All my hue lights randomly becoming unavailable

FWIW, I have a test instance of Home Assistant OS on an RPI3. Using the SSH Add-on, when I ping my Philips Hue Bridge it returns sub-second responses.

Screenshot from 2020-06-26 13-20-41

Unless your DNS has an entry for the Hue Bridge, you’ll have to ping it using its IP address.

The response times from my production server (running on an old laptop) are slightly faster (under a half-second). All devices are connected to the same 10/100 switch.

Only my production server communicates with the Hue Bridge and I haven’t noticed incidents where all Hue devices become unavailable (I currently only have a half-dozen Hue Color bulbs).

anything else you added? or changed in the HA setup?

start with the recorder and logbook. simply only include what you need, and exclude anything else, including events.

exclude:
  event_types:
    - service_removed
    - service_executed
    - platform_discovered
    - homeassistant_start
    - homeassistant_stop
    - feedreader
    - service_registered
    - call_service
    - component_loaded
    - logbook_entry
    - system_log_event
    - automation_triggered
    - script_started
    - timer_out_of_sync

  domains:
    - alert
    - automation
    - binary_sensor
    - camera
    - climate
    - counter
    - customizer
    - device_tracker
    - group
    - input_boolean
    - input_datetime
    - input_number
    - input_select
    - input_text
    - light
    - media_player
    - proximity
    - scene
    - sensor
    - script
    - sun
    - switch
    - persistent_notification
    - person
    - remote
    - timer
    - updater
    - variable
    - weather
    - zone

again, the ping results won’t help here. Still, to compare:

~ $ ping 192.168.1.212
PING 192.168.1.212 (192.168.1.212): 56 data bytes
64 bytes from 192.168.1.212: seq=0 ttl=63 time=0.622 ms
64 bytes from 192.168.1.212: seq=1 ttl=63 time=0.531 ms
64 bytes from 192.168.1.212: seq=2 ttl=63 time=0.560 ms
64 bytes from 192.168.1.212: seq=3 ttl=63 time=0.497 ms
64 bytes from 192.168.1.212: seq=4 ttl=63 time=0.418 ms
64 bytes from 192.168.1.212: seq=5 ttl=63 time=0.510 ms
64 bytes from 192.168.1.212: seq=6 ttl=63 time=0.478 ms
64 bytes from 192.168.1.212: seq=7 ttl=63 time=0.508 ms
64 bytes from 192.168.1.212: seq=8 ttl=63 time=0.454 ms
64 bytes from 192.168.1.212: seq=9 ttl=63 time=0.710 ms
64 bytes from 192.168.1.212: seq=10 ttl=63 time=0.526 ms
64 bytes from 192.168.1.212: seq=11 ttl=63 time=0.536 ms
64 bytes from 192.168.1.212: seq=12 ttl=63 time=0.506 ms
64 bytes from 192.168.1.212: seq=13 ttl=63 time=0.481 ms
64 bytes from 192.168.1.212: seq=14 ttl=63 time=0.456 ms
64 bytes from 192.168.1.212: seq=15 ttl=63 time=0.454 ms

and this is what’s registered on the Hub:


Add the hue debug setting and check for +5 seconds events. If not, all is well, but these are the moments all Hue go unavailable.

I mean, you know how it goes - You add little things here and there and then one random day your wife shouts from another room “Alexa won’t turn the Family Room lights off!”

Good starting list - Thank you! I was just going through one of my old RPi2 configuration.yaml files and found that I had a bunch of things excluded from recorder and logbook, so I’ll start there and then expand to what you’ve shared if the problem doesn’t go away.

You may have misunderstood my suggestion. Use ping to check connectivity when the devices are reported to be unavailable.

Pinging when they are all available is not likely to reveal much unless response times are unusually long even under normal circumstances (unlikely). I only posted my results for comparative “normal” conditions. As mentioned, I’ve never experienced the unavailable situation so I can’t provide ping times for that condition.

Anyway, if you say it’s a glitch with the integration then so be it.

@roddie, check this How to reduce your database size and extend the life of your SD card by @tom_l for detailed info on settings for recorder, logbook and history (which I forgot all about…)

Thanks, @Mariusthvdb! This is excellent. I went through an exercise like this when I was still on my RPi2, but I didn’t bother when I rebuilt on an RPi3 or migrated to the Mini PC.

That said, I’m thinking it might be related to the Integration itself because that’s really the only thing that I can think of that changed between stable and my wife yelling at me. I’ve removed the integration and gone back to the manual entry in my YAML file.

I’ll keep working at it and let everyone know what I come up with

So, an update - I removed the Hue integration and went back to the manual YAML method:

hue:
  bridges:
    - host: 10.x.x.202

For whatever reason, HA didn’t see the hub/lights at all anymore - IIRC, I had to manually trigger a discovery when I had it configured this way before, but I was short on time, so I didn’t bother trying to troubleshoot for very long.

I added the Hue integration back while leaving the YAML in-tact, and I’ve had only a fraction of the “unavailable” messages in my Logbook since, and only with one bulb. Prior to this, I was getting them a lot more often with all of my bulbs.

It could just be a coincidence, and I’ll keep monitoring, but so far, so good.

I do have exactly the same issues since the last update. All 1-2 minutes hue lights go to off/unavailable in the log. No changes on the config - it worked for years before. Any real fix/solution? Sounds like try and error approach so far?

I see in the log file:
Timeout fetching sensor data
Logger: homeassistant.components.hue.sensor_base
Source: helpers/update_coordinator.py:140
Integration: Philips Hue (documentation, issues)
First occurred: July 17, 2020, 11:21:47 AM (46 occurrences)
Last logged: 7:07:23 AM

and

Timeout fetching light data
Timeout fetching group data

Log Details (ERROR)

Logger: homeassistant.components.hue.light
Source: helpers/update_coordinator.py:140
Integration: Philips Hue (documentation, issues)
First occurred: July 16, 2020, 9:03:50 PM (99 occurrences)
Last logged: 7:25:14 AM

My log file is showing three different errors in fact. I wonder if all of them are coming from different issues that people may be having here:

  • Request failed 3 times, giving up (x114),
  • Timeout fetching group data (x38),
  • Timeout fetching sensor data (x19).

I have made sure that my bridge has a fixed IP. Zigbee is operating on channel 11 and WiFi, I think, on Channel 6, so there should be no interference(?). Other than running a little low on ideas.

I may have made some progress with my issue. I have been pinging continuously every 2s my pi based HA server from another pi. Both are connected to the same stock Virgin router. Every other minute, I can see host becoming unreachable:

Mon 10 Aug 08:22:58 BST 2020: 64 bytes from 192.168.0.60: icmp_seq=405 ttl=64 time=0.475 ms
Mon 10 Aug 08:23:00 BST 2020: 64 bytes from 192.168.0.60: icmp_seq=406 ttl=64 time=0.624 ms
Mon 10 Aug 08:23:13 BST 2020: From 192.168.0.50 icmp_seq=411 Destination Host Unreachable
Mon 10 Aug 08:23:13 BST 2020: From 192.168.0.50 icmp_seq=412 Destination Host Unreachable
Mon 10 Aug 08:23:17 BST 2020: From 192.168.0.50 icmp_seq=413 Destination Host Unreachable
Mon 10 Aug 08:23:17 BST 2020: From 192.168.0.50 icmp_seq=414 Destination Host Unreachable
Mon 10 Aug 08:23:22 BST 2020: From 192.168.0.50 icmp_seq=415 Destination Host Unreachable
Mon 10 Aug 08:23:22 BST 2020: From 192.168.0.50 icmp_seq=416 Destination Host Unreachable
Mon 10 Aug 08:23:25 BST 2020: 64 bytes from 192.168.0.60: icmp_seq=417 ttl=64 time=2079 ms
Mon 10 Aug 08:23:25 BST 2020: 64 bytes from 192.168.0.60: icmp_seq=418 ttl=64 time=0.648 ms

and when I refresh the logbook in HA these are clearly correlated with my bridge going down. So this is looking now like a network connectivity on the server side.

I am trying to evaluate whether this is happening because of the router or because of my Pi4.

This is an old thread, but I’m wondering if anyone has found any resolution to this. I am encountering this issue now. It might be totally unrelated, but it seems to have started when I started hitting the bridge pretty hard with Halloween automations that quickly and repeatedly change 4 different lights to different colors. Maybe the bridge is just “getting tired” and not working as well as it did previously?

I’m still having the issue here - I thought I’d resolved it by configuring Hue support directly in configuration.yaml, but it’s still misbehaving.

Yep, still having the same issue on my setup too.

Hey guys, I’m on the same boat here.

Tried resetting the Hue hub, removed and re-added the Hue integration, disabled auto-discovery through configuration.yaml but my lamps still become briefly unavailable and this is starting to get really annoying.

I only got 3 lights on the hub so there isn’t any chance of overloading it. It just misbehaves.

This may be unrelated, but it seems that the Hue integration gets in a weird state at the same time when the bulbs misbehave. Specifically, if I’m on the Integrations page and click on one of my four Hue integrations instead of showing me the settings, it says something like “Installing” almost as though the integration crashed and is resetting.

Again, could be a false alarm, but thought I’d mention it.

As it turns out, I think the root cause of my issue was that my SD card was starting to go bad. Eventually HA stopped responding and the SD partitions were corrupt. I rebuilt a new HA on a new Raspberry Pi and everything is working smoothly.

I’m having the same issue. Every 2 min the system has an error and it “Became unavailable” for 5 seconds.

This is an issue for some automations. Reading all I have not seen a possible solution, buy maybe I lost something. What I tried:

  • Changing Hue Hub position
  • Changing Raspberry position
  • Changing ports
  • Desinstall Hue Integration from Home Assistant
  • Install Hue Integration from Home Assistant

Thanks in advance if someone can help!

Regards,

No solution I’m afraid but I have recently started experiencing the same problem (Feb '21). I’m running on a Linux box not a pi, so not an SD card issue for me. Upgraded to 2021.02.06 and still getting the issue. Bit of a headscratcher. Quite annoying.

Yes, it’s quite annoying. What I’ve notices is that after every system update situation gets better. Instead of loosing the connection every 5s it’s starts loosing it every hour, and then the timefrime of the issue shorts itself every time.
Have you also noticed that?
Do you think with this information something could be solved?
Regards,