Devices become "Unresponsive", fixed with "Reload", can't be constantly reloading

Howdy,

I have Home Assistant installed via HASSIO and roughly 40 devices on my network. I am having this issue for a while but today finally I found some time to get a hold of it.

Basically what is happening is that the devices become “unresponsive”. I’ll give as an example a Shelly H&T humidity and temperature sensor, connected with the USB base.

I add it to HA,
Everything OK
Half an hour later, device “unresponsive”.

If I go to HA Devices and Integrations and simply “Reload”, it’s working again. For half an hour or so.

Other devices like Shelly DW2 also suffer from this illness. But let’s not think it only affects battery powered devices: I also have a Shelly GAS CNG, which is plugged to an outlet and always home, and it also becomes irresponsive.

I have also read a bit about here in the community forums, several devices also become unresponsive apparently since a HA update a while back.

My question is: is this going to be fixed?
Is there a command to automate the “reload” of integrations every once in a while so it stops showing as “unresponsive” ?

Thank you

Bump

Hiya people c’mon a little help here :sweat_smile:

Is there any way to make home assistant periodically reload an integration?

Some devices don’t like a mix of 2.4Ghz-5Ghz Wi-Fi network.
Are u able to set you Wi-Fi network to 2.4Ghz only? Check what happens then.
I also have some devices that don’t like channel switching. Try to fix the 2.4Ghz Wi-Fi channel to 1,6 or 11

Yes you can set it via a automation, but it is only a nasty workaround :wink: and it makes the device very unreliable because you want to monitor or use the device @ any time

Hi there and thank you for your reply. The IOT network is 2.4 only and clients are connected on channel 1 (it’s the optimal). I have two Aruba InstantOn AP22, all the settings are manageable.

I have a theory about this, except for the Shelly GAS that only happened once may have been something non-related, all the devices that become “unresponsive” are battery powered devices. Shelly DW2 for example is a door window sensor. If the window is closed for 8 hours and there are ever so slight changes in temperature, it won’t report. So after a while HA says it’s “unresponsive”. Battery devices can’t be always on.
Reloading the integration works in the sense that no longer becomes unresponsive (but didn’t awake the devices to check) and reports the last value actually reported by the device.

If it would be my installation, I’d try to solve the problem, not try to work around the symptom.

But for that some more information would be necessary. Focus on one device (type) and see what the logs bring up. Enable the debug mode for the integration you’re using to integrate that device. If it’s via an Add-on, what do the logs say there? If help is needed, don’t hesitate to post some more info on the things involved (HA-OS version, integration version, Add-on version, hardware and so on).

PS: it seems only your Shelly devices are affected, please check if these devices do some specific things, depending on the power source they are connected to. I have a Shelly button, that is not working, as long as it is plugged in (versus battery), and the admin interface is only working on that device, if it is plugged in. Kind of an “if power connected, only admin use”… :slight_smile:

Hi and thank you for the replies. Ok so to debug this and check logs etc I’ll have to do it over the weekend.

Only my Shelly and only battery powered Shelly. HOWEVER there is an inconsistent behaviour with the H&T sensors, as I have currently 3 active using USB bottoms. they’re not always on but support updating more frequently, giving that there are atmospheric condition changes.

What I find most awkward is that on these H&T only one goes unresponsive. The other two, exactly alike with the same config, don’t go unresponsive.
I’m actually a bit tempted to spin a new HASSIO VM and add all integrations there.

Are you sure that device isn’t dropping from your network, then coming back a few minutes later? A new HAOS installation isn’t going to fix the problem.

Yes I am. Thank you.

Well something is happening to it to make HA treat it differently then the other sensors. If they are all the exact same hardware, firmware, and integration. They all go through the same code. Usually something like that boils down to a hardware or networking issue.

It’s not hardware and it’s not networking issue. I am a systems and network administrator and have thousands worth of top networking hardware to support IOT. IOT doesn’t work if the underlying infrastructure is not good. I make my living out of it. Also, the H&T that recurrently goes unresponsive is the one with the best wifi signal. WIFI is assured by two HPE antennas (each supports 200 clients so that would be 400 total) and have ZERO network-related issues.

Then it sounds like it’s a hardware issue with the device itself. The same code isn’t going to treat the same hardware differently. That’s not how code works and you should know this as a system admin. Intermittent issues always come down to the hardware itself, the hardwares config, the hardwares firmware, or networking.

If you have 2 working good sensors and 1 that miss behaves… I’d start looking at the hardware. You’re welcome to continue to question the integration, but you’ll notice that you’re the only one having this issue with shelly devices…

I have 11 battery based sensors behaving poorly all with the same behaviour. (10 DW2 and 1 H&T)
Not ONE. ELEVEN.

The HA installation is the first since idk how many years ago first tried this. I look at the config directory and I see leftover crap since forever. Garbage collection isn’t a feature here.

Ok, by all means write up an issue then. Don’t take any advice here, and just assume the code is bad. Is that the affirmation you want with this topic?

Advice? The only advice here was from @paddy0174 that actually mentioned something useful like next steps in order to pursue the issue.
All you did was “it’s a hardware issue, it’s a whatever issue, HA code is perfect”. Cheers man, you’re a genius.

I’m giving you starting points to look at seeing that you went 3 days and had to bump your thread. You seem to gloss over that fact and are upset that I said any of this. Have a great time being very upset over some small pointers. I’m out.

I’m not sure why you seem offended by my responses. Anyone at every level of competence can overlook simple things. I deal with it on a day to day basis integrating IOT products into our assembly lines. But what do I know, I don’t help people every day or anything in my free time.

1 Like

The offensive part of your response is throwing the responsibility of the issue to “hardware, network & others” out of the oblivion of what you think, when you don’t know any details of my HASSIO install, of my network, or even the detail of monitoring and general works I have over it. Your insistence is offensive after I mentioned I am a field professional and ensuring that part is fine. For you, it seems more obvious that 11 devices fail simultaneously, than an issue of HA. Because you know, code is perfect, bugs don’t exist, actually when a software developer delivers a product that product is closed and will be working forever without any errors, because code is perfect.

If you had any experience with software, monitoring in large scale or others, you’d see how frequent polling issues (isn’t what HA does is a polling job btw?) are due to underlying conditions that have NOTHING to do with hardware.

Guys, please. :slight_smile: We’re all here to get something cool out of our hobby, namely the best smart home ever! :wink:

So back to topic: my suggestion would be, to start fresh and with the basics.

  • What HA version are you running, and is it HA-OS or another installation method?
  • How do you communicate with the Shellys? Via integration or via MQTT with the Shelly original firmware? Or did you flash a custom firmware on the Shellys (btw that’s what I do)?
  • Have you tried changing the location of the “unresponsive” device?

And now to my ideas how to get closer:

  • After a reload of the integration, you should be able to get to the logs of the device in some way. I’m not sure, but the Shellys should report back about their WIFI signal strength. Is there something to look for, eg. even while running (and connecting) the signal is bad? Try changing the location, and see if it get’s unresponsive after a while in the new location.
  • If you don’t mind, have you thought about flashing the Shellys with ESPHome? I do this for all my Shellys, because I don’t want to run another integration and my Shellys are working really nice. But all that depends on you, having physical access to the device (eg. not inside the wall).
  • Depending on how you run the Shellys (integration, MQTT, ESPHome), is there any admin menu available? If you run it with the defaults, the Shelly should connect via the Shelly cloud. If so, is there a firmware update available in the Shelly admin interface of that device?

Questions, and more questions! :rofl: See what you can answer and try, and if things change. But the details from your installation would be good to know. It’s hard to guess, especially with Shellys, as they offer numerous ways to connect to them…

Let us know, what comes up! :slight_smile: Ah, and if this doesn’t get you anywhere, we do need some logs. :wink:

I’ve been writing code for HA and using it for 8+ years. I know very well how it works and I also know bugs happen. I never said the code was perfect. You stated that you had 3 devices, 2 that works without issue and 1 that has this problem. In my experience this is ALWAYS a hardware or networking issue.

Then out of no where, you explain that you have 11 that all act this way. In my experience this would typically be a software issue. However hardware issues are not out of the question if you set them up all the exact same way, which is easily done if you have an onboarding process.

Please keep in mind that I’m working off what you tell me. Getting pissed at me because I’m working of your information doesn’t help anyone. I’m sorry that I hurt your ego, can we please leave it at the door now and solve your problem?

FYI shelly integration doesn’t poll. It waits for a response from the device.

Lastly, this issue on shelly might shed some light.

According to the code owner, the integration expects the sleep wake up cycle to respond in a specific period of time. What’s the wake cycle on your devices?

1 Like

@paddy0174

  • first question about env, HASSIO VM image
Home Assistant 2023.7.3
Supervisor 2023.07.1
Operating System 10.3
Frontend 20230705.1 - latest
  • Shellys communicate via integration. I have multiple vlans avahi handling mdns (but this is only for HomeKit, because Shelly’s and Home Assistant are all on the same subnet), but even between different vlans everything’s working and tuned to perfection. Any new Shelly or anything else connected to the IOT network is immediately recognised by HA. I just have to add confirm the integration, and in this case add the authentication and everything is correctly setup. No custom firmware or any changes other than periodically updating their firmware when new versions are available.
  • Changing the location is irrelevant since I have 1DW2 per each window + front door, so it’s 9 windows and the front door, some are closer to the antenna, some are further from the antenna, direct view to the antenna and not, etc. So even disregarding the H&T device, I consider all locations considered there wouldn’t be another place to put any that would make it differently. Unless I actually glued one to the AP. And for example the DW2 all fail at the same time (prolly in regard that I reload them at the same time too) so it’s hard for me to understand how the WIFI and placement of device would be be a factor.

I have to see how to enable debug logging and do the tests with some more time than … well, when I should be working :sweat_smile:

I have 30 Shelly devices, of these 11 are giving these problems and what they have in common is being (originally) battery powered devices. The H&T are “no longer” battery devices but their configuration is kind of still built around that principle, I’ll explain a bit further down.

@petro
Allow me to put this to you in another way
Initially and at the first times after I added them to HA, this didn’t happen. This started happening at some point in time I didn’t pay much attention then. I am sorry I honestly have no idea what was the last working of HA Core.

  • The devices:
    • 10 Shelly DW2 (Door Window sensors)
    • 3 Shelly H&T (Humidity & Temperature)

The Shelly DW2 are battery powered and there’s not much you can do about it, the H&T sensors originally came with the battery bottom but I purchased the USB bottoms.

You say shelly integration doesn’t poll, waits for a response from the device. Ok, then if a device doesn’t get back, how long it takes for it to become unresponsive? Because with the H&T I’m accounting a few minutes, and DW2 around a few hours. I only noticed the DW2 because over night they stop responding.

The wake cycle for the H&T is when a value crosses a threshold, either humidity or temperature. If the H&T is on battery, the thresholds that can be selected are further apart than if by usb. (eg, on battery you can only select 1.0 intervals, and usb enables 0.5 intervals). Other than that doesn’t have a preset time interval to report.
The DW2 are battery only but actually last long. They report if the window state changes (open/closed), have luminosity and temperature sensors, which also work by thresholds. No setting to periodically wake and call home.

The devices have the IP statically configured on the device itself & a static mapping in the router.
If I wake any of the devices I get immediate connection with them, they have a stable connection, don’t drop (unless they go idle). I can see the signal strength for any device individually on my antennas management.

I read the issue you shared there, the resolution was:

Solution: If you are running HA in docker and your network is configured as 'bridge' you have to add port 5683 in the container:

Local port 5683; Container port 5683; Type UDP

Now it works for me!

I am not using docker type install. Home Assistant and the devices are on the same subnet. this doesn’t apply to me.

Now I’m going to describe a behaviour with the DW2 sensors that I have tested repeatedly:

Everything is ok the integration is active. You open a window you close the window HA reports accordingly.
You go to sleep, next morning the device is unresponsive. You go to the integration and reload.
It reloads successfully and loads the last recorded values. Did HA contact the device on the reload? Impossible, the device IS asleep. But the integration shows reloaded and reports values. OK.
You open the window. In HA the sensor doesn’t change state. You close the window. HA doesn’t change state. And so on. But not showing unresponsive. Leave the window open.
You manually wake the DW2 sensor (put a pin to it). Access its management interface via http. The state is correctly reported (as open in the case). You go to HA. Nothing.
While the DW2 is awake, you reload the HA integration. State is now correctly reported in HA. The DW2 goes idle. You close the window. HA reports correctly. You open the window. HA reports correctly. Until the next “unresponsive” event, where you have two options, either you simply reload all the integrations alone and it clears the errors, but if you really want it functional you have to wake every device before the reload event.