Shelly 1 and 2pm stop communicating but still on wifi?

New Shelly user, long time HA and networking…

This is about shelly not about HA really, but hopefully some expertise here:

I have three shelly switches, two 2PM and one 1, that have been installed for 4-5 days. I also have a CNG sensor and flood sensor both of which are working fine.

In the last day I’ve had three failures, on one twice, one once, and one not at all.

They do NOT drop off wifi, as in disassociate. I checked the AP logs. They actually continue to communicate at the low level in that they respond properly to ARP requests, and with the expected IP and MAC and on the right SSID. All my testing is on one subnet.

They will not respond to pings, or to home assistant or to web access.

A power cycle fixes it.

It is not a wifi signal issue, they have an RSSI of -38 and -56 dBm. And again, the AP shows them still connected in this failure mode.

It acts as though the IP stack is failing.

They are running the real shelly firmware and current, and no cloud or other options other than bluetooth proxy is enabled (and works). They have not been modified or re-flashed with any other software.

This feels like some kind of software glitch. Is anyone having issues? The firmware in use is 20240430-105743/1.3.1-gd8534ee so it is really new (but I have no older experience as I only just got these).

Again – NOT a wifi dropout issue. I see lots of postings about those (though someone who doesn’t know how to look might think it is wifi giving it goes incommunicado for most things).

Any ideas?

Linwood

PS. Yes, this started near the peak of the aurora from the CME, but…

Have you enabled debug logging and captured the logs?

image

I had not, I just did, however I assume that logging comes from the communications with HA. When they stop communicating can it show anything other than that they stopped communicating even with pings? (I see these going off due to a network monitor that does pings every 60s).

I’m waiting for more. I had a failure at 2am, 11am and 1pm. It’s now 5pm so lack of another event so far means nothing.

More when something happens.

I don’t know, but there might be a clue in there.

If there are not any clues, I would take the logs and post them as a new issue on Shelly GitHub Issues (I’m assuming that you checked there for any similar issues).

I can’t see how the integration is actually involved in the device stops responding even to pings, but in reading through the list nothing sounds similar. I found a lot of postings elsewhere about dropping off wifi (but this is not that). I’ll just have to wait for it to happen again and see if I get anything.

I want to enable logging in the device, but there’s no indication it persists through power cycles (do you know?). I also am completely confused about enabling logging, it has a specific enable for mqtt and for websocket, but not any general sort. I enabled websocket and will leave a couple of browsers open and see if they will stay connected and show anything. Now just need it to fail again.

A device hardware / firmware issue can be logged with Shelly support directly:

Ah, thank you. Done.

And of course after those three outages no more… Murphy is active.

Unless debug logging and keeping a web browser open is actually avoiding some bug.

Waiting continues…

Arggg…

So after a few days of no outages I decided to stop the web browser that was open on each one. Within a few hours one of them hung again - could be coincidence, could be the web browser was somehow avoiding the bug.

What has me frustrated is the debug mode on the shelly integration had been turned off. I didn’t do it. Something else did. Not sure what.

But I also turned it back on and do not see where to find the log. Anyone know? System, logs, the drop down doesn’t have anything specific to shelly. Do I need to enable it in the configuration.yaml somewhere and not just the integration screen? I still don’t expect to get anything there, other than loss of communication, but…

Linwood

image

I had two 1s that behaved like that. Flashed with ESPHome and rock solid for the last three years.

I hadn’t realized that was an option. Do you know if there are example yaml files floating around? Or could you share?

Update: I found several, thanks. What I was missing is just the idea it would work.

@maxK thank you. I didn’t realize it was mixed in with the core. I guess if it fails again I will have something.

I want to correct a mis-impression – Shelly responded promptly to that ticket and I somehow lost the email (since found). They sent a reminder a week or so later.

Perhaps worse (or better), for the last 5 days or so there have been no outages. I hate mysteries.

Hi, do you perhaps use a AIMESH router setup?
I also had some problems and in desperation, I bind the shelly to a specific router and is now a few days without ptoblem, but still too soon to conclude. Not sure how the aimesh works, but was wondering if it was handed to another router, if that may be when it became inaccessable.

No mesh involved, no wifi issues involved as I’ve mentioned above, since ARP works. This is not a network issue (in the sense of outside the device), it is something happening inside the device.

It’s been a while, but have you achieved progress in the meantime?
I have a bunch of Shellys, but only the 2PM Gen3s for the window shades are doing that and it’s super annoying to open their hiding spots to reset them. The others are perfectly fine.
Unfortunately I can’t diagnose deeper because I have TP-Link hardware where the interface is a**. Might have to change that at some point in the future when I have money lying around…

I was thinking of programming a reboot every night inside the devices, but as it’s not possible with normal schedules, I didn’t get around to look up how to do it with a script.

My problem just went away, so I assume that Shelly did some update at some point.

not my intention to disappoint you, but the most recent firmwares for Shelly gen1 are released too long ago to help you with the issue:

  • 1.14 - September’23
  • 1.14.1 RC1 (and never finalized) - November’23

Maybe a terminology issue, I have Shelly 1 Gen 2?

Or maybe it was global warming, cosmic rays, or something I changed for other reasons on the network and did not notice, but no longer failing is still a good thing.