Ive been fighting issues with ZHA for weeks and making little to no progress.
ZHA with skyconnect (on a 10’ extension cable i think)
HASSOS as a VM on esxi 7
Home Assistant 2023.3.3 (started a while ago, maybe on 2022.12 or 2023.1)
Supervisor 2023.3.1
OS 9.5
Frontend 20230309.0
i was around 125 devices connected to my ZHA network with an old nortek 2 in 1 stick. 35 or so of them were repeaters. I got my skyconnect late december (i think?) so i was excited to move to it. Migration was a piece of cake.
A little while earlier a couple of Osram bulbs started acting up, so i thought the skyconnect might work better. It didnt make a difference, so i replaced a couple of them with sengled bulbs. They also had some issues where they wouldnt respond to commands. At the time i was around 135 devices, still 35 or so routers.
One day zha crashed completely. No device on the network would respond. I reloaded the integration and it failed to load. I got an error message (cant find the text, it was a while ago) and it took me a while to figure out i needed to disconnect the skyconnect, connect it again, then reload the integration. it comes back up right away, but it might take another 20 minute for all of my devices to come back online.
This continued for a few weeks. Same issues, same fix. Sometimes it would crash 3 times a day, other times it would be fine for a few days, but that was rare.
Decided to set up Z2M and start moving devices. subjectively, Z2M seemed faster and more stable, while i still had issues with ZHA. Same thing, it would crash and id have to reseat the USB and reload ZHA to get it back online.
Im down to 101 devices on ZHA, the rest have been moved to Z2M. yesterday, i posted in the HA facebook group and someone mentioned Osram devices were failing on them causing their ZHA network to crash. Since id already had several Osram failures and similar symptoms he described, i unplugged every Orasm device on my network. (i didnt consider that i was disconnecting 12 or 13 routers, but ive still got around 25, now with 101 ZHA devices, but i think thatll iron itself out).
this was last night, around 9pm. I reloaded ZHA after i did that and let it go about its buisness. By 12:10am, it had crashed again. I reseated the dongle and reloaded ZHA before leaving for work and so far (1130am) it seems to be okay, as far as i can tell from here.
what can i look at to find the cause of these problems? Im hoping we are onto something with Osram devices being the cause but since it crashed 3 hours later, im not so sure. Im wondering if that was something with the mesh being changed and rebuilding a lot of routes?
where can i find helpful logs? If i look at the logs before ZHA crashes i cant find anything thats helpful, but i dont know what im looking for. If it stops responding and i reload it before reseating the USB, i get an error in the logs that it failed to load but i dont remember the wording since i can tell when its not working and i just fix it to get things back up and running.
short of moving 100 devices to Z2M, what can i do to find the cause of this? Im good with EVENTUALLY moving to Z2M, but i dont want to try and move 100 devices at a time.
should i ditch the skyconnect and migrate back to the nortek for the time being? surprisingly i dont think i had a single ZHA issue when that was my coordinator, but i dont think the device count was greater than 110 on that coordinator.
any insight or advice would be much appreciated!! also please let me know what other info i might need to provide, this is my first post here so hopefully ive given enough info.