ZHA not able to load. Is Home Assistant trying to upgrade the firmware without my knowledge?

I have newly installed Home Assistant, and added ~26 Zigbee devices today. First, it seemed to go well, but after a while, I was not able to add more devices until I restarted the computer. After another reset, the ZHA integration was stuck in an “Initializing” / “Failed setup, will retry” loop. I tried deleting and adding the integration again, with the same result.

Setup:

  • Intel NUC with Proxmox
  • Ubuntu VM with Aeotec Z Stick 7 (Z-Wave) and SMLIGHT SLZB-07 (Zigbee) attached to that VM (Note: They use the same USB-serial converter, and might be confused by auto probing software?)
  • Home Assistant 2024.4.1 in Docker in Ubuntu VM
  • Zigbee runs in ZHA, Z-Wave in separate zwave-js

Based on the logs, it seems that it starts initializing the network, but at some point it gets interrupted by universal_silabs_flasher probing for bootloaders.

Below is a partial excerpt from the logs.

# I guess this is regular Zigbee stuff during initialization
2024-04-06 18:34:31.612 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'6430a1a9602a15be6eb07e'
2024-04-06 18:34:31.612 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'87009f7e'
2024-04-06 18:34:31.612 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received sendUnicast: [<EmberStatus.SUCCESS: 0>, 12]
2024-04-06 18:34:31.641 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'7430b1a9702a77eaa8f27fdb55c01c959c4dec8d37a47e'
2024-04-06 18:34:31.641 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'8070787e'
2024-04-06 18:34:31.642 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received trustCenterJoinHandler: [0x5862, dc:8e:95:ff:fe:35:66:f1, <EmberDeviceUpdate.STANDARD_SECURITY_SECURED_REJOIN: 0>, <EmberJoinDecision.NO_ACTION: 3>, 0x26cb]
2024-04-06 18:34:31.642 DEBUG (MainThread) [bellows.zigbee.application] Received trustCenterJoinHandler frame with [0x5862, dc:8e:95:ff:fe:35:66:f1, <EmberDeviceUpdate.STANDARD_SECURITY_SECURED_REJOIN: 0>, <EmberJoinDecision.NO_ACTION: 3>, 0x26cb]
2024-04-06 18:34:31.642 INFO (MainThread) [zigpy.application] Device 0x5862 (dc:8e:95:ff:fe:35:66:f1) joined the network
2024-04-06 18:34:31.642 DEBUG (MainThread) [zigpy.device] [0x5862] Cancelling old group rescan
2024-04-06 18:34:31.642 DEBUG (MainThread) [zigpy.zcl] [0x5862:1:0x0004] Sending request header: ZCLHeader(frame_control=FrameControl<0x01>(frame_type=<FrameType.CLUSTER_COMMAND: 1>, is_manufacturer_specific=False, direction=<Direction.Client_to_Server: 0>, disable_default_response=0, reserved=0, *is_cluster=True, *is_general=False), tsn=3, command_id=2, *direction=<Direction.Client_to_Server: 0>)
2024-04-06 18:34:31.642 DEBUG (MainThread) [zigpy.zcl] [0x5862:1:0x0004] Sending request: get_membership(groups=[])
2024-04-06 18:34:31.643 DEBUG (MainThread) [bellows.zigbee.application] Sending packet ZigbeePacket(timestamp=datetime.datetime(2024, 4, 6, 18, 34, 31, 643027, tzinfo=datetime.timezone.utc), src=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0x0000), src_ep=1, dst=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0x5862), dst_ep=1, source_route=None, extended_timeout=False, tsn=3, profile_id=260, cluster_id=4, data=Serialized[b'\x01\x03\x02\x00'], tx_options=<TransmitOptions.NONE: 0>, radius=0, non_member_radius=0, lqi=None, rssi=None)
2024-04-06 18:34:31.643 DEBUG (MainThread) [bellows.ezsp.protocol] Send command sendUnicast: (<EmberOutgoingMessageType.OUTGOING_DIRECT: 0>, 0x5862, EmberApsFrame(profileId=260, clusterId=4, sourceEndpoint=1, destinationEndpoint=1, options=<EmberApsOption.APS_OPTION_RETRY|APS_OPTION_ENABLE_ROUTE_DISCOVERY: 320>, groupId=0, sequence=3), 27, b'\x01\x03\x02\x00')
2024-04-06 18:34:31.643 DEBUG (Thread-112) [aiosqlite] executing functools.partial(<built-in method execute of sqlite3.Connection object at 0x7e77978e2110>, 'UPDATE devices_v12\n SET last_seen=:ts\n WHERE ieee=:ieee AND :ts - last_seen > :min_update_delta', {'ts': 1712428471.642384, 'ieee': dc:8e:95:ff:fe:35:66:f1, 'min_update_delta': 30.0})
2024-04-06 18:34:31.643 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'403121a9602a15d001904b21aa5493099d4e27a8f6ca6688ffc6e50f7e'
2024-04-06 18:34:31.643 DEBUG (Thread-112) [aiosqlite] operation functools.partial(<built-in method execute of sqlite3.Connection object at 0x7e77978e2110>, 'UPDATE devices_v12\n SET last_seen=:ts\n WHERE ieee=:ieee AND :ts - last_seen > :min_update_delta', {'ts': 1712428471.642384, 'ieee': dc:8e:95:ff:fe:35:66:f1, 'min_update_delta': 30.0}) completed
2024-04-06 18:34:31.644 DEBUG (Thread-112) [aiosqlite] executing functools.partial(<built-in method commit of sqlite3.Connection object at 0x7e77978e2110>)
2024-04-06 18:34:31.644 DEBUG (Thread-112) [aiosqlite] operation functools.partial(<built-in method commit of sqlite3.Connection object at 0x7e77978e2110>) completed
2024-04-06 18:34:31.652 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'0531a1a9602a15bf0c9b7e'
2024-04-06 18:34:31.652 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'8160597e'
2024-04-06 18:34:31.652 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received sendUnicast: [<EmberStatus.SUCCESS: 0>, 13]
2024-04-06 18:34:31.741 DEBUG (Thread-112) [aiosqlite] executing functools.partial(<function PersistingListener._set_isolation_level.<locals>.<lambda> at 0x7e779462d940>)
2024-04-06 18:34:31.741 DEBUG (Thread-112) [aiosqlite] operation functools.partial(<function PersistingListener._set_isolation_level.<locals>.<lambda> at 0x7e779462d940>) completed
2024-04-06 18:34:31.741 DEBUG (Thread-112) [aiosqlite] executing functools.partial(<built-in method execute of sqlite3.Connection object at 0x7e77978e2110>, 'PRAGMA wal_checkpoint;', [])

# Suddenly, serial port closed
2024-04-06 18:34:31.742 DEBUG (bellows.thread_0) [bellows.uart] Connection lost: None
2024-04-06 18:34:31.742 DEBUG (bellows.thread_0) [bellows.uart] Closed serial connection
2024-04-06 18:34:31.745 DEBUG (Thread-112) [aiosqlite] operation functools.partial(<built-in method execute of sqlite3.Connection object at 0x7e77978e2110>, 'PRAGMA wal_checkpoint;', []) completed
2024-04-06 18:34:31.745 DEBUG (Thread-112) [aiosqlite] executing functools.partial(<function PersistingListener._set_isolation_level.<locals>.<lambda> at 0x7e779652b560>)
2024-04-06 18:34:31.745 DEBUG (Thread-112) [aiosqlite] operation functools.partial(<function PersistingListener._set_isolation_level.<locals>.<lambda> at 0x7e779652b560>) completed
2024-04-06 18:34:31.745 DEBUG (Thread-112) [aiosqlite] executing functools.partial(<built-in method close of sqlite3.Connection object at 0x7e77978e2110>)
2024-04-06 18:34:31.746 DEBUG (Thread-112) [aiosqlite] operation functools.partial(<built-in method close of sqlite3.Connection object at 0x7e77978e2110>) completed

# Failed to set up
2024-04-06 18:34:31.746 DEBUG (MainThread) [homeassistant.components.zha] Failed to set up ZHA
Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/components/zha/__init__.py", line 153, in async_setup_entry
zha_gateway = await ZHAGateway.async_from_config(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/src/homeassistant/homeassistant/components/zha/core/gateway.py", line 197, in async_from_config
await instance.async_initialize()
File "/usr/src/homeassistant/homeassistant/components/zha/core/gateway.py", line 215, in async_initialize
await app.startup(auto_form=True)
File "/usr/local/lib/python3.12/site-packages/zigpy/application.py", line 234, in startup
await self.initialize(auto_form=auto_form)
File "/usr/local/lib/python3.12/site-packages/zigpy/application.py", line 180, in initialize
await self.permit(0)
File "/usr/local/lib/python3.12/site-packages/bellows/zigbee/application.py", line 938, in permit
await super().permit(time_s, node)
File "/usr/local/lib/python3.12/site-packages/zigpy/application.py", line 1254, in permit
await zigpy.zdo.broadcast(
File "/usr/local/lib/python3.12/site-packages/zigpy/device.py", line 688, in broadcast
return await app.broadcast(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/zigpy/application.py", line 929, in broadcast
await self.send_packet(
File "/usr/local/lib/python3.12/site-packages/bellows/zigbee/application.py", line 912, in send_packet
raise zigpy.exceptions.DeliveryError(
zigpy.exceptions.DeliveryError: Failed to enqueue message after 3 attempts: <EmberStatus.NETWORK_BUSY: 161>

# Something seems to be probing the serial ports for bootloaders??
2024-04-06 18:34:31.749 INFO (MainThread) [universal_silabs_flasher.flasher] Probing ApplicationType.GECKO_BOOTLOADER at 115200 baud

# This is the serial port of my Zigbee stick
2024-04-06 18:34:31.749 DEBUG (MainThread) [zigpy.serial] Opening a serial connection to '/dev/ttyUSB0' (115200 baudrate)
2024-04-06 18:34:31.751 DEBUG (MainThread) [universal_silabs_flasher.common] Connection made: SerialTransport(<_UnixSelectorEventLoop running=True closed=False debug=False>, <universal_silabs_flasher.gecko_bootloader.GeckoBootloaderProtocol object at 0x7e778ed3d580>, Serial<id=0x7e7791b31f90, open=True>(port='/dev/ttyUSB0', baudrate=115200, bytesize=8, parity='N', stopbits=1, timeout=0, xonxoff=False, rtscts=False, dsrdtr=False))
2024-04-06 18:34:31.752 DEBUG (MainThread) [universal_silabs_flasher.common] Sending data b'\n'
2024-04-06 18:34:31.752 DEBUG (MainThread) [universal_silabs_flasher.common] Sending data b'3'
2024-04-06 18:34:31.806 DEBUG (MainThread) [universal_silabs_flasher.common] Received data b'\x151\xb1\xa9\r*\xf2\x90\xb8\xaco\xdbU\xc0\x1c\x95\xd0\xe1\'"\xd0~'
2024-04-06 18:34:31.807 DEBUG (MainThread) [universal_silabs_flasher.gecko_bootloader] Parsing State.WAITING_FOR_MENU: bytearray(b'\x151\xb1\xa9\r*\xf2\x90\xb8\xaco\xdbU\xc0\x1c\x95\xd0\xe1\'"\xd0~')

Don’t have an answer but a genuine question.

If you have proxmox installed, why are you going through the unnecessary complication of running HA in Ubuntu Docker?

1 Like

I’m open to suggestions, what would you recommend?

AFAIK, the alternatives would be a thinner LXC container with docker (no experience with that, but IIRC docker in LXC was not recommended) or running a VM with HAOS. I have some internal resistance to the aggressive user friendliness direction Home Assistant has taken, where first YAML configuration was scrapped, and now it looks like I can’t use Zigbee because something is helping me with a surprise firmware upgrade. HAOS seems like everything bundled together in something that can be hard to understand and work with, for instance if I need to migrate something to another host. So I guess running it in a docker-compose stack gives me a slight sense of control :slight_smile:

I’d set the paranoia aside for a little while & try running Haos using tteck’s scripts (can’t send a link right now. Don’t have it saved on mobile).

What you’re seeing is probably normal communication which is only shown when you have DEBUG logs enabled. Otherwise it’s usually hidden.

I’m not looking into this because of my paranoia, it’s because Zigbee doesn’t work.

Since the coordinator setup failed, I’d investigate to ensure the device path hasn’t changed. Did you check if the device is still linked to the same virtual device ID? The logs reference /dev/ttyUSB0, is that right?

Have you physically unplugged the coordinator to reset it?

You mention a “serial to usb converter” - What does that do? How is the phyical usb port passed to the Ubuntu VM that Home Assistant is running in?

I checked the path, and the initialization cycle repeats seemingly indefinitely. /dev/ttyUSB0 refers to the Zigbee device (ttyUSB0 corresponds to the serial number of the Z-Wave one). I also removed the Z-Wave controller to ensure that it was not a mix-up, and also that it was not the one triggering the bootloader probing.

I have physically unplugged the coordinator, and also turned off the entire PC.

The Serial to USB converter is a chip on the Zigbee coordinator stick, which is also used on the Z-Wave controller stick. The chip is what the computer sees, and how it recognizes the hardware. That’s why I suspected there might be some confusion. But unplugging the Z-Wave stick should have eliminated that.

My best guess is that the bootloader probe/surprise firmware upgrade module starts probing and interferes with ZHA, causing the Zigbee integration to fail. Maybe when I get enough devices in the network, the Zigbee integration takes some time to load, and causes the probe to think that the device is free.

They both can’t be the same device ID.

You seem convinced that a firmware upgrade is being initiated, but I don’t agree. I haven’t seen any evidence of ZHA upgrading firmware automatically in the several years I’ve run Home Assistant. I’ll leave you with that train of thought to pursue. Best of luck.

Typo, the Z-Wave is ttyUSB1.

I might be mistaken about that it actually tries to do a firmware upgrade, but I am trying to find out why ZHA fails to load. Based on the logs, it is apparent that the initialization fails when another module starts probing for a bootloader, which likely means that it takes over the serial port.

After removing and adding the integration again, ZHA is now able to successfully load. During setup of the integration this time, I was prompted with the type of controller, which I didn’t get the last time. The other main differences is that the entire loading takes seconds instead of minutes, and I find no track of universal_silabs_flasher in the logs.

My last comment was a bit harsh, but let me genuinely try to address your concerns and clear up some misconceptions.

I’d suggest running HAOS in a VM by itself and simplifying your setup.
As it stands, you currently have Proxmox > Ubuntu VM > Docker > HA (not sure which installation type, but I’m assuming it’s not HAOS).
Each of those is a potential point of failure when it comes to communicating with attached hardware. Why not try Proxmox > VM running HAOS and reduce your points of failure by 50%? (tteck script is here).

That will be true for any installation type. I’m not saying you’re wrong, but for years HA had a reputation for being hard to work with for beginners.
The devs listened and made it more accessible - maybe too accessible, but that ship has sailed.
My point is - you definitely can’t please everyone, but at least it shows that the devs are listening to people’s feedback. I’d rather have that than use a platform where feedback is ignored.

Still not sure why you got those entries, but based on your last comment there’s a few possibilities:

  • Multi-protocol (Zigbee + Matter on the same chip) related. Maybe at some point you clicked the enable multi-protocol option. (Side note - don’t enable that option. It’s still experimental and has caused issues).
  • (More probable) Given that you previously didn’t get the prompt for the controller type, the bootloader was trying to figure out which chip it was communicating with.
  • (Most probable, and related to the above) Given you’re defining the path using ttyUSB0/1 instead of serial_by_id, it picked up the Zwave chip.

Using the ttyUSB to define the path is dangerous since there’s no guarantee that it won’t change if you unplug & replug the dongles.
Have a look at this thread to learn how to set up ZHA & Zwave to use the serial_by_id in the integration/addon setup.

Honestly, it’s not hard to understand and work with at all, especially for a migration.
I moved everything from a Pi to a laptop running proxmox and it couldn’t have been smoother or simpler:

  • Take a full backup of your old HA instance and download it locally.
  • Install HA on your new host.
  • Choose “Restore from backup” on the onboarding screen and upload your backup.

It’s a simple 3 step process which restores everything, including any addons if you’re running HAOS. I was done in about an hour, and that includes installing HA in a VM from scratch.

Hope that clears up some things for you!

2 Likes

ZHA and Zigbee 2 MQTT are different than ZWave in that the controller is reloaded from the integration at startup ontonthe controller whereas ZWave stores on the controller and loads up ZWave. Yes it’s way more technical than that but it is a fundamental difference.

While I’m not positive exactly how the commands look on the wire they would look like someone or something trying to flash the stick. I’d bet that’s what you are seeing.

2 Likes

Completely agree with @ShadowFist here. That’s an awfully complicated path, especially in terms of hardware. If you want to simplify and maintain machine independence, I’d go with Proxmox-> HAOS OR just plain docker on a bare metal OS → HA Container. I currently do the latter in a docker swarm for that very reason.

Again, 100% agree. I’m also in the “I wish YAML was still more prevalent”, but at the end of the day, HA is still HA regardless of the installation type.

2 Likes

About my setup - thanks for the tips, I understand they are well intended. My comments were to explain, not defend, why I chose the way I did, answering the question “why are you going through the unnecessary complication of running HA in Ubuntu Docker?”.

I will probably migrate to a more elegant solution at some point, but right now I’ll get started with what I know how to set up. After some years with no time to play with home automation, I’m trying to get Home Assistant back up and running with ~40 ZigBee and ~30 Z-Wave devices, cameras, robot vacuums, car, lawn mover, heat pump, ventilation+++. Also Frigate and AppDaemon. So one thing at the time :slight_smile: Also, Proxmox > HAOS VM is probably also a stack of a Linux OS and maybe a container environment too.

I have not enabled multi-protocol, and I believe that is not an option for my SLZB-07.

For some reason I could not set up the ZHA integration with /dev/serial/by-id, I already tried that. But I will look more into it.

I am quite sure the Zigbee network is stored on the stick, as the devices were there after deleting and adding the integration again, just without correct device names and areas. And, I also believe there is no reason to probe for a bootloader in the middle of initializing the network.

Right now ZHA loads as expected, and I haven’t been able to reproduce the issue after several restarts and deleting/adding the integration, but if I can I’ll probably make an issue on github.

Correct. if using Zigbee2MQTT, what is stored on the stick is saved in backup.json. In ZHA, you can make a backup of the NVRAM of the stick manually.

2 Likes

Do you know the port speed and flow control settings? Adding the ZHA integration with defaults (where it suggests /dev/ttyUSBx) works, but using Enter Manually and EZSP I am prompted for port speed and flow control. Default settings (115200 and “blank” which I assume equates to none) gives me a “Failed to connect”.

Entering /dev/ttyUSB0 as Serial device path worked, so there is likely some problem with using /dev/serial/by-id…

Could you paste the full serial by id path you used and screenshots/logs of what you’re getting?

It should look something like this /dev/serial/by-id/usb-Silicon_Labs_HubZ_Smart_Home_Controller_C0F003D3-if01-port0

Just guessing, but probably docker privileged mode can’t access devices through symlinks. Mapping the device from /dev/serial/by/id… to /dev/zigbee and pointing to that one from Home Assistant worked.

Still, I’d like to understand why the universal_silabs_flasher started interfering. I don’t want to spend hours setting up the system when it can happen at any time.

Anyways. I now communicate with the stick using /dev/serial/by-id, and went on adding some more devices. After three devices successfully added, the fourth one wouldn’t add, and I restarted the integration. Now it’s back to trying to probe.

As suggested above, maybe it is a failure/timeout etc in the stick that triggers a probing for the type of stick. But, then I would expect to see some logs from universal_silabs_flasher also when everything works as expected.

I’m honestly stumped. It might be your setup, but I enabled debug logs and reloaded ZHA and didn’t get any flasher entries.
Then again, I’m running a different ezsp (Silabs) coordinator.

Clutching at straws here, but if you are 100% sure you didn’t enable matter support, can you check if it came preloaded with the matter firmware? The product page specifically lists it as supporting thread/matter.