Network issue when adding new Thread device on my baremetal installation, but working on my virtualized platform

Hello,

I just switched from a virtualized,installation of HA, based on Proxmox and a standard X86-64 image, to a baremetal installation to have a dedicated hardware running it.

I am facing issues with Thread and Matter that I do not have on the virtualized installation, and I currently have no clue where to look for more information to identify what is exactly causing the issue. Both are configured the same way.

What is not working: I can trigger the discovery of a new Matter/Thread device. But it fails at the Network step (the can’t connect to thread network, on my phone)

Where it is working: Intel NUC, Proxmox OS, one VM running latest HA OS.
I am using a Sonoff Zigbee Dongle E (usb-ITEAD_SONOFF_Zigbee_3.0_USB_Dongle_Plus_V2_20230829202004-if00) and the Silicon Labs Multiprotocol version 2.4.4
On top of that, ZHA and Thread/Matter integrations. I can add a Zigbee device, I can add a Thread device, both are working fine. The Sonoff dongle is using USB passthrough, of course.

Where it is not working: Zimaboard using the same Generic x86-64 image.
Then I also installed the same modules, same integrations. I also use the exact same Sonoff USB dongle. All installed versions of components look the same.

I can also switch the Sonoff dongle from one installation to the other, it keeps working on my virtualized installation, it does not work on the other.

So:

  • HA is working at least on one of my installation
  • the Sonoff dongle is working
  • the Mattter/Thread devices and Zigbee devices are working

There must be something incorrectly configured on my installation, but I have no idea what to look for and what should be correct or not. I was thinking that maybe the network IPv6 was an issue, but both are on the same network, wired, with DHCP. But maybe there is a difference in this area that should be checked.

If anyone knows some troubleshooting steps that I can run on both installation, maybe I would be able to identify what is not correct on my production installation !

Thanks !

Not sure… so take some guesses. Unless you transferred the image from Proxmox to Zimaboard, then the Zimaboard’s Multiprotocol Thread stack is starting with a new and different dataset. If you are commissioning devices using Android, the Android will use the Thread dataset it found the first time which is probably the one from the Proxmox Multiprotocol Thread dataset. If this is the problem, then try either importing or sync’ing the Android dataset with Zimaboard’s HA Thread integration.

Your old Thread dataset can be found in .storage/thread_datasets

Hello to both of you.

I am not sure I fully understand your answer.
Just to be sure that your remark is valid for my installation: my VM and my other baremetal installation are two different installations. No system copy, nothing reused (imported/exported). I just made sure to install the same integrations and services, and configure them in a very similar way … but maybe not 100% accurate, as there are differences between both systems of course.

I had no idea what this dataset is and how to sync it. Can I just reset it. I had no idea my issue could come from the Android smartphone, not my HA installation !

You move the dongle from one installation to the other, or you have 2 similar dongles ? ‘the exact same’ is ambiguous.

All the devices on a Thread network use a common “dataset” which contains things like, credentials, channel number to operate on, etc. Usually when a Thread network is being used for the first time, the Thread Border Router autogenerates the dataset. HA’s Multiprotocol AddOn and OTBR AddOn do just this, and the HA Thread Integration retrieves this dataset from the AddOn.

Goto your UI->Integrations->Thread Integration, click on “Configure”. You will likely see only one Thread Network listed and it is called “home-assistant”. The other thing to check for is whether the “home-assistant” network is shown under the heading “Preferred Network”. Anyway, for the “home-assistant” network, click on (I). You should get a pop up and one of the items is “Active dataset TLVs”.
This is the “dataset” I’ve been referring to. Compare the dataset on the Proxmox with the one on the Zimaboard. If they are different, then this difference indeed can be the problem.

For how to transfer the dataset, here is the document to read. The document describes 2 cases.

I had earlier mentioned “Sync’ing”, let me say that it is somewhat confusing to me, but my understanding is that Sync’ing can sync the Thread dataset from HA to Android (when HA has a “preferred” dataset and Android does not have a dataset at all), and from Android to HA (when Android has a Thread dataset and HA does not have a “preferred dataset”). So the latter may work for you, but I’m now no longer entirely sure.

However I think the best thing to try would be to “Import” the dataset from Android to HA (which is “Case 2” of the document). So give it a try and see if this will make the “Active dataset TLVs” be the same for Proxmox and Zimaboard.

[Edit]. Let me add a third possibility… in the Thread Integration, click on the 3-dots in the upper-right corner and you’ll see a option to “Add dataset from TLV”. Here you should be able to copy the “Active dataset TLVs” from Proxmox to Zimaboard. It might actually be easier that importing.

1 Like

Oh, I see. I have a single dongle, that I have switch from one installation to the other.

Thanks, I will try that tonight and let you know if I managed to progress on my issue :slight_smile:

Okay, I guess something incorrect happened in my second installation.
Just to be clear, I do not want to share the Sonoff dongle between both installation, I do not need it on the my previous proxmox, I can get rid off everything, no problem at all with that option also.

Here is the initial status on my Zimaboard

I have two networks. One is using channel 15 (!!!)


The other is using channel 25

My ZHA/SL Mulitprocol is using 25.

Is it already incorrect at this point, if I understand properly.

Here is the Thread networks on my Proxmox installation

And here are the information :

I plugged again my dongle there, to check, everything is still working, Thread is functionnal:

I fixed it this way in Zimaboard:


which is still not correct I guess, as I am still facing the network issue.

I am not sure I properly understood everything you shared, but I went through the documentation, it seems “maybe” ok.
I would be more than ok to reset everything, if it could be a shortcut. I am in the process of adding devices, I prefer starting with something clean and properly configured.

Here is the “final” configuration after a system reboot. But still the same issue at network check, on my smartphone:

Thanks !

A bit of a mess me thinks :thinking: .

It seems like your Proxmox HA instance is named “homeassistant.local”
and your ZimaBoard HA instance has the name “homeassistant-2.local”.

It seems that your working Proxmox Thread Network (the one using Channel 25) has the name “ha-thread-1b6f”.

Somehow your ZimaBoard

  • has auto-discovered the Proxmox Thread network named ha-thread-1b6f.
  • has also created its own thread network named “ha-thread-8cea”

In your fix for ZimaBoard, it looks like:

  • Has its own Thread Network named home-assistant and is marked as preferred.
  • Has discovered the Proxmox Thread network “ha-thread-1b6f”

I really don’t know what to recommend for you…maybe try the following:
Assuming ha-thread-1b6f is the working Thread network for Proxmox, then
on your ZimaBaord “fix” (last image), click on the “MAKE PREFERRED NETWORK” for the ha-thread-1b6f network. If that succeeds, it should set your ZimaBoard’s “home-assistant” Thread network with the same dataset as the ha-thread-1b6f and then you should be working with your existing devices.

1 Like

You need to synchronise the thread datasets

on your vm, go to .storage, find thread datasets

open it

{
  "version": 1,
  "minor_version": 4,
  "key": "thread.datasets",
  "data": {
    "datasets": [
      {
        "created": "2023-08-07T08:39:54.356366+00:00",
        "id": "01H77JE77M0KV2WVAD7VNB6NEZ",
        "preferred_border_agent_id": "3443750a34507d6d18072ac71376c3b6",
        "preferred_extended_address": "020d1099d6c33960",
        "source": "otbr",
        "tlv": "0e080000000000010000000300000f35060004001fffe00208eb1a834bfa4bb9000708fda1908e83ccabb80510698610ca912e666c08b8112a1b331f94030e686f6d652d617373697374616e7401023918041082b73b287fbc0b1f22e4d57e99dde75f0c0402a0f7f8"
      }
    ],
    "preferred_dataset": "01H77JE77M0KV2WVAD7VNB6NEZ"
  }
}

on your Zimaboard, go to the thread integration, click configure, then the 3 dots in the upper right corner

Choose ‘Add dataset from TLV’

afbeelding

Insert the long TLV from the dataset from your VM

Now you will have one thread network.

1 Like

The configuration of tlv for dataset under the .storage directory is the exact same value that is available in the UI in the “active dataset TLVs” entry, and this is already the one I copied when @wmaker suggested me to do so.

I tried that option:


But I still got an error Unable to connect to thread network home-assistant :frowning:

1 Like

Do you have any new errors in your logs coinciding with the attempt? I’d like to know why and it probably said something.

If your last image is for the ZimaBoard, then it looks like it now has the “preferred” dataset which is hopefully the original working one from Proxmox. However the Thread integration also thinks the Proxmox TBR is the one the dataset is associated with. Try the following …
Powerdown the Proxmox VM so that it quits advertising its TBR. Then plug in the Sonoff into the ZimaBoard and power it up… Now what do you see in the Thread Integration?

Matter logs:

2024-03-31 22:48:58 (MainThread) INFO [matter_server.server.helpers.paa_certificates] Fetched 130 PAA root certificates from DCL.
2024-03-31 22:48:58 (MainThread) INFO [matter_server.server.helpers.paa_certificates] Fetching the latest PAA root certificates from Git.
2024-03-31 22:49:11 (MainThread) INFO [matter_server.server.helpers.paa_certificates] Fetched 90 PAA root certificates from Git.
2024-03-31 22:49:11 (MainThread) WARNING [FabricAdmin] Allocating new controller with CaIndex: 1, FabricId: 0x0000000000000002, NodeId: 0x000000000001B669, CatTags: []
2024-03-31 22:49:11 (Dummy-2) CHIP_ERROR [chip.native.DL] Long dispatch time: 506 ms, for event type 2
2024-03-31 22:49:11 (MainThread) INFO [matter_server.server.device_controller] Loaded 0 nodes from stored configuration
2024-03-31 22:49:11 (MainThread) INFO [matter_server.server.vendor_info] Loading vendor info from storage.
2024-03-31 22:49:11 (MainThread) INFO [matter_server.server.vendor_info] Loaded 184 vendors from storage.
2024-03-31 22:49:11 (MainThread) INFO [matter_server.server.vendor_info] Fetching the latest vendor info from DCL.
2024-03-31 22:49:11 (MainThread) INFO [matter_server.server.vendor_info] Fetched 182 vendors from DCL.
2024-03-31 22:49:11 (MainThread) INFO [matter_server.server.vendor_info] Saving vendor info to storage.

Silicon Labs Multiprotocol:

Restarting
[22:50:59:152210] Info : Endpoint socket #12: Client disconnected. 1 connections
[22:50:59:152331] Info : Client disconnected
[22:51:00:157701] Info : New client connection using library v4.3.1.0
[22:51:00:161248] Info : Endpoint socket #12: Client connected. 2 connections
Reusing socket from previous instance.

There is no thread entry, so I do not know where and what to look for :frowning:

The Proxmox installation has almost always be down, except when I was asked to retrieve some configuration there.
I do not care about this installation, this was only some quick tests I have doing to check that with multiprotocol I was able to get ZHA and Thread properly working (and reliably) and this installation will be deleted. The “quick” tests start being very annoying :grinning:

Here is now the Thread page after some reboots, TLV impots, also reset the phone app that was connected to both installations:

Is it better ? I have no idea !

But the result is the same, both my Nuki 4 pro and my Eve Energy cannot connect. Verify network connectivity … and failing.

Things are getting now even worse … Home Assistant is now telling me that I need a border router.

Even if Thread configuration is there and seems okay

I updated to latest SIL Multiprotocol version (2.4.5) and Matter (5.5.1).
I am going through the very very few logs I can find and there is no obvious errors. But I activated debug logging, in case it may help.

I am asking again and again but is there any way to factory reset the installation, so that HA, Sonoff dongle as well as my smartphone, everything is gone and I can use again Matter and Thread ?

I thought I would be able to fix this installation, it seems I cannot.

Android/Google Play stores the first credentials it ever gets and won’t use another one. I think that it stores the credentials against a border router “agent-id”, so either your credentials are not the same as the original, and or the “agent-id” is also different which may be why you’re getting the border router not found. There is a way to fix this, but you have to clear out the Google Play which will also clear out other things you may use that is not HA related.

1 Like

This is good to know.
I am not willing to do it, but I think I should start thinking about it because after weeks of debugging, searching, and finally asking here, I think I am stuck somewhere with this installation that shares configuration from two platforms.
I do not know if there is anything I did wrong or that I am still doing wrong, but I guess this should be a supported use case to switch from one installation to the other in case of any failure, and I hope others will not face the same issues I currently have.

I will give probably give a try clearing services this week-end, maybe deleting integrations and services, removing configurations, if any, also flashing the dongle in case it might reset anything on it or help generating new identifiers, I hope it will activate as a fresh installation !
Thanks !