I have an SMLight SLZB06 Zigbee coordinator set up in a remote location, and I’m connecting to it via a WireGuard VPN from my main Home Assistant instance (using Zigbee2MQTT). Whenever the VPN drops or Zigbee2MQTT hangs, the coordinator seems to become unresponsive, so the entire Zigbee network goes down.
Right now, I’m dealing with it by power-cycling the coordinator using a smart plug whenever the network stays down for too long—but I’m looking for a more graceful or automated solution. It’s vital for me to be able to handle this remotely without on-site intervention.
I’ve seen some hints about using a firmware (on the SLZB06) that might allow an HTTP reboot endpoint or even running Z2M locally on the coordinator’s ESP32 chip and forwarding data over MQTT. Has anyone gone that route or done something similar for a robust remote setup?
Is there a better way to detect a hung coordinator or a broken VPN tunnel and auto-restart it without physically cutting power?
Would running Zigbee2MQTT locally on the coordinator (or on a device at the remote site) and sending sensor data over MQTT to my main HA instance be simpler in the long run?
Any best practices for maintaining stability over WireGuard, like keepalive settings or specific Z2M config tweaks?
I’d really appreciate any insights or experiences. Thanks in advance!
You can’t run Zigbee2MQTT on an ESP32, however, you can run Zigbee2Tasmota on an ESP32. But unless you know what you are doing, I would run Zigbee2MQTT on a separate remote device.
That’s what we’re doing right now (running Z2M on a separate remote device), but sometimes the connectivity is unstable and the zigbee network falls. In most cases, the only way to solve this is by powering off and on the coordinator using the wi-fi smart plug, but I’m looking for a more stable and robust solution
I’m in a similar situation, running SLZB-06 with VPN to Z2M.
I was able to improve VPN by adding the optional ping IP. This should be the endpoint of your tunnel that is local.
There is an experimental mode called hub in some firmware version. The hub means it’s not just acting like a coordinator, but also replacing the likes of z2m or zha to send MQTT messages directly. (will try that once it’s considered stable)
Mine cuts off every two days or so. I can see the VPN connection is alive, but Z2M can’t connect to it, neither does the http interface work.
We’ve done a couple of things to make these connections more stable:
We created a blueprint that automatically restarts the add on if it detects that its down. It retries multiple times
We always use a wifi smart plug (we use a Shelly Plug S) to power the coordinator. This way we can either power cycle the coordinator (this is also done automatically in the blueprint) or we can tell if the issue is related to the internet connection (because the plug is offline too)
Sometimes, if the connection is going down too often, we enable the “allow multi-threaded socket connection” under the Z2M and ZHA menu:
I don’t know exactly what it does, but we’ve read somewhere that it solves this issue when it happens frequently and it’d worked for us.
These strategies made our connections much more stable. We have dozens of them installed all around the country we work on and we haven’t had any issues that required us to have physical access to the coordinators.
That being said, because of the architecture we’re using, we’re very eager to start using the Zigbee Hub as soon as it has compatibility with all the zigbee devices we install. Today I sent an email to SMLight to have a conversation with them to understand their roadmap and see when can we expect to have this working.
I added a script in the SMLight that automatically reboots the coordinator everyday at 11PM and this helped a lot.
On the HASS side, I created a helper that pings the coordinator every 10 seconds. Whenever it goes from “off” to “on” (disconnected to connected), it starts the Z2M add-on.
This two techniques have made my connections much more resilient. I barely have any issues now.
I’d love these connections to be more robust too and stable too, but I don’t think there’s much more I could do for that
@robert97 can you please share the HASS helper? Since the latest SMLight firmware, the connection is more stable as it used to be. I have also setup the script to reboot the coordinator, but it causes a “spikes” in the temperature measurements on the graph. The values, when it’s disconnected, are few degrees higher than the last measured value.
Here’s the script. I’m using timezonedb to get the time. It’s free, but you need an API key.
#META {"start":1}
# ===================================================================
# SLZB Auto-Reboot Script (TimeZoneDB GET Version)
# ===================================================================
# This script reboots the device daily at a specified local time.
# It uses the TimeZoneDB API and parses the "formatted" string.
# ===================================================================
import HTTP
import json
import string
# ## Configuration ##
# Set the desired local hour for the reboot (24-hour format).
var REBOOT_HOUR_LOCAL = 23
# Failsafe timer: 24 hours + 15 minutes.
var FAILSAFE_INTERVAL = (24 * 3600 * 1000) + (15 * 60 * 1000)
# Recommended HTTP buffer size to hold the full response.
var HTTP_BUFFER_SIZE = 512
# ## Global Variables ##
var time_to_reset = FAILSAFE_INTERVAL
SLZB.log("Auto-reboot script started. Reboot scheduled for " .. REBOOT_HOUR_LOCAL .. ":00 local time.")
# The main loop runs indefinitely.
while (true)
# Attempt to sync the time, but only on the first run or after a failure.
if (time_to_reset >= FAILSAFE_INTERVAL)
SLZB.log("Attempting to get precise local time from timezonedb.com...")
var json_response = nil
var url = "http://api.timezonedb.com/v2.1/get-time-zone?key=API_KEY&format=json&by=zone&zone=Europe/Madrid&fields=formatted"
if (HTTP.open(url, "get", HTTP_BUFFER_SIZE))
var code = HTTP.perform()
if (code == 200)
json_response = HTTP.getResponse()
else
SLZB.log("HTTP request failed. Code: " .. code .. ". Falling back to failsafe.")
end
# Per guidelines, close the connection immediately.
HTTP.close()
else
SLZB.log("HTTP.open() failed. Check network. Falling back to failsafe.")
end
# If we got a valid response, parse it.
if (json_response != nil)
var time_data = json.load(json_response)
# MODIFIED PARSING LOGIC STARTS HERE
if (time_data != nil && time_data["status"] == "OK" && time_data.contains("formatted"))
var formatted_str = time_data["formatted"] # e.g., "2025-09-19 19:18:09"
# Extract the HH:MM:SS part of the string
var parts = string.split(formatted_str, " ") # -> ["2025-09-19", "19:18:09"]
if (size(parts) == 2)
var time_part = parts[1] # -> "19:18:09"
var time_components = string.split(time_part, ":") # -> ["19", "18", "09"]
# Convert to numbers
var h = int(time_components[0])
var m = int(time_components[1])
var s = int(time_components[2])
# Calculate milliseconds until the next target reboot time
var millis_now = (h * 3600 + m * 60 + s) * 1000
var millis_target = REBOOT_HOUR_LOCAL * 3600 * 1000
var millis_until_target = millis_target - millis_now
if (millis_until_target < 0)
millis_until_target += (24 * 3600 * 1000) # Add 24 hours
end
time_to_reset = millis_until_target
SLZB.log("Time synchronized. Reboot scheduled in " .. (time_to_reset / 3600000) .. " hours.")
else
SLZB.log("Could not parse formatted date string. Falling back to failsafe.")
time_to_reset = FAILSAFE_INTERVAL
end
else
SLZB.log("API response status not OK or missing data. Falling back to failsafe.")
time_to_reset = FAILSAFE_INTERVAL
end
# MODIFIED PARSING LOGIC ENDS HERE
end
end
# Compare the device's uptime with the calculated reboot time.
if (SLZB.millis() >= time_to_reset)
SLZB.log("!!! REBOOT CONDITION MET. Rebooting device now. !!!")
SLZB.reboot()
end
# Log the current status for monitoring.
SLZB.log("Uptime: " .. (SLZB.millis()/1000) .. "s | Target for Reboot: " .. (time_to_reset/1000) .. "s")
# Wait for 60 seconds before checking again to conserve CPU.
SLZB.delay(60000)
end