TL;DR - My VM loses network connectivity sometimes, so I built a watchdog that resets it when this happens.
Hello, I’ve been using HA for about 9-10 months now, a long-time HomeSeer convert - and I love it! Running on an HP Elite Mini PC (Win 10 / VirtualBox), alongside Blue Iris 5 (but starting to look at Frigate, thats another topic). With the addition of a wall tablet running in the living room, has become a part of everyones daily routines.
Overall the system has been very reliable, until anything disrupts the controller PC’s network connection. That could be a momentary internet outage, router / switch reset or s/w update, unplug network cable to PC, etc. If any of these happen, the host PC will re-establish its network connection. The VirtualBox VM Network connection does not fully restore however.
When in this state, the following conditions are seen:
- No HA access from any PC’s on the network or Nabu Casa, including the wall tablet. Note all local PC’s on the same subnet.
- HA can be accessed from the Host PC however. From there, HA can be found to be running fine.
- Pinging the VM from the Host PC works- but pinging the VM from any other PC on the network fails (this works fine when all is running ok).
- Most devices/ integrations on the network and internet are Unavailable from HA. I say ‘most’, because strangely some like the Ecobee are not showing Unavailable. I suspect those connections are down too, but I havent tested for sure.
When in this state, shutting down the VM from the host PC, and restarting it restores the connection.
The best solution would be to fix the root of what is causing the issue. This appears to be a long-lived Virtualbox issue, after some digging - and no solutions that worked for me (at least that I could find). The next best - a watchdog that can reboot the VM when this condition occurs! I am new to Python, and thought this would be a good project to tackle.
One thing to note - the ‘Reboot System’ from HA does not resolve this issue for me. It does reboot the host VM, but the conditions stated previously still persist. Only when the VM is shut down from the Host side and restarted, does it restore full network functionality.
My solution uses a python script running on the host (windows) system, and establishes an MQTT connection with HA. It subscribes to a topic, and if HA detects that network connectivity is lost, it sends a reboot command to the host system. The host system then shuts down the VM, and starts it up again.
The automation that detects the network outage has two triggers - one triggers if the main router goes unavailable, and another trigger is run on HA startup. Other conditions check if a few other network devices are unavailable as well. If true, then the automation will send a persistent notification, delay a short time, then send the reboot signal.
Its been running great for a few weeks now. I havent seen too many natural network drops, but in all test cases its been 100% so far. I do need to get rid of the hacky time delays in the script and replace with wait logic.
Network Watchdog Python script (it uses the paho mqtt client) :
import paho.mqtt.client as mqtt
import time
import subprocess
from time import localtime, strftime
broker="10.0.0.11"
port=1883
def printmessage(newmessage1, newmessage2):
print (strftime("%Y-%m-%d %H:%M:%S", localtime()),newmessage1, newmessage2)
def on_message(client, userdata, message):
printmessage("message received " ,str(message.payload.decode("utf-8")))
if str(message.payload.decode("utf-8")) == "ON":
printmessage("Reboot Command Received... Rebooting VM","")
TheExecutable = 'C:\\Progra~1\\Oracle\\VirtualBox\\VBoxManage.exe'
pid = subprocess.Popen([TheExecutable, "controlvm", "HomeAssistantR02", "acpipowerbutton"]) # Call subprocess
printmessage("Delay 30 seconds... ","")
time.sleep(30)
printmessage("Launching VM (takes about 90 seconds)..","")
TheExecutable = 'C:\\Progra~1\\Oracle\\VirtualBox\\VirtualBoxVM.exe'
#print ("executable:", TheExecutable)
pid = subprocess.Popen([TheExecutable, "--startvm", "HomeAssistantR02"]) # Call subprocess
time.sleep(90)
def on_publish(client,userdata, mid, reason_code, properties): #create function for callback
printmessage("data published \n","")
pass
def on_connect(client, userdata, flags, reason_code,properties):
client.inprogress_flag=False
if reason_code == 0:
# success connect
client.connected_flag=True #set flag
printmessage("Connected to ",broker)
printmessage("Subscribing to topic hostPC/reboot","")
client.subscribe("hostPC/reboot")
if reason_code > 0:
# error processing
printmessage("Error Connecting: ",reason_code)
def on_disconnect(client, userdata, flags, reason_code,properties):
client.inprogress_flag=False
if reason_code == 0:
# success disconnect
printmessage("Client Disconnected","")
client.connected_flag=False #reset flag
if reason_code > 0:
# error processing
printmessage("Error Disconnecting: ",reason_code)
mqtt.Client.connected_flag=False#create flag in class
mqtt.Client.inprogress_flag=False#create flag in class
client = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2)
client.username = 'removed for post'
client.password = 'removed for post'
client.on_publish = on_publish #assign function to callback
client.on_message=on_message #attach function to callback
client.on_connect=on_connect #bind call back function
client.on_disconnect=on_disconnect #bind call back function
try:
if not client.inprogress_flag:
printmessage("Connecting to broker","")
client.inprogress_flag=True
client.connect(broker,port,keepalive=60) #establish connection
except:
printmessage("connect failed","")
time.sleep(5)
client.loop_forever(retry_first_connection=True)
And the HA Automation :
alias: "Control: Network Watchdog"
description: ""
trigger:
- platform: state
entity_id:
- sensor.nokia
from: null
to: unavailable
for:
hours: 0
minutes: 1
seconds: 0
id: nokiacellunavailable
- platform: homeassistant
event: start
id: hastarting
condition:
- condition: or
conditions:
- condition: and
conditions:
- condition: trigger
id:
- hastarting
- condition: state
state: unavailable
entity_id: sensor.nokia
for:
hours: 0
minutes: 2
seconds: 0
- condition: and
conditions:
- condition: trigger
id:
- nokiacellunavailable
- condition: state
state: unavailable
entity_id: climate.garagetemp_thermostat
for:
hours: 0
minutes: 0
seconds: 0
- condition: state
entity_id: sensor.obihai_sp1_service_status
state: unavailable
action:
- service: persistent_notification.create
metadata: {}
data:
message: Home Assistant detected a Network disconnect and is rebooting...
- delay:
hours: 0
minutes: 0
seconds: 5
milliseconds: 0
- service: mqtt.publish
data:
qos: "0"
topic: hostPC/reboot
payload: "ON"
mode: single