I’ve had an ongoing issue with Home Assistant where I have to do a full restart on the host machine after any type of network (LAN or WAN) outage to get it to reconnect. This means if I reboot my router over night HA is not working in the morning.
I am running HA on Windows 11 Oracle VM (haos_ova-8.4.vdi) with a static IP set. It will not reconnect on a simple HA restart or reboot. I have to restart the whole host machine. My network setup is TP Link Omada hardware. Any suggestions would be greatly appreciated. Thanks.
Same thing happens to me (VMWare) whenever my Windows 10 machine’ network adapter connection changes. I thought it might be because I have multiple adapters and teaming going on, but perhaps not.
I am hoping someone will chime in that has the solution. We can’t be the only two out there experiencing this issue. It really sucks when your away and the power goes out and then comes back on and you have no functionality until you can get home to manually reboot.
You are not alone. I am using HA inside a Virtualbox, on a G2 Mini PC, wired Ethernet. My system is very stable except whenever something happens to my router (power loss, reboot, etc.) the PC reconnects, but the VM fails to reconnect. The HA automatons continue to work and there is no issue with the Zigbee or Z-wave. Of course when this happens I lose all app connectivity. On the router side, I can see the pc connected but the virtual machine does not show connected. This has been persistent and repeatable through multiple versions over the last six months since I built the system (running 2022.9.4 now) and whenever I make changes to my router settings or reboot, I have an extra manual step of manually rebooting HA… My router rarely goes down; I don’t need to reboot it very often and I have it plugged into a UPS, but would very much like a solution for this.
I have seen going back several years in the Hassio boards people with similar problems but no definitive solve. I think it might be a problem with the VM.
Edit to add: The last time I encountered this was this week, I shut off a breaker to add a motion switch, and the wired AP that my G2 box plugs into was on that breaker. So although my router didn’t go down (it’s UPS protected) just the interruption was enough to screw up the connection and require a reboot. I dug pretty deep in the forums and reddit looking for a solve, and am seeing a lot of people with the same problem. I saw one commenter mentioning that it’s a known issue in virtualbox with no further details.
UPDATE: I ended up eliminating VM altogether. I had a spare SSD that I added to the empty bay of the G2 Mini, and used the original windows drive to install HAOS to the empty drive using etcher. I have automated backups (local and to a NAS) and don’t miss the VM at all.
Hi, I was struggling with the same issue so I’ve written a python script to restart my HomeAssistant VM after losing internet connectivity.
What it does is tries to connect to Google’s DNS service (8.8.8.8) once every interval (default 5 seconds), if it can, it will continue on its merry way, checking connectivity every 5 seconds. If the connection is lost, it will keep checking every 5 seconds UNTIL it is able to connect again, at which point it will restart the HomeAssistant VM and go back to checking connectivity.
Instructions (I’m not sure the level comfort with programming/running scripts of the reader so I’ll try to be as detailed as possible. Also these steps will be Windows based but should be relatively easy to modify for other OSes):
Install python, should be installed on the host machine, not the VM. I won’t put a bunch of instructions on how to install python but just google “python [OS]” and you should be able to figure it out, get it from python.org. NOTE: When installing, be sure to click the tick-box that adds python to the PATH.
Change the value of vbox_manage_path to the directory where VirtualBox is installed, default for Windows is "C:\Program Files\Oracle\VirtualBox"
Alternatively: Add the vbox install directory to your Path environment variable and leave the vbox_manage_path variable as "VBoxManage" NOTE: Don’t mess with environment variables if you don’t know what you’re doing, use the full path method above.
Change the value of vm_name to the name of your HomeAssistant VM in VirtualBox
Set the value of connectivity_check_interval_seconds to however frequently you want to check for internet connectivity. I often have very short dropouts of <10 seconds so I’m starting with every 5 seconds and seeing how that goes. Something to note: if you set it to a higher value and your internet drops out and reconnects straight after a check and in less time than the interval, the script will not be aware of the dropout and your HA will lose connectivity and not be reset, so bear that in mind when setting this value.
Make a new txt file, name it whatever you want, copy the script from below into the file, and change the extension to .py.
Execute the script in cmd or PowerShell. Easiest way to do it is to navigate to the directory that the script is in, with no files highlighted, hold shift and then right click and then click on “Open command window here” or “Open PowerShell window here”. When it opens, type python '[name of script].py' and hit enter. The script should output "Internet is available." and continue to do so once every interval you’ve set. Leave this window open forever so that the script will reset your VM if connectivity is lost.
Here is the script:
import socket
import time
import subprocess
vbox_manage_path = "VBoxManage"
vm_name = "HomeAssistantVMName"
connectivity_check_interval_seconds = 5
def check_internet_connectivity():
try:
# Check if we can connect to Google's DNS server at 8.8.8.8
socket.setdefaulttimeout(5)
socket.socket(socket.AF_INET, socket.SOCK_STREAM).connect(("8.8.8.8", 53))
return True
except OSError:
return False
had_dropout = False
while True:
if check_internet_connectivity():
print("Internet is available.")
if had_dropout:
print("Resetting VM")
subprocess.run([vbox_manage_path, "controlvm", vm_name, "reset"])
had_dropout = False
else:
print("Internet is not available.")
had_dropout = True;
time.sleep(connectivity_check_interval_seconds)
DISCLAIMER: Use at your own risk, I bear no responsibility if you break something by not doing it right.
Potential improvements/differences: Ping external IP of HA instead of Google’s DNS for a more accurate idea of whether or not it’s reachable. (quick try of this and my ping hits the internal IP = useless, not sure how to hit the external IP from the host machine, don’t have time right now to figure it out, sure it’s possible)
You may want to work out a way to do this with the Windows Task Scheduler instead of in a while loop, so you don’t have to have the script running 100 per cent of the time, and by all means, go for it. My issue is keeping track of previous connectivity between runs.
Hope this helps, let me know if you have an issues and I’ll try to help if I have time.
Thank you for the idea, I wanted to solve this without Python, here is what GPT came up with. Just save it on the Windows Machine as “HA_reboot.bat” and run it.
@echo off
SET vm_saved=1
:check_again
SET currentTime=%TIME%
SET currentDate=%DATE%
ping 8.8.8.8 -n 1 >nul
if %errorlevel%==0 (
if NOT %vm_saved%==1 (
echo [%currentDate% %currentTime%] Internet connection restored. Saving and restarting VM...
"C:\Program Files\Oracle\VirtualBox\VBoxManage.exe" controlvm "HomeAssistant" savestate
REM Warten, bis die VM gespeichert ist
timeout /t 60 /nobreak >nul
"C:\Program Files\Oracle\VirtualBox\VBoxManage.exe" startvm "HomeAssistant"
SET vm_saved=1
)
) else (
echo [%currentDate% %currentTime%] Internet connection lost.
SET vm_saved=0
)
timeout /t 5 /nobreak >nul
goto check_again
Why does everyone use Google’s DNS? Isn’t it enough to use the router’s IP? It will be more logical. Why overload home assistance if there is no Internet?
Its interesting how almost every VM problem ends up with no solution. I started facing few VM problems like this one and others, but never any solution Its quite frustrating to start loosing my entire HA build I have been working on for 5+ years. Its no longer running smoothly on Raspberry.
I have been reading a lot of threads about it and it seems all VM keep getting those kind of errors. I have windows server so maybe using Hyper-V directly?
power outage → power restore (network is lost every time)
router restart (network is lost sometimes)
HA VM in VirtualBox, running on Asus laptop.
Everything works perfectly except this issue.
I can manually reset the network in VM by going to VM network settings and change ‘Attached to’ from ‘Bridged’ to ‘NAT’ and back. Sometimes it works, sometimes it’s not.
When the network stops working, the ‘network info’ command in HA shows correct IPs and DNS, sometimes it shows 0.0.0.0.
Host machine network is not affected, so it’s clearly a VirtualBox issue.
Anyone found a solution (except restarting VM)?
It’s the combination of both Windows and VMware, imho. Windows as the underlying system is not ideal…
Think of it from a developer perspective:
Practically every developer in FOSS is using open-source or, if not otherwise possible, free software. That starts with Linux on the server (the HA development machine) and ends not only by using Proxmox.
So the least possible combination is Windows and VMware. But if only a few users use this combination, it’s much harder to develop or trouble shoot.
That’s not by intention, but if you have only a few users, there’s likely not much feedback about bugs. And to debug this, you’d need to have the same systems running - as I said, not likely to happen with Windows and VMware.
Windows and VMware both are products, that need support and fixing like every other software. But with these, you get this kind of support only, if you pay for it!
It’s not meant rude, but you get what you choose. With Windows and VMware you chose to use a system, that is harder to maintain as other alternatives. That’s why most people recommend to simply use alternatives to that combination.
I know, this is not what people want to hear, but really, the energy you put in debugging a Windows/VMware system is better put in using Linux and Proxmox.
Google DNS is public, but they are at times hit by DDoS attacks, so not the best host to use.
Your own router is not the best either, if that is the one causing the issue. It might reply on both the internal and external IP, but not any further out.
If you do a traceroute (tracert on Windows) to Google DNS then you can see the jumps between your client machine and Google DNS.
The best IP should be the one just after your routers IP (it might be represented with both the internal and external one in the list).
If the next one starts with 100 then your internet connection is behind a CGNat and that might also be a cause for the issue, in which case you might want to use the next IPninnthe list that do not start with 100.