Virtual Box Home Assistant OS keeps crashing and using 100% cpu on host

Hello,
My home assistant setup started crashing about 2 weeks ago. Any help that anyone has to troubleshoot this would be appreciated.

Hardware: Mini PC (intel n100 cpu, 12GB ram, 256GB SSD)

Configuration: Windows > Oracle Virtual Box > Home Assistant OS (.vdi image)

VM Resources: 4GB ram, 4cpu processors, 100% execution cap, 32GB storage

Devices and Services within HAOS: August Lock integration, Bond bridge integration, HACS, Homekit integration (export of devices to homekit), Litter Robot, SmartThinQ LGE, System Monitor, Z-Wave JS UI, ZHA

Behavior on Host PC: CPU usage will be pegged at 100% until I open the Virtual Machine and then it will drop to 30-50%, sometimes this alone will fix the issue. Everything else seems nominal. If HAOS does not recover by opening the VM through VirtualBox, I will shut down the VM and restart it within VirtualBox (I run the VM in headless most of the time).

I sometimes see errors on the VM CLI, the screenshot below is an example, but I dont know much about the error or how to troubleshoot it…

Behavior within HAOS: Home assistant will stop responding, no automations will run and the web-UI is unreachable.

I also see errors on the HAOS logs page sometimes but I’m unable to correlate them to a specific device or plug-in. I do not have a screenshot of these as I thought the logs would be saved within HAOS but I cant find them after a reboot…
I recall seeing runner.py errors and ncp-ezsp error that may be relevant. the python errors I see regularly after recovering from a crash, the ncp ezsp error I’ve only seen once.

What I’ve tried to resolve this: I’ve disabled a few of the integrations that I thought were causing issues, primarily ring camera integration as I was getting ffmpeg errors with the integration running but those seem to have gone away without fixing the crashing.

I installed system monitor hoping to see something there but the CPU and memory usage within home assistant seem fine. it is weird to me that the cpu usage on the Windows is much higher than what HAOS indicates it is using.



I will try to capture the logs within HAOS next time it crashes to see if there is valuable information there…

I have tried looking for patterns in when and how it crashes, but it seems to be at random and not following any particular schedule or pattern. Sometimes it’ll crash after 1-2 hours, sometimes it’ll go a 4-5 days without crashing. I’ve logged about a dozen crashes since June 16th and cannot find a pattern. It does not have to do with the Host PC updating/restarting as I’m logging reboot date/time and it doesnt match with the crashes (The VM successfully restarts upon host boot-up every time).

Any help troubleshooting this further would be greatly appreciated.
Thanks everyone!

Windows and HAOS in Virtualbox, not a good combination. What version of Virtualbox are you running ?

It ran just fine for a little over 2 months before the issues started, but I have heard that it’s not super stable. I may look into moving towards proxmox with HAOS and Windows running alongside one another. instead of HAOS within windows.

VirtualBox Version: 7.0.18

HAOS Version: 2024.6.3

I started having issues with HAOS 2024.5.X and a different version of Virtual box. Updated both of them to the versions shown above and then let it be so I could capture data without changing much.

This is a recurring theme where Vbox users run fine for a period of time and then it stops working. Sometimes after an upgrade and sometimes on its own. The only reliable thing I’ve seen that’s easy to try is to move the virtual disk to a new virtual machine and see if that corrects the problem. You can also play with the USB and optimization settings to see if it comes back to life.

1 Like

I would argue that 4 vcpus on an n100 processor is asking too much of it, especially if you are using the Windows OS as well. When I ran home assistant on Virtualbox I never gave it more than 2 vcpus.

Will give this a try this weekend. I also believe its a VirtualBox/HAOS issue as just opening the VM window will sometimes fix the issue which doesnt make much sense if HAOS itself had crashed or something. Almost as it were waiting for a user input that it cant receive…

Windows usage is very minimal, mainly use it as the interface to HA when configuring things, looking up tutorials, etc. Otherwise the PC sits idle. I can reduce the vcpus available to HAOS, particularly since it doesn’t look like I need 4 vcpus based on system monitor.

Thanks for the input!

I’m guessing the VM is CPU constrained based on your posted logs. Depending on how you have Windows setup the fact that opening the VirtualBox GUI will give more priority to the VirtualBox VMS unless you told Windows to prioritize background processes. And it’s likely constrained because of the 4 vcpu setting, not the VM load.

1 Like

I’m a little confused. How could the VM be CPU constrained if the VM load is low? How would the 4vcpu setting affect this? wouldn’t giving it more cpu be better than lowering it to 2vcpus? I can look into telling windows to prioritize the VM but I’m still confused how the VM is consuming so much CPU when the load within the VM is low.

I’ll play around with the settings and see if it has a positive impact.

Just because the HA utilization is low doesn’t necessarily mean that the virtual machine itself isn’t causing spikes. It would be very easy for you to reduce resources allocated to the VM and see if it changes anything. Definitely try easy fixes before complex ones just in case they take care of it!

1 Like
  • run VirtualBox with Admin rights
  • set Windows 11 power mode to “Best Performance” from “Balanced”
  • disable power throttling just for Virtualbox processes

From Terminal (Admin) shell (opened with Win+X):

powercfg /powerthrottling disable /path “C:\Program Files\Oracle\VirtualBox\VBoxHeadless.exe”
powercfg /powerthrottling disable /path “C:\Program Files\Oracle\VirtualBox\VirtualBoxVM.exe”
powercfg /powerthrottling list

1 Like

For sure!
Restarted the VM with 2 vcpus now, not changing anything else. Will run it for a few days to see if things are more stable and go from there!

Will make sure that power throttling is disabled. Thanks for the detailed instructions.

No crashes in the past 7hrs while at work, but did see this in the HAOS CLI log:

So whatever the root cause is still present, but now it doesn’t cause HA to become unresponsive which is a step in the right direction for sure.

The only error I see in HA is the following from this morning, thinking they are unrelated though…

Logger: aiohttp.server
Source: /usr/local/lib/python3.12/site-packages/aiohttp/web_protocol.py:421
First occurred: 9:52:51 AM (1 occurrences)
Last logged: 9:52:51 AM

Error handling request
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/aiohttp/web_protocol.py", line 350, in data_received
    messages, upgraded, tail = self._request_parser.feed_data(data)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "aiohttp/_http_parser.pyx", line 557, in aiohttp._http_parser.HttpParser.feed_data
aiohttp.http_exceptions.BadStatusLine: 400, message:
  Invalid method encountered:

    b'\x16\x03\x01\x02\x96\x01'
      ^
1 Like

Been running for 8-9 days now and everything seems stable enough! Will look at logs tonight and see if there are any issues. Then slowly start enabling features to ensure its all good!

Thanks everyone for the help

1 Like

This makes it 10 days since HA crashed so I’m marking this as solved with @mterry63’s answer. Unsure why it would have worked for a few months with 4vcpus and then become unstable but changing that to 2vcpus seems to have gotten rid of the crashing.

I am still seeing issues in CLI but as long as its not crashing, I don’t particularly care. Thank you @francisp and @CO_4X4 for helping me troubleshoot. You guys were very helpful.