VirtualBox on Linux instability / corruption?

Has anyone else noticed stability problems running HAOS on VirtualBox on a Linux Host lately?

In the last few months, I’ve run into some problems where some JSON files or the sqlite recorder database have become corrupt. I’ve also noticed processes that die due to Signal 11 (SEGV) and 6 (ABORT). These processes have included python3.12 (Home Assistant) and Node (ZwaveJS UI).

Things generally get autorestarted by the supervisor. No notifications/repairs are generated. So if you don’t happen to catch that there was a restart it is easy to be unaware. I’ve added automations to notify my when Home Assistant restarts of when Zwave devices go unavailable.

You can find a record via journalctl on HaOS if you have access to the console or have done the steps to enable ssh on port 22222 to HaOS. (This doesn’t work from the container-based ssh addon, unless you’ve got a container that maps in the journal files.

# journalctl | grep -E 'ABEND|supervisor.docker'
Mar 18 19:57:05 homeassistant audit[4279]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 subj=unconfined pid=4279 comm="python3" exe="/usr/local/bin/python3.12" sig=11 res=1
Mar 18 19:57:09 homeassistant hassio_supervisor[433]: 24-03-18 15:57:09 INFO (SyncWorker_5) [supervisor.docker.manager] Starting homeassistant

At first I thought this was a hardware problem. I ran a bunch of hardware diagnostics (memtest86+, disk tests, etc.) and found nothing.

So I moved the HaOS VM to a new physical host. Things were better for a while but then problems started occurring again.

If I use vboximg-mount to get access to the partitions inside of the VDI things look ok. The EXT4 filesystem don’t show any problems from fsck.ext4 -f volN

I’ve used a fresh download of haos VDI 12.1. I’m using a separate VDI for the hassos-data disk.

I haven’t found any clues in either the HaOS kernel logs, the virtualbox logs, or the physical host logs. So I’m somewhat grasping at straws trying to track down bugs.

The systems I’ve encountered stability problems on are running (current) VirtualBox release 7.0.14 on a nuc (Ubuntu 22.04 LTS) and a Lenovo m710q (RHEL 9.3). Other things running on those hosts and other non-Home Assistant VMs haven’t shown any problems.

I’ve started some testing for a HaOS VM on VirtualBox on Windows 11. So far I haven’t seen any problems there, but I have no idea idea what might trigger the problem, so other than placing some load, I’m not sure what to try.

I haven’t opened an issue against haos or virtualbox since I don’t have any specifics to go on.

Anyone else see something like this?

I ran HASS OS in virtualbox for nearly 2 years, not experienced issues like yours and I would say that it was stable BUT the big thing for me was performance

The setup used far too much system kernel time and doing basic things was very slow

I had enough of this and migrated to running core and there addons in containers. Load average went down by 80% and I had so much overhead that I’m now able to run more services and even have frigate running on the same box, it was like having a brand new server again!

I’ll never go back to virtualbox for HA, ever, that’s for sure. It looks like you know Linux fairly well so running docker or podman shouldn’t be a challenge

Hi @_dev_null - Thanks for your reply.

I probably should have mentioned I have another VirtualBox instance in a different location that has been running fine for a number of years. (I don’t remember when it was installed, but the first backup I can find was from Dec 2020.)

It’s running on VirtualBox 6.1.42 on RHEL 7.9. So a much older install.

(At one point I was forced to rebuild the VM because an HaOS update made it unbootable. It turned out the default disk controller from 2020 was a pretty old IDE setup. At some point maybe around HaOS 8.5 support for that was dropped and I didn’t catch it. I created a new VM with the new defaults and got a SATA controller as the default emulated device. It’s been pretty happy since.)

I’m surprised you were seeing as much load from your VirtualBox VM. I know there is some overhead from the emulation, running an extra kernel, etc. I rarely see a load average above around 0.75 unless my VMs are doing some active work. I have some other smallish non-home assistant VMs that I use for different isolated services/OS instances.

I did resist running HaOS for a while. The Home Assistant instance I’m having a problem with used to be hosted as a “supervised” install on an RPI 4 from 2021 to some point in 2023. But I was running out of memory on a 4GB RPI 4 and the requirements for the supervised install have gotten onerous enough for me to make it worth just moving to a VM on x64 (Thinkcentre mini pc).

I’ve stuck with VirtualBox for many years because I liked the flexibility of being able to fairly easily move VMs between a Windows laptop, a Linux server, a mac mini, etc.


I suppose my next steps are to try VirtualBox 7 alternatives: Either KVM or VirtualBox 6 (since my other Home Assistant instance seems to be running fine that way). However I’ll still be just shooting in the dark because I have no idea how to make the problem reproducible.

1 Like

I’ve moved to KVM on the same hardware. Bit of a learning needed, but really like the ability to use a nice CLI (virsh) as well as have a GUI/Web GUI (virt-manager or Cockpit) available when I want. ’

Being able to get to the serial console for host HaOS easily via virsh is very handy, especially if remotely logging in. It avoids having to use an emulated display adapter and a remote desktop for what is essentially text level access. HaOS out of the box has both consoles enabled so I didn’t have to do anything.

I’m not usually a big GUI/web GUI user, but the cockpit project that Red Hat is behind gives a nice overview of LVM based storage, KVM virtual machines, performance metrics, etc. without having to do much work.

Since the VirtualBox problems would crop up and then go away for long periods it will take a few weeks to be confident that switching away from VirtualBox 7 solved the problem.