Has anyone else noticed stability problems running HAOS on VirtualBox on a Linux Host lately?
In the last few months, I’ve run into some problems where some JSON files or the sqlite recorder database have become corrupt. I’ve also noticed processes that die due to Signal 11 (SEGV) and 6 (ABORT). These processes have included python3.12 (Home Assistant) and Node (ZwaveJS UI).
Things generally get autorestarted by the supervisor. No notifications/repairs are generated. So if you don’t happen to catch that there was a restart it is easy to be unaware. I’ve added automations to notify my when Home Assistant restarts of when Zwave devices go unavailable.
You can find a record via journalctl on HaOS if you have access to the console or have done the steps to enable ssh on port 22222 to HaOS. (This doesn’t work from the container-based ssh addon, unless you’ve got a container that maps in the journal files.
# journalctl | grep -E 'ABEND|supervisor.docker'
Mar 18 19:57:05 homeassistant audit[4279]: ANOM_ABEND auid=4294967295 uid=0 gid=0 ses=4294967295 subj=unconfined pid=4279 comm="python3" exe="/usr/local/bin/python3.12" sig=11 res=1
Mar 18 19:57:09 homeassistant hassio_supervisor[433]: 24-03-18 15:57:09 INFO (SyncWorker_5) [supervisor.docker.manager] Starting homeassistant
At first I thought this was a hardware problem. I ran a bunch of hardware diagnostics (memtest86+, disk tests, etc.) and found nothing.
So I moved the HaOS VM to a new physical host. Things were better for a while but then problems started occurring again.
If I use vboximg-mount
to get access to the partitions inside of the VDI things look ok. The EXT4 filesystem don’t show any problems from fsck.ext4 -f volN
I’ve used a fresh download of haos VDI 12.1. I’m using a separate VDI for the hassos-data disk.
I haven’t found any clues in either the HaOS kernel logs, the virtualbox logs, or the physical host logs. So I’m somewhat grasping at straws trying to track down bugs.
The systems I’ve encountered stability problems on are running (current) VirtualBox release 7.0.14 on a nuc (Ubuntu 22.04 LTS) and a Lenovo m710q (RHEL 9.3). Other things running on those hosts and other non-Home Assistant VMs haven’t shown any problems.
I’ve started some testing for a HaOS VM on VirtualBox on Windows 11. So far I haven’t seen any problems there, but I have no idea idea what might trigger the problem, so other than placing some load, I’m not sure what to try.
I haven’t opened an issue against haos or virtualbox since I don’t have any specifics to go on.
Anyone else see something like this?