Host keeps crashing post Supervisor 2022.10.0 update

My Home Assistant system has been running stable (enough) for a long while. That said, after I recently upgraded to Supervisor 2022.10.0, it started crashing regularly. The error I see on the host machine is “possible timer handling issue on cpu=0” and a whole bunch of other log files where the screen usually only reads: “ha >”.

I went through some troubleshooting steps on discord, rebooting various different ways. Sometimes the host would never fully boot. Sometimes it would, but would eventually give this error and run out of memory.

As a last resort, I grabbed a new VDI of Home Assistant and restored from backup. This solved the problem for about a week. Then the problem came back. It MAY have been triggered by a power-loss to the host machine, but I’m not 100% sure about that.

I’m going to go through the restore process again, but was curious if anyone else is experiencing this or might have an idea on what is going wrong.

Thanks!

So much missing information.
What is the host?
How is Home Assistant installed?
What else is running on the host?

Sorry about that!

Home Assistant 2022.10.3
Supervisor 2022.10.0
Operating System 9.2
Frontend 20221010.0 - latest

I’m running Hassio on an Oracle Virtual Box VM (the one provided by Home Assistant), which runs on a Windows 10 Machine.

Windows machine details:

  • Windows 10 Pro 10.0.19043 Build 19043
  • HP Z440 Workstation
  • Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz, 3492 Mhz, 4 Core(s), 8 Logical Processor(s)
  • 32 GB RAM

Here’s how the VM is set up:




What other information can I provide? Happy to!

Well, there goes my idea that your Raspberry was simply running out of RAM…

So, what is crashing? The VM or the host PC? If it’s the VM you might try increasing the memory, but 8GB should be overkill already.

I am not familiar with VM’s. Where are the database files stored?
Try deleting config/home-assistant_v2.db It can grow pretty large in short order.

The VM is crashing. I am using MariaDB, so that .db file isn’t there. That said, here’s what glances is showing. It seems like I have plenty of disk space still, no?


I think I am going down the wrong rabbit hole since I have little experience with VMs. Maybe someone else will have a better insight here.

1 Like

You VM should have 2 cores and 4GB RAM (you can get away with 2G RAM). Your disk should be 32G or 64G. MariaDB is no issue and will run on 2G RAM.

I also suggest you update Win10 to 19044.2130 and make certain you are using latest VirtualBox version.

As you suggested, I shut down my VM normally, updated the settings to 2 CPUs, updated windows to 19044.2130 and rebooted. My RAM was already at 8gb and my storage at 32 (with plenty of room).

Upon restart I’m getting a new symptom that I came across last time this happened (1 week ago). The VB host starts and then gets stuck here:

HA is unaccessible via browser. In the past I would try resetting the VM and could sometimes get back in, but ultimately that’s when I started from a fresh VDI and restored Home Assistant from backup. Unfortunately, it came back 1 week later.

I am also hearing the constant sound of a USB device disconnecting, however when I look at USBLogView and USBDView, nothing is showing as disconnecting. I’ve seen this particularly behavior before, many moons ago, but it didn’t seem to impact how things were running. I’d eventually just restart the entire machine out of annoyance and it would go away.

Does this provide any other clues for you?

I just rebooted the Host machine and now it starts up normally. I expect in a few hours I will start getting that timing error again. Odd…

I believe you have a disk corruption problem somewhere on your Windows PC. You can try to get things going using Hyper-V instead of VirtualBox.

Hyper-V is part of Windows Pro, just enable it.

Please check the Windows disk for errors. Also if problem persists share your VB network configuration.

Here’s what chkdsk reports. It seems to think things are okay. Is there another scan I should try? I would try Hyper-V, but I might run into trouble there. Even though my machine is reporting that it supports virtualization, I can’t even get WSL to recognize this fact. See my unanswered post on that here: https://www.reddit.com/r/WindowsHelp/comments/xlblq4/wsl_thinks_vm_platform_isnt_enabled_but_it_seems/

Microsoft Windows [Version 10.0.19044.2130]
(c) Microsoft Corporation. All rights reserved.

C:\Windows\system32>chkdsk
The type of the file system is NTFS.
Volume label is Windows.

WARNING!  /F parameter not specified.
Running CHKDSK in read-only mode.

Stage 1: Examining basic file system structure ...
  1039104 file records processed.
File verification completed.
 Phase duration (File record verification): 8.05 seconds.
  13229 large file records processed.
 Phase duration (Orphan file record recovery): 0.00 milliseconds.
  0 bad file records processed.
 Phase duration (Bad file record checking): 0.29 milliseconds.

Stage 2: Examining file name linkage ...
  430 reparse records processed.
  1422006 index entries processed.
Index verification completed.
 Phase duration (Index verification): 20.10 seconds.
  0 unindexed files scanned.
 Phase duration (Orphan reconnection): 8.80 seconds.
  0 unindexed files recovered to lost and found.
 Phase duration (Orphan recovery to lost and found): 0.29 milliseconds.
  430 reparse records processed.
 Phase duration (Reparse point and Object ID verification): 6.51 milliseconds.

Stage 3: Examining security descriptors ...
Security descriptor verification completed.
 Phase duration (Security descriptor verification): 40.03 milliseconds.
  191452 data files processed.
 Phase duration (Data attribute verification): 0.37 milliseconds.
CHKDSK is verifying Usn Journal...
  35243112 USN bytes processed.
Usn Journal verification completed.
 Phase duration (USN journal verification): 122.09 milliseconds.

Windows has scanned the file system and found no problems.
No further action is required.

 249416758 KB total disk space.
 101401996 KB in 626902 files.
    398108 KB in 191453 indexes.
         0 KB in bad sectors.
   1152794 KB in use by the system.
     65536 KB occupied by the log file.
 146463860 KB available on disk.

      4096 bytes in each allocation unit.
  62354189 total allocation units on disk.
  36615965 allocation units available on disk.
Total duration: 37.14 seconds (37149 ms).

C:\Windows\system32>

Okay, here’s some new weird info. Rebooted my machine and the startup froze here:


I then clicked the X (close) button and then hit cancel. It unstuck. I was able to duplicate this behavior one other time.

Also just looked into Hyper-V and it doesn’t support USB passthrough, so that would be a non-starter as I have USB Z-Wave and Zigbee devices at the center of this.

Change your storage driver to SATA. Try a HAOS instance with the following similar settings (not important but you can remove the Floppy support as well):

Also, while you are spinning up another test VM, maybe using a fresh new vdi download, and maybe consider a different storage media (different physical disk or external, even) to house your VM image… just so to be sure.

What are the VM settings in your Network tab, BTW?

Trying all the things you mentioned and reinstalling/restoring. I tried a full restore last Friday and once again everything worked great until the following Thursday. So weird! Let’s see if your most recent suggestions improve things for next week.

I am also going to uninstall some add-ons that aren’t must-have (mariadb, influxdb)

Here are my network settings, as requested. If it helps, the Virtual Machine is installed on a Samsung SSD 850 Pro 256GB

Try different “Promiscuous Mode” settings.

Another Thursday and another crash. Some more info: it’s not enough for me to simply replace my existing VDI with a fresh one. I need to delete the entire VM instance in VirtualBox and recreate from scratch. This is driving me crazy!

@k8gg : what is promiscuous mode?

Here’s a VBox log leading up to the event:

And then when I reset it and started it up again, it worked for an hour or so and then this happened:

Any ideas?

I believe your system does not support or not enabled VT-x, and the crash might be due to this (slow performance, something times out) or you have RAM issues, or something in your PC interferes with VB. See below from HA log.

00:00:05.723416 HM: HMR3Init: Attempting fall back to NEM: VT-x is not available
00:00:05.781980 CPUM: No hardware-virtualization capability detected

I think you should use HA on another machine which support hardware virtualisation or use on the present system as a bare metal install.