Corrupted db, kernel panic, etc... at/near the 2024.8 update (not sure)

csillag · August 10, 2024, 10:19pm

I am running HA on a server with ECC memory. Generally speaking it’s rock solid, so think I can exclude HW errors right from the start.

It’s running in a VirtualBox image, downloaded from here: https://github.com/home-assistant/operating-system/releases/download/12.4/haos_ova-12.4.vdi.zip

It has 4Gb memory, 4 CPUs, and 20 Gb of disk space, provided by RAID over SSDs. It’s solid.

Still, looking at the console of the VM, I am seeing this:

I started looking since I was seeing errors in my ha core logs, where I looked since all my statistics have disappeared… the logs mentioned unrecoverable DB corruption.

The kernel panic seems to be happening while working with ext2 (the file system), so I also expect a corrupted file system.

Where does the instability come from? I was not doing anything extravagant.

No custom kernel drivers, no fancy hw addons. Just Home Assistant sitting there, communicating with stuff via TCP/IP.

What kind of kernel does HAOS ship with? I would expect that they have chosen a well tested, stable and reliable source, wouldn’t they?

I have spent a lot of time setting this up, and now all my previous stats are lost, maybe even my configuration?

What is going on here?

I don’t think it’s my HW, since again, this is a server running on ECC memory, with a bunch of VMs, and nothing else is complaining, just Home Assistant…

To say I’m not happy about this would be an understatement.

Can someone please offer any explanation about what is going in?

MaxK · August 10, 2024, 10:52pm

I would start here: 2024.5+: Tracking down instability issues caused by integrations.

Architecture overview | Home Assistant Developer Docs.

Your backups will have this data and config info.

nickrout · August 11, 2024, 12:02am

I have seen a lot of problems with virtual box and ha lately. Try proxmox or VMware.

csillag · August 11, 2024, 12:12am

I would start here: 2024.5+: Tracking down instability issues caused by integrations

I have enabled the debug option now, but why would we assume that this has anything to do with integrations? The problems are seeing are not in HA core logs, but dmesg, kernel logs… several layers lower.

Meanwhile, here is a new crash:

I am hardly running any integrations. (Shelly and Solax, besides the default ones.)

None of them are doing anything that would have the power to crash the kernel…

nickrout · August 11, 2024, 12:17am

You are right, it is not an ha issue at all.

The kernel configuration should be able to be found here

Don’t ask me exactly where, I don’t have all day sorry. No doubt someone understands the repo, I don’t have that depth of knowledge.

csillag · August 11, 2024, 11:40pm

OK, so the solution was that I am ditching the Home Assistant Operating System.

I just set up a virtual box as I normally do, install OS, and do a HA core install

Migrated my config manually.

I did lost all my previous statistical data (energy etc), but there are no kernel panics or db corruption since that.

Let’s see how this works out.

I think I can live without the superviser or their OS layer…