Database corrupt, yet again

jba · December 3, 2023, 7:28pm

Hey guys,

Hope someone can help me, or maybe knows what I can do to fix it.
Im running HA in a Debian Virtualbox VM.
I keep getting database malformations every 2,3,4,5 months.

I got recorder purging at 5 days with repack.
The way I fix it now is exporting the corrupt db and .recover with

sqlite3 ./home-assistant_v2.db.corrupt ".recover" | sqlite3 ./home-assistant_v2.db_fix

I use ssh ha stop core, replace db with fixed one, and ssh ha start core.
What I really would like to know is why is it corrupting…?
These are the log messages of the malformation:

Logger: homeassistant.components.recorder.util
Source: components/recorder/util.py:139
Integration: Recorder (documentation, issues)
First occurred: 04:12:05 (1 occurrences)
Last logged: 04:12:05

Error executing query: (sqlite3.DatabaseError) database disk image is malformed (Background on this error at: https://sqlalche.me/e/20/4xp6)
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/cursor.py", line 1135, in fetchall
    rows = dbapi_cursor.fetchall()
           ^^^^^^^^^^^^^^^^^^^^^^^
sqlite3.DatabaseError: database disk image is malformed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/recorder/util.py", line 139, in session_scope
    yield session
  File "/usr/src/homeassistant/homeassistant/components/recorder/purge.py", line 88, in purge_old_data
    has_more_to_purge |= _purge_states_and_attributes_ids(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/recorder/purge.py", line 195, in _purge_states_and_attributes_ids
    state_ids, attributes_ids = _select_state_attributes_ids_to_purge(
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/recorder/purge.py", line 255, in _select_state_attributes_ids_to_purge
    ).all():
      ^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 1390, in all
    return self._allrows()
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 554, in _allrows
    rows = self._fetchall_impl()
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 2293, in _fetchall_impl
    return list(self.iterator)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/loading.py", line 215, in chunks
    fetch = cursor._raw_all_rows()
            ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 546, in _raw_all_rows
    rows = self._fetchall_impl()
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/cursor.py", line 2102, in _fetchall_impl
    return self.cursor_strategy.fetchall(self, self.cursor)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/cursor.py", line 1139, in fetchall
    self.handle_exception(result, dbapi_cursor, e)
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/cursor.py", line 1080, in handle_exception
    result.connection._handle_dbapi_exception(
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2343, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/cursor.py", line 1135, in fetchall
    rows = dbapi_cursor.fetchall()
           ^^^^^^^^^^^^^^^^^^^^^^^
sqlalchemy.exc.DatabaseError: (sqlite3.DatabaseError) database disk image is malformed
(Background on this error at: https://sqlalche.me/e/20/4xp6)

and

Logger: homeassistant.components.recorder.core
Source: components/recorder/core.py:912
Integration: Recorder (documentation, issues)
First occurred: 04:12:05 (1 occurrences)
Last logged: 04:12:05

Unrecoverable sqlite3 database corruption detected: (sqlite3.DatabaseError) database disk image is malformed (Background on this error at: https://sqlalche.me/e/20/4xp6)
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/cursor.py", line 1135, in fetchall
    rows = dbapi_cursor.fetchall()
           ^^^^^^^^^^^^^^^^^^^^^^^
sqlite3.DatabaseError: database disk image is malformed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/recorder/core.py", line 912, in _process_one_task_or_recover
    return task.run(self)
           ^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/recorder/tasks.py", line 113, in run
    if purge.purge_old_data(
       ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/recorder/util.py", line 625, in wrapper
    return job(instance, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/recorder/purge.py", line 88, in purge_old_data
    has_more_to_purge |= _purge_states_and_attributes_ids(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/recorder/purge.py", line 195, in _purge_states_and_attributes_ids
    state_ids, attributes_ids = _select_state_attributes_ids_to_purge(
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/recorder/purge.py", line 255, in _select_state_attributes_ids_to_purge
    ).all():
      ^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 1390, in all
    return self._allrows()
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 554, in _allrows
    rows = self._fetchall_impl()
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 2293, in _fetchall_impl
    return list(self.iterator)
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/loading.py", line 215, in chunks
    fetch = cursor._raw_all_rows()
            ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/result.py", line 546, in _raw_all_rows
    rows = self._fetchall_impl()
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/cursor.py", line 2102, in _fetchall_impl
    return self.cursor_strategy.fetchall(self, self.cursor)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/cursor.py", line 1139, in fetchall
    self.handle_exception(result, dbapi_cursor, e)
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/cursor.py", line 1080, in handle_exception
    result.connection._handle_dbapi_exception(
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/base.py", line 2343, in _handle_dbapi_exception
    raise sqlalchemy_exception.with_traceback(exc_info[2]) from e
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/engine/cursor.py", line 1135, in fetchall
    rows = dbapi_cursor.fetchall()
           ^^^^^^^^^^^^^^^^^^^^^^^
sqlalchemy.exc.DatabaseError: (sqlite3.DatabaseError) database disk image is malformed
(Background on this error at: https://sqlalche.me/e/20/4xp6)

My server its running on:

Intel(R) Core™ i5-4590 CPU @ 3.30GHz
disk: Samsung SSD 850
RAM 16GB

I already replaced the SSD and the RAM.

NathanCu · December 3, 2023, 7:33pm

If you’re seeing it that much either you have disk issues or something operational is going on…

Assuming you are doing everything you can to ensure it’s not disk…

How are you shutting down HA. Is the machine crashing? What happens when the host drops? Do you have any power events? Those logs are great looking at the actual corruption but do you have any other issues in your log from before the dB corruption? (I’m wondering if your corruption is a symptom of something else. Rather than the cause)

jba · December 5, 2023, 3:47pm

I hardly ever shut down/reboot the host and when I do I do it the right way. (ACPI shutdown)
The server itself has never crashed, and no power loss or something alike.

I already replaced the SSD and the RAM so cant be that.

Ill save the entire log next time it happens as I dont have access to it anymore since restoring the DB.

nikito7 · December 13, 2023, 1:24pm

Acpi shutdown skip some stuff

jbakers · December 13, 2023, 1:41pm

That’s a rather empty answer, to be honest.
As far as I know is acpi shutdown the preferred method of shutting down or rebooting a vm.

Also, would you like to specify what is being skipped…

vingerha · December 15, 2023, 7:10am

Earlier this month I migrated from MariaDb back to sqlite, having never had any issues before, I ran into the above for the first time this morning seemingly ater a restart. I restored backups of 4 days ago where I know (!) it was working but also these proved to provide corrupt.
I am wondering now where this comes from as I doubt the db to be corrupt or the disk …
EDIT: I restart HA (only HA) each morning 01:00

tbandras · December 19, 2023, 7:33pm

Would you mind explaining these steps to a beginner? I am apparently having the same issue (see the description here) and only realized today that the symptoms are exactly the same as yours.

My setup is similar to yours, I reckon.

vingerha · December 20, 2023, 10:41am

Well, I found my source…ME… I opened the db with dbbrowser whilst HA was still running

firstcolle · December 9, 2024, 3:41pm

hi, have you fix your problem?
i have the same problem. what i notice is that the repack destroys my database.
In September, I decided to start fresh with a clean database, setting the purge to 90 days.
Until today, everything was fine, and the database was only 4GB. Even the repack, scheduled every second Sunday of the month, worked smoothly, and I noticed the database size decreasing significantly.

However, last night (second sunday of december, when repack is planned), it got corrupted again.
I tried downloading the file, repairing it with SQLite, and re-uploading it after stopping the core. Upon restarting, the freshly uploaded database is immediately marked as corrupted, and HA starts with a new database.

Even if I manually start the purge action by selecting only

action: recorder.purge
data:
repack: true
the database gets corrupted.

vingerha · December 9, 2024, 4:42pm

Never seen this with my db, the sqlite is about 1.5Gb itself and I do a purge every 10 days with repack

jba · December 10, 2024, 8:43pm

Yea I fixed it but not in a way, I think You’d like to hear.
I stopped using VirtualBox.
Switched to proxmox CE and will never look back.
I used Tteck’s script to create the Haos VM (not LXC), and its been running with no problems at all since 10+ months.
This particular Proxmox VE Helper-Script

I think my corruption problems were because of VirtualBox and the combination of my full image backup scripts running on my Debian machine weekly. (This is why I marked NathanCu’s post as the solution.)

Proxmox also has a backup function, but whether its Snapshot, Suspend or Stop, they have never failed me.

If you have a dedicated server I advise you to backup your HA, install proxmox on a fresh system and restore the backup.

firstcolle · December 12, 2024, 8:51am

for now, i simply disabled the repack function and left the db grows.
i’ll look into your solution but doesn’t seem simple