PSA: 2024.7 recorder problems

The migration code is at core/homeassistant/components/recorder/migration.py at 5fb6c65d23e4af9af3bd9080017772d5623162d6 · home-assistant/core · GitHub if you would like to take a stab at a PR to improve the rebuild speed for large SQLite databases.

2 Likes

It’s now 6 hours after the last log entry for recorder which is:

2024-08-12 04:47:58.331 WARNING (Recorder) [homeassistant.components.recorder.migration] Rebuilding SQLite table states; This will take a while; Please be patient!

I feel everything is still super sluggish and nothing really works but CPU usage is “only” 20%.

Is the process still ongoing? How long can it take realistically?

Yes

This depends on:

  1. Your hardware. CPU & Memory
  2. The size of your database in GB.
  3. The number of entities in your database.

I’ve heard it’s taken over 48 hours for some people.

Has there been a clear manual migration process documented while minimizing Home Assistant downtime? That would be perfectly acceptable for me - though the hardware (8-core Ryzen, 4 allocated to hassos kvm) should be able to finish the migration fine.

edit: I see it is documented above. Can we run this with the recorder temporary disabled somehow?

Found this topic via google search. Not sure 100% related but seems like this.

I’m on 2024.7.4 core.
Yesterday evening, I wasplaying with Solar Panels Forecast settings. It was the only activity I did. Today I found that some statistics including Energy dashboard doesn’t work. Log shows following error starting from yesterdays’s 23pm CEST

024-08-15 11:57:58.556 ERROR (Recorder) [homeassistant.components.recorder.core] SQLAlchemyError error processing task CommitTask()
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/recorder/core.py", line 931, in _process_one_task_or_event_or_recover
    task.run(self)
  File "/usr/src/homeassistant/homeassistant/components/recorder/tasks.py", line 291, in run
    instance._commit_event_session_or_retry()  # noqa: SLF001
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/recorder/core.py", line 1204, in _commit_event_session_or_retry
    self._commit_event_session()
  File "/usr/src/homeassistant/homeassistant/components/recorder/core.py", line 1230, in _commit_event_session
    session.execute(
  File "/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 2351, in execute
    return self._execute_internal(
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/session.py", line 2236, in _execute_internal
    result: Result[Any] = compile_state_cls.orm_execute_statement(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/bulk_persistence.py", line 1603, in orm_execute_statement
    result = _bulk_update(
             ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/bulk_persistence.py", line 335, in _bulk_update
    persistence._emit_update_statements(
  File "/usr/local/lib/python3.12/site-packages/sqlalchemy/orm/persistence.py", line 948, in _emit_update_statements
    raise orm_exc.StaleDataError(
sqlalchemy.orm.exc.StaleDataError: UPDATE statement on table 'states' expected to update 24 row(s); 0 were matched.

Glancing at the data in the states table I can’t see anomalies. Maybe it’s related to non-existent FKs for queued queries (as I understand from the first post)

Anyway, the most important question is: how can I recover from this situation? Of course without loss of today’s data if possible.

I appreciate a prompt response.

BTW I have HA running off SSD. So I assume it’s not the storage damage (but still possible of course)

Edit:
I after enabling debug for sqlalchemy I found that while requested records indeed don’t exist in states table. What was even more strange, the maximum value of state_id found in the table was way lower than requested ids (like a difference by 10k or so).
Seeing no option to fix data in the table manually in order to make the recorder continue running, I decided to restart the restart.
On start, the HA marked the db as corrupted creating a new empty one.

Is a corruption of db something that might have come with v2024.7? Or should I be afraid of SSD?

Edit2. I succeeded with db recovery. So I lost only one day of data (yesterday). I could merge those data but it’s not worth it. Having such a tool could be helpful

I think this is an important pickup. I know some columns changed from INT to BIGINT (SQLite). Perhaps something went wrong with what was considers the current ID.

Fck…
The system did it to me again. Once I switched over to recovered db, it broke at 10:58PM CEST. It is the same time it got broken for the first time.

Is there any process that can be associated with this time?

Great contribution! Been looking for what is supposed to happen, do you also have the steps for manual upgrade for a MariaDB10 ?

I’m running on a synology NAS, HA in docker, standard MariaDB on syn. 2024.08 did not automatically start the migration. Updated the docker image to 2024.08.1, only after starting in safe mode the migration started.

No errors during migration, but since there was no feedback I cancelled the first migration after 6 days of (single core) 100% CPU for MariaDB. Updated to 2024.8.2, same result, only starts in safe mode. No errors, still running after >28h but no finished migration. Last message in HA.log:

2024-08-17 11:09:39.525 WARNING (Recorder) [homeassistant.components.recorder.migration] Modifying columns attributes_id in table state_attributes. Note: this can take several minutes on large databases and slow machines. Please be patient!

Cannot restart HA because it says the migration is still running…
On syno in /var/packages/MariaDB10/target/mysql/ there’s a file NAS_name.err, this file is growing by the minute with lines like this:

---TRANSACTION 59630390, ACTIVE 86678 sec fetching rows
mysql tables in use 2, locked 2
124478 lock struct(s), heap size 12083320, 2987245 row lock(s)
MariaDB thread id 5, OS thread handle 140079888877248, query id 418 NAS_Synology 192.168.10.232 home-assistant copy to tmp table
ALTER TABLE state_attributes MODIFY attributes_id BIGINT NOT NULL AUTO_INCREMENT
--------
FILE I/O
--------
Pending flushes (fsync): 0
3820464 OS file reads, 2484150 OS file writes, 89079 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
---
LOG
---
Log sequence number 1298633634812
Log flushed up to   1298633634812
Pages flushed up to 1298633394058
Last checkpoint at  1298633394058
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 50331648
Dictionary memory allocated 182616
Buffer pool size   993
Free buffers       0
Database pages     256
Old database pages 0
Modified db pages  255
Percent of dirty pages(LRU & free pages): 99.222
Max dirty pages percent: 90.000
Pending reads 0
Pending writes: LRU 0, flush list 0
Pages made young 110780, not young 3202335
0.00 youngs/s, 0.00 non-youngs/s
Pages read 3820428, created 130742, written 2424980
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 256, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
5 read views open inside InnoDB
state: sleeping
----------------------------
END OF INNODB MONITOR OUTPUT
============================

It seems to me the synology is not capable to perform this migration within a reasonable time. Is there a possibility to follow another route to perform this migration, something like:

  • Start a VM in VMware on my windows machine including a MariaDB or other DB
  • Create the new tables including the correct BIGINT columns
  • Temporarily disable recorder in HA
  • Copy all relevant tables and data from the NAS to the windows machine if possible directly into the new lay-out tables (or perform the migration locally on windows?)
  • Drop the effected tables on the NAS
  • Move data back into MariaDB on the NAS

This thread is for the SQLite table rebuild that was added in 2024.7.x to fix the foreign key problem with the new version of SQLite. Please start a new thread for help with the MariaDB/MySQL/Postgresql BIGINT schema change that was introduced in 2024.8.x

I am fairly certain I have issue #2 (a third party integration is causing recorder to stop working). I tried disabling all third party integrations and also tried rolling back to 2024.6.4, but I still get the recorder errors. Any tips on how I could narrow down the offending integration?

My integrations:

Logs:

Logger: homeassistant.components.recorder.util
Source: components/recorder/util.py:136
integration: Recorder (documentation, issues)
First occurred: 7:28:02 PM (3 occurrences)
Last logged: 7:35:10 PM

Error executing query
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/homeassistant/components/recorder/util.py", line 136, in session_scope
    yield session
  File "/usr/local/lib/python3.12/site-packages/homeassistant/components/recorder/statistics.py", line 470, in compile_missing_statistics
    modified_statistic_ids = _compile_statistics(
                             ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/homeassistant/components/recorder/statistics.py", line 552, in _compile_statistics
    compiled: PlatformCompiledStatistics = platform_compile_statistics(
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/homeassistant/components/sensor/recorder.py", line 467, in compile_statistics
    last_stats = statistics.get_latest_short_term_statistics_with_session(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/homeassistant/components/recorder/statistics.py", line 1995, in get_latest_short_term_statistics_with_session
    return _sorted_statistics_to_dict(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/homeassistant/components/recorder/statistics.py", line 2164, in _sorted_statistics_to_dict
    "end": start_ts + table_duration_seconds,
           ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for +: 'NoneType' and 'float'
Logger: homeassistant.components.recorder.core
Source: components/recorder/core.py:906
integration: Recorder (documentation, issues)
First occurred: 7:28:02 PM (3 occurrences)
Last logged: 7:35:10 PM

Error while processing event CompileMissingStatisticsTask()
Error while processing event StatisticsTask(start=datetime.datetime(2024, 8, 22, 2, 25, tzinfo=datetime.timezone.utc), fire_events=True)
Error while processing event StatisticsTask(start=datetime.datetime(2024, 8, 22, 2, 30, tzinfo=datetime.timezone.utc), fire_events=True)
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/homeassistant/components/recorder/core.py", line 906, in _guarded_process_one_task_or_event_or_recover
    self._process_one_task_or_event_or_recover(task)
  File "/usr/local/lib/python3.12/site-packages/homeassistant/components/recorder/core.py", line 926, in _process_one_task_or_event_or_recover
    task.run(self)
  File "/usr/local/lib/python3.12/site-packages/homeassistant/components/recorder/tasks.py", line 178, in run
    if statistics.compile_missing_statistics(instance):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/homeassistant/components/recorder/util.py", line 643, in wrapper
    return job(instance, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/homeassistant/components/recorder/statistics.py", line 470, in compile_missing_statistics
    modified_statistic_ids = _compile_statistics(
                             ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/homeassistant/components/recorder/statistics.py", line 552, in _compile_statistics
    compiled: PlatformCompiledStatistics = platform_compile_statistics(
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/homeassistant/components/sensor/recorder.py", line 467, in compile_statistics
    last_stats = statistics.get_latest_short_term_statistics_with_session(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/homeassistant/components/recorder/statistics.py", line 1995, in get_latest_short_term_statistics_with_session
    return _sorted_statistics_to_dict(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/homeassistant/components/recorder/statistics.py", line 2164, in _sorted_statistics_to_dict
    "end": start_ts + table_duration_seconds,
           ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for +: 'NoneType' and 'float'

That’s a different issue caused by running an old version of HA with a database that has been upgraded to a newer version. You’ll have to remove the old style rows manually since the migration already happened.

Thank you! I took the opportunity to move my HA to MariaDB (starting with a blank DB) while running 2024.6.4, and then upgraded to 2024.8.2, and have no more issues!

1 Like

I’m running HA w/ 30 days history and Sqlite and the 2024.8.2 patch finally resolved the issues for me. It shrunk the database from 34GB back to 9GB too. I did not need to do any manual intervention.

1 Like

I have gone through this very long thread but not managed to get my logbook working again. I tried deleting the sqlite db in case it was corrupted in some way but that did not help (the file is now about 2GB). I’m not sure when this started but it was probably after an update some time ago. I am running HA Core 2024.8.3 and Supervisor 2024.08.0. Any suggestions would be most appreciated. (I am not running the integrations/add-ons mentioned in the thread as possibly causing a problem with the recorder. There is nothing in my log that suggests the cause. I can try to switch to MariaDB if anyone thinks that will help.

What’s not working with logbook?

I used to get a logbook entries for every device state change. I no longer get them.

Does the page not appear? Have you cleared your cache and refreshed your page? If the page does appear, is it all entities or just some?

Here are 2 examples of what I am seeing. One is from the HAI which uses an add-in and the other is from Emporia power monitor. None of the entities are showing logbook entries or statistics.


Sensors do not have logbook entries and that binary sensor has no changes in the given history period shown. Can you please go to your logbook sidebar and look there instead of the more info page?

I went to the logbook, sidebar and selected an entity for a door that has been opened numerous times in the last day, and there are no logbook entries whatsoever. I know what this used to look like, and it had entries for every opening and closing for each of these devices.