SQLAlchemyError error causes Supervisor to fault

I have a stable error with a DB. The issue is reproduced all the time on SQLite, MariaDB and now on PostgreSQL. I exclude almost all statistics from the config:

recorder:
  db_url: !secret psql_string 
  db_retry_wait: 15
  purge_keep_days: 90
  commit_interval: 10
  exclude:
    domains:
      - media_player
      - switch
      - automation
      - update
      - person
      - binary_sensor
      - sun
      - zone
      - camera
      - device_tracker
      - weather
      - button
      - light
      - select
      #- sensor
    entity_globs:
      - sensor.sun*
      - sensor.weather_*
    event_types:
      - service_removed
      - service_executed
      - platform_discovered
      - homeassistant_start
      - homeassistant_stop      
      - feedreader
      - service_registered
      - call_service
      - component_loaded
      - logbook_entry
      - system_log_event
      - automation_triggered
      - script_started
      - timer_out_of_sync
      - recorder_5min_statistics_generated
      - recorder_hourly_statistics_generated
      - panels_updated
      - telegram_sent
      - device_registry_updated

The error that cause global fault of HA. After this error, Supervisor is crashed and only a hard reset can fix the situation.

Logger: homeassistant.components.recorder.core
Source: components/recorder/core.py:912
Integration: Recorder (documentation, issues)
First occurred: 19:02:16 (1 occurrences)
Last logged: 19:02:16

SQLAlchemyError error processing task CommitTask(): This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (psycopg2.errors.IoError) could not read block 23 in file "base/16480/2608": I/O error [SQL: INSERT INTO state_attributes (hash, shared_attrs) SELECT p0::BIGINT, p1::TEXT FROM (VALUES (%(hash__0)s, %(shared_attrs__0)s, 0), (%(hash__1)s, %(shared_attrs__1)s, 1)) AS imp_sen(p0, p1, sen_counter) ORDER BY sen_counter RETURNING state_attributes.attributes_id, state_attributes.attributes_id AS attributes_id__1] [parameters: {'hash__0': 2994023477, 'shared_attrs__0': '{"state_class":"measurement","server_name":"Slobozhanske","server_country":"Ukraine","server_id":"39363","bytes_received":116196516,"unit_of_measurement":"Mbit/s","device_class":"data_rate","icon":"mdi:speedometer","friendly_name":"SpeedTest Download"}', 'hash__1': 3121778268, 'shared_attrs__1': '{"state_class":"measurement","server_name":"Slobozhanske","server_country":"Ukraine","server_id":"39363","bytes_sent":117121024,"unit_of_measurement":"Mbit/s","device_class":"data_rate","icon":"mdi:speedometer","friendly_name":"SpeedTest Upload"}'}] (Background on this error at: https://sqlalche.me/e/20/e3q8) (Background on this error at: https://sqlalche.me/e/20/7s2a)
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/recorder/core.py", line 912, in _process_one_task_or_recover
  File "/usr/src/homeassistant/homeassistant/components/recorder/tasks.py", line 305, in run
  File "/usr/src/homeassistant/homeassistant/components/recorder/core.py", line 1163, in _commit_event_session_or_retry
  File "/usr/src/homeassistant/homeassistant/components/recorder/core.py", line 1183, in _commit_event_session
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 1969, in commit
  File "<string>", line 2, in commit
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/state_changes.py", line 103, in _go
  File "/usr/local/lib/python3.11/site-packages/sqlalchemy/orm/session.py", line 929, in _raise_for_prerequisite_state
sqlalchemy.exc.PendingRollbackError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (psycopg2.errors.IoError) could not read block 23 in file "base/16480/2608": I/O error

[SQL: INSERT INTO state_attributes (hash, shared_attrs) SELECT p0::BIGINT, p1::TEXT FROM (VALUES (%(hash__0)s, %(shared_attrs__0)s, 0), (%(hash__1)s, %(shared_attrs__1)s, 1)) AS imp_sen(p0, p1, sen_counter) ORDER BY sen_counter RETURNING state_attributes.attributes_id, state_attributes.attributes_id AS attributes_id__1]
[parameters: {'hash__0': 2994023477, 'shared_attrs__0': '{"state_class":"measurement","server_name":"Slobozhanske","server_country":"Ukraine","server_id":"39363","bytes_received":116196516,"unit_of_measurement":"Mbit/s","device_class":"data_rate","icon":"mdi:speedometer","friendly_name":"SpeedTest Download"}', 'hash__1': 3121778268, 'shared_attrs__1': '{"state_class":"measurement","server_name":"Slobozhanske","server_country":"Ukraine","server_id":"39363","bytes_sent":117121024,"unit_of_measurement":"Mbit/s","device_class":"data_rate","icon":"mdi:speedometer","friendly_name":"SpeedTest Upload"}'}]
(Background on this error at: https://sqlalche.me/e/20/e3q8) (Background on this error at: https://sqlalche.me/e/20/7s2a)

Have you tried to troubleshoot the error yourself first. For example try to remove domains, test, then try globs, test, then try events, test. If you find a specific exclude area to be of an issue try to narrow down by testing each domain, etc.

Hi and thanks for replying. Yes, I’m troubleshooting the error more than 3 months. Have you seen my message? I changed 2 databases, excluded all domains and sensors from recorder. I have 1 day retention for history data. The DB is not reached 50mb daily, but the HA fault is reproducible stable.

I have seen your message, but there is no information what steps you took to troubleshoot the problem, hence my question.

Original exception was: (psycopg2.errors.IoError) could not read block 23 in file "base/16480/2608": I/O error

Looks like a filesystem error or a duff SD card/SSD to me, especially if it affects multiple database backends.

You’ll need to do an offline check on that.

The SSD is 1 week in use. Intenso M.2 SSD SATA III Top, 128 GB, 520 MB/s. Do you think the SSD is a slow horse against massive history? I have 30 devices in network.

[Thu Dec 21 12:21:53 EET 2023][INFO] Run performance test
Category                  Test                      Result              
HDParm                    Disk Read                 331.48 MB/s              
HDParm                    Cached Disk Read          271.36 MB/s              
DD                        Disk Write                115 MB/s                 
FIO                       4k random read            19248 IOPS (76992 KB/s)  
FIO                       4k random write           10291 IOPS (41165 KB/s)  
IOZone                    4k read                   33051 KB/s               
IOZone                    4k write                  26390 KB/s               
IOZone                    4k random read            20863 KB/s               
IOZone                    4k random write           29651 KB/s 

The performance of the SSD isn’t relevant. Something is causing the underlying operating system to return an IO Error to the database engine.

If that were mine, I’d be reformatting it with an appropriate file system. I’ve installed HAOS and that appears to be using EXT4.

OOTB HAOS is used according to the installation manual. I have this issue reproduced on SQL Lite, MariaDB and Postgre SQL even with the 1-day retention policy and < 100mb db size. I strongly believe that is caused by some hardware issue. Just ordered a new power supply, will see soon the result.

Ah, yeah, power could indeed be the cause.

Hello, since you didn’t post any updates I wonder if and how you resolved this issue?