History Graph Cards freezes graphs? Problem with flat lines?

hi,
i dont know why but since 2or 3 days i have regognized problems showing my history graphs i have used for a couple of weeks/months…

hard to explain: the graph is showing flat lines for all entities of this graph in a same time range i´m offline (so not beeing on the gui). when accessing the gui (e.g. a dashboard) the new comng readings will be shown correctly, only the last (e.g. 2-3 hours) in my “gui-offline-time” will be flat lines) see screenshots at then end…

i just know such similar behaviour when no recording was taken (by testing exclude/include), but i hadn´t changed anything in the exlucde settings for a couple of weeks…

… and when i´m showing the history in the history view (not on the dashboard card), i can see all states correctly? so recorder should work correctly?
i had restarted, and then rebootet my rasp4, but it doesn´t helped?

yesterday, i thougth it is related to the shelly integration, because only the ShellyPower meter readings history graphscards had this prop, but NOT the temerpature readings of aquara (xiaomi, zigbb2mqtt )?.. but today all of them are showing the same issue? so all history graph cards ar showing this bevaviour!

pls note, i have running HASSIO on a rasp4 with ssd, mariadb and influxdb. as starting with influx ContinuesQueries, i had recognized a temporarily lagging of the system, so hassio gui on browser and andriod app had temporarily problems loading the pages , i had to wait for a minute then it works fast/slow as usual… ( i assume when 1 or more CQs are running it slows down system?). mostly the history graphs shows the most time lag by loading… so rest of dashboard is loaded before… but i had never this behaviour as shown in the the following screenhots…?




i recognize also a higher cpu load (idle is 4%-6%, it rises to 35% when system is laggy, just by browsing thrue the dashboards… in the past it was about 15% max… but something had changed?

i can rember a couple days in the past, i had the same issue, but after closing or reloading “hassio gui” a couple minutes later the graphs where loaded correclty… i had assumed something loading the history states was temporarily pending. But now i can´t get any graph loaded correctly?

what could this be? And why is the graph-card not able to show a history-state, but history page is?
pls help! thank you in advance!
br Frank

image
today at 12:15 up to 12:35 shortly i have restarted hassio, i regonized a high cpu load, but wonder what it is caused? but it seems after restart at 12:41 the graphs lost history-states? sometimes between 19h and 20 i rebootet system… but it does not helped…
cant file anything unusual in the logs …

now i shut down the system and pluged off the rasp for a minute before restart…
it took 2-3 minutes, to load the state history in the dashboard…

now last hours are shown, but not the rest of the day? what the heck?

the problem came back… at approx 20:40

i have some Log Erros in home assist core log, but i assume they are whilst reboot…but they talk about to high cpu load…

Logger: homeassistant.components.websocket_api.http.connection
Source: components/websocket_api/http.py:186
Integration: Home Assistant WebSocket API (documentation, issues)
First occurred: 20:42:28 (1 occurrences)
Last logged: 20:42:28

[547967745376] HomeAdmin from 192.168.1.233 (Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36): Client unable to keep up with pending messages. Stayed over 1024 for 5 seconds. The system's load is too high or an integration is misbehaving
Logger: homeassistant.helpers.frame
Source: helpers/frame.py:77
First occurred: June 4, 2023 at 20:34:41 (1 occurrences)
Last logged: June 4, 2023 at 20:34:41

Detected integration that called async_setup_platforms instead of awaiting async_forward_entry_setups; this will fail in version 2023.3. Please report issue to the custom integration author for hacs using this method at custom_components/hacs/__init__.py, line 171: hass.config_entries.async_setup_platforms(
Logger: homeassistant.setup
Source: runner.py:179
First occurred: 20:34:38 (2 occurrences)
Last logged: 20:34:38

Setup of input_boolean is taking over 10 seconds.
Setup of input_number is taking over 10 seconds.
Logger: homeassistant.components.sensor
Source: helpers/entity_platform.py:236
Integration: Sensor (documentation, issues)
First occurred: 20:34:26 (1 occurrences)
Last logged: 20:34:26

Platform influxdb not ready yet: Cannot connect to InfluxDB due to 'HTTPConnectionPool(host='192.168.1.10', port=8086): Max retries exceeded with url: /query?q=SHOW+DATABASES%3B (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f91a3bb50>: Failed to establish a new connection: [Errno 111] Connection refused'))'. Please check that the provided connection details (host, port, etc.) are correct and that your InfluxDB server is running and accessible.; Retrying in background in 30 seconds
Logger: homeassistant.components.influxdb.sensor
Source: components/influxdb/sensor.py:167
Integration: InfluxDB (documentation, issues)
First occurred: 20:34:26 (1 occurrences)
Last logged: 20:34:26

Cannot connect to InfluxDB due to 'HTTPConnectionPool(host='192.168.1.10', port=8086): Max retries exceeded with url: /query?q=SHOW+DATABASES%3B (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f91a3bb50>: Failed to establish a new connection: [Errno 111] Connection refused'))'. Please check that the provided connection details (host, port, etc.) are correct and that your InfluxDB server is running and accessible.

one message whilest booting time… so not a prob?

Logger: homeassistant.components.influxdb
Source: components/influxdb/__init__.py:488
Integration: InfluxDB (documentation, issues)
First occurred: 20:34:25 (1 occurrences)
Last logged: 20:34:25

Cannot connect to InfluxDB due to 'HTTPConnectionPool(host='192.168.1.10', port=8086): Max retries exceeded with url: /write?db=homeassistant (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f91976fb0>: Failed to establish a new connection: [Errno 111] Connection refused'))'. Please check that the provided connection details (host, port, etc.) are correct and that your InfluxDB server is running and accessible. Retrying in 60 seconds.

influx log
nothing suspicious…

[20:34:00] INFO: Kapacitor is waiting until InfluxDB is available...
[20:34:00] INFO: Chronograf is waiting until InfluxDB is available...
[20:34:01] INFO: Starting the InfluxDB...
[20:34:36] INFO: Starting the Kapacitor
[20:34:36] INFO: Starting Chronograf...
'##:::'##::::'###::::'########:::::'###:::::'######::'####:'########::'#######::'########::
 ##::'##::::'## ##::: ##.... ##:::'## ##:::'##... ##:. ##::... ##..::'##.... ##: ##.... ##:
 ##:'##::::'##:. ##:: ##:::: ##::'##:. ##:: ##:::..::: ##::::: ##:::: ##:::: ##: ##:::: ##:
 #####::::'##:::. ##: ########::'##:::. ##: ##:::::::: ##::::: ##:::: ##:::: ##: ########::
 ##. ##::: #########: ##.....::: #########: ##:::::::: ##::::: ##:::: ##:::: ##: ##.. ##:::
 ##:. ##:: ##.... ##: ##:::::::: ##.... ##: ##::: ##:: ##::::: ##:::: ##:::: ##: ##::. ##::
 ##::. ##: ##:::: ##: ##:::::::: ##:::: ##:. ######::'####:::: ##::::. #######:: ##:::. ##:
..::::..::..:::::..::..:::::::::..:::::..:::......:::....:::::..::::::.......:::..:::::..::
2023/06/04 20:34:37 Using configuration at: /etc/kapacitor/kapacitor.conf
time="2023-06-04T20:34:51+02:00" level=info msg="Reporting usage stats" component=usage freq=24h reporting_addr="https://usage.influxdata.com" stats="os,arch,version,cluster_id,uptime"
time="2023-06-04T20:34:51+02:00" level=info msg="Serving chronograf at http://127.0.0.1:8889" component=server
[20:34:52] INFO: Starting NGINX...

Grafana log shows something strange:
what are these cleanup jobs every 10 Minutes?

[20:34:07] INFO: Starting NGINX...
logger=cleanup t=2023-06-04T20:44:07.489215399+02:00 level=info msg="Completed cleanup jobs" duration=35.760273ms
logger=cleanup t=2023-06-04T20:54:07.478318063+02:00 level=info msg="Completed cleanup jobs" duration=25.019067ms
logger=cleanup t=2023-06-04T21:04:07.477116818+02:00 level=info msg="Completed cleanup jobs" duration=24.056814ms
logger=cleanup t=2023-06-04T21:14:07.489102729+02:00 level=info msg="Completed cleanup jobs" duration=35.19953ms
logger=cleanup t=2023-06-04T21:24:07.477455368+02:00 level=info msg="Completed cleanup jobs" duration=24.192132ms
logger=infra.usagestats t=2023-06-04T21:33:54.34898561+02:00 level=info msg="Sent usage stats" duration=348.349753ms
logger=cleanup t=2023-06-04T21:34:07.477463983+02:00 level=info msg="Completed cleanup jobs" duration=24.230492ms
logger=cleanup t=2023-06-04T21:44:07.476908322+02:00 level=info msg="Completed cleanup jobs" duration=23.659509ms
logger=cleanup t=2023-06-04T21:54:07.476606348+02:00 level=info msg="Completed cleanup jobs" duration=23.423496ms
logger=cleanup t=2023-06-04T22:04:07.479937705+02:00 level=info msg="Completed cleanup jobs" duration=25.462915ms
logger=cleanup t=2023-06-04T22:14:07.47801981+02:00 level=info msg="Completed cleanup jobs" duration=24.386623ms
logger=cleanup t=2023-06-04T22:24:07.478458198+02:00 level=info msg="Completed cleanup jobs" duration=24.58706ms
logger=cleanup t=2023-06-04T22:34:07.477420155+02:00 level=info msg="Completed cleanup jobs" duration=23.463777ms
logger=cleanup t=2023-06-04T22:44:07.477111018+02:00 level=info msg="Completed cleanup jobs" duration=23.845513ms
logger=cleanup t=2023-06-04T22:54:07.477017885+02:00 level=info msg="Completed cleanup jobs" duration=23.756508ms
logger=cleanup t=2023-06-04T23:04:07.488555618+02:00 level=info msg="Completed cleanup jobs" duration=34.558037ms
logger=cleanup t=2023-06-04T23:14:07.478016701+02:00 level=info msg="Completed cleanup jobs" duration=24.010431ms
logger=cleanup t=2023-06-04T23:24:07.477749215+02:00 level=info msg="Completed cleanup jobs" duration=23.848722ms
logger=cleanup t=2023-06-04T23:34:07.477385192+02:00 level=info msg="Completed cleanup jobs" duration=23.645241ms
logger=cleanup t=2023-06-04T23:44:07.487810807+02:00 level=info msg="Completed cleanup jobs" duration=34.757416ms
logger=cleanup t=2023-06-04T23:54:07.477170247+02:00 level=info msg="Completed cleanup jobs" duration=24.046991ms

Mariadb log seems OK

phpmyadmin.pma__userconfig                         OK
phpmyadmin.pma__usergroups                         OK
phpmyadmin.pma__users                              OK
sys.sys_config                                     OK
[20:33:47] INFO: Ensuring internal database upgrades are performed
[20:33:47] INFO: Ensure databases exists
[20:33:48] INFO: Create database homeassistant
[20:33:48] INFO: Ensure users exists and are updated
[20:33:48] INFO: Update user hassio_mariadbuser
[20:33:48] INFO: Init/Update rights
[20:33:49] INFO: Granting all privileges to hassio_mariadbuser on homeassistant
[20:33:49] INFO: Successfully send service information to Home Assistant.

mosquito broker log
is this normal, that connections are closed and reconnected every 2 minutes?

![image|563x424](upload://64leF1t9tvKHJRkpfUlnUpx5sjH.png)

File Editor log
is this normal, that fileeditor has that Info every couple of seconds, even when not used?

I’d try to upgrade to 2023.5.x - there has been a lot of rework and change in the recorder - may fix your issue.

i had Updated homeassistant (but not yet homeassistant core) and now it seems much better…

after reboot it shows full history in cards instantly… and it seems much faster… i will track it a couple of days.

@PeteRage thanx for the hint. i dont understand difference between HA and HAcore, but the rework on recorder shouldn t be installed on my system until update of core, isnt it?

no i have severe problems now… mqtt broker addon cant start, zigbee2mqtt cant start, cant acces system/backup, system addon pages (they keep black, showing just title), cpu load is constantly 35% for hours…(since yesterday 23:00h)
hassio is incredible fast, but may due zigbee2mqtt and mosquito are not running…?

then i tried to Update HAcore, zigbee2mqtt, mosquito, mariadb as well, but does not get better now…

is there any possibility to see an overview of running tasks? something lika a taskmanager or processmanager?

Depending on the size of your db, it can take along time for the migration to complete, during this time there is increased CPU usage which could be causing your issues. Search around in this community for those threads. It took my system 14 hours to complete the upgrade.

how could i see, when conversion of DB is finished when updating core?
i had seen some log entries, but it seems to be finished until morning(or evene yesterday night), but CPU usage ist still high all day long…?
i had some restarts due to other updates in meantime…what should not be good while reorganize of db?

but today shortly before lunch i had still severe problems, some addins wont start (mosquito, zigbe2 etc…) even until updating them…, some HA pages cant be opened (add-in page, hacs page, some system pages)
i had hacs error or info in logs, but hacs was upto date? i realized later, that hacs addons do have its own update page… 4 wherer pending… after updating them, cpu load instantly falls… so i assumed this caused some trouble too?
but cpu loads not fall to old idle value about 6-8%. now “idle cpu-usage” is about at 15-20% all day long, with some minutely peaks up to 35%…

after i had successfull started mosquito and zigb2mqtt (was only possible after hacs+hacs-addons updates quite after lunch), and 1 hour of reconnecting all sensors, system is now as fast/slow as before, depending on loading times for gui pages… but this laggy loading of dashboard is not seen on the cpu-loads? without mqtt everything loaded quite fast, bvut cpu load was still high, so this integrations usually might slow down everthing… (as before update) but something had changed generally…could it be, new core and HA is more cpu-heavy?

image

@PeteRage you had right! after over 13h hours cpu usage has calm down to 4-8%, so i think converting db was still in process… but i would wish any kind of process bar or status in logs…

after last update(s) Performance overall seems now quite good, better than before…or similar as a couple a weeks ago… so at the moment i’m sure that HA and HAcore update some weeks ago had lead into performance problems, as i did not had updated hacs and hacs addons, what i was not aware. this became more and more worse week by week and had its peak yesterday…
thank you so far, i have learned something more about HA
br Frank

Look like I am facing this very issue now:

Reboot seems to help but all graphs get stuck like this after some hours.
The forward to influx db seems unaffected. I can see all values in grafana.

I would say this happens since upgrading to version 2024.07.

I don´t recognize anything out of the ordinary in the logs. Especially not around the time the graphs start to flat out.

Did or does anybody have a similar problem?

Read this PSA: 2024.7 recorder problems

1 Like

Hello there

I’m also facing same issues.

In addition, I noticed the RPi5 Memory usage is almost 100% :open_mouth:
Normal usage maintains on 10%

Core
2024.1.5
Supervisor
2024.08.0
Operating System
12.0
Frontend
20240104.