How to monitor single component

As you can see on the screenshot below, something on my HA instance is getting wild, crashing it at the end. After restart it works stable for a few days. Right now I can see the first symptoms of getting hot again.

It’s rpi4 8GB w/SSD, using HAOS 6.x and HA2024.8, some number of add-ons as well as HACS-installed extensions.

Primarily I would be happy with separate CPU usage of add-ons (dockers). Glances don’t provide such information into sensors AFAIK.

Thank you in advance

Do you use the CPU Speed integration?

If so. Try removing it.

Thank you for attempting to help. Appreciate that.
No, I don’t use CPU Speed integration

Here is a list of integrations and addons installed.

BTW the issues started during last a few months after updating to 2024.7 or 2024.8.
The system was rock stable for about 4 years.


Check out this post, it will likely help you with everything you need to find the issue.

Thank you
I will try that.

But it doesn’t address add-ons (docker-based). Is there any easy-to-use tool to monitor dockers performance? Having sensors with their CPU usage I would correlate it with total CPU usage.

The Glances shows CPU usage per docker, but I cannot find this information published to sensors (or their attributes)

I probably found the component causing the load. It’s the node process. Is it node.js?
How can I find where it comes from?
It’s shown by Glances but invisible in HA terminal. So I assume it’s running in OS level?

Homeassistant restart didn’t influence the load on node process.

The only component I found referencing node.js is HACS2.0. But I assume that HA restart should improve the situation then.

Node red uses nodejs and it looks like it’s your primary automation engine. Runaway automation? I’ve seen it happen before. Although that is also in the list.

Studio code server has had issues with it’s extensions. I have that installed but off at all times. I only use it when I’m external.

Thank you for your answer

On Saturday, I reinstalled my HA from scratch, updating OS to 13.1, then restoring its config from a backup. I found that average CPU temperature dropped by 10 degree. So far so good.

But the system crashed again this morning. This time it seemed because of running out of memory.
The interesting fact is, that only home assistant and ha_observer have been restarted. Their restart caused memory usage to go back to normal.
Since then it’s growing again.

It doesn’t mean, other components are not responsible for that (ie NR you mentioned).

The screenshot below shows two events:

  • Saturday’s CPU overloading/overheating followed by system reinstall
  • today’s run out of memory followed by HA restart.

At first glance, it looks like different root causes.
Previously I wrote it’s caused by node.js. This component is used by Z2M AFAIK. I found no other components used that. But this time it looks like a memory leak.

TBH I would be happy to find some add-on causing that. But if those problems are triggered by HA process, it will be extremely hard to find a single component responsible for that. I can remove integrations from the system one by one checking if the situation improves.
Maybe debugging tools would help - but I’m not familiar with them, hoping that way of experiments is less time-consuming.

bdraco’s post that I linked above should help you find offending python packages. That would point to integrations (if the issue is in core)

1 Like

OK,
first I’ve tried is checking templates.
Guided by the link you provided, I run profiler.dump_log_objects. It displayed a lot of messages like:

2024-09-30 19:51:34.803 CRITICAL (SyncWorker_58) [homeassistant.components.profiler] RenderInfo object in memory: <RenderInfo Template<template=({{ (states('sensor.wattsonic_home_consumption_now')|float(0)*1000)|round(0) }}) renders=15134>states=False all_states_lifecycle=False domains=frozenset() domains_lifecycle=frozenset() entities=frozenset({'sensor.wattsonic_home_consumption_now'}) rate_limit=None has_time=False exception=None is_static=False>
2024-09-30 19:51:34.803 CRITICAL (SyncWorker_58) [homeassistant.components.profiler] RenderInfo object in memory: <RenderInfo Template<template=({{ (states('sensor.wattsonic_battery_p')|float(0)*1000)|round(0) }}) renders=5158> all_states= all_states_lifecycle=False domains=frozenset() domains_lifecycle=frozenset() entities=frozenset({'sensor.wattsonic_battery_p'}) rate_limit=None has_time=False exception=None is_static=False>
2024-09-30 19:51:34.803 CRITICAL (SyncWorker_58) [homeassistant.components.profiler] RenderInfo object in memory: <RenderInfo Template<template=({{ ((states('sensor.shelly_3em_phase_1_power')|float + states('sensor.shelly_3em_phase_2_poweroat + states('sensor.shelly_3em_phase_3_power')|float + states('sensor.pg_cube_power')|float)) |round(2) }}) renders=20170> all_states=False all_states_lifecycle=False domains=frozenset() domains_lifecycle=frozenset() entities=frozenset({'r.shelly_3em_phase_3_power', 'sensor.pg_cube_power', 'sensor.shelly_3em_phase_1_power', 'sensor.shelly_3em_phase_2_power'}) rate_limit=None has_time=False exception=None is_static=False>

It has a reasonably high number of renders. But isn’t it expected considering those transform real-time frequently changing data?

  • shelly_3em_phase_1_power template has existed in my system for several years.
  • those wattsonic*, since August.

For those using HAOS/supervisor instead of Docker install: The Home Assistant Supervisor integration has (disabled) entities for CPU and memory of all addons. If you enable them, you can monitor them. As I create the screenshot, I see Studio Code is taking way more CPU than usual :slight_smile:

Thank you.
Didn’t know about that. Very useful.

not sure how should I decode Memory Growth log records:


2024-09-30 22:02:44.403 CRITICAL (SyncWorker_14) [homeassistant.components.profiler] Memory Growth: [('dict', 2761773, 1516), ('Context', 831349, 306), ('Event', 451234, 235), ('State', 452525, 235), ('coroutine', 396, 4), ('builtin_function_or_method', 8789, 2), ('Task', 96, 1), ('Future', 129, 1), ('FutureIter', 79, 1)]
2024-09-30 22:03:13.889 CRITICAL (SyncWorker_26) [homeassistant.components.profiler] Memory Growth: [('dict', 2763092, 1319), ('Context', 831844, 495), ('State', 452787, 262), ('Event', 451495, 261)]

The thirt number in each 3-set, is delta of increase. I assume the small number is OK, the high number might indicate the problem. But what indicates real problem?

BTW
Memory eaten by HA container still grows, reflecting total used memory graph very accuratelly.

Because I was not able to find anything helpful while memory usage by HA went critical, I disabled/removed lot of integrations and AddOns.
I left ModBus integration (the one I’ve added in Augus when problems started to happen).

Removed Custom Components:

  • browser_mod
  • circadian_lighting
  • cz_energy_spot_prices
  • dreame_vacuum
  • fordpass
  • hacs
  • ltss
  • monitor_docker
  • powercalc
  • rpi_power

Disabled Addons:

  • Samba Share
  • NodeRed :frowning:
  • Grafana
  • SQLite Web
  • Zigbee2MQTT :frowning:
  • SVC

Remaining addons

  • SSH
  • FileEditor
  • Mosquito Broker
  • Log Viewer
  • Unifi Network Controller
  • Home Assistant Google Drive Backup
  • Glances

I will monitor the state day or two, then if no issues, I will start to add extension by extension.

After 2 days, HA confirmed it’s stable. Since start it’s memory usage slowly went from 6 up to 8%, slightly oscillating near this value

Yesterday I’ve added back Zigbee2MQTT, NodeRed and HACS to the party.

pink line at about 10% is UnifiController.
pink at at about 4% is NodeRed
Zigbee goes at about 1.25%

I bet this setup will be stable too.

I have the first candidate: ltss.
This is how memory usage of HA looks like after 2 days

I will do one round again without and then with ltss before I report it.

So it’s almost certain, the LTSS custom component causes the problem.
I found in HA logs the LTSS record

ValueError: A string literal cannot contain NUL (0x00) characters.

corresponding with the time at which memory usage start to grow. It might happen anytime, but right now it happens on the component/HA start.

I don’t know where this “broken” data are read from (API, recorder, SQLite). It’s not from the destination database, since it doesn’t even attempt to connect the Postgres.

Obviously, without this component, the memory leak is not observed.

I’ve reported the issue: Innit error and possible memory leak · Issue #213 · freol35241/ltss · GitHub

1 Like

This may be worth writing up an issue against core so that it blocks that version of LTSS.

How should I understand? Is core somehow responsible for custom components? Or it’s desired to block components known for serious issues?

On top of that I cannot say the ltss version since the issue is present. It might be more version back.

Of course, I’m ready to file the issue but need to understand that dependency first.

If a custom integration takes down Home assistant, it’s typically added to the block list so that it cannot take down home assistant.

2 Likes