Hi, I had a bit of yesterday that Home Assistant was near non-responsive. I ran top and python3 -m homeassistant --config /config was pegged at 100%
. Rebooting didn’t help. First time I’ve had to use safe mode. I did use profiler before I put it in safe mode, however I’d never figured out how to use Snakeviz in the past, so it was more a “following the forum guidance” kind of thing. In safe mode my CPU via system monitor dropped to ~13%.
After a few hours I switched off safe mode and it’s been ok since. CPU has been 21-33%.
Great, it’s reletively working, but the trouble is I didn’t actually identify the problem. My “attempts” on the screenshot above are just switching from Safe Mode. The first time when I checked top, the same python3 -m homeassistant --config /config process had high CPU and I didn’t have time to troubleshoot so I just turned Safe Mode on again.
Getting Snakeviz to work (here’s the questions)
Later I figured out how to install and use Snakeviz with WSL. Nothing made since I was only seeing that base_events.py was taking ~59/60 seconds. That seemed like a lot, but the chart didn’t look anything like what I was seeing in the screenshots. After tinkering I finally asked an LLM that suggested “Something is blocking the event loop” and to share my top 20 functions which I did.
The LLM suggested initially that MQTT was the culprit, but after the ncalls sort it pointed more towards template loops. Incidentally I did recently add a couple in helpers. So, the bulk of this whole big post is to ask if any of you Jinja/Python gurus would take a look at this to tell me if anything else stands out as a problem AND to look at the templates that I’ve identified as likely problematic and if you’re willing, give me some guidance on improving them.
Template 1
This is the first template I showed it, not the worst (IMO), but the LLM said it was a problem. It’s being used as a binary_sensor helper. The purpose of this is to trigger an automation when the Traccar location goes stale. I’m not positive, but I believe this one was human made with the help of someone on the forum.
{{ (now() - states.device_tracker.rosa_traccar.last_updated).total_seconds() > 1200 }}
Template 2
The goal of this one was to avoid having to manually add new leak sensors, but have one binary_sensor that I could use to trigger notifications if any leak sensor failed to report every day. This was definitely LLM generated.
{% set threshold = now() - timedelta(hours=26) %}
{% set ns = namespace(stale=false) %}
{% for entity in states.sensor
if entity.entity_id.startswith('sensor.leak_sensor_')
and entity.entity_id.endswith('_last_seen') %}
{% set last_seen = as_datetime(entity.state) %}
{% if last_seen and last_seen < threshold %}
{% set ns.stale = true %}
{% endif %}
{% endfor %}
{{ ns.stale }}
Template 3
And finally, similar to #2, this was supposed to simplify some automations by allowing me to Label a OTP sensor with a clearance level so that when an automation is run to check the submitted OTP code it would check to see if the code matches a labeled OTP sensor. Again, LLM generated.
{% set target_labels = ["Security 1 - All Access", "Security 4 - Garage Only"] %}
{% set ns = namespace(allowed=[]) %}
{% for lbl in target_labels %}
{% set ns.allowed = ns.allowed + label_entities(lbl) %}
{% endfor %}
{% set sensors = states.sensor
| selectattr('entity_id', 'search', 's_otp_sensor')
| selectattr('entity_id', 'in', ns.allowed)
| rejectattr('state', 'in', ['unavailable','unknown'])
| map(attribute='entity_id')
| list %}
{{ sensors | join('\n') }}```
A reason I point out that two of these were LLM generated is to document examples of where they fall short in identifying problematic (presumably) situations (creating this loop that basically crashed HA).
So, are these example templates problematic? Would anyone be willing to offer ideas for fixing the problem? An idea for fixing the 2nd sensor is to use a
time_pattern trigger
template
- trigger:
- platform: time_pattern
hour: "/1"
sensor:
- name: "Leak Sensors Overdue"
state: >
............................ etc
What about the others? Maybe I could move Template 3 into the automation itself so it’s not evaluated so frequently, but only as needed? More efficient, but less observable.
I have a second sensor, very similar to #3 that makes a list of the leak sensors that didn’t report that’s parsed in the automation, into the notification. I suppose that could be moved to the automation too. Or should I change the template entirely?
Oh, and why might the problem have gone away despite the helpers still being around? It’s been a week or two since I created those helpers that are (presumably) the problem.
I may be answering some of my own questions, but I’d sure be grateful for any input you all can share on improving my approach to these problems.


