High CPU - School me on Jinja Template Loops, and woes of LLM generated templates

Hi, I had a bit of yesterday that Home Assistant was near non-responsive. I ran top and python3 -m homeassistant --config /config was pegged at 100% :open_mouth:. Rebooting didn’t help. First time I’ve had to use safe mode. I did use profiler before I put it in safe mode, however I’d never figured out how to use Snakeviz in the past, so it was more a “following the forum guidance” kind of thing. In safe mode my CPU via system monitor dropped to ~13%.

After a few hours I switched off safe mode and it’s been ok since. CPU has been 21-33%.

Great, it’s reletively working, but the trouble is I didn’t actually identify the problem. My “attempts” on the screenshot above are just switching from Safe Mode. The first time when I checked top, the same python3 -m homeassistant --config /config process had high CPU and I didn’t have time to troubleshoot so I just turned Safe Mode on again.

Getting Snakeviz to work (here’s the questions)

Later I figured out how to install and use Snakeviz with WSL. Nothing made since I was only seeing that base_events.py was taking ~59/60 seconds. That seemed like a lot, but the chart didn’t look anything like what I was seeing in the screenshots. After tinkering I finally asked an LLM that suggested “Something is blocking the event loop” and to share my top 20 functions which I did.

Snakeviz screenshots

Top 20 sorted by cumtime

Top 20 sorted by ncalls

The LLM suggested initially that MQTT was the culprit, but after the ncalls sort it pointed more towards template loops. Incidentally I did recently add a couple in helpers. So, the bulk of this whole big post is to ask if any of you Jinja/Python gurus would take a look at this to tell me if anything else stands out as a problem AND to look at the templates that I’ve identified as likely problematic and if you’re willing, give me some guidance on improving them.

Template 1

This is the first template I showed it, not the worst (IMO), but the LLM said it was a problem. It’s being used as a binary_sensor helper. The purpose of this is to trigger an automation when the Traccar location goes stale. I’m not positive, but I believe this one was human made with the help of someone on the forum.

{{ (now() - states.device_tracker.rosa_traccar.last_updated).total_seconds() > 1200 }}
Template 2

The goal of this one was to avoid having to manually add new leak sensors, but have one binary_sensor that I could use to trigger notifications if any leak sensor failed to report every day. This was definitely LLM generated.

{% set threshold = now() - timedelta(hours=26) %}
{% set ns = namespace(stale=false) %}

{% for entity in states.sensor
     if entity.entity_id.startswith('sensor.leak_sensor_')
     and entity.entity_id.endswith('_last_seen') %}
  {% set last_seen = as_datetime(entity.state) %}
  {% if last_seen and last_seen < threshold %}
    {% set ns.stale = true %}
  {% endif %}
{% endfor %}

{{ ns.stale }}
Template 3

And finally, similar to #2, this was supposed to simplify some automations by allowing me to Label a OTP sensor with a clearance level so that when an automation is run to check the submitted OTP code it would check to see if the code matches a labeled OTP sensor. Again, LLM generated.

{% set target_labels = ["Security 1 - All Access", "Security 4 - Garage Only"] %}
{% set ns = namespace(allowed=[]) %}

{% for lbl in target_labels %}
  {% set ns.allowed = ns.allowed + label_entities(lbl) %}
{% endfor %}

{% set sensors = states.sensor
   | selectattr('entity_id', 'search', 's_otp_sensor')
   | selectattr('entity_id', 'in', ns.allowed)
   | rejectattr('state', 'in', ['unavailable','unknown'])
   | map(attribute='entity_id')
   | list %}
{{ sensors | join('\n') }}```

A reason I point out that two of these were LLM generated is to document examples of where they fall short in identifying problematic (presumably) situations (creating this loop that basically crashed HA).

So, are these example templates problematic? Would anyone be willing to offer ideas for fixing the problem? An idea for fixing the 2nd sensor is to use a
time_pattern trigger

template
  - trigger:
      - platform: time_pattern
        hour: "/1"
    sensor:
      - name: "Leak Sensors Overdue"
        state: >
        ............................ etc

What about the others? Maybe I could move Template 3 into the automation itself so it’s not evaluated so frequently, but only as needed? More efficient, but less observable.

I have a second sensor, very similar to #3 that makes a list of the leak sensors that didn’t report that’s parsed in the automation, into the notification. I suppose that could be moved to the automation too. Or should I change the template entirely?

Oh, and why might the problem have gone away despite the helpers still being around? It’s been a week or two since I created those helpers that are (presumably) the problem.

I may be answering some of my own questions, but I’d sure be grateful for any input you all can share on improving my approach to these problems.

The first two include now() so they would be rate limited to once a minute and should not be an issue. If you want to limit their rate more by adding time pattern triggers that would be fine.

The third uses states.sensor, limiting its refresh to once per second. So it shouldn’t be really be an issue either. But, if you have a lot of sensor entities that it’s iterating through and you want to make it a bit more efficient, you could rewrite the template without states.sensor. You would need to either move it into the automation as you described or figure out an appropriate trigger for your use case if you want to keep it external:

{% set target_labels = ["Security 1 - All Access", "Security 4 - Garage Only"] %}
{{ target_labels | map('label_entities') | flatten
| select('match','sensor') | select('search', 's_otp_sensor')
| select('has_value') | list | join('\n') }}

I make it a point not to debug LLM generated code, so I didn’t look at it. But are you using the Studio Code addon? It can eat al your memory and CPU all the same. If so, try restarting the add-on.

Thanks for the suggestion and info on update frequency.

It sounds though like these templates aren’t necessarily the single cause? I haven’t changed them yet and for the most part I haven’t seen sustained high CPU since.

@Edwin_D I totally get that. Definitely some woes and issues, however it’s made some advanced templates and ESPHome projects much more accessible. But as mentioned in the OP, it’s not comparable to a competent human.

VSCode has been an issue in the past, so I quit leaving it running. I switched to using it remotely, but I thought recent updates were supposed to fix the CPU issue. Since my remote is working fully with the new extension updates I haven’t tested the add-on.

Does anything stand out on the profile to either of you?