Heads up! Upcoming breaking change in the Template integration

I’m not saying it was the templates. I actually don’t think i have any of the offending templates. I’m saying it was simply the update that caused my CPU load to increase.

So what others in the thread are seeing might be from the update in general instead of the templates.

Just an FYI for anyone seeing the same thing.

Yeah, that’s what I’m trying to figure out. I didn’t see any bump but others are seeing a huge bump. I also don’t use automations or too many templates. When I do use templates, I make sure they all work off entity state changes. I might have to put my cpu data into another software package. Maybe I do have a bump.

Prior to 0.115 it worked like this:

  • Home Assistant inspects the value_template option, identifies entities, and assigns listeners to them. If it can’t find any entities, your Template Sensor is evaluated at startup and never again (until the next restart).

  • If you also include the entity_id option, Home Assistant assigns listeners to each entity you’ve specified and does not inspect value_template. So entity_id serves to supersede value_template. You have complete manual control over what causes value_template to be evaluated.

In 0.115 it works like this:

  • Home Assistant inspects value_template, identifies entities, and assigns listeners to them. If it can’t find any entities, your Template Sensor is evaluated at startup and never again (until the next restart).

  • The entity_id option is no longer available to supersede value_template. You no longer have manual control to override the automatic system.

To be clear, the new entity-identification process is more thorough than the previous one. For example, it now handles expand properly and understands states, state.sensor, etc. I believe this is why it was decided to deprecate entity_id. However, there are situations where you may still want manual control to supersede the automatic system.

So far, I have not heard a compelling technical reason that prevents restoring the entity_id option. Instead, there’s been talk of adding new options that, irony of ironies, serve as substitutes for entity_id. :man_shrugging:

2 Likes

It seems that I have to point out that I am just a single guy discussing my thoughts, just like all of you. I do not speak for Home Assistant and I am not at all certain that auto_update would even be accepted since it is, honestly, still ugly.

You ask for technical justification. If my previous post was not enough, I am not sure what could satisfy you. There is no reason that entity_id could not be re-implemented if that’s what you want to hear. The technical terms are “feature creep”, “technical debt” and possibly even “second system syndrome”.

It would be very helpful if you could show an example of a template that no longer works for you with the new engine.

And as additional info…

I’ve also noticed system having a higher latency than it did before as well.

I have a light switch that directly controls a shelly and the shelly state toggling controls toggling a smart bulb. It used to be almost instantaneous. Now there’s at least a .5 to 1 second delay between flipping the switch and the light toggling.

So "the bump’ really is having an impact on performance.

Too bad there’s no good/easy way to figure out what’s causing it.

The feature was removed so to call its reinstatement “feature creep” would be disingenuous. If there’s no technical obstacle then I recommend it be restored because it provides manual control over which entities are assigned listeners.

A prime example is the one discussed at length in the posts above, namely the “Sensor - Unavailable/Offline Detection”. Its template uses states which, in previous versions, was not assigned any listeners and that was a good thing for that particular Template Sensor’s purposes. It was sufficient to evaluate the template once a minute by simply specifying entity_id: sensor.time. Effectively, we had manual control over which entity (or entities) served to refresh the template. We could constrain the listeners.

It’s no longer possible to do that in 0.115 (by definition, “loss of functionality”). That means instead of one listener (for sensor.time) it now gets the maximum number of possible listeners (one for each entity). Instead of updating once a minute it gets updated at least that and much more. It uses more resources than needed to get the job done and there’s no control over it (short of foregoing the use of a Template Sensor altogether and resorting to a python_script … which is what Marius is doing).

I just outlined the issue to someone else who asked where to now include sensor.time within the template:

yes.
Another that is a possible system killer is a template like:

      {% set ns = namespace(domains=[]) %}
      {% for d in states|groupby('domain') %}
      {% set ns.domains = ns.domains + [d[0]] %}
      {% endfor %}
      {% set list = ns.domains|join('\n') %}
      {{list if list|count < 255 else
        list|replace('input','inp')|truncate(255,true)}}

I have 3 systems. Main system with all other integrations loaded and rather a large backend setup, and 2 smaller ones, dedicated for Z-wave (Aotec stick) and Mqtt, serving as the dedicated broker. The latter 2 to take away as much stress from the production system as possible.
All on 115.3 now. My production system immediately breaks upon loading this template, and grinds to a halt. I can only restart it using the command line.
This template didnt break sweat in 114. The other (Z-wave and Mqtt) can run it without obvious trouble, but of course they have practically nothing to track…

I have to add the above template is not even the complete template sensor, but an attribute_template. Main value_template was the count:

{{states|groupby('domain')|count}}

Now these seem to get solved in 116, but I mention it here to be completely transparent :wink:

1 Like

With respect finity, with HA that was always a possibility probability as HA works on 1 second updates. If you hit the button 1ms before the ‘scan’ (sorry I can only relate this to PLC’s) then the response will be instantaneous (well 1ms) if you hit it 999ms before the scan it will be 1s. And the average will be about 0.5 s in normal use. If you have z wave devices talking directly (having been put on the same group) then you may be able to beat that 1/2 second average. (from what I’ve read this is also possible with some Philips hue stuff, but I can’t attest). You could probably also do it with esphome but would need dedicated communication. All of these inherently bypass HA as a controller so you’d loose flexibility and probably some control too.
I don’t know what else to say other than you’ve been very lucky with the timings previously and are maybe noticing it more now that you are ‘looking for symptoms’
:man_shrugging:

With respect mutt,

That’s kind of insulting…I know it wasn’t intended but please consider who you are talking to.

I installed this ceiling fan a year ago and have been operating it as it is since then. I think I would have noticed a latency in that amount of time if it was being “just lucky”.

And to be fair I wasn’t even thinking about or even considering that a 10 or 12 percent increase in CPU was causing any issues with response time. As you can see (and as I’ve mentioned in other threads) my CPU usage at times gets up to over 50% and I’ve never seen any latency.

I got home from work and flipped on the light switch and the light didn’t come on…then it did…I thought it was strange but didn’t even think about it being related until it kept happening every single time since then. And even this morning it’s still doing the same thing.

So, no, I wasn’t “lucky” before and I’m not suddenly “unlucky” now. Something is going on.

Understood.
I know who I’m speaking to and would trust your observations above 99% of members
And it is co-incident with the recent upgrades etc. Insulting you was furthest from my mind (for which I apologise)
But the 1 second update of HA has been discussed many times, so that will have to be taken into account.
The shelly is connected by WiFi, the smart bulb is connected via ??? Zigbee, Z-wave, WiFi ???
Given your experience with these what is your best guess for the various propagation delays involved here ?
Switch to Shelly - instantaneous ?
Shelly over WiFi to HA - 50 to 120 ms ?
HA read, process, write - assuming good timings - 50ms ?
HA to bulb - (depends on transport and protocol), wifi - 50 to 120 ms ?, zigbee - your guess is better than mine ? Z-wave - I’m a z-wave fan but I’ve seen delays varying from say a quarter of a second to 3 secs ? (if the z-wave is native HA to z-wave or if it has to be translated first to mqtt then back to passing through the zwave controller api, I don’t know the speed differences but there must be some)
Given this chain, and you having to make an estimate what would you estimate ? (not from previous but in theory). AND, if your life depended on it what would you guarantee to (say) a client/friend /relative you just installed this exact setup for ?
Evidently something has changed in your setup but unless your processor usage is over (say) 80% then I wouldn’t say that your increased overhead is actually affecting this point to point response.
HA is designed to cope with varying loads whilst maintaining a reasonably consistent response (part of the one second updates, else they’d finish one cycle and immediately start on the next, so massive processing power /speed would pay massive dividends. (Edit: But a LOT of people run quite large systems happily on a Pi3b)
I genuinely would be interested for anyone’s views on what is absorbing this additional time.
And particularly what you think your propagation times ‘should’ be.
Also surrounding conditions, was someone streaming a 4k film clogging your WiFi bandwidth for example ?
(edit2: I also know that you know ALL of the above, I’m just putting in in context for everyone)

Believe me, I get what you are saying.

I’m using ESPHome for the most part for the Wifi stuff. the light in question is zigbee.

I do expect some small latency, of course. a quarter second seems reasonable. Which is what I was previously getting.

As a test I reverted back to v114.2 and I immediately experienced my previous performance.

here is an example:

then i again updated to 115.2 and this is what I get now:

I’d say that’s a pretty significant difference. The videos were taken 16 minutes apart (and everybody else is still in bed…) so literally the only difference is the HA version.

2 Likes

Wow !
Pretty damning
(Edit: AND a well documented effect !)

1 Like

So I just checked out my cpu level for the last month and it’s been static @ 1% on average. You can check out my config to compare to yours. I’m also using esphome but I only have 2 devices with ~20 entities. The event loop is tight when performing automations on it. I only have 15 template sensors and about 10 or so template entitites outside of that. Most of my stuff comes from appdaemon. My memory dropped 2k from 15.2 to 15.3. But I removed tensorflow 1.0 and added tensorflow 2.0.

There’s a link posted by Bdraco for installing py-spy. It can reveal what is occupying Home Assistant’s time and potentially help the development team to fix it. Someone suggested in a WTH that it be included with Home Assistant to make it easier for users to provide its reports when logging a GitHub Issue.

Direct link: GitHub - benfred/py-spy: Sampling profiler for Python programs

1 Like

ooh, I didn’t think about that.

I’ll see if I can get that running and maybe it’ll help narrow it down.

At this point I don’t even know what to put into a bug report. I’m sure that “I’m getting a bad latency on a few switches” won’t give them much to go on.

I’d Upvote that to become an official Add-On and a well documented addition to other install methods so that such diagnostic information being available to all (if running supervised (ie able to be monitored by the devs) and enabled)
:+1:

yep, especially for those amongst us using Homeassistant OS… rather difficult to setup in that.

I don’t know if it can be used as an Add-on (i.e. its own docker container). I may be wrong but it may need to run in the same context as the python program it is profiling (so in the homeassistant docker container).

It’s coded in Rust, not python, so that means it needs Rust-related resources. I don’t know how much space all of this takes but it may be a consideration when deciding whether to include it by default.


EDIT

If I have understood the following blog post correctly, it creates a docker container for py-spy and a separate container for a python test program. It then proceeds to use the containerized py-spy to profile the containerized test program. If this is true then, theoretically, py-spy could be packaged as an Add-on.

Full disclosure: I’m making a lot of assumptions …

In integration then?

Either way, Frenk’s your man !

Edit:
So you could ‘choose’ to install it (or not)
And if you could then ‘choose’ to allow remote monitoring, just of Py-Spy date (or not)

That way everyone gets what they want.

Long term you could look at what processor hit it causes / data bandwidth it consumes and change accordingly.