How do you actually know if your Home Assistant instance is healthy?
Most of us look at raw CPU or RAM percentages, but those donāt tell the whole story. A system can have 10% CPU usage but be filled with 50 āZombieā entities and have no backupsāmaking it a ticking time bomb.
I wanted a āNorth Starā metric, so I developed the HAGHS (Home Assistant Global Health Score).
What is HAGHS?
HAGHS is a standardized 0-100 index. Itās like a āCredit Scoreā for your smart home. It balances your hardware performance (40%) with your software maintenance hygiene (60%).
Why you should use it:
The Advisor: It doesnāt just give you a number; it provides an attribute with specific steps to fix your score (e.g., āDelete 12 unavailable entities to gain +5 pointsā).
Smart Filtering: Itās designed to ignore ālegitimateā sleepers like buttons or scenes so your score stays accurate.
Proactive Maintenance: It keeps you on top of backups and critical updates.
Join the āRace to 100ā
Iāve released the reference implementation as a simple YAML template.
Post a screenshot of your score below! Iām currently sitting at a 87. Can anyone hit a perfect 100?
I have also submitted this as an official Architecture Proposal to see if we can get this logic integrated natively into Home Assistant Core. If you like the idea, a āStarā on GitHub would help show the devs there is community interest!
You could make this a template blueprint. If you make changes/fixes to the templates people can just re-import it, and the config for the user will be easier, as they just have to provide the variables.
Somewhere in the future they will even be able to do it from the GUI. Then you can guide the entity selection using selectors.
though the thresholds for the DB are somewhat arbitraryā¦
there should be some weighing those imho. as long as the available disk space is fine, who cares about the db size? I mean mine is 3596.2 so deep in the penalty score but only at 6 % sensor.disk_use_percent_homeā¦
have to check what Zombie entities are really.
o I see: | selectattr('state', 'in', ['unavailable', 'unknown'])
ok I didnt filter those yet, so thats going to take some work. most unavailable s are set in the UI explicitly though, not sure if they count in these templates.
thanks for the suggestion! actually, iāve already taken this a step further. haghs has evolved from a simple template into a full custom integration (custom component) with a native config flow.
this means you can already do exactly what you described: install it via hacs and use the ui setup mask with entity selectors to map your sensors. moving it to a proper integration allows for much deeper logic, better performance, and the āadvisorā engine which would be hard to maintain in a blueprint.
check out the latest version on github, iād love to hear your thoughts on the ui setup!
thatās a fair point about disk space, but the db penalty isnāt actually about storage capacityāitās about i/o performance. even if you have a 2tb ssd, a large database increases i/o wait times, slows down backups, and makes history/logbook loading sluggish. itās more about ārecorder healthā than ādisk capacity.ā however, customizable thresholds are on the roadmap for exactly this reason, so users with high-end hardware can adjust the limits.
regarding the zombies: youāre right, filtering them manually via templates is a pain. thatās exactly why i implemented the haghs_ignore label support. you donāt have to write any codeājust tag any device or entity in the ui with that label, and the integration automatically excludes it from the calculation. it makes cleaning up the score much easier than managing complex regex filters.
I doubt I can get much better, maybe into the mid 70ās - itās running on i5 4 cores with 16GB ram and like 97 GB of storage but I just installed Music Assistant and Voice w/local TTS and AI.
Instead of percentages values of usage, you should rather use the real pressure values from Linux:
These actually tell you, how much of your processes are waiting for CPU, memory or io.
Here is a good description of PSI:
80% disk space warning on a 2 tb disk is pointless.
You could make it staggered or use a fix value which at least should be available.
hi. great questions. zombies count against the score because they represent maintenance debt. an unavailable entity often means a broken automation, a dead battery, or a zigbee device that lost its route. these create noise in your logs and make your system less predictable. it is about hygiene, if you dont need it, remove it. if it is seasonal, use the haghs_ignore label.
regarding the attribute size: while the dashboard card uses collapsible rows to stay clean, the underlying state machine still has to carry that 400+ entity string. for v2.2, i will implement a cap on the attribute list to prevent state bloat, while keeping the total count accurate. thanks for pointing out the impact on system performance.
hi. haghs checks all entities in the update domain. if the advisor is nagging you but you dont see anything in the main update notification, check settings > system > updates.
sometimes updates are marked as āskippedā or are hidden because of pending dependencies. haghs still sees these as āpending maintenance debtā. if it is a legacy update you cannot perform, you can use the haghs_ignore label on that specific update entity to stop it from affecting your score. v2.2 will actually list the specific update name in the attributes to make this easier to track.
regarding psi: i will look into how we can incorporate psi values into the score for v2.2, possibly as an optional advanced metric for those who have the sensors configured.
regarding disk space: i agree. for the next update i will look into changing the logic to a staggered approach or a minimum absolute threshold (e.g., warning only if less than 10gb or 15% remains, whichever is more sensible).
thanks for pushing the technical depth of this project.
regarding psi: i will look into how we can incorporate psi values into the score for v2.2, possibly as an optional advanced metric for those who have the sensors configured.