How do you actually know if your Home Assistant instance is healthy?
Most of us look at raw CPU or RAM percentages, but those donāt tell the whole story. A system can have 10% CPU usage but be filled with 50 āZombieā entities and have no backupsāmaking it a ticking time bomb.
I wanted a āNorth Starā metric, so I developed the HAGHS (Home Assistant Global Health Score).
What is HAGHS?
HAGHS is a standardized 0-100 index. Itās like a āCredit Scoreā for your smart home. It balances your hardware performance (40%) with your software maintenance hygiene (60%).
Why you should use it:
The Advisor: It doesnāt just give you a number; it provides an attribute with specific steps to fix your score (e.g., āDelete 12 unavailable entities to gain +5 pointsā).
Smart Filtering: Itās designed to ignore ālegitimateā sleepers like buttons or scenes so your score stays accurate.
Proactive Maintenance: It keeps you on top of backups and critical updates.
Join the āRace to 100ā
Iāve released the reference implementation as a simple YAML template.
Post a screenshot of your score below! Iām currently sitting at a 87. Can anyone hit a perfect 100?
I have also submitted this as an official Architecture Proposal to see if we can get this logic integrated natively into Home Assistant Core. If you like the idea, a āStarā on GitHub would help show the devs there is community interest!
You could make this a template blueprint. If you make changes/fixes to the templates people can just re-import it, and the config for the user will be easier, as they just have to provide the variables.
Somewhere in the future they will even be able to do it from the GUI. Then you can guide the entity selection using selectors.
though the thresholds for the DB are somewhat arbitraryā¦
there should be some weighing those imho. as long as the available disk space is fine, who cares about the db size? I mean mine is 3596.2 so deep in the penalty score but only at 6 % sensor.disk_use_percent_homeā¦
have to check what Zombie entities are really.
o I see: | selectattr('state', 'in', ['unavailable', 'unknown'])
ok I didnt filter those yet, so thats going to take some work. most unavailable s are set in the UI explicitly though, not sure if they count in these templates.
thanks for the suggestion! actually, iāve already taken this a step further. haghs has evolved from a simple template into a full custom integration (custom component) with a native config flow.
this means you can already do exactly what you described: install it via hacs and use the ui setup mask with entity selectors to map your sensors. moving it to a proper integration allows for much deeper logic, better performance, and the āadvisorā engine which would be hard to maintain in a blueprint.
check out the latest version on github, iād love to hear your thoughts on the ui setup!
thatās a fair point about disk space, but the db penalty isnāt actually about storage capacityāitās about i/o performance. even if you have a 2tb ssd, a large database increases i/o wait times, slows down backups, and makes history/logbook loading sluggish. itās more about ārecorder healthā than ādisk capacity.ā however, customizable thresholds are on the roadmap for exactly this reason, so users with high-end hardware can adjust the limits.
regarding the zombies: youāre right, filtering them manually via templates is a pain. thatās exactly why i implemented the haghs_ignore label support. you donāt have to write any codeājust tag any device or entity in the ui with that label, and the integration automatically excludes it from the calculation. it makes cleaning up the score much easier than managing complex regex filters.
I doubt I can get much better, maybe into the mid 70ās - itās running on i5 4 cores with 16GB ram and like 97 GB of storage but I just installed Music Assistant and Voice w/local TTS and AI.
Instead of percentages values of usage, you should rather use the real pressure values from Linux:
These actually tell you, how much of your processes are waiting for CPU, memory or io.
Here is a good description of PSI:
80% disk space warning on a 2 tb disk is pointless.
You could make it staggered or use a fix value which at least should be available.
hi. great questions. zombies count against the score because they represent maintenance debt. an unavailable entity often means a broken automation, a dead battery, or a zigbee device that lost its route. these create noise in your logs and make your system less predictable. it is about hygiene, if you dont need it, remove it. if it is seasonal, use the haghs_ignore label.
regarding the attribute size: while the dashboard card uses collapsible rows to stay clean, the underlying state machine still has to carry that 400+ entity string. for v2.2, i will implement a cap on the attribute list to prevent state bloat, while keeping the total count accurate. thanks for pointing out the impact on system performance.
hi. haghs checks all entities in the update domain. if the advisor is nagging you but you dont see anything in the main update notification, check settings > system > updates.
sometimes updates are marked as āskippedā or are hidden because of pending dependencies. haghs still sees these as āpending maintenance debtā. if it is a legacy update you cannot perform, you can use the haghs_ignore label on that specific update entity to stop it from affecting your score. v2.2 will actually list the specific update name in the attributes to make this easier to track.
regarding psi: i will look into how we can incorporate psi values into the score for v2.2, possibly as an optional advanced metric for those who have the sensors configured.
regarding disk space: i agree. for the next update i will look into changing the logic to a staggered approach or a minimum absolute threshold (e.g., warning only if less than 10gb or 15% remains, whichever is more sensible).
thanks for pushing the technical depth of this project.
regarding psi: i will look into how we can incorporate psi values into the score for v2.2, possibly as an optional advanced metric for those who have the sensors configured.
Thanks - I found it. It was a manual integration reporting an update available and I did not have it as an entity on my normal view for those type of things.
Still surprised at this rating - 86% on system with 4 CPU Cores with 16GB ram and 100GB storage at 49% full.
Seems to be CPU Load as I hover at 13-15% load with all the integrations and add-ons.
this is huge news, thank you for the heads-up. having psi natively in haos 16.3 and soon in the systemmonitor integration makes the roadmap for haghs v2.2 much cleaner.
instead of building custom file-parsers, we can now aim to pull these metrics directly from the core entities once your pr is merged. this is exactly the kind of standardization i was hoping for. i will keep a close eye on your prāthis will definitely be the gold standard for the haghs hardware score.
yes, the current 10% cpu threshold is quite strict for your setup. v2.2 will fix this by switching to psi (pressure stall information). instead of raw usage, weāll measure actual resource āpressure,ā making the score much fairer for powerful hardware. check out the roadmap on r/haghs for more details.
Hello there! Very cool idea! I have installed the integration myself and got a score of 85%. I think it would be a good idea to add more sensors to the Global health score. In my opinion these sensors should also be considered into the overall health score:
CPU Temperature
Raspberry Pi Power Supply Checker or similar since many users are running HA on their PIās
Spent the day cleaning up zombies, ignoring seasonal items, and shrinking my database from 34GB to 72MB (yes, those units are correct, had to use VACUUM to get it to work)
Was getting excited as my score kept going up, but now everything reports fully optimised and only has a score of 86%.
I have followed the steps in this post and at https://github.com/D-N91/home-assistant-global-health-score to get HAGHS installed, but I am stuck at this point:
I assume that haghs.yaml is the YAML code posted on github that begins with this:
And I assume that I should copy all that code to my configuration.yaml file?
However, I donāt have any dashboard entries already in my configuration.yaml and this code looks incomplete. There should be something about lovelace and dashboards - is that correct?
Could some kind person please guide me as to exactly what I need to do to get this into my dashboard?