HAGHS - What is your Global Health Score? šŸ›”ļø

Hey everyone,

How do you actually know if your Home Assistant instance is healthy?

Most of us look at raw CPU or RAM percentages, but those don’t tell the whole story. A system can have 10% CPU usage but be filled with 50 ā€œZombieā€ entities and have no backups—making it a ticking time bomb.

I wanted a ā€œNorth Starā€ metric, so I developed the HAGHS (Home Assistant Global Health Score).

:shield: What is HAGHS?

HAGHS is a standardized 0-100 index. It’s like a ā€œCredit Scoreā€ for your smart home. It balances your hardware performance (40%) with your software maintenance hygiene (60%).

:sparkles: Why you should use it:

  • The Advisor: It doesn’t just give you a number; it provides an attribute with specific steps to fix your score (e.g., ā€œDelete 12 unavailable entities to gain +5 pointsā€).
  • Smart Filtering: It’s designed to ignore ā€œlegitimateā€ sleepers like buttons or scenes so your score stays accurate.
  • Proactive Maintenance: It keeps you on top of backups and critical updates.

:rocket: Join the ā€œRace to 100ā€

I’ve released the reference implementation as a simple YAML template.

Get the code here: (GitHub - D-N91/home-assistant-global-health-score: A standardized architectural framework for measuring system health and hygiene in Home Assistant.)

How to participate:

  1. Copy the haghs.yaml from the repo.
  2. Add the Gauge card to your dashboard.
  3. Post a screenshot of your score below! I’m currently sitting at a 87. Can anyone hit a perfect 100?

I have also submitted this as an official Architecture Proposal to see if we can get this logic integrated natively into Home Assistant Core. If you like the idea, a ā€œStarā€ on GitHub would help show the devs there is community interest!

1 Like

1 Like

You could make this a template blueprint. If you make changes/fixes to the templates people can just re-import it, and the config for the user will be easier, as they just have to provide the variables.

Somewhere in the future they will even be able to do it from the GUI. Then you can guide the entity selection using selectors.

1 Like

fun project

though the thresholds for the DB are somewhat arbitrary…

there should be some weighing those imho. as long as the available disk space is fine, who cares about the db size? I mean mine is 3596.2 so deep in the penalty score :wink: but only at 6 % sensor.disk_use_percent_home…

have to check what Zombie entities are really.
o I see: | selectattr('state', 'in', ['unavailable', 'unknown'])

ok I didnt filter those yet, so thats going to take some work. most unavailable s are set in the UI explicitly though, not sure if they count in these templates.

1 Like

thanks for the suggestion! actually, i’ve already taken this a step further. haghs has evolved from a simple template into a full custom integration (custom component) with a native config flow.

this means you can already do exactly what you described: install it via hacs and use the ui setup mask with entity selectors to map your sensors. moving it to a proper integration allows for much deeper logic, better performance, and the ā€œadvisorā€ engine which would be hard to maintain in a blueprint.

check out the latest version on github, i’d love to hear your thoughts on the ui setup!

that’s a fair point about disk space, but the db penalty isn’t actually about storage capacity—it’s about i/o performance. even if you have a 2tb ssd, a large database increases i/o wait times, slows down backups, and makes history/logbook loading sluggish. it’s more about ā€œrecorder healthā€ than ā€œdisk capacity.ā€ however, customizable thresholds are on the roadmap for exactly this reason, so users with high-end hardware can adjust the limits.

regarding the zombies: you’re right, filtering them manually via templates is a pain. that’s exactly why i implemented the haghs_ignore label support. you don’t have to write any code—just tag any device or entity in the ui with that label, and the integration automatically excludes it from the calculation. it makes cleaning up the score much easier than managing complex regex filters.

Made me wonder though , why are the zombies bad at all, or rephrase , count against health.

Also, having over 400/600, I dont think we should list them in the attribute.

That is worse for the system in itself

I cannot seem to find this update that it is referencing… is it logged in more detail anywhere?

I doubt I can get much better, maybe into the mid 70’s - it’s running on i5 4 cores with 16GB ram and like 97 GB of storage but I just installed Music Assistant and Voice w/local TTS and AI.

Suggestion:

Instead of percentages values of usage, you should rather use the real pressure values from Linux:
These actually tell you, how much of your processes are waiting for CPU, memory or io.

Here is a good description of PSI:

80% disk space warning on a 2 tb disk is pointless.
You could make it staggered or use a fix value which at least should be available.

hi. great questions. zombies count against the score because they represent maintenance debt. an unavailable entity often means a broken automation, a dead battery, or a zigbee device that lost its route. these create noise in your logs and make your system less predictable. it is about hygiene, if you dont need it, remove it. if it is seasonal, use the haghs_ignore label.

regarding the attribute size: while the dashboard card uses collapsible rows to stay clean, the underlying state machine still has to carry that 400+ entity string. for v2.2, i will implement a cap on the attribute list to prevent state bloat, while keeping the total count accurate. thanks for pointing out the impact on system performance.

1 Like

hi. haghs checks all entities in the update domain. if the advisor is nagging you but you dont see anything in the main update notification, check settings > system > updates.

sometimes updates are marked as ā€˜skipped’ or are hidden because of pending dependencies. haghs still sees these as ā€˜pending maintenance debt’. if it is a legacy update you cannot perform, you can use the haghs_ignore label on that specific update entity to stop it from affecting your score. v2.2 will actually list the specific update name in the attributes to make this easier to track.

this is excellent feedback.

regarding psi: i will look into how we can incorporate psi values into the score for v2.2, possibly as an optional advanced metric for those who have the sensors configured.

regarding disk space: i agree. for the next update i will look into changing the logic to a staggered approach or a minimum absolute threshold (e.g., warning only if less than 10gb or 15% remains, whichever is more sensible).

thanks for pushing the technical depth of this project.

regarding psi: i will look into how we can incorporate psi values into the score for v2.2, possibly as an optional advanced metric for those who have the sensors configured.

I added PSI values in Home Assistant OS 16.3

I also worked to get PSI information into the systemmonitor integration, but that is still ongoing. Not sure if that can help you: