ha-state-archive: Structural Audit & Archival Tooling for Home Assistant
Hi everyone,
I would like to introduce an open-source project extracted from my own long-running Home Assistant production infrastructure:
Repository:
Why this project exists
As Home Assistant setups grow over the years, they increasingly behave like long-lived software systems.
Large YAML include trees, registry-managed entities, generated runtime structures, retained historical logic, partial migrations, dynamic templates⦠eventually the question stops being:
βDoes the configuration load?β
and becomes:
βDo I still structurally understand this system?β
Most existing tools focus on:
- backups
- YAML validation
- formatting
- linting
Those tools are essential, but they do not answer questions such as:
- βWhere is this entity actually declared?β
- βShould this entity exist in YAML or in the registry?β
- βIs this reference statically resolvable?β
- βIs this an integrity problem or expected runtime behavior?β
- βWhat structurally changed between two releases?β
- βCan this archived version be safely purged?β
What is ha-state-archive?
This is not a traditional backup tool.
ha-state-archive is an infrastructure-side archival and audit pipeline designed for long-lived Home Assistant systems.
The repository currently includes:
- include graph resolution
- declaration extraction
- structural integrity auditing
- registry authority analysis
- runtime YAML authority classification
- release-oriented diff generation
- deterministic retention workflows
- quarantine-first purge workflows
The central component is the audit engine.
Instead of only validating YAML syntax, it attempts to reason about Home Assistant structural integrity through concepts such as:
- authority modeling
- static vs dynamic references
- actionable anomalies
- architectural observations
- bounded outputs
One important design goal is distinguishing between:
- actual integrity problems;
- intentionally dynamic Home Assistant behavior;
- expected runtime-only mechanisms;
- infrastructure-side observations.
For example, some Home Assistant platforms intentionally operate outside the entity registry model and should not automatically be treated as integrity failures.
Example audit output
Example anomaly types currently implemented:
| Type | Meaning |
|---|---|
declared_not_in_registry |
YAML declaration missing from registry |
registry_not_declared |
Registry entity without matching declaration |
broken_reference |
Static reference to an unknown entity |
runtime_yaml_observation |
Runtime YAML platform intentionally absent from registry |
The audit engine also distinguishes between:
- actionable anomalies;
- architectural observations.
Observations are reported separately and do not increment the anomaly count.
Structural release diffing
The repository already includes a structural diff engine capable of generating bounded release-to-release Markdown reports focused on meaningful configuration evolution rather than raw file comparison.
Current implemented concepts include:
- declaration-level changes;
- structural additions/removals;
- bounded diff outputs;
- exclusion-aware diffs;
- release-oriented Markdown reporting.
The long-term direction is to progressively move toward increasingly semantic and structure-aware evolution analysis.
Architectural direction
The project intentionally follows an infrastructure-oriented approach.
Most processing occurs outside Home Assistant itself:
[ Home Assistant ]
β
βΌ
Immutable extracted versions
β
βΌ
Archival pipeline
βββββββββββ΄ββββββββββ
βΌ βΌ
Structural audit Release diffs
β β
βββββββββββ¬ββββββββββ
βΌ
MQTT supervision
βΌ
Retention classification
βΌ
Quarantine
βΌ
Delayed irreversible purge
The repository was extracted from a real production environment and progressively generalized for open-source publication.
The core pipeline is already running continuously in production, although APIs and some internal structures are still evolving.
Current status
At this stage, the project is probably closer to infrastructure engineering tooling than a turnkey Home Assistant add-on.
The current target audience is mainly advanced users operating:
- large YAML-based setups;
- Git-driven workflows;
- multi-site environments;
- long-lived Home Assistant infrastructures.
Feedback welcome
I would especially appreciate feedback regarding:
- audit semantics;
- authority modeling;
- structural diff philosophy;
- retention strategy;
- operational ergonomics;
- edge cases around dynamic Home Assistant behavior.
I am also very open to criticism if some architectural assumptions appear too tied to my own infrastructure.
Thanks for reading.