LAN/VPN browser access from desktop or mobile devices.
The search corpus is the archive structure produced by ha-state-archive, but the project itself is filesystem-oriented and does not depend on Home Assistant internals at runtime.
Why I built it
Once you accumulate months or years of Home Assistant snapshots, searching historical configurations manually becomes painful.
I wanted something able to answer questions like:
“When did this entity first appear?”
“Which version introduced this automation?”
“What changed between these periods?”
“Where was this helper referenced historically?”
The Markdown export is especially useful for:
incident investigation;
historical analysis;
sharing findings;
external tooling and LLM workflows.
Philosophy
The project follows the same philosophy as ha-state-archive:
Home Assistant → real-time automation and operational decisions
External infrastructure → archival, audit, search and historical analysis
The goal is not to replace Home Assistant functionality, but to complement it with long-term infrastructure tooling.
Feedback welcome — especially from people running large or long-lived Home Assistant installations.
Thanks for the update. It's not clear to me how this would be deployed and put into service. Can you elaborate?
What are the requirements to use this system? I see it's a separate docker container, but I'm not sure how it expects to get access to data, and what generates the data it consumes.
Thanks, very good question — and I probably need to make this clearer in the README.
ha-archive-search is not a Home Assistant add-on and does not collect data from HA directly.
It is a separate Docker container that searches an existing filesystem tree containing Home Assistant configuration snapshots/backups.
The typical model is:
Home Assistant
↓ regular backups / snapshots / copied config versions
NAS or server filesystem
↓ mounted read-only into the container
ha-archive-search
↓
search UI + Markdown export
So the tool does not generate the data it consumes — it expects the data to already exist as directories on disk.
On my setup, Home Assistant backups are regularly stored on a NAS. The container mounts that archive directory read-only:
No database, no indexer, no HA integration, no MQTT feed, no runtime connection to Home Assistant.
Requirements:
Docker
A directory containing extracted Home Assistant configuration snapshots
That directory mounted into the container (read-only recommended)
LAN/VPN access to the web UI
The system is intentionally infrastructure-side and offline from HA runtime — closer to a portable enhanced grep/search interface for HA configuration corpora than to a live HA integration.
The ingestion layer is the only component that ever handles the encryption password. Everything downstream — audit, diff, retention, search and export — operates on immutable extracted trees with no access to secrets.
ha-state-archive handles:
encrypted .tar ingestion
decryption and extraction
creation of immutable extracted snapshots
stabilization checks before downstream processing
retention and quarantine workflows
infrastructure-side audit/diff processing
MQTT supervision back into Home Assistant
It does not rewrite or normalize Home Assistant YAML content; it only prepares stable extracted snapshot directories for downstream tools.
ha-archive-search then mounts the resulting versions/ directory read-only and provides structured search and export on top of it.
Both projects are designed to run infrastructure-side, typically on a NAS (I use Synology DSM), independently from the Home Assistant runtime itself.
So today:
direct native backup ingestion in ha-archive-search: no
searching extracted/versioned HA corpora: yes
The separation keeps the search layer intentionally lightweight, portable and dependency-light.