Arsenal — a contract-driven architecture for Home Assistant

Arsenal — a contract-driven architecture for Home Assistant

After years of watching Home Assistant configurations grow into something unmaintainable, I tried a different approach: treating HA as a governed software system instead of a collection of automations.

The result is Arsenal — my HA setup, now public after building a publication pipeline to make sure nothing sensitive leaked.


The problem I was trying to solve

If you've maintained a serious HA installation for a few years, you've probably hit some version of this:

  • Automations triggering other automations, with no clear ownership of the logic
  • Business rules spread across scripts, helpers, UI cards, and automations simultaneously
  • Dashboards that do things instead of just rendering state
  • Entities you're afraid to delete because you don't know what depends on them
  • A configuration.yaml that grew organically and now no one fully understands

None of this is HA's fault. It's what happens when a powerful system is used without architectural constraints.


The core idea

Arsenal is built around three principles:

Backend decides. UI renders. Never the other way around.
No logic in dashboards. Cards display states they didn't calculate. Decisions happen upstream, in dedicated entities. The UI is a mirror, not an engine.

Contract before YAML.
Every functional domain has a written contract before any code. The contract defines entities, their roles, valid state transitions, and invariants the system must respect. YAML implements the contract — not the reverse.

Exposure is a decision, not an accident.
What's visible externally (API, MQTT, notifications) is explicitly governed. What's internal stays internal.


The architecture

┌─────────────────────────────────────────────┐
│  Physical sensors · Integrations · MQTT     │  PERCEPTION
└───────────────────────┬─────────────────────┘
                        │ raw states
                        ▼
┌─────────────────────────────────────────────┐
│  Template sensors · Helpers · Constraints   │  DECISION
└───────────────────────┬─────────────────────┘
                        │ decision states
                        ▼
┌─────────────────────────────────────────────┐
│  Automations · Sovereign scripts            │  EXECUTION
└───────────────────────┬─────────────────────┘
                        │ commands
                        ▼
                    Hardware

The UI doesn't appear in this diagram. It observes — it doesn't participate in the flow.


What this looks like concretely

Taking the VMC (ventilation) domain as an example:

  • sensor.vmc_besoin_brut — is there a ventilation need? (CO2, humidity)
  • sensor.vmc_admissibilite — is the context allowing action? (time, presence, locks)
  • input_text.vmc_decision — formal decision state: on_demande / off_demande / verrouille
  • automation.vmc_application_decision — fires on decision state change, nothing else
  • script.vmc_allumage — sovereign script with MQTT ACK
  • sensor.vmc_coherence — detects divergence between decision and real state

The same pattern applies to every domain: alarm, domestic hot water circulation, dehumidifier, boiler.


What's in the repo

  • Full HA configuration structured by architectural layer
  • Formal contracts for each domain in 00_documentation_arsenal/
  • Versioned changelogs documenting architectural decisions, not just changes
  • Contract validation scripts that check entity consistency across YAML
  • A publication audit pipeline (scripts/security/) that enforces zero leaked credentials before any push

The publication pipeline was actually an interesting sub-project — it goes from naive regex to a proper risk classification system with CRITICAL / WARNING / [scope=doc] levels. That alone took several iterations to get right.


What this isn't

Not a setup to copy. The entities, MQTT topics, and device names are specific to one installation.

Not a framework or a library. Arsenal is a configuration with a strong point of view on how to organize one.

What might be useful: the patterns, the contracts structure, the separation of concerns approach, and the tooling.


GitHub: GitHub - antoinevalentinHA/arsenal: Home Assistant as a governed system · GitHub

Happy to discuss the architectural choices — particularly around the decision layer and the contract model, which are the parts that differ most from typical HA setups.

3 Likes

A quick follow-up after the initial publication.

Over the last weeks, Arsenal evolved from a documented architecture into a governed system with automated validation.

Some highlights:

  • Contract consistency checks integrated into CI
  • Security publication audit with risk classification
  • Architectural changelogs documenting decisions, not only changes
  • Public release pipeline used continuously on the real system
  • More than 100 automated checks executed on every change

One unexpected result: the documentation itself became executable governance.

The original goal was not to document Home Assistant better.

The goal was to reduce architectural entropy over time and make large-scale refactoring possible without losing confidence in the system.

The repository is still very specific to one installation, but the patterns are proving surprisingly transferable.

I'm curious: how many people here are using formal architectural rules, contracts, invariants, or CI validation around their Home Assistant configuration?

1 Like

As an IT guy with a quite large, grown and bad maintainable system I‘m quite interested in your approach. My wife acceptance factor plummeted after I tried to use AI to refactor the already grown system so I‘m looking for a more structured approach.
I already started moving any logic from the ui to templates. It really makes the ui responsive again and helps to keep the system clean but I tend to skip this approach when I try out new stuff.

I‘ll take a deeper look at the repo and do an automatic translation of your code comments because my French is not good enough.

Thank you for sharing your work.

Thanks — and yes, WAF was a real part of what pushed me here (alongside simply enjoying the architecture side of the problem). A system the household can't trust eventually gets ripped out, however clever it is.

Sounds like you're already on one of the most important steps: getting logic out of the UI. That's one of the foundational principles in Arsenal — the backend decides, the UI only renders. Everything else tends to flow from that separation.

The “I skip it when I try new stuff” part is exactly where systems drift, in my experience. Most long-term complexity doesn't come from bad design decisions; it comes from temporary experiments that quietly become permanent.

What helped me wasn't having more discipline in the moment. It was making promotion explicit: an experiment can be as messy as it wants, but it doesn't get to influence a real decision entity until there's a contract for it. That single rule keeps the mess quarantined while still allowing fast iteration.

Auto-translation should handle most of the comments reasonably well. The only potentially confusing part is some domain-specific entity vocabulary — e.g. "verrouille" (locked), "besoin_brut" (raw demand), "on_demande" / "off_demande" (requested on/off) — but the overall structure should remain readable.

I'd be very interested to hear which refactor caused the biggest WAF drop in your case. Those stories usually contain the most valuable architectural lessons.

Well the were 3 issues that ruined the acceptance of the system

  1. Moving from a small with HAOS to a bigger one based upon runtipi/docker
    Some stuff broke and it took some time to get up everything running again
  2. I used AI to do some refactorings that did not play well with the gui based logic - I was too eager to experiment with AI
  3. Trying to use blueprints to not reinvent the wheel
    The overall issue is that my family especially my wife likes a system that takes care if she is not in the house but really hates it to battle with automations that are not written with such a case in mind. A lot of the brilliant blue prints actually ended up in a battle between the inhabitants and the automations. Automation lowers the cover, someone in the house adjusts it and after a certain grace period automation lowers it again and the vicious cycle repeated.

I'm still trying to find a balance between using other peoples stuff which is brilliant but sometimes not really fitting my special cases and doing everything from scratch. Maybe AI and the use of stuff like a good skill pack - adkisson/ha-development-skillpack: Skill pack for LLM vibe coding for Home Assistant - together with some more architectural framework like yours can do the job.

I'm curious, are you using AI to develop stuff and if yes which kind of approach?
I'm experimenting with bmad-code-org/BMAD-METHOD: Breakthrough Method for Agile Ai Driven Development with the free version of antigravity

Hi Alexander,

Thanks for the detailed reply — your point 3 is the one I've spent the most time on, because it isn't really an automation bug, it's a governance gap.

The cover war (automation closes → someone reopens → grace period → automation closes again) happens because the automation treats its own intent as authoritative and re-asserts on a timer. In Arsenal the rule is the opposite: a human action is a first-class writer, and an automation may only act when it can justify acting — its triggering condition must have genuinely changed (sun crossed a threshold, presence flipped, etc.). "The grace period elapsed" is not a legitimate reason to overrule a human.

Add an explicit manual-override state with a defined lifecycle, have the automation read it before it's allowed to write, and the loop disappears. Three doctrines do most of the work: one authoritative writer per output, every action carries an explicit legitimacy condition, and no silent fallback — if the system can't justify an action it does nothing rather than guess.

On AI — yes, heavily. I should be upfront: I came to all of this about a year ago, with no background in Home Assistant, software architecture, or code. So the setup below isn't a stylistic preference, it's what makes the whole thing possible for someone starting from zero — and it's close to the opposite of vibe coding:

  • Claude handles contracts, governance, audits and review. It never touches the repo.
  • ChatGPT generates code against the contract.
  • I commit everything by hand; the AI never pushes.

Your point 2 (AI refactors that don't play well with the GUI logic) is exactly the failure I designed against. The fix isn't a better prompt, it's an invariant the AI can't argue its way past: I front-load CI validators (naming, single-writer, reachability, synchrony…) so any AI proposal that violates a contract is rejected before it reaches my family's house. That inverts the risk — AI becomes safe to use because the contracts are non-negotiable.

So I think your instinct is right: skill pack + architectural framework is the combination. The skill pack gives the AI competence; the framework gives it boundaries. Competence without boundaries is precisely how point 2 happens — and as a beginner, those boundaries are exactly what I can't produce on intuition, so I had to encode them explicitly.

That's also where I'd gently differ from BMAD. I read BMAD as "documentation is the single source of truth." Arsenal's anchor is the reverse: the runtime is the reference, and the contract documents and audits the runtime. For a live house where my wife is the real acceptance test, I can't afford a spec that quietly drifts from what the home actually does. Same instinct as BMAD — make AI predictable through structure — opposite anchor point.

I haven't run BMAD or Antigravity yet, though Antigravity hosting Claude and Gemini side by side maps almost perfectly onto the dual-AI split I already use, so I'll probably give it a spin. My honest expectation: the agentic loop will generate code just fine. The hard part was never generation — it's keeping a ~3,400-entity system coherent, which I could never hold by hand, let alone a year in. The contracts are what make it survivable: it's a governance problem before it's a coding one, and that's where they earn their keep.

Curious how BMAD holds up once your config gets large and stateful — keep me posted.