This is a deep technical post on how I structured a room-aware interaction layer in Home Assistant.
If you're interested in how this feels from a user perspective (vs. how it’s built), I’ve written a separate, non-technical version here: Homes That Behave: From Echo Devices to Room-Aware Experiences
Otherwise, if you enjoy architecture, systems design, and Home Assistant internals… welcome ![]()
Full transparency: I used Copilot to help organize and structure this post based on my existing automations, scripts, and helpers.
All architecture, design decisions, and implementation are my own, but Copilot helped turn a working system into something readable and shareable.
I have also been in Enterprise IT my entire 40+ year career building enterprise systems and I have to say - Home Assistant has been an AMAZING platform to be able to build robust, enterprise grade solutions. Yes, I like to tinker, but the power of this platform has enabled me to go far beyond that and my approach to technology mirrors my career. For many this entire post is overkill - but for me, I feel like I was able to make sure that technology sits in the background. Thank you the entire Home Assistant/ Music Assistant/ Open Home Foundation for what you stand for and the products you have produced.
Overview + Architecture
I built what I call the Concierge (also referred to as Abilities Concierge) as a room-aware interaction layer on top of Home Assistant.
The goal is simple:
- Walk into any room and ask: “What can I do here?”
- Receive a truthful, capability-aware response
- Follow up with simple commands like:
- “Turn on the lamps”
- “Open the shades”
- “Play jazz”
- Without needing to know device names, entity IDs, or room-specific vocabulary
Pattern: Lamps vs Lights (Human Vocabulary Mapping)
In my system:
-
“Lamps” = devices labeled as lamps
-
“Lights” = built-in overhead lighting
This allows natural language like:
-
“Turn on the lamps”
-
“Turn on the lights”
…without requiring device names.
The system derives intent dynamically based on labels, not entity naming.
Why this matters:
Most smart homes require you to learn how they work.
This system does the opposite—the home explains itself and adapts to the room you’re in, so anyone (including guests) can use it immediately without training.
Core Design Principles
- Room context is authoritative (Area-based or explicitly passed)
- Voice is an exception interface (discovery, quick actions, media)
- Silence is success (speak only when it adds value)
- Explicit beats clever (no hidden state or guessing)
- Stateless commands (each request stands alone)
- Self-healing system (missing devices/helpers never break behavior)
- Learning is opt-in only (guests never train the system)
System Architecture Diagram (Logical Model)
┌────────────────────────────┐
│ Voice Inputs (HA Voice) │
│ │
└────────────┬───────────────┘
│
▼
┌────────────────────────────────┐
│ Concierge Entry Automations │
│ (Voice Intent + Room Resolution)│
└────────────┬───────────────────┘
│
▼
┌────────────────────────────────────────────┐
│ Room & Audio Resolver (Keystone) │
│ - Determines room │
│ - Determines output mode │
│ - Discovers capabilities │
└────────────┬───────────────────────────────┘
│
┌──────────────┴───────────────┐
▼ ▼
┌──────────────┐ ┌────────────────────┐
│ Capability │ │ Action Scripts │
│ Discovery │ │ (Lights, Music, │
│ Scripts │ │ Shades, TV) │
└──────┬───────┘ └─────────┬──────────┘
│ │
▼ ▼
┌──────────────┐ ┌────────────────────┐
│ Speech Layer │◄──────────┤ Sonos Speak w/ │
│ (Unified) │ │ Ducking │
└──────┬───────┘ └────────────────────┘
│
▼
┌──────────────────────────────┐
│ Output Channels │
│ - Sonos (primary) │
│ - Voice Assistant fallback │
│ - ESPHome (visual signals) │
└──────────────────────────────┘
Supporting Layer:
- Helpers (brightness, speaker profiles, posture)
- Integrations (lighting, media, sensors, AI)
- Dashboards (Fully Kiosk)
Part 2 — Objects Required to Implement the Concierge
This section documents the actual system components:
Automations (entry points), Scripts (engines), and Helpers (state storage).
1. Entry Automations (Voice + Orchestration)
Concierge – Voice Entry (HA Voice)
Purpose
- Primary voice entry point
- Resolves room via
area_id(trigger.device_id) - Routes to discovery or action pipeline
Calls
- Room & Audio Resolver (Keystone)
- Room Abilities – Speak (Unified)
- Follow-up command automations
Concierge – Follow-Up Commands (HA Voice)
Purpose
- Handles stateless commands:
- Lamps / lights / music / TV / shades
- Uses “speak-first” + parallel execution pattern
Calls
- Turn On Lamps / Lights – Usual
- Music scripts
- Shade scene triggers
- TV control scripts
Concierge – Room Monitoring Awareness
Purpose
- Handles awareness queries:
- Temperature
- Humidity
- Light level
- Air quality
Calls
- Room Monitoring Abilities – Speak
- Sensor-specific speak scripts
Concierge – Follow-Up Sensor Queries
Purpose
- Dedicated handler for environmental questions
Concierge – Lighting Percentage Catcher
Concierge – Shade Percentage Catcher
Purpose
- Interprets natural language like:
- “Set lights to 30%”
- “Shades to 50%”
Lighting Profile – Learn on Use
Purpose
- Learns brightness from real usage
- Triggers after state stabilization
Updates
input_number.<entity>_learned_brightness
Learn Music Volume – On Sonos Change
Purpose
- Captures room music volume into speaker profile
Concierge – Intentional Learning
Purpose
- Only learns when explicitly requested:
- “Remember this”
- “That’s perfect”
Bedtime / Goodnight / Good Morning (Concierge)
Purpose
- Room posture + environment control routines
Duck Sonos When Assist is Listening
Purpose
- Improves voice interaction clarity
2. Core Scripts (Headless Engines)
Room & Audio Resolver (Keystone)
The single most important script
Responsibilities
- Resolve room context
- Detect Sonos presence
- Select output mode (Sonos vs fallback)
- Seed speaker profile
- Determine:
- has_lamps
- has_lights
- output_mode
- scope
Room Abilities – Speak (Unified)
Purpose
- Answers: “What can I do here?”
Behavior
- Discovers capabilities in real-time:
- Lamps vs Lights
- Shades
- Music
- TV
- Speaks only what exists
Sonos Speak with Ducking (Room)
Purpose
- Central speech system for the home
Behavior
- Ducks music
- Speaks message
- Restores playback
Voice Assistant Speak (Fallback)
Purpose
- Used when Sonos is unavailable or unsuitable
Turn On Lamps – Usual (Room)
In our home we separate Lamp from Lights by using labels on all lamps to separate them from the overhead lights.
Purpose
- Restores learned brightness per lamp
Turn On Lights – Usual (Room)
In our home lights are the built-in overhead lights in the ceiling (art lights, overhead lights, etc.) We use a label on them as 'Lights' to determine the from Lamps.
Purpose
- Applies learned brightness across all luminaires
Learn Lighting – Usual (Room)
Purpose
- Explicit learning of lighting posture
Resolve Music Player – Room
Purpose
- Determines correct Sonos player
Music – Update Last Media (Room)
Purpose
- Persists playback context
Music – Determine Genre
Purpose
- Identifies current playback genre
Play Genre / Artist / Album – Room
Purpose
- Media playback via Music Assistant
Start Music – Usual (Room)
Purpose
- Deterministic music startup
Continue Playing – Room
Purpose
- Resume previous playback
Speak Room Temperature / Humidity / Air Quality / Light Level
Purpose
- Environment awareness responses
Set Room Posture
Purpose
- Sets Daytime vs Overnight behavior
Kiosk Set Day Mode / Overnight Mode
Purpose
- Controls dashboard state
Sonos Snapshot + Alarm Chime
Purpose
- Protect and restore audio state for alerts
3. Helpers (State + Memory Layer)
Lighting Helpers (Per Device)
input_number.<entity>_learned_brightness
Used for
- Restoring “usual” lighting state
Speaker Profile (Per Room)
input_text.<room>_speaker_profile
Stores
{"chat":0.45,"music":0.60,"duck":0.25}
Room Posture
input_select.room_posture_<room>
Values
- Daytime
- Overnight
Audit / Observability
input_datetime.lamp_brightness_profile_last_updated
input_datetime.speaker_profile_last_updated
Music Helpers
- Genre counters
- Last media tracking
- Genre determination text
Part 3 - Concierge Call Graph (Automations → Scripts → Helpers)
1. Primary Voice Entry Flow
Automation: Concierge – Voice Entry (HA Voice)
↓
Script: Room & Audio Resolver (Keystone)
↓
├── IF discovery intent ("what can I do here?")
│ ↓
│ Script: Room Abilities – Speak (Unified)
│ ↓
│ Script: Sonos Speak with Ducking (Room)
│ OR
│ Script: Voice Assistant Speak (Fallback)
│
├── IF action intent (lamps / lights / music / TV / shades)
│ ↓
│ Automation: Concierge – Follow-Up Commands
│
└── IF awareness intent (temperature / humidity / air quality)
↓
Automation: Concierge – Room Monitoring Awareness
2. Follow-Up Command Flow (Core Interaction Engine)
Automation: Concierge – Follow-Up Commands
↓
Script: Room & Audio Resolver (Keystone)
↓
├── Lighting
│ ↓
│ Script: Turn On Lamps – Usual (Room)
│ Script: Turn On Lights – Usual (Room)
│
├── Shades
│ ↓
│ Scene Mapping (PowerView Scenes)
│
├── Music
│ ↓
│ Script: Resolve Music Player – Room
│ ↓
│ Script: Start Music – Usual (Room)
│ OR
│ Script: Play Genre / Artist / Album (Room)
│
├── TV
│ ↓
│ Apple TV Service Calls
│
↓
(Speak-first pattern)
↓
Script: Sonos Speak with Ducking (Room)
OR
Script: Voice Assistant Speak (Fallback)
3. Awareness / Sensor Query Flow
Automation: Concierge – Room Monitoring Awareness
↓
Script: Room & Audio Resolver (Keystone)
↓
Script: Room Monitoring Abilities – Speak
↓
├── Script: Speak Room Temperature
├── Script: Speak Room Humidity
├── Script: Speak Room Light Level
├── Script: Speak Room Air Quality
│ ↓
│ (Optional)
│ OpenAI Enrichment (Air-Q interpretation)
↓
Script: Sonos Speak with Ducking (Room)
OR
Script: Voice Assistant Speak (Fallback)
4. Lighting Learning Flow (Passive Memory System)
Automation: Lighting Profile – Learn on Use
↓
(Triggered by brightness changes + stabilization delay)
↓
Update Helper:
input_number.<light>_learned_brightness
↓
Update Helper:
input_datetime.lamp_brightness_profile_last_updated
5. Intentional Learning Flow (Explicit Only)
Automation: Concierge – Intentional Learning
↓
Script: Learn Lighting – Usual (Room)
↓
Update:
input_number.<light>_learned_brightness
OR
Script: Learn Music Volume – Room
↓
Update:
input_text.<room>_speaker_profile (music value only)
6. Music Playback State Tracking
Automation: Music Genre – Observe Playback (All Sources)
↓
Script: Music – Determine Genre
↓
Update Helpers:
counter.music_genre_<genre>
---
Automation: Music – Capture Last Media (Robust)
↓
Script: Music – Update Last Media (Room)
↓
Update Helpers:
last media context (title / artist / source)
7. Continue / Resume Music Flow
Automation: Voice – Continue Playing
↓
Script: Resolve Music Player – Room
↓
Script: Continue Playing – Room
↓
Script: Music – Update Last Media (Room)
8. Speech Pipeline (Global Pattern)
ANY SCRIPT that needs to speak
↓
Script: Room & Audio Resolver (Keystone)
↓
├── IF Sonos available
│ ↓
│ Script: Sonos Speak with Ducking (Room)
│ ↓
│ ├── Snapshot volume
│ ├── Duck music
│ ├── Speak message
│ └── Restore + resume
│
└── ELSE
↓
Script: Voice Assistant Speak (Fallback)
9. Bedtime / Goodnight / Good Morning (Experience Flow)
Automation: Bedtime / Goodnight / Good Morning
↓
Script: Concierge Bedtime / Goodnight / Good Morning
↓
├── Lighting Scripts (Usual / Dim / Off)
├── Shade Scenes (PowerView)
├── Music Control (Start / Stop / Pink Noise)
├── Script: Set Room Posture
│ ↓
│ Update:
│ input_select.room_posture_<room>
│
├── Script: Kiosk Set Mode
│ ↓
│ Fully Kiosk commands
│
└── Script: Sonos Speak with Ducking (Room)
10. Sonos Listening Etiquette Flow
Automation: Duck Sonos When Any Assist Satellite Is Listening
↓
Script: Duck Sonos While Assist Is Listening (Room)
11. Dashboard / Button Path (Non-Voice Entry)
Dashboard Button (Fully Kiosk)
↓
Script Call (same engines as voice)
↓
Script: Room & Audio Resolver (Keystone)
↓
Same execution path as Follow-Up Commands
12. Messaging / Visual Signals (ESPHome Layer)
Automation: Message Event (e.g., Laundry Complete)
↓
Presence Resolution (which room to notify)
↓
├── Script: Sonos Speak with Ducking (Room)
│
└── ESPHome Device
↓
Set Ring Color (status indication)
Mental Model Summary (Why This Works)
This call graph collapses into 3 core truths:
1. Everything flows through the Keystone
- Room resolution
- Output channel selection
- Capability awareness
2. Automations trigger, Scripts execute
- Automations = entry + orchestration
- Scripts = reusable engines
3. Helpers only store human intent
- Brightness
- Volume
- Room posture
Part 4 — Appendix: Integrations Used
Lighting & Power
- Insteon
- All lights and switches
- Enables dimming + learned brightness
- Labels used to identify Lamps and Lights (Overhead)
Presence & Environment
- Aqara Multi-Sensors
- Presence, humidity, light level, temperature
- Air-Q
- Air quality, noise level
- HomeKit
- Bridge for Aqara + leak sensors
Audio & Media
- Sonos
- Primary audio output
- Music Assistant
- Music orchestration
- Apple TV Integration
- TV detection + control
Voice Intelligence
- HA Voice (Assist)
- Local voice input
- OpenAI + OpenAI TTS
- Fallback intelligence
- Air quality interpretation
- Music genre classification
Dashboards & UI
- Fully Kiosk Browser
- Dashboard control
- Media Index
- Apple Shared Album source for rolling images/video
Security & Access
- Unifi Protect
- Cameras
- Unifi Access
- Door + gate control
Task Management
- Microsoft 365 To Do
- Shared shopping list
Messaging & Visual Signals
- ESPHome
- Voice assistant LED ring control
- Visual cues (e.g., laundry complete)
- Presence-aware message routing
My overall objective
This system is not a collection of automations.
It is a translation layer:
Integrations → Capabilities → Experiences
By separating:
- discovery from action
- action from voice
- voice from devices
…the system becomes:
- deterministic
- scalable
- understandable
- and adaptable to any environment