The Concierge: A Room-Aware, Deterministic Interaction Layer for Home Assistant

This is a deep technical post on how I structured a room-aware interaction layer in Home Assistant.

If you're interested in how this feels from a user perspective (vs. how it’s built), I’ve written a separate, non-technical version here: Homes That Behave: From Echo Devices to Room-Aware Experiences

Otherwise, if you enjoy architecture, systems design, and Home Assistant internals… welcome :slightly_smiling_face:

Full transparency: I used Copilot to help organize and structure this post based on my existing automations, scripts, and helpers.

All architecture, design decisions, and implementation are my own, but Copilot helped turn a working system into something readable and shareable.

I have also been in Enterprise IT my entire 40+ year career building enterprise systems and I have to say - Home Assistant has been an AMAZING platform to be able to build robust, enterprise grade solutions. Yes, I like to tinker, but the power of this platform has enabled me to go far beyond that and my approach to technology mirrors my career. For many this entire post is overkill - but for me, I feel like I was able to make sure that technology sits in the background. Thank you the entire Home Assistant/ Music Assistant/ Open Home Foundation for what you stand for and the products you have produced.

Overview + Architecture

I built what I call the Concierge (also referred to as Abilities Concierge) as a room-aware interaction layer on top of Home Assistant.

The goal is simple:

  • Walk into any room and ask: “What can I do here?”
  • Receive a truthful, capability-aware response
  • Follow up with simple commands like:
    • “Turn on the lamps”
    • “Open the shades”
    • “Play jazz”
  • Without needing to know device names, entity IDs, or room-specific vocabulary

Pattern: Lamps vs Lights (Human Vocabulary Mapping)

In my system:

  • “Lamps” = devices labeled as lamps

  • “Lights” = built-in overhead lighting

This allows natural language like:

  • “Turn on the lamps”

  • “Turn on the lights”

…without requiring device names.

The system derives intent dynamically based on labels, not entity naming.

Why this matters:

Most smart homes require you to learn how they work.

This system does the opposite—the home explains itself and adapts to the room you’re in, so anyone (including guests) can use it immediately without training.


Core Design Principles

  • Room context is authoritative (Area-based or explicitly passed)
  • Voice is an exception interface (discovery, quick actions, media)
  • Silence is success (speak only when it adds value)
  • Explicit beats clever (no hidden state or guessing)
  • Stateless commands (each request stands alone)
  • Self-healing system (missing devices/helpers never break behavior)
  • Learning is opt-in only (guests never train the system)

System Architecture Diagram (Logical Model)

                  ┌────────────────────────────┐
                  │   Voice Inputs (HA Voice)  │
                  │                            │
                  └────────────┬───────────────┘
                               │
                               ▼
               ┌────────────────────────────────┐
               │ Concierge Entry Automations     │
               │ (Voice Intent + Room Resolution)│
               └────────────┬───────────────────┘
                            │
                            ▼
        ┌────────────────────────────────────────────┐
        │ Room & Audio Resolver (Keystone)           │
        │ - Determines room                          │
        │ - Determines output mode                   │
        │ - Discovers capabilities                   │
        └────────────┬───────────────────────────────┘
                     │
      ┌──────────────┴───────────────┐
      ▼                              ▼
┌──────────────┐           ┌────────────────────┐
│ Capability   │           │ Action Scripts     │
│ Discovery    │           │ (Lights, Music,    │
│ Scripts      │           │ Shades, TV)        │
└──────┬───────┘           └─────────┬──────────┘
       │                             │
       ▼                             ▼
┌──────────────┐           ┌────────────────────┐
│ Speech Layer │◄──────────┤ Sonos Speak w/     │
│ (Unified)    │           │ Ducking            │
└──────┬───────┘           └────────────────────┘
       │
       ▼
┌──────────────────────────────┐
│ Output Channels              │
│ - Sonos (primary)            │
│ - Voice Assistant fallback   │
│ - ESPHome (visual signals)   │
└──────────────────────────────┘

Supporting Layer:
- Helpers (brightness, speaker profiles, posture)
- Integrations (lighting, media, sensors, AI)
- Dashboards (Fully Kiosk)

Part 2 — Objects Required to Implement the Concierge

This section documents the actual system components:
Automations (entry points), Scripts (engines), and Helpers (state storage).


1. Entry Automations (Voice + Orchestration)

Concierge – Voice Entry (HA Voice)

Purpose

  • Primary voice entry point
  • Resolves room via area_id(trigger.device_id)
  • Routes to discovery or action pipeline

Calls

  • Room & Audio Resolver (Keystone)
  • Room Abilities – Speak (Unified)
  • Follow-up command automations

Concierge – Follow-Up Commands (HA Voice)

Purpose

  • Handles stateless commands:
    • Lamps / lights / music / TV / shades
  • Uses “speak-first” + parallel execution pattern

Calls

  • Turn On Lamps / Lights – Usual
  • Music scripts
  • Shade scene triggers
  • TV control scripts

Concierge – Room Monitoring Awareness

Purpose

  • Handles awareness queries:
    • Temperature
    • Humidity
    • Light level
    • Air quality

Calls

  • Room Monitoring Abilities – Speak
  • Sensor-specific speak scripts

Concierge – Follow-Up Sensor Queries

Purpose

  • Dedicated handler for environmental questions

Concierge – Lighting Percentage Catcher

Concierge – Shade Percentage Catcher

Purpose

  • Interprets natural language like:
    • “Set lights to 30%”
    • “Shades to 50%”

Lighting Profile – Learn on Use

Purpose

  • Learns brightness from real usage
  • Triggers after state stabilization

Updates

  • input_number.<entity>_learned_brightness

Learn Music Volume – On Sonos Change

Purpose

  • Captures room music volume into speaker profile

Concierge – Intentional Learning

Purpose

  • Only learns when explicitly requested:
    • “Remember this”
    • “That’s perfect”

Bedtime / Goodnight / Good Morning (Concierge)

Purpose

  • Room posture + environment control routines

Duck Sonos When Assist is Listening

Purpose

  • Improves voice interaction clarity

2. Core Scripts (Headless Engines)


Room & Audio Resolver (Keystone)

The single most important script

Responsibilities

  • Resolve room context
  • Detect Sonos presence
  • Select output mode (Sonos vs fallback)
  • Seed speaker profile
  • Determine:
    • has_lamps
    • has_lights
    • output_mode
    • scope

Room Abilities – Speak (Unified)

Purpose

  • Answers: “What can I do here?”

Behavior

  • Discovers capabilities in real-time:
    • Lamps vs Lights
    • Shades
    • Music
    • TV
  • Speaks only what exists

Sonos Speak with Ducking (Room)

Purpose

  • Central speech system for the home

Behavior

  • Ducks music
  • Speaks message
  • Restores playback

Voice Assistant Speak (Fallback)

Purpose

  • Used when Sonos is unavailable or unsuitable

Turn On Lamps – Usual (Room)

In our home we separate Lamp from Lights by using labels on all lamps to separate them from the overhead lights.

Purpose

  • Restores learned brightness per lamp

Turn On Lights – Usual (Room)

In our home lights are the built-in overhead lights in the ceiling (art lights, overhead lights, etc.) We use a label on them as 'Lights' to determine the from Lamps.

Purpose

  • Applies learned brightness across all luminaires

Learn Lighting – Usual (Room)

Purpose

  • Explicit learning of lighting posture

Resolve Music Player – Room

Purpose

  • Determines correct Sonos player

Music – Update Last Media (Room)

Purpose

  • Persists playback context

Music – Determine Genre

Purpose

  • Identifies current playback genre

Play Genre / Artist / Album – Room

Purpose

  • Media playback via Music Assistant

Start Music – Usual (Room)

Purpose

  • Deterministic music startup

Continue Playing – Room

Purpose

  • Resume previous playback

Speak Room Temperature / Humidity / Air Quality / Light Level

Purpose

  • Environment awareness responses

Set Room Posture

Purpose

  • Sets Daytime vs Overnight behavior

Kiosk Set Day Mode / Overnight Mode

Purpose

  • Controls dashboard state

Sonos Snapshot + Alarm Chime

Purpose

  • Protect and restore audio state for alerts

3. Helpers (State + Memory Layer)


Lighting Helpers (Per Device)

input_number.<entity>_learned_brightness

Used for

  • Restoring “usual” lighting state

Speaker Profile (Per Room)

input_text.<room>_speaker_profile

Stores

{"chat":0.45,"music":0.60,"duck":0.25}

Room Posture

input_select.room_posture_<room>

Values

  • Daytime
  • Overnight

Audit / Observability

input_datetime.lamp_brightness_profile_last_updated
input_datetime.speaker_profile_last_updated

Music Helpers

  • Genre counters
  • Last media tracking
  • Genre determination text

Part 3 - Concierge Call Graph (Automations → Scripts → Helpers)


1. Primary Voice Entry Flow

Automation: Concierge – Voice Entry (HA Voice)
    ↓
Script: Room & Audio Resolver (Keystone)
    ↓
    ├── IF discovery intent ("what can I do here?")
    │       ↓
    │   Script: Room Abilities – Speak (Unified)
    │       ↓
    │   Script: Sonos Speak with Ducking (Room)
    │       OR
    │   Script: Voice Assistant Speak (Fallback)
    │
    ├── IF action intent (lamps / lights / music / TV / shades)
    │       ↓
    │   Automation: Concierge – Follow-Up Commands
    │
    └── IF awareness intent (temperature / humidity / air quality)
            ↓
        Automation: Concierge – Room Monitoring Awareness

2. Follow-Up Command Flow (Core Interaction Engine)

Automation: Concierge – Follow-Up Commands
    ↓
Script: Room & Audio Resolver (Keystone)
    ↓
    ├── Lighting
    │     ↓
    │   Script: Turn On Lamps – Usual (Room)
    │   Script: Turn On Lights – Usual (Room)
    │
    ├── Shades
    │     ↓
    │   Scene Mapping (PowerView Scenes)
    │
    ├── Music
    │     ↓
    │   Script: Resolve Music Player – Room
    │        ↓
    │   Script: Start Music – Usual (Room)
    │   OR
    │   Script: Play Genre / Artist / Album (Room)
    │
    ├── TV
    │     ↓
    │   Apple TV Service Calls
    │
    ↓
(Speak-first pattern)
    ↓
Script: Sonos Speak with Ducking (Room)
    OR
Script: Voice Assistant Speak (Fallback)

3. Awareness / Sensor Query Flow

Automation: Concierge – Room Monitoring Awareness
    ↓
Script: Room & Audio Resolver (Keystone)
    ↓
Script: Room Monitoring Abilities – Speak
    ↓
    ├── Script: Speak Room Temperature
    ├── Script: Speak Room Humidity
    ├── Script: Speak Room Light Level
    ├── Script: Speak Room Air Quality
    │       ↓
    │   (Optional)
    │   OpenAI Enrichment (Air-Q interpretation)
    ↓
Script: Sonos Speak with Ducking (Room)
    OR
Script: Voice Assistant Speak (Fallback)

4. Lighting Learning Flow (Passive Memory System)

Automation: Lighting Profile – Learn on Use
    ↓
(Triggered by brightness changes + stabilization delay)
    ↓
Update Helper:
    input_number.<light>_learned_brightness
    ↓
Update Helper:
    input_datetime.lamp_brightness_profile_last_updated

5. Intentional Learning Flow (Explicit Only)

Automation: Concierge – Intentional Learning
    ↓
Script: Learn Lighting – Usual (Room)
        ↓
    Update:
        input_number.<light>_learned_brightness

    OR

Script: Learn Music Volume – Room
        ↓
    Update:
        input_text.<room>_speaker_profile (music value only)

6. Music Playback State Tracking

Automation: Music Genre – Observe Playback (All Sources)
    ↓
Script: Music – Determine Genre
    ↓
Update Helpers:
    counter.music_genre_<genre>

---

Automation: Music – Capture Last Media (Robust)
    ↓
Script: Music – Update Last Media (Room)
    ↓
Update Helpers:
    last media context (title / artist / source)

7. Continue / Resume Music Flow

Automation: Voice – Continue Playing
    ↓
Script: Resolve Music Player – Room
    ↓
Script: Continue Playing – Room
    ↓
Script: Music – Update Last Media (Room)

8. Speech Pipeline (Global Pattern)

ANY SCRIPT that needs to speak
    ↓
Script: Room & Audio Resolver (Keystone)
    ↓
    ├── IF Sonos available
    │       ↓
    │   Script: Sonos Speak with Ducking (Room)
    │       ↓
    │       ├── Snapshot volume
    │       ├── Duck music
    │       ├── Speak message
    │       └── Restore + resume
    │
    └── ELSE
            ↓
        Script: Voice Assistant Speak (Fallback)

9. Bedtime / Goodnight / Good Morning (Experience Flow)

Automation: Bedtime / Goodnight / Good Morning
    ↓
Script: Concierge Bedtime / Goodnight / Good Morning
    ↓
    ├── Lighting Scripts (Usual / Dim / Off)
    ├── Shade Scenes (PowerView)
    ├── Music Control (Start / Stop / Pink Noise)
    ├── Script: Set Room Posture
    │        ↓
    │   Update:
    │       input_select.room_posture_<room>
    │
    ├── Script: Kiosk Set Mode
    │        ↓
    │   Fully Kiosk commands
    │
    └── Script: Sonos Speak with Ducking (Room)

10. Sonos Listening Etiquette Flow

Automation: Duck Sonos When Any Assist Satellite Is Listening
    ↓
Script: Duck Sonos While Assist Is Listening (Room)

11. Dashboard / Button Path (Non-Voice Entry)

Dashboard Button (Fully Kiosk)
    ↓
Script Call (same engines as voice)
    ↓
Script: Room & Audio Resolver (Keystone)
    ↓
Same execution path as Follow-Up Commands

12. Messaging / Visual Signals (ESPHome Layer)

Automation: Message Event (e.g., Laundry Complete)
    ↓
Presence Resolution (which room to notify)
    ↓
    ├── Script: Sonos Speak with Ducking (Room)
    │
    └── ESPHome Device
            ↓
        Set Ring Color (status indication)

Mental Model Summary (Why This Works)

This call graph collapses into 3 core truths:

1. Everything flows through the Keystone

  • Room resolution
  • Output channel selection
  • Capability awareness

2. Automations trigger, Scripts execute

  • Automations = entry + orchestration
  • Scripts = reusable engines

3. Helpers only store human intent

  • Brightness
  • Volume
  • Room posture

Part 4 — Appendix: Integrations Used


Lighting & Power

  • Insteon
    • All lights and switches
    • Enables dimming + learned brightness
    • Labels used to identify Lamps and Lights (Overhead)

Presence & Environment

  • Aqara Multi-Sensors
    • Presence, humidity, light level, temperature
  • Air-Q
    • Air quality, noise level
  • HomeKit
    • Bridge for Aqara + leak sensors

Audio & Media

  • Sonos
    • Primary audio output
  • Music Assistant
    • Music orchestration
  • Apple TV Integration
    • TV detection + control

Voice Intelligence

  • HA Voice (Assist)
    • Local voice input
  • OpenAI + OpenAI TTS
    • Fallback intelligence
    • Air quality interpretation
    • Music genre classification

Dashboards & UI

  • Fully Kiosk Browser
    • Dashboard control
  • Media Index
    • Apple Shared Album source for rolling images/video

Security & Access

  • Unifi Protect
    • Cameras
  • Unifi Access
    • Door + gate control

Task Management

  • Microsoft 365 To Do
    • Shared shopping list

Messaging & Visual Signals

  • ESPHome
    • Voice assistant LED ring control
    • Visual cues (e.g., laundry complete)
  • Presence-aware message routing

My overall objective

This system is not a collection of automations.

It is a translation layer:

Integrations → Capabilities → Experiences

By separating:

  • discovery from action
  • action from voice
  • voice from devices

…the system becomes:

  • deterministic
  • scalable
  • understandable
  • and adaptable to any environment