Atom Echo with Claude and Continuous Conversation

Voice Assistant with Claude CLI, Custom Wake Word & Continuous Conversation. Yes I asked Claude to reformat my post below, it makes it more straightforward to follow even if obvious AI formatting! :slight_smile:

My voice assistant setup using Claude CLI (not API) as the conversation agent, with a custom-trained wake word, streaming TTS to Sonos/Google Home speakers, and continuous conversation.


Hardware & Software

| Component | Details |

|-----------|---------|

| Server | Mac Mini M1 16GB, Asahi Linux (Fedora) |

| Microphones | 5x M5Stack ATOM Echo (ESP32) |

| Speakers | Sonos Play:1 (3x), Google Home (2x) |

| Wake Word | Custom “Jarvis” via microWakeWord - runs on-device |

| STT | Google Cloud Speech-to-Text (~1s) |

| Conversation | Claude Haiku via Claude Code CLI in Docker |

| TTS | Piper (en_US-ryan-high) - ONNX Runtime CPU, ~200ms/sentence |

Flow: ATOM Echo → Wake Word (on-device) → HA → Google Cloud STT → Claude CLI → Piper TTS → HTTP stream → Sonos/Google Home


Key Features

1. Custom Wake Word

Trained using microWakeWord framework, runs entirely on ESP32. Tune probability_cutoff per room (0.5-0.7) - lower = more sensitive.

2. Perceived Latency Reduction

Play “One moment” via TTS immediately after STT completes. Masks Claude’s processing time. Only works for Google Home - Sonos buffering (~2-3s) causes the ack to get preempted, so I skip it for Sonos.

3. Sentence-Level Streaming

Claude streams response → SentenceBuffer splits on .!? → each sentence immediately sent to Piper → WAV hosted on HTTP server → played on speaker. Speaker starts talking while Claude is still generating.

4. Sonos Queue-Based Playback

Sonos buffering caused second sentence to overwrite the first before it played. Fix: Use SoCo library directly - first sentence clears queue and starts playback, subsequent sentences just add to queue. Sonos plays them in sequence.

5. Auto Speaker Routing

Detects which ATOM Echo triggered (checks satellite state in HA) and routes audio to the corresponding room’s speaker automatically.

6. Continuous Conversation

After response finishes, ATOM Echo listens for 10 more seconds without wake word:

  1. Adapter calculates TTS duration, waits for playback

  2. Satellite state: processing → idle

  3. HA automation triggers ESPHome start_listening service

  4. 10s timeout script - cancelled if user speaks, otherwise returns to wake word mode

Critical: Trigger automation on from: processing to to: idle (not just to: idle) to avoid loops.

7. Claude CLI Streaming

Spawns Claude with: --model haiku --continue --allowedTools WebSearch,WebFetch,Bash --output-format stream-json --include-partial-messages

  • --continue resumes conversation context

  • --allowedTools enables web search + hass-cli for HA control

  • Stream JSON parsed for content_block_delta events with text

8. Conversation Persistence

Claude CLI runs continuously in Docker with --continue flag. Follow-up questions work naturally, no cold start after first interaction.


Docker Setup

Container includes:

  • Python 3.11 + Node.js 20 (for Claude CLI)

  • @anthropic-ai/claude-code (npm)

  • homeassistant-cli (pip) for HA control via Bash tool

  • wyoming, aiohttp, soco, pyyaml

Key config:

  • network_mode: host - required for SoCo Sonos discovery

  • Named volume for /root/.claude - persists Claude auth

  • Wyoming server on port 10500

Piper TTS: Standard rhasspy/wyoming-piper image, CPU-only ONNX Runtime. No MLX (macOS only) or GPU accel on Asahi Linux, but M1 CPU is fast enough.


Latency Breakdown

| Stage | Latency |

|-------|---------|

| Wake word | <100ms (on-device) |

| STT | ~1s (Google Cloud) |

| Claude | 2-4s (masked by ack) |

| TTS | ~200ms/sentence |

| Speaker | Immediate (Google) / 2-3s buffer (Sonos, solved with queue) |


To Improve

  1. Local STT - Looking for fast, accurate alternative for M1. Whisper.cpp works but less accurate than Google Cloud.

  2. Sonos AudioClip API - Lower latency possible but developer portal is not working.

  3. Wake word tuning - Large rooms need lower probability_cutoff but too low = false triggers.