Speech-to-Phrase Docker Compose

I am having a lot of difficulty setting up a working docker compose configuration for Speech-To-Phrase. Would anyone be able to post their configuration? I think my difficulty is with the commands and token. Is the URL external or internal?

I think I used piper tts

I will edit this post with compose

EDIT

services:

##########################################
#             OpenWakeWord               #
##########################################
  wakeword:
    container_name: openwakeword
    hostname: openwakeword
    restart: unless-stopped
    image: rhasspy/wyoming-openwakeword  
#    ports:
#      - 10400:10400#I do not expose port and use proxylocal to connect to HA in docker
    volumes:
      - /yourserverfolder/custom:/custom
    environment:
      - preload-model= 'ok_nabu'
      - custom-model-dir= /srv/main/docker/havoice/openwakeword/custom #I trained alexa wake word 
    networks:
      proxylocal:

##########################################
#                TTS-Piper               #
##########################################
  pipertts:
    container_name: pipertts
    hostname: pipertts
    restart: "unless-stopped"
    image: rhasspy/wyoming-piper    
    volumes:
       - /yourserverfolder/data:/data/
#    ports:
#      - "10200:10200" #I do not expose port and use proxylocal to connect to HA in docker
    command: --voice en_US-lessac-medium
    environment:
      PGID: 1015 #i create group to run container under
      PUID: 1015 #i create user to run container under
    networks:
      proxylocal: #"isolated network" created for containers communication.  
      proxywan: #only for container that need wan access

##########################################
#              NETWORKS                  #
##########################################      
networks:
  proxylocal:
    external: true
  proxywan:
    external: true

Its not in the compose above but i am using linuxserver.io faster-whisper in the above compose for Speech to Text

Speech-to-phrase would be a substitution for whisper STT. I think Piper is TTS?

this configuration works for me:

services:
  speech-to-phrase:
    image: rhasspy/wyoming-speech-to-phrase
    container_name: speech-to-phrase
    ports:
      - "10300:10300"
    volumes:
      - /mnt/docker-data/appdata/speech-to-phrase/models:/models
      - /mnt/docker-data/appdata/speech-to-phrase/train:/train
    command:
      --hass-websocket-uri 'ws://homeassistant.local:8123/api/websocket'
      --hass-token '<LONG_LIVED_ACCESS_TOKEN>'
      --retrain-on-start
    restart: unless-stopped

Note: setting of restart policy is up to you

2 Likes

Hi All,

Should there be an order in starting the docker containers for (HA/TTS/STP speach-to-phrase)?

I seem to have ‘reproducable’ problem that when STP is started at the same time of HA, it won’t recognise most of my commands/devices. The build in sentences work fine. When restarting only STP, this is solved again.

I tried adding a health check in HA, which pauzes the startup of STP a little bit, but it seems not long enough?

Should there be a shared config directory?

I’m using:
image: rhasspy/wyoming-speech-to-phrase:1.4.1
image: “Package home-assistant · GitHub
image: rhasspy/wyoming-piper:1.6.2

My compose file:

services:
  homeassistant:
    container_name: homeassistant
    image: "ghcr.io/home-assistant/home-assistant:2025.10.2"
    volumes:
      - /home/ron/docker/certbot/data/certbot/:/certs
      - /home/ron/docker/home-assistant/config/:/config
      - /etc/localtime:/etc/localtime:ro
      - havol:/data/
    restart: unless-stopped
    network_mode: host
    healthcheck:
      test: ["CMD", "curl", "-f", "https://myha.B.B:8123"]
      interval: 60s
      timeout: 5s
      retries: 5
      start_period: 20s


  rhasspy-speech:
    container_name: rhasspy-speech
    image: rhasspy/wyoming-speech-to-phrase:1.4.1
    restart: unless-stopped
    depends_on:
      homeassistant:
        condition: service_healthy
        restart: true
    volumes:
      - "./rhasspy-speech-config/models:/models"
      - "./rhasspy-speech-config/train:/train"
    ports:
      - "10300:10300"
      - "8099:8099"
    command: '--hass-websocket-uri wss://myha.B.B:8123/api/websocket --hass-token BBB --retrain-on-start'


volumes:
  havol: