[Voice] Multiple Voice Assistants (>1 Wake-word and Pipeline)

It would be awesome if we could have multiple voice assistants in operation simultaneously. So if I say “ok nabu” I get a nabu casa pipeline for controlling the house and if I say “OK Jarvis” I get a ChatGPT pipeline. Others have described how it would be useful to have different languages for bilingual households
This actually briefly almost worked by mistake Wyoming ignores second assistant's wake word. · Issue #101942 · home-assistant/core · GitHub (at least the open wake word side did).

Provided the hardware is capable of running multiple wake-word models it would seem trivial to implement. Just mandate that each active pipeline have a different wake-word and once a wake-word is detected forward the speech to the appropriate pipeline/wyoming endpoint.

Related:

With 2 pieces of hardware you can so it now.

I am aware, as you can select the pipeline for each piece of hardware, but why not >1 on one piece of hardware?

4 Likes

This would be extremely helpful.
I’d like to have this possibility to use different wake words for different languages.

1 Like

Yeah we need this.

As far as my testing, we can only use 1 language per voice satellite.

Some are reporting success using the Wyoming satellite function on Linux devices e.g raspberry pi zeros

I’d love to see this. I actually have bought multiple of those Atom Echo devices and plugged them into USB outlets in most of my rooms, and they work decently enough as generic house control.

Recognizing different wake words on the same hardware, would allow a different pipeline to be associated with it, and to be route to something like my Ollama instance for general LLM interaction.

adding my comment.

I think this is useful for safety and not only for “fun”: I have three mini-me running around my apartments and the oldest is already capable to ask google to start music, light etc.

I do have also electric shutters in my house and I think this should be used only by adult (It’s not fun to be locked outside). Having multiple wakeworks can be a good solution.

For example: “Hey Jarvis, lower the shutters” for me and my wife. But this should not be available for my daughters wakeword (Hey Casita - for example using Encanto disney movie).

I think this a great use case - but not for multiple wakewords … because it won’t take your kids long to work out that they can also give Jarvis commands.
Instead this seems ideal for Voice Assist to recognise the speaker.

Sounds as if this feature is planned. Mentioned in the latest update at 11:48 and again here:

I’d like to suggest that support for multiple wakewords could be used for more than just running multiple pipelines.

Commercial voice assistant smart speakers like Google Home or Alexa can cancel/stop alarms, or stop playing music, when the user says “Stop”. Home Assistant’s voice assistant should also have this capability.

What if we were to support multiple wakewords, and use “Stop” as a secondary wakeword that could stop timers, and perhaps send commands to HA to stop playback on media devices?

4 Likes

with some help from Gemini I was able to finally do it.

step 1: alsa config to split input device. in my case it is Poly CL-3200M speakerphone.

# /etc/asound.conf

# Define the shared microphone input using dsnoop
pcm.mic_shared {
    type dsnoop             # Use the dsnoop plugin for sharing capture
    ipc_key 1024           # Unique IPC key for this dsnoop instance
                           # (must be different from any dmix ipc_key)
    ipc_key_add_uid false  # Set to false if accessed by different users/root
                           # Set to true if only accessed by the same user
                           # For Docker with host devices, false might be safer initially

    slave {
        pcm "hw:P3200M,0"  # The actual hardware device from arecord -l
        channels 1         # We want mono input for voice
        rate 16000         # Your desired sample rate (e.g., 16kHz for voice)
        format "S16_LE"    # 16-bit signed little-endian (common for voice)
        period_size 1024   # ALSA period size in frames
        buffer_size 8192   # ALSA buffer size in frames (e.g., 4-8 periods)
                           # Larger buffers can help prevent xruns if RT is an issue
                           # but increase latency.
    }

    # Optional: If your P3200M is stereo but you only want the left channel
    # (assuming channel 0 is left, channel 1 is right for the hardware)
    # If P3200M is mono, this bindings section is not strictly needed
    # but doesn't hurt. If it's stereo and you want to pick one channel,
    # make sure 'channels 1' is above in the slave {} block.
    # bindings {
    #    0 0  # Bind channel 0 of this mic_shared device to channel 0 of the slave (hw:P3200M,0)
    # }
}

# Optional: If you also want to make this shared mic the default ALSA capture device
# (Be cautious if other applications expect a different default)
# pcm.default {
#    type asym
#    playback.pcm "plughw:P3200M,0" # Or your preferred default playback
#    capture.pcm "mic_shared"
# }
# ctl.default {
#    type hw
#    card "P3200M"
# }

# If you want a simple default playback to your P3200M speaker for testing
# (used by your Docker 'aplay' command later if you change -D)
pcm.!default {
    type plug
    slave.pcm "hw:P3200M,0" # This assumes P3200M is also your desired speaker
                            # If not, use the correct playback device here
}

# This ensures control elements also default to P3200M
ctl.!default {
    type hw
    card "P3200M"
}

Now I can run one open-wakeword and as many wyoming-satellites as I like.

step 2: patch wyoming-satellite to also install alsa-utils:

in Dockerfile:

apt-get install --yes --no-install-recommends avahi-utils alsa-utils && \

step 3: in docker-compose delegate /dev/snd AND ipc:host

networks:
  voice_services_net: # Define our custom network

services:
  oww:
    image: rhasspy/wyoming-openwakeword
    container_name: oww
    hostname: oww
    user: "1000:1000" # current user
    restart: always
    volumes:
      - ./openwakeword:/config
    command: --preload-model 'hey_chewbacca' --custom-model-dir /config --debug
    networks: # Assign to our custom network
      - voice_services_net
    ports:
     - "10400:10400" # Expose oww's port 10400 for HA.

  sat-chewbacca:
    build:
      context: ./wyoming-satellite # Path to the directory containing the Dockerfile
      dockerfile: Dockerfile
    container_name: sat-chewbacca
    hostname: sat-chewbacca
    ipc: host
    privileged: true # Still likely needed for easy /dev/snd access.
                     # Alternatives involve specific cgroup device permissions.
    devices:
      - "/dev/snd:/dev/snd"
    networks:
      - voice_services_net
    ports: # Expose for HA
      - "10200:10200"
    volumes:
      - ./chewbacca_recordings:/recordings
      - /etc/asound.conf:/etc/asound.conf:ro
    command:
      - "--uri"
      - "tcp://0.0.0.0:10200" # Satellite listens on its port 10200
      - "--name"
      - "dining-chewbaa"
      - "--zeroconf-name"
      - "dining-chewbaa"
      - "--mic-command"
      - "arecord -D mic_shared -r 16000 -f S16_LE -c 1 -t raw"
      - "--snd-command"
      - "aplay -r 22050 -c 1 -f S16_LE -t raw -D plughw:CARD=P3200M,DEV=0 --nonblock"
      - "--debug"
      # - "--debug-recording-dir"
      # - "/recordings"
      - "--wake-word-name"
      - "hey_chewbacca"
      - "--wake-uri"
      - "tcp://oww:10400" # This will resolve to the oww container on voice_services_net

  sat-mendeleev:
    build:
      context: ./wyoming-satellite
      dockerfile: Dockerfile
    container_name: sat-mendeleev
    hostname: sat-mendeleev
    ipc: host
    privileged: true
    devices:
      - "/dev/snd:/dev/snd"
    networks:
      - voice_services_net
    ports: # Expose for HA
      - "10300:10300"
    volumes:
      - ./mendeleev_recordings:/recordings
      - /etc/asound.conf:/etc/asound.conf:ro
    command:
      - "--uri"
      - "tcp://0.0.0.0:10300" # Satellite listens on its port 10300
      - "--name"
      - "dining-mendeleev"
      - "--zeroconf-name"
      - "dining-mendeleev"
      - "--mic-command"
      - "arecord -D mic_shared -r 16000 -f S16_LE -c 1 -t raw"
      - "--snd-command"
      - "aplay -r 22050 -c 1 -f S16_LE -t raw -D plughw:CARD=P3200M,DEV=0 --nonblock"
      - "--debug"
      # - "--debug-recording-dir"
      # - "/recordings"
      - "--wake-word-name"
      - "hey_min_di_le_ev"
      - "--wake-uri"
      - "tcp://oww:10400" # This will resolve to the oww container on voice_services_net

  sat-mr-anderson:
    build:
      context: ./wyoming-satellite
      dockerfile: Dockerfile
    container_name: sat-mr-anderson
    hostname: sat-mr-anderson
    ipc: host
    privileged: true
    devices:
      - "/dev/snd:/dev/snd"
    networks:
      - voice_services_net
    ports: # Expose for HA
      - "10250:10250"
    volumes:
      - ./mr_anderson_recordings:/recordings
      - /etc/asound.conf:/etc/asound.conf:ro
    command:
      - "--uri"
      - "tcp://0.0.0.0:10250"
      - "--name"
      - "dining-mr-anderson"
      - "--zeroconf-name"
      - "dining-mr-anderson"
      - "--mic-command"
      - "arecord -D mic_shared -r 16000 -f S16_LE -c 1 -t raw"
      - "--snd-command"
      - "aplay -r 22050 -c 1 -f S16_LE -t raw -D plughw:CARD=P3200M,DEV=0 --nonblock"
      - "--debug"
      # - "--debug-recording-dir"
      # - "/recordings"
      - "--wake-word-name"
      - "Mr._Anderson"
      - "--wake-uri"
      - "tcp://oww:10400" # This will resolve to the oww container on voice_services_net

OrangePi Zero3 running these 4 containers:

  • Load: 60%
  • Temp: +57C (passive cooling)
  • Mem: 10% of 4 GiB