New build with Loxone, Sonos, Voice Assist and Home Assistant

Hello, I am new to the community and hope you can help me. If this is the wrong category, please help me to move it to the right one…

We are planning a new build. We will implement blinds, lights, and access control with Loxone. There will also be a Loxone weather station and various motion detectors.

For sound, we have decided on Sonos speakers, which will be installed in almost every room. However, I do not want to use Google or Alexa as voice assistants. I prefer Voice Assist. The question is how to do this. I could place ESP boxes or Voice Assist PE next to each Sonos speaker, but I am not sure if it will work if I place the microphone right next to the speaker and music is playing. Currently, I am testing Voice Assist PE and cannot get the response to come from the Sonos speaker instead of the PE. Do you have any tips for YAML code? Or would you recommend a completely different setup?

Generally, I want to set up Home Assistant as the main system and treat Loxone as secondary. Is this a bad idea? I want to have the possibility to expand my smart home later. Nevertheless, I believe that I should wire everything that can be wired during the new build. What do you think? I am very curious about your opinions.

Thanks in advance for your replies!

Any thoughts?

It probably won’t - most voice assistants have trouble with background noise and a speaker a few inches away will be overwhelming. You can mute the Sonos for the TTS response, but the wake word is a real problem.

I don’t think you can at the moment. Assist on HA assumes a single speaker/mic unit - the response always goes back to the source of the command - and as far as I can see there’s no way to hack into that pipeline. (Somebody please tell me I’m wrong - I’d love to be able to do it.)

You can have responses on the Sonos if you restrict yourself to custom sentences and intents (which is what I do currently). For example…

language: "en"
intents:
  CustomCoffee:
    data:
      - sentences:
          - "(Turn | Switch) on the coffee (maker | machine)"
          - "Coffee (maker | machine) on"

and…

# Turn on the coffee machine

CustomCoffee:
  action:
    - service: switch.turn_on
      target:
        entity_id: switch.bedroom_socket_1
    - service: script.willow_tts_response
      data:
        tts_sentence: "{{ states('sensor.finished_phrase') }} Coffee maker on."

script.willow_tts_response is a single script which handles all TTS responses.

alias: Willow TTS response
sequence:
  - if:
      - condition: state
        entity_id: input_boolean.use_cloud_service
        state: "off"
    then:
      - target:
          entity_id:
            - "{{ states('sensor.speaker') }}"
        data:
          announce: true
          media_content_id: >
            media-source://tts/picotts?message="{{ tts_sentence }}"
          media_content_type: music
          extra:
            volume: 50
        action: media_player.play_media
    else:
      - action: tts.speak
        data:
          cache: true
          media_player_entity_id: "{{ states('sensor.speaker') }}"
          message: "{{ tts_sentence }}"
          options:
            voice: Yko7PKHZNXotIFUBG7I9
        target:
          entity_id: tts.elevenlabs
fields:
  tts_sentence:
    selector:
      text: {}
    name: TTS_sentence

Normally it uses the Elevenlabs TTS integration, but if the internet is down it falls back on PicoTTS, which is entirely local. sensor.speaker holds the name of the nearest Sonos speaker, based on tracking phone/watch/movement using Bermuda.

All this is redundant if the coffee maker switch is exposed to Assist - the command will be handled by built-in intents. But then the response can only come through the voice assistant. If you don’t need a response, of course, that’s fine.

I should warn you - this is a very deep rabbit hole. :grin:

Edit: sensor.finished_phrase is a template sensor which picks a random phrase from a list - “OK, done” etc.