A presentable voice assistant satellite

Am I correct that you’re not running the changes I suggested? You should put them back in, as I’m pretty sure you will not hit that specific issue with the changes in place as I haven’t seen it at all with the change in place. I also did a different modification for my ESP32-S3-BOX that is kind of the same thing the ESP32 mod, but slightly changed because the S3-BOX is doing initial wake word detection. Both the ESP32 and ESP32S3 ran without issue for at least 4 days. I had the S3-BOX first and gave up on it because it would be non-responsive most morning. I did have an issue with the ESP32 that I describe below. You were correct about network connectivity being part of the problem associated with the voice assistant. All of the communications between the satellite and HA are UDP. if a packet gets stepped on or arrives out of order, it’ll cause the audio to be messed up. I did some ping test and found that when I had reasonable ping response times (less than 10 ms) the audio quality out the speaker was great. However, if something was delaying packets on the network I get the studder issue. Now I did have an issue last night where the ping responses from the ESP32 were greater than 1000ms with about a 50% packet drop rate on the ping request. I’m not sure what caused this, but it happened after I did my network modifications. So, network activity would have been getting interrupted. I’m going to watch this to see if over time the ESP32 gets slower responding to pings. In the event that happens I’m going to set up an automation on HA to reboot the satellites in the middle of the night.

Check this post #72 further up-thread from @Rich37804 specifically regarding using GPIO20 for your amplifier bit clock. Also be certain the ground between the MAX98537a amplifier and ESP32-S3 is connected.

My device bacame very unstable with those changes. Lots of freezing.
Im going on 3 days with only one issue as things are now with 4 assistants running.

Had two false positives tonight, but that’s not surprising, considering the VA sits beneath the television. Just very happy to have my first VA fully assembled and operational this afternoon and evening. Plan is to assemble another two or three this weekend.

1 Like

Hmm that’s interesting. The changes I posted work for me on the ESP32. I had lots of issues without those changes. These same changes were problematic on the ESP32S3, so I made slightly different changes for better performance on the ESP32S3. Can you post your most recent configuration file for the ESP32, so I can run a test on my system?

For the ChatGPT, did you set up the localai implementation or did you just use the online approach? I set up the localai stuff. From the command line I get pretty good responses to generalized question. Here’s and example command line query and response:

$ curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "luna",
     "messages": [{"role": "user", "content": "Who is batman?"}],
     "temperature": 0.5 
}'
{"created":1709344133,"object":"chat.completion","id":"0ebf26e2-869e-4dcf-b6ac-413c710fa692","model":"lunademo","choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"Batman is a superhero that appears in American comic books published by DC Comics. He is the alias of Bruce Wayne, a billionaire playboy who uses his wealth to fight crime in Gotham City."}}],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

When I hook it up to the HA voice assistant pipeline using extended_openai_conversation, I see the question to come through in the localai logs. I’m running localai on an AMD 3900x. Giving 22 cores for localai processing. Response take about 3 minutes. I can’t seem to get things right so it’ll attempt to turn the lights out. I’m not sure if I have a configuration issue, a model issue or a docker container issue. The thing I really don’t understand is, for some strange reason the model, which returns good responses to generalize question through the command line, simply return the question I ask as the response when queried through extended_openai_conversation integration. This is seen in the GUI and in the localia logs.

This link was the best reference I found for setting up localai with an appropriate model. I don’t currently have a CUDA based graphics card in the server I’m running it on, so I had to slightly modify the docker compose lines. I’m running the full pipeline on one server. This is my full docker-composer file for the pipeline:

version: '3'
services:
  wyoming-whisper:
    image: rhasspy/wyoming-whisper
    ports:
      - "10300:10300"
    volumes:
      - ./whisper-data:/data
    #command: [ "--model", "medium-int8", "--language", "en" ]
    command: [ "--model", "small-int8", "--language", "en" ]
    restart: unless-stopped

  openwakeword:
    container_name: openWakeWord
    image: rhasspy/wyoming-openwakeword
    volumes:
      - ./openwakeword-data:/data
      - ./openwakeword-data:/custom
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
    environment:
      - TZ=America/New_York
    #command: [ "--preload-model", "ok_nabu", "--custom-model-dir", "/custom" ]
    #command: --preload-model 'ok_nabu' --custom-model-dir /custom
    #command: --preload-model 'alexa' --custom-model-dir /custom
    command: --preload-model 'hey_jarvis' --custom-model-dir /custom
    restart: unless-stopped
    ports:
      - 10400:10400

  wyoming-piper:
    image: rhasspy/wyoming-piper
    ports:
      - "10200:10200"
    volumes:
      - "./piper-data:/data"
    #command: [ "--voice", "en-gb-southern_english_female-low" ]
    command: [ "--voice", "en_GB-northern_english_male-medium" ]
    #command: [ "--voice", "en_GB-semaine-medium" ]
    restart: unless-stopped

  localai:
    image: quay.io/go-skynet/local-ai:v2.9.0
    ports:
      - 8080:8080
    environment:
      - DEBUG=true
      - MODELS_PATH=/models
    volumes:
      - "./models:/models"

Anyway that’s a lot of words and I’m really hoping you have the localai working and if yes could share which model you using and any configuration you’ve used to make it work.

post deleted

Only problem is, too many of the AITRIP ESP32-S3-WROOM-1-N16R8 boards I received simply do not work reliably. :astonished: I cannot recommend this board, based on my experience with two different orders of three boards each.

Mine have been purring like kittens for a few days now. I only have 2 running at the moment because I have replaced the other 2 with Wyoming satellites.
No errors/issues from the 2 esp32 satellites in a few days now.

1 Like

Are you running micro wake word on yours?

No, I am not. I see no need for that in my configuration.

I was just wondering as the S3 am i trying to get micro wake word running on is casing crackling on the speaker as I said earlier. I have had 2 ESP32 wroom boards running without micro wake word for a while.

It just seems to be my S3 boards which are causing issues.

The way I trouble shoot that is to disconnect one wire at a time until the cracking goes away. Dont unplug any ground or positive lines. Unplug one, if the crackling stays, plug back in. Unplug and the crackling stops, reassign that output/input to a different gpio. On one of my s-3 boards it was a line running to the microphone causing it.

The microphone is responding fine, its only when the speaker plays the response that it crackles. Sometimes one word is legible but not often. I have tried various pins for the speaker with no luck.

I had the same problem, I managed to solve it by changing the speaker ground pin.

Will try that, as I have not tried that yet. Thanks for the tip.

This configuration is working well for me so far with micro_wake_word enabled on the esp32-s3-devkitc-1. I’m also using the on-board ws2812 LED for visual feedback. I’ve modified the yaml from here to suit the s3-board.

I’m using this board (ESP32-S3 N16R8). If you go for this one, you need to be aware that you’ll have to solder the LED pads for pin 48 to have the led work and 5V is not connected to power the Max98357, but I have not had an issue so far powering it off of 3.3V, as it is supported.

The one issue I am getting, and I was getting this with my esp32-devkit boards, is the voice/speaker jitter/stutter. Not sure if this is common across all implementations of VA on esp32s. This is an intermittent issue and I’m guessing it is CPU load related.

esphome:
  name: s3test
  friendly_name: S3Test
  platformio_options:
    board_build.flash_mode: dio
  on_boot:
      priority: 600
      then: 
        # Run the script to refresh the LED status
        - script.execute: control_led
        # - output.turn_off: set_low_speaker
        # If after 30 seconds, the device is still initializing (It did not yet connect to Home Assistant), turn off the init_in_progress variable and run the script to refresh the LED status
        - delay: 30s
        - if:
            condition:
              lambda: return id(init_in_progress);
            then:
              - lambda: id(init_in_progress) = false;
              - script.execute: control_led
              
esp32:
  board:   esp32-s3-devkitc-1
  variant: esp32s3
  framework:
    type:  esp-idf
    components:
      - name:    esphome_board
        source:  github://jesserockz/esphome-esp-adf-board@main
        refresh: 0s
    sdkconfig_options:
      CONFIG_ESP32S3_DEFAULT_CPU_FREQ_240: "y"
      CONFIG_ESP32S3_DATA_CACHE_64KB:      "y"
      CONFIG_ESP32S3_DATA_CACHE_LINE_64B:  "y"
      CONFIG_AUDIO_BOARD_CUSTOM:           "y"

psram:
  mode: octal
  speed: 80MHz

# Enable logging
logger:

ota:
  password: "<redacted>"

# Enable Home Assistant API
api:
  encryption:
    key: "<redacted>"
  # If the device connects, or disconnects, to Home Assistant: Run the script to refresh the LED status
  on_client_connected:
    - script.execute: control_led
  on_client_disconnected:
    - script.execute: control_led

wifi:
  ssid: !secret tp_wifi_ssid
  password: !secret tp_wifi_password

  # If the device connects, or disconnects, to the Wifi: Run the script to refresh the LED status
  on_connect:
    - script.execute: control_led
  on_disconnect:
    - script.execute: control_led

substitutions:
  # Phases of the Voice Assistant
  # IDLE: The voice assistant is ready to be triggered by a wake-word
  voice_assist_idle_phase_id: '1'
  # LISTENING: The voice assistant is ready to listen to a voice command (after being triggered by the wake word)
  voice_assist_listening_phase_id: '2'
  # THINKING: The voice assistant is currently processing the command
  voice_assist_thinking_phase_id: '3'
  # REPLYING: The voice assistant is replying to the command
  voice_assist_replying_phase_id: '4'
  # NOT_READY: The voice assistant is not ready 
  voice_assist_not_ready_phase_id: '10'
  # ERROR: The voice assistant encountered an error
  voice_assist_error_phase_id: '11'  
  # MUTED: The voice assistant is muted and will not reply to a wake-word
  voice_assist_muted_phase_id: '12'

  #pins
  i2s_out_lrclk_pin: GPIO6 # LRC on Max98357
  i2s_out_bclk_pin: GPIO7 # BCLK on Max98357
  i2s_in_lrclk_pin: GPIO3 # WS on INMP441
  i2s_in_bclk_pin: GPIO2 # SLK on INMP441
  light_pin: GPIO48 # on-board LED
  speaker_pin: GPIO8 # DIN on Max98357
  mic_pin: GPIO4 # SD on INMP441

globals:
  # Global initialisation variable. Initialized to true and set to false once everything is connected. Only used to have a smooth "plugging" experience
  - id: init_in_progress
    type: bool
    restore_value: no
    initial_value: 'true'
  # Global variable tracking the phase of the voice assistant (defined above). Initialized to not_ready
  - id: voice_assistant_phase
    type: int
    restore_value: no
    initial_value: ${voice_assist_not_ready_phase_id}

light:
  - platform: esp32_rmt_led_strip
    rgb_order: GRB
    pin: ${light_pin}
    num_leds: 1
    rmt_channel: 0
    chipset: WS2812
    name: "Status LED"
    id: led
    disabled_by_default: True
    # entity_category: diagnostic
    icon: mdi:led-on
    default_transition_length: 0s
    effects:
      - pulse:
          name: "Slow Pulse"
          transition_length: 250ms
          update_interval: 250ms
          min_brightness: 50%
          max_brightness: 100%
      - pulse:
          name: "Fast Pulse"
          transition_length: 100ms
          update_interval: 100ms
          min_brightness: 50%
          max_brightness: 100%

script:
  # Master script controlling the LED, based on different conditions: initialization in progress, wifi and API connected, and the current voice assistant phase.
  # For the sake of simplicity and re-usability, the script calls child scripts defined below.
  # This script will be called every time one of these conditions is changing.
  - id: control_led
    then:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:
            - if:
                condition:
                  wifi.connected:
                then:
                  - if:
                      condition:
                        api.connected:
                      then:
                        - lambda: |
                            switch(id(voice_assistant_phase)) {
                              case ${voice_assist_listening_phase_id}:
                                id(control_led_voice_assist_listening_phase).execute();
                                break;
                              case ${voice_assist_thinking_phase_id}:
                                id(control_led_voice_assist_thinking_phase).execute();
                                break;
                              case ${voice_assist_replying_phase_id}:
                                id(control_led_voice_assist_replying_phase).execute();
                                break;
                              case ${voice_assist_error_phase_id}:
                                id(control_led_voice_assist_error_phase).execute();
                                break;
                              case ${voice_assist_muted_phase_id}:
                                id(control_led_voice_assist_muted_phase).execute();
                                break;
                              case ${voice_assist_not_ready_phase_id}:
                                id(control_led_voice_assist_not_ready_phase).execute();
                                break;
                              default:
                                id(control_led_voice_assist_idle_phase).execute();
                                break;
                            }
                      else:
                        - script.execute: control_led_no_ha_connection_state
                else:
                  - script.execute: control_led_no_ha_connection_state
          else:
            - script.execute: control_led_init_state


  # Script executed during initialisation: In this example: Turn the LED in green with a slow pulse 🟢
  - id: control_led_init_state
    then:
      - light.turn_on:
          id: led
          blue: 0%
          red: 0%
          green: 100%
          effect: "Fast Pulse"
  

  # Script executed when the device has no connection to Home Assistant: In this example: Turn off the LED 
  - id: control_led_no_ha_connection_state
    then:
      - light.turn_off:
          id: led  


  # Script executed when the voice assistant is idle (waiting for a wake word): In this example: Turn the LED in white with 20% of brightness ⚪
  - id: control_led_voice_assist_idle_phase
    then:
      - light.turn_on:
          id: led
          blue: 100%
          red: 100%
          green: 100%
          brightness: 20%
          effect: "none"


  # Script executed when the voice assistant is listening to a command: In this example: Turn the LED in blue with a slow pulse 🔵
  - id: control_led_voice_assist_listening_phase
    then:
      - light.turn_on:
          id: led
          blue: 100%
          red: 0%
          green: 0%
          effect: "Slow Pulse"


  # Script executed when the voice assistant is processing the command: In this example: Turn the LED in blue with a fast pulse 🔵         
  - id: control_led_voice_assist_thinking_phase
    then:
      - light.turn_on:
          id: led
          blue: 100%
          red: 0%
          green: 0%
          effect: "Fast Pulse"


  # Script executed when the voice assistant is replying to a command: In this example: Turn the LED in blue, solid (no pulse) 🔵         
  - id: control_led_voice_assist_replying_phase
    then:
      - light.turn_on:
          id: led
          blue: 100%
          red: 0%
          green: 0%
          brightness: 100%
          effect: "none"


  # Script executed when the voice assistant encounters an error: In this example: Turn the LED in red, solid (no pulse) 🔴        
  - id: control_led_voice_assist_error_phase
    then:
      - light.turn_on:
          id: led
          blue: 0%
          red: 100%
          green: 0%
          brightness: 100%
          effect: "none"


  # Script executed when the voice assistant is muted: In this example: Turn off the LED 
  - id: control_led_voice_assist_muted_phase
    then:
      - light.turn_off:
          id: led


  # Script executed when the voice assistant is not ready: In this example: Turn off the LED 
  - id: control_led_voice_assist_not_ready_phase
    then:
      - light.turn_off:
          id: led

# This is how to include the Espressif Audio Development Framework. 
# This is needed to be able to use VAD (Voice audio detection) and prevent the voice assistant from being constantly streaming audio to Home Assistant
# For now, this component is not documented, nor on the code base of ESPHome, hence the reference to the external component.
esp_adf:    
external_components:
  - source: github://pr#5230
    components:
      - esp_adf
    refresh: 0s 

# Declaration of the switch that will be used to turn on or off (mute) or voice assistant
switch: 
#system     
  - platform: restart
    name: Restart
    id: restart_switch  
  - platform: template
    name: Enable Voice Assistant
    id: use_wake_word
    optimistic: true
    restore_mode: RESTORE_DEFAULT_ON
    icon: mdi:assistant
    # When the switch is turned on (on Home Assistant):
    # Start the voice assistant component
    # Set the correct phase and run the script to refresh the LED status
    on_turn_on:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:      
            - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
            - if:
                condition:
                  not:
                    - voice_assistant.is_running
                then:
                  - micro_wake_word.start
            - script.execute: control_led
    # When the switch is turned off (on Home Assistant):
    # Stop the voice assistant component
    # Set the correct phase and run the script to refresh the LED status
    on_turn_off:
      - if:
          condition:
            lambda: return !id(init_in_progress);
          then:      
            - voice_assistant.stop
            - micro_wake_word.stop
            - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
            - script.execute: control_led


# This is our two i2s buses with the correct pins. 
# You can refer to the wirinng diagram of our voice assistant for more details 
i2s_audio:
  - id: i2s_out
    i2s_lrclk_pin: ${i2s_out_lrclk_pin}
    i2s_bclk_pin: ${i2s_out_bclk_pin}
  - id: i2s_in
    i2s_lrclk_pin: ${i2s_in_lrclk_pin}
    i2s_bclk_pin: ${i2s_in_bclk_pin}


# This is the declaration of our microphone. 
# It includes the data pin (You can refer to the wiring diagram of our voice assistant for more details)
# It references the correct i2s bus declared above.
microphone:
  platform: i2s_audio
  id: external_microphone
  adc_type: external
  i2s_audio_id: i2s_in
  i2s_din_pin: ${mic_pin}
  channel: left
  pdm: false

# This is the declaration of our speaker. 
# It includes the data pin (You can refer to the wiring diagram of our voice assistant for more details)
# It references the correct i2s bus declared above.
# output:
#   - platform: gpio
#     pin:
#         number: ${speaker_pin}
#         allow_other_uses: true
#     id: set_low_speaker

speaker:
  platform: i2s_audio
  id: external_speaker
  dac_type: external
  i2s_audio_id: i2s_out
  i2s_dout_pin: 
    number: ${speaker_pin}
    # allow_other_uses: true


micro_wake_word:
  model: okay_nabu
  on_wake_word_detected:
    then:
      - voice_assistant.start:

# This is the declaration of our voice assistant
# It references the microphone and speaker declared above.
voice_assistant:
  id: va
  microphone: external_microphone
  speaker: external_speaker
  # use_wake_word: true
  
  # This is how I personally tune my voice assistant, you may have to test a few values for the 4 parameters above
  noise_suppression_level: 4
  auto_gain: 31dBFS
  volume_multiplier: 8.0
  vad_threshold: 3

  # When the voice assistant connects to HA:
  # Set init_in_progress to false (Initialization is over).
  # If the switch is on, start the voice assistant
  # In any case: Set the correct phase and run the script to refresh the LED status
  on_client_connected:
    - lambda: id(init_in_progress) = false; 
    - if:
        condition:
          switch.is_on: use_wake_word
        then:
          - micro_wake_word.start
          - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
        else:
          - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
    - script.execute: control_led

  # When the voice assistant disconnects to HA: 
  # Stop the voice assistant
  # Set the correct phase and run the script to refresh the LED status
  on_client_disconnected:
    - lambda: id(voice_assistant_phase) = ${voice_assist_not_ready_phase_id};  
    - micro_wake_word.stop
    - script.execute: control_led

  # When the voice assistant starts to listen: Set the correct phase and run the script to refresh the LED status
  on_listening:
    - lambda: id(voice_assistant_phase) = ${voice_assist_listening_phase_id};
    - script.execute: control_led

  # When the voice assistant starts to think: Set the correct phase and run the script to refresh the LED status
  on_stt_vad_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_thinking_phase_id};
    - script.execute: control_led

  # When the voice assistant starts to reply: Set the correct phase and run the script to refresh the LED status
  on_tts_stream_start:
    - lambda: id(voice_assistant_phase) = ${voice_assist_replying_phase_id};
    - script.execute: control_led
  
  on_end:
    - if:
        condition:
          - switch.is_on: use_wake_word
        then:
          - wait_until:
              not:
                voice_assistant.is_running:
          - micro_wake_word.start
  # When the voice assistant finished to reply: Set the correct phase and run the script to refresh the LED status
  on_tts_stream_end:
    - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
    - script.execute: control_led

  # When the voice assistant encounters an error: 
  # Set the error phase and run the script to refresh the LED status
  # Wait 1 second and set the correct phase (idle or muted depending on the state of the switch) and run the script to refresh the LED status 
  on_error:
    - if:
        condition:
          lambda: return !id(init_in_progress);
        then:
          - lambda: id(voice_assistant_phase) = ${voice_assist_error_phase_id};  
          - script.execute: control_led
          - delay: 1s
          - if:
              condition:
                switch.is_on: use_wake_word
              then:
                - lambda: id(voice_assistant_phase) = ${voice_assist_idle_phase_id};
              else:
                - lambda: id(voice_assistant_phase) = ${voice_assist_muted_phase_id};
          - script.execute: control_led

VAD will always say that , there is no benefit in having it in the config, esp-adf only works with the s3 box 3 , you can safely remove esp-adf , the external component and vad_threshold: :slight_smile:

stutter shouldn’t be an issue on an s3, make sure that your wifi connection is good, and also that your HA isnt instance isn’t under load (check hardware monitor whilst issuing voice command) . if you have other components installed on the same board such as web_server: try disabling that.

Thank you, I’ll try that with the second VA I assembled today. (Apparently, I found one of the few good S3 boards in my recent shipments.) NOPE! :frowning: esp_adf is needed.

-- Building ESP-IDF components for target esp32s3
-- Configuring incomplete, errors occurred!
See also "/data/build/vasst-sunroom/.pioenvs/vasst-sunroom/CMakeFiles/CMakeOutput.log".

fatal: not a git repository (or any parent up to mount point /)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
CMake Error at /data/cache/platformio/packages/framework-espidf/tools/cmake/build.cmake:201 (message):
  Failed to resolve component 'audio_sal'.
Call Stack (most recent call first):
  /data/cache/platformio/packages/framework-espidf/tools/cmake/build.cmake:241 (__build_resolve_and_add_req)
  /data/cache/platformio/packages/framework-espidf/tools/cmake/build.cmake:518 (__build_expand_requirements)
  /data/cache/platformio/packages/framework-espidf/tools/cmake/project.cmake:476 (idf_build_process)
  CMakeLists.txt:3 (project)

========================== [FAILED] Took 4.08 seconds ==========================

FYI, I’ve found the Adafruit SPH045 MEMS microphone to be a MUCH better device to connect, configure, and physically install. It has pins in a straight line, instead of all around a circle. Also, if you leave the SEL pin unconnected, it defaults to Left, no need to ground it.

do a clean build files , then try install again. you have to run clean build after adding or removing a component when using esp-idf framework