M5stack atom echo won't execute commands

Hello all. Hoping to get some helpful insight on an issue I’m having:

I’ve set up a voice assistant, and it works, mostly. I do not have Nabu Casa Cloud, but my HA instance is accessible from outside my local network via Tailscale. I’m running this on a Home Assistant Green. I have Wyoming set up with Piper for TTS and Speech-to-Phrase for STT. I also have openWakeWord set up, currently using the default “OK, Nabu” wake word.

The assistant works. I can type to it, or if I’m connected via HTTPS i can use the mic in my browser, or on my mobile to successfully get the assistant to execute the test automations i have set up.

I got an m5stack atom echo to use as a mic/speaker in conjunction with this setup. It came preinstalled w/ ESPhome firmware for HA, but b/c i couldn’t figure out how to add it in that state, i went ahead and re-flashed it via the instructions here. I put in my WiFi credentials, and when it asked to open the HA instance to add it, i used the tailscale URL. Everything went through successfully. I added it to the ESPhome integration, and it’s linked to my Assistant. The wake word engine is set to On Device.

OK, that’s all good. But now, when i speak the wake word, the atom echo’s light turns blue, and pulsates as it waits to recognize a phrase. Then I speak the trigger sentence that triggers the test automation (to turn on the one light i have exposed to Assist) - and the atom echo light turns red, then turns back to the standby white.

I looked around in various logs for errors, and i did find a clearly correlated one in the Speech-to-Phrase addon:

ERROR (online2-cli-nnet3-decode-faster[5.5]:GetLattice():online2/online-nnet3-decoding.cc:69) You cannot get a lattice if you decoded no frames.
kaldi::KaldiFatalErroronline2-cli-nnet3-decode-faster --config=/data/models/en_US-rhasspy/model/online/conf/online.conf --max-active=7000 --lattice-beam=8.0 --acoustic-scale=1.0 --beam=24.0 /data/models/en_US-rhasspy/model/model/final.mdl /share/speech-to-phrase/train/en_US-rhasspy/graph/HCLG.fst /share/speech-to-phrase/train/en_US-rhasspy/graph/words.txt ark:/tmp/tmpzgncdb9t 
LOG (online2-cli-nnet3-decode-faster[5.5]:ComputeDerivedVars():ivector/ivector-extractor.cc:183) Computing derived variables for iVector extractor
LOG (online2-cli-nnet3-decode-faster[5.5]:ComputeDerivedVars():ivector/ivector-extractor.cc:204) Done.
LOG (online2-cli-nnet3-decode-faster[5.5]:RemoveOrphanNodes():nnet3/nnet-nnet.cc:948) Removed 1 orphan nodes.
LOG (online2-cli-nnet3-decode-faster[5.5]:RemoveOrphanComponents():nnet3/nnet-nnet.cc:847) Removing 2 orphan components.
LOG (online2-cli-nnet3-decode-faster[5.5]:Collapse():nnet3/nnet-utils.cc:1488) Added 1 components, removed 2
LOG (online2-cli-nnet3-decode-faster[5.5]:CompileLooped():nnet3/nnet-compile-looped.cc:345) Spent 0.0691831 seconds in looped compilation.

Does anyone have any insight into what might be going on?

i’m so close to this working…

I’ve now installed the ESPhome Device Builder addon, and I hit the “take control” button on this device. i then was able to update the firmware to the latest (for some reason, taking control caused an old firmware to be compiled and installed). it still fails, but now I’m able to see the logs from the device itself:

INFO ESPHome 2025.7.2
INFO Reading configuration /config/esphome/m5stack-atom-echo-xxxxxx.yaml...
INFO Starting log output from 192.168.XXX.XXX using esphome API
INFO Successfully resolved m5stack-atom-echo-bbe038 @ 192.168.XXX.XXX in 0.000s
INFO Successfully connected to m5stack-atom-echo-xxxxxx @ 192.168.XXX.XXX in 0.099s
INFO Successful handshake with m5stack-atom-echo-xxxxx  @ 192.168.XXX.XXX in 0.123s
[19:40:34][I][app:164]: ESPHome version 2025.7.2 compiled on Jul 20 2025, 19:35:00
[19:40:34][C][wifi:613]: WiFi:
[19:40:34][C][wifi:434]:   Local MAC: XX:XX:XX:XX:XX:XX
[19:40:34][C][wifi:439]:   SSID: [redacted]
[19:40:34][C][wifi:442]:   IP Address: 192.168.XXX.XXX
[19:40:34][C][wifi:446]:   BSSID: [redacted]
[19:40:34][C][wifi:446]:   Hostname: 'm5stack-atom-echo-xxxxxx'
[19:40:34][C][wifi:446]:   Signal strength: -42 dB ▂▄▆█
[19:40:34][C][wifi:455]:   Channel: 3
[19:40:34][C][wifi:455]:   Subnet: 255.255.255.0
[19:40:34][C][wifi:455]:   Gateway: 192.168.XXX.1
[19:40:34][C][wifi:455]:   DNS1: 192.168.XXX.1
[19:40:34][C][wifi:455]:   DNS2: 192.168.XXX.1
[19:40:34][C][logger:246]: Logger:
[19:40:34][C][logger:246]:   Max Level: DEBUG
[19:40:34][C][logger:246]:   Initial Level: DEBUG
[19:40:34][C][logger:252]:   Log Baud Rate: 115200
[19:40:34][C][logger:252]:   Hardware UART: UART0
[19:40:34][C][logger:259]:   Task Log Buffer Size: 768
[19:40:34][C][esp32_rmt_led_strip:268]: ESP32 RMT LED Strip:
[19:40:34][C][esp32_rmt_led_strip:268]:   Pin: 27
[19:40:34][C][esp32_rmt_led_strip:272]:   RMT Symbols: 192
[19:40:34][C][esp32_rmt_led_strip:297]:   RGB Order: GRB
[19:40:34][C][esp32_rmt_led_strip:297]:   Max refresh rate: 0
[19:40:34][C][esp32_rmt_led_strip:297]:   Number of LEDs: 1
[19:40:34][C][template.select:065]: Template Select 'Wake word engine location'
[19:40:34][C][template.select:066]:   Update Interval: 60.0s
[19:40:34][C][template.select:069]:   Optimistic: YES
[19:40:34][C][template.select:069]:   Initial Option: On device
[19:40:34][C][template.select:069]:   Restore Value: YES
[19:40:34][C][gpio.binary_sensor:052]: GPIO Binary Sensor 'Button'
[19:40:34][C][gpio.binary_sensor:053]:   Pin: GPIO39
[19:40:35][C][gpio.binary_sensor:055]:   Mode: interrupt
[19:40:35][C][gpio.binary_sensor:072]:   Interrupt Type: ANY_EDGE
[19:40:35][C][light:092]: Light 'living_room_assist'
[19:40:35][C][light:094]:   Default Transition Length: 0.0s
[19:40:35][C][light:094]:   Gamma Correct: 2.80
[19:40:35][C][template.switch:079]: Template Switch 'Use listen light'
[19:40:35][C][template.switch:079]:   Restore Mode: restore defaults to ON
[19:40:35][C][template.switch:057]:   Optimistic: YES
[19:40:35][C][template.switch:079]: Template Switch 'timer_ringing'
[19:40:35][C][template.switch:079]:   Restore Mode: always OFF
[19:40:35][C][template.switch:057]:   Optimistic: YES
[19:40:35][C][factory_reset.button:011]: Factory Reset Button 'Factory reset'
[19:40:35][C][factory_reset.button:011]:   Icon: 'mdi:restart-alert'
[19:40:35][C][i2s_audio.microphone:083]: Microphone:
[19:40:35][C][i2s_audio.microphone:083]:   Pin: 23
[19:40:35][C][i2s_audio.microphone:083]:   PDM: YES
[19:40:35][C][i2s_audio.microphone:083]:   DC offset correction: YES
[19:40:35][C][psram:016]: PSRAM:
[19:40:35][C][psram:019]:   Available: NO
[19:40:35][C][i2s_audio.speaker:114]: Speaker:
[19:40:35][C][i2s_audio.speaker:114]:   Pin: 22
[19:40:35][C][i2s_audio.speaker:114]:   Buffer duration: 60
[19:40:35][C][i2s_audio.speaker:120]:   Timeout: 500 ms
[19:40:35][C][i2s_audio.speaker:128]:   Communication format: std
[19:40:35][C][captive_portal:099]: Captive Portal:
[19:40:35][C][esphome.ota:073]: Over-The-Air updates:
[19:40:35][C][esphome.ota:073]:   Address: m5stack-atom-echo-xxxxxx.local:3232
[19:40:35][C][esphome.ota:073]:   Version: 2
[19:40:35][C][safe_mode:018]: Safe Mode:
[19:40:35][C][safe_mode:019]:   Boot considered successful after 60 seconds
[19:40:35][C][safe_mode:019]:   Invoke after 10 boot attempts
[19:40:35][C][safe_mode:019]:   Remain for 300 seconds
[19:40:35][C][web_server.ota:224]: Web Server OTA
[19:40:35][C][api:207]: API Server:
[19:40:35][C][api:207]:   Address: m5stack-atom-echo-xxxxxx.local:6053
[19:40:35][C][api:212]:   Using noise encryption: YES
[19:40:35][C][mdns:122]: mDNS:
[19:40:35][C][mdns:122]:   Hostname: m5stack-atom-echo-xxxxxx
[19:40:35][C][micro_wake_word:064]: microWakeWord:
[19:40:35][C][micro_wake_word:065]:   models:
[19:40:35][C][micro_wake_word:014]:     - Wake Word: Okay Nabu
[19:40:35][C][micro_wake_word:014]:       Probability cutoff: 0.97
[19:40:35][C][micro_wake_word:014]:       Sliding window size: 5
[19:40:35][C][micro_wake_word:014]:     - Wake Word: Hey Mycroft
[19:40:35][C][micro_wake_word:014]:       Probability cutoff: 0.95
[19:40:35][C][micro_wake_word:014]:       Sliding window size: 5
[19:40:35][C][micro_wake_word:014]:     - Wake Word: Hey Jarvis
[19:40:35][C][micro_wake_word:014]:       Probability cutoff: 0.97
[19:40:35][C][micro_wake_word:014]:       Sliding window size: 5
[19:40:35][C][micro_wake_word:022]:     - VAD Model
[19:40:35][C][micro_wake_word:022]:       Probability cutoff: 0.50
[19:40:35][C][micro_wake_word:022]:       Sliding window size: 5
[19:40:58][D][micro_wake_word:325]: Detected 'Okay Nabu' with sliding average probability is 0.98 and max probability is 1.00
[19:40:58][D][voice_assistant:477]: State changed from IDLE to START_MICROPHONE
[19:40:58][D][voice_assistant:484]: Desired state set to START_PIPELINE
[19:40:58][D][micro_wake_word:370]: Stopping wake word detection
[19:40:58][D][voice_assistant:207]: Starting Microphone
[19:40:58][D][ring_buffer:034]: Created ring buffer with size 16384
[19:40:58][D][voice_assistant:477]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[19:40:58][D][micro_wake_word:378]: State changed from DETECTING_WAKE_WORD to STOPPING
[19:40:58][D][voice_assistant:477]: State changed from STARTING_MICROPHONE to START_PIPELINE
[19:40:58][D][micro_wake_word:273]: Inference task is stopping, deallocating buffers
[19:40:58][D][micro_wake_word:278]: Inference task is finished, freeing task resources
[19:40:58][D][micro_wake_word:378]: State changed from STOPPING to STOPPED
[19:40:58][D][voice_assistant:228]: Requesting start
[19:40:58][D][voice_assistant:477]: State changed from START_PIPELINE to STARTING_PIPELINE
[19:40:58][D][voice_assistant:499]: Client started, streaming microphone
[19:40:58][D][voice_assistant:477]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[19:40:58][D][voice_assistant:484]: Desired state set to STREAMING_MICROPHONE
[19:40:58][D][voice_assistant:623]: Event Type: 1
[19:40:58][D][voice_assistant:626]: Assist Pipeline running
[19:40:58][D][voice_assistant:623]: Event Type: 3
[19:40:58][D][voice_assistant:645]: STT started
[19:40:58][D][light:052]: 'living_room_assist' Setting:
[19:40:58][D][light:076]:   Red: 0%, Green: 0%, Blue: 100%
[19:40:58][D][light:126]:   Effect: 'Slow Pulse'
[19:41:03][D][voice_assistant:623]: Event Type: 11
[19:41:03][D][voice_assistant:824]: Starting STT by VAD
[19:41:03][D][voice_assistant:623]: Event Type: 12
[19:41:03][D][voice_assistant:828]: STT by VAD end
[19:41:03][D][voice_assistant:477]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[19:41:03][D][voice_assistant:484]: Desired state set to AWAITING_RESPONSE
[19:41:03][D][voice_assistant:477]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[19:41:03][D][light:052]: 'living_room_assist' Setting:
[19:41:03][D][light:076]:   Red: 0%, Green: 0%, Blue: 100%
[19:41:03][D][light:126]:   Effect: 'Fast Pulse'
[19:41:03][D][voice_assistant:477]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[19:41:04][D][voice_assistant:623]: Event Type: 0
[19:41:04][E][voice_assistant:796]: Error: stt-no-text-recognized - No text recognized
[19:41:04][D][voice_assistant:605]: Signaling stop
[19:41:04][D][voice_assistant:477]: State changed from AWAITING_RESPONSE to STOP_MICROPHONE
[19:41:04][D][voice_assistant:484]: Desired state set to IDLE
[19:41:04][D][voice_assistant:623]: Event Type: 2
[19:41:04][D][voice_assistant:763]: Assist Pipeline ended
[19:41:04][D][voice_assistant:477]: State changed from STOP_MICROPHONE to IDLE
[19:41:04][D][light:052]: 'living_room_assist' Setting:
[19:41:04][D][light:069]:   Brightness: 100%
[19:41:04][D][light:076]:   Red: 100%, Green: 0%, Blue: 0%
[19:41:04][D][light:126]:   Effect: 'None'
[19:41:05][D][micro_wake_word:360]: Starting wake word detection
[19:41:05][D][light:052]: 'living_room_assist' Setting:
[19:41:05][D][light:069]:   Brightness: 60%
[19:41:05][D][light:076]:   Red: 100%, Green: 89%, Blue: 71%
[19:41:05][D][micro_wake_word:378]: State changed from STOPPED to STARTING
[19:41:05][D][micro_wake_word:261]: Inference task has started, attempting to allocate memory for buffers
[19:41:05][D][micro_wake_word:266]: Inference task is running
[19:41:05][D][micro_wake_word:378]: State changed from STARTING to DETECTING_WAKE_WORD
[19:41:05][D][ring_buffer:034][mww]: Created ring buffer with size 3840
[19:41:06][D][light:052]: 'living_room_assist' Setting:
[19:41:06][D][light:069]:   Brightness: 60%
[19:41:06][D][light:076]:   Red: 100%, Green: 89%, Blue: 71%

does that help anyone smarter than me to see what’s going on? it kind of seems like the atom echo is not able to communicate with the stt entity – I’m not really sure where to start.

i know the mic works because it successfully detected the wake word. it does this consistently. but it somehow can’t interpret input from this device’s microphone.

again, the stt works correctly from a laptop or mobile mic, so i know it has something to do with the m5stack device…

Do you see any error in the voice assistant log when you go to your Voice assistant page, and then click on the 3 dots on your pipeline, and then on debug?
You can see the raw log for each triggered conversation at the bottom of the debug page.

if I’m viewing the correct log, then the error is:

error:
  code: stt-no-text-recognized
  message: No text recognized

again, though, the same wake word/command succeeds perfectly when spoken into the mobile app mic over https

I have exactly the same issue, working perfectly one day and getting stt-no-text-recognized the next. I’ve been using Home Assistant cloud speech to text, I switched to using faster-whisper and its recognizing speech but not very well. It’s getting most sentences wrong. Can’t figure it out. I think it’s since an update to ESPhome but not sure. I have a home assistant yellow and atom echo

i know that on a browser or on the mobile companion app, if you’re connected to HA via http: it will not allow you to use the mic for Assist. If you’re connected via https: (whether locally or from outside your home network - eg, over cell data from the companion app) then it will allow you to use the mic. I’m wondering if this has something to do with the atom echo trying to use the STT engine from within the local network without encryption, but i don’t know how to check what the echo is using to communicate, nor how to change it. would anyone w/ more knowledge of ESPhome be able to point me in the right direction to look into that? or have a suggestion otherwise if that doesn’t seem to be the issue?

ok, here’s a full log from the Voice Assistant debug page. (i was on mobile before and couldn’t get it to let me select the whole log):

stage: done
run:
  pipeline: [REDACTED]
  language: en
  conversation_id: [REDACTED]
  tts_output:
    token: [REDACTED].wav
    url: /api/tts_proxy/[REDACTED].wav
    mime_type: audio/x-wav
    stream_response: false
events:
  - type: run-start
    data:
      pipeline: [REDACTED]
      language: en
      conversation_id: [REDACTED]
      tts_output:
        token: [REDACTED].wav
        url: /api/tts_proxy/[REDACTED].wav
        mime_type: audio/x-wav
        stream_response: false
    timestamp: "2025-07-23T06:14:02.828481+00:00"
  - type: stt-start
    data:
      engine: stt.speech_to_phrase
      metadata:
        language: en
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
    timestamp: "2025-07-23T06:14:02.829723+00:00"
  - type: stt-vad-start
    data:
      timestamp: 2470
    timestamp: "2025-07-23T06:14:05.418225+00:00"
  - type: stt-vad-end
    data:
      timestamp: 3380
    timestamp: "2025-07-23T06:14:06.329031+00:00"
  - type: error
    data:
      code: stt-no-text-recognized
      message: No text recognized
    timestamp: "2025-07-23T06:14:06.410426+00:00"
  - type: run-end
    data: null
    timestamp: "2025-07-23T06:14:06.411993+00:00"
stt:
  engine: stt.speech_to_phrase
  metadata:
    language: en
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  done: false
error:
  code: stt-no-text-recognized
  message: No text recognized

OK, completely randomly, it succeeded about four times in turning on and off the light i have exposed, via the atom echo. However, most times, it still gives a red light no text recognized error. So i thought - it sounds like it’s trying to do all the processing on the tiny device itself. So i switched the Wake word engine location from On device to In Home Assistant and now it succeeds much more often (albeit extremely slowly). however, the device seems to get hot, and it stops being able to interpret any input.

i think this device might be trying to do all the voice processing on device instead of just the wake word detection…

so i built a wyoming satellite on an Rpizero2W, and everything works fine. I think the issue is really that the m5 stack is trying to be used as a satellite (perhaps b/c of the way i set it up), and it’s just not powerful enough.