M5tsack Atom Echo issues (buggy feedback, gets stuck (?))

Hi everybody,

I bought some m5stack Atom Echo devices, hoping to replace my existing Alexa devices. They were purchased from the official m5 store (so I’ll just assume the issues I’m about to describe are not caused by me using cheap fake copies of these).

Firmware was installed through this link $13 voice assistant for Home Assistant - Home Assistant via Chromium browser; the devices (I tested two individual m5stack atom echos, issues were the same on both) were flashed, connected to my WiFi network, then automatically discovered by Home Assistant.

The devices are powered via USB-C by an Anker USB power hub.

Setup

voice assistant

openwakeword

piper

whisper

1. buggy speaker

(Video, you might need to increase volume to hear this) My setup is in German, so is the response “Schwenker existiert nicht” (‘schwenker does not exist’). But it doesn’t play this as I’d expect, rather like “Schwenker ex…xsxsxs… sstiert nicht”.

This is the same with all audio feedback. “Schwenker” does actually exist, so when I turn it on/off, feedback will be like (video) this… instead of “ausgeschaltet” (‘switched off’), it says “ausge…schschschalttttttttttetttttt”.

My initial thought was that the speaker was damaged, but then why would this happen on both (brand new) devices?

2. not recognizing commands, then getting stuck

Sometimes, commands don’t get recognized. I made an automation for reporting the current time:

This should pick up on “Wie spät” (‘what time is it’) or “Uhrzeit” (‘time’). Sometimes, this works fine; however, those times that it doesn’t, I get a “Entschuldigung, das habe ich nicht verstanden” (‘sorry, I did not understand this’) - also with buggy audio feedback.

When this happens, I can no longer trigger the device by saying “okay nabu”; the led stays white (it would be blue if it were to listen to commands); sometimes, after a minute or so, I can trigger it again - other times I have to power cycle it in order to work again.

3. not that great in understanding language (?)

Sometimes, commands will be understood, sometimes they won’t. The device is on my desk, so maybe 50cm in front of me. I have to speak very clearly in order to be understood.

As you can see in setup, I am using the base model. Before, it was tiny-int8 and didn’t work at all. base does work now, but not that great. I am not sure which model to pick in order to work sufficiently, but also not be too slowly (see 4.)

4. takes way too long

When I say a command, it takes between 6 - 10 seconds for the device and/or Home Assistant to work with it. I say something, the led turns to a blue pulse (for said amount of time), until it replies and/or executes the command.

Below are the log files from one of the two devices; this one was updated to the latest ESPHome version after adopting it; the other one (with the exact same issues) was still one ESPHome version behind.


[08:38:52][D][voice_assistant:185]: VAD detected speech
[08:38:52][D][voice_assistant:416]: State changed from WAITING_FOR_VAD to START_PIPELINE
[08:38:52][D][voice_assistant:422]: Desired state set to STREAMING_MICROPHONE
[08:38:52][D][voice_assistant:202]: Requesting start...
[08:38:52][D][voice_assistant:416]: State changed from START_PIPELINE to STARTING_PIPELINE
[08:38:53][D][voice_assistant:437]: Client started, streaming microphone
[08:38:53][D][voice_assistant:416]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[08:38:53][D][voice_assistant:422]: Desired state set to STREAMING_MICROPHONE
[08:38:53][D][voice_assistant:523]: Event Type: 1
[08:38:53][D][voice_assistant:526]: Assist Pipeline running
[08:38:53][D][voice_assistant:523]: Event Type: 9
[08:38:55][D][voice_assistant:523]: Event Type: 10
[08:38:55][D][voice_assistant:532]: Wake word detected
[08:38:55][D][voice_assistant:523]: Event Type: 3
[08:38:55][D][voice_assistant:537]: STT started
[08:38:55][D][light:036]: 'Unten/Arbeitszimmer/Echo-m5-235f1c' Setting:
[08:38:55][D][light:059]:   Red: 0%, Green: 0%, Blue: 100%
[08:38:55][D][light:109]:   Effect: 'Slow Pulse'
[08:38:57][D][voice_assistant:523]: Event Type: 11
[08:38:57][D][voice_assistant:677]: Starting STT by VAD
[08:38:57][D][voice_assistant:523]: Event Type: 12
[08:38:57][D][voice_assistant:681]: STT by VAD end
[08:38:57][D][voice_assistant:416]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[08:38:57][D][voice_assistant:422]: Desired state set to AWAITING_RESPONSE
[08:38:57][D][voice_assistant:416]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[08:38:57][D][light:036]: 'Unten/Arbeitszimmer/Echo-m5-235f1c' Setting:
[08:38:57][D][light:059]:   Red: 0%, Green: 0%, Blue: 100%
[08:38:57][D][light:109]:   Effect: 'Fast Pulse'
[08:38:58][D][esp-idf:000]: I (66522115) I2S: DMA queue destroyed

[08:38:58][D][voice_assistant:416]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[08:38:59][D][voice_assistant:523]: Event Type: 4
[08:38:59][D][voice_assistant:551]: Speech recognised as: " Schwänker an"
[08:38:59][D][voice_assistant:523]: Event Type: 5
[08:38:59][D][voice_assistant:556]: Intent started
[08:39:05][D][voice_assistant:523]: Event Type: 6
[08:39:05][D][voice_assistant:523]: Event Type: 7
[08:39:05][D][voice_assistant:579]: Response: "Entschuldigung, das habe ich nicht verstanden"
[08:39:05][D][light:036]: 'Unten/Arbeitszimmer/Echo-m5-235f1c' Setting:
[08:39:05][D][light:051]:   Brightness: 100%
[08:39:05][D][light:059]:   Red: 0%, Green: 0%, Blue: 100%
[08:39:05][D][light:109]:   Effect: 'None'
[08:39:05][D][voice_assistant:523]: Event Type: 8
[08:39:05][D][voice_assistant:599]: Response URL: "http://10.0.0.25:8123/api/tts_proxy/5c02e4a6af79b53b45aa3d8f4b2d40a7881ea901_de-de_78c4af86c1_tts.home_assistant_cloud.wav"
[08:39:05][D][voice_assistant:416]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[08:39:05][D][voice_assistant:422]: Desired state set to STREAMING_RESPONSE
[08:39:05][D][esp-idf:000]: I (66529418) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8

[08:39:05][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:05][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:05][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:05][D][light:036]: 'Unten/Arbeitszimmer/Echo-m5-235f1c' Setting:
[08:39:05][D][light:051]:   Brightness: 60%
[08:39:05][D][light:059]:   Red: 100%, Green: 89%, Blue: 71%
[08:39:06][D][voice_assistant:523]: Event Type: 98
[08:39:06][D][voice_assistant:664]: TTS stream start
[08:39:07][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:07][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:07][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:07][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:07][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:07][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:08][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:08][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:08][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:08][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:08][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:08][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:08][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:08][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:08][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:08][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:09][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:09][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:09][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:09][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:09][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:09][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:09][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:09][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:09][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:09][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:10][D][esp-idf:000]: I (66534153) I2S: DMA queue destroyed
[08:39:10][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:10][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:10][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:10][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:10][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:10][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:10][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:10][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:10][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:12][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:12][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:12][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:12][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:12][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:12][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:12][D][esp-idf:000]: I (66536811) I2S: DMA queue destroyed
[08:39:12][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:12][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:12][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:12][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:12][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:13][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:13][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:13][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:13][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:13][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:13][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:13][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:13][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:13][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:13][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:14][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:14][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:14][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:14][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:14][D][esp-idf:000]: I (66538306) I2S: DMA queue destroyed

[08:39:14][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:14][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:14][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:14][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:14][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:14][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:14][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:15][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:15][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:15][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:15][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:15][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:15][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:15][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:15][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:16][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:16][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:16][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:16][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:16][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:16][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:16][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:16][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:16][D][voice_assistant:523]: Event Type: 99
[08:39:16][D][voice_assistant:672]: TTS stream end
[08:39:16][D][esp-idf:000]: I (66540523) I2S: DMA queue destroyed
[08:39:16][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:16][D][voice_assistant:319]: Speaker has finished outputting all audio
[08:39:16][D][voice_assistant:416]: State changed from RESPONSE_FINISHED to IDLE
[08:39:16][D][voice_assistant:422]: Desired state set to IDLE
[08:39:16][D][voice_assistant:416]: State changed from IDLE to START_MICROPHONE
[08:39:16][D][voice_assistant:422]: Desired state set to WAIT_FOR_VAD
[08:39:16][D][voice_assistant:155]: Starting Microphone
[08:39:16][D][voice_assistant:416]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[08:39:16][D][esp-idf:000]: I (66540560) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4
[08:39:16][D][voice_assistant:416]: State changed from STARTING_MICROPHONE to WAIT_FOR_VAD
[08:39:16][D][voice_assistant:172]: Waiting for speech...
[08:39:16][D][voice_assistant:416]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[08:39:16][D][voice_assistant:185]: VAD detected speech
[08:39:16][D][voice_assistant:416]: State changed from WAITING_FOR_VAD to START_PIPELINE
[08:39:16][D][voice_assistant:422]: Desired state set to STREAMING_MICROPHONE
[08:39:16][D][voice_assistant:202]: Requesting start...
[08:39:16][D][voice_assistant:416]: State changed from START_PIPELINE to STARTING_PIPELINE
[08:39:16][D][voice_assistant:437]: Client started, streaming microphone
[08:39:16][D][voice_assistant:416]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[08:39:16][D][voice_assistant:422]: Desired state set to STREAMING_MICROPHONE
[08:39:16][D][voice_assistant:523]: Event Type: 1
[08:39:16][D][voice_assistant:526]: Assist Pipeline running
[08:39:16][D][voice_assistant:523]: Event Type: 9

I haven’t had any experience with voice / m5stack atom echo, so I thought using the web installer would be best. But perhaps there are some settings I should change in order to make things work better? Below is the ESPHome yaml that must have been automatically generated when using the web installer

substitutions:
  name: m5stack-atom-echo-235f1c
  friendly_name: Unten/Arbeitszimmer/Echo-m5-235f1c
packages:
  m5stack.atom-echo-voice-assistant: github://esphome/firmware/voice-assistant/m5stack-atom-echo.yaml@main
esphome:
  name: ${name}
  name_add_mac_suffix: false
  friendly_name: ${friendly_name}
api:
  encryption:
    key: yesnomaybeidontknow


wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

Should I add / remove / change anything here? Since this is an ESP32, I would like to add bluetooth proxy (though this might make things even slower?).

I was looking forward to a local solution, but as for now, it feels nothing like using Alexa. It feels like gambling whether or not something might work, then (if it doesn’t) waiting until it’s possible to try again.

Thank you in advance for your ideas :slight_smile: