M5tsack Atom Echo issues (buggy feedback, gets stuck (?))

prankousky · March 13, 2024, 7:50am

Hi everybody,

I bought some m5stack Atom Echo devices, hoping to replace my existing Alexa devices. They were purchased from the official m5 store (so I’ll just assume the issues I’m about to describe are not caused by me using cheap fake copies of these).

Firmware was installed through this link $13 voice assistant for Home Assistant - Home Assistant via Chromium browser; the devices (I tested two individual m5stack atom echos, issues were the same on both) were flashed, connected to my WiFi network, then automatically discovered by Home Assistant.

The devices are powered via USB-C by an Anker USB power hub.

Setup

voice assistant

openwakeword

piper

whisper

1. buggy speaker

(Video, you might need to increase volume to hear this) My setup is in German, so is the response “Schwenker existiert nicht” (‘schwenker does not exist’). But it doesn’t play this as I’d expect, rather like “Schwenker ex…xsxsxs… sstiert nicht”.

This is the same with all audio feedback. “Schwenker” does actually exist, so when I turn it on/off, feedback will be like (video) this… instead of “ausgeschaltet” (‘switched off’), it says “ausge…schschschalttttttttttetttttt”.

My initial thought was that the speaker was damaged, but then why would this happen on both (brand new) devices?

2. not recognizing commands, then getting stuck

Sometimes, commands don’t get recognized. I made an automation for reporting the current time:

This should pick up on “Wie spät” (‘what time is it’) or “Uhrzeit” (‘time’). Sometimes, this works fine; however, those times that it doesn’t, I get a “Entschuldigung, das habe ich nicht verstanden” (‘sorry, I did not understand this’) - also with buggy audio feedback.

When this happens, I can no longer trigger the device by saying “okay nabu”; the led stays white (it would be blue if it were to listen to commands); sometimes, after a minute or so, I can trigger it again - other times I have to power cycle it in order to work again.

3. not that great in understanding language (?)

Sometimes, commands will be understood, sometimes they won’t. The device is on my desk, so maybe 50cm in front of me. I have to speak very clearly in order to be understood.

As you can see in setup, I am using the base model. Before, it was tiny-int8 and didn’t work at all. base does work now, but not that great. I am not sure which model to pick in order to work sufficiently, but also not be too slowly (see 4.)

4. takes way too long

When I say a command, it takes between 6 - 10 seconds for the device and/or Home Assistant to work with it. I say something, the led turns to a blue pulse (for said amount of time), until it replies and/or executes the command.

Below are the log files from one of the two devices; this one was updated to the latest ESPHome version after adopting it; the other one (with the exact same issues) was still one ESPHome version behind.


[08:38:52][D][voice_assistant:185]: VAD detected speech
[08:38:52][D][voice_assistant:416]: State changed from WAITING_FOR_VAD to START_PIPELINE
[08:38:52][D][voice_assistant:422]: Desired state set to STREAMING_MICROPHONE
[08:38:52][D][voice_assistant:202]: Requesting start...
[08:38:52][D][voice_assistant:416]: State changed from START_PIPELINE to STARTING_PIPELINE
[08:38:53][D][voice_assistant:437]: Client started, streaming microphone
[08:38:53][D][voice_assistant:416]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[08:38:53][D][voice_assistant:422]: Desired state set to STREAMING_MICROPHONE
[08:38:53][D][voice_assistant:523]: Event Type: 1
[08:38:53][D][voice_assistant:526]: Assist Pipeline running
[08:38:53][D][voice_assistant:523]: Event Type: 9
[08:38:55][D][voice_assistant:523]: Event Type: 10
[08:38:55][D][voice_assistant:532]: Wake word detected
[08:38:55][D][voice_assistant:523]: Event Type: 3
[08:38:55][D][voice_assistant:537]: STT started
[08:38:55][D][light:036]: 'Unten/Arbeitszimmer/Echo-m5-235f1c' Setting:
[08:38:55][D][light:059]:   Red: 0%, Green: 0%, Blue: 100%
[08:38:55][D][light:109]:   Effect: 'Slow Pulse'
[08:38:57][D][voice_assistant:523]: Event Type: 11
[08:38:57][D][voice_assistant:677]: Starting STT by VAD
[08:38:57][D][voice_assistant:523]: Event Type: 12
[08:38:57][D][voice_assistant:681]: STT by VAD end
[08:38:57][D][voice_assistant:416]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[08:38:57][D][voice_assistant:422]: Desired state set to AWAITING_RESPONSE
[08:38:57][D][voice_assistant:416]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[08:38:57][D][light:036]: 'Unten/Arbeitszimmer/Echo-m5-235f1c' Setting:
[08:38:57][D][light:059]:   Red: 0%, Green: 0%, Blue: 100%
[08:38:57][D][light:109]:   Effect: 'Fast Pulse'
[08:38:58][D][esp-idf:000]: I (66522115) I2S: DMA queue destroyed

[08:38:58][D][voice_assistant:416]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[08:38:59][D][voice_assistant:523]: Event Type: 4
[08:38:59][D][voice_assistant:551]: Speech recognised as: " Schwänker an"
[08:38:59][D][voice_assistant:523]: Event Type: 5
[08:38:59][D][voice_assistant:556]: Intent started
[08:39:05][D][voice_assistant:523]: Event Type: 6
[08:39:05][D][voice_assistant:523]: Event Type: 7
[08:39:05][D][voice_assistant:579]: Response: "Entschuldigung, das habe ich nicht verstanden"
[08:39:05][D][light:036]: 'Unten/Arbeitszimmer/Echo-m5-235f1c' Setting:
[08:39:05][D][light:051]:   Brightness: 100%
[08:39:05][D][light:059]:   Red: 0%, Green: 0%, Blue: 100%
[08:39:05][D][light:109]:   Effect: 'None'
[08:39:05][D][voice_assistant:523]: Event Type: 8
[08:39:05][D][voice_assistant:599]: Response URL: "http://10.0.0.25:8123/api/tts_proxy/5c02e4a6af79b53b45aa3d8f4b2d40a7881ea901_de-de_78c4af86c1_tts.home_assistant_cloud.wav"
[08:39:05][D][voice_assistant:416]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[08:39:05][D][voice_assistant:422]: Desired state set to STREAMING_RESPONSE
[08:39:05][D][esp-idf:000]: I (66529418) I2S: DMA Malloc info, datalen=blocksize=512, dma_buf_count=8

[08:39:05][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:05][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:05][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:05][D][light:036]: 'Unten/Arbeitszimmer/Echo-m5-235f1c' Setting:
[08:39:05][D][light:051]:   Brightness: 60%
[08:39:05][D][light:059]:   Red: 100%, Green: 89%, Blue: 71%
[08:39:06][D][voice_assistant:523]: Event Type: 98
[08:39:06][D][voice_assistant:664]: TTS stream start
[08:39:07][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:07][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:07][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:07][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:07][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:07][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:08][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:08][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:08][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:08][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:08][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:08][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:08][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:08][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:08][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:08][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:09][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:09][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:09][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:09][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:09][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:09][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:09][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:09][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:09][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:09][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:10][D][esp-idf:000]: I (66534153) I2S: DMA queue destroyed
[08:39:10][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:10][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:10][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:10][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:10][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:10][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:10][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:10][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:10][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:11][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:12][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:12][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:12][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:12][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:12][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:12][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:12][D][esp-idf:000]: I (66536811) I2S: DMA queue destroyed
[08:39:12][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:12][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:12][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:12][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:12][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:13][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:13][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:13][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:13][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:13][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:13][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:13][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:13][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:13][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:13][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:14][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:14][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:14][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:14][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:14][D][esp-idf:000]: I (66538306) I2S: DMA queue destroyed

[08:39:14][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:14][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:14][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:14][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:14][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:14][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:14][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:15][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:15][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:15][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:15][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:15][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:15][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:15][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:15][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:16][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:16][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:16][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:16][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:16][D][i2s_audio.speaker:167]: Stopping I2S Audio Speaker
[08:39:16][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:16][D][i2s_audio.speaker:161]: Starting I2S Audio Speaker
[08:39:16][D][i2s_audio.speaker:164]: Started I2S Audio Speaker
[08:39:16][D][voice_assistant:523]: Event Type: 99
[08:39:16][D][voice_assistant:672]: TTS stream end
[08:39:16][D][esp-idf:000]: I (66540523) I2S: DMA queue destroyed
[08:39:16][D][i2s_audio.speaker:178]: Stopped I2S Audio Speaker
[08:39:16][D][voice_assistant:319]: Speaker has finished outputting all audio
[08:39:16][D][voice_assistant:416]: State changed from RESPONSE_FINISHED to IDLE
[08:39:16][D][voice_assistant:422]: Desired state set to IDLE
[08:39:16][D][voice_assistant:416]: State changed from IDLE to START_MICROPHONE
[08:39:16][D][voice_assistant:422]: Desired state set to WAIT_FOR_VAD
[08:39:16][D][voice_assistant:155]: Starting Microphone
[08:39:16][D][voice_assistant:416]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[08:39:16][D][esp-idf:000]: I (66540560) I2S: DMA Malloc info, datalen=blocksize=1024, dma_buf_count=4
[08:39:16][D][voice_assistant:416]: State changed from STARTING_MICROPHONE to WAIT_FOR_VAD
[08:39:16][D][voice_assistant:172]: Waiting for speech...
[08:39:16][D][voice_assistant:416]: State changed from WAIT_FOR_VAD to WAITING_FOR_VAD
[08:39:16][D][voice_assistant:185]: VAD detected speech
[08:39:16][D][voice_assistant:416]: State changed from WAITING_FOR_VAD to START_PIPELINE
[08:39:16][D][voice_assistant:422]: Desired state set to STREAMING_MICROPHONE
[08:39:16][D][voice_assistant:202]: Requesting start...
[08:39:16][D][voice_assistant:416]: State changed from START_PIPELINE to STARTING_PIPELINE
[08:39:16][D][voice_assistant:437]: Client started, streaming microphone
[08:39:16][D][voice_assistant:416]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[08:39:16][D][voice_assistant:422]: Desired state set to STREAMING_MICROPHONE
[08:39:16][D][voice_assistant:523]: Event Type: 1
[08:39:16][D][voice_assistant:526]: Assist Pipeline running
[08:39:16][D][voice_assistant:523]: Event Type: 9

I haven’t had any experience with voice / m5stack atom echo, so I thought using the web installer would be best. But perhaps there are some settings I should change in order to make things work better? Below is the ESPHome yaml that must have been automatically generated when using the web installer

substitutions:
  name: m5stack-atom-echo-235f1c
  friendly_name: Unten/Arbeitszimmer/Echo-m5-235f1c
packages:
  m5stack.atom-echo-voice-assistant: github://esphome/firmware/voice-assistant/m5stack-atom-echo.yaml@main
esphome:
  name: ${name}
  name_add_mac_suffix: false
  friendly_name: ${friendly_name}
api:
  encryption:
    key: yesnomaybeidontknow


wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

Should I add / remove / change anything here? Since this is an ESP32, I would like to add bluetooth proxy (though this might make things even slower?).

I was looking forward to a local solution, but as for now, it feels nothing like using Alexa. It feels like gambling whether or not something might work, then (if it doesn’t) waiting until it’s possible to try again.

Thank you in advance for your ideas

saxn-paule · June 10, 2024, 11:58am

I have exactly the same issue. Brand new echo.
With English language everything works well and all commands are recognized.

After switching to German, commands are recognized with a lot of luck and the echo hangs up after 5 - 10 commands.

saxn-paule · June 12, 2024, 11:33am

With this settings/LMM the german detection works good:

command: --model guillaumekln/faster-whisper-small --language de --beam-size 3

For avoiding the Echo goes stuck, just turn off the wake word detection and on again 2 seconds later every three minutes by an automation.

prankousky · June 13, 2024, 12:45pm

Where did you place this command? Sorry, it’s been a while since I’ve been working with this…

saxn-paule · June 13, 2024, 2:00pm

As part of my Docker config.

version: '3'

services:
  whisper:
    container_name: whisper
    image: rhasspy/wyoming-whisper
    command: --model guillaumekln/faster-whisper-small --language de --beam-size 3
    volumes:
      - /home/psc/Docker-Whisper/whisper-data:/data
    environment:
      - TZ=Europe/Berlin
    restart: unless-stopped
    ports:
      - 10300:10300

  piper:
    container_name: piper
    image: rhasspy/wyoming-piper
    command: --voice en_US-lessac-high
    volumes:
      - /home/psc/Docker-Whisper/piper-data:/data
    environment:
      - TZ=Europe/Berlin
    restart: unless-stopped
    ports:
      - 10200:10200

I can’t use the HA addons because I run Home-Assistant in a Docker container as well.

It takes ~4 seconds for recognizing the wake word and executing the command on a NUC with i5-6260U and 8GB RAM

radinsky · July 13, 2024, 1:03pm

I got all the same issues as described in the first post.
I have researched both the forums and the project’s github, all issues are there but no fixes or workarounds provided.
I hope to see some development on it, there was so much attention enabling voice assistant (year of voice and etc), but so far it’s only barely demo grade, for sure not something that can be useable in everyday routine.
Ordered for now S3 box, will test its performance (but I don’t expect some huge surprises)

w00z · July 31, 2024, 7:30am

Would you mind posting your experience with the S3? As you said, the Atom Echo is only usable for toying around, not for day to day use.

radinsky · July 31, 2024, 9:36am

I have been playing with it for the last week or so, tried the “stock” homeassistant esphome voiceassistant firmware. It is not stable. You can reboot the device and it will work for a while, it stops responding or the dictation/voice recognition stops working. Really not reliable so far.

Tried custom fw of BigBobbas (look on github), the voice assistant issues pretty much the same (i think even worth), however there’s touchscreen stuff you can customize, but it is not trivial and requires to get familiar with (yaml) “coding” of esp, the process of compiling and flashing the fw of every change you want to verify is slow and takes a lot of time, also there’s very limited documentation and examples, but chatgpt helped (and confused) me a lot.

The s3 box I purchased is 3b and (I didn’t know) doesn’t contain the battery (sensor model is out of stock). If I will start working properly i will get the battery (SENSOR) version.

Anyway for now it’s not stable and not reliable. Unfortunately not even close to google home or alexa…

hksthff · August 2, 2024, 9:54am

Unfortunately same issues here. Was really excited about the Atom Echo and using it for basic info & controls (timer, clock, temperature, open/close shutters), but neither in German nor in English the device recognizes any commands (“sorry, not ware of…”).