Voice Assist with M5Stack Atom Echo activated. Long time for wake word and speech-to-text?

Hi All!

Today finally the M5Stack Atom Echo has arrived! :smiley: So I installed it and it works even with Italian Language :slight_smile: But it seems quite slow, and I don’t know if I can change something to get it better.

I have Home Assistant OS installed on Beelink MINI S12 Pro with Alder Lake-N N100, 16GB DDR4 +500GB M.2 PCIe 2280 NVMe. So I think it’s quite powerful!

I’m doing some tests and the logs on the whisper addon say:

INFO:faster_whisper:Processing audio with duration 00:02.560
INFO:wyoming_faster_whisper.handler: Accendi luce studio

Instead the Voice Assist debug has this for the wake word RAW which took 2,43 seconds:

entity_id: wake_word.openwakeword
metadata:
  format: wav
  codec: pcm
  bit_rate: 16
  sample_rate: 16000
  channel: 1
timeout: 5
wake_word_output:
  wake_word_id: ok_nabu_v0.1
  wake_word_phrase: ok nabu
  timestamp: 1235

and this for the speech-to-text RAW which took 7,03 seconds:

metadata:
  language: it
  format: wav
  codec: pcm
  bit_rate: 16
  sample_rate: 16000
  channel: 1
stt_output:
  text: " Accendi luce studio"

The Natural Language Processing took only 0,05 seconds:

conversation_id: null
device_id: 5051163320b288942a1c8a5c4096ebaf
intent_output:
  response:
    speech:
      plain:
        speech: Ho acceso
        extra_data: null
    card: {}
    language: it
    response_type: action_done
    data:
      targets: []
      success:
        - name: Studio
          type: area
          id: studio
        - name: Studio
          type: entity
          id: light.studio
        - name: Studio Strip
          type: entity
          id: light.studio_strip
      failed: []
  conversation_id: null

The settings for whisper is:

model: small-int8
language: it
beam size: 5

and this is all the whisper log:

s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service whisper: starting
s6-rc: info: service whisper successfully started
s6-rc: info: service discovery: starting
[16:16:45] WARNING: Your CPU does not support Advanced Vector Extensions (AVX). Whisper will run slower than normal.
INFO:__main__:Ready
[16:16:48] INFO: Successfully send discovery information to Home Assistant.
s6-rc: info: service discovery successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
INFO:faster_whisper:Processing audio with duration 00:02.560
INFO:wyoming_faster_whisper.handler: Accendi luce studio

Is there some wrong configuration? I think it takes a long time for the wake word and speech-to-text processing, I’m right?

Thanks in advance!

Settings, devices, click the esp device and on the config card, what is “Finished Speaking Detection” set to? ‘Relaxed’ seems to work well.

yes it’s relaxed… but I don’t think that the speech-to-text is relative to this configuration