Voice Assist with M5Stack Atom Echo activated. Long time for wake word and speech-to-text?

SaintTDI · April 3, 2024, 2:45pm

Hi All!

Today finally the M5Stack Atom Echo has arrived! So I installed it and it works even with Italian Language But it seems quite slow, and I don’t know if I can change something to get it better.

I have Home Assistant OS installed on Beelink MINI S12 Pro with Alder Lake-N N100, 16GB DDR4 +500GB M.2 PCIe 2280 NVMe. So I think it’s quite powerful!

I’m doing some tests and the logs on the whisper addon say:

INFO:faster_whisper:Processing audio with duration 00:02.560
INFO:wyoming_faster_whisper.handler: Accendi luce studio

Instead the Voice Assist debug has this for the wake word RAW which took 2,43 seconds:

entity_id: wake_word.openwakeword
metadata:
  format: wav
  codec: pcm
  bit_rate: 16
  sample_rate: 16000
  channel: 1
timeout: 5
wake_word_output:
  wake_word_id: ok_nabu_v0.1
  wake_word_phrase: ok nabu
  timestamp: 1235

and this for the speech-to-text RAW which took 7,03 seconds:

metadata:
  language: it
  format: wav
  codec: pcm
  bit_rate: 16
  sample_rate: 16000
  channel: 1
stt_output:
  text: " Accendi luce studio"

The Natural Language Processing took only 0,05 seconds:

conversation_id: null
device_id: 5051163320b288942a1c8a5c4096ebaf
intent_output:
  response:
    speech:
      plain:
        speech: Ho acceso
        extra_data: null
    card: {}
    language: it
    response_type: action_done
    data:
      targets: []
      success:
        - name: Studio
          type: area
          id: studio
        - name: Studio
          type: entity
          id: light.studio
        - name: Studio Strip
          type: entity
          id: light.studio_strip
      failed: []
  conversation_id: null

The settings for whisper is:

model: small-int8
language: it
beam size: 5

and this is all the whisper log:

s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service whisper: starting
s6-rc: info: service whisper successfully started
s6-rc: info: service discovery: starting
[16:16:45] WARNING: Your CPU does not support Advanced Vector Extensions (AVX). Whisper will run slower than normal.
INFO:__main__:Ready
[16:16:48] INFO: Successfully send discovery information to Home Assistant.
s6-rc: info: service discovery successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
INFO:faster_whisper:Processing audio with duration 00:02.560
INFO:wyoming_faster_whisper.handler: Accendi luce studio

Is there some wrong configuration? I think it takes a long time for the wake word and speech-to-text processing, I’m right?

Thanks in advance!

Demusman · April 3, 2024, 4:27pm

Settings, devices, click the esp device and on the config card, what is “Finished Speaking Detection” set to? ‘Relaxed’ seems to work well.

SaintTDI · April 3, 2024, 4:35pm

yes it’s relaxed… but I don’t think that the speech-to-text is relative to this configuration