Year of the Voice - Chapter 4: Wake words

There are a number of dual mic webcams now that have better pickup than speakerphones.
There are also advanced KWS that use both vid and voice for auth.
I have a Anker C300 webcam and its mic is miles better than the Anker Powerconf speakerphone I have.

Similar situation here. I haven’t had time to try yet, but it seems like you could install the script to get this going on the pi and have it work.

Not sure of the audio processing done though. The new might break the old.
Do a backup and try it out! Let us know how it goes.

We have an ESPHome config for it here. However we didn’t like the quality of it as a voice assistant and so we haven’t promoted it, to avoid people spending money to buy one and being disappointed.

2 Likes

Replying to myself, but can anyone please tell me what the go is with the available wake words?

According to the announcement stream we should have all these available to use:

However I only have these in the list:
image

…which is in line with the docs…

…but what about the big list from the video? Is there something I’m missing here?

I believe the other wake words come with the Porcupine1 wake engine.

Right, ok. That explains it. Thanks.

This is … not fun :frowning:
Having installed Voice Assist, and got homeassistant-satellite on a RasPi3 to recognise my USB mic and headphones … I have been asking it to “Turn the study light on” or “Turn on the study light”.
With the --debug option I can see that it detected:

  • ’ Turn the study like on.’
  • ’ turn the study light’
  • ’ Turn the study light.’
  • ’ turn on the study flight’
  • ’ turn on the stubby light.’
  • ’ Turn on the stabbing light.’

Very much a surly teenager deliberately misinterpreting every command.

Finally I tried just “Turn on the light” and, despite it appearing to detect ’ Turn on the light.’, once again the “Sorry, I couldn’t understand that” :frowning: Or is that because HA doesn’t know which room my voice assist satellite is in ?

DEBUG:homeassistant_satellite.remote:{'type': 'auth_required', 'ha_version': '2023.10.3'}
DEBUG:homeassistant_satellite.remote:{'type': 'auth_ok', 'ha_version': '2023.10.3'}
DEBUG:homeassistant_satellite.remote:{'id': 1, 'type': 'result', 'success': True, 'result': None}
DEBUG:homeassistant_satellite.remote:{'id': 1, 'type': 'event', 'event': {'type': 'run-start', 'data': {'pipeline': '01gzx9v1fjm5mmjvej04fadjv5', 'language': 'en', 'runner_data': {'stt_binary_handler_id': 1, 'timeout': 300}}, 'timestamp': '2023-10-16T04:51:04.091539+00:00'}}
DEBUG:homeassistant_satellite.remote:{'id': 1, 'type': 'event', 'event': {'type': 'wake_word-start', 'data': {'entity_id': 'wake_word.openwakeword', 'metadata': {'format': 'wav', 'codec': 'pcm', 'bit_rate': 16, 'sample_rate': 16000, 'channel': 1}, 'timeout': 3}, 'timestamp': '2023-10-16T04:51:04.091655+00:00'}}
DEBUG:__main__:wake_word-start {'entity_id': 'wake_word.openwakeword', 'metadata': {'format': 'wav', 'codec': 'pcm', 'bit_rate': 16, 'sample_rate': 16000, 'channel': 1}, 'timeout': 3}
DEBUG:homeassistant_satellite.remote:{'id': 1, 'type': 'event', 'event': {'type': 'wake_word-end', 'data': {'wake_word_output': {'wake_word_id': 'hey_rhasspy_v0.1', 'timestamp': 3490}}, 'timestamp': '2023-10-16T04:51:11.096731+00:00'}}
DEBUG:__main__:wake_word-end {'wake_word_output': {'wake_word_id': 'hey_rhasspy_v0.1', 'timestamp': 3490}}
DEBUG:homeassistant_satellite.remote:{'id': 1, 'type': 'event', 'event': {'type': 'stt-start', 'data': {'engine': 'stt.faster_whisper', 'metadata': {'language': 'en', 'format': 'wav', 'codec': 'pcm', 'bit_rate': 16, 'sample_rate': 16000, 'channel': 1}}, 'timestamp': '2023-10-16T04:51:11.096827+00:00'}}
DEBUG:__main__:stt-start {'engine': 'stt.faster_whisper', 'metadata': {'language': 'en', 'format': 'wav', 'codec': 'pcm', 'bit_rate': 16, 'sample_rate': 16000, 'channel': 1}}
DEBUG:homeassistant_satellite.remote:{'id': 1, 'type': 'event', 'event': {'type': 'stt-vad-start', 'data': {'timestamp': 3715}, 'timestamp': '2023-10-16T04:51:11.473481+00:00'}}
DEBUG:__main__:stt-vad-start {'timestamp': 3715}
DEBUG:homeassistant_satellite.remote:{'id': 1, 'type': 'event', 'event': {'type': 'stt-vad-end', 'data': {'timestamp': 4400}, 'timestamp': '2023-10-16T04:51:12.848287+00:00'}}
DEBUG:__main__:stt-vad-end {'timestamp': 4400}
DEBUG:homeassistant_satellite.remote:{'id': 1, 'type': 'event', 'event': {'type': 'stt-end', 'data': {'stt_output': {'text': ' Turn on the light.'}}, 'timestamp': '2023-10-16T04:51:13.319297+00:00'}}
DEBUG:__main__:stt-end {'stt_output': {'text': ' Turn on the light.'}}
DEBUG:homeassistant_satellite.remote:{'id': 1, 'type': 'event', 'event': {'type': 'intent-start', 'data': {'engine': 'homeassistant', 'language': 'en', 'intent_input': ' Turn on the light.', 'conversation_id': None, 'device_id': None}, 'timestamp': '2023-10-16T04:51:13.319343+00:00'}}
DEBUG:__main__:intent-start {'engine': 'homeassistant', 'language': 'en', 'intent_input': ' Turn on the light.', 'conversation_id': None, 'device_id': None}
DEBUG:homeassistant_satellite.remote:{'id': 1, 'type': 'event', 'event': {'type': 'intent-end', 'data': {'intent_output': {'response': {'speech': {'plain': {'speech': "Sorry, I couldn't understand that", 'extra_data': None}}, 'card': {}, 'language': 'en', 'response_type': 'error', 'data': {'code': 'no_intent_match'}}, 'conversation_id': None}}, 'timestamp': '2023-10-16T04:51:13.330464+00:00'}}
DEBUG:__main__:intent-end {'intent_output': {'response': {'speech': {'plain': {'speech': "Sorry, I couldn't understand that", 'extra_data': None}}, 'card': {}, 'language': 'en', 'response_type': 'error', 'data': {'code': 'no_intent_match'}}, 'conversation_id': None}}
DEBUG:homeassistant_satellite.remote:{'id': 1, 'type': 'event', 'event': {'type': 'tts-start', 'data': {'engine': 'tts.piper', 'language': 'en_GB', 'voice': 'en_GB-alba-medium', 'tts_input': "Sorry, I couldn't understand that"}, 'timestamp': '2023-10-16T04:51:13.330497+00:00'}}
DEBUG:__main__:tts-start {'engine': 'tts.piper', 'language': 'en_GB', 'voice': 'en_GB-alba-medium', 'tts_input': "Sorry, I couldn't understand that"}
DEBUG:homeassistant_satellite.remote:{'id': 1, 'type': 'event', 'event': {'type': 'tts-end', 'data': {'tts_output': {'media_id': "media-source://tts/tts.piper?message=Sorry,+I+couldn't+understand+that&language=en_GB&voice=en_GB-alba-medium", 'url': '/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-gb_35f6e7cd1a_tts.piper.wav', 'mime_type': 'audio/x-wav'}}, 'timestamp': '2023-10-16T04:51:13.330692+00:00'}}
DEBUG:__main__:tts-end {'tts_output': {'media_id': "media-source://tts/tts.piper?message=Sorry,+I+couldn't+understand+that&language=en_GB&voice=en_GB-alba-medium", 'url': '/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-gb_35f6e7cd1a_tts.piper.wav', 'mime_type': 'audio/x-wav'}}
DEBUG:root:play ffmpeg: ['ffmpeg', '-i', 'http://192.168.1.98:8123/api/tts_proxy/dae2cdcb27a1d1c3b07ba2c7db91480f9d4bfd8f_en-gb_35f6e7cd1a_tts.piper.wav', '-f', 'wav', '-ar', '22050', '-ac', '1', '-filter:a', 'volume=1.0', '-']
DEBUG:root:play: ['aplay', '-D', 'plughw:CARD=Headphones,DEV=0', '-r', '22050', '-c', '1', '-f', 'S16_LE', '-t', 'raw']
Playing raw data 'stdin' : Signed 16 bit Little Endian, Rate 22050 Hz, Mono
DEBUG:homeassistant_satellite.remote:{'id': 1, 'type': 'event', 'event': {'type': 'run-end', 'data': None, 'timestamp': '2023-10-16T04:51:13.330712+00:00'}}
DEBUG:__main__:run-end None
DEBUG:homeassistant_satellite.remote:Pipeline finished

Ohhhh … changing microphone helped the audio quality immensely … but unfortunately not so much improvement in the recognition :frowning:

There is far too much running on a ESP32-S3-Box and trying to do ASR & TTS on it spreads resourses to thin.
Likely though its still a perfect platform for a post BSS (Blind Source Seperation) or have KWS onboard and broadcast only on KW hit.
Also the ESP32-S3-Box is just bloated hardware wise where £50 returns a lot of not nessacary
Any ESP32-S3 can use the ADF that has the only instance of a free BSS alg avail even though a blob.
It would be quite simple to employ TFLite4Micro with a pretrained model to select the output from the BSS to get the ‘Voice’.
Basically esspressif run 2x KWS on the outputs to select which is the voice command.
Just needs a esp32-s3 addon board of 2x Mics, ADC and I recommend Max9814 for the analogue ADC to extend farfield.

Very likely as the basic word stemming with a full vocab simple ASR is likely to do so.
Likely you could train on the fly a n-gram LM (Language model) that is implicit to the entities and load up the ASR using a LM.

I still think a LLM would be much better than basic word stemming as LLM’s have really made basic word stemming obselete.
What you do is use Langchain and have the entity sentences presented as documents.

A LM would likely improve current but LLM’s are really making all previous methods obselete.
LMs are pretty old tech but quick to create on the fly and reload a ASR on any entity changes.

Do you have any network infrastructure recommendations around this setup?
There are hints in a few threads that a constant UDP stream isn’t reliably supported by certain wifi setups.

I think if your using a Pi then its now using websockets so TCP, but the M5-Stack might well be still using UDP which always was a bad idea, even from the point your constantly broadcasting a audio stream to all endpoints in that network as they still need to check if the packet is applicable.

Getting the same issue with an Atom Echo using HA Cloud pipelines and rhasspy/wyoming-openwakeword docker.
Toggling the wake word switch doesn’t resolve it but it seems to start responding again after 5-10 mins.

My chain is
Atom ECHO => TPlink WiFi access point (TLWA901ND V5) with 100Mbit LAN connection => LAN cable => main router (o2 homebox 6641) with 1000MBit ports => LAN cable => Raspberry Pi4

I dunno but never liked the idea of using the same mqtt network for audio.

Haven’t got one but just presuming its the same.

I have this issue discribed above , can anyone point me to the right direction , is there any update for the atom ??

Thank you

From how I see it, at this point in time the best we can do is collect cases until someone with more insights into the underlying tech finds the pattern/problem.

What is your setup? Hardware and software?

I have HA in a Proxmox VM , with superviser …

Home Assistant 2023.10.3
Supervisor 2023.10.0
Operating System 11.0
Interface: 20231005.0 - latest

using an ATOM , i just bought 2 units , both have the same issue , the firmware in the atom is

atom-echo-voice-assistant
por m5stack
Firmware: 2023.10.0b1 (Oct 13 2023, 23:14:59)
Hardware: 1.0

I have tried and without wake word … by pressing the button , and the ATOM doesnt have this problems , so its related to the wake word , without it i can press multiple times the button and it responds all the time !!!

i have open a issue in github for someone that wants to join :

4 Likes

I have 1 Atom Echo flashed with the voice assistant with the same symptoms when using openwakeword. I also have 2 ESP32-S3-Box-Lite that have voice assistant flashed on them with the exact same symptoms on those. I can ask a few commands (2-5) and then the voice assistants freeze up.

I have another ESP32-S3-Box Lite flashed with Willow voice assistant for about 3 weeks now and it does not have any issues responding to wake words or getting locked up. Even when the Atom Echo and others don’t respond using openwakeword, the Willow box will still respond.

3 Likes

Oh, I never heard about it until I saw it on Amazon a few days ago, noticed it was running on an ESP32 and wondered if it could be used for Assist.

Can you tell what kind of issues there were with the device?