I wonder if they’re generating code at runtime and need to mark it as executable?
Sure, no problem
Both of these features should be in the latest version now. Both the wake-up sound and the beep when a voice command is finished recorded are now configurable through the web interface.
Also, POST-ing to /api/text-to-speech?repeat=true
will now repeat the last spoken sentence.
I ended up developing the slots lists feature for this purpose instead. This just allows you to reference an external text file from your grammar, but it could be populated from anywhere. The idea was to then write a small service to connect to Home Assistant’s /api/config endpoint and extract the appropriate entity names (maybe in a cron job).
I personally didn’t end up needing to write that service because I only have a few entities I care to talk about, but I did create one for pulling movie and TV show names out of Kodi. They just go into a movies
and a shows
file in my profile’s slots
directory, respectively. Then, I can just have something like this in my sentences.ini
file:
[PlayMovie]
play ($movies){movie_name}
[PlayTVShow]
play ($shows){show_name}
There is some post-processing I have to do on the names, of course, like changing “2” to “two” and getting rid of apostrophes, etc.
hmm maybe.
I have to admit that this is the only container that i have (i think) that executes code from a mapped folder (besides Python code).
Thanks for adding the requested features Cant wait to play arround with this!
Btw, is there any way to support your work with little contribution?
UPDATE:
Just updated to the latest Version and i have a problem with my custom wake sound the audio plays fine but there is no intent recognition working.
I am suspecting that this could be because my wake signal is speech as well and that this messes up the intents? could this be the problem?
If this is the problem would it be possible to create a delay in the voice recognition for the duration of the wake signal?
Ahh, OK. I’ll try setting up some manual commands & slots lists.
- Can synonyms be used in slots lists?
- Wha’s the recommended way to send HASS replies to commands (like “the garage door is open”) to the rhassphy host? I see rhasspy has an API for posting wavs to play, or text to TTS. Should I be making a HASS rest_command service to post text replies to rhassphy’s API? Does rhasspy have some sort of i-am-a-media-player integration with HASS that I’ve missed?
- Porcupine seems to do a good job of waking in my short tests so far, but it seems like the voice recognition frequent misses the first word of commands, such as make/set/turn/what/tell/how. Is this just my mic being of insufficient quality (testing with a Microsoft HD3000 webcam USB mic), or might it be that it’s failing to capture the beginning of the audio? I am waiting until after the wake-word beep.
Thanks for your help!
You can do two things for text-to-speech replies:
- Lets your hasio instance speak
- send a message to the Rhasspy API with a text payload
- both
I have an automation responding to a Rhasspy event. It sends a payload via a rest command and calls the TTS service as well.
The payload via restcommand is used, because that is an sattelties and I want to play audio on that sattelite.
The TTS service plays the audio on the machine HA is installed.
My conf.yaml (I am Dutch):
rest_command:
rhasspy_speak:
url: 'http://192.168.43.169:12101/api/text-to-speech'
method: 'POST'
payload: '{{ payload }}'
content_type: text/plain
My automation:
- id: '1556640424575'
alias: Rolluiken
trigger:
- event_data: {}
event_type: rhasspy_Covers
platform: event
condition: []
action:
- data_template:
payload: Dat is goed, ik {{ trigger.event.data.actiontype }} het {{ trigger.event.data.whichcover
}} rolluik
service: rest_command.rhasspy_speak
- data_template:
message: Dat is goed, ik {{ trigger.event.data.actiontype }} het {{ trigger.event.data.whichcover
}} rolluik
service: tts.google_cloud_say
entity_id: media_player.test
My media player is actually a mpd, sending audio to snapcast (the server is an addon)
media_player:
- platform: snapcast
host: 192.168.43.169
- platform: mpd
host: 192.168.43.169
port: 6601
name: Test
This way, you can create multiroom TTS messages
I appreciate the thought very much but people volunteering their time is a great contribution already!
Do you see anything in the logs? I’ve never tried anything but beeps for the wake sound, so it is possible that it could interfere with the speech recognition process. I’ll work on either adding an optional delay to the speech recognition, or trying to ensure it doesn’t start until after the wake WAV has finished playing.
Yes! I just tried it and it seems to work. I have something like this in slots/movies
:
toy story three:3
and this in sentences.ini
:
[PlayMovie]
play ($movies){movie_name}
When I say “play toy story three”, I get back “toy story 3” in the movie_name
slot.
I actually have some code that accepts MPD play requests from Home Assistant, and could play them through Rhasspy’s audio player. Would anyone be interested in that feature?
It might be the audio system you’re using to record (PyAudio vs. ALSA directly) as well as the microphone. I re-create the microphone audio stream in between turning off the wake system and turning on the voice command recognition system. Maybe re-opening the device is taking too long? I may be able to solve both this and @ntuseracc’s problem by opening the device while the wake WAV is playing but making sure the audio is ignored until after the WAV has finished…
That sounds useful to me, though not urgent. I can go ahead and try the recommendations @Romkabouter suggested.
I’ve tried with both PyAudio and ALSA directly with the same results. I’ve also tried with both the Microsoft HD3000 and the PS3 Eye, same issue.
FWIW, PyAudio sometimes segfaults (maybe 1 in 10-20 times), but arecord directly seems functional with the caveat of usually missing the first word. In both cases, I’m testing on a RPi 3b+.
Thanks for the examples! I’ll give them a try.
I am suspecting that this could be because my wake signal is speech as well and that this messes up the intents? could this be the problem?
That is why Snips does not listen until a playFinished message is published.
- audio is played, when done, message with topicID on playFinished topic
- when wakeword feedback is enabled, Snips start listening to voice commands only after that finished message has been published.
I’ve moved to trying to run rhasspy on my faster Intel server with the RPi 3b+ as a voice satellite via mqtt with @koan’s hermes-audio-server. The response time to commands (speech-to-text) is certainly faster!
I did get hermes-audio-recorder working fine in a docker container. I noticed that the initial word seemed to not be missed much if at all, perhaps because hermes keeps the mic recording the entire time?
On the other hand, I was unable to get hermes-audio-player working in a docker container. It would play the wakeworld acknowledgement sounds sent over mqtt just fine, but as soon as some TTS came it would immediately die with the below errors. No such behavior if run under a venv. Would be nice to have it running under docker also but not sure where to dig on that.
Can hermes-audio-recorder or any of the other commonly used satellite input/output setups (Matrix Voice MQTT Audio Streamer etc) be used with local wakeworld detection? So audio is only sent when the wakeword is detected?
Some other observations:
- Sometimes when I say wakeword, rhasspy appears to detect it but never plays the prompt wav nor listens to my further input, despite
DEBUG:PorcupineWakeListener:Hotword detected (True)
appearing in the log (twice) - hermes-audio-server’s VAD appears to send audio pretty much anytime there’s sound, from typing to music to my mouse sliding on the desk. In mode 3 it sometimes doesn’t for some music, but does for typing. Basically, at least for me, any mode is basically a don’t-send-silence filter only.
- Trying to get default audio devices to go the right devices can be a real beast sometimes, especially if a non-default can’t be selected (hermes). When I finally get it working for hermes-audio-recorder and -player in docker (the latter of which on startup says its listening to my HDMI but is actually listening to the 3.5mm anyway), it still doesn’t work for hermes-audio-recorder in venv.
2019-06-30 12:25:01,429 INFO Received an audio message of length 88 bytes with request id 7883c146-778d-43db-9e10-6e9c802818e4 on site default.
Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2048
Expression 'PaAlsaStreamComponent_InitialConfigure( &self->playback, outParams, self->primeBuffers, hwParamsPlayback, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2722
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2843
Traceback (most recent call last):
File "/usr/local/bin/hermes-audio-player", line 20, in <module>
plac.call(main)
File "/usr/local/lib/python3.7/site-packages/plac_core.py", line 330, in call
cmd, result = parser.consume(arglist)
File "/usr/local/lib/python3.7/site-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "/usr/local/bin/hermes-audio-player", line 16, in main
cli.main(PLAYER, verbose, version, config, daemon)
File "/usr/local/lib/python3.7/site-packages/hermes_audio_server/cli.py", line 68, in main
server.start()
File "/usr/local/lib/python3.7/site-packages/hermes_audio_server/mqtt.py", line 68, in start
self.mqtt.loop_forever()
File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1578, in loop_forever
rc = self.loop(timeout, max_packets)
File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1072, in loop
rc = self.loop_read(max_packets)
File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1374, in loop_read
rc = self._packet_read()
File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 2071, in _packet_read
rc = self._packet_handle()
File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 2560, in _packet_handle
return self._handle_publish()
File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 2759, in _handle_publish
self._handle_on_message(message)
File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 2902, in _handle_on_message
callback(self, self._userdata, message)
File "/usr/local/lib/python3.7/site-packages/hermes_audio_server/player.py", line 76, in on_play_bytes
output=True)
File "/usr/local/lib/python3.7/site-packages/pyaudio.py", line 750, in open
stream = Stream(self, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/pyaudio.py", line 441, in __init__
self._stream = pa.open(**arguments)
OSError: [Errno -9997] Invalid sample rate
That’s indeed the desirable scenario, but currently Hermes Audio Server is just a ‘dumb’ audio conduit because I’m not comfortable enough at integrating wake word detection. That may change, but don’t count on it in the immediate future.
An audio message of 88 bytes doesn’t sound right, which sound was this? And what was its sample width?
Yes, configurable audio devices are on my TODO list. It will probably be the next feature I’ll implement and it will definitely make it easier to use Hermes Audio Server in more diverse scenarios.
Yes, your observation is correct. I’m just using WebRTCVAD’s modes for the VAD feature and I’m not impressed by its results, but it’s the best VAD system I found and the same one that Rhasspy uses, so I figured it was a safe bet to avoid downstream problems when I’m sending audio to Rhasspy. I added the VAD as an experimental feature mainly to save bandwidth. For that purpose it does the job.
Rhasspy docs say it always tries to return 16 kHz, 16-bit mono audio, so I’d guessing that’s what it generates, though I can see the included wakeword wavs are 44.1 kHz.
Somewhat interestingly, with:
pcm.speaker {
type plug
slave {
pcm "hw:0,0"
}
}
We have this crash result (two successful wakeword sounds, followed by a crash on a TTS being sent):
2019-07-01 07:05:30,203 INFO Connected to audio output bcm2835 ALSA: IEC958/HDMI (hw:0,1).
2019-07-01 07:05:30,221 INFO Connected to MQTT broker mqtt:1883 with result code 0.
2019-07-01 07:05:30,222 INFO Subscribed to hermes/audioServer/default/playBytes/+ topic.
2019-07-01 07:05:35,226 INFO Received an audio message of length 81.98 KiB with request id b0e69643-ddfa-4843-ad62-a3cf37ae57de on site default.
2019-07-01 07:05:35,689 INFO Finished playing audio message with id b0e69643-ddfa-4843-ad62-a3cf37ae57de on device bcm2835 ALSA: IEC958/HDMI (hw:0,1) on site default.
2019-07-01 07:05:37,082 INFO Received an audio message of length 117.1 KiB with request id 78bfd886-4e24-454c-ad0c-c62790e2e50b on site default.
2019-07-01 07:05:37,759 INFO Finished playing audio message with id 78bfd886-4e24-454c-ad0c-c62790e2e50b on device bcm2835 ALSA: IEC958/HDMI (hw:0,1) on site default.
2019-07-01 07:05:47,322 INFO Received an audio message of length 35.09 KiB with request id bd7053c0-df4f-4f49-bd84-ac1718a3bad1 on site default.
Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2048
Expression 'PaAlsaStreamComponent_InitialConfigure( &self->playback, outParams, self->primeBuffers, hwParamsPlayback, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2722
Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2843
Traceback (most recent call last):
File "/usr/local/bin/hermes-audio-player", line 20, in <module>
plac.call(main)
File "/usr/local/lib/python3.7/site-packages/plac_core.py", line 330, in call
cmd, result = parser.consume(arglist)
File "/usr/local/lib/python3.7/site-packages/plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "/usr/local/bin/hermes-audio-player", line 16, in main
cli.main(PLAYER, verbose, version, config, daemon)
File "/usr/local/lib/python3.7/site-packages/hermes_audio_server/cli.py", line 68, in main
server.start()
File "/usr/local/lib/python3.7/site-packages/hermes_audio_server/mqtt.py", line 68, in start
self.mqtt.loop_forever()
File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1578, in loop_forever
rc = self.loop(timeout, max_packets)
File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1072, in loop
rc = self.loop_read(max_packets)
File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 1374, in loop_read
rc = self._packet_read()
File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 2071, in _packet_read
rc = self._packet_handle()
File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 2560, in _packet_handle
return self._handle_publish()
File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 2759, in _handle_publish
self._handle_on_message(message)
File "/usr/local/lib/python3.7/site-packages/paho/mqtt/client.py", line 2902, in _handle_on_message
callback(self, self._userdata, message)
File "/usr/local/lib/python3.7/site-packages/hermes_audio_server/player.py", line 76, in on_play_bytes
output=True)
File "/usr/local/lib/python3.7/site-packages/pyaudio.py", line 750, in open
stream = Stream(self, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/pyaudio.py", line 441, in __init__
self._stream = pa.open(**arguments)
OSError: [Errno -9997] Invalid sample rate
However, after forcing 48000 rate output via asound.conf in the docker container (44100 still crashed) with this asound entry:
pcm.speaker {
type plug
slave {
pcm "hw:0,0"
rate 48000
}
}
… it plays successfully!
Curiously though the first log line changes from Connected to audio output bcm2835 ALSA: IEC958/HDMI (hw:0,1).
to Connected to audio output default.
. The only sound configuration that changed is the addition of the rate 48000
. I don’t know if that’s just a bug in display, or related to the problem. The Pi is headless so nothing is connected via HDMI, and I do hear the wakeword sounds over the 3.5mm audio (hw:0,0).
2019-07-01 07:11:03,481 INFO Connected to audio output default.
2019-07-01 07:11:03,497 INFO Connected to MQTT broker mqtt:1883 with result code 0.
2019-07-01 07:11:03,499 INFO Subscribed to hermes/audioServer/default/playBytes/+ topic.
2019-07-01 07:11:13,176 INFO Received an audio message of length 81.98 KiB with request id 75a604f3-2992-4007-9f0d-88235c7f9f59 on site default.
2019-07-01 07:11:13,628 INFO Finished playing audio message with id 75a604f3-2992-4007-9f0d-88235c7f9f59 on device default on site default.
2019-07-01 07:11:15,020 INFO Received an audio message of length 117.1 KiB with request id 770f7bb9-acab-4cfa-b224-b1bb54d403bc on site default.
2019-07-01 07:11:15,668 INFO Finished playing audio message with id 770f7bb9-acab-4cfa-b224-b1bb54d403bc on device default on site default.
2019-07-01 07:11:19,212 INFO Received an audio message of length 35.09 KiB with request id 8ea3ed0b-aef3-4236-b0ef-ee4a75ad5813 on site default.
2019-07-01 07:11:20,248 INFO Finished playing audio message with id 8ea3ed0b-aef3-4236-b0ef-ee4a75ad5813 on device default on site default.
Yes, audio playback in Linux is a bit of dark magic But I’m glad it works now. I hadn’t tested Hermes Audio Server yet in Docker.
You can use the Node-red addon for this, it has a “get entities node”, with which you can create a file with the file node.
So, while I have not tried this, you can at a certain interval get all the entities and create slots for it
A little demo again, voice controlling the brightness of the Matrix Voice with Rhasspy
This one is a bit better, it features my Hassio screen with the Matrix settings and a Pi Zero as a snapcast client
Is this just for node-red-contrib-home-assistant-websocket
or is there something built in?
It is the node-red-contrib-home-assistant-websocket, which is preinstalled in the Hassio node-red addon