Year of the Voice - Chapter 2: Let's talk

I really hope satellite and presence is on the future roadmap of Assist.

As someone else pointed out it is redundant if I am in the living room and I have to say turn on the living room light. The user should have to only say-turn on the light.

1 Like

This is a requirement imposed by browsers. Websites cannot access user’s private information (including the sound in your room) unless the site is served over HTTPS. This is only limited to browsers. Any Assist devices, like ESPHome or VoIP, will not have this limitation. When we add this to the apps natively, it won’t be an issue either.

6 Likes

Thank you, I really appreciate your answer because it is a priority for me and also the only option to control my devices in Czech.

1 Like

Thank you for the tip! I’ll give that a try.

I may finally have a legitimate reason for upgrading my Intel NUC to something more powerful :sweat_smile:

Thanks for posting how to do this.

Ultimately, is the plan to include wake-word capabilities from within the apps natively? That would allow us to use things like Chromebook tablets, Android or iOS tablets or phones, etc., as “satellite” devices for Rhasspy!

1 Like

I would love to see an outline of setting training / tuning the models as mentioned for piper.

However in general I have a question… Do we need a full seperate instance of piper for different voices or is there a way to load multiple voices?

–voice en-us-amy-low

Will down load amy, but I tried a few different voices, the files are there in the piper docker data folder, but when I add poper to Text-to-speech in assist setup I am only able to select the current voice that was passed in the command line?

In one piper docker instance can we pass --voice multiple times? user commas to separate several voices?

Similarly if we are creating an automation in with service: tts.speak can we pass a voice name in the data?

Thus have one instance of piper and have notifications with different voices? Maybe have a voice specific to kids rooms or something?

This is probably a newbie type of question as I’ve never used Docker.
I have HA installed on a RPI4 but I have a desktop running Windows (because of BlueIris) with more CPU/RAM, so I went ahead and installed Docker there, but when I run the Piper image I get this:

usage: __main__.py [-h] --piper PIPER --voice VOICE --uri URI --data-dir
                   DATA_DIR --download-dir DOWNLOAD_DIR [--speaker SPEAKER]
                   [--noise-scale NOISE_SCALE] [--length-scale LENGTH_SCALE]
                   [--noise-w NOISE_W] [--auto-punctuation AUTO_PUNCTUATION]
                   [--samples-per-chunk SAMPLES_PER_CHUNK] [--debug]
__main__.py: error: the following arguments are required: --voice 

You have to use a docker compose file to pass the required arguments. Look further up the thread.

Well I have to say I used Mizudroid and a simple thing like turning a light on/off worked straight off. Good work devs and @Cadster

I run Home Assistant on Docker on a 1950x Threadripper Unraid server with a 2070 super, and dual coral tpu

Wyoming-Whisper addon requires Home Assistant OS and no GPU support.
Wyoming-Whisper docker does not support GPU or TPU
OpenAI-Whisper does not support Wyoming protocol.

GRR.

Anyone successfully created a yaml self? I run into 2 issues:

  1. a lot of compile warnings. (dpaste/6r36P (Python))
    2. I cannot get the M5 stack in boot load.
    → I did it now via browser instead of esphome flasher… seems some sort of baud issue.

pressed both buttons on boot, left-face button,longer, shorter…

Using 'COM12' as serial port.
Connecting.....
Detecting chip type... ESP32
Connecting.....

Chip Info:
 - Chip Family: ESP32
 - Chip Model: ESP32-PICO-D4 (revision 1)
 - Number of Cores: 2
 - Max CPU Frequency: 240MHz
 - Has Bluetooth: YES
 - Has Embedded Flash: YES
 - Has Factory-Calibrated ADC: YES
 - MAC Address: 4C:75:25:A6:3E:E4
Uploading stub...
Running stub...
Stub running...
Changing baud rate to 460800
Changed.
Unexpected error: Reading chip details failed: Timed out waiting for packet header

EDIT: Despite of the compile warnings it seems to do the job. Now I can manually update esphome :slight_smile:

is it possible to create an option for the new voip integration to enter an external sip phone server with credentials? So that i can use my home telephone for home assistant assist, connected through a modem.

1 Like

Is there a way to get the wav file that was fed to whisper?
Recognition rate is quite low for me, but I have no way to tell if it’s whisper or the microphone quality/setup.

1 Like

I see there is support for the Grandstream HT801.

I don’t have one of those but I do have a couple Obihai obi200 devices which appear to perform the same function as the Grandstream.

Does anyone know if it is possible to set up the obi200 to work with the Voice Over IP integration?

Check the device’s manual for direct SIP dialing.

I’ve been working on this as I use an Obi202. I was able to get the Obi202 configured to call HA and the entities show up, but it looks like the VOIP integration today only supports the newer OPUS codec, not any other SIP standard codecs, so it isn’t working.

I’m hoping they will support older codecs (G711U, G771A, G729) which will enable a TON of existing ATA devices (and straight up SIP clients) to work.

I have opened a feature request, please engage there! https://community.home-assistant.io/t/support-for-other-codecs-in-voip-integration/568580

Maybe I missed something. The on/off switch control works?
The light entities work for me, but not the switches.

Concerning the implementation of wake-word detection, you guys should definitely have a look at what this guy has done. He did the whole process two years ago already. Saw the video already a few months back: Build your own Alexa with the ESP32 and TensorFlow Lite - YouTube

The code is also available: GitHub - atomic14/diy-alexa: DIY Alexa

1 Like

Wakeword already worked pretty well in Rhasspy, so I assume it’s just a matter of time…

1 Like