[Voice Assistant Contest] HA-Visual-Voice-Assistant

Rellu · February 9, 2024, 9:45pm

Hello everyone,

This is a demo of a voice assistant made in Home Assistant with visual responses played on a tablet on which I installed fully kiosk browser and browser mod.

Here’s a tutorial:

https://www.youtube.com/watch?v=bZgH4NDmBpk

EDIT UPDATE
Due to the fact that AlexxIT managed to implement Wake Word detection in the StreamAssist integration, I managed to make some changes in this integration to be able to play these random visual responses without the need for an Esp32 satellite, using:

RtpMic android app
Fully Kiosk Browser app
Browser mod integration
Preferred tts service
Preferred tts voice and language
Multiple tts random custom responses in your preffered language and voice
For more information you can visit my github page
https://github.com/relust/VisualStreamAssist/blob/main/README.md

VISUAL RESPONSES USING ESP32 SATELLITE

Can be used if the version with StreamAssist integration does not work for you or for Google Home displays.
For more information you can visit my github page
github: https://github.com/relust/HA-Visual-Voice-Assistant

Here’s a small demo - for English turn on subtitles:

https://youtu.be/EcL6o62Vnoo

Hardware

ESP32 dev board
INMP441 Microphone
WS2812 Led
copper conductor spiral for touch
printed ha logo case
google home display

Software

ESPHome
Google Cloud STT
Edge TTS
Porcupine
Fully kiosk browser
Browser mod
Extended OpenAi integration
Vindoz AI talking photo platform

What he does:

When the wake word is detected, the MUTE switch on the Esp32 satellite turns on. This is a REAL MUTE SWITCH made by connecting the L/R pin on the microphone to a digital pin on the Esp32 through which the microphone records only when there is voltage on this pin, and in the code, the microphone is set to the left channel.
When the wake word is detected, a video and audio response is streamed through the browser mode media player on the tablet’s display. You can make several answers that can be played randomly.
After this answer is finished playing, the mute switch goes back to off and the listening starts.
When the streaming of the tts response starts, it is played through the fully kiosk media player on the tablet.
At the same time, a no_sound_speech video is sent, i.e. simulation of speech but without sound, through the browser mode media player on the tablet display.
Several voice assistants can be set, each with individual responses and pipelines that can be exchanged through voice commands.

ObiDenKenobi · February 10, 2024, 8:14am

Very well done this, especially the changing of languages and assistants.
Can you share a bit more on the creation of the assistant characters (Javis & Sheila)?

SuperElma · February 10, 2024, 5:37pm

very nice made.

mclever · February 24, 2024, 12:37am

Is there any reason the tablet couldn’t do all the work instead of needing an esp32?

daywalker03 · February 24, 2024, 5:11am

As I understand it, there is no support for Wake Words in the browser and you would need to manually interact with the tablet to start Assist, plus voice input doesn’t work if you don’t have https set up on your Home Assistant server. That’s why the ESP32 board is needed.

Rellu · February 24, 2024, 7:02am

Indeed, that is true. Now I’m working to replace the Esp32 satellite with AlexxIT’s Stream Assist integration in this project and I think I’ll be able to finish this soon.

daywalker03 · February 24, 2024, 7:11am

I just started using it myself, so I know you should be able to do some neat things with it.

echopage · March 4, 2024, 10:23am

Hi, I’m trying to start your project, but when I go to test it in service it gives me this string!

stdout: ""
stderr: ""
returncode: 0

jenova70 · March 17, 2024, 9:41am

The results of the contest are out!
They may be of interest to you
Have a look!

Rellu · March 17, 2024, 5:12pm

Thanks for the appreciation and I look forward to further information.

WhiteSockedDancer · March 19, 2024, 12:03pm

Very nice setup and demo! Congratulations to your win!

How do you switch between different wakewords/pipelines/languages on the same device?

Rellu · March 19, 2024, 12:25pm

With a script and automation, I will upload them to my GitHub. The Esphome satellite integration exposes several services: next pipeline, first pipeline, and last pipeline. An SSH script overwrites the GIFs for speech and listening. Additionally, an automation triggered by conversation executes them.

WhiteSockedDancer · March 20, 2024, 12:54am

Thx! So you just switch the select of the pipeline entity of the esphome device. I didnt think of that. nice!

Any idea how we could enable different wakewords on the same device in parallel? Should be simple with remote wakeword-detection to send the audio to two different pipelines

jconcep · June 11, 2024, 12:40pm

i have a fire tablet installed all the requirements and am running fully kiosk, however when I attempt to use the wake work i see rtp mic moving but nothing else happens. Along with fully kiosk do i also need to have the home assistant companion app installed??

Ranur · August 13, 2024, 11:11am

Whoa! Given your update - does this mean you can run the entire shebang from a google mini or nest hub? Without external microphone?

Rellu · August 13, 2024, 11:22am

You can use google nest hub or google speaker with a esp32 sattelite for microphone.

Ranur · August 13, 2024, 11:33am

But not with the built in microphone?

Rellu · August 13, 2024, 11:39am

No. You can’t use google speakers built in microphone to comand Home Assistant Assist. On google nest mini you can replace internall pcb with a onju voice pcb to use internal microphone. Onju pcb is an esp32 sattelite. Search onju voice on youtube for more informations.

dragonmaster0430 · January 30, 2025, 10:02pm

Does anyone know how to use the “Vindoz AI talking photo platform” to make new visual assistants?

Rellu · February 11, 2025, 6:41am

I’ll tell you how I did it. I opened the Vidnoz platform and selected an avatar or uploaded a photo, then let it read a text but then with more spaces at the end to simulate speaking and then listening. Then I downloaded the video made in this way and in CorelVideoStudio I cut out a piece simulating speech and generated a gif that would repeat endlessly, and I did the same with the gif simulating listening.
Then with Gimp I made the backgroud transparent to each gif, but this is not necessarily necessary.