[Voice Assistant Contest] HA-Visual-Voice-Assistant

Hello everyone,

This is a demo of a voice assistant made in Home Assistant with visual responses played on a tablet on which I installed fully kiosk browser and browser mod.


Here’s a tutorial:

https://www.youtube.com/watch?v=bZgH4NDmBpk

EDIT UPDATE
Due to the fact that AlexxIT managed to implement Wake Word detection in the StreamAssist integration, I managed to make some changes in this integration to be able to play these random visual responses without the need for an Esp32 satellite, using:


VISUAL RESPONSES USING ESP32 SATELLITE

Here’s a small demo - for English turn on subtitles:

https://youtu.be/EcL6o62Vnoo

Hardware

  • ESP32 dev board
  • INMP441 Microphone
  • WS2812 Led
  • copper conductor spiral for touch
  • printed ha logo case
  • google home display

Software

  • ESPHome
  • Google Cloud STT
  • Edge TTS
  • Porcupine
  • Fully kiosk browser
  • Browser mod
  • Extended OpenAi integration
  • Vindoz AI talking photo platform

What he does:

  • When the wake word is detected, the MUTE switch on the Esp32 satellite turns on. This is a REAL MUTE SWITCH made by connecting the L/R pin on the microphone to a digital pin on the Esp32 through which the microphone records only when there is voltage on this pin, and in the code, the microphone is set to the left channel.
  • When the wake word is detected, a video and audio response is streamed through the browser mode media player on the tablet’s display. You can make several answers that can be played randomly.
  • After this answer is finished playing, the mute switch goes back to off and the listening starts.
  • When the streaming of the tts response starts, it is played through the fully kiosk media player on the tablet.
  • At the same time, a no_sound_speech video is sent, i.e. simulation of speech but without sound, through the browser mode media player on the tablet display.
  • Several voice assistants can be set, each with individual responses and pipelines that can be exchanged through voice commands.

Very well done this, especially the changing of languages and assistants.
Can you share a bit more on the creation of the assistant characters (Javis & Sheila)?

6 Likes

very nice made.

1 Like

Is there any reason the tablet couldn’t do all the work instead of needing an esp32?

As I understand it, there is no support for Wake Words in the browser and you would need to manually interact with the tablet to start Assist, plus voice input doesn’t work if you don’t have https set up on your Home Assistant server. That’s why the ESP32 board is needed.

Indeed, that is true. Now I’m working to replace the Esp32 satellite with AlexxIT’s Stream Assist integration in this project and I think I’ll be able to finish this soon.

I just started using it myself, so I know you should be able to do some neat things with it.

1 Like

Hi, I’m trying to start your project, but when I go to test it in service it gives me this string!

stdout: ""
stderr: ""
returncode: 0

The results of the contest are out!
They may be of interest to you :wink:
Have a look!

Thanks for the appreciation and I look forward to further information.

Very nice setup and demo! Congratulations to your win! :wink:

How do you switch between different wakewords/pipelines/languages on the same device?

With a script and automation, I will upload them to my GitHub. The Esphome satellite integration exposes several services: next pipeline, first pipeline, and last pipeline. An SSH script overwrites the GIFs for speech and listening. Additionally, an automation triggered by conversation executes them.

Thx! So you just switch the select of the pipeline entity of the esphome device. I didnt think of that. nice!

Any idea how we could enable different wakewords on the same device in parallel? Should be simple with remote wakeword-detection to send the audio to two different pipelines :smiley: