[Voice Assistant Contest] HA-Visual-Voice-Assistant

Hello everyone,

This is a demo of a voice assistant made in Home Assistant with an Esp32 satellite with visual responses played on a tablet on which I installed fully kiosk browser and browser mod.

Here’s a small demo - for English turn on subtitles:


For more information you can visit my github page

github: https://github.com/relust/HA-Visual-Voice-Assistant


  • ESP32 dev board
  • INMP441 Microphone
  • WS2812 Led
  • copper conductor spiral for touch
  • printed ha logo case
  • android tablet


  • ESPHome
  • Google Cloud STT
  • Edge TTS
  • Porcupine
  • Fully kiosk browser
  • Browser mod
  • Extended OpenAi integration
  • Vindoz AI talking photo platform

What he does:

  • When the wake word is detected, the MUTE switch on the Esp32 satellite turns on. This is a REAL MUTE SWITCH made by connecting the L/R pin on the microphone to a digital pin on the Esp32 through which the microphone records only when there is voltage on this pin, and in the code, the microphone is set to the left channel.
  • When the wake word is detected, a video and audio response is streamed through the browser mode media player on the tablet’s display. You can make several answers that can be played randomly.
  • After this answer is finished playing, the mute switch goes back to off and the listening starts.
  • When the streaming of the tts response starts, it is played through the fully kiosk media player on the tablet.
  • At the same time, a no_sound_speech video is sent, i.e. simulation of speech but without sound, through the browser mode media player on the tablet display.
  • Several voice assistants can be set, each with individual responses and pipelines that can be exchanged through voice commands.

Very well done this, especially the changing of languages and assistants.
Can you share a bit more on the creation of the assistant characters (Javis & Sheila)?


very nice made.

Is there any reason the tablet couldn’t do all the work instead of needing an esp32?

As I understand it, there is no support for Wake Words in the browser and you would need to manually interact with the tablet to start Assist, plus voice input doesn’t work if you don’t have https set up on your Home Assistant server. That’s why the ESP32 board is needed.

Indeed, that is true. Now I’m working to replace the Esp32 satellite with AlexxIT’s Stream Assist integration in this project and I think I’ll be able to finish this soon.

I just started using it myself, so I know you should be able to do some neat things with it.

Hi, I’m trying to start your project, but when I go to test it in service it gives me this string!

stdout: ""
stderr: ""
returncode: 0