Home Assistant + Local Voice Setup (TTS and STT)

Im absolutley new to this Forum and in general to Forums (Posting in them).

But I got a good working setup, with Local Voice Assistant and I think its worth sharing. Maybe someone finds it useful…

Im trying to write it as complete as possible to be able to replicatable it (in some way)

Hardware

First of all my Hardware. I have chosen those parts with Powersavings (living in Germany, 27ct/kWH), good Performance, Price and noise in mind. (Server is in my Office)
Im using an self-build Server with the following components:

  1. Mainboard: Gigabyte M12-LEO (AM4, IPMI) (~ used, as new, 50€)
  2. CPU: Ryzen 3700x (used, 60€)
  3. CPU Fan: AMD Wraith Prism (used, 10€)
  4. RAM: 2x8GB + 2x16GB of DDR4 Memory clocked 2133Mhz (had some on hand, rest 40€)
  5. PSU: 600W bequiet Powersupply (used, 30€)
  6. CASE: Coolermaster Master Masterbox M300L (used, 40€)
  7. STORAGE: 2x 1TB Curial MX500 SSD (on hand, would be around 110€) + 2 TB WD Red (on hand) as Backupdrive
  8. GPU: NVIDIA Quadro P400 (on hand, would be used ~40€)

As said, im pretty happy with my setup. Its not perfect, but with Powertarget for the CPU (BIOS) set to 80W, and schedutil (Powersaver didnt boost my Clocks up when needed and used same idle power) im on 28-32W idle. I got the GPU pretty recent and didnt measure Powerconsumption again, but its stated with 5 Watts at idle. So around 35-40 Watts I think.

Its nearly completley silent. In Idle the Fan is only turning like 200RPM. I tried to set it up so the Fan is off at idle, but the Serverboard doenst allow fans to go to 0 RPM.

Im using Proxmox with ZFS Mirror on the SSD and the WD-RED for Backups. If your just running HA, use less storage. I think 240GB for each SSD is fine. Thow in any old HDD as Backup.

Sometimes I do an Offline Backup to an USB Harddrive.

I got the HDD to only spin up when the daily backup on Proxmox starts, using Hook Skripts and HD Idle. I know it can affect the Harddrive, but its only spinnning up once per day, so it will be fine.

Voice Assist Setup

My Goal was to get it working reliably locally for small tasks like “Turn on the Light”.

First I setup the Wyoming Protocol Addons (whisper, piper, openWakeWord). It was working in German, but terribly slow (12 Sec.) I tried diffrent modells, larger ones and smaller ones but none were perfect. The small ones often where waaaay to inaacurate and the large ones way to slow. So I needed acceleration.

By Luck I got my hands on an free Nvidia Quadro P400 which turned out to be just right for the job. Its low on powerdraw, pretty silent and delivers enough performance for TTS and STT.

Since HAOS doenst support GPUs i needed something external. I had an Docker setup as an LXC with Portainer.

I needed to setup the nvidia drivers for the Proxmox host, to then pass them through to docker and the containers.
After some failed attempts, I stumbled accros this manual for the GPU setup which worked like a charm.

You need an Image with GPU support for docker.
This is the compose file I used:

services:
  wyoming-whisper:  
    image: slackr31337/wyoming-whisper-gpu:latest  
    container_name: wyoming-whisper
    environment:  
      - MODEL=medium-int8
      - LANGUAGE=de
      - COMPUTE_TYPE=int8
      - BEAM_SIZE=5
    ports:  
      - 10300:10300
    volumes:  
      - /path/to/persistent/data:/data  
    restart: unless-stopped
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities:
                - gpu
                - utility
                - compute
  wyoming-piper:  
    image: slackr31337/wyoming-piper-gpu:latest  
    container_name: wyoming-piper
    environment:  
      - PIPER_VOICE=de_DE-eva_k-x_low
    ports:  
      - 10200:10200
    volumes:  
      - /path/to/persistent/data:/data  
    restart: unless-stopped
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities:
                - gpu
                - utility
                - compute

In Home Assistant I connected the Services via the Wyoming integration and selected the new Pipelines in the “Assistant” settings.

Im really happy with the accuracy and the reaction times. Its nearly allways on spot with what I have said and needs about 3 Seconds to respond to short questions or actions.

I also tried OpenWebUi as an Conversionagent. I used Ilama3.2b but the responses where extremly long, and I wasnt able to run my TTS and STT Containers parallel with OpenWebUI since the Containers need around 900MiB of VRAM and open Webui 1400MiB. Im sure its possible to optimize llama furhter, but since I cannot run it in parallel i settled for the Home Assistant Conversion Agent.

Since its working good now I decided to buy the Voice Satellite form Nabu Case for around 70€.

I would like to be able to confiigure the Home Assistant Coversation Agent a bit more. Its good for basic tasks, but since I good a good CPU I could leverage it to use a bigger LLM. Maybe thats something that will come in the Future.

If you got any questions feel free to ask.

Im open for any suggestions for Improvment! Feel free to comment

Hope its useful for someone!

1 Like