Egg Voice Assistant

OK wow. The reason for my original question is purely because in this photo:
image

…it honestly looks like there is practically an LED for each segment of the diffuser. So it does an amazing job from that single ring of LED’s!

Waou! this is really a beautiful project !!
Thank you for sharing it :slight_smile:

I’m not sure to understand where you installed the microphone and the speaker.
For the speaker, the picture you posted here helps a lot, I understand that it is in the middle of the led circle, right ?
What kind of 50~51mm speaker would you recommend? A big one like this:
image
Or a thiner one:
image

And about the microphone, where did you install it?
I would imagine it on the top of the egg, but I can’t see a dedicated space on your thingiverse model.

Thanks for the precisions :slight_smile:

Just asking is there a site to buy already built. Not the tinkering type lol

I designed a small hole in the base for the mike. You can see it in the attached image. I also ran an old usb cable in, which is hard wired to the board.

Do not have a site to buy. Still playing with it in my free time, which I do not have right now. Future versions have a printed ring backed by a trimmed wire hanger to lock the outer screen on.

OK nice!
So the power cable come from this hole and the mic is positioned close to that hole.
And about the speaker, do you have a recommandation?
I’m about to buy the big one on the picture posted above.
THanks :slight_smile:

There is a hole for the mic and a hole for the cable, opposite one another. The mic hole as a lip on the inside to place it and melt a little filament to keep it in place.

The speaker I used is close to the larger one. I wound up buying a bluetooth speaker and pulled the speaker out of that. I also used the diffuser from it. I put silver duct tape (the actual metal type) on the inside of the diffuser. It cost about $5.00. Plus you have an extra rgbw and battery to play with.

The speaker can be on top or it can be below.


I have a 10 second delay in any responses. I’ve tried changing models and settings in esphome/whisper/piper.

Any suggestions?

I do not at this moment. I get a quick responses and sometimes some lag. Not sure if it could be a network thing or a cheaper ESP32.

I spent all day trying to speed up recognition and improve detection. I ended up offloading whisper and piper to docker compose in a VM on my server and used linxuserver faster whisper to enable GPU support… it’s seemless speed between any other off the shelf device now.

The link above is a good example. I’ve tried larger models such as turbo but need to improve my graphics card. If the assist pipeline fails with an sst message it’s most likely due to hardware limitations running models locally.

Can you post your docker-compose.yaml file? Mainly for open-wakeword, piper, whisper, and GPU setup.

Thanks!

  whisper:
    container_name: whisper
    image: lscr.io/linuxserver/faster-whisper:latest
### GPU uncomment to use GPU prefer tensor core capable unit
#    runtime: nvidia
#    deploy:
#      resources:
#        reservations:
#          devices:
#            - driver: nvidia
#              count: 1
#              capabilities: [gpu]
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Etc/UTC
      - WHISPER_MODEL=tiny.int8  #base.int8
#      - NVIDIA_VISIBLE_DEVICES=all  # Allows all GPUs; adjust as needed  
#      - NVIDIA_DRIVER_CAPABILITIES=all  
    volumes:
      - /share/whisper-piper/whisper-data:/data
    ports:
      - 10300:10300
    restart: unless-stopped


piper:
    container_name: piper
    image: rhasspy/wyoming-piper:latest
    command:
      --voice en-us-ryan-high
    volumes:
    - /share/whisper-piper/piper-data:/data
    environment:
      - TZ=UTC
    restart: unless-stopped
    ports:
      - 10200:10200


#  openwakeword:
#    container_name: openwakeword
#    image: rhasspy/wyoming-openwakeword:latest
#    volumes:
#      - /share/whisper-piper/openwakeword:/data
#      - /share/whisper-piper/openwakeword/custom:/custom
#    runtime: nvidia
#    deploy:
#      resources:
#        reservations:
#          devices:
#            - driver: nvidia
#              count: 1
#              capabilities: [gpu]
#    environment:
#      - NVIDIA_VISIBLE_DEVICES=all
#      - NVIDIA_DRIVER_CAPABILITIES=all
#      - TZ=UTC
#      - PROBABILITY_THRESHOLD=0.2 # Adjust this value
#      - MINIMUM_MATCHES=1 # Adjust this value
#      - MODEL_PATH=/custom/lambda.tflite
#      - MODEL_TYPE=tflite
#    command:
#      - --preload-model=lambda    # Specify your custom wake word name
#      - --custom-model-dir=/custom         # Directory inside the container for custom models
#    restart: unless-stopped
#    ports:
#      - 10400:10400

WORD OF CAUTION!!! IF YOU HAVE OPENWAKEWORD ADD-ON AND AN EXTERNAL OPENWAKEWORD INSTANCE VIA WYOMING IT WILL CAUSE ISSUES WITH HOME ASSISTANT. VOICE ASSISTANT WILL TRY TO CONNECT TO BOTH AND SINCE IT DISCONNETS AND CONNECTS OFTEN IT WILL CAUSE VOICE TO BE UNUSABLE.

It took me a while to figure ouw that was going on above just fyi.

Training custom wakewords can be done via searching “automatic model training simple google colab”. I’d highly recommend google colab and purchasing 100 credits for 10 dollars. Use the A100 instance and you should be able to train 5-6 models for 10 dollars. I trained 2 and still had 60+ credits left… note to pay attention though because if you keep the instance spun up it will drain your credits even though it is not doing anything.

Thanks,

My whisper and piper was working until the last updates. Was doing good until then, and now all screwed up.

I did use lscr.io/linuxserver but had problems with it, wanted to see what other configs look like and see where I might be going wrong.

I’m funning on an i7, 32gb ram and a PNY NVIDIA RTX A2000 12GB.

Might just need a good kick in the reboot…

Thanks again.

I’ve really been looking at an A2000 or A4000.

I have HA spooled up in a VM.

Frigate with compreface and deepstack and all the voice elements spun up in another VM. They are running on an older server with 2 E5-2690’s, 768 gb DDR4, coral tpu and a quadro p600. The p600 doesn’t support tensor and even given the machine 24+ core access still was not good for larger models that the tiny one. I’m curious how well it’d do on an A2000/4000 and if it’s worth it.

What do you use instead of linuxserver? Lmk how it works with the GPU!

Here is the piper markdown I am trying out right now. Trying to see if about making piper use the GPU.

  # lspipepr/piper:gpu-version-1.4.0
  # lscr.io/linuxserver/piper:latest
  piper:
    container_name: piper
    #image: rhasspy/wyoming-piper:latest
    image: lspipepr/piper:gpu-version-1.4.0
    command:
       #--voice en_US-ryan-high     
       #--cuda
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=America/Chicago  
      - PIPER_VOICE=en_US-lessac-medium
      - PIPER_LENGTH=1 #optional play speed
      - PIPER_NOISE=0.667 #optional
      - PIPER_NOISEW=0.333 #optional
      - PIPER_SPEAKER=0 #optional
      - PIPER_PROCS=1 #optional
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=compute,utility
      - DEVICE=cuda
      - privileged
    volumes:
      - /opt/voiceAssistant/lspipepr-piper-data/models:/root/.local/share/piper
      - /opt/voiceAssistant/lspipepr-piper-data:/data
      - /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.7:/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8:ro
      - /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.7:/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8:ro
      - /usr/lib/x86_64-linux-gnu/libcublasLt.so.11:/usr/lib/x86_64-linux-gnu/libcublasLt.so.12:ro
      - /usr/lib/x86_64-linux-gnu/libcublas.so.11:/usr/lib/x86_64-linux-gnu/libcublas.so.12:ro
    runtime: nvidia
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              #count: 1
              device_ids: ['0']
              capabilities: [gpu]
#                - gpu
#                - utility
#                - compute 
    ports:
      - 10200:10200
    restart: unless-stopped

I just got everyting up and running on a RTX 3060 picked one up cheap and figured why not… all I can say is this is definitely usable.

I have to polish up some sentence recognition and build some helper automtations for things like turn up temp, turn down volume in x… but sub 5 secon responses now.

You can simple say dim lights in room, turn off all lights in room. What’s temperature outside? How’s the weather?.. so forth it responds pretty well.

Running Ollama with Llama 3.2 in GPU
Faster Whisper with medium-int8 model in GPU

Piper I’m still running local. It is have 0 seconds to respond.

I am trying this on the preview edition voice now… and for being what it is, I’m impressed.

I am using portainer.io in the case below thus why I’m calling them out in separate stacks.

Ollama installation of models isn’t very straightforward but you can do it via the webui by typing the model in and htting enter or do it via the ollama integration in HA.

Stack 1

services:
  ollama:
    volumes:
      - ollama:/root/.ollama
    container_name: ollama
    pull_policy: always
    tty: true
    restart: unless-stopped
    image: ollama/ollama:${OLLAMA_DOCKER_TAG-latest}
    ports:
      - ${OLLAMA_WEBAPI_PORT-11434}:11434
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            capabilities: ["gpu"]
            count: all

  open-webui:
    build:
      context: .
      args:
        OLLAMA_BASE_URL: '/ollama'
      dockerfile: Dockerfile
    image: ghcr.io/open-webui/open-webui:${WEBUI_DOCKER_TAG-main}
    container_name: open-webui
    volumes:
      - open-webui:/app/backend/data
    depends_on:
      - ollama
    ports:
      - ${OPEN_WEBUI_PORT-3000}:8080
    environment:
      - 'OLLAMA_BASE_URL=http://ollama:11434'
      - 'WEBUI_SECRET_KEY='
    extra_hosts:
      - host.docker.internal:host-gateway
    restart: unless-stopped


volumes:
  ollama: {}
  open-webui: {}

Stack 2

services:
  whisper:
    container_name: whisper
    image: lscr.io/linuxserver/faster-whisper:gpu  #was latested but needed GPU to run on GPU
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Etc/UTC
      - WHISPER_MODEL=medium.int8  # Adjust as needed (e.g., base.int8)  small base tiny medium are the options without int8 and with int8
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: ["gpu"]
              count: all
    volumes:
      - /whisper-piper/whisper-data:/data
    runtime: nvidia
    ports:
      - 10300:10300
    restart: unless-stopped

EDIT: This does require cuda and nividia container to be installed. Both of these have very clear instructions from Nvidia if you search. This did take me about 2 hours to get through it.

1 Like