Run whisper on external server

Just tested the new whisper add-on and it lags pretty badly on my RPi4 and the only sensible model option that actually runs, tiny-int8, has about 40% WER (word error rate) in my language (Polish) which is basically unusable for anything. I wanted to run whisper on an external beefier server, I made this docker-compose:

services:
  whisper:
    image: homeassistant/amd64-addon-whisper:latest
    container_name: whisper
    ports:
      - 10300:10300
    restart: unless-stopped
    volumes:
      - whisper-data:/data
    entrypoint: python3
    command: -m wyoming_faster_whisper --uri tcp://0.0.0.0:10300 --model tiny-int8 --beam-size 1 --language pl --data-dir /data --download-dir /data
volumes:
  whisper-data:

which in theory runs the same command as the official add-on does. Then I add the wyoming integration in HA with the IP of my docker host 192.168.10.22 and 10300 port. It adds successfully, I can select it as the speech to text option, but when I try to use it in the conversation window it just hangs on listening indefinitely and never do anything.

Nothing in HA logs, whisper docker logs says itā€™s running fine:

INFO:__main__:Downloading FasterWhisperModel.TINY_INT8 to /data
INFO:__main__:Ready

and doesnā€™t output anything when I try to dictate commands. When I run the whisper server health check command:

echo '{ "type": "describe" }' | nc -w 1 192.168.10.22 10300

from the homeassistant container I get this response from the whisper server:

{"type": "info", "data": {"asr": [{"name": "faster-whisper", "attribution": {"name": "Guillaume Klein", "url": "https://github.com/guillaumekln/faster-whisper/"}, "installed": true, "models": [{"name": "tiny-int8", "attribution": {"name": "rhasspy", "url": "https://github.com/rhasspy/models/"}, "installed": true, "languages": ["pl"]}]}], "tts": [], "handle": []}}

which indicates that HA conatiner communicates with whisper server just fine.

Debug assistant unfortunately time outs:

So what am I missing?
Can I run whisper server on a different machine somehow at all?

8 Likes

I tried this but Wyoming integration fails to connect to the Whisper server.
I get the below error on the Whisper docker:

curl: (6) Could not resolve host: supervisor
ERROR: Something went wrong contacting the API

An alternative would be to use some other hardware for homeassistant.
Whats the best hardware for this? or maybe just wait for rpi5, whenever that comes outā€¦?

I havenā€™t found a suitable Whisper Container on Docker hub
But maybe try running the Add-on on a separate Docker Host.

How about the wyoming-whisper docker image (it works for me btw)

Itā€™s referenced from HA Year of the Voice - Chapter 2: Let's talk - Home Assistant page

3 Likes

Why canā€™t I add another assistant in the settings? I only get to a login screen. Is it only for paying people? I feel Iā€™m musing something very basic hereā€¦

Did the wyoming integration part with whisper and piper but cans use them :confused:

I can run whisper in docker with the image you mentioned on an Intel NUC N5105 @ 2.00GHz NAS. I switched to the model ā€˜tinyā€™ and its pretty fast.

Do you have an idea how to fix the healthcheck?
Docker states:
Status unhealthy
Failer count xx
Last output /bin/sh: 1: nc: not found

Sorry fabs - Iā€™m no expert and only just get by with Docker.

The following may be of some help though:

I run faster_whisper with the medium-int8 model in a docker container on a Ryzen 1700 machine and it still takes ~15s to translate a simple command like ā€œturn on x lightā€. Not sure if any decent performance is possible right now.

2 Likes

Thank you for pointing out your hardware.
I consider buying a new host with the same N5105 cpu for local voiceassistant (whisper) but Iā€™m not sure whether it could handle the bigger models (small, base).
Could you please test the performance a bit more, which is the highest model usable for real-time stt on this CPU ?
Would very appreciate some whisper benchmark results with debug info on some middle-end hardware (between RPi4 and Intel Core i5) like N5105 Celeron, Odroid N2+, ā€¦ but havenā€™t found it anywhere (yet).

My HA runs in docker-compose on a
Intel(R) Coreā„¢ i5-3470 CPU @ 3.20GHz with 8Gb of RAM
Whisper works but it is slow (also around 15 seconds)

Speech-to-Text 14.21s :white_check_mark:
Engine stt.faster_whisper
Language en
Output switch off nook lights

Natural Language Processing 0.05s :white_check_mark:
Engine homeassistant
Language en
Input switch off nook lights
Response type action_done

1 Like

I did some tests. I used the sentence ā€œThe quick brown fox jumps over the lazy dogā€ spoken via piper plugin

Setting: --model tiny --language en ā†’ 6,09s almost instant response
Setting: --model base-int8 --language en ā†’ 6,79s short delay ~1s
Setting: --model base --language en ā†’ 7,69s delay ~2s

atm moment I stick to model tiny because it feels snappy and the accuracy is not the problem. I am still experimenting but coud not use it a lot.

1 Like

nc is for network connectivity and the error you are seeing means that nc is not installed in the container - potentially you build a custom container and install using a dockerfile to fix the issue

Hi,
I am also getting this timeout error.

Also entity state is shown as unknown.
image

But it added through wyoming protocol without any problems and with ā€œsuccessā€ message.

This is my docker compose:

  whisper:
    image: rhasspy/wyoming-whisper
    container_name: whisper

    restart: unless-stopped

    networks:
      internal:
        ipv4_address: $WHISPER_INTERNAL_IP

    ports:
      - $WHISPER_PORT:10300

    volumes:
      - $DOCKER_APPS_DIR/whisper:/data

    command: --model tiny-int8 --beam-size 1 --language pl --data-dir /data --download-dir /data

    environment:
      - TZ=$TZ

Logs from whisper
image

There is no increased CPU usage or anything, not even microphone activation. Just timeout.

Hi, there is something that iā€™m missing. At the beginning, when assist was introduced , it worked out of the box (i guess using external STT server, maybe google?). Then with latest HA releases this mode is no longer avalable, and we are forced to install local stt and tts engines. Why this change? Is it a commercial change to invite users to use Nabucasa service?
My impression is that at the actual state local STT is totally useless at least for languages different from english.

1 Like

Hi, one of the main principles of the Home Assistant platform is to not be dependent on any cloud service (nabucasa is optional) to protect your privacy. Thatā€™s why the assistant mechanics have, in conformity this principle, evolved to a cloudless model. That being said, itā€™s true that right now it doesnā€™t work that well.

This helped me get started running whisper on separate hardware from my HA Pi4: How to Use a Docker Compose File for Wyoming Whisper | ExitCode0

I have a few things running on this server so I just added this to my docker compose file. I use resource limits because this box also runs a Frigate server and Iā€™ve had that use up a crazy amount of resources in the past and cripple the other services.

  whisper:
    image: rhasspy/wyoming-whisper
    command: [ "--model", "medium-int8", "--language", "en" ]
    restart: unless-stopped
    ports:
      - 10300:10300
    deploy:
      resources:
        limits:
          cpus: "4.0"
          memory: 8096M
3 Likes

Thank you for this, Iā€™ll look into it.

If itā€™s not too much to ask, could you please indicate what kind of performance youā€™re getting and whatd hardware you have ?

Iā€™m not sure how to get performance metrics but it takes 2-4 seconds after I finish speaking and press the voice button to when the response shows up and begins speaking back to me.

I donā€™t see any timing info in the whisper logs in docker.

Iā€™m running this and a few other containers on
Debian 11 bullseye
AMD Ryzen 7 3800X
32 GB RAM

1 Like