Assist: Rhasspy-Speech for Speech-to-Text

The latest voice blog post mentioned a new speech to text option: Rhasspy-Speech:

I have hugh delays with Whisper due to limited hardware. Since Rhasspy standalone worked great for me I really would like to use the new Wyoming-Rhasspy-Speech.

So far I did not manage to run it via docker:

  • docker-compose.yml
    ’services:
    rhasspy-speech:
    container_name: rhasspy-speech
    image: rhasspy/wyoming-rhasspy-speech:1.0.0
    volumes:
    - $PWD/models:/models
    - $PWD/train:/train
    ports:
    - 10300:10300
    restart: unless-stopped
    logging:
    driver: journald’
  • Put a model from here into models (extracted) (rhasspy/rhasspy-speech at main)

Startup fails with “WARNING:Skipping model en_US-rhasspy (not trained)
rhasspy-speech | WARNING:No trained models found.”

Any ideas what I missed?
I believe I also need to provide a sentences.yml somewhere?

I did not try myself but picked up this docker-compose example:

services:
  rhasspy-speech:
    container_name: rhasspy-speech
    image: "rhasspy/wyoming-rhasspy-speech:1.0.0"
    restart: unless-stopped
    volumes:
      - "./config/models:/models"
      - "./config/training:/training"
    ports:
      - "10300:10300"
      - "8099:8099"     
    command: 
      - "--hass-token=your_token_here"
      - "--hass-websocket-uri=ws://home_assistant_url:8123/api/websocket"

Heres a link to the source. (discord)

8-1-2025: updated the link

1 Like

Thanks. I manually build the docker image to have v1.4.3 and used your provided yaml. Works now :slight_smile: Thanks

1 Like

How are your results? Speed compared to whisper? Accuracy compared to whisper?

first tests are great. I did not do any professional comparison and can only report the"felt" improvement. Accuracy is very good and its much faster than Whisper (tiny-int8) for me. You need to manually provide the sentences and train tho…

I have not yet tested that implementation but I had tested rhasspy in the past, basically 100% success and very fast at garbage hardware (RPi4 iirc) + 1€ USB Mic :slight_smile:

I have so much false detects ~50% with whisper (could be language related) even at medium llvm because the recognised words are slightly off so nothing works.
So it detects “of” or “offer” instead of off - nothing works.

I have a pretty bad experience with faster-whisper, it’s often recognize words wrong. So am I waiting for a working Rhasspy-Speech docker image to try.

I have just built the docker image myself.
You can just use the provided Dockerfile and update the version: wyoming-addons/rhasspy-speech/Dockerfile at eb1985688e429d217e0de788567e53f0fea898bc · rhasspy/wyoming-addons · GitHub

thanks, I gonna wait for official image update.

For me also, rhasspy has seemed more accurate and faster from first impressions.

Hi,

I’m unable to open the Discord link.

I have the same error in my container when triggering voice in the Companion app (I have extracted a model in :

~/docker/wyoming-rasspy $ ls config/models/nl_NL-cgn/
config.json  frequent_words.txt  g2p.fst  lexicon.db  LICENSE  model  phoneme_examples.txt  README.md  SOURCE




rhasspy-speech  | INFO:root:Ready
rhasspy-speech  |  * Serving Flask app 'rhasspy_speech'
rhasspy-speech  |  * Debug mode: off
rhasspy-speech  | INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
rhasspy-speech  |  * Running on all addresses (0.0.0.0)
rhasspy-speech  |  * Running on http://127.0.0.1:8099
rhasspy-speech  |  * Running on http://172.18.0.2:8099
rhasspy-speech  | INFO:werkzeug:Press CTRL+C to quit
rhasspy-speech  | WARNING:root:Skipping model nl_NL-cgn (not trained)
rhasspy-speech  | WARNING:root:No trained models found.
rhasspy-speech  | ERROR:root:No model selected

In HA I have configured the ‘faster-whisper’ integration.

I would like to improve my tests, with Dutch language.

@ronnie_j

Open the web ui via port 8099 in order to download the model and train your custom commands.

Hi,
i tried “Whisper” and also “Rhasspy-Speech”.
i am extremly impressed with the accuracy and incredible speed of rhasspy-speech.

When i use “Whisper” it understands me in general, but the text “result” is a little bit different each time, i.e. the text results of three tries starting the script for WebRadio in German:
“Bayern-1-Webradio starten”
“Bayern eins Webradio starten”
“Bayern 1 Web Radio starten”
→ Home Assistant never understood me, because the script is called “Bayern 1 Webradio starten”

But it does not understand “new” words.
When i want to “name” a timer, or add an item to the to-do / Shopping-list it does not understand me.

Would this “Improvement” be possible?

First try speech-recognition with “rhasspy-Speech” when it does not recognize the sentence, the “Audio” is automatically handed over to “whisper” which then recognizes the sentence and gives it back to Home-Assistant.

This would be an awesome feature!
It would combine the best of both worlds and would be fast but also flexible.

Thank you for your help in advance!

The same Idea as fallback to LLM that we have now?


How did you install rhasspy-Speech? via addon?
By “whisper” you mean faster-whisper?

Yes, i installed rhasspy-Speech via Addon

Yes, by whisper ein mean “faster whisper” running as a Docker on my Intel N100 Unraid “Server”.
I tried HomeLLM – but the N100 is much too weak for that.

Faster-whisper is running “ok” - so response in 5 to 10 Seconds.

1 Like