Assist: Rhasspy-Speech for Speech-to-Text

The latest voice blog post mentioned a new speech to text option: Rhasspy-Speech:

I have hugh delays with Whisper due to limited hardware. Since Rhasspy standalone worked great for me I really would like to use the new Wyoming-Rhasspy-Speech.

So far I did not manage to run it via docker:

  • docker-compose.yml
    ’services:
    rhasspy-speech:
    container_name: rhasspy-speech
    image: rhasspy/wyoming-rhasspy-speech:1.0.0
    volumes:
    - $PWD/models:/models
    - $PWD/train:/train
    ports:
    - 10300:10300
    restart: unless-stopped
    logging:
    driver: journald’
  • Put a model from here into models (extracted) (rhasspy/rhasspy-speech at main)

Startup fails with “WARNING:Skipping model en_US-rhasspy (not trained)
rhasspy-speech | WARNING:No trained models found.”

Any ideas what I missed?
I believe I also need to provide a sentences.yml somewhere?

1 Like

I did not try myself but picked up this docker-compose example:

services:
  rhasspy-speech:
    container_name: rhasspy-speech
    image: "rhasspy/wyoming-rhasspy-speech:1.0.0"
    restart: unless-stopped
    volumes:
      - "./config/models:/models"
      - "./config/training:/training"
    ports:
      - "10300:10300"
      - "8099:8099"     
    command: 
      - "--hass-token=your_token_here"
      - "--hass-websocket-uri=ws://home_assistant_url:8123/api/websocket"

Heres a link to the source. (discord)

8-1-2025: updated the link

1 Like

Thanks. I manually build the docker image to have v1.4.3 and used your provided yaml. Works now :slight_smile: Thanks

1 Like

How are your results? Speed compared to whisper? Accuracy compared to whisper?

first tests are great. I did not do any professional comparison and can only report the"felt" improvement. Accuracy is very good and its much faster than Whisper (tiny-int8) for me. You need to manually provide the sentences and train tho…

I have not yet tested that implementation but I had tested rhasspy in the past, basically 100% success and very fast at garbage hardware (RPi4 iirc) + 1€ USB Mic :slight_smile:

I have so much false detects ~50% with whisper (could be language related) even at medium llvm because the recognised words are slightly off so nothing works.
So it detects “of” or “offer” instead of off - nothing works.

I have a pretty bad experience with faster-whisper, it’s often recognize words wrong. So am I waiting for a working Rhasspy-Speech docker image to try.

I have just built the docker image myself.
You can just use the provided Dockerfile and update the version: wyoming-addons/rhasspy-speech/Dockerfile at eb1985688e429d217e0de788567e53f0fea898bc · rhasspy/wyoming-addons · GitHub

thanks, I gonna wait for official image update.

For me also, rhasspy has seemed more accurate and faster from first impressions.

Hi,

I’m unable to open the Discord link.

I have the same error in my container when triggering voice in the Companion app (I have extracted a model in :

~/docker/wyoming-rasspy $ ls config/models/nl_NL-cgn/
config.json  frequent_words.txt  g2p.fst  lexicon.db  LICENSE  model  phoneme_examples.txt  README.md  SOURCE




rhasspy-speech  | INFO:root:Ready
rhasspy-speech  |  * Serving Flask app 'rhasspy_speech'
rhasspy-speech  |  * Debug mode: off
rhasspy-speech  | INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
rhasspy-speech  |  * Running on all addresses (0.0.0.0)
rhasspy-speech  |  * Running on http://127.0.0.1:8099
rhasspy-speech  |  * Running on http://172.18.0.2:8099
rhasspy-speech  | INFO:werkzeug:Press CTRL+C to quit
rhasspy-speech  | WARNING:root:Skipping model nl_NL-cgn (not trained)
rhasspy-speech  | WARNING:root:No trained models found.
rhasspy-speech  | ERROR:root:No model selected

In HA I have configured the ‘faster-whisper’ integration.

I would like to improve my tests, with Dutch language.

@ronnie_j

Open the web ui via port 8099 in order to download the model and train your custom commands.

1 Like

Hi,
i tried “Whisper” and also “Rhasspy-Speech”.
i am extremly impressed with the accuracy and incredible speed of rhasspy-speech.

When i use “Whisper” it understands me in general, but the text “result” is a little bit different each time, i.e. the text results of three tries starting the script for WebRadio in German:
“Bayern-1-Webradio starten”
“Bayern eins Webradio starten”
“Bayern 1 Web Radio starten”
→ Home Assistant never understood me, because the script is called “Bayern 1 Webradio starten”

But it does not understand “new” words.
When i want to “name” a timer, or add an item to the to-do / Shopping-list it does not understand me.

Would this “Improvement” be possible?

First try speech-recognition with “rhasspy-Speech” when it does not recognize the sentence, the “Audio” is automatically handed over to “whisper” which then recognizes the sentence and gives it back to Home-Assistant.

This would be an awesome feature!
It would combine the best of both worlds and would be fast but also flexible.

Thank you for your help in advance!

1 Like

The same Idea as fallback to LLM that we have now?


How did you install rhasspy-Speech? via addon?
By “whisper” you mean faster-whisper?

Yes, i installed rhasspy-Speech via Addon

Yes, by whisper ein mean “faster whisper” running as a Docker on my Intel N100 Unraid “Server”.
I tried HomeLLM – but the N100 is much too weak for that.

Faster-whisper is running “ok” - so response in 5 to 10 Seconds.

1 Like

I tried Speech-to-Phrase new STT that was announced on voice stream yesterday and OMG it’s awesome! It solves the problems I had with faster-whisper. With faster-whisper I could turn on/off lamp (with specific name) only on 10th try. With Speech-to-Phrase it works on first try and it’s fast!

I am also using microwakeword, while I have learned to trigger it quite reliably, it’s not as good as ok google. Hope it would improved too.

PS: I had to built Speech-to-Phrase container myself because no docker image available at the moment. Docker image available now.

I did not understand the difference between rhasspy-speech and the new speech-to-phrase. Did anyone compare them already?

I failed to install rhasspy-speech, so I will never know. I think speech-to-phrase is evolution of rhasspy-speech.

It automatically learned about all my HA exposed devises.