Kaldi + Fsticuffs (rhasspy-nlu) in HA like with Rhasspy v2?

Hello there.

First of all, I’m German an therefore have the problems I have with voice recognition.
I’m pretty sure it works better with the English language.

Anyway… I’m using Rhasspy v2 for many years now and it works like a charm.
But I want to switch to the Voice Assistant-Features built-in HA because they give me more direct integration, the Names of devices can be used instantly without manual setup, and so on.
And as I want to have satellites in different rooms now, I would like to use wyoming-sattelite, because it’s so easy to setup.

But the problem is, that the VA doesn’t recognize even the simplest words/names for the devices, or anything at all… like giving me the current time or the weather.
I can’t even get it to turn on a simple light.
I tried the default, so

  • wyoming-whisper
  • wyoming-piper
  • wyoming-openwakeword

wyoming-openwakeword works great, I’m happy with that.
wyoming-piper does sound horrible, I exchanged that with wyoming-opentts. But both of them work, no problem here.

But wyoming-whisper is the problem. As I said, it doesn’t recognize anything for me.
So I tried wyoming-vosk, because it should be more like Kaldi (what I’ve been using for years now). It was better, but really not good enough.

So my questions are:
As I have a totally functional Rhasspy v2 setup, is it possible to integrate that in the “HA-VA workflow”?
I know that v2 uses MQTT and v3 the Wyoming protocol.
But are there any chances to integrate that with each other?

The wokflow should be like this:

  • in HA: wyoming-satellites ==> wyoming-openwakeword ==> record or stream speech
  • then it should send/stream the speech to Rhasspy v2
  • in Rhasspy v2: Kaldi + Fsticuffs ==> Intents ==> back to HA
  • In HA: Intents are processed and optonal there is feedback via TTS (wyoming-piper or wyoming-opentts)

So that the “STT and intent recognition”-part is done by rhasspy v2, because it simply just works for me.

Any ideas and suggestions are welcome.
Thanks in advance.

Here is a screenshot of my current Rhasspy v2-setup.
If anyone needs more information, I’m happy to provide it.

1 Like

@synesthesiam is already working on exactly that as it seems:

I think we should use this as a first stage, if this comes back with a low probability/rejection, then it should evaluate whisper and pass it to an LLM. On modern hardware they could run in parallel as Kaldi barely has any delay on my minipc. It’s nearly instant.

I’m sitting in the same boat as you I think. My 5 year old rhasspy 2.5 setup is still outperforming the current state of openwakeword + whisper. It contains a lot of node-red glue and works conveniently well, e.g. by checking the site id with a prefix in the entity. “Turn the light off” will actually turn the “Livingroom light” off if I’m in the living room.

I have replaced text to speech with piper in my rhasspy instance by wrapping it with a marry tts interface though and am pretty happy with my glados voice.

Thanks - didn’t see that project myself.

When I’m back home mid next week, I can give this a shot.
Bute in the meantime I already set up Rhasspy 2 sattelittes, learned about using the satellite-ids correctly, setting up UDP-servers on the sattelites, so that they’re not streaming all the time to the central MQTT-server, any much, much more.

It’s working great so far… only thing is, that the 512 MB RAM on the Pi0 2Ws isn’t realy enough for the Docker-installation… they’re swapping all the time which isn’t good…
Would have to go with debian 10, 32 Bit on the satellites, than I could install the venv directly without docker… but having this old OS version is not my favorite thing, either.

As I said, I have everything up and running with Rhasspy 2 at the moment.
Don’t know when I will feel playful enough to go in full test mode again :wink:


Unrelated to the topic, but related to my post:
Perhaps I can free up RAM if I can get rid of the docker container for rhasspy with the patch or the forked repo from this:

This stt engine can be even faster when you use your own phrase dictionary. Has already been discussed in other threads.

From my initial post

“So I tried wyoming-vosk, because it should be more like Kaldi (what I’ve been using for years now). It was better, but really not good enough.”