Kaldi + Fsticuffs (rhasspy-nlu) in HA like with Rhasspy v2?

sic6SaNdMaN · January 4, 2025, 3:14pm

Hello there.

First of all, I’m German an therefore have the problems I have with voice recognition.
I’m pretty sure it works better with the English language.

Anyway… I’m using Rhasspy v2 for many years now and it works like a charm.
But I want to switch to the Voice Assistant-Features built-in HA because they give me more direct integration, the Names of devices can be used instantly without manual setup, and so on.
And as I want to have satellites in different rooms now, I would like to use wyoming-sattelite, because it’s so easy to setup.

But the problem is, that the VA doesn’t recognize even the simplest words/names for the devices, or anything at all… like giving me the current time or the weather.
I can’t even get it to turn on a simple light.
I tried the default, so

wyoming-whisper
wyoming-piper
wyoming-openwakeword

wyoming-openwakeword works great, I’m happy with that.
wyoming-piper does sound horrible, I exchanged that with wyoming-opentts. But both of them work, no problem here.

But wyoming-whisper is the problem. As I said, it doesn’t recognize anything for me.
So I tried wyoming-vosk, because it should be more like Kaldi (what I’ve been using for years now). It was better, but really not good enough.

So my questions are:
As I have a totally functional Rhasspy v2 setup, is it possible to integrate that in the “HA-VA workflow”?
I know that v2 uses MQTT and v3 the Wyoming protocol.
But are there any chances to integrate that with each other?

The wokflow should be like this:

in HA: wyoming-satellites ==> wyoming-openwakeword ==> record or stream speech
then it should send/stream the speech to Rhasspy v2
in Rhasspy v2: Kaldi + Fsticuffs ==> Intents ==> back to HA
In HA: Intents are processed and optonal there is feedback via TTS (wyoming-piper or wyoming-opentts)

So that the “STT and intent recognition”-part is done by rhasspy v2, because it simply just works for me.

Any ideas and suggestions are welcome.
Thanks in advance.

Here is a screenshot of my current Rhasspy v2-setup.
If anyone needs more information, I’m happy to provide it.

async · January 16, 2025, 10:00pm

@synesthesiam is already working on exactly that as it seems:

I think we should use this as a first stage, if this comes back with a low probability/rejection, then it should evaluate whisper and pass it to an LLM. On modern hardware they could run in parallel as Kaldi barely has any delay on my minipc. It’s nearly instant.

I’m sitting in the same boat as you I think. My 5 year old rhasspy 2.5 setup is still outperforming the current state of openwakeword + whisper. It contains a lot of node-red glue and works conveniently well, e.g. by checking the site id with a prefix in the entity. “Turn the light off” will actually turn the “Livingroom light” off if I’m in the living room.

I have replaced text to speech with piper in my rhasspy instance by wrapping it with a marry tts interface though and am pretty happy with my glados voice.

sic6SaNdMaN · January 17, 2025, 2:58pm

Thanks - didn’t see that project myself.

When I’m back home mid next week, I can give this a shot.
Bute in the meantime I already set up Rhasspy 2 sattelittes, learned about using the satellite-ids correctly, setting up UDP-servers on the sattelites, so that they’re not streaming all the time to the central MQTT-server, any much, much more.

It’s working great so far… only thing is, that the 512 MB RAM on the Pi0 2Ws isn’t realy enough for the Docker-installation… they’re swapping all the time which isn’t good…
Would have to go with debian 10, 32 Bit on the satellites, than I could install the venv directly without docker… but having this old OS version is not my favorite thing, either.

As I said, I have everything up and running with Rhasspy 2 at the moment.
Don’t know when I will feel playful enough to go in full test mode again

Unrelated to the topic, but related to my post:
Perhaps I can free up RAM if I can get rid of the docker container for rhasspy with the patch or the forked repo from this:

github.com/rhasspy/rhasspy

How to compile on newer Debian versions (like jammy)

opened 02:13AM - 09 Feb 23 UTC

ryanlath

This might not be the most appropriate place for this, but the forums don't acce…pt files, and the maintainer seems to be mia... Here's what I had to do to compile on Ubuntu 22.04 (jammy) and RaspPi OS 64 (Bullseye). Patch is attached. ``` sudo apt update sudo apt-get install \ python3 python3-dev python3-setuptools python3-pip python3-venv \ git build-essential libatlas-base-dev swig portaudio19-dev \ supervisor mosquitto sox alsa-utils libgfortran5 libopenblas-dev \ espeak flite gfortran python3-lxml libxml2-dev libxslt1-dev libffi-dev \ perl curl patchelf ca-certificates git clone --recursive https://github.com/rhasspy/rhasspy wget https://github.com/rhasspy/rhasspy/files/10692790/jammy.patch patch -p0 < jammy.patch cd rhasspy ./configure --enable-in-place --disable-deepspeech make make install ./rhasspy.sh -p en ``` I might have missed a few dependencies in the above `apt install`. And some of the patch is version comments to myself, but if you just want to check out this project and see if it's worth your time... hopefully, I can save you some time from Python dependency hell. [jammy.patch](https://github.com/rhasspy/rhasspy/files/10692790/jammy.patch)

mchk · January 17, 2025, 4:12pm

This stt engine can be even faster when you use your own phrase dictionary. Has already been discussed in other threads.

sic6SaNdMaN · January 17, 2025, 4:27pm

From my initial post

“So I tried wyoming-vosk, because it should be more like Kaldi (what I’ve been using for years now). It was better, but really not good enough.”

sic6SaNdMaN · January 25, 2025, 10:42am

I marked this as the answer.
However I will wait a while longer before trying that.
I read through the issues and saw that the docker-image hasn’t been updated yet from version 1.0.0 to 1.4.3 and so on.
I think, this project isn’t really ready YET - but I’m excited to see progress here and will monitor it and switch to it eventually.

Thanks again and have a nice one.