Yeah I was just going through Mike and 1st impression that even if it is your own it was ‘no AI!?’ and why I was looking was I was wondering what you might be using.
Maybe you used an early version, but the requirements seem pretty standard, but its quite normal for for a ML release to be tied to a ML version I am frequently using miniconda, venv or docker as different python bases is not uncommon but usually no problem. Dependencies are only a problem if you ignore such tools?
I was curious as I thought it had chosen due to load and I thought with the introduction of ASR such as Whisper SotA models is gaining more focus and like GitHub - guillaumekln/faster-whisper: Faster Whisper transcription with CTranslate2 you have running in Rhasspy3.0 the papers and models that hit SotA WER are frecking huge, but actually have been boiled down to less accurate but far faster and smaller models.
Yes if you go no holds barred on achieving (State of the Art) then models can be huge, but that is what dev is about and with Assist entities we are talking a very small subset of a language model and hence why I was wondering what you where using.
Whisper is an example all on itself as there is nothing special really about the ASR of whisper, it the GPT like beamsearch that fixes those errors via the NLP of the decoder as an approx explanation my memory will dictate.
The times Whisper gets it totally wrong but still creates a logical sentence is this process in action.
Because you have a much smaller language subset I was expecting something lean and mean and very accurate and why I was checking to what you are using, but dependencies… !?
If Whisper was opensource then likely we could have a entity subset LM (language model) of the decoder part, sady its not and as far as I know we get what we are given.
Wav2Vec2 is a close 2nd to whisper where the LM or n-Gram could be generated on the fly based on enabled entities hopefully creating a super small and accurate LM.
Likely you could even increase accuracy by adding a custom ASR dataset of multi-voice TTS churning out common command sentences but seems nowadays resuse of others seems the norm.
Its things like ‘custom commands’ that are dictating large language models as it departs from a small known Hass entity model and we are talking very small. Just enabled entities create a LM on the fly, to needing a full large language model because we have to have everything to deal with custom, unknown, as in the huge model that could be the entities of a shopping list, or todo.
Which surely are not Hass inference based control but skills in there own right requiring much more complex language models.
I am just getting my head mainly Rhasspy3.0 Dev Preview and thinking the 1st amazing skill is the Hass Assistance Inference based module.
I can add a NLP infront of the word stemmer and convert from AI NLP to basic stemming NLP, but bottleneck of logic is always basic word stemming, losing much of the NLP AI is trained on, that could output correct YAML intent.
I thought Hass Assist was going to be specific to Hass entities so lighter weight, more accurate language subset models could be created, purely because it isn’t a full language model supporting all unknowns.
You could probably bypass Assist as module but train on the Assist YAML dataset with a tiny entity only LM NLP.
If I was going to use Assist I could also make a much lighter weight and equally even more accurate ASR by having a specific subset LM that creates ‘Better’ formatted entity specific output and feed direct into the word stemmer with better results and negate to an extent the need for additional NLP.
Niether what I was expecting, so guess I will just have to wait until a a Flair add-on for HA
if I want to gain the accuracies I was hoping for as that was what I was expecting and likely too enthuasiastic.
Cheers for the reply and explanation.