Year of the Voice - Chapter 1: Assist

I’ve used Flair before, and while it’s a nice framework, it suffers from the same problems as most anything machine-learning related:

  • Large list of dependencies, including specific versions of PyTorch and libraries like scikit-learn that need to be pre-compiled (HA supports more architectures than just amd64 and arm64)
  • 100’s or 1000’s of MB of weights to download per language
  • Must be re-trained whenever the list of areas or entities changes, including changes to aliases (very slow without a fast CPU or GPU)

While Assist is far less flexible than something like Flair, it:

  • Fits sentence templates for dozens of languages into around 200Kb of uncompressed JSON
  • Requires no dependencies outside of the standard library
  • Can be “re-trained” by just adjusting the input entity/area lists

Plus, as we’ve mentioned before, the plan is for the current sentence templates in Assist to serve as training data for more sophisticated machine learning systems in the future. So it should be possible some day to install a Flair add-on for HA (bundled with all its dependencies and weights), and train it on the sentences generated from Assist templates :slight_smile:

Yeah I was just going through Mike and 1st impression that even if it is your own it was ‘no AI!?’ and why I was looking was I was wondering what you might be using.

Maybe you used an early version, but the requirements seem pretty standard, but its quite normal for for a ML release to be tied to a ML version I am frequently using miniconda, venv or docker as different python bases is not uncommon but usually no problem. Dependencies are only a problem if you ignore such tools?

I was curious as I thought it had chosen due to load and I thought with the introduction of ASR such as Whisper SotA models is gaining more focus and like GitHub - guillaumekln/faster-whisper: Faster Whisper transcription with CTranslate2 you have running in Rhasspy3.0 the papers and models that hit SotA WER are frecking huge, but actually have been boiled down to less accurate but far faster and smaller models.
Yes if you go no holds barred on achieving (State of the Art) then models can be huge, but that is what dev is about and with Assist entities we are talking a very small subset of a language model and hence why I was wondering what you where using.
Whisper is an example all on itself as there is nothing special really about the ASR of whisper, it the GPT like beamsearch that fixes those errors via the NLP of the decoder as an approx explanation my memory will dictate.
The times Whisper gets it totally wrong but still creates a logical sentence is this process in action.

Because you have a much smaller language subset I was expecting something lean and mean and very accurate and why I was checking to what you are using, but dependencies… !?

If Whisper was opensource then likely we could have a entity subset LM (language model) of the decoder part, sady its not and as far as I know we get what we are given.
Wav2Vec2 is a close 2nd to whisper where the LM or n-Gram could be generated on the fly based on enabled entities hopefully creating a super small and accurate LM.
Likely you could even increase accuracy by adding a custom ASR dataset of multi-voice TTS churning out common command sentences but seems nowadays resuse of others seems the norm.

Its things like ‘custom commands’ that are dictating large language models as it departs from a small known Hass entity model and we are talking very small. Just enabled entities create a LM on the fly, to needing a full large language model because we have to have everything to deal with custom, unknown, as in the huge model that could be the entities of a shopping list, or todo.
Which surely are not Hass inference based control but skills in there own right requiring much more complex language models.

I am just getting my head mainly Rhasspy3.0 Dev Preview and thinking the 1st amazing skill is the Hass Assistance Inference based module.

I can add a NLP infront of the word stemmer and convert from AI NLP to basic stemming NLP, but bottleneck of logic is always basic word stemming, losing much of the NLP AI is trained on, that could output correct YAML intent.
I thought Hass Assist was going to be specific to Hass entities so lighter weight, more accurate language subset models could be created, purely because it isn’t a full language model supporting all unknowns.
You could probably bypass Assist as module but train on the Assist YAML dataset with a tiny entity only LM NLP.
If I was going to use Assist I could also make a much lighter weight and equally even more accurate ASR by having a specific subset LM that creates ‘Better’ formatted entity specific output and feed direct into the word stemmer with better results and negate to an extent the need for additional NLP.

Niether what I was expecting, so guess I will just have to wait until a a Flair add-on for HA if I want to gain the accuracies I was hoping for as that was what I was expecting and likely too enthuasiastic. :slight_smile:

Cheers for the reply and explanation.

2 Likes

Just wondering how to setup the entities so assist works as advertised. I asked this:
image

Here is my window sensor:
image

Of these questions which were given as examples in the release notes:

  • “What is the outside temperature?”
  • “What is the power consumption in the office?”
  • “Are any lights on in the bedroom?”
  • “Are all windows closed in the kitchen?”
  • “How many lights are on in the office?”
  • “Which doors are open?”

Only the first one worked. I either got “Not any” or “I dont understand”. So what do I have wrong in my setup? (Note I substituted Study for all of the examples above)

The basic text stemming is just not capable of that with out exact definitions for each permitation.

MetaAI just released their Llama model which is much smaller than ChatGpt and does run on a Pi even if extremely slowly 10sec/token.
Also Stanford took the model and refined it to produce Alpaca in an extremely cost effective manner.

https://crfm.stanford.edu/2023/03/13/alpaca.html

GitHub - ggerganov/llama.cpp: Port of Facebook's LLaMA model in C/C++ aka whisper.cpp but llama

It is interesting

Have a look at GitHub - KoboldAI/KoboldAI-Client where GPT/Llama is being interfaced to fantasy errotic novels in AI Dungeon adventures, that is a concept I never thought of but wow the possibilities seem to be extremely varied and interesting even if they might not be your thing :slight_smile:

Alpaca works without internet.

Yeah all local models. This gives a bit of an idea on model size on Llama/Alpaca

Likely a Pi4 is just too slow but could actually run with 4gb, there are Arm based SBC such as the RK3588 variants that will run much better, but for GPU based or Apple silicon advanced local based AI has moved from the future to now.
Much is the development of training via supervised and reinforced learning that say with a project like this is Documentation and peer review of forum solutions.

If the are doing it for Dungeon & Dragons AI fantasy chat then its just one strange and varied example of a specific knowledge domain.

1 Like

Anyone know of an existing issue when entity names have an apostophe? Looking at the following picture the alias works but the name does not?

P.S. Sorry for the deleted post above. Pressed wrong button.

Is “Study” an area? Questions like “are all windows closed in the study” will look at the entities in that area.

Yes I have made each room an area.

Edit: also why would the response be YES to both questions?

Edit2: Same behaviour with 2023.5.0

Can you share how you fixed it? I can’t seem to make it work.

Josh.ai just announced 3rd party integration toolings that might unlock HA’s ability to work with Josh Equipment…