I recently bought the Voice-PE hardware. At the moment I’m playing around with it but I can’t do more than the standard functions like setting a timer.
My home automation platform is HomeMatic, which I have integrated with HomeAssistant, which works fine.
My HA hardware is a Rasperry PI5-64 with 8 GB RAM.
I know that’s not enough for the Voice-PE hardware. That’s why I switched the voice assistant to Microsoft Cognitive Services. I have an M365 tenant and an Azure subscription. In addition, there are already integrations from community users.
So far so good. When I look at the DEBUG traces from the voice assistant and listen to the voice output, it also seems to handle them cleanly via Microsoft Cognitive Services.
(FYI I use High-German, but it doesn’t seem to have a problem with that)
My rooms are clearly named (living room, foyer, bedroom, kitchen, etc.)
When I say: OK Nabu. Turn on light in living room one of several things happens:
Voice-PE doesn’t understand
Or he says light on. (But this does not really happen although I can see in the VA trace log that it has understood it but has not sent an action or has no target).
All my 86 HomeMatic devices are exposed to ASSIST. Nevertheless it does not work. I have even tried assigning aliases to the target devices. No luck.
Question: Does the Voice-PE have to be assigned to the same room as the device to be controlled? So if my Voice-PE is not assigned to the living room and I want to switch the light in the living room, will it work?
Or my Voice-PE is assigned to the living room and I want to open the front door, which is assigned to the foyer?
So does anyone have an idea why HA does not turn on the device (the light in the living room)? When I move the switch on the HA dashboard or in the app, the light is switched on properly. So it’s not a fundamental problem with the Homematic ↔ HomeAssistant integration
Thanks for any advice
Oliver
PS. I have attached the VA Config and the VA Debug Trace
Do you have any other devices in the living room with the name ‘Light’ on them, such as a sensor or plug? I had the same issue as you. Voice assistant was trying to turn on another entity, such as the plug socket.
Now what I’ve taken to doing is with any problem commands, I add an alias to the device. This seems to have cleared up most missed commands. You can add them here.
You say ms Azure cognitive services but you don’t say what integration or model you’re using. That is very important. The model needs to be strong enough to understand (something like a llama3.1. 17b or better is the lightest model I’ve even been able to use successfully) and the model you choose MUST support tool use. And then finally the integration you’re using must support assist control.
All three must be true. All of that before we even get to telling it how to control things. So how did you setup Azure?
Reason I ask is…
hallucinating about if the light is even on says your model choice probably isn’t adequate and
not being able to control is usually a hallmark of a model that can’t do tool use.
(btw I’m an architect level Azure guy and I don’t use Azure cognitive services yet - because it’s very complex in HA before adding Azure You have to conquer llm setup and then Azure it. that was very much something I had planned on tackling if it was even possible later this year.)
The other debug tool to try is UI->DevTools->Assist.
Pick German language, and type in “licht im Wohnzimmer einschalten” (the German statement as shown in the attached Voice Assist Debug), and see what entity was picked.
These are interesting facts. Will check that and come back to you.
I followed this article to set up Azure Cognitive Services and thought that if someone explicitly has done this integration with tutorial for voice assist, it should work and be tested
It selected my experimental ESP32 Touch-Display as one of the targets, which is not even turned on. Will try to exclude these from beiing exposed to voice assist.
Maybe the better approach is not to expose all 86 devices to ASSIST but only the ones you want to be controlled by it.
Yeah you’re right - I read that after. Short version only using patterns for action and this is only ‘the ears and voice’
it also means for troubleshooting purposes we can completely eliminate that entire section including Azure. (I think, if the Azure STT is working better than builtin hsre) Simply did it match. - no.
Yeah the Azure Cognitive Services seem to do a good job.
But as MCHK mentioned I used the built in conversation agent which seems not to be the best one (on a PI5 8GB)
I switched to “Google Generative AI” with mixed results. Sometimes speaking English is better than German and sometimes when speaking German it answers in English.
I mean I have a powerful VMWare ESXi infrastructure (DL380 with 36 CPUs (3.0 GHz) = 72 Logical Processors and 768 GByte of RAM).
I could install Ollama as LLM but the Hardware does not have GPU boards. And from what I read in the Internet, without GPUs don’t even try Ollama.
So what is your best suggestion as a free LLM to do the best job with HA?
In the meantime I will try with German and English Aliases for devices and areas.
And I will speak with a mix of English/German.
Question: Would Homeassistant Cloud from NabuCasa give a better voice recognition and device matching experience?
You need an LLM that supports tool use.
If you roll your own. You need at least a 7b model (llama3.1 or better) prefer something 12b or better that has tool use like Mistral or Mixtral
The publicly available fro models use one that has a supported integration but all the supported ones are fine. That puts you with something like chatgpt (currently 4.0 mini) or Google gemini.
Your success with any of those will he a function of your ability to describe context to the llm. Not the llm itself.
Nabu does not provide an LLM. They do have a speech to text and text to speech provider. Both work fine for English speakers. Not sure about other languages and dialects.
As to free and llm. They exist. But remember if you’re not paying then everything you are sending is consumed for training purposes.
For me this includes locations. When people are in and out of bed, how often I clean what room l, and on and on…
For my use that’s not acceptable, and I’m not interested in sharing that so I pay for my LLM and turn off the sharing options.
Your other choice is to run your LLMs locally on ollama.