Using Alexa as frontend for HA Assist rather than the new hardware HA Voice PE

puterboy · December 26, 2024, 2:43pm

The new HA Voice PE seems promising (though immature) but I already have about a half dozen Alexas in my house.

I currently do “rudimentary” Alexa voice response using the Alexa skill “My Questions” for limited, pre-programmed speech-to-text followed by some klugey (and messy) Node Red code.

I would like to be able to use my Alexas as a full speech-to-text frontend to the HA Assist -engine - presumably by using/creating a skill that does the speech-to-text and forwards it on to HA.
Then using an LLM (either HA or external), I can use HA Assist to translate the text into actions.

Is the above possible
Has anyone done the above?
Any pointers on how to do this?

The alternative of having a dual network of Alexa and HA Voice PE devices scattered across the house is not going to appeal to my SA

NathanCu · December 26, 2024, 2:46pm

Not the way you want to.

You will be able to have a skill that Alexa passes to. You will not be able to have your Alexa devices answer as if they are ha voice native

It will require knowledge of AWS and skill building in the Amazon infrastructure.

Read: lots of work for something you probably won’t like…

templeton_nash · December 26, 2024, 3:30pm

Hi @puterboy

Take a look at this

puterboy · December 26, 2024, 3:42pm

That looks promising!

puterboy · December 26, 2024, 3:54pm

Previously, I was able to get Alexa to both accept input (e.g., Alexa bedroom temperature) and answer (e.g., 67 degrees) but using my very manual, kluge method.
Basically, in NodeRed I am able to both listen to Alexa, parse the phrase, carry out corresponding action, and then pass an answer back to the same (or any) Alexa device that triggered it.
Diagramatically,

[Alexa Speech-to-text] -> [Alexa "My Questions" Skill]
-> [Node Red: Alexa Event node - On Device Activity]
-> [Node Red: Switch node -Filter (one per expected phrase]
-> [Node-Red: Current State node (one per filter output)]
-> [Node Red: Alexa Routine Node - Speak]

If possible, I would like to do the same with more general speech-to-text.
e.g.

[Alexa Speech-to-text] -> [Alexa Skill] 
-> [HA Assist] -> [HA Action]
-> [Alexa Media Output]

Are you saying, that you can’t get the final link back to Alexa Media to work?
Could one do something like:

[Alexa Speech-to-text] -> [Alexa "Custom General Speech-to-Text" Skill]
-> [Node Red: Alexa Event node - On Device Activity]
-> [Node Red: "Pass to HA Assist & Capture Response"] (does such a concept exist)
-> [Node Red: Alexa Routine Node - Speak]

I am OK with it taking a lot of work as I have done a fair bit of hacking to date in HA and its databases.

NathanCu · December 26, 2024, 4:27pm

No. What I’m saying is that you won’t be able to intercept general STT without a secondary skill call.

You CAN do (wake word) ask (skill) to ‘you can pass this entire payload to whatever is handling it for Assist and respond through something like AMP’

‘Alexa, ask assist to turn on the light’
‘Alexa, ask assist to do this wierd thing with the LLM’

You CANNOT intercept the basic ‘(hey) Alexa’ spech pipeline before the call to the secondary skill and react to something like Alexa, turn on the light - that will ALWAYS be handled by Alexa. That’s hard coded on the Amazon side and won’t ever change.

That part in my home is a non starter and would make it inherently not worth all the effort to have a ‘kinda’ solution.

puterboy · December 26, 2024, 4:50pm

Very helpful.
Sorry for being “slow”, but are there any other limitations on this approach other than:

You need to add the secondary skill words “ask assistant” to the primary “Alexa” wake word
You need to build and deploy your own custom skill
3 The general “suckiness” of Alexa STT voice recognition

NathanCu · December 26, 2024, 5:06pm

The Alexa Media Player integration is… Less than reliable… (for me works about 50% of the time) So you need to account t for that or look up an alternative.

For info on that read all the Alexa Media Player threads

(short version: Amazon has zero incentive to make this easy - they MUST monetize Alexa for the business unit to continue to exist. They want you staying in thier walled garden. So in essence you’re getting into a cat and mouse game trying to stay ahead of them as they continue to change things and make it hard to interface. If you want to interface in a way they want you to, great. I strongly suspect it will become harder in the future as they add paid Alexa to the mix. And that goes a long way to my decision… I may solve it tomorrow only for it to not work again in May… Therefore my Amazon devices now have a limited lease on life. As soon as voice response on my own HA driven voice devices are reliable enough for family use - close, still need ability for custom wake word and access to the onboard timer - the Amazon devices get replaced and e-wasted. I’m personally not interested in fighting with a multinational on what they allow with the device. )

fabianosan · December 29, 2024, 3:51pm

Você precisa adicionar as palavras de habilidade secundária “pergunte ao assistente” à palavra de ativação principal “Alexa”

You don’t necessarily need to “ask or ask” the skill to perform such an action, you can start the skill and from there you give the commands you want. With this approach, I created automations in Home Assistant that open the skill according to the presence in a certain room, just by passing by it starts the skill and I give the command. Furthermore, saying: “Alexa, smart home”, is not so complex, because after that, if you are using AI in Assist, you can ask for several commands at the same time, for example: Turn off all the lights in the house, turn on the TV in the living room and tell me which doors are open?

Você precisa criar e implantar sua própria habilidade
personalizada 3 A “” geral do reconhecimento de voz Alexa STT

The “Alexa (hosted)” model, you do this task in about 2 or 3 minutes (one time) and it’s done! No pain…