Need some help here. I’m trying to play audio returned from a conversation agent on my SONOS speaker. I can see in the logs that the wakeword is detected, SST done, OpenAI response returned. But I can’t for the life of me figure out how to get it to play through another audio device.
Here is the outline:
I have a HA Green with cloud.
- a voice satellite - USB Microphone to record and stream voice to HA. Plugged into HA Green
- a wake word engine using wyoming-satellite & assist-microphone.
- an STT engine (Whisper) to translate audio to text.
- a conversation agent (OpenAI integration) to act on what is said and formulate a text response
- a TTS engine (Piper) to transform the response text into an audio signal
This last step is where I’m banging my head against a wall
- Play the voice satellite audio response through wi-fi connected SONOS speaker.
Anyone have any insights here?
For context. I had the M5echo working well and playing through the SONOS, but the mic quality wasn’t good enough on the M5 (among other issues with the m5). So I’m trying to replace the m5 with a USB microphone.
I don’t believe this is possible, not with AssistMicrrophone add on, which I assume is what you are currently using with the microphone plugged directly into the HA green. I use this and it works great but I also have one of those round speakerphones, about 50 bucks off Amazon when i bought it a year or so ago. Best voice assistant in the house. My Espressif Korvo-1 being second, it still just doesn’t handle background noise great. With newer models coming out using XIOS chip for echo/voice isolation, which I believe are the same Google and Amazon use, things will get better on that end, it is going to take some time though.
You could create sentence/response automation and have it play through your wireless Sonos speaker. First, verify it works. Go to developer tools, then services. choose TTS (text to speech say). Choose your Sonos from the available entities, type in something for it to say and see if it works. It should, works great on my Sonos soundbar. If works then create an automation, type in sentence for the trigger, make it do whatever actions, then choose the TTS service as the last action and have it reply. Kind of a headache but only way I can thnk of. Maybe someone smarter then me knows another way,
The problem is you really just don’t have a way to tell the add on how to send audio outside the 2 dropdown boxes in configuration but only USB devices show up there.
Thanks for the response. I appreciate it.
I too think it’s difficult if not impossible to use AssistMicrophone in this way. Not sure if there is anything else I can use out there but I haven’t been able to find it.
My use case is for an art gallery and needs to use the OpenAI assistants api to work. I’ve scripted a python program that works okay. The ReSpeaker/Mic array works well and I’m able to play it on the Sonos as a notification broken into chunks to get around the 20sec SONOS notification time limit.
Working on custom wake-word and silence detection now as well as goodbye word so I can keep a conversation thread open without having to say the wake word everytime.
I wish I could do this using the home assistant pipelines.
No problem, I’ve tried a few different solutions and the Wyoming Satellite works the best but my Espressif Korvo-1 is a close second, well third actually, Assist Microphone is the best but obviously has to be plugged into USB using a dedicated USB speakerphone. I never got to try or see an S3 box in action because demand drove prices way to high for me so can’t say how it compares…
It’s possible to do something similar for repeated commands without saying the wake word. It just isn’t possible (to my knowledge) on the Wyoming Satellite or Assist Microphone. Only the S3 Box orr another device using Microwakeword. It was put in as a feature request but I think it was pushed back or not accepted. Was able to get it working on the Korvo-1. It listened for 5 seconds after replying to a command before turning off and had a switch to turn it on/off. It was or is, called continued conversation
I did decide to go ahead and get a Seeed respeaker lite, which has an XMOS chip. I’ll let you know how it goes. Honestly, probably best to wait for Nabu to release their model because it will just work out of the box. Obviously you can make adjustments but it will be supported and I imagine their main testing device once it’s out.
I also wish the ESP32-P4 wasn’t delayed by a year. Now that the first dev board is out it looks like a major performance boost. No WiFi/BT though but GPIO pins for Ethernet. The dev board is using a C6 for radio capabilities… That and I believe the ESPHome team has “fun” getting the S3 to work so it could be quiet some time before it’s available in ESPHome if it ever is. It also generates all encryption keys and won’t show them as plain text to software. Not sure how that would effect creating API keys. Official specs. It would also probably have to be updated via USB/Serial every time you want to update.
I’m hoping the XIOS chip handling noise/echo cancelation will do the trick. That’s my main issue now, background noise, particularly TV, of almost anything decently loud in the background ruins the experience as my Korvo-1 works great when silent, Wyoming Satellite handles it better but far from perfect. I was shocked at how good that camera looked in the P4 demo linked above considering the display and camera were both hooked up to the P4.
Regarding the AI stuff I know [Nabu is working with Nvidia on a custom completely local LLM for Home Assistant. Nvidia approached Nabu because so many people at Nvidia use HA. I know they have been porting stuff to GPU based using Nvidia’s Jetson models. While mostly over my head there is a thread on the Nvidia development forums.
Regardless, it’s going to take time and effort for Google/Amazon type reliability but I have faith in Nabu Casa pulling it off. It just won’t be as fast as everyone wants because it will progressively get better over time through both hardware and software updates. Like you, I want it all working perfectly now but there isn’t a “perfect” solution, especially for everyone at the moment.
Just wanted to add that the USB speakerphones have excellent quality sound payback. My understanding from research is the DSP and other technology in the phone is utilized. Why it’s still not as good as say, Google is the cloud resources used for individual voice isolation so it works great when loud noises are in the background, like music with no lyrics. But when you are watching TV it just isn’t great at deciphering your individual voice with the other voices from the TV which is where the big name brands still win, at least for now.
If you main purpose is music playback then they make some pretty decent USB speakers the obvious downside is having to spend money. I haven’t looked at GitHub but I’m assuming someone has put this in as a feature request already. It doesn’t seem like it would be hard to add to Assist Microphone but sometimes just because something seems like it would be easy doesn’t necessarily mean it is to implement.
Quick demo of audio playback from the one I’m using below., all buttons work so I can mute the microphone and adjust volume on the device with no extra steps needed. If this seems like a solution you might explore don’t but the Bluetooth/USB combo ones with batteries. I originally bought one and I believe the continues streaming and BT being on caused the battery to drain and it didn’t auto detect that it was plugged in to charge. I would have to unplug it then charge it before using it again. USB only or a few have 2 USB ports, one being a dedicated power port so this wouldn’t be an issue in that scenario. They also tend to have higher wattage speakers.
Thanks mate! Appreciate all the helpful tips and insight into the development process. Hopefully Nabu comes out with something soon.
I ended up coding an assistant using python and bypassing home assistant all together. The latency is the main issue still nagging me. I’m not using it to control any devices, so it works for my use case.
I started with a respeaker 4 hat array. But the microphone wasn’t strong enough and I had to be very close to it for it to work. I still plan to use it for something else so it wasn’t a total waste. I do like the LED capabilities that it comes with.
Can you write it to a local file? If you can you can play that from Sonos.