Is there currently any possibility to use all three components on an esp32/esp32-s3?
The problem is that as far as I understand, for wake word detection to work, the platform needs to be set to esp-idf, which does not support any media_player component though. Is there any workaround or future where this is going to work?
The only issue that I know of (and that is likely to be fixed in ESPHome 2023.11) is that you don’t have receive the “reply” from HA in some conditions if you use the media_player component rather than the speker one.
Now, tbh, using French and local STT (haven’t try NC) is not practically working (slow, poor recognition), so I’m not actually using it beyond basic testing.
I only tested German and it was, well, horrible too. But when I switched to Home Assistant cloud for the audio to text processing it was 1000x better, so for now I am using that.
As far as the platform goes, I saw that in a YouTube video on how to set up voice assistant (and it was explicitly mentioned - although I could not find it anywhere else).
Esp-idf is also the platform that is preselected on the example code.
In the meantime, I did something similar to koying (workaround n 'round…), but I bypass the voice_assistant>media_player config entirely. I use an HA script to process the text-to-be-spoken (called in on_tts_start) which outputs it back to the ESPHome media_player via tts.speak; very convenient to add a bit of “flair” to every response anyway (and “simple” TTS queueing support)!
The voice assistant pipeline is still wonky with custom device builds (periodic errors, state freezes… don’t know if it’s better with the “featured devices”…), but other than that, this workaround works very well
Finally I got the voice assistant with media_player running for esp-idf. I needed to implement a custom component for this, you can find it here: github. The component still needs some improvements but I would appreciate if someone is willing to test it on a different hardware than mine. Right now it is running on a ESP32-S3-DevKitC-1. Any feedback is welcome on this.
Same issue here…
I made up my Voice Sattelite using the template provided during an howto instruction video.
This template luckily described why ESP-IDF is really needed:
“This is important. ESPHome supports two frameworks: Arduino and ESP-IDF. ESP-IDF is needed to include an audio library called ESP_ADF used in our voice assistant”
media_player: is part of the Arduino framework which makes it impossible to have our voice sattelites be like the Smartspeakers from Google Home/Amazon Alexa/or Apple Homekit.
Therefore I’m still unable to replace my Google sattelites around my home.
To accomplish the needs of I guess every household is the Sattelites be able to:
Be a voice satellite (this is already accomplished)
Be able to play audio (therefore media_player: to ESP-IDF is needed)
Be able to cast audio to these Sattelites (without a hassle)
Looking forward what the future will bring but as soon as this is all possible HA/ESPHOME will be a valuable competitor against those 3 and will be probably the only really secure secure smartspeaker…oh gosh I’m looking forward to it
What could be next: HA Smartphones without spy-stuff on it which really integrate with the HA itself?