I just spent 1.5Millions Fable 5 Tokens on this, with over an hour of our dear Claude thinking.
Now, I know we dislike LLM Vibe-Coded stuff here, so if you simply don't want to engage with these type of generative solutions, continue your way.
But if you want the goods, and I mean, the good goods!
- Full Audio Duplex, which comes with:
-
- A 'stop' halt-word to stop an ongoing Text-To-Speech answer
-
- A 'Barge-in' mechanism, where repeating 'Okay-Nabu' erases what you said and starts a new
listening session
- A 'Barge-in' mechanism, where repeating 'Okay-Nabu' erases what you said and starts a new
-
- VAD to avoid those pesky false-positives (Voice Activity Detection )
-
- Real-time AEC (Acoustic Echo Cancellation)
- Gain control for the Microphone and Speaker
-
- Allowing for a louder voice (I recommend a gain of 4)
-
- Allowing for a more sensitive microphone ( Guys, I can whisper
okay nabuand it picks it up!)
- Allowing for a more sensitive microphone ( Guys, I can whisper
Then head to the YAML file on my repo to grab the full code.
And be sure to install the ESPHome-Intercom components locally if you select another version than the one linked (Ending with _cloud).
Those two others in the same folder use a local cache of the Esphome-Intercom components!
And use the little demon images you see on the video
Here a video for those wondering of how it works (I apologize for my slow local inference time!)
As for results?
It just works.
I have been using a previous version of this code for over 2 months now! With no issues.
Credit where it's due, this is all possible thanks to n-IA-hane who made the ESPHome-intercom component!