I’m the worst at Voice Assistants

I’ve literally been trying for weeks to get a voice assistant running, I wanted to build one for the experience and customisable nature, also cheaper than buying pre-made - although ironically I’ve spent way more on parts trying to get stuff to work.
I’ve wanted a PTT assistant, and a wake word assistant.

Boards I’ve tried:
ESP32D Wroom
ESP32 S3 Mini
Seeed Studio ESP32S3Plus

And I’ve mainly tried using Max 98357A amplifiers and I’ve tried INMP441 microphones and ICS43434 microphones.

I’ve tried copying and editing code from ESPHome and GitHub.
I’ve spent literal days with ChatGPT, Claude, and DeepSeek trying to get to the bottom of things.
I even bought an m5 stack atom echo (albeit the old Pico version and not the S3) and I’ll come to that in a moment.

I’d post code that I’ve tried but honestly I must have tried over 30 versions at this point, and as I said spent days debugging with AI, as well as listening to suggestions for buying more stuff to try.

My confusion arises from a myriad of people seemingly making voice assistants online with boards that don’t have the PSRAM off the seeed s3 plus, and PTT versions with a little C3 (or six I can’t remember), and even following along with code on GitHub from a chap called ‘war lion’ got me nowhere - it compiles etc but nothing ever happens.

So I’m logical, so it must be the hardware, get more hardware and still I can’t make it work, so it must be my soldering, cool well I’ll breadboard everything, so it must be my pin choices, ok cool I’ll research data sheets, I’ll quiz AI after getting that to look at data sheets.

Still absolutely no luck.

Even last night I was trying to get the aforementioned WarLion code to work with Claude and in that particular case the info isn’t getting to the speaker (yes, I have pipelines set up - I even set up a home assistant cloud pipeline which then managed to crash my system and stop HA working for an hour). Claude decided that the power coming from the 5v from the seeed s3 wasn’t enough for the max 98357a and I should try it with an external 5v directly to that as well as the esp. again though, I’ve literally never seen anybody doing that. Heck, AI tells me that esp boards can’t run any more than one LED but that seems just wrong.

Going back to the echo, I bought it, it worked, I tried to add an external amplifier and speaker, broke the code and again after spending days with it and AI, it’s never worked properly since.

I’m not going to give up, but in a world where people seem to have made voice assistants with loads of different types of ESP boards and not just the one particular chip’d full size S3 dev kit or respeaker, what can I do to be better and actually have a chance of making something that works??

I have 6 voice assistant’s using this code which has not been modified for 4 months at least. It is built with ESP32 dev s3 n16r8, Max98357, Inmp441, bh1750, bmp280.

Show us what hardware you are using and the code you have and we will look it over.

2 Likes

I tried an atom early days of voice and was quickly hit with. The platform isn’t done. It’s good. But not done. You’re still fighting growing pains. So… I’m not an electrical engineer and the platform is still moving too fast for me to commit to building my own so I went VPE

Yes the speaker is underpowered yes the mics can false positive but I’m not fighting with hardware AND software at the same time. One battle at a time.

When it stabilizes (I suspect another year.) I may consider building one based on that and upgrading the components that need it.

So I’d say you’re not the worst. But you may have jumped in on a big build early… No way in hell I’m trying this build rn. Too much is changing.

1 Like

Obviously I have no idea what I’m doing, but isn’t the media player unnecessary these days? I thought HA incorporated that somehow

If you want to hear sound from your device it will need a media player.

I assume you have read the docs to at least get an idea?

It is not simple code but there are plenty of examples, just be aware if the code was written more than 6 months ago it will not work unless you compile it with a 6 month old version of esphome. Also for decent results you will need psram, decent hardware for running HA, if you don’t use the Nabu casa cloud.

The code is changing again now to incorporate the new sendspin protocol which just adds to the fun.

Or… get a Nabu Casa account and link Alexa and/or Google Home :slight_smile:

If you want performance like Alexa or google you would be far better of getting Alexa or Google, that we can be sure of.

Also I very rarely use voice, as it is just a bit flakey even using using google or alexa. But when I do my setup works fine, I would not put an esphome based voice box where it can here the telly though, that is just asking for trouble.

My voice control here in the office is 8 feet from my speaker and rarely has any false positives. False positive do happen though.

1 Like

I have read as much as I can, as a non-coder it’s incredibly difficult when researching because the official documentation suggests you know a certain amount which I don’t, although I am trying to learn.
So then I bring in AI or website research and then things get really contradictory - a prime example is my media player question. I am convinced that I’ve read that a media player isn’t necessary with I2S audio, and when asking Claude it agreed.

So as a newbie that’s trying to learn it’s extremely difficult know ‘how’ to actually learn. As per my original post I’ve tried all the things I would think I should try to get a result, multiple products, multiple codes, and more.

Definitely seems like a gap in the YouTube market, explaining this stuff to idiots like me to help us learn.

This is me giving Claude all the information again and looking for confirmation, and this is why I’ve been confused

LLMs WILL confidently lie to you unless you have specifically grounded it in both your subject And given clear directives to not create content

Ask it for a verifiable source on each as proof and chase the links else consider it garbage. This goes for every. single. model especially with things re: Homeassistant or Esphome. They trained the model you’re using last year in bad data. :wink:

Oh I spend quite a bit of time creating a framework, giving it data sheets, websites, specific instructions, remove guesswork, only use verifiable info etc.

It doesn’t help that YAML is more of an organisational tool, so learning Python may help, C++ as well, but when you have no interest in becoming a programmer for things outside of making things work in ESPHome again finding the info is tough as is finding the resources to learn from.

1 Like

Most of issues now is you can’t have two voice devices wake at same time. Neither responds. I can be next to one and 15 foot (5 meter) from other and both wake. It’s my current pain

Wake detection is actually great.

Speech to phrase STT gives not an inch. If TV in background it picks one word from that and basically no match. Or it doesn’t have loud enough input so no match

Whisper is good but may include TV as part of command

Nabu is best. Don’t know why

Atom echo. Terrible but I recently heard if you swap speaker to larger better is is good. I plan to test this this week. For the price the mics are OK.

Atom echo s3. Cheap. Heard speaker is 3x louder

S3 box. Meh but works. Audio is poor but it’s cheap

HAVPE. Best choice. Just works. Audio not great. Definitely worth $40USD but >$70, I bite the bullet in support. It’s not ad g by supported so it worth it. I have 3

I have some waveshare voice devices. At $15 usd they are amazing. Not as good as HAVPE but definitely $15 good. Perfect to throw in barely used room and walk away

Pair an atom echo with a portable speaker and you may have something great . A 3d printer and you have a product.

You need to understand AI what ever you use is based on training from some time ago.

Esphome voice and media players have changed so much over the last 6 months that no matter how much you trust AI it is not going to provide the answer you need.

If you want a working voice assistant you will be far better off searching this forum for solutions.

just search for “esp32 n16r8 voice” you will see loads of working solutions, always pick the latest

2 Likes

Have you got a link to the YAML you use with the Waveshare please?

1 Like