ESP32 S3 Box 3: Why is this so difficult?!

Thanks for the reply, late yesterday I started to see people with same problem. That actually makes me happy because I know it’s not something that I borked. In that regard, misery loves company. I’ll hang out and wait for updates.

I used the Willow add-on. Problem free so far.

I don’t know why you guys are wasting your time on this. It’s all good if you like science experiments. The HA voice stuff just is NOT ready for prime time. Try Willow. I wrote up a detailed doc explaining how to get it working and IT WORKS. Literally 15 minutes to set it up…

Use the link Stiltjack posted.

3 Likes

I think you are misunderstanding what my ask is. Willow still needs a voice “box” for you to talk to. I already have the voice assistant part of HA working just fine, I’m struggling with that box.

ESP32 S3 Box 3 is the voice box. Speech recognition is very good ideed.

No speaker sound here as well, just a “click”.
Watch the log while compiling, there are a lot of issue’s to be solved so it seems .

Short term fix - downgrading to an older version of esp-idf:

esp32:
  board: esp32s3box
  flash_size: 16MB
  framework:
    type: esp-idf
    version: 4.4.6
1 Like

I had a startlingly easy go of it with my BOX unit (non S3). A couple weeks ago, I did the following:

  • Selected my device type

  • Hit Connect, selected proper COM port, & hit Install.

A couple minutes later, I had a working voice assist satellite device, with working mic, speaker, and cutesy display.

I immediately asked it “Hey Nabu, turn off XX lights”
And much to my surprise, without doing any other config, IT DID IT!
(+ other voice requests)

I haven’t really played with it much since, & don’t know if it’ll have any issues with newest ESPHome code.

But I was absolutely flabbergasted that it came up so easily.

I’m still doing HA voice control here with Alexa and/or Siri. But I’m now convinced that HA Voice Assist is much farther along than I had thought, & I can seriously think about starting to move to it anytime.

You can find all the YAML code that it’s using, via the GitHub link near bottom of page – firmware/voice-assistant at main · esphome/firmware · GitHub

Follow-up: If anyone uses the Ready Made Projects to build a voice satellite with M5Stack Atom Echo device, I’d have to agree that unit is more toy like. Almost un-hearable speaker volume. And no wake word – you have to push its button for Push-to-Talk functionality (apparently not enough horsepower to process wake word on device). And mic didn’t always seem to pick up voices clearly.

But even so, it IS a functional voice assist satellite, which worked immediately after loading firmware. (And is cheap way to experiment)

Thanks for that! I tried that but that ended up breaking the microphone.

I’m in the death spiral of attempting to install as well. I’ve follow the instructions here:

ESP32-S3-BOX voice assistant - Home Assistant (home-assistant.io)

which don’t work as there is never an option to set the WIFI credentials.

I tried this: firmware/wake-word-voice-assistant/esp32-s3-box-3.yaml at main · esphome/firmware · GitHub but it was rejected as being too large.

Anyone with a suggestion on how to proceed with the ESP32-S3-BOX-3 variant?

I agree, it shouldn’t be this hard. This is a bad look for the Home Assistant/ESPHome ecosystem.

1 Like

Look here

2 Likes

I had a similar issue on Arch Linux. So this may not be relevant to you. I found that after the device got flashed the permissions on the USB reverted back and I didn’t have access anymore. As a hack I opened a background terminal and did:

while :; do chmod 666 /dev/ttyACM0; sleep 2;done

Replace ACM0 with your device name. Keep it running until you’re all done then ctrl+c

1 Like

FWIW, I couldn’t get this to work using Windows 11…it did work using using @pepe59’s suggestion on Ubuntu/Chrome. No clue as as to why.

I think you need to pin ESPHome to 2024.4.x to fix it.
It works in my branch under GitHub - D3SOX/ESPHome-firmware: Holds firmware configuration files for projects that the ESPHome team provides.

This is really disappointing, there is a lot of blabla going on about voice assist, so I bought this very nice exp32 S3 box 3(the m5stack echo was a rather disappointing experience), got the yaml code into esphome in HA, and I end up with a non-working speaker. It does process voice commands, not as good as Google, but that’s oke for now.
While compiling the code, the log screen is flooded with messages / warnings / errors. How can this be unnoticed by the creators ?
I followed the advice from @smcnaught to switch back to the esp framework version: 4.4.6, that works for me, for now :slight_smile:

1 Like

Hello, where exactly should I write these few lines that restore ESPHOME to the previous version?
Does /homeassistant/esphome/esp32-s3-box-3-5a93fc.yaml match the beginning or the end?

source on ESPHome-firmware/voice-assistant/esp32-s3-box-3.yaml at 0fe1bcec60bfc415dc793357783a28685e95dc1e · D3SOX/ESPHome-firmware · GitHub is already patched, use that one.

1 Like

Thank’s! Do I need to write something? Wifi password something like that?

I finally got my ESP32-S3-BOX-3 and whacked the standard yaml on with only two changes:

micro_wake_word_model: hey_jarvis

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

I have 3 separate issues:

  1. Speaker doesn’t work. Known issue and I’ll try downgrading firmware to 4.4.6 as recommended above.

  2. Microphone is pretty bad. I tried talking to it from 2m away in a quiet room and it didn’t hear me. I had to be 0.5m away and even then it didn’t hear me sometimes. Is this also a bug in recent firmware?

  3. It takes aaaaaaaaaaages to acknowledge my command. I know my back-end (whisper & piper) is fine because if I trigger Assist on my phone and say “turn on lights”, it takes 3-4s at most. When I do it via my ESP32-S3-BOX-3, it takes around 20s! Could this be a configuration issue or just the way it is?

Appreciate any guidance with this!

EDIT: Setting ESP32 framework version to 4.4.6 broke my microphone too, until I downgraded my ESPHome docker container to version 2024.4.2. Now both mic and speaker work, and it takes 4-5s to complete a simple command, which is comparable to using my phone.

So the primary issue I’m left with is the wake word. It just doesn’t seem responsive at all, I always have to say it about 5-6 times even when only 20cm away. Any tips for this?

4 Likes