I wanted to join the year of the voice, albeit a couple of years late! I have a number of Google nest speakers around the house, but they can’t control every thing that is managed by home assistant, plus I am slowly trying to move more to local services.
I had a few options
- get one of the Home Assistant Preview boxes - about AUD$105 locally
- get an ESP32-S3-BOX - around the USD60 mark
- build something myself (spoiler alert - I did this, and it cost less than USD$10)
Not wanting things to be too easy, I thought I’d try the build it myself route. To do this, I could use my 3D printer to make an enclosure, but these can look a bit amateurish plus as I also needed a speaker I thought why not find a cheap portable speaker and hack that. I had a hunt on Aliexpress and found the “Lenovo K3 Pro 5.0 Portable Bluetooth Speaker Stereo Surround Wireless Bluetooth Speakers Music Audio Player Loudspeaker” for USD$6 delivered. Done.
When it arrived I found it looked decent enough, and sounded pretty good as well.
Although initially I didn’t plan to cast music to it, maybe later down the track I will. The thing claims to have a 1200mAh battery in it, and indeed on opening it I found it had a 18650 1200mAh battery. Speaking of which, to open:
- Peel off the rubber plate over the buttons
- Loosen the three phillips head screws - they are quite recessed, so will need a long screwdriver to access them.
- The base and speaker cover can then be separated. Be very careful when removing things as a) the wires are very thin and delicate and b) it would be super easy to short circuit the battery and allow the magic smoke to escape
- The speaker rests inside the speaker enclosure - no clips or glue - and for assembly is held in there by friction then a foam block presses between the battery and the speaker magnet
- The battery is held in by two clips - slowly pull it upwards on one side and then can pull it out from the other side
- The circuit board is held on by two phillips head screws, and the usb port is held in with one screw
The battery will provide somewhere from 3.7V to 4.2V (maybe more, maybe less) depending on the state of charge. The button used to turn the speaker on/off has roughly 3.3V available constantly from the bottom left hand pin (looking at the board with the button and antenna facing you) and the top right hand pin. When the power is on, that voltage does drop slightly, but probably will still provide enough power for an ESP device.
Planning
So three options:
- Grab 5V from the USB port, but that would then mean the ESP would not be available if the speaker is running off battery
- Grab 3.3V from the power button
- Grab the power directly from the battery - but that will need something to manage the power so would be slightly more complicated
I am currently assuming that this will be powered constantly, so I might eventually go with option #1, although maybe will try option #2 as it would be nice for it to be portable occasionally.
The speaker has a microphone - would like to use it, but it is actually located on the base of the speaker so not ideal. Might be better to use a separate microphone and put it either on one side of the speaker (after drilling a hole) or behind the speaker grill.
The speaker amplifier will need some reverse engineering that is currently beyond what I can be bothered worrying about. The speaker is a 4 ohm 5W job, so might simply bypass the motherboard completely and wire the esp up to the speaker via something like a MAX98357 - should be able to pump over 3W to it at 5V. Actually, part of me is tempted to remove the battery & motherboard and just leave the speaker. Hmmm…
Actually, yep. Gonna do that. So plan has finalised. Going to delete the motherboard and battery - will free up more room for the esp and anything else I might want to put in. Will just need to use tape or hot glue to keep the speaker firmly in place. The battery will not be wasted - can use that for something else, for example a 18650 shield costs less than USD$3 and provides good backup power for an esp device.
Would like to have some form of visual feedback when speaking to it, so will want at least two holes - one for the microphone and one for an LED. It is quite possible the glow through the speaker might be enough - will see!
As for the microhone, found the existing microphone on the motherboard presses up against a rubber tube that goes to a hole in the base. Will try using that - if it doesn’t work then will drill a hole in the side of the case.
Parts cost (in $USD):
$6 for speaker & enclosure
$3.40 lolin d32
$2.40 MAX98357 amplifier
$0.25 led
$0.10 wires & solder
Less credit for the battery - worth maybe $3 - total well under $10
Assembly
Initially wanted to use an ESP32-C3 Supermini Plus as it is super compact and has RGB led built in, and if you get one with the external antenna the coverage is excellent. But then found out that the C3 only supports one i2s channel. So can’t have both a speaker AND microphone without complications I can live without. Dang. Changed to a Lolin D32 (I had one spare). As it’s much bigger it meant I had to directly solder wires to it rather than using the dupont cables I prefer when experimenting, but what the hey. (At a pinch it might have just fitted using dupont, but I might later add in extra sensors so the extra space gained is worth the hassle).
I wired everything up - ran power from the micro usb port to the esp32, cabled up the microphone (first removing the pins on it), amplifier, and led then checked that it all worked. (Note that depending on the esp you use, you might be able to line up the esp usb socket with the hole itself and/or you might want to replace the micro with a usb-c port.) Then I hot glued (not my proudest moment, but it does the job) the microphone so the hole on the board lined up with the hole in the base, drilled a hole for the led (it looks like metal, but is just plastic) and glued that in, then shoved everything else inside. Even with the larger Lolin D32 there was stacks of room left for future sensors like temperature, millimeter wave, etc. I haven’t permanently fixed the speaker in place as thought I might need to move the microphone etc - will do that later.
Again, not my proudest work - I will tidy up the cabling etc at some point. As an aside, potentially could mount the microphone and LED into the top of the speaker - there is probably just enough room between the cone of the speaker and the grill - but I took the easy way out for the moment.
In hindsight maybe I should have tried to centre the LED under the “k”… ah well… maybe I can put another one in and give it eyes.

How does it work? Not bad at all. The microphone is surprisingly sensitive - can pickup commands from across a large room even though the microphone is under the speaker itself. Volume is decent enough, and the speaker sounds good. The only issues I have are really not to do with it but rather the backend. It struggles to accurately understand words. Mis-hears things like “lamp” as “lab”, or joins them together. For example, to get the command “turn on Richard Lamp” to be semi reliable I needed to add in extra aliases under settings/voice/expose/entity such as “richardlamp”, “rigid lamp”, “rigidlamp”, “rigid lab”, “rigidlab”, “Richard’s lamp”, and “Richard’s lab”
Slightly annoying, but can cope with that by adding in extra aliases for the devices that are frequently used. Another slightly annoying thing is media_player is built on the arduino framework, and currently not supported on esp-idf. This matters because “ESP-IDF is needed to include an audio library called ESP_ADF used in our voice assistant”. There are some ways to get around this (eg GitHub - gnumpi/esphome_audio: Custom audio components for ESPHome), but for now I won’t be casting music to the speaker.
More of an issue is it frequently stutters on playback - this apparently may be fixed by tweaking piper and/or moving to openai (as I don’t want to use external services, I might try llama which in turn might help with the voice recognition accuracy). For now though it is still a work in progress, and the Google nest speakers are staying.
Oh, as an aside, I enabled ble tracking on the thing but found that memory usage was too high and caused issues - so had to disable that. It may work on different ESP32 devices that have more memory available, but not the one I am using.
FWIW Here’s my current code:
# Compiled and tested on esphome 2025.4.0 and HA 2025.4.4
# Used INMP441 module for microphone - note that:
# - Mixed info re L/R pin. Some say can leave unconnected, some say needs to be grounded
# - L/R pin uses low/high to toggle between left or right using gnd or vcc
#
substitutions:
devicename: jarvis01
location: study
ledpin: GPIO16 # ws2811 - legs left to right 1,2,3,4,notch: 1(din), 2(gnd), 3(5v), 4(dout)
wspin: GPIO26 #WS or Word Select or Left Right clock
sckpin: GPIO25 #SCK or Serial Clock or Bit Clock
dinpin: GPIO14 #Data In or SD or Serial Data
gpiopin: GPIO13 #DIN Pin of the MAX98357A Audio Amplifier
# might want to also add in a temp sensor eg DS18B20, HDC1080?
# might want to add in a proximity sensor eg HLK-LD2410B?
esphome:
name: $devicename
friendly_name: $devicename
min_version: 2024.6.0
name_add_mac_suffix: false
project:
name: ninkasi.ble
version: '0.1'
comment: Jarvis LOLIN D32 $location
platformio_options:
build_flags:
- "-D CONFIG_ADC_SUPPRESS_DEPRECATE_WARN=1" # Putting this in temporarily to remove warning “legacy adc calibration driver is deprecated" message during compilation - https://github.com/esphome/issues/issues/5153#issuecomment-1847547482
esp32:
board: esp32dev
framework:
type: esp-idf
version: recommended
# Custom sdkconfig options
sdkconfig_options:
COMPILER_OPTIMIZATION_SIZE: y
# Advanced tweaking options
advanced:
ignore_efuse_mac_crc: false
# Enable logging
logger:
# baud_rate: 0 # disable serial uart logging to maybe save a little ram
# logs:
# component: ERROR
api:
encryption:
key: !secret esphome_encryption_key
on_client_connected:
then:
- delay: 50ms
- micro_wake_word.start:
on_client_disconnected:
then:
- voice_assistant.stop:
ota:
password: !secret ota_password
platform: esphome
wifi:
networks:
- ssid: !secret wifIoT_ssid
password: !secret wifIoT_password
priority: 2
# Backup SSID just in case
- ssid: !secret wifi_ssid
password: !secret wifi_password
priority: 1
# Enable fallback hotspot (captive portal) in case wifi connection fails
ap:
ssid: "$devicename Fallback Hotspot"
password: !secret ota_password
# Remember to install via cable initially if enabling ble tracker below
# Note - enabling this dropped free memory to below 30kb and caused instability
#esp32_ble_tracker:
# scan_parameters:
# # continuous: false
# active: True
# interval: 211ms # default 320ms
# window: 120ms # default 30ms
#bluetooth_proxy:
# active: true
light:
- platform: esp32_rmt_led_strip
id: led
rgb_order: RGB
pin:
number: $ledpin
# ignore_strapping_warning: true # enable this if you need to use a strapping pin
num_leds: 1
chipset: ws2811
name: "Status LED"
default_transition_length: 0s
effects:
- pulse:
name: "extra_slow_pulse"
transition_length: 800ms
update_interval: 800ms
min_brightness: 0%
max_brightness: 30%
- pulse:
name: "slow_pulse"
transition_length: 250ms
update_interval: 250ms
min_brightness: 50%
max_brightness: 100%
- pulse:
name: "fast_pulse"
transition_length: 100ms
update_interval: 100ms
min_brightness: 50%
max_brightness: 100%
switch:
- platform: template
id: mute
name: "Mute microphone"
optimistic: true
on_turn_on:
- micro_wake_word.stop:
- voice_assistant.stop:
- light.turn_on:
id: led
red: 100%
green: 0%
blue: 0%
brightness: 30%
- delay: 2s
- light.turn_off:
id: led
- light.turn_on:
id: led
red: 100%
green: 0%
blue: 0%
brightness: 30%
on_turn_off:
- micro_wake_word.start:
- light.turn_on:
id: led
red: 0%
green: 100%
blue: 0%
brightness: 60%
effect: fast pulse
- delay: 2s
- light.turn_off:
id: led
i2s_audio:
- id: i2s
i2s_lrclk_pin: $wspin
i2s_bclk_pin: $sckpin
microphone:
- platform: i2s_audio
id: va_mic
adc_type: external
i2s_din_pin: $dinpin
channel: left
i2s_audio_id: i2s
output:
- platform: gpio
pin:
number: $gpiopin
allow_other_uses: true
id: set_low_speaker
speaker:
platform: i2s_audio
id: va_speaker
i2s_audio_id: i2s
dac_type: external
i2s_dout_pin:
number: $gpiopin
allow_other_uses: true
channel: mono
bits_per_sample: 32bit
sample_rate: 16000
# Can use the following to provide a volume control
# Note that there can be an impact on voice quality
#number:
# - platform: template
# name: "Volume"
# id: volume
# unit_of_measurement: "%"
# min_value: 0
# max_value: 1
# step: 0.1
# mode: SLIDER
# update_interval: never
# optimistic: true
# restore_value: true
# initial_value: 0.5
# icon: "mdi:knob"
# entity_category: config
# on_value:
# - speaker.volume_set: !lambda "return x;"
micro_wake_word:
models:
- model: hey_jarvis
on_wake_word_detected:
- voice_assistant.start:
- light.turn_on:
id: led
red: 100%
green: 100%
blue: 100%
brightness: 30%
effect: scan
voice_assistant:
id: va
microphone: va_mic
speaker: va_speaker
noise_suppression_level: 2.0
volume_multiplier: 4.0
on_stt_end:
then:
- light.turn_off: led
on_error:
- micro_wake_word.start:
on_end:
then:
- light.turn_off: led
- wait_until:
not:
voice_assistant.is_running:
- micro_wake_word.start:
sensor:
- platform: uptime
name: "$devicename Uptime"
- platform: wifi_signal
name: "$devicename WiFi Signal"
update_interval: 60s
- platform: template
name: $devicename free memory
lambda: return heap_caps_get_free_size(MALLOC_CAP_INTERNAL);
icon: "mdi:memory"
entity_category: diagnostic
state_class: measurement
unit_of_measurement: "b"
update_interval: 60s
# Ah. Turns out that media_player is built on the arduino framework. Currently not supported on esp-idf
# "ESP-IDF is needed to include an audio library called ESP_ADF used in our voice assistant"
# So don't bother with this yet
# Hack here if interested: https://github.com/gnumpi/esphome_audio
#
#
#media_player:
# - platform: i2s_audio
# name: Media Player
# dac_type: external
# i2s_audio_id: i2s_out
# i2s_dout_pin: $gpiopin
# mode: mono
# id: i2s_media
# icon: mdi:speaker-wireless