Hello everyone,
My friend and I have been working on an offline voice recognition device, ESPVoice, for the past couple of months and I think we are ready to share it with you guys.
Background
We started this project because we were having a few issues with existing smart speakers. For example:
- setting up smart speaker in HA is not straight forward
- internet/cloud dependent
- lock-in into smart speaker ecosystem
- not easy to link voice command to HA automation
- privacy concerns as smart speaker is always listening
- inaccurate recognition for non-native speaker
- limited language support
We wanted a device that is able to control Home Assistant using voice command without relying on any cloud based solutions, something that is straightforward and easy to set up and use.
Solution
After a bit of research, we’ve found a nice little chip SU-03T that is able to do offline voice recognition and output the results to a serial output. Coupled with an ESP32, we are able integrate the chip into Home Assistant and control devices via voice.
The circuit itself is pretty straight forward, basically we just need to connect the TTL uart of SU-03T and ESP32. ESP32 is running on ESPHome framework. We are able to read the serial output from the chip and forward to Home Assistant using Custom UART Text Sensor. Likewise, using Custom UART Switch, we can create a switch in Home Assistant to send commands to the chip. In this way we are able to have two way communication between Home Assistant and our device. Neat!
Below is a sample code to do so:
esphome:
includes:
- uart_read_line_sensor.h
esp32:
board: esp32dev
framework:
type: arduino
uart:
id: uart_bus
tx_pin: GPIO17
rx_pin: GPIO16
baud_rate: 9600
text_sensor:
- platform: custom
lambda: |-
auto my_custom_sensor = new UartReadLineSensor(id(uart_bus));
App.register_component(my_custom_sensor);
return {my_custom_sensor};
text_sensors:
name: "espvoiceuart"
button:
- platform: template
name: "Test Button"
id: "testButton"
on_press:
- uart.write: [0xAA, 0x55, 0x00, 0x55, 0xAA]
We have since gone through a few iterations of redesigning the device, tuning the firmware and package everything into a self-contained unit. The final product, ESPVoice, contains an ESP32 board, voice recognition chip, speaker and mic. ESPVoice runs on an ESPHome compatible firmware and can be integrated into Home Assistant as an ESPHome device without much configuration.
Here are some demonstrations of the device: ESPVoice playlist
Limitations
ESPVoice is an offline voice recognition device that only responds to commands that are pre-trained by the user. User has to say the exact pre-trained words in order for the recognition to work. It is not able to understand the context of a speech like an existing smart speaker.
Feel free to check out our website for more information.
For those who are interested in trying out the device, we have created a small batch of hardware that is ready to be shipped.
Thanks!