Offline voice recognition device to control your HA

Hello everyone,

My friend and I have been working on an offline voice recognition device, ESPVoice, for the past couple of months and I think we are ready to share it with you guys.

Background
We started this project because we were having a few issues with existing smart speakers. For example:

  • setting up smart speaker in HA is not straight forward
  • internet/cloud dependent
  • lock-in into smart speaker ecosystem
  • not easy to link voice command to HA automation
  • privacy concerns as smart speaker is always listening
  • inaccurate recognition for non-native speaker
  • limited language support

We wanted a device that is able to control Home Assistant using voice command without relying on any cloud based solutions, something that is straightforward and easy to set up and use.

Solution

ESPVoice interations: prototype to final product

After a bit of research, we’ve found a nice little chip SU-03T that is able to do offline voice recognition and output the results to a serial output. Coupled with an ESP32, we are able integrate the chip into Home Assistant and control devices via voice.

The circuit itself is pretty straight forward, basically we just need to connect the TTL uart of SU-03T and ESP32. ESP32 is running on ESPHome framework. We are able to read the serial output from the chip and forward to Home Assistant using Custom UART Text Sensor. Likewise, using Custom UART Switch, we can create a switch in Home Assistant to send commands to the chip. In this way we are able to have two way communication between Home Assistant and our device. Neat!

Below is a sample code to do so:

esphome:
  includes:
    - uart_read_line_sensor.h

esp32:
  board: esp32dev
  framework:
    type: arduino

uart:
  id: uart_bus
  tx_pin: GPIO17
  rx_pin: GPIO16
  baud_rate: 9600

text_sensor: 
  - platform: custom
    lambda: |-
      auto my_custom_sensor = new UartReadLineSensor(id(uart_bus));
      App.register_component(my_custom_sensor);
      return {my_custom_sensor};
  
    text_sensors:
      name: "espvoiceuart"

button:
  - platform: template
    name: "Test Button"
    id: "testButton"
    on_press:
      - uart.write: [0xAA, 0x55, 0x00, 0x55, 0xAA]

We have since gone through a few iterations of redesigning the device, tuning the firmware and package everything into a self-contained unit. The final product, ESPVoice, contains an ESP32 board, voice recognition chip, speaker and mic. ESPVoice runs on an ESPHome compatible firmware and can be integrated into Home Assistant as an ESPHome device without much configuration.

Here are some demonstrations of the device: ESPVoice playlist

Limitations
ESPVoice is an offline voice recognition device that only responds to commands that are pre-trained by the user. User has to say the exact pre-trained words in order for the recognition to work. It is not able to understand the context of a speech like an existing smart speaker.

Feel free to check out our website for more information.
For those who are interested in trying out the device, we have created a small batch of hardware that is ready to be shipped.

Thanks!

4 Likes

Wow! Cool, where can I order? Is it self configurable (esphome?)

Hi, you can order from our online shop: Shop - ESPVoice
Not too sure about ‘self configurable’ that you refer to, but the firmware uses esphome framework. So once I power up ESPVoice and connect to my local network, Home Assistant is able to automatically detect the device as an esphome device, as shown below:


You can also able reprogram the ESP32 if required. We are currently writing up the documentations the device.

I see esphome 2023.4 has microphone and voice assistant support

This sounds amazing. Is it in any way language-dependent? I’m considering ordering one of the initial 50 pcs to see if it works with German.
Also, does it understand more than one person’s voice?

i received mine today

  1. when i powered it on, it said it was preprogrammed with a wake word (which i didn’t knew) did i get a second hand model ?
  2. the wifi ap didn’t appear either
  3. i opened it, and resetted it a few times until the AP appeared
  4. trained a wake word
  5. trained words for 01 & 02: worked (language doesn’t matter, mine is german)
  6. tried to train a word for 12: didn’t work
  7. now it doesn’t respond properly anymore:
    if i press any command in HA, no further prompt comes and (i guess) the command to the voice chop times out and it says after a few seconds “call me again if you need help”
    feels like as if the voice chip is crashed, although the wiring shows audio is completely done by the extra chip and not by the esp32

resetting the learned data didn’t help either

i’ll let it rest and try again later

it doesn’t say anything on the serial, besides the info an esp prints on reset

update:
repluggedit and had to relearn
feels like as if the mic doesn’t work most of the time

Got mine today.
Looks good and works directly with starting sequence.

integration in HA should work easy, unfortunately not for me …
ESPHome will find the device, but there is no control TAB to configure the 12 Speach recognition phrases.
Worked through manual (very detailed!!!), but it didn´t work for me.

I wanted to make a factory reset, but without buttons: no chance.
@bedi Is it possible to hard rest the factory settings?

Kind regards
FraMic

Unfortunately this appears to be a dead project. Website / store is gone.