Is there a way to stream audio from one ESPHome to another?

I’m thinking about building a voice doorbell using ESPHome with I2S microphone and I2S speaker. What I’d like to achieve is to send my voice through ESPHome microphone inside the house to another ESPHome running in my porch light.

I’d appreciate if anyone has done it before and is willing to share experience!

I’d also love to know this! For similar applications.

There’s not a whole lot of info about the microphone component in esphome to see what can be done with it.

I am waiting for years for this options.
Look here:

microphone:
  - platform: ...
    on_data:
      - logger.log:
          format: "Received %d bytes"
          args: ['x.size()']

It seems that audio are in the “x” variable, i dont dnow what type of data is. Waveform? Binary?
We could try to create a service on home assistant that when x changes state then sends its state to another esp media player and see what happens.

1 Like

I have looked into home assistant source code for voice assistant, but im not a coding expert, it seems that voice assistant start a UDP server for receiving audio, i have wrote a custom udp server and seems to be receiving udp packets, maybe could be useful for receiving audio stream:
image

The nightmare is voice assistant in esphome, i dont understand nothing on how it work.

Anyway

Here the udp receiver source code:
/config/custom_components/audio_receiver/manifest.json

{
    "domain": "audio_receiver",
    "name": "Audio Receiver",
    "version": "1.0",
    "documentation": "",
    "dependencies": [],
    "codeowners": [],
    "requirements": []
}

/config/custom_components/audio_receiver/__init__.py

import asyncio
import logging
import socket
from homeassistant.const import EVENT_HOMEASSISTANT_STOP

DOMAIN = "audio_receiver"

_logger = logging.getLogger(__name__)

class UDPAudioReceiver:
    def __init__(self, hass, host, port):
        self.hass = hass
        self.host = host
        self.port = port

    async def start(self):
        loop = asyncio.get_running_loop()
        self.transport, self.protocol = await loop.create_datagram_endpoint(
            lambda: UDPProtocol(self.hass),
            local_addr=(self.host, self.port)
        )
        self.hass.bus.async_listen_once(EVENT_HOMEASSISTANT_STOP, self.stop)

    async def stop(self, event):
        self.transport.close()

class UDPProtocol(asyncio.DatagramProtocol):
    def __init__(self, hass):
        self.hass = hass

    def datagram_received(self, data, addr):
        _logger.info(f"Data received: {data}")

async def async_setup(hass, config):
    host = "0.0.0.0"  # Replace with your configuration
    port = 12345  # Replace with your configuration
    receiver = UDPAudioReceiver(hass, host, port)
    hass.loop.create_task(receiver.start())
    return True

/config/custom_components/audio_receiver/config_flow.py

import voluptuous as vol
from homeassistant import config_entries
from homeassistant.core import callback
from . import DOMAIN

class AudioReceiverFlowHandler(config_entries.ConfigFlow, domain=DOMAIN):
    VERSION = 1
    CONNECTION_CLASS = config_entries.CONN_CLASS_LOCAL_PUSH

    @staticmethod
    @callback
    def async_get_options_flow(config_entry):
        return OptionsFlowHandler(config_entry)

    async def async_step_user(self, user_input=None):
        if user_input is not None:
            return self.async_create_entry(title="Audio Receiver", data=user_input)

        return self.async_show_form(
            step_id="user",
            data_schema=vol.Schema(
                {
                    vol.Required("host", default="0.0.0.0"): str,
                    vol.Required("port", default=12345): int,
                }
            ),
        )

class OptionsFlowHandler(config_entries.OptionsFlow):
    def __init__(self, config_entry):
        self.config_entry = config_entry

    async def async_step_init(self, user_input=None):
        if user_input is not None:
            return self.async_create_entry(title="", data=user_input)

        return self.async_show_form(
            step_id="init",
            data_schema=vol.Schema(
                {
                    vol.Required("host", default=self.config_entry.options.get("host", "0.0.0.0")): str,
                    vol.Required("port", default=self.config_entry.options.get("port", 12345)): int,
                }
            ),
        )

and configuration.yaml

audio_receiver:
  host: "0.0.0.0"
  port: 12345
1 Like

I am tryng this approach for grab and send audio from microphone section in esphome:

microphone:
  - platform: i2s_audio
    i2s_audio_id: i2s_in
    id: mic
    adc_type: external
    i2s_din_pin: GPIO23
    pdm: false
    on_data:
      - lambda: |-
          for (uint8_t byte : x) {
            id(audio_buffer).push_back(byte);
          }
          if (id(audio_buffer).size() >= 512) {
            int sock = ::socket(AF_INET, SOCK_DGRAM, 0);
            struct sockaddr_in destination;
            destination.sin_family = AF_INET;
            destination.sin_port = htons(12345);  //  UDP receiver port
            destination.sin_addr.s_addr = inet_addr("192.168.1.10");  //  UDP receiver IP

            ::sendto(sock, id(audio_buffer).data(), id(audio_buffer).size(), 0, reinterpret_cast<sockaddr*>(&destination), sizeof(destination));
            ::close(sock);
            id(audio_buffer).clear();
          }
globals:
  - id: is_capturing
    type: bool
    restore_value: no
    initial_value: "false"
  - id: audio_buffer
    type: std::vector<int16_t>
    restore_value: no
    initial_value: 'std::vector<int16_t>()'
  - id: sequence_number
    type: uint32_t
    restore_value: no
    initial_value: '0'

binary_sensor:
  - platform: esp32_touch
    pin: GPIO4
    threshold: 1000
    name: Action
    on_press:
      then:
        if:
          condition:
            lambda: "return !id(is_capturing);"
          then:
            - globals.set:
                id: is_capturing
                value: "true"
            - microphone.capture: mic
            - delay: 5s
            - globals.set:
                id: is_capturing
                value: "false"
            - microphone.stop_capture: mic

button:
  - platform: template
    name: "Cattura"
    on_press:
      - microphone.capture: mic
      - delay: 5s
      - microphone.stop_capture: mic

And receiver section:
/homeassistant/custom_components/audio_receiver/init.py

import asyncio
import logging
import wave
import os
from collections import deque
from datetime import datetime
from homeassistant.const import EVENT_HOMEASSISTANT_STOP

DOMAIN = "audio_receiver"

_logger = logging.getLogger(__name__)

class UDPAudioReceiver:
    def __init__(self, hass, host, port, save_path):
        self.hass = hass
        self.host = host
        self.port = port
        self.save_path = save_path
        self.buffer = deque()
        self.timeout_handle = None

    async def start(self):
        loop = asyncio.get_running_loop()
        self.transport, _ = await loop.create_datagram_endpoint(
            lambda: UDPProtocol(self),
            local_addr=(self.host, self.port)
        )
        _logger.info(f"UDP audio receiver started on {self.host}:{self.port}")
        self.hass.bus.async_listen_once(EVENT_HOMEASSISTANT_STOP, self.stop)

    async def stop(self, event):
        self.transport.close()
        _logger.info("UDP audio receiver stopped")
        self.save_as_wav()

    def save_as_wav(self):
        if self.buffer:
            timestamp = datetime.now().strftime("%H.%M")
            file_path = os.path.join(self.save_path, f"audio-{timestamp}.wav")
            _logger.info("Timeout reached, saving data...")
            with wave.open(file_path, 'wb') as wav_file:
                wav_file.setnchannels(1)
                wav_file.setsampwidth(2)
                wav_file.setframerate(44100)
                while self.buffer:
                    wav_file.writeframes(self.buffer.popleft())
            _logger.info(f"Audio data saved to {file_path}")
        else:
            _logger.info("Timeout reached, but no data to save.")

class UDPProtocol(asyncio.DatagramProtocol):
    def __init__(self, receiver):
        self.receiver = receiver

    def datagram_received(self, data, addr):
        _logger.info(f"Data received from {addr}")
        self.receiver.buffer.append(data)
        if self.receiver.timeout_handle:
            self.receiver.timeout_handle.cancel()
        self.receiver.timeout_handle = asyncio.get_event_loop().call_later(10, self.receiver.save_as_wav)

async def async_setup(hass, config):
    host = config[DOMAIN].get('host', '0.0.0.0')
    port = config[DOMAIN].get('port', 12345)
    save_path = config[DOMAIN].get('save_path', '/media/audio')
    receiver = UDPAudioReceiver(hass, host, port, save_path)
    hass.loop.create_task(receiver.start())
    return True

And config_flow.py

import voluptuous as vol
from homeassistant import config_entries
from homeassistant.core import callback
from . import DOMAIN

class AudioReceiverFlowHandler(config_entries.ConfigFlow, domain=DOMAIN):
    VERSION = 1
    CONNECTION_CLASS = config_entries.CONN_CLASS_LOCAL_PUSH

    @staticmethod
    @callback
    def async_get_options_flow(config_entry):
        return OptionsFlowHandler(config_entry)

    async def async_step_user(self, user_input=None):
        if user_input is not None:
            return self.async_create_entry(title="Audio Receiver", data=user_input)

        return self.async_show_form(
            step_id="user",
            data_schema=vol.Schema(
                {
                    vol.Required("host", default="0.0.0.0"): str,
                    vol.Required("port", default=12345): int,
                    vol.Required("save_path", default="/media/audio"): str,
                }
            ),
        )

class OptionsFlowHandler(config_entries.OptionsFlow):
    def __init__(self, config_entry):
        self.config_entry = config_entry

    async def async_step_init(self, user_input=None):
        if user_input is not None:
            return self.async_create_entry(title="", data=user_input)

        return self.async_show_form(
            step_id="init",
            data_schema=vol.Schema(
                {
                    vol.Required("host", default=self.config_entry.options.get("host", "0.0.0.0")): str,
                    vol.Required("port", default=self.config_entry.options.get("port", 12345)): int,
                    vol.Required("save_path", default=self.config_entry.options.get("save_path", "/media/audio")): str,
                }
            ),
        )

configuration.yaml

audio_receiver:
  host: "0.0.0.0"
  port: 12345
  save_path: "/media/audio" #folder audio has to be created.

I am start the capture or via button or via touch pin, i can see the udp packet sended and received, the receiver after some time of inactivity save the buffer to wav the file is created but i can’t hear nothing of relevants inside wave file :frowning: the capture time is 5 seconds but i receive only a 1 second file with some noise.

Oh neat. This is at least a good start. I’m not too familiar with ESPHome programming but let me try to read the doc. Thanks!

If the goal is to transmit audio from an esp in a door bell and transmit it to a receiver indoors, why would you even use esphome? There are several transmitter/receiver projects available online already you couod use.

I made it. Partially.
There was an update, only documented in ESPhome changelog that x is now uint16_t.
I also had just noise till i changed that. Now I do have chopped but clear voice.

I do tests with VLC at “udp://@:12345” and Options “:network-caching=1000 :demux=rawaud :rawaud-channels=1 :rawaud-samplerate=16000”

2 Likes

I’ve got the same setup running, VLC and an INMP441 hooked up to an Olimex ESP32-C3, but I’m only getting garbled noise. This is the YAML file:

substitutions:
  display_name: record

esphome:
  name: ${display_name}
  name_add_mac_suffix: true  
  platformio_options:
    board_build.mcu: esp32c3
    board_build.variant: esp32c3  
  includes:
    # should contain single line: #include <esp_task_wdt.h>
    - wdt_include.h
    # should contain #include <sys/socket.h>
    # and            #include <netinet/in.h>
    - std_includes.h 
  on_boot:
    then:
      - lambda: !lambda |-
          // increase watchdog timeout
          esp_task_wdt_init(90, false);

esp32:
  variant: ESP32C3
  board: esp32dev
  framework:
    type: esp-idf
    sdkconfig_options:
      CONFIG_BT_BLE_50_FEATURES_SUPPORTED: y
      CONFIG_BT_BLE_42_FEATURES_SUPPORTED: y
      CONFIG_COMPILER_OPTIMIZATION_PERF: y      
      CONFIG_ESP_TASK_WDT_TIMEOUT_S: "90"


status_led:  
  pin: 
    number: GPIO8
    inverted: true

logger: 


wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  fast_connect: true
  reboot_timeout: 1h  

  ap:
    ssid: !secret ap_ssid
    password: !secret ap_pass


web_server:
  auth:
    username: !secret web_username
    password: !secret web_password


i2s_audio:
  id: i2s_in
  i2s_lrclk_pin: 5 # ws
  i2s_bclk_pin: 6 # sck

microphone:
  - platform: i2s_audio
    i2s_audio_id: i2s_in
    id: mic
    adc_type: external
    i2s_din_pin: 4 # sd
    # bits_per_sample: 32bit
    # sample_rate: 32000
    pdm: false
    channel: right
    on_data:
      - lambda: |-
          for (int16_t byte : x) {
            id(audio_buffer).push_back(byte);
          }
          if (id(audio_buffer).size() >= 512) {
            int sock = ::socket(AF_INET, SOCK_DGRAM, 0);
            struct sockaddr_in destination;
            destination.sin_family = AF_INET;
            destination.sin_port = htons(12345);  //  UDP receiver port
            destination.sin_addr.s_addr = inet_addr("192.168.2.10");  //  UDP receiver IP

            ::sendto(sock, id(audio_buffer).data(), id(audio_buffer).size(), 0, reinterpret_cast<sockaddr*>(&destination), sizeof(destination));
            ::close(sock);
            id(audio_buffer).clear();
          }
globals:
  - id: is_capturing
    type: bool
    restore_value: no
    initial_value: "false"
  - id: audio_buffer
    type: std::vector<int16_t>
    restore_value: no
    initial_value: 'std::vector<int16_t>()'
  - id: sequence_number
    type: uint32_t
    restore_value: no
    initial_value: '0'


button:
  - platform: template
    name: "Record"
    on_press:
      - microphone.capture: mic
      - delay: 5s
      - microphone.stop_capture: mic

The changelogs mention the uint8_t change here: ESPHome 2023.6.0 - 21st June 2023 — ESPHome

Do you maybe have any idea why I keep getting garbled noise? I’ve tried 2 different microphones, they do seem to work with this project: GitHub - stas-sl/esphome-sound-level-meter

That component however uses it’s own I2S code and does some bit shifting:

i2s:
  bck_pin: 4
  ws_pin: 5
  din_pin: 6
  sample_rate: 48000            # default: 48000
  bits_per_sample: 32           # default: 32
  dma_buf_count: 8              # default: 8
  dma_buf_len: 256              # default: 256
  use_apll: true                # default: false

  # right shift samples.
  # for example if mic has 24 bit resolution, and
  # i2s configured as 32 bits, then audio data will be aligned left (MSB)
  # and LSB will be padded with zeros, so you might want to shift them right by 8 bits
  bits_shift: 8                 # default: 0

I solved my issue because of this genious: https://www.reddit.com/r/Esphome/comments/14f5mdf/i2s_sound_sampling_rate_anomalies/

id(audio_buffer).size() does not have the right size. It is the size of elements, but not the size on bytes. " *2" solved my issue! With this I was able to send a clear audio stream to my previous posted VLC-Settings.
As you can see, I also made some of the variables global just to not set them on every loop again.

esphome:
  name: "${name}"
  on_boot:
  - priority: 210.0
      #before MQTT
    then:
    - lambda: |-
        id(destination).sin_family = AF_INET;
        id(destination).sin_port = htons(12345);  //  UDP receiver port
        id(destination).sin_addr.s_addr = inet_addr("192.168.XX.XX");  //  UDP receiver IP

globals:
  - id: mqtt_mic_active
    type: unsigned long
    initial_value: '0'
  - id: audio_buffer
    type: std::vector<int16_t>
    restore_value: no
    initial_value: 'std::vector<int16_t>()'
  - id: sock
    type: int
    restore_value: no
  - id: destination
    type: "struct sockaddr_in"
    restore_value: no

i2s_audio:
  id: i2s_in
  i2s_lrclk_pin: GPIO26
    #WS
  i2s_bclk_pin: GPIO25
    #SCK
microphone:
  - platform: i2s_audio
    i2s_audio_id: i2s_in
    id: inmp441_mic
    adc_type: external
    i2s_din_pin: GPIO33
        #SD
    pdm: false
    use_apll: false
    bits_per_sample: 32bit
        #scaled down to 16bit
    sample_rate: 16000
    channel: right
        #L/R PIN (4) is on low then the left channel is activated, and otherwise the right channel
        #seems to be twisted in esphome...
        # -> right = low
    on_data:
      #The on_data trigger (and the internal callback) for the microphone now provides std::vector<int16>
      - lambda: |-
          for (uint16_t byte : x) {
            id(audio_buffer).push_back(byte);
          }
          if(id(audio_buffer).size() >= 256) {
            id(sock) = ::socket(AF_INET, SOCK_DGRAM, 0);
            ::sendto(id(sock), id(audio_buffer).data(), id(audio_buffer).size() *2, 0, reinterpret_cast<sockaddr*>(&id(destination)), sizeof(id(destination)));
            ::close(id(sock));
            id(audio_buffer).clear();
          }

Hopefully someone else finds this useful.

4 Likes

Thank you for this! My mic needs the left channel, but for the rest the issue with the choppyness mentioned earlier is resolved.

This did result in a pull request because the on_capturing condition for the microphone does not work:

Cheers

test the code on esp32,but the output all noise , esphome version is 2024.9

If anyone else stumbles on this and doesn’t have VLC, I had some success getting a (somewhat distorted) stream using MPV with the above esphome code:

mpv --no-resume-playback udp://0.0.0.0:12345 -v --demuxer=rawaudio --demuxer-rawaudio-channels=1 --demuxer-rawaudio-rate=16000 --demuxer-rawaudio-format=s16be

Works great, thanks a lot for posting this!

Btw the ugly globals can be replaced with static variables inside the lambda, makes the code cleaner:

    ...
    on_data:
      - lambda: |-
          static std::vector<int16_t> audio_buffer;
          static struct sockaddr_in  destination = {
            .sin_family = AF_INET,
            .sin_port = htons(12345),
            .sin_addr = { .s_addr = inet_addr("192.168.X.X") }
          };

          for (uint16_t byte : x) {
            audio_buffer.push_back(byte);
          }
          if(audio_buffer.size() >= 256) {
            int sock = ::socket(AF_INET, SOCK_DGRAM, 0);
            ::sendto(sock, audio_buffer.data(), audio_buffer.size() *2, 0, reinterpret_cast<sockaddr*>(&destination), sizeof(destination));
            ::close(sock);
            audio_buffer.clear();
          }