Is there a way to stream audio from one ESPHome to another?

you are right… my taget is to dump it to a file and then get it translated to text with google.

How do i call it: python rec_stream_to_file.py 12345 test.wav None 1 true true
doesn’t retrun anything.

The code I pasted is just a Python function. You’ll have to write your own bit of Python ‘main’ code in the file to parse your command line arguments and call the function. (See sys.argv in Python docs.)

ah ok now …
i understand

I think i could need some more help…
i am passing the args

import argparse
import pathlib

parser = argparse.ArgumentParser()
parser.add_argument("-p","--port", type=int)
parser.add_argument("-f","--file", type=str)
parser.add_argument("-l","--max_length", type=int)
parser.add_argument("-t","--max_timeout", type=int)
parser.add_argument("-fb","--feedback", type=bool)
parser.add_argument("-dae","--daemon", type=bool)
args = parser.parse_args()

then behind the function i open the file, and call the server

temp_file = open(args.file, "w")
server(args.port,temp_file,args.max_length,args.max_timeout,args.feedback,args.daemon)

but i get an error when callig it:

pi@iobroker:/opt/iobroker/iobroker-data/esphome.0 $ sudo python rec_stream_to_file.py -p 12345 -f /test.wav -l 5  -t 1 -fb true -dae true
Traceback (most recent call last):
  File "/opt/iobroker/iobroker-data/esphome.0/rec_stream_to_file.py", line 137, in <module>
    server(args.port,temp_file,args.max_length,args.max_timeout,args.feedback,args.daemon)
TypeError: server() takes from 1 to 2 positional arguments but 6 were given

any idear ? here is the complete file

import argparse
import pathlib

from pathlib import Path
from socketserver import BaseRequestHandler, UDPServer
from threading import Thread
from time import sleep, monotonic_ns
from wave import open as wave_open

try:
    from pyaudio import PyAudio
except ModuleNotFoundError:
    PyAudio = None

STREAM_CHANNELS = 1
STREAM_WIDTH = 2
STREAM_RATE = 16000


def server( port, file=None, *, max_length=None, timeout=None,
                          feedback=True, daemon=False):
    """Receive data on a UDP port and record to file or play as audio.

    Arguments:
        port       - port number on which to listen
        file       - file to which to write; if ending in '.wav' will
                     record as audio samples; if None will play audio
        max_length - if not None, stop after this number of seconds
                     from receipt of the first datagram
        timeout    - if not None, once a datagram has been received,
                     close file and return if datagrams doesn't arrive
                     faster than this period in seconds
        feedback   - if true, print a period on standard output for
                     each 4kibytes received & diagnostics at shutdown
        daemon     - if true, re-raise keyboard exception on exit
    """
    wv = False
    if file is not None:
        file = Path(file)
        wv = file.suffix.lower() == '.wav'

    activity_timestamp_ns = None
    start_timestamp_ns = None
    count = 0
    exception = None
    max_length_ns = None if max_length is None \
                                        else max_length * 1000000000
    timeout_ns = None if timeout is None else timeout * 1000000000
    needs_starting = False

    class Handler(BaseRequestHandler):
        def handle(self):
            nonlocal activity_timestamp_ns, start_timestamp_ns
            nonlocal count, needs_starting
            if wv:
                fh.writeframesraw(self.request[0])
            else:
                if needs_starting:
                    needs_starting = False
                    fh.start_stream()
                fh.write(self.request[0])
            previous_count = count
            count += len(self.request[0])
            if feedback and previous_count // 4096 != count // 4096:
                print('.', end='', flush=True)
            activity_timestamp_ns = monotonic_ns()
            if start_timestamp_ns is None:
                start_timestamp_ns = activity_timestamp_ns
    def read_stream():
        nonlocal exception
        with UDPServer(('0.0.0.0', int(port)), Handler) as server:
            thread = Thread(target=server.serve_forever)
            thread.start()
            try:
                while True:
                    sleep(1)
                    now_ns = monotonic_ns()
                    if timeout_ns is not None and \
                       activity_timestamp_ns is not None and \
                       now_ns - activity_timestamp_ns > timeout_ns:
                        break
                    if max_length_ns is not None and \
                       start_timestamp_ns is not None and \
                       now_ns - start_timestamp_ns > max_length_ns:
                        break
            except KeyboardInterrupt as e:
                exception = e
            if feedback:
                diagnostic = ' & removing empty file' if \
                                activity_timestamp_ns is None else ''
                print(f'\nshutting down{diagnostic}', flush=True)
            server.shutdown()
            thread.join()

    if file is not None:
        with (
            wave_open(str(file), 'wb') if wv else open(file, 'wb')
        ) as fh:
            if wv:
                fh.setnchannels(STREAM_CHANNELS)
                fh.setsampwidth(STREAM_WIDTH)
                fh.setframerate(STREAM_RATE)
            read_stream()
        if activity_timestamp_ns is None:
            file.unlink(missing_ok=True)
    else:
        if PyAudio:
            pya = PyAudio()
        else:
            raise ModuleNotFoundError(
                            'Install pyaudio for realtime streaming')
        needs_starting = True
        fh = pya.open(STREAM_RATE, STREAM_CHANNELS,
            pya.get_format_from_width(STREAM_WIDTH), output=True,
            start=not needs_starting
        )
        read_stream()
        fh.stop_stream()
        fh.close()
        pya.terminate()

    if exception and daemon:
        raise exception


if __name__ == "__main__":
	parser = argparse.ArgumentParser()
	parser.add_argument("-p","--port", type=int)
	parser.add_argument("-f","--file", type=str)
	parser.add_argument("-l","--max_length", type=int)
	parser.add_argument("-t","--max_timeout", type=int)
	parser.add_argument("-fb","--feedback", type=bool)
	parser.add_argument("-dae","--daemon", type=bool)
	args = parser.parse_args()

	temp_file = open(args.file, "w")

	server(args.port,temp_file,args.max_length,args.max_timeout,args.feedback,args.daemon)




Got it done… when i active the udp stream it is counting dots. and it stops when i stop the stream.
The File is not empty…
but i cant hear anything so i will seach now on the ESP side again

-finally I got it all done … the problem was that your config is for an PDM microphone.
once changed and cleaned the build i can hear now myself in the wav file.
Great many thanks.

Is there way to control the switch from the package in the user yaml file ?

Add “id: stream_audio” to the switch in the package file and you can control it with “switch.turn_on: stream_audio” from your actions. (Also “turn_off” and “toggle”.)

In recent ESPHome releases, I’m afraid my on_data callback is no longer valid.

Here’s a new version of the package file that works with newer ESPHome releases and has the switch ID. (You could add a substitution to choose pdm: true or pdm: false.)

##
# @file
#
# Streaming audio from device with with mono 16kHz 32bit PDM I2S microphone.
# Change substitutions before including package to adapt GPIO assignments.
#
# Copyright (c) Spamfast 2024-2025
#
# @version $Id: basic-udp-microphone.yaml 2999 2025-06-11 16:05:11Z spamfast $
#

substitutions:
  i2s_lrclk_pin: "40"
  i2s_din_pin: "41"

globals:
- id: sock_fd
  type: volatile int
  initial_value: "-1"

esphome:
  project:
    name: under-the-mountain.basic-udp-microphone
    version: 1.0.3
  includes:
  - components/net_headers.h
  on_boot:
  - priority: -200
    then:
    - script.execute: update_server

wifi:
  on_connect:
    then:
    - script.execute: update_server
  on_disconnect:
    then:
    - script.execute: update_server

i2s_audio:
- id: i2saudio
  i2s_lrclk_pin:
    number: ${i2s_lrclk_pin}

microphone:
- platform: i2s_audio
  id: mic
  i2s_din_pin: ${i2s_din_pin}
  pdm: true
  sample_rate: 16000
  bits_per_sample: 32bit
  adc_type: external
  on_data:
    then:
    - lambda: |-
        #pragma GCC diagnostic push
        #pragma GCC diagnostic error "-Wall"
        auto type_val = x[0];
        using Buffer = std::vector<decltype(type_val)>;
        static Buffer buffer;
        static const size_t DATAGRAM_PAYLOAD_SIZE = 512;
        static_assert(DATAGRAM_PAYLOAD_SIZE % sizeof(Buffer::value_type) == 0, "item size not compatible with datagram size");
        static const size_t ITEMS_PER_FRAME = DATAGRAM_PAYLOAD_SIZE / sizeof(Buffer::value_type);
        if (id(sock_fd) < 0) {
          buffer.clear();
          buffer.shrink_to_fit();
        } else {
          buffer.reserve(ITEMS_PER_FRAME);
          for (const auto item : x) {
            static_assert(sizeof(item) == sizeof(Buffer::value_type), "fault with decltype");
            buffer.push_back(item);
            if (buffer.size() >= ITEMS_PER_FRAME) {
              (void) ::send(id(sock_fd), buffer.data(), buffer.size() * sizeof(Buffer::value_type), 0);
              buffer.clear();
              buffer.reserve(ITEMS_PER_FRAME);
            }
          }
        }
        #pragma GCC diagnostic pop

switch:
- platform: template
  name: Stream Audio
  id: stream_audio
  restore_mode: ALWAYS_OFF
  icon: mdi:record-rec
  lambda: "return id(mic).is_running();"
  turn_on_action:
    then:
    - script.execute: update_server
    - if:
        condition:
          lambda: "return id(sock_fd) >= 0;"
        then:
        - microphone.capture:
  turn_off_action:
    then:
    - microphone.stop_capture:

text:
- platform: template
  name: UDP Audio Target
  id: server
  icon: mdi:ip-outline
  optimistic: true
  mode: text
  restore_value: true
  entity_category: config
  on_value:
    then:
    - script.execute: update_server

text_sensor:
- platform: template
  name: UDP Audio Target
  id: server_socket
  icon: mdi:ip
  update_interval: never
  entity_category: diagnostic

script:
- id: update_server
  mode: queued
  then:
  - lambda: |-
      #pragma GCC diagnostic push
      #pragma GCC diagnostic error "-Wall"
      static std::string prev_address;
      static std::string prev_port;
      std::string server_txt = id(server).state;
      while (! server_txt.empty() && isspace(server_txt.back())) {
        server_txt.resize(server_txt.size() - 1);
      }
      while (! server_txt.empty() && isspace(server_txt.front())) {
        server_txt = server_txt.substr(1);
      }
      const auto colon = server_txt.find(':');
      std::string address(server_txt.substr(0, colon));
      std::string port((colon == server_txt.npos) ? std::string() : server_txt.substr(colon + 1));
      while (! address.empty() && isspace(address.back())) {
        address.resize(address.size() - 1);
      }
      while (! port.empty() && isspace(port.front())) {
        port = port.substr(1);
      }
      if (server_txt.empty() || address.empty() || port.empty() || ! id(wlan).is_connected()) {
        id(mic).stop();
        if (! id(wlan).is_connected()) {
          id(server_socket).publish_state("offline");
        } else if (server_txt.empty()) {
          id(server_socket).publish_state("none");
        } else {
          id(server_socket).publish_state("malformed");
        }
        prev_address.clear();
        prev_port.clear();
        if (id(sock_fd) >= 0) {
          const int fd = id(sock_fd);
          id(sock_fd) = -1;
          (void) ::close(fd);
        }
      } else {
        if (prev_address != address || prev_port != port || id(sock_fd) < 0) {
          prev_address = address;
          prev_port = port;
          id(mic).stop();
          if (id(sock_fd) >= 0) {
            const int fd = id(sock_fd);
            id(sock_fd) = -1;
            (void) ::close(fd);
          }
          #pragma GCC diagnostic push
          #pragma GCC diagnostic ignored "-Wmissing-field-initializers"
          static const struct addrinfo HINTS = {
            .ai_family = AF_INET,
            .ai_socktype = SOCK_DGRAM,
          };
          #pragma GCC diagnostic pop
          struct addrinfo * addresses = nullptr;
          if (getaddrinfo(address.c_str(), port.c_str(), &HINTS, &addresses) != 0) {
            id(server_socket).publish_state("lookup-fail");
          } else if (addresses == nullptr) {
            id(server_socket).publish_state("no-addresses");
          } else {
            bool ipv4_found = false;
            for (const struct addrinfo * info = addresses; info != nullptr; info = info->ai_next) {
              if (info->ai_family == AF_INET) {
                ipv4_found = true;
                const int fd = ::socket(info->ai_family, SOCK_DGRAM, 0);
                if (fd >= 0) {
                  if (::connect(fd, info->ai_addr, info->ai_addrlen) != 0) {
                    (void) ::close(fd);
                  } else {
                    char txt[sizeof("255.255.255.255:65535")] = {};
                    const auto in4_sock_addr = reinterpret_cast<const struct sockaddr_in *>(info->ai_addr);
                    const auto ip_addr = ntohl(in4_sock_addr->sin_addr.s_addr);
                    std::snprintf(txt, sizeof(txt), "%u.%u.%u.%u:%u",
                      (unsigned) ((ip_addr >> 24) & 0xFFU),
                      (unsigned) ((ip_addr >> 16) & 0xFFU),
                      (unsigned) ((ip_addr >>  8) & 0xFFU),
                      (unsigned) ((ip_addr >>  0) & 0xFFU),
                      (unsigned) ntohs(in4_sock_addr->sin_port)
                    );
                    id(server_socket).publish_state(txt);                    
                    id(sock_fd) = fd;
                    break;
                  }
                }
              }
            }
            freeaddrinfo(addresses);
            if (id(sock_fd) < 0) {
              id(server_socket).publish_state(ipv4_found ? "socket/connect-fail" : "no-ipv4-address");
            }
          }
        }
      }
      #pragma GCC diagnostic pop

1 Like

I think this could be done much simpler using built-in udp component:

i2s_audio:
  i2s_lrclk_pin: GPIOXX
  i2s_bclk_pin: GPIOYY

microphone:
  - platform: i2s_audio
    id: mic
    adc_type: external
    i2s_din_pin: GPIOXX
    channel: left
    sample_rate: 48000
    bits_per_sample: 16bit
    i2s_mode: primary
    on_data:
      - udp.write:
          data: !lambda 'return x;'

udp:
  addresses: xx.xx.xx.xx # where to stream data
  port: 1234

That’s basically whole config (except boilerplate)

Then I use ffplay or mpv to listen to the audio with minimal latency:

ffplay -fflags nobuffer -flags low_delay -f s16le -ar 48000 -ch_layout mono -probesize 32 -analyzeduration 0 udp://0.0.0.0:1234

mpv udp://0.0.0.0:1234 -v --demuxer=rawaudio --demuxer-rawaudio-channels=1 --demuxer-rawaudio-rate=48000 --demuxer-rawaudio-format=s16le --untimed --cache=no

ffplay also by default displays nice spectrogram of playing audio.

Regarding endianness - in my experiments, it turned out to be little-endian, even though I previously saw s16be (big-endian) mentioned in this thread.

1 Like

I looked at the udp: component and for the most basic use case it is indeed simpler.

However the component will only accept a dotted-quad IPv4 address in the addresses: field which means the binary firmware generated will be tied to a fixed IP address target. You can’t even use a hostname or FQDN so as to use a changed DNS server record to change the target without reflashing.

So if you want to use a hostname/FQDN in your configuration you have to use the getaddrinfo() C POSIX API at runtime anyway. You might then be able to use theesphome::udp::UDPComponent ::add_address() C++ API - I don’t know, I’ve not tried it.

Before calling getaddrinfo you’ll have to wait until the WiFi (or Ethernet) is up & configured and you can’t change the address as there’s no API to remove addresses from the component instance. So if the network disconnects and reconnects - say if your ESPHome device is mobile - and the second DNS lookup changes which is possible with split-horizon DNS then you have to reboot your ESPHome device.

Similarly, if you want to be able to configure the target host address/name at runtime the udp: component can’t handle it.

And since it only accepts dotted-quad, you can’t use IPv6 even if you use network: { enable_ipv6: true } in your configuration.

This rather limits the syslog: component too since it relies on udp:.

yes, a general purpose DNS lookup module is needed for this and other applications, and that’s not available in ESPHome at the moment.

Please tell me the port opens but there is no sound in Stream! I will reproduce under the VLC (udp://192.168.22.200:1234) as well as under
ffplay -fflags nobuffer -flags low_delay -f s16le -ar 48000 -ch_layout mono -probesize 32 -analyzeduration 0 udp://192.168.22.200:1234
Windows10

esphome:
  name: mic-test
  friendly_name: mic_test

esp32:
  board: esp32-s3-devkitc-1
  framework:
    type: esp-idf

# Enable logging
logger:

# Enable Home Assistant API
api:
  encryption:
    key: "xxxxxxxxxxxxxxxxxxxxxxxxxxxx"

ota:
  - platform: esphome
    password: "xxxxxxxxxxxxxxxxx"

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Mic-Test Fallback Hotspot"
    password: "xxxxxx"

captive_portal:



i2s_audio:
  i2s_lrclk_pin: GPIO45
  i2s_bclk_pin: GPIO9

microphone:
  - platform: i2s_audio
    id: mic
    adc_type: external
    i2s_din_pin: GPIO10
    channel: left
    sample_rate: 16000
    bits_per_sample: 16bit
    i2s_mode: primary
    on_data:
      - udp.write:
          data: !lambda 'return x;'

udp:
  addresses: 192.168.22.200 # where to stream data
  port: 1234

switch:
- platform: template
  name: Stream Audio
  id: stream_audio
  restore_mode: ALWAYS_OFF
  icon: mdi:record-rec
  lambda: "return id(mic).is_running();"
  turn_on_action:
    then:
    - microphone.capture: mic
  turn_off_action:
    then:
    - microphone.stop_capture: mic


The microphone on the device works poorly.

Data on Pines are taken with.
https://github.com/RealDeco/xiaozhi-esphome

  # Hardware v2 pin mappings
  i2s_lrclk_pin: "45"        # I2S LRCLK (Word Select)
  i2s_bclk_pin: "9"          # I2S BCLK (Bit Clock)
  i2s_mclk_pin: "16"         # I2S MCLK (Master Clock)
  i2s_din_pin: "10"          # I2S Data In (Mic)
  i2s_dout_pin: "8"          # I2S Data Out (Speaker)

Tell me what was done by me not right?!
The firmware from the hub works- but almost does not disintegrate speech!

VLC  udp://192.168.22.200:1234
 :network-caching=1000 :demux=rawaud :rawaud-channels=1 :rawaud-samplerate=16000

Despite a lot of struggle I got it working.
First hurdle was that the microphone component wouldn’t return any data in ‘x’, this was caused by a missmatch in channel. I’m using a XIAO sense esp32s3 + expansion board with a MSM261D3526H1CPM microphone. The data sheet says its a half cycle PDM mic, and the L/R pin is tied to ground which SHOULD mean that the channel in esphome has to be set to right, but neither that nor left gave any data. stereo finally caused x to contain data. I used @Spamfast python function to receive the stream, but had to change the STREAM_CHANNELS = 2.

This is my final config (only the relevant part):

udp:
  - id: xx
    addresses: ["10.10.11.2"]
    port: 1234

i2s_audio:
  i2s_lrclk_pin: GPIO42 

microphone:
  - platform: i2s_audio
    id: mic
    channel: stereo
    adc_type: external
    i2s_din_pin: GPIO41
    sample_rate: 16000
    bits_per_sample: 16bit
    pdm: true
    on_data:
      - udp.write:
          id: xx
          data: !lambda |-
            return x;


switch:
- platform: template
  name: Stream Audio
  id: stream_audio
  restore_mode: ALWAYS_OFF
  icon: mdi:record-rec
  lambda: "return id(mic).is_running();"
  turn_on_action:
    then:
    - microphone.capture:
  turn_off_action:
    then:
    - microphone.stop_capture:

Hello everyone,

thank you very much for this topic - it seems to be exactly what I need: streaming audio from a microphone, a 3.5 jack, or Bluetooth to a server, processing it on the server, and then forwarding the stream to one or multiple receivers: Audio control, conferencing and multi-room audio

I have no experience with the ESP32 yet, so this may be a silly question. I’m considering using the ReSpeaker XVF3800 with the XIAO ESP32S3 to implement full-duplex audio streaming. The current idea is to receive sound from the microphone through I²S and send it to a server over WiFi/UDP (static IPv4), and to receive sound from the server over WiFi/UDP and pass it to a stereo amplifier/speakers. In addition, I need echo cancellation. It’s probably not doable with the built-in components only, right? Is it even doable with ESPHome?

I found this example configuration: Respeaker-XVF3800-ESPHome-integration/config/respeaker-xvf-satellite-example.yaml at 56bc71e0d43d77c359a62a8004a545999e7d3207 · formatBCE/Respeaker-XVF3800-ESPHome-integration · GitHub , but I haven’t found any reference to echo cancellation.

Hi, do you remember this post? I finally had time to dedicate to it. I had an active Claude Pro subscription these days and wanted to thoroughly test Claude Code. It wasn’t easy because there are so many variables at play. I often had to restart it from scratch and help with debugging, but in the end we got there. With this external component we wrote, you can achieve two-way audio. In future versions, I might even modify it to choose the audio direction, whether it’s only one way or the other. Currently, it works in full duplex, so you can speak and hear from both devices. I also need to fix the echo that comes back into the microphone when the device emits sounds. I noticed a system in the esp-idf documentation that would allow this, but when I tried to implement it, the audio got significantly worse. But it’s a good start:
https://community.home-assistant.io/t/esphome-full-duplex-audio-intercom-because-i-was-bored-on-vacation/966706