you are right… my taget is to dump it to a file and then get it translated to text with google.
How do i call it: python rec_stream_to_file.py 12345 test.wav None 1 true true
doesn’t retrun anything.
you are right… my taget is to dump it to a file and then get it translated to text with google.
How do i call it: python rec_stream_to_file.py 12345 test.wav None 1 true true
doesn’t retrun anything.
The code I pasted is just a Python function. You’ll have to write your own bit of Python ‘main’ code in the file to parse your command line arguments and call the function. (See sys.argv in Python docs.)
ah ok now …
i understand
I think i could need some more help…
i am passing the args
import argparse
import pathlib
parser = argparse.ArgumentParser()
parser.add_argument("-p","--port", type=int)
parser.add_argument("-f","--file", type=str)
parser.add_argument("-l","--max_length", type=int)
parser.add_argument("-t","--max_timeout", type=int)
parser.add_argument("-fb","--feedback", type=bool)
parser.add_argument("-dae","--daemon", type=bool)
args = parser.parse_args()
then behind the function i open the file, and call the server
temp_file = open(args.file, "w")
server(args.port,temp_file,args.max_length,args.max_timeout,args.feedback,args.daemon)
but i get an error when callig it:
pi@iobroker:/opt/iobroker/iobroker-data/esphome.0 $ sudo python rec_stream_to_file.py -p 12345 -f /test.wav -l 5 -t 1 -fb true -dae true
Traceback (most recent call last):
File "/opt/iobroker/iobroker-data/esphome.0/rec_stream_to_file.py", line 137, in <module>
server(args.port,temp_file,args.max_length,args.max_timeout,args.feedback,args.daemon)
TypeError: server() takes from 1 to 2 positional arguments but 6 were given
any idear ? here is the complete file
import argparse
import pathlib
from pathlib import Path
from socketserver import BaseRequestHandler, UDPServer
from threading import Thread
from time import sleep, monotonic_ns
from wave import open as wave_open
try:
from pyaudio import PyAudio
except ModuleNotFoundError:
PyAudio = None
STREAM_CHANNELS = 1
STREAM_WIDTH = 2
STREAM_RATE = 16000
def server( port, file=None, *, max_length=None, timeout=None,
feedback=True, daemon=False):
"""Receive data on a UDP port and record to file or play as audio.
Arguments:
port - port number on which to listen
file - file to which to write; if ending in '.wav' will
record as audio samples; if None will play audio
max_length - if not None, stop after this number of seconds
from receipt of the first datagram
timeout - if not None, once a datagram has been received,
close file and return if datagrams doesn't arrive
faster than this period in seconds
feedback - if true, print a period on standard output for
each 4kibytes received & diagnostics at shutdown
daemon - if true, re-raise keyboard exception on exit
"""
wv = False
if file is not None:
file = Path(file)
wv = file.suffix.lower() == '.wav'
activity_timestamp_ns = None
start_timestamp_ns = None
count = 0
exception = None
max_length_ns = None if max_length is None \
else max_length * 1000000000
timeout_ns = None if timeout is None else timeout * 1000000000
needs_starting = False
class Handler(BaseRequestHandler):
def handle(self):
nonlocal activity_timestamp_ns, start_timestamp_ns
nonlocal count, needs_starting
if wv:
fh.writeframesraw(self.request[0])
else:
if needs_starting:
needs_starting = False
fh.start_stream()
fh.write(self.request[0])
previous_count = count
count += len(self.request[0])
if feedback and previous_count // 4096 != count // 4096:
print('.', end='', flush=True)
activity_timestamp_ns = monotonic_ns()
if start_timestamp_ns is None:
start_timestamp_ns = activity_timestamp_ns
def read_stream():
nonlocal exception
with UDPServer(('0.0.0.0', int(port)), Handler) as server:
thread = Thread(target=server.serve_forever)
thread.start()
try:
while True:
sleep(1)
now_ns = monotonic_ns()
if timeout_ns is not None and \
activity_timestamp_ns is not None and \
now_ns - activity_timestamp_ns > timeout_ns:
break
if max_length_ns is not None and \
start_timestamp_ns is not None and \
now_ns - start_timestamp_ns > max_length_ns:
break
except KeyboardInterrupt as e:
exception = e
if feedback:
diagnostic = ' & removing empty file' if \
activity_timestamp_ns is None else ''
print(f'\nshutting down{diagnostic}', flush=True)
server.shutdown()
thread.join()
if file is not None:
with (
wave_open(str(file), 'wb') if wv else open(file, 'wb')
) as fh:
if wv:
fh.setnchannels(STREAM_CHANNELS)
fh.setsampwidth(STREAM_WIDTH)
fh.setframerate(STREAM_RATE)
read_stream()
if activity_timestamp_ns is None:
file.unlink(missing_ok=True)
else:
if PyAudio:
pya = PyAudio()
else:
raise ModuleNotFoundError(
'Install pyaudio for realtime streaming')
needs_starting = True
fh = pya.open(STREAM_RATE, STREAM_CHANNELS,
pya.get_format_from_width(STREAM_WIDTH), output=True,
start=not needs_starting
)
read_stream()
fh.stop_stream()
fh.close()
pya.terminate()
if exception and daemon:
raise exception
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("-p","--port", type=int)
parser.add_argument("-f","--file", type=str)
parser.add_argument("-l","--max_length", type=int)
parser.add_argument("-t","--max_timeout", type=int)
parser.add_argument("-fb","--feedback", type=bool)
parser.add_argument("-dae","--daemon", type=bool)
args = parser.parse_args()
temp_file = open(args.file, "w")
server(args.port,temp_file,args.max_length,args.max_timeout,args.feedback,args.daemon)
Got it done… when i active the udp stream it is counting dots. and it stops when i stop the stream.
The File is not empty…
but i cant hear anything so i will seach now on the ESP side again
-finally I got it all done … the problem was that your config is for an PDM microphone.
once changed and cleaned the build i can hear now myself in the wav file.
Great many thanks.
Is there way to control the switch from the package in the user yaml file ?
Add “id: stream_audio” to the switch in the package file and you can control it with “switch.turn_on: stream_audio” from your actions. (Also “turn_off” and “toggle”.)
In recent ESPHome releases, I’m afraid my on_data callback is no longer valid.
Here’s a new version of the package file that works with newer ESPHome releases and has the switch ID. (You could add a substitution to choose pdm: true or pdm: false.)
##
# @file
#
# Streaming audio from device with with mono 16kHz 32bit PDM I2S microphone.
# Change substitutions before including package to adapt GPIO assignments.
#
# Copyright (c) Spamfast 2024-2025
#
# @version $Id: basic-udp-microphone.yaml 2999 2025-06-11 16:05:11Z spamfast $
#
substitutions:
i2s_lrclk_pin: "40"
i2s_din_pin: "41"
globals:
- id: sock_fd
type: volatile int
initial_value: "-1"
esphome:
project:
name: under-the-mountain.basic-udp-microphone
version: 1.0.3
includes:
- components/net_headers.h
on_boot:
- priority: -200
then:
- script.execute: update_server
wifi:
on_connect:
then:
- script.execute: update_server
on_disconnect:
then:
- script.execute: update_server
i2s_audio:
- id: i2saudio
i2s_lrclk_pin:
number: ${i2s_lrclk_pin}
microphone:
- platform: i2s_audio
id: mic
i2s_din_pin: ${i2s_din_pin}
pdm: true
sample_rate: 16000
bits_per_sample: 32bit
adc_type: external
on_data:
then:
- lambda: |-
#pragma GCC diagnostic push
#pragma GCC diagnostic error "-Wall"
auto type_val = x[0];
using Buffer = std::vector<decltype(type_val)>;
static Buffer buffer;
static const size_t DATAGRAM_PAYLOAD_SIZE = 512;
static_assert(DATAGRAM_PAYLOAD_SIZE % sizeof(Buffer::value_type) == 0, "item size not compatible with datagram size");
static const size_t ITEMS_PER_FRAME = DATAGRAM_PAYLOAD_SIZE / sizeof(Buffer::value_type);
if (id(sock_fd) < 0) {
buffer.clear();
buffer.shrink_to_fit();
} else {
buffer.reserve(ITEMS_PER_FRAME);
for (const auto item : x) {
static_assert(sizeof(item) == sizeof(Buffer::value_type), "fault with decltype");
buffer.push_back(item);
if (buffer.size() >= ITEMS_PER_FRAME) {
(void) ::send(id(sock_fd), buffer.data(), buffer.size() * sizeof(Buffer::value_type), 0);
buffer.clear();
buffer.reserve(ITEMS_PER_FRAME);
}
}
}
#pragma GCC diagnostic pop
switch:
- platform: template
name: Stream Audio
id: stream_audio
restore_mode: ALWAYS_OFF
icon: mdi:record-rec
lambda: "return id(mic).is_running();"
turn_on_action:
then:
- script.execute: update_server
- if:
condition:
lambda: "return id(sock_fd) >= 0;"
then:
- microphone.capture:
turn_off_action:
then:
- microphone.stop_capture:
text:
- platform: template
name: UDP Audio Target
id: server
icon: mdi:ip-outline
optimistic: true
mode: text
restore_value: true
entity_category: config
on_value:
then:
- script.execute: update_server
text_sensor:
- platform: template
name: UDP Audio Target
id: server_socket
icon: mdi:ip
update_interval: never
entity_category: diagnostic
script:
- id: update_server
mode: queued
then:
- lambda: |-
#pragma GCC diagnostic push
#pragma GCC diagnostic error "-Wall"
static std::string prev_address;
static std::string prev_port;
std::string server_txt = id(server).state;
while (! server_txt.empty() && isspace(server_txt.back())) {
server_txt.resize(server_txt.size() - 1);
}
while (! server_txt.empty() && isspace(server_txt.front())) {
server_txt = server_txt.substr(1);
}
const auto colon = server_txt.find(':');
std::string address(server_txt.substr(0, colon));
std::string port((colon == server_txt.npos) ? std::string() : server_txt.substr(colon + 1));
while (! address.empty() && isspace(address.back())) {
address.resize(address.size() - 1);
}
while (! port.empty() && isspace(port.front())) {
port = port.substr(1);
}
if (server_txt.empty() || address.empty() || port.empty() || ! id(wlan).is_connected()) {
id(mic).stop();
if (! id(wlan).is_connected()) {
id(server_socket).publish_state("offline");
} else if (server_txt.empty()) {
id(server_socket).publish_state("none");
} else {
id(server_socket).publish_state("malformed");
}
prev_address.clear();
prev_port.clear();
if (id(sock_fd) >= 0) {
const int fd = id(sock_fd);
id(sock_fd) = -1;
(void) ::close(fd);
}
} else {
if (prev_address != address || prev_port != port || id(sock_fd) < 0) {
prev_address = address;
prev_port = port;
id(mic).stop();
if (id(sock_fd) >= 0) {
const int fd = id(sock_fd);
id(sock_fd) = -1;
(void) ::close(fd);
}
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wmissing-field-initializers"
static const struct addrinfo HINTS = {
.ai_family = AF_INET,
.ai_socktype = SOCK_DGRAM,
};
#pragma GCC diagnostic pop
struct addrinfo * addresses = nullptr;
if (getaddrinfo(address.c_str(), port.c_str(), &HINTS, &addresses) != 0) {
id(server_socket).publish_state("lookup-fail");
} else if (addresses == nullptr) {
id(server_socket).publish_state("no-addresses");
} else {
bool ipv4_found = false;
for (const struct addrinfo * info = addresses; info != nullptr; info = info->ai_next) {
if (info->ai_family == AF_INET) {
ipv4_found = true;
const int fd = ::socket(info->ai_family, SOCK_DGRAM, 0);
if (fd >= 0) {
if (::connect(fd, info->ai_addr, info->ai_addrlen) != 0) {
(void) ::close(fd);
} else {
char txt[sizeof("255.255.255.255:65535")] = {};
const auto in4_sock_addr = reinterpret_cast<const struct sockaddr_in *>(info->ai_addr);
const auto ip_addr = ntohl(in4_sock_addr->sin_addr.s_addr);
std::snprintf(txt, sizeof(txt), "%u.%u.%u.%u:%u",
(unsigned) ((ip_addr >> 24) & 0xFFU),
(unsigned) ((ip_addr >> 16) & 0xFFU),
(unsigned) ((ip_addr >> 8) & 0xFFU),
(unsigned) ((ip_addr >> 0) & 0xFFU),
(unsigned) ntohs(in4_sock_addr->sin_port)
);
id(server_socket).publish_state(txt);
id(sock_fd) = fd;
break;
}
}
}
}
freeaddrinfo(addresses);
if (id(sock_fd) < 0) {
id(server_socket).publish_state(ipv4_found ? "socket/connect-fail" : "no-ipv4-address");
}
}
}
}
#pragma GCC diagnostic pop
I think this could be done much simpler using built-in udp component:
i2s_audio:
i2s_lrclk_pin: GPIOXX
i2s_bclk_pin: GPIOYY
microphone:
- platform: i2s_audio
id: mic
adc_type: external
i2s_din_pin: GPIOXX
channel: left
sample_rate: 48000
bits_per_sample: 16bit
i2s_mode: primary
on_data:
- udp.write:
data: !lambda 'return x;'
udp:
addresses: xx.xx.xx.xx # where to stream data
port: 1234
That’s basically whole config (except boilerplate)
Then I use ffplay or mpv to listen to the audio with minimal latency:
ffplay -fflags nobuffer -flags low_delay -f s16le -ar 48000 -ch_layout mono -probesize 32 -analyzeduration 0 udp://0.0.0.0:1234
mpv udp://0.0.0.0:1234 -v --demuxer=rawaudio --demuxer-rawaudio-channels=1 --demuxer-rawaudio-rate=48000 --demuxer-rawaudio-format=s16le --untimed --cache=no
ffplay also by default displays nice spectrogram of playing audio.
Regarding endianness - in my experiments, it turned out to be little-endian, even though I previously saw s16be (big-endian) mentioned in this thread.
I looked at the udp: component and for the most basic use case it is indeed simpler.
However the component will only accept a dotted-quad IPv4 address in the addresses: field which means the binary firmware generated will be tied to a fixed IP address target. You can’t even use a hostname or FQDN so as to use a changed DNS server record to change the target without reflashing.
So if you want to use a hostname/FQDN in your configuration you have to use the getaddrinfo() C POSIX API at runtime anyway. You might then be able to use theesphome::udp::UDPComponent ::add_address() C++ API - I don’t know, I’ve not tried it.
Before calling getaddrinfo you’ll have to wait until the WiFi (or Ethernet) is up & configured and you can’t change the address as there’s no API to remove addresses from the component instance. So if the network disconnects and reconnects - say if your ESPHome device is mobile - and the second DNS lookup changes which is possible with split-horizon DNS then you have to reboot your ESPHome device.
Similarly, if you want to be able to configure the target host address/name at runtime the udp: component can’t handle it.
And since it only accepts dotted-quad, you can’t use IPv6 even if you use network: { enable_ipv6: true } in your configuration.
This rather limits the syslog: component too since it relies on udp:.
yes, a general purpose DNS lookup module is needed for this and other applications, and that’s not available in ESPHome at the moment.
Please tell me the port opens but there is no sound in Stream! I will reproduce under the VLC (udp://192.168.22.200:1234) as well as under
ffplay -fflags nobuffer -flags low_delay -f s16le -ar 48000 -ch_layout mono -probesize 32 -analyzeduration 0 udp://192.168.22.200:1234
Windows10
esphome:
name: mic-test
friendly_name: mic_test
esp32:
board: esp32-s3-devkitc-1
framework:
type: esp-idf
# Enable logging
logger:
# Enable Home Assistant API
api:
encryption:
key: "xxxxxxxxxxxxxxxxxxxxxxxxxxxx"
ota:
- platform: esphome
password: "xxxxxxxxxxxxxxxxx"
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
# Enable fallback hotspot (captive portal) in case wifi connection fails
ap:
ssid: "Mic-Test Fallback Hotspot"
password: "xxxxxx"
captive_portal:
i2s_audio:
i2s_lrclk_pin: GPIO45
i2s_bclk_pin: GPIO9
microphone:
- platform: i2s_audio
id: mic
adc_type: external
i2s_din_pin: GPIO10
channel: left
sample_rate: 16000
bits_per_sample: 16bit
i2s_mode: primary
on_data:
- udp.write:
data: !lambda 'return x;'
udp:
addresses: 192.168.22.200 # where to stream data
port: 1234
switch:
- platform: template
name: Stream Audio
id: stream_audio
restore_mode: ALWAYS_OFF
icon: mdi:record-rec
lambda: "return id(mic).is_running();"
turn_on_action:
then:
- microphone.capture: mic
turn_off_action:
then:
- microphone.stop_capture: mic
The microphone on the device works poorly.
Data on Pines are taken with.
https://github.com/RealDeco/xiaozhi-esphome
# Hardware v2 pin mappings
i2s_lrclk_pin: "45" # I2S LRCLK (Word Select)
i2s_bclk_pin: "9" # I2S BCLK (Bit Clock)
i2s_mclk_pin: "16" # I2S MCLK (Master Clock)
i2s_din_pin: "10" # I2S Data In (Mic)
i2s_dout_pin: "8" # I2S Data Out (Speaker)
Tell me what was done by me not right?!
The firmware from the hub works- but almost does not disintegrate speech!
VLC udp://192.168.22.200:1234
:network-caching=1000 :demux=rawaud :rawaud-channels=1 :rawaud-samplerate=16000
Despite a lot of struggle I got it working.
First hurdle was that the microphone component wouldn’t return any data in ‘x’, this was caused by a missmatch in channel. I’m using a XIAO sense esp32s3 + expansion board with a MSM261D3526H1CPM microphone. The data sheet says its a half cycle PDM mic, and the L/R pin is tied to ground which SHOULD mean that the channel in esphome has to be set to right, but neither that nor left gave any data. stereo finally caused x to contain data. I used @Spamfast python function to receive the stream, but had to change the STREAM_CHANNELS = 2.
This is my final config (only the relevant part):
udp:
- id: xx
addresses: ["10.10.11.2"]
port: 1234
i2s_audio:
i2s_lrclk_pin: GPIO42
microphone:
- platform: i2s_audio
id: mic
channel: stereo
adc_type: external
i2s_din_pin: GPIO41
sample_rate: 16000
bits_per_sample: 16bit
pdm: true
on_data:
- udp.write:
id: xx
data: !lambda |-
return x;
switch:
- platform: template
name: Stream Audio
id: stream_audio
restore_mode: ALWAYS_OFF
icon: mdi:record-rec
lambda: "return id(mic).is_running();"
turn_on_action:
then:
- microphone.capture:
turn_off_action:
then:
- microphone.stop_capture:
Hello everyone,
thank you very much for this topic - it seems to be exactly what I need: streaming audio from a microphone, a 3.5 jack, or Bluetooth to a server, processing it on the server, and then forwarding the stream to one or multiple receivers: Audio control, conferencing and multi-room audio
I have no experience with the ESP32 yet, so this may be a silly question. I’m considering using the ReSpeaker XVF3800 with the XIAO ESP32S3 to implement full-duplex audio streaming. The current idea is to receive sound from the microphone through I²S and send it to a server over WiFi/UDP (static IPv4), and to receive sound from the server over WiFi/UDP and pass it to a stereo amplifier/speakers. In addition, I need echo cancellation. It’s probably not doable with the built-in components only, right? Is it even doable with ESPHome?
I found this example configuration: Respeaker-XVF3800-ESPHome-integration/config/respeaker-xvf-satellite-example.yaml at 56bc71e0d43d77c359a62a8004a545999e7d3207 · formatBCE/Respeaker-XVF3800-ESPHome-integration · GitHub , but I haven’t found any reference to echo cancellation.
Hi, do you remember this post? I finally had time to dedicate to it. I had an active Claude Pro subscription these days and wanted to thoroughly test Claude Code. It wasn’t easy because there are so many variables at play. I often had to restart it from scratch and help with debugging, but in the end we got there. With this external component we wrote, you can achieve two-way audio. In future versions, I might even modify it to choose the audio direction, whether it’s only one way or the other. Currently, it works in full duplex, so you can speak and hear from both devices. I also need to fix the echo that comes back into the microphone when the device emits sounds. I noticed a system in the esp-idf documentation that would allow this, but when I tried to implement it, the audio got significantly worse. But it’s a good start:
https://community.home-assistant.io/t/esphome-full-duplex-audio-intercom-because-i-was-bored-on-vacation/966706