šŸ”” ESPHome Full-Duplex Audio Intercom

I’m always happy to provide feedback!

Before you go down the trainer route, there are a bunch out there already, including mine. :slight_smile:

I based mine on @TaterTotterson 's and he then incorporated some of my stuff back into his. That’s not to say ours are perfect or fit everyone’s needs so if you do decide to create your own, you can at least get some good ideas from ours.

Getting stuff merged into ESPHome can be a bit of a pain the first time so I wouldn’t blame you for making that a lower priority. It’s just something to think about when you’ve run out of other stuff to do. :slight_smile:

Great, then, if they already work well, I’ll avoid wasting time and use yours. From my research, I hadn’t seen yours in particular. I knew about Totter’s, but I HATE ipnyb notebooks. I started with a CUDA-only approach, and was trying with IPA phonetics. Getting an English TTS to speak in Italian to generate samples is a real pain. Then sample generation + sample generation, not just Libritts but also others, then similar samples to use as negative examples, and then the classic training with noise/background noise, etc.

Yeah that was my reason for going the CLI route. Anyway, feel free to take whatever you need from my repo if it’ll help. The training process is kind of a black art so good luck!

1 Like

Hi,
found this awesome project when looking for a replacement for my Nuki Opener, therefore Im really curious about the Doorbell functionality.
Would it be possible to interface with analog intercoms?
The intercom currently installed runs 4+n wires (CA, speaker, mic, open, ground).
A remote door opening function would be great :slight_smile:

Will install it on an ESP32 and follow the developement here.
Ben

The project is meant to be used with the new ESP devices that have speakers and mics (like the dozens S3 variants from AliExpress. I.E. xiaozhi balls).

There’s no current support for analog intercom, but, if you can deal with the ADC/DAC portion, level shifting, dry contact relays and phone protocol (most intercom are just good old PABX phones), you could make the intercom smart enought to interface with other devices

And instead…If AzonInc succeeds, it could seriously pave the way for this type of use :crossed_fingers:

Hi, guys.

Sorry the long time missing, but life happens…

@meconiotech , I was finally able to test with the xiaozhi-ball-v3-va-intercom.yaml as you long sugested

Had to fine tune the wake work sensitivity a little due to some false positives detection (using alexa wake word)
The music assistant integration still fails nativelly, but it seems to be a problem with all the home assistant native players.

The MA used to fail with:

2026-04-01 10:47:36.772 ERROR (MainThread) [aiohttp.server] Error handling request from 192.168.11.39
Traceback (most recent call last):
  File "/app/venv/lib/python3.13/site-packages/aiohttp/web_protocol.py", line 510, in _handle_request
    resp = await request_handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/venv/lib/python3.13/site-packages/aiohttp/web_app.py", line 569, in _handle
    return await handler(request)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/app/venv/lib/python3.13/site-packages/music_assistant/controllers/streams/streams_controller.py", line 633, in serve_queue_flow_stream
    output_format = await self.get_output_format(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<4 lines>...
    )
    ^
  File "/app/venv/lib/python3.13/site-packages/music_assistant/controllers/streams/streams_controller.py", line 1960, in get_output_format
    player_max_bit_depth = max(supported_bit_depths)
ValueError: max() iterable argument is empty

After adding this to my configuration.yaml, it works just fine:

homeassistant:
  customize:
    media_player.alexa_do_escritorio:
      supported_bit_depths:
        - 16

Wake word detection during music playback is perfect. And mid response too

I finally think it’s ready to my use case and cannot be more greatfull for your amazing work

I’ll now finally work on polishing some edges to my use case and create my own character (even loving troiaio, she’s your own devil to please :wink: )

But I won’t be gone for good. I hope that to the end of the month I finally get two more esp devices (two p4+c6 boards) and one of them with a camera.
And I’ll finally be able to help with the intercom part too

Hi Will, nice to hear from you again. I’m glad everything is working well for you. I’ve put quite a bit of work into it, and at this point I’m actually using the project in every room at home.

Just one thing: be careful with the ESP32-C6. It only has a single core, and I haven’t personally tested it. Intercom will most likely work fine, but AEC, I2S full-duplex audio, media player, speaker, and GUI are all quite heavy tasks.

In the code, to make all of this coexist properly, I split the workload across multiple cores and enabled PSRAM wherever possible. With only one core, it’s likely you won’t be able to get the full experience.

As for the ESP32-P4, for now it’s the best ESP model I’ve worked with. I absolutely love it.

I also started using Music Assistant myself, since people kept talking about it on the forums, and so far I’m enjoying it. What I don’t understand is why I never had to make the 16-bit change on my side.

One thing I really can’t stand about MA is that I can’t seek through the timeline. For example, if I’m listening to a podcast, I can’t skip to the part I’m interested in and I’m forced to play it from beginning to end. I see the same problem even when I play content on a Google Nest.

For the assistant avatar, I hope it helps that I added a sort of theme support in the YAML, so you can build your own AI avatar however you like. For the idle animation, I usually generate a video with Gemini and then split the frames using FFmpeg.

1 Like

I have the same issue with seeking in MA. But I thought it was due to the fact I use youtube music as my main provider and it’s implementation is ā€œbadā€ to say the least due to missing a proper API as spotify have

I’ll add some local files and give it a try later

As for the esp32C6, I meant a dual stack board. P4s don’t have bluetooth nor wifi itself, so they rely on a C6 for that. In my opion (and your experience) even better, as all the communication stuff is offloaded, leaving the P4 only for what matters

In the next day I’ll try to work in the little things I said and in the character implementation, maybe ā€œmergingā€ with the ones that already existis in @RealDeco 's repo (GitHub - RealDeco/xiaozhi-esphome: Alternative code to use xiaozhi ai devices in esphome/home assistant. Ā· GitHub) for a good variety start

1 Like

Using this, HA Assist, and Music Assistant would pretty much allow me to get off Alexa, well, almost, need a good grocery list integration. Been watching the FutureProof Homes Satellite 1 dev kit for this sort of usage as well.

I have the same goal.

What I’ve found so dar about the shopping list is that HA suporte for it is the weak point. Trying a few add-ons, but all bad todo. Even for such a simples thing

I’ve seen the satellite1 project from it’s launch and it’s amazing, but much more expensive (but up tĆ“ it’s quality)

As my echos get dumber and dumber, and Amazon seems to have abandoned the project, being able to use inexpensive devices as the xiaozhi balls along with free tier IA or even 100% local and get more than what I had back when Alexa first lauched ia simply perfect

Injusto wish I could flash all my echos (that should be possible as I paid for that) just for the sound quality

:warning: Major development update: PBX-lite rewrite / 2026.5.0-dev

Hi everyone,

I am preparing a major development update for ESPHome Intercom.

This is not a small feature release. It is a semantic cleanup of the whole project, and I want to explain it clearly before it lands because it will affect existing
YAMLs.

The short version:

The next release will break existing YAMLs, but it is moving the project toward a much cleaner and more flexible architecture.

The project started as a way to make browser-to-ESP and ESP-to-Home-Assistant full-duplex intercom calls work. Over time, it grew into something much bigger: ESP-to-
ESP calls, touch UIs, Voice Assistant coexistence, AEC/AFE audio processing, TCP and UDP transport, routing, call states, and Home Assistant integration.

At that point the old model was no longer enough.

:telephone_receiver: From PBX-like to PBX-lite

The project is moving from ā€œPBX-like behaviorā€ to a real PBX-lite model.

Every ESP is now treated as an independent extension.

That means an ESP can:

  • originate a call
  • receive a call
  • ring
  • answer
  • decline
  • hang up
  • report why a call ended
  • keep its own call state
  • call another ESP directly when possible

Home Assistant is no longer treated as the hidden owner of every call. It becomes another member of the PBX system: it can call ESPs, receive calls from ESPs, and
optionally bridge calls between devices.

Do not be scared by the telephone terminology.

When I say PBX, extension, softphone, routing, or call state, I am not saying you must build a telephone system. You can still use this exactly like before: one ESP
at the door, one Home Assistant card, auto-answer if you want an intercom, manual answer if you want it to ring first.

The point of the new model is that the simple case becomes just one clean case of the bigger system, instead of being held together by special-case logic.

A one-device setup is simply a PBX-lite setup with one destination in the phonebook: Home Assistant.

:notebook: Unified phonebook

The old split between separate TCP and UDP phonebooks is being replaced by a single protocol-aware phonebook.

The new logical format is:

Name|tcp|ip|tcp_port
Name|udp|ip|udp_audio_port|udp_control_port
Name|ha|ip|tcp_port|udp_audio_port|udp_control_port

Home Assistant publishes one roster:

sensor.intercom_phonebook

The ESP firmware subscribes to that single roster and locally shapes it for its own transport.

This means:

  • TCP firmware keeps TCP peers direct.
  • UDP firmware keeps UDP peers direct.
  • TCP-to-UDP and UDP-to-TCP calls go through Home Assistant.
  • The real destination name is preserved even when HA is used as bridge.

So if a TCP ESP calls a UDP ESP, it does not pretend the destination is Home Assistant. The call still targets the real ESP, but the network endpoint points to HA
because HA must bridge between protocols.

:satellite: Direct ESP-to-ESP calls

Same-protocol devices can call each other directly.

Examples:

  • TCP ESP -> TCP ESP: direct call
  • UDP ESP -> UDP ESP: direct call
  • TCP ESP -> UDP ESP: bridged by Home Assistant
  • UDP ESP -> TCP ESP: bridged by Home Assistant

This is the important architectural change: the ESP devices are real call endpoints, not just Home Assistant-controlled speakers/microphones.

One way or another, the project will also include ESP-side auto-discovery.

The goal is that if multiple intercom ESPs are on the same network, they should be able to discover each other, merge those peers into the local phonebook, and call
each other even before Home Assistant gets involved.

There is already work in progress around a generic ESPHome mdns_browser component for this. I am treating it as infrastructure, not as an intercom-only shortcut:
it should be able to browse arbitrary mDNS service types and let YAML/components decide how to use those discovered services.

For the current dev baseline, the stable source of truth is still the Home Assistant-published phonebook. ESP-side discovery will be enabled in the standard YAMLs
only after it has been validated long enough to avoid boot or reconnect instability.

:house_with_garden: Home Assistant as PBX

Home Assistant can now act in two ways.

First, Home Assistant can be a destination.

If the selected ESP destination is the Home Assistant instance name, for example Home, Office, or whatever you configured in HA settings, the Lovelace card
behaves like a softphone. Pressing Call means Home Assistant calls the ESP, and the ESP sees an incoming call from HA.

Second, Home Assistant can be a bridge.

This is useful when:

  • two ESPs use different transports
  • you want the call path to go through HA
  • you want HA to keep visibility of a call
  • you explicitly enable HA-as-PBX routing

This mode is optional. Direct peer-to-peer is still the cleanest path when both devices can talk directly.

:iphone: Lovelace card behavior

The card now follows a simple rule.

If the selected ESP destination is another ESP, the card mirrors the ESP.

Pressing Call on the card presses Call on the ESP. The ESP originates the call. The card reflects the ESP state: outgoing, ringing, streaming, destination, caller,
and end reason.

If the selected ESP destination is Home Assistant, the card behaves like a softphone.

This makes the card less magical and more predictable. It mirrors the device when the ESP owns the call, and it becomes a HA endpoint only when HA itself is
selected.

There is also a show_protocol option so the card can show TCP/UDP context and distinguish normal ESP-to-ESP calls from inter-protocol calls.

:no_entry_sign: Call reasons and Do Not Disturb

Call end reasons now travel through the protocol.

For example, if a device has Do Not Disturb enabled and another ESP calls it, the callee can send:

DND

The caller receives that reason and can show it on screen. Home Assistant and the Lovelace card preserve the reason instead of replacing it with a generic message.

Free-form reason strings are also supported. The PBX layer should forward the real reason, not invent a new one.

This matters because the ESP remains authoritative when the card is mirroring an ESP-to-ESP call. The card should show what the ESP reports, not reinterpret the call
as if HA owned it.

:sound: Audio stack cleanup

The audio stack is being cleaned up into clearer layers:

  • i2s_audio_duplex: full-duplex I2S, speaker/mic paths, AEC reference capture, FIR decimation, runtime audio controls
  • esp_aec: lightweight echo cancellation path
  • esp_afe: full Espressif Audio Front-End path for advanced boards
  • audio_processor: shared interface used by both AEC and AFE processors

Generic full-experience YAMLs are moving toward the lighter AEC path by default. Full AFE remains for boards where it makes sense and has been validated.

AFE user controls are now aligned with board topology.

Single-mic boards expose:

  • Echo Cancellation
  • Noise Suppression
  • Auto Gain Control
  • Voice Activity Detector

Dual-mic boards expose:

  • Echo Cancellation
  • Speech Enhancement
  • Voice Activity Detector

Codec boards use hardware Master Volume through the codec path. Generic/no-codec boards can use software speaker volume. This avoids confusing media-player volume,
intercom volume, and codec DAC volume.

:microphone: Voice Assistant coexistence

The full-experience YAMLs are still meant to combine intercom, full-duplex audio, media playback, wake word, and Home Assistant Voice Assistant on the same device.

The target is not ā€œeither intercom or voice assistantā€.

The target is one ESP device that can be:

  • an intercom endpoint
  • a Voice Assistant satellite
  • a media playback endpoint
  • a PBX-lite extension

This is why the audio stack had to be cleaned up. Intercom audio, media playback, wake word, AEC/AFE processing, and VA all need to share the same hardware without
stepping on each other.

:gear: TCP and UDP are both first-class

TCP is still the preferred reliable signaling/audio transport for many setups.

UDP remains important for simple low-latency direct ESP use cases and for users who want lightweight ESP-to-ESP behavior.

The point of the new architecture is not to kill one transport. The point is to make both transports fit into the same call model.

That is why the phonebook now carries the protocol, and why Home Assistant can bridge only when needed.

:warning: Breaking changes

The next release will break existing YAMLs.

Main changes:

  • versioning moves to calver: 2026.5.0
  • firmware and HA integration must be updated together
  • the wire protocol changed
  • PBX-lite becomes the default product model
  • the old simple/full split is gone
  • raw_udp remains as an explicit opt-in for raw audio use cases
  • TCP and UDP variants are explicit YAML files
  • the old split TCP/UDP phonebook model is replaced by a unified phonebook
  • UDP control uses port 6055
  • UDP audio and TCP still default to 6054
  • old UDP-specific intercom code is gone; UDP now lives inside intercom_api
  • old Home Assistant phonebook push/service patterns are being replaced by the unified phonebook subscription model
  • some device/entity naming is being cleaned up, so Home Assistant entities and automations may need to be renamed

Please do not update blindly when this lands.

Read the dev branch documentation first.

:books: Documentation to read before upgrading

The most important docs are:

  • README.md
  • docs/INTERCOM_PROTOCOL.md
  • docs/PHONEBOOK_PROTOCOL.md
  • docs/DEPLOYMENT_GUIDE.md
  • component READMEs under esphome/components/

The goal is to make the project easier to reason about long-term.

This may be uncomfortable during migration, but it gives the project a real foundation: independent ESP extensions, Home Assistant as an optional PBX member,
protocol-aware routing, cleaner audio layers, and a phonebook model that can scale beyond the original doorbell use case.

:construction: Current status

This is still active development.

The stable baseline right now is:

  • Home Assistant publishes the unified phonebook.
  • ESPs subscribe to it.
  • ESP-to-ESP calls work when the phonebook gives them a compatible endpoint.
  • Home Assistant bridges when transport/protocol differences require it.
  • ESP-side mDNS discovery is being redesigned before it returns to the standard YAMLs.

So yes, the project is temporarily more disruptive, but the reason is to stop accumulating special cases and move toward a model that can scale.

The old system worked, but it was becoming difficult to explain and maintain.

The new system is stricter, but much easier to reason about.

1 Like

Wow. Big changes here

Nice to see that you've been working on the assistant moods too
Just uploaded a few days ago a esphome-LVGL port for the roboeyes repo
Been working fine for a few days, but needs some attention to make it more aligned with the moods (and the curious mode is killing me and my tokens - hahahah)

my port: GitHub - willrnsantana/robo_eyes_esphome Ā· GitHub

link for the OG in the readme

I'll try to get some time next week to catch up with all the changes and create some images (Troiaio still spookes me out when she pops here and there! Tipical! heheheh)

1 Like

:rocket: v2026.6.0 - Intercom Native polish and Espressif GMF audio stack

This release is the next major step after the 2026.5.0 PBX-lite migration.

2026.5.x introduced the new call model: ESP devices as independent extensions, Home Assistant as a peer/bridge, unified phonebook, TCP/UDP routing and browser
softphone support.

2026.6.0 keeps that model and rebuilds the audio foundation under it.

:house: Home Assistant / Intercom Native

The Home Assistant side has been cleaned up around the unified PBX-lite event model.

:repeat_button: Unified call event model

The integration and Lovelace card now use the unified intercom_native.call_event event shape for session, bridge and forward updates.

This gives automations and the card a more consistent view of:

  • call scope
  • event type
  • call state
  • hangup / decline / failure reason
  • bridge and forward lifecycle

The older split event behavior is no longer the preferred model.

:no_mobile_phones: Better unavailable-device handling

The card now handles unavailable ESP devices more explicitly instead of showing stale call controls as if the device were still reachable.

This should make dashboard state clearer when an ESP is offline, rebooting, being flashed, or temporarily disconnected from Home Assistant.

:high_voltage: Safer fast hangup / redial behavior

The browser softphone path has been hardened for fast user actions.

If a call is ended and another call starts immediately after, browser audio cleanup no longer tears down the new call's microphone/audio path by mistake.

This fixes a class of "second call has no browser audio" style problems.

:mobile_phone: Mobile notification answer flow

The documented mobile flow now supports real Answer / Decline actions:

  • Answer opens the dashboard view containing intercom-card with ?intercom_answer=1
  • the card requests microphone permission and starts the full-duplex browser/app audio path
  • Decline stays in Home Assistant automation logic and calls intercom_native.decline

This is the supported way to answer an ESP-originated call from the Home Assistant Companion app.

:broom: Versioned card cache behavior

The card is registered with a versioned frontend URL derived from the installed integration version.

After upgrading, hard-refresh the dashboard page or clear the Companion app cache if the card still shows an old version.

:white_check_mark: Minimum Versions

This release requires:

  • ESPHome: 2026.5.x or newer
  • Home Assistant Core: 2026.5.0 or newer

HACS metadata now declares the Home Assistant minimum version accordingly.

:warning: Breaking Changes

Custom YAMLs that still use the old audio component/package layout need to be migrated.

Main migration points:

  • maintained YAMLs now use esp_audio_stack
  • old i2s_audio_duplex packages are no longer the supported path
  • some YAML options were renamed:
    • speaker_volume -> master_volume
    • mic_attenuation -> input_gain
    • frame_buffers_in_psram -> buffers_in_psram
    • audio_stack_in_psram -> audio_task_stack_in_psram
  • Generic full profiles are split into AEC and AFE variants
  • full audio/LVGL profiles include OTA maintenance handling
  • old copied Lovelace card files should be replaced by the bundled card

After upgrading, clear ESPHome build caches once before compiling:

find . -type d -name .esphome -prune -exec rm -rf {} +

:headphone: Audio Stack Migration

The biggest internal change in 2026.6.0 is the migration from the old custom duplex audio path to the new esp_audio_stack backend.

This replaces the maintained i2s_audio_duplex path.

The goal is not just a component rename. The new backend is built around Espressif / ESP-IDF audio components that are designed to work together:

  • esp_driver_i2s for official I2S channel ownership
  • esp_codec_dev for codec-backed devices
  • gmf_io / io_codec_dev for codec IO
  • esp_audio_effects for rate, bit-depth and layout conversion
  • esp-sr for Acoustic Echo Cancellation
  • gmf_ai_audio / esp_gmf_afe_manager for the full Audio Front-End pipeline

This means the project now carries less custom audio infrastructure and relies more directly on the Espressif audio ecosystem.

:light_bulb: Why This Matters

Earlier versions had custom code for a lot of low-level audio work:

  • I2S lifecycle
  • speaker/microphone glue
  • AEC reference routing
  • rate conversion
  • bit-depth conversion
  • channel layout conversion
  • ring buffers
  • processor feed/fetch timing
  • codec-specific assumptions

That worked, but it created too much maintenance pressure and too many board-specific edge cases.

With esp_audio_stack, the project is closer to the native ESP-IDF audio model while still exposing normal ESPHome surfaces above it:

  • microphone
  • speaker
  • media player
  • mixer
  • Voice Assistant
  • Micro Wake Word
  • intercom API
  • Home Assistant entities

:puzzle_piece: Supported Audio Shapes

The maintained profiles now cover these layouts through the new stack:

  • single-bus codec boards
  • single-bus no-codec boards
  • dual-bus MEMS mic + I2S amplifier boards
  • ES8311 stereo playback-reference boards
  • ES7210 + ES8311 TDM reference boards
  • dual-mic AFE boards
  • lightweight AEC-only Generic S3 profiles
  • full AFE profiles for larger flash/RAM layouts

Codec-backed devices use esp_codec_dev.

No-codec devices use official esp_driver_i2s channels directly, avoiding unnecessary codec/GMF IO dependencies on smaller builds.

:studio_microphone: AEC and AFE Profiles

Profiles are now split more clearly.

:feather: esp_aec

Use this for lightweight echo cancellation.

It is the default direction for:

  • intercom-only devices
  • Generic S3 full-experience profiles that need to fit smaller flash layouts
  • users who want Acoustic Echo Cancellation without the full Audio Front-End cost

:brain: esp_afe

Use this for the full Espressif Audio Front-End path.

It adds:

  • Acoustic Echo Cancellation
  • Noise Suppression
  • Automatic Gain Control
  • Voice Activity Detection
  • Speech Enhancement / Blind Source Separation on supported dual-mic boards

It is heavier, but it is the right direction for boards with enough flash/RAM and for full voice-device profiles.

:package: Generic Profile Split

Generic S3 full-experience YAMLs are now split by intended target:

  • generic-s3-full-aec-*

    • lightweight path
    • intended for 4 MB-friendly builds
    • uses standalone esp_aec
    • uses the lighter previous_frame reference
  • generic-s3-full-afe-*

    • full Audio Front-End path
    • intended for larger flash layouts
    • uses esp_afe
    • uses TYPE2-style software reference

This avoids pretending one Generic YAML can fit every board and every flash layout.

:speaker_high_volume: Better AEC Reference Handling

Echo cancellation quality depends heavily on the playback reference.

The new stack handles reference routing per topology:

  • ES8311 boards can use stereo digital feedback
  • ES7210 TDM boards can use a hardware TDM reference slot
  • no-codec Generic AEC profiles can use previous_frame
  • Generic AFE profiles can use TYPE2-style software reference

This is one of the main reasons for the audio migration. AEC quality depends on reference timing, channel layout and conversion path, not only on enabling a library.

:brain: Runtime and Memory Improvements

The migration also cleaned up runtime behavior:

  • large buffers and task stacks are allocated earlier
  • repeated heap churn during call/media transitions has been reduced
  • microphone and speaker wrapper loops wake on real events instead of spinning
  • intercom_api parks its loop when idle
  • intercom TX uses lower-copy reads where possible
  • full profiles place selected buffers/stacks in PSRAM
  • full LVGL/audio profiles enter OTA maintenance mode before flashing

This helps demanding full-experience devices where media playback, Piper TTS, Micro Wake Word, Voice Assistant, AFE/AEC and intercom all coexist.

:compass: Maintained Board Direction

Current maintained baseline:

  • Waveshare ESP32-S3 Audio Board: full AFE, dual mic, TDM reference
  • Spotpear Ball v2: codec-backed AFE/intercom profiles
  • Generic S3 AEC: lightweight 4 MB-friendly full-experience profiles
  • Generic S3 AFE: larger flash full AFE profiles
  • Generic dual-bus: maintained intercom profiles
  • Waveshare P4 Touch: present and improving, still board-specific/experimental

:test_tube: Validation

Before this release, the public YAMLs were switched to remote release mode so users can download only the YAML and let ESPHome fetch packages, assets and external
components from main.

Validation performed:

  • HACS validation passes
  • hassfest validation passes
  • generic-s3-full-afe-tcp.yaml compiles successfully with ESPHome 2026.5.1
  • ESPHome fetches this repository from main
  • Espressif managed components resolve and build correctly

Generic full AFE firmware size from the validation build is about 2.1 MB.

:up_arrow: Upgrade Notes

Recommended upgrade path:

  1. Update the Home Assistant integration through HACS.
  2. Restart Home Assistant.
  3. Hard-refresh the dashboard page containing intercom-card.
  4. Clear ESPHome build cache once.
  5. Recompile from the updated YAMLs.
  6. Flash the ESP firmware.

If you maintain custom YAMLs, start from the closest maintained profile and reapply only your board-specific changes.

2 Likes

Finally got my hands in the overpriced P4+C6 3.4'' screen combo and will be able to help with the intercom tests along with the xiaozhi ball!!

The speaker is quite bigger than I expected. Hoping I'll finally have some decent audio

BTW, the sound is surprisingly good even in open air (but has a lot of distortion at 100% volume). One of the cones is a mini sub. With a decent enclosure it may actually sound good enought to replace my echos.

This is great! Happy to see this project moving forward, sounds like it is moving a lot closer to replacing Alexa for me, assuming I can get bluetooth proxy, PBX-lite, and Assist type interactions all working on the same ESP32 board, I'll be a happy camper!

1 Like

The last two are already working. The BT proxy wasn't tested, but I guess it should work, specially with the P4+C6 combo boards as the communications has it's own processor with the C6

As soon as I can I'll start working in the interface. Other BT function I've never explored myself is BT speaker. Maybe I'll give it a try

Hi, I'm glad the components are proving useful for you.

Yes, the system is constantly evolving. You know it, I also dream of having an assistant that's truly our own.

Anyway, these days I'm doing what I always do: trying to optimize everything as much as possible. Every time you add a new feature on ESPs, it feels like trying to cram more clothes into a suitcase that's already so full it barely closes.

A little preview: I've been doing some tests with SendSpin :heart:. I absolutely love it and can't wait to kick it straight into my YAMLs.

I assume your device uses the same P4 audio codecs as mine. In that case, the volume is controlled directly by the codec. When it reaches 100%, that's genuinely the maximum output the codec can provide, it's not a software calculation.

I'd recommend adding a limit so the volume can't be raised above 90%.

1 Like