ESP32 MP3 & TTS Player with MQTT, FTP Server and Message Queue

Hello everyone,

I’d like to introduce my new project: a smart home TTS/MP3 player, based on an ESP32 audio kit and fully integrated into your smart home system via MQTT.

The player provides the following features:

1. TTS (Text-to-Speech)

  • Converts text to audio using Google TTS
  • Immediate playback
  • Ideal for home automation (e.g., “Washing machine finished”, “Motion detected”)

2. TTM (Text-to-Memory) with caching

Works like TTS, but the generated MP3 files are stored locally.

  • If the same text is requested again, the cached file is played
  • Reduces internet requests and makes playback extremely fast

3. MP3 Playback

  • Plays any MP3 files from the SD card
  • Custom sounds can be uploaded via FTP

4. Playback queue

Multiple speech requests are played one after another automatically.

  • No overlap or audio glitches
  • TTS text can include a prefix that overrides the volume, for example:
    80! Water in the basement
    This ensures that important alerts are played loudly, regardless of the player’s current volume settings.

5. MQTT control

  • Fully controllable via MQTT
  • Compatible with any smart home system that supports MQTT
  • Commands include: mp3, tts, ttm, volume, stop, reboot, and more
  • Each TTS player can subscribe to messages individually or as part of a group

6. Configuration

  • Configurable via app.ini on the SD card
  • Easy setup of Wi-Fi, MQTT, FTP, volume, and more
  • Room-based SD card configuration:
    When moving the device, simply swap the SD card to carry all settings and files with you.

One possible use for the player is to turn it into a garden iDog that plays sounds or music whenever someone walks past

The entire project, including documentation, is open source and available here:
GitHub: GitHub - nmaciol/esp32-smarthome-mp3player-main

1 Like

So it needs Internet access?

Yes, TTS requires an internet connection every time it is used,
whereas TTM only needs one for the first use. The audio is saved as an MP3 file and retrieved from the SD card on the second use. This allows most messages to be played back offline.

See point 2: TTM (Text-to-Memory) with caching.

My isssue is Google for tts.
Could a local tts be used? Piper?

My devices are blocked from internet. Access to Google would be undesirable

The ESP32 isn’t capable of producing natural-sounding speech. Instead, you can generate voice messages using Google, OpenAI, and similar services, save them as MP3 files, upload them to the MP3/TTS player via FTP, and play them by referring to their file names.

Example: Using MQTT with the MP3/TTS Playe