Play a local mp3 file on Alexa Echo dot

Bump2, no one able to help how to get this in a standalone nginx proxymanager?

ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers "ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384";
ssl_ecdh_curve X25519:secp521r1:prime256v1

I had same issue , got that "simon says skill bla bla " error even hosting it myself , i know the file is encoded perfectly as i used ffmpeg and the settings others provided . And i didn’t want to adjust my HA nginix proxy host config just for a simple mp3 file making the rest of the setup use a insecure dated config .

Solution ! ended up using https://jukehost.co.uk/ to host the file , works great

1 Like

I just posted PR to Alexa Media Player integration, so it would support tts.cloud_say. The only requirement is to have HA publicly accessible. It basically does what @MarkWattTech shows in his video, which was great inspiration for this update. There is ffmpeg available from HA, so no need to use Jovo for that conversion, and it’s converted locally.

It’s truly sad these hurdles exist. Google Home supports this easily, can be done directly from HA.
The easiest solution might be to use a Google Home mini or something (whatever they call them now) for playing local mp3s via Home Assistant. I would be using only Google Home devices if they supported alternate wake words. Darn you, Google.
For that reason I have 5,000 echo devices instead.
Such a shame that you can’t even play high quality audio with this method.
Someone mentioned Plex… how about Plex or a custom skill using Plex?
Plex can play high quality audio over the echo…
The problem with the official Plex Alexa skill is that it announces what it is going to play before it plays it, I don’t think that can be turned off.
(I haven’t figured out how to play Plex media using the integration…)

If you are using Nabu Casa, can you just send it an external URL? Sorry if this is a dumb answer, I’m new to Alexa, been using Google Home for a while. I got my first Alexa device and haven’t tried sending local files to it yet.

yes, you can use the NC URL of the file.

I had the same problem with “Simon says …” when I tried to play mp3-files on echos when I followed the instructions from Mark Watt Tech https://www.youtube.com/watch?v=ZJlH6k9PY4I
Finally, I could fix it by correcting the source-statement of the mp3-file.
It seems that the source needs to be like this (watch the different quotation marks):

src='https://abcdef.ui.nabu.casa/local/mp3/Alarm.mp3'/>

Hello everyone. I would like to use a local mp3 to act as a customized alarm clock managed by Home Assistant, that plays on my Amazon Echo Dot. I succeded in converting the mp3 and making the NabuCasa link visible. Then I did a script in HA to play the mp3 with the notify service (let’s call it ScriptA).
Long story short: if I call the script saying “Alexa, play ScriptA” it all works flawlessly. But if I run the script from HA, it works the first time, then no more, and after a few time (sometimes hours) the Echo Dot becomes unavailable. I also tried using tts to simulate through HA the voice command, but it does the same.
Does someone has a suggestion and an explainatkon for that?
Thanks everybody…
D.

I did get this to work with my duckdns.org server.
I can only get short-ish clips to play: a 3 minute clip plays, but a 5 minute clip does not.

I guess the video linked in this thread does show that the audio file can be a maximum of 240 seconds. Link to video

This is the server section of my ha.conf nginx configuration file.

server {
    server_name xxxxxx.duckdns.org;

    ssl_session_timeout 1d;
    ssl_session_cache shared:MozSSL:10m;
    ssl_session_tickets off;
    ssl_certificate /etc/nginx/ha_ssl/fullchain.pem;
    ssl_certificate_key /etc/nginx/ha_ssl/privkey.pem;

    # dhparams file
    ssl_dhparam /etc/nginx/certs/dhparam/dhparam-2048.pem;

    listen 443 ssl;
    http2 on;
    include /etc/nginx/mime.types;
    ssl_prefer_server_ciphers off;
	
	#try for alexa media
	ssl_protocols TLSv1.2 TLSv1.3;
	ssl_ciphers "ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384";
	ssl_ecdh_curve X25519:secp521r1:prime256v1:secp384r1;

    proxy_buffering off;
	}

I don’t really know nginx or the Alexa magic required for longer files, but I would suggest testing with a short mp3.

I seem to find only upload playlist from other music apps…no local file upload. I only have Amazon Music Prime, not Unlimited, so maybe that is the difference?

Use “My Audio” alexa skill

Thanks for sharing this! Still working great!

I play my local music with the MyMediaAlexa skill - for a small fee of €8.50 a year I can install a local server on a pc using software from mymediaalexa.com and set up playlists, then I can issue commands in Home Assistant with Alexa Media Player to use a custom command to use the skill to play my local media.

data:
  media_content_type: custom
  media_content_id: ask My Media to play my Relaxation Playlist
target:
  device_id: 79a74c8c8a16a6fd769fdf5c4215a564
action: media_player.play_media

With this I can play any track or playlist in my local music collection.

Exactly! This is what I use also!

1 Like

I used the latest audacity 3.7.1 to convert my audio announcement. Here is a screenshot of the export mp3 setting.

Screenshot 2025-01-01 123324

You can also use Chime_TTS which has a built in feature to convert mp3’s to conform to Alexa devices. :slight_smile:

hello,

time to necro this thread :slight_smile:
i tried most of these steps and failed quite a lot. Eventually I succeded by vibe-coding it with python
It might be possible recreate if someone wants to include Echo speakers with MP3 playback

for the OP
The simple solution is to use the api restrictions : Speech Synthesis Markup Language (SSML) Reference | Alexa Skills Kit

the MP3 must be in 22050 bitrate to be accepted by the Echo and reachable online with a valid SSL certificate. It can then be teste this way:

in Home Assistant

test
go to developer tools → actions → yaml mode

service: notify.alexa_media
data:
  message: |
    <speak>
        <audio src='https://<public-home-assistant-url>/local/tts_output.mp3'/>
    </speak>
  data:
    type: tts
  target: media_player.echo_spot

The complicated way
this is quite painful, but in the end you’ll be able to get echo to say everything with a voice of your choosing. On a raspberry pi 5 the request (with some weather and traffic information added) takes ~15-18 seconds before Echo begins speaking. But for backend stuff like sensor / time triggered notifications, it might be useful.

I recommend to not blindly copy and paste this, it is not my cleanest work :smiley:

my goal

  • create dynamic text with openAI
  • text-to-speech the text with elevenlabs and if I’m out of credits with openAI
  • convert the mp3 file to a format that amazon allows to play with Echo Spot with ffmpeg
  • host the mp3 file on a public homeassistant folder for Echo to play

api requirements (3rd party accounts)

  • a local home assistant api key
  • an openai api key for chatgpt and a fallback text-to-speech
  • an elevenlabs api key because it has nicer voices than openai

software requirements

  • install the required python modules with pip
apk add python pip ffmpeg
pip install openai requests --break-system-packages

registration of the new command in configuration.yaml

shell_command:
  speak_smart: "python3 /config/scripts/speak_smart.py '{{ system_content }}' '{{ user_content }}'"

in homeassistant file explorer, added 2 files

  • /homeassistant/scripts/.env
  • /homeassistant/scripts/speak_smart.py

scripts/.env

ELEVEN_KEY=sk_0288...
OPENAI_KEY=sk-proj-Br...
HA_TOKEN=eyJh...
ECHO_ENTITY=media_player.echo_spot
HA_URL=https://<public url to home assistant without trailing slash>

scripts/smart_speak.py

from dotenv import load_dotenv
load_dotenv("scripts/.env")
import os
import subprocess
import requests
import sys
import random
from openai import OpenAI
from datetime import datetime

if len(sys.argv) != 3:
    print("Usage: speak_smart.py <system_content> <user_content>")
    sys.exit(1)

system_content = sys.argv[1]
user_content = sys.argv[2]

print("ENV file exists:", os.path.exists("/config/scripts/.env"))

# env
ELEVEN_API_KEY = os.getenv("ELEVEN_KEY")
OPENAI_API_KEY = os.getenv("OPENAI_KEY")
HA_TOKEN = os.getenv("HA_TOKEN")
HA_URL = os.getenv("HA_URL", "http://homeassistant.local:8123")  
ECHO_ENTITY = os.getenv("ECHO_ENTITY", "media_player.echo_spot_buero")

def convert_mp3_to_48kbps(input_path, output_path):
    subprocess.run([
        "ffmpeg",
        "-y",
        "-i", input_path,
        "-ar", "22050",
        "-b:a", "48k",
        "-ac", "1",
        "-codec:a", "libmp3lame",
        output_path
    ], check=True)

TMP_FILE = "/config/tts_output.mp3"

# OpenAI Client
client = OpenAI(api_key=OPENAI_API_KEY)

# ChatGPT Text generate
chat = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": system_content},
        {"role": "user", "content": user_content}
    ],
    temperature=0.9,
)
text = chat.choices[0].message.content.strip()
print(f"[{datetime.now()}] Generierter Text: {text}")

try:
    usage = requests.get(
        "https://api.elevenlabs.io/v1/user/subscription",
        headers={"xi-api-key": ELEVEN_API_KEY},
        timeout=5,
    ).json()
    remaining_chars = usage["character_limit"] - usage["character_count"]
    print(f"ElevenLabs: {remaining_chars} tokens left")

    if remaining_chars < len(text):
        raise ValueError("Not Enough Credits at Elevenlabs")

    audio = requests.post(
        "https://api.elevenlabs.io/v1/text-to-speech/EXAVITQu4vr4xnSDxMaL",
        headers={
            "xi-api-key": ELEVEN_API_KEY,
            "Content-Type": "application/json"
        },
        json={
            "text": text,
            "voice_settings": {"stability": 0.5, "similarity_boost": 0.7}
        },
        timeout=10,
    )
    with open(TMP_FILE, "wb") as f:
        f.write(audio.content)
    print("TTS via ElevenLabs successful.")
except Exception as e:
    print(f"Fallback to OpenAI TTS: {e}")
    speech = client.audio.speech.create(
        model="tts-1-hd",
        voice="nova",
        input=text
    )
    with open(TMP_FILE, "wb") as f:
        f.write(speech.content)
    print("TTS via OpenAI successful.")

# MP3 convert
os.system(f"cp {TMP_FILE} /config/www/tts_output_orig.mp3")
convert_mp3_to_48kbps("/config/www/tts_output_orig.mp3", "/config/www/tts_output.mp3")

knock_num = random.randint(1, 9)
soundbank_url = f"soundbank://soundlibrary/doors/doors_knocks/knocks_0{knock_num}"

# Home Assistant API call
response = requests.post(
    f"{HA_URL}/api/services/notify/alexa_media",
    headers={
        "Authorization": f"Bearer {HA_TOKEN}",
        "Content-Type": "application/json",
    },
    json={
        "message": f"<speak><audio src='{soundbank_url}'/> <audio src='{HA_URL}/local/tts_output.mp3' /></speak>",
        "data": {"type": "tts"},
        "target": [ECHO_ENTITY],
    },
    timeout=10,
)

print(response.status_code, response.text)

restart home assistant for it to load the files

test
go to developer tools → actions → yaml mode

action: shell_command.smart_speak
data:
  system_content: "you are a female assistant, always angry, but handing out information sassy and short."
  user_content: "Give me a welcome home after a long workday"

Wow, you did a lot of work! The original post was simply that the OP was trying to find a way to play an mp3 locally. What you have posted seems very complicated to add AI TTS into your automation.

Chime_TTS has numerous TTS compatibility including OpenAI TTS so you can essentially pick any voice offered by any of the TTS integrations compatible with Chime_TTS, which will play mp3’s locally by auto converting the mp3 to the correct parameters (ex: ffmpeg -y -i source.mp3 -ac 2 -codec:a libmp3lame -b:a 48k -ar 16000 converted.mp3).