hello,
time to necro this thread 
i tried most of these steps and failed quite a lot. Eventually I succeded by vibe-coding it with python
It might be possible recreate if someone wants to include Echo speakers with MP3 playback
for the OP
The simple solution is to use the api restrictions : Speech Synthesis Markup Language (SSML) Reference | Alexa Skills Kit
the MP3 must be in 22050 bitrate to be accepted by the Echo and reachable online with a valid SSL certificate. It can then be teste this way:
in Home Assistant
test
go to developer tools → actions → yaml mode
service: notify.alexa_media
data:
message: |
<speak>
<audio src='https://<public-home-assistant-url>/local/tts_output.mp3'/>
</speak>
data:
type: tts
target: media_player.echo_spot
The complicated way
this is quite painful, but in the end you’ll be able to get echo to say everything with a voice of your choosing. On a raspberry pi 5 the request (with some weather and traffic information added) takes ~15-18 seconds before Echo begins speaking. But for backend stuff like sensor / time triggered notifications, it might be useful.
I recommend to not blindly copy and paste this, it is not my cleanest work 
my goal
- create dynamic text with openAI
- text-to-speech the text with elevenlabs and if I’m out of credits with openAI
- convert the mp3 file to a format that amazon allows to play with Echo Spot with ffmpeg
- host the mp3 file on a public homeassistant folder for Echo to play
api requirements (3rd party accounts)
- a local home assistant api key
- an openai api key for chatgpt and a fallback text-to-speech
- an elevenlabs api key because it has nicer voices than openai
software requirements
- install the required python modules with pip
apk add python pip ffmpeg
pip install openai requests --break-system-packages
registration of the new command in configuration.yaml
shell_command:
speak_smart: "python3 /config/scripts/speak_smart.py '{{ system_content }}' '{{ user_content }}'"
in homeassistant file explorer, added 2 files
- /homeassistant/scripts/.env
- /homeassistant/scripts/speak_smart.py
scripts/.env
ELEVEN_KEY=sk_0288...
OPENAI_KEY=sk-proj-Br...
HA_TOKEN=eyJh...
ECHO_ENTITY=media_player.echo_spot
HA_URL=https://<public url to home assistant without trailing slash>
scripts/smart_speak.py
from dotenv import load_dotenv
load_dotenv("scripts/.env")
import os
import subprocess
import requests
import sys
import random
from openai import OpenAI
from datetime import datetime
if len(sys.argv) != 3:
print("Usage: speak_smart.py <system_content> <user_content>")
sys.exit(1)
system_content = sys.argv[1]
user_content = sys.argv[2]
print("ENV file exists:", os.path.exists("/config/scripts/.env"))
# env
ELEVEN_API_KEY = os.getenv("ELEVEN_KEY")
OPENAI_API_KEY = os.getenv("OPENAI_KEY")
HA_TOKEN = os.getenv("HA_TOKEN")
HA_URL = os.getenv("HA_URL", "http://homeassistant.local:8123")
ECHO_ENTITY = os.getenv("ECHO_ENTITY", "media_player.echo_spot_buero")
def convert_mp3_to_48kbps(input_path, output_path):
subprocess.run([
"ffmpeg",
"-y",
"-i", input_path,
"-ar", "22050",
"-b:a", "48k",
"-ac", "1",
"-codec:a", "libmp3lame",
output_path
], check=True)
TMP_FILE = "/config/tts_output.mp3"
# OpenAI Client
client = OpenAI(api_key=OPENAI_API_KEY)
# ChatGPT Text generate
chat = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": system_content},
{"role": "user", "content": user_content}
],
temperature=0.9,
)
text = chat.choices[0].message.content.strip()
print(f"[{datetime.now()}] Generierter Text: {text}")
try:
usage = requests.get(
"https://api.elevenlabs.io/v1/user/subscription",
headers={"xi-api-key": ELEVEN_API_KEY},
timeout=5,
).json()
remaining_chars = usage["character_limit"] - usage["character_count"]
print(f"ElevenLabs: {remaining_chars} tokens left")
if remaining_chars < len(text):
raise ValueError("Not Enough Credits at Elevenlabs")
audio = requests.post(
"https://api.elevenlabs.io/v1/text-to-speech/EXAVITQu4vr4xnSDxMaL",
headers={
"xi-api-key": ELEVEN_API_KEY,
"Content-Type": "application/json"
},
json={
"text": text,
"voice_settings": {"stability": 0.5, "similarity_boost": 0.7}
},
timeout=10,
)
with open(TMP_FILE, "wb") as f:
f.write(audio.content)
print("TTS via ElevenLabs successful.")
except Exception as e:
print(f"Fallback to OpenAI TTS: {e}")
speech = client.audio.speech.create(
model="tts-1-hd",
voice="nova",
input=text
)
with open(TMP_FILE, "wb") as f:
f.write(speech.content)
print("TTS via OpenAI successful.")
# MP3 convert
os.system(f"cp {TMP_FILE} /config/www/tts_output_orig.mp3")
convert_mp3_to_48kbps("/config/www/tts_output_orig.mp3", "/config/www/tts_output.mp3")
knock_num = random.randint(1, 9)
soundbank_url = f"soundbank://soundlibrary/doors/doors_knocks/knocks_0{knock_num}"
# Home Assistant API call
response = requests.post(
f"{HA_URL}/api/services/notify/alexa_media",
headers={
"Authorization": f"Bearer {HA_TOKEN}",
"Content-Type": "application/json",
},
json={
"message": f"<speak><audio src='{soundbank_url}'/> <audio src='{HA_URL}/local/tts_output.mp3' /></speak>",
"data": {"type": "tts"},
"target": [ECHO_ENTITY],
},
timeout=10,
)
print(response.status_code, response.text)
restart home assistant for it to load the files
test
go to developer tools → actions → yaml mode
action: shell_command.smart_speak
data:
system_content: "you are a female assistant, always angry, but handing out information sassy and short."
user_content: "Give me a welcome home after a long workday"