I’ve had success! I’m using a DFPlayer mini MP3 player module, with the speaker connected directly to it. I also have a little class-D audio amplifier, but the DFPlayer is plenty loud, so I never used it. I started with fully-formed sentences as the MP3 files, but realized I want more flexibility, and managing the sentences required a lot of notes and memorization. It sounds a little bit robotic, but I’m happy to take that with the flexibility I’ve managed and the ability to keep it all inside the walls of my house. Google and Amazon have no business knowing when I’m home, or when I receive visitors.
I taught my house to speak. I played around with it. Now, I want it to shut up… Balance will take a little while to find. While I was in the middle of creating this, I had a roommate move in, which increased the frequency of notifications about five-fold. I REALLY love the “garbage day tomorrow” spoken notice I get at 11PM on Tuesdays - I’ve always had a hard time remembering garbage day…
I know - current HA version doesn’t use services anymore. Now, they’re called actions. I’m not ready to break other configurations yet, so I haven’t updated.
I used an online text-to-speech site to generate the audio to make up the vocabulary, and then split each word into its own MP3 file, named 0001.mp3 to 0300.mp3. The DFPlayer requires that all files be They’re all on the MicroSD card on the DFPlayer, and I’m not even close to 1GB of audio files. The YAML file needs an array that identifies each file by the word it represents. I had to split my 300-word voocabulary into two arrays. The ESP would enter a panic loop if either array was more than 200 words long. When compiled, even with a 300-word vocabulary, I want to say that I’m using less than half the available flash space on the ESP.
I created a service on the ESP that can be called by HA with a real sentence being passed. The script executed on the ESP parses the “sentence,” looks up the index of each word and then plays the MP3 file named per the index in the array. I added a few tones to let me know a statement was on the way, and used different ones to give the more important ones more of a “kick.” If you specify MP3 file number 0000, it’ll play a random file, so element zero in the first array is just a placeholder. I asked a handful of friends for words to add to the vocabulary, so it drifts into the weeds at times.
On the ESP itself, I can simply execute the script with a full sentence such as
on_press:
- script.execute:
id: make_speech
full_speech: 'tonewater laundry room water detected'
From Home Assistant, I can call the service just as easily
- service: esphome.talker_hall_make_announcement
data:
full_speech: 'tonemotion front door motion detected'
YAML:
globals:
- id: words
type: std::array<std::string, 200>
initial_value: '{
"nevermore", "tonedub", "tonetrip", "tonequad", "tonewater",
"tonedoorclose", "tonedooropen", "tonemotion", "tonenotfound", "toneone",
"tonetwo", "tonethree", "tonefour", "tonefive", "tonesix",
"toneseven", "toneeight", "tonenine", "toneten", "one",
"two", "three", "four", "five", "six",
"seven", "eight", "nine", "ten", "eleven",
"twelve", "thirteen", "fourteen", "fifteen", "sixteen",
"seventeen", "eighteen", "nineteen", "twenty", "thirty",
"fourty", "fifty", "sixty", "seventy", "eighty",
"ninety", "hundred", "thousand", "monday", "tuesday",
"wednesday", "thursday", "friday", "saturday", "sunday",
"january", "february", "march", "april", "may",
"june", "july", "august", "september", "october",
"november", "december", "AM", "PM", "day",
"days", "hour", "hours", "minute", "minutes",
"second", "seconds", "today", "tomorrow", "yesterday",
"above", "after", "and", "anniversary", "appointment",
"back", "bath", "bay", "be", "bed",
"been", "before", "begin", "begun", "below",
"birthday", "bottom", "breakfast", "cabinet", "car",
"ceiling", "clean", "civic", "closed", "dance",
"daytime", "dentist", "detected", "dining", "dinner",
"disabled", "dish", "do", "doctor", "door",
"dryer", "during", "east", "enabled", "end",
"ended", "family", "finished", "floor", "flower",
"followup", "for", "front", "garage", "garbage",
"garden", "gate", "get", "goodbye", "guest",
"happy", "has", "have", "high", "home",
"in", "inner", "inside", "interested", "interview",
"intruding", "is", "jeep", "kitchen", "last",
"laundry", "leave", "light", "living", "locked",
"low", "lunch", "machine", "master", "media",
"medium", "mid", "mode", "motion", "next",
"nighttime", "noon", "north", "not", "notice",
"now", "off", "office", "on", "open",
"outer", "outside", "path", "pathway", "patio",
"pick", "pie", "porch", "radio", "ready",
"recharge", "recycle", "refrigerator", "remain", "remaining",
"replace", "roof", "room", "run", "running",
"schedule", "set", "shed", "shelf", "side",
}'
- id: wordstoo
type: std::array<std::string, 141>
initial_value: '{
"south", "space", "star", "start", "started",
"stop", "stopped", "sunrise", "sunset", "talking",
"temperature", "time", "top", "trespassing", "truck",
"turned", "unlocked", "up", "update", "wall",
"warrant", "wars", "was", "washer", "washing",
"waste", "water", "welcome", "west", "will",
"window", "within", "work", "yard", "you",
"your", "yours",
<< NAMES HERE >>,
"moisture",
"wet", "dry", "a", "afternoon", "am",
"are", "arrived", "at", "away", "call",
"calling", "contact", "countdown", "cycle", "darryl",
"departed", "feed", "give", "go", "gone",
"goodbye", "here", "i", "if", "internal",
"left", "load", "loaded", "message", "microwave",
"morning", "night", "nimbus", "note", "please",
"police", "property", "ready", "right", "take",
"thank", "thanks", "the", "to", "treat",
"unload", "unloaded", "welcome", "when", "will",
"written", "assistance", "baby", "cabinet", "cat",
"cleared", "dog", "expired", "food", "give",
"help", "longer", "me", "more", "movie",
"music", "needed", "no", "over", "prefer",
"preference", "quiet", "request", "requested", "save",
"saved", "show", "shut", "silence", "silent",
"stairs", "stereo", "television", "tell", "time",
"timer", "treat", "treats", "under", "want",
"yes",
}'
- id: input_string
type: std::string
restore_value: no
initial_value: '""' # Default value
- id: word_index
type: int
restore_value: no
initial_value: '9999' # Default value
uart:
tx_pin: ${pin_tx}
rx_pin: ${pin_rx}
baud_rate: 9600
id: test_uart
dfplayer:
uart_id: test_uart
id: dfplayer_talker_hall
on_finished_playback:
then:
logger.log: 'Playback finished event'
api:
encryption:
key: !secret api_key
services:
- service: make_announcement
variables:
full_speech: string
then:
- logger.log: "make_announcement called"
- lambda: |-
id(make_speech).execute(full_speech);
script:
# Takes a sentence, splits into words, identifies the track number and sends to process_word
# The words are queued by process_word to ensure each plays completely before the next begins
- id: make_speech
parameters:
full_speech: string
mode: queued
then:
- lambda: |-
ESP_LOGI("Making Speech", "Speech: %s", "speech");
std::string current_word;
std::string speech = full_speech + " "; // Add space to handle the last word
for (size_t i = 0; i < speech.length(); i++) {
if (speech[i] == ' ') {
// A complete word has been found
bool found = false;
for (size_t j = 0; j < id(words).size(); j++) {
if (id(words)[j] == current_word) {
found = true;
id(word_index) = j;
// Play action with the index (assuming a play function is defined)
// Replace the following line with your actual play action
ESP_LOGI("Announcement Maker", "Playing word: %s at index: %d", current_word.c_str(), j);
id(process_word).execute(j + 1);
break; // Exit the loop once the item is found
}
}
if (!found) {
for (size_t j = 0; j < id(wordstoo).size(); j++) {
if (id(wordstoo)[j] == current_word) {
found = true;
id(word_index) = j + 201;
// Play action with the index (assuming a play function is defined)
// Replace the following line with your actual play action
ESP_LOGI("Announcement Maker", "Playing word: %s at index: %d", current_word.c_str(), j + 200);
id(process_word).execute(j + 201);
break; // Exit the loop once the item is found
}
}
}
if (!found) {
ESP_LOGI("Announcement Maker", "Word '%s' not found.", current_word.c_str());
id(process_word).execute(18);
}
current_word.clear(); // Clear the current word for the next iteration
} else {
current_word += speech[i]; // Build the current word
}
}
# Takes the track number of the word passed by make_speech and plays it, blocking until the whole word has been played
- id: process_word
parameters:
word: int
mode: queued
then:
- lambda: |-
if (word > 200){
ESP_LOGI("Process Word", "Playing word: %d, ==> %s", word - 1, id(wordstoo)[word - 201].c_str() );
word = word - 1;
} else {
ESP_LOGI("Process Word", "Playing word: %d, ==> %s", word, id(words)[word - 1].c_str() );
}
- dfplayer.play:
file: !lambda 'return word;'
- delay: 100ms
- wait_until:
condition:
lambda: return id(dfplayer_talker_hall).is_playing() == false;