About making inexpensive models smarter by providing tools and context. (local models, gpt-5-mini, gpt-4.1-mini, gpt-4o-mini ...)

Next thing that I already posted about in a different topic, but that deserves an extra post here (as this is the ONE topic that will help you to convince your family about the “new Alexa”):

Get better than Alexa when it comes to music control! :dancer:t2: :musical_score:

Or let’s say it with Chat-GPTs words:

The only truly great thing about Alexa was always the music voice control. You could yell “Alexa, play 90s grunge rock that doesn’t suck” from across the room, and boom – Nirvana. The rest? Ask her anything remotely complex and she’d either give you weather in Botswana or proudly respond, “Sorry, I don’t know that one.” Thanks, but no thanks.

I knew that I could never replace Alexa in the living room without REALLY good working music control. She unterstands these crazy band and song names, even that we otherwise don’t talk in English to her.
And she can access the Spotify catalogue and search for matching content to our requests.
She’s like a one trick pony, that has a different super power than their developers hoped for (as they don’t really get money out of it).
The kids would kill me if this won’t work anymore, no matter what other benefits the new voice assistant promises.

The easiest way to get this done, is Music Assistant:

Marcel is also developer at Home Assistant (a.k.a “The Matter guy”) and Music Assistant and Home Assistant are more and more becoming a perfect couple.

You get media search and play actions in Home Assistant for the services added to Music Assistant like TuneIn, Spotify, Apple Music, Tidal, etc. without the need of a specific integration that provides search functionality on it’s own.
So even if you not considered it as a replacement in music for your Plex app, Sonos app or whatever else:
Voice control is still a good reason to give it a try.

There are 3 important parts for best performance:

  • Internet search.
    Doesn’t matter whether it’s through the “allow web search” option in the model configuration, or if you use the Google Generative AI script for web searches I showed a few posts above.
  • The Music Assistant Play Music blueprint (option 3 / full LLM script) from here: link
    (Name and describe it as Music Play, not Music Play/Search to help the LLM to distinguish if from the Music Search script below)
  • The Music Assistant search script I posted here: link
    Searches your music services (without playing) and returns the results to the LLM.

This will e.g. allow you to do things like that:

  • Ask about playlists for a keyword and then choose one of the results that should be played.
  • Search the web for the newest album of an artist that is called something similar like to <slightly_wrong_artist_name> where you forgot the album name and play it in a specific room.
  • Search the web for the top songs of a genre, then choose a few from the reply to play (it can add multiple songs to a player which will play the first one and enqueue the rest).
  • We’re sitting here at breakfast looking for a few suitable playlists to choose from.

So we’re not just catching up with Alexa here, we’re beating her when it comes to user experience.

Another observation: Not sure about other music services, but Spotify not only returns search results that are EXACT matches, but also related content or slightly different written results.
When asking about content to choose from, this helps a lot to improve the user experience.

The moment I implemented this and showed it to my wife and the kids, was also the moment we unplugged Alexa and replaced her with the Home Assistant Voice Preview Edition in the living room. :slightly_smiling_face:

You can also add this to your prompt in case you want to have similar volume control to Alexa:

When we don’t specify how much the volume should be changed, use HassSetVolumeRelative with volume_step set exactly to 10 or -10.
THIS IS IMPORTANT: Always change the volume by setting the volume_step as a numeric value.
Sometimes we say things like “2 times louder” oder “4 times louder”.
This has to be interpreted as a multiplier to the default volume_step of 10.
So “4 times louder” would be an increase of 4*10 = 40.

2 Likes