WiiM product voice control through speech-to-phrase via api

I created API calls for integrating speech to text using services and rest commands in order to send direct control to a wiim device. As you can see below it was brought to my attention that intents and or intent_scripts with proper integration should work similarly and more efficiently. I am still researching this and will update below when I have a complete example so you don’t have to exhaust yourself searching in this vast emptiness and spread out piecemeal datastream of documentation.

Thanks for the helpful information.
But I have something to add.

LinkPlay is a system integration for adding media players. Therefore, many commands are built-in, eliminating the need to duplicate volume control, playback controls, or mute. Simply name your devices/rooms.

There’s a built-in action for presets. You can also create an intent based on it.

The only command that requires API access is adjusting the equalizer. And perhaps selecting a position (this is a more convenient implementation than using play_media).

Also, “service” hasn’t been used for a long time.

It doesn’t get exposed to voice commands if using speech-to-phrase. Nothing I have found actually exposes those entities as a media player.

I was only talking about intents. They’re not directly related to any STT engine.

I agree that the basic s2p dictionary doesn’t pay enough attention to all existing intents, so many phrases will have to be added myself.

let me rephrase, there is no data anywhere that demonstrates how to write any intents to allow operation of the linkplay or wiim integrations to control that aspect of the media player by speech-to-phrase. Specifically to the wiim devices for volume levels, playing presets, zone control, and skip/previous, stop/pause/play. Not only are they not defined as entities, they cannot be exposed by standard media player intents. So, from everything that I have seen, in order to control this via voice command it needs to be done by url api commands. If there are other options that are available, more streamline, or more efficient, then I haven’t been able to find it, hence why I created this file.

do you have documentation that shows a more efficient way?

Is your WiiM Amp added to HA as a media player? Have you assigned a friendly name (or alias) to this media player and exposed it to Assist?
If so, the basic commands (their list can be found in the repository) are already working—you can check this in text mode.

Yes it is added in with a friendly name. I have it exposed to speech to phrase add on and it has only 3 sections enabled, it does not detect other available options as a media player and there is no demonstration of how intents are going to for example from voice command “play driving on family speaker” and it know how to get to preset 1 on family speaker.

There are defined intents in StP, there are intents in the link you provided, but when added they fall into nothingness because nothing is built into the proper command lines for linkplay/media player to actually do anything.

Do you have a WiiM device? If so are you able to or have you tried to control it via StP voice assistant?

This is a misunderstanding. The S2P dictionary is independent; it describes all possible phrase variants that the server should recognize. Identical syntax is used for convenience.
It’s also true that control phrases must match intents, as I mentioned earlier.

Yes, I have both Wiim and other media players. I used a fixed dictionary in Vosk two years ago, before s2p came out. Back then, there were no intents for media players, and you really had to create them yourself. A lot has changed since then.

Since then, I’ve only kept this one intent_script.

SetPreset:
  action:
    action: linkplay.play_preset
    target:
      entity_id: "{{ lp_player }}"
    data:
      preset_number: "{{ plpreset }}"
  speech:
    text: "Ok"

If you are implementing multi-room control, it will also be convenient to implement them using the media_player.join/media_player.unjoin commands in intent_scripts.

I still have no idea how to make that work.

I dont understand how there can be so much promotion in trying to get this working and yet there is hardly any cross documentation that truly demonstrates how to make this work successfully. It feels like a force view into the cloud based services unless you 1. already knew the logic/syntax or 2. you had this built in steps along the way so it was just adapted.

How do I modify my intents in sentences.yaml to adjust audio levels? or better yet a complete list of actions that specifically works by default with wiim through linkplay?

Let’s use “relative volume change” as an example. You’ll need to select the desired phrases from this file and create identical templates in the s2p dictionary.

For example, if you want to use the automatic method for obtaining names, then for the phrase increase <name> volume you need to specify

  - sentences:
      - "increase {name} volume"
    domains:
      - "media_player"

But it’s better to use your own personal list that doesn’t contain unnecessary media players.

- "increase {wiim_name} volume"

lists:
  wiim_name:
    - "first device"
    - "second device"

If necessary, use multiple lists; you can use a range l for any numeric values.

All this syntax serves one purpose - to avoid having to manually type each phrase variation.

Overall, creating a list for all relevant phrases for control requires quite a bit of work.
Personally, I gave up on this and switched to a real stt engines a year ago. You can run good models (like the Parakeet v2) on an average home CPU. Or there are cloud options like grok or azure that will give you free limits for whisper.

If I still want something local, what would you assume a good model would be for something capable of 99tops (mini pc running a 285H cpu, 140t gpu, int8 npu). I don’t mind writing everything out, the problem is trying to piece everything together and then it be functional isn’t easy with lack of documentation. People are great at writing stuff, but not so much on tutorial or explanation side of things.

Also despite writing out a database of sentences is useless if the limit of S2P often prevents multiple words from translating into the correct “out” which makes speech useless for anything less than one word. If I have to say preset 1 instead of playlist name and try to remember that for all zones because it can’t interpret a multi-word playlist name and convert it to 1 to play that preset, then it’s futile.

The parakeet and canary models are available in the onnx-asr addon/app.

Example from the documentation
And also here.

lists:
  brightness_level:
    values:
      - in: (max | maximum | highest)
        out: 100
      - in: ( minimum | lowest)
        out: 1

Thank you very much, you are a great and patient person. Now I actually have something to search from to figure out how to piece this together. Bear in mind, the api method I had posted worked flawlessly, but I see how in the near future it being outmatched by better integrations that can offer zone synchronization and easier customization for changes or assigned to areas so that it can simply speech.

By multiple words I mean “whole house” or “dancing songs” as an in and it properly translate to an out of 1 and 2 respectively. Anything in list that is single word in to something out, works fine regardless of how many match the same out, but with S2P it often translates two different spoken words as they are spoken despite it being defined in lists to out as 1 or 2 and it fails. So it doesn’t recognized “whole house” it sees it as “whole” + “house” so instead of sending 1 as defined in list it sends “whole” + “house”

Also how to you have your wiim device exposed to intents? Right now mine is

  name:
    values:
      - in: "Family room speaker"
        out: "Family room speaker"
        context:
          domain: "media_player"
          media_player_supports_pause: true
          media_player_supports_volume_set: true
          media_player_supports_next_track: true
        metadata:
          domain: "media_player"
      - in: "Family room speaker"
        out: "Family room speaker"
        context:
          domain: "media_player"
          media_player_supports_pause: true
          media_player_supports_volume_set: true
          media_player_supports_next_track: true
        metadata:
          domain: "media_player"
      - in: "Family room speaker Group Master"
        out: "Family room speaker Group Master"
        context:
          domain: "media_player"
          media_player_supports_pause: false
          media_player_supports_volume_set: true
          media_player_supports_next_track: false
        metadata:
          domain: "media_player"
      - in: "Family room speaker Multiroom Role"
        out: "Family room speaker Multiroom Role"
        context:
          domain: "sensor"
        metadata:
          domain: "sensor"
      - in: "Current Input"
        out: "Current Input"
        context:
          domain: "sensor"
        metadata:
          domain: "sensor"

but i need to add it in, so it isn’t seen as a different device but can be substituted in lp_player for link play intent_script

If I understand what you’re saying correctly, I wouldn’t do the conversion on the s2p side, but leave that task to the intent. Let the preset name slot use a list with several input name values ​​and one output preset number.
This is more versatile and will continue to work with other stt engines.

Custom sentences are an independent subsystem, so they don’t use “system data”. And for this task, it’s easiest to define a clear matching list.

  - in: "Family room speaker"
    out: "media_player.deviсename1"
  - in: "yet another speaker"
    out: "media_player.deviсename2"

You can also use standard template functions in the script, such as retrieving the device name by name.

device_id(entity_id) returns the device ID for a given entity ID or device name. Can also be used as a filter.

However, the first option seems more predictable. Entity IDs can be useful when creating multi-room controls.

I am struggling with this.

The snippet I sent you with media player layout is from the S2P sentences.yaml and that is based on what was trained when exposed via voice assistant.

The reason I am asking all of these questions is because if they are exposed to the voice assistant, and it is defined as a media player in the trained sentences.yaml file wouldn’t adding the same entity in custom sentences make it duplicate so when speaking voice command it errors out due to duplicate entities?

Do I not expose that speaker as a media player to voice assistant and ONLY include it in the custom sentences?

for example:

in custom sentences.yaml enter under list indentation:

lp_player:
 values;
    -in: "family room speaker"
     out: media_player.familyroomspeaker
    -in: "living room speaker"
     out: media_player.livingroomspeaker

plpreset:
  values:
    - in: "driving"
      out: 1
    - in: "dancing fun"
      out: 2
    - in: "three"
      out: 3
    - in: "four"
      out: 4
    - in: "five"
      out: 5

and under intents indentation in the same .yaml

PlayPreset:
  sentences:
    - "play preset {plpreset} on {lp_player}"
    - "set {lp_player} to preset {plpreset}"

would that still expose all other media player actions or do I need to define all of those seperately as intents? If so and I build those out as intents would i need separate intent_scripts for the intent media_player actions or would I only need these specific linkplay intent scripts

intent_script:
  SetPreset:
    action:
      action: linkplay.play_preset
      target:
        entity_id: "{{ lp_player }}"
      data:
        preset_number: "{{ plpreset }}"
    speech:
      text: "Playing your playlist"
  JoinSpeakers:
    speech:
      text: "Joining your speakers."
    action:
      - variables:
          master_entity: "{{ lp_player[0] }}"
          member_entities: "{{ lp_player[1:] }}"
      - action: media_player.join
        target:
          entity_id: "{{ master_entity }}"
        data:
          group_members: "{{ member_entities }}"
  UnjoinSpeakers:
    speech:
      text: "Speakers are no longer joined."
    action:
      - action: media_player.unjoin
        target:
          entity_id: "{{ lp_player }}"
  PlayPresetInAreas:
    speech:
      text: "Playing {{ pl_preset }} in {{ area_names | join(', ') }}."
    action:
      - variables:
          speakers: >
            {% set all_speakers = [] %}
            {% for area in area_names %}
              {% set area_speakers = area_entities(area)
                   | select('match', '^media_player\.')
                   | list %}
              {% set all_speakers = all_speakers + area_speakers %}
            {% endfor %}
            {{ all_speakers | unique | list }}
      - choose:
          - conditions: "{{ speakers | length == 0 }}"
            sequence:
              - action: conversation.process
                data:
                  text: "I couldn't find any speakers in {{ area_names | join(', ') }}."
          - conditions: "{{ speakers | length > 1 }}"
            sequence:
              - action: media_player.join
                target:
                  entity_id: "{{ speakers[0] }}"
                data:
                  group_members: "{{ speakers[1:] }}" 
      - action: linkplay.play_preset
        target:
          entity_id: "{{ speakers[0] }}"
        data:
          media_content_type: playlist
          media_content_id: "{{ pl_preset }}"

and do not expose these media players to the assistant at all?

I am slowly, if at all, learning this. Language learning and processing has never been my strong suit so it is even harder for me to figure this out because it is literally all syntax. But I did just discover that by going into developer tools I can see what actions are available in my system and that part of what you are saying is starting to make more sense. So I am guessing that I don’t actually need to expose anything complex to S2P but can build it manually and set intents to match actions.

I found the link to built-in intents. I also used your link for media.player and added those intents to my custom sentence yaml file. I left the device exposed to voice assistant and changed the {{ lp_player}} to {{media_player}} since I only use the same players throughout the house. I edited and the added my intent_scripts from above. So hopefully I am starting on the right track.

I think you get the idea.
With custom sentences, the system will only recognize the data you specified (the two media players from your code).

The input phrase is play preset four on living room speaker for which a template is found: play preset {plpreset} on {lp_player} and two variables are passed to the script: 4 and media_player.livingroomspeaker.

In turn, the phrase play preset four on living room speaker should be specified in the S2P. Otherwise, the server will ignore this command and it won’t be recognized.

As for built-in intents, they’re the same scripts, but HA does all the work itself. The user is required to give the device a convenient name, exposed it and use predefined phrases.

For S2P, the logic remains the same; the phrases used must be specified. Either manually or using the auxiliary tools provided by the server.

So I think I got this set up and working but my api calls supported shuffle and stop which is not supported for WiiM devices through built-in media_player or linkplay intents.

Is there a better way to send the url api calls rather than service? API calls do support this

I can only guess why modern companies dislike the “stop” button in their app interfaces so much. The HA team copied this practice (but usually we’re dealing with hardware devices, not just cloud services, so this doesn’t quite make sense). Therefore, you’ll have to create the intent for media_player.media_stop yourself.

media_player.repeat_set and media_player.shuffle_set will probably be added later, but for now you’ll have to implement this yourself.

Regarding EQ, I would create a dictionary in the script that matches the player ID and its IP address (which will be passed in the request). This would allow to avoid duplicating the work with names and instead use the existing list.

Ok, I want to stick with speech-to-phrase because I want it local, I dont want anything to be inferred or delayed. I want it exactly how we speak in the way that we did with the original alexa.

We also want to stick with Amazon music because we like our created playlists and those specific versions of the songs. We also don’t mind using our mobile devices to send specific songs that we want (in query or because we want a specific song) direct to the WiiM device from the amazon music app. This set up is strictly to be able to quick call our amazon music playlists through a totally local system (aside from the audio stream).

With that in mind, I used your feedback:

Configuration.yaml needs these added to proper indentations. You need to map the sensor to correct media player entity ID and it’s corresponding IP address for each device you want to expose

intent_script:
  PlayPreset:
    speech:
      text: "Playing your playlist."
    action:
      - variables:
          player: "{{ lp_player if lp_player is string else lp_player[0] }}"
      - action: linkplay.play_preset
        target:
          entity_id: "{{ player }}"
        data:
          preset_number: "{{ plpreset }}"

  JoinSpeakers:
    speech:
      text: "Joining your speakers now."
    action:
      - variables:
          master_entity: "{{ lp_player[0] }}"
          member_entities: "{{ lp_player[1:] }}"
      - action: media_player.join
        target:
          entity_id: "{{ master_entity }}"
        data:
          group_members: "{{ member_entities }}"

  UnjoinSpeakers:
    speech:
      text: "Speakers are no longer joined."
    action:
      - variables:
          player: "{{ lp_player if lp_player is string else lp_player[0] }}"
      - action: media_player.unjoin
        target:
          entity_id: "{{ player }}"

  PlayPresetInAreas:
    speech:
      text: "Playing {{ plpreset }} in {{ area_names | join(', ') }}."
    action:
      - variables:
          speakers: >
            {% set all_speakers = [] %}
            {% for area in area_names %}
              {% set area_speakers = area_entities(area)
                   | select('match', '^media_player\.')
                   | list %}
              {% set all_speakers = all_speakers + area_speakers %}
            {% endfor %}
            {{ all_speakers | unique | list }}

      - choose:
          - conditions: "{{ speakers | length == 0 }}"
            sequence:
              - action: conversation.process
                data:
                  text: "I couldn't find any speakers in {{ area_names | join(', ') }}."

          - conditions: "{{ speakers | length > 1 }}"
            sequence:
              - action: media_player.join
                target:
                  entity_id: "{{ speakers[0] }}"
                data:
                  group_members: "{{ speakers[1:] }}"

      - action: linkplay.play_preset
        target:
          entity_id: "{{ speakers[0] }}"
        data:
          preset_number: "{{ plpreset }}"

  StopPlayback:
    speech:
      text: "Stopping playback."
    action:
      - variables:
          player: "{{ lp_player if lp_player is string else lp_player[0] }}"
          ip: "{{ state_attr('sensor.lp_player_ip', 'map')[player] }}"
      - action: rest_command.wiim_stop
        data:
          ip: "{{ ip }}"

  ShuffleOn:
    speech:
      text: "Shuffle enabled."
    action:
      - variables:
          player: "{{ lp_player if lp_player is string else lp_player[0] }}"
          ip: "{{ state_attr('sensor.lp_player_ip', 'map')[player] }}"
      - action: rest_command.wiim_shuffle
        data:
          ip: "{{ ip }}"
          mode: "2"

  ShuffleOff:
    speech:
      text: "Shuffle disabled."
    action:
      - variables:
          player: "{{ lp_player if lp_player is string else lp_player[0] }}"
          ip: "{{ state_attr('sensor.lp_player_ip', 'map')[player] }}"
      - action: rest_command.wiim_shuffle
        data:
          ip: "{{ ip }}"
          mode: "4"

  RepeatOne:
    speech:
      text: "Repeating the current track."
    action:
      - variables:
          player: "{{ lp_player if lp_player is string else lp_player[0] }}"
          ip: "{{ state_attr('sensor.lp_player_ip', 'map')[player] }}"
      - action: rest_command.wiim_repeat
        data:
          ip: "{{ ip }}"
          mode: "1"

  RepeatAll:
    speech:
      text: "Repeating all tracks."
    action:
      - variables:
          player: "{{ lp_player if lp_player is string else lp_player[0] }}"
          ip: "{{ state_attr('sensor.lp_player_ip', 'map')[player] }}"
      - action: rest_command.wiim_repeat
        data:
          ip: "{{ ip }}"
          mode: "0"

  EQPreset:
    speech:
      text: "Setting EQ preset."
    action:
      - variables:
          player: "{{ lp_player if lp_player is string else lp_player[0] }}"
          ip: "{{ state_attr('sensor.lp_player_ip', 'map')[player] }}"
      - action: rest_command.wiim_eq
        data:
          ip: "{{ ip }}"
          preset: "{{ eq_preset }}"

  InputSwitch:
    speech:
      text: "Switching input."
    action:
      - variables:
          player: "{{ lp_player if lp_player is string else lp_player[0] }}"
          ip: "{{ state_attr('sensor.lp_player_ip', 'map')[player] }}"
      - action: rest_command.wiim_input
        data:
          ip: "{{ ip }}"
          source: "{{ source }}"

  PlayEverywhere:
    speech:
      text: "Playing everywhere."
    action:
      - variables:
          wiim_players: "{{ state_attr('sensor.lp_player_ip', 'map').keys() | list }}"
      - choose:
          - conditions: "{{ wiim_players | length > 1 }}"
            sequence:
              - action: media_player.join
                target:
                  entity_id: "{{ wiim_players[0] }}"
                data:
                  group_members: "{{ wiim_players[1:] }}"
      - action: linkplay.play_preset
        target:
          entity_id: "{{ wiim_players[0] }}"
        data:
          preset_number: "{{ plpreset }}"

  UnjoinAll:
    speech:
      text: "Ungrouping all speakers."
    action:
      - variables:
          wiim_players: "{{ state_attr('sensor.lp_player_ip', 'map').keys() | list }}"
      - repeat:
          for_each: "{{ wiim_players }}"
          sequence:
            - action: media_player.unjoin
              target:
                entity_id: "{{ repeat.item }}"

  SetAllVolume:
    speech:
      text: "Setting volume on all speakers."
    action:
      - variables:
          wiim_players: "{{ state_attr('sensor.lp_player_ip', 'map').keys() | list }}"
      - repeat:
          for_each: "{{ wiim_players }}"
          sequence:
            - action: media_player.volume_set
              target:
                entity_id: "{{ repeat.item }}"
              data:
                volume_level: "{{ volume | int / 100 }}"

  PauseEverywhere:
    speech:
      text: "Pausing all speakers."
    action:
      - variables:
          wiim_players: "{{ state_attr('sensor.lp_player_ip', 'map').keys() | list }}"
      - repeat:
          for_each: "{{ wiim_players }}"
          sequence:
            - action: media_player.media_pause
              target:
                entity_id: "{{ repeat.item }}"

  StopEverywhere:
    speech:
      text: "Stopping all speakers."
    action:
      - variables:
          wiim_players: "{{ state_attr('sensor.lp_player_ip', 'map').keys() | list }}"
      - repeat:
          for_each: "{{ wiim_players }}"
          sequence:
            - variables:
                ip: "{{ state_attr('sensor.lp_player_ip', 'map')[repeat.item] }}"
            - action: rest_command.wiim_stop
              data:
                ip: "{{ ip }}"

  MuteAll:
    speech:
      text: "Muting all speakers."
    action:
      - variables:
          wiim_players: "{{ state_attr('sensor.lp_player_ip', 'map').keys() | list }}"
      - repeat:
          for_each: "{{ wiim_players }}"
          sequence:
            - action: media_player.volume_mute
              target:
                entity_id: "{{ repeat.item }}"
              data:
                is_volume_muted: true

  WiimStatus:
    speech:
      text: "Checking status."
    action:
      - variables:
          player: "{{ lp_player if lp_player is string else lp_player[0] }}"
          ip: "{{ state_attr('sensor.lp_player_ip', 'map')[player] }}"
      - action: rest_command.wiim_status
        data:
          ip: "{{ ip }}"
      - delay: "00:00:01"
      - variables:
          raw: "{{ states('sensor.rest_response') }}"
          data: "{{ raw | from_json if raw is not none else {} }}"
          status: "{{ data.status | default('unknown') }}"
          volume: "{{ data.vol | default('unknown') }}"
          loop: "{{ data.loop | default('unknown') }}"
          shuffle: >
            {% if loop in ['2','3'] %}
              on
            {% else %}
              off
            {% endif %}
          repeat: >
            {% if loop == '1' %}
              one
            {% elif loop == '0' %}
              all
            {% else %}
              off
            {% endif %}
      - action: conversation.process
        data:
          text: >
            {{ player }} is currently {{ status }} at volume {{ volume }}.
            Shuffle is {{ shuffle }} and repeat is {{ repeat }}.

  WhatsPlaying:
    speech:
      text: "Checking what's playing."
    action:
      - variables:
          player: "{{ lp_player if lp_player is string else lp_player[0] }}"
          ip: "{{ state_attr('sensor.lp_player_ip', 'map')[player] }}"
      - action: rest_command.wiim_metadata
        data:
          ip: "{{ ip }}"
      - delay: "00:00:01"
      - variables:
          raw: "{{ states('sensor.rest_response') }}"
          data: "{{ raw | from_json if raw is not none else {} }}"
          meta: "{{ data.metaData if 'metaData' in data else {} }}"
          title: "{{ meta.title | default('unknown title') }}"
          artist: "{{ meta.artist | default('unknown artist') }}"
          album: "{{ meta.album | default('unknown album') }}"
      - action: conversation.process
        data:
          text: >
            Now playing on {{ player }}:
            {{ title }} by {{ artist }} from the album {{ album }}.

###change this template sensor to add your device entity IDs and also your specific device IP, must match output from wiim.yaml

template:
  - sensor:
      - name: lp_player_ip
        state: "ok"
        attributes:
          map: >
            {{ {
              "media_player.ENTITYID": "YOUR DEVICE IP"
            } }}

rest_command:

  wiim_stop:
    url: "https://{{ ip }}/httpapi.asp?command=setPlayerCmd:stop"
    method: GET
    verify_ssl: false

  wiim_shuffle:
    url: "https://{{ ip }}/httpapi.asp?command=setPlayerCmd:loopmode:{{ mode }}"
    method: GET
    verify_ssl: false

  wiim_repeat:
    url: "https://{{ ip }}/httpapi.asp?command=setPlayerCmd:loopmode:{{ mode }}"
    method: GET
    verify_ssl: false

  wiim_eq:
    url: "https://{{ ip }}/httpapi.asp?command=EQLoad:{{ preset }}"
    method: GET
    verify_ssl: false

  wiim_input:
    url: "https://{{ ip }}/httpapi.asp?command=setPlayerCmd:switchmode:{{ source }}"
    method: GET
    verify_ssl: false

  wiim_status:
    url: "https://{{ ip }}/httpapi.asp?command=getPlayerStatus"
    method: GET
    verify_ssl: false

  wiim_metadata:
    url: "https://{{ ip }}/httpapi.asp?command=getMetaInfo"
    method: GET
    verify_ssl: false

Then wiim.yaml for intents, note you need to update your lp_player list as below ##comments## and add fields for additional devices. As for playlists the out has to be 1-12, and those preset numbers have to match the associated preset in that specific wiim device. So if you have the same playlist name across multiple wiim, then you need to make sure they are the same exact preset number in the WiiM app. If you try to play a preset on a different wiim, it will still send that number from the output of your spoken word but it will play that preset number. For example, if you have a driving preset on WiiM1 and that preset is number 2 and you do not have the same driving preset on Wiim2 and you say “play my playlist driving on WiiM2” it will play whatever preset 2 is on WiiM2 device. This file is saved in my system via samba add on in both config/custom_sentences/en/ AND share/speech-to-phrase/custom_sentences/en. By saving it in both locations it seemed to help everything work with less debugging.

language: en
slots: 
  media_player: 
    domain: media_player
lists:
  lp_player:
    values:
      - in: "WIIM device 1"      ####your spoken device name
        out: "media_player.WIIMdeviceEntityID"       #######this must be the entity ID for that device
  plpreset:        ####change your playlist1, playlist2 to your spoken playlist name.
    values:
      - in: "playlist1"
        out: "1"
      - in: "playlist2"
        out: "2"
      - in: "playlist3"
        out: "3"
      - in: "playlist4"
        out: "4"
      - in: "playlist5"
        out: "5"
      - in: "playlist6"
        out: "6"
      - in: "playlist7"
        out: "7"
      - in: "playlist8"
        out: "8"
      - in: "playlist9"
        out: "9"
      - in: "playlist10"
        out: "10"
      - in: "playlist11"
        out: "11"
      - in: "playlist12"
        out: "12"
  eq_preset:
    values:
      - in: "flat"
        out: "Flat"
      - in: "acoustic"
        out: "Acoustic"
      - in: "bass booster"
        out: "Bass Booster"
      - in: "boost bass"
        out: "Bass Booster"
      - in: "extra bass"
        out: "Bass Booster"
      - in: "bass reducer"
        out: "Bass Reducer"
      - in: "reduce bass"
        out: "Bass Reducer"
      - in: "less bass"
        out: "Bass Reducer"
      - in: "classical"
        out: "Classical"
      - in: "dance"
        out: "Dance"
      - in: "deep"
        out: "Deep"
      - in: "electronic"
        out: "Electronic"
      - in: "edm"
        out: "Electronic"
      - in: "hip hop"
        out: "Hip-Hop"
      - in: "hip-hop"
        out: "Hip-Hop"
      - in: "rap"
        out: "Hip-Hop"
      - in: "jazz"
        out: "Jazz"
      - in: "latin"
        out: "Latin"
      - in: "loudness"
        out: "Loudness"
      - in: "loud"
        out: "Loudness"
      - in: "lounge"
        out: "Lounge"
      - in: "piano"
        out: "Piano"
      - in: "pop"
        out: "Pop"
      - in: "r&b"
        out: "R&B"
      - in: "rnb"
        out: "R&B"
      - in: "rock"
        out: "Rock"
      - in: "small speakers"
        out: "Small Speakers"
      - in: "small speaker"
        out: "Small Speakers"
      - in: "spoken word"
        out: "Spoken Word"
      - in: "podcast"
        out: "Spoken Word"
      - in: "treble booster"
        out: "Treble Booster"
      - in: "boost treble"
        out: "Treble Booster"
      - in: "treble reducer"
        out: "Treble Reducer"
      - in: "reduce treble"
        out: "Treble Reducer"
      - in: "vocal booster"
        out: "Vocal Booster"
      - in: "boost vocals"
        out: "Vocal Booster"
  source:
    values:
      - in: "optical"
        out: "optical"
      - in: "optical in"
        out: "optical"
      - in: "bluetooth"
        out: "bluetooth"
      - in: "bt"
        out: "bluetooth"
      - in: "line in"
        out: "line-in"
      - in: "line-in"
        out: "line-in"
      - in: "aux"
        out: "line-in"
      - in: "wifi"
        out: "wifi"
      - in: "wi-fi"
        out: "wifi"
  mode:
    values:
      - in: "shuffle"
        out: "shuffle"
      - in: "random"
        out: "shuffle"
      - in: "repeat"
        out: "repeat"
      - in: "loop"
        out: "repeat"
  volume:
    range:
      from: 0
      to: 100
  position:
    range:
      from: 0
      to: 3600
intents:
  HassMediaPause:
    data:
      - sentences:
          - "(pause;<name>)"
        requires_context:
          domain: media_player
      - sentences:
          - "pause"
        requires_context:
          area:
            slot: true
      - sentences:
          - "pause [[the|my] (music|[tv] show[s]|media [player[s]])] [in] <area>"
          - "pause <area> (music|[tv] show[s]|media [player[s]])"
  HassMediaPlayerMute:
    data:
      - sentences:
          - "mute"
          - "mute ([the] (music|media [player[s]])) [<here>]"
        requires_context:
          area:
            slot: true
      - sentences:
          - "mute <name>"
          - "mute ([the] (music|media [player[s]])) on <name>"
        requires_context:
          domain: media_player
  HassMediaPlayerUnmute:
    data:
      - sentences:
          - "unmute"
          - "unmute ([the] (music|media [player[s]])) [<here>]"
        requires_context:
          area:
            slot: true
      - sentences:
          - "unmute <name>"
          - "unmute ([the] (music|media [player[s]])) on <name>"
        requires_context:
          domain: media_player
  HassMediaPrevious:
    data:
      - sentences:
          - "previous (track|item) [on|for] <name>"
          - "(go back [to the (previous|last) (song|track)];[on] <name>)"
          - "replay [the (previous|last) (song|track)] on <name>"
          - "<name> (play|replay) [the] (previous|last) [(song|track)] [again]"
        requires_context:
          domain: media_player
      - sentences:
          - "previous (track|item)"
          - "go back [to the (previous|last) (song|track)]"
          - "replay [the (previous|last) (song|track)] [again]"
          - "play [the] (previous|last) [(song|track)] [again]"
        requires_context:
          area:
            slot: true
      - sentences:
          - "previous (track|item) [in] <area>"
          - "go back [to the (previous|last) (song|track)] [in] <area>"
          - "(replay [the (previous|last) (song|track)] [again];[in] <area>)"
          - "(play [the] (previous|last) [(song|track) ][again];[in] <area>)"
  HassMediaUnpause:
    data:
      - sentences:
          - "((unpause|resume);<name>)"
        requires_context:
          domain: media_player
      - sentences:
          - "(unpause|resume)"
        requires_context:
          area:
            slot: true
      - sentences:
          - "(unpause|resume) [[the|my] (music|[tv] show[s]|media [player[s]])] [in] <area>"
          - "(unpause|resume) <area> (music|[tv] show[s]|media [player[s]])"
  HassSetVolume:
    data:
      - sentences:
          - "<numeric_value_set> <name> volume to <volume>"
          - "turn <name> [volume] (up|down) to <volume>"
          - "(<numeric_value_set> the volume to <volume>;[on] <name>)"
          - "(turn (the volume;(up|down)) to <volume>;[on] <name>)"
        requires_context:
          domain: media_player
      - sentences:
          - "<numeric_value_set> volume to <volume>"
          - "turn volume (up|down) to <volume>"
          - "<numeric_value_set> the volume to <volume>"
          - "turn (the volume;(up|down)) to <volume>"
        requires_context:
          area:
            slot: true
      - sentences:
          - "<numeric_value_set> <area> volume to <volume>"
          - "turn <area> [volume] (up|down) to <volume>"
          - "turn [volume] (up|down) to <volume> [in] <area>"
          - "(<numeric_value_set> the volume to <volume>;[in] <area>)"
          - "<numeric_value_set> the volume [in] <area> to <volume>"
          - "(turn (the volume;(up|down)) to <volume>;[in] <area>)"
  HassMediaNext:
    data:
      - sentences:
          - "next [track|item] [on|for] <name>"
          - "(skip [(to [the] next [(song|track)]|([the] (song|track)|this [(song|track)]) )];[on] <name>)"
        requires_context:
          domain: media_player
      - sentences:
          - "next [track|item]"
          - "(skip [(to [the] next [(song|track)]|([the] (song|track)|this [(song|track)]))])"
        requires_context:
          area:
            slot: true
      - sentences:
          - "next [track|item] [in] <area>"
          - "(skip [(to [the] next [(song|track)]|([the] (song|track)|this [(song|track)]) )];[in] <area>)"
  HassSetVolumeRelative:
    data:
      - sentences:
          - "[turn [the]] volume up"
          - "turn up [the] volume"
          - "increase [the] volume"
        slots:
          volume_step: "up"
        requires_context:
          area:
            slot: true
      - sentences:
          - "[turn [the]] volume up [by] <volume_step>"
          - "turn up [the] volume [by] <volume_step>"
          - "increase [the] volume [by] <volume_step>"
        expansion_rules:
          volume_step: "{volume_step_up:volume_step}[%| percent]"
        requires_context:
          area:
            slot: true
      - sentences:
          - "[turn [the]] volume down"
          - "turn down [the] volume"
          - "decrease [the] volume"
        slots:
          volume_step: "down"
        requires_context:
          area:
            slot: true
      - sentences:
          - "[turn [the]] volume down [by] <volume_step>"
          - "turn down [the] volume [by] <volume_step>"
          - "decrease [the] volume [by] <volume_step>"
        expansion_rules:
          volume_step: "{volume_step_down:volume_step}[%| percent]"
        requires_context:
          area:
            slot: true
      - sentences:
          - "(<name>;volume up)"
          - "increase [the] volume [on|for] <name>"
          - "increase <name> volume"
          - "turn up <name> volume"
        slots:
          volume_step: "up"
        requires_context:
          domain: "media_player"
      - sentences:
          - "(<name>;volume up) [by] <volume_step>"
          - "increase [the] volume ([on|for] <name>;[by] <volume_step>)"
          - "increase <name> volume [by] <volume_step>"
          - "turn up <name> volume [by] <volume_step>"
        expansion_rules:
          volume_step: "{volume_step_up:volume_step}[%| percent]"
        slots:
          volume_step: "up"
        requires_context:
          domain: "media_player"
      - sentences:
          - "(<name>;volume down)"
          - "decrease [the] volume [on|for] <name>"
          - "decrease <name> volume"
          - "turn down <name> volume"
        slots:
          volume_step: "down"
        requires_context:
          domain: "media_player"
      - sentences:
          - "(<name>;volume down) [by] <volume_step>"
          - "decrease [the] volume ([on|for] <name>;[by] <volume_step>)"
          - "decrease <name> volume [by] <volume_step>"
          - "turn down <name> volume [by] <volume_step>"
        expansion_rules:
          volume_step: "{volume_step_down:volume_step}[%| percent]"
        slots:
          volume_step: "down"
        requires_context:
          domain: "media_player"
      - sentences:
          - "(<area_floor>;volume up)"
          - "[turn [the]] volume up (in|on) <area_floor>"
          - "turn up [the] volume [in|on] <area_floor>"
          - "turn up <area_floor> volume"
          - "increase [the] volume [in|on] <area_floor>"
          - "increase <area_floor> volume"
        slots:
          volume_step: "up"
      - sentences:
          - "(<area_floor>;volume up) [by] <volume_step>"
          - "[turn [the]] volume up ((in|on) <area_floor>;[by] <volume_step>)"
          - "turn up [the] volume ([in|on] <area_floor>;[by] <volume_step>)"
          - "turn up <area_floor> volume [by] <volume_step>"
          - "increase [the] volume ([in|on] <area_floor>;[by] <volume_step>)"
          - "increase <area_floor> volume [by] <volume_step>"
        expansion_rules:
          volume_step: "{volume_step_up:volume_step}[%| percent]"

      - sentences:
          - "(<area_floor>;volume down)"
          - "[turn [the]] volume down (in|on) <area_floor>"
          - "turn down [the] volume [in|on] <area_floor>"
          - "turn down <area_floor> volume"
          - "decrease [the] volume [in|on] <area_floor>"
          - "decrease <area_floor> volume"
        slots:
          volume_step: "down"
      - sentences:
          - "(<area_floor>;volume down) [by] <volume_step>"
          - "[turn [the]] volume down ((in|on) <area_floor>;[by] <volume_step>)"
          - "turn down [the] volume ([in|on] <area_floor>;[by] <volume_step>)"
          - "turn down <area_floor> volume [by] <volume_step>"
          - "decrease [the] volume ([in|on] <area_floor>;[by] <volume_step>)"
          - "decrease <area_floor> volume [by] <volume_step>"
        expansion_rules:
          volume_step: "{volume_step_down:volume_step}[%| percent]"  
  PlayPreset:
    data:
      - sentences:
          - "play preset {plpreset} on {lp_player}"
          - "play playlist {plpreset} on {lp_player}"
          - "play my playlist {plpreset} on {lp_player}"
          - "play {plpreset} on {lp_player}"
          - "set {lp_player} to preset {plpreset}"
          - "set the {lp_player} to preset {plpreset}"
          - "set {lp_player} to playlist {plpreset}"
          - "set the {lp_player} to playlist {plpreset}"
          - "{lp_player} play preset {plpreset}"
          - "{lp_player} play playlist {plpreset}"
          - "{lp_player} play {plpreset}"
        slots:
          lp_player: lp_player
          plpreset: plpreset
  JoinSpeakers:
    data:
      - sentences:
          - "join {lp_player}"
          - "group {lp_player}"
          - "combine {lp_player}"
        slots:
          lp_player:
            type: lp_player
            list: true
  UnjoinSpeakers:
    data:
      - sentences:
          - "unjoin {lp_player}"
          - "ungroup {lp_player}"
          - "separate {lp_player}"
        slots:
          lp_player: lp_player
  PlayPresetInAreas:
    data:
      - sentences:
          - "play {plpreset} in {area_names}"
          - "play my {plpreset} playlist in {area_names}"
        slots:
          plpreset: plpreset
          area_names:
            type: area
            list: true
  StopPlayback:
    data:
      - sentences:
          - "stop {lp_player}"
          - "stop the music on {lp_player}"
        slots:
          lp_player: lp_player
  ShuffleOn:
    data:
      - sentences:
          - "shuffle on {lp_player}"
          - "enable shuffle on {lp_player}"
        slots:
          lp_player: lp_player
  ShuffleOff:
    data:
      - sentences:
          - "shuffle off {lp_player}"
          - "disable shuffle on {lp_player}"
        slots:
          lp_player: lp_player
  RepeatOne:
    data:
      - sentences:
          - "repeat one on {lp_player}"
        slots:
          lp_player: lp_player
  RepeatAll:
    data:
      - sentences:
          - "repeat all on {lp_player}"
        slots:
          lp_player: lp_player

  EQPreset:
    data:
      - sentences:
          - "set {lp_player} EQ to {eq_preset}"
        slots:
          lp_player: lp_player
          eq_preset: eq_preset
  InputSwitch:
    data:
      - sentences:
          - "set {lp_player} input to {source}"
          - "switch {lp_player} to {source}"
        slots:
          lp_player: lp_player
          source:
            values:
              - "optical"
              - "bluetooth"
              - "line-in"
              - "wifi"
  PlayEverywhere:
    data:
      - sentences:
          - "play {plpreset} everywhere"
          - "play my {plpreset} playlist everywhere"
          - "play preset {plpreset} on all speakers"
          - "play {plpreset} on every speaker"
        slots:
          plpreset: plpreset
  UnjoinAll:
    data:
      - sentences:
          - "ungroup all speakers"
          - "unjoin all speakers"
          - "separate all speakers"
  SetAllVolume:
    data:
      - sentences:
          - "set all speakers to {volume} percent"
          - "set volume to {volume} on all speakers"
          - "set every speaker to {volume} percent"
        slots:
          volume: volume
  PauseEverywhere:
    data:
      - sentences:
          - "pause everywhere"
          - "pause all speakers"
          - "pause the music everywhere"
  StopEverywhere:
    data:
      - sentences:
          - "stop everywhere"
          - "stop all speakers"
          - "stop the music everywhere"
  MuteAll:
    data:
      - sentences:
          - "mute all speakers"
          - "mute everything"
          - "mute everywhere"
  WiimStatus:
    data:
      - sentences:
          - "what is the status of {lp_player}"
          - "give me the status of {lp_player}"
        slots:
          lp_player: lp_player
  WhatsPlaying:
    data:
      - sentences:
          - "what's playing on {lp_player}"
          - "what song is playing on {lp_player}"
          - "what is playing on {lp_player}"
        slots:
          lp_player: lp_player

I tested the important-to-me features of this and it works as intended. Please note that the custom intent/intent_scripts were added to help better control WiiM devices where it cannot be done with built-in intents. These custom intents/intent_scripts only work with WiiM devices that use the same API commands. If you try to incorporate this for OTHER media_players they will not work with the same rest commands so you would need to figure out a way to adapt that to your specific device(s) and map it accordingly and put them in a separate list in your custom sentences.

To note @mchk the reason I love stop is because it literally forces the linkplay in HA for that WiiM to go back into idle mode with no playlist selected. This means that it cannot be accidentally started as a misinterpreted word. In our household we love silence as much as we love music, so that is critical for us for no surprises. We use resume/pause when we want to actively listen to music and we use stop when we are finished and moving on, for us it is like turning off the light.