Voice Chapter 11: multilingual assistants are here

dumbdevice · October 30, 2025, 1:22am

Saw live-stream, some ideas:

about Confirmations:

I think it should be configurable
all my devices are in one area, for different devices I would want to have different types of confirmations

Configurable fuzzy match options:

Strict match
Very Strict match
Fuzzy match

Hedda · October 30, 2025, 7:44am

https://www.youtube.com/live/sIkguv0NEQI

Would love to also hear more about voice hardware roadmap with future reference hardware plans for both ESPHome on ESP32 and Linux Voice Assistent on ARM64 satellites from Open Home Foundation and partners. Will you for example be making a matching official product for Linux Voice Assistent that matches the Home Assistant Voice Preview Edition as reference hardware?

Would it be a good idea to have voice satellites with a fixes microphone board and a modular ”compute” (a.k.a. ”core”) board like they have in the FutureProofHomes Satellite1 modular design concept?

Satellite1 Voice Assistant PCB Dev Kit – FutureProofHomes

That is, make two swappable SoM (System-on-Module) ”compute” boards, with one SoM-board based on ESP32 and one SoM-board based on a powerful ARM64 SoC (similar to Raspberry Pi Zero 2 W)?

Maybe base the ARM64 SoM compute-board on a SoC with built-in NPU that is powerful enough to off-load Speach-To-Text to free up resources from other tasks.

Perhaps even make the ESP32 SoM compute-board with multiple ESP32 chips on the same board to off-load the communication tasks and also make it work as a Thread Border Router (if could combine an ESP32-S3 with an ESP32-C6).

Also wondering if you can test if inexpensive AI hardware accelerator hardware could run a small LLM?

NathanCu · October 30, 2025, 9:36am

Ill stop you here… This does not exist.

The smallest GPU (yes you you need one in LLM land - not optional… vram is your limiting factor) you can hope to use and have a decent experience is something better than a Nvidia 3xxx with at LEAST 8GB vram. Preferably 16 or more.

That puts you MINIMUM $800 usd. Probably in the low $1000’s

mchk · November 6, 2025, 11:44am

2025.11. timer_command:conversation_command still only works for the standard “Home assistant” agent.

tal6203 · November 13, 2025, 2:47pm

Does anyone know if they will add a variety or more new wake words in the next version?

NathanCu · November 13, 2025, 3:06pm

I would not expect them to. They have three (four if you include the stop) perfectly good (even if we don’t like them) copyright clearing terms already and have documented how to make your own.

If they build a new one and run afoul of copyright or service marks or trademark or… They get sued because they’re a business entity.

If you do it… It’s your install… Have fun. Maybe the copyright owner comes to tell you to stop but HA isn’t sued out of existence by just shipping something.

Given those choices if I’m your Dev PM I won’t LET you make more three is fine we have bigger fish to fry. You have satisfied the build requirements and prevented scope creep and lawsuit. I call it a win. Now am I in that room and do I KNOW they make that decision… No. But coming for the perspective of people who’ve has to make decisions like that. I wouldn’t hold my breath… (probably not)

dumbdevice · November 13, 2025, 5:37pm

I am fine with just a single “ok nabu”, it just should work as good as ok google

Sean_H · November 16, 2025, 10:32am

Finally, I can see a route to moving away from Alexa/Plex. Lighting and music are my primary use cases (with heating, at least logging, being the next most relevant). LMS has replaced Plex now that I have some stand-alone wireless speakers (and I can use about 3 more squeezelite players).
I’m not too worried about the performance of local acceleration, I expect models and hardware to converge soon enough as the next ASIC iterations start coming to market.
Immediate priorities for me are phrase/context accuracy, and music library handling - which both seem to be in hand.

hacs8888 · January 7, 2026, 3:32pm

Hello,
In general I can set up the linux-voice-assistant and it works with HA (e.g. light on ). But I cant’ hear any sounds (wake & finished or answers to question).
Hardware I use: orange pi zero 3 & Jabra 510

What do I have to configure?

hacs8888 · January 8, 2026, 10:16am

Found a solution: Change the mpv_player.py to:

_LOGGER = logging.getLogger(__name__)


class MpvMediaPlayer:
    def __init__(self, device: Optional[str] = None) -> None:
        self.player = MPV(audio_device='pulse')

        if device:
            self.player["audio-device"] = device

        self.is_playing = False

        self._playlist: List[str] = []
        self._done_callback: Optional[Callable[[], None]] = None
        self._done_callback_lock = Lock()

        self._duck_volume: int = 50
        self._unduck_volume: int = 100

        self.player.event_callback("end-file")(self._on_end_file)

    def play(
        self,
        url: Union[str, List[str]],
        done_callback: Optional[Callable[[], None]] = None,
        stop_first: bool = True,
    ) -> None:
        self.stop()
        self.player["audio-device"] = 'pulse'
        if isinstance(url, str):
            self._playlist = [url]
        else:
            self._playlist = url

        next_url = self._playlist.pop(0)
        _LOGGER.debug("Playing %s", next_url)