Well… The whole voice assistant business is not so much a rabbit hole as a vast, growing warren of intersecting burrows.
My understanding is largely non-technical, but for what it’s worth (I’m sure someone will correct me)…
There are a number of stages in any voice assistant setup:
Waking up the voice assistant device with the microphone
Purpose-built voice assistants are mostly designed to be always listening, waiting for a “wake word”, but this is usually handled locally, in the device itself. If you use a phone or tablet with the HA app you should be able to turn on the microphone physically by clicking a button. Either way, this is not something that would normally be handled centrally (on a Synology server, for example).
Normally you would have a voice assistant device in or within earshot of each room, but if you use the HA app on your phone you can use it anywhere. You can do both, of course.
Someone, somewhere has probably implemented a clapper (someone, somewhere has done nearly everything), but I’ve never come across anything like it.
Issuing a command through the device
Commands are handled centrally on your Home Assistant server (which could be a Synology).
Broadly speaking, there are two approaches. You can create custom sentences and responses (intents) - which involve a bit of code writing - or you can use commands that are built in to HA.
Any device integrated with HA can be “exposed” to Assist (with a simple tickbox) and turned on and off with the built-in commands, and there is a growing list of more complex commands like “Set the heating to XX”, “Put XX on the shopping list”, which don’t need any setting up.
Speech to text
Whichever route you take (you can mix both) your spoken sentence will have to be interpreted and converted into commands that HA can obey. This can be done locally, but it is a very deep rabbit hole indeed. Most voice applications will send your command to the cloud one way or another. Voice Assistant PE is designed to do this.
Text to speech
Any responses to your commands need to be converted back to speech (even if it’s only “Sorry, I didn’t understand that”). There are several (rather robotic) local options which run on your HA server and many very realistic voices if you don’t mind connecting to a cloud service.
Responses to custom sentences can be sent to any integrated speaker; responses to built-in commands are always through the device you spoke to. If you’re happy with the built-in commands there shouldn’t be any coding involved. If you want to go down yet another rabbit hole, you can connect to an LLM service to get more varied and “realistic” responses.
Hope this is some help. Unfortunately the docs about using Assist and voice commands are even more obscure than usual. If you haven’t found it already, this is a good starting point:
Edit: A historical note, which may make things a bit clearer: Assist predates Voice Assistant PE (by a couple of years) and was designed to be used through the HA app, with text or voice commands. The whole tottering edifice is built from there…