Well now with the launch of the official Home Assistant Voice Preview Edition there is another clear reason for more powerful hardware, it is officially stated that the Home Assistant Green (and Home Assistant Yellow) does not have powerful enough hardware to run the larger Whisper model for Assist locally.
This was also mention multiple times by Paulus, J-Lo and Mile in the video linked here as well as shown in the workflow diagram
”The N100 platform by Intel is our recommended hardware for local voice, that is like the bare-minimum.”
”If you are going to go local then we recommend out-of-the-gate something like an Intel N100 or better, especially for Whisper, if you have a graphics processor with a couple of gigs VRAM then you can get really fast large models that have high accuracy running.”
So if you run Home Assistent OS on hardware with a relativly slow CPU (like the Home Assistant Green) then for voice control you must off-load speech-to-text and text-to-speech to cloud services for it to be fast enough to be usable. If you want to run fully locally then you need a faster computer to run Home Assistent OS and the local Whisper piper on.
Note that they are here not even talking about LLMs for AI conversation agents but only the STT and TTS parts. It is the Whisper part, i.e. the Speech-To-Text parts that is usually the bottleneck for local voice control (without LLM). Meaning you will be dependet on the internet and cloud services for voice control if you do not have faster hardware to run everything locally.