What do I need to get a device working with HAs built in voice assist and LLM?

yes. It is similar to HA voice PE. esp32 device with built in speaker / amp / mic.

If you setup local HA voice assistant pipeline (STT / TTS / Wakeword) it will use that and only that if you flash it with esphome firmeware. With this it will be completely local.

I have esphome yaml for this device that supports latest esphome version.

Yes but

You can look in forum. There are already devices with screen that people have done work to get functioning with esphome. Getting devices working with esphome can be straightforward but when it come to i2s and display it gets a little complicated. Unless you are programmer or want to learn I suggest getting a device that is confirmed, tested and someone already made yaml for.

Also the device you list look tiny and speaker and mic will likely be poor due to size. When looking at voice, speaker will matter. mic matter as well but they tend to be less of an issue. I would also consider form factor. The device you list will not sit on table or other location well

echo ear look OK

esp-s3-box is another. I have this but audio is less than poor

my current belief is that voice/mic and display should be seperate. This is based on current device availability. This will change by June 2026 I expect and more / better devices will be available. for now I suggest you do either / or but not both.

There are some esp display with speaker (no mic) but I expect audio is poor maybe but again, this is why if you are looking for alexa like device you should get that (esp, speaker, mic, no display) for now.