Yes, sure!
ESPHome is a technology to put together smart connected devices using ESP8266 and ESP32 computers (tiny low-power computers which have Wi-FI and Bluetooth connectivity).
These computers are not full PCs like a Raspberry Pi (on which you would normally install a complete Linux system). Rather, the programming that goes into these devices is far lower level; you could even argue these devices, when programmed, have no “operating system” as we understand normally.
Unlike standard Arduino programming for these boards, ESPHome is much more like a LEGO assemble-your-device experience; you describe (via YAML) what each part hooked to the board “means” (in terms of “this device is a light, that gizmo is a thermometer, this other gizmo hooked here is a light sensor”). There is one similarity to standard Arduino programming, in that the initial programming (“flashing”) of the computer must be done via a serial adapter — however, once ESPHome has been installed to the computer, you can subsequently upgrade the computer’s software over the air.
ESPHome has native compatibility with Home Assistant, which means that sensors and switches you program on your board can by default appear as entities in Home Assistant directly. E.g. if your ESP device has an LED connected on pin 5, you can write some very simple YAML (to be programmed into the device) that makes this LED appear as a light in Home Assistant — and toggling the switch in Home Assistant would turn the LED on and off.
Finally, ESPHome — just like Arduino programming — also lets you write code to program your device as well. This is the joker card for doing things that simply wouldn’t be possible via the standard LEGO building blocks-like approach.
Yes, given the right hardware (most likely an ESP board, an amplifier, and an I2S chip, all hooked together properly) you can use ESPHome to make a “connected speaker” which can present itself to Home Assistant, or possibly even serve other protocols like Bluetooth.
I hope that was helpful.
EDIT: I wrote a longer explainer up here: