Home Assistant Voice Preview, TTS Failing & VLANs

I have just installed a new Home Assistant Voice Preview Edition (HA VPE), and am having some Text to Speech issues which look to be network related. This is running locally and not using the HA Cloud.

It looks like everything is work up until the HA VPE unit attempts the speech output. At this point the ring pulses for a variable amount of time which seems to be related to the length of the message.

While most of the functionality appears to be working there seems to be an underlying issue. Going into Settings > ESPHome and clicking on the Setup Voice Assistant option goes though the Firmware checks. It then tries the configuration and times out with “The voice assistant is unable to connect to Home Assistant

To play audio, the voice assistant device has to connect to Home Assistant to fetch the files. Our test shows that the device is unable to reach the Home Assistant server.”

The Help Me option on the error links to Troubleshooting Assist - Home Assistant which correctly describes the problem (“My voice assistant understands me and processes the command, but I don’t get a voice response.”) and in part refers to the internal network settings for the profile.

This is where I think the problem is. In my environment HASSIO is in it’s own VLAN. The HA VPE is in a different VLAN. There are no firewall rules blocking traffic between the two. HASSIO is setup with a certificate.

Because I have the certificate the config will not allow me to put an IP address into the internal network field. However, the environment has a Pi-Hole, so it can resolve the public DuckDNS name to a local IP.

What I suspect is happening though is that because HASSIO is on a different IP network to HA VPE when the HA VPE goes to fetch the speech file to play it is trying to use the public entry and access HASSIO from the outside.

While most of this is working, because I cannot complete the setup then some of the work arounds are not available.

For troubleshooting I have also tried to just use Piper for a simple TTS test, but that doesn’t work either.

Has anyone else managed to get HA & HA VPE working when they are in different VLANS?

Yes, I have. I commented on a post about issues connecting a PE to a different VLAN. Take a look here:

Thanks for the link, it’s been helpful. If I understand it correctly you added a route to allow the traffic between the two IP networks? Are these setup as VLANS, or are they routed networks?
My home lab is using a firewall to control/route the traffic. From the logs I’m comfortable that the firewall is not blocking the traffic. This seems to line up with the PE being able to control devices and generally interact with HA.
It seems that the (simplified) flow is,

  1. Voice command to the PE
  2. PE interacts with HA to process and action the command
  3. HA sends a message to the PE telling it where to look for the file containing the voice response
  4. PE initiates a new connection to HA to pull the voice file and then plays it

Step 4 is where the wheels are falling off. I will take a look at the firewall logs again and focus on the traffic from the PE to the HA…

That is the same.

HA use discovery protocols and PEs might also, so you need to look for that traffic too.
Discovery protocols are rarely routable though, so you will have to set up reflectors for them somehow.

My VPE is on a separate VLAN to my HA but I’m using Unifi and it just works :man_shrugging:

VLANs and routed networks are not the same, but are typically used together. You do not need a VLAN to for a routed network to work which may be why some configurations work and mine doesn’t.

One summary explains it this way,
IP Routed Network:

An IP routed network utilizes IP addresses to route traffic across different network segments. Routers use destination IP addresses to determine the optimal path for data packets, ensuring they reach their destination. IP routing operates at Layer 3 and uses IP addresses to identify network segments.

VLANs:

VLANs create virtual networks on top of a physical network, allowing devices to be grouped logically regardless of their physical location. They enhance network performance and security by isolating broadcast domains, reducing traffic congestion and improving security. VLANs typically operate at Layer 2 and use MAC addresses for communication within the same VLAN.

Relationship:

VLANs and IP routing often work together. While VLANs segment the network at Layer 2, IP routing is used to connect these segments (inter-VLAN routing) or to route traffic across different VLANs.

This is probably firewall related or maybe you should also be using mdns repeater.
But in my opinion it is firewall related. Check your firewall logs.

Thanks for the suggestion. I’m using PiHole as a local DNS server, and the firewall rules show this is working. I can see the HA VPE successfully connect to the DNS server on port 53, then communicate with HA on port 443.
There are explicit rules at the top of the firewall allowing traffic between the two devices, and the logs don’t show dropped traffic. It’s a SophosXG, and the logging doesn’t seem to show all the traffic though, so it may be a non-standard protocol.

Are you sure that esphome is using port ?
I have esphome devices on different vlan and this is what I’m using as firewall rules


# Allow TCP traffic for ESPHome communication on port 6053
iptables -D FORWARD -i br0 -o br53 -p tcp --dport 6053 -j ACCEPT >/dev/null 2>&1
iptables -I FORWARD -i br0 -o br53 -p tcp --dport 6053 -j ACCEPT
iptables -D FORWARD -i br53 -o br0 -p tcp --sport 6053 -j ACCEPT >/dev/null 2>&1
iptables -I FORWARD -i br53 -o br0 -p tcp --sport 6053 -j ACCEPT

# Allow established connections for port 6053
iptables -D FORWARD -i br53 -o br0 -m state --state ESTABLISHED,RELATED -j ACCEPT >/dev/null 2>&1
iptables -I FORWARD -i br53 -o br0 -m state --state ESTABLISHED,RELATED -j ACCEPT

# Allow TCP traffic on port 3232 (ESPHome OTA updates)
iptables -I FORWARD -i br0 -o br53 -p tcp --dport 3232 -j ACCEPT
iptables -I FORWARD -i br53 -o br0 -p tcp --sport 3232 -j ACCEPT
iptables -I FORWARD -i br0 -o br53 -p udp --dport 3232 -j ACCEPT
iptables -I FORWARD -i br53 -o br0 -p udp --sport 3232 -j ACCEPT

Maybe not the best but this is working for my setup.

As they say, “It ain’t dumb if it works”. :grinning:
The firewall rules are Any-Any, so it should be open access. However… the firewall logs are not showing traffic on those ports. I’m not sure if that’s cause or effect though. If the HA is passing the link for the VPE to pull down the speech file, but the link link is for the external URL (or something similar) then it may never get to that stage.

The “local network” listed in HAui>> settings >> network must be accessible to bother the HAVPE and HA

If this IP is accessible in both vlan HAVPE works without issue. Don’t presume HA can access the IP. You must verify it can.

So, some steps forward that may help troubleshoot. Under the ESPHome integration the is an “Enable debug logging” option.


Click on that, provide a voice command, wait for the lights to stop flashing, and click it again. The browser automatically downloads the log file. This will go back in time, so scroll to when the last interaction started. It looks like it has all the voice content, so it’s long.
The logs will end up showing the STT, and the reply it is going to send back. Following that there is an entry that includes,

“https[:]//mydomain.duckdns[.]org/api/tts_proxy/On6_8vShJLtJjyZQjHA18Q.flac”\033[0m"

Which is where I think the HA VPE is told to grab the file.

Going down a bit gives this,

message: “\033[0;36m[D][esp-idf:000]\033[1;31m[ann_read]\033[0;36m: \033[0;31mE (141030301) esp-tls: [sock=57] delayed connect error: Connection reset by peer\033[0m\n\033[0m”

2025-05-01 11:14:11.366 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09b8c3: [D][esp-idf:000][ann_read]: E (141030301) esp-tls: [sock=57] delayed connect error: Connection reset by peer

2025-05-01 11:14:11.367 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09b8c3 @ 172.16.20.120: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: “\033[0;36m[D][esp-idf:000]\033[1;31m[ann_read]\033[0;36m: \033[0;31mE (141030303) esp-tls: Failed to open new connection\033[0m\n\033[0m”

2025-05-01 11:14:11.367 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09b8c3: [D][esp-idf:000][ann_read]: E (141030303) esp-tls: Failed to open new connection

2025-05-01 11:14:11.368 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09b8c3 @ 172.16.20.120: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: “\033[0;36m[D][esp-idf:000]\033[1;31m[ann_read]\033[0;36m: \033[0;31mE (141030304) transport_base: Failed to open a new connection\033[0m\n\033[0m”

2025-05-01 11:14:11.368 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09b8c3: [D][esp-idf:000][ann_read]: E (141030304) transport_base: Failed to open a new connection

2025-05-01 11:14:11.370 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09b8c3 @ 172.16.20.120: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_DEBUG
message: “\033[0;36m[D][esp-idf:000]\033[1;31m[ann_read]\033[0;36m: \033[0;31mE (141030305) HTTP_CLIENT: Connection failed, sock < 0\033[0m\n\033[0m”

2025-05-01 11:14:11.370 DEBUG (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09b8c3: [D][esp-idf:000][ann_read]: E (141030305) HTTP_CLIENT: Connection failed, sock < 0

2025-05-01 11:14:11.379 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09b8c3 @ 172.16.20.120: Got message of type SubscribeLogsResponse: level: LOG_LEVEL_ERROR
message: “\033[1;31m[E][speaker_media_player.pipeline:112]: Media reader encountered an error: ESP_ERR_HTTP_CONNECT\033[0m”

2025-05-01 11:14:11.379 ERROR (MainThread) [homeassistant.components.esphome.manager] Home Assistant Voice 09b8c3: [E][speaker_media_player.pipeline:112]: Media reader encountered an error: ESP_ERR_HTTP_CONNECT
2025-05-01 11:14:11.389 DEBUG (MainThread) [aioesphomeapi.connection] home-assistant-voice-09b8c3 @ 172.16.20.120: Got message of type MediaPlayerStateResponse: key: 2232357057
state: MEDIA_PLAYER_STATE_IDLE

So at this stage I’m wondering is the network is OK but it’s a cert issue as other people have mentioned TLS 1.3?

So your “local network” is WAN Connection?
Can HA access that address?
Can Voice PE access that address?
Ping only, nothing special.
Can you temporaily add a local IP and test with that?

Thats complicated. I think the connection method is weird and doesnt allow domain at all. 1.3 may be a side not but I have no interest in trying to figure that part out. I had same/similar errors and it really came down to HA not being able to access the IP provided for HA. Once I got an IP that both Voice PE and HA could access all was resolved.

When troubleshooting I try to get to working state in simpliest manner possible then work backwards – or forwards depending how you see this – to get to the desired state. 1.3 is a rabbit hole and duckdns is as well. try testing with local IP to verify all is OK then move from there. or not

True, the definition of them are not the same, but the in the context of the question they are the same, because the question is kind of moot in the way it is formed.
VLANs and routed networks are not opposites, so using and OR here makes no sense.
You will be using routed network whether they are real networks or VLANs, so you will have to always answer yes to that part.
From the device view on the network it is unimportant and also hidden to them whether they are using a real networks or VLANs.

It looks like there are others out there with the same issue (VA works except does not play back the auto response) and a similar setup (VA in separate VLAN to the HA, certificate enabled on HA).

This explains the symptoms I see & looks like it is my issue. PE device has no TTS responses (server SSL) · Issue #315 · esphome/home-assistant-voice-pe · GitHub (TLDR: The VA cannot use TLS, and can only pull the files if SSL is disabled on HA. There are options suggested, or wait it out for a fix.)

Can your VAs access the CAs in your cert chain?

This seems irrelevant based on the post above.

Microcontrollers like a ESP32 do not run a full Linux distribution. Per definition their HTTP stack is limited in terms of performance, not can they run/store encryption in a fast and reliable way.

So summing up all this info you are running into issues because while developing the VPE there has been an assumption made that the internal webserver is the original, non-HTTPS webserver to connect to HA in your local network.
It was never intended to serve the audio data to the VPE (or any other ESP device) over the cloud or the local webserver with some self signed certificate.

It will auto pick external URL if the internal one has SSL configured. Like I said, the design choices of the past are not helping here. The fact that you can configure the builtin webserver with your own SSL certificate is the reason it breaks.