Home Assistant Voice Preview Edition cannot connect to local server and fails to download .flac voice response

During onboarding HA Voice Preview Edition the workflow says that the device cannot connect to HA server and directs me to a troubleshooting page to ensure that Settings → System → Network → Local Network URL is correct. (it is correct and it is using a https:// url with a lets encrypt certificate that is working for other devices on the network).

I pressed on anyway and added the device to ESP home and the adoption worked, latest code compiled and installed. I could issue voice commands successfully, but no voice response was given. Checking the device logs shows that it is getting an SSL error and trying to download it using the remote HA cloud URL instead of the local URL. I following is the log file (with my HA cloud URL changed - it is correct in the original log)

Couple of questions:

  1. Why is it trying to use the HA cloud URL instead of the local network URL configured in HA
  2. Why can it not download the flac file with a HA Cloud URL that works fine on my laptop if I put it into a browser?
  • Core2025.1.2
  • Supervisor2024.12.3
  • Operating System14.1
  • Frontend20250109.0

The specific SSL error:

[10:26:28][D][esp-idf:000][ann_read]: E (1881717) esp-tls-mbedtls: mbedtls_ssl_handshake returned -0x7280

The conversation log:

[10:26:23][D][micro_wake_word:355]: Detected 'Okay Nabu' with sliding average probability is 0.89 and max probability is 0.98
[10:26:23][D][media_player:080]: 'Media Player' - Setting
[10:26:23][D][media_player:084]:   Command: STOP
[10:26:23][D][media_player:093]:  Announcement: yes
[10:26:23][D][media_player:080]: 'Media Player' - Setting
[10:26:23][D][media_player:093]:  Announcement: yes
[10:26:23][D][nabu_media_player.pipeline:173]: Reading FLAC file type
[10:26:23][D][nabu_media_player.pipeline:184]: Decoded audio has 1 channels, 48000 Hz sample rate, and 16 bits per sample
[10:26:23][D][nabu_media_player.pipeline:211]: Converting mono channel audio to stereo channel audio
[10:26:24][D][voice_assistant:515]: State changed from IDLE to START_MICROPHONE
[10:26:24][D][voice_assistant:522]: Desired state set to START_PIPELINE
[10:26:24][D][voice_assistant:225]: Starting Microphone
[10:26:24][D][voice_assistant:515]: State changed from START_MICROPHONE to STARTING_MICROPHONE
[10:26:24][D][voice_assistant:515]: State changed from STARTING_MICROPHONE to START_PIPELINE
[10:26:24][D][voice_assistant:280]: Requesting start...
[10:26:24][D][voice_assistant:515]: State changed from START_PIPELINE to STARTING_PIPELINE
[10:26:24][D][voice_assistant:537]: Client started, streaming microphone
[10:26:24][D][voice_assistant:515]: State changed from STARTING_PIPELINE to STREAMING_MICROPHONE
[10:26:24][D][voice_assistant:522]: Desired state set to STREAMING_MICROPHONE
[10:26:24][D][voice_assistant:641]: Event Type: 1
[10:26:24][D][voice_assistant:644]: Assist Pipeline running
[10:26:24][D][voice_assistant:641]: Event Type: 3
[10:26:24][D][voice_assistant:655]: STT started
[10:26:24][D][light:036]: 'voice_assistant_leds' Setting:
[10:26:24][D][light:047]:   State: ON
[10:26:24][D][light:051]:   Brightness: 66%
[10:26:24][D][light:109]:   Effect: 'Waiting for Command'
[10:26:24][D][power_supply:033]: Enabling power supply.
[10:26:25][D][voice_assistant:641]: Event Type: 11
[10:26:25][D][voice_assistant:804]: Starting STT by VAD
[10:26:25][D][light:036]: 'voice_assistant_leds' Setting:
[10:26:25][D][light:051]:   Brightness: 66%
[10:26:25][D][light:109]:   Effect: 'Listening For Command'
[10:26:27][D][voice_assistant:641]: Event Type: 12
[10:26:27][D][voice_assistant:808]: STT by VAD end
[10:26:27][D][voice_assistant:515]: State changed from STREAMING_MICROPHONE to STOP_MICROPHONE
[10:26:27][D][voice_assistant:522]: Desired state set to AWAITING_RESPONSE
[10:26:27][D][voice_assistant:515]: State changed from STOP_MICROPHONE to STOPPING_MICROPHONE
[10:26:27][D][light:036]: 'voice_assistant_leds' Setting:
[10:26:27][D][light:051]:   Brightness: 66%
[10:26:27][D][light:109]:   Effect: 'Thinking'
[10:26:27][D][voice_assistant:515]: State changed from STOPPING_MICROPHONE to AWAITING_RESPONSE
[10:26:27][D][voice_assistant:515]: State changed from AWAITING_RESPONSE to AWAITING_RESPONSE
[10:26:27][D][voice_assistant:641]: Event Type: 4
[10:26:27][D][voice_assistant:669]: Speech recognised as: "Turn off laundry lights."
[10:26:27][D][voice_assistant:641]: Event Type: 5
[10:26:27][D][voice_assistant:674]: Intent started
[10:26:27][D][power_supply:033]: Enabling power supply.
[10:26:28][D][voice_assistant:641]: Event Type: 6
[10:26:28][D][voice_assistant:641]: Event Type: 7
[10:26:28][D][voice_assistant:697]: Response: "Turned off the light"
[10:26:28][D][light:036]: 'voice_assistant_leds' Setting:
[10:26:28][D][light:051]:   Brightness: 66%
[10:26:28][D][light:109]:   Effect: 'Replying'
[10:26:28][D][voice_assistant:641]: Event Type: 8
[10:26:28][D][voice_assistant:719]: Response URL: "https://myhacloudurl.ui.nabu.casa/api/tts_proxy/jlEuNKyISsY4CXrf5ecRyA.flac"
[10:26:28][D][voice_assistant:515]: State changed from AWAITING_RESPONSE to STREAMING_RESPONSE
[10:26:28][D][voice_assistant:522]: Desired state set to STREAMING_RESPONSE
[10:26:28][D][media_player:080]: 'Media Player' - Setting
[10:26:28][D][media_player:087]:   Media URL: https://myhacloudurl.ui.nabu.casa/api/tts_proxy/jlEuNKyISsY4CXrf5ecRyA.flac
[10:26:28][D][media_player:093]:  Announcement: yes
[10:26:28][D][power_supply:033]: Enabling power supply.
[10:26:28][D][voice_assistant:641]: Event Type: 2
[10:26:28][D][voice_assistant:733]: Assist Pipeline ended
[10:26:28][D][esp-idf:000][ann_read]: E (1881717) esp-tls-mbedtls: mbedtls_ssl_handshake returned -0x7280

[10:26:28][D][esp-idf:000][ann_read]: E (1881719) esp-tls: Failed to open new connection

[10:26:28][D][esp-idf:000][ann_read]: E (1881720) transport_base: Failed to open a new connection

[10:26:28][D][esp-idf:000][ann_read]: E (1881723) HTTP_CLIENT: Connection failed, sock < 0

[10:26:28][E][nabu_media_player.pipeline:171]: Media reader encountered an error: ESP_ERR_HTTP_CONNECT
[10:26:28][E][nabu_media_player:305]: The announcement pipeline's file reader encountered an error.
[10:26:28][D][voice_assistant:515]: State changed from STREAMING_RESPONSE to IDLE
[10:26:28][D][voice_assistant:522]: Desired state set to IDLE
[10:26:28][D][light:036]: 'voice_assistant_leds' Setting:
[10:26:28][D][light:047]:   State: OFF
[10:26:28][D][light:109]:   Effect: 'None'
[10:26:38][D][power_supply:048]: Disabling power supply.

I was able to get HA Voice PE replying by removing TLS security from configuration.yaml and updating the local URL to HTTP instead of HTTPS

Not a great solution as now everything is in plain text.

Has anybody else got Voice PE working with a local HTTPS connection using a valid Let’s Encrypt certificate?

I’ve seen a few others who have encountered this problem where “local URL” using https does not work but have not seen a solution other than what you did or use nginx proxy. I would suggest writing up an issue on this here. I am still waiting for my VPE, but I think I’m going to run into the same problem.

Thanks for the link to submit a ticket, was not sure where of actually submit an issue beyond community forum. Please message with how you go when you receive yours as I think that this is not a great outcome if the device is forcing a security downgrade and would like to help solve it.