Teaching your HA Voice PE to Whisper and Shout - Smart Volume using Built-in Mics

Have you ever been startled by your Voice Assistant being way too loud in the quiet of the night? Or struggled to hear it over the chaos of a busy household? Well, I’ve added a nifty dynamic volume control that automatically adjusts based on ambient sound levels!

The solution is implemented entirely through ESPHome Device Builder. You’ll need to take over your Home Assistant Voice PE device (Add the ESPHome Device Builder add-on and find your device there) and use the YAML below instead of the default configuration.

Sensors

Using the existing microphone, I’ve implemented three sensors:

Dynamic Volume Control

The system adjusts in real-time to the noise level in your room:

  • Anchor Volume: Your base volume level when it’s quiet
  • Strength: How aggressively it scales up in noisy situations
  • Simple toggle to enable/disable

Bonus feature: It turns out this is a pretty decent proxy for presence detection - at least if you have young kids! :smile: The ambient sound level spikes whenever there’s activity in the room.

Note: This is still a work in progress - I’m still tweaking the scaling to find the sweet spot, but it’s already quite useful. No more heart attacks when asking for the weather at 3 AM!

Installation

Here’s the complete YAML - just copy this into your ESPHome Device Builder configuration:

substitutions:
  name: home-assistant-voice-REPLACE_WITH_YOUR_OWN
  friendly_name: Home Assistant Voice REPLACE_WITH_YOUR_OWN

packages:
  Nabu Casa.Home Assistant Voice PE: github://esphome/home-assistant-voice-pe/home-assistant-voice.yaml

globals:
  - id: dynamic_volume_enabled
    type: bool
    restore_value: yes
    initial_value: 'false'
  - id: last_dynamic_volume_calculation
    type: float
    restore_value: no
    initial_value: '0'

number:
  - platform: template
    name: "Dyn. Vol. Anchor"
    id: dynamic_volume_anchor
    min_value: 0.1
    max_value: 0.85
    step: 0.05
    initial_value: 0.3
    restore_value: true
    optimistic: true
    icon: "mdi:volume-high"
    unit_of_measurement: "x"
    entity_category: config
    
  - platform: template 
    name: "Dyn. Vol. Strength"
    id: dynamic_volume_strength
    min_value: 0
    max_value: 5
    step: 0.1
    initial_value: 1.0
    restore_value: true
    optimistic: true
    icon: "mdi:volume-vibrate"
    unit_of_measurement: "x"
    entity_category: config

switch:
  - platform: template
    name: "Dynamic Volume"
    id: dynamic_volume_switch
    icon: "mdi:volume-vibrate"
    optimistic: true
    restore_mode: RESTORE_DEFAULT_OFF
    entity_category: config
    turn_on_action:
      - lambda: id(dynamic_volume_enabled) = true;
      - script.execute: update_dynamic_volume
    turn_off_action:
      - lambda: |-
          id(dynamic_volume_enabled) = false;
          // Reset to anchor volume when disabled
          id(nabu_media_player)
            ->make_call()
            .set_volume(id(dynamic_volume_anchor).state)
            .perform();

sensor:
  # Peak amplitude sensor
  - platform: template
    name: "Ambient Sound Peak"
    id: ambient_sound_peak
    unit_of_measurement: "max"
    accuracy_decimals: 6
    update_interval: 1s
    icon: "mdi:microphone-outline"
    state_class: "measurement"
    lambda: |-
      static const char *const TAG = "ambient_sound";
      static const size_t INPUT_BUFFER_SIZE = 512;
      static int16_t input_buffer[INPUT_BUFFER_SIZE];
      
      // Don't measure when media is playing to avoid feedback
      if (id(nabu_media_player)->state != media_player::MEDIA_PLAYER_STATE_IDLE) {
        ESP_LOGD(TAG, "Media player not idle.");
        return id(ambient_sound_peak).state; // Return previous value
      }
      
      // Check if micro_wake_word is ready
      if (!id(mww).is_ready()) {
        ESP_LOGD(TAG, "Micro wake word not ready yet");
        return 0;
      }

      // Start mic if needed
      if (!id(asr_mic)->is_running()) {
        id(asr_mic)->start();
        delay(50); // Give mic time to start
      }

      size_t bytes_read = id(asr_mic)->read(
        input_buffer, 
        INPUT_BUFFER_SIZE * sizeof(int16_t),
        0
      );
      
      if (bytes_read == 0) {
        memset(input_buffer, 0, INPUT_BUFFER_SIZE * sizeof(int16_t));
        ESP_LOGD(TAG, "No samples read from microphone");
        return 0;
      }
      
      size_t samples_read = bytes_read / sizeof(int16_t);
      
      // Find maximum absolute value
      float max_value = 0;
      for (size_t i = 0; i < samples_read; i++) {
        float normalized = abs(input_buffer[i]) / 32768.0f;
        max_value = max(max_value, normalized);
      }
      
      ESP_LOGD(TAG, "Max amplitude: %.6f", max_value);
      return max_value;

  # Linear scaling sensor using peak amplitude as input
  - platform: template
    name: "Ambient Sound Level"
    id: ambient_sound_level
    unit_of_measurement: "%"
    accuracy_decimals: 1
    update_interval: 1s
    icon: "mdi:microphone-outline"
    state_class: "measurement"
    filters:
      - sliding_window_moving_average:
          window_size: 5
    lambda: |-
      float peak = id(ambient_sound_peak).state;
      if (std::isnan(peak)) {
        return 0;
      }
      
      // Simple linear scaling between min/max peak values
      const float MIN_PEAK = 0.000024f;
      const float MAX_PEAK = 0.9f;
      
      float percentage = 0;
      if (peak > MIN_PEAK) {
        percentage = (peak - MIN_PEAK) / (MAX_PEAK - MIN_PEAK) * 100;
        percentage = clamp(percentage, 0.0f, 100.0f);
      }
      
      ESP_LOGD("ambient_sound", "Linear Percentage: %.1f%%", percentage);
      return percentage;

  # Exponential scaling sensor
  - platform: template
    name: "Ambient Sound Level Exp"
    id: ambient_sound_level_exp
    unit_of_measurement: "%"
    accuracy_decimals: 1
    update_interval: 1s
    icon: "mdi:microphone-outline"
    state_class: "measurement"
    lambda: |-
      float linear_value = id(ambient_sound_level).state;
      if (std::isnan(linear_value)) {
        return 0;
      }
      
      // Apply exponential curve
      // Using x^0.4 which gives more resolution to lower values while
      // still maintaining a reasonable curve
      constexpr float exp = 0.4f;
      float percentage = pow(linear_value / 100.0f, exp) * 100.0f;
      
      ESP_LOGD("ambient_sound_exp", "Exponential scaling: %.1f%% -> %.1f%%", 
               linear_value, percentage);
      
      return percentage;

script:
  - id: update_dynamic_volume
    mode: single
    then:
      - lambda: |-
          if (!id(dynamic_volume_enabled)) return;
          
          float ambient_level = id(ambient_sound_level_exp).state;
          if (std::isnan(ambient_level)) return;
          
          float anchor = id(dynamic_volume_anchor).state;
          float strength = id(dynamic_volume_strength).state;
          
          // Convert ambient level to 0-1 range
          float normalized_level = ambient_level / 100.0f;
          
          // Calculate gain factor based on ambient level and strength
          float gain = 1.0f + (normalized_level * strength);
          
          // Calculate new volume 
          float new_volume = anchor * gain;
          
          // Clamp to valid range
          new_volume = clamp(new_volume, 0.0f, 1.0f);
          
          // Only update if changed significantly
          if (abs(new_volume - id(last_dynamic_volume_calculation)) > 0.01) {
            id(last_dynamic_volume_calculation) = new_volume;
            id(nabu_media_player)
              ->make_call()
              .set_volume(new_volume)
              .perform();
            
            ESP_LOGD("dynamic_volume", "Ambient: %.1f%%, Gain: %.2f, New Volume: %.2f", 
                     ambient_level, gain, new_volume);
          }

interval:
  - interval: 1s
    then:
      - script.execute: update_dynamic_volume

logger:
  level: DEBUG
  logs:
    dynamic_volume: DEBUG

esphome:
  name: ${name}
  name_add_mac_suffix: false
  friendly_name: ${friendly_name}

api:
  encryption:
    key: YOUR_OWN_SUPER_SECRET_API_KEY

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

After installing (be patient, building the firmware for the first time can take a long time), you’ll find the new controls in your device’s Configuration panel. Play around with the Anchor and Strength values to find what works best for your space!

Initially, I tried using RMS (Root Mean Square) to measure the ambient sound level, but I found that peak detection correlates much better with what we actually experience as “noise level”. The peaks in audio better represent those moments when you think “wow, it’s noisy in here” - exactly when you want your assistant to speak up!

What do you think? Seems like a great function to have in the device by default. I’ts practically free :slight_smile:

18 Likes

Quick update on the dynamic volume control:

Made some performance tweaks to the configuration - mainly slowed down the sampling rate from 1s to 5s and added better filtering to smooth out the readings. Also fixed a small issue with volume adjustments during media playback.

Updated YAML is below if anyone wants to try it out.

substitutions:
  name: home-assistant-voice-REPLACE_WITH_YOUR_OWN
  friendly_name: Home Assistant Voice REPLACE_WITH_YOUR_OWN

packages:
  Nabu Casa.Home Assistant Voice PE: github://esphome/home-assistant-voice-pe/home-assistant-voice.yaml

globals:
  - id: dynamic_volume_enabled
    type: bool
    restore_value: yes
    initial_value: 'false'
  - id: last_dynamic_volume_calculation
    type: float
    restore_value: no
    initial_value: '0'

number:
  - platform: template
    name: "Dyn. Vol. Anchor"
    id: dynamic_volume_anchor
    min_value: 0.1
    max_value: 0.85
    step: 0.05
    initial_value: 0.3
    restore_value: true
    optimistic: true
    icon: "mdi:volume-high"
    unit_of_measurement: "x"
    entity_category: config
    
  - platform: template 
    name: "Dyn. Vol. Strength"
    id: dynamic_volume_strength
    min_value: 0
    max_value: 5
    step: 0.1
    initial_value: 1.0
    restore_value: true
    optimistic: true
    icon: "mdi:volume-vibrate"
    unit_of_measurement: "x"
    entity_category: config

switch:
  - platform: template
    name: "Dynamic Volume"
    id: dynamic_volume_switch
    icon: "mdi:volume-vibrate"
    optimistic: true
    restore_mode: RESTORE_DEFAULT_OFF
    entity_category: config
    turn_on_action:
      - lambda: id(dynamic_volume_enabled) = true;
      - script.execute: update_dynamic_volume
    turn_off_action:
      - lambda: |-
          id(dynamic_volume_enabled) = false;
          // Reset to anchor volume when disabled
          id(nabu_media_player)
            ->make_call()
            .set_volume(id(dynamic_volume_anchor).state)
            .perform();

sensor:
  # Peak amplitude sensor
  - platform: template
    name: "Ambient Sound Peak"
    id: ambient_sound_peak
    unit_of_measurement: "max"
    accuracy_decimals: 6
    update_interval: 5s
    icon: "mdi:microphone-outline"
    state_class: "measurement"
    filters:
      - throttle: 5s
    lambda: |-
      static const char *const TAG = "ambient_sound";
      static const size_t INPUT_BUFFER_SIZE = 512;
      static int16_t input_buffer[INPUT_BUFFER_SIZE];
      
      // Don't measure when media is playing to avoid feedback
      if (id(nabu_media_player)->state != media_player::MEDIA_PLAYER_STATE_IDLE) {
        ESP_LOGD(TAG, "Media player not idle.");
        return id(ambient_sound_peak).state; // Return previous value
      }
      
      // Check if micro_wake_word is ready
      if (!id(mww).is_ready()) {
        ESP_LOGD(TAG, "Micro wake word not ready yet");
        return 0;
      }

      // Start mic if needed
      if (!id(asr_mic)->is_running()) {
        id(asr_mic)->start();
        delay(50); // Give mic time to start
      }

      size_t bytes_read = id(asr_mic)->read(
        input_buffer, 
        INPUT_BUFFER_SIZE * sizeof(int16_t),
        0
      );
      
      if (bytes_read == 0) {
        memset(input_buffer, 0, INPUT_BUFFER_SIZE * sizeof(int16_t));
        ESP_LOGD(TAG, "No samples read from microphone");
        return 0;
      }
      
      size_t samples_read = bytes_read / sizeof(int16_t);
      
      // Find maximum absolute value
      float max_value = 0;
      for (size_t i = 0; i < samples_read; i++) {
        float normalized = abs(input_buffer[i]) / 32768.0f;
        max_value = max(max_value, normalized);
      }
      
      ESP_LOGD(TAG, "Max amplitude: %.6f", max_value);
      return max_value;

  # Linear scaling sensor
  - platform: template
    name: "Ambient Sound Level"
    id: ambient_sound_level
    unit_of_measurement: "%"
    accuracy_decimals: 1
    update_interval: 5s
    icon: "mdi:microphone-outline"
    state_class: "measurement"
    filters:
      - sliding_window_moving_average:
          window_size: 5
          send_every: 5
      - throttle_average: 5s
    lambda: |-
      float peak = id(ambient_sound_peak).state;
      if (std::isnan(peak)) {
        return 0;
      }
      
      // Simple linear scaling between min/max peak values
      const float MIN_PEAK = 0.000024f;
      const float MAX_PEAK = 0.9f;
      
      float percentage = 0;
      if (peak > MIN_PEAK) {
        percentage = (peak - MIN_PEAK) / (MAX_PEAK - MIN_PEAK) * 100;
        percentage = clamp(percentage, 0.0f, 100.0f);
      }
      
      ESP_LOGD("ambient_sound", "Linear Percentage: %.1f%%", percentage);
      return percentage;

  # Exponential scaling sensor
  - platform: template
    name: "Ambient Sound Level Exp"
    id: ambient_sound_level_exp
    unit_of_measurement: "%"
    accuracy_decimals: 1
    update_interval: 5s
    icon: "mdi:microphone-outline"
    state_class: "measurement"
    filters:
      - sliding_window_moving_average:
          window_size: 5
          send_every: 5
      - throttle_average: 5s
    lambda: |-
      float linear_value = id(ambient_sound_level).state;
      if (std::isnan(linear_value)) {
        return 0;
      }
      
      // Apply exponential curve
      // Using x^0.4 which gives more resolution to lower values while
      // still maintaining a reasonable curve
      constexpr float exp = 0.4f;
      float percentage = pow(linear_value / 100.0f, exp) * 100.0f;
      
      ESP_LOGD("ambient_sound_exp", "Exponential scaling: %.1f%% -> %.1f%%", 
               linear_value, percentage);
      
      return percentage;

script:
  - id: update_dynamic_volume
    mode: single
    then:
      - lambda: |-
          if (!id(dynamic_volume_enabled)) return;
      
          // Don't update volume when media is playing
          if (id(nabu_media_player)->state != media_player::MEDIA_PLAYER_STATE_IDLE) return;
          
          float ambient_level = id(ambient_sound_level_exp).state;
          if (std::isnan(ambient_level)) return;
          
          float anchor = id(dynamic_volume_anchor).state;
          float strength = id(dynamic_volume_strength).state;
          
          // Convert ambient level to 0-1 range
          float normalized_level = ambient_level / 100.0f;
          
          // Calculate gain factor based on ambient level and strength
          float gain = 1.0f + (normalized_level * strength);
          
          // Calculate new volume 
          float new_volume = anchor * gain;
          
          // Clamp to valid range
          new_volume = clamp(new_volume, 0.0f, 1.0f);
          
          // Only update if changed significantly
          if (abs(new_volume - id(last_dynamic_volume_calculation)) > 0.01) {
            id(last_dynamic_volume_calculation) = new_volume;
            id(nabu_media_player)
              ->make_call()
              .set_volume(new_volume)
              .perform();
            
            ESP_LOGD("dynamic_volume", "Ambient: %.1f%%, Gain: %.2f, New Volume: %.2f", 
                     ambient_level, gain, new_volume);
          }

interval:
  - interval: 5s
    then:
      - script.execute: update_dynamic_volume

logger:
  level: DEBUG
  logs:
    dynamic_volume: DEBUG

esphome:
  name: ${name}
  name_add_mac_suffix: false
  friendly_name: ${friendly_name}

api:
  encryption:
    key: YOUR_OWN_SUPER_SECRET_API_KEY

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

1 Like

Just compiled this. We will see how it goes. Thanks for the work.

3 Likes

Quick update on the dynamic volume control:

Made some structural updates to align with the latest ESPHome changes - mainly updated the media player references from nabu_media_player to external_media_player to match upstream updates.

Also streamlined the ambient sound sensors - moved the linear scaling to diagnostics and made the exponential scaling the default sensor, since it provides better resolution at lower volumes.

Updated YAML is below if anyone wants to try it out.

substitutions:
  name: home-assistant-voice-REPLACE_WITH_YOUR_OWN
  friendly_name: Home Assistant Voice REPLACE_WITH_YOUR_OWN

packages:
  Nabu Casa.Home Assistant Voice PE: github://esphome/home-assistant-voice-pe/home-assistant-voice.yaml

globals:
  - id: dynamic_volume_enabled
    type: bool
    restore_value: yes
    initial_value: 'false'
  - id: last_dynamic_volume_calculation
    type: float
    restore_value: no
    initial_value: '0'

number:
  - platform: template
    name: "Dyn. Vol. Anchor"
    id: dynamic_volume_anchor
    min_value: 0.1
    max_value: 0.85
    step: 0.05
    initial_value: 0.3
    restore_value: true
    optimistic: true
    icon: "mdi:volume-high"
    unit_of_measurement: "x"
    entity_category: config
    
  - platform: template 
    name: "Dyn. Vol. Strength"
    id: dynamic_volume_strength
    min_value: 0
    max_value: 5
    step: 0.1
    initial_value: 1.0
    restore_value: true
    optimistic: true
    icon: "mdi:volume-vibrate"
    unit_of_measurement: "x"
    entity_category: config

switch:
  - platform: template
    name: "Dynamic Volume"
    id: dynamic_volume_switch
    icon: "mdi:volume-vibrate"
    optimistic: true
    restore_mode: RESTORE_DEFAULT_OFF
    entity_category: config
    turn_on_action:
      - lambda: id(dynamic_volume_enabled) = true;
      - script.execute: update_dynamic_volume
    turn_off_action:
      - lambda: |-
          id(dynamic_volume_enabled) = false;
          // Reset to anchor volume when disabled
          id(external_media_player)
            ->make_call()
            .set_volume(id(dynamic_volume_anchor).state)
            .perform();

sensor:
  # Peak amplitude sensor
  - platform: template
    name: "Ambient Sound Peak"
    id: ambient_sound_peak
    unit_of_measurement: "max"
    accuracy_decimals: 6
    update_interval: 5s
    icon: "mdi:microphone-outline"
    state_class: "measurement"
    entity_category: "diagnostic" 
    filters:
      - throttle: 5s
    lambda: |-
      static const char *const TAG = "ambient_sound";
      static const size_t INPUT_BUFFER_SIZE = 512;
      static int16_t input_buffer[INPUT_BUFFER_SIZE];
      
      // Don't measure when media is playing to avoid feedback
      if (id(external_media_player)->state != media_player::MEDIA_PLAYER_STATE_IDLE) {
        ESP_LOGD(TAG, "Media player not idle.");
        return id(ambient_sound_peak).state; // Return previous value
      }
      
      // Check if micro_wake_word is ready
      if (!id(mww).is_ready()) {
        ESP_LOGD(TAG, "Micro wake word not ready yet");
        return 0;
      }

      // Start mic if needed
      if (!id(asr_mic)->is_running()) {
        id(asr_mic)->start();
        delay(50); // Give mic time to start
      }

      size_t bytes_read = id(asr_mic)->read(
        input_buffer, 
        INPUT_BUFFER_SIZE * sizeof(int16_t),
        0
      );
      
      if (bytes_read == 0) {
        memset(input_buffer, 0, INPUT_BUFFER_SIZE * sizeof(int16_t));
        ESP_LOGD(TAG, "No samples read from microphone");
        return 0;
      }
      
      size_t samples_read = bytes_read / sizeof(int16_t);
      
      // Find maximum absolute value
      float max_value = 0;
      for (size_t i = 0; i < samples_read; i++) {
        float normalized = abs(input_buffer[i]) / 32768.0f;
        max_value = max(max_value, normalized);
      }
      
      ESP_LOGD(TAG, "Max amplitude: %.6f", max_value);
      return max_value;

  # Linear scaling sensor
  - platform: template
    name: "Ambient Sound Level Linear"
    id: ambient_sound_level
    unit_of_measurement: "%"
    accuracy_decimals: 1
    update_interval: 5s
    icon: "mdi:microphone-outline"
    state_class: "measurement"
    entity_category: "diagnostic" 
    filters:
      - sliding_window_moving_average:
          window_size: 5
          send_every: 5
      - throttle_average: 5s
    lambda: |-
      float peak = id(ambient_sound_peak).state;
      if (std::isnan(peak)) {
        return 0;
      }
      
      // Simple linear scaling between min/max peak values
      const float MIN_PEAK = 0.000024f;
      const float MAX_PEAK = 0.9f;
      
      float percentage = 0;
      if (peak > MIN_PEAK) {
        percentage = (peak - MIN_PEAK) / (MAX_PEAK - MIN_PEAK) * 100;
        percentage = clamp(percentage, 0.0f, 100.0f);
      }
      
      ESP_LOGD("ambient_sound", "Linear Percentage: %.1f%%", percentage);
      return percentage;

  # Exponential scaling sensor
  - platform: template
    name: "Ambient Sound Level"
    id: ambient_sound_level_exp
    unit_of_measurement: "%"
    accuracy_decimals: 1
    update_interval: 5s
    icon: "mdi:microphone-outline"
    state_class: "measurement"
    filters:
      - sliding_window_moving_average:
          window_size: 5
          send_every: 5
      - throttle_average: 5s
    lambda: |-
      float linear_value = id(ambient_sound_level).state;
      if (std::isnan(linear_value)) {
        return 0;
      }
      
      // Apply exponential curve
      // Using x^0.4 which gives more resolution to lower values while
      // still maintaining a reasonable curve
      constexpr float exp = 0.4f;
      float percentage = pow(linear_value / 100.0f, exp) * 100.0f;
      
      ESP_LOGD("ambient_sound_exp", "Exponential scaling: %.1f%% -> %.1f%%", 
               linear_value, percentage);
      
      return percentage;

script:
  - id: update_dynamic_volume
    mode: single
    then:
      - lambda: |-
          if (!id(dynamic_volume_enabled)) return;
      
          // Don't update volume when media is playing
          if (id(external_media_player)->state != media_player::MEDIA_PLAYER_STATE_IDLE) return;
          
          float ambient_level = id(ambient_sound_level_exp).state;
          if (std::isnan(ambient_level)) return;
          
          float anchor = id(dynamic_volume_anchor).state;
          float strength = id(dynamic_volume_strength).state;
          
          // Convert ambient level to 0-1 range
          float normalized_level = ambient_level / 100.0f;
          
          // Calculate gain factor based on ambient level and strength
          float gain = 1.0f + (normalized_level * strength);
          
          // Calculate new volume 
          float new_volume = anchor * gain;
          
          // Clamp to valid range
          new_volume = clamp(new_volume, 0.0f, 1.0f);
          
          // Only update if changed significantly
          if (abs(new_volume - id(last_dynamic_volume_calculation)) > 0.01) {
            id(last_dynamic_volume_calculation) = new_volume;
            id(external_media_player)
              ->make_call()
              .set_volume(new_volume)
              .perform();
            
            ESP_LOGD("dynamic_volume", "Ambient: %.1f%%, Gain: %.2f, New Volume: %.2f", 
                     ambient_level, gain, new_volume);
          }

interval:
  - interval: 5s
    then:
      - script.execute: update_dynamic_volume

logger:
  level: DEBUG
  logs:
    dynamic_volume: DEBUG

esphome:
  name: ${name}
  name_add_mac_suffix: false
  friendly_name: ${friendly_name}

api:
  encryption:
    key: YOUR_OWN_SUPER_SECRET_API_KEY

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
  

Following this project very close. Just implemented it in all 3 of my devices.

1 Like

This is cool. Consider adding your own package repo location so that you can share easier and users are less prone to errors in typing in commands…

2 Likes

Update: Now Available as an ESPHome Package!

Following community feedback, I’ve made the Dynamic Volume Control available as an easy-to-use ESPHome package. This means you no longer need to copy and paste the entire configuration - just reference the package in your configuration!

Quick Installation

substitutions:
  name: your-ha-voice-device-name  # Replace with your device name
  friendly_name: Your HA Voice     # Replace with your friendly name

packages:
  # Official ESPHome Home Assistant Voice PE package
  Nabu Casa.Home Assistant Voice PE: github://esphome/home-assistant-voice-pe/home-assistant-voice.yaml
  # Dynamic Volume package
  Jaapp.DynamicVolume: github://jaapp/ha-voice-dynamic-volume/dynamic-volume.yaml

# Your existing API configuration
api:
  encryption:
    key: YOUR_API_KEY

# Your existing WiFi configuration  
wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

Benefits of Using the Package

  • Easier Installation: No more copying and pasting large code blocks
  • Fewer Errors: Eliminates typos that can happen with manual copying
  • Automatic Updates: Get improvements when I update the package
  • Better Documentation: Check out the GitHub repository for full documentation

GitHub Repository

For more information, example configurations, and to report issues, please visit the GitHub repository:
https://github.com/jaapp/ha-voice-dynamic-volume

Thanks to everyone who has tried this out and provided feedback! Special thanks to @Sir_Goodenough for suggesting the package approach.

11 Likes

Thanks for the Git update. That will make things way easier to update in the future. So far everything has been working great on mine.

1 Like

Unfortunately I’m unable to edit my own posts. I just dicovered an omission in the example yml in the post which would cause the device to loose it’s hostname.

Please be sure to use this corrected yml, or the example yml from my github repo.

substitutions:
  name: your-ha-voice-device-name  # Replace with your device name
  friendly_name: Your HA Voice     # Replace with your friendly name

packages:
  # Official ESPHome Home Assistant Voice PE package
  Nabu Casa.Home Assistant Voice PE: github://esphome/home-assistant-voice-pe/home-assistant-voice.yaml
  # Dynamic Volume package
  Jaapp.DynamicVolume: github://jaapp/ha-voice-dynamic-volume/dynamic-volume.yaml

# Required base configuration
esphome:
  name: ${name}
  name_add_mac_suffix: false
  friendly_name: ${friendly_name}

# Your existing API configuration
api:
  encryption:
    key: YOUR_API_KEY

# Your existing WiFi configuration  
wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password
3 Likes

When I add the package it fails in the install looking for the asr_mic as its not available. Is there a work around. I’m using the latest version of ESPhome as of 10/9

I am also having the same error

Running into the same error :frowning:

Can we use external media player for Voice Pe replies? with Smart Volume?