🚀 Bulletproof Your ESPHome: The Memory Guide

Ever find your ESPHome device mysteriously offline or stuck in a reboot loop? The culprit is often a silent killer: a memory leak.

This guide is for everyone. We’ll start with the simple, beginner-friendly way to see your device’s memory and understand its reboots. Then, we’ll level up to the professional-grade, self-healing guardian that automatically fixes memory problems before you even know they’re there.


Part 1: The Observer :male_detective: (Beginner Level)

Before you can fix a problem, you need to diagnose it. This first step is incredibly simple and gives you two essential tools: one to track your device’s free memory and another to see exactly why it last rebooted.

Your Basic Diagnostic Toolkit

Copy the code below into your device’s YAML file. This adds two new sensors to Home Assistant: one for free memory and one for the last reboot reason.

# -------------------------------------------------------------------
# STEP 1: ADD BASIC DIAGNOSTICS
# See your device's free memory and last reboot reason.
# -------------------------------------------------------------------

# Enables the ability to read debug information.
debug:
  update_interval: 1min # How often to check the memory.

sensor:
  # Creates a sensor in Home Assistant with the free memory value.
  - platform: debug
    free:
      name: "Heap Free"
      unit_of_measurement: "B" # B for Bytes
      icon: "mdi:memory"

text_sensor:
  # Creates a sensor that shows the reason for the last reboot.
  - platform: debug
    reset_reason:
      name: "Reset Reason"
      icon: "mdi:information-outline"

How to Read the Signs

After flashing your device, look for these two new sensors in Home Assistant.

1. The “Heap Free” Sensor

Check its history graph.

  • Healthy Memory: The graph will fluctuate but remain generally stable over time.
  • :rotating_light: Potential Leak: The graph shows a slow, steady decline over hours or days. It only recovers after a reboot, and then the downward trend starts all over again.

2. The “Reset Reason” Sensor

This tells you the story behind the last reboot. You’ll commonly see:

  • Power On: The device was physically powered off and on. This is the normal reason after you flash it.
  • Software/System Reset: The device was rebooted intentionally by code. This is the reason you’ll see when our Guardian script (from Part 2) is working correctly!
  • Watchdog: This is a crash! The device’s code froze, and an automatic safety system had to reboot it. This is a classic symptom of a severe memory leak.
  • Deep Sleep Wake: The device woke up from deep sleep, which is normal for battery-powered devices.

If your “Heap Free” is constantly trending down and your “Reset Reason” is Watchdog, you definitely have a problem. It’s time to level up.


Part 2: The Guardian :shield: (Professional Level)

Watching graphs is one thing; automatically fixing the problem is another. The “Guardian” is a complete, self-healing system that adapts to your device and reboots it cleanly before a crash.

This is the ultimate set-and-forget solution.

The Blueprint: The Full Auto-Healing YAML

This code replaces the simple snippet from Part 1. It contains the full logic for observing, adapting, and acting, and includes the diagnostic sensors.

:bulb: Quick Start: Copy this entire block into your device’s YAML. You only need to change device_name.

# ===================================================================
# = SELF-HEALING MEMORY PROTECTION SYSTEM FOR ESPHOME
# ===================================================================

substitutions:
  device_name: your_device_name
  heap_threshold: "1000"
  memory_checks: "5"

globals:
  - id: low_memory_consecutive_checks
    type: int
    restore_value: false
    initial_value: 0
  - id: heap_threshold_bytes
    type: float
    initial_value: !secret heap_threshold
    restore_value: false
  - id: max_heap_observed
    type: float
    restore_value: false
    initial_value: "0.0"
  - id: threshold_auto_set
    type: bool
    restore_value: false
    initial_value: false

logger:
  level: WARN
  logs:
    heap_auto: DEBUG
    low_memory_check: DEBUG

esphome:
  name: ${device_name}
  on_boot:
    priority: -100
    then:
      - lambda: |-
          id(threshold_auto_set) = false;
          id(max_heap_observed) = 0.0;
          id(low_memory_consecutive_checks) = 0;
          ESP_LOGI("heap_auto", "System started. Auto-threshold will be set after 5 minutes.");

debug:
  update_interval: 30s

sensor:
  - platform: debug
    free:
      name: "Heap Free"
      id: heap_free_sensor

text_sensor:
  - platform: debug
    reset_reason:
      name: "Reset Reason"
      id: reset_reason_sensor

interval:
  - interval: 30s
    id: heap_monitoring_interval
    then:
      - lambda: |-
          if (!id(threshold_auto_set) && id(heap_free_sensor).has_state()) {
            float current_heap = id(heap_free_sensor).state;
            if (current_heap > id(max_heap_observed)) {
              id(max_heap_observed) = current_heap;
              ESP_LOGD("heap_auto", "New max heap observed: %.0f bytes", current_heap);
            }
          }

  - interval: 5min
    id: auto_threshold_starter
    then:
      - lambda: |-
          if (!id(threshold_auto_set)) {
            if (id(max_heap_observed) > 0) {
              float new_threshold = id(max_heap_observed) * 0.33;
              if (new_threshold < 500) new_threshold = 500;
              if (new_threshold > 10000) new_threshold = 10000;
              id(heap_threshold_bytes) = new_threshold;
              ESP_LOGI("heap_auto", "Auto-set heap threshold to %.0f bytes (33%% of max observed %.0f bytes)", new_threshold, id(max_heap_observed));
            } else {
              id(heap_threshold_bytes) = ${heap_threshold};
              ESP_LOGI("heap_auto", "Could not determine max heap, using default threshold: %.0f bytes", id(heap_threshold_bytes));
            }
            id(threshold_auto_set) = true;
          }

  - interval: 1min
    id: heap_memory_check_interval
    then:
      - if:
          condition:
            lambda: return id(threshold_auto_set);
          then:
            - lambda: |-
                float heap_threshold = id(heap_threshold_bytes);
                int checks_needed_for_restart = ${memory_checks};
                if (id(heap_free_sensor).has_state() && id(heap_free_sensor).state < heap_threshold) {
                  id(low_memory_consecutive_checks)++;
                  ESP_LOGW("low_memory_check", "Heap low (%.0f bytes). Warning %d of %d.", id(heap_free_sensor).state, id(low_memory_consecutive_checks), checks_needed_for_restart);
                  if (id(low_memory_consecutive_checks) >= checks_needed_for_restart) {
                    ESP_LOGE("low_memory_check", "Heap critically low for %d checks. RESTARTING to prevent crash!", id(low_memory_consecutive_checks));
                    App.reboot();
                  }
                } else {
                  if (id(low_memory_consecutive_checks) > 0) {
                    ESP_LOGD("low_memory_check", "Heap recovered (%.0f bytes), resetting counter.", id(heap_free_sensor).state);
                  }
                  id(low_memory_consecutive_checks) = 0;
                }

# ===================================================================
# OPTIONAL: EXTRA SENSORS FOR THE PROS
# ===================================================================
sensor:
  - platform: template
    name: "Heap Threshold"
    lambda: return id(heap_threshold_bytes);
    update_interval: 30s
    unit_of_measurement: "B"
    icon: "mdi:gauge-low"
  
  - platform: template
    name: "Low Memory Counter"
    lambda: return id(low_memory_consecutive_checks);
    update_interval: 10s
    unit_of_measurement: "checks"
    icon: "mdi:counter"

binary_sensor:
  - platform: template
    name: "Low Heap Warning"
    device_class: problem
    lambda: |-
      if (id(heap_free_sensor).has_state()) {
        return id(heap_free_sensor).state < id(heap_threshold_bytes);
      }
      return false;

:sparkles: The Magic Behind the Curtain

  1. The Grace Period (First 5 Mins): The system watches and learns, allowing your device to reach a stable state.
  2. Observation Mode: It records the maximum free memory to learn what “healthy” looks like for your specific device.
  3. Setting the Trap: After 5 minutes, it sets a safety threshold at 33% of the observed maximum, creating a custom-fit safety net.
  4. The Guardian Watch: Every minute, it checks the current memory against this new threshold.
  5. The Rescue Mission: If memory stays low for 5 consecutive checks, it triggers a clean, immediate reboot. Your “Reset Reason” sensor will then report Software/System Reset.

:trophy: Your Arsenal: Pro-Tips & Leak Hunting

Is it a Memory Leak or a Connection Timeout?

Check your Reset Reason sensor! If it’s Software/System Reset, our Guardian is working. But also check the main device logs. If you see “Timeout” or “Disconnecting” from the api: or mqtt: components, you might have a network issue, not a memory leak.

:white_check_mark: Start Lean: The less code you have, the less memory you use. Avoid memory-hungry components like web_server if you don’t absolutely need them.

:white_check_mark: Pro-Tip for Lambdas: Be very careful inside lambda sections. Creating lots of String objects can be a major source of leaks.

:white_check_mark: Test Your Guardian: To confirm the system works, temporarily set the heap_threshold in substitutions to a huge number (like 30000). This will force a test reboot.

:white_check_mark: Consider Deep Sleep: If your goal is maximum battery life for a sensor that reports infrequently, the deep_sleep component is your best friend. It naturally resets memory on each wake cycle.

:man_detective: The Usual Suspects: Common Leak Sources

If the Guardian is rebooting your device, a leak is present. Look for these common culprits:

  • Frequent http_request components.
  • Large and complex lambda functions.
  • Parsing large JSON files.
  • Too many components updating too frequently.
  • Excessive DEBUG logging (our Guardian code already helps manage this).

With these tools, you can move from being a victim of memory leaks to being the master of your device’s stability. Happy project building!

13 Likes

Great guide, thank you!

Just a quick question regarding “Part 1: The Observer :man_detective: (Beginner Level)”. If you have multiple devices, how can you distinguish between “name: Your Device Name Heap Free” and “name: Your Device Name Reset Reason” for each device?

You replace ‘Your Device Name’ with the actual name of the device

Okay, but if I want to keep the text like “Your Device Name Reset Reason”, is it possible to use a variable, like in “$DEVICENAME: Reset Reason” or something similar?

Sorry for confusion on naming and I will correct the guide, I just whipped something together to someone who wants to rule out the memory issues in their project. Hope this helps:

ESPHome Sensor Naming

How ESPHome Naming Works

When you define friendly_name in your ESPHome device configuration, it automatically gets prefixed to all sensor names in Home Assistant.

Basic Configuration

text

substitutions:
  name: "esp32-device"
  friendly_name: "Living Room"
  
esphome:
  name: ${name}
  friendly_name: ${friendly_name}

Sensor Naming Rules

:white_check_mark: Correct way:

text

sensor:
  - platform: uptime
    name: "Uptime"
  - platform: wifi_signal  
    name: "WiFi Signal"

Result in Home Assistant: “Living Room Uptime”, “Living Room WiFi Signal”

:x: Avoid this (creates duplicates):

text

sensor:
  - platform: uptime
    name: "${friendly_name} Uptime"  # DON'T DO THIS

Result: “Living Room Living Room Uptime”

Key Points

  1. Automatic prefixing: ESPHome automatically adds friendly_name to all entity names
  2. Entity ID vs Display Name:
  • Entity ID: sensor.esp32_device_uptime (based on name)
  • Display Name: “Living Room Uptime” (includes friendly_name)
  1. Don’t manually add prefixes: Let ESPHome handle the prefixing automatically
  2. Keep sensor names simple: Use descriptive but short names like “Temperature”, “Humidity”, “Uptime”

Best Practice Template

text

substitutions:
  name: "device-name"
  friendly_name: "Room Name"

esphome:
  name: ${name}
  friendly_name: ${friendly_name}

sensor:
  - platform: dht
    temperature:
      name: "Temperature"
    humidity:
      name: "Humidity"

This creates clean, consistent naming: “Room Name Temperature”, “Room Name Humidity” in Home Assistant while maintaining proper entity IDs.

3 Likes

Great, now I get how it works. Thanks a lot for the clear explanation!

1 Like

I changed this with: initial_value: $heap_threshold as I don’t want to use “secrets”
Also, put initial_value(s) in quotes to pass validation and changed logger level to DEBUG instead of WARN
Compile and install …success.
Low Memory Counter shows 0,0 checks …normal?

@laptopology, if you have some problems and do not know exactly if you have a problem with memory, I suggest you do first just observe how your memory behaves as in “Part 1: The Observer”. After you see what kind of behaviour your ESPhome device do, you know the initial heap value you need to put to the “guardian”. Guardian is only for advanced cases only, usually you just need to find the thing which makes memory to leak.