Ever find your ESPHome device mysteriously offline or stuck in a reboot loop? The culprit is often a silent killer: a memory leak.
This guide is for everyone. We’ll start with the simple, beginner-friendly way to see your device’s memory and understand its reboots. Then, we’ll level up to the professional-grade, self-healing guardian that automatically fixes memory problems before you even know they’re there.
Part 1: The Observer
(Beginner Level)
Before you can fix a problem, you need to diagnose it. This first step is incredibly simple and gives you two essential tools: one to track your device’s free memory and another to see exactly why it last rebooted.
Your Basic Diagnostic Toolkit
Copy the code below into your device’s YAML file. This adds two new sensors to Home Assistant: one for free memory and one for the last reboot reason.
# -------------------------------------------------------------------
# STEP 1: ADD BASIC DIAGNOSTICS
# See your device's free memory and last reboot reason.
# -------------------------------------------------------------------
# Enables the ability to read debug information.
debug:
update_interval: 1min # How often to check the memory.
sensor:
# Creates a sensor in Home Assistant with the free memory value.
- platform: debug
free:
name: "Heap Free"
unit_of_measurement: "B" # B for Bytes
icon: "mdi:memory"
text_sensor:
# Creates a sensor that shows the reason for the last reboot.
- platform: debug
reset_reason:
name: "Reset Reason"
icon: "mdi:information-outline"
How to Read the Signs
After flashing your device, look for these two new sensors in Home Assistant.
1. The “Heap Free” Sensor
Check its history graph.
- Healthy Memory: The graph will fluctuate but remain generally stable over time.
Potential Leak: The graph shows a slow, steady decline over hours or days. It only recovers after a reboot, and then the downward trend starts all over again.
2. The “Reset Reason” Sensor
This tells you the story behind the last reboot. You’ll commonly see:
Power On
: The device was physically powered off and on. This is the normal reason after you flash it.Software/System Reset
: The device was rebooted intentionally by code. This is the reason you’ll see when our Guardian script (from Part 2) is working correctly!Watchdog
: This is a crash! The device’s code froze, and an automatic safety system had to reboot it. This is a classic symptom of a severe memory leak.Deep Sleep Wake
: The device woke up from deep sleep, which is normal for battery-powered devices.
If your “Heap Free” is constantly trending down and your “Reset Reason” is Watchdog
, you definitely have a problem. It’s time to level up.
Part 2: The Guardian
(Professional Level)
Watching graphs is one thing; automatically fixing the problem is another. The “Guardian” is a complete, self-healing system that adapts to your device and reboots it cleanly before a crash.
This is the ultimate set-and-forget solution.
The Blueprint: The Full Auto-Healing YAML
This code replaces the simple snippet from Part 1. It contains the full logic for observing, adapting, and acting, and includes the diagnostic sensors.
Quick Start: Copy this entire block into your device’s YAML. You only need to change
device_name
.
# ===================================================================
# = SELF-HEALING MEMORY PROTECTION SYSTEM FOR ESPHOME
# ===================================================================
substitutions:
device_name: your_device_name
heap_threshold: "1000"
memory_checks: "5"
globals:
- id: low_memory_consecutive_checks
type: int
restore_value: false
initial_value: 0
- id: heap_threshold_bytes
type: float
initial_value: !secret heap_threshold
restore_value: false
- id: max_heap_observed
type: float
restore_value: false
initial_value: "0.0"
- id: threshold_auto_set
type: bool
restore_value: false
initial_value: false
logger:
level: WARN
logs:
heap_auto: DEBUG
low_memory_check: DEBUG
esphome:
name: ${device_name}
on_boot:
priority: -100
then:
- lambda: |-
id(threshold_auto_set) = false;
id(max_heap_observed) = 0.0;
id(low_memory_consecutive_checks) = 0;
ESP_LOGI("heap_auto", "System started. Auto-threshold will be set after 5 minutes.");
debug:
update_interval: 30s
sensor:
- platform: debug
free:
name: "Heap Free"
id: heap_free_sensor
text_sensor:
- platform: debug
reset_reason:
name: "Reset Reason"
id: reset_reason_sensor
interval:
- interval: 30s
id: heap_monitoring_interval
then:
- lambda: |-
if (!id(threshold_auto_set) && id(heap_free_sensor).has_state()) {
float current_heap = id(heap_free_sensor).state;
if (current_heap > id(max_heap_observed)) {
id(max_heap_observed) = current_heap;
ESP_LOGD("heap_auto", "New max heap observed: %.0f bytes", current_heap);
}
}
- interval: 5min
id: auto_threshold_starter
then:
- lambda: |-
if (!id(threshold_auto_set)) {
if (id(max_heap_observed) > 0) {
float new_threshold = id(max_heap_observed) * 0.33;
if (new_threshold < 500) new_threshold = 500;
if (new_threshold > 10000) new_threshold = 10000;
id(heap_threshold_bytes) = new_threshold;
ESP_LOGI("heap_auto", "Auto-set heap threshold to %.0f bytes (33%% of max observed %.0f bytes)", new_threshold, id(max_heap_observed));
} else {
id(heap_threshold_bytes) = ${heap_threshold};
ESP_LOGI("heap_auto", "Could not determine max heap, using default threshold: %.0f bytes", id(heap_threshold_bytes));
}
id(threshold_auto_set) = true;
}
- interval: 1min
id: heap_memory_check_interval
then:
- if:
condition:
lambda: return id(threshold_auto_set);
then:
- lambda: |-
float heap_threshold = id(heap_threshold_bytes);
int checks_needed_for_restart = ${memory_checks};
if (id(heap_free_sensor).has_state() && id(heap_free_sensor).state < heap_threshold) {
id(low_memory_consecutive_checks)++;
ESP_LOGW("low_memory_check", "Heap low (%.0f bytes). Warning %d of %d.", id(heap_free_sensor).state, id(low_memory_consecutive_checks), checks_needed_for_restart);
if (id(low_memory_consecutive_checks) >= checks_needed_for_restart) {
ESP_LOGE("low_memory_check", "Heap critically low for %d checks. RESTARTING to prevent crash!", id(low_memory_consecutive_checks));
App.reboot();
}
} else {
if (id(low_memory_consecutive_checks) > 0) {
ESP_LOGD("low_memory_check", "Heap recovered (%.0f bytes), resetting counter.", id(heap_free_sensor).state);
}
id(low_memory_consecutive_checks) = 0;
}
# ===================================================================
# OPTIONAL: EXTRA SENSORS FOR THE PROS
# ===================================================================
sensor:
- platform: template
name: "Heap Threshold"
lambda: return id(heap_threshold_bytes);
update_interval: 30s
unit_of_measurement: "B"
icon: "mdi:gauge-low"
- platform: template
name: "Low Memory Counter"
lambda: return id(low_memory_consecutive_checks);
update_interval: 10s
unit_of_measurement: "checks"
icon: "mdi:counter"
binary_sensor:
- platform: template
name: "Low Heap Warning"
device_class: problem
lambda: |-
if (id(heap_free_sensor).has_state()) {
return id(heap_free_sensor).state < id(heap_threshold_bytes);
}
return false;
The Magic Behind the Curtain
- The Grace Period (First 5 Mins): The system watches and learns, allowing your device to reach a stable state.
- Observation Mode: It records the maximum free memory to learn what “healthy” looks like for your specific device.
- Setting the Trap: After 5 minutes, it sets a safety threshold at 33% of the observed maximum, creating a custom-fit safety net.
- The Guardian Watch: Every minute, it checks the current memory against this new threshold.
- The Rescue Mission: If memory stays low for 5 consecutive checks, it triggers a clean, immediate reboot. Your “Reset Reason” sensor will then report
Software/System Reset
.
Your Arsenal: Pro-Tips & Leak Hunting
Is it a Memory Leak or a Connection Timeout?
Check your
Reset Reason
sensor! If it’sSoftware/System Reset
, our Guardian is working. But also check the main device logs. If you see “Timeout” or “Disconnecting” from theapi:
ormqtt:
components, you might have a network issue, not a memory leak.
Start Lean: The less code you have, the less memory you use. Avoid memory-hungry components like
web_server
if you don’t absolutely need them.
Pro-Tip for Lambdas: Be very careful inside
lambda
sections. Creating lots of String
objects can be a major source of leaks.
Test Your Guardian: To confirm the system works, temporarily set the
heap_threshold
in substitutions
to a huge number (like 30000
). This will force a test reboot.
Consider Deep Sleep: If your goal is maximum battery life for a sensor that reports infrequently, the
deep_sleep
component is your best friend. It naturally resets memory on each wake cycle.
The Usual Suspects: Common Leak Sources
If the Guardian is rebooting your device, a leak is present. Look for these common culprits:
- Frequent
http_request
components. - Large and complex
lambda
functions. - Parsing large JSON files.
- Too many components updating too frequently.
- Excessive
DEBUG
logging (our Guardian code already helps manage this).
With these tools, you can move from being a victim of memory leaks to being the master of your device’s stability. Happy project building!