I’ve been trying for about a month now to set up some magnetometers to monitor my gas and water meters. Everything works, but the setup pretty regularly (~20 times a day) either has the sensors completely freeze up (forcing me to reboot), or the device itself hits some sort of snag that causes an exception/panic, or the interrupt watchdog to trigger a reboot. There’s pretty frequent NACK / I2C timeouts in the logs too, but obviously those are recoverable. I’m at my wits end trying to solve these, and can’t figure out what’s happening to my setup, where others who are pushing the limits further arent experiencing these issues at all.
I’ve got 2 QMC5883P magnetometers hooked up to my ESP32. Unfortunately dont have power anywhere near the gas or water meters (they’re at opposite ends of the house), so I’ve centralized the ESP32 and am running 25 and 35ft of Cat5 to the meters. It’s wired with a ground on each twisted pair. (V/GND, SDA/GND, SCL/GND). Neither wire runs along an AC mains, but one does run along some old phone wire that isn’t in use / a grounding cable for the same. Each sensor is it’s own bus. Because of this length, I’ve also added an LTC4311 Extender / Active Terminator to each bus, to help. I’m using 4.7K pullups at the ESP.
I’ve tried everything I can think of, tweaking the pullups, the timeouts, the frequency, the update intervals, no matter what I do, I can’t seem to prevent these lockups or panics or watchdog issues. The annoying thing is the reboots take ~40 seconds, which means I miss a good amount of pulses for anything running during that time, especially with how often it’s rebooting.
The only other thing I could think to try is using I2C > Differential signal converters, but here in Canada those would cost $23 a pop, and I’d need 4, and I’ve already spent too much money as is on what was supposed to be a silly little project that most people seem to have zero trouble with. I’ve even seen other running it at 50+ feet without issue, so I feel particularly insane trying to debug this.
Here’s the code I’m using. The long script in there is just the recovery script to reboot the sensors when they freeze up, but that doesnt stop the panics / watchdog timeouts. Also note that, I think the 15ms update period for the water sensor is a lot, yeah, but I’ve tested it by graphing the raw output, and any slower and it misses pulses when the tap is fully open. My water meter seems to spin exceptionally fast.
esphome:
name: utility-monitor
friendly_name: Utility Monitor
on_boot:
priority: -100
then:
- delay: 100ms # Let sensor power stabilize
- lambda: |-
const char* causes[] = {
"Normal Boot",
"Water I2C Frozen",
"Gas I2C Frozen",
"API Disconnect",
"Recovery Failure"
};
int code = id(reboot_cause_code);
if (code >= 0 && code < 4) {
id(reboot_cause_sensor).publish_state(causes[code]);
}
id(recovery_status).publish_state("No issue");
// Reset
id(reboot_cause_code) = 0;
external_components:
- source: github://mazzhead/esphome-components@main
components: [qmc5883p]
- source: github://dentra/esphome-components
components: [ coredump, partitions ]
esp32:
board: esp32dev
framework:
type: esp-idf
sdkconfig_options:
CONFIG_ESP_COREDUMP_ENABLE_TO_FLASH: y
CONFIG_ESP_COREDUMP_DATA_FORMAT_ELF: y
CONFIG_ESP_COREDUMP_CHECKSUM_CRC32: y
CONFIG_ESP_INT_WDT_TIMEOUT_MS: "1000" # Increase from default 300ms
CONFIG_ESP_TASK_WDT_TIMEOUT_S: "10" # Increase task WDT too
partitions: custom_partitions.csv
coredump:
i2c:
- id: water_i2c_bus
sda: GPIO26
scl: GPIO25
frequency: 10kHz
timeout: 1ms
- id: gas_i2c_bus
sda: GPIO33
scl: GPIO32
frequency: 10kHz
timeout: 1ms
switch:
- platform: gpio
pin: GPIO18
id: sensor_power_water
restore_mode: ALWAYS_ON
internal: true
- platform: gpio
pin: GPIO19
id: sensor_power_gas
restore_mode: ALWAYS_ON
internal: true
- platform: template
name: "Display Raw Gas Field Data"
id: gas_data_switch
optimistic: true
entity_category: diagnostic
icon: "mdi:eye"
restore_mode: "RESTORE_DEFAULT_OFF"
on_turn_off:
then:
- lambda: 'id(raw_gas_field).publish_state(NAN);'
- platform: template
name: "Display Raw Water Field Data"
id: water_data_switch
optimistic: true
entity_category: diagnostic
icon: "mdi:eye"
restore_mode: "RESTORE_DEFAULT_OFF"
on_turn_off:
then:
- lambda: 'id(raw_water_field).publish_state(NAN);'
globals:
- id: water_total_pulses
type: float
restore_value: yes
initial_value: '0.0'
- id: gas_total_pulses
type: float
restore_value: yes
initial_value: '0.0'
# Water meter wave alternates peak heights
- id: expecting_half_peak
type: bool
restore_value: yes
initial_value: 'false'
- id: reboot_cause_code
type: int
restore_value: yes
initial_value: '0'
# Flow tracking globals
- id: water_pulses_last_period
type: int
restore_value: no
initial_value: '-1'
- id: gas_pulses_last_period
type: int
restore_value: no
initial_value: '-1'
- id: water_recovery_running
type: bool
initial_value: 'false'
- id: gas_recovery_running
type: bool
initial_value: 'false'
- id: recovery_attempts
type: int
initial_value: '0'
sensor:
- platform: qmc5883p
id: water_sensor
i2c_id: water_i2c_bus
field_strength_z:
name: "Water Field Strength Z"
id: "water_mag_z"
internal: true
on_value:
then:
- lambda: |-
// Push to the visible sensor ONLY if the switch is ON
if (id(water_data_switch).state) {
id(raw_water_field).publish_state(x);
}
float dynamic_threshold = id(expecting_half_peak) ? id(water_mid_threshold).state : id(water_max_threshold).state;
if (!id(water_pulse_state).state) {
if (x >= id(water_max_threshold).state) {
// Full peak detected, expect half peak next
id(water_pulse_state).publish_state(true);
id(expecting_half_peak) = true;
}
else if (id(expecting_half_peak) && x >= id(water_mid_threshold).state) {
// Half peak detected, expect full peak next
id(water_pulse_state).publish_state(true);
id(expecting_half_peak) = false;
}
} else {
if (x <= id(water_min_threshold).state) {
id(water_pulse_state).publish_state(false);
}
}
update_interval: never
- platform: qmc5883p
id: gas_sensor
i2c_id: gas_i2c_bus
field_strength_x:
name: "Gas Field Strength X"
id: "gas_mag_x"
internal: true
on_value:
then:
- lambda: |-
// Push to the visible sensor ONLY if the switch is ON
if (id(gas_data_switch).state) {
id(raw_gas_field).publish_state(x);
}
if (x >= id(gas_max_threshold).state) {
id(gas_pulse_state).publish_state(true);
}
else if (x <= id(gas_min_threshold).state) {
id(gas_pulse_state).publish_state(false);
}
update_interval: never
# Raw sensor data for diagnostics
- platform: template
name: "Water Z Field"
id: raw_water_field
unit_of_measurement: "µT"
accuracy_decimals: 3
update_interval: never
entity_category: diagnostic
icon: "mdi:sine-wave"
- platform: template
name: "Gas X Field"
id: raw_gas_field
unit_of_measurement: "µT"
accuracy_decimals: 3
update_interval: never
entity_category: diagnostic
icon: "mdi:sine-wave"
- platform: template
name: "Water Total Volume"
device_class: water
state_class: total_increasing
unit_of_measurement: "L"
accuracy_decimals: 3
icon: "mdi:water"
lambda: |-
return id(water_total_pulses) * id(water_multiplier).state;
update_interval: 1s
- platform: template
name: "Gas Total Volume"
device_class: gas
state_class: total_increasing
unit_of_measurement: "m³"
accuracy_decimals: 2
icon: "mdi:meter-gas"
lambda: |-
return id(gas_total_pulses) * id(gas_multiplier).state;
update_interval: 1s
# Raw pulses for diagnostics
- platform: template
name: "Water Pulses"
id: raw_water_pulses
state_class: total_increasing
icon: "mdi:counter"
entity_category: diagnostic
accuracy_decimals: 2
lambda: |-
return (float)id(water_total_pulses);
- platform: template
name: "Gas Pulses"
id: raw_gas_pulses
state_class: total_increasing
icon: "mdi:counter"
entity_category: diagnostic
accuracy_decimals: 1
lambda: |-
return (float)id(gas_total_pulses);
- platform: template
name: "Water Flow Rate"
id: water_flow_rate
unit_of_measurement: "L/min"
device_class: volume_flow_rate
state_class: measurement
accuracy_decimals: 2
update_interval: 5s # Calculate every 10 seconds
lambda: |-
if (id(water_pulses_last_period) == -1) {
id(water_pulses_last_period) = id(water_total_pulses);
return 0.0; // Return zero flow on first boot
}
int pulses_this_period = id(water_total_pulses) - id(water_pulses_last_period);
id(water_pulses_last_period) = id(water_total_pulses);
// pulses in 5 seconds → liters per minute
// pulses * (multipler L/pulse) * (60 seconds / 5 seconds) = L/min
float liters_this_period = pulses_this_period * id(water_multiplier).state;
float flow_rate = liters_this_period * 12.0; // Convert to per-minute
return flow_rate;
- platform: template
name: "Gas Flow Rate"
id: gas_flow_rate
unit_of_measurement: "m³/h"
device_class: volume_flow_rate
state_class: measurement
accuracy_decimals: 3
update_interval: 15s # Gas flows slower, check less often
lambda: |-
if (id(gas_pulses_last_period) == -1) {
id(gas_pulses_last_period) = id(gas_total_pulses);
return 0.0; // Return zero flow on first boot
}
int pulses_this_period = id(gas_total_pulses) - id(gas_pulses_last_period);
id(gas_pulses_last_period) = id(gas_total_pulses);
// pulses in 15 seconds → m³ per hour
// pulses * (multiplier m³/pulse) * (3600 seconds / 15 seconds) = m³/h
float volume_this_period = pulses_this_period * id(gas_multiplier).state;
float flow_rate = volume_this_period * 240.0; // Convert to per-hour
return flow_rate;
number:
- platform: template
name: "Water Pulse Max Threshold"
id: water_max_threshold
unit_of_measurement: "µT"
icon: "mdi:arrow-collapse-up"
max_value: 100
min_value: -100
step: 0.1
initial_value: 21.0
restore_value: true
entity_category: config
optimistic: true
- platform: template
name: "Water Pulse Mid Threshold"
id: water_mid_threshold
unit_of_measurement: "µT"
icon: "mdi:arrow-collapse-vertical"
max_value: 100
min_value: -100
step: 0.1
initial_value: 15.0
restore_value: true
entity_category: config
optimistic: true
- platform: template
name: "Water Pulse Min Threshold"
id: water_min_threshold
unit_of_measurement: "µT"
icon: "mdi:arrow-collapse-down"
max_value: 100
min_value: -100
step: 0.1
initial_value: 12.0
restore_value: true
entity_category: config
optimistic: true
- platform: template
name: "Gas Pulse Max Threshold"
id: gas_max_threshold
unit_of_measurement: "µT"
icon: "mdi:arrow-collapse-up"
max_value: 100
min_value: -100
step: 0.1
initial_value: -6.0
restore_value: true
entity_category: config
optimistic: true
- platform: template
name: "Gas Pulse Min Threshold"
id: gas_min_threshold
unit_of_measurement: "µT"
icon: "mdi:arrow-collapse-down"
max_value: 100
min_value: -100
step: 0.1
initial_value: -10.0
restore_value: true
entity_category: config
optimistic: true
- platform: template
name: "Water Volume Per Rotation"
id: water_multiplier
icon: "mdi:rotate-360"
unit_of_measurement: "L"
max_value: 1
min_value: -1
step: 0.0001
initial_value: 0.065558
restore_value: true
entity_category: config
optimistic: true
- platform: template
name: "Gas Volume Per Rotation"
id: gas_multiplier
icon: "mdi:rotate-360"
unit_of_measurement: "m³"
max_value: 1
min_value: -1
step: 0.0001
initial_value: 0.00314
restore_value: true
entity_category: config
optimistic: true
text_sensor:
- platform: debug
reset_reason:
name: "Reboot Reason"
- platform: template
name: "Reboot Cause"
id: reboot_cause_sensor
icon: "mdi:sync-alert"
entity_category: diagnostic
- platform: template
name: "Recovery Status"
id: recovery_status
icon: "mdi:tools"
entity_category: diagnostic
debug:
script:
- id: recover_i2c_bus
parameters:
bus_index: int # 0 = Water, 1 = Gas
mode: single
then:
- lambda: |-
if (bus_index == 0) id(water_recovery_running) = true;
else id(gas_recovery_running) = true;
id(recovery_attempts)++;
ESP_LOGI("recovery", "===== Recovering %s bus, attempt %d of 3 =====", bus_index == 0 ? "Water" : "Gas", id(recovery_attempts));
id(recovery_status).publish_state(str_sprintf("Recovering %s bus...", bus_index == 0 ? "Water" : "Gas"));
# 1. Physical Power Down
- lambda: 'ESP_LOGI("recovery", "[1/5] Sensor powered off");'
- if:
condition:
lambda: 'return bus_index == 0;'
then:
- switch.turn_off: sensor_power_water
else:
- switch.turn_off: sensor_power_gas
- delay: 1s
# 2. Check Bus State & Hardware Reinit
- lambda: |-
ESP_LOGI("recovery", "[2/5] Checking if lines are stuck low...");
// Check if the SDA or SCL lines are stuck LOW while power is off.
// If they are, it means the ESP32 internal peripheral is confused.
gpio_num_t sda = (bus_index == 0) ? GPIO_NUM_26 : GPIO_NUM_33;
gpio_num_t scl = (bus_index == 0) ? GPIO_NUM_25 : GPIO_NUM_32;
gpio_set_direction(sda, GPIO_MODE_INPUT);
gpio_set_pull_mode(sda, GPIO_PULLUP_ONLY);
gpio_set_direction(scl, GPIO_MODE_INPUT);
gpio_set_pull_mode(scl, GPIO_PULLUP_ONLY);
bool stuck = (gpio_get_level(sda) == 0 || gpio_get_level(scl) == 0);
if (stuck) {
ESP_LOGW("recovery", "Bus pins stuck low. Attempting line flush...");
gpio_set_direction(scl, GPIO_MODE_OUTPUT);
for(int i=0; i<16; i++) {
gpio_set_level(scl, 1); // HIGH
esp_rom_delay_us(20);
gpio_set_level(scl, 0); // LOW
esp_rom_delay_us(20);
App.feed_wdt();
}
}
# 3. Power Up
- lambda: 'ESP_LOGI("recovery", "[3/5] Sensor powered on, waiting for boot...");'
- if:
condition:
lambda: 'return bus_index == 0;'
then:
- switch.turn_on: sensor_power_water
else:
- switch.turn_on: sensor_power_gas
- delay: 1500ms # Give sensor time to boot
# 4. The "Safe" Re-Initialization
# Instead of deleting the driver, we just clear the error status
# and force the component to re-send its configuration.
- lambda: |-
auto* bus = (bus_index == 0) ? id(water_i2c_bus) : id(gas_i2c_bus);
auto* mag = (bus_index == 0) ? id(water_sensor) : id(gas_sensor);
ESP_LOGI("recovery", "[4/5] Re-initializing sensor software state...");
mag->status_clear_error();
mag->setup();
if (bus->status_has_error()) {
ESP_LOGE("recovery", "I2C Bus hardware peripheral is still errored");
}
- delay: 500ms
# 5. Verification
- lambda: |-
ESP_LOGI("recovery", "[5/5] Verifying sensor responses");
auto* mag = (bus_index == 0) ? id(water_sensor) : id(gas_sensor);
mag->update();
- delay: 200ms
- lambda: |-
auto* mag = (bus_index == 0) ? id(water_sensor) : id(gas_sensor);
if (mag->status_has_error()) {
ESP_LOGI("recovery", "===== Recovery Failed %d of 3 =====", id(recovery_attempts));
if (bus_index == 0) id(water_recovery_running) = false;
else id(gas_recovery_running) = false;
if (id(recovery_attempts) >= 3) {
id(reboot_cause_code) = (bus_index == 0 ? 1 : 2);
delay(100);
App.safe_reboot();
}
} else {
ESP_LOGI("recovery", "===== Recovery Successful =====");
if (bus_index == 0) id(water_recovery_running) = false;
else id(gas_recovery_running) = false;
id(recovery_attempts) = 0;
id(water_mag_alive).publish_state(true);
id(gas_mag_alive).publish_state(true);
id(recovery_status).publish_state("Normal");
}
interval:
- interval: 15ms
then:
- if:
condition:
lambda: return !id(water_recovery_running);
then:
- lambda: |-
if (id(water_sensor)->status_has_error()) {
return;
}
- component.update: water_sensor
- interval: 50ms
startup_delay: 25ms
then:
- if:
condition:
lambda: return !id(gas_recovery_running);
then:
- lambda: |-
if (id(gas_sensor)->status_has_error()) {
return;
}
- component.update: gas_sensor
# Frozen water bus check
- interval: 300ms
then:
- lambda: |-
if (id(sensor_power_water).state && !id(water_recovery_running)) {
static float last_water_z = 0;
static int water_frozen_count = 0;
float current_water_z = id(water_mag_z).state;
// Check if value hasn't changed in 5 seconds
if (abs(current_water_z - last_water_z) < 0.001) { // Less than 0.1µT change
water_frozen_count++;
if ((water_frozen_count >= 4) && (water_frozen_count < 10)) {
ESP_LOGW("watchdog", "Water sensor appears frozen (%d/10), value stuck at %.2f",
water_frozen_count, current_water_z);
}
if (water_frozen_count >= 10) {
id(water_mag_alive).publish_state(false);
id(recover_i2c_bus).execute(0);
}
} else {
if (water_frozen_count >= 4) {
ESP_LOGD("watchdog", "Water sensor alive again, value changed to %.2f", current_water_z);
id(recovery_attempts) = 0;
}
id(water_mag_alive).publish_state(true);
water_frozen_count = 0;
}
last_water_z = current_water_z;
}
# Frozen gas bus check
- interval: 500ms
then:
- lambda: |-
if (id(sensor_power_gas).state && !id(gas_recovery_running)) {
static float last_gas_x = 0;
static int gas_frozen_count = 0;
float current_gas_x = id(gas_mag_x).state;
if (abs(current_gas_x - last_gas_x) < 0.001) {
gas_frozen_count++;
if ((gas_frozen_count >= 5) && (gas_frozen_count < 8)) {
ESP_LOGW("watchdog", "Gas sensor appears frozen (%d/8), change: %.3fµT",
gas_frozen_count, current_gas_x);
}
if (gas_frozen_count >= 8) {
id(gas_mag_alive).publish_state(false);
id(recover_i2c_bus).execute(1);
}
} else {
if (gas_frozen_count >= 5) {
ESP_LOGD("watchdog", "Gas sensor alive again, value changed to %.3fµT", current_gas_x);
id(recovery_attempts) = 0;
}
id(gas_mag_alive).publish_state(true);
gas_frozen_count = 0;
}
last_gas_x = current_gas_x;
}
binary_sensor:
- platform: template
name: "Water Magnetometer Alive"
id: water_mag_alive
entity_category: diagnostic
- platform: template
name: "Gas Magnetometer Alive"
id: gas_mag_alive
entity_category: diagnostic
- platform: template
name: "Water Pulse State"
id: water_pulse_state
device_class: moving
internal: true
on_press:
then:
- lambda: 'id(water_total_pulses) += 0.25;'
on_release:
then:
- lambda: 'id(water_total_pulses) += 0.25;'
- platform: template
name: "Gas Pulse State"
id: gas_pulse_state
device_class: moving
internal: true
on_press:
then:
- lambda: 'id(gas_total_pulses) += 0.5;'
on_release:
then:
- lambda: 'id(gas_total_pulses) += 0.5;'
# Enable logging
logger:
level: DEBUG
logs:
qmc5883p: INFO
i2c.idf: WARN
i2c: WARN
component: INFO
sensor: INFO
watchdog: DEBUG
binary_sensor: INFO
esp32.preferences: ERROR
web_server:
port: 80
# Enable Home Assistant API
api:
encryption:
key: "xxx"
ota:
- platform: esphome
password: "xxx"
wifi:
ssid: !secret wifi_ssid
password: !secret wifi_password
use_address: utility-monitor.local
# Enable fallback hotspot (captive portal) in case wifi connection fails
ap:
ssid: "Utility-Monitor Fallback Hotspot"
password: "xxx"
captive_portal:
What else can I even try here? Different magnetometers? I considered swapping to the MMC5603s. Should I use shielded cat5 instead, or would the capacitance on that be too much?