Magnetometer (QMC5883P) continually locks up / crashes ESP32

I’ve been trying for about a month now to set up some magnetometers to monitor my gas and water meters. Everything works, but the setup pretty regularly (~20 times a day) either has the sensors completely freeze up (forcing me to reboot), or the device itself hits some sort of snag that causes an exception/panic, or the interrupt watchdog to trigger a reboot. There’s pretty frequent NACK / I2C timeouts in the logs too, but obviously those are recoverable. I’m at my wits end trying to solve these, and can’t figure out what’s happening to my setup, where others who are pushing the limits further arent experiencing these issues at all.

I’ve got 2 QMC5883P magnetometers hooked up to my ESP32. Unfortunately dont have power anywhere near the gas or water meters (they’re at opposite ends of the house), so I’ve centralized the ESP32 and am running 25 and 35ft of Cat5 to the meters. It’s wired with a ground on each twisted pair. (V/GND, SDA/GND, SCL/GND). Neither wire runs along an AC mains, but one does run along some old phone wire that isn’t in use / a grounding cable for the same. Each sensor is it’s own bus. Because of this length, I’ve also added an LTC4311 Extender / Active Terminator to each bus, to help. I’m using 4.7K pullups at the ESP.

I’ve tried everything I can think of, tweaking the pullups, the timeouts, the frequency, the update intervals, no matter what I do, I can’t seem to prevent these lockups or panics or watchdog issues. The annoying thing is the reboots take ~40 seconds, which means I miss a good amount of pulses for anything running during that time, especially with how often it’s rebooting.

The only other thing I could think to try is using I2C > Differential signal converters, but here in Canada those would cost $23 a pop, and I’d need 4, and I’ve already spent too much money as is on what was supposed to be a silly little project that most people seem to have zero trouble with. I’ve even seen other running it at 50+ feet without issue, so I feel particularly insane trying to debug this.

Here’s the code I’m using. The long script in there is just the recovery script to reboot the sensors when they freeze up, but that doesnt stop the panics / watchdog timeouts. Also note that, I think the 15ms update period for the water sensor is a lot, yeah, but I’ve tested it by graphing the raw output, and any slower and it misses pulses when the tap is fully open. My water meter seems to spin exceptionally fast.

esphome:
  name: utility-monitor
  friendly_name: Utility Monitor
  on_boot:
    priority: -100
    then:
      - delay: 100ms  # Let sensor power stabilize
      - lambda: |-
          const char* causes[] = {
            "Normal Boot",
            "Water I2C Frozen", 
            "Gas I2C Frozen",
            "API Disconnect",
            "Recovery Failure"
          };
          
          int code = id(reboot_cause_code);
          if (code >= 0 && code < 4) {
            id(reboot_cause_sensor).publish_state(causes[code]);
          }
          id(recovery_status).publish_state("No issue");
          
          // Reset
          id(reboot_cause_code) = 0;

external_components:
  - source: github://mazzhead/esphome-components@main
    components: [qmc5883p]
  - source: github://dentra/esphome-components
    components: [ coredump, partitions ]

esp32:
  board: esp32dev
  framework:
    type: esp-idf
    sdkconfig_options:
      CONFIG_ESP_COREDUMP_ENABLE_TO_FLASH: y
      CONFIG_ESP_COREDUMP_DATA_FORMAT_ELF: y
      CONFIG_ESP_COREDUMP_CHECKSUM_CRC32: y
      CONFIG_ESP_INT_WDT_TIMEOUT_MS: "1000"  # Increase from default 300ms
      CONFIG_ESP_TASK_WDT_TIMEOUT_S: "10"     # Increase task WDT too
  partitions: custom_partitions.csv

coredump:

i2c:
  - id: water_i2c_bus
    sda: GPIO26
    scl: GPIO25
    frequency: 10kHz
    timeout: 1ms
  - id: gas_i2c_bus
    sda: GPIO33
    scl: GPIO32
    frequency: 10kHz
    timeout: 1ms

switch:
  - platform: gpio
    pin: GPIO18 
    id: sensor_power_water
    restore_mode: ALWAYS_ON
    internal: true
  - platform: gpio
    pin: GPIO19 
    id: sensor_power_gas
    restore_mode: ALWAYS_ON
    internal: true

  - platform: template
    name: "Display Raw Gas Field Data"
    id: gas_data_switch
    optimistic: true
    entity_category: diagnostic
    icon: "mdi:eye"
    restore_mode: "RESTORE_DEFAULT_OFF"
    on_turn_off:
      then:
        - lambda: 'id(raw_gas_field).publish_state(NAN);'
  - platform: template
    name: "Display Raw Water Field Data"
    id: water_data_switch
    optimistic: true
    entity_category: diagnostic
    icon: "mdi:eye"
    restore_mode: "RESTORE_DEFAULT_OFF"
    on_turn_off:
      then:
        - lambda: 'id(raw_water_field).publish_state(NAN);'

globals:
  - id: water_total_pulses
    type: float
    restore_value: yes
    initial_value: '0.0'
  - id: gas_total_pulses
    type: float
    restore_value: yes
    initial_value: '0.0'

  # Water meter wave alternates peak heights
  - id: expecting_half_peak
    type: bool
    restore_value: yes
    initial_value: 'false'

  - id: reboot_cause_code
    type: int
    restore_value: yes
    initial_value: '0'

  # Flow tracking globals
  - id: water_pulses_last_period
    type: int
    restore_value: no
    initial_value: '-1'
  - id: gas_pulses_last_period
    type: int
    restore_value: no
    initial_value: '-1'

  - id: water_recovery_running
    type: bool
    initial_value: 'false'
  - id: gas_recovery_running
    type: bool
    initial_value: 'false'
  - id: recovery_attempts
    type: int
    initial_value: '0'

sensor:
  - platform: qmc5883p
    id: water_sensor
    i2c_id: water_i2c_bus
    field_strength_z:
      name: "Water Field Strength Z"
      id: "water_mag_z"
      internal: true
      on_value:
        then:
          - lambda: |-

              // Push to the visible sensor ONLY if the switch is ON
              if (id(water_data_switch).state) {
                id(raw_water_field).publish_state(x);
              }

              float dynamic_threshold = id(expecting_half_peak) ? id(water_mid_threshold).state : id(water_max_threshold).state;

              if (!id(water_pulse_state).state) {
                if (x >= id(water_max_threshold).state) {
                  // Full peak detected, expect half peak next
                  id(water_pulse_state).publish_state(true);
                  id(expecting_half_peak) = true;
                } 
                else if (id(expecting_half_peak) && x >= id(water_mid_threshold).state) {
                  // Half peak detected, expect full peak next
                  id(water_pulse_state).publish_state(true);
                  id(expecting_half_peak) = false; 
                }
              } else {
                if (x <= id(water_min_threshold).state) {
                  id(water_pulse_state).publish_state(false);
                }
              }
    update_interval: never

  - platform: qmc5883p
    id: gas_sensor
    i2c_id: gas_i2c_bus
    field_strength_x:
      name: "Gas Field Strength X"
      id: "gas_mag_x"
      internal: true
      on_value:
        then:
          - lambda: |-
              // Push to the visible sensor ONLY if the switch is ON
              if (id(gas_data_switch).state) {
                id(raw_gas_field).publish_state(x);
              }
              if (x >= id(gas_max_threshold).state) {
                  id(gas_pulse_state).publish_state(true);
              } 
              else if (x <= id(gas_min_threshold).state) {
                  id(gas_pulse_state).publish_state(false);
              }
    update_interval: never

  # Raw sensor data for diagnostics
  - platform: template
    name: "Water Z Field"
    id: raw_water_field
    unit_of_measurement: "µT"
    accuracy_decimals: 3
    update_interval: never
    entity_category: diagnostic
    icon: "mdi:sine-wave"
  - platform: template
    name: "Gas X Field"
    id: raw_gas_field
    unit_of_measurement: "µT"
    accuracy_decimals: 3
    update_interval: never
    entity_category: diagnostic
    icon: "mdi:sine-wave"

  - platform: template
    name: "Water Total Volume"
    device_class: water
    state_class: total_increasing
    unit_of_measurement: "L"
    accuracy_decimals: 3
    icon: "mdi:water"
    lambda: |-
      return id(water_total_pulses) * id(water_multiplier).state;
    update_interval: 1s
  
  - platform: template
    name: "Gas Total Volume"
    device_class: gas
    state_class: total_increasing
    unit_of_measurement: "m³"
    accuracy_decimals: 2
    icon: "mdi:meter-gas"
    lambda: |-
      return id(gas_total_pulses) * id(gas_multiplier).state; 
    update_interval: 1s

  # Raw pulses for diagnostics
  - platform: template
    name: "Water Pulses"
    id: raw_water_pulses
    state_class: total_increasing
    icon: "mdi:counter"
    entity_category: diagnostic
    accuracy_decimals: 2
    lambda: |-
        return (float)id(water_total_pulses);
  - platform: template
    name: "Gas Pulses"
    id: raw_gas_pulses
    state_class: total_increasing
    icon: "mdi:counter"
    entity_category: diagnostic
    accuracy_decimals: 1
    lambda: |-
        return (float)id(gas_total_pulses);

  - platform: template
    name: "Water Flow Rate"
    id: water_flow_rate
    unit_of_measurement: "L/min"
    device_class: volume_flow_rate
    state_class: measurement
    accuracy_decimals: 2
    update_interval: 5s  # Calculate every 10 seconds
    lambda: |-
      if (id(water_pulses_last_period) == -1) {
        id(water_pulses_last_period) = id(water_total_pulses);
        return 0.0;  // Return zero flow on first boot
      }

      int pulses_this_period = id(water_total_pulses) - id(water_pulses_last_period);
      id(water_pulses_last_period) = id(water_total_pulses);
      
      // pulses in 5 seconds → liters per minute
      // pulses * (multipler L/pulse) * (60 seconds / 5 seconds) = L/min
      float liters_this_period = pulses_this_period * id(water_multiplier).state;
      float flow_rate = liters_this_period * 12.0;  // Convert to per-minute
      
      return flow_rate;

  - platform: template
    name: "Gas Flow Rate"
    id: gas_flow_rate
    unit_of_measurement: "m³/h"
    device_class: volume_flow_rate
    state_class: measurement
    accuracy_decimals: 3
    update_interval: 15s  # Gas flows slower, check less often
    lambda: |-
      if (id(gas_pulses_last_period) == -1) {
        id(gas_pulses_last_period) = id(gas_total_pulses);
        return 0.0;  // Return zero flow on first boot
      }

      int pulses_this_period = id(gas_total_pulses) - id(gas_pulses_last_period);
      id(gas_pulses_last_period) = id(gas_total_pulses);
      
      // pulses in 15 seconds → m³ per hour
      // pulses * (multiplier m³/pulse) * (3600 seconds / 15 seconds) = m³/h
      float volume_this_period = pulses_this_period * id(gas_multiplier).state;
      float flow_rate = volume_this_period * 240.0;  // Convert to per-hour
      
      return flow_rate;

number:
  - platform: template
    name: "Water Pulse Max Threshold"
    id: water_max_threshold
    unit_of_measurement: "µT"
    icon: "mdi:arrow-collapse-up"
    max_value: 100
    min_value: -100
    step: 0.1
    initial_value: 21.0
    restore_value: true
    entity_category: config
    optimistic: true
  - platform: template
    name: "Water Pulse Mid Threshold"
    id: water_mid_threshold
    unit_of_measurement: "µT"
    icon: "mdi:arrow-collapse-vertical"
    max_value: 100
    min_value: -100
    step: 0.1
    initial_value: 15.0
    restore_value: true
    entity_category: config
    optimistic: true
  - platform: template
    name: "Water Pulse Min Threshold"
    id: water_min_threshold
    unit_of_measurement: "µT"
    icon: "mdi:arrow-collapse-down"
    max_value: 100
    min_value: -100
    step: 0.1
    initial_value: 12.0
    restore_value: true
    entity_category: config
    optimistic: true

  - platform: template
    name: "Gas Pulse Max Threshold"
    id: gas_max_threshold
    unit_of_measurement: "µT"
    icon: "mdi:arrow-collapse-up"
    max_value: 100
    min_value: -100
    step: 0.1
    initial_value: -6.0
    restore_value: true
    entity_category: config
    optimistic: true
  - platform: template
    name: "Gas Pulse Min Threshold"
    id: gas_min_threshold
    unit_of_measurement: "µT"
    icon: "mdi:arrow-collapse-down"
    max_value: 100
    min_value: -100
    step: 0.1
    initial_value: -10.0
    restore_value: true
    entity_category: config
    optimistic: true
  
  - platform: template
    name: "Water Volume Per Rotation"
    id: water_multiplier
    icon: "mdi:rotate-360"
    unit_of_measurement: "L"
    max_value: 1
    min_value: -1
    step: 0.0001
    initial_value: 0.065558
    restore_value: true
    entity_category: config
    optimistic: true
  - platform: template
    name: "Gas Volume Per Rotation"
    id: gas_multiplier
    icon: "mdi:rotate-360"
    unit_of_measurement: "m³"
    max_value: 1
    min_value: -1
    step: 0.0001
    initial_value: 0.00314
    restore_value: true
    entity_category: config
    optimistic: true
    

text_sensor:
  - platform: debug
    reset_reason:
      name: "Reboot Reason"
  - platform: template
    name: "Reboot Cause"
    id: reboot_cause_sensor
    icon: "mdi:sync-alert"
    entity_category: diagnostic
  - platform: template
    name: "Recovery Status"
    id: recovery_status
    icon: "mdi:tools"
    entity_category: diagnostic

debug:

script:
  - id: recover_i2c_bus
    parameters:
      bus_index: int  # 0 = Water, 1 = Gas
    mode: single
    then:
        - lambda: |-
            if (bus_index == 0) id(water_recovery_running) = true;
            else id(gas_recovery_running) = true;
            id(recovery_attempts)++;
            ESP_LOGI("recovery", "===== Recovering %s bus, attempt %d of 3 =====", bus_index == 0 ? "Water" : "Gas", id(recovery_attempts));
            id(recovery_status).publish_state(str_sprintf("Recovering %s bus...", bus_index == 0 ? "Water" : "Gas"));
            
        # 1. Physical Power Down
        - lambda: 'ESP_LOGI("recovery", "[1/5] Sensor powered off");'
        - if:
            condition:
              lambda: 'return bus_index == 0;'
            then:
              - switch.turn_off: sensor_power_water
            else:
              - switch.turn_off: sensor_power_gas
      
        - delay: 1s

        # 2. Check Bus State & Hardware Reinit
        - lambda: |-
            ESP_LOGI("recovery", "[2/5] Checking if lines are stuck low...");

            // Check if the SDA or SCL lines are stuck LOW while power is off.
            // If they are, it means the ESP32 internal peripheral is confused.
            gpio_num_t sda = (bus_index == 0) ? GPIO_NUM_26 : GPIO_NUM_33;
            gpio_num_t scl = (bus_index == 0) ? GPIO_NUM_25 : GPIO_NUM_32;

            gpio_set_direction(sda, GPIO_MODE_INPUT);
            gpio_set_pull_mode(sda, GPIO_PULLUP_ONLY);
            gpio_set_direction(scl, GPIO_MODE_INPUT);
            gpio_set_pull_mode(scl, GPIO_PULLUP_ONLY);

            bool stuck = (gpio_get_level(sda) == 0 || gpio_get_level(scl) == 0);

            if (stuck) {
              ESP_LOGW("recovery", "Bus pins stuck low. Attempting line flush...");
              gpio_set_direction(scl, GPIO_MODE_OUTPUT);
              for(int i=0; i<16; i++) {
                gpio_set_level(scl, 1); // HIGH
                esp_rom_delay_us(20);   
                gpio_set_level(scl, 0); // LOW
                esp_rom_delay_us(20);

                App.feed_wdt();
             }
            }

        # 3. Power Up
        - lambda: 'ESP_LOGI("recovery", "[3/5] Sensor powered on, waiting for boot...");'
        - if:
            condition:
              lambda: 'return bus_index == 0;'
            then:
              - switch.turn_on: sensor_power_water
            else:
              - switch.turn_on: sensor_power_gas
            
        - delay: 1500ms # Give sensor time to boot

        # 4. The "Safe" Re-Initialization
        # Instead of deleting the driver, we just clear the error status
        # and force the component to re-send its configuration.
        - lambda: |-
            auto* bus = (bus_index == 0) ? id(water_i2c_bus) : id(gas_i2c_bus);
            auto* mag = (bus_index == 0) ? id(water_sensor) : id(gas_sensor);

            ESP_LOGI("recovery", "[4/5] Re-initializing sensor software state...");
            mag->status_clear_error();
            mag->setup();

            if (bus->status_has_error()) {
                ESP_LOGE("recovery", "I2C Bus hardware peripheral is still errored");
            }

        - delay: 500ms

        # 5. Verification
        - lambda: |-
            ESP_LOGI("recovery", "[5/5] Verifying sensor responses");
            auto* mag = (bus_index == 0) ? id(water_sensor) : id(gas_sensor);
            mag->update();
      
        - delay: 200ms

        - lambda: |-
            auto* mag = (bus_index == 0) ? id(water_sensor) : id(gas_sensor);
            if (mag->status_has_error()) {
                ESP_LOGI("recovery", "===== Recovery Failed %d of 3 =====", id(recovery_attempts));
                if (bus_index == 0) id(water_recovery_running) = false;
                else id(gas_recovery_running) = false;

                if (id(recovery_attempts) >= 3) {
                  id(reboot_cause_code) = (bus_index == 0 ? 1 : 2);
                  delay(100);
                  App.safe_reboot();
                }
            } else {
                ESP_LOGI("recovery", "===== Recovery Successful =====");
                if (bus_index == 0) id(water_recovery_running) = false;
                else id(gas_recovery_running) = false;
                id(recovery_attempts) = 0;
                id(water_mag_alive).publish_state(true);
                id(gas_mag_alive).publish_state(true);
                id(recovery_status).publish_state("Normal");
            }

interval:
  - interval: 15ms
    then:
      - if:
          condition:
            lambda: return !id(water_recovery_running);
          then:
            - lambda: |-
                if (id(water_sensor)->status_has_error()) {
                  return;
                }
            - component.update: water_sensor
      
  - interval: 50ms
    startup_delay: 25ms 
    then:
      - if:
          condition:
            lambda: return !id(gas_recovery_running);
          then:
            - lambda: |-
                if (id(gas_sensor)->status_has_error()) {
                  return;
                }
            - component.update: gas_sensor

  # Frozen water bus check
  - interval: 300ms
    then:
      - lambda: |-
          if (id(sensor_power_water).state && !id(water_recovery_running)) {
              static float last_water_z = 0;
              static int water_frozen_count = 0;
              
              float current_water_z = id(water_mag_z).state;
              
              // Check if value hasn't changed in 5 seconds
              if (abs(current_water_z - last_water_z) < 0.001) {  // Less than 0.1µT change
                water_frozen_count++;
                if ((water_frozen_count >= 4) && (water_frozen_count < 10)) { 
                  ESP_LOGW("watchdog", "Water sensor appears frozen (%d/10), value stuck at %.2f", 
                        water_frozen_count, current_water_z);
                }
                
                if (water_frozen_count >= 10) { 
                  id(water_mag_alive).publish_state(false);
                  id(recover_i2c_bus).execute(0);
                }
              } else {
                if (water_frozen_count >= 4) {
                  ESP_LOGD("watchdog", "Water sensor alive again, value changed to %.2f", current_water_z);
                  id(recovery_attempts) = 0;
                }
                id(water_mag_alive).publish_state(true);
                water_frozen_count = 0;
              }
              
              last_water_z = current_water_z;
          }

  # Frozen gas bus check
  - interval: 500ms
    then:
      - lambda: |-
          if (id(sensor_power_gas).state && !id(gas_recovery_running)) {
              static float last_gas_x = 0;
              static int gas_frozen_count = 0;

              float current_gas_x = id(gas_mag_x).state;

              if (abs(current_gas_x - last_gas_x) < 0.001) {
                gas_frozen_count++;
                if ((gas_frozen_count >= 5) && (gas_frozen_count < 8)) {
                  ESP_LOGW("watchdog", "Gas sensor appears frozen (%d/8), change: %.3fµT", 
                        gas_frozen_count, current_gas_x);
                }
                
                if (gas_frozen_count >= 8) {
                  id(gas_mag_alive).publish_state(false);
                  id(recover_i2c_bus).execute(1);
                }
              } else {
                if (gas_frozen_count >= 5) {
                  ESP_LOGD("watchdog", "Gas sensor alive again, value changed to %.3fµT", current_gas_x);
                  id(recovery_attempts) = 0;
                }
                id(gas_mag_alive).publish_state(true);
                gas_frozen_count = 0;
              }
              
              last_gas_x = current_gas_x;
          }

binary_sensor:
  - platform: template
    name: "Water Magnetometer Alive"
    id: water_mag_alive
    entity_category: diagnostic
  - platform: template
    name: "Gas Magnetometer Alive"
    id: gas_mag_alive
    entity_category: diagnostic

  - platform: template
    name: "Water Pulse State"
    id: water_pulse_state
    device_class: moving
    internal: true
    on_press:
      then:
        - lambda: 'id(water_total_pulses) += 0.25;'
    on_release:
      then:
        - lambda: 'id(water_total_pulses) += 0.25;'
  - platform: template
    name: "Gas Pulse State"
    id: gas_pulse_state
    device_class: moving
    internal: true
    on_press:
      then:
        - lambda: 'id(gas_total_pulses) += 0.5;'
    on_release:
      then:
        - lambda: 'id(gas_total_pulses) += 0.5;'


# Enable logging
logger:
  level: DEBUG
  logs:
    qmc5883p: INFO
    i2c.idf: WARN
    i2c: WARN
    component: INFO
    sensor: INFO
    watchdog: DEBUG
    binary_sensor: INFO
    esp32.preferences: ERROR

web_server:
  port: 80

# Enable Home Assistant API
api:
  encryption:
    key: "xxx"

ota:
  - platform: esphome
    password: "xxx"

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

  use_address: utility-monitor.local

  # Enable fallback hotspot (captive portal) in case wifi connection fails
  ap:
    ssid: "Utility-Monitor Fallback Hotspot"
    password: "xxx"

captive_portal:
    

What else can I even try here? Different magnetometers? I considered swapping to the MMC5603s. Should I use shielded cat5 instead, or would the capacitance on that be too much?

Wouldn’t it be better from a signal perspective to put an ESP at each location and use the CAT5 to power them? I’d probably choose a 12v or 24v PSU and use buck converters at the end. Don’t think I2C is meant for such lengths.

2 Likes

I’ve done the same thing for my water and gas. The run for my gas meter is similar in length to yours and I have no issues.

However, the run for my water meter is significantly longer and it’s periodically given me issues (including right now). In my case, I believe the water meter is extra problematic partly because of the high update interval required (the meter spins VERY fast). Originally I tried putting the ESP close by it, but I believe WiFi signal was sub-optimal and this caused other issues.

Have you tried using two ESPs instead of one (even if they’re right next to each other)?

You have to have something very much wrong, even windows11 reboots quicker…
What you get on logs?

I think 15ms interval is not viable on esphome.
And 1ms timeout on 10khz i2c is too low.
And I don’t know is calling setup() on runtime is good idea either.

Overall I wonder if you have right sensor for this approach, some digital output hall sensor might be much better.
And like already mentioned, put the esp next to sensor, it doesn’t make lot of sense to stretch i2c wiring to save one esp-board.

I am using this project GitHub - tronikos/esphome-magnetometer-water-gas-meter: Using ESP8266 or ESP32 and QMC5883L or QMC5883P or HMC5883L or MMC5603, a triple-axis magnetometer, to read your water meter or gas meter

and it mostly works. But, you are asking for trouble using i2c at those distances. Use your CAT5 cable to bring DC power to two independent esp32 devices at your meter locations.

i2c was designed for very short distances (between integrated circuits, so think a few cm and not many meters). It can work at long distances but it also frequently doesn’t.