Standalone Wifi Caller ID Decoder

Standalone Caller ID Decoder for Home Assistant

Introduction

I still have a landline around (well, an Ooma VOIP box really) with a phone number I’ve had for years. I wanted to be able to pop up a notification when someone calls, and sadly Ooma doesn’t make a usable API. I also have a Pi-based HA instance, not near the Ooma, so I figured standalone wifi-enabled caller-ID box based on an ESP would be the most flexible option.

Part cost: low (<$10, plus ~$40 shipping)
Difficulty: high (soldering, some electronics, parts acquisition, custom esp component)

Caller ID Basics

Caller ID technology has been around since the 1980s, transmitting caller information between the first and second ring of an incoming call. The data is encoded using FSK (Frequency Shift Keying) modulation at 1200 baud, where:

  • Mark (binary 1) = 1200 Hz
  • Space (binary 0) = 2200 Hz

The protocol follows the MDMF (Multiple Data Message Format) standard, which packages caller information in structured data blocks containing:

  • Date and time of the call
  • Calling number (the phone number)
  • Calling name (if available from the carrier)

Hardware: HT9032D-based Decoder

This project is based entirely on the excellent work from the Arduino Telephone Caller ID Unit Instructable for the CID module. The original design used an Arduino with the HT9032D caller ID decoder chip, which handles all the complex FSK demodulation and gives us clean digital data.

Using the Gerber files there I had a set of 5 custom PCBs printed in China for $2. Plus $27 shipping. The HT9032D has been discontinued by the manufacturer, but you can still find them; I got two for $10 on Amazon. The rest of the circuit board parts I got from Digikey and Arrow for pennies. The most annoying was getting the right form factor phone jack to fit the board - 0520186646 from Arrow on my 3rd attempt.

Adapting for ESP-01

While the original design used an Arduino to drive the LCD display, I didn’t want the display and just needed Home Assistant integration, which made ESPHome the perfect choice. However, the ESP-01 module presents some unique constraints:

Limited GPIO pins: The ESP-01 only has 4 usable GPIO pins, and we need three of them:

  • GPIO3 (RX) — UART receive from HT9032D data output
  • GPIO0 — Power-down control for the HT9032D (PWDN pin)
  • GPIO2 — Ring detection input

GPIO1 is TX, apparently output only, and I needed two inputs, so RX and GPIO2. (Maybe I could have used TX for power-down and GPIO0 for ring instead of GPIO2.) GPIO2 is normally blue LED control on the ESP01, so I had to give that up.

Boot requirements: GPIO0 and GPIO2 must be HIGH during boot, which unfortunately doesn’t align with our circuit needs:

  • GPIO0 HIGH = HT9032D powered down (safe startup state)
  • GPIO2 HIGH = Ring detection with pull-up resistor

but Ring is normally low unless the phone is ringing. I needed to invert the input to GPIO2 so that it is normally high, and pulled low only when ringing, so that my ESP can boot. I inverted the ring detection signal using an NPN transistor (2N3904).

 RING_SENSE ──[10K]──┤ 2N3904
   (base)            │       └── GND (emitter)
                   GPIO2
                 (collector)

When the phone isn’t ringing, the transistor is OFF and GPIO2 reads HIGH (pulled up via the ESP’s internal resistor). When a call comes in, the ring signal turns the transistor ON, pulling GPIO2 LOW, which we detect as an inverted signal in software.

Final Hardware

So altogether we now have:

  1. HT9032D Decoder Board
  2. ESP-01
  3. NPN transitor
  4. 3.3v regulator for the ESP, powered from a small USB brick

I 3D-printed a small mounting board to hold these pieces together and wire-wrapped the connections.

and then printed a small box to hold it all.

Software: Custom ESPHome Component

The heart of this project is a custom ESPHome component that implements the MDMF protocol decoder. ESPHome’s component framework made it relatively straightforward to create a reusable caller ID decoder. The custom component is written in C++ and lives under “esphome/components/caller_id” along with it’s Python init function. This was a rewrite of the Arduino code from the Instructable, done almost entirely by Claude Sonnnet 4.

Protocol State Machine

The decoder implements a finite state machine to handle the various phases of caller ID reception:

  1. CID_IDLE — Waiting for a ring signal to activate the decoder
  2. CID_SYNC — Looking for the synchronization pattern (alternating 0x55 bytes)
  3. CID_PACKET — Reading the main packet header and length
  4. CID_MESSAGE — Parsing individual MDMF data messages
  5. CID_END — Processing complete, waiting for line to go idle

Key Implementation Details

Power Management: The HT9032D is kept powered down (PWDN = HIGH) when idle to avoid trying to decode noise. It’s only activated when a ring is detected, giving us the brief window between rings to capture caller ID data.

Watchdog Protection: A 10-second watchdog timer resets the decoder if it gets stuck in any non-idle state, preventing lockups from malformed data or electrical interference.

MDMF Parsing: The component properly decodes MDMF parameter blocks:

  • Type 0x01: Date and time (MMDDHHMM format)
  • Type 0x02: Calling number
  • Type 0x07: Calling name

Hardware UART Limitation: Since GPIO1 (TX) and GPIO3 (RX) are the hardware UART pins, we lose console logging capability. However, logs are still available over WiFi through the Home Assistant API. And since I’m using GPIO2 for input, I lose control of the blue LED, which remains dark.

Code

cid.yaml
substitutions:
  name: "cid"
  friendly_name: "cid"

esp8266:
  board: esp01_1m
  restore_from_flash: false

# Pin assignments
# https://randomnerdtutorials.com/esp8266-pinout-reference-gpios/
# 3v3 |  | RX/GPIO3 - high at boot
# RST |  | GPIO0 - pulled up, flash if low on boot
# EN  |  | GPIO2 - pulled up, blue led on if pulled down, must be high at boot
# TX  |  | GND
#  ^ TX/GPIO1 - high at boot

# Caller ID
# Based on decoder board and code from 
# https://www.instructables.com/Arduino-Telephone-Caller-ID-Unit/

# Caller ID decoder using HT9032D module.
# Decodes MDMF (Multiple Data Message Format) caller ID from PSTN lines.
#
# ESP-01 GPIO assignments:
#   GPIO3 (RX) — CALLER_ID_RX (UART RX from HT9032D data out)
#   GPIO0      — PWDN         (HT9032D power-down control, active HIGH)
#   GPIO2      — RING_DET     (Ring detection input from HT9032D)
#
# Note: GPIO0 is HIGH at boot (normal boot mode) which matches PWDN=HIGH
# (decoder powered down). GPIO2 must also be HIGH at boot.

# invert ring signal using NPN transistor:
#  SENSE ──[10K]──┤ 2N3904
#                  │       └── GND (emitter)
#                GPIO2
#                (collector)

# Hardware logger disabled — GPIO1 (TX) / GPIO3 (RX) are the only
# hardware UART and GPIO3 is used for caller ID data.
# Logs are still available over WiFi via the API connection.
logger:
  baud_rate: 0

uart:
  id: caller_id_uart
  rx_pin: GPIO3
  baud_rate: 1200
  data_bits: 8
  parity: NONE
  stop_bits: 1

external_components:
  - source:
      type: local
      path: components

caller_id:
  uart_id: caller_id_uart
  pwdn_pin: GPIO0
  ring_pin:
    number: GPIO2
    inverted: true
    mode:
      input: true
      pullup: true
  caller_number:
    name: "Caller ID"
    icon: mdi:pound-box
  caller_name:
    name: "Caller Name"
    icon: mdi:account-box
  call_time:
    name: "Call Time"
    icon: mdi:clock-outline
  ring_detector:
    name: "Ring Detector"
    icon: mdi:bell-ring-outline
    filters:
      - delayed_off: 200ms
components/caller_id/caller_id_decoder.h
#pragma once

#include "esphome/core/component.h"
#include "esphome/core/hal.h"
#include "esphome/components/uart/uart.h"
#include "esphome/components/text_sensor/text_sensor.h"
#include "esphome/components/binary_sensor/binary_sensor.h"

static const char *const TAG = "caller_id";

namespace esphome {
namespace caller_id {

static const uint8_t MDMF_HEADER = 0x80;
static const uint8_t MDMF_PARAM_TIME = 0x01;
static const uint8_t MDMF_PARAM_CID = 0x02;
static const uint8_t MDMF_PARAM_NAME = 0x07;

static const uint8_t MIN_SYNC_COUNT = 10;
static const uint32_t WATCHDOG_TIMEOUT_MS = 10000;
static const uint8_t LINE_IDLE_THRESHOLD = 80;

class CallerIDDecoder : public Component, public uart::UARTDevice {
 public:
  void set_pwdn_pin(GPIOPin *pin) { pwdn_pin_ = pin; }
  void set_ring_pin(GPIOPin *pin) { ring_pin_ = pin; }
  void set_caller_number_sensor(text_sensor::TextSensor *sensor) { caller_number_ = sensor; }
  void set_caller_name_sensor(text_sensor::TextSensor *sensor) { caller_name_ = sensor; }
  void set_call_time_sensor(text_sensor::TextSensor *sensor) { call_time_ = sensor; }
  void set_ring_detector(binary_sensor::BinarySensor *sensor) { ring_detector_ = sensor; }

  float get_setup_priority() const override { return setup_priority::DATA; }

  void setup() override {
    pwdn_pin_->setup();
    ring_pin_->setup();
    // Power down the decoder at startup (PWDN HIGH = powered down).
    pwdn_pin_->digital_write(true);
    state_ = CID_IDLE;
    msg_state_ = CIDMSG_HEADER;
  }

  void loop() override {
    // Poll ring pin and publish to binary sensor.
    bool ring_state = ring_pin_->digital_read();
    if (ring_detector_ != nullptr && ring_state != last_ring_state_) {
      ring_detector_->publish_state(ring_state);
      last_ring_state_ = ring_state;
    }

    // Watchdog: if mid-decode with no progress for 10s, reset.
    if (state_ == CID_SYNC || state_ == CID_PACKET || state_ == CID_MESSAGE) {
      if (millis() - watchdog_start_ > WATCHDOG_TIMEOUT_MS) {
        ESP_LOGW(TAG, "Watchdog timeout in state %d, resetting to idle", state_);
        reset_to_idle_();
        return;
      }
    }

    switch (state_) {
      case CID_IDLE:
        handle_idle_(ring_state);
        break;
      case CID_SYNC:
        handle_sync_();
        break;
      case CID_PACKET:
        handle_packet_();
        break;
      case CID_MESSAGE:
        handle_message_();
        break;
      case CID_END:
        handle_end_(ring_state);
        break;
    }
  }

 protected:
  GPIOPin *pwdn_pin_{nullptr};
  GPIOPin *ring_pin_{nullptr};
  text_sensor::TextSensor *caller_number_{nullptr};
  text_sensor::TextSensor *caller_name_{nullptr};
  text_sensor::TextSensor *call_time_{nullptr};
  binary_sensor::BinarySensor *ring_detector_{nullptr};

  enum State { CID_IDLE, CID_SYNC, CID_PACKET, CID_MESSAGE, CID_END };
  enum MsgState { CIDMSG_HEADER, CIDMSG_LEN, CIDMSG_DATA };

  State state_{CID_IDLE};
  MsgState msg_state_{CIDMSG_HEADER};

  uint8_t sync_count_{0};
  uint8_t packet_len_{0};
  uint8_t current_len_{0};
  uint8_t msg_type_{0};
  uint8_t msg_len_{0};
  uint8_t msg_pos_{0};
  char msg_data_[16]{};

  uint32_t watchdog_start_{0};
  uint32_t end_delay_start_{0};
  uint8_t idle_counter_{0};
  bool last_ring_state_{false};

  void reset_to_idle_() {
    ESP_LOGD(TAG, "Resetting to idle (was state %d, sync_count=%d)", state_, sync_count_);
    state_ = CID_IDLE;
    msg_state_ = CIDMSG_HEADER;
    sync_count_ = 0;
    idle_counter_ = 0;
    msg_pos_ = 0;
    pwdn_pin_->digital_write(true);  // Power down decoder.
  }

  void handle_idle_(bool ring_state) {
    if (ring_state) {
      // Ring detected — activate decoder (PWDN LOW).
      ESP_LOGI(TAG, "Ring detected, powering up decoder, entering SYNC state");
      pwdn_pin_->digital_write(false);
      sync_count_ = 0;
      watchdog_start_ = millis();
      state_ = CID_SYNC;
    }
  }

  void handle_sync_() {
    if (!available())
      return;

    uint8_t data;
    read_byte(&data);
    watchdog_start_ = millis();

    if (data == 0x55) {
      sync_count_++;
      if (sync_count_ % 10 == 0) {
        ESP_LOGD(TAG, "SYNC: received %d sync bytes so far", sync_count_);
      }
    } else {
      ESP_LOGD(TAG, "SYNC: non-sync byte 0x%02X (sync_count=%d)", data, sync_count_);
    }

    if (sync_count_ >= MIN_SYNC_COUNT && data == MDMF_HEADER) {
      ESP_LOGI(TAG, "MDMF header detected after %d sync bytes, entering PACKET state", sync_count_);
      state_ = CID_PACKET;
      watchdog_start_ = millis();
    }
  }

  void handle_packet_() {
    if (!available())
      return;

    uint8_t data;
    read_byte(&data);
    packet_len_ = data;
    current_len_ = 0;
    state_ = CID_MESSAGE;
    msg_state_ = CIDMSG_HEADER;
    watchdog_start_ = millis();
    ESP_LOGD(TAG, "PACKET: length=%d, entering MESSAGE state", packet_len_);
  }

  void handle_message_() {
    if (!available())
      return;

    // Check if we've consumed all message bytes in the packet.
    if (current_len_ >= packet_len_) {
      // Read and discard the checksum byte.
      uint8_t checksum;
      read_byte(&checksum);
      ESP_LOGD(TAG, "MESSAGE: packet complete, checksum=0x%02X, entering END state", checksum);

      state_ = CID_END;
      msg_state_ = CIDMSG_HEADER;
      idle_counter_ = 0;
      end_delay_start_ = millis();
      watchdog_start_ = millis();

      // Power down the decoder.
      pwdn_pin_->digital_write(true);
      return;
    }

    uint8_t data;

    switch (msg_state_) {
      case CIDMSG_HEADER:
        read_byte(&data);
        msg_type_ = data;
        msg_state_ = CIDMSG_LEN;
        current_len_++;
        watchdog_start_ = millis();
        ESP_LOGD(TAG, "MSG HEADER: type=0x%02X (%s)", msg_type_,
                 msg_type_ == MDMF_PARAM_CID ? "CID" :
                 msg_type_ == MDMF_PARAM_TIME ? "TIME" :
                 msg_type_ == MDMF_PARAM_NAME ? "NAME" : "UNKNOWN");
        break;

      case CIDMSG_LEN:
        read_byte(&data);
        msg_len_ = data;
        msg_state_ = CIDMSG_DATA;
        msg_pos_ = 0;
        current_len_++;
        watchdog_start_ = millis();
        ESP_LOGD(TAG, "MSG LEN: %d bytes", msg_len_);
        break;

      case CIDMSG_DATA:
        read_byte(&data);
        if (msg_pos_ < 15) {
          msg_data_[msg_pos_] = static_cast<char>(data);
          msg_data_[msg_pos_ + 1] = '\0';
        }
        msg_pos_++;
        current_len_++;

        if (msg_pos_ >= msg_len_) {
          ESP_LOGD(TAG, "MSG DATA complete: \"%s\"", msg_data_);
          publish_message_();
          msg_state_ = CIDMSG_HEADER;
          watchdog_start_ = millis();
        }
        break;
    }
  }

  void handle_end_(bool ring_state) {
    // Wait for the ringing to stop before returning to idle.
    if (millis() - end_delay_start_ >= 100) {
      end_delay_start_ = millis();
      idle_counter_++;

      if (ring_state) {
        idle_counter_ = 0;
      }
    }

    if (idle_counter_ > LINE_IDLE_THRESHOLD) {
      idle_counter_ = 0;
      state_ = CID_IDLE;
    }
  }

  void publish_message_() {
    switch (msg_type_) {
      case MDMF_PARAM_CID:
        if (caller_number_ != nullptr)
          caller_number_->publish_state(std::string(msg_data_));
        break;
      case MDMF_PARAM_TIME:
        if (call_time_ != nullptr)
          call_time_->publish_state(format_time_(msg_data_));
        break;
      case MDMF_PARAM_NAME:
        if (caller_name_ != nullptr)
          caller_name_->publish_state(std::string(msg_data_));
        break;
    }
  }

  // Format MDMF time (MMDDhhmm) into a readable string.
  std::string format_time_(const char *data) {
    char buf[12];
    snprintf(buf, sizeof(buf), "%c%c/%c%c %c%c:%c%c",
             data[0], data[1], data[2], data[3],
             data[4], data[5], data[6], data[7]);
    return std::string(buf);
  }
};

}  // namespace caller_id
}  // namespace esphome

components/caller_id/__init__.py
import esphome.codegen as cg
import esphome.config_validation as cv
from esphome.components import uart, text_sensor, binary_sensor
from esphome import pins

DEPENDENCIES = ["uart"]
AUTO_LOAD = ["text_sensor", "binary_sensor"]

CONF_PWDN_PIN = "pwdn_pin"
CONF_RING_PIN = "ring_pin"
CONF_CALLER_NUMBER = "caller_number"
CONF_CALLER_NAME = "caller_name"
CONF_CALL_TIME = "call_time"
CONF_RING_DETECTOR = "ring_detector"

caller_id_ns = cg.esphome_ns.namespace("caller_id")
CallerIDDecoder = caller_id_ns.class_(
    "CallerIDDecoder", cg.Component, uart.UARTDevice
)

CONFIG_SCHEMA = (
    cv.Schema(
        {
            cv.GenerateID(): cv.declare_id(CallerIDDecoder),
            cv.Required(CONF_PWDN_PIN): pins.gpio_output_pin_schema,
            cv.Required(CONF_RING_PIN): pins.gpio_input_pin_schema,
            cv.Optional(CONF_CALLER_NUMBER): text_sensor.text_sensor_schema(),
            cv.Optional(CONF_CALLER_NAME): text_sensor.text_sensor_schema(),
            cv.Optional(CONF_CALL_TIME): text_sensor.text_sensor_schema(),
            cv.Optional(CONF_RING_DETECTOR): binary_sensor.binary_sensor_schema(),
        }
    )
    .extend(cv.COMPONENT_SCHEMA)
    .extend(uart.UART_DEVICE_SCHEMA)
)


async def to_code(config):
    var = cg.new_Pvariable(config[cv.CONF_ID])
    await cg.register_component(var, config)
    await uart.register_uart_device(var, config)

    pwdn = await cg.gpio_pin_expression(config[CONF_PWDN_PIN])
    cg.add(var.set_pwdn_pin(pwdn))

    ring = await cg.gpio_pin_expression(config[CONF_RING_PIN])
    cg.add(var.set_ring_pin(ring))

    if CONF_CALLER_NUMBER in config:
        sens = await text_sensor.new_text_sensor(config[CONF_CALLER_NUMBER])
        cg.add(var.set_caller_number_sensor(sens))

    if CONF_CALLER_NAME in config:
        sens = await text_sensor.new_text_sensor(config[CONF_CALLER_NAME])
        cg.add(var.set_caller_name_sensor(sens))

    if CONF_CALL_TIME in config:
        sens = await text_sensor.new_text_sensor(config[CONF_CALL_TIME])
        cg.add(var.set_call_time_sensor(sens))

    if CONF_RING_DETECTOR in config:
        sens = await binary_sensor.new_binary_sensor(config[CONF_RING_DETECTOR])
        cg.add(var.set_ring_detector(sens))

Integration with Home Assistant

The ESPHome component exposes four entities to Home Assistant:

  • Caller ID (text sensor) — The calling phone number
  • Caller Name (text sensor) — The caller’s name if provided
  • Call Time (text sensor) — Date and time of the call
  • Ring Detector (binary sensor) — Real-time ring detection

These entities can be used in Home Assistant automations for:

  • Call logging and history tracking
  • Notifications when specific numbers call
  • Announce callers over media players
  • Run automations when called from specific phones


This project was inspired by and builds upon the Arduino Telephone Caller ID Unit by [jayakody2000lk]. The ESPHome implementation and Home Assistant integration are original contributions to the maker community.

2 Likes