DFR1154 AI Camera — local ESP32-S3 security camera with MQTT, Telegram and optional YOLO companion

Hi all,

I wanted to share my open-source project for the DFRobot DFR1154 ESP32-S3 AI Camera:

GitHub - PeterkoCZ91/DFR1154-ai-camera: Self-hosted AI security camera firmware for DFRobot FireBeetle 2 ESP32-S3 (DFR1154). On-device person detection (FOMO + ByteTrack), MJPEG/RTSP streaming, Home Assistant MQTT, Telegram alerts. Optional Python A12 companion with YOLOv11n + face recognition. · GitHub

This is a security camera firmware for the DFR1154 board, built to run fully on your local network.

The board has an ESP32-S3, OV3660 camera, LTR-308 light sensor and a PDM microphone. I use it as a gate/security camera endpoint
with Home Assistant, MQTT and Telegram.

It can run on its own, or with an optional Docker/Python companion called A12. A12 connects to the camera stream and adds
YOLOv11n verification, face recognition, event scoring, PIR-triggered recording, adaptive video clips and Telegram photo/video
alerts.

No cloud is required.

What it does

The ESP32 firmware handles the camera side:

  • MJPEG stream
  • RTSP stream
  • snapshots
  • AVI recording
  • on-device motion/person detection
  • DAY / DUSK / NIGHT camera profiles
  • MQTT integration
  • Telegram alerts
  • local web dashboard
  • basic diagnostics

Internally, the firmware uses a PSRAM ring buffer so multiple parts of the system can read camera frames without constantly
copying them around.

Person detection

The firmware does not treat every motion event the same.

It uses a three-state person decision:

  • NONE
  • UNCERTAIN
  • CONFIDENT

A confident detection can trigger directly.
An uncertain detection can be sent to A12, where YOLOv11n checks the camera stream before sending a Telegram alert.

The goal is simple: fewer useless alerts, but still no silent drops when something looks suspicious.

A12 companion

A12 is optional. It runs locally in Docker on a Raspberry Pi, mini PC or server.

It adds:

  • YOLOv11n ONNX detection
  • event scoring
  • PIR / external sensor triggered recording
  • adaptive video clips
  • Telegram photo/video alerts
  • optional face recognition using a local known_faces.pkl
  • MQTT events for Home Assistant
  • simple diagnostics with tools/a12 doctor

Face recognition is done locally. The face database is not part of the repository and should stay on the local machine.

Home Assistant

The project integrates with Home Assistant through MQTT.

I mainly use HA as the automation layer, while the camera and A12 still do their own local processing. So the camera is not just
a passive video stream; it can publish useful events that HA can react to.

Examples:

  • person detected
  • uncertain person event
  • camera status
  • PIR-triggered event
  • face recognition result
  • recording/media event
  • A12 status

Multiple cameras

The current simple approach for multiple cameras is to run the same A12 Docker image more than once.

Each camera gets its own config, MQTT topic prefix and runtime folder.

That keeps it easy to understand, and if one camera goes offline it should not take down the other one.

Honest limitations

  • This is custom firmware, not ESPHome YAML.
  • A12 needs a local machine if you want YOLO or face recognition.
  • The project is mainly tuned for the DFR1154 / OV3660 board.
  • It is meant for local network use, not direct public internet exposure.
  • Setup is more technical than flashing a ready-made consumer camera.

Why I built it

I wanted a camera that behaves more like a local security sensor than just a webcam.

Something that can:

  • stream video,
  • detect people,
  • avoid many false alerts,
  • send useful Telegram notifications,
  • integrate with Home Assistant,
  • and still stay self-hosted.

Feedback, testing and PRs are welcome.

Happy to answer questions if anyone is working with ESP32-S3 cameras, DFR1154, MQTT security sensors or local AI camera
pipelines.

1 Like