Advanced debugging with HA

It’s been a while since I last published an in-depth article, and I wanted to make one about debugging. It’s quite obviously an intermediate-to-advanced level because HA increasingly “hides” complexity, which is a good thing. But if you are like me, a control freak, and quickly get upset when things aren’t acting as expected, well, this one is for you.

1 - Debugging with the internal tools

Obvious but necessary. Those tools are a treasure trove. It takes a bit of time to realize the beauty of their completeness. In the recent HA versions, you’ll find them in Settings → Developer tools

1.1) Developer tools

Template debugger

This is my go-to whenever I create a template, especially for triggers and conditions. When I write a Jinja2 templated condition or trigger, I first cut/paste it into the template debugger to see if it spits an error. It becomes easy to see right away if a sensor exists, if you have a syntax issue, or, most commonly for me, if you compare a string to an integer or something similar. 10 seconds check, hours of headache. This is a reflex to build.

Example of condition I tend to test:
- "{{ (as_timestamp(now()) - as_timestamp(state_attr('automation.alarm_security_tiggers','last_triggered'), 0) | int > 600) }}"

Or a trigger:

     - trigger: template
       value_template: >
         {%- set count = namespace() %}
         {%- set count.value = 0 %}
         {%- for sensor in expand('group.presence_detectors') -%}
           {%- if (now() - sensor.last_changed) < timedelta(seconds=90) -%}
             {%- set count.value = count.value +1 %}
           {%- endif -%}
         {%- endfor %}
         {%- for pirs in expand('group.outdoor_pirs') -%}
           {%- if (now() - pirs.last_changed) < timedelta(seconds=90) -%}
             {%- set count.value = count.value +1 %}
           {%- endif -%}
         {%- endfor %}
         {{ count.value >1 }}

States

There is more to unpack than meets the eye.

1st, see the cut/paste icon just below the sensor name, quite useful. You look for an entity, you type in a part of the name, HA finds it, you can then cut/paste it. Also the date on the right is an hyperlink to the history of the entity, which allows you to graph quickly it’s historical values. Then you can force the state and below you see the value of it and it’s various other attributes. Here my wemos.pitinfo is getting my main electric counter value (0 here because my solar PV are producing) and all other attributes.

Actions

The Action tab allows kicking a script or automation and checking if it passes the test. It’s practical because you can see the return code of HTTP queries, for example, or shell scripts if they have an output. It’s also good to know your action is understood by HA. Same principle as Jinja templates in the debugger, make it a habit and spare yourself time.

Events

I find this one particularly useful when using MQTT. You can input a request and see the result, or just listen to all MQTT events or those of a specific category. Now, this isn’t limited to MQTT; you can just listen to any event type you’re looking for. HA, being an event-based system, you can also simulate events by firing on which will trigger your automations, for example. You may also want to test an ESP listening to MQTT messages sent by HA. So you can fire your mqtt_pub from a Linux CLI or just use HA UX. Listening to “*” is usually very verbose, but you can find a list of the events here to narrow down.

Statistic

Who didn’t crash a long-term sensor because a ZHA or Zwave update ruined a specific entry? Or just you botched a template used in a utility meter, and well, now this graph looks like nothing. Statistic correction is exactly here to help you fix a value without pulling your favorite SQL wrapper. Better even, it’ll help you identify outliers quickly.

1.2) Automations & Scripts traces

Located in Settings → Automations & scenes.
This one is pure gold, let’s dive into it:

First, just know that what I explain here is also valid for scripts and not only automations.
Now you can click the 3 dot menu and “run actions” or use “traces”. Run actions will ignore triggers and conditions of your automations/scripts and only run … the actions. Quite obvious. If those actions are containing some if/then, I think they are still being evaluated though.

But the gold mine is … traces and its many facets:

Here, we can see the 4 triggers which may start an automation on top, then the conditional branching and the actions taken.

On the right block we can see details of the currently selected step in the tree, here, the conditional evaluation, with the values returned when applicable. (don’t mind the error in describing condition, old bug)

Above it, you have a date selector to see the last 3 executions and in the tabs of this block you’ll also see the trace timeline. If anything went wrong in an automation or script, you’ll find where, why and how 95% of times in this tracer. If not… there is a advanced mode: in the upright corner, click the 3 dot menu button and select “Download trace”. You’ll get a raw dump file that sometimes includes extra details on why it fails and previous execution iterations.

1.3) “ha core logs”

I’ll just lightly cover this one since we’ll put together a much better way.
Long story short, start the shell (install the addon ssh if need be) and in the window, type in “ha core logs”. Well you’ll see the last lines, it’s messy, not very descriptive, etc. Good for survival but not really for long term. More about this soon.

2 - Configurations, scripts, automations & widgets

2.1) YAML Configuration

We need traces. So we need proper logger configuration along with a few options.

In configuration.yaml:

system_log:
 max_entries: 15
 fire_event: true # set to true to receive notification if a problem is detected in the logs

Carefull here, this could become intensive on your system is set too high.
If you get sudden slow downs, turn if back off and revisit. Next, the logger.yaml:

default: warning
filters:
 homeassistant.components.homekit.util:
   - "media_player.philips_tv_2"
 homeassistant.components.websocket_api.http.connection:
   - "Received invalid command: mobile_app/push_notification_channel"
 frontend.js.modern:
   - "Uncaught error from Chrome"

logs:
 #homeassistant.components.zha: warning
 #zigpy: warning
 homeassistant.components.zha.core.device: warning
 homeassistant.components.device_tracker.unifi: error
 homeassistant.components.sonos.speaker: error
 homeassistant.components.nanoleaf: error
 homeassistant.components.device_tracker.iphonedetect: error
 homeassistant.core: warning
 homeassistant.components.mqtt: warning
 homeassistant.components.python_script: error
 aiohttp: error
 custom_components.spotcast: warning
 custom_components.localtuya.common: error
 ring_doorbell.doorbot: error
 custom_components.moonraker: error

Some things are to be logged always, other on error or warnings only, some need to be ignored because they repeat but you don’t care, etc and finally some need to be activated only when needed, if you actively troubleshoot something. Your burger, your sauce.

2.2) Scripts & Automation

Here is an automation I like, which isolate logs which are problematic in separate files:

 - id: "510121"
   alias: Misc - HA - Automation log error handling
   description: Capture HA automation/script warnings & errors and log them via my logger
   mode: queued
   max: 5
   initial_state: true
   triggers:
     - trigger: event
       event_type: system_log_event
   conditions:
     - condition: template
       value_template: >
         {% set d = trigger.event.data | default({}) %}
         {% set lvl = (d.get('level','') | string | upper) %}
         {% set n = (d.get('name','') | string) %}
         {# system_log_event level is typically WARNING or ERROR #}
         {% set is_error = lvl in ['ERROR','CRITICAL'] %}
         {% set is_ha_auto =
             n.startswith('homeassistant.components.automation')
             or n.startswith('homeassistant.components.script')
             or n.startswith('homeassistant.helpers.script')
             or n.startswith('homeassistant.helpers.service')
         %}
         {{ is_error and is_ha_auto }}
   actions:
     - variables:
         d: "{{ trigger.event.data | default({}) }}"
         summary: >-
           [{{ (d.get('level','') | string | upper) }}] {{ d.get('name','unknown.logger') }}
           — {{ (d.get('message','') | string) | replace('\n',' | ') }}
           {% if d.get('source') %} @ {{ d.get('source') }}{% endif %}
           {% if d.get('exception') %} :: {{ (d.get('exception') | string) | replace('\n',' | ') }}{% endif %}
     - action: script.uns
       data:
         log_facility: errors
         logbook: true
         message: "{{ summary }}"

(Much) More about script.uns below, but suffice to say it’s my goto to replace every other notification system.

2.3) Dashboard Widget

I’m graphing my raspi5 CPU:

type: custom:mini-graph-card
hours_to_show: 7
points_per_hour: 12
cache: true
smoothing: false
height: 200
line_width: 0
show:
  graph: bar
  fill: false
color_thresholds:
  - value: 0
    color: "#00FF00"
  - value: 0.5
    color: "#00FF50"
  - value: 1
    color: "#ffff00"
  - value: 1.5
    color: "#FFFF50"
  - value: 2
    color: "#FF0000"
entities:
  - name: 5M Load avg
    entity: sensor.load_5m
    graph: bar
    show_state: true
    show_points: true
    labels: true
    aggregate_func: max
    group_by: interval

Checking my ram as well:

type: custom:apexcharts-card
graph_span: 4h
update_interval: 5min
header:
  show: true
  title: RPi5
  show_states: true
  colorize_states: true
apex_config:
  legend:
    show: false
  dataLabels:
    enabled: false
  tooltip:
    x:
      format: ddd dd MMM
yaxis:
  - id: first
    opposite: false
    min: 0
    max: ~50
    decimals: 0
    apex_config:
      tickAmount: 4
    show: true
  - id: second
    opposite: true
    min: 0
    max: 4
    decimals: 0
    apex_config:
      tickAmount: 4
    show: true
series:
  - entity: sensor.home_assistant_v2_db_size
    transform: return(x/1024)
    name: DB size
    yaxis_id: second
    type: area
    opacity: 0.25
    stroke_width: 0.05
    unit: Gb
    color: cyan
    group_by:
      func: last
      duration: 5min
    show:
      datalabels: false
  - entity: sensor.memory_use
    name: RAM
    transform: return(x/1024)
    yaxis_id: second
    type: area
    opacity: 0.4
    stroke_width: 2
    color: orange
    unit: Gb
    group_by:
      func: last
      duration: 10min
    show:
      datalabels: false
  - entity: sensor.processor_use
    name: CPU
    yaxis_id: first
    type: line
    opacity: 0.8
    stroke_width: 2
    unit: "%"
    color: purple
    group_by:
      func: last
      duration: 5min
    show:
      datalabels: false

Checking the containers health:

type: vertical-stack
cards:
  - type: custom:auto-entities
    card:
      type: entities
      title: Running Automations
    filter:
      exclude:
        - state: unavailable
        - attributes:
            current: "= 0"
      include:
        - domain: automation
    show_empty: true
    sort:
      ignore_case: true
      method: name
  - type: custom:auto-entities
    card:
      type: entities
      title: Running Scripts
    filter:
      exclude:
        - state: "off"
        - state: "unavailable"
      include:
        - domain: script
    show_empty: true
    sort:
      ignore_case: true
      method: name
  - type: custom:auto-entities
    card:
      type: entities
      title: Containers CPU
    filter:
      exclude:
        - entity_id: sensor.z_wave_js_cpu_percent
        - entity_id: sensor.ring_mqtt_with_video_streaming_cpu_percent
        - entity_id: sensor.z_wave_js_memory_percent
        - entity_id: sensor.ring_mqtt_with_video_streaming_memory_percent
      include:
        - name: CPU
        - entity_id: sensor.*_cpu_percent
    sort:
      ignore_case: true
      method: name

3 - The advanced notification system I always wanted

I called it Universal Notification Script. It covers any kind of notification I can want so far. Could be better with interactive notification and all, but it’s a strong start:
#-------- UNIVERSAL NOTIFICATION SYSTEM ----------------------------------------

 uns:
   mode: parallel
   alias: Universal Notification System
   fields:
     message:
       description: "Message to be distributed"
       example: "May the force be with you"
       required: true
     logbook:
       description: "Log message in the logbook"
       default: true
       required: false
     log_facility:
       description: "security|airco|safety|pool|zwave|evcars|misc|solar|energy|errors"
       example: "security (-> notify.security_log -> ~/8-security.log)"
       default: "misc"
       required: false
     emergency:
       default: False
       required: false
     notified_users:
       example: "me,her"
       description: "Send a notification to phone of listed persons"
       default: "me"
       required: false
     media_players:
       description: "Media players TTS to speak through"
       example: kitchen, office
     audio_cue:
       description: "A sound bit to play"
       example: "/local/siren.wav, /local/firealarm.wav, /local/dog.wav"
     image:
       description: "Image to attach."
       example: "/config/tmp/test.jpg"
       default: " "
     led:
       description: "Set the ESP reative LED to a color and rythm"
       example: "prio:90 flash:red:3 or prio:55 ttl:15s breathe:yellow or prio:65 ttl:5m breathe:blue"
       required: false
       default: "prio:60 train:orange:120/120:3"
   variables:
     urgent: >
       {% if (emergency is defined and emergency == true) %}
         {{ True }}
       {% else %}
         {{ False }}
       {% endif %}
     mediaplayers: >
       {% if (emergency is defined and emergency == 'true') or (media_players is defined and media_players == 'all') %}
         {{ '["media_player.kitchen","media_player.terrasse_sud","media_player.office","media_player.living","media_player.gaming_room","media_player.kid","media_player.kitchen"]' }}
       {% elif media_players is defined %}
         {{- '[' -}}
           {%- for i in media_players.split(',') -%}
             {% if loop.last%}
               {{- '"media_player.'+i|replace(' ','')+'"' -}}
             {% else %}
               {{- '"media_player.'+i|replace(' ','')+'",' -}}
             {% endif %}
           {%- endfor -%}
           {{- ']' -}}
       {% else %}
         none
       {% endif %}
     logto: >
       {% if log_facility == 'security' %}
         {{ "notify.security_log" }}
       {% elif log_facility == 'errors' %}
         {{ "notify.errors_log"}}
       {% elif log_facility == 'pool' %}
         {{ "notify.pool_log" }}
       {% elif log_facility == 'evcars' %}
         {{ "notify.evcars_log" }}
       {% elif log_facility == 'solar' %}
         {{ "notify.solar_log" }}
       {% elif log_facility == 'energy' %}
         {{ "notify.energy_log" }}
       {% elif log_facility == 'zwave' %}
         {{ "notify.zwave_log" }}
       {% elif log_facility == 'airco' %}
         {{ "notify.airco_log" }}
       {% elif log_facility == 'safety' %}
         {{ "notify.safety_log" }}
       {% elif log_facility == 'misc' %}
         {{ "notify.misc_log" }}
       {% else %}
         {{ "notify.misc_log" }}
       {% endif %}
     img: >
       {% if image is defined %}
         {{ image }}
       {% else %}
         www/none.png
       {% endif %}
   sequence:
     - action: notify.send_message
       target:
         entity_id: "{{ logto }}"
       data:
         message: "{{ message }}"
     - if: "{{ logbook is defined }}"
       then:
         - action: logbook.log
           data:
             name: "{{ log_facility }}"
             domain: "text"
             message: "{{ message }}"
     - if: "{{ led is defined }}"
       then:
         - action: mqtt.publish
           data:
             topic: home/led/cmd
             payload: "{{ led }}"
     - if: "{{ notified_users is defined }}"
       then:
         - if: "{{ 'me' in notified_users and (now().hour not in (1,2,3,4,5,6,7,8) or urgent) }}"
           then:
             - action: shell_command.send_pushover # www/tmp/xyz.png for Pushover
               data:
                 message: "{{ message }}"
                 image: "{{ img }}"
         - if: "{{ 'her' in notified_users and now().hour not in (1,2,3,4,5,6,7,8)}}"
           then:
             - action: notify.mobile_app_iphone # "/local/tmp/xyz.png" for HA notifications
               data:
                 message: "{{ message }}"
                 data:
                   image: "{{ img | replace('www','/local') }}"
         - if: "{{ 'fabian' in notified_users and now().hour not in (1,2,3,4,5,6,7,8)}}"
           then:
             - action: notify.mobile_app_fabian_phone 
               data:
                 message: "{{ message }}"
                 data:
                   image: "{{ img | replace('www','/local') }}"
     - if: "{{ mediaplayers != 'none' }}"
       then:
         - if: "{{ (now().hour not in (1,2,3,4,5,6,7,8) and states('group.homies') == 'home') or urgent }}"
           then:
             - action: tts.amazon_polly_say
               data:
                 entity_id: "{{ mediaplayers }}"
                 message: "{{ message }}"
             - if: "{{ audio_cue is defined }}"
               then:
                 - delay: 10 # Delay for the TTS to finish generating
                 - action: media_player.play_media
                   data:
                     entity_id: "{{ mediaplayers }}"
                     media_content_type: music
                     media_content_id: "{{ audio_cue }}"

And using the “file” integration, I created hooks for my personnalized log files:

Okayyy, it’s a bit chunky but here are some usecase:

     - action: script.uns
       data:
         logbook: true
         log_facility: security
         notified_users: me
         emergency: true
         message: >
           {% for sensor in expand('group.security') -%}
             {% if (now() - sensor.last_changed) < timedelta(seconds=60) -%}
               {{- state_attr(sensor.entity_id, 'friendly_name').partition(' ')[0] + '|' -}}
             {% endif -%}
           {%- endfor %}
         media_players: gaming_room
         audio_cue: /local/dog.wav

It logs to the logbook, but also in a specific file named 8_security.log. So if I search for a precise problem regarding my EV, my cameras or Airco, I can directly look in the proper file. Handy really. The message is also voiced on the gaming room media player, here a dog is barking (it’s the alarm system). If the emergency flag is on, then I can be notified at night.

Another one, coupled with Frigate api:

     - repeat:
         for_each:
           - camera: camera.wyze
             entity_id: binary_sensor.living_room_person_occupancy
             where: "living room"
             image: "http://192.168.0.75:5000/api/wyze/latest.jpg?height=1080&quality=95"
           - camera: camera.rox
             entity_id: binary_sensor.pool_person_occupancy
             where: "south terrace"
             image: "http://192.168.0.75:5000/api/rox/latest.jpg?height=1080&quality=95"
           - camera: camera.rookie
             entity_id: binary_sensor.garage_person_occupancy
             where: "garage"
             image: "http://192.168.0.75:5000/api/rookie/latest.jpg?height=1080&quality=95"
           - camera: camera.sagittaire
             entity_id: binary_sensor.terrace_person_occupancy
             where: "north terrace"
             image: "http://192.168.0.75:5000/api/sagittaire/latest.jpg?height=1080&quality=95"
           - camera: camera.tic
             entity_id: binary_sensor.garden_person_occupancy
             where: "garden"
             image: "http://192.168.0.75:5000/api/tic/latest.jpg?height=1080&quality=95"
           - camera: camera.radada
             entity_id: binary_sensor.gate_person_occupancy
             where: "gate"
             image: "http://192.168.0.75:5000/api/radada/latest.jpg?height=1080&quality=95"
         sequence:
           - variables:
               ent: "{{ repeat.item.entity_id }}"
               recent: "{{ (now() - states[ent].last_changed).total_seconds() < 60 }}"
               stamp: "{{ now().strftime('-%H%MT%d%m%Y') }}"
               base: "{{ repeat.item.camera[7:] }}"
           - if: "{{ recent }}"
             then:
               - action: downloader.download_file
                 data:
                   overwrite: true
                   url: "{{ repeat.item.image }}"
                   filename: "{{ base ~ stamp ~ '.jpg' }}"
               - delay: 2
               - action: script.uns
                 data:
                   logbook: true
                   log_facility: security
                   notified_users: me
                   message: "Person detected in {{ repeat.item.where }} by {{ repeat.item.camera }}"
                   image: "{{ 'www/tmp/' ~ base ~ stamp ~ '.jpg' }}"

4 - Checking your containers’ health

Always a pain. Containers are great, until they’re not.
You basically run a mini OS of its own and it’s often hard to know what really happens inside. There are ways, but it’s tricky.

You can list running containers running:
sudo docker ps --format "table {{.ID}}\t{{.Names}}"

And from there on, explore what’s happening inside a specific container by running a bash into it:
docker exec -it addon_45df7312_zigbee2mqtt bash

5 - Shrinking your Database size

A classical hurdle in HA is the database size getting out of control.

5.1) Stop the bleeding

This is in recorder.yaml. Make it tighter.

purge_keep_days: 30
exclude:
 domains:
   - updater
   - alarm_control_panel
   - calendar
   - camera
   - climate
   - media_player
   - proximity
   - script
   - sun
   - vacuum
   - weather
   - zone
   - media_player
   - uptime
   - time_date
   - worldclock
   - system_log_event
 entities:
   - sun.sun
   - sensor.server_temp_cfac_temperature
   - sensor.server_temp_cfac_humidity
   - sensor.sonoff_100153d4ec_voltage
   - sensor.memory_free
   - sensor.memory_use
   - sensor.processor_temperature
   - sensor.memory_use_percent
   - sensor.processor_use
   - sensor.date
   - sensor.time
   - sensor.internet_time
   - sensor.solar_angle
   - sensor.home_assistant_v2_db
   - sensor.galaxy_tab_a7_free_memory
 entity_globs:
   - sensor.load_*
   - sensor.shellyem_*_channel_*_power
   - sensor.shellyem_*_channel_*_voltage
   - sensor.shelly_em_channel_*_power_factor
   - sensor.sonoff_*_power
   - sensor.sonoff_*_current
   - sensor.network_throughput_*
   - sensor.*_last_seen
   - sensor.faikin_*_ble_humidity
   - sensor.faikin_*_ble_temp
   - sensor.teleinfo*
   - sensor.smart_meter_*
   - sensor.faikin_*_humidity
   - sensor.*last_seen
   - sensor.sonoff_*_power
   - sensor.sonoff_*_current
   - sensor.sonoff_*_voltage
   - update.*
   - input_text.*
   - input_boolean.*
   - sensor.ipad*
   - sensor.roborock_s8_pro_ultra*
   - sensor.*last_seen
   - sensor.node_*_node_status
   - sensor.home*
 event_types:
   - service_registered
   - system_log_event
   - call_service

5.2) Clean the mess

This is a bit more dependent on your setup and SQL skills. If you use sqlite, you can add the “app” (addon) SQLite web to explore your database. Don’t overclick, requests take time.

The big picture:

SELECT
  SUM(pgsize) bytes,
  name
FROM dbstat
GROUP BY name
ORDER BY bytes DESC

(This one is loooong, count minutes)

This should show you the biggest tables.

Then, likely states and event_data should be your biggest ones.

To dig into them a bit more:

SELECT
  COUNT(*) as cnt,
  COUNT(*) * 100 / (SELECT COUNT(*) FROM events) AS cnt_pct,
  event_types.event_type
FROM events
INNER JOIN event_types ON events.event_type_id = event_types.event_type_id
GROUP BY event_types.event_type
ORDER BY cnt ASC
SELECT
  COUNT(*) AS cnt,
  COUNT(*) * 100 / (SELECT COUNT(*) FROM states) AS cnt_pct,
  states_meta.entity_id
FROM states
INNER JOIN states_meta ON states.metadata_id=states_meta.metadata_id
GROUP BY states_meta.entity_id
ORDER BY cnt ASC

For more about this topic, a great post here.

6 - Prompting LLMs to help you debug

I used Chatgpt to health check my packages and it did… awesome job.
The only thing is that the knowledge cutoff (when the LLM’s training data ended) should be considered. At the time of writing this, I use ChatGPT 5.2, and its knowledge horizon is August 2025. But this varies with models, and I tend to always refresh them about the updated syntax, which took place in October 2024. See release notes here. Just give it the link and tell it to apply the new syntax (trigger/triggers, service/actions, etc.)

(My other posts: 20 things I wished I knew when I started with Home Assistant || Integrate any remote in home assistant || Anti false positive PIR+camera alarm system || Automation from zero to hero || Securing & segregating your home network || Solar panel supervision || Daikin full automation)

7 Likes