TUTORIAL: ESPHome File Logger - Works with HAOS

thargy · October 19, 2024, 4:51pm

Introduction

This is a follow up to ESPHome logger => to a file - ESPHome.

UPDATE: Added log file truncation support to prevent unmanaged log file growth.

Using ESPHome, you can access logs from the AddOn. However, you must leave a running window to capture logs over time. I have multiple devices and rarely have a window open to watch when something unusual happens.

An alternative is to use the esphome command line from my development machine, but, again, I don’t leave the machine running; and why should I when I have a perfectly good Home Assistant server running all the time? Ideally, the ESPHome integration should provide an option to dump log files into the /config folder somewhere. Still, until then, this was my solution - and, most importantly, it works on HAOS!

If you are not using HAOS, there should be more than enough detail in this tutorial for you to tweak it for your environment.

Requirements

To start, you need to have installed the ESPHome Integration, and be able to access the terminal. The easiest way is using the Advanced SSH & Web Terminal.

Next, we need to find the container name for the ESPHome Integration and make a note of it. Mine is addon_5c53de3b_esphome, but yours can be different.

Access the terminal and run the command docker container ls | grep "esphome" to list all the running containers. You should see one line ending with a name like addon_5c53de3b_esphome, though your hex code in the middle is probably different.

Alternatively, you can navigate to Developer Tools->Template and paste the following into the Template Editor: addon_{{ state_attr('update.esphome_update', 'entity_picture') | replace('/api/hassio/addons/','') | replace('/icon', '') }}; however, this may break in future if the paths change.

Create the script

The first step is to create a shell startup script, which can go under the config folder in a folder called shell and must be called esphome_logs_to_file.sh (technically it must contain esphome_logs somewhere in the name). Replace the contents with the following:

#! /usr/bin/env bash

# Reads one line at a time and appends to a file, allowing you to delete the log file, or even the log directory and it will be re-created
safe_log () {    
    #echo "$$" >> .loggers.pid;
    while IFS=$'\n' read -r line; do
        # Create log folder if doesn't exist
        if [ ! -d logs ]; then
            mkdir -p logs;        
        fi
        # Create or append line to file.
        echo -e "$line" >> "logs/$1"
    done < /dev/stdin
}

# Get process ID and cmd line when ps is not available
get_ps () {    
    (for p in $(ls /proc/[0-9]*/cmdline); do
        if [ -f "$p" ]; then
            echo "$p" | echo -n "$(grep -Eo '[0-9]{1,}') ";
            tr '\0' ' ' <  $p;
            echo;
        fi
    done) 2> /dev/null;
}

# Filters processes to those containing "esphome logs" or "esphome_logs" and returns ids
get_filtered () {
    local pids="$( get_ps )";
    echo "$pids" | grep -E "esphome[ _]logs" | grep -Eo '^[0-9]{1,}'
}

log () {
    echo "[$(date -u +"%Y-%m-%dT%H:%M:%SZ")] $1";
}

# Prevent for loop running if no configs found
shopt -s nullglob;

# Enable glob support
shopt -s extglob;

# Log file location for this script
logfile="${0%.*}.log";

# Implement truncate command - this doesn't need to run in docker
if [ "$1" = "truncate" ]; then
    log "Truncating logs" >> $logfile;

    # Get optional number of lines, default 1000
    lines=${2:-1000}

    # Move to esphome directory
    cd /config/esphome/logs >> $logfile;

    # Loop through all log files, removing all but 'lines' lines.
    for file in *.clog; do
        bash -c "sed -i -e :a -e '\$q;N;${lines},\$D;ba' \"${file}\"" &
        log "Truncating ${file} to ${lines} lines." >> $logfile;
    done

    if [ -f "$logfile" ]; then        
        # Truncate our log file too
        bash -c "sed -i -e :a -e '\$q;N;${lines},\$D;ba' \"${logfile}\"" &
        log "Truncating ${logfile} to ${lines} lines." >> $logfile;
    fi

    # Restore directory
    cd - > /dev/null;

    log "Done!" >> $logfile;
    exit 0;
fi

# If we're not running in the docker, now is the time to run in docker daemon
if [ ! "$1" = "**DOCKED**" ]; then
    echo "Logging startup process to ${logfile}";
    log "Starting Daemon in ESPHome Container..." >> $logfile;
    docker exec -t -d addon_5c53de3b_esphome nohup bash -c "$0 **DOCKED** \"$@\" >> \"${logfile}\" 2>&1 &" 0<&- &>/dev/null &
    exit 0;
fi
log "Running as Daemon"

# Kill any existing loggers
log "Scanning processes for loggers...";

pids=$( get_filtered );
for pid in $pids; do
    # Don't kill ourself, also skip any processes that are already dead (usually the one generating the pids)
    if [ ! "$pid" = "$$" ] && [ -f "/proc/$pid/cmdline" ]; then
        kill -9 $pid 2>&1;
        log "Killed $pid";
    fi
done
log "Done!";

# If we were passed the kill command, we're done
if [ "$2" = "kill" ]; then
    exit $?;
fi
log "Starting loggers...";


# Move to esphome directory
cd /config/esphome;

# Get all configurated devices - skippings secrets.yaml if present
for file in !(secrets).yaml; do
    # Look for "esphome:" in file
    if grep -q "esphome:" "${file}"; then
        script -q /dev/null -c "esphome logs \"${file}\" --device OTA" | safe_log "${file%.*}.clog" 2>&1 &                                           
        log "Started logging to file for ${file%.*}";
    fi
done

# Restore directory
cd - > /dev/null;

# Restore options
shopt -u nullglob;
shopt -u extglob;

log "Done!";

IMPORTANT: Change the DOCKER_CONTAINER name to the name of your Esphome container as found above!

Script Explanation

Don’t just paste random scripts from the internet! Instead, you should always check them first. Therefore, here is a step-by-step explanation of what the script does!

The script starts with some utility functions:

safe_log - this takes the output from the loggers and appends it one line at a time to the relevant log file, ensuring it exists first. This means you can delete the log file (and even the log directory, but not the esphome directory) to clear it. You can also empty the contents.

get_ps - the standard addon containers (including Esphome) do not have the ps command, this scans /proc/ returning a PID and command line for each process, similar to ps.

get_filtered - this uses get_ps to get the running processes in the Esphome container, and filters them out to the ones containing esphome logs or esphome_logs in the command line - these are the logging processes (there are multiple per logger to help fool esphome into thinking it is running in a terminal, so that it outputs ANSI colour escape codes).

log - this appends the UTC date and time to the start of our own log messages.

We then implement a truncate command which will truncate all *.clog files in the logs folder, to the last 1000 lines (we can optionally pass in the number of lines to keep)

If we’re not doing a truncate, the script’s first action is to see if it is running as a daemon in the esphome container; if not, it runs itself in the container and exits.

The new script passes the above check and uses the above utilities to scan the container’s processes to see if any existing loggers are running. If there are, it kills them. In this way, any new call is effectively a ‘restart’, and only one logger will be created for each device.

If the script was called with the kill parameter, it will terminate now, allowing us to use the script to stop logging.

Finally, we get to the main functionality: finding each device configuration and spinning up a logger for it. It does this by scanning the /config/esphome directory for .yaml files, skipping any called secrets.yaml (the secrets file from the addon, not a device configuration). It then checks each file for the presence of the “esphome:” string, which indicates it’s a device configuration.

For each configuration file found, it executes a script command in the background, which, in turn, executes an esphome logs command. The command it runs is:

esphome logs config/esphome/${file} --device OTA

This passes in the .yaml configuration file to the esphome logs CLI command.

--device OTA connects wirelessly (this is a change to the CLI since the last topic was written); without this, esphome may pause to ask you how you wish to connect.

Returning to the outer command, script -q /dev/null -c "{{ command }}" > logs/${file%.*}.clog 2>&1 & executes the inner command in a wrapper and streams any output to a .clog file in the /config/esphome/logs directory. We use this wrapper to ensure that esphome thinks it is running in a terminal so that we can capture any ANSI colour characters in the log file - which is why I also use a .clog extension (see below), but you can change that to whatever you want.

2>&1 ensures that any errors in running the command are likewise written to the log file.

The final & launches the script process in the background, allowing multiple instances to be run and ensuring it isn’t terminated automatically (shell commands can run for a maximum of 60 seconds).

Test everything is working

You can now test the script simply by running it:

/config/shell/esphome_logs_to_file.sh

In the shell folder, you should see an esphome_logs_to_file.log file, which contains the script’s logs, and will look something like this:

[2024-10-20T16:05:56Z] Starting Daemon in ESPHome Container...
[2024-10-20T16:05:56Z] Running as Daemon
[2024-10-20T16:05:56Z] Scanning processes for loggers...
[2024-10-20T16:05:56Z] Done!
[2024-10-20T16:05:56Z] Starting loggers...
[2024-10-20T16:05:56Z] Started logging to file for {{ YOUR DEVICE }}
[2024-10-20T16:05:56Z] Started logging to file for {{ YOUR 2ND DEVICE }}
...
[2024-10-20T16:05:56Z] Done!

In the esphome folder, you should also see a logs folder with a .clog file for each device, showing it is connecting:

INFO ESPHome 2024.10.0
INFO Reading configuration ...

You can kill the loggers using:

/config/shell/esphome_logs_to_file.sh kill

And your esphome_logs_to_file.log file should end with something like:

[2024-10-20T16:00:40Z] Starting Daemon in ESPHome Container...
[2024-10-20T16:00:41Z] Running as Daemon
[2024-10-20T16:00:41Z] Scanning processes for loggers...
[2024-10-20T16:00:41Z] Killed 1414
[2024-10-20T16:00:41Z] Killed 1417
[2024-10-20T16:00:41Z] Killed 1418
[2024-10-20T16:00:41Z] Killed 1421
[2024-10-20T16:00:41Z] Killed 1424
[2024-10-20T16:00:41Z] Killed 1426
[2024-10-20T16:00:41Z] Done!

There should be three processes for each logger. Which means everything is working as it should.

Add the shell commands

You can use the shell script to start and stop the loggers, but Home Assistant is all about automation! The next bit is easy if you already have SSH access set up, e.g., through the Advanced SSH & Web Terminal. However, doing that is beyond the scope of this tutorial. Sadly, it is required as shell commands are not given access to docker.

By calling ourself via ssh, we can run a shell command as root, giving us docker access. Here’s an example configuration:

shell_command:
  # Restart logging to file for all esphome devices
  esphome_log_to_file_restart: |-
    ssh -i /config/.ssh/id_rsa -o 'StrictHostKeyChecking=no' 127.0.0.1 -p2222 "/config/shell/esphome_logs_to_file.sh"
  # Stop logging to file for all esphome devices
  esphome_log_to_file_stop: |-
    ssh -i /config/.ssh/id_rsa -o 'StrictHostKeyChecking=no' 127.0.0.1 -p2222 "/config/shell/esphome_logs_to_file.sh kill"
  # Truncate esphome logs - defaults to 1000 lines if {{ lines }} data is not supplied
  esphome_log_to_file_truncate: |-
    bash -c "/config/shell/esphome_logs_to_file.sh truncate {{lines}}"

Note: your public key file (here /config/.ssh/id_rsa) must be in the authorized keys supplied in the Advanced SSH & Web Terminal configuration, and the port number (here 2222) must also match.

You will also notice that our truncate command doesn’t need to use the ssh connection, as it doesn’t need to run inside the docker container.

After adding the above to the configuration and another restart, you should be able to go to Developer->Action and see two new actions: Shell Command: esphome_log_to_file_restart and Shell Command: esphome_log_to_file_stop.

You can test each from the Developer->Action tab.

Finally, we can add an automation to run the restart command every time Home Assistant starts up:

alias: ESPHome - Start File logger
triggers:
  - trigger: homeassistant
    event: start
actions:
  - action: shell_command.esphome_log_to_file_restart

You can also optionally add an automation to truncate the log files periodically:

alias: ESPHome - Truncate logs
description: "Truncate esphome logs every hour"
mode: single
triggers:
  - trigger: time_pattern
    hours: /1
actions:
  - action: shell_command.esphome_log_to_file_truncate
    data:
      lines: 10000

Note: you can change the number of lines to keep for each file, and the frequency of when you want to truncate easily.

Bonus - Colour Logs!

Eagle-eyed readers will notice I used a .clog file extension. You can choose any extension you like, e.g. .log; however, I use VSCode with the VT100 extension. By installing the same extension and adding the following to the VSCode config, I can double click a .clog file to see my esphome logs in full colour

    "workbench.editorAssociations": {
        "*.clog": "vt100.preview"
    },
    "files.associations": {
        "*.clog": "vt100"
    },
    "vt100.synchronous-scrolling": true

Known issues

Sadly, I haven’t found a way to run the scripts whenever the ESPHome addon starts (see this thread). This means that restarts of the addon (which are not common, to be honest, except after updates) will kill the loggers, and require a restart to be issued.
The VT100 extension preview window doesn’t automatically update when changes occur, unless it is attached to a normal editor window, as such you might prefer to use the normal editor window (which still colourized the log, but displays the ANSI codes). If you remove the workbench.editorAssociations setting, you can revert to the standard editor window and open a preview alongside instead.

Protoncek · October 22, 2024, 4:41pm

Thanks a lot for this! It works perfectly. I’m looking for something like this for quite a while, since it will massively help debugging “nasty” modules.

I’m gonna leave it running for a while and see how much space it will take - did you explore this yet? I guess those logs will slowly fill up hdd space?

Only thing i just can’t make it to work is start and stop via HA shell command. So, if/when you’ll have time any kind of more detailed tutorial for this would be great.

thargy · October 23, 2024, 3:42pm

Hi @Protoncek,

Thanks a lot for this! It works perfectly. I’m looking for something like this for quite a while, since it will massively help debugging “nasty” modules.

I’m glad you found it useful, it took a lot of time to get the bash to work with the various limitations put in place by Docker and HAOS.

I’m gonna leave it running for a while and see how much space it will take - did you explore this yet? I guess those logs will slowly fill up hdd space?

This is very much a “how long is a piece of string”, for instance the size of the logs depends on (amongst other things):

How many devices you have
How ‘chatty’ they are
How often they disconnect
What loglevel you have set on each device

This gets worse when you consider it also depends on how much storage space you have! As such, there isn’t one strategy to rule them all, and there isn’t realistically anything to ‘explore’ as my own setup will be unrelated to yours. What I did do is make it really easy to delete, truncate or tail logs without affecting future logging, so that everyone can choose a strategy that best suits them.

However, I’ve updated the above tutorial by adding another subcommand to the shell script, and a new shell command and automation, for anyone that wants to go down that route.

Only thing i just can’t make it to work is start and stop via HA shell command. So, if/when you’ll have time any kind of more detailed tutorial for this would be great.

Mostly likely you are failing to setup the ssh connection correctly. Try running the command line directly from the terminal to see if it works, e.g.

ssh -i /config/.ssh/id_rsa -o 'StrictHostKeyChecking=no' 127.0.0.1 -p2222 "/config/shell/esphome_logs_to_file.sh"

Again, there are so many ways to configure HAOS for ssh, however, I did recommend using Advanced SSH & Web Termina, for which there are already plenty of posts about. It really would be out of scope for this topic.

Note that the location of the public key /config/.ssh/id_rsa shoudl be where you’ve placed your public/private key pair (there should also be a /config/.ssh/id_ras.pub file), and 2222 should be changed to the port you configured - I think by default it might be 22222, so double check. Finally, the addon’s authorized keys should be configured to include the public key.

Protoncek · October 24, 2024, 5:42am

Thanks a lot! I decided i won’t “play” with it anymore, since I’ll be using it only if needed - if any of my modules start to act funny, then I’ll just log in and run it manually - i have advanced ssh&web terminal installed (port 22), i also have HAOS access over putty on port 2222, all this works, so I’ll just leave it this way. I’m not that experienced in these things, so I’d rather not break anything working…

Appr. 1 day of running shows that clog files occupy about 400MB (i have over 30 esphome devices). It also seems that it occupies quite some RAM, since my ram consumption went from appr. 3G to 8GB since script started, and cpu usage also goes up a few % - from 2-3% up to 12-15%. Note that i have “overkill” machine: Intel NUC skull canyon with i7 and 16GB of ram, so on a machine like, say, Pi3 or Pi4 CPU usage can increase more and perhaps even slow down whole HA or clog the RAM.
It’s possible that the cause is not script, though, because i stopped script yesterday evening and ram and cpu are still same. HAOS reboot will probably help, i’ll do it later in the evening when i’ll have time to fix if anything goes wrong. This would need some additional testing… i’ll do it if time will allow…

EDIT: haos reboot solved ram and cpu issue, not it’s back to 2GB ram and 2% cpu usage.