First of all, thanks to all the folks that contributed to this topic. I came here looking to monitoring my JOBD’s S.M.A.R.T. data and you people inspired me.
The first issue I wanted to solve was the exit code. @kiwijunglist suggested using this smartctl
command:
/usr/sbin/smartctl --info --all --json --nocheck standby /dev/$arg
I have a sick disk in my media server and noticed this was always returning 0. After a bit of investigation I found this command lacked a a single character (“x”), as evidenced by the following on my media server:
root@media:/tmp# smartctl --all /dev/sdf >/dev/null ; echo $?
0
root@media:/tmp# smartctl --xall /dev/sdf >/dev/null ; echo $?
68
The -x/--xall
flag enables the extended exit status and allows us to decode the non-zero exit bits smartctl(8).
Using the bash script example from the smartctl(8) man page, I updated the awake script with the following payload data:
NOTE: My naming scheme is:
- JSON
smartctl_$HOSTNAME_$DEV_json
- scripts and sensors
smartctl_$HOSTNAME_$DEV
-
smartctl_media_sda_json
in this example.
{# Extended Status Bits #}
{% for index in range(0,7) -%}
{% if states('sensor.smartctl_media_sda_json') | int | bitwise_and(2 ** index) > 0 -%}
{% set smartctl_exit = {
0:"command line did not parse",
1:"open failed or low-power mode",
2:"checksum error",
3:"DISK FAILING",
4:"PRE-FAIL",
5:"DISK OK but PRE-FAIL",
6:"Logged ERRORs",
7:"Self-Test ERRORs"
} -%}
"exit_bit_{{ index }}": "{{ smartctl_exit[index] }}",
{% else -%}
"exit_bit_{{ index }}": 0,
{% endif -%}
{% endfor -%}
{# END Extended Status Bits #}
"smartctl_man_page": "https://www.smartmontools.org/browser/trunk/smartmontools/smartctl.8.in"
I’m tying to dig myself out of this rabbit hole, but every time I solve one problem I create another one . I’m still tweaking this config, but thought I’d share my current HA S.M.A.R.T. solution.
The following is a complete config for the /dev/sdf
disk on my media
host. I have several disks in this server, so I just edit the sda.yaml
file in /config/packages/smartctl/media
and run a for
loop to copy them into place for the other disks and then a quick sed -i -e "s/sda/$DEV/g" ${DEV}.yaml
. I included the script I’m using to ssh
into the remote host and to grab the smartcl JSON data, and the crontab
entry I have set on my HA server.
You’ll have to excuse the mess, this is still a Work In Progress. I’m trying to make some sense of the logs (among other things).
homeassistant:
customize:
sensor.smartctl_media_sdf:
friendly_name: Media /dev/sdf
icon: mdi:harddisk
sensor:
- platform: mqtt
name: smartctl_media_sdf
state_topic: 'smartctl/media/sdf/state'
json_attributes_topic: 'smartctl/media/sdf/attributes'
#################################################################################################################################
# crontab entry (user: pi)
# Run once an hour
# * */1 * * * for DEV in sda sdb sdc sdd sde sdf nvme0n1p2; do /home/pi/docker/homeassistant/bin/get-smartctl-json.sh media "${DEV}"; done
#
#################################################################################################################################
#
######################
# get-smartctl-json.sh
######################
# #!/usr/bin/env bash
#
# SSHKEY='~/.ssh/smart'
# RHOST="$1"
# RUSER=smart
# DEV="$2"
# RCOMMAND="sudo /usr/sbin/smartctl --info --xall --json --nocheck standby /dev/$DEV"
# BASEDIR="/home/pi/docker/homeassistant/config/smartctl"
# [ -z "$RHOST" -o -z "$DEV" ] && { echo "$0 hostname device"; exit 1; }
# ssh -i $SSHKEY ${RUSER}@${RHOST} "$RCOMMAND" > "${BASEDIR}/${RHOST}/${DEV}.json"
#
#################################################################################################################################
- platform: command_line
name: smartctl_media_sdf_json
command: "/bin/cat /config/smartctl/media/sdf.json"
value_template: "{{ value_json.smartctl.exit_status }}"
json_attributes:
- smartctl
- device
- model_name
- serial_number
- user_capacity
- smart_status
- ata_smart_attributes
- temperature
- firmware_version
- ata_smart_self_test_log
- ata_smart_error_log
scan_interval: 30
automation:
- alias: "smartctl_media_sdf"
trigger:
- platform: state
entity_id: sensor.smartctl_media_sdf_json
- platform: homeassistant
event: start
action:
service_template: >-
{% if states('sensor.smartctl_media_sdf_json') | int | bitwise_and(2 ** 1) > 0 %} {# exit_bit_1 = sleep, see smartctl (8) #}
script.smartctl_media_sdf_sleep
{% else %}
script.smartctl_media_sdf_awake
{% endif %}
script:
smartctl_media_sdf_awake:
sequence:
- service: mqtt.publish
data:
topic: "smartctl/media/sdf/state"
# figure out how to define all the bitwise settings
payload: >-
{% if states('sensor.smartctl_media_sdf_json') | int > 0 %}
Sick
{% else -%}
Healthy
{% endif -%}
retain: true
- service: mqtt.publish
data_template:
topic: "smartctl/media/sdf/attributes"
# IF YOU HAVE PROBLEMS WITH THE SENSOR YOU CAN COPY+PASTE THE PAYLOAD INTO HOME ASSISTANT TEMPLATE EDITOR
payload: >-
{
"last updated": "{{ states('sensor.date_time') }}",
"model name": "{{ state_attr('sensor.smartctl_media_sdf_json','model_name') | string }}",
"serial_number": "{{ state_attr('sensor.smartctl_media_sdf_json','serial_number') | string }}",
"firmware_version": "{{ state_attr('sensor.smartctl_media_sdf_json','firmware_version') | string }}",
"device": "{{ state_attr('sensor.smartctl_media_sdf_json','device').name | string }}",
"device_type": "{{ state_attr('sensor.smartctl_media_sdf_json','device').type | string }}",
"device_protocol": "{{ state_attr('sensor.smartctl_media_sdf_json','device').protocol | string }}",
"size": "{{ (state_attr('sensor.smartctl_media_sdf_json','user_capacity').bytes / 1000000000000) | round(2)}} TB",
"temperature": "{{ state_attr('sensor.smartctl_media_sdf_json','temperature').current | int }}",
"smart_status":
{% if states.sensor.smartctl_media_sdf_json.attributes.smart_status.passed -%}
"Healthy",
{% else -%}
"Sick",
{% endif -%}
{# ATA SMART Error Log #}
{% if 'table' in state_attr('sensor.smartctl_media_sdf_json','ata_smart_error_log').extended.keys() -%}
{% for attr in state_attr('sensor.smartctl_media_sdf_json','ata_smart_error_log').extended.table -%}
"error_log_number_{{ attr.error_number }}":"{{ attr.error_description }}",
{% endfor -%}
{% else -%}
"ata_smart_error_log":"Not Found",
{% endif -%}
{# END ATA SMART Error Log #}
{# ATA Self Test Log #}
{%- for attr in state_attr('sensor.smartctl_media_sdf_json','ata_smart_self_test_log') -%}
{%- if attr == "standard" -%}
{%- if 'table' in state_attr('sensor.smartctl_media_sdf_json','ata_smart_self_test_log').standard.keys() -%}
{%- for log in state_attr('sensor.smartctl_media_sdf_json','ata_smart_self_test_log').standard.table -%}
"self_test_{{ loop.index }}": "{{ log.type.string }}, {{ log.status.string }} @ {{ log.lifetime_hours }} hrs",
{%- endfor -%}
{%- endif -%}
{%- endif -%}
{%- if attr == "extended" -%}
{%- if 'table' in state_attr('sensor.smartctl_media_sdf_json','ata_smart_self_test_log').extended.keys() -%}
{%- for log in state_attr('sensor.smartctl_media_sdf_json','ata_smart_self_test_log').extended.table -%}
"self_test_{{ loop.index }}": "{{ log.type.string }}, {{ log.status.string }} @ {{ log.lifetime_hours }} hrs",
{%- endfor -%}
{%- endif -%}
{%- endif -%}
{%- endfor -%}
{# END ATA Self Test Log #}
{# END ATA SMART Error Log #}
{# ATA Attributes #}
{% if state_attr('sensor.smartctl_media_sdf_json','ata_smart_attributes').table -%}
{% for attr in state_attr('sensor.smartctl_media_sdf_json','ata_smart_attributes').table -%}
{%- if attr.id -%}
"ID_{{ attr.id }}_{{ attr.name }}":"{{ attr.raw.value | int }}",
{% else -%}
"DEBUG_{{ attr }}":"You should never see this.",
{% endif -%}
{% endfor -%}
{% else -%}
"attributes_table":"Not Found",
{% endif -%}
{# END ATA Attributes #}
"smartctl_id_info": "https://www.backblaze.com/blog/what-smart-stats-indicate-hard-drive-failures/",
{# Extended Status Bits #}
{% for index in range(0,7) -%}
{% if states('sensor.smartctl_media_sdf_json') | int | bitwise_and(2 ** index) > 0 -%}
{% set smartctl_exit = {
0:"command line did not parse",
1:"open failed or low-power mode",
2:"checksum error",
3:"DISK FAILING",
4:"PRE-FAIL",
5:"DISK OK but PRE-FAIL",
6:"Logged ERRORs",
7:"Self-Test ERRORs"
} -%}
"exit_bit_{{ index }}": "{{ smartctl_exit[index] }}",
{% else -%}
"exit_bit_{{ index }}": 0,
{% endif -%}
{% endfor -%}
{# END Extended Status Bits #}
"smartctl_man_page": "https://www.smartmontools.org/browser/trunk/smartmontools/smartctl.8.in"
}
retain: true
smartctl_media_sdf_sleep:
sequence:
- service: mqtt.publish
data:
topic: "smartctl/media/sdf/state"
payload: "Sleep"
retain: true
And what post is complete with a couple of obligatory screenshots?
- HACS card
custom:auto-entities