Hya,
is there a way to monitor the mdadm (linux) raid health with HA?
TIA.
Hya,
is there a way to monitor the mdadm (linux) raid health with HA?
TIA.
Yes.
I use Netdata for one of my system monitoring solutions, it natively monitors mdstat. There are 4 ways to get those metrics.
Scrape the web page
Export to a time series database then read that directly from HA
Set up an alert in Netdata, then monitor that alert platform in HA (less cpu usage)
OR the easiest, the local JSON REST api
This is the url that pulls the amount of failed disks:
http://192.168.0.2:19999/api/v1/data?chart=mdstat.mdstat_health&after=-600&before=0&points=1&group=average>ime=0&format=json&options=seconds&options=jsonwrap
and returns this data which you parse:
{
"api": 1,
"id": "mdstat.mdstat_health",
"name": "mdstat.mdstat_health",
"view_update_every": 600,
"update_every": 1,
"first_entry": 1618544491,
"last_entry": 1618631481,
"before": 1618631400,
"after": 1618630801,
"dimension_names": ["md126"],
"dimension_ids": ["md126"],
"latest_values": [0],
"view_latest_values": [0],
"dimensions": 1,
"points": 1,
"format": "json",
"result": {
"labels": ["time", "md126"],
"data":
[
[ 1618631400, 0]
]
},
"min": 0,
"max": 0
}
The latest_values
array gives you the current failed disk count to match the array of the mdraid arrays from dimension_names
, if it is anything but 0s you have a problem. dimensions
will match the amount of raid arrays you have in mdraid, so use that value as the size of the arrays.
thx for the answer.
i have netdata installed, but no mdstat on mine iām trying to find out why and how.
it is part of the proc plugin should just read the output of proc/mdstat if it exists
yes, i know, but it doesnāt show up.
solved. was using 1.9.0, update to 1.30.1 and itās working. thx!
Hi @richieframe,
thanks to your help I found the values given by Netdata about the health of the RAID of my QNAP NAS.
My NAS is configured like this:
- hard-disks 1 to 4 are in RAID5;
- the fifth slot is empty;
- the sixth contains an SSD with ācacheā functions.
At the moment there are no problems neither at the hard-disk level nor at the RAID1 level
Iāve modified the url according to my configuration and this below is the output:
{
"api": 1,
"id": "mdstat.mdstat_health",
"name": "mdstat.mdstat_health",
"view_update_every": 600,
"update_every": 1,
"first_entry": 1642872373,
"last_entry": 1642876369,
"before": 1642876200,
"after": 1642875601,
"dimension_names": ["md1", "md3", "md322", "md256", "md321", "md13", "md9"],
"dimension_ids": ["md1", "md3", "md322", "md256", "md321", "md13", "md9"],
"latest_values": [0, 0, 0, 0, 1, 19, 19],
"view_latest_values": [0, 0, 0, 0, 1, 19, 19],
"dimensions": 7,
"points": 1,
"format": "json",
"result": {
"labels": ["time", "md1", "md3", "md322", "md256", "md321", "md13", "md9"],
"data":
[
[ 1642876200, 0, 0, 0, 0, 1, 19, 19]
]
},
"min": 0,
"max": 19
}
I also found on the net the possibility of identifying further data with this url
all metrics from Netdata agent running and this is the output of mdstat.mdstat_health:
"mdstat.mdstat_health": {
"name":"mdstat.mdstat_health",
"family":"health",
"context":"md.health",
"units":"failed disks",
"last_updated": 1642929989,
"dimensions": {
"md1": {
"name": "md1",
"value": 0.0000000
},
"md3": {
"name": "md3",
"value": 0.0000000
},
"md322": {
"name": "md322",
"value": 0.0000000
},
"md256": {
"name": "md256",
"value": 0.0000000
},
"md321": {
"name": "md321",
"value": 1.0000000
},
"md13": {
"name": "md13",
"value": 19.0000000
},
"md9": {
"name": "md9",
"value": 19.0000000
}
}
},
Starting from this point I would like to set up a template sensor that provides more readable values (like āstatus okā, ādegradedā and so onā¦)
At first I set up the sensors for Home Assistant like this:
disk1_status:
data_group: "mdstat.mdstat_health"
element: "md1"
icond: mdi:harddisk
disk2_status:
data_group: "mdstat.mdstat_health"
element: "md3"
icond: mdi:harddisk
disk3_status:
data_group: "mdstat.mdstat_health"
element: "md322"
icond: mdi:harddisk
disk4_status:
data_group: "mdstat.mdstat_health"
element: "md321"
icond: mdi:harddisk
All return ā0.0 failed disksā except the last one which reports ā1.0 failed disksā.
However I donāt understand if they refer to the status of the individual hard-disks or to that of the raid.
Unfortunately I am not a linux user (I know only a few basic concepts) but getting help from mr. Google seems to me that perhaps the md1 sensor refers to the health of the RAID (and that the other āmdxā sensors represent other aspects).
If this were true then the following sensor would be enough for me:
raid_status:
data_group: "mdstat.mdstat_health"
element: "md1"
icond: mdi:harddisk
Taking into account my limitations and waiting to be able to better understand these concepts, Iāve tried to sketch this proof of ātemplate sensorā :
- platform: template
sensors:
ts653a_raid_status_readable:
friendly_name: "ts653a_raid_status_readable"
value_template: >-
{% if is_state('ts653a_raid_status', '0.0000000') %}
OK
{% elif is_state('ts653a_raid_status', '1.0000000') %}
KO
{% else %}
-unknown-
{% endif %}
Unfortunately, with poor results because it gives always to me -unknown-.
Iām far away from what I would like to do ā¦
Thanks in advance
Iāve done it.
It was a trivial syntax error.
I forgot to add āsensorā so the template couldnāt work.
The right syntax is:
- platform: template
sensors:
ts653a_raid_status_readable:
friendly_name: "ts653a_raid_status_readable"
value_template: >-
{% if is_state('sensor.ts653a_raid_status', '0.0') %}
OK
{% elif is_state('sensor.ts653a_raid_status', '1.0') %}
KO
{% else %}
-unknown-
{% endif %}
The result is OK so itās working.
Now the point is to understand if the md1 element of mdstat.mdstat_health represents the health of the raid or notā¦
Hi @RobinB,
Unfortunately no.
Iām stuck at the point of my last message.
The sensor works but I donāt know if it is set correctly.
I donāt know Linux well enough to understand the values of those āmdstatā sensors provided by the Netdata Agent.
After asking you yesterday, I went ahead and used the HACS Addon RAID Monitor, that actually works out fine for me. I also set up Netdata cloud and that alerts on broken disks also.
Hello,
Could you be more specific?
What is the HACS addon you talk about? I can not find it
Thanks Iāll take a look