Yes.
I use Netdata for one of my system monitoring solutions, it natively monitors mdstat. There are 4 ways to get those metrics.
Scrape the web page
Export to a time series database then read that directly from HA
Set up an alert in Netdata, then monitor that alert platform in HA (less cpu usage)
OR the easiest, the local JSON REST api
This is the url that pulls the amount of failed disks:
http://192.168.0.2:19999/api/v1/data?chart=mdstat.mdstat_health&after=-600&before=0&points=1&group=average>ime=0&format=json&options=seconds&options=jsonwrap
and returns this data which you parse:
{
"api": 1,
"id": "mdstat.mdstat_health",
"name": "mdstat.mdstat_health",
"view_update_every": 600,
"update_every": 1,
"first_entry": 1618544491,
"last_entry": 1618631481,
"before": 1618631400,
"after": 1618630801,
"dimension_names": ["md126"],
"dimension_ids": ["md126"],
"latest_values": [0],
"view_latest_values": [0],
"dimensions": 1,
"points": 1,
"format": "json",
"result": {
"labels": ["time", "md126"],
"data":
[
[ 1618631400, 0]
]
},
"min": 0,
"max": 0
}
The latest_values
array gives you the current failed disk count to match the array of the mdraid arrays from dimension_names
, if it is anything but 0s you have a problem. dimensions
will match the amount of raid arrays you have in mdraid, so use that value as the size of the arrays.