Mdadm raid 5 monitor

richieframe · April 17, 2021, 4:00am

Yes.

I use Netdata for one of my system monitoring solutions, it natively monitors mdstat. There are 4 ways to get those metrics.

Scrape the web page
Export to a time series database then read that directly from HA
Set up an alert in Netdata, then monitor that alert platform in HA (less cpu usage)
OR the easiest, the local JSON REST api

This is the url that pulls the amount of failed disks:

http://192.168.0.2:19999/api/v1/data?chart=mdstat.mdstat_health&after=-600&before=0&points=1&group=average&gtime=0&format=json&options=seconds&options=jsonwrap

and returns this data which you parse:

{
   "api": 1,
   "id": "mdstat.mdstat_health",
   "name": "mdstat.mdstat_health",
   "view_update_every": 600,
   "update_every": 1,
   "first_entry": 1618544491,
   "last_entry": 1618631481,
   "before": 1618631400,
   "after": 1618630801,
   "dimension_names": ["md126"],
   "dimension_ids": ["md126"],
   "latest_values": [0],
   "view_latest_values": [0],
   "dimensions": 1,
   "points": 1,
   "format": "json",
   "result": {
 "labels": ["time", "md126"],
    "data":
 [
      [ 1618631400, 0]
  ]
},
 "min": 0,
 "max": 0
}

The latest_values array gives you the current failed disk count to match the array of the mdraid arrays from dimension_names, if it is anything but 0s you have a problem. dimensions will match the amount of raid arrays you have in mdraid, so use that value as the size of the arrays.