How do I detect errors, notably in discovery?

I’ve got zwave running on a rPi, and HA running in a VM on Hyper-V, and generally it works fine. Sometimes, especially when both power up abou tthe same time, they fail to start correctly, and I may lose all the zWave devices. Occasionally something else may go wrong, e.g. with cast devices, or the network itself that might take out some discovered sensor.

I would like to somehow monitor for problems, and get an alert. I do not necessarily need HA to send the alert, I could monitor from a script on another server (e.g. I have a Zabbix network monitor).

I see the restful API and I guess I could code a check in an exterior system for every sensor’s existence or value. That seems tedious in the extreme.

HA seems to know all the entities it has seen before in integrations, and I (think I can) adjust this if I actually remove something.

Is there some existing integration or tool or way to create a value template sensor to do this in a more automated fashion? Basically for HA to monitor itself and notice if there are problems? There’s a system health tool which I guess feeds (or is) the API, but it is unclear to me how to act on it other than “look at the info panel”.

I do not much care if I can get low level detailed error info automatically, basically I want a “hey, come check HA as something is going wrong” message. Specifically email, but again, if I could query from a simple script on another system even better.

Am I missing something obvious?

Doesn’t seem to hold much interest, but I’ll follow up with what I’ve done.

I use zabbix as a network monitoring system; it’s free and I use it for my home network as well as for clients.

I used the restful interface to Home Assistant to monitor for states, and when an entity is new or disappears it can let me know. It also lets me know if it can’t access home assistant, so it can detect if the server fails.

Here’s an example of when I disabled some zwave sensors I was not using:

If any other zabbix users are out there and want it, I can share the template and associated external check routine (in perl).

Next I’ll set it up to monitor the HA environment, e.g. disk, cpu, etc. I like the idea of keeping the monitoring of HA outside of HA.

FWIW.