I have came to rely on Home Assistant so much now that it going down can be critical. (In my mind at least… My wife has another opinion.) Anyone came up with a good monitoring solution they would like to share with me?
Right now I’m running a script on another server that’s checking the API. If API Running isn’t returned then it attempts to restart hass. There’s a sleep for 5 minutes then it attempts the API call again and if it us running it exits and logs. If it isn’t running then it emails me the last 50 lines from the logs and quits. I’m happy to sanitize and clean up the script if anyone is interested, but I know there has to be a cleaner way to monitor it. For example, this doesn’t tell me if all the dependencies are met… I.e., if Wink loaded properly.
The “real” solution is a monitoring program such as Nagios or Zabbix. This is what I am using as I am a sysadmin by trade. Something much simpler that could work for you is Pingdom or similar site.
@rpitera - I don’t know how I didn’t see that in my searching before I asked this question! I’ll have to take a look and see if that is better than what I have now.
@AlucardZero & @forbin - Thanks for both of your suggestions, but I’m not looking at standing up anything or spending money with this. I actually work for a very large IT monitoring company, so using the product that I sell is always an option (I won’t add in a shameless URL or name at this point!)
My main question is more around what to monitor, which I wasn’t very clear about now that I read my original post! Right now all I’m looking at is whether or not the API is reported as “running”, but I don’t think that’s the best indicator or whether or not everything is actually working as it should. I gave one example of my Wink hub not getting initialized, but it could be that my Plex server is no longer seen, or my WeMo devices are offline, or my TV, or my Yamaha receiver… I know I could also monitor each of those individual endpoints, but that doesn’t also mean that Home Assistant can see them and interact with them properly either.
I think I’m probably looking for a solution that hasn’t been solved yet and probably there isn’t an easy solution for. Until then I’ll see out a way to query a running config and then run tests to make sure it matches what’s actually running from time to time.
Surely you can extend your current script to do all of this via the API by checking if these devices are still visible and have the required state as you expect?
Yeah, I can make an API call to query the state of each element and that’s probably what I’ll end up doing. I’m going to work on getting an inventory and then write a script that loops through them on occasion. I don’t want to have to add another step in the process (to forget) of bringing new devices online each time.
Yeah, for your scenario I would recommend writing an automation policy. I have a similar one that looks for unlocked doors and lights on while no one is home. If they are found they lock/power off and send me a text.
That’s the exact thing that I want peace of mind knowing that it is working. I want to feel completely confident that when I’m not getting that text that it is because the doors are locked rather than Home Assistant has:
I get her onboard after everything ‘just works’ for a couple of weeks and I get a bright idea of changing or adding something and I break it… Then we are back to her hating it and blaming anything wrong in her life on my ‘stupidhome’! If the mailman doesn’t deliver something on time it is the automation’s fault!
Your HA Dashboard project brought a few things down because of my tinkering. Thanks
It’s all in the positioning … After I let my wife use the stock UI to turn the lights out for a week or two, she was saying “this is really hard to see without my glasses, can you do anything?” - “why of course I can honey, let me see about fixing up that old dashboard we had - you liked that right? Of course there may be a few teething problems with the lights and such but you won’t mind because in the end you’ll have the new dashboard, right?”
My wife never even attempted to use the stock HA UI after I got it installed on her phone and throughout the house. She would walk around the house and check each lock and the garage before even trying to learn how it worked. I’m sure that I could have implemented it better, but it just didn’t have the capabilities and look that HA Dashboard does.
Now that I have HA Dashboard running a few different places in the house she’s starting to use it. I think that’s probably the beginning of her seeing the use cases. I think this (https://home-assistant.io/blog/2016/01/19/perfect-home-automation/#read-more) blog hit the nail on the head and really made me rethink how I was implementing my “SmartHome”. I want most things to work via automations and only use the dashboard to override things and/or see states of things.
Yes, we are on the same path Although something like the dashboard is necessary and good (and impressive for your friends) - the real goal is to never have to use it unless something out of the ordinary happens. Ironically I am doing my best with other projects to render the dashboard unnecessary!
https://uptimerobot.com offers free monitoring and alerting of up to 50 URLs. It also has an open API so a component could be built to both create the monitor as well as reading it’s status.
[quote]The /etc/systemd/system/multi-user.target.wants/service.service file should also contain a line like Restart=always under the [Service] section of the file to enable the service to respawn after a crash
[/quote]
If you installed Home Assistant the systemd way that should be enough i guess?
With more simple things like scripts i have good experience with supervisor to automatically start and restart scripts without any further gimmicks like notifications and such. For Domoticz i have used Monit in the past, also works fine. It also has support for checking stuff like memory usage, CPU usage and such. Useful for restarting a process if it would hog your CPU for a long time.
This is perfect! Thanks for your help and suggestion. I’ve created about 15 checks that I think give me a very good indication whether or not my HA instance is up and working properly. Took me less time than what it will to respond to you type up how I did it!
How I accomplished it was by looking at the response from the API. I’m using the URL of “api/state/entity_id” for a single device from each component that I’m loading. When loaded properly the API will return a string that includes the friendly_name, last_state, entity_id, etc. I’m using a keyword check to validate that the actual entity_id is returned. (If it doesn’t load it will result with “Entity not found”, which I suppose I could have also checked for and alerted if that existed.)
@ThinkPad - My main concern was not necessarily if the daemon was started. I wanted to know if what I’m expecting to be loaded has in fact loaded and is represented via the API. Like my Wink components, my Nest components, my media center components, my Wemo components, etc…
I did the same thing in the period i was running Domoticz. I used Monit to load an URL and check for a certain keyword. If that was missing, the service was not running as it should and it would warn me.