Monitoring Home Assistant - How are you doing it?

jwl173305361 · July 25, 2016, 8:55pm

I have came to rely on Home Assistant so much now that it going down can be critical. (In my mind at least… My wife has another opinion.) Anyone came up with a good monitoring solution they would like to share with me?

Right now I’m running a script on another server that’s checking the API. If API Running isn’t returned then it attempts to restart hass. There’s a sleep for 5 minutes then it attempts the API call again and if it us running it exits and logs. If it isn’t running then it emails me the last 50 lines from the logs and quits. I’m happy to sanitize and clean up the script if anyone is interested, but I know there has to be a cleaner way to monitor it. For example, this doesn’t tell me if all the dependencies are met… I.e., if Wink loaded properly.

rpitera · July 25, 2016, 10:30pm

The best I can offer at the moment is a Google Docs based uptime monitor, but it may be a start for you…

My post with the links to the script is about midway down the thread.

AlucardZero · July 25, 2016, 10:52pm

The “real” solution is a monitoring program such as Nagios or Zabbix. This is what I am using as I am a sysadmin by trade. Something much simpler that could work for you is Pingdom or similar site.

forbin · July 25, 2016, 10:55pm

Monit is a possibility.

jwl173305361 · July 25, 2016, 11:15pm

Thanks everyone for your contributions!

@rpitera - I don’t know how I didn’t see that in my searching before I asked this question! I’ll have to take a look and see if that is better than what I have now.

@AlucardZero & @forbin - Thanks for both of your suggestions, but I’m not looking at standing up anything or spending money with this. I actually work for a very large IT monitoring company, so using the product that I sell is always an option (I won’t add in a shameless URL or name at this point!)

My main question is more around what to monitor, which I wasn’t very clear about now that I read my original post! Right now all I’m looking at is whether or not the API is reported as “running”, but I don’t think that’s the best indicator or whether or not everything is actually working as it should. I gave one example of my Wink hub not getting initialized, but it could be that my Plex server is no longer seen, or my WeMo devices are offline, or my TV, or my Yamaha receiver… I know I could also monitor each of those individual endpoints, but that doesn’t also mean that Home Assistant can see them and interact with them properly either.

I think I’m probably looking for a solution that hasn’t been solved yet and probably there isn’t an easy solution for. Until then I’ll see out a way to query a running config and then run tests to make sure it matches what’s actually running from time to time.

Thanks again for your responses!

AlucardZero · July 26, 2016, 3:26am

Not a bad idea; what other sorts of things do you want to monitor?

I was thinking of alerting if, for example, any lights are on and I am not home, but I could just do that in HASS.

laf · July 26, 2016, 7:10am

Surely you can extend your current script to do all of this via the API by checking if these devices are still visible and have the required state as you expect?

jwl173305361 · July 26, 2016, 10:29am

Yeah, I can make an API call to query the state of each element and that’s probably what I’ll end up doing. I’m going to work on getting an inventory and then write a script that loops through them on occasion. I don’t want to have to add another step in the process (to forget) of bringing new devices online each time.

jwl173305361 · July 26, 2016, 10:37am

Yeah, for your scenario I would recommend writing an automation policy. I have a similar one that looks for unlocked doors and lights on while no one is home. If they are found they lock/power off and send me a text.

That’s the exact thing that I want peace of mind knowing that it is working. I want to feel completely confident that when I’m not getting that text that it is because the doors are locked rather than Home Assistant has:

Went offline
Stopped communicating with my locks

aimc · July 26, 2016, 12:46pm

My wife is my monitoring service, when she says “honey, why didn’t the lights come on?” I spring into action and start fixing things

In all seriousness though, I am using the Google method but for me, homeassistant has been rock solid whenever I am not messing with it!

jwl173305361 · July 26, 2016, 12:58pm

Touché on both!

I get her onboard after everything ‘just works’ for a couple of weeks and I get a bright idea of changing or adding something and I break it… Then we are back to her hating it and blaming anything wrong in her life on my ‘stupidhome’! If the mailman doesn’t deliver something on time it is the automation’s fault!

Your HA Dashboard project brought a few things down because of my tinkering. Thanks

aimc · July 26, 2016, 1:49pm

It’s all in the positioning … After I let my wife use the stock UI to turn the lights out for a week or two, she was saying “this is really hard to see without my glasses, can you do anything?” - “why of course I can honey, let me see about fixing up that old dashboard we had - you liked that right? Of course there may be a few teething problems with the lights and such but you won’t mind because in the end you’ll have the new dashboard, right?”

jwl173305361 · July 26, 2016, 2:10pm

My wife never even attempted to use the stock HA UI after I got it installed on her phone and throughout the house. She would walk around the house and check each lock and the garage before even trying to learn how it worked. I’m sure that I could have implemented it better, but it just didn’t have the capabilities and look that HA Dashboard does.

Now that I have HA Dashboard running a few different places in the house she’s starting to use it. I think that’s probably the beginning of her seeing the use cases. I think this (https://home-assistant.io/blog/2016/01/19/perfect-home-automation/#read-more) blog hit the nail on the head and really made me rethink how I was implementing my “SmartHome”. I want most things to work via automations and only use the dashboard to override things and/or see states of things.

aimc · July 26, 2016, 2:34pm

Yes, we are on the same path Although something like the dashboard is necessary and good (and impressive for your friends) - the real goal is to never have to use it unless something out of the ordinary happens. Ironically I am doing my best with other projects to render the dashboard unnecessary!

rpitera · July 26, 2016, 2:59pm

Would you mind sharing this with us?

aceat64 · July 26, 2016, 6:26pm

I use Zabbix. I’ve got a VM up on Digital Ocean that runs a few websites and Zabbix.

CCOSTAN · August 3, 2016, 4:25pm

https://uptimerobot.com offers free monitoring and alerting of up to 50 URLs. It also has an open API so a component could be built to both create the monitor as well as reading it’s status.

ThinkPad · August 4, 2016, 3:32pm

Isn’t this enough: Auto-start Checklist for systemd ?

And then especially the part:

[quote]The /etc/systemd/system/multi-user.target.wants/service.service file should also contain a line like Restart=always under the [Service] section of the file to enable the service to respawn after a crash
[/quote]

If you installed Home Assistant the systemd way that should be enough i guess?

With more simple things like scripts i have good experience with supervisor to automatically start and restart scripts without any further gimmicks like notifications and such. For Domoticz i have used Monit in the past, also works fine. It also has support for checking stuff like memory usage, CPU usage and such. Useful for restarting a process if it would hog your CPU for a long time.

jwl173305361 · August 7, 2016, 11:49pm

This is perfect! Thanks for your help and suggestion. I’ve created about 15 checks that I think give me a very good indication whether or not my HA instance is up and working properly. Took me less time than what it will to respond to you type up how I did it!

How I accomplished it was by looking at the response from the API. I’m using the URL of “api/state/entity_id” for a single device from each component that I’m loading. When loaded properly the API will return a string that includes the friendly_name, last_state, entity_id, etc. I’m using a keyword check to validate that the actual entity_id is returned. (If it doesn’t load it will result with “Entity not found”, which I suppose I could have also checked for and alerted if that existed.)

@ThinkPad - My main concern was not necessarily if the daemon was started. I wanted to know if what I’m expecting to be loaded has in fact loaded and is represented via the API. Like my Wink components, my Nest components, my media center components, my Wemo components, etc…

ThinkPad · August 8, 2016, 8:52am

Clever!

I did the same thing in the period i was running Domoticz. I used Monit to load an URL and check for a certain keyword. If that was missing, the service was not running as it should and it would warn me.