Home assistant unresponsive

My home assistant is not reachable on :8123, nor are my automations working. I can see the machine on the network and ping it via IP, though there is no hostname and trying to connect via ssh fails also.

I am running from an SSD so it’s not an SD-card problem (and also I’ve had the same problem before I switched to SSD so I don’t think it is the SSD drawing too much power or something like that – besides I tried to follow all the best practices with my SSD setup).

Do you have any ideas what could be causing the problem or how to debug further?

This is very annoying as I would like to use home assistant for my security setup when I’m away, otherwise monitor the state of my home, open the gate when coming back home, etc. but during my last two vacations it just became unresponsive after a week or so.

When you reboot does it come back? If yes have you considered an automation to reboot the system every couple of days? I know this is not a fix but it might help till you can figure out the problem. I reboot my machine (VM on ubuntu server) once a week via automation.

Are you getting voltage spikes? Perhaps an UPS would fix?

Yeah so far it’s always come back when I reboot. I haven’t done so yet because I was thinking maybe it’s helpful to leave in the current state to investigate (but anyway will do so soon).

Do you know if there is an easy way to see/track uptimes either within home-assistant itself or from a separate system via IP? I’d like to log when & how often it happens.

I can try the restarting but when doing so from within home assistant (the only easy way I think as I run on a raspberry pi?) that will only help if it really happens only after X days or is purely a network problem but will not help at all if it’s just a crash that randomly occurs with some probability…

Regarding voltage spikes: I live in the Europe (Netherlands now, before Switzerland) and I have never heard those being a problem for anything.

Below is how I reboot my system. I use garbage collection sensor for the days but you can use a time condition as well.The odd timing is to insure other automations have finished or not started.

- id: 'XXXXXXXXXXX'
  alias: Reboot HA Core
  description: Reboot home assistant at 11:57:19 PM
  trigger:
  - platform: time
    at: '23:57:19'
  condition:
  - condition: state
    entity_id: sensor.boot_hacore
    state: '0'
  action:
  - service: homeassistant.restart
  mode: single

I was burning up Z-Wave devices frequently. Installed surge protection on Main and sub panels and that has gone away. Have UPS on all servers and switches that further conditions power. Might not be the problem but could be especially if you are close to a substation. I am less than 1 km to major substation. Voltage and current spikes can cause major problems. Clean conditioned power is a prerequisite.

Here is a good general discussion on spikes

Here is an integration for monitoring uptime.

and if you set the following in configuration.yaml

template:
  - trigger:
      - platform: time
        at: '00:00:00'
      - platform: event
        event_type: event_template_reloaded
      - platform: homeassistant
        event: start
    sensor:
      - name: "My Uptime"
        state: >-
          {% set boot = as_timestamp(states('sensor.ha_uptime'))|int %}
          {% set duration = as_timestamp(utcnow())|int - boot %}
          {{ timedelta(seconds=duration) }}

this is what it looks like in developer and will show days if calculated

Thanks for the replies here but after some more research it seems what I really want for a robust solution is a “hardware watchdog”. The raspberry pi has one built-in so I’ll try to enable that.