Supervisor disconnected, can't access Add-Ons, Can't reboot Host

I have been having issues for a while now, but havent had the chance to look into it in more detail.
I’m running Home Assistant OS on an Intel NUC for 2 years almost. Never had any issues.
Now I can not access the Add-Ons anymore. WHen I go to Settings → Add ons I am getting “Could not load the Supervisor panel!” after a while. The 5 trouble shooting steps is what I am going through now. The Observer on port 4357 says “Observer: Disconnected”.
A reboot of the host actually works for a while, but after a day the same errors occur. But I have to manually force shut it down because the way through Settings->System->Hardware does not wok anymore ( “Failed to get available hardware. Unknown error, see supervisor logs”)
When I connect via ssh I can log in, but there’s no command prompt, only this message:

"Welcome to the Home Assistant command line.

Waiting for Supervisor to startup..."

Loading the full logs for the Supervisor (System->Logs) gives a “Failed to get supervisor logs, 502: Bad Gateway”.

Since Home Assistant never failed on me before I don’t have much experience in trouble shooting.
Is there anything else I can do to get to the bottom of the problem?
Any help is appreciated!

Hi tzippy,

seems to me I have the same or comparable problem running HA Green after migrating it from a VM on a QNAP NAS. Even restarting HA doesn’t work: I can select the restart but it doesn’t actually restart. After hard restart it runs fine for some hours and then shows the problems you mention:

  • No access to Addon Page
  • No access to Observer
  • No access to Supervisor
  • No access to Hardware Info
  • No possibility for a software restart
  • After hardware restart everything fine for some hours
    Configuration:
  • HA Green
  • Core: 2024.9.1
  • Supervisor: 2024-09.1
  • OS: 13.1
  • Frontend: 20240906.1

Any guesses from the community before digging into the logs?
Is there a possibility to trace where and when the system starts to become corrupt?

1 Like

Hi Joachim,

Thanks for sharing! Glad to see im not alone.
Maybe it’s not that uncommon.

I’d install the Glances Add-On and open a view to it right after a restart. Make sure you turn off Protection Mode on it so it can access your system. Look over all of the %CPU, Disk, Network and Memory values to get an idea of how everything looks when it’s running OK. Leave that screen up and as you get closer to the time it becomes unstable see if you can find any processes or containers that start using excessive resources.

1 Like

Great idea. I already have glances actually. Will do a screenrecording.

Glances is a great tool, thanks for the tip - but didn’t help me here because I didn’t notice any suspicious activity (I can’t stare at the screen for hours until my eyes water :slight_smile: ).

But what seems to be the cause: I mapped a network folder via the internal SAMBA share option and scheduled a daily backup routine. Backup routine got stuck and the backups were corrupt. So deleted the connection and everything is fine. Backups (local) are OK and HA is running without flaws.

I will give a NFS connection a try and see what happens. Because storing backups locally makes few sense. And creating a script that copies the backup to an external device via sftp or else shoud be only a fallback scenario. Let’s see.

@tzippy : Maybe you have a similar setup?

Hello Joachim,

I have been running stable as of yesterday, while running a screen recording on glances.
But I guess I can delete that now since what you wrote is exactly what I have been doing as well. I mapped my local NAS and scheduled a Backup (It never managed to do a successful backup though).
Once I am home I will unmap that drive and try the NFS connection as well.
Again, thank you so much for that hint!

Tzippy