Unstable Home Assistant on RPI 3B+ w/SSD

Hello Automaters…

I’ve been an HA user for quite a while and have grown my integrations quite substantially.

Over the past 6 months, I’ve noticed ongoing abnormal HA availability and behaviour. Sometimes HA times out and after no specific amount of time, it is restored (10+minutes). Sometimes it ‘hangs’ where I can connect to the OS but not to HA. During both of these periods, HA has gaps in data (obviously) so it becomes clear it’s not working for these periods of time.

I’ve connected up a WEMO switch to remotely power off/on to get things up an running again, but this doesn’t help me solve the root cause.

I have run through logs and apart from pages and pages of details about devices that cannot connect for periods when they’re turned off etc… I cannot seem to locate anything specific that says “HA unavailable, or HA fatal error” … etc…

I’m looking for help on where to start. Many posts I’ve read use the process of disabling integrations etc. I have too many to start down that path and need to find logs to identify or logon and run some health check commands directly. Given I work for a monitoring company, I’ve pointed one of our monitoring tools checking for availability and notice its random and about 5 times a day if it hasn’t ‘hung’.

Can anyone help with how to perform some fault finding in HA? Are there any commands I can run or something I can monitor like the supervisor to see errors?

I’m running on a RaspberryPI 3B+ with an SSD to boot, Conbee II Zigbee. I’ve just replaced PSU with a RPI original version 2 weeks ago and checking Power utilisation on WEMO, CPU temperature, CPU usage, Memory Usage, there doesn’t appear to differ much leading up to these scenarios.