System crashes almost everyday. Can someone help understand the logs?

if you have

upnp:

in your configuration.yaml try commenting it out

I don’t have UPnP enabled in my configuration.yaml, but i don’t know what the UI does in this regard. Each time homeassistant reboots, it discovers a lot of devices that i did not connect.
Is there a way in de UI to disable this?

I have a RPI4 4GB as well, using docker-compose to run a stack of

  • Home Assistant Core
  • Eclipse Mosquitto
  • Zigbee2MQTT
  • Watchtower (to auto update)

Integrations are weather and MQTT. Controlling 35 devices / 120 entities so far.

Runs smooth 24/7, memory at 12%, load below 0,05.

What is the OS you ar running?

Raspian, based on Debian.

Hmmm, try this:

discovery:
  ignore:
    - upnp
1 Like

TIL that “Core” (now) references to Home Assistant Core application on Python. I’ve been using this all the time for describing what is now called Home Assistant Container. Sorry.

So, I run Home Assistant Container on Raspian (on a Raspberry Pi) or Ubuntu (on an NUC-alike x86) as OS. Both installations run without any memory or load issues.

Memory history on the x86 over the last 4 weeks:

Memory history on the RPI over the last 4 weeks:

(the RPI is recently set-up an frequent changes to configuration incl. restarts of HA are applied)

10% memmory saved by adding:

discovery:

enable:

- homekit

I just did that , to enable only homekit devices , and disable all others. You can try to disable whole feature of “discovery”.

When i add this to my config the checker gives an error


But i don’t see how the UPnP is related to hass io crashing everyday
today i got a crash again a memory drop and then a few minutes later the system is down.

Just adding myself to this issue.
Running HassOS on an HA Blue

addons:
MariaDB, ESPHome, Google Drive Backup SSH & Web Terminal, Samba, VS code
Integrations:
DSMR, HACS,HomeKit, Nest, OpenWeatherMap, Pi-Hole, RFXCOM, Ring, Sonos, Uniquiti Unifi, WLED, Yeelight, Z-Wave and ZHA (with conbee II)

I also connect to influx that is running on another server. I cannot exactly identify when this started happening, but the memory and CPU creep has become very noticeable lately:


Also in my case, this chart illustrates numerous restarts as I try to pin point the problem, I’ve been disabling integrations one at the time, but so far no luck. Only changes over the last week were the (mandatory) updates.

Logs don’t give me anything to work with for as far as I can tell, but am really hoping to find a better way to trouble shoot this as the ‘disable integration - monitor for several hours - restore snapshot’ process in not really sustainable.

edit: just to note that discovery is disabled on my system.

Hi,

I’m sadly out of suggestion (spotcast has been on my fishy list lately).

My system has stabilized after moving mqtt out to another server + a fresh install (not from backup - I have about 400 entities…) , using an SSD boot and moving to an RPi 8gb ( although 4gb would be more than fine).

I’ve seen one worrying spike of about 10% jump and drop to previous level within a couple of minutes. But I’m roughly stable at around 1gig ram in use (13% of my 8gig)

A couple things you could try, no guarantees :slightly_smiling_face:
ssh into your Raspian

grep error /var/log/syslog.1

This will show system errors for yesterday, maybe some thing of interest there, syslog.2 would be the day before and so on.
Regarding memory issues, note that Linux does not necessarily release memory when a process ends, but that memory is instantly available if needed. If you want to see better detail on what is using memory and cpu use HTOP

sudo apt-get install htop
htop

This may help see any issues, YMMV, good luck.

I have a Pi Model4 of now two months old. I’m relatively new. So a limited number of add-ons or integrations and a lot of RAM and disk free.

Home Assistant crashes at least once a week. And until now only during the night.

After 6 days I had this night for the second time exactly the same strange “semi crash”

  • No automation was done this morning
  • But I could connect from my PC to my PI and I could login. In the overview menu, I could even manually set some lights.
  • But I could not open any other menu than “overview”. I received the next error message when trying to open other menus. For the logging menu, Red-one menu, supervisor menu …

image

  • So I powered off/on my PI
  • After reboot, I could see in the “history” that the temperature measurements stopped at 02:00 this nights. And this stopped for my KNX AND for for my Zwave devices.
  • After reboot, the log file is always empty. Which means it’s difficult to find the reason. It should be better if the log file retains the information for 5 days (doing an append)

ive been having this for a while, and dont know why

the only way around it (sort of) that ive found is an automation to reboot the Pi every night at 2am

I’ve recently been having issues with losing access to HASSIO. It’s running on a Raspberry Pi 4 with an branded RaPi power supply (so that shouldn’t be the issue). Previously I was running a VM on an ESXI server and it was (mostly) rock solid, but I needed to move my Home Assistant more centrally as I was running ZHA and wanted make sure the supervisor reached all parts of the house. Also, the USB passthrough started to get a bit flakey on the server as well.

Crashes happen every few days and I’m not sure what causing it. I lose connection to the front end, so the only thing I can do is to pull the power and restart it. Accessing the logs hasn’t been exactly fruitful. The HASSIO log is lost each time I reboot the device, so that’s no help. I’ve tried accessing /var/log/, but folder seems to be empty. I’m not really great with Linux, so I’m kind of short of ideas. I’m not sure how to SSH into the Pi to see if it’s the host which is no longer active or just HASSIO itself.

How do I turn on logging at the host level? Is there a way to keep the HASSIO logs and not have them blown away each time the device reboots?

At this point, I’m tempted to run a separate bare-bones Home Assistant instance to will monitor my main Home Assistant and if it doesn’t receive a response, use a WiFi switch to turn it on and off … but I suspect that’ll cause more problems in the end than it addresses. I’d rather have the root cause fixed.

Any help would be greatly appreciated.

Hi, I am no expert… but here’s a couple of answers

“I can only access certain pages, some automations don’t work, etc”

  • Seems your ram memory is maxed out. This was the case for me. It still “works a bit” but most functions don’t.
  • Only solution - restart

My log file are erased after each restart

  • You are right - and right to be frustrated. I found that VERY annoying.
  • Solution: automation to copy the old log file before it gets erased. Here’s how:

Create a folder under your config folder called “external_data”
If you don’t know how… install the the add-on “File editor” to be found by searching under “supervisor”
Add in configuration.yaml

shell_command:
  # Backup logs and append a version number if one already exists.
  backup_logs: cp /config/home-assistant.log /config/external_data/log-`date +"%Y%m%d%H%M"`.log

This is where the old logs will be copied.

Then create an automation

  • you can add this to “automations.yaml”
- alias: Backup Log file
  trigger:
    platform: homeassistant
    event: shutdown
  action:
    service: shell_command.backup_logs

or if you prefer through the interface:

Using Windows to fetch the log
There’s still the drama that if your system is half crashed, you most likely won’t be able to “shut it down”. Using the automation above. So the logs won’t be saved. What I also did - it didn’t work all the time, but sometimes, even in a half crashed mode, it worked
Install “samba” to access the folder of files from Windows and copy the log file before restart

Under supervisor, install “samba share
Don’t start it yet. Add this to “configuration” found on the top navigation:


Edit to make your own password and username - no quotes needed

workgroup: WORKGROUP
username: type_your_own_username_instead_of_this
password:  type_your_own_password_instead_of_this
interface: ''
allow_hosts:
  - 10.0.0.0/8
  - 172.16.0.0/12
  - 192.168.0.0/16
  - 'fe80::/10'
veto_files:
  - ._*
  - .DS_Store
  - Thumbs.db
  - icon?
  - .Trashes
compatibility_mode: false

Start the add-on
now on your PC:
open an explorer windows (shortcut is windows symbol and “e”)
Type this:

\\homeassistant\

image
if it doesn’rt work - replace homeassistant with the ip adress of your system
image
(That’s mine. Yours is definitely another IP adress)
type in your username and password

Use a proper text file editor (notepad is an abomination for this)
Everybody loves the free “Notepad ++”


BE CAREFUL
Bare in mind that this is your live system. If you delete, change files in there… your system might not restart so act with care. I never delete a file there without making a backup before (ok… the log files can be deleted without problems…)

me2 - System stalls each couple of days Pi4 4GB Raspian running Core on Docker with an additional container for portainer and future use. 14 Integrations and a couple out of HACS. I can’t blame anything except myself for taking the “Emergency” upgrades and replacing Wireguard with Tail Scale and making a few other config changes all in one day.
I’ve started TOP running in a SSH session and thinking I’ll run TAIL on a couple of the log files from an always on machine (that doesn’t stall out)
If I cannot see anything from the logs after the next lock up, I plan to either disable all the HACS components or take them one at a time.
I also am interested to try to determine if python is running or not. I’ve never tried to script unix but so far not able to even get a clear answer from systemctl

If anyone knows how to check python health and where the Hassio log files are - I’d like to know. Good luck all

After deleting the database and restarting, everything was sweet

For about a week

For the last few days, it now crashes and hangs after about an hour

I’ve not changed anything to cause this

When it crashes, I cannot access any logs, and it’s a 50/50 chance if automations and switches still work

The only fix is a hard restart

Unfortunately for me, this means the WAF is now below zero, and my day off on the weekend is going to be spent un smarting the entire house

Hello there,

Since more than 2 year I experiencing this kind of issue with the RPI4 ( not remember if I had this issue on rpi3 )
The only fix I found and work well since 8 months was to move MQTT docker container on my NAS.
If that can help someone.
May be in next week i’ll reinstall mqtt on rpi to see difference.

++