HassIO Keeps Going Offline

ok i got confused with hassbian homeassistant instead, hassio as you said is pretty not friendly in that case. lucky for *unix folks on hassbian then.

Yes hassio is totally different under the hood to hassbian.

I actually have a similar issue occurring on a non-hassio install, and it definitely has to be a memory leak, though I haven’t spent any time trying to identify it. My issue is unrelated to Pi hardware.

I have two separate LXC containers (one for configuration testing and one for production use) on a 24-core server-class machine. Both are 0.63.1 virtualenv installations in vanilla Debian. Both also have access to 4 CPU cores (speeds up installation/update efforts), 512MB of RAM, and 256MB of swap memory.

The production one only differs in that it has access to the Z-Wave USB stick, so the dev container actually has less to do, with the possible exception of discovery being disabled on the production install
 However, it crashes far faster for some reason.

I note that when the crashes occur, the hypervisor reports that one CPU core is at 100%, and the swap is over-allocated, though the memory consumption itself is normal.

The production container will run for weeks or up to a month with no issues, but eventually it crashes. The dev container usually goes out within a week, though I usually only start it up to do some testing, so it’s rare that it stays on that long. Sometimes I’ll see peaks in the swap usage that doesn’t lead to a crash (it goes down later), but sometimes it grows and eventually fails.

Next time it happens, I’ll see if I can still get into the container (multi-core and normal memory usage might allow it) and see if I can figure out which process is crushing the CPU. Usually I’m too busy trying to get the automation back up to see what’s going on.

Finally checking in again. The dev container is again using all of its swap and the website is not available, but CPU is not high.

Listing processes with swap space usage doesn’t show anything out of the ordinary, other than processes are using swap that shouldn’t need to, like sshd using almost 1MB of swap. There was no history of memory usage over 300MB on 512MB of available memory.

Because I’m using LXC, I can’t see everything in the container that’s using swap (total listing might have amounted to 10MB on 256MB of assigned swap, so it’s possible something else is causing the issue. That said, Home Assistant containers are the only ones I have that are having swap issues. I’m running Plex and MythTV in LXC containers, among other services, with no issue.

I had to reboot for now, but I’ll keep trying to dig into it. I feel like it might be related to logging. Neither system is set up for verbose logging, but the hass logs are pretty verbose in default mode.

Same problem here (HassOS 1.9).
Changed power supply and 3 sd card, but still no luck! :frowning:

Anyone had any luck with this? I have been using Hassio for about 3 weeks and this happens about every 3 days. All of my zwave /zigbee automations appear to be functioning still but I completely lose network access. Does not show up in my router either. I am using Ethernet.

Is anyone aware of any logs that will persist through a reboot so I can try to narrow down the issue?

Edit:
I did add some sensors for memory use and CPU last time I had this issue. Interestingly memory use stayed the same and cpu usage dropped drastically when this occurs.

Blue line is CPU. Around 10:30 AM it drops to 3 and doesnt go up until I perform a hard reset of my PI3+.
image

2nd post for second image.
Mem free percentage didnt seem to change until I performed a reboot. Spike percent free after my rebot.

I setup some additional sensors around swap usage and disk usage to ensure nothing crazy is happening there. I also disabled discovery as I have a lot of chrome devices and see some errors from time to time in the HA log about those.

and another instance of this.
image
I completely lose network connectivity which prevents anything not zwave/zigbee from working. I also have stumbled upon what appears to be times where all my zigbee devices are failing to function until a reboot of hassio. This is extremely frustrating. I love home assistant and the power it has but I cant keep doing this if it becomes unreliable,multiple times a week. Hopefully someone with more knowledge can chime in on what logging options I have. If something would survive a reboot I feel like this would get me much closer to finding out what the real problem is.

So after logging did not help, I searched the web for others mentioning an issue with the Pi in general losing its network config. Found a mention (cant find it) of an issue where the DHCP lease time seemed to play a role. That rang a bell that my lease time is 3 days (default and not configurable on my router). This lined up almost exactly to the time I was having the issue. A workaround many had posted was to set the IP on the PI with a static IP. I had already done this via the router but it seemed when the lease was up, I would lose the PI. After following the info here: https://docs.resin.io/reference/OS/network/2.x/#setting-a-static-ip I seem to have eliminated this issue. I have now been up to two weeks of uptime without losing network connectivity. Now to tackle my random zigbee failures.

TL;DR
DHCP lease time may have been kiling the PI IP. Setting static IP on the PI itself seems to have resolved this issue.

2 Likes

Hi
I am just dealing with that issue.
It has been working correctly without stucking for 2 months and now it is stucking quite frequently.
I have not changed anything - all automatic updates are disabled.
RPI is connected via cable, i.e. when I unplug the cable then there is no connection to it.
The behaviour is as described by other people - suddenly no connection, router does not see it any more, RPI is on, restarting router does not help, only hard restart of RPI.
Also I am not able to find any logs, they are deleted after hard restart ? would be useful to see what was going on before it gets stuck.
what can change if I do not change anything: SD card? power supply? RPI itself? my internet provided updated the router so maybe this DHCP lease time? I have not tried to replace them yet.
I have just tried with deleting db (I had already 500 mb, but still not so big!), we will see.
but any other ideas are welcome.

Mine as well, past a week im seeing failures
 i tried using a new sd card, flasing and reloading, nothing works, once it went offline, i pulled out the logs just to see memory full, removed some components and moved to the prerelease 2.3 hassos version, seems ok for now


Traceback (most recent call last):
File “/usr/local/lib/python3.6/site-packages/homeassistant/helpers/entity.py”, line 221, in async_update_ha_state
await self.async_device_update()
File “/usr/local/lib/python3.6/site-packages/homeassistant/helpers/entity.py”, line 349, in async_device_update
await self.hass.async_add_executor_job(self.update)
File “/usr/local/lib/python3.6/concurrent/futures/thread.py”, line 56, in run
result = self.fn(*self.args, **self.kwargs)
File “/usr/local/lib/python3.6/site-packages/homeassistant/components/sensor/command_line.py”, line 99, in update
self.data.update()
File “/usr/local/lib/python3.6/site-packages/homeassistant/components/sensor/command_line.py”, line 175, in update
command, shell=shell, timeout=self.timeout)
File “/usr/local/lib/python3.6/subprocess.py”, line 336, in check_output
**kwargs).stdout
File “/usr/local/lib/python3.6/subprocess.py”, line 403, in run
with Popen(*popenargs, **kwargs) as process:
File “/usr/local/lib/python3.6/subprocess.py”, line 709, in init
restore_signals, start_new_session)
File “/usr/local/lib/python3.6/subprocess.py”, line 1275, in _execute_child
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Out of memory

Prathik_Gopal where can I find this logs?

In my case, after removing home-assistant_v2.db in config now it is working fine. Additionally I put some purging in configuration.yaml:
recorder:
purge_keep_days: 10
purge_interval: 10
I hope it will clear db automatically.

It should be in the config folder named as home-assistant.log, you may need configure using the logger component for certain logs to appear

For more clarity see here : Logger component
Mine is as below, this will help you narrow down or find deep trace to certain issues using the log levels in the component.

default: warning
logs:
  homeassistant.components.automation: info
  homeassistant.components.mqtt: info
  homeassistant.components.sensor: info

I have the same experience with hassio, after a few days it goes offline. Started with a brand new RPI 3B+ with a new SD. Started enthusiastic with installing HUE, TrÄdfri, Tellstick and Connbee and all seems to be fine until it goes offline. Tried RPI 3B with different cards and 3B+ with different cards. Tried setting up openHAB2 and Domoticz on same cards and RPIs and it seems to work fine. Have had domoticz running for years without sudden death and currently three pi running without problems domoticz/UnifiPI/Openhab2. The only one crashing is Hassio. Any clues? Should install Raspian an HA instead of using Hassio?

Had same issue with HASSIO and having same issue with hassbian 


Any idea?

There’s your problem.

how do you get those logs?
I mean, when RPI and hassio stuck then I can only do a brute restart via power supply, then I am loosing all logs: home-assistance.log is empty and starts as new.

Just wanted to throw another update on this. Been a few months and this issue completely went away after setting the static IP on the PI itself. The fact the issue occurred almost exactly when my DHCP lease was up was pretty telling.

any friendly (for lazy people) instruction how to set static IP on the PI itsefl?

You can find it all here: https://github.com/home-assistant/hassos/blob/dev/Documentation/network.md