HA Stops about every 2nd day - RPi2 (SSH = ok)

I have a strange issue, I need to find the cause of.

My HA setup (pyton virt env) on RPi2 have been running for some months. But suddenly it stops about every second day. The Pi are responding to SSH, and I can reboot it to get everything up and running again. But I need to find a solution.

I have checked home-assistant.log, it contains just 3 entries of warnings:
2018-09-26 03:08:29 WARNING (Recorder) [homeassistant.components.recorder] Ended unfinished session (id=2 from 2018-09-24 00:17:18.547353)
2018-09-26 08:25:21 WARNING (MainThread) [homeassistant.components.zwave] Z-Wave not ready after 19002 seconds, continuing anyway
2018-09-26 23:06:25 WARNING (MainThread) [asyncio] socket.send() raised exception.

Is there another place to look?.. :slight_smile:
I have configured it to start up with:

Is there a log-location for other things?
Thanks!

You might try

systemctl status [email protected]

when it fails, to see the exit code, but I expect, judging from the last message in the log, that it is a component not handling an error case correctly.

You might get more information if you increased the logging level in the logger component

Ah, I look forward to the next fail… :slight_smile: Thanks for the advice!

I have a fail now again, systemctl status reports:

[email protected] - Home Assistant
   Loaded: loaded (/etc/systemd/system/[email protected]; enabled; vendor preset: enabled)
   Active: failed (Result: signal) since Fri 2018-10-12 18:39:22 CEST; 1 day 3h ago
  Process: 392 ExecStart=/srv/homeassistant/bin/hass -c /home/homeassistant/.homeassistant (code=killed, signal=SEGV)
 Main PID: 392 (code=killed, signal=SEGV)

Oct 12 18:02:32 automator hass[392]: 2018-10-12 18:02:32 INFO (MainThread) [homeassistant.helpers.script] Script SolnedgĂĄng: Executing step call service
Oct 12 18:02:32 automator hass[392]: 2018-10-12 18:02:32 INFO (MainThread) [homeassistant.helpers.script] Script SolnedgĂĄng: Executing step call service
Oct 12 18:02:32 automator hass[392]: 2018-10-12 18:02:32 INFO (MainThread) [homeassistant.helpers.script] Script SolnedgĂĄng: Executing step call service
Oct 12 18:02:32 automator hass[392]: 2018-10-12 18:02:32 INFO (MainThread) [homeassistant.helpers.script] Script SolnedgĂĄng: Executing step call service
Oct 12 18:12:13 automator hass[392]: 2018-10-12 18:12:13 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ios/identify to 192.168.55.198 (auth: True)
Oct 12 18:23:37 automator hass[392]: 2018-10-12 18:23:37 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ios/identify to 192.168.55.198 (auth: True)
Oct 12 18:32:54 automator hass[392]: 2018-10-12 18:32:54 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ios/identify to 192.168.55.198 (auth: True)
Oct 12 18:39:22 automator systemd[1]: [email protected]: Main process exited, code=killed, status=11/SEGV
Oct 12 18:39:22 automator systemd[1]: [email protected]: Unit entered failed state.
Oct 12 18:39:22 automator systemd[1]: [email protected]: Failed with result 'signal'.

The script solnedgĂĄng (sunset in swedish), have been working for 2 weeks. I turns on some lights at sunset.

Where shuld I look to find more details on the “error”? Or am I not seeing it in the log above? :slight_smile:

the home-assistant.log just shows some error from days when HA have been working fine:

2018-10-09 17:59:32 WARNING (MainThread) [homeassistant.components.http] legacy_api_password support has been enabled. If you don't require it, remove the 'api_pa ssword' from your http config. 2018-10-09 17:59:36 WARNING (Recorder) [homeassistant.components.recorder] Ended unfinished session (id=3 from 2018-10-01 12:48:27.737395) 2018-10-09 18:10:25 ERROR (MainThread) [frontend.js.latest.201809270] :0:0 Script error. 2018-10-09 18:10:27 ERROR (MainThread) [homeassistant.core] Timer got out of sync. Resetting 2018-10-11 18:42:14 ERROR (MainThread) [frontend.js.latest.201809270] :0:0 Script error.

Thanks!

//Sam

SEGV is a memory access violation, but searching reported errors for something like this reveals this issue, which seems to have been fixed by replacing the SD card, so I think that should be your next step.
https://github.com/home-assistant/home-assistant/issues/8079

Ah!.. Thats maybe a point.
Just got a hang again:

pi@automator:~ $ systemctl status [email protected]
â—Ź [email protected] - Home Assistant
   Loaded: loaded (/etc/systemd/system/[email protected]; enabled; vendor preset: enabled)
   Active: failed (Result: signal) since Sun 2018-10-14 06:49:04 CEST; 10h ago
  Process: 5463 ExecStart=/srv/homeassistant/bin/hass -c /home/homeassistant/.homeassistant (code=killed, signal=SEGV)
 Main PID: 5463 (code=killed, signal=SEGV)

Oct 14 06:48:18 automator hass[5463]:   File "/srv/homeassistant/lib/python3.5/site-packages/aiohttp/web_response.py", line 367, in _start
Oct 14 06:48:18 automator hass[5463]:     await writer.write_headers(status_line, headers)
Oct 14 06:48:18 automator hass[5463]:   File "/srv/homeassistant/lib/python3.5/site-packages/aiohttp/http_writer.py", line 110, in write_headers
Oct 14 06:48:18 automator hass[5463]:     self._write(buf)
Oct 14 06:48:18 automator hass[5463]:   File "/srv/homeassistant/lib/python3.5/site-packages/aiohttp/http_writer.py", line 67, in _write
Oct 14 06:48:18 automator hass[5463]:     raise ConnectionResetError('Cannot write to closing transport')
Oct 14 06:48:18 automator hass[5463]: ConnectionResetError: Cannot write to closing transport
Oct 14 06:49:04 automator systemd[1]: [email protected]: Main process exited, code=killed, status=11/SEGV
Oct 14 06:49:04 automator systemd[1]: [email protected]: Unit entered failed state.
Oct 14 06:49:04 automator systemd[1]: [email protected]: Failed with result 'signal'.

I’ll head on to a new SD and a backup… :wink: :scream:

//Sam

Okay, it is happening again… I have moved to a new SD card.
Same message again.
When I moved, I cloned the card with Win32DiskImage. I did not get any errors during the process, but could I have copied also the corruption?

I have also tried to do some research on mu old SD card, have filled it up several times with data and I have been able to copy it back again without any errors (using H2testw)… So maybe SD card are not the issue.

Any more thoughts?

Also Im still able to SSH and start HA again.

â—Ź [email protected] - Home Assistant
   Loaded: loaded (/etc/systemd/system/[email protected]; enabled; vendor preset: enabled)
   Active: failed (Result: signal) since Wed 2018-10-17 22:56:06 CEST; 9min ago
  Process: 395 ExecStart=/srv/homeassistant/bin/hass -c /home/homeassistant/.homeassistant (code=killed, signal=SEGV)
 Main PID: 395 (code=killed, signal=SEGV)

Oct 17 21:53:07 automator hass[395]: 2018-10-17 21:53:07 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ios/identify to 192.168.55.198 (auth:
Oct 17 22:03:05 automator hass[395]: 2018-10-17 22:03:05 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ios/identify to 192.168.55.198 (auth:
Oct 17 22:07:39 automator hass[395]: 2018-10-17 22:07:39 INFO (MainThread) [homeassistant.components.updater] Submitted analytics to Home Assistant servers. Infor
Oct 17 22:07:39 automator hass[395]: 2018-10-17 22:07:39 INFO (MainThread) [homeassistant.components.updater] You are on the latest version (0.80.0) of Home Assis
Oct 17 22:14:33 automator hass[395]: 2018-10-17 22:14:33 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ios/identify to 192.168.55.198 (auth:
Oct 17 22:24:06 automator hass[395]: 2018-10-17 22:24:06 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ios/identify to 192.168.55.198 (auth:
Oct 17 22:31:38 automator hass[395]: 2018-10-17 22:31:38 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ios/identify to 192.168.55.198 (auth:
Oct 17 22:56:06 automator systemd[1]: [email protected]: Main process exited, code=killed, status=11/SEGV
Oct 17 22:56:06 automator systemd[1]: [email protected]: Unit entered failed state.
Oct 17 22:56:06 automator systemd[1]: [email protected]: Failed with result 'signal'.

I would suggest starting with an new image and moving your configuration over instead of cloning. Any bit errors in the clone will be copied over.

1 Like

That’s the next step I will take, maybe also upgrade from rpi2 to 3, if the installation are dependent on the hardware.

I have same issue, I tried to replace sd card two times but nothing.
I use hassbian on raspberry pi B+ (first version).