Spontaneous restarts (all sensors become unavailable temporarily)

BratislaveBoy · October 24, 2022, 10:00am

Since recently I’m seeing that sensors from different sources (Zigbee2MQTT but also unrelated wifi sensors) become unavailble for short periods of time.

My own debugging so far:

Considering this is from different sources, I exlude add-on specific (eg Zigbee2MQTT) or transmission-related (Zigbee stick) issues. It seems that Hassio in itself has an issue.
My log level was at “debug” but log files were so huge (several gigabytes) I couldn’t even open them properly. I changed to “warning”, but don’t see any specific warnings at these timepoints.
It is a pretty vague issue, and I couldn’t find a topic on these forums that was of help so far.

Hassio is running on a capable Linux server, in Virtualbox. I don’t think the PC is hanging.

I’ve recenlty changed to info to investigate more. Does this ring a bell do anyone? What can I further do to investigate?

BratislaveBoy · October 26, 2022, 5:24am

Using the restart blueprint, I found this related to random restarts/crashes of Hassio

Googling with this in mind, I find many fellow sufferers with multiple causes. I don’t have an answer to my specific issues.

The home-assistant.log and home-assistant.log.1 are not helpful: the home-assistant.log.1 has a few lines of the startup which then aborts. Considering home-assistant.log.1 is the previous startup, it seems Hassio is restarting at least two times.

Also, looked at CPU/RAM/HDD, none of them seem to be maxing out.

BratislaveBoy · October 28, 2022, 8:07am

to continue my monologue

I managed to retrieve the log.1 file today (previously it was overwritten because of a double reboot), but I don’t find anything exceptional here at the timestamp of the spontaneous reboot.

There is only an error related to a plugin, but I don’t expect this to cause a reboot.

Honestly, I don’t know what to do at this point and am considering start a HA install from scratch, but that will be painful.

2022-10-28 10:01:17.099 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
File “/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py”, line 151, in _handle_refresh_interval
await self._async_refresh(log_failures=True, scheduled=True)
File “/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py”, line 283, in _async_refresh
self.async_update_listeners()
File “/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py”, line 110, in async_update_listeners
update_callback()
File “/usr/src/homeassistant/homeassistant/helpers/entity.py”, line 533, in async_write_ha_state
self._async_write_ha_state()
File “/usr/src/homeassistant/homeassistant/helpers/entity.py”, line 571, in _async_write_ha_state
state = self._stringify_state(available)
File “/usr/src/homeassistant/homeassistant/helpers/entity.py”, line 539, in _stringify_state
if (state := self.state) is None:
File “/config/custom_components/skodaconnect/sensor.py”, line 50, in state
return self.instrument.state
File “/usr/local/lib/python3.10/site-packages/skodaconnect/dashboard.py”, line 124, in state
val = super().state
File “/usr/local/lib/python3.10/site-packages/skodaconnect/dashboard.py”, line 55, in state
if hasattr(self.vehicle, self.attr):
File “/usr/local/lib/python3.10/site-packages/skodaconnect/vehicle.py”, line 1396, in last_connected
return last_connected.isoformat()
UnboundLocalError: local variable ‘last_connected’ referenced before assignment

BratislaveBoy · October 29, 2022, 7:13am

Another restart, but the last errors in the log are different and must be unrelated to the restart.

If anyone could suggest anything else I could do to debug, do let me know

2022-10-29 00:00:04.569 WARNING (MainThread) [custom_components.battery_sim] Export sensor value decreased - meter may have been reset
2022-10-29 00:00:04.576 WARNING (MainThread) [custom_components.battery_sim] Export sensor value decreased - meter may have been reset
2022-10-29 00:00:04.578 ERROR (MainThread) [homeassistant.helpers.template_entity] TemplateError(‘ZeroDivisionError: float division by zero’) while processing template ‘Template(“{{ ((1-(float(states(‘sensor.myenergi_my_home_grid_export_today’))/float(states(‘sensor.myenergi_my_home_generated_today’))))*100)|round(0) }}”)’ for attribute ‘_attr_native_value’ in entity ‘sensor.jd_percentsolarused_kwh’
2022-10-29 00:00:04.591 ERROR (MainThread) [homeassistant.helpers.template_entity] TemplateError(‘ZeroDivisionError: float division by zero’) while processing template ‘Template(“{{ (float(states(‘sensor.jd_selfconsumption_kwh’))/(float(states(‘sensor.myenergi_my_home_grid_import_today’))+float(states(‘sensor.jd_selfconsumption_kwh’)))*100)|round(0) }}”)’ for attribute ‘_attr_native_value’ in entity ‘sensor.jd_percentgreen_kwh’

BratislaveBoy · November 7, 2022, 6:07am

As a final resort, I have gone through the pain of rebuilding a new Hassio OS. Backing up the configuration.yaml, dashboards, zigbee2mqtt config, and automations made it easier to reinstall, but it did take few hours to get everything right.

The reasoning of reinstalling Hassio was that during the many things I have tried in Hassio, the ‘restart bug’ occurred, and by reinstalling Hassio fresh with only the necessary add-ons, the bug would be resolved. Although things ran successfully the first 48 hours or so, the system restarted itself again for unknown reason two times by now.

I don’t know what to do next. Googling again for ‘spontaneous restarts hassio’ returns some topics, but they are either unresolved or do not have the same cause.

I further used the “Watchman” integration to remove any issues with broken entities, and I resolved any remaining warnings and errors I had in the logs (some templating issues). Although I don’t believe those can be a cause for a reboot, I’m just trying anything at this point.

michaelblight · November 13, 2022, 3:59am

I’ve been having the same thing happen for the last 4 days or so - HA restarting a couple of times a day. Suspiciously it started soon after I upgraded from 2022.9 to 2022.11.

I too am running HASSOS under VirtualBox. I also have another VM where I run Mosquitto, Node Red and other stuff under Docker. When HA crashes, the whole docker engine on that VM dies too. I think this is because the HA shares I have mapped “go away”. In journalctl I’m seeing a message about docker trying to connect to it. Next time HA dies I will look at journalctl on the HASSOS VM around that time. Normally VM issues are due to memory and the whole machine crashes, but it doesn’t look like it this time, and I have 50% memory free on the host anyway. And you can see here the VM is pretty relaxed…

michaelblight · November 14, 2022, 1:02am

HA crashed again last night, but this time didn’t take out my Node Red server as well. I had put in “shutdown” and “start” automations to get a log of when it’s happening, and noticed that “shutdown” does not occur in these cases. Therefore I think it’s probably the OS that is choosing to restart. I will look into journalctl later today.

michaelblight · November 14, 2022, 8:04am

The first culprit is the iCloud integration. It seems to be doing a “SystemExit”. I will post this separately and see if others are getting it.

2022-11-13 15:19:48.852 ERROR (MainThread) [homeassistant] Error doing job: Future exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/src/homeassistant/homeassistant/components/icloud/account.py", line 317, in keep_alive
    self.update_devices()
  File "/usr/src/homeassistant/homeassistant/components/icloud/account.py", line 185, in update_devices
    status = device.status(DEVICE_STATUS_SET)
  File "/usr/local/lib/python3.10/site-packages/pyicloud/services/findmyiphone.py", line 119, in status
    self.manager.refresh_client()
  File "/usr/local/lib/python3.10/site-packages/pyicloud/services/findmyiphone.py", line 34, in refresh_client
    req = self.session.post(
  File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 635, in post
    return self.request("POST", url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/pyicloud/base.py", line 78, in request
    response = super().request(method, url, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.10/site-packages/requests/sessions.py", line 701, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/requests/adapters.py", line 489, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 703, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/local/lib/python3.10/http/client.py", line 1374, in getresponse
    response.begin()
  File "/usr/local/lib/python3.10/http/client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "/usr/local/lib/python3.10/http/client.py", line 279, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/local/lib/python3.10/socket.py", line 705, in readinto
    return self._sock.recv_into(b)
  File "/usr/local/lib/python3.10/ssl.py", line 1274, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/local/lib/python3.10/ssl.py", line 1130, in read
    return self._sslobj.read(len, buffer)
SystemExit

Tyfoon · November 16, 2022, 11:18am

After years and years of happy use I now also have spontaneous reboots. Did you find any causes?

BratislaveBoy · November 16, 2022, 11:50am

I can carefully say that the reboots are not occurring anymore for me, but I don’t know what might have helped during my troubleshooting process. To quickly reiterate:
0. Noticed sensors become unavailable temporarily.

Found that spontaneous reboots were causing my sensors to have gaps trough the “restart notification” automation mentioned above.
Googling the issue and checking the log and log.1 files did not help.
Reinstalled Hassio completely (Hassio OS running in Virtualbox running on Ubuntu), did not resolve the issue: reboots still occurred.
Went on with clearing some minor errors that occurred in my log files (e.g. templating issues that sometimes give zero division errors). I don’t think these were related to reboots.
For unknown reason I’m > 7 days reboot free now.

Tyfoon · November 16, 2022, 12:22pm

Thanks. I’m running hass supervised (hassIO) on an intel NUC. I’m also checking all errors (which have been there for ages) and trying to have the disappear. Hope that helps.

michaelblight · November 21, 2022, 7:21am

I’m not any closer. The journalctl command suggests something catastrophic, but it seems to change from one crash to the next. For example:

Nov 17 20:00:59 homeassistant dockerd[352]: fatal error: unexpected signal during runtime execution
Nov 17 20:00:59 homeassistant dockerd[352]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xf0ded4]
Nov 17 20:00:59 homeassistant dockerd[352]: runtime stack:
Nov 17 20:00:59 homeassistant dockerd[352]: runtime.throw({0x2c65ff5, 0x10082a4})
Nov 17 20:00:59 homeassistant dockerd[352]:         runtime/panic.go:1198 +0x71
Nov 17 20:00:59 homeassistant dockerd[352]: runtime.sigpanic()
N

Nov 21 03:37:17 homeassistant dockerd[342]: fatal error: runtime: sudog with non-nil next
Nov 21 03:37:17 homeassistant dockerd[342]: goroutine 69 [running]:
Nov 21 03:37:17 homeassistant dockerd[342]: runtime.throw({0x2c43b3a, 0xc001feb740})

There’s nothing interesting in the log just before these - in both cases there was no log activity for about 4 seconds before.

Around the time it occurs, the HA log fills up with recorder issues, so I assume that’s because MariaDB has crashed. So I think a database corruption is the most likely root cause. I’ve tried backing up and restoring just this addon, but it doesn’t help. I assume that the backup is a file copy, so any database corrputions would be kept. I’ve used “mysqlcheck --all-databases” and it all seems fine. I haven’t been able to find anything in the log for MariaDB because it’s clogged up by deCONZ and I don’t know what unit (or whatever) to specify. And there doesn’t appear to be anything significant in the VM log.

No sure what to do next.

BratislaveBoy · November 22, 2022, 1:47pm

Remove point 5, just experienced another reboot… Frequency is a lot lower than when I first reported it though…

BratislaveBoy · February 3, 2023, 9:05am

Hi All,

I recently moved to VMWare instead of VirtualBox to run Hassio on Linux. This actually solves this reboot problem (at least for the passed 3 weeks). If you are on the same boat, give it a try. Thanks to the backup/restore option the process what relatively straightforward, apart from some settings that were not automatically moved (e.g. I had to re-create the mqtt user and some other minor points like setting up my Zigbee hardware correctly).