After months of trying to track down why my Home Assistant server regularly crashes every 5-7 days. I finally just gave up and started over from scratch.
Downloaded the latest release for the RPi, deleted all the unused entries from all my config files, removed config files no longer in use and reinstalled MQTT, Mosquitto and File Editor.
Then copied all my configuration files back in and rebooted. Cleaned up some misc issues with the MQTT login and it all ran for about 10 minutes after rebooting again with no errors. Then it just FROZE with āReload UIā.
Attached are all my yaml files. I really donāt know why this is happening. I really canāt get it more basic than this. Hopefully someone can point out what Iāve done wrong.
Thanks.
configuration.yaml
homeassistant:
# Name of the location where Home Assistant is running
name: Home
# Location required to calculate the time the sun rises and sets
latitude: 0
longitude: 0
# Impacts weather/sunrise data (altitude above sea level in meters)
elevation: 4
# metric for Metric, imperial for Imperial
unit_system: metric
# Pick yours from here: http://en.wikipedia.org/wiki/List_of_tz_database_time_zones
time_zone: Atlantic/Madeira
# Customization file
customize: !include customize.yaml
recorder:
purge_interval: 2
purge_keep_days: 7
db_url: !secret mysql_recorder
lovelace:
mode: yaml
# Enables configuration UI
config:
# Checks for available updates
# Note: This component will send some information about your system to
# the developers to assist with development of Home Assistant.
# For more information, please see:
# https://home-assistant.io/blog/2016/10/25/explaining-the-updater/
updater:
# Optional, allows Home Assistant developers to focus on popular components.
# include_used_components: true
# System Health
system_health:
# Discover some devices automatically
discovery:
# Allows you to issue voice commands from the frontend in enabled browsers
conversation:
# Enables support for tracking state changes over time
history:
# View all events in a logbook
logbook:
# Enables a map showing the location of tracked devices
map:
# Track the sun
sun:
# Cloud
cloud:
group: !include groups.yaml
automation: !include automations.yaml
script: !include scripts.yaml
panel_custom:
- name: hassio-main
sidebar_title: Configurator
sidebar_icon: hass:settings
js_url: /api/hassio/app/entrypoint.js
url_path: configurator
embed_iframe: true
require_admin: true
config:
ingress: core_configurator
# Sonoff Switches
switch TaskLamp:
- platform: mqtt
name: "Task Lamp"
command_topic: "cmnd/sonoff-01/power"
state_topic: "stat/sonoff-01/POWER"
qos: 1
payload_on: "ON"
payload_off: "OFF"
retain: false
switch TableLamp:
- platform: mqtt
name: "Table Lamp"
command_topic: "cmnd/sonoff-02/power"
state_topic: "stat/sonoff-02/POWER"
qos: 1
payload_on: "ON"
payload_off: "OFF"
retain: false
camera:
- platform: foscam
ip: 172.27.2.21
port: 88
username: !secret foscam_username
password: !secret foscam_password
name: Kitchen
- platform: foscam
ip: 172.27.2.22
port: 88
username: !secret foscam_username
password: !secret foscam_password
name: Front Door
sensor Livingroom:
- platform: mqtt
name: "Temperature"
state_topic: "tele/sonoff-02/SENSOR"
value_template: '{{ value_json.SI7021.Temperature }}'
unit_of_measurement: "Ā°C"
availability_topic: "tele/sonoff-02/LWT"
payload_available: "Online"
payload_not_available: "Offline"
- platform: mqtt
name: "Humidity"
state_topic: "tele/sonoff-02/SENSOR"
value_template: '{{ value_json.SI7021.Humidity }}'
unit_of_measurement: "%"
availability_topic: "tele/sonoff-02/LWT"
payload_available: "Online"
payload_not_available: "Offline"
sensor:
- platform: time_date
display_options:
- 'date_time'
#### JGAurora A4 3D Printer configurations
switch A4:
- platform: mqtt
name: "A4 3D Printer"
command_topic: "cmnd/sonoff-a4/power"
state_topic: "stat/sonoff-a4/POWER1"
qos: 1
payload_on: "ON"
payload_off: "OFF"
retain: false
#### JGMaker A6 3D Printer configurations
switch A6:
- platform: mqtt
name: "A6 3D Printer"
command_topic: "cmnd/sonoff-a6/power"
state_topic: "stat/sonoff-a6/POWER1"
qos: 1
payload_on: "ON"
payload_off: "OFF"
retain: false
#### JGMaker Magic 3D Printer configurations
switch Magic:
- platform: mqtt
name: "Magic 3D Printer"
command_topic: "cmnd/sonoff-magic/power"
state_topic: "stat/sonoff-magic/POWER1"
qos: 1
payload_on: "ON"
payload_off: "OFF"
retain: false
#### AnyCubic Kossel 3D Printer configurations
switch Kossel:
- platform: mqtt
name: "Kossel 3D Printer"
command_topic: "cmnd/sonoff-kossel/power"
state_topic: "stat/sonoff-kossel/POWER1"
qos: 1
payload_on: "ON"
payload_off: "OFF"
retain: false
#### Hevo 3D Printer configurations
switch Hevo:
- platform: mqtt
name: "Hevo 3D Printer"
command_topic: "cmnd/sonoff-hevo/power"
state_topic: "stat/sonoff-hevo/POWER1"
qos: 1
payload_on: "ON"
payload_off: "OFF"
retain: false
#### Hevo 3D Printer configurations
switch Artist_D:
- platform: mqtt
name: "Artist D 3D Printer"
command_topic: "cmnd/sonoff-ad/power"
state_topic: "stat/sonoff-ad/POWER1"
qos: 1
payload_on: "ON"
payload_off: "OFF"
retain: false
Thanks! I totally missed that, but it didnāt throw any errors and seems to be working. However, I have corrected it and restarted Home Assistance. Hope that was the issue!
Still died 1 week later. It just locks up and fails to respond over http or ssh. Only way to get it working again is to power off (unplug power) and plug it back in.
This sounds hardware-related to me. Your configuration may put more or less load on the system, which will influence its crash frequency, but it shouldnāt be doing this at all. New SD card might be the answer, if youāre sure itās not overheating.
Can you ssh into the Pi, or even ping it, once itās frozen?
does it crash every monday at like 2am? If the answer is yes, get a new SD card. Thereās a whole thread about this and the solution for everyone was SD card related.
Iām just going to go out on a limb and say that the problem is definitely SD card related. It crashes on a monday based on your post history, youāre using a pi, and most likely using an SD card.
Thanks everyone for the suggestions. Here are some answers:
Itās not every Monday, its every 7 days from the last crash or manual reboot.
Its a new SD Card. Iāve tried several. Still crashes. Its a 32GB Class 10.
This is my second Power Supply. Itās a 50W with 4 x 5.2V 2.5A outlets and 1 USB C Outlet with nothing plugged into it. There are two other Raspberry Piās (3B and 2B) plugged into the other two USB A outlets running other services and have never crashed. This Power Supply has LCD displays showing Voltage and Power draw of each Pi plugged into it. Currently they are running between .1A and .2A @ 5.2V
As for overheating. The temps are around 40c, with heatsink and a fan. But Iāve disabled the Pi monitor plugin as part of my shot-gun testing of this problem. Thereās not much left of my original configuration. And not much left to go wrong.
If this still crashes, Iāll steal the Samsung 64GB evo select mico xc class 3 SD card from my camera and give it a try.
You sound exactly like everyone else on that thread and youāve posted every monday. Just saying. Every single person āIts not my sd card. The card is XYZ and I have 8 piās running with this sd.ā Then a month later āI replace the sd and now it no longer crashes.ā
I hope youāre right and the Samsung SD card fixes it. But like I said, Iāve already replaced the card several times over the many months this has been happening. The only thing I havenāt done is try the xc card from Samsung. And just as an TL;DR - its not every Monday, its every 7 days. If I manually reboot in the middle of the week, it will crash in the middle of next week.
Stillā¦hope its the SD card, put an end to this saga, and I can start restoring my configuration back together the way it was.
So for giggles, reboot wednesday. If it crashes next wednesday itāll most likely be database related and I would recommend moving towards a different database instead of the sqllite.
It appears to be in the throws of immiment death - well before its usual 1 week lifespan. Sensors and cameras are timing out. System is losing connection, etc.
Hereās the current system log and screenshot of the usaged.
21-06-16 09:25:58 INFO (MainThread) [supervisor.resolution.check] Starting system checks with state CoreState.RUNNING
21-06-16 09:25:58 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.SECURITY/ContextType.CORE
21-06-16 09:25:58 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.PWNED/ContextType.ADDON
21-06-16 09:25:58 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.FREE_SPACE/ContextType.SYSTEM
21-06-16 09:25:58 INFO (MainThread) [supervisor.resolution.check] System checks complete
21-06-16 09:25:58 INFO (MainThread) [supervisor.resolution.evaluate] Starting system evaluation with state CoreState.RUNNING
21-06-16 09:25:59 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
21-06-16 09:25:59 INFO (MainThread) [supervisor.resolution.fixup] Starting system autofix at state CoreState.RUNNING
21-06-16 09:25:59 INFO (MainThread) [supervisor.resolution.fixup] System autofix complete
21-06-16 10:13:10 WARNING (MainThread) [supervisor.host.network] Can't update connectivity information: Error: Timeout was reached
21-06-16 10:13:10 INFO (MainThread) [supervisor.homeassistant.api] Updated Home Assistant API token
21-06-16 10:14:45 WARNING (MainThread) [supervisor.host.network] Can't update connectivity information: Error: Timeout was reached
21-06-16 10:25:59 INFO (MainThread) [supervisor.resolution.check] Starting system checks with state CoreState.RUNNING
21-06-16 10:25:59 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.SECURITY/ContextType.CORE
21-06-16 10:25:59 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.PWNED/ContextType.ADDON
21-06-16 10:25:59 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.FREE_SPACE/ContextType.SYSTEM
21-06-16 10:25:59 INFO (MainThread) [supervisor.resolution.check] System checks complete
21-06-16 10:25:59 INFO (MainThread) [supervisor.resolution.evaluate] Starting system evaluation with state CoreState.RUNNING
21-06-16 10:26:00 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
21-06-16 10:26:00 INFO (MainThread) [supervisor.resolution.fixup] Starting system autofix at state CoreState.RUNNING
21-06-16 10:26:00 INFO (MainThread) [supervisor.resolution.fixup] System autofix complete
21-06-16 10:26:10 WARNING (MainThread) [supervisor.host.network] Can't update connectivity information: Error: Timeout was reached
21-06-16 10:26:38 INFO (MainThread) [supervisor.jobs] 'Tasks._update_addons' blocked from execution, no host internet connection
21-06-16 10:26:53 WARNING (MainThread) [supervisor.host.network] Can't update connectivity information: Error: Timeout was reached
21-06-16 10:27:19 INFO (MainThread) [supervisor.updater] Fetching update data from https://version.home-assistant.io/stable.json
21-06-16 10:27:33 WARNING (MainThread) [supervisor.host.network] Can't update connectivity information: Error: Timeout was reached
21-06-16 10:28:24 WARNING (MainThread) [supervisor.host.network] Can't update connectivity information: Error: Timeout was reached
21-06-16 10:29:04 WARNING (MainThread) [supervisor.jobs] 'GitRepo.pull' blocked from execution, no supervisor internet connection
21-06-16 10:29:04 WARNING (MainThread) [supervisor.jobs] 'GitRepo.pull' blocked from execution, no supervisor internet connection
21-06-16 10:29:04 ERROR (MainThread) [asyncio] Task exception was never retrieved
future: <Task finished name='Task-68136' coro=<Repository.update() done, defined at /usr/src/supervisor/supervisor/store/repository.py:106> exception=StoreJobError("'GitRepo.pull' blocked from execution, no supervisor internet connection")>
Traceback (most recent call last):
File "/usr/src/supervisor/supervisor/store/repository.py", line 110, in update
await self.git.pull()
File "/usr/src/supervisor/supervisor/jobs/decorator.py", line 86, in wrapper
raise self.on_condition(error_msg, _LOGGER.warning) from None
supervisor.exceptions.StoreJobError: 'GitRepo.pull' blocked from execution, no supervisor internet connection
21-06-16 10:29:04 ERROR (MainThread) [asyncio] Task exception was never retrieved
future: <Task finished name='Task-68138' coro=<Repository.update() done, defined at /usr/src/supervisor/supervisor/store/repository.py:106> exception=StoreJobError("'GitRepo.pull' blocked from execution, no supervisor internet connection")>
Traceback (most recent call last):
File "/usr/src/supervisor/supervisor/store/repository.py", line 110, in update
await self.git.pull()
File "/usr/src/supervisor/supervisor/jobs/decorator.py", line 86, in wrapper
raise self.on_condition(error_msg, _LOGGER.warning) from None
supervisor.exceptions.StoreJobError: 'GitRepo.pull' blocked from execution, no supervisor internet connection
21-06-16 10:29:07 INFO (MainThread) [supervisor.jobs] 'StoreManager.update_repositories' blocked from execution, no supervisor internet connection
21-06-16 10:29:07 INFO (MainThread) [supervisor.store] Loading add-ons from store: 63 all - 0 new - 0 remove
21-06-16 10:29:17 WARNING (MainThread) [supervisor.host.network] Can't update connectivity information: Error: Timeout was reached
21-06-16 10:31:25 WARNING (MainThread) [supervisor.host.network] Can't update connectivity information: Error: Timeout was reached
21-06-16 10:32:13 WARNING (MainThread) [supervisor.host.network] Can't update connectivity information: Error: Timeout was reached
21-06-16 10:32:54 WARNING (MainThread) [supervisor.host.network] Can't update connectivity information: Error: Timeout was reached
21-06-16 10:44:12 INFO (MainThread) [supervisor.homeassistant.api] Updated Home Assistant API token
21-06-16 10:55:45 WARNING (MainThread) [supervisor.host.network] Can't update connectivity information: Error: Timeout was reached
21-06-16 11:02:52 INFO (MainThread) [supervisor.host.info] Updating local host information
21-06-16 11:02:54 INFO (MainThread) [supervisor.host.services] Updating service information
21-06-16 11:02:55 INFO (MainThread) [supervisor.host.network] Updating local network information
21-06-16 11:03:02 INFO (MainThread) [supervisor.host.sound] Updating PulseAudio information
21-06-16 11:03:02 INFO (MainThread) [supervisor.host] Host information reload completed
21-06-16 11:07:08 WARNING (MainThread) [supervisor.host.network] Can't update connectivity information: Error: Timeout was reached
21-06-16 11:07:59 WARNING (MainThread) [supervisor.host.network] Can't update connectivity information: Error: Timeout was reached
21-06-16 11:08:50 WARNING (MainThread) [supervisor.host.network] Can't update connectivity information: Error: Timeout was reached
21-06-16 11:09:41 WARNING (MainThread) [supervisor.host.network] Can't update connectivity information: Error: Timeout was reached
21-06-16 11:10:32 WARNING (MainThread) [supervisor.host.network] Can't update connectivity information: Error: Timeout was reached
21-06-16 11:26:00 INFO (MainThread) [supervisor.resolution.check] Starting system checks with state CoreState.RUNNING
21-06-16 11:26:00 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.SECURITY/ContextType.CORE
21-06-16 11:26:00 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.PWNED/ContextType.ADDON
21-06-16 11:26:11 WARNING (MainThread) [supervisor.utils.pwned] Can't fetch HIBP data: Timeout
21-06-16 11:26:22 WARNING (MainThread) [supervisor.utils.pwned] Can't fetch HIBP data: Timeout
21-06-16 11:26:22 INFO (MainThread) [supervisor.resolution.checks.base] Run check for IssueType.FREE_SPACE/ContextType.SYSTEM
21-06-16 11:26:22 INFO (MainThread) [supervisor.resolution.check] System checks complete
21-06-16 11:26:22 INFO (MainThread) [supervisor.resolution.evaluate] Starting system evaluation with state CoreState.RUNNING
21-06-16 11:26:22 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
21-06-16 11:26:22 INFO (MainThread) [supervisor.resolution.fixup] Starting system autofix at state CoreState.RUNNING
21-06-16 11:26:22 INFO (MainThread) [supervisor.resolution.fixup] System autofix complete
It should still have a network ip. Are all devices on your router starting with 172? Typically thatās reserved for internal to a computer, not a network.