pdwonline
(When a goat is on the net)
February 7, 2023, 8:27pm
1
Running HA core on docker for many many years on a NUC i5. Now, it seems something bad is going on.
HA response got slower and slower. DB disconnections: got an error readying communications packets (I am using MariaDB 10.8). Timeouts more and more: Setup timed out for bootstrap - moving forward, components/hunterdouglas_powerview/shade_data.py:74 Error doing job: Task exception was never retrieved.
It is getting worse and worse. Today, half of my Zwave devices seems died in HA. Fortunately the came up after reboot and long waiting.
How could I best diagnose what’s going on? What is blocking Home Assistant from wrapping up the start up phase?
Any ideas are welcome!
How big has the database grown? Are you SQLlite, Maria, or other for the database? Are the hard drives eMMC which have a short life versus other hard drives?
pdwonline
(When a goat is on the net)
February 7, 2023, 9:25pm
3
I am on MariaDb and started over with an empty database
What does the storage settings report on memory usage and eMMC life?
pdwonline
(When a goat is on the net)
February 8, 2023, 7:34am
5
How can I get information about this?
pdwonline
(When a goat is on the net)
February 8, 2023, 8:11am
6
Found some information. These are my results, but I don’t have a clue if this is indicating a problem:
martctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-58-generic] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Device Model: ADATA SP550NS38
Serial Number: 2G4220001161
Firmware Version: P0414B
User Capacity: 120,034,123,776 bytes [120 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
TRIM Command: Available, deterministic, zeroed
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2 (minor revision not indicated)
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Feb 8 09:08:42 2023 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 171) seconds.
Offline data collection
capabilities: (0x71) SMART execute Offline immediate.
No Auto Offline data collection support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0002) Does not save SMART data before
entering power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 2) minutes.
Conveyance self-test routine
recommended polling time: ( 1) minutes.
SCT capabilities: (0x0035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x0000 100 100 000 Old_age Offline - 0
5 Reallocated_Sector_Ct 0x0000 100 100 000 Old_age Offline - 0
9 Power_On_Hours 0x0000 100 100 000 Old_age Offline - 13623
12 Power_Cycle_Count 0x0000 100 100 000 Old_age Offline - 67
160 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
161 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 114
163 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 10
164 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 167326
165 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 161
166 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 61
167 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 127
168 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 1000
169 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 88
175 Program_Fail_Count_Chip 0x0000 100 100 000 Old_age Offline - 0
176 Erase_Fail_Count_Chip 0x0000 100 100 000 Old_age Offline - 0
177 Wear_Leveling_Count 0x0000 100 100 050 Old_age Offline - 671
178 Used_Rsvd_Blk_Cnt_Chip 0x0000 100 100 000 Old_age Offline - 0
181 Program_Fail_Cnt_Total 0x0000 100 100 000 Old_age Offline - 0
182 Erase_Fail_Count_Total 0x0000 100 100 000 Old_age Offline - 0
192 Power-Off_Retract_Count 0x0000 100 100 000 Old_age Offline - 13
194 Temperature_Celsius 0x0000 100 100 000 Old_age Offline - 35
195 Hardware_ECC_Recovered 0x0000 100 100 000 Old_age Offline - 0
196 Reallocated_Event_Count 0x0000 100 100 016 Old_age Offline - 0
197 Current_Pending_Sector 0x0000 100 100 000 Old_age Offline - 0
198 Offline_Uncorrectable 0x0000 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0000 100 100 050 Old_age Offline - 0
232 Available_Reservd_Space 0x0000 100 100 000 Old_age Offline - 100
241 Total_LBAs_Written 0x0000 100 100 000 Old_age Offline - 808106
242 Total_LBAs_Read 0x0000 100 100 000 Old_age Offline - 76839
245 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 334652
SMART Error Log Version: 1
Warning: ATA error count 0 inconsistent with error log pointer 1
ATA Error Count: 0
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 0 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
00 ec 00 00 00 00 00 Device Fault
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 00 00 00 00 00 00 00:00:00.000 READ DMA
Warning! SMART Self-Test Log Structure error: invalid SMART checksum.
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 55 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
7 0 65535 Read_scanning was completed without error
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Okay left panel, Settings>Storage> 2 items should be listed, What is used/total capacity and life used and remaining on the eMMC drive
pdwonline
(When a goat is on the net)
February 10, 2023, 7:30am
8
Since I am on Core, there is no Settings/Storage for me. I do value your suggestion that my eMMC could be weared to much., so I will try to move HA to another nuc that I have, and see how it behaves