HA stops work every Monday

So, I had ordered a new raspberry PI 4 8gb to replace my Pi 4 4gb, I restored the backup from that one. Only other change was moving from a 64gb A1 to an A2, I did this last Friday, for the first time in months, my HA didn’t crash on Monday morning! So no config change, just hardware. I’ll report back next Monday!

I forgot about the supervisor log. Could one of you check the supervisor log after a crash. Is it even reachable? One would need to follow the debugging instructions listed here to setup port 22222. Please, don’t do this, if you are uncomfortable with the instructions! :slight_smile:

Maybe someone of you could provide the log out of the safed sdCard? :slight_smile:

@paddy0174, where on the SD card can I find the supervisor log?

@captain_daveman, would be great if that solved your problem! might be indication that some heavy process is triggered that is just too much for our PI3b’s. The search therefore continues :wink:

@paddy0174, does @plevuus have a point? I can’t reach my PI in any form on monday morning so getting the log form the SD card might be the best bet.

@Harmpert just to be clear, I wasn’t using a Pi3b I was already using a Pi4, only difference is that I went from a 4gb to an 8gb one. I have a feeling it was the change of SD card type that may have solved it, I say solve, it’s probably more a case of ‘coped’ with it, I’ll take a look at my logs on Monday, if I’ve survived without crashing again I will be able to take a look and see what was going on around the 00:00-02:00 timeframe

I’v been searching on the SD for quite some time, but I really cannot find the supervisor log, so any tip is welcome. This weekend I’ll try to setup the 22222 ssh port, maybe that can be used on Monday morning.
@captain_daveman, good to see you got rid of the crashes. ( at least once :wink: )

For some reason importing the public key from an USB for setting up the 22222 SSH access fails. I get Unknown error, see supervisor logs. Supervisor log only confirms USB is recognized, but doesn’t display any error:

21-03-20 16:56:09 INFO (MainThread) [supervisor.hardware.monitor] Detecting HardwareAction.ADD usb hardware /dev/bus/usb/001/007

The core-log gives some hint:

2021-03-20 18:04:11 ERROR (MainThread) [homeassistant.components.hassio.handler] Client error on os/config/sync request Cannot connect to host 172.30.32.2os:80 ssl:default [Name does not resolve]
2021-03-20 18:04:11 ERROR (MainThread) [homeassistant.components.hassio] Failed to to call os/config/sync - 

Host-log only shows ‘mounting’ of the drive:

[430112.206756] usb 1-1.1.3: new high-speed USB device number 7 using dwc_otg
[430112.341873] usb 1-1.1.3: New USB device found, idVendor=13fe, idProduct=3e00, bcdDevice= 1.00
[430112.341892] usb 1-1.1.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[430112.341904] usb 1-1.1.3: Product: USB Flash Drive
[430112.341916] usb 1-1.1.3: Manufacturer: Philips
[430112.341928] usb 1-1.1.3: SerialNumber: 07B8120890C73B5B
[430112.343881] usb-storage 1-1.1.3:1.0: USB Mass Storage device detected
[430112.344558] scsi host0: usb-storage 1-1.1.3:1.0
[430112.511037] usbcore: registered new interface driver uas
[430113.493635] scsi 0:0:0:0: Direct-Access     Philips  USB Flash Drive  PMAP PQ: 0 ANSI: 0 CCS
[430114.083144] sd 0:0:0:0: [sda] 7570752 512-byte logical blocks: (3.88 GB/3.61 GiB)
[430114.083609] sd 0:0:0:0: [sda] Write Protect is off
[430114.083627] sd 0:0:0:0: [sda] Mode Sense: 23 00 00 00
[430114.084058] sd 0:0:0:0: [sda] No Caching mode page found
[430114.084250] sd 0:0:0:0: [sda] Assuming drive cache: write through
[430114.140458]  sda: sda1
[430114.144340] sd 0:0:0:0: [sda] Attached SCSI removable disk

The drive is FAT formatted and has name ‘CONFIG’. The file containing the public key is named authorized_keys, is ANSI encoded with only LF (no CR). I do not know why it fails. any tips?

Hi all, my PI went bananas again last night. I did hook up a monitor to my PI and it did show error messages from 01:52 onwards.

I have added my findings to the github entry as the person involved there (Michael) suggested I do that.
In short the following error messages seem to be the interesting ones:
[#####.#####] mmc0: card never left busy state
[#####.#####] mmc0: error -110 whilst initialising SD card

Looking up these messages in google show remarks on bad SD cards.
But I am not entirely convinced here:

  1. The error only happens every sunday at around 02:00
  2. Rebooting any other time (hard or the proper way) does not cause this to happen

One remark did trigger a new line of thought though: it might be that my card is a fake 32GB.
The card tells the system it is indeed 32GB but in reality it is only 16GB.
At 02:00 the PI is doing some health check including the state of the SD card and tries to access the SD card beyond the 16GB and goes into error mode.

So, I am going to buy a new 32GB card and try to copy my HASS.IO setup to this new card and look what happens. If no error occurs then I will check the old card to see if my assumption is correct.

Link to the github entry
HA stops work every Monday · Issue #47928 · home-assistant/core

I rebuilt my instance onto a new SD card last week - and also deleted the database after recovering from backup.
This morning - all still working for the first time in weeks (might be months by now)
Not sure which, if either, of the above changes made the difference. This is starting to look like an SD card issue as mentioned by Harmpert above.

@Wilber keep your fingers crossed! Next weekend will be the test!

Question:
How did you get your configuration onto the new card?

Yes, fair point! :slight_smile: My database does seem to be growing quite quickly these days.

I just used the standard backup/recover method.
Took a snapshot before shutting down, downloaded to pc. Created the new disk/image of HA. Installed Samba, uploaded the snapshot to HA and recovered from that.

You do not even have to install samba. Allready during the onbording you can choose to select and upload a snapshot from your computer. See this post.
I remember it didn’t work flawlessly in my case, but it end it worked. (can’t full remember what went wrong and how I solved it. I think the progress window didn’t show, but I just waited)

@Wilber @plevuus Thanks for the tip!

@Plevuus although I am going to try the card route I am still not convinced that the problem we are facing is necessarily a card issue. I have seen some more people having instability problems that were only temporarily solved by using a new card. Having said that, the card might be part of the problem.

@Harmpert, You can use the free DiskInternals Linux-Reader to read the content of your Home-Assistant SD-card on a windows PC.
C’T magazine has a nice SD-card test on their site (in Dutch).

@Wilber
How did your system behave last night? Still no problems?

I installed a new SD card and did not have a crash last night.
On the other hand, my configuration is not entirely back to normal yet…

Still up and running fine this morning!

I can understand how a rebuild would fix a crashing problem, but it’s very strange that several of us had the same issue at around the same time.

@Wilber

Good to hear!
Indeed, you do have a point, It still is a good idea to identify which process is running on sunday night that triggers a (seemingly) problematic SD-card to act up.

Hi, here the same problem.

I started with HA in nov. 2020. And from the beginning (I think) my RPI4 crashes every Monday during the night. And I’m not doing any automation at that time.
My HA crashed at 3h04 this night (summer time GMT+1, Brussels) . Some previous crashes happened at (winter time) +/- 2h00, 1h30, 2h30

I have always the same behavior:

  • No automation was done this morning
  • But I can logon from my PC to my PI. In the overview menu, I can even manually set some lights.
  • But I cannot open any other menu than “overview”. I receive the error message “Unable to load the panel source: /api/hassio/app/entrypoint.js” when trying to open other menus, like the logging menu, Red-one menu, supervisor menu.
  • Not possible to connect to my RPI with Samba

I always have to power off/on my RPI

After power off/on -> some screennshots indicating that memory use, disk use, processor temp are very acceptable.

Environment: Raspberry PI model4 – 64bit - 4GB Ram – 64GB SSD
HA Software versions: Core 2021.3.3 / Supervisor 2021.03.6 / Host: 5.12 /
Add-ons: File editor: 5.2.0, Samba share V 9.3.1, Terminal&SSH 9.1.0, Zwave JS 0.1.16, Log Viewer 0.9.1, Nodered 8.2.1, DSS VOip Notifier 3.5.6

@hdehaseleer, You write 64GB SSD, but I assume you mean a 64GB SD, right? What is the brand and model of the SD card and what is it’s speed-class?
Please have a look at the linked Github-issue. Multiple people got rid of the issue by upgrading their SD cards. (actually all people who tried)

@Plevuus: Thanks for your prompt response.

Yes, i’m using a SD card (not SSD). (Kingston Canvas Select plus / 10 I U1 A1 100MB/s)

I didn’t believe that a hardware problem could arrive each Monday moring 3h00.(and only on Monday)
But indeed, perhaps the OS is doing some cleanup Monday +/- 3h00 which the SD card cannot follow.
I’ll buy a new one and give it a try.