Help Building Resilience?

Update: As soon as I unplugged it, it must have locked up. First, after waiting longer that the 5 minutes in #4, It was completely inaccessable, so I had to pull the power and then plug it back in to reboot it. Then when checking in the logs - there was absolutely nothing in the log from the moment I unplugged it from the ethernet cable. When HA finally started up after the plug pulling reboot, everything was working fine except for some command line sensors that look at the syslog - in the log itself appears “binary match found” which means a command line sensor call to GREP syslog thought the log was binary. Just deleting the log so any corruption in it is gone resolved that issue. So I had to delete the syslog and restart the RPI (properly with a shutdown -r command) - only then everything came back normally.

My hard-coded (crontab) logged and emailed reporting of scheduled reboots are working perfectly but over and above that, if there is any issue - my RPI is vulnerable in that way… I am surprised watchdog didnt make it reboot. Anyway, so my idea of log scraping won’t work either in a case like this… UGH

Maybe my 200 ERROR count idea PLUS something else. Jeez

Thoughts?

The HA log from the previous session is rename to home-assistant.log.1 on startup, so you need to actually go into the folder and read it with a text editor or copy it out for opening on another computer.

1 Like

I had no idea HA did this. Since when?

From my very vague recollection, 6 months -ish.

Edit, I think since Aug 2021 Change logging to do rollover() instead of rotate() by janiversen · Pull Request #55177 · home-assistant/core · GitHub

Yes, the log in the home assistant directory will be rolled, but if you view it with journalctl -f homeassistant (e.g. on Debian) it will be continuous.

@KruseLuds lets try to get HA out of the mix for a moment to get rid of confounding issues. Make a job that pings e.g. Google once a minute and pipe the output to a file. Or, just open a terminal and run a continuous ping and break your network again. If the ping doesn’t come back, we know this issue has nothing to do with HA directly.

I like this idea - will do more investigation and let you know, thanks!

@parautnbach I think you hit upon the root error at this point. Remember now, I started with pure and clean debian and NOTHING else on it (I had to install SUDO, etc.). I tried pinging google.com, and it was returning nice and fast, disconnected the ethernet cable, waited a couple of minutes, reconnected the cable, since I had been running headless I then reconnected - only to see that it never recovers the internet connection (but does recover the local connection enough to allow me to attach to it headless again). Of course this makes HA go down the drain with repeated errors. Rebooted and the problem then goes away of course. Researching on how to enable a plain debian (with ethernet internet connection) can be configured to reconnect when the internet comes back (at the host level). Looking forward to any ideas you may have while I research what I can add or reconfigure - (Hence my saying “Remember now, I started with pure and clean debian and NOTHING else on it (I had to install SUDO, etc.).”)…

So does it get it’s IP address from DHCP, or have you set a static?

Set as static on the router but not on the HA end. Due to that it is always the same number anyway, I could make it static on HA as well but that wouldn’t make any difference. Restarting Nework Manmager, restarting eth0, nothing seems to work to make it reconnect once the ethernet cable is puilled and then put back in.

I had a command line sensor that would have the number of AdGuard DNS errors in the last 10 minutes using this sytax:

sudo journalctl --since “10 minutes ago” | grep “hassio_dns” | grep “ERROR] plugin/errors:” | wc -l

However, the lack of connectivity causes Home Assistant itself to lock up so there could be no action taken on the sensor. So, I am going to write an app that will run on startup on the host outside of HA, which will update the count using the above statement and will call the service to restart the adguard addon once the above error count gets above 50, and also to then if the count continues to go up, will just reboot the RPI once it reaches 500 log entries in the last 10 minutes.

I know the above is a hack, but not being able to just get the damned thing to reconnect to the internet when the cable is plugged back in is driving me nuts (and I thoroughly tried everything under the sun when pulling the cable out and putting it back in, while homje assistant ws NOT running on the rpi)!

The pi should reconnect when the cable is pulled and then replugged.

Yes it should. What do you suggest? Debian 64 bit Home Assistant Supervised, RPI 4b w/8Gig RAM, booting off a 1TB SSD (specs above).

My PC running debian and supervised certainly doesn’t exhibit this behaviour, although I set it to a static IP (how, I cannot recall now!). It is not in the dhcp.leases file on my dhcp server so I guess that is correct.

I will set iot to a static IP and see if that makes any difference.

@nickrout I’ve made the IP static on on the RPI. Also, when home assistant is NOT running on it and I am at a command prompt, and can ping google.com, when I pull the ethernet cable out and plug it back in - cannot reach the internet any more, unless I reboot the pi. Ideas?

Just to say that the network is handled by network manager, which must save some logs somewhere :slight_smile:

I run headless, so I lose the connection when I pull the plug. However, even though the internet is lost, I can reconnect headless when the plug is back in. I updated the IP to static - I had not done it properly for some reason but following these steps did it - and fixed the preoblem - but opnly fixed the problem temporarily. When unplugged the ethernet cable and then plugged it back in, I saw that HA and adguard recovered. So I then rebooted the RPI without adguard or HA running - and unplugged and plugged the ethernet cord back in - unable to ping anything, so no recovery. I rebooted it again and tried the same with HASS running, same as before unplug and replug the ethernet cable - and HA wasn’t able to recover - so myIP is static and we are all set in that regard but no solution - yet - thoughts @nickrout ?

1 Like

Just to be 100% certain with what you mean here: It’s a static IP set on the Pi and not a DHCP reservation on your router, right?

Actually both. :slight_smile:

And to avoid cross posting and make things easier for people to read, I am posting the end problem statement (reconnecting) in a different post (here) and marking this one as completed as that is the only thing left that I cna think of regarding the resilience on my end…