Unable to connect to Home Assistant. Juli 2021 release

yann1420 · August 2, 2021, 5:54pm

Hello all,
I have an RPI4-8GB running Home Assistant Operating System. While being on holiday I updated to the latest Juli version from my android app.
When coming home I cannot access HA any more not with Android app ( Unable to connect to Home Assistant. ), not with Firefox ( Unable to connect to Home Assistant. ) or Chromium (This site can’t be reached)on Ubuntu. And with ssh I get a “ssh: connect to host port 22: No route to host”

I use a duckdns.org entry , no connect Then created an entry in my local DNSmasq to shortcut the DNS name, no connect. Local hostname, no connect. IP-address, no access.
No access through SMB anymore.

Can someone hint me how to restore the instance?

Thanks in Advance

aceindy · August 2, 2021, 6:29pm

Can you ping the IP address??

EnderTheThird · August 3, 2021, 4:29am

I’ve had this issue myself with Home Assistant OS installed on an Intel NUC. It ran perfectly for about 3-4 months until the July update broke things and the core system just stopped running. The Supervisor observer would show that it was running, but HA core wouldn’t start. I wasn’t able to get my snapshots to restore via CLI because I couldn’t sort them to find the most recent full restore, so I just started over from scratch.

That was going fine until a day or 2 ago and then core stopped working again. This time I saved multiple full snapshots on my desktop but restoring those didn’t help – core still wouldn’t start. On a fresh install, I also tried restoring those, some of which from a week or 2 ago at which time everything was definitely working without any issues – core crashes and won’t start, again.

Since the snapshots don’t seem to be working correctly, I’ve now said forget it and installed HA OS under Virtualbox with an Ubuntu install on that same NUC. So far so good, and the VBox snapshots work properly, though they’re much larger files. I tested it by starting to setup my new install, making sure everything worked, and then trying to restore an old snapshot, which did the trick of borking it, only this time I had a VBox snapshot to fall back on.

Something is screwy with HA Core and I have no idea how to figure out what’s causing it to fail across multiple installs. I tried the Docker install on my NAS, which worked fine, but I really like the HA OS install with supervisor since for 3-4 months, it worked absolutely perfectly. Anything I can send to try to help with diagnosing this issue, I’m happy to help. I can bork this install and send whatever logs since I know my VBox snapshots can get me running again.

yann1420 · August 3, 2021, 10:19am

Why did I not think of this elementary test?
And no, the IP-address can not be pinged, and no trace in the dhcpd log file.

So, what can I do? . The HA OS makes that it is pretty closed. The only thing I can think of is take out the storage and copy an old enough snapshot and tar it back .

Other suggestions?

yann1420 · August 3, 2021, 10:22am

HA is a nice product , but far from fool proof. I don’t want to imagine if the YL (Young Lady) runs into the problem.

aceindy · August 3, 2021, 10:53am

euh…

cold reset?
hardware failure?
faulty network cable?
faulty port on router?
OS doesn’t boot properly?
Not using DHCP/wrong fixed IP set?

Could be many things…guess it’s time to hook up a monitor/keyboard?

yann1420 · August 4, 2021, 6:33am

Cold reset? I unplugged waited for 20 seconds and powered on again. No change.
network is a wired RPI4. I checked the switch and all port are ok
HW failure, all ways possible and in order to check I’ve been able to connect a monitor/keyboard to the headless pi. And this a lot off error messages appeared on the screen. Amongst them:

– failed to start HassOS overlay Setup
– failed to start File System Check on /dev/disk/by-label/hassos-data
– failed to to mount HassOS data partion
And then ended up in an emergency mode . A df learned me that /dev/root is 100% full . There are 8 partitions on the SD-card but as I consider HAssOS as a blackbox (and it behaves like a blackbox to me ) I have no clue what the functions are of these partitions.

So, I wonder what is next ?

aceindy · August 4, 2021, 6:46am

sounds like the ssd has either a reduced size (due to corruption), or is simply full…,

You could try to find the file database.db (the database with your devices history) and delete it, which should free up some space.
Then maybe it would start up again, and allow you to take a snapshot.

But you should replace the ssd either way…

PS: a harddisk is normally split up in sections.
These sections are called ‘partitions’.
These partitions can differ in size

yann1420 · August 4, 2021, 7:38am

Thank you for your help, Aceindy.

It is running from an SD card. Have not been able to get it on an SSD.
In a life prior to my previous life I was admin for a large amount of UNIX systems, but my knowledge has become somewhat rusty .

IMHO a supervised system should have warned me or better fixed potential problems for me ( or other novices) . And thus I see this incident as an opportunity to improve HAss-OS.

I see 8 partitions of which 4 can be mounted

/dev/mmcblk0p2     22773     22348         0 100% /media/yann/hassos-kernel
/dev/mmcblk0p3    123520    123520         0 100% /media/yann/disk1
/dev/mmcblk0p4     22773     22348         0 100% /media/yann/hassos-kernel1
/dev/mmcblk0p5    125184    125184         0 100% /media/yann/disk

then there is
/dev/mmcblk0p7 with a label hassos-overlay that does not mount. (read-only ?)
/dev/mmcblk0p8 with a label hassos-data that does not mount. (read-only ?)

root@leonie:/media/yann#  fsck -n  /dev/mmcblk0p8
fsck from util-linux 2.34
e2fsck 1.45.5 (07-Jan-2020)
Warning: skipping journal recovery because doing a read-only filesystem check.
hassos-data: clean, 311842/1916928 files, 4222403/7636731 blocks
root@leonie:/media/yann# fsck -n  /dev/mmcblk0p7
fsck from util-linux 2.34
e2fsck 1.45.5 (07-Jan-2020)
Warning: skipping journal recovery because doing a read-only filesystem check.
hassos-overlay: clean, 53/24576 files, 8949/98304 blocks

This leaves:

root@leonie:/media/yann# fsck -n  /dev/mmcblk0p1
fsck from util-linux 2.34
fsck.fat 4.1 (2017-01-24)
0x25: Dirty bit is set. Fs was not properly unmounted and some data may be corrupt.
 Automatically removing dirty bit.
Leaving filesystem unchanged.
/dev/mmcblk0p1: 245 files, 3860/16343 clusters

and

root@leonie:/media/yann # fdisk -l /dev/mmcblk0p6
Disk /dev/mmcblk0p6: 8 MiB, 8388608 bytes, 16384 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
root@leonie:/media/yann# fsck  -n   /dev/mmcblk0p6
fsck from util-linux 2.34
e2fsck 1.45.5 (07-Jan-2020)
ext2fs_open2: Bad magic number in super-block
fsck.ext2: Superblock invalid, trying backup blocks...
fsck.ext2: Bad magic number in super-block while trying to open /dev/mmcblk0p6

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

root@leonie:/media/yann#  e2fsck -b 8193 /dev/mmcblk0p6
e2fsck 1.45.5 (07-Jan-2020)
e2fsck: Read-only file system while trying to open /dev/mmcblk0p6
Disk write-protected; use the -n option to do a read-only
check of the device.
root@leonie:/media/yann #    e2fsck -b 32768  /dev/mmcblk0p6
e2fsck 1.45.5 (07-Jan-2020)
e2fsck: Read-only file system while trying to open /dev/mmcblk0p6
Disk write-protected; use the -n option to do a read-only
check of the device.

Anyway, to answer your suggestion, I’ve not been able to detect a database.db and therefor could not free up space yet.

francisp · August 4, 2021, 8:22am

yann1420:

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
 or
    e2fsck -b 32768 <device>

I have never been able to repair a disk with a bad superblock. I think you will have to reimage your SD.

yann1420 · August 4, 2021, 9:22am

okay, I’ll start to prepare myself mentally for this event.
I wonder where I could find a description of the purpose/size/fs-type/other of each of those partitions .

aceindy · August 4, 2021, 9:39am

Maybe you can retrieve backup.tar’s from backup folder?
If not, you can take a manual backup,
by taking a copy of ‘config’ folder, and there should be a folder called ‘storage’, that folder contains all devices and entities.
If you use zigbee2mqtt, those can be backed up by copying /Share/Zigbee2mqtt/

yann1420 · September 7, 2021, 1:14pm

Hello Aceindy
what I think happened that the filesystem came to a 100% after making a backup prior the upgrade and that stopped the machine from working.
I reinstalled the system to 2021.08. As a self-respecting previous-sysadmin I had never made a backup so far, therefore had to reverse engineer my setup of which I am still not recovered to-date.
And with 2001.08 that has the new energy dashboard I have a whole new set of questions.
One thing though what I immediately tackled is the backup for which I, with great pleasure use the NextCloud integration/add-on.

Thanks, Yann