No SSH or HA

Filip_Van_Hoeckel · April 8, 2019, 6:48pm

After the latest update I get no SMB, SSH access (connection refused) or HA (connection refused, both on IP as well as DuckDNS name). I can ping the machine just fine and have direct shell access. Issues began when the sdcard apparently got full bc db was getting too large, even though I have that set to purge on interval (apparently that doesnt really work consistently as well). Logfiles do not list any showstopping errors whatsoever. Please advise, thanks.

anon34565116 · April 8, 2019, 6:53pm

Perhaps missing or corrupted files?
Depending on how HA is installed, it might be possible to force the upgrade again. Otherwise it is time to trust your backups and reformat ;(

Do NOT do what I did last weekend. I backed up my data, and reformatted the card without copying the backups off the card.

Filip_Van_Hoeckel · April 8, 2019, 6:55pm

Wouldnt that produce logs? I can see shell running and log in on that. I have SMB which means the addon is running. As the addon is running, that must mean HA is running as well, and judging by the shell it is. It’s just without UI and SSH.

Filip_Van_Hoeckel · April 8, 2019, 6:56pm

So SMB came up, now SSH is back up as well. Now, if only HA would come up as well…but in the meantime I finally get something useful in the logs

2019-04-08 20:58:00 ERROR (MainThread) [homeassistant.core] Error doing job: SSL handshake failed
Traceback (most recent call last):
  File "uvloop/sslproto.pyx", line 500, in uvloop.loop.SSLProtocol._on_handshake_complete
  File "uvloop/sslproto.pyx", line 484, in uvloop.loop.SSLProtocol._do_handshake
  File "/usr/local/lib/python3.7/ssl.py", line 763, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLError: [SSL: SSLV3_ALERT_CERTIFICATE_UNKNOWN] sslv3 alert certificate unknown (_ssl.c:1056)

Filip_Van_Hoeckel · April 8, 2019, 7:14pm

Even weirder is that I can connect to the static IP I have been using for ages, but if I query the IP using ifconfig I get a totally different IP address which doesnt even belong to my network.

flamingm0e · April 8, 2019, 7:40pm

When you use the SSH add-on, you are connecting INSIDE a container running alongside hassio. If it is a 172.x.x.x address, it’s the docker internal network. Ignore that.

CaptTom · April 8, 2019, 11:39pm

Not sure if it’s any help, but I had almost the identical problem. First I lost my Zigbee devices (“unavailable”), then the HA web interface, then SSH. Oddly enough, I never lost SMB.

I tried restarting, rebooting, powering off and back on, and removing my HUSBZB USB stick. No luck. I restored an old config file, deleted the database, still no good. Finally disconnected it all and brought it to where I had a monitor, keyboard and mouse available, and fired it back up.

Worked fine then. Powered down, moved it back to its normal home, and it’s been ok for the last couple of days. Never did find the problem.

fillwe · April 9, 2019, 5:09pm

Got the same problem after updating to 0.91. Reverted back via ssh to 89 and after 0.91.1 came out I updated again and it has worked since then, don’t know how 0.91.2 will fair though…

double68 · April 9, 2019, 8:37pm

I found the same problem.
I noticed that the error occurs using google chrome.
No errors with Firefox.
why ???

anon34565116 · April 9, 2019, 9:15pm

Try clearing your Chrome browser cache

CaptTom · April 9, 2019, 10:56pm

Interesting. This all happened after updating to 0.90.2. I see there’s a new version, 0.91.2. I’ll give it a shot and watch for similar issues.

I do typically use Chrome, both on the laptop and phone, but I don’t see the connection with my Zigbee devices becoming unavailable, or SSH not working. But I’ve had to clear the cache, or at least refresh, in the past, so it’s worth a shot.

Filip_Van_Hoeckel · April 13, 2019, 5:42pm

Have SSH and Samba, just no web interface. As a matter of fact I even think HA is running as my automations still work. Wouldn’t it be nice if developers stopped adding on features and focus on core stability instead…

flamingm0e · April 13, 2019, 7:00pm

Weird. Mine is stable as hell. I reboot it once every few weeks after an update and don’t have to mess with it at all between updates.

Perhaps the problem is with your setup…

anon34565116 · April 13, 2019, 7:36pm

The original poster in this thread ran out of space on their card, likely resulting in file corruption. There is no way the developers could have avoided that short of physically upgrading the OP’s system themselves.

What would be really nice is if users read the Release Notes before upgrading & read the log files if things fail. You have SSH & Samba so you have the tools to do that.

Also let us know solutions you tried that did not work.

Filip_Van_Hoeckel · April 14, 2019, 6:24am

My installation ran out of space bc purging the database apparently was not running, even though it was configured to, as per documentation. There was no mention or intention of any physical upgrade on my part, btw, so that sounds like assumption.

As a matter of fact I always read and follow documentation to a tee, as I’m too stupid to figure these things out myself. There was no mention of breaking any core services by upgrading what essentially is a dotdot-release. Reading the log files the main issue standing out is SSL issues. According to documentation DuckDNS should renew LE certs automatically: set it and forget it. But apparently this component also suffers from documentation issues. When investigating I see hundreds of users w/ the same issues and about the same number of solutions, which basically boil down to “have you tried rebooting it”.

Upgrading to the latest version, restarting HA, rebooting the Pi. Reading countless threads of people having similar issues on Discord, Reddit, the forums, etc. Digging trough the log files, pinpointing it to an SSL issue. Reopening the SSL thread I opened three months ago about LE not renewing.

anon34565116 · April 14, 2019, 8:51am

Once you have filesystem corruption the only solution is to copy off any important data, hoping it is not corrupted, and then reformat & reinstall.
Unix style filesystems do not handle running out of space well.

Filip_Van_Hoeckel · April 14, 2019, 5:49pm

What makes you assume file corruption is the issue here? Suppose it is and I set my db to purge to prevent a full file system, the system not complying and thus corrupting itself…that’s not software - that’s a timebomb.

anon34565116 · April 14, 2019, 5:52pm

Your quote below.

I know from many years of experience that Unix filesystems are prone to corruption, especially if it happened while trying to write a bunch of data.

flamingm0e · April 14, 2019, 6:06pm

Journaling filesystems don’t suffer these issues.

This hasn’t been an issue with Linux filesystems in over 15 years.

Filip_Van_Hoeckel · April 14, 2019, 6:48pm

I disabled SSL in conf and got back in, so no ruined filesystems whatsoever. Now I only need to solve the irritating DuckDNS/SSL issue.