HA Supervised Unreliable?

knebb · April 23, 2024, 11:00am

Hi,

I have HA running on a Debian12 VM already a while ago. I have some AddOns installed (Influx, Grafana, …) but due to time lack there is not much going on on the system. I only check the system from time to time, so there are frequently several days before I access the frontend.

Unfortunately, my HA installation is quiet unreliable. When tring to connect the server can not be reached by the browser. Checking on the Debian system itself it appears the port 8123 is not open to offer communication.

So for some reason port got closed and the task behind it vanished.
I am unsure how to recover here and I want to know WHY the port ist closed!

I tried docker restart hassio_supervisor
I checked logs with journalctl -xe but I do not see any hints.
I re-installed os-agent with dpkg -i os-agent_1.6.0_linux_x86_64.deb.
I re-installed Docker by curl -fsSL get.docker.com | sh..
I even re-install the package by apt install ./homeassistant-supervised.deb from fresh download.

No matter what I try- it stays offline and I cannot connect.
This is the journalctl -f output after restarting docker:

Apr 23 12:52:22 hoas hassio_supervisor[171900]: 2024-04-23 12:52:22.231 INFO (MainThread) [supervisor.host.services] Updating service information
Apr 23 12:52:22 hoas hassio_supervisor[171900]: 2024-04-23 12:52:22.233 INFO (MainThread) [supervisor.resolution.checks.base] Run check for free_space/system
Apr 23 12:52:22 hoas hassio_supervisor[171900]: 2024-04-23 12:52:22.234 INFO (MainThread) [supervisor.resolution.check] System checks complete
Apr 23 12:52:22 hoas hassio_supervisor[171900]: 2024-04-23 12:52:22.234 INFO (MainThread) [supervisor.resolution.evaluate] Starting system evaluation with state running
Apr 23 12:52:22 hoas hassio_supervisor[171900]: 2024-04-23 12:52:22.243 INFO (MainThread) [supervisor.host.network] Updating local network information
Apr 23 12:52:22 hoas hassio_supervisor[171900]: 2024-04-23 12:52:22.415 INFO (MainThread) [supervisor.host.sound] Updating PulseAudio information
Apr 23 12:52:22 hoas hassio_supervisor[171900]: 2024-04-23 12:52:22.420 INFO (MainThread) [supervisor.host.manager] Host information reload completed
Apr 23 12:52:22 hoas hassio_supervisor[171900]: 2024-04-23 12:52:22.446 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
Apr 23 12:52:22 hoas hassio_supervisor[171900]: 2024-04-23 12:52:22.450 INFO (MainThread) [supervisor.resolution.fixup] Starting system autofix at state running
Apr 23 12:52:22 hoas hassio_supervisor[171900]: 2024-04-23 12:52:22.450 INFO (MainThread) [supervisor.resolution.fixup] System autofix complete
Apr 23 12:52:30 hoas systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Apr 23 12:52:49 hoas dockerd[171900]: 2024/04/23 12:52:49 http: superfluous response.WriteHeader call from go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*respWriterWrapper).WriteHeader (wrap.go:98)
Apr 23 12:52:49 hoas systemd[1]: run-docker-runtime\x2drunc-moby-22b2f2f8b804eeee94a0d6076189437ec4603ced9e6b65686a74ecaa6a4d0255-runc.jE8PQ7.mount: Deactivated successfully.
Apr 23 12:52:52 hoas systemd[1]: systemd-hostnamed.service: Deactivated successfully.
Apr 23 12:52:52 hoas systemd[1]: systemd-timedated.service: Deactivated successfully.
Apr 23 12:53:19 hoas systemd[1]: run-docker-runtime\x2drunc-moby-22b2f2f8b804eeee94a0d6076189437ec4603ced9e6b65686a74ecaa6a4d0255-runc.caCYxo.mount: Deactivated successfully.

After a full system reboot everything is up and running and I can connect to my instance. And I can see from what time it went down as the MQTT data did not get collected.

Now, I could schedule a cron job to reboot the system every now and then or based on reachability of port 8123. But I guess we all agreee this is not the solution to this issue.

Anyone here having some ideas to troubleshoot the issue? Why only a reboot does help? What steps can I do to figure out why it stopped?

Thanks a lot!

/KNEBB

Nick4 · April 23, 2024, 11:28am

No, HA (Supervised) is not unreliable, your system is.

You could restart in safe mode, disable (custom/community) integrations to see where it comes from.
Another option is to wait for someone who knows more then me.

I wonder if this line could play a role:

francisp · April 23, 2024, 11:30am

if you run in a VM anyway, why not HA OS ?

knebb · April 23, 2024, 12:57pm

Hi,

there are several reasons I prefer my own OS and use supervised mode. Some of them:

Use of encrypted disks
Ability to use ssh for the full system and not just in a docker
Scripting admin tasks to automated as much as possible
not having a black box
and some more

Ok, regarding the issue:
These are my integrations:

Apple TV
DLNA
HA Supervisor
IFTTT
IPP
iRobot
Met.no
MobileApp
MQTT
Philips Hue
Ping (ICMP)
Shelly
Sonne

Of course I can try to disable these step-by-step. And wait if HA breaks. If not, assume I found the culpit. But this will be a very weird way to troubleshoot.
I prefer to check logs, and find the reason for a problem.

I expected to get more information from the logs.
@Nick4
What might be the problem in the entry you quoted?

Thanks!

/KNEBB

WallyR · April 23, 2024, 1:35pm

Are you adept in running a Linux server with the requirements listed for Supervisor?
Many think they can run Debian, but HA Supervised sets specific requirements that limits what you can and can’t do.
One common mistake is to try and circumvent Network Manager, which typically makes a mess of the network configuration. Often to such a degree that a roll back is not possible and a reinstall is necessary.

paddy0174 · April 23, 2024, 1:51pm

See, that’s why supervised is not the preffered installation method. It is advised for people who know their own system exactly…

As you might have noticed, we can’t really help, because no one besides you knows, what your system is built like. All the points you name as an advantage for supervised are exactly the points, you’re now fighting with.

“not having a black box”: and now you don’t have a black box? You know less from your system, as you would know from a “closed” HA-OS install. Thing is, in case of HA-OS you don’t need to know. In supervised it’s your responsibilty to know and to fix.
“Use SSH for the whole system”: Yep, same here. You don’t need to fiddle around in HA-OS, it just works. You only need the SSH connection, because you must work on the system where HA-OS doesn’t give you access, because you don’t need to have it.
“Scripting admin tasks”: that you wouldn’t need to schedule, if you wouldn’t need to repair your system…

I’m honest, I had this discussion many times here in the forum, and so I know the arguments. HA-OS is the better alternative for almost every use case!

I’d strongly recommend you make your VM with HA-OS and “just use it”. If you want to fiddle around in a system, setup another VM and play there. There are litterally no reasons for using supervised anymore. It’s so 2010, it’s just outdated.

Sorry, not what you wanted to hear I’m sure!

knebb · April 23, 2024, 1:57pm

Hi,

well yes. Running Linux server since 30years. Iguess I have some experience
And yes, I read (and followed!) the steps and requirements en detail!

And no, besides of some minor changes I did not configure anything in an unusual way.

I am wondering why service I should have to restart in order to get my HA instance back again working.
I just did some troubleshooting any check the available service in systemd. So next time it happens I will try to restart these serivces:

root@hoas:~# systemctl list-unit-files| grep hassio
hassio-apparmor.service                    enabled         enabled
hassio-supervisor.service                  enabled         enabled

As long as no one else has a better idea?

/KNEBB

fleskefjes · April 23, 2024, 1:59pm

Again probably not what you want to hear, but if you’re having stability issues and the need of restarting services to get it back running I would look at the underlying cause, not workarounds to “hack it to work”.

boheme61 · April 23, 2024, 2:00pm

Is that “log-snippet” all you got ?
Why not look in logs prior to Restart ?
Have you enabled DEBUG mode in logger ?

knebb · April 23, 2024, 2:04pm

Well, this was my origin idea heare…
But as no one could tell me where to have a look or helps in reading the log I provided I am limited to workarounds…

@paddy0174 : You are trying to tell me HA-OS will not have any issues I need to troubleshoot? At least I can try to troubleshoot based on my Debian, in HA-OS I am not really able to get on the console…and I am pretty sure the logs will not be better…

/KNEBB

fleskefjes · April 23, 2024, 2:05pm

FWIW I’ve been running HAOS for some years now and I’ve never had the need to SSH into it to restart services or troubleshooting anything on the OS level.

francisp · April 23, 2024, 2:06pm

What VM hypervisor are you using ? You should be able to get to the console.

knebb · April 23, 2024, 2:08pm

Thanks for the hint about logger.

There is obviously some more, but I will not cut&paster loads of lines as long as we do not know what we are looking for… I’ll see if I can attach it…

Logger was set to info, now switched to debug:

logger:
 default: debug

/KNEBB

knebb · April 23, 2024, 2:09pm

Yes, I am. But HA ist a server type-OS and I it is very cumbersome to do troubleshooting on the VM console through hypervisor (no cut&paste and wrong keyboard and stuff like this).

boheme61 · April 23, 2024, 2:13pm

Don’t call that “snippet” of log-entries ( after a restart !) for troubleshooting-logs !
Everything looks fine, maybe beside the part NICK4 mention

With 30 years experience running Linux Server, im sure you know how to
Google, on a simple log-output

boheme61 · April 23, 2024, 2:15pm

Now Restart HA again , and “monitor” your logs until it “spews” errors/connection-timeouts/ etc

Nick4 · April 23, 2024, 2:15pm

What about the first part of that question?

IDNK, just made that remark in case it might help.

boheme61 · April 23, 2024, 2:25pm

A simple google search, first post, reveal this

github.com/moby/moby

Syslog entry: "superfluous response.WriteHeader call" in Docker 25.x

opened 04:03AM - 26 Feb 24 UTC

jamescarppe

status/0-triage kind/bug area/metrics area/metrics/otel version/25.0

### Description When retrieving the logs for a container (via `docker logs` for… example), a log entry is made in `/var/log/syslog`: ``` Feb 26 16:54:53 be-docker dockerd[410]: 2024/02/26 16:54:53 http: superfluous response.WriteHeader call from go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp.(*respWriterWrapper).WriteHeader (wrap.go:98) ``` This appears to have been introduced alongside the OpenTelemetry addition in Docker 25.0.0 as this log message is not created on previous versions of Docker (for example 24.0.6). ### Reproduce 1. Install Docker 25.x 2. Create a container (any will do) 3. Monitor `/var/log/syslog` (via `tail -f` for example) 4. While monitoring, run `docker logs containername` 5. Note the entry in syslog ### Expected behavior No "superfluous response" message. ### docker version ```bash Client: Docker Engine - Community Version: 25.0.3 API version: 1.44 Go version: go1.21.6 Git commit: 4debf41 Built: Tue Feb 6 21:14:17 2024 OS/Arch: linux/amd64 Context: default Server: Docker Engine - Community Engine: Version: 25.0.3 API version: 1.44 (minimum version 1.24) Go version: go1.21.6 Git commit: f417435 Built: Tue Feb 6 21:14:17 2024 OS/Arch: linux/amd64 Experimental: true containerd: Version: 1.6.28 GitCommit: ae07eda36dd25f8a1b98dfbf587313b99c0190bb runc: Version: 1.1.12 GitCommit: v1.1.12-0-g51d5e94 docker-init: Version: 0.19.0 GitCommit: de40ad0 ``` ### docker info ```bash Client: Docker Engine - Community Version: 25.0.3 Context: default Debug Mode: false Plugins: buildx: Docker Buildx (Docker Inc.) Version: v0.12.1 Path: /usr/libexec/docker/cli-plugins/docker-buildx compose: Docker Compose (Docker Inc.) Version: v2.24.5 Path: /usr/libexec/docker/cli-plugins/docker-compose Server: Containers: 10 Running: 7 Paused: 0 Stopped: 3 Images: 33 Server Version: 25.0.3 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Using metacopy: false Native Overlay Diff: true userxattr: false Logging Driver: gelf Cgroup Driver: cgroupfs Cgroup Version: 1 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog Swarm: inactive Runtimes: runc io.containerd.runc.v2 Default Runtime: runc Init Binary: docker-init containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb runc version: v1.1.12-0-g51d5e94 init version: de40ad0 Security Options: apparmor seccomp Profile: builtin Kernel Version: 5.4.0-171-generic Operating System: Ubuntu 20.04.6 LTS OSType: linux Architecture: x86_64 CPUs: 2 Total Memory: 1.925GiB Name: be-docker ID: HJDW:MIDK:VVF2:ZZC5:CCHA:6KOE:CCZU:TXJ3:2YUM:2KHI:NKJX:7ID3 Docker Root Dir: /var/lib/docker Debug Mode: false Username: jcarppe Experimental: true Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false WARNING: No swap limit support ``` ### Additional Info While a single log line entry doesn't seem like much, when using a separate tool (such as Portainer) to view container logs that auto-refreshes, this can result in repeated syslog entries.

Now tell me with your 30 years Linux VS HA experience , is this caused from HA, OR your Docker env. ?
PS: I dont say it will lead you to a Solution, But i do know ( IF you would have experienced this in HAOS ) , it would be fixed, by HA Devs, if it was an Issue

PS: In HA-Supervised you also find logs in UI /Settings/System/Logs, if i remember right , 3 years now since i “tried” Supervised, and came to the conclusion " No You can’t really do what you want with your “Own” Debian OS, HA limits/sets the rules you Have To follow, And So your really have to know HA(All parts) and the requirement for a Supervised installation, beside Docker env. Which was totally unknown for me “pre-retired” since more than a decade

Anyway, your DEBUG outputs, might lead you to an “easier” conclusion on what’s going on in your System
And you Cant Exclude, all your Debian logs, just because you can’t access HA ( As you already have noticed )

paddy0174 · April 23, 2024, 2:59pm

Nope, not trying to tell, I’m telling you! History and experience shows numerous times, that the only error and fault free running supervised installations are taken care of by Linux specialists.

I know from own experience, what I’m speaking about. I had supervised running for nearly three years and switched to HA-OS after some nice guy here in the forum had a heated exchange with me. I tried it, and have zero problems since then! My HA-OS install in a Proxmox VM is running now for almost half a year without restarts of the VM or any problems.

Believe it or not, but it’s your choice:
Either be a computer nerd, that waists time, nerves and money with a supervised install, that brings no advantages, or be a HA user, that focuses on making his life easier.

Not meant rude, really, but if you ask around, everyboody will tell you the same (as a few already did here in this topic): use HA-OS in a VM and not supervised! Simple as that!

Why not try it for yourself and prove me or yourself wrong? Setup a VM with HA-OS, ideally with Proxmox and not VMware, restore the backup from your supervised installation and let it run for at least two weeks or one major update.
I can practically guarantee you, that you’ll be better off with that installation.

As I said, this forum and its many users have proven my point right.

WallyR · April 23, 2024, 3:00pm

The reason why I wrote as I did was that the usual way is not to use Network manager, but other tools and that is simply a no-go.
If config or direct editing of config files will trash the Network manager control and will then affect HA in unexpected ways.