btasker
November 6, 2021, 11:42pm
70
This is good news.
I captured some metrics (leading to the GH issue above) and found that use of Cloudflare added about 1/2 second of latency on for me (despite CF only being about 15ms away). I’ve been meaning to rewrite the fallback to use UDP to try and prove whether it’s DoT overhead, but got sidetracked on other things.
Having adguard active shouldn’t come into play, no - as Phil says, the underlying problem is that the cloudflare fallback bypasses everything/anything you’ve set up on the LAN.
1 Like
btasker
November 6, 2021, 11:45pm
71
Hopefully it’ll gain some traction, but I’m not going to hold my breath.
To be honest, if it doesn’t, then I’ve already got a path in mind.
I have a working PoC for replacing supervisor and coredns images with patched versions without triggering the codeNotary checks (so it won’t mark itself unhealthy/unsupported and block updates) - if push comes to shove that’ll allow me to patch out any bits they’re not willing to fix as and when they arise, and then I can get back to using HA as an appliance rather than a source of frustration
3 Likes
pk198105
(Pk198105)
November 7, 2021, 4:25pm
72
what is so difficult to make this a configurable option? Let the people decide, let people opt out from the DNS TLS Cloudflare lookup if they dont want
2 Likes
I replaced the CoreDNS image with custom-made dnsmasq container instead. Works great.
myhades
(Alaric)
November 8, 2021, 3:11am
74
This’s getting unacceptable.
My HA runs in a network where all traffic must go through my own DNS server to avoid DNS poisoning, thus any other DNS requests is blocked or redirected.
I wasn’t aware of the situation before this post and it explains A LOT why HA’s been countering some weird network issues.
PLEASE DON’T force people to use embedded DNS, cause google devices do and they’re a such a pain in the ass.
4 Likes
btasker
November 8, 2021, 8:34am
75
Just watch your install doesn’t get marked Unsupported and Unhealthy as a result - it’ll prevent you from installing addon updates.
One of the things Supervisor does is list out running containers and check they’re “approved”. One of the conditions of a supervisor install is
The operating system is dedicated to running Home Assistant Supervised.
(from architecture/adr/0014-home-assistant-supervised.md at 3331ea0255844a46d2830f3d780672b37b5793ff · home-assistant/architecture · GitHub ).
It feels like there’s a certain disconnect with reality in all this. I don’t mind not getting support where I’ve done something custom, I do mind not being able to update stuff though.
Yes, this caught me when I first tried replacing it without any nuance. Supervisor went nuts trying to replace it but of course, as soon as it stopped the container, it lost name resolution and therein lies madness (and possibly ironically, the reason DNS is so tenaciously configured).
Since then I have prepared my replacement container such that it is no longer rejected by Supervisor, but it does take a little bit of a fandango during upgrades.
This is fantastic. I did however have an issue trying to install:
Failed to install add-on
The command ‘/bin/bash -o pipefail -c apk add --no-cache docker’ returned a non-zero code: 1
From system log:
21-11-23 15:05:47 INFO (SyncWorker_6) [supervisor.docker.addon] Starting build for 68e874ae/aarch64-addon-coredns-fix:0.1.1
21-11-23 15:06:01 ERROR (SyncWorker_6) [supervisor.docker.addon] Can’t build 68e874ae/aarch64-addon-coredns-fix:0.1.1: The command ‘/bin/bash -o pipefail -c apk add --no-cache docker’ returned a non-zero code: 1
21-11-23 15:06:01 ERROR (SyncWorker_6) [supervisor.docker.addon] Build log:
Step 1/23 : ARG BUILD_FROM=ghcr.io/hassio-addons/base/amd64:10.0.1
Step 2/23 : FROM ${BUILD_FROM}
—> 77db0d03c09e
Step 3/23 : SHELL ["/bin/bash", “-o”, “pipefail”, “-c”]
—> Using cache
—> 2643c024c7ad
Step 4/23 : ENV TERM=“xterm-256color”
—> Using cache
—> 330311bea649
Step 5/23 : ARG BUILD_ARCH=amd64
—> Using cache
—> cfccee773891
Step 6/23 : RUN apk add --no-cache docker
—> Running in 0c97ca0a26fa
fetch https://dl-cdn.alpinelinux.org/alpine/v3.14/main/aarch64/APKINDEX.tar.gz
WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.14/main: temporary error (try again later)
btasker
November 24, 2021, 8:55am
78
I think the Alpine repo’s been having some issues this week - I had an issue building another image the other day.
I should probably push a pre-built image to a registry though, there’s no real need to have to build it each time as we’re not customising anything based on the build host
2 Likes
Yeah just tried again and same error - will keep trying until I forget
GaryK
(Gary Kelley)
November 27, 2021, 8:42pm
80
Supervisor log output:
21-11-27 13:38:18 INFO (MainThread) [supervisor.store.git] Cloning add-on https://github.com/bentasker/HomeAssistantAddons/ repository
21-11-27 13:39:06 ERROR (MainThread) [supervisor.store.git] Can't clone https://github.com/bentasker/HomeAssistantAddons/ repository: Cmd('git') failed due to: exit code(128)
cmdline: git clone -v --recursive --depth=1 --shallow-submodules https://github.com/bentasker/HomeAssistantAddons/ /data/addons/git/68e874ae
stderr: 'Cloning into '/data/addons/git/68e874ae'...
fatal: unable to access 'https://github.com/bentasker/HomeAssistantAddons/': The requested URL returned error: 504
'.
21-11-27 13:39:06 INFO (MainThread) [supervisor.resolution.module] Create new issue IssueType.FATAL_ERROR - ContextType.STORE / 68e874ae
21-11-27 13:39:06 INFO (MainThread) [supervisor.resolution.module] Create new suggestion SuggestionType.EXECUTE_REMOVE - ContextType.STORE / 68e874ae
21-11-27 13:39:06 ERROR (MainThread) [supervisor.store] Can't load data from repository https://github.com/bentasker/HomeAssistantAddons/
It appears Github has an issue today.
Not wanting to flog a dead horse, but I’ve raised yet another issue on GitHub about this, as the behaviour on start-up has changed with the latest version
opened 03:13PM - 08 Jan 22 UTC
bug
### Describe the issue you are experiencing
After updating to the latest vers… ion and rebooting the host machine, I see a storm of DNS messages to 1.1.1.1 and 1.0.0.1 at a rate of approximately 1,330 requests per minute. This persists for 10-12 minutes then stops.
During this time, a 'normal' number of DNS(53) requests are also sent to the locally assigned DNS server, the system is fully functional, but shows an approximate doubling of CPU usage during this period.
![Screenshot from 2022-01-08 15-01-53](https://user-images.githubusercontent.com/59442445/148649166-dfc9acfc-0feb-46a0-8437-d4f4e78bfa78.png)
### What is the used version of the Supervisor?
supervisor-2021.12.2
### What type of installation are you running?
Home Assistant OS
### Which operating system are you running on?
Home Assistant Operating System
### What is the version of your installed operating system?
Home Assistant OS 7.1
### What version of Home Assistant Core is installed?
core-2021.12.8
### Steps to reproduce the issue
1) All port 853 DNS requests on the network are redirected to a local service. N.B. In previous versions, blocking 853 requests would result in a 'message storm', but redirecting these same requests would not.
2) Update and reboot the host
### Anything in the Supervisor logs that might be useful for us?
```txt
These are the DNS logs, which I think are unhelpful because they contain no timestamp information.
[INFO] 127.0.0.1:39215 - 9428 "NS IN . udp 17 false 512" NOERROR - 0 5.432606477s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.1.1.1:853: i/o timeout
[INFO] 127.0.0.1:34895 - 59462 "NS IN . udp 17 false 512" NOERROR - 0 5.458035354s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:56739 - 46027 "NS IN . udp 17 false 512" NOERROR - 0 5.331660312s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.1.1.1:853: i/o timeout
[INFO] 127.0.0.1:58808 - 46741 "NS IN . udp 17 false 512" NOERROR - 0 5.420126791s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.1.1.1:853: i/o timeout
[INFO] 127.0.0.1:38816 - 4732 "NS IN . udp 17 false 512" NOERROR - 0 5.405921786s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.1.1.1:853: i/o timeout
[INFO] 127.0.0.1:51994 - 23540 "NS IN . udp 17 false 512" NOERROR - 0 5.504404725s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:51276 - 50772 "NS IN . udp 17 false 512" NOERROR - 0 5.542411367s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:58576 - 56873 "NS IN . udp 17 false 512" NOERROR - 0 5.570220953s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.1.1.1:853: i/o timeout
[INFO] 127.0.0.1:39111 - 59553 "NS IN . udp 17 false 512" NOERROR - 0 5.511247088s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:50327 - 60812 "NS IN . udp 17 false 512" NOERROR - 0 5.502660682s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.1.1.1:853: i/o timeout
[INFO] 127.0.0.1:37019 - 7802 "NS IN . udp 17 false 512" NOERROR - 0 5.4931429099999995s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:35885 - 8511 "NS IN . udp 17 false 512" NOERROR - 0 5.464419785s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.1.1.1:853: i/o timeout
[INFO] 127.0.0.1:49851 - 43931 "NS IN . udp 17 false 512" NOERROR - 0 5.392757149s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:50418 - 31856 "NS IN . udp 17 false 512" NOERROR - 0 5.4743177020000005s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:59344 - 43489 "NS IN . udp 17 false 512" NOERROR - 0 5.485432552s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:45905 - 12656 "NS IN . udp 17 false 512" NOERROR - 0 5.404956821s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.1.1.1:853: i/o timeout
[INFO] 127.0.0.1:57442 - 55380 "NS IN . udp 17 false 512" NOERROR - 0 5.411243457s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:47079 - 42357 "NS IN . udp 17 false 512" NOERROR - 0 5.509683664s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.1.1.1:853: i/o timeout
[INFO] 127.0.0.1:45440 - 9175 "NS IN . udp 17 false 512" NOERROR - 0 5.474020209s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:33783 - 60902 "NS IN . udp 17 false 512" NOERROR - 0 5.51442521s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:45153 - 24924 "NS IN . udp 17 false 512" NOERROR - 0 5.480785116s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:47433 - 6081 "NS IN . udp 17 false 512" NOERROR - 0 5.524554836s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.1.1.1:853: i/o timeout
[INFO] 127.0.0.1:57140 - 486 "NS IN . udp 17 false 512" NOERROR - 0 5.42725042s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.1.1.1:853: i/o timeout
[INFO] 127.0.0.1:36694 - 39252 "NS IN . udp 17 false 512" NOERROR - 0 5.565304869s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.1.1.1:853: i/o timeout
[INFO] 127.0.0.1:40078 - 11967 "NS IN . udp 17 false 512" NOERROR - 0 5.602336224s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:43475 - 15053 "NS IN . udp 17 false 512" NOERROR - 0 5.644719862s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:60825 - 21094 "NS IN . udp 17 false 512" NOERROR - 0 5.578912966s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.1.1.1:853: i/o timeout
[INFO] 127.0.0.1:40380 - 11105 "NS IN . udp 17 false 512" NOERROR - 0 5.534997791s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:33645 - 59755 "NS IN . udp 17 false 512" NOERROR - 0 5.457796136s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:46080 - 61189 "NS IN . udp 17 false 512" NOERROR - 0 5.377324234s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.1.1.1:853: i/o timeout
[INFO] 127.0.0.1:56699 - 34585 "NS IN . udp 17 false 512" NOERROR - 0 5.3598043650000005s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:53016 - 30704 "NS IN . udp 17 false 512" NOERROR - 0 5.393816554s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.1.1.1:853: i/o timeout
[INFO] 127.0.0.1:59192 - 19357 "NS IN . udp 17 false 512" NOERROR - 0 5.540754531s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:39000 - 57883 "NS IN . udp 17 false 512" NOERROR - 0 5.560830907s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:56889 - 52884 "NS IN . udp 17 false 512" NOERROR - 0 5.537498188s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:32886 - 53561 "NS IN . udp 17 false 512" NOERROR - 0 5.421969831s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.1.1.1:853: i/o timeout
[INFO] 127.0.0.1:33900 - 60616 "NS IN . udp 17 false 512" NOERROR - 0 5.346150499s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.1.1.1:853: i/o timeout
[INFO] 127.0.0.1:44171 - 15542 "NS IN . udp 17 false 512" NOERROR - 0 5.372345367s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:55933 - 11828 "NS IN . udp 17 false 512" NOERROR - 0 5.463023343s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:39654 - 35258 "NS IN . udp 17 false 512" NOERROR - 0 5.468075386s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:52962 - 64931 "NS IN . udp 17 false 512" NOERROR - 0 5.475450687s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:42763 - 40932 "NS IN . udp 17 false 512" NOERROR - 0 5.543031639s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.1.1.1:853: i/o timeout
[INFO] 127.0.0.1:46030 - 24171 "NS IN . udp 17 false 512" NOERROR - 0 5.441059923s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.1.1.1:853: i/o timeout
[INFO] 127.0.0.1:48671 - 39229 "NS IN . udp 17 false 512" NOERROR - 0 5.490991256s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.0.0.1:853: i/o timeout
[INFO] 127.0.0.1:51896 - 51917 "NS IN . udp 17 false 512" NOERROR - 0 5.526528155s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.1.1.1:853: i/o timeout
[INFO] 127.0.0.1:60556 - 56650 "NS IN . udp 17 false 512" NOERROR - 0 5.489073053s
[ERROR] plugin/errors: 2 . NS: dial tcp 1.1.1.1:853: i/o timeout
[INFO] 127.0.0.1:51071 - 35802 "NS IN . udp 17 false 512" NOERROR - 0 4.311122831s
[ERROR] plugin/errors: 2 . NS: x509: certificate is valid for *.mydomain.org, mydomain.org, not cloudflare-dns.com
[INFO] 127.0.0.1:43592 - 61309 "NS IN . udp 17 false 512" NOERROR - 0 1.328996582s
[ERROR] plugin/errors: 2 . NS: x509: certificate is valid for *.mydomain.org, mydomain.org, not cloudflare-dns.com
[INFO] 127.0.0.1:47269 - 991 "NS IN . udp 17 false 512" NOERROR - 0 2.8312739479999998s
[ERROR] plugin/errors: 2 . NS: x509: certificate is valid for *.mydomain.org, mydomain.org, not cloudflare-dns.com
[INFO] 127.0.0.1:51775 - 59057 "NS IN . udp 17 false 512" NOERROR - 0 0.070910355s
[ERROR] plugin/errors: 2 . NS: x509: certificate is valid for *.mydomain.org, mydomain.org, not cloudflare-dns.com
```
### Additional information
I understand that the coreDNS component used in HA is configured to use these Cloudflair DNS servers as a fallback, however when these servers are unreachable (Clearly, redirected DoT DNS requests will fail because of the certificate mismatch), it should not cause the flood of network traffic observer at startup, especially when the locally configured DNS services are functioning perfectly.
4 Likes
avd706
(Avd706)
January 8, 2022, 4:20pm
82
At this point, its better to modify to sources to shut this off.
A brave move. Best of luck.
Tommmii:
CoreDNS will , for no detectable reason start using the hardcoded fallback. (This is bad, but workable)
The real gotcha is, that it will never revert back to original configuration, but stay stuck on the fallback.
This has just happened on mine… I have a DHCP-assigned static IP with a client option set for localised managed DNS, including RPZ for local network protection. At some point in the last couple of days, even though it’s still seeing the local DNS server, it no longer uses it at all…
So now none of my RPZ overwrites work, and none of my local DNS resolves. I now have to go through SMTP automations and rewrite them with hardcoded IPs because HA no longer resolves properly.
I’m struggling to come up with a reasonable idea on why someone thought this was a good idea.
2 Likes
le_top
March 24, 2022, 2:15pm
85
In my old post about this I indicated how I reconfigured ha using
ha dns option --servers dns://10.33.2.254 --servers dns://80.80.80.80 --servers dns://80.80.81.81
and also by adding a rule to my firewall.
I am surprised about the network address resolution in HA.
I have an OPNSense firewall that provides its own IP as the DNS server and when examining the DNS configuration using ha dns info, I get:
[core-ssh ~]$ ha dns info
host: 172.30.32.3
locals:
- dns://10.33.2.254
servers: []
update_available: false
version: 2021.06.0
version_latest: 2021.06.0
On my firewall, I override some DNS entries to point to the local network address (10.X.X.X) rather than the public (dynamic) network address.
So …
Many thanks to @CentralCommand for implementing an option to disable the fallback in version 2022.05.0 of the supervisor. Its been a long road, but we finally got there.
SSH into your HA instance and simply type:
ha dns options --fallback=false
No more fallback…, job done
17 Likes
Yep, that’s it!
My one suggestion for those reading this, please run the following command first:
ha resolution info
I put in some checks which test user-provided DNS servers to ensure they don’t have issues. The check for the situation I described here in particular is not obvious. It’s entirely possible that your local DNS server has this issue and you’ve never noticed since it only affects musl systems.
So please run that command and make sure no dns server issues are in the list. If there are none then feel free to disable the fallback.
If there are then I would strongly advise fixing those first otherwise you may have unexpected issues. Particularly around updating and installing containers since queries for github.com and ghcr.io resolve on A queries but not AAAA. If you do have the ipv6 issue I linked and you disable the fallback anyway you likely will see all your HA containers suddenly start to think github.com and ghcr.io don’t exist and hit a lot of problems.
10 Likes
petro
(Petro)
May 5, 2022, 12:27pm
89
Because this feature request has been implemented, please make new posts for support on this. This FR is formally closed.
4 Likes