Improve Privacy, Stop using hardcoded DNS

Yep it should do.

Under the hood what it does is periodically go into the HomeAssistant DNS container and check what config coredns has. If it’s not the “approved” version (i.e. has reverted to the official version) it’ll change the config and restart the coredns process to make it use the new config.

There’s also a mode that should prevent supervisor from auto-updating and quietly breaking your install (trying to fix that originally led me to this DNS issue).

There are 2 modes, depending on your preference:

  • auto-patch (the default) just removes any reference to 127.0.0.1:5553 (which is where the fallback listens)
  • Template applies a config that you provide in /config

auto-patch currently probably works best for most, but it’s possible that that might change in future HomeAssistant releases.

Take with a pinch of salt, but my installation definitely feels more responsive with it installed - Cloudflare’s DNS is slow compared to my local (partly because it’s more distant). I used to assume that the ewelink API was slow, turns out that it was probably the name resolution.

1 Like

Thanks for a detailed answer! Excuse this possibly silly question, but can this (local operation of HA) be tested by just yanking out the WAN cable off my router? It should, right?

I’d have thought so, yeah.

What you observe will probably depend on how your local DNS (router I assume) handles not being able to reach an upstream. Some sit and timeout (so you’d still see delays), others fail early (so it’ll be quite responsive, but obviously stuff relying on external name resolution will be broken).

CoreDNS uses a default dial timeout of 30s, so with the official version you’ve got quite a big delay in there when the WAN is down (calling scripts might use a shorter timeout - I haven’t looked, but I suspect not).

The timeout is built into Coredns and isn’t a configurable AFAIK - yet another reason why baking coreDNS into HomeAssistant is a bizzare choice, it’s most commonly seen in kubernetes clusters, not appliances (there are much better suited options if local resolution is a must)

Let me know how you get on, it’s possible I may be able to find a way around it if it proves to be a sticking point

1 Like

For reference

The… uh… assault on your network if you block Cloudflare at the edge is the result of a misconfiguration in the HomeAssistant CoreDNS config: I’ve raised a bug here: CoreDNS is misconfigured leading to unexpected healthcheck behaviour · Issue #64 · home-assistant/plugin-dns · GitHub

Basically, because that fallback is in the main forward statement, it gets healthchecked once a minute. If you’ve blocked Cloudflare, then the upstream queries will fail and the :5553 block will retry them, slowly building up a nice little traffic storm.

It is, very much, a bug in the HA DNS setup

2 Likes

I tried yanking the WAN cable out of the router (to simulate local-only operation) and most function of HA was retained. Yeelight bulbs switched to LAN mode after a few seconds. Of course Tuya wouldn’t work but that’s expected. Even most Xiaomi stuff worked locally, which was unexpected. Not bad.

Does DNS blocking with Adguard Home (running as a HA addon on the same Rpi4) come into play at all with this? I have set my router DHCP server’s primary DNS to the Pi’s IP, and the secondary to 8.8.8.8, so all devices are filtered by AGH and there’s a fallback if AGH isn’t running. Does this matter with the issue at hand at all?

Short answer, no. The fallback bypasses any local DNS settings on your network (that’s the problem)

1 Like

@btasker Nice issue raised in GutHub, but I predict it will just be closed with no response (just like the last four issues I raised).

Fingers crossed I’m wrong.

1 Like

This is good news.

I captured some metrics (leading to the GH issue above) and found that use of Cloudflare added about 1/2 second of latency on for me (despite CF only being about 15ms away). I’ve been meaning to rewrite the fallback to use UDP to try and prove whether it’s DoT overhead, but got sidetracked on other things.

Having adguard active shouldn’t come into play, no - as Phil says, the underlying problem is that the cloudflare fallback bypasses everything/anything you’ve set up on the LAN.

1 Like

Hopefully it’ll gain some traction, but I’m not going to hold my breath.

To be honest, if it doesn’t, then I’ve already got a path in mind.

I have a working PoC for replacing supervisor and coredns images with patched versions without triggering the codeNotary checks (so it won’t mark itself unhealthy/unsupported and block updates) - if push comes to shove that’ll allow me to patch out any bits they’re not willing to fix as and when they arise, and then I can get back to using HA as an appliance rather than a source of frustration :slight_smile:

3 Likes

what is so difficult to make this a configurable option? Let the people decide, let people opt out from the DNS TLS Cloudflare lookup if they dont want

2 Likes

I replaced the CoreDNS image with custom-made dnsmasq container instead. Works great.

This’s getting unacceptable.
My HA runs in a network where all traffic must go through my own DNS server to avoid DNS poisoning, thus any other DNS requests is blocked or redirected.
I wasn’t aware of the situation before this post and it explains A LOT why HA’s been countering some weird network issues.

PLEASE DON’T force people to use embedded DNS, cause google devices do and they’re a such a pain in the ass.

4 Likes

Just watch your install doesn’t get marked Unsupported and Unhealthy as a result - it’ll prevent you from installing addon updates.

One of the things Supervisor does is list out running containers and check they’re “approved”. One of the conditions of a supervisor install is

The operating system is dedicated to running Home Assistant Supervised.

(from architecture/adr/0014-home-assistant-supervised.md at 3331ea0255844a46d2830f3d780672b37b5793ff · home-assistant/architecture · GitHub).

It feels like there’s a certain disconnect with reality in all this. I don’t mind not getting support where I’ve done something custom, I do mind not being able to update stuff though.

Yes, this caught me when I first tried replacing it without any nuance. Supervisor went nuts trying to replace it but of course, as soon as it stopped the container, it lost name resolution and therein lies madness (and possibly ironically, the reason DNS is so tenaciously configured).

Since then I have prepared my replacement container such that it is no longer rejected by Supervisor, but it does take a little bit of a fandango during upgrades.

This is fantastic. I did however have an issue trying to install:

Failed to install add-on

The command ‘/bin/bash -o pipefail -c apk add --no-cache docker’ returned a non-zero code: 1

From system log:
21-11-23 15:05:47 INFO (SyncWorker_6) [supervisor.docker.addon] Starting build for 68e874ae/aarch64-addon-coredns-fix:0.1.1
21-11-23 15:06:01 ERROR (SyncWorker_6) [supervisor.docker.addon] Can’t build 68e874ae/aarch64-addon-coredns-fix:0.1.1: The command ‘/bin/bash -o pipefail -c apk add --no-cache docker’ returned a non-zero code: 1
21-11-23 15:06:01 ERROR (SyncWorker_6) [supervisor.docker.addon] Build log:
Step 1/23 : ARG BUILD_FROM=ghcr.io/hassio-addons/base/amd64:10.0.1
Step 2/23 : FROM ${BUILD_FROM}
—> 77db0d03c09e
Step 3/23 : SHELL ["/bin/bash", “-o”, “pipefail”, “-c”]
—> Using cache
—> 2643c024c7ad
Step 4/23 : ENV TERM=“xterm-256color”
—> Using cache
—> 330311bea649
Step 5/23 : ARG BUILD_ARCH=amd64
—> Using cache
—> cfccee773891
Step 6/23 : RUN apk add --no-cache docker
—> Running in 0c97ca0a26fa
fetch https://dl-cdn.alpinelinux.org/alpine/v3.14/main/aarch64/APKINDEX.tar.gz
WARNING: Ignoring https://dl-cdn.alpinelinux.org/alpine/v3.14/main: temporary error (try again later)

I think the Alpine repo’s been having some issues this week - I had an issue building another image the other day.

I should probably push a pre-built image to a registry though, there’s no real need to have to build it each time as we’re not customising anything based on the build host

2 Likes

Yeah just tried again and same error - will keep trying until I forget

Supervisor log output:

21-11-27 13:38:18 INFO (MainThread) [supervisor.store.git] Cloning add-on https://github.com/bentasker/HomeAssistantAddons/ repository
21-11-27 13:39:06 ERROR (MainThread) [supervisor.store.git] Can't clone https://github.com/bentasker/HomeAssistantAddons/ repository: Cmd('git') failed due to: exit code(128)
  cmdline: git clone -v --recursive --depth=1 --shallow-submodules https://github.com/bentasker/HomeAssistantAddons/ /data/addons/git/68e874ae
  stderr: 'Cloning into '/data/addons/git/68e874ae'...
fatal: unable to access 'https://github.com/bentasker/HomeAssistantAddons/': The requested URL returned error: 504
'.
21-11-27 13:39:06 INFO (MainThread) [supervisor.resolution.module] Create new issue IssueType.FATAL_ERROR - ContextType.STORE / 68e874ae
21-11-27 13:39:06 INFO (MainThread) [supervisor.resolution.module] Create new suggestion SuggestionType.EXECUTE_REMOVE - ContextType.STORE / 68e874ae
21-11-27 13:39:06 ERROR (MainThread) [supervisor.store] Can't load data from repository https://github.com/bentasker/HomeAssistantAddons/

It appears Github has an issue today.

Not wanting to flog a dead horse, but I’ve raised yet another issue on GitHub about this, as the behaviour on start-up has changed with the latest version

4 Likes