Under the hood what it does is periodically go into the HomeAssistant DNS container and check what config coredns has. If it’s not the “approved” version (i.e. has reverted to the official version) it’ll change the config and restart the coredns process to make it use the new config.
There’s also a mode that should prevent supervisor from auto-updating and quietly breaking your install (trying to fix that originally led me to this DNS issue).
There are 2 modes, depending on your preference:
auto-patch (the default) just removes any reference to 127.0.0.1:5553 (which is where the fallback listens)
Template applies a config that you provide in /config
auto-patch currently probably works best for most, but it’s possible that that might change in future HomeAssistant releases.
Take with a pinch of salt, but my installation definitely feels more responsive with it installed - Cloudflare’s DNS is slow compared to my local (partly because it’s more distant). I used to assume that the ewelink API was slow, turns out that it was probably the name resolution.
Thanks for a detailed answer! Excuse this possibly silly question, but can this (local operation of HA) be tested by just yanking out the WAN cable off my router? It should, right?
What you observe will probably depend on how your local DNS (router I assume) handles not being able to reach an upstream. Some sit and timeout (so you’d still see delays), others fail early (so it’ll be quite responsive, but obviously stuff relying on external name resolution will be broken).
CoreDNS uses a default dial timeout of 30s, so with the official version you’ve got quite a big delay in there when the WAN is down (calling scripts might use a shorter timeout - I haven’t looked, but I suspect not).
The timeout is built into Coredns and isn’t a configurable AFAIK - yet another reason why baking coreDNS into HomeAssistant is a bizzare choice, it’s most commonly seen in kubernetes clusters, not appliances (there are much better suited options if local resolution is a must)
Let me know how you get on, it’s possible I may be able to find a way around it if it proves to be a sticking point
Basically, because that fallback is in the main forward statement, it gets healthchecked once a minute. If you’ve blocked Cloudflare, then the upstream queries will fail and the :5553 block will retry them, slowly building up a nice little traffic storm.
I tried yanking the WAN cable out of the router (to simulate local-only operation) and most function of HA was retained. Yeelight bulbs switched to LAN mode after a few seconds. Of course Tuya wouldn’t work but that’s expected. Even most Xiaomi stuff worked locally, which was unexpected. Not bad.
Does DNS blocking with Adguard Home (running as a HA addon on the same Rpi4) come into play at all with this? I have set my router DHCP server’s primary DNS to the Pi’s IP, and the secondary to 8.8.8.8, so all devices are filtered by AGH and there’s a fallback if AGH isn’t running. Does this matter with the issue at hand at all?
I captured some metrics (leading to the GH issue above) and found that use of Cloudflare added about 1/2 second of latency on for me (despite CF only being about 15ms away). I’ve been meaning to rewrite the fallback to use UDP to try and prove whether it’s DoT overhead, but got sidetracked on other things.
Having adguard active shouldn’t come into play, no - as Phil says, the underlying problem is that the cloudflare fallback bypasses everything/anything you’ve set up on the LAN.
Hopefully it’ll gain some traction, but I’m not going to hold my breath.
To be honest, if it doesn’t, then I’ve already got a path in mind.
I have a working PoC for replacing supervisor and coredns images with patched versions without triggering the codeNotary checks (so it won’t mark itself unhealthy/unsupported and block updates) - if push comes to shove that’ll allow me to patch out any bits they’re not willing to fix as and when they arise, and then I can get back to using HA as an appliance rather than a source of frustration
what is so difficult to make this a configurable option? Let the people decide, let people opt out from the DNS TLS Cloudflare lookup if they dont want
This’s getting unacceptable.
My HA runs in a network where all traffic must go through my own DNS server to avoid DNS poisoning, thus any other DNS requests is blocked or redirected.
I wasn’t aware of the situation before this post and it explains A LOT why HA’s been countering some weird network issues.
PLEASE DON’T force people to use embedded DNS, cause google devices do and they’re a such a pain in the ass.
It feels like there’s a certain disconnect with reality in all this. I don’t mind not getting support where I’ve done something custom, I do mind not being able to update stuff though.
Yes, this caught me when I first tried replacing it without any nuance. Supervisor went nuts trying to replace it but of course, as soon as it stopped the container, it lost name resolution and therein lies madness (and possibly ironically, the reason DNS is so tenaciously configured).
Since then I have prepared my replacement container such that it is no longer rejected by Supervisor, but it does take a little bit of a fandango during upgrades.
I think the Alpine repo’s been having some issues this week - I had an issue building another image the other day.
I should probably push a pre-built image to a registry though, there’s no real need to have to build it each time as we’re not customising anything based on the build host
Not wanting to flog a dead horse, but I’ve raised yet another issue on GitHub about this, as the behaviour on start-up has changed with the latest version