Improve Privacy, Stop using hardcoded DNS

Tommmii · August 30, 2021, 9:54am

This, 99% !!
The other 1% : why offer the option to configure a DNS server, if you’re then going to ignore it in some cases ?
If/when any user (village idiot or system admin) decides to use the DNS server configuration option, then that explicitly removes the DNS issues/problems away from HA.

Either allow & obey the user’s setting,
or when there is no user setting do whatever you think is correct.

tescophil · August 30, 2021, 10:11am

Exactly. There are several layers of issues here mixing privacy with bad practice and just dumb ass coding mistakes, the worst of which is that the fallback for the fallback is the fallback, so the system gets stuck in an infinite load increasing loop when the fallback is blocked/unavailable.

getut · September 9, 2021, 4:48pm

I just found this thread after locking down my network so that apps cannot bypass my DNS filtering. The whole DNS over https thing is ludicrous and a huge security nightmare when network owners can’t see and control the traffic on their network. Applications should always honor the OS DNS settings and all attempts to bypass that need to be treated as a malicious bypass. If someone can explain how I can open a bug fix ticket to add my name to the shutdown list, I’ll be more than happy to do it. But this crap needs to stop. It’s not OK to even have it as a configuration item, it simply needs to honor system DNS full stop. If DNS over https is supported it needs to be configured at the system level or not at all. Again, applications doing their own resolution is a malicious bypass of system owner control of the system and network.

Wim_L · September 17, 2021, 6:22am

After months of dns misery, with our secured network. We moved to regular docker containers, in a debian VM instead of HAOS. Was some work to link nodered,zigbee etc, moving configs, etc… but What a difference.
Load has dropped significantly, HA responds a Lot faster. Both VM’s have 1core and 2G ram available.
No unwanted supervisor/audio/dns etc containers, and with docker-compose updating is only a pull away. Don’t think I’ll ever run supervised again…

ahasbini · September 20, 2021, 11:43pm

I’ve spent weeks figuring out this problem, all started just because the pihole integration was from time to time not showing data when using its hostname, though after restarting HA it would immediately work for sometime then suddenly decides to stop again. And the worst part is that it felt like the fallback was being triggered randomly. I got ping monitors setup on my network, yet the core-dns plugin just randomly switches to cloudflare at times there aren’t any obvious issues and never switches back. Kind of wishing more details to better understand and figure out improvements.

And now that I’m understanding the issue, the only choice the devs give is to move my setup from HassOS to standalone container setup without any mention of this problem at all in the first place. That’s really a waste of time and effort for me because the reason I went with HassOS was for it’s great variety of features/integrations, addon store and easy setup. I’m really hoping that this gets fixed.

alec.fenichel · September 25, 2021, 6:04pm

How do people feel about this as a solution?

Tommmii · September 26, 2021, 8:12am

anything that removes the brainless hardcoded dns servers is giant step in the right direction.

tescophil · September 30, 2021, 7:14am

Shocker, this PR was closed with a final comment from the Devs of:

“We are not interested in further discussing this topic.”

Gilf0y · October 20, 2021, 4:07pm

Thats ridicilous comment. Any Devs to explain this? I read this: Intentionally weakening security.

tom_l · October 20, 2021, 10:01pm

Not a dev or sysadmin but from this post it seems you were given a way forward:

That will not solve the main issue which we solved with this. If you want this then you need to do the following:

Open a PR to Supervisor and add an option to disable fallback which marks your system as unsupported and extend it to the API + tests

Create an PR to the developer documentation for having it there

Create an PR to CLI repository for using that options

Create an PR to this repository to get this options in place

bigbang · October 22, 2021, 12:51am

Can’t you just add a NAT forwarding rule to your firewall for all TCP/UDP traffic going from LAN to !LAN on port 53 to force it to your local DNS / firewall?
Got the idea from this channel: A comprehensive guide to pfSense Pt 6 - DNS - YouTube

tescophil · October 22, 2021, 6:27am

In response to the last two comments:

The’Way forward’ suggested was to make the solution to this issue ‘unsupported’, which all responders to the PR found unacceptable.
You cannot redirect DoT port 853 (not 53) traffic because the requester (HA) will check the SSL certificate response and throw an error. because it was expecting a cert for cloudflare 1.1.1.1

Overall it looks like this issue is dead. The Devs have actively decided to weaken security in favour of easing their support burden.

HeyImAlex · October 22, 2021, 6:34pm

As a HA core user, there is one thing I don’t really understand about the issue discussed here. Set aside all security implications and lack of configurability of the fallback DNS and possible increase of support activity if the fallback is removed. I think both sides have some good points here.

But - am I correct understanding that HAOS basically goes batshit insane and starts flooding your internal network with an ever accelerating avalanche of DNS requests as soon as it doesn’t have access to the internet anymore ? If so, then this is a very serious bug. It simply means that HAOS is essentially unusable without a permanent internet connection. And its failure mode is to take your entire network down with it.

And it seems that the devs are unwilling to fix this bug even though they are fully aware of it ? And willingly let it DOS everything on your network, saying that this is intended behavior and done by design ? If so then I have no words.

btasker · October 30, 2021, 4:21pm

am I correct understanding that HAOS basically goes batshit insane and starts flooding your internal network with an ever accelerating avalanche of DNS requests as soon as it doesn’t have access to the internet anymore ?

That’s my understanding, yes.

Most devs go with exponential back-off to prevent a thundering herd effect, but it seems that coredns just queues retries again, and again, and again, tying up system resources as well as hammering your network.

So it’s not just DOSing your network, it’s DOSing itself.

If so then I have no words.

This combined with some other issues leaves me looking at having to move to running a core container on an OS I maintain, because the devs aren’t willing to accept a PR without attaching conditions that no-one sane would agree to (do all this work and anyone who uses it will be unsupported).

Which has to ask the question of it’s worth it - what happens when they break something in core and then won’t fix it? As a user, am I doomed to a future of having multiple “fixed” scripts exported as volumes over the broken ones?

tescophil · October 30, 2021, 4:57pm

Currently (as the OP) I have found only one acceptable ‘Work Around’ for this.

Since simply blocking DoT 853 requests only causes HA to flood the network with more requests, an alternate approach is to redirect these requests to a local service.

Whilst these redirected requests will also fail because of the SSL certificate mismatch, currently this does not case HA to spam the network with additional requests.

When I implemented this on my network, all I see are two requests every 5 mins (health check I assume), one to 1.1.1.1 and the other to 1.0.0.1.

Not perfect, but the best solution I’ve found so far, as clearly the Devs are not going to release a supported build without the fallback functionality.

btasker · October 30, 2021, 5:16pm

Thanks @tescophil

That’s at least feasible.

I’ve been toying with the idea of creating my own privileged add-on that’ll periodically

Copy down corefile from the DNS container
Check if it’s changed vs it’s own template
If it has, copy it up and kill coredns to force a reload

The advantage of this is it’ll survive reboots and updates. The disadvantage is that it’s extra complexity, and something I’ll have to maintain going forward. But, it would also mean I have a tool at hand for the next breaking change/thing.

btasker · October 30, 2021, 7:40pm

Just in case anyone wants it, I’ve gone the route of throwing together a custom add-on to enforce my will on the coredns container

It’s dirty and it’s messy, but it works and survives reboots etc.

It

Strips out the references to 127.0.0.1:5553 to prevent fallback
Removes the healthchecks so coredns won’t spam your firewall

At some point I’ll probably change it to use a template for the /etc/core rewrite so that it’s less likely to break if there are unexpected changes to the file in a future update.

Stooovie · November 5, 2021, 4:53am

This is my experience. HAOS just doesn’t work properly without internet access. Of course I’m not counting cloud based devices like Tuya. I mean the UI, ESPHome, local stuff. It just lags, some things aren’t available, and it grinds to a halt eventually. For whatever reason people didn’t believe me when I brought this up on these forums.

Stooovie · November 5, 2021, 4:58am

Interesting. My networking knowledge is limited, but I know my HAOS basically grinds to a halt without internet access, so how would I actually implement what you propose? Thanks.

Stooovie · November 5, 2021, 5:13am

Installed, thanks for your work! If I understand correctly, this basically should enable fully local operation (except cloud-based integrations of course) of HAOS, right?