Local DNS!

I have had the same issue as many have described here. I’m quite amazed that the situation is like it is. To me it would be a basic assumption that any device would agree to use the set or DHCP provided DNS. This is maniac!

The local network integrations didn’t work with my local names. I was able to get them working by pointing to the IP address rather than the name. However, I would prefer names over IP addresses but that seems to be tricky.

Tom, probably there’s light at the end of the tunnel: https://github.com/home-assistant/plugin-dns/issues/20#issuecomment-926125086

1 Like

thx for pointing it out !
Fingers crossed.

1 Like

well…those PR’s died a swift death.

1 Like

How is this still an issue?

At the bottom of that PR is a comment by @balloob - Remove hardcoded DNS servers by fenichelar · Pull Request #56 · home-assistant/plugin-dns · GitHub

We make the decisions we do because with Home Assistant OS we are offering a solutions to users that works, for users that want to focus on about home automation, not system administration.

If you’re interested in doing DNS stuff, probably Home Assistant OS is not for you. Consider using a Supervised or Container installation.

Except, if you use Supervised it’ll break your install periodically because there’s no way to stop it auto-updating, which is also apparently for the benefit of users, despite breaking things left-right-and-centre: How to stop supervisor auto-update? - #18 by john-arvid

The whole thing is a false dichotomoy anyway - you can have the “user friendly” default of using fallback DNS, and auto-updating and still have a config option to disable it.

I get that there are design decisions that have to be made, but it really is quite frustrating when your home automation won’t remain stable because of those decisions - bugs happen, but you expect they’ll be fixed at some point.

4 Likes

see my solution (using adGuard)

So, this is still broke, and apparently won’t get fixed. Heard that it is so that people who don’t do system administration can… checks notes… use a system that involves configuration and automation of several systems. Huh.

Well anyway, I would imagine if there are any people out there that are willing to do some network administration, you can probably make a rule in your firewall that redirects all port 53 (DNS) traffic to your chosen local DNS server and call it a day.

Edit: Even this doesn’t work. I have configured and tested NAT rules that intercept DNS, and DNSSEC (443 and 853). I have hardcoded the name into the hosts file. Nothing will allow me to use a local server for DNS resolution. HA wants to use 1.0.0.1:853 or 1.1.1.1:853 regardless, and won’t resolve local hostnames.

2 Likes

Wow, glad to hear that I’m not the only one with this ridiculous issue. I’ve tried both suggested edits to the corefile (followed by ha dns restart) but unfortunately neither worked. The dumb ways they often handle things at the container level will be the death of me. I can achieve short-name resolution at the host level (and throughout my entire network) but not in the container since it’s looking for a DNS suffix of “.local.hass.io” rather than my Edgerouter-4’s DNS of “.router.home” and doesn’t have the decency to ask my router. Specifying FQDN works at both levels with nslookup/ping but f that, I’m not using FQDNs in my config when it should be working properly with short names. The only way I could think to solve this based on what everyone’s tried so far is to set up a WINS Server in docker to auto-resolve with short names in another domain - but even if that’s possible and happens to work, that’s beyond stupid for a multitude of reasons. Hopefully the devs can wake up and fix this instead of defending such a mediocre implementation.

EDIT: I was able to change the line “fallback REFUSED,SERVFAIL,NXDOMAIN . dns://127.0.0.1:5553” and swapped out the “127.0.0.1:5553” with my router’s IP and specified Port 53 rather than 5553 and that seemed to give me full short name resolution across the board but of course that doesn’t survive a host reboot. If I can script this then it would suffice for a dirty workaround. - Nvm, that doesn’t work either. Resolves the name but still adds the .local.hass.io suffix. Ugh.

This is absolutely the most frustrating thing broken in HASS for me.

1 Like

Same on my side… this bug makes me crazy.

Hey all, I’m Mike Degatano. For anyone that watched the release party of 2022.3 you saw me on the stream, I was announced as Nabu Casa’s new hire focusing on supervisor. I have been working on this exact issue and was wondering if anyone facing this would be willing to try the beta channel. I have put in two changes that should help with this that are available there right now:

  1. MDNS to Systemd Resolved - essentially the DNS plugin is no longer attempting to resolve MDNS and LLMNR names itself. Instead it simply asks the native systemd-resolved service to do that for us. This should ensure .local names work properly, as long as the host can resolve the name then we can too.
  2. Cloudflare as a fallback only and no healthcheck - I changed up the corefile to remove Cloudflare from the list of forwarding servers. Instead it is only used as a fallback. This should keep it from getting “stuck” on cloudflare like a lot of you all have been seeing. Where it moves on from your listed local DNS servers after hitting issues and gets the idea that only the Cloudflare one works. It will always keep trying your local DNS servers first. And it no longer healthchecks Cloudflare at all to prevent runaway healthchecks when Cloudflare is blocked like some of you had reported in issues.

I’m not done here, there’s a few more changes to put in. But I thought those two might help you all from what I’ve been reading. So if any of you would be willing to give the beta a try and let me know if your experience is better or if there’s still issues to address I would really appreciate it.

14 Likes

How do we switch to the beta for hassio-dns? Do we have to switch the supervisor to beta?

Yes, that puts the whole install on the beta channel which means it will install the current supervisor beta. Sorry, there’s no way to only subscribe the one plugin to beta.

I think (pretty sure) you can force just the dns beta with
ha dns update --version 2022.04.0 (the only addon that won’t work for is supervisor itself if you are not on beta channel)

Oh yea good point. I don’t think that is sticky though, I believe supervisor will eventually realize that the wrong version of the DNS plugin is installed based on the channel it thinks its on and correct it. But it should last long enough for some testing.

I pulled beta dns when it was released and its still sticky to now

Yeah, I’ve done the same. Seems to be sticky for me too.

@CentralCommand I’ve been using your changes from plugin-dns PR #82 manually applied to the coredns template file. It’s been working much much better. Over the last week I’ve only had one “incident” of DNS giving up the ghost and my external URL based camera entities showing up as broken images on Lovelace dashboard. I have to restart HA to get them back, I’m not sure why temporary dns name failure gets stuck and won’t try again at some point. I’ll give a switch over to the beta if that still makes sense, otherwise if a release of this to production is coming I’ll hold until then. Thanks for the work on this.

It should come out of beta soon, most likely next week barring reports of issues.

I’m a bit disappointed it still got stuck. From my understanding of the forward plugin in coredns I don’t see how that could happen. Only your DNS servers are listed there now, cloudflare should only be tried if those fail. Was there anything in the log for the DNS plugin of note? You can get to that from here for reference. You can also turn on debug logging for supervisor by doing ha supervisor options --logging debug. That will turn on debug logging for the plugins as well which might give more insight.

One possibility I would like to check if you don’t mind (should be quick). Most of the containers used in a typical home assistant deployment are alpine based which means they use musl rather then glibc. This has an interesting consequence when it comes to DNS because of this commit. According to the DNS spec if a host exists a DNS server should always return NOERROR, even if they have no answers for the particular type of query (like if a host has only an ipv4 address but not an ipv6 one). musl enforces this, glibc does not. Because musl and alpine is newer not all DNS servers respect this rule and as a result unexpected NXDOMAINS can be encountered on alpine based systems.

We set up a simple test to check for this and are going to start testing DNS servers for this so we can let users know if their DNS server has an issue once this PR is merged. In the meantime you can test this manually by using the test domain we set up that only resolves for A type queries by doing this:

dig _checkdns.home-assistant.io A +noall +comments
dig _checkdns.home-assistant.io AAAA +noall +comments

Your response should have status: NOERROR for both of those, like this:

;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6899

If not then that may be the issue. Since a query issued for a type not supported by one of your local domains could be getting back an NXDOMAIN and causing the DNS plugin (which is alpine based) to think the entire domain doesn’t exist. Home Assistant too since it is also alpine based.

@CentralCommand Very happy to see this starting to get sorted after 2 years of my requests getting constantly shot down

The only issue remaining as far as I can see is the ability to disable the fallback completely. I understand that this may well be unpopular, as it sort of defeats the purpose of the fallback as a final catch all. That being the case, the behaviour could be modified so as to only call the fallback after a SERVFAIL response. Currently we have:

fallback REFUSED . dns://127.0.0.1:5553
fallback SERVFAIL . dns://127.0.0.1:5553
fallback NXDOMAIN . dns://127.0.0.1:5553

However, REFUSED and NXDOMAIN are not errors, and the fallback should not be used when these messages are received.

I believe the fallback is also used (from my own observations) when a NOERROR with a NULL response is received…, again this is not and error, and the fallback should not be invoked.

Here is such an example where the fallback is called when a NOERROR is returned (the fallback fails because I have it redirected to a local service which fails because of the cert mismatch)

[INFO] 127.0.0.1:48415 - 56587 "AAAA IN api.viessmann.com. udp 46 true 2048" NOERROR - 0 1.028544507s
[ERROR] plugin/errors: 2 api.viessmann.com. AAAA: dial tcp 1.0.0.1:853: i/o timeout

Again, very happy to see the changes you have made so far. What do you think about just using the fallback on the SERVFAIL condition ?

3 Likes