Local DNS!

So just to clarify, the fallback isn’t involved in mdns queries. On a system with supervisor, here’s a rough overview of what happens:

  1. When you do ha dns info, the host system knows about the DNS server(s) listed in locals. The ones listed in servers are not known to it, plugin-dns/coredns handles those.
  2. When a query that ends in .local or a single-label name comes in to plugin-dns, it does not attempt to resolve it on its own. Instead it simply asks the host’s systemd-resolved for an answer over dbus. Keep in mind the host knows mdns, llmnr and the dns servers listed in locals. It can use any and all of this information in answering queries (though it may choose not to, we have no control over it)
  3. Plugin-dns returns whatever answer is provided by systemd-resolved in this case. It does not ask any other dns servers or answers regardless of the status code. Not the ones in servers, not fallback, nothing else is queried.

What your suggesting is actually pretty complex. I mean you’re asking for all the options required to configure a domain with totally separate handling from everything else. And even if we did that we probably wouldn’t allow .local since according to the spec we shouldn’t. The authority for .local on a network is mdns. I think if you need this much control over the network then tbh you would probably be better served with a container install then one with supervisor.

One thing that I suppose we could do here is an option to simply disable mdns/llmnr. So it would look kind of like this:

  1. Add a new option for “disable multicast dns” to supervisor’s DNS API
  2. Supervisor writes this option into config file for dns plugin
  3. DNS plugin omits this line based on this new option. Then no special treatment for MDNS and LLMNR queries is given.
  4. Update the CLI to support this new option
  5. Update API doc for this new option

In addition there would also need to be a check to mark the system as unsupported for usage of this setting in supervisor. There are definitely a bunch of dependencies on mdns that may break with this setting such as:

  1. ESPHome devices broadcast their name using mdns by default
  2. Home Assistant broadcasts its name via mdns and llmnr
  3. MDNS discovery enabled by default in HA
  4. Local Google Assistant relies on mdns (not sure if this would break or not)

This is a shortlist off the top of my head. Basically either the setting would simply mark a system as unsupported or an author would need to research all the combinations of things with that setting which would break and mark those combinations as unsupported. Since we don’t want issues caused by folks disabling mdns while also enabling features that rely on mdns.

If you or someone wants to PR this then I think that could work. I can add it to my list but tbh there’s a lot on there already unrelated to DNS so I’m not sure when I’ll get to it. That and I’m not 100% convinced we should be doing it. If you scan this thread and the others around local DNS issues you’ll see a common refrain is “Why is Home Assistant doing its own DNS and not just asking the host system?” Well .local is now a case where that’s exactly what we are doing. We’re not doing our own DNS and simply asking the host system. Seems very counterintuitive to remove that.

Puh, now you create ideas :slight_smile:

From a development POV I would love to do the ultimate configuration option, but based on the complexity of its range of settings and my experience with half educated users this may create unnecessary headache and should be IMHO, if, a subbutton aka Advanced Settings.

But should we really be able do disable mDNS? From my technical point of view more and more “smart” and not so smart devices will support it, Apple pushed it not just into home network but also into enterprise networks, and as it can co-exist I do not see an advantage when it is disabled - except maybe some less noisy network.
I know that the RFC meanwhile reserved .local for mDSN, Apple for example integrated in a way that it does not interfere with a mydomain.local additionally in place. The key is to respect the name server(s) for each zone and one of them can be the mydomain.local.

You may disable the possibility to set a XYZ.local domain but from my POV this is not needed, because in that solution they do not interfere and co-exist. And the interface does not prevent me from adding a wrong IP address, some certain idiocracy/liberty may be left on the user’s shoulders. In this case I can imagine that either a good quick start manual for settings with DHCP, own DNS and own domain can resolve many issues. I am up for that docu work as well.

Regarding your clarification. I understand that the FB is not involved in mDNS, and neither should be the local domain. I read through the possible configuration options of core DNS and found often in examples and sites about coreDNS the way of answering a specific zone with a forward to certain DNS servers, which I then adapted. Also, I addressed that in many configurations people do not run a TLS enabled server at home, but a simple bind or dnsmasq on their routers, NAS or small hypervisor, but I already thought about the button under the Domain Name: Enable DoH or DoT and Hostname

Yes, the authority for .local is mDNS, which does not exclude possible authority of mydomain.local to another NS, each subdomain can have its own name server(s).

Regarding possible solutions in case of (mis-)configurations. Improper DNS setting ruin pretty much everything, that’s clear. What could go wrong? Assuming we have the new “domain name” option under supervisor - system - network:

  • User enters .local as domain, system says .local is not allowed because of mDNS
  • User enters domain.local but his domain is called mydomain.local and therefore his DNS server refuses to answer for example esp-kitchen.domain.com and all other internal requests will end in NXDOMAIN and the log will contain
    172.30.32.1:57044 - 24079 “A IN espkitchen.domain.local. udp 38 false 512” NXDOMAIN qr,aa,rd 52 0.011588231s
    and obviously many other NXDOMAIN…
  • user enters his correct mydomain.local and DNS queries to his domain are served as well as mDNS.

A handy reset network settings button could be helpful. Resetting to DHCP or a new manual address and just restoring the original config, applying the newly entered settings and rewriting the domain (if applicable) into corefile… hm, is there some way of loading files for zone info, but a quick google did not bring too much on split config or load additional configs - it would be nice to not touch the corefile at all but load the zone forwarder additionally if needed. I will try to find a way for that or do you know?

What do you think, leaving everything as it is, just adding DNS forward for the domain set to the NS set?

And yes, I would PR that, I just need help connecting the supervisor UI to the config files…
And I maybe would ask extensively the users if they want it that way, I am experienced in heterogenous networks but I may miss a certain constellation that is very well in use and it could be taken care of.

Tbh I’m not sure I totally follow. But we are open to PRs here if you want to give it a go. I guess here’s the general ground rules for a PR in this area:

  1. We won’t add an option that’s basically just “inject custom corefile here”. We want specific options in the APIs for the features we support so we can ensure a finite support space.
  2. Start by adding new options to supervisor’s API here. Probably in dns.py. You should also reflect current values of these new options in the info API
  3. Generally all options for the plugin are captured in one or more properties here. You can see in there that is also where the config is written out that is passed to plugin-dns.
  4. In plugin DNS you can assume the option(s) are available and use that information to write the core file. Take a look at the template here to see how that works.
  5. New DNS API options must also be added to the CLI here. And documented here.
  6. You’ll need good test coverage for everything added to supervisor.
  7. Unfortunately we don’t have great options for automated testing of the plugins right now other then ensuring they start so you’ll have to do more manual testing there. I find its easiest to just run it either by pretending its an addon or building the image and running it locally and then test it with dig. You can also build the image and then retag it so it displaces the current image for DNS on a dev system and gets fully exercised as a DNS plugin somewhere.

I would recommend starting with a PR for supervisor and use that to lay out your plan then wait for feedback before proceeding with the other parts.

We may also ask for a new unsupported check depending on the content of the PR and expected interactions with the rest of the ecosystem. I don’t think we would do that in this case but others may think of something I forgot.

Yes, it got pretty mixed up, a short declutter:

  • Adding the option to handle the personal local domain eg. mydomain.home with the provided DNS server(s) while leaving mDNS and the fallback function untouched.
  • Optionally adding the option to reset the network settings to resole misconfigurations.

I went back to your post about the rough overview about systemd-resolved, checked resolvectl and tried a couple of local lookups, which all failed. Adding the search domain with resolvectl domain 3 mydomain.home resolved that issue, all mDNS and DNS requests are performed now properly. I used the enp0s3 interface.

Tbh I am now a bit lost with the architecture. I am pretty sure there are reasons for that kind of setup but I have rearely seens that kind of complexity for name resolution:

  • Why are there two more or less competing name lookup services? There is coreDNS wich is differently configured than the systemd-resolved.
  • Why do I need to set a search domain to properly lookup a FQND of the same domain? Because even though resolved knows about the name servers it does not look it up properly until a the search domain for that domain is set.

I can’t imagine this is practical from a maintenance and support point of view.
I would go now the other way round and clean it up from the requriements side, instead of fixing the existing: What NS service is needed, which service can take care of it (coreDNS, resolved, dnsmasq or whatever) and keep that ONE service configured for all major setups and not two more or less competing ones:

  • MUST resolve names for the internal docker network
  • MUST resolve mDNS
  • MUST resolve DNS with all variants (DoT/DoH/DNS)
  • SHOULD respect local search domains

I can surely take care of the configuration and extensive testing with all major client platforms. I can also test most hypervisors and I have a NUC to run it baremetal, I just can’t test on RPI and I am not experienced in python.

What do you think?

1 Like

So getting this output - not seeing the two errors you mentioned - is it safe to proceed?
image

ipv6 error

This is exactly the issue I mentioned. Your DNS server is not handling ipv6 correctly. If you disable the fallback then your system will be marked as unsupported because it’s likely many things will not work correctly, particularly updates.

In general you should never have anything in issues. If you do there is something wrong you need to look into. Really wish we had a UI here, someday…

I just added an entry for ha.home to my router and then popped into the host shell of an HAOS system and did this:
Screen Shot 2022-05-16 at 3.48.49 PM

This didn’t work for you without additional changes? .home shouldn’t require anything extra to function, it’s only .local that’s got special treatment due to mdns.

If you had to change the settings of systemd-resolved to get mydomain.home to work there may be another issue in your setup. You should be able to add that as an entry to your local DNS server and have no issues resolving that from the host HA is running on and the containers that make up HA.

We need resolved because its the only one that handles multicast DNS that’s readily available on HAOS and stock debian. DNS resolvers like coreDNS and dnsmasq don’t do that OOTB because those kinds of queries generally aren’t supposed to be directed at them. And we’re not implementing all the logic for that ourselves. That was tried in coreDNS already, it isn’t a good idea, multicast DNS is complicated and we don’t want to own that logic. Hence why the mdns plugin we made now simply asks resolved.

Theoretically we could use resolved for everything. The change that would need to be done is everything in network settings comes from dbus and gets handed over dbus to systemd-resolved. We already do that for some of the things, would need to do it for everything. It’s not impossible its mostly just work.

Although I should note I actually discuss that idea with pvizelli before making the mdns changes and he mentioned that we used to do that a while ago and had to change it. He couldn’t remember all the reasons at the time and I didn’t press it. But this is part of why I suggested just opening an initial PR to supervisor laying out your plan to get the opinions of a few others. Although this is starting to sound like it might warrant an ADR first instead.

Thanks. I was able to update my firewall (sophos xg) from v18 to v19 and it appears they resolved the issue. Appreciate all the work that went into this!

image

1 Like

Sure was fun going through my entire network changing .local to .somethingelse.

Glad this useful breaking change was added.

Was there anything wrong with local DNS first, then fallback to mdns (other than a slight delay for the failed resolution)?

You did?

In my case it is not impossible, but as Mike used: lots of work and many breaking points to forget:
certificates, conection strings, Active Directory, Exchange, DFS, scripts, ACLs…

Thank you for the insights!
That’s where I was lastly, isn’t it time to verify the architecture of name resolution in general.

Regarding: "entry for ha.home"… and I found another “bug” - well, actually I pointed that a bit hiddenly already out, to reproduce:

  • set a DHCP IP in the UI under /config/network
  • run resolvectl status, you will get Scopes, Protocols, Current and available DNS Servers AND the DNS Domain (if the search domain is part of the DHCP offer, which it is in my case)
  • a resolvectl query ha.home will work

  • now change to static IP in the UI
  • run resolvectl status, the DNS Domain is now missing
  • a resolvectl query ha.home will not work anymore
    The issue here is the missing search domain, and that explains also why I had to set it manually, because with a static IP it is not part of the DHCP offer and cannot be set through the UI.

I will post a bug report for above and mention it in the ADR discussion.

Yeah, going through 23 servers and updating connection strings wasn’t fun. I set all my hosts to resolve to x.local and x.somethingelse during the transition so I could gradually find problems. I could then remove .local from each host one-by-one until they were all moved over.

Luckily, I don’t have anything in AD that’s named “.local”. I’d learned my lesson before setting up an AD, so picked something else by then!

I still don’t understand why this change needed to happen. HA had been working fine the way it was for years.

I learned the lesson long time ago but the network exists even a bit longer, AD on Windows Server 2000 with a clear recommendation: companyname.local
Even in Windows SBS 2011 the recommendation was still .local
Long story short, the amount of .local out there was growing even after Apple wrote RFC and begged IANA to register the address.

Well, it is like it is. For all who arrive here now and plan to invest nights in changing domain names, connection strings and certificates the proper solution is close and can partially already work now, see: Name lookup service in Home Assistant - switch to systemd-resolved · Discussion #768 · home-assistant/architecture · GitHub

hello,
is there any fix for that issue? I am running the latest haas and occurring same issue …
ie
coredns
→ coredns -conf/corefile
is using 30-50% cpu constantly.

seems the fix is >

thanks

I could sure use some advice on what I am overlooking with HASS local DNS resolution. (HASS OS 2022.8.5)

My primary router’s DHCP is configured with a domain of “home.arpa” and static DNS host name mapping for several cameras - (lowerdeck, upperdeck, etc.) These hosts resolve locally but not within HASS.

Within HASS, I’ve configured IPV4 with static mapping to local gateway/DNS and there are no HASS resolution issues noted…

image

➜  ~ ha dns info   
fallback: true
host: 172.30.32.3
llmnr: true
locals:
- dns://192.168.0.1
mdns: true
servers:
- dns://192.168.0.1
update_available: false
version: 2022.04.1
version_latest: 2022.04.1
➜  ~ ha resolution info  
checks:
- enabled: true
  slug: supervisor_trust
- enabled: true
  slug: network_interface
- enabled: true
  slug: addon_pwned
- enabled: true
  slug: free_space
- enabled: true
  slug: dns_server_ipv6
- enabled: true
  slug: core_security
- enabled: true
  slug: dns_server
issues: []
suggestions: []
unhealthy: []
unsupported: []
HASS OS Terminal  
➜  ~ nslookup lowerdeck.home.arpa 
Server:         172.30.32.3
Address:        172.30.32.3#53

Non-authoritative answer:
** server can't find lowerdeck.home.arpa: NXDOMAIN

Note: I have IPv6 disabled in the HASS settings because my ISP has it disabled and I don’t see any way to enable in in the router. Also, changing the fallback to false did not help.

Thanks for any advice!

Please share the response of the following commands:

dig A lowerdeck.home.arpa
dig AAAA lowerdeck.home.arpa

Particularly the second one. If your DNS server is returning anything other then NOERROR as the status then you’re going to have an issue. I know you don’t use ipv6 but musl systems still care about this. If an NXDOMAIN response is received for a domain on one protocol it is considered non-existent on all protocols. Here was my more detailed explanation of why this is before.

Well, but anyway you must remove the fallback to make sure only local resolution is taking place in case your DNS does not answer quickly enough or at all.

Additionally please set the search domain in your router for the DHCP discovery and set HA to DHCP instead of fixed address, as there is still no other way to use your own search domain. I found out that in my case HA ignores the search domain if not pushed by DHCP (see: https://github.com/home-assistant/operating-system/issues/1916). Please also see https://github.com/home-assistant/operating-system/issues/1916 it might help you getting the right commands to further troubleshoot it.

Anyway, IMHO the entire DNS, search domain and multi-DNS service situation in HA is politely said suboptimal.

Edit: Can you try to check if your DNS is in general replying properly by using nslookup on a computer

nslookup lowerdeck.home.arpa 192.168.0.1

Just because the name can be resolved from computers or phones does not neccesarily mean your DNS is resolving that

Thanks Mike and Alexander for all your troubleshooting tips!

Ok, so I made the following changes:

  • configured DNS option --fallback=false and restarted DNS
  • Reconfigure HASS IPV4 network to use DHCP. (Router assigns a static IP/hostname)

From my local PC:

nslookup  lowerdeck.home.arpa 192.168.0.1
Server:  GTC-Router.home.arpa
Address:  192.168.0.1

Non-authoritative answer:
Name:    lowerdeck.home.arpa
Address:  192.168.0.17

From HASS OS terminal:

nslookup lowerdeck.home.arpa 192.168.0.1
Server:         192.168.0.1
Address:        192.168.0.1#53

Non-authoritative answer:
Name:   lowerdeck.home.arpa
Address: 192.168.0.17
** server can't find lowerdeck.home.arpa: NXDOMAIN

➜  ~ nslookup lowerdeck.home.arpa
Server:         172.30.32.3
Address:        172.30.32.3#53

** server can't find lowerdeck.home.arpa: NXDOMAIN
➜  ~ dig A lowerdeck.home.arpa   

; <<>> DiG 9.16.29 <<>> A lowerdeck.home.arpa
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60386
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 2
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: a304903ad30059fd (echoed)
;; QUESTION SECTION:
;lowerdeck.home.arpa.           IN      A

;; ADDITIONAL SECTION:
lowerdeck.home.arpa.    104     IN      A       192.168.0.17

;; Query time: 0 msec
;; SERVER: 172.30.32.3#53(172.30.32.3)
;; WHEN: Tue Sep 06 12:51:16 CDT 2022
;; MSG SIZE  rcvd: 95

➜ ~ dig AAAA lowerdeck.home.arpa

; <<>> DiG 9.16.29 <<>> AAAA lowerdeck.home.arpa
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 42856
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 5fba0bf400a25ab2 (echoed)
;; QUESTION SECTION:
;lowerdeck.home.arpa.           IN      AAAA

;; AUTHORITY SECTION:
home.arpa.              300     IN      SOA     prisoner.iana.org. hostmaster.root-servers.org. 2002040800 1800 900 604800 604800

;; Query time: 177 msec
;; SERVER: 172.30.32.3#53(172.30.32.3)
;; WHEN: Tue Sep 06 12:52:27 CDT 2022
;; MSG SIZE  rcvd: 149

I’m running HASS OS 2022.8.5, which I understand is Alpine based. I can see “NOERROR” status for the DiG A command, and status “NXDOMAIN” for the DiG AAAA command. Is the recursion not available warning a problem?

The other troubleshooting steps require running resolvctl which apparently isn’t installed on HASSOS.

docker exec -t -i homeassistant /bin/bash
bash-5.1# resolvectl status
bash: resolvectl: command not found

I’m not clear what this involves "please set the search domain in your router for the DHCP discovery ".

The search domain is in your case the Domain Name, so that’s setup properly.

Edit: Your PC correctly resolves it and so does the nslookup and the dig A, but with the recursion error. 172.30.32.3 is AFAIK the coreDNS and this one seems to fail it. It’s been a while that I checked the config of coreDNS. Mike, do you have the coreDNS config in mind that might let it fail, otherwise I check it quickly.

Edit: Does HA get the search domain? Unfortunately I only know it with resolvectl to find it out

resolvectl flush-caches
resolvectl status 3

I am wondering now, why you do not have resolvectl in HASS OS It seems the HA CLI does not have resolvectl. Please try the same in the core-ssh (https://github.com/hassio-addons/addon-ssh) container.

1 Like

This is your problem. Doesn’t matter whether or not you disable the fallback, its not going to work because of this.

Since your DNS server responds with NXDOMAIN for AAAA requests rather then NOERROR all the alpine-based containers (read: most of HA) are going to say NXDOMAIN for that domain. This is exactly what I was looking for and talking about in my post above.

Technically this means your DNS server is behaving incorrectly. If a name exists it is always supposed to respond with NOERROR. If there are no answers for that name on the particular type of query (AAAA in this case) then it should return NOERROR with no answers. But if it returns NXDOMAIN then all the alpine containers (core, supervisor, most addons, etc.) will treat that as the answer for all queries on that domain (A and AAAA) due to this commit.

You’ll need to adjust your DNS server to handle this correctly according to spec or else you will have issues with it. If there’s no options around this in your DNS server then some other options to consider:

  1. Enable ipv6 on your network
  2. Install and use a different DNS server
  3. Don’t use a local-only internal domain. Use the same domain for internal and external access but tell your DNS server to resolve that domain to a local IP on your LAN. Also enable the fallback DNS so when your DNS server returns NXDOMAIN for AAAA requests the DNS plugin asks cloudflare for a different answer and gets a proper NOERROR response.

This isn’t true, at least the quickly enough part. I fixed that a while ago. The only time there can be a race condition between DNS servers is if you have multiple DNS servers configured. The fallback is only tried after everything else fails.

If your DNS server doesn’t respond at all then yes the fallback will happen. But that could actually work in your favor, see my solution #3 above.

2 Likes