HA cannot add SMB storage when using a FQDN - strange but solved

I was trying to add a SMB storage to use as a backup destination. That did not work, and the error message was that HA could not resolve the name. This was when I used nas5.example.com as the FQDN for the server. It did work when I used the name nas5.local.

After some serious head-banging, the problem turns out to be result of my lazyness and systemd-resolved behavior and IPv6. I have my own DNS servers, and there I had defined the server names using the shortest name for them, so I had an A record for the name “nas5”, and a CNAME record of “nas5” for the name “nas5.example.com”:

nas5: A 10.10.10.5
nas5.example.com: CNAME "nas5"

This works fine in most cases: client asks for the name “nas5.example.com” and gets back both the CNAME and the actual IPv4 address of “nas5”. However, with HAOS, on the host os, the systemd-resolved was not happy:

# resolvectl query nas5.example.com
nas5.example.com: resolve call failed: No appropriate name servers or networks for name found

nslookup works fine, of course:

# nslookup nas5.example.com
Server:		192.168.1.13
Address:	192.168.1.13:53

nas5.example.com	canonical name = nas5
Name:	nas5
Address: 10.10.10.10

nas5.example.com	canonical name = nas5

So, why would systemd-resolved fail? It seems the reason is that my network (and HA) has IPv6 enabled, so systemd-resolved decides to query both A and AAAA records. The DNS knows the A records for nas5, but not the AAAA records. Thus the A records work fine, and the DNS response includes both the CNAME and IPv4 address. However, for the AAAA query the DNS server only returns the CNAME, as there really is nothing else it could return. And at this stage (I believe) systemd-resolved looks at the CNAME it got, and decides that it is a single-label name and calls it quits and decides to throw away even the IPv4 result it got. Why not return just the IPv4 A, I do not know or understand.

The situation is a bit different if I use multi-label CNAMEs: If I haver the following in DNS:

test4.example.com: A 10.0.0.4
test2.example.com: CNAME test4.example.com

Query for the test4 works as expected:

# resolvectl query test4.example.com
test4.example.com: 10.0.0.4                    -- link: end0

-- Information acquired via protocol DNS in 1.6ms.
-- Data is authenticated: no; Data was acquired via local or encrypted transport: no
-- Data from: network
#

However, test2 fails, somewhat unexpectedly:

# resolvectl query test2.example.com
test2.example.com: Name 'test4.example.com' not found
#

Note that the error is different this time: “not found” instead of “No appropriate name servers”. And notice that it explained that the CNAME was not found. But the end result is the same, even the successful IPv4 A resolution gets thrown away. Again, I have no clue as to why throw away a successful resolution.

Also, the result that the CNAME of test4 was a failure gets stored somewhere in the cache, which is a bummer as well, as it poisons further queries that succeeded earlier:

# resolvectl query test4.example.com
test4.example.com: 10.0.0.4                    -- link: end0

-- Information acquired via protocol DNS in 1.7ms.
-- Data is authenticated: no; Data was acquired via local or encrypted transport: no
-- Data from: network
# resolvectl query test2.example.com
test2.example.com: Name 'test4.example.com' not found
# resolvectl query test4.example.com
test4.example.com: Name 'test4.example.com' not found
# resolvectl flush-caches
# resolvectl query test4.example.com
test4.example.com: 10.0.0.4                    -- link: end0

-- Information acquired via protocol DNS in 1.4ms.
-- Data is authenticated: no; Data was acquired via local or encrypted transport: no
-- Data from: network

I can think of a few ways to fix this:

  1. make systemd-resolved return the successful A record, even if the AAAA query fails with the CNAME.
  2. Disable IPv6 on HAOS
  3. Add IPv6 AAAA records for all the hosts
  4. Change the order of what is a CNAME vs what is an alias

Option 1 sounds like it could take some time.
Option 2 is not for me, but might work for others.
Option 3 works if everything is IPv6 -capable. I verified that it works, but not a solution for me.
Option 4 is the solution I chose. This means that I have now:

nas5.example.com: A 10.10.10.10
nas5: CNAME nas5.example.com

And this works without systemd-resolved throwing away the IPv4 A, even though it gets a NXDOMAIN for the AAAA query:

# resolvectl query nas5.example.com
nas5.example.com: 10.10.10.10                  -- link: end0

-- Information acquired via protocol DNS in 116.5ms.
-- Data is authenticated: no; Data was acquired via local or encrypted transport: no
-- Data from: network

If anyone knows how to fix or explain this systemd-resolved behaviour, I would be interested.

This is something you should take up with the maintainers of systems, which I am sure refers you to making a new RFC proposal for DNS systems and also for the IPv6 standard at the governing organisation for DNS.

Not possible. IPv6 is the future and Matter and Thread only use that.

That is the way to go and also what is the official way.

See first comment.

Why would this need a new RFC? I would think this is a decision the systemd-resolved can do.
Even now systemd-resolved returns the IPv4 address in cases where it gets a NXDOMAIN for the AAAA. Why could it not do the same when it receives just the CNAME, which implicitly means an AAAA does not exist?

Because that is not the right way.
An IPv6-only system will get a success response then, but will be unable to use the response answer.

The IPv6 standard governs how and what the other protocols should do when they are both IPv4- and IPv6-able.
It is due to the method that is use to implement IPv6 without having to make huge changes in the IPv4 protocol.

Hmm… Are we perhaps talking of different things. RFC 6742 specifically states:
" Well-behaved applications SHOULD NOT simply use the first address
returned from an API such as getaddrinfo() and then give up if it
fails."
Which is something I am suggesting systemd-resolved would do. I does get the IPv4 addressd, and the IPv6 should result in NODATA, so it should use the IPv4 address.

Which RFC do you think the idea of “use IPv4 address even if AAAA resolution fails” violates?

Sadly, I don’t know, but intermittent DNS lookup failures do seen to be a problem in HAOS. Desktop Linux works fine with SystemD.

I personally have seen APCUPSD and SFTP backup storage both fail randomly with DNS+DHCP provided by the AdGuard Home Add-on.

Back in the day, HAOS didn’t support IPv6 which broke my MQTT devices but this seems to have been fixed some time ago (just in time for Matter+Thread).

Hard-coding IPv4 seems like a Bad Idea :tm:, but has proven reliable for my static servers.

How does that relate to a query for A and AAAA records? The root cause in the article is either smb misconfig or a missing SPN. Am I missing something?

I am not sure it does.
I only quickly scanned a bit of the article.

I have been looking up at the DNS RFC and the behaviour of DNS lookups should be that if a query of a record hits a CNAME record , then the canonical name should be queried with the same records as the initial query.

The other behaviour of DNS lookups is to query AAAA first on an IPv6 enabled interface and if it is not succeeding, then do a query on A records.
Some DNS clients do both searches at the same time to save time, but discard the result of the IPv4 if an AAAA record is received.

Another behaviour of DNS clients is that they should only look up records for protocols they have enabled and not share the the information between interfaces, because they might not be usable to other interfaces.

nas5 is a domain and not a host.
And therefore it replies with no DNS servers found for IPv6.

test2 is a host in the example.com domain and example that come exist in your domain service, but test2 does not have an AAAA record, so no host exist in IPv6.

https://www.rfc-editor.org/rfc/rfc6724

There could be a typo there, mixing interfaces with protocols: I do not know of any RFC that would prohibit sharing DNS responses between interfaces. The situation is a bit different with mDNS/LLMNR as those are interface-specific. But for the protocol, basically yes, RFC3493 specifies that getaddrinfo() should only return stuff we have addresses for.

“nas5” is a totally valid host name or a domain name in DNS sense. It is not a FQDN, it is a single label relative address. According to RFC8499:

“Host name” is often meant to be a domain name that follows the rules in Section 3.5 of [RFC1034], which is also called the “preferred name syntax”.

systemd-resolved has been configured in HAOS not to send the single-label names to DNS, with the directive ResolveUnicastSingleLabel=no. That is a reasonable choice, as it enables the use of LLMNR. However, this is not a DNS requirement.

Why would there need to be a host in IPv6? Perhaps you are thinking that I have an IPv6 only environment? I do not, I run the default dual stack.
But yes, test2.example.com does not have an AAAA record, as it has the CNAME RR, and RFC 2182 specifies:

An alias name (label of a CNAME record) may,
if DNSSEC is in use, have SIG, NXT, and KEY RRs, but may have no
other data.

About the RFC6724, it deals with the selection of addresses from a list of source and destination addresses. The implementation of RFC6724 is within getaddrinfo(), and should not be handled by systemd-resolved. In other words, systemd-resolved should return the addresses it gets unmodified, and getaddrinfo() will sort them in the order defined by RFC6724.

So, the correct way for systemd-resolved to work would be to return the IP4 address when it gets one, regardless of whether there is an issue with getting the IPv6 address. Right now it does not work that way when a CNAME points to a single-label name. That, IMHO, is a bug, but not something that would be worth fixing as it can be worked around easily.

The inability to resolve names that have any kind of CNAME - even multi-label - would be a more severe bug. However, there is a rather interesting twist of events: the same test I did yesterday (with the multi-label domain names) works today. I cannot tell what is different, but now it now works as expected:

# resolvectl query test2.example.com
test2.example.com: 10.0.0.4                    -- link: end0
                   (test4.example.com)

-- Information acquired via protocol DNS in 110.5ms.
-- Data is authenticated: no; Data was acquired via local or encrypted transport: no
-- Data from: network
# resolvectl query test4.example.com
test4.example.com: 10.0.0.4                    -- link: end0

-- Information acquired via protocol DNS in 1.4ms.
-- Data is authenticated: no; Data was acquired via local or encrypted transport: no
-- Data from: cache network

Now I am just left wondering what happened :slight_smile:

That one was new to me too, but there were a lengthy example and as far as I could understand, then the issue was that multiple sources interfaces on a host can lead to the sameinterfaces on a destination host, but with different routes for each source interface.

Do you have a link to that discussion? As DNS by itself is not interface-specific, and there is no interface-scoping for the responses, I would like to understand what this is about. It is true that you can make decisions on where to send the queries, but that is done based on the name that is being queried, not the IP address of the resolved record. And once the response arrives, the routing decisions of the packets themselves are independent of where the DNS response came from. DNS and routing work on different levels.

I found it in a couple of places.
I tried to love ok through my browser history and I found it first here in section 6.3.1.
https://www.rfc-editor.org/rfc/rfc6147

I later found it in regards with normal DNS too, but I looked at so many DNS and IPv6 RFCs that it is a huge task to go through them all.

Ok, thanks. I believe you might be referring mostly to either NAT64 prefix ambiguity or split horizon DNS. The RFC6147 issue is not precisely an interface issue, but rather a network/routing issue: if you get responses with a specific NAT64 gateway prefix, you better make sure you have a route towards that gateway.

But anyway, I believe those issues are not relevant to the discussion here, which is how HAOS deals with entries that have CNAME that points to an entry that has no AAAA.

Maybe not, but the fact is still that it is a systemd (just realized that my first post had systems written instead of systemd. Damn autocorrect!) issue and should probably be address by the maintainers there.
Systemd is not a HAOS thing, but a Linux thing.