An update on this issue: the problem with github.com
not resolving appears to not being caused by the Pi-hole DNS servers, but by the Main DNS server in my network.
To clarify: up to now I have been running two Pi-hole DNS servers in parallel for redundancy, and both of these Pi-holes were forwarding all non-resolvable DNS queries to the main DNS server in my local network.
This main DNS server was forwarding all queries that it could not answer itself to my Internet providers DNS servers.
This main server is an old Windows Server that amongst others is the DHCP, DNS and DFS server for my local network.
This has been working flawlessly for years, but now I suddenly have this problem with HA not being able to access github.com
due to these DNS upstream errors.
Strangely enough, up to now I am only seeing this for the github.com
FQDN and not for any other DNS queries.
And when I go to github.com
through any other means, like a web browser, ping etc, the FQDN is resolved normally.
So the only problem seems to be HA sending a DNS query for github.com
to the main DNS server.
To troubleshoot the problem I initially let the Home Assistant server use the Internet providers DNS servers directly (as written in the previous post), and that solved the problem.
But then I let the Home Assistant server use the main local DNS server directly (so skipping the Pi-holes), and the problem was back.
So then I let the Pi-hole DNS servers go out directly to the Internet providers DNS servers (so skipping the main DNS server), and let the Home Assistant servers use the Pi-holes again, and the problem was gone again.
So the conclusion must be that my main DNS server somehow has trouble to forward the github.com
FQDN DNS queries from Home assistant to external DNS servers.
A packet capture shows these Format errors in the DNS traffic for the query response to github.com
:
Domain Name System (response)
Transaction ID: 0x2441
Flags: 0x8181 Standard query response, Format error
1... .... .... .... = Response: Message is a response
.000 0... .... .... = Opcode: Standard query (0)
.... .0.. .... .... = Authoritative: Server is not an authority for domain
.... ..0. .... .... = Truncated: Message is not truncated
.... ...1 .... .... = Recursion desired: Do query recursively
.... .... 1... .... = Recursion available: Server can do recursive queries
.... .... .0.. .... = Z: reserved (0)
.... .... ..0. .... = Answer authenticated: Answer/authority portion was not authenticated by the server
.... .... ...0 .... = Non-authenticated data: Unacceptable
.... .... .... 0001 = Reply code: Format error (1)
And a packet capture for a normal working DNS query response to another FQDN shows this:
Domain Name System (response)
Transaction ID: 0xa166
Flags: 0x8180 Standard query response, No error
1... .... .... .... = Response: Message is a response
.000 0... .... .... = Opcode: Standard query (0)
.... .0.. .... .... = Authoritative: Server is not an authority for domain
.... ..0. .... .... = Truncated: Message is not truncated
.... ...1 .... .... = Recursion desired: Do query recursively
.... .... 1... .... = Recursion available: Server can do recursive queries
.... .... .0.. .... = Z: reserved (0)
.... .... ..0. .... = Answer authenticated: Answer/authority portion was not authenticated by the server
.... .... ...0 .... = Non-authenticated data: Unacceptable
.... .... .... 0000 = Reply code: No error (0)
So this is identical except for the Format error (1)
.
I am by no means a network specialist and still don’t understand why this is happening, but when I find some more info I will report back.
If anybody has an idea about this then please let me know.