> The answer to the query, possibly preface by one or more CNAME RRs that specify aliases encountered on the way to an answer.
The "possibly preface" (sic!) to me is obviously to be understood as "if there are any CNAME RRs, the answer to the query is to be prefaced by those CNAME RRs" and not "you can preface the query with the CNAME RRs or you can place them wherever you want".
But also.. the programmers working on the software running one of the most important (end-user) DNS servers in the world:
1. Changes logic in how CNAME responses are formed
2. I assume some tests at least broke that meant they needed to be "fixed up" (y'know - "when a CNAME is queried, I expect this response")
3. No one saw these changes in test behavoir and thought "I wonder if this order is important". Or "We should research more into this", Or "Are other DNS servers changing order", Or "This should be flagged for a very gradual release".
4. Ends up in test environment for, what, a month.. nothing using getaddrinfo from glibc is being used to test this environment or anyone noticed that it was broken
Cloudflare seem to be getting into thr swing of breaking things and then being transparent. But this really reads as a fun "did you know", not a "we broke things again - please still use us".
There's no real RCA except to blame an RFC - but honestly, for a large-scale operation like there's this seems very big to slip through the cracks.
I would make a joke about South Park's oil "I'm sorry".. but they don't even seem to be
We used to say at work that the best way to get promoted was to be the programmer that introduced the bug into production and then fix it. Crazy if true here...
"Testing environment" sounds to me like a real network real user devices are used with (like the network used inside CloudFlare offices). That's what I would do if I was developing a DNS server anyway, other than unit tests (which obviously wouldn't catch this unless they were explicitly written for this case) and maybe integration/end-to-end tests, which might be running in Alpine Linux containers and as such using musl. If that's indeed the case, I can easily imagine how noone noticed anything was broken. First look at this line:
> Most DNS clients don’t have this issue. For example, systemd-resolved first parses the records into an ordered set:
Now think about what real end user devices are using: Windows/macOS/iOS obviously aren't using glibc and Android also has its own C library even though it's Linux-based, and they all probably fall under the "Most DNS clients don't have this issue.".
That leaves GNU/Linux, where we could reasonably expect most software to use glibc for resolving queries, so presumably anyone using Linux on their laptop would catch this right? Except most distributions started using systemd-resolved (most notable exception is Debian, but not many people use that on desktops/laptops), which is a locally-cached recursive DNS server, and as such acts as a middleman between glibc software and the network configured DNS server, so it would resolve 1.1.1.1 queries correctly, and then return the results from its cache ordered by its own ordering algorithm.
They absolutely should have unit tests that detect any change in output and manually review those changes for an operation of this size.
OP said:
"However, we did not have any tests asserting the behavior remains consistent due to the ambiguous language in the RFC."
One could guess it's something like -- back when we wrote the tests, years ago, whoever did it missed that this was required, not helped by the fact that the spec proceeded RFC 2119 standardizing the all-caps "MUST" "SHOULD" etc language, which would have helped us translsate specs to tests more completely.
This is the part that is shocking to me. How is getaddrinfo not called in any unit or system tests?
I would hazard a guess that their test environment have both the systemd variant and the Unbound variants (Unbound technically does not arrange them, but instead reconstructs it according to RFC "CNAME restart" logic because it is a recursive resolver in itself), but not just plain directly-piped resolv.conf (Presumably because who would run that in this day and age. This is sadly just a half-joke, because only a few people would fall on this category.)
Which goes to show, one person’s “obvious understanding” is another’s “did they even read the entire document”.
All of which also serves to highlight the value of normative language, but that came later.
You might not find it ambiguous but it is ambiguous and there were attempts to fix it. You can find a warmed up discussion about this topic here: https://mailarchive.ietf.org/arch/msg/dnsop/2USkYvbnSIQ8s2vf...
And perhaps this is somewhat pedantic, but they also write that “RFC 1034 section 3.6 defines Resource Record Sets (RRsets) as collections of records with the same name, type, and class.” But looking at the RFC, it never defines such a term; it does say that within a “set” of RRs “associated with a particular name” the order doesn’t matter. But even if the RFC had said “associated with a particular combination of name, type, and class”, I don’t see how that could have introduced ambiguity. It specifies an exception to a general rule, so obviously if the exception doesn’t apply, then the general rule must be followed.
Anyway, Cloudflare probably know their DNS better than I do, but I did not find the article especially persuasive; I think the ambiguity is actually just a misreading, and that the RFC does require a particular ordering of CNAME records.
(ETA:) Although admittedly, while the RFC does say that CNAMEs must come before As in the answer, I don’t necessarily see any clear rule about how CNAME chains must be ordered; the RFC just says “Domain names in RRs which point at another name should always point at the primary name and not the alias ... Of course, by the robustness principle, domain software should not fail when presented with CNAME chains or loops; CNAME chains should be followed”. So actually I guess I do agree that there is some ambiguity about the responses containing CNAME chains.
I just commented the same.
It's pretty clear that the "possibly" refers to the presence of the CNAME RRs, not the ordering.
"With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody."
combined with failure to follow Postel's Law:
"Be conservative in what you send, be liberal in what you accept."
What's reasonable is: "Set reserved fields to 0 when writing and ignore them when reading." (I heard that was the original example). Or "Ignore unknown JSON keys" as a modern equivalent.
What's harmful is: Accept an ill defined superset of the valid syntax and interpret it in undocumented ways.
This is useful as it allows the ISA to remain compatible with code which is unaware of future extensions which define new functionality for these bits so long as the zero value means "keep the old behavior". For example, a system register may have an EnableNewFeature bit, and older software will end up just writing zero to that field (which preserves the old functionality). This avoids needing to define a new system register for every new feature.
MCP seems to be a new round of the cycle beginning again.
I'm dead serious, we should be in a golden age of "programming in the large" formal protocols.
It also does not give any way to actually see a warning message, where would we even put it? I know for a fact that if my glibc DNS resolver started spitting out errors into /var/log/god_knows_what I would take days to find it, at best the resolver could return some kind of errno with perror giving us a message like "The DNS response has not been correctly formatted", and then hope that the message is caught and forwarded through whatever is wrapping the C library, hopefully into our stderr. And there's so many ways even that could fail.
Start with milliseconds, move on to seconds and so on as the unwanted behavior continues.
In this case the broken resolver was the one in the GNU C Library, hardly an obscure situation!
The news here is sort of buried in the story. Basically Cloudflare just didn't test this. Literally every datacenter in the world was going to fail on this change, probably including their own.
I would expect most datacenters to use their own local recursive caching DNS servers instead of relying on 1.1.1.1 to minimize latency.
https://blog.cloudflare.com/zone-apex-naked-domain-root-doma... , and I quote directly ... "Never one to let a RFC stand in the way of a solution to a real problem, we're happy to announce that CloudFlare allows you to set your zone apex to a CNAME."
The problem? CNAMEs are name level aliases, not record level, so this "feature" would break the caching of NS, MX, and SOA records that exist at domain apexes. Many of us warned them at the time that this would result in a non-deterministic issue. At EC2 and Route 53 we weren't supporting this just to be mean! If a user's DNS resolver got an MX query before an A query, things might work ... but the other way around, they might not. An absolute nightmare to deal with. But move fast and break things, so hey :)
In earnest though ... it's great to see how now CloudFare are handling CNAME chains and A record ordering issues in this kind of detail. I never would have thought of this implicit contract they've discovered, and it makes sense!
Related, the phrase "CNAME chains" causes vague memories of confusion surrounding the concepts of "CNAME" and casual usage of the term "alias". Without re-reading RFC1034 today, I recall that my understanding back in the day was that the "C" was for "canonical", and that the host record the CNAME itself resolved to must itself have an A record, and not be another CNAME, and I acknowledge the already discussed topic that my "must" is doing a lot of lifting there, since the RFC in question predates a normative language standard RFC itself.
So, I don't remember exactly the initial point I was trying to get at with my second paragraph; maybe there has always been some various failure modes due to varying interpretations which have only compounded with age, new blood, non-standard language being used in self-serve DNS interfaces by providers, etc which I suppose only strengthens the "ambiguity" claim. That doesn't excuse such a large critical service provider though, at all.
It is a nightmare, but the spec is the source of the nightmare.
CNAMES are a huge pain in the ass (as noted by DJB https://cr.yp.to/djbdns/notes.html)
If a small business or cloud app can't resolve a domain because the domain is doing something different, it's much easier to blame DNS, use another DNS server, and move on. Or maybe just go "some Linuxes can't reach my website, oh well, sucks for the 1-3%".
Cloudflare is large enough that they caused issues for millions of devices all at once, so they had to investigate.
What's unclear to me is if they bothered to send patches to broken open-source DNS resolvers to fix this issue in the future.
Based on what we have learned during this incident, we have reverted the CNAME re-ordering and do not intend to change the order in the future.
To prevent any future incidents or confusion, we have written a proposal in the form of an Internet-Draft to be discussed at the IETF.
That is, explicitly documenting the "broken" behaviour as permitted.
(Yes, there are other recursive resolver implementations, but they look at BIND as the reference implementation and absent any contravention to the RFC or intentional design-level decisions, they would follow BIND's mechanism.)
Parsing the answer section in a single pass requires more finesse, but does it need fancier data structures than a string to string map? And failing that you can loop upon CNAME. I wouldn't call a depth limit like 20 "a rather low limit on the number of CNAMEs in a response", and max 20 passes through a max 64KB answer section is plenty fast.
That seems like some doubling-down BS to me, since they earlier say "It's ambiguous because it doesn't use MUST or SHOULD, which was introduced a decade after the DNS RFC." The RFC says:
>The answer to the query, possibly preface by one or more CNAME RRs that specify aliases encountered on the way to an answer.
How do you get to interpreting that, in the face of "MUST" being defined a decade later, as "I guess I can append the CNAME to the answer?
Holding onto "we still think the RFC allows it" is a problem. The world is a lot better if you can just admit to your mistakes and move on. I try to model this at home and at work, because trying to "language lawyer" your way out of being wrong makes the world a worse place.
$ echo "A AAAA CAA CNAME DS HTTPS LOC MX NS TXT" | sed -r 's/ /\n/g' | sed -r 's/^/rfc1034.wlbd.nl /g' | xargs dig +norec +noall +question +answer +authority @coco.ns.cloudflare.com
;rfc1034.wlbd.nl. IN A
rfc1034.wlbd.nl. 300 IN CNAME www.example.org.
;rfc1034.wlbd.nl. IN AAAA
rfc1034.wlbd.nl. 300 IN CNAME www.example.org.
;rfc1034.wlbd.nl. IN CAA
rfc1034.wlbd.nl. 300 IN CAA 0 issue "really"
;rfc1034.wlbd.nl. IN CNAME
rfc1034.wlbd.nl. 300 IN CNAME www.example.org.
;rfc1034.wlbd.nl. IN DS
rfc1034.wlbd.nl. 300 IN DS 0 13 2 21A21D53B97D44AD49676B9476F312BA3CEDB11DDC3EC8D9C7AC6BAC A84271AE
;rfc1034.wlbd.nl. IN HTTPS
rfc1034.wlbd.nl. 300 IN HTTPS 1 . alpn="h3"
;rfc1034.wlbd.nl. IN LOC
rfc1034.wlbd.nl. 300 IN LOC 0 0 0.000 N 0 0 0.000 E 0.00m 0.00m 0.00m 0.00m
;rfc1034.wlbd.nl. IN MX
rfc1034.wlbd.nl. 300 IN MX 0 .
;rfc1034.wlbd.nl. IN NS
rfc1034.wlbd.nl. 300 IN NS rfc1034.wlbd.nl.
;rfc1034.wlbd.nl. IN TXT
rfc1034.wlbd.nl. 300 IN TXT "Check my cool label serving TXT and a CNAME, in violation with RFC1034"
The result is DNS resolvers (including CloudFlare Public DNS) will have a cache dependent result if you query e.g. a TXT record (depending if it has the CNAME cached).
At internet.nl (https://github.com/internetstandards/) we found out because some people claimed to have some TXT DMARC record, while also CNAMEing this record (which results in cache dependent results, and since internet.nl uses RFC 9156 QName Minimisation, if first resolves A, and therefor caches the CNAME and will never see the TXT). People configure things similar to https://mxtoolbox.com/dmarc/dmarc-setup-cname instructions (which I find in conflict with RFC1034).I don't think they're advising anyone create both a CNAME and TXT at the same label - but it certainly looks like that from the weird screenshot at step 5 (which doesn't match the text).
I think it's mistakenly a mish-mash of two different guides, one for 'how to use a CNAME to point to a third party DMARC service entirely' and one for 'how to host the DMARC record yourself' (irrespective of where the RUA goes).
My main point was however that it's really not okay that CloudFlare allows setting up other record types (e.g. TXT, but basically any) next to a CNAME.
The more things change, the more things stay the same. :-)
That's the only reasonable conclusion, really.
I expect this is why BIND 9 has the 'servfail-ttl' option. [0]
Turns out that there's a standards-track RFC from 1998 that explicitly permits caching SERVFAIL responses. [1] Section 8 of that document suggests that this behavior was permitted by RFC 1034 (published back in 1987).
[0] <https://bind9.readthedocs.io/en/v9.18.42/reference.html#name...>
> To prevent any future incidents or confusion, we have written a proposal in the form of an Internet-Draft to be discussed at the IETF
Of course.
There's also so much of it, and it mostly works, most of the time. This creates a hysteresis loop in human judgement of efficacy: even a blind chicken gets corn if it's standing in it. Cisco bought cisco., but (a decade ago, when I had access to the firehose) on any given day belkin. would be in the top 10 TLDs if you looked at the NXDOMAIN traffic. Clients don't opportunistically try TCP (which they shouldn't, according to the specification...), but we have DoT (...but should in practice). My ISPs reverse DNS implementation is so bad that qname minimization breaks... but "nobody should be using qname minimization for reverse DNS", and "Spamhaus is breaking the law by casting shades at qname minimization".
"4096 ought to be enough for anybody" (no, frags are bad. see TCP above). There is only ever one request in a TCP connection... hey, what are these two bytes which are in front of the payload in my TCP connection? People who want to believe that their proprietary headers will be preserved if they forward an application protocol through an arbitrary number of intermediate proxy / forwarders (because that's way easier than running real DNS at the segment edge and logging client information at the application level).
Tangential, but: "But there's more to it, because people doing these things typically describe how it works for them (not how it doesn't work) and onlookers who don't pay close attention conclude "it works"." http://consulting.m3047.net/dubai-letters/dnstap-vs-pcap.htm...
It’s always DNS.
Honestly, it shouldn't matter. Anybody who's using a stub resolver where this matters, where /anything/ matters really, should be running their own local caching / recursing resolver. These oftentimes have options for e.g. ordering things for various reasons.
Randomization within the final answer RRSet is fine (and maybe even preferred in a lot of cases)
We thought it as just the default ntp servers abut had some reboot during this event because www.cisco.com was unavailable.
As an aside, I am super annoyed at Cloudflare for calling their proxy records "CNAME" in their UI. Those are nothing like CNAMEs and have caused endless confusion.
Please order the answer in the order the resolutions were performed to arrive at the final answer (regardless of cache timings). Anything else makes little sense, especially not in the name of some micro-optimization (which could likely be approached in other ways that don’t alter behaviour).
Something is broken in Cloudflare since a couple of years. It takes a very specific engineering culture to run the internet and it's just not there anymore.
Nowhere the RFC suggests multiple CNAMEs need to be in a specific order.
And I also being shocked that Cisco Switch goes to reboot loop with this DNS order issue.
Also no, the client doesn't need more memory to parse the out-of-order response, it can take multiple passes through the kilobyte.
Of course, if the server sends unrelated address records in the answer section, that will result in incorrect data. (A simple counter can detect the end of the answer section, so it's not necessary to chase CNAMEs for section separation.)
nitpicking at the RFCs when everyone knows DNS is a big old thing with lots going on
how do they not have basic integration tests to check how clients resolve
it seems very unlike cloudflare of old that was much more up front - there is no talk of the need to improve process, just blaming other people
Also, what's the right mental framework behind deciding when to release a patch RFC vs obsoleting the old standard for a comprehensive update?
Otherwise I might go to consult my favorite RFC and not even know its been superseded. And if it has been superseded with a brand new doc, now I have to start from scratch again instead of reading the diff or patch notes to figure out what needs updating.
And if we must supersede, I humbly request a warning be put at the top, linking the new standard.
https://datatracker.ietf.org/doc/html/rfc5245
I agree, that it would be much more helpful if made obvious in the document itself.
It's not obvious that "updated by" notices are treated in any more of a helpful manner than "obsoletes"
Maybe I'm being overly-cynical but I have a hard time believing that they deliberately omitted a test specifically because they reviewed the RFC and found the ambiguous language. I would've expected to see some dialog with IETF beforehand if that were the case. Or some review of the behavior of common DNS clients.
It seems like an oversight, and that's totally fine.
rrs = resolver.resolve('www.example.test')
assert Record("cname1.example.test", type="CNAME") in rrs
assert Record("192.168.0.1", type="A") in rrs
Which wouldn't have caught the ordering problem.Reminds me of https://news.ycombinator.com/item?id=37962674 or see https://tech.tiq.cc/2016/01/why-you-shouldnt-use-cloudflare/
> If recursive service is requested and available, the recursive response to a query will be one of the following:
> - The answer to the query, possibly preface by one or more CNAME RRs that specify aliases encountered on the way to an answer.
> While "possibly preface" can be interpreted as a requirement for CNAME records to appear before everything else, it does not use normative key words, such as MUST and SHOULD that modern RFCs use to express requirements. This isn’t a flaw in RFC 1034, but simply a result of its age. RFC 2119, which standardized these key words, was published in 1997, 10 years after RFC 1034.
It's pretty clear that CNAME is at the beginning.
The "possibly" does not refer to the order but rather to the presence.
If they are present, they are are first.
It might be a victim of polite/ironic/sarcastic influences to language that turns innocuous words into contronyms
Sounds low key selfish / inconsiderate to me
... to push such a change without adequate thought or informed buy in by consumers of that service.
Wherever possible I compile with gethostbyname instead of getaddrinfo. I use musl instead of glibc
Nothing against IPv6 but I do not use it on the computers and networks I control
When compiling software written by others, sometimes there are compile-time options that allow not using getaddrinfo or IPv6
For example,
links (--without-getaddrinfo)
haproxy (USE_GETADDRINFO="")
tnftp (--disable-ipv6)
elinks (--disable-ipv6)
wolfssl (ipv6 disabled by default)
stunnel (--disable-ipv6)
socat (--disable-ipv6)
and many more
Together with localhost TLS forward proxy I also use lots of older software that only used gethostbyname, e.g., original netcat, ucspi-tcp, libwww, original links, etc.
Generally I avoid mobile OS (corporate OS for data collection, surveillance and ad services)
Mobile data is disabled. I almost never use cellular networks for internet
Mobile sucks for internet IMHO; I have zero expectation re: speed and I cannot control what ISPs choose to do
For me, non-corporate UNIX-like OS are smaller, faster, easier to control, more interesting
They write reordering, push it and glibc tester fires, fails and you quickly discover "Crap, tests are failing and dependency (glibc) doesn't work way I thought it would."
> Another notable affected implementation was the DNSC process in three models of Cisco ethernet switches. In the case where switches had been configured to use 1.1.1.1 these switches experienced spontaneous reboot loops when they received a response containing the reordered CNAMEs.
... but I am surprised by this:
> One such implementation that broke is the getaddrinfo function in glibc, which is commonly used on Linux for DNS resolution.
Not that glibc did anything wrong -- I'm just surprised that anyone is implementing an internet-scale caching resolver without a comprehensive test suite that includes one of the most common client implementations on the planet.
> One such implementation that broke is the getaddrinfo function in glibc, which is commonly used on Linux for DNS resolution.
> Most DNS clients don’t have this issue.
The most widespread implementation on the most widespread server operating system has the issue. I'm skeptical of what the author means by "Most DNS clients."
Also, what is the point of deploying to test if you aren't going to test against extremely common scenarios (like getaddrinfo)?
> To prevent any future incidents or confusion, we have written a proposal in the form of an Internet-Draft to be discussed at the IETF. If consensus is reached...
Pretty sure both Hyrum's Law and Postel's Law have reached the point of consensus.
Being conservative in what you emit means following the spec's most conservative interpretation, even if you think the way it's worded gives you some wiggle room. And the fact that your previous implementation did it that way for a decade means people have come to rely on it.
Any change to a global service like that, even a rollback (or data deployment or config change), should be released to a subset of the fleet first, monitored, and then rolled out progressively.
Each resolved record would be asserted as a fact, and a tiny search implementation would run after all assertions have been made to resolve the IP address irrespective of the order in which the RRsets have arrived.
A micro Prolog implementation could be rolled into glibc's resolver (or a DNS resolver in general) to solve the problem once and for all.
It's surprising how something so simple can be so broken.