Internet-Draft | Secure Nameserver Selection Algorithm fo | October 2024 |
Zhang, et al. | Expires 22 April 2025 | [Page] |
Nameserver selection algorithms employed by DNS resolvers are not currently standardized in the DNS protocol, and this has lead to variation in the methods being used by implementations in the field. Recent research has shown that some of these implementations suffer from security vulnerabilities. This document provides an in-depth analysis of nameserver selection utilized by mainstream DNS software and summarizes uncovered vulnerabilities. It then provides recommendations for defending against these security and availability risks. Designers and operators of recursive resolvers can adopt these recommendations to improve the security and stability of the DNS.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 22 April 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
The robustness of the Domain Name System (DNS) is crucial for the function of the Internet. The increasing trend of relying on large DNS providers further underscores this significance. To achieve high resilience, DNS specifications mandate deploying multiple DNS nameservers to provide authoritative DNS answers. [RFC1034] REQUIRES that “every zone be available on at least two servers.” [RFC2182] further says that “nameservers MUST be placed at both topologically and geographically dispersed locations.”¶
However, [RFC1034] vaguely suggests that resolvers should “find the best server to ask”, and no specification provides an exact algorithm for implementing this. Typically, resolvers aim to choose the nameserver with optimal performance, thereby minimizing query latency and enhancing overall efficiency [Yu12] [Mueller17].¶
Lack of standardization in this area can lead to DNS implementations choosing methods that may have security issues. By exploiting properties of these selection algorithms, attackers can potentially disrupt the resilience of nameservers. Such disruptions can degrade the performance of DNS services, leading to a series of security concerns including denial-of-service (DoS) attacks, traffic hijacking, and cache poisoning [Dai21]. These vulnerabilities underscore the necessity for a secure implementation of nameserver selection. Additionally, a selection process may include vulnerable data structures, thereby enabling attackers to significantly degrade the performance of resolvers.¶
Furthermore, no specification describes how resolvers should retry and failover to alternate nameservers, and how resolvers should safely manage the resources employed to speak to nameservers across multiple zones in the process of answering queries. This can lead to two issues. Firstly, the nameserver selection process by resolvers may result in unbounded behaviors, potentially overloading DNS servers with cyclical queries. This vulnerability could be exploited by attackers to launch DoS attacks. Secondly, resolvers may fail to handle the selection of nameserver that is unavailable or providing an error or unusable answer. Implementations that lack robust retry and failover algorithms, can result in availability issues when some of the authoritative servers for a zone are unresponsive or providing unusable responses.¶
This document does not propose to standardize a precise nameserver selection algorithm. However, it does offer recommendations on some general principles that should be followed in developing such algorithms. It starts by reviewing the nameserver selection implemented by widely used recursive DNS software (I1 - I6). Subsequently, the document details identified vulnerabilities (V1 - V5), based on existing research, emphasizing the security gaps in current practices. Lastly, the document recommends options (R1- R5) to address these issues, emphasizes the requirement for resolvers to have robust retry and failover behavior, and restates the requirement for resolvers to sensibly bound their work.¶
To ensure stability and robustness, DNS specifications [RFC1034] [RFC2182] require multiple candidates of nameservers. Specifically, a domain name should be configured with multiple candidates, including both levels of NS records and IP addresses. Such candidates can be placed at both topologically and geographically dispersed locations. An example is shown in Figure 1. Typically, a recursive resolver measures the candidate nameservers while resolving the domain name and selects the next candidate with the best performance from its aspect (e.g., with the lowest latency).¶
The selection algorithm of DNS resolvers for appropriate nameservers plays a critical role in DNS security. When attackers exploit vulnerabilities in the selection, they gain the ability to divert legitimate DNS traffic toward a specified nameserver (e.g., ns1-example.com or an IP address like 192.0.2.1, as shown in Figure 1). This manipulation can lead to multiple security issues:¶
One of the most straightforward consequences is the redirection of all legitimate traffic to a targeted server, causing overloads that bypass traditional DoS defense mechanisms designed to detect and filter out malicious traffic. It allows attackers to evade detection with ease, since benign users generate all redirected traffic [Gupta17] [Mirkovic04] [Zargar13]. Beyond overloading DNS nameservers, attackers can also disrupt the load balancing of upper-layer applications, which often rely on diverse nameserver responses to distribute traffic efficiently [Zhang23].¶
Moreover, exploiting nameserver selection lays the groundwork for further attacks, such as DNS cache poisoning and traffic hijacking. This attack narrows the query path to a single, attacker-controlled route, simplifying the manipulation of DNS responses. Researchers have highlighted that disrupting the resilience and further eliminating candidates of nameservers is a critical step in executing advanced attacks, including acquiring fraudulent TLS certificates [Dai21]. DNSSEC can successfully detect cache poisoning and other response falsification attacks. However, even if DNS resolvers all implement DNSSEC validation, these attacks are still quite practical today given the poor uptake to date of DNSSEC signing in deployed authoritative DNS infrastructure.¶
Additionally, no specification describes how resolvers should retry and failover to alternate nameservers and this can lead to two issues. Firstly, it is essential that resolvers retry other available nameservers for a zone when the selected nameserver does not respond or provides an unusable response. Experience has shown that there are poorly implemented DNS resolvers in the field that do not employ robust retry and failover algorithms, which can lead to availability problems when some of the authoritative servers for a zone are unresponsive or giving erroneous responses. It could also be the case that an attacker has blocked the response from a selected nameserver or injected an unusable response from the nameserver. Secondly, the retry logic must be balanced with a requirement to sensibly bound the work of the resolver. Unbounded behaviours such as too aggressive retries, following too many references, etc may overload the resolver. Attackers can exploit these vulnerabilities by specially crafted zones and NS records, leading to Denial of Service (DoS) attacks.¶
In section 4, this document will summarize various attacks that exploit the implementation of nameserver selection. Such attacks can target both authoritative nameservers and resolvers.¶
Before examining the specific vulnerabilities, this document reviews widely used DNS implementations by referring to published research [Zhang23][Zhang22][Yu12][Mueller17]. It includes four open-source recursive DNS software (I1 - I4) from a white-box view, and one proprietary DNS software (I5) from a black-box view. Additionally, it delves into the examination of open resolvers in the wild (I6).¶
BIND9 [BIND9] utilizes a strategy to select the nameserver with the lowest statistical latency, through the maintenance of the Smoothed Round Trip Time (SRTT) to each nameserver. It updates the SRTT for a nameserver after each query based on the latency, leveraging the algorithm of Exponentially Weighted Moving Average (EWMA) [Hunter86]. Further, BIND9 performs a Bubble Sort to rank candidate nameservers by their SRTT and selects the nameserver with the lowest SRTT for query assignment. Initially, nameservers that have not yet been queried are assigned a random SRTT value ranging from 1 to 32 milliseconds. This range facilitates early selection and rapid SRTT updating, allowing BIND9 to evaluate all candidates at the beginning. Additionally, BIND9 decreases the SRTTs of unselected nameservers after each query, enhancing their likelihood of future selection.¶
In its optimization efforts, BIND9 penalizes nameservers with poor performance. Specifically, its nameserver selection algorithm assigns penalties for response failures, including timeouts and DNSSEC validation errors. When such failures are detected, BIND9 increases the affected nameserver's SRTT, effectively lowering its likelihood of being chosen for future queries. Furthermore, BIND9 utilizes a "lame cache" mechanism to identify and avoid selecting nameservers that fail to provide correct delegation information, known as "lame servers" [ISC-LAME]. This is achieved by recording lame servers associated with specific <QNAME, QTYPE> tuples. Once a nameserver is marked as lame, BIND9 will exclude it from selection for a predetermined period defined by the 'lame-ttl' setting.¶
A notable feature of BIND9's design is sharing the performance status of nameservers across all delegated domains. For instance, if domain A and domain B are delegated to the same nameserver, the SRTT status updated from resolving queries for domain A will also affect the resolution process for domain B. This shared status mechanism, which operates at the IP address level, may be beneficial to BIND9's overall efficiency. However, as discussed in Section 4, this approach can cause certain vulnerabilities.¶
Like BIND9, PowerDNS Recursor [PowerDNS] maintains a shared status for each nameserver. PowerDNS also dynamically updates the SRTT using the EWMA. Different from BIND9, PowerDNS globally shares the status at the NS record level instead of the IP address level. While processing each query, PowerDNS employs a Stable Sort to rank the candidate nameservers, selecting the one with the lowest SRTT to send a query. To facilitate rapid measurement of all candidates, the initial SRTT value is set at 0 milliseconds. Besides, PowerDNS ensures all nameservers are periodically probed. This is achieved by applying a decay factor to the SRTT of all servers during each query. It also resets the SRTT status of a nameserver periodically to maintain up-to-date records.¶
PowerDNS also implements a strategy to penalize nameservers that fail to respond. In cases where a query times out, PowerDNS applies a fixed penalty to the SRTT of the corresponding nameserver. Furthermore, when other types of failures are detected, such as malformed responses, PowerDNS marks the server as throttled. This action temporarily removes the server from the pool of candidates for query resolution.¶
Knot Resolver [Knot] also maintains the shared SRTT of nameservers but utilizes a unique method for selecting nameservers. Specifically, it updates the SRTT of a nameserver using Karn's Algorithm [Karn87]. To rank the candidate nameservers, Knot Resolver defines a specific comparison function. This function takes into account several factors: preference is given to nameservers with IPv6 addresses, those that haven’t been tried before, those with fewer detected failures, and those with the lowest latency. After executing a Quick Sort with the comparison function, Knot Resolver leverages the Epsilon-Greedy Algorithm [Sutton98] to select a nameserver to query. This algorithm balances between exploring different nameservers and exploiting the one with the highest priority. In approximately 5% of the cases, the algorithm will randomly select a nameserver from all available candidates, which allows for the exploration of different servers. In the remaining 95% of cases, the algorithm chooses the nameserver with the highest priority, ensuring that the most optimal server is typically selected for queries.¶
Unbound [Unbound], unlike other software, implements a special strategy for selecting a nameserver. This approach involves maintaining an SRTT for each nameserver, which is updated using Karn's Algorithm. Additionally, Unbound avoids nameserver failures by either adding a penalty to the SRTT or, in more severe cases, removing the nameserver from consideration. Initially, each untried nameserver is assigned an SRTT of 376ms. However, Unbound's selection process is not solely based on the lowest SRTT. It randomly selects from all candidates and only excludes a nameserver when its SRTT exceeds the least-latent one by more than 400ms. This default behavior can be customized, allowing Unbound to select the best nameserver according to a configured probability.¶
A further unique aspect of Unbound is maintaining nameserver status for each delegation individually. Different from other software that shares nameserver status across all delegated domains, Unbound records this information separately for each domain. This domain-specific status ensures that the resolution for one domain does not impact the nameserver status for other domains.¶
There is limited research on Microsoft DNS, as it is a component of the closed-source operating system, Windows Server [MicrosoftDNS]. Through software simulation, [Zhang23] reveals that Microsoft DNS records the status of non-responsive nameservers. This status is shared across all delegated domains at the IP address level.¶
Previous studies have analyzed the selection strategies employed by real-world open resolvers from a black-box perspective [Yu12][Mueller17]. Investigations into the patterns of outgoing DNS queries reveal that a considerable number of open resolvers prioritize nameservers based on the lowest latency. This approach to server selection mirrors the methodologies implemented in I1 - I4. Moreover, a presented research [Zhang23] introduces a large-scale analysis, affirming the prevalent implementation of sharing nameserver statuses. This strategy is similar to the practices observed in I1 - I3.¶
This section summarizes four revealed vulnerabilities (V1-V4) targeting nameserver selection, which can affect current implementations of recursive resolvers.¶
[Zhang23] reveals an attack named Disablance (Load Balancing Disabler) to disrupt the resilience of nameservers. Consequently, attackers can efficiently overload nameservers with legitimate traffic, and even further disrupt DNS-based load balancing of upper-layer applications. Also, the uncovered attack derandomizes the NS selection and lowers the bar of traffic hijacking and cache poisoning.¶
This vulnerability stems from vulnerabilities in authoritative nameservers and recursive resolvers. Specifically, attackers can abuse a defensive strategy of nameservers, which ignore queries for domain names that are out of authority. Despite the strategy violating the DNS specification[RFC8906], it is widely adopted by real-world nameservers and vendors[Zhang23]. With the pre-condition, attackers exploit recursive DNS software's globally shared nameserver selection status. To execute the attack, adversaries create non-responsive DNS queries directed at a target resolver to make the resolver record the failure of nameservers. This is achieved by setting up DNS records for a domain under their control to point toward the hosting provider's nameservers but excluding the targeted nameserver. For example, a victim domain is assigned multiple nameservers (N1 - Nm). Then attackers’ domain is delegated to N2 - Nm, leaving N1 as the targeted nameserver. Following this, attackers send several DNS queries to the targeted resolver. These queries are ignored by the nameservers due to the lack of authority over the attacker's domain, leading recursive resolvers to deem these nameservers (N2 - Nm) as non-responsive, thereby deprioritizing them in future resolutions. As a result, DNS requests from legitimate users will overwhelm the remaining nameserver (N1) for a period.¶
The risk posed by Disablance attacks is widespread. First, three mainstream DNS implementations—BIND9 (I1), PowerDNS (I2), and Microsoft DNS (I5)—have been identified as exploitable to Disablance attacks. Furthermore, the analysis has revealed that 22.24% of the top 1 million Fully Qualified Domain Names (FQDNs) and 3.94% of the top 1 million Second-Level Domains (SLDs) are potential targets of Disablance. Additionally, 37.88% of tested open resolvers, along with 10 out of 14 well-known public DNS services, including Cloudflare and Quad9, are vulnerable to Disablance attacks.¶
[Hay13] presents an attack to manipulate BIND9's (I1) NS selection, enabling adversaries to affect the choice of nameserver queried by a resolver predictably. This vulnerability facilitates the manipulation of NS selections, thereby enhancing the effectiveness of subsequent attacks, such as cache poisoning and traffic redirection.¶
The management of shared nameserver status and its approach to decreasing the SRTT for NS selection makes the attack possible. Specifically, BIND9 lowers the SRTT for less frequently used nameservers to diversify future NS choices, rather than consistently choosing the fastest candidate. For example, a victim domain is assigned multiple nameservers (N1 - Nm). To force the resolver to query N1 predictably, attackers set up a domain delegation involving a series of non-open nameservers (C1 - Cn), their nameserver (A), and the target nameserver (N1). Then, attackers ask for the resolution of their domain, and the resolver queries C1 - Cn and then A. Each query to the non-open nameservers incrementally lowers the SRTT of unqueried candidates, including N1. When A responds to its query, it halts further domain resolution, preventing N1 from being updated in its SRTT. In total, the SRTT of N1 has been reduced by n+1 times. By ensuring a large number of intermediary nameservers (n), attackers can degrade N1's selection priority, effectively redirecting subsequent queries to N1 and disrupting BIND9's NS selection randomness.¶
[Herzberg12] uncovers an attack for compromising the randomness of BIND9's (I1) NS selection mechanism through the usage of fragmented DNS packets. Similar to the impact of V2, this attack disrupt the non-deterministic of the NS selection, reducing the likelihood of resolvers querying alternative nameservers. As V1 and V2, this vulnerability can also lay the groundwork for advanced attacks.¶
This vulnerability stems from the way BIND9 punishes the malformed responses. For example, a victim domain is assigned multiple nameservers (N1 - Nm). The attack begins with the target resolver being induced to generate of large DNS responses, which are split into fragments. Attackers, operating from an IP spoofing position, subsequently send crafted second fragments for all nameservers except the intended target (N2 - Nm, with N1 being the target), each containing an arbitrary byte along with the standard headers. These malicious fragments are designed to pair with the legitimate first fragment, creating a corrupted DNS packet. Hence, the resolver marks the status of N2 - Nm as faulty, continuously selecting N1 for future queries. Consequently, this manipulation allows attackers to control the choice of nameserver.¶
Attackers may abuse the recursive cache, which is a component in the process of nameserver selection, to consumes the resources of a resolver. This attack differs from the above attacks (V1 - V3) that disrupt the resilience of nameservers directly. Instead, it exploits inefficient designs of recursive cache and further degrades the performance of resolvers.¶
This document details a fixed vulnerability[CVE2021-25219] as an example. The core of this vulnerability lies in BIND9's handling of the lame cache. BIND9 identifies and records a lame nameserver, which cannot provide correct information on domain delegation [ISC-LAME]. Upon identifying a lame nameserver, BIND9 will avoid selecting this nameserver for a predetermined period. However, the method of recording this information—using a <QNAME, QTYPE> tuple for each entry. Attackers can initiate a series of malformed responses from nameservers, causing the resolver to devote most of its CPU time to managing and checking the lame cache. This flawed implementation potentially leads to considerable processing delays for incoming client queries.¶
[Moura21] and [Xu23] propose that attacks exploit the unbounded behaviors of NS selection to craft cyclical queries of resolvers, referred to as TsuName and TsuKing respectively. To ensure availability and robustness, some resolvers may involve unbounded retries and failovers in the NS selection, making attacks abusing cyclical queries possible. Specifically, TsuName takes advantage of the cyclic dependencies that occur when the resolution of one domain name requires the resolution of another. By exploiting the flaw in NS selection, attackers can manipulate the resolver to participate in queries loops for CNAME and NS references. TsuKing, by exploiting additional vulnerabilities in NS selection, can cause a more significant overload to the victims. Due to the unbounded retry behavior and the inability to identify the RD flag, attackers can coordinate a number of vulnerable resolvers into a query loop to amplify queries hierarchically. Such attacks can efficiently amplify DNS traffic and pose a significant DoS threat to both resolvers and nameservers.¶
In this section, this document presents recommendations (R1 - R4) to address the vulnerabilities discussed earlier. To provide practical solutions, it references the current implementations of mainstream recursive DNS software (Unbound and Knot Resolver) rather than specifying an implementation from scratch.¶
Most implementations, including I1, I2, I5, and most of I6, deterministically select the candidate nameserver with the highest priority. This approach potentially allows attackers to manipulate the status of nameservers, effectively eliminating the possibility of querying certain candidates. Addressing this, a balanced approach to nameserver selection can significantly mitigate vulnerabilities (V1 - V3) by reducing the predictability exploited by attackers.¶
We recommend the strategies employed by Knot Resolver and Unbound, which strike a balance between optimization and exploration. Specifically, Knot Resolver implements the Epsilon-Greedy Algorithm for nameserver selection. This algorithm maintains a balance between exploring various nameservers and exploiting the one deemed to have the highest priority. In approximately 5% of queries, it will randomly select a nameserver from the pool of available candidates. This randomness disrupts the deterministic selection process that attackers might exploit. Similarly, Unbound's selection mechanism does not rely solely on the shortest SRTT. It introduces randomness by selecting among all candidates and only excluding a nameserver if its SRTT exceeds the least-latent one by more than 400 milliseconds. These approaches render efforts to manipulate nameserver selection less effective and can be implemented with minimal modifications to existing software that shares nameserver status.¶
This document recommends maintaining an independent nameserver status for each domain since the shared status has been identified as contributing to two vulnerabilities (V1 and V2). Most current implementations manage a shared nameserver status across all delegated domains, which presents a significant security risk. The delegation relationship observed by resolvers does not guarantee that a domain is hosted on a designated nameserver, enabling attackers to manipulate the priority of a nameserver by crafting their own domains' configurations. Therefore, this document recommends assigning an independent status to every domain to enhance security measures. Currently, Unbound utilizes this strategy, rendering it immune to the two vulnerabilities.¶
Effective cache management can avoid the resource overconsumption associated with maintaining independent nameserver statuses. One of the possible reasons for a shared nameserver status is to conserve resources and protect against DoS attacks. To prove the feasibility of our recommendation, this document includes an experiment. We configured a machine with a 16-core CPU and 16 GB of memory, installing Unbound version 1.13.1, which managed independent nameserver statuses. The experiment simulated an attack aiming to overconsume resolver resources. We created a wildcard domain with five NS records and overload the resolver with queries for random subdomains at a rate of 3,000 QPS over a 12-hour period. The experiment resulted in a low memory consumption of approximately 92 MB, with no service interruptions detected in Unbound. These findings indicate that adopting an independent nameserver status does not inherently create a vulnerability to DoS attacks.¶
This document proposes a secure data structure for DNS caches to mitigate vulnerabilities targeting recursive resolvers (V4). NS selection typically relies on cached information regarding the performance of nameservers. However, flawed implementations may allow adversaries to disrupt the performance of recursive resolvers, increasing the risk of DoS attacks. To mitigate the risk, this document advises against inefficient data structures and unnecessary fine-grained information within the recursive cache.¶
By integrating the above options, DNS resolvers can enhance their defense against manipulation of NS selection, thereby preserving the robustness and stability of DNS operation and contributing to overall Internet security.¶
This document suggests a proper design that prevents cyclic queries (V5) and related attacks that negatively impact a resolver's resources. Some unbounded designs can cause DNS resolvers to do more work than they reasonably expected to do to resolve a name, and lead to potential denial of service attacks.¶
Resolvers should be able to prevent cyclical queries and abort with an error. More generally, resolvers should take concrete steps to sensibly bound the total amount of work they are willing to perform (or the time spent doing that work) to service a request. As the sugguestions provided by [Moura21] and [Xu23], this document recommends avoiding unlimited NS references and aggressive retries. Also, related security checks (e.g., RD flag) should be ensured during the process of NS selection. Note the notion of limiting work is a general principle that applies to the entirety of a resolvers work, not just to the work needed for nameserver selection.¶
To avoid leading to another issue: lack of availability and robustness, we re-emphasizes here the importance of appropriate retries and failovers. All DNS resolvers should automatically retry and failover to other nameservers for a DNS zone when they encounter a nameserver that is unresponsive or providing an erroneous response. Not only must resolvers have a robust retry and failover algorithm, but no middlebox or network appliance interposed in the traffic path of resolvers should interfere with their ability to perform retry and failover. Experience has shown that there is a small percentage of deployed DNS resolvers in the field that do not robustly failover to alternate nameservers leading to availability issues. Importantly, the retry and failover mechanism needs to be balanced with the requirement to bound the resolver's work (R4).¶
A proper retry and failover design also makes resolvers robust against attacks where an adversary is attempting interfere with DNS responses from authoritative servers in an attempt to cause an outage (e.g. by blocking responses), or in an attempt to redirect clients (e.g. by modifying responses that could be detected with DNSSEC validation). As long as the attacker is not able inject these attacks along the traffic path to all or most of the authoritative servers for a zone, a resolver with robust retry and failover behavior should be able to successfully resolve names within that zone.¶
Some of the specific conditions under which a DNS resolver should retry other nameservers for a zone include an unresponsive namerserver, a nameserver giving an erroneous response (any response code other than NOERROR or NXDOMAIN), a nameserver giving responses that cannot be authenticated, such as responses with missing DNSSEC signatures when they were expected to be signed, responses with bogus or expired signatures, etc.¶
In early DNS specifications, resolvers were discouraged from aggressive behavior to prevent unnecessary transmissions. Consequently, the nameserver selection algorithm typically selects only one server to query, imposes restrictions on the time and scope of retries, and limits the frequency of probing. This conservative approach can hinder the resolver's ability to identify the "best" server to query. Some authoritative DNS operators have expressed concerns that resolvers can do more to find the optimal server. Intuitive proposals suggest that resolvers could enhance their performance for popular domains and adapt their aggressiveness based on current load conditions.¶
TBD¶
This document contains no actions for IANA.¶
Thanks to all the people who reviewed the draft and gave comments and support.¶