A. Oppermann Internet Draft Internet Pipeline AG C. Jeker Document: draft-oppermann-BGPDNS-RR-sorting- Internet Business 00.txt Solutions AG Expires: June 2002 January 2001 Using BGP topology information for DNS RR sorting Status of this Memo This document is an Internet-Draft and is subject to all provisions of Section 10 of RFC2026 except that the right to produce derivative works is not granted. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract Traditional AS-based server multi-homing is a burden to the global Internet routing system because of excessive AS number consumption and non-aggregated prefix growth. Many times AS-based multi-homing is not the right solution for the customer needs but the only one available. This document describes a method and a protocol for doing AS-less and PI IP-less server multi-homing with multiple ISP’s by assigning multiple IP’s to the customers servers and using the topology information contained in the global BGP routing table to sort the multiple DNS resource records by nearest first relative to the requestor. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [1]. Oppermann Informational - Expires June 2002 1 BGPDNS January 2001 Introduction In many cases it is desired by the end-users to connect (important) servers to more than one upstream ISP. Reasons include: - Acquire redundancy in case (the link to) one upstream ISP fails - Balance/share load over more than one upstream ISP - Become independent from individual ISP’s Today these objectives have to be satisfied by requesting PI IP space, obtaining an AS number and participating in the global BGP routing. Whilst some advantages, this approach has several drawbacks to the Internet at large and to the newly multi-homed customer: - Fragmentation of the IP address space leads to excessive memory and computing power requirements in the Internet core routers as well as on the newly connected multi-homed customer - Rapid exhaustion of the current AS number space requires an upgrade of the current AS number system with transition to 32 bit AS numbers - Running and tuning full BGP at the end-users site requires significant knowledge and experience as well as continued monitoring and adjustments - BGP without knowledgeable tuning quickly leads to unintended asymmetric traffic patterns through the upstream ISP’s - Unqualified modification on the BGP speaking router quickly leads to disconnection from the global Internet because of missing route announcements or route flap dampening - Misconfiguration of the routing table entries and parameters quickly lead to bogus route announcements like announcing a full /8 “Class-A” or multiple /24 “Class-C” instead of an aggregate and causes serious traffic interruptions not only to the end-user but also in other parts of the global Internet - ISP’s have to employ very strict filters towards their multi- homed customers because of these frequent problems. These filters in turn decrease flexibility and increase complexity while representing a significant source of errors in themselves. Many times the requestors of non-aggregated PI IP address space and AS numbers are not aware of these implications and lack sufficient technological background knowledge to qualify and quantify the impact on themselves and the Internet in general. Oppermann Informational - Expires June 2002 2 BGPDNS January 2001 Many times only one or a subset of one of the reasons for AS based multi-homing is given by the requestor. Often in these cases the cure of AS based multi-homing is worse than the disease of being single-homed. By providing an alternative to AS based multi-homing this document intends to reduce the pressure, because of lack of suitable alternatives, onto the global BGP routing system by providing a solution for many of the current independent IP address space and AS number requestors. Overview of BGPDNS The BGPDNS approach combines the power of BGP with the ease of DNS. In AS-based multi-homing a server only has one IP address out of the PI IP address space. With BGPDNS this is changed. In BGPDNS multi-homing the server operator gets multiple upstream links from multiple ISP’s. From each of these ISP’s the operator also receives a reasonable IP prefix out of their aggregates. To the multi-homed server one IP address of each prefix of the ISP’s is assigned to it. Thus this server is reachable through a number of IP addresses trough their respective ISP’s. The router connecting to all these ISP’s does policy routing to direct the outgoing packets to the ISP where the IP addresses prefix belong to. This aspect also represents a major advantage in that it gives connection path symmetry between the server and the client. All of the IP address of this server are put into DNS as multiple “A” records to the same name. The BGPDNS server establishes a BGP listening-only session with each of the ISP’s routers or route- servers to get a comprehensive view of the Internet topology from it’s own perspective. It does normal best-path evaluation either subject to the default rules or custom crafted metrics as in normal AS-based multi-homing. Upon a DNS request for a target server the DNS server will send a packet containing the IP address of the requestor and a list of all IP addresses of the target server to the BGPDNS server. The BGPDNS server looks up the best path to the requestor with the BGP topology information. It then weights each of the target server IP addresses according to their topology-wise closeness to the requestor. The BGPDNS server returns the weighted list to the DNS server which then changes the order of the multiple DNS resource records based on their weight. Thus the IP address of the target server corresponding to the best path is moved first place in the DNS answer to the requestor. As such the ISP with the best connectivity in direction of the requestor is selected and will carry the traffic to and from this requestor. Oppermann Informational - Expires June 2002 3 BGPDNS January 2001 The traffic pattern and path selection is equal to normal AS-based multi-homing with one exception. Incoming and outgoing traffic for each connection is symmetrical and will go through the same ISP in and out whereas in normal AS-based multi-homing it can be, and often is, asymmetrical. Responsiveness to link state or topology changes is immediate for new requests as in normal AS-based multi-homing. If a link fails the corresponding IP won’t be chosen for new requests anymore because all BGP NLRI path info is gone for this link as well (BGP session down or all NLRI are withdrawn). In the case of a upstream link failure the maximum black hole time for a particular requestor is the configured DNS resource record expiration timeout. Only requestors that had this particular upstream link as best path are affected and only for the DNS RR caching time. The DNS RR expiration timeout has to be chosen carefully based on the operating environment. See section “choosing the right values” in this documents for a guide. Inner workings of BGPDNS In principal the inner workings and the selection process are rather simple. The reader is assumed to be familiar with the relevant BGP and DNS RFC’s as well as operational experience of running BGP in the Internet. Principal procedure on the BGPDNS server (in cisco cli syntax): 1. Requestor wants to resolve FQDN of target server and has already found it’s the authoritive DNS server 2. DNS server receives DNS resolve request from requestor 3. DNS server finds more than one “IN A” record 4. DNS server sends off BGPDNS packet to BGPDNS server 5. BGPDNS receives packet from DNS server containing requestors IP address and list of available server IP addresses 6. BGPDNS does a “show ip bgp requestor-ip” to find the best path to the requestor 7. BGPDNS takes leftmost AS number of best path 8. BGPDNS does a “show ip bgp regexp ^leftmost-as$” to get a list of prefixes of the upstreams 9. BGPDNS finds the local IP that is within one of these prefixes Oppermann Informational - Expires June 2002 4 BGPDNS January 2001 10. BGPDNS returns the packet to DNS server with this specific IP set to highest preference 11. DNS server responds to requestor with response sorted highest- first according to the preferences values received from BGPDNS 12. Requestor uses the first IP address of the DNS response to establish a connection with the target server Rationale IP Address space depletion is not as fast and does not have the same impact as AS number space depletion and Internet core router memory consumption. With next generation IP numbering [IPv6] address space depletion is no longer a issue. Also the IP address space consumption by now has a tradition of being tightly and carefully managed. By keeping IP address space aggregation intact and the positive effects on AS numbers and router memory by far outweigh the negative effects of BGPDNS by requiring one IP address per server and upstream ISP. Content delivery networks like Akamai have proven that DNS RR based global load balancing is working on a large scale and does not have a negative impact on the Internet at large nor on the individual user. Nevertheless the DNS RR expiration timeouts have to be chosen according to the merits of the intended type of multi-homing. For all services that are either stateless, have only short-lived or restartable sessions BGPDNS is well suited and provides equal results as true AS-based multi-homing. Taking BGP convergence times and dampening into consideration a short lived session shall be defined as any kind of protocol where a large majority (>75%) of the sessions are shorter than 15 minutes. The term “session” is equal to the intended lifetime of a TCP connection or UDP stream. Thus the definition applies to the following popular services based the protocols HTTP, HTTPS, SMTP, POP, IMAP, FTP (in part), NNTP (client sessions). Advantages and disadvantages of BGPDNS Advantages of BGPDNS: - Link and ISP redundancy - Load balancing over more than one link and ISP - Independence of a single ISP - Traffic symmetry Oppermann Informational - Expires June 2002 5 BGPDNS January 2001 - No impact on global BGP routing system Disadvantages of BGPDNS: - Each server requires as many IP addresses as it is connected to ISP’s - For cached DNS resource records only timeout based convergence - Additional load on the DNS system Recommendation of BGPDNS For AS number requests whose main purpose is server multi-homing of all services that are stateless, have only short-lived or restartable sessions a policy like the RIPE “HTTP policy” or “Static Dial Up policy” should be applied. Choosing the right values The question is how to deal with cached DNS RR records which point to the now unreachable IP address. BGP today has a global convergence and propagation time of approx. 3 minutes as shown by recent research []. It is not recommended to set the DNS RR expiry timeout below 20 seconds as this might lead to multiple DNS requests for the same HTTP page download if a client is behind a slow link. Depending on the frequency of topology changes and the likelihood of link failures the DNS RR expiration timeout can be set as high as a couple of hours. Determining the optimal DNS RR expiration timeout is balancing between two opposite tradeoffs. If the expiry timeout is low this will result in many repeated DNS requests while adding latency to the session leading to poor responsiveness experience by the content user. This effect is even more prominent when either the round trip time from the content user to the BGPDNS server is high or packet loss on the path is occurring. On the other hand, if the expiration timeout is high this may result in, by the time of reuse of the cached value, sub-optimal path information or, if the link of the preferred IP became unavailable in the meantime, to partial unreachability until the cached DNS RR expires and a new request is being made. Guideline for selecting the right DNS RR expiration timeout value: - Connected to number of ISP’s (put in number) - Interconnection among ISP’s where the largest target user population is connected through (1 poor .. 10 good) - Stability of links to ISP’s (1 poor .. 10 good) Oppermann Informational - Expires June 2002 6 BGPDNS January 2001 - Round trip time (average) to target user population (<30ms = 10 good .. >500ms = 1 poor) Count these numbers together and range it in within a DNS RR expiration timeout window from 20 seconds (score 5) to 7200 seconds (2 hours, score 40). Rounding up or down to the next reasonable minute value and adjusting for any site or application specific situation is believed by the authors to result in an optimal value. So a score of 15 would give 2700 seconds, 45 minutes for the DNS RR expiration timeout. Deployment considerations and recommendations Normal scenario ISP A .. ISP X 172.16/19 .. 192.168/20 | | 172.16.5.0/27 .. 192.168.14.128/27 \ / \ / \ / \ / \ / \ / \ / +--------+ | Router | Policy routing outgoing packets +--------+ +--------+ | | BGPDNS |-----------| BGPDNS server +--------+ | | +--------+ | | DNS |-----------| DNS server +--------+ | | +--------+ | Server | 172.16.5.7, .. , 192.168.14.131 +--------+ Large scale scenario ISP A .. ISP X 172.16/19 .. 192.168/20 | | 172.16.5.0/27 .. 192.168.14.128/27 \ / Oppermann Informational - Expires June 2002 7 BGPDNS January 2001 \ / \ / \ / \ / \ / \ / +--------+ | Router | Policy routing outgoing packets +--------+ +--------+ | | BGPDNS |-----------| BGPDNS server +--------+ | | +--------+ | | DNS |-----------| DNS server +--------+ | | +-----+ | SLB | Server load balancer | | 172.16.5.7, .. , 192.168.14.131 +-----+ / \ / \ / \ +--------+ +--------+ | Server | ... | Server | +--------+ +--------+ Setup and scaling on the ISP side. Use of dedicated route-servers instead of BGP on the edge routers. The ISP uses it’s normal AS. All the BGPDNS customers will use AS65525 for default. All customers can use the same AS because they wont announce anything. The ISP MUST filter any announcement coming from these customers. EBGP multi-hop would be the most common case to feed the BGPDNS servers. Simple case ISPx ¦ Customer ¦ ¦ +--------+ ¦ | BGPDNS | ¦ +--------+ ¦ / +-----------+ +-----------+ / |Edge Router|---EBGP--->|Cust.Router|---/ +-----------+ +-----------+ ¦ ¦ ¦ Oppermann Informational - Expires June 2002 8 BGPDNS January 2001 Large scale case ISPx ¦ Customer ¦ ¦ +-----------+ +--------+ |RouteServer|-- --- --- ---EBGP--- --- --- --- ->| BGPDNS | +-----------+ multi-hop +--------+ \ ¦ / \ +-----------+ +-----------+ / ----|Edge Router|---------->|Cust.Router|---/ +-----------+ +-----------+ ¦ ¦ ¦ The ISP should filter extensively the line on his side. Especially have an access list that only allows the customers prefixes in. This is to avoid routing packets with foreign prefixes and to make configuration errors on the customers side immediately obvious. Customer Router. Other uses The protocol is generic enough to be useful for others applications than BGP and DNS. The BGP part may be replaced with any other kind of suitable application capable of returning the desired answer. The DNS part for example may be replaced instead with a HTTP server issuing redirects to the appropriate server. The BGPDNS protocol The BGPDNS protocol is spoken between the DNS server and the BGP listener. The communication stateless and uses the UDP datagrams for communication. Because of its close relationship to the BGP the port number 179/UDP is chosen for the BGPDNS task on the BGP listener. The BGPDNS protocol packet format 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Version |Options | Oppermann Informational - Expires June 2002 9 BGPDNS January 2001 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Query-type |Request-type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Request-ID | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Multi-view | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |RFU | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Number of Objects of Query-type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |See Query-type detail definition | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |folded MD5 signature | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Version is the version of BGPDNS this packet conforms to. This field should make future extensions of this protocol easier. Before defining a new version one has to consider if it is not more appropriate to simply define a Query-type. The protocol described in this document is version 0x1. Options contains only *packet* level options. The options bitfield has the following format relative to it’s start position: Bit 00 : MD5 signature present Bit 01-15: reserved The options bitfield has to be set to NULL if no options are present. Any bit set to ONE has the meaning of an enabled option. Query-type is the type of the query: 0x1: client sorting IPv4 0x2: server sorting IPv4 0x3: client sorting IPv6 0x4: server sorting IPv6 0x3-0xFFFF: reserved for Query-type 1: Oppermann Informational - Expires June 2002 10 BGPDNS January 2001 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |RR answer IP address IPv4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |RR ranking | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... for Query-type 2: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |relative source address, IPv4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |RR answer IP address IPv4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |RR ranking | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... Request-type 0x1: request 0x2: answer 0x3-0xFFFF: reserved The DNS server sends a request packet with 0x1 request-type. The BGPDNS server sends a answer packet with 0x2 request-type. Request-ID contains a unique ID for this query set the requestor. The server MUST not alter this in any way and include it in any response to the requestor regarding this particular request. The requestor MUST generate a new unique Request-ID for any request it generates. The Request-ID together with the IP address of the requestor uniquely identifies each unanswered request for the server. In the requestor the Request-ID uniquely identifies each outstanding request to the server. The maximum number of outstanding requests is two by the power of the Request-ID field size. The Request-ID MUST not be reused by the requestor within twice the timeframe of the maximum timeout. The timeout MUST be at minimum ONE full second and at maximum THIRTY seconds. The recommended timeout value is THREE seconds. The timeout value should be user adjustable within the specified boundaries by an configuration option at the start or runtime of the application. Oppermann Informational - Expires June 2002 11 BGPDNS January 2001 Multi-view to select the BGP table view in the BGPDNS server. A BGPDNS server can have multiple BGP views if it serves multiple differently connected or routed entities. The number in this field is equal to the number of the view in the BGPDNS server. The default view is named NULL. RFU reserved for future use. These fields MUST be sent as NULL and MUST be ignored if received. Number of Objects of Request-type count of the objects of the specified request-type. A maximum packet size of 1024 octets allows for 124 objects of request-type 0x1 and 0x2 (IPv4) or 49 objects of 0x3 and 0x4 (IPv6). MD5 signature the MD5 signature is folded from 16 octets to 4 octets. The first four octets are XOR’ed with the next four octets. The result is again XOR’ed with the next four octets. The MD5 signature is computed by running the MD5 algorithm over the whole packet to be sent. The shared secret is used as salt to the MD5 algorithm and if not sufficient long, repeated as many times as necessary. For the computation the MD5 signature field is NULL. General The requestor MUST not alter the requestor supplied values (except the Request-type, RR ranking and MD5 signature fields) in any way. A BGPDNS packet MUST never be more than 1024 octets. Implementation considerations and recommendations Response time is critical Filling in the preference values Failure cases BGPDNS server does not respond Oppermann Informational - Expires June 2002 12 BGPDNS January 2001 Loss of all BGP sessions Network not in table Security Considerations The authors have identified the following potential security threads to BGPDNS: Authorization of BGPDNS requestors Denial of Service, overload attacks Spoofing of requests/answers Sniffing of communication Information leakage Oppermann Informational - Expires June 2002 13 BGPDNS January 2001 References 1 RFC 2119 Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997 Oppermann Informational - Expires June 2002 14 Author's Addresses Andre Oppermann Internet Pipeline AG Hardstrasse 235 Phone: +41-1-277-75-55 8005 Zuerich, Switzerland Email: oppermann@pipeline.ch Claudio Jeker Internet Business Solutions AG Hardstrasse 235 Phone: +41-1-277-75-75 8005 Zuerich, Switzerland Email: jeker@n-r-g.com Oppermann Informational - Expires June 2002 15