Using IP Address Mobility
To Provide High Availability via NT Clusters
Joseph S. Barrera III
NT Clusters provide high availability services to unmodified clients and unmodified servers using standard network protocols and naming, in particular IP (Internet Protocol) and DNS (Domain Name System). The simplest NT Cluster consists of two NT servers, each capable of providing a service, e.g., a remote file share, so that if one server fails, the other can immediately continue providing the service. Thus to the client, a server failure appears to be an instant reboot of the server. When the low levels of the client software contain retry logic, then the failure can be masked completely.
NT Clusters provide failure transparency via IP Address mobility. Any service provided by the cluster is accessed via a mobile IP address that moves to a surviving node when a previous node fails. The IP address move can be done within four seconds, and is transparent to unmodified and geographically distributed clients. NT Clusters currently use IP mobility to implement highly available web, file, print, and database service, using unmodified NT server applications, for unmodified DOS, Windows and UNIX clients, from anywhere on the planet.
This paper argues that IP mobility is a requirement for high availability to such clients, as the (more natural) alternative of mapping the server’s name to the fixed IP address of the surviving node won’t work. It doesn’t work because the name-to-IP address binding is cached by clients and by DNS itself, and there is no mechanism to update these caches. Thus even after the server name has been bound to the new IP address, clients will continue to believe that the server is down because they will continue to use the IP address for the failed node.
IP mobility is the ability to move an IP address from one machine to another. For IP mobility to be useful, it must be fast (on the order of seconds, not minutes or hours) and scalable (cost independent of number of clients). Fast and scalable IP address mobility is already supported by IP, specifically by IP routing and by the Address Resolution Protocol (ARP) [RFC826] that underlies IP. If a host is on the same physical network as the mobile IP address, it is immediately informed of an IP address move via an ARP broadcast; this allows IP mobility to be fast. If a host is not on the same physical network, then it does not need to be informed of the move; this allows IP mobility to be scalable.
IP routing is what connects different physical IP networks. When a host sends an IP packet to a host on a different network, it actually is sent through a series of gateways, machines connected to two or more networks. Only the last gateway, the one on the same network as the mobile IP address, needs to know the new location of the mobile IP address.
ARP (Address Resolution Protocol)
ARP provides the mapping from an IP address to a MAC (Media Access Control) address (e.g., Ethernet or FDDI address). A host broadcasts an ARP request when it wishes to discover the MAC address for an IP address on the same network. The owner of the IP address responds with an ARP reply containing the corresponding MAC and IP address pair. The original host is then allowed to cache this pair to avoid having to repeat the same ARP request. However, this ARP cache is required to update itself based on the sender’s MAC and IP address of all ARP requests. This requirement ensures that if a host changes its MAC address, its subsequent ARP requests will update the ARP caches of the other hosts on the same network.
When a host reboots, it broadcasts an ARP request for its (normal, non-mobile) IP address. This broadcast serves two purposes. First, it will detect whether the IP address is already in use by some other host (as it should not be), as the other host will respond with an ARP reply. Second, if the host did change its MAC address, the ARP request will update any ARP caches that have an entry for the IP address, replacing the old MAC address with the new. In particular, it will update the ARP cache for any gateway on the network, ensuring that subsequent packets to the IP address sent from outside the network, as well as inside, are ultimately sent to the correct MAC address.
Similarly, when a mobile IP address is moved from one host to another, the new host broadcasts an ARP request containing the IP address and the new host’s MAC address, similarly ensuring that all affected ARP caches are updated with the new MAC address.
NT Clusters use mobile IP addresses in addition to the fixed IP addresses assigned to each node (NT Server) in the cluster. These mobile IP addresses belong to resource groups, groups of dependent resources that the cluster attempts to keep running on one node or another. For example, a resource group might contain a web server, a database server (to handle queries made via the web server), a disk on a shared SCSI string between two nodes of the cluster, a mobile IP address, and a domain name (distinct from that used by any node) bound to the mobile IP address.
A two-node cluster could have two such resource groups, each designated to run on a different node by default. If one of the nodes failed, then the NT Cluster software would detect the failure and move the affected resource group to the remaining node. Once the other resources in the group had been started, the Cluster software would "start" the mobile IP address, triggering the ARP reply broadcasts that would clear any ARP cache entries associating the mobile IP address with the failed node. Any client previously connected to the failed node would now be able to reconnect to the new node.
The NT Cluster software currently takes about four seconds to move an IP address to a new node, corresponding to the four ARP reply broadcasts it does, one a second, before deciding that it is safe to use the address. Future versions of the software could cut this time substantially by recognizing when the IP address was previously in use by a failed node, in which case the new node could start using the IP address immediately.
Note that for IP mobility to work, clients must always refer to the mobile IP addresses and associated names, and never to the actual (fixed) node IP addresses or names. One soon sees that NT Cluster resource groups form "Virtual NT Servers" that take on a life of their own independent of the physical nodes they reside on. For example, one can replace all the physical NT servers, one by one, in an NT cluster without ever stopping the virtual NT servers.
Although IP address mobility does allow NT Clustering to provide high availability, it might seem more natural to change the mapping from domain name to IP address (mobile domain names) instead of IP address to MAC address (mobile IP addresses). However, such an approach does not work, because name-to-IP address mappings are cached in too many places, without any way of updating these caches. This caching reflects the assumption that domain name-to-IP address mappings do not change.
Resolvers Cache Name-to-IP Address Mappings
DNS stores domain name information "in a distributed fashion with local caching to improve performance…. The update process allows updates [of domain name information] to percolate out through the users of the domain system rather than guaranteeing that all copies are simultaneously updated." [RFC1034, pp2-4].
This local caching is done by resolvers, agents that query domain name servers on behalf of users. Such caching is crucial for reducing the cost of name queries for clients, as well as for reducing load on name servers. However, there is no mechanism for name servers to invalidate the name-to-IP-address mappings in these caches. Such mappings are only invalidated when the TTL (time-to-live) associated with them has expired, and these TTLs are usually on the order of days [RFC1034, pp37f]. A TTL short enough to provide timely IP mobility is destined to generate unacceptable load on the corresponding name servers. This is especially true for the most well-known servers (e.g., corporate web sites) for which high availability is particularly important and thus are the best candidates for NT Clustering.
Programs Cache Name-to-IP Address Mappings
Client programs also cache name-to-IP address bindings in a variety of ways. Some programs, when supplied with a domain name, perform the name lookup immediately, and only remember the IP address. Even when such programs contain retry logic, they will be unable to find a host that has resurrected itself with a new IP address. Ping is a trivial example of such a program; a more serious example is the NFS filesystem in, for example, Linux [LINUX].
Other programs will intentionally maintain a private name-to-IP-address cache to avoid the cost of the name lookup. Both the Internet Explorer 3.01 and the Netscape Navigator 3.01 browsers appear to maintain such a cache. Once either has viewed a page on a machine, it will continue to point to the machine’s old IP address even after the machine has reappeared with a new address. This is true even after reloading the current page and even though the machine’s name is clearly part of the URL.
NT Clustering uses IP mobility to providing high availability to unmodified and geographically distributed clients. Fast and scalable IP mobility is provided by IP as-is. The alternative of mapping domain names to different fixed IP address does not work due to the caching of such mappings in clients and DNS itself.
[RFC826] Plummer, David C. An Ethernet Address Resolution Protocol. Internet Request for Comments. Massachusetts Institute of Technology, November 1982.
[RFC1034] P. Mockapetris. Domain Names – Concepts and Facilities. Internet Request for Comments. USC/Information Sciences Institute, November 1987.
[LINUX] Slackware Linux, Version 2.0.27, December 1996.