Section 12.1. Network Load-Balancing Clusters

12.1. Network Load-Balancing Clusters

NLB in Windows Server 2003 is accomplished by a special network driver that works between the drivers for the physical network adapter and the TCP/IP stack. This driver communicates with the NLB program (called wlbs.exe, for the Windows Load Balancing Service) running at the application layerthe same layer in the OSI model as the application you are clustering. NLB can work over FDDI- or Ethernet-based networkseven wireless networksat up to gigabit speeds.

Why would you choose NLB? For a few reasons:

NLB is an inexpensive way to make a TCP/IP-dependent application somewhat fault tolerant, without the expense of maintaining a true server cluster with fault-tolerant components. No special hardware is required to create an NLB cluster. It's also cheap hardware-wise because you need only two network adapters to mitigate a single point of failure.
The "shared nothing" approachmeaning each server owns its own resources and doesn't share them with the cluster for management purposes, so to speakis easier to administer and less expensive to implement, although there is always some data lag between servers while information is transferred among the members. (This approach also has its drawbacks, however, because NLB can only direct clients to back-end servers or to independently replicated data.)
Fault tolerance is provided at the network layer, ensuring that network connections are not directed to a server that is down
Performance is improved for your web or FTP resource because load is distributed automatically among all members of the NLB cluster

NLB works in a seemingly simple way: all computers in an NLB cluster have their own IP address just like all networked machines do these days, but they also share a single, cluster-aware IP address that allows each member to answer requests on that IP address. NLB takes care of the IP address conflict problem and allows clients who connect to that shared IP address to be directed automatically to one of the cluster members.

NLB clusters support a maximum of 32 cluster members, meaning that no more than 32 machines can participate in the load-balancing and sharing features. Most applications that have a load over and above what a single 32-member cluster can handle take advantage of multiple clusters and use some sort of DNS load-balancing technique or device to distribute requests to the multiple clusters individually.

When considering an NLB cluster for your application, ask yourself the following questions: how will failure affect application and other cluster members? If you are a running a high-volume e-commerce site and one member of your cluster fails, are the other servers in the cluster adequately equipped to handle the extra traffic from the failed server? A lot of cluster implementations miss this important concept and later see the consequencea cascading failure caused by perpetually growing load failed over onto servers perpetually failing from overload. Such a scenario is entirely likely and also entirely defeats the true purpose of a cluster. Avoid this by ensuring that all cluster members have sufficient hardware specifications to handle additional traffic when necessary.

Also examine the kind of application you are planning on clustering. What types of resources does it use extensively? Different types of applications stretch different components of the systems participating in a cluster. Most enterprise applications have some sort of performance testing utility; take advantage of any that your application offers in a testing lab and determine where potential bottlenecks might lie.

Web applications, Terminal Services, and Microsoft's new ISA Server 2004 product can take advantage of NLB clustering.

It's important to be aware that NLB is unable to detect if a service on the server has crashed but not the machine itself, so it could direct a user to a system that can't offer the requested service.

12.1.1. NLB Terminology

Before we dig in deeper in our coverage of NLB, let's discuss a few terms that you will see. Some of the most common NLB technical terms are:

NLB driver: This driver resides in memory on all members of a cluster and is instrumental in choosing which cluster node will accept and process the packet. Coupled with port rules and client affinity (all defined on the following pages), the driver decides whether to send the packet up the TCP/IP stack to the application on the current machine, or to pass on the packet because another server in the cluster will handle it.
Unicast mode: In unicast mode , NLB hosts send packets to a single recipient.
Multicast mode: In multicast mode, NLB hosts send packets to multiple recipients at the same time.
Port rules: Port rules define the applications on which NLB will "work its magic," so to speak. Certain applications listen for packets sent to them on specific port numbersfor example, web servers usually listen for packets addressed to TCP port 80. You use port rules to instruct NLB to answer requests and load-balance them.
Affinity: Affinity is a setting which controls whether traffic that originated from a certain cluster member should be returned to that particular cluster node. Effectively, this controls which cluster nodes will accept what types of traffic.

12.1.2. NLB Operation Styles and Modes

An NLB cluster can operate in four different ways:

With a single network card in each server, using unicast mode
With multiple network cards in each server, using unicast mode
With a single network card in each server, using multicast mode
With multiple network cards in each server, using multicast mode

You cannot mix unicast and multicast modes among the members of your cluster. All members must be running either unicast or multicast mode, although the number of cards in each member can differ.

The following sections detail each mode of operation.

12.1.2.1. Single card in each server in unicast mode

A single network card in each server operating in unicast mode requires less hardware, so obviously it's less expensive than maintaining multiple NICs in each cluster member. However, network performance is reduced because of the overhead of using the NLB driver over only one network cardcluster traffic still has to pass through one adapter, which can be easily saturated, and is additionally run through the NLB driver for load balancing. This can create real hang-ups in network performance.

An additional drawback is that cluster hosts can't communicate with each other through the usual methods, such as pingingit's not supported using just a single adapter in unicast mode. This has to do with MAC address problems and the Address Resolution Protocol (ARP) protocol. Similarly, NetBIOS isn't supported in this mode either.

This configuration is shown in Figure 12-1.

Figure 12-1. Single card in each server in unicast mode

12.1.2.2. Multiple cards in each server in unicast mode

This is usually the preferred configuration for NLB clusters because it enables the most functionality for the price in equipment. However, it is inherently more expensive because of the second network adapter in each cluster member. Having that second adapter, though, means there are no limitations among regular communications between members of the NLB cluster. Additionally, NetBIOS is supported through the first configured network adapter for simpler name resolution. All kinds and types and brands of routers support this method, and having more than one adapter in a machine removes bottlenecks found with only one adapter.

This configuration is shown in Figure 12-2.

Figure 12-2. Multiple cards in each server in unicast mode

12.1.2.3. Single card in each server in multicast mode

Using a single card in multicast mode allows members of the cluster to communicate with each other normally, but network performance is still reduced because you still are using only a single network card. Router support might be spotty because of the need to support multicast MAC addresses, and NetBIOS isn't supported within the cluster.

This configuration is shown in Figure 12-3.

Figure 12-3. Single card in each server in multicast mode

12.1.2.4. Multiple cards in each server in multicast mode

This mode is used when some hosts have one network card and others have more than one, and all require regular communications among themselves. In this case, all hosts need to be in multicast mode because all hosts in an NLB cluster must be running the same mode. You might run into problems with router support using this model, but with careful planning you can make it work.

This configuration is shown in Figure 12-4.

Figure 12-4. Multiple cards in each server in multicast mode

12.1.3. Port Rules

NLB clusters feature the ability to set port rules , which simply are ways to instruct Windows Server 2003 to handle each TCP/IP port's cluster network traffic. It does this filtering in three modes: disabled, where all network traffic for the associated port or ports will be blocked; single host mode, where network traffic from an associated port or ports should be handled by one specific machine in the cluster (still with fault tolerance features enabled); and multiple hosts mode (the default mode), where multiple hosts in the cluster can handle port traffic for a specific port or range of ports.

The rules contain the following parameters:

The virtual IP address to which the rule should be applied
The port range for which this rule should be applied
The protocols for which this rule should apply, including TCP, UDP, or both
The filtering mode that specifies how the cluster handles traffic described by the port range and protocols, as described just before this list

In addition, you can select one of three options for client affinity (which is, simply put, the types of clients from which the cluster will accept traffic): None, Single, and Class C. Single and Class C are used to ensure that all network traffic from a particular client is directed to the same cluster host. None indicates there is no client affinity, and traffic can go to any cluster host.

When using port rules in an NLB cluster, it's important to remember that the number and content of port rules must match exactly on all members of the cluster. When joining a node to an NLB cluster, if the number or content of port rules on the joining node doesn't match the number or content of rules on the existing member nodes, the joining member will be denied membership to the cluster. You need to synchronize these port rules manually across all members of the NLB cluster.

12.1.4. Creating an NLB Cluster

To create a new NLB cluster, use the Network Load Balancing Manager and follow the instructions shown next.

From the Administrative Tools folder, open the Network Load Balancing Manager. The main screen is shown in Figure 12-5.

Figure 12-5. The Network Load Balancing Manager console
From the Cluster menu, select New.

The Cluster Parameters screen appears, as shown in Figure 12-6. Here, you specify the name of the cluster and the IP address information by which other computers will address the cluster. Enter the IP address, subnet mask, and full Internet name (i.e., the canonical DNS name). Also choose unicast or multicast mode, as discussed in the previous section. Click Next to continue.

Enabling remote control of your clustermeaning being able to load the NLB Manager client on other systems and connect remotely to the clusteris not recommended because it is a large security risk. Avoid this unless absolutely necessary, and use other tools such as Terminal Services or Remote Desktop.

Figure 12-6. The Cluster Parameters screen

The Cluster IP Addresses screen appears, as shown in Figure 12-7. Here, enter any additional IP addresses the cluster might need. You might want this for specific applications, but it's not required for a standard setup. Click Next when you've finished, or if there are no other IP addresses by which this cluster will be known.
The Port Rules screen appears, as shown in Figure 12-8. Enter and configure any port rules you'd like, as discussed in the previous section, and then click Next when you're done.
The Connect screen appears, as shown in Figure 12-9. Here, enter the IP address or DNS name of the host that will be added to the cluster first. Then click Connect. The list in the white box at the bottom of the screen will be populated with the network interfaces available for creating a cluster. Click the public interface, and click Next.
The Host Parameters screen appears, as seen in Figure 12-10. On this screen, enter the priority for the host of the cluster, the dedicated IP that you'll use to connect to this specific member node, and the initial state of this host when you first boot up Windows Server 2003. Click Finish to complete the process.

The NLB cluster is created, and the first node is configured and added to the cluster.

Figure 12-7. The Cluster IP Addresses screen

Figure 12-8. The Port Rules screen

Figure 12-9. The Connect screen

Figure 12-10. The Host Parameters screen

12.1.5. Adding Other Nodes to the Cluster

Chances are good that you want to add another machine to the cluster to take advantage of load balancing. To add a new node to an existing cluster, use the following procedure:

From the Administrative Tools menu, open the Network Load Balancing Manager console.
In the left pane, right-click the cluster to which you'd like to add a node, and then select Add Host to Cluster from the pop-up context menu.
The Connect screen appears. Type in the DNS name or the IP address of the host to join to the cluster. Click the Connect button to populate the list of network interfaces on that host, and then select the card that will host public traffic and click Next.
The Host Parameters screen appears. Enter the appropriate priority of the host (a setting which allows you to specify which machine should get the largest number of requestsuseful if you have two machines in a cluster and one is more powerful than the other), the dedicated IP address of this member of the cluster, and the initial state of the potential member node when Windows Server 2003 first boots. You can set the initial state to Started, Stopped, or Suspended.
Click Finish to complete the procedure.

The node is then added to the selected NLB cluster. You can tell the process is finished when the node's status, as indicated within the Network Load Balancing Manager console, says "Converged."

12.1.6. Removing Nodes from the Cluster

For various reasons, you might need to remove a joined node from the clusterto perform system maintenance, for example, or to replace the node with a newer, fresher, more powerful machine. You must remove an NLB cluster member gracefully. To do so, follow these steps:

From the Administrative Tools menu, open the Network Load Balancing Manager console.
Right-click Network Load Balancing Clusters in the left pane, and from the pop-up context menu, select Connect to Existing.
Enter the host to connect to, and then click Connect. Then, at the bottom of the Connect screen, select the cluster on the host, and click Next.
Finally, back in the console, right-click the node you want to remove in the left pane, and select Delete Host from the pop-up context menu.

This removes the node.

If you are only upgrading a node of the cluster and don't want to permanently remove a node from a cluster, you can use a couple of techniques to gradually reduce traffic to the host and then make it available for upgrading. The first is to perform a drainstop on the cluster host to be upgraded. Drainstopping prevents new clients from accessing the cluster while allowing existing clients to continue until they have completed their current operations. After all current clients have finished their operations, cluster operations on that node cease.

To perform a drainstop, follow these steps:

Open a command-line window.
From the command line, type wlbs drainstop <IP Address>:<hostID>, replacing the variable with the cluster IP address and the HostID with the unique number set in the Host Parameters tab in NLB properties.

For example, if my cluster was located at 192.168.0.14 and I wanted to upgrade node 2, I would enter the following command:

    Wlbs drainstop 192.168.0.14:2

In addition, you can configure the Default state of the Initial host state to Stopped as you learned in the previous section. This way, that particular node cannot rejoin the cluster during the upgrade process. Then you can verify your upgrade was completed smoothly before the cluster is rejoined and clients begin accessing it.

12.1.7. Performance Optimization

NLB clusters often have problems with switches. Switches differ from hubs in that data transmission among client computers connected to a hub is point-to-point: the switch keeps a cache of the MAC address of all machines and sends traffic directly to its endpoint, whereas hubs simply broadcast all data to all connected machines and those machines must pick up their own data. However, switches work against NLB clusters because every packet of data sent to the cluster passes through all the ports on the switch to which members of the cluster are attached because all cluster members share the same IP address, as you've already learned. Obviously, this can be a problem.

To avert this problem, you can choose from a few workarounds:

Use a premium hub to connect the NICs of all cluster members, and then use the uplink feature on the hub to link the hub to the switch.
Enable unicast mode as opposed to multicast mode. Remember, you need to make this change on all members of the cluster.
If possible, have all hosts on the same subnet, and then connect them to an isolated switch or configure them to connect in a single VLAN if you have that capability.
Disable the source MAC masking feature in the Registry. The source MAC masking feature is used to change the MAC address of traffic originating from the cluster from the individual cluster node's MAC address to the MAC address of the server. In multicast mode in switching environments, this can flood switching ports, so disabling this feature will work around that problem. Change the Registry value of:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\WLBS\Parameters\MaskSourceMAC
from 1 to 0. Restart all mllembers of the cluster after making this change.