|[ Team LiB ]|
Peripheral Component Interconnect
No one might have predicted that Peripheral Component Interconnect would become the most popular expansion bus in the world when Intel Corporation introduced it in July 1992. PCI was long awaited as a high-speed expansion bus specification, but the initial announcement proved to be more and less than the industry hoped for. The first PCI standard defined mandatory design rules, including hardware guidelines to help ensure proper circuit operation of motherboards at high speeds with a minimum of design complications. It showed how to link together computer circuits—including the expansion bus—for high-speed operation but failed to provide a standard for the actual signals and connections that would make it a real expansion bus.
The design first took the form of a true expansion bus with Intel's development of PCI Release 2.0 in May 1993, which added the missing pieces to create a full 64-bit expansion bus. Although it got off to a rocky start when the initial chipsets for empowering it proved flawed, computer-makers have almost universally announced support. The standard has been revised many times since then. By the time this book appears, Version 3.0 should be the reigning standard.
The original explicit purpose of the PCI design was to make the lives of those who engineer chipsets and motherboards easier. It wasn't so much an expansion bus as an interconnection system, hence its pompous name (Peripheral Component is just a haughty way of saying chip, and Interconnect means simply link). And that is what PCI is meant to be—a fast and easy chip link.
Even when PCI was without pretensions of being a bus standard, its streamlined linking capabilities held promise for revolutionizing computer designs. Whereas each new Intel microprocessor family required the makers of chipsets and motherboards to completely redesign their products with every new generation of microprocessor, PCI promised a common standard, one independent of the microprocessor generation or family. As originally envisioned, PCI would allow designers to link together entire universes of processors, coprocessors, and support chips without glue logic—the pesky profusion of chips needed to match the signals between different integrated circuits—using a connection whose speed was unfettered by frequency (and clock) limits. All computer chips that follow the PCI standard can be connected together on a circuit board without the need for glue logic. In itself, this could lower computer prices by making designs more economical while increasing reliability by minimizing the number of circuit components.
A key tenant of the PCI design is processor independence; that is, its circuits and signals are not tied to the requirements of a specific microprocessor or family. Even though the standard was developed by Intel, the PCI design is not limited to Intel microprocessors. In fact, Apple's PowerMac computers use PCI.
PCI can operate synchronously or asynchronously. In the former case, the speed of operation of the PCI bus is dependent on the host microprocessor's clock and PCI components are synchronized with the host microprocessor. Typically the PCI bus will operate at a fraction of the external interface of the host microprocessor. With today's high microprocessor speeds, however, the bus speed often is synchronized to the system bus or front-side bus, which may operate at 66, 100, or 133MHz. (The 400 and 533MHz buses used by the latest Pentium 4 chips actually run at 100 and 133MHz, respectively, and ship multiple bytes per clock cycle to achieve their high data rates.) The PCI bus can operate at speeds up to 66MHz under the revised PCI 2.2 (and later) standards. PCI-bus derivations, such as PCI-Express, use this higher speed.
PCI is designed to maintain data integrity at operating speeds down to 0 Hz, a dead stop. Although it won't pass data at 0 Hz, the design allows notebook computers to freely shift to standby mode or suspend mode.
Although all PCI peripherals should be able to operate at 33MHz, the PCI design allows you to connect slower peripherals. To accommodate PCI devices that cannot operate at the full speed of the PCI bus, the design incorporates three flow-control signals that indicate when a given peripheral or board is ready to send or receive data. One of these signals halts the current transaction. Consequently, PCI transactions can take place at a rate far lower than the maximum 33MHz bus speed implies.
The PCI design provides for expansion connectors extending the bus off the motherboard, but it limits such expansion to a maximum of three connectors (none are required by the standard). As with VL bus, this limit is imposed by the high operating frequency of the PCI bus. More connectors would increase bus capacitance and make full-speed operation less reliable.
To attain reliable operation at high speeds without the need for terminations (as required by the SCSI bus), Intel chose a reflected rather than direct signaling system for PCI. To activate a bus signal, a device raises (or lowers) the signal on the bus only to half its required activation level. As with any bus, the high-frequency signals meant for the slots propagate down the bus lines and are reflected back by the unterminated ends of the conductors. The reflected signal combines with the original signal, doubling its value up to the required activation voltage.
The basic PCI interface requires only 47 discrete connections for slave boards (or devices), with two more on bus-mastering boards. To accommodate multiple power supply and ground signals and blanked off spaces to key the connectors for proper insertion, the physical 32-bit PCI bus connector actually includes 124 pins. Every active signal on the PCI bus is adjacent to (either next to or on the opposite side of the board from) a power supply or ground signal to minimize extraneous radiation.
Although the number of connections used by the PCI system sounds high, Intel actually had to resort to a powerful trick to keep the number of bus pins manageable. The address and data signals on the PCI bus are time-multiplexed on the same 32 pins. That is, the address and data signals share the same bus connections (AD00 through AD31). On the one clock cycle, the combined address/data lines carry the address values and set up the location to move information to or from. On the next cycle, the same lines switch to carrying the actual data.
This address/data cycling of the bus does not slow the bus. Even in nonmultiplexed designs, the address lines are used on one bus cycle and then the data lines are used on the next. Moreover, PCI has its own burst mode that eliminates the need for alteration between address and data cycles. PCI also can operate in its own burst mode. During burst mode transfers, a single address cycle can be followed by multiple data cycles that access sequential memory locations.
PCI achieves its multiplexing using a special bus signal called Cycle Frame (FRAME#). The appearance of the Cycle Frame signal identifies the beginning of a transfer cycle and indicates the address/data bus holds a valid address. The Cycle Frame signal is then held active for the duration of the data transfer.
During burst mode transfers, a single address cycle can be followed by multiple data cycles that access sequential memory locations, limited only by the needs of other devices to use the bus and other system functions (such as memory refresh). The burst can continue as long as the Cycle Frame signal remains active. With each clock cycle that Cycle Frame is high, new data is placed on the bus. If Cycle Frame is active only for one data cycle, an ordinary transfer takes place. When it stays active across multiple data cycles, a burst occurs.
This burst mode underlies the 132MBps throughput claimed for the 32-bit PCI design. (With the 64-bit extension, PCI claims a peak transfer rate of 264MBps.) Of course, PCI attains that rate only during the burst. The initial address cycle steals away a bit of time and lowers the data rate (the penalty for which declines with the increasing length of the burst). System overhead, however, holds down the ultimate throughput.
PCI need not use all 32 (or 64) bits of the bus's data lines. Four Byte Enable signals (C/BE0# through C/BE3#) are used to indicate which of the four-byte-wide blocks of PCI's 32-bit signals contain valid data. In 64-bit systems, another four signals (C/BE4# through C/BE7#) indicate the additional active byte lanes.
To accommodate devices that cannot operate at the full speed of the PCI bus, the design incorporates three flow-control signals: Initiator Ready (IRDY#, at pin B35), Target Ready (TRDY#, at pin A36), and Stop (STOP#, at pin A38). Target Ready is activated to indicate that a bus device is ready to supply data during a read cycle or accept it during a write cycle. When Initiator Ready is activated, it signals that a bus master is ready to complete an ongoing transaction. A Stop signal is sent from a target device to a master to stop the current transaction.
Data Integrity Signals
To ensure the integrity of information traversing the bus, the PCI specification makes mandatory the parity-checking of both the address and data cycles. One bit (signal PAR) is used to confirm parity across 32 address/data lines and the four associated Byte Enable signals. A second parity signal is used in 64-bit implementations. The parity signal lags the data it verifies by one cycle, and its state is set so that the sum of it, the address/data values, and the Byte Enable values are a logical high (1).
If a parity error is detected during a data transfer, the bus controller asserts the Parity Error signal (PERR#). The action taken on error detection (for example, resending data) depends on how the system is configured. Another signal, System Error (SERR#) handles address parity and other errors.
Parity-checking of the data bus becomes particularly important as the bus width and speed grow. Every increase in bus complexity also raises the chance of errors creeping in. Parity-checking prevents such problems from affecting the information transferred across the bus.
Because of the design of the PCI system, the lack of the old IRQ signals poses a problem. Under standard PCI architecture, the compatibility expansion bus (ISA) links to the host microprocessor through the PCI bus and its host bridge. The IRQ signals cannot be passed directly through this channel because the PCI specification does not define them. To accommodate the old IRQ system under PCI architecture, several chipset and computer makers, including Compaq, Cirrus Logic, National Semiconductor, OPTi, Standard Microsystems, Texas Instruments, and VLSI Technology, developed a standard they called Serialized IRQ Support for PCI Systems.
The serialized IRQ system relies on a special signal called IRQSER that encodes all available interrupts as pulses in a series. One long series of pulses, called an IRQSER cycle, sends data about the state of all interrupts in the system across the PCI channel.
The IRQSER cycle begins with an extended pulse of the IRQSER signal, lasting from four to eight cycles of the PCI clock (each of which is nominally 33MHz but may be slower in systems with slower bus clocks). After a delay of two PCI clock cycles, the IRQSER cycle is divided into frames, each of which is three PCI clock cycles long. Each frame encodes the state of one interrupt—if the IRQSER signal pulses during the first third of the frame, it indicates the interrupt assigned to that frame is active. Table 9.2 lists which interrupts are assigned to each frame position.
In addition to the 16 IRQ signals used by the old interrupt system, the PCI serialized interrupt scheme also carries data about the state of the system management interrupt (SMI#) and the I/O check (IOCHCK#) signals as well as the four native PCI interrupts and 10 unassigned values that may be used by system designers. According to the serialized interrupt scheme, support for the last 14 frames is optional.
The IRQSER cycle ends with a Stop signal, a pulse of the IRQSER signal that lasts two or three PCI clocks, depending on the operating mode of the serialized interrupt system.
The PCI serialized interrupt system is only a means of data transportation. It carries the information across the PCI bus and delivers it to the microprocessor and its support circuitry. The information about the old IRQ signals gets delivered to a conventional 8259A interrupt controller or its equivalent in the microprocessor support chipset. Once at the controller, the interrupts are handled conventionally.
Although the PCI interrupt-sharing scheme helps eliminate setup problems, some systems demonstrate their own difficulties. For example, some computers force the video and audio systems to share interrupts. Any video routine that generates an interrupt, such as scrolling a window, will briefly halt the playing of audio. The audio effects can be unlistenable. The cure is to reassign one of the interrupts, if your system allows it.
Bus-Mastering and Arbitration
In operation, a bus master board sends a signal to its host to request control of the bus and starts to transfer when it receives a confirmation. Each PCI board gets its own slot-specific signals to request bus control and receive confirmation that control has been granted. This approach allows great flexibility in assigning the priorities, even the arbitration protocol, of the complete computer system. The designer of a PCI-based computer can adapt the arbitration procedure to suit his needs rather than having to adapt to the ideas of the obscure engineers who conceived the original bus specification.
Bus mastering across the PCI bus is achieved with two special signals: Request (REQ#) and Grant (GNT#). A master asserts its Request signal when it wants to take control of the bus. In return, the central resource (Intel's name for the circuitry shared by all bus devices on the motherboard, including the bus control logic) sends a Grant signal to the master to give it permission to take control. Each PCI device gets its own dedicated Request and Grant signal.
As a self-contained expansion bus, PCI naturally provides for hardware interrupts. PCI includes four level-sensitive interrupts (INTA# through INTD#, at pins A6, B7, A7, and B8, respectively) that enable interrupt sharing. The specification does not itself define what the interrupts are or how they are to be shared. Even the relationship between the four signals is left to the designer (for example, each can indicate its own interrupt, or they can define up to 16 separate interrupts as binary values). Typically, these details are implemented in a device driver for the PCI board. The interrupt lines are not synchronized to the other bus signals and may therefore be activated at any time during a bus cycle.
As the world shifts to lower-power systems and lower-voltage operation, PCI has been adapted to fit. Although the early incarnations of the standard provided for 3.3-volt operation in addition to the then-standard 5-volt level, the acceptance of the lower voltage standard became official only with PCI version 2.3. Version 3.0 (not yet released at the time of this writing) takes the next step and eliminates the 5-volt connector from the standard.
High frequencies, radiation, and other electrical effects also conspire to limit the number of expansion slots that can be attached in a given bus system. These limits become especially apparent with local bus systems that operate at high clock speeds. All current local bus standards limit to three the number of high-speed devices that can be connected to a single bus.
Note that the limit is measured in devices and not slots. Many local bus systems use a local bus connection for their motherboard-based display systems. These circuits count as one local bus device, so computers with local bus video on the motherboard can offer at most two local bus expansion slots.
The three-device limit results from speed considerations. The larger the bus, the higher the capacitance between its circuits (because they have a longer distance over which to interact). Every connector adds more capacitance. As speed increases, circuit capacitance increasingly degrades its signals. The only way to overcome the capacitive losses is to start with more signals. To keep local bus signals at reasonable levels and yet maintain high speeds, the standards enforce the three-device limit.
A single computer can accommodate multiple PCI expansion buses bridged together to allow more than three slots. Each of these sub-buses then uses its own bus-control circuitry. From an expansion standpoint—or from the standpoint of an expansion board—splitting the system into multiple buses makes no difference. The signals get where they are supposed to, and that's all that counts. The only worries are for the engineer who has to design the system to begin with—and even that is no big deal. The chipset takes care of most expansion bus issues.
Standard PCI cards do not allow for hot-plugging. That is, you cannot and should not remove or insert a PCI expansion board into a connector in a running computer. Try it, and you risk damage to both the board and the computer.
In some applications, however, hot-plugging is desirable. For example, in fault-tolerant computers you can replace defective boards without shutting down the host system. You can also add new boards to a system while it is operating.
To facilitate using PCI expansion boards in such circumstances, engineers developed a variation on the PCI standard called PCI Hot Plug and published a specification that defines the requirements for expansion cards, computers, and their software to make it all work. The specification, now available as revision 1.1, is a supplement to the ordinary PCI Specification.
PCI builds upon the Plug-and-Play system to automatically configure itself and the devices connected to it without the need to set jumpers or DIP switches. Under the PCI specification, expansion boards include Plug-and-Play registers to store configuration information that can be tapped into for automatic configuration. The PCI setup system requires 256 registers. This configuration space is tightly defined by the PCI specification to ensure compatibility. A special signal, Initialization Device Select (IDSEL), dedicated to each slot activates the configuration read and write operations as required by the Plug-and-Play system.
In the late 1990s, engineers at Compaq, Hewlett-Packard, and IBM realized that microprocessor speeds were quickly outrunning the throughput capabilities of the PCI bus, so they began to develop a new, higher-speed alternative aimed particularly at servers. Working jointly, they developed a new bus specification, which they submitted to the PCI Special Interest group in September 1998. After evaluating the specification for a year, in September 1999, the PCI SIG adopted the specification and published it as the official PCI-X Version 1.0 standard.
The PCI-X design not only increases the potential bus speed of PCI but also adds a new kind of transfer that makes the higher speeds practical. On July 23, 2002, PCI-X Version 2.0 was released to provide a further upgrade path built upon PCI-X technology. Table 9.3 compares the speed and bandwidth available under various existing and proposed PCI and PCI-X specifications.
The PCI-X design follows the PCI specification in regard to signal assignments on the bus. It supports both 32-bit and 64-bit bus designs. In fact, a PCI card will function normally in a PCI-X expansion slot, and a PCI-X expansion board will work in a standard PCI slot, although at less than its full potential speed.
The speed of the PCI-X expansion bus is not always the 133MHz of the specifications. It depends on how many slots are connected in a single circuit with the bus controller and what's in the slots. To accommodate ordinary PCI boards, for example, all interconnected PCI-X slots automatically slow to the highest speed the PCI board will accept—typically 33MHz. In addition, motherboard layouts limit the high-frequency capabilities of any bus design. Under PCI-X, only a single slot is possible at 133MHz. Two-slot designs require retarding the bus to 100MHz. With four slots, the practical top speed falls to 66MHz. The higher speeds possible under PCI-X 2.0 keep the clock at 133MHz but switch to double-clocking or quad-clocking the data to achieve higher transfer rates.
To reach even the 133MHz rate (and more reliable operation at 66MHz), PCI-X adds a new twist to the bus design, known as register-to-register transfers. On the PCI bus, cards read data directly from the bus, but the timing of signals on the bus leaves only a short window for valid data. At 33MHz, a PCI card has a window of about 7 milliseconds to read from the bus; at 66MHz, only about 3 milliseconds is available. In contrast, PCI-X uses a register to latch the signals on the bus. When a signal appears on the bus, the PCI-X registers lock down that value until data appears during the next clock cycle. A PCI-X card therefore has a full clock cycle to read the data—about 15 milliseconds at 66MHz and about 7.5 milliseconds at 133MHz.
Although register-to-register transfers make higher bus speeds feasible, at any given speed they actually slow the throughput of the bus by adding one cycle of delay to each transfer. Because transfers require multiple clock cycles, however, the penalty is not great, especially for bursts of data. But PCI-X incorporates several improvements in transfer efficiency that help make the real-world throughput of the PCI-X bus actually higher than PCI at the same speed.
One such addition is attribute phase, an additional phase in data transfers that allows devices to send each other a 36-bit attribute field that adds a detailed description to the transfer. The field contains a transaction byte count that describes the total size of the transfer to permit more efficient use of bursts. Transfers can be designated to allow relaxed ordering so that transactions do not have to be completed in the order they are requested. For example, a transfer of time-critical data can zoom around previously requested transfers. Relaxed ordering can keep video streaming without interruption. Transactions that are flagged as non-cache coherent in the attribute field tell the system that it need not waste time snooping through its cache for changes if the transfer won't affect the cache. By applying a sequence number in the attribute field, bus transfers in the same sequence can be managed together, which can improve the efficiency in caching algorithms.
In addition to the attribute phase, PCI-X allows for split transactions. If a device starts a bus transaction but delays in finishing it, the bus controller can use the intervening idle time for other transactions. The PCI-X bus also eliminates wait states (except for the inevitable lag at the beginning of a transaction) through split transactions by disconnecting idle devices from the bus to free up bandwidth. All PCI-X transactions are one standard size that matches the 128-bit cache line used by Intel microprocessors, permitting more efficient operation of the cache. When all the new features are taken together, the net result is that PCI-X throughput may be 10 percent or more higher than using standard PCI technology.
Because PCI-X is designed for servers, the integrity of data and the system are paramount concerns. Although PCI allows for parity-checking transfers across the bus, its error-handling mechanism is both simple and inelegant—an error would shut down the host system. PCI-X allows for more graceful recovery. For example, the controller can request that an erroneous transmission be repeated, it can reinitialize the device that erred or disable it entirely, or it can notify the operator that the error occurred.
PCI-X cards can operate at either 5.0 or 3.3 volts. However, high-speed operation is allowed only at the lower operating voltage. The PCI-X design allows for 128 bus segments in a given computer system. A segment runs from the PCI-X controller to a PCI-X bridge or between the bridge and an actual expansion bus encompassing one to four slots, so a single PCI-X system can handle any practical number of devices.
According to the PCI Special Interest Group, the designated successor to today's PCI-X (and PCI) expansion buses is PCI Express. Under development for years as 3GIO (indicating it was to be the third generation input/output bus design), PCI Express represents a radical change from previous expansion bus architectures. Instead of using relatively low-speed parallel data lines, it opts for high-speed serial signaling. Instead of operating as a bus, it uses a switched design for point-to-point communications between devices—each gets the full bandwidth of the system during transfers. Instead of special signals for service functions such as interrupts, it uses a packet-based system to exchange both data and commands. The PCI-SIG announced the first PCI Express specification at the same time as PCI-X 2.0, on July 23, 2002.
In its initial implementation, PCI Express uses a four-wire interconnection system, two wires each (a balanced pair) for separate sending and receiving channels. Each channel operates at a speed of 2.5GHz, which yields a peak throughput of 200MBps (ignoring packet overhead). The system uses the 8b/10b encoding scheme, which embeds the clock in the data stream so that no additional clock signal is required. The initial design contemplates future increases in speed, up to 10GHz, the theoretic maximum speed that can be achieved in standard copper circuits.
To accommodate devices that require higher data rates, PCI Express allows for multiple lanes within a single channel. In effect, each lane is a parallel signal path between two devices with its own four-wire connection. The PCI Express hardware divides the data between the multiple lanes for transmission and reconstructs it at the other end of the connection. The PCI Express specifications allow for channels with 1, 2, 4, 8, 12, 16, or 32 lanes (effectively boosting the speed of the connection by the equivalent factor). A 32-lane system at today's 2.5GHz bus speed would deliver throughput of 6400MBps.
The switched design is integral to the high-speed operation of PCI Express. It eliminates most of the electrical problems inherent in a bus, such as changes in termination and the unpredictable loading that occur as different cards are installed. Each PCI Express expansion connector links a single circuit designed for high-speed operation, so sliding a card into one slot affects no other. The wiring from each PCI Express connector runs directly back to a single centralized switch that selects which device has access to the system functions, much like a bus controller in older PCI designs. This switch can be either a standalone circuit or a part of the host computer's chipset.
All data and commands for PCI Express devices are contained in packets, which incorporate error correction to ensure the integrity of transfers. Even interrupts are packetized using the Messaged Signal Interrupt system introduced with PCI version 2.2, the implementation of which is optional for ordinary PCI but mandatory for PCI Express. The packets used by PCI Express use both 32-bit and extended 64-bit addressing.
The primary concern in creating PCI Express was to accommodate the conflicting needs of compatibility while keeping up with advancing technology. Consequently, the designers chose a layered approach with the top software layers designed to match current PCI protocols, while the lowest layer, the physical, permits multiple variations.
The top two layers, Config/OS and S/W, require no change from ordinary PCI. In other words, the high-speed innovations of PCI Express are invisible to the host computer's operating system.
As to the actual hardware, PCI Express retains the standard PCI board design and dimensions. It envisions dual-standard boards that have both conventional and high-speed PCI Express connections. Such dual-standard boards are restricted to one or two lanes on a extra edge connector that's collinear with the standard PCI expansion connector but mounted between it and the back of the computer. Devices requiring a greater number of lanes need a new PCI Express connector system.
Standards and Coordination
|[ Team LiB ]|