Digital Audio

Computers, of course, use digital signals, as do many modern stereo components such as Compact Disc players and Digital Audio Tape systems. Once a sound signal has been translated from analog into digital form, it becomes just another form of data that your computer can store or compute upon. Digital technology adds new terms to the audio vocabulary and raises new concerns.

Digital recording of sound turns music into numbers. That is, a sampling circuit examines audio waveforms thousands of times every second and assigns a numerical value to the strength of the sound every time it looks; it then records the numbers. To reproduce the music or noise, a computer's sound system works backward. It takes the recorded numbers and regenerates the corresponding signal strength at intervals exactly corresponding to those at which it examined the original signal. The result is a near-exact duplication of the original audio.

The digital recording process involves several arbitrary variables. The two most important are the rate at which the original audio signal is examined—called the sampling rate—and the numeric code assigned to each value sampled. The code is digital and is defined as a given number of bits, the bit-depth or resolution of the system. The quality of sound reproduction is determined primarily by the values chosen for these variables.

Sampling Rate

The sampling rate limits the frequency response of a digital recording system. The highest frequency that can be recorded and reproduced digitally is half the sampling frequency. Why? Start by taking the worst case, when the sampling frequency and the frequency being sampled are the same. The sample would then occur at exactly the same place in the audio wave with each sample. The numbers in the digital code would be the same, and when reproduced the system would not produce a tone but a constant, unvarying voltage—direct current. The sampling frequency thus acts like an audio stroboscope. As the frequency being sampled goes up past half the sampling frequency, the tone reconstructed from the digital code actually goes down, reaching zero at the sampling frequency. Go even higher and the regenerated tone starts increasing again—from zero.

This top frequency that can be sampled (that is, half the sampling frequency) is often called the Nyquist frequency, after Harry Nyquist, a Swedish-born American scientist working at Bell Labs who first published the explanation of the limit in 1928. Higher frequencies become ambiguous and can be confused with lower frequency values, producing distortion. To prevent problems, frequencies higher than half the sampling frequency must be eliminated—filtered out—before they are digitally sampled. Because frequencies beyond the Nyquist frequency masquerade as lower frequencies when they get regenerated, the regenerated tones are said to alias, and filters that remove the high frequencies before sampling are called antialiasing filters.

Because no audio filter is perfect, most digital audio systems have antialiasing filters with cutoff frequencies somewhat lower than the Nyquist frequency. The Compact Disc digital audio system is designed to record sounds with frequencies up to about 15KHz, and it uses a sampling rate of 44.1KHz and a Nyquist frequency of 20.05Hz. Table 25.2 lists the sampling rates in common use in a variety of applications.

Table 25.2. Common Digital Sampling Rates

Rate (Hz) Application

5563.6 Apple Macintosh, lowest quality

7418.3 Apple Macintosh, low quality

8000 Telephone standard

8012.8 NeXT workstations

11,025 PC, low quality (1/4th CD rate)

11,127.3 Apple Macintosh, medium quality

16,000 G.722 compression standard

18,900 CD-ROM/XA long-play standard

22,050 PC, medium quality (1/2 CD rate)

22,254.5 Basic Apple Macintosh rate

32,000 Digital radio, NICAM, long-play DAT, HDTV

37,800 CD-ROM/XA higher-quality standard

44,056 Professional video systems

44,100 Basic CD standard

48,000 DVD, Audio Codec '97, Professional audio recording

96,000 DVD at highest audio quality

The odd numbers used by some of the standards are often less arbitrary than they look. For example, the 22,254.5454 Hz rate used by Apple Macintosh system matches the horizontal line rate of the video display of the original 128K Macintosh computer system. For Mac people, that's a convenient number. The 44,056 rate used by some professional video systems is designed to better match the sampling rate to the video frame rate.

Resolution

The number of bits in a digital code or bit-depth determines the number of discrete values it can record. For example, an eight-bit digital code can represent 256 distinct objects, be they numbers or sound levels. A recording system that uses an eight-bit code can therefore record 256 distinct values or steps in sound levels. Unfortunately, music and sounds vary smoothly rather than in discrete steps. The difference between the digital steps and the smooth audio value is heard as distortion. This distortion also adds to the noise in the sound recording system. Minimizing distortion and noise means using more steps. High-quality sound systems—that is, CD-quality sound—require a minimum of a 16-bit code. High-quality systems use 20- or 24-bit codes.

Bandwidth

Sampling rate and resolution determine the amount of data produced during the digitization process, which in turn determines the amount that must be recorded. In addition, full stereo recording doubles the data needed because two separate information channels are required. The 44.1KHz sampling frequency and 16-bit digital code of stereo CD audio result in the need to process and record about 150,000 bits of data every second, about 9MB per minute.

For full CD compatibility, most newer soundboards have the capability to digitize at the CD level. Intel's Audio Codec '97 specification requires a bit more, a 48K sampling rate, and undoubtedly stereophiles will embrace the extraordinarily high optional 96K sampling rate allowed by the DVD standard. For most computer operations, however, less can be better—less quality means less data to save in files and ship across the Internet. The relatively low quality of loudspeakers attached to computers, the ambient background noise in offices, and the noise the computer and its fan and disks make themselves make the nuances of top-quality sound inaudible anyways.

To save disk space and processing time, computer sound software often gives you the option of using less resource-intensive values for sampling rate and bit-depth. Most computer sound systems support 22 and 11KHz sampling; some offer other intermediate values, such as 8, 16, or 32KHz. You can trim your data needs in half simply by making system sounds monophonic instead of stereo. (Note that such savings are not quite so straightforward once audio is compressed, such as when using MP3.)

If you are making original recordings of sounds and music, you will want to use as high a rate as is consistent with your computer's resources. Often the application will dictate your format. For example, if you want to use your CD-R drive to master audio CDs, you'll need to use the standard CD format, stereo 16-bit quantization at a 44.1KHz sampling rate. On the other hand, the best tradeoff between quality and bandwidth for Internet-bound audio is 11KHz sampling with 8-bit quantization. If you have an eye to the future and an ear of gold, go for 96KHz sampling (if your computer allows it).

Note that the format of the data sets a limit on quality without determining the actual quality of what you will hear. In other words, you cannot do better than the quality level you set through choice of bit-depth and sampling rate. The shortcomings of practical hardware, particularly inexpensive sound systems and loudspeakers, dictate that the quality of the sound that actually makes it into the air will be less realistic than the digital format may allow.

Transmission

Moving digital audio around is the same as sending any digital signal. You can move files through any convenient port.

You can usually move CD-quality audio through a 10Base-T network with no problem. For example, you can put WAV files on a server and play them back smoothly on any computer connected to the network. That is, unless you put your computer to work on other jobs that are extremely resource intensive or your network bears heavy traffic.

USB provides a quick and easy connection and is coming into use for digital recording and editing. The bandwidth of USB is, however, constrained. After allowing for handshaking and other transmission overhead, you can expect to route about six CD-quality digital audio signals simultaneously through a USB 1.0 connection.

The Web provides a big challenge for audio. Real-time audio playback across the Web is inevitably a big compromise, one that relies on heavy compression that restricts bandwidth, which in turn limits frequency response and guarantees a high noise level.

If you don't need real-time audio playback—if you are collecting music rather than listening to it—you can move any kind of digital audio through any channel. Ordinary telephone connections make the time involved prodigious, especially if you want to move raw CD-quality audio for an album collection. Compressing audio files trims the time to transmit them through such connections, which is why most music now moves through the Internet in compressed form using aurally friendly algorithms such as MP3 (discussed next) and newer, related systems.

Compression

The Internet is not the only place where the size of digital audio files becomes oppressive. At about 10MB per minute, audio files quickly grow huge. Archiving more than a few sound bites quickly becomes expensive in terms of disk space.

To squeeze more sound into a given amount of storage, digital audio can be compressed like any other data. Actually, audio lends itself to compression. The more efficient algorithms take into account the special character of digital sound. The best rely on psycho-acoustic principles, how people actually hear sound. They discard inaudible information, not wasting space on what they cannot hear.

The algorithms for compressing and decompressing digital audio are called codecs, short for coder/decoder. Several have become popular for different applications. Windows includes several, invisibly selecting the proper one when necessary to play back an audio file. When recording, your software will prompt you to select the codec to use if a selection is available.

The Internet has brought other compression systems to the forefront. Although nothing inherent in the Web requires them, because they reduce the size of audio files (and hence their transmission time), they have made audio distribution through the Web practical. More than that, high-quality compressed audio and the Web are revolutionizing how music gets distributed.

As excitement about the Web was reaching its peak in the year 2000, developing a new audio compression system spurred more investment than perhaps any other area of Web development. Every promoter had a new and better system that he hoped to make a new industry standard. Although most of these efforts claimed high degrees of compression (and thus small file sizes), the chief ingredient in each was rights management, the ability to control duplication of music files to protect publishers' copyrights. Some systems (such as RealAudio) allow you to listen from the Web without the ability to save music. Others only let you play downloaded files on a single system after you've paid for them. All these rights-management systems are based on digital encryption. The Digital Millennium Copyright Act makes breaking these codes or even publishing how to break the codes illegal.

The most important open standard—and the one that inspired the newer compression systems—is MP3, shorthand for MPEG, Level 3. (It does not stand for MPEG-3—there is no such standard, at least not yet). Although usually regarded as a video standard, the MPEG standards discussed in Chapter 24, "Display Systems," also describe the audio that accompanies the moving images. The applications of MPEG audio are widespread—its compression system is used by Digital Compact Cassettes, digital broadcasting experiments, and the DVD.

MPEG audio is not one but a family of audio coding schemes based on the human perception of sound. The basic design has three layers that translate directly into sound quality. The layers, numbered 1 through 3, form a hierarchy of increasing complexity that yield better quality at the same bit-rate. Each layer is built on the previous one and incorporates the ability to decode signals coded under the lower layers. Table 25.3 summarizes the MPEG audio layers.

Table 25.3. MPEG Layers and Bit-Rates Compared

Layer Allowed Range Target or Optimum Sample Application

1 32 to 448Kbps 192Kbps Digital Compact Cassette

2 32 to 384Kbps 128Kbps MUSICAM (Broadcasting)

3 32 to 320Kbps 64Kbps DVD and Internet sound

As the layer number increases, the encoding becomes more complex. The result is a greater amount of compression. Because greater compression requires more processing, there is apt to be more latency (signal delay) as the layer number increases.

The layer number does not affect perceived sound quality. All layers permit sampling frequencies of 32, 44.1, or 48Kbps. No matter the layer, the output quality is dependent on the bit-rate allowed—the higher the bit-rate, the higher the quality. The different standards allow higher quality to be maintained at lower bit-rates. At their target bit-rates, all three layers deliver sound quality approaching that of CDs.

Unlike other standards, MPEG does not define compression algorithms. Instead, the layers provide standards for the data output rather than how that output is achieved. This descriptive approach allows developers to improve the quality of the algorithms as the technology and their discoveries permit. Header information describes the level and methodology of the compression used in the data that follows.

MPEG is asymmetrical in that it is designed with a complex encoder and a relatively simple decoder. Ordinarily you will decode files. Only the producer or distributor of MPEG software needs an encoder. The encoding process does not need to (and often does not) take place in real time. All layers use a polyphase filter bank with 32 sub-bands. Layer 3 also adds a modified discrete cosine transform (MDCT) that help increase its frequency resolution.

MP3 takes advantage of the high compression afforded under the MPEG audio standard and uses it as the basis for a file system, which serves as a basis for today's MP3 hardware. The advantage of MP3 is simply compression. It squeezes audio files into about one-twelfth the space raw digital audio data would require. As a result, music that would nominally require a 50MB file under the WAV format only takes about 4MB. Smaller files means less transmission time so that cuts and entire albums can reasonably be sent across the Internet. This also allows a substantial length of music (an hour or more) to be encoded into solid-state memory and carried about in a no-moving-parts player.

Better still, by squeezing the size of the MP3 file, the data rate required for playing back a file in real time can be similarly reduced. Instead of requiring approximately 1.2Mbps to move two CD-quality audio channels, MP3 files need only 64Kbps for near-CD-quality playback. Although that's not slow enough for real-time playback through a 56K modem (remember, MP3 files are already compressed so modem-based compression cannot appreciably speed up transfer rates), real-time playback is possible with ISDN terminal adapters (with a good 128K connection and fast server), cable "modems," and ADSL connections. With light network traffic and a 56K modem, you can expect to download an MP3 file from the Web in two to four times its actual playing time.

Because MP3 is part of the MPEG-1 standard, it accommodates only stereo. MPEG-2 is designed for surround sound using up to eight channels. In the works is a new level of compression, often referred to as MP4, which will extend MP3-like compression to surround sound systems.

The tradeoff with gaining the compactness of MP3 files is data processing. Decoding the audio data from an MP3 file requires a substantial amount of microprocessor power, so you can't use a vintage computer you stuck on your shelf a decade ago as an audio component. Machines more than a couple of generations old can't process MP3 in real time. Typically, you'll need at least a 100MHz Pentium microprocessor, although a 300MHz Pentium II system make a great audio-playback (and MP3-encoding) computer. In other words, save your old computer for your stereo system.

You can recognize MP3 files by their filename extension of MP3. Note that some encoding programs that generate MP3 files allow you to choose the "level" of processing—typically Level 1, 2, or 3. These selections correspond to the levels under the MPEG standard and represent higher degrees of compression (smaller files and lower data rates) as the numbers ascend, but only Level 3 creates true MP3 files. Nearly all MP3 decoders will handle whatever level you choose, so go for the lower numbers if you want the highest quality and don't mind giving up more disk space.

Looped Sound

When you want to include sound as a background for an image that your audience may linger over for a long but predetermined amount of time (for example, to add background sounds to a Web page), you can minimize the storage and transmission requirements for the audio by using looped sound. As the name implies, the sound forms an endless loop, the end spliced back to the beginning. The loop can be as short as a heartbeat or as long as several musical bars. The loop simply repeats as long as the viewer lingers over the Web page. The loop only requires as much audio data as it takes to code a single pass, no matter how long it plays.

No rule says that the splice between the end and beginning of the loop must be inconspicuous, but if you want to avoid the nature of the loop becoming apparent and distracting, it should be. When looping music, you should place the splice so that the rhythm is continuous and regular. With random sounds, you should match levels and timbre at the splice. Most computer sound editors will allow you to finely adjust the splice. Most Internet sound systems support looped sounds.

Internet Sound Systems

Go online and you'll be confronted with a strange menagerie of acronyms describing sound systems promising everything from real-time delivery of rock concerts, background music from remote radio stations a zillion miles away, and murky audio effects percolating in the background as you browse past pages and pages of uninteresting Web sites. All this stuff is packaged as digital audio of some kind; otherwise, it could never traverse the extent of the Internet. Rather than straight digital audio, however, it is processed and compressed into something that fits into an amazingly small bandwidth. Then, to get it to you, it must latch on to its own protocol. It's amazing that anything gets through at all, and, in truth, some days (and connections) are less amazing than others.

The biggest hardware obstacle to better Internet sound is bandwidth. Moving audio digitally consumes a huge amount of bandwidth, and today's typical communications hardware is simply not up to the chore. A conventional telephone conversation with a frequency response of 300 to 3000 hertz—hardly hi-fi—gets digitized by the telephone system into a signal requiring a bandwidth of 64,000 bits per second. That low-fidelity data is a true challenge to cram through a modem that has but a 28,800 or even a 33,400 bits per second data rate. As a result, all Internet sound systems start with data compression of some kind to avoid the hardware-imposed bandwidth limits.

The Internet poses another problem for audio systems—the Web environment itself is rather inhospitable to audio signals. From its inception, the Net was developed as an asynchronous packet-switched network. Its primary protocol, TCP, was not designed for the delivery of time-critical isosynchronous data such as live audio. When you're downloading a file (or a Web page, which is essentially a file as well), it doesn't matter whether a packet gets delayed, but a late packet in an audio stream is less than worthless—it's an interruption that can ruin whatever you're listening to. Some Internet sound systems abandon TCP for audio transfer and use the UDP protocol instead, which can complicate matters with systems and firewalls designed expressly for TCP. Other Internet sound systems rely on TCP, citing that it ensures top audio quality in transferred files.

Several streaming audio players are available for download from the Web. The three most popular are the audio side of Apple's QuickTime (www.apple.com/quicktime/), RealAudio from Real.com (www.real.com), and Microsoft's Windows Media Player (included with current versions of Windows and available from www.microsoft.com/windows/windowsmedia/).

Multichannel Sound

When sound recording first became possible, people were so thrilled to hear anything at all that small issues such as sound quality were unimportant to them. A tiny voice against a scratchy background sounded as real as a loving coo whispered in their ears. As new technologies such as electronic recording processes brought more life-like qualities to reproduced sound, people discovered that recordings had a missing dimension. In real life, they could hear depth, but the single horn on top of the old Victrola compressed the sound down to a single point.

Trying to recover the depth of sound recording has occupied the minds of engineers and researchers almost since the advent of electronic recording in the 1920s. Their aim has been to reproduce the entire listening experience so that you don't just hear the music but feel like you are immersed in it, sitting in a concert hall rather than your living room.

Stereo

The most natural improvement was stereophonic recording. The idea was simple, even intuitive. Scientists knew that people were able to localize sound because they had two ears. At the time, they didn't know exactly what allowed human hearing to determine locations, but they found that reproducing music through two-speaker systems with separately recorded signals created a miraculous effect: Instead of hearing sound from one speaker or the other, the ears of the listeners were fooled into hearing sounds coming from the entire space between the speakers.

Binaural

Closely related to stereo is binaural recording. As with stereo, binaural uses two recording channels, but binaural requires a special recording method for creating those signals that uses tiny microphones placed inside the ears of a dummy head. Playback, too, differs from ordinary stereo. In its purest form, a binaural system requires headphones instead of loudspeakers. In effect, the binaural system records the sounds that reach the dummy's ears directly into your ears.

The difference between stereo and binaural is astounding. Unlike simple stereo systems, binaural creates a convincing illusion of three-dimensional space. Listen to a conventional stereo recording on headphones, and all the sounds appear to emanate from inside your head. The only spread is between your ears. With a binaural recording, the sound appears to come from all around, outside your head, surrounding you. You are convincingly transported to the place and position of the dummy head used in making the original recording.

Without headphones, the binaural illusion disappears. To most people, a binaural recording played through loudspeakers sounds like a conventional stereophonic recording.

The development of binaural recording paralleled conventional stereo. Although initially more a laboratory curiosity, in the 1960s and early 1970s record companies released a few commercial binaural discs and radio stations transmitted a few binaural broadcasts of classical music and, rarely, rock concerts. Unfortunately, the artificial head recording technique was incompatible with the multitrack studio tapes used for the vast majority of popular and rock discs, so the technology never transcended its status as a laboratory curiosity.

Quadraphonic

An abortive attempt at creating a three-dimensional sound stage appeared in the form of quadraphonic recording in the early 1970s. The idea was simple: To create listening space around and behind the listener, two recording channels and speakers dedicated to them were added behind the listener—essentially it was stereo squared.

The problem with quad was not the concept but the available technology for broadcasting and recording discs. At the time, quad was invariably a compromise. Because of the limitations of analog technology, engineers were invariably forced to squeeze in four channels where only two were meant to fit. The most common method involved a sacrifice in channel separation. To achieve compatibility with conventional stereo, the engineers combined the front and back information for each of the two stereo channels. They then piggybacked difference information—the back channel minus the front channel—on top of the front-plus-back signal using a form of phase modulation (quadrature modulation, which, despite the similarity of names, is not related to the quadraphonic concept). These systems provided a front-to-back channel separation of about six decibels—much less than the 20 to 30 dB of separation for the front channels but sufficient (or so the engineers said) for providing the illusion of depth.

Only tape systems could provide four full-bandwidth and completely separate audio channels. At the time, however, the only legitimate tape systems with quality suitable for music recording used cumbersome open-reel tape. The audio cassette was still in its infancy.

Although the Compact Disc would have permitted true quadraphonic recording with four equal channels (although with a halving of playing time), it came too late. By the time of the introduction of the CD, consumer-level quad was long a dead issue. It had never made it into Disco let alone the 1980s. Worse, quad bore the stigma of unfulfilled promises along with skepticism about its origins. Many people suspected the stereo industry introduced and promoted quad solely to sell more stereo equipment at a time when hardware sales were flagging. Because of its bad name, the advent of technology capable of handling quad was not sufficient to resurrect it, and modern systems carefully avoided any association with it. Note that no new technology professing to be "surround sound" uses four speakers.

3D Sound

The notion that two channels of sound are all that you need for two ears is hard to shake. Somehow with just two ears most people are able to perceive sound in three dimensions—you can tell not only whether a sound source is left or right of you but also whether it is above, below, or anywhere else. Two ears shouldn't be sufficient for 3D, but (obviously) they are. Binaural recording and playback shouldn't work.

As early as the first studies of stereo sound, scientists have puzzled over this anomaly. They discovered that two ears alone aren't enough. The ears had to be coupled to a powerful computer—the human brain. The brain exploited subtle differences in the signals received in each ear, differences caused by the odd shape of the human ear (in some people odder than others), and extracted enough information not only for depth but height and everything else.

Once they figured out how people could deduce depth from the signals from two ears, engineers went to work reversing the process. They figured if they altered the signal before it was played back, they could add cues that would fool listeners into thinking they heard sound in three dimensions when played through two speakers.

Surround Sound

Although surround sound is a relatively new innovation for home listening, it actually predates two-channel stereo by decades. The first commercial presentation of surround sound came with Walt Disney's animated feature film, Fantasia, released in 1941 with a six-channel soundtrack. Although not known then as "surround," similar multichannel formats became popular in cinematic productions during the 1950s as movie producers attempted to fight the defection of their audience to the small screens of television sets. They stretched their already-giant movie screens with Cinerama and Cinemascope and filled the auditoriums with six or more sound channels. Although these systems were formally classed as "stereo," industry insiders began to refer to the speakers in the sides and rear of the auditoriums as "surrounds," and soon the technology became surround sound.

Although "surround sound" does not inherently imply any specific number of channels, the most popular format made four the basic standard. In 1976, Dolby Laboratories introduced Dolby optical stereo sound, a system that used noise reduction to coax high-quality sound from optical sound tracks. Until then, most movies with surround sound used magnetic soundtracks, a technology that adds to the cost and complication of printing films. Optical soundtracks are printed in the same operation as the images. Fitting four channels into the optical soundtrack required matrixing, the same technique that put quadraphonic sound on vinyl phonograph discs.

The difference between this four-channel form of surround sound and quad is the arrangement of channels. Quad uses four equal speakers in the corners of the listening space—a square. Surround uses a diamond arrangement with two primary speakers—one left, one right—a center speaker, and a rear speaker. Not only is this arrangement effective in producing an illusion of an all-encompassing sound field, it is also more amenable to the matrixing technique that combines four channels into two. Videocassettes labeled "Dolby Surround" use this technology.

Dolby Pro Logic adds signal steering to the basic surround sound arrangement. That is, it is able to emphasize (or exaggerate) the surround sound effect by selectively altering the balance between channels in response to coding in the original two-channel signal. Both Dolby Surround and Dolby Pro Logic are analog technologies, although the latter uses digital logic to control the balance between channels.

In discussing surround sound, the arrangement of speakers is often abbreviated in what looks like a fraction—two numbers separated by a slash. The first number represents the speakers in front of the listener; the latter number, the speakers behind the listener. Table 25.4 lists the common surround sound configurations.

Table 25.4. Surround Sound Channel Configurations

Designation Common Name Speakers Used

Front Left Front Mid Left Front Front Center Mid Right Front Right Rear Left Rear Center Rear Right

1/0 Mono o o X o o o o o

2/0 Stereo X o o o X o o o

3/0 Center-channel stereo X o X o X o o o

2/1 Three-channel surround X o o o X o X o

2/2 Quadra-phonic X o o o X X o X

3/2 Standard surround X o X o X X o X

5/2 Enhanced surround X X X X X X o X

Further complicating the designations is a decimal component sometimes listed in the count of channels (for example, 5.1 or 7.1). The decimal indicates an additional sound source that doesn't quite reach to being a full-range channel because of limited bandwidth. Sometimes this extra channel is termed the low frequency effects (LFE) channel. Its frequency range is limited to that of nondirectional low frequencies (below 100 to 150 Hz), as would be used to power a subwoofer. Nearly all theater-style surround sound systems use an LFE channel to accentuate explosions and other impacts so that their low-frequency components might be felt as readily as they are heard.

AC-3

More commonly known as Dolby Digital (Dolby Labs holds patents on the perceptual encoding system it uses), AC-3 is a 5.1 channel sound system. In other words, it uses five full-bandwidth (20 Hz to 20KHz) channels along with one reduced-bandwidth (20 Hz to 120 Hz) channel dedicated to low frequency effects. Speakers for a full-fledged AC-3 system are arrayed in the 3/2 configuration with three in front (left, center, and right) and two in the rear (left and right). The subwoofer can be put in any convenient location. AC-3 also allows for other configurations, including plain mono and stereo.

AC-3 is the standard sound format of movies distributed on DVD, having been approved for that purpose in December, 1995. In the DVD system, all six channels of AC-3 are discrete, separately recorded and isolated from the others. The full-bandwidth channels are encoded with a sampling rate of 48KHz and a depth of 16 bits.

In operation, AC-3 compresses the raw audio data encompassing all the channels down to a bit-rate that can range from 64Kbps to 448Kbps. Typically, in stereo the bit-rate will be 192Kbps. In full 5.1 channel configuration, the bit-rate runs about 384Kbps.

AC-3 is the standard for sound on NTSC-based DVD-Video discs. In addition, it has also been adopted as the sound system for the American digital television transmission standard (ATSC). As such, the full text of the AC-3 standard is available from the ATSC Web site at www.atsc.org.

DTS

Digital Theater Systems (DTS) began as a proprietary system for providing top-quality digital surround sound to motion picture theaters. Using the same encoding and compression system as it applied to professional applications, DTS created an alternative audio format for digital source material. The DTS system is an option for both audio and video DCDs. Nearly all DVD players lack the facility for decoding DTS signals, although you can usually connect an auxiliary processor to handle them.

As with most of the sound formats for DVD, the DTS system began as an enhancement for theatrical films. The professional DTS system syncs a CD-based digital playback system with movie projectors using a timecode encoded along with conventional sound tracks on the film itself. The timecode allows the digital audio source to be exactly locked to the film even after editing or repair (cutting and splicing) of the film.

In DVD form, the DTS sound stream encodes up to 5.1 channels sampled at 48KHz with a depth of up to 20 bits. It allows all standard channel combinations from mono to 3/2 surround with an LFE channel. The chief advantage of DTS is that it uses a lower compression ratio (about 4 to 1) than does Dolby Digital. As a result, it holds the potential of delivering higher-quality audio. The bit stream may have a data rate from 64Kbps to 1536Kbps.

SDDS

By acquiring Columbia Pictures, Sony became a major player in the motion picture business. The company is active not only in software but has also developed its own high-quality digital surround system called Sony Dynamic Digital Sound (SDDS). Based on the same compression system used by the zombie Minidisc system, SDDS can encode up to eight channels in a 1280Kbps stream. Typically it samples 5.1 or 7.1 channels at 48KHz and a depth of 16 bits. SDDS is an optional DVD format.

Capturing Sounds

If you're not content using the sound given you by others and don't want to go to the trouble of synthesizing your own, you may look longingly at all the noises already available to you that just aren't in computer-compatible form. You might have a stack of old vinyl albums you want to transfer to CD; you might want to move a cut from a CD into an MP3 file for more compact listening on your notebook computer while your travel; or you may want to steal a sound from a television program or movie to spice up your computer's otherwise drab responses (my system has Homer Simpson crying "Doh!" every time I hit the wrong key). Equipped with a modern soundboard and suitable software, your computer can readily capture any audio source you have for whatever purpose you want.

As with most computing matters, however, you have several choices in capturing sounds. You can directly transfer the digital data from CD, you can digitize analog data from conventional audio sources, or you can make original recordings. Each technique has its own methodology and produces different results.

CD-Ripping

Audio Compact Discs already contain digital audio information, so you might think that capturing the data would be easy. After all, your computer already can tap its built-in CD drive to get the data it needs to set up programs. Matters are not quite so simple, however. The format for audio CDs is substantially different from the Yellow Book standard used by digital data CDs. The differences extend to formatting and error correction. Simply put, your ears tolerate audio errors much better than digital systems tolerate errors in code. Although a data error may be inaudible to you when you listen to a CD playing back, the data stream may include errors that will send digital circuits into conniptions. If your computer cannot detect the audio errors, they will be faithfully encoded as digital data in the proper format but with the wrong value. Transient audio errors become a permanent part of the data—and the digitally encoded audio that you listen to.

The ability for a CD to deliver audio in digital form to your computer is called digital audio extraction. The process requires special software that has come to be known as the CD ripper. Similarly, the extraction process is often called CD-ripping.

CD drives differ widely in their ability to yield up pure audio data. Most early drives can not faithfully extract audio data. They are unable to properly frame that audio data into digital form, resulting in jitter. The aural result is distortion in the digital audio, most often in the form of loud clicks that repeat throughout a transfer, sort of like someone has thoroughly rubbed 80-grit sandpaper over your favorite vinyl album.

Note that the quality of digital audio extraction is primarily dependent on the CD drive that you use. Older ATAPI-based (IDE) CD drives were poor at digital audio extraction. Music ripped using them contained numerous objectionable clicks and pops, if you could rip from them at all. Today's drives do a much better job. The only time you're likely to encounter clicks and pops is when ripping copy-protected commercial CDs. The noise is intentional.

CD-ripping is fast. With a fast microprocessor your computer will likely rip through CDs at the fastest speed at which your CD drive is capable. In other words, you may be able to rip CD files in from one-sixth to one-thirty-second the time it would take the cut to play back.

Analog Capture

The alternative to ripping digital audio is to capture audio in analog form and convert it to digital. All soundboards have analog-to-digital converter circuits that carry out this process, which is nothing more than ordinary audio sampling. You only need supply the analog source to your soundboard and specify the parameters of the sample—bit-rate and bit-depth.

Internal CD

If your source material is CD, you can use the internal CD drive of your computer for analog capture even if it is not capable of digital audio extraction. Use the mixer software supplied with your soundboard to specify the CD drive as the source and then mute or switch off other sources. Your soundboard will then sample only the audio from your CD.

The quality you get from analog capture will be lower than with direct extraction. Your samples will pick up all noise and distortion present in the analog circuitry of your soundboard, which can be substantial. For example, while the noise floor of CD audio is about 96 dB below peak level, going through the analog audio circuitry of both the CD drive and soundboard may increase the noise so that it is only 60 to 70 dB below peak audio. This level of noise is often not objectionable and can actually sound superior to audio extracted from a marginal CD drive that is rife with clicks and pops. In fact, this noise level is comparable with optimum FM radio reception. If you are a purist, you can remove most of the residual noise with the digital noise–reduction functions available in some audio-editing programs.

Another disadvantage of analog capture of CD audio is speed. Analog capture is a real-time process. Sampling a cut from a CD will require precisely the playback time of the cut. In addition, you may need to edit what you've captured, trimming the silence from the beginning and end of the cut.

External Components

You can also capture sounds from external components using the Auxiliary Input that's available on most (but not all) soundboards. The Auxiliary Input functions exactly like the input to a cassette recorder. The audio signals you plug in there are routed through the mixer function of the soundboard to its analog-to-digital converter to be sampled. Once the audio signals are digitized, you can use the audio data exactly like that ripped from CD.

You can plug an external audio CD player directly into the Auxiliary Input of your soundboard. Better still, you can connect the recorder outputs of your receiver to the Auxiliary Input of your soundboard and use your computer as if it were a cassette recorder, using the receiver to select the external audio source you want to record.

The mixer software supplied with most soundboards also has a MIDI input that allows you to translate MIDI files—both those you make and those you obtain elsewhere—into digital audio files that you can play back like any other music file. You can make recordings by selecting the MIDI source and using any sound-recording software. Note, however, that most soundboards use their own MIDI circuitry to synthesize the sounds, so the capabilities and idiosyncrasies of your soundboard will determine the musical voices in the MIDI recordings that you make.

Microphone Inputs

The worst way to try to capture audio from a CD is to shove a microphone in front of a speaker. Not only does the quality suffer the indignity of conversion from digital to analog format, but it also suffers the substantial shortcomings of both the speaker and the microphone. You might just as well listen to a friend play the CD over the telephone.

Using microphones is the only way to capture live sounds, be it your own speech or the band infesting your garage that no amount of warfarin will deter. Most soundboards have microphone inputs that allow you to plug directly in and start recording simply by selecting the microphone (or mic) source in your soundboard's mixer program.

That said, you should be aware that most soundboard microphone inputs are designed primarily for speech. Although they are able to accept a wide frequency range, they have a high noise level because their circuits are immersed inside the noisy electronics of your computer. You can obtain better quality recordings by using an external microphone preamplifier such as that in a mixing console (or "board").

Note that most soundboard microphone inputs are stereophonic and use stereophonic miniature jacks with three connections—tip, ring, and sleeve. A single monophonic microphone with a standard miniature plug (tip and sleeve) usually won't work with these inputs. Adapters are readily available to let you use a single microphone with a stereo input or to combine the signals from two microphones into a single plug to match your soundboard. Modern microphones designed for computer applications are usually equipped with stereo plugs to match the inputs of soundboards.

Legal Issues

Commercial recordings—in general that means all the CDs that you've bought—are protected by copyright laws. You should not post what you've captured or sampled on the Internet. You should not use material in a production (for example, a slideshow, training video, or simply as background sound) without the permission of the copyright holder. You should never sell a copy of anything that you've sampled from a copyrighted source such as a CD.

You can back up a CD that you have bought to an MP3 file for your personal use (for example, to make an MP3 file that you can listen to on a portable MP3 player). Current copyright law allows you to make a single backup of copyrighted software that you have bought. It's no different than making a copy of a disk on a cassette to play in your car (although some folks question the legality of making such copies, as long as you only make one copy and do it for personal use only, you won't have to worry about the copyright police pounding on your door). On the other hand, it is both illegal and immoral to copy tracks from discs that you have borrowed from a friend or library.

[ Team LiB ]