voice fundamentals book

Upload: siddharta-mangas-espinosa

Post on 03-Jun-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/12/2019 Voice Fundamentals Book

    1/127

    F o r e w o r d

    After more than 100 years of experience in providing global telecommunications

    solutions, Nortel (Northern Telecom) has acquired Bay Networks, Inc., adding world-

    class, IP-based data communications capabilities that complement and expand Nortel's

    acknowledged strengths. This precedent-setting union creates Nortel Networks, a new

    company with a widely respected heritage and a unique market position: Unified

    Networks.

    Unified Networks create greater value for customers worldwide through network

    solutions that integrate data networking and telephony. The Unified Networks strategy

    extends to solutions, products, and services, delivering new economics in networking by

    reducing costs, introducing higher revenue services, and delivering new value derived

    through networking.

    The emergence of the World Wide Web and increasing market deregulation has created

    a strong demand for networks that provide increased profitability and higher service

    levels for organizations of all types. Unified Networks from Nortel Networks deliver

    cutting-edge solutions to reach these new levels of economics.

    To meet increased needs, Nortel Networks is delivering a new class of customer rela-

    tionships. Extranets, intranets, Web access, e-mail, call centers, and old-fashioned

    personal attention are combined to help customers deal with a wide range of new, chal-

    lenging, and potentially confusing issues. Whether a customer is at the enterprise level,

    a service provider, or a small business, Nortel Networks delivers Unified Networks

    solutions designed to meet their unique business challenges.

    Solutions based on Unified Networks strategies can take many forms, involving many

    different products and technologies - including those that existed prior to the merger with

    Bay Networks and many launched since. Unified Networks solutions are differentiated

    only by their size, scope, and ambition.

    Each solution is tailored to the unique needs of the customer, and integrates a variety of

    products, technologies, and services, some of which are described below.

    Accelar brings together switching and routing into a low-cost, very high-performance

    package.

    CallPilot unifies disparate messaging systems, user interfaces, and presentation

    formats, making messaging more intuitive and easier to use.

    I

    Vo ice Fu n d a m e n ta ls

    I n t r o d u c t i o n

  • 8/12/2019 Voice Fundamentals Book

    2/127

  • 8/12/2019 Voice Fundamentals Book

    3/127

    #1 FRADs - Dataquest 1998

    #1 FRAD revenues - Dell'Oro 1H98

    #1 Packet Switch - Dataquest 1998

    #1 in PADs - Dataquest 1998

    Vo ice Co m m u n ica t i o nVoice communication has long existed in the world of analog and digital telephone

    exchanges. Fixed, dedicated, switched services have provided the user with the ability

    to place telephone calls to practically anywhere in the world. The way that voice is

    carried and switched within both public and private networks is changing with the

    evolution towards the Broadband-Integrated Services Digital Network (B-ISDN), as

    based upon Asynchronous Transfer Mode (ATM) technology. This evolution is acceler-

    ating the shift to enterprise networks capable of handling voice, video, and data trans-

    missions over a single, integrated infrastructure.

    This evolution delivers numerous benefits, including more efficient use of network

    bandwidth and the ability to offer many different types of voice service. However, there

    are many issues that first need to be understood, and then overcome, before these

    benefits can be realized.

    This booklet examines these issues, highlighting voice communication techniques in

    use today and investigating those that may become commonplace in the future. Of

    course, with the limited space available, every technological aspect will not be covered

    in depth. This booklet is designed to serve as useful introduction to the main subject

    areas, and additional references have been included for individuals interested in

    learning more.

    References are given by a number in square brackets and listed in Appendix D e.g. [3].

    In t e n d e d Au d i e n ceThis booklet is intended for a broad range of readers who are involved in the design,

    implementation, operation, management, or support of enterprise networks carrying

    both voice and data traffic. It has particular relevance to those with a background in data

    and/or limited experience in voice technologies, although it will also serve as a useful

    reference for anyone involved in voice networks today.

    III

    Vo ice Fu n d a m e n ta ls

    I n t r o d u c t i o n

  • 8/12/2019 Voice Fundamentals Book

    4/127

    IV

    Vo ice Fu n d a m e n ta ls

  • 8/12/2019 Voice Fundamentals Book

    5/127

    V

    Vo ice Fu n d a m e n ta ls

    Ta b le o f C o n t e n t s

    Ta b le o f Co n t e n t s

    1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11.1 A Short History of the Telephone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1

    2 The Telephone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32.1 Key Voice Fundamentals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3

    2.1.1 Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3

    2.1.2 Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4

    2.2 Basic Operation of the Telephone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4

    2.2.1 Basic Telephony - Signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4

    2.2.2 Basic Telephony - The Speech Path . . . . . . . . . . . . . . . . . . . . . . .7

    2.3 Other Types of Telephones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7

    2.3.1 "Two-wire" vs. "Four-wire" Telephony . . . . . . . . . . . . . . . . . . . . . . .8

    3 PBX Phone Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93.1 Introduction to the PBX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9

    3.2 Call Routing in a PBX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10

    3.3 Voice Interfaces on a PBX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11

    4 Introduction to Digital Voice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .134.1 The Channel Bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13

    4.2 Digital Voice - Pulse Code Modulation (PCM) G.711 . . . . . . . . . . . . . . .13

    4.2.1 A-law and -law PCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16

    4.2.2 Power of a Digital Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16

    4.2.3 Distortion Resulting from the Digitization Process . . . . . . . . . . . .17

    4.3 The Digital 1.544 Mbps PBX Interface (DS-1) . . . . . . . . . . . . . . . . . . . .17

    4.3.1 Physical Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .17

    4.3.2 Framing - D4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18

    4.3.3 Framing - Extended Superframe (ESF) . . . . . . . . . . . . . . . . . . . .18

    4.3.4 Channel Associated Signaling (CAS) on DS-1 . . . . . . . . . . . . . . .19

    4.3.5 Common Channel Signaling on DS-1 . . . . . . . . . . . . . . . . . . . . .20

    4.3.6 DS-1 Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20

    4.4 The Digital 2.048 Mbps PBX Interface (E1) . . . . . . . . . . . . . . . . . . . . . .20

    4.4.1 Physical Interface - G.703 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20

    4.4.2 Framing Structure - G.704 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .21

    4.4.3 Channel Associated Signaling (CAS) on E1 . . . . . . . . . . . . . . . . .23

    4.4.4 Common Channel Signaling (CCS) on E1 . . . . . . . . . . . . . . . . . .24

    4.4.5 E1 Alarms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24

  • 8/12/2019 Voice Fundamentals Book

    6/127

    4.5 The Need for PBX Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . .25

    4.5.1 PBX Systems Without Synchronization . . . . . . . . . . . . . . . . . . . .26

    4.5.2 PBX Systems With Synchronization . . . . . . . . . . . . . . . . . . . . . . .26

    5 Speech Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .295.1 Different Coding Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29

    5.2 Adaptive Differential Pulse Code Modulation (ADPCM) . . . . . . . . . . . . .31

    5.3 Code Excited Linear Prediction (CELP) . . . . . . . . . . . . . . . . . . . . . . . . .33

    5.4 Low Delay-CELP (LD-CELP) ITU-T G.728 . . . . . . . . . . . . . . . . . . . . . . .34

    5.5 Conjugate Structure-Algebraic CELP (CS-ACELP) ITU-T G.729 . . . . . .34

    5.6 Other Compression Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35

    5.7 Speech Compression Impairments . . . . . . . . . . . . . . . . . . . . . . . . . . . .36

    5.7.1 Mean Opinion Score (MOS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36

    5.7.2 Quantization Distortion Units (QDUs) . . . . . . . . . . . . . . . . . . . . . .38

    5.7.3 Speech Compression and Voice-Band Data . . . . . . . . . . . . . . . . .39

    5.8 Fax Relay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39

    6 Echo and Echo Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .416.1 What is Echo? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41

    6.1.1 Causes of Echo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42

    6.2 Echo Control Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43

    6.2.1 When Is Echo Control Required? . . . . . . . . . . . . . . . . . . . . . . . . .43

    6.2.2 Echo Control Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44

    6.3 Echo Suppressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45

    6.4 Echo Cancellers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45

    6.4.1 Nonlinear Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46

    6.4.2 Tail Circuit Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47

    6.4.3 Types of Echo Cancellers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48

    6.4.4 Tone Disabling of Echo Cancellers and Echo Suppressors . . . . . .49

    6.4.5 G.168 (Improved Echo Canceller) . . . . . . . . . . . . . . . . . . . . . . . .49

    7 Introduction to Signaling Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . .517.1 Analog Signaling Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51

    7.1.1 Ground Start: 2 Way PBX to Public Exchange Trunk Circuit . . . . .51

    7.1.2 E&M Trunk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52

    7.1.3 AC Signaling Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55

    7.1.4 Manual Signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56

    7.2 Digital Signaling Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58

    7.2.1 Channel Associated Signaling (CAS) . . . . . . . . . . . . . . . . . . . . . .58

    7.2.2 Common Channel Signaling (CCS) . . . . . . . . . . . . . . . . . . . . . . .59

    VI

    Vo ice Fu n d a m e n ta ls

    Ta b le o f C o n t e n t s

  • 8/12/2019 Voice Fundamentals Book

    7/127

    7.2.2.1 Private-to-Public Networking Protocols . . . . . . . . . . . . . . . .60

    7.2.2.2 Public to Public Networking Protocol . . . . . . . . . . . . . . . . .60

    7.2.2.3 Private Networking Protocols . . . . . . . . . . . . . . . . . . . . . . .60

    7.2.2.4 How Does CCS Work? . . . . . . . . . . . . . . . . . . . . . . . . . . .61

    8 Voice Within the Enterprise Network . . . . . . . . . . . . . . . . . . . . . . . .658.1 What is an Enterprise Network? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65

    8.1.1 Different Types of Enterprise Networks . . . . . . . . . . . . . . . . . . . .66

    8.2 Time Division Multiplexers (TDM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .68

    8.2.1 TDM Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70

    9 Voice Over Asynchronous Transfer Mode (ATM) . . . . . . . . . . . . . . .719.1 Introduction to ATM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71

    9.1.1 The ATM Cell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .71

    9.1.2 The ATM Adaptation Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . .72

    9.1.3 ATM Service Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .75

    9.1.4 Statistical Multiplexing with ATM . . . . . . . . . . . . . . . . . . . . . . . . .76

    9.2 Voice and Telephony Over ATM (VTOA) . . . . . . . . . . . . . . . . . . . . . . . .77

    9.2.1 Voice Sample Cellification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78

    9.2.2 Speech Activity Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .79

    9.3 PBX Synchronization Across an ATM Network . . . . . . . . . . . . . . . . . . .80

    9.3.1 Synchronous Residual Time Stamp (SRTS) . . . . . . . . . . . . . . . . .81

    9.3.2 Adaptive Clock Recovery (ACR) . . . . . . . . . . . . . . . . . . . . . . . . .81

    9.3.3 Independent Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .82

    10 Voice Over Frame Relay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8310.1 Introduction to Frame Relay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .83

    10.2 Voice Over Frame Relay (VoFR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .85

    10.2.1 Delay on Frame Relay Networks . . . . . . . . . . . . . . . . . . . . . . . .86

    10.2.2 VoFR Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .86

    10.3 Benefits and Issues of VoFR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88

    10.3.1 Benefits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88

    10.3.2 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .89

    11 Voice Over IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9111.1 What is IP? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .91

    11.1.1 How Voice Over IP Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93

    11.1.2 Benefits and Issues of Voice Over IP . . . . . . . . . . . . . . . . . . . . .93

    11.1.3 Standards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .94

    VII

    Vo ice Fu n d a m e n ta ls

    Ta b le o f C o n t e n t s

  • 8/12/2019 Voice Fundamentals Book

    8/127

    12 Voice Switching in the Enterprise Network . . . . . . . . . . . . . . . . . . . .9512.1 The Evolution of Voice Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . .96

    12.2 What is Voice Switching? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .98

    12.3 Why Perform Voice Switching? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .99

    APPENDIX A - Company and Product Overview . . . . . . . . . . . . . . . . .101

    APPENDIX B - Introduction to Decibels . . . . . . . . . . . . . . . . . . . . . . . .109

    APPENDIX C - Glossary of Terms . . . . . . . . . . . . . . . . . . . . . . . . . . . .111

    APPENDIX D - References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .117

    VIII

    Vo ice Fu n d a m e n ta ls

    Ta b le o f C o n t e n t s

  • 8/12/2019 Voice Fundamentals Book

    9/127

    1

    Vo ice Fu n d a m e n ta ls

    1 In t rod u ct ion

    This booklet provides an introduction into many of the fundamentals of voice communi-

    cation, beginning with analog techniques, and concluding with voice over ATM. Some of

    the ways that enterprise networks can be created will be discussed, and how to make

    the most efficient use of available network bandwidth through techniques such as

    speech compression and speech activity detection (also known as silence suppression).

    Many other issues are discussed, including how and why echo occurs and techniques

    that can be used to overcome it, thus allowing network managers to migrate their voice

    networks onto ATM.

    1.1 A Sh o r t His to ry of t h e Telep h on eIn the mid-1870s, while trying to understand sound and sound communications,

    Scottish-born inventor Alexander Graham Bell had an idea for a device that would

    transmit sound over long distances by converting the sound to an electrical signal. This

    device was later called the telephone derived from the Greek words meaning 'far' (tele)

    and 'sound' (phone). Bell was not the only person of the time developing a telephone

    device. However, he was the first one to patent the device in 1876.

    Further developments were made to the telephone during the late 1870s. Bell created

    the induction-based earpiece, and Thomas Edison was responsible for the design of the

    carbon microphone. The incorporation of these enhancements produced a truly practical

    instrument.

    Initially the telephone had no mechanism for dialing another number. To make a call a

    handle needed to be turned which generated an electric current. This current signaled

    the operator of the local exchange. To connect a caller to the called party, the operator

    would manually insert a jack plug into the corresponding jack socket.

    It wasn't until 1889 that Almon B. Strowger developed the automatic telephone

    exchange. The most unlikely of people to be involved in telephony, Strowger developed

    the exchange as a way of beating his business rival in Kansas City, USA. The wife of

    Strowger's main competitor was the operator of the local exchange, and whenever a call

    came in asking for an undertaker, naturally she passed it onto her husband. To alleviate

    this problem, Strowger developed the first automatic telephone exchange and the dial

    telephone, eliminating the need for an operator.

    Telephone networks have undergone many changes since those early days. However,

    many of the underlying principles remain the same. The basic "two-wire" telephone used

    Sect ion 1 - In t ro du c t ion

  • 8/12/2019 Voice Fundamentals Book

    10/127

    in most domestic homes today still operates in essentially the same way as the tele-

    phones of over 100 years ago.

    2

    Vo ice Fu n d a m e n ta ls

    Sect ion 1 - In t ro du c t ion

  • 8/12/2019 Voice Fundamentals Book

    11/127

    2 Th e Te lep h on e

    In this section, the basic operation of the telephone is examined with a look at the two

    basic functions that it offers: signaling and speech transmission. To better understand

    this critical piece of equipment, it is important to appreciate how human voice and

    hearing function. To complete this section, other types of telephones will be examined,

    including proprietary designs and digital telephone sets.

    2.1 Key Voice Fu n d a m en ta ls

    2.1.1 Fre q u e n cy

    Human speech occurs as a result of air being forced from the lungs, through the vocal

    chords and along the vocal tract which extends from an opening in the vocal chords to

    the mouth and nose. Speech consists of a number of different types of sounds, including

    voiced, unvoiced and plosive sounds. The voiced sounds result from the vocal chords

    vibrating, thus interrupting the flow of air from the lungs and producing sounds in the

    frequency range of approximately 50 to 500 Hertz (Hz). Unvoiced sounds result when

    the air passes some obstacle in the mouth or a constriction in the vocal tract. Finally,

    plosive sounds result from air being let out with a sudden burst, for example when the

    vocal tract is closed then suddenly released, or when the mouth is closed and suddenly

    opened. A person's nasal cavities and sinuses also modify all of these sounds, and all

    contribute to what we know as normal human speech.

    The range of frequencies that result from these sound sources, combined with the

    structure of the vocal tract, nasal cavities, and sinuses vary, depending upon whom is

    actually speaking. The resulting mix of frequencies determines the unique sound of a

    person's voice.

    The range of frequencies produced by speech varies significantly from one person to

    another as explained above. Normally, frequencies in the range of about 50 Hz upward

    are generated, with the majority of the energy concentrated between about 300 Hz and

    3 kilohertz (kHz). The human ear, on the other hand, can detect sounds over a range of

    frequencies from around 20 Hz to 20 kHz, with maximum sensitivity in the region

    between 300 Hz and 10 kHz.

    Taking these two factors into account, as well as the results of practical testing, the

    frequency band of 300 Hz to 3.4 kHz has been found to be the key sonic range for

    speech intelligibility and voice recognition. Reducing this bandwidth quickly reduces

    intelligibility, while increasing it adds quality yet does not significantly improve intelligi-

    bility or voice recognition.

    3

    Vo ice Fu n d a m e n ta ls

    Sect ion 2 - Th e Te lep h on e

  • 8/12/2019 Voice Fundamentals Book

    12/127

    As a result, the frequency band used in telephone systems is limited to between 300 Hz

    and 3.4 kHz, delivering a system that provides speech transmission that is quickly

    recognized and easily understood.

    2.1.2 LevelsIt is important to ensure that voice signals are transmitted at the correct level across a

    network, so that end-to-end performance is maintained. Too low a level can result in

    speech merging into background noise, creating an environment where the listener finds

    it hard to hear the talker and is encouraged to talk loudly. On the other hand, too high a

    level will encourage the listener to talk too quietly.

    Today, international voice communication is part of everyday life. People need to be able

    to communicate with others anywhere in the world as effectively as if they were in their

    own country, or even in their own office. This goal is complicated by the way telephone

    systems have evolved differently in different countries. For example, an analog (the term

    analog is described in section 4.2) telephone from North America transmits a lower-level

    electrical signal for a given acoustic volume than a telephone in the UK.

    Signal levels will be discussed in this booklet in terms of decibels (dBs), and related

    terms such as dBm, dBm0, and dBr. For readers unfamiliar with these terms, or individ-

    uals who simply need a refresher, please refer to Appendix B.

    2.2 Ba s ic Ope r a t i on o f t h e Te le p h on e

    Telephones come in many varieties, yet they fall into two main categories: analog, and

    digital. The original sets designed by A. G. Bell were analog. In fact, most telephones

    used in domestic environments are still analog.

    The simplest form of telephone today is the two-wire "loop-disconnect" telephone. It is

    also known by various other names, including "loop-start" and "POTS" (Plain Old

    Telephone Service) telephone. It connects to the telephone exchange via two wires that

    carry the voice signals in both directions, hence the term two-wire telephone. The wires

    also carry the dialed digits to the exchange and the incoming ringing voltage to the

    phone. The exchange places a voltage of about 48 volts across the pair of wires to

    power the telephone and monitor the on-hook, off-hook, and pulse dialing activity.

    2.2.1 Ba sic Telep h on y - Sign a lin g

    To initiate a call, the user lifts the handset. This action closes a switch in the telephone

    and causes current to flow in a loop, hence the term "loop-start." The exchange detects

    this current as an incoming call and provides a dial tone to the line. The dial tone signals

    4

    Vo ice Fu n d a m e n ta ls

    Sect ion 2 - Th e Te lep ho n e

  • 8/12/2019 Voice Fundamentals Book

    13/127

    to the user that they may now start to dial. Dialing before hearing the dial tone may result

    in digits being missed by the exchange. However, today's modern exchanges will

    usually return dial tone immediately after detecting current flow. Upon hearing the dial

    tone, the user begins to dial the called number. If the telephone is set to pulse dial, the

    telephone rapidly opens and closes the loop at a rate of approximately 10 PPS (Pulses

    Per Second) or 20 PPS. This is also referred to as loop-disconnect dialing. Figure 2.1

    shows the progress of a call from the handset being lifted and dial tone being returned,

    to the first digit being dialed (a 3 in this case).

    The dial speed and the make/break ratio are standards that were set in the past. They

    reflect the characteristics of switching equipment and direct control switches. The make

    break ratio varies according to the different dial pulse receivers used in different

    countries (e.g. North America:

    61/39, UK: 67/33, Germany:

    60/40). The ratio 50/50 was not

    chosen because it did not match

    the characteristics of mechanical

    relays and switches in the

    switching systems.

    An alternative way of sending

    dialing information, called Dual

    Tone Multi Frequency (DTMF) is

    much more common today. In this

    form of signaling, each number is

    represented by two tones that are

    transmitted simultaneously on the

    voice path for short period of time.

    5

    Vo ice Fu n d a m e n ta ls

    Sect ion 2 - Th e Te lep h on e

    Figu re 2 .2 DTMF Freq u en cies

    Figu re 2.1 Op era t ion o f Loo p Discon n ect Dia l ling

  • 8/12/2019 Voice Fundamentals Book

    14/127

    The frequencies used are shown in Figure 2.2 and defined in ITU-T Recommendation

    Q.23 [1].

    DTMF transmits digits much faster than pulse dialing, and the time taken to send each

    digit is independent of the digit being sent. An additional benefit of DTMF is that once

    the call is established, pressing a key on the phone will transmit the tones over the voice

    path, enabling DTMF to be used to access voice mail, home banking systems, and other

    tone-based systems.

    When an incoming call arrives at the telephone set, the exchange applies an AC ringing

    voltage to the pair of wires. To answer the incoming call, the user picks up the handset.

    This action applies a loop to the line that is detected by the exchange, which then

    removes the ringing and connects the voice path through.

    Recall - Recall is a function usually available on a simple two-wire analog telephone

    (except for older models). It is often accessed with a button marked "R", and can be

    used for a number of functions such as accessing additional features from a telephone

    exchange or swapping between calls on the same line. There are two types of Recall,

    namely Timed Break Recall (TBR) and Earth Recall (ER). With TBR, pressing the Recall

    button while the handset is off-hook causes the phone to put a timed break on the line

    (similar to dialing a "1"). With the phone set to Earth Recall, the phone momentarily

    applies a ground (earth) to one of its leads, known as the B lead.

    6

    Vo ice Fu n d a m e n ta ls

    Sect ion 2 - Th e Te lep ho n e

    Figu re 2 .3 Tw o w ire Telep ho n e Set In te rfa ce

  • 8/12/2019 Voice Fundamentals Book

    15/127

    2.2 .2 Ba s ic Telep h on y - Th e Sp ee ch Pa th

    Apart from transmitting dialing information, the main function of the telephone is to

    provide voice communications. As already mentioned, the simple telephone has to

    provide simultaneous voice paths in both directions even though there are only two

    wires. It achieves this through the use of a hybrid circuit, the purpose of which is to take

    four-wire speech (i.e. separate paths for transmit and receive) and to combine the two

    onto a single two-wire path. Figure 2.3 shows a simplified view of the interface between

    a two-wire telephone and the exchange. Speech from the mouthpiece will pass through

    the hybrid and onto the telephone line. However, for reasons described below, a certain

    amount of the speech signal will be reflected back to the earpiece, which is referred to

    as sidetone.

    The telephone, the telephone line, and the exchange interface create impedance that

    determine the relationship between the voltage and current on the line. To enable the

    maximum amount of power to be transferred from the telephone to the line and into the

    exchange interface (and vice versa), the impedances should be as close as possible to

    one another. This is achieved through the use of balance impedance.

    As long as the balance impedance closely matches the impedance presented by the line

    and the exchange interface, minimal amounts of signal will be reflected back to the

    telephone earpiece. If there is significant impedance "mismatch", then sidetone

    increases as more signal is reflected back to the earpiece. At the exchange interface,

    the speech signal originating from the telephone mouthpiece is directed towards caller

    "Y" by the hybrid.

    In the opposite direction, a speech signal from caller "X" at the exchange is transmitted

    to the 2-wire line, and as long as the exchange balance impedance matches the line and

    telephone set, then little of this signal is reflected back to caller "Y".

    2.3 Oth er Typ es of Te lep h on esWhile the simple two-wire telephone set is used extensively in the domestic environ-

    ment, it is less common in private networks based on Private Branch Exchange (PBX)

    systems. In this environment, it is common to find feature telephone sets that offer a

    wider range of facilities than the basic telephone. These feature sets will vary from

    manufacturer to manufacturer, each one usually being of a proprietary design. Some will

    be analog, while others will be digital. Most PBX systems today are of the digital variety,

    although this does not necessarily imply the use of a digital telephone set, and allow

    both analog and digital phones to be used. With an analog phone, the interface in the

    7

    Vo ice Fu n d a m e n ta ls

    Sect ion 2 - Th e Te lep h on e

  • 8/12/2019 Voice Fundamentals Book

    16/127

    PBX converts the analog signals to digital PCM (see Section 4.2). A digital telephone

    performs the analog/digital and digital/analog conversions within the set.

    Some digital telephones are of a proprietary nature, where the format of the digital data

    is manufacturer-specific. Many digital sets, however, conform to the Basic Rate ISDN

    standards as outlined in ITU-T Recommendation I.420 [2]. This defines a digital

    interface that carries two channels operating at 64 Kbps (known as the B or Bearer

    channels) and a 16 kilobits per second (Kbps) signaling channel (known as the D or

    Delta channel).

    The B channels can be used to carry data or digitized voice in a similar manner to the

    Primary Rate interfaces as described in sections 4.3 and 4.4. The main difference is that

    on a Basic Rate interface only two B channels are available. The D channel is normally

    used to carry Common Channel Signaling (CCS) information for call control in a similar

    manner to CCS on Primary Rate interfaces (see Section 7.2). However, the specifica-

    tions also allow the D channel to be used to carry user data, including packet-switched

    and frame-based data.

    2.3.1 "Tw o-w ire" vs . "Fou r-w ire" Telep h on y

    References are frequently made to two-wire or a four-wire telephone sets. Care should

    be taken when interpreting the meaning of these terms, since they can be confusing. A

    two-wire telephone means that speech is carried in both directions on the same pair of

    wires, and requires hybrid circuits to split the two paths into separate transmit and

    receive functions at the telephone set. It is also possible, particularly in some proprietary

    telephone sets, to have additional wires for signaling purposes. In this case, a two-wire

    telephone may actually be connected to the exchange by more than two wires.

    A four-wire telephone strictly means that speech is carried in each direction on separate

    pairs of wire and no hybrid circuit is necessary. However, a better definition of four-wire

    telephony is that the speech is carried on separate paths that may be pairs of wire, or

    might even be separate channels in a digital system. For example, once into a digital

    telephone exchange, speech is carried as "four-wire" even though it is carried in

    timeslots in high-speed digital signals rather than actually on four wires.

    8

    Vo ice Fu n d a m e n ta ls

    Sect ion 2 - Th e Te lep ho n e

  • 8/12/2019 Voice Fundamentals Book

    17/127

    3 PBX Ph on e Sys te m s

    This section is not intended to be a detailed description of the operation of PBX phone

    systems. However, in order to better understand some of the other sections presented

    in this booklet, some of its key aspects will be covered.

    The PBX system will be examined followed by a quick look at how telephone calls are

    routed within a PBX network. Finally, the different types of voice interfaces normally

    available on a PBX will be introduced.

    3.1 In t r od u ct ion to th e PBX

    Simply put, a PBX is the telephone exchange that is privately owned by an organization.

    Its objective is to provide voice (and often data) communications to its users. In addition

    to offering simple call setup facilities, it also offers many other features and facilities to

    make life easier for its users. Users can place calls to others in their organization by

    simply dialing their extension number. To place a call to a person not connected to the

    same PBX network requires the call to be routed via the Public Switched Telephone

    Network (PSTN). This usually involves dialing an access code, such as a "9" or "0",

    followed by the complete destination number including country code and area code

    if appropriate.

    9

    Vo ice Fu n d a m e n ta ls

    Sect ion 3 - PBX Phon e Sys te m s

    Figu re 3.1 Typica l Com p on en ts of a Dig i ta l PBX

  • 8/12/2019 Voice Fundamentals Book

    18/127

    Most larger PBX systems today are digital. This means that they route connections in a

    digital form, with speech first being converted from analog into PCM (see Section 4.2).

    Figure 3.1 shows the typical components used within a digital PBX.

    The "core" of the PBX is the common control and the switching matrix. The common

    control acts as the "brains", and controls the overall operation of the PBX. It performs

    functions such as recognizing that a phone has been taken off hook and connecting a

    dial-tone generator to the phone, interpreting the dialed digits and routing the call to a

    particular trunk or line interface, and so on. The switching matrix takes in 1.544 Mbps bit

    streams comprising multiple 64 Kbps channels and allows them to be connected, or

    switched, to any other 64 Kbps channel on any other interface.

    Interfaces to the PBX come in two main types: lines and trunks. Lines connect user

    devices such as analog or digital telephone sets, or other devices such as data

    terminals. Trunks are shared links and can carry connections originating from line inter-

    faces on the same PBX or from other trunks also connected to the PBX. Analog trunks

    can only support one connection at a time, while digital trunks can support many

    connections simultaneously (see Sections 4.3 and 4.4). The trunks can then be broken

    down into two additional types: PSTN trunks (also called Central Office trunks) and

    private trunks. PSTN trunks connect the PBX to the public telephone network, and

    private trunks (also called "Tie lines" or "Tie trunks") connect the PBX to other PBXs as

    part of an overall private network.

    3.2 Ca ll Ro u t in g in a PBX

    When a user dials a destination number, the PBX needs to determine how to route the

    connection in the most efficient manner. The PBX needs to consider many factors,

    including: Is the number valid? Is this user allowed to connect to the specified destina-

    tion? Which is the cheapest trunk to use? Is there a trunk free?

    Figure 3.2 shows a network of PBX systems connected together with inter-PBX trunks,

    which could be analog, digital, or both. To place a call the user takes Telephone A "off-

    hook" and dials the number for Telephone B. PBX 1 inspects the dialed digits and makes

    a decision as to which trunk to route the call on. In this case it chooses Trunk 1. PBX 1

    seizes Trunk 1 (or a single timeslot of a digital trunk) and passes dialing information

    across it. The method used for dialing will depend upon what type of trunk it is (see

    Section 7 for more details on signaling). PBX 2 receives the call, inspects the dialed

    number, and routes the call accordingly onto PBX 3. PBX 3 inspects the digits, identifies

    the destination to be located on the exchange, and alerts the user to an incoming call

    by making the phone ring.

    10

    Vo ice Fu n d a m e n ta ls

    Sect ion 3 - PBX Phon e Sys te m s

  • 8/12/2019 Voice Fundamentals Book

    19/127

    3.3 Voice In te r fa ces o n a PBX

    The types of voice interface available on PBX systems are many and varied. They

    typically fall into three categories:

    Line Interfaces - These are the interfaces on the PBX that connect to desktop tele-

    phones. Line interfaces can include any of the types discussed in Sections 2.2 and 2.3,

    including:

    2-wire analog loop disconnect/loop start

    2-or 4-wire analog proprietary feature set

    4-wire digital set, either proprietary or conforming to Basic Rate ISDN

    standards

    Private Trunk Interfaces - These interfaces provide the links between PBXs within a

    multi-PBX private network. They allow calls to be routed from one PBX to another

    without the need to involve the public network, avoiding extra call cost. Private trunk

    interfaces typically include:

    2- or 4-wire analog with Ear and Mouth (E&M) signaling

    4-wire analog with AC15 signaling

    Digital trunk supporting CAS or CCS signaling

    Public Trunk Interfaces - These provide access from the PBX to the PSTN (Public

    Switched Telephone Network) for outgoing and/or incoming calls. Public trunk interfaces

    typically include:

    Ground Start analog trunk: 2-wire, both-way calling

    Analog Direct Dial In (DDI): 2-wire, typically incoming calls only

    Digital trunk supporting CAS or CCS signaling

    11

    Vo ice Fu n d a m e n ta ls

    Sect ion 3 - PBX Phon e Sys te m s

    Figu re 3.2 Ca ll Rou t in g in a PBX Pr iva t e Ne tw ork

  • 8/12/2019 Voice Fundamentals Book

    20/127

    12

    Vo ice Fu n d a m e n ta ls

  • 8/12/2019 Voice Fundamentals Book

    21/127

    4 In t r o d u ct i o n t o Dig i t a l Vo i ce

    As previously discussed, many of today's voice interfaces rely on analog technology.

    However, once received into a PBX or a wide area network, analog voice is converted

    to a digital format in order to derive all the available benefits. This section looks at the

    background to digital voice, followed by a more detailed look at how analog speech is

    actually converted to a digital format. The integration of digital voice channels into

    primary rate digital interfaces is then presented, and finally the need for synchronization

    in a PBX network is examined.

    4 .1 Th e Ch a n n e l Ba n k

    The channel bank was one of the first devices to make use of digital voice in a practical

    environment. It is a device that takes multiple analog voice channels, digitizes them, and

    then multiplexes them onto a high-speed digital link. Two main types of channel banks

    exist today, one supporting up to 24 voice channels, and another supporting 30/31

    channels multiplexed onto 1.544 Mbps and 2.048 Mbps digital links.

    The first channel bank was developed in North America in 1962, and was known as the

    D1 channel bank. It provided 24 analog inputs, each of which was converted to 8-bit

    PCM (although the least significant bit was then ignored). The resulting seven bits were

    used for each voice sample, and one bit was used for signaling, providing a combined

    data rate of 1.544 Mbps. It was later found that this seven-bit PCM gave unsatisfactory

    voice quality. Subsequent generations have used eight-bit PCM with "robbed bit"

    signaling (see Section 4.4.4).

    Channel banks have evolved beyond simply supporting analog voice circuits. Today, it

    is common for a channel bank to support multiple interface types other than just the

    analog voice interface. For example:

    2-wire speech with loop disconnect signaling (incoming and outgoing)

    2-/4-wire speech with E&M signaling

    Data 0 - 64 Kbps

    Data n x 64 Kbps

    4.2 Digi t a l Voice - Pu lse Cod e M od u la t ion (PCM) G.711

    Digital voice is a representation of analog voice signals that uses binary "1"s and "0"s,

    also known as bits.

    13

    Vo ice Fu n d a m e n ta ls

    Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice

  • 8/12/2019 Voice Fundamentals Book

    22/127

    Figure 4.1 shows the progress of a speech signal entering the telephone, being

    converted into an analog electrical signal, and then being converted into a digital form.

    When the talker speaks, they create variations of pressure in the air. The telephone

    picks up these pressure changes and turns them into an electrical signal that is

    analogous to the acoustic signal from the talker (hence the term analog). This analog

    signal is then converted into a digital stream of data bits which represents the digital

    voice signal.

    But why transport voice in a digital format? There are a number of reasons, including:

    Digital transmission is independent of distance - When an analog

    signal is transmitted on a transmission channel, it is attenuated to

    compensate for signal losses in the cable. In addition, noise is picked up

    that will affect the quality of the voice transmission. The signal that arrives

    at the destination is made up of a combination of the original signal and

    line noise. Amplifiers can be used to boost the signal back to the original

    level, but there is no easy way for the amplifier to distinguish between the

    original signal and line noise, so the noise is also amplified.

    Digital signals take on one of two levels, represented by binary "0"s and

    binary "1"s. When noise is introduced into the digital signal, it can be

    easily removed by regenerating equipment. Thus, the signal that arrives

    at the destination is an identical replica of the signal transmitted from the

    source.

    Multiplexing of voice and data - Since the digitized PCM voice is essen-

    tially a data stream running at a bit rate of 64 Kbps, it can be readily inte-

    grated with other PCM channels to make up an aggregate connection

    combining many voice channels in one physical connection. Since it is

    14

    Vo ice Fu n d a m e n ta ls

    Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice

    Figu re 4 .1 Prog ress o f spe ech signa l An a logu e to Dig it a l

  • 8/12/2019 Voice Fundamentals Book

    23/127

    essentially data, it can be combined with other 'real' data and transmitted

    over a common transmission medium. This approach is used for the

    digital representation of analog voice signals in telephone systems and is

    defined in ITU-T Recommendation G.711 [3]. A simplified view of the

    process can be seen in Figure 4.2.

    The analog signal is sampled at a rate of 8000 times per second. This rate is derived

    from a theory developed by Harry Nyquist, which states that the sampling rate must be

    at least twice the maximum frequency of the signal being sampled (i.e. > 2 x 3.4 kHz).

    This results in Pulse Amplitude Modulation (PAM), which is simply a series of pulses that

    represent the amplitude of the analog signal at each sample time. Each PAM sample is

    compared to a range of fixed quantization levels, each of which is represented by a fixed

    binary pattern. The binary pattern of the closest quantization level is then used to

    represent the PAM sample.

    15

    Vo ice Fu n d a m e n ta ls

    Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice

    Figu re 4 .2 An a logu e to PCM Con vers ion

    Fi g u re 4 . 3 Lin e a r / Un i fo rm a n d N o n - lin e a r / N o n -u n i fo rm Q u a n t i sa t i o n

  • 8/12/2019 Voice Fundamentals Book

    24/127

    Due to a finite number of quantization levels available, this process introduces an error

    into the digital representation of the analog signal. The more bits used in each sample

    results in additional quantization levels and less error. To achieve reasonable quality

    over the range of speech amplitudes found in networks requires a minimum of 12-bit

    PCM samples assuming linear quantization (also known as uniform quantization). A

    view of linear quantization is given in Figure 4.3a.

    In practice, this number of levels is unnecessary for two reasons. First, average signal

    levels are normally small, and only the lower quantization levels actually get used.

    Second, the human ear operates in a logarithmic manner, being more sensitive to distor-

    tion in low-level signals than in high-level signals.

    As a result, a technique known as companding is used. This reduces the number of

    quantization levels by retaining multiple levels at low signal amplitudes, and reducing the

    number of levels for high amplitudes. This process of companding is shown in

    Figure 4.3b.

    4.2 .1 A-la w a n d - la w PCM

    There are two common types of PCM, -law and A-law, each of which uses a different

    rule for the companding process. North America and Japan mostly use -law, whereas

    other areas of the world require the use of A-law. Both types are defined in the G.711

    Recommendation, yet differ in a number of ways. The companding laws are different,

    and the allocation of PCM codes differs in relation to the amplitude of the PAM samples.

    With A-law, after converting from PAM to PCM, the even bits of each sample are inverted

    before being transmitted onto the digital transmission path. This bit inversion was origi-

    nally used to ensure that a sufficient number of "1"s existed in the digital stream,

    because any channel that was idle would otherwise produce a pattern of only "0"s. In

    fact, on a 2.048 Mbps PBX interface this inversion is unnecessary, since the problem of

    too many "0"s is dealt with at the physical layer (see Section 4.3.1).

    For these reasons, when operating an international network where both -law and A-law

    PCM systems are used, it is important to perform a proper conversion between the two.

    The conversion process is given in G.711, which defines that digital paths between

    countries should carry signals encoded using A-law PCM. Where both countries use the

    same law, then that law should be used on digital paths between them. Any necessary

    conversion will be done in the countries using -law PCM.

    4 .2.2 Pow er o f a D ig i t a l Sign a lIt is easy to quantify the power levels for an analog interface, since this is something that

    can be measured directly with a power meter. In the PCM world, there is no equivalent

    16

    Vo ice Fu n d a m e n ta ls

    Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice

  • 8/12/2019 Voice Fundamentals Book

    25/127

    direct method of measurement. Instead, a specific relationship is defined between an

    analog signal and a digital sequence. The relationship of power to the digital signal is

    defined in G.711 through two tables (one for -law and one for A-law) that define a

    sequence of PCM samples. When decoded, these samples result in a sine wave signal

    of 1 kHz, at a nominal level of 0 dBm0. This provides a theoretical maximum level of

    +3.17 dBm0 for -law, and +3.14 dBm0 for A-law. Any attempt to exceed these levels

    will result in distortion of the signal, simply because there are no more quantization

    levels.

    4 .2.3 Disto r t ion Resu lt in g f rom th e Dig i t i za t ion Proces s

    When PCM values are allocated to the PAM samples, a certain amount of distortion

    results because of the finite number of quantization levels available to quantize the

    analog signal. Distortion is covered in more depth in Section 7.7.

    4 .3 Th e Dig ita l 1.54 4 M b p s PBX In t e rfa ce (DS-1)

    The 1.544 Mbps PBX interface is common to North America and Japan, and is often

    referred to as a "T1" or a "DS-1" interface. (In practice, these two terms are often used

    interchangeably, although this is wrong. DS-1 refers to a particular speed of 1.544 Mbps,

    while T1 refers to a digital transmission system). It offers 23 or 24 traffic timeslots

    depending upon the type of signaling being used.

    In countries supporting DS-1 interfaces, such as in North America, various types of T1

    transmission facilities are offered. AMI facilities expect the attached device (e.g. PBX) to

    provide an AMI electrical signal as described in Section 4.3.1. The main problem with

    this is that long strings of "0"s do not provide any electrical voltage transitions. This can

    result in loss of repeater synchronization on the transmission facility. It is therefore the

    responsibility of the attached equipment to ensure that sufficient "1"s exist to maintain

    synchronization. The proportion of "1"s to "0"s is known as the "1"s density.

    An alternative type of facility supports Bipolar Eight Zeroes Substitution (B8ZS), where

    violation pulses are introduced into the user data stream upon the detection of an

    excess "0"s count. This technique is similar in principle to HDB3, as seen in Section

    4.3.1, and is described below.

    4.3.1 Physical Interface

    DS-1 is supported via twisted pair cable only, unlike E1 that is supported on both unbal-

    anced coaxial cable and balanced twisted pair cables. DS-1 uses Alternate Mark

    Inversion (AMI) line coding to electrically encode the signal on the line. However, to

    overcome any problem of low "1"s density, a process called B8ZS is normally used

    17

    Vo ice Fu n d a m e n ta ls

    Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice

  • 8/12/2019 Voice Fundamentals Book

    26/127

    instead of the E1 HDB3 process. The operation of B8ZS, shown in Figure 4.4, works by

    replacing strings of eight consecutive binary zeroes with a code that introduces bipolar

    violations into the 4th and 7th bit positions. This ensures that a sufficient number of

    voltage transitions exist, while retaining the DC-balanced nature of the signal.

    4.3.2 Fra m ing - D4

    There are two common types of framing used on a DS-1 interface, D4 and Extended

    Superframe (ESF).

    D4 framing is shown in Figure 4.8. This consists of a frame of 193 bits with a repetition

    rate of 8000 frames per second, giving a data rate of 1.544 Mbps and a 125s frame

    duration. Each frame contains 24 eight-bit timeslots named timeslot 1 through to timeslot

    24, and a single bit called the F or framing bit. All 24 timeslots are normally available for

    traffic except for when CCS is carried. In this case, timeslot 24 is reserved for the

    signaling channel.

    Framing is achieved using the F bit over a sequence of 12 frames, which is also called

    a superframe. In odd-numbered frames the F bit is called Ft for terminal framing, and

    performs frame alignment. In even-numbered frames the F bit is called Fs and performs

    superframe alignment.

    4 .3.3 Fra m ing - Ex te n d ed Su p e r f ra m e (ESF)Today, ESF is more common than D4 due to its capabilities for monitoring the perfor-

    mance of an in-service T1 link. This was not easily possible with D4, since the link would

    need to be taken out of service in order for performance testing to be carried out.

    The extended superframe, as shown in Figure 4.9, does exactly what its name implies

    and extends the 12-frame superframe to 24 frames. The use of the F bit is also changed.

    18

    Vo ice Fu n d a m e n ta ls

    Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice

    Figu re 4 .4 HDB3 Cod ing

  • 8/12/2019 Voice Fundamentals Book

    27/127

    Only 6 out of the 24 frames in the ESF are now used for synchronization purposes. Of

    the remaining 18 F bits, six are utilized for CRC checking to verify the integrity of the

    ESF, and 12 make up a Facility Data Link (FDL). The FDL is also known as the Data

    Link (DL) and is sometimes called the Embedded Operation Channel (EOC). The FDL

    is available for the communication of alarms, loop backs and general performance infor-

    mation between terminating devices such as Customer Service Units (CSUs), which

    terminate the T1s at the customer premises.

    4.3.4 Ch a n n el Assoc ia te d Sign a lin g (CAS) on DS-1

    The technique for CAS on a DS-1 is shown in Figures 4.8 and 4.9. The basic process is

    the same for both D4 and ESF framing. However for D4, only two signaling bits, A and

    B, are used for each traffic timeslot. With ESF, four bits are used: A, B, C, and D.

    The process used is called bit robbing, because the least significant bit of each traffic

    timeslot in every sixth frame is taken away to carry signaling information rather than

    traffic. Meanwhile, the other seven bits are left alone and continue to carry traffic such

    as PCM. Any distortion introduced to PCM voice traffic by this bit-robbing technique is

    negligible and can be ignored. However, for data the distortion can be significant. This

    is why data support is typically only 56 Kbps rather than 64 Kbps, with only the seven

    most significant bits being used.

    19

    Vo ice Fu n d a m e n ta ls

    Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice

    Figu re 4 .5 G .70 4 Fra m e St ru ctu re fo r 2.048 Mbi t / s E1

  • 8/12/2019 Voice Fundamentals Book

    28/127

    4 .3.5 Co m m o n Ch a n n e l Sig n a li n g o n DS-1

    Common Channel Signaling utilizes timeslot 24 to carry signaling information as HDLC-

    based data messages. See section 5.2.2 for more details and examples.

    4 .3.6 DS-1 Ala rm sDS-1 provides the same alarm conditions, but in a different manner and with a different

    naming convention. Figure 4.10 gives a comparison between DS-1 and E-1 alarm condi-

    tions and the naming conventions associated with each.

    The method used by DS-1 to provide a remote alarm indication differs depending on

    whether D4 or ESF framing is being used.

    With D4 trunks, a remote alarm indication, also called a yellow alarm, is given by trans-

    mitting a "0" in the bit 2 position of every timeslot. Putting this alarm indication in the

    traffic timeslots has two implications. First, it destroys any valid information carried in the

    traffic timeslots. Second, the receiver must validate the indication for a period of time

    (typically about 600 milliseconds (ms)) before taking any action, since it is possible that

    normal traffic could briefly mimic it.

    With ESF trunks, a remote alarm indication (yellow alarm) is given by using the F bit

    Facility Data Link to transmit an alternating pattern of eight "0"s, followed by eight "1"s,

    then eight "0"s, and so on.

    4 .4 Th e Digi t a l 2 .0 4 8 Mb p s PBX In t e r fa ce (E1)

    A digital PBX interface running at 2.048 Mbps, sometimes called the "E1" interface, is

    designed to conform to ITU-T Recommendation G.732 "Characteristics of Primary PCM

    Multiplex Equipment Operating at 2.048 Mbps" [4]. This in turn refers to the following

    recommendations:

    G.703: "Physical/Electrical Characteristics of Hierarchical Digital Interfaces"

    G.704: "Synchronous Frame Structures Used at Primary and Secondary

    Hierarchical Levels"

    G.711: "Pulse Code Modulation (PCM) of Voice Frequencies"

    4.4 .1 Physica l In t e rfa ce - G.70 3

    ITU-T Recommendation G.703 [5] defines the electrical characteristics for many types

    and speeds of interfaces, including 64 Kbps, 1.544 Mbps, 6.312 Mbps, 32.064 Mbps,

    44.736 Mbps, 2.048 Mbps, 8.448 Mbps, 34.368 Mbps, 139.264 Mbps, 97.728 Mbps, and

    155.52 Mbps. European voice applications primarily use the 2.048 Mbps interface.

    20

    Vo ice Fu n d a m e n ta ls

    Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice

  • 8/12/2019 Voice Fundamentals Book

    29/127

    One of two types of physical interface may be used: the 75-ohm unbalanced coaxial

    interface, or the 120-ohm balanced twisted pair interface. Parameters including voltage,

    voltage and frequency tolerance, and others are specified in G.703.

    The actual data bits are transmitted using Alternate Mark Inversion (AMI) with High

    Density Bipolar 3 (HDB3) encoding. The objective of using these is twofold: first, to

    remove any DC component from the transmitted signal (AMI performs this function), and

    secondly, to ensure that there are a sufficient number of voltage transitions in the signal.

    This is referred to as "1"s density, and is important so that the receiving device can

    derive synchronization, or timing from the signal (HDB3 performs this function).

    Figure 4.8 shows AMI and HDB3 encoding. AMI employs a three-level signal where

    binary zeroes are encoded using 0 volts, and successive binary "1"s (marks) are

    encoded using alternating voltages of 2.37 V for the unbalanced interface and 3 V for

    the balanced interface.

    HDB3 is defined in G.703, and works by replacing each block of 4 successive zeroes by

    a pattern of either 000V or B00V, where B represents an inserted pulse that conforms to

    the AMI rule and V represents an AMI violation. A violation is where two successive "1"s

    use the same electrical polarity. The choice of which pattern is used ensures that the

    number of B pulses between consecutive V pulses is odd, thus retaining the DC-

    balanced nature of the signal. This is important to help ensure error-free transmission.

    4 .4 .2 Fra m ing S t ru ctu re - G .70 4

    ITU-T Recommendation G.704 [6] defines the frame structures for a number of different

    speed links, including 1.544 Mbps, 6.312 Mbps, 2.048 Mbps, and 8.448 Mbps.

    As shown in Figure 4.5, a frame of 256 bits is defined with a repetition rate of 8000

    frames per second, giving a data rate of 2.048 Mbps and a 125 s frame duration. Each

    21

    Vo ice Fu n d a m e n ta ls

    Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice

    Figu re 4.6 E1 Alar m s

  • 8/12/2019 Voice Fundamentals Book

    30/127

    frame comprises 32 eight-bit timeslots named timeslot 0 through to timeslot 31. Timeslot

    0 is used for a number of purposes, including frame synchronization and alarm

    reporting. Timeslot 16 is normally used to carry signaling information, although in some

    circumstances it may be used to carry a traffic channel. Timeslots 1 to 15 and 17 to 31

    are used to carry 30 traffic channels, normally PCM in the case of PBX systems.

    However, since these timeslots simply represent a 64 Kbps channel they can be used

    to carry any form of traffic, including data.

    Alternate timeslot 0s carry different information. Figure 4.5 is concerned with the frame

    alignment signal 0011011 in even frames, and the A (alarm) bit in odd frames. The

    purpose of the frame alignment signal is to allow the devices at each end of a link to

    22

    Vo ice Fu n d a m e n ta ls

    Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice

    Figu re 4 .7 Bipo la r Eight Zer oe s Su bst i tu t ion (B8ZS) Cod ing

    Figu re 4 .8 D4 Fra m ing

  • 8/12/2019 Voice Fundamentals Book

    31/127

    synchronize to the frame, enabling them to know where the frame starts and which bits

    refer to which timeslot. The A-bit, also known as the Remote Alarm Indication (RAI) is

    set to "0" in normal operation. In the event of a fault condition the A-bit will be set to a

    binary "1". See section 4.3.5 for more details on alarms.

    The other bits, Si and Sa4-Sa8, are of less significance, although still important. Si is

    reserved for international use. One specific use, as given in G.704, is to carry a Cyclic

    Redundancy Check that can be used for enhanced error monitoring on the link. It is

    important to note that both ends of a link must be configured in the same way, either with

    CRC enabled or CRC disabled.

    Sa4 to Sa8 are additional spare bits that can be used for a number of purposes as

    defined in G.704. For example, Sa4 can be used as a message-based link for opera-

    tions, maintenance and performance monitoring.

    4.4 .3 Ch a n n el Assoc ia te d Sign a lin g (CAS) on E1Figure 4.5 demonstrates a form of signaling known as Channel Associated Signaling

    (CAS). With CAS, specific bits of data within the frame are defined to carry signaling

    information for each of the traffic timeslots. The information for each timeslot is trans-

    mitted as a group of four bits, designated A, B, C and D. Since timeslot 16 only has eight

    bits available to support 30 traffic timeslots, a multiframe structure of 16 frames (desig-

    nated frame 0 to frame 15) is defined to allow A, B, C, and D bits to be carried for all 30

    timeslots. Frame 0 carries a Multiframe Alignment Signal (MFAS) of four zeroes. This

    allows the receiving system to identify which frame is which, and associate a traffic

    timeslot with its correct signaling bits. Frame 0 also carries a remote alarm indicator to

    23

    Vo ice Fu n d a m e n ta ls

    Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice

    Fig u r e 4.9 ESF Fr a m in g

  • 8/12/2019 Voice Fundamentals Book

    32/127

    signify loss of multiframe alignment. Timeslot 16 of frame 1 then carries A, B, C, and D

    bits for timeslots 1 and 17, timeslot 16 of frame 2 carries A, B, C, and D bits for timeslots

    2 and 18, and so on up to frame 15 of the multiframe, which carries A, B, C, and D bits

    for timeslots 15 and 31, after which the sequence repeats.

    A common problem is when the A, B, C, and D bits in timeslot 16 associated with any

    of the traffic channels 1 to 15 are set to 0000. If this is the case a false multiframe

    alignment signal will result, causing the whole signaling mechanism to fail. This situation

    is most common in a configuration where the PBX system is connected to a multiplexer

    network that provides an idle pattern of 0000 to the PBX for channels that are not routed.

    In fact, ITU-T G.704 recommends against the use of 0000 for any signaling purposes for

    timeslots 1 to 15. It also recommends that if B, C, and D are not used, then they should

    be set to B=1, C=0 and D=1.

    4 .4 .4 Com m on Cha n n e l Sign a ling (CCS) on E1

    Common Channel Signaling utilizes timeslot 16, as does CAS. However, rather than

    defining specific bits to carry signaling information for each of the traffic timeslots,

    signaling information is sent as High Level Data Link Control (HDLC)-based data

    messages. See section 5.2.2 for more details and examples.

    4 .4.5 E1 Ala rm sRecommendation G.732 describes various fault conditions and subsequent actions that

    should be taken. It includes such conditions as power supply failure and codec failure,

    although this booklet will only describe problems associated with the link itself.

    Frame Level Alarms - (in reference to Figure 4.6) In the event of one of the following

    problems occurring on the received signal (Rx), bit 3 in timeslot 0 of odd numbered

    frames should be set to "1" on the transmit (Tx) signal of that PBX system.

    Loss of incoming signal

    Loss of frame alignment

    Excessive error ratio 1 x 10-3

    24

    Vo ice Fu n d a m e n ta ls

    Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice

    Figu re 4 .10 DS-1 a n d E1 Ala rm Com p a rison

  • 8/12/2019 Voice Fundamentals Book

    33/127

    Loss of frame alignment will occur for a number of reasons, including cabling or

    equipment faults. In the event of a failure in the transmission system between the two

    PBXs, the transmission system should automatically apply an Alarm Indication Signal

    (AIS) condition to the line of a continuous stream of "1"s.

    Multiframe Alarm - When running Channel Associated Signaling, bit 6 of timeslot 16 in

    frame 0 of the multiframe is used to indicate loss of multiframe alignment. If multiframe

    alignment is lost on the receive signal, the PBX will set bit 6 in its transmit signal as to

    the far end.

    Loss of multiframe alignment is a rare situation, since in a normal failure condition loss

    of frame alignment is more likely. A common cause of such a condition is misconfigura-

    tion of one end of the link, as described in Section 4.3.3 above.

    4.5 Th e Ne ed for PBX Syn ch ron iza t ion

    Similar to many digital systems, digital PBX systems operate internally in a synchronous

    manner where data is moved from one place to another using a common clock, or timing

    source. When two PBXs are connected together via a digital link, they must be synchro-

    nized in order to avoid bit slips and loss of frame synchronization.

    Digital PBX interfaces typically have some form of buffering mechanism in order to

    overcome problems associated with jitter, wander, and slight timing inaccuracies.

    However, in the event that the timing mismatch between two PBXs is too great, the

    buffer will fill. Once full, it must be emptied, resulting in loss of data and loss of frame

    synchronization. Frame synchronization should be regained rapidly, so long as the

    timing mismatch is not too great.

    25

    Vo ice Fu n d a m e n ta ls

    Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice

    Figu re 4.11 La ck of Syn chro n isa t io n

    1,544,001Mbit/s 1,543,999Mbit/s

  • 8/12/2019 Voice Fundamentals Book

    34/127

    To understand why synchronization is required, consider the following two examples.

    The first example shows two PBX systems connected without synchronization, while the

    second shows PBX 2 synchronized to PBX 1.

    4.5.1 PBX Sys te m s With ou t Syn chro n iza t ion(With reference to Figure 4.11) The two PBXs are connected via a 1.544 Mbps link. This

    is only a nominal speed and some deviation is inevitable. It is quite possible that the

    transmit clock from PBX 1 is running slightly fast, say 1.544001 Mbps. This represents

    an accuracy of about 5 parts in 10 million, which is not uncommon. PBX 2, on the other

    hand, is running slightly slowly at 1.543999 Mbps, again an accuracy of about 5 parts in

    10 million.

    Every second, PBX 1 transmits 1,544,001 bits of data onto the trunk, and PBX 2

    receives 1,543,999 bits, leaving two bits to be absorbed in a buffer at the input to PBX

    2. This will continue with two bits being absorbed in the buffer every second until the

    buffer is full, at which point it will be emptied causing loss of frame synchronization. The

    effect of this is hard to predict, but will probably cause clicks on any voice call currently

    in progress across the trunk, or disconnection.

    4.5.2 PBX Sys te m s With Syn chro n iza t ion

    (With reference to Figure 4.12) The system has been rearranged to allow PBX 2 to

    synchronize its internal clock to the data arriving at the digital trunk port. PBX 1 still

    transmits data a bit fast, at 1.544001 Mbps, but now PBX 2 also receives data at

    1.544001 Mbps. The buffer does not fill, and frame synchronization is preserved.

    This approach is effective in a point-to-point situation, but synchronization also needs to

    be maintained in a full network scenario such as shown in Figure 4.13.

    26

    Vo ice Fu n d a m e n ta ls

    Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice

    Figu re 4 .12 PBX Synchr on isa t ion

    1.544001 Mbit/s 1.544001 Mbit/s

  • 8/12/2019 Voice Fundamentals Book

    35/127

    There are a number of rules to follow with synchronization, one of which is that the whole

    network should be synchronized back to the same clock source. Figure 4.13 shows a

    small network of five PBX systems with PBX 1 taking a high-accuracy clock from an

    ISDN connection to the public network. An ideal clocking arrangement based upon this

    network would be for PBX 2 to synchronize its system clock to the data on Link 1 coming

    from PBX 1. PBX 3 would then synchronize to the data coming from PBX 2 via Link 2.

    PBX 4 would synchronize to PBX 1 via Link 3 and PBX 5 to PBX 4 via Link 6. In this

    way, all PBX systems are synchronized together.

    However, this scenario alone would not compensate for a failure. If link 6 were to fail,

    PBX 5 would lose the clock to which it is synchronized. To overcome this problem, it is

    normal to have a clock fallback list in each PBX, and in the event of a problem the PBX

    will search for another valid source.

    However, one key criteria to follow when creating clock fallback lists is to ensure that the

    network cannot get into a clocking loop. For example, this is where PBX 4 takes its

    clocking signal from PBX 3, PBX 3 from PBX 5, and PBX 5 from PBX 4. In this scenario

    it is likely that errors would occur, causing loss of synchronization on the links. The error

    level could extend to where synchronization cannot be regained, resulting in a complete

    loss of one or more trunks. For this reason, it is very important to follow the manufac-

    turer's guidelines when configuring network clocking.

    27

    Vo ice Fu n d a m e n ta ls

    Sect ion 4 - In t ro du c t ion t o Dig i ta l Vo ice

    Figu re 4 .13 Synchr on isa t ion in a Net w ork of PBXs

  • 8/12/2019 Voice Fundamentals Book

    36/127

    28

    Vo ice Fu n d a m e n ta ls

  • 8/12/2019 Voice Fundamentals Book

    37/127

    5 Sp e e ch Co m p r e s s io n

    Speech compression, also referred to as voice compression, describes the process of

    digitizing speech to a bit rate of less than 64 Kbps. It is normal, however, to start with PCM

    at 64 Kbps and compress it to a lower rate. Ideally, the resulting speech quality will not

    be affected, however in practice, there will be some degradation that may or may not be

    apparent to the users. There are many different techniques with different characteristics

    used for speech compression, which result in bit rates from a few Kbps up to 40 Kbps.

    PCM speech can be compressed because a large portion of the 64 Kbps bit stream is

    redundant. Furthermore, it is thought that speech of reasonable quality can be provided

    at rates as low as 1 Kbps. This has not yet been achieved, in large part because current

    understanding of the way speech works is less than complete. As time goes by, new and

    more efficient techniques are being developed to drive the bit rate lower and lower while

    maintaining acceptable quality.

    This section looks at a number of speech compression techniques in common usage

    today, and other new systems poised for entry into the marketplace. The section

    concludes with information on speech compression impairments, including the negative

    effects that are introduced into voice telephony when speech compression is used.

    5.1 Differe n t Co d in g Typ e s

    Speech compression schemes can be classified into one of three categories: Waveform

    Coding, Source Coding, and Hybrid Coding.

    Waveform Coding - Waveform coders attempt to reconstruct a waveform in a form as

    close to the original signal as possible, based on samples of the original waveform. In

    theory, this means that waveforms are signal-independent and should work with non-

    voice signals such as modem and fax traffic. Typically waveform coders are relatively

    simple to implement and produce acceptable quality speech at rates above 16 Kbps.

    Below this, the reconstructed speech quality degrades rapidly.

    PCM is an example of a waveform coding technique. If linear quantization is used, then

    at least 12 bits per PCM sample are needed to reproduce good quality speech. This

    results in a bit rate of 96 Kbps (8000 samples per second x 12). However, the nature of

    speech and human hearing does not tend to follow a linear pattern. Much of a speech

    signal is at low levels, and human ears are not sensitive to the absolute amplitude of

    sounds, but instead to the log of the waveform amplitude. Therefore, in representing

    speech digitally, more bits are allocated at the lower levels than at the higher levels. This

    29

    Vo ice Fu n d a m e n ta ls

    Se c t i o n 5 - Sp e e c h C o m p re s s io n

  • 8/12/2019 Voice Fundamentals Book

    38/127

    process, called companding, results in digitized speech of 64 Kbps. Therefore, even

    PCM effectively represents a compressed digital speech form. Companding is

    discussed in more detail in Section 4.2.

    Other common waveform coding techniques include Adaptive Differential Pulse Code

    Modulation (ADPCM), and Continuously Variable Slope Delta (CVSD).

    Source Coding - Source coders (also known as Vocoders) are more complex than

    waveform coders, but can compress speech to bit rates of 2.4 Kbps and below. To

    achieve these compression rates, knowledge of the speech generation process is

    required. In principle, the speech signal is analyzed and a model of the source is

    generated. A number of parameters are then transmitted to the destination to allow it to

    rebuild the model and thus recreate the speech. These parameters include such infor-

    mation as whether a sound is voiced (such as vowels) or unvoiced (such as most conso-

    nants), amplitude information, and pitch. While source coders provide very low bit rates,

    the subjective quality of the regenerated speech can be poor. Often, although the

    speech is understood, recognition of whom is speaking, referred to as talker recognition,

    is very poor. Furthermore, source coders do not carry non-speech signals very well,

    such as modem or fax signals.

    Because of these factors, source coders are not typically used in commercial applica-

    tions. Their main use has been in military applications where natural sounding speech

    is not as important as a very low bit rate, which can then be encrypted for security

    purposes.

    Hybrid Coding - As the name suggests, hybrid coding uses aspects of both waveform

    and source coding, bringing together the benefits of high-quality speech from waveform

    coders and low bit rates of source coders.

    Hybrid coders operate in a similar manner to source coders, where a model is built on

    the parameters of the speech signal. Rather than transmit these parameters directly to

    the destination, hybrid coders are used to synthesize a number of new signals. These

    new signals are then compared with the original signal to find the best match. The

    modeled parameters, along with the excitation signal, which represents how the synthe-

    sized signal was produced, are then transmitted to the destination where the speech is

    reproduced. Examples of hybrid coders include Code Excited Linear Prediction (CELP)

    and its derivatives, some of which are described later in this section.

    30

    Vo ice Fu n d a m e n ta ls

    Se c t i o n 5 - Sp e e c h C o m p re s s io n

  • 8/12/2019 Voice Fundamentals Book

    39/127

  • 8/12/2019 Voice Fundamentals Book

    40/127

    DPCM in action, where only the difference between one sample and the next is trans-

    mitted from the source to the destination.

    For PCM, the analog speech signal is sampled at regular points, and the absolute level

    of the sample is encoded. With DPCM, the difference between one sample and the next

    is encoded. By doing this, fewer bits are needed to encode the signal. Although speech

    samples don't change rapidly in practice, when they do the changes in signal level are

    often greater than can be encoded using DPCM. This results in distortion of the signal,

    as shown in Figure 5.1.

    However, with ADPCM the amplitude range over which a given number of bits are used

    to encode a sample varies, or adapts, depending upon the range of amplitudes

    occurring at the time. This can be seen in Figure 5.2, which shows the principle of

    encoding the difference between the actual signal and a prediction of what it will be,

    based on the prediction that the next sample will be the same as the current one.

    The general rule that is used by ADPCM is as follows:

    "When signals are quantized towards the limit of the current range, the

    range used to quantize the next sample is changed."

    During the first few samples, the difference between one and the next will be relatively

    small. These differences are encoded using the full range of quantization levels that are

    available, given the number of bits available for each sample. For example, at 32 Kbps

    the difference is encoded in four bits, one for the sign (whether the next sample is more

    positive or negative than that predicted) and three for the magnitude of the difference. A

    few samples further on, the difference between one and the next is greater, yet can still

    be encoded using the full range of quantization levels. This is achieved through the

    32

    Vo ice Fu n d a m e n ta ls

    Se c t i o n 5 - Sp e e c h C o m p re s s io n

    Figu re 5.2 Ad a p t ive Differen t ia l Pu lse Cod e Mod u la t ion (ADPCM)

  • 8/12/2019 Voice Fundamentals Book

    41/127

    process of adaptation, where the amplitude range that

    can be encoded with four bits changes depending upon

    the amplitudes at the time.

    As previously mentioned, ADPCM supports four different

    bit rates. The different rates are achieved through the

    number of bits used to encode the difference between

    one sample and the next, as seen in Figure 5.3.

    5.3 Co d e Excite d Lin e a r Pre d ict ion (CELP)

    At bit rates of around 16 Kbps and below, the quality of waveform coders falls rapidly.

    We have also seen that source coders, while operating at very low bit rates, tend to

    reduce the talker recognition substantially. Therefore, hybrid schemes, especially CELP

    coders and their derivatives tend to be used today for sub 16 Kbps compression. Many

    CELP implementations are proprietary to the manufacturer although we shall also

    discuss two standardized versions known as LD-CELP as defined in ITU-T G.728 and

    CS-ACELP as defined in ITU-T G.729.

    The essence of a CELP encoder is to analyze the incoming speech, and then transmit

    a number of parameters to the decoder so that the original speech can be reproduced

    as accurately as possible. These parameters include the mathematical model of a filter

    which simulates the talker's vocal tract (the vocal characteristics that make a person

    sound unique), gain information giving the level of the speech, and a codebook index.

    The codebook index is used to point to a sequence of pre-defined speech samples,

    known as vectors, which is common to both the transmitter and the receiver. The

    number of codebook entries and the number of samples within each entry is dependent

    upon the actual CELP implementation.

    At the transmitter, groups of PCM speech samples (vectors) from the input speech,

    typically up to 20 ms in length, are compared to the vectors stored in the codebook.

    Generating a synthetic speech signal for every entry in the codebook and comparing it

    to the actual speech input vector does this. The index for the vector that produces the

    best match with the input speech waveform is then transmitted to the receiver. At the

    receiver, this waveform is then extracted from the codebook and filtered, using the math-

    ematical model of the original talker's vocal tract. This produces highly recognizable,

    high-quality speech transmissions.

    Due to the complexities of speech and the wide range of different human voices, the

    processing required for CELP is very intensive, typically of the order of 15 million instruc-

    tions per second (MIPS) or more for one voice channel. However, CELP is a very

    33

    Vo ice Fu n d a m e n ta ls

    Se c t i o n 5 - Sp e e c h C o m p re s s io n

    Figu re 5.3 ADPCM bits

  • 8/12/2019 Voice Fundamentals Book

    42/127

    popular form of speech compression because of its high speech quality and low bit

    rates, typically between 4.8 Kbps and 16 Kbps. The practical drawbacks of CELP are

    evident in two main areas. First, CELP often produces end-to-end delays in the order of

    50 to 100 ms. This is due to a combination of the processing overhead and the number

    of speech samples that are buffered for analysis. Such high delays can cause trans-

    mission problems (see Section 6 on echo). Second, since CELP is tuned to human

    speech, it does not support voice band data well and can cause problems with modems

    and fax machines, as well as the transmission of DTMF tones.

    5.4 Lo w De la y-Co d e Excit e d Lin e a r Pr e d ictio n (LD-CELP) ITU-T G.728

    LD-CELP is based upon CELP, and provides similar speech quality as 32 Kbps ADPCM

    at a rate of 16 Kbps. It also incurs smaller levels of delay, typically less than 2 ms, as

    compared with normal CELP delay levels of 50 to 100 ms. LD-CELP uses backward

    adaption to produce its filtering characteristics, which means that the filter is produced

    from previously reconstructed speech.

    At the encoder, A law or -law PCM is first converted to linear PCM. The input signal is

    then partitioned into blocks of five consecutive input signal samples. The encoder then

    compares each of 1024 codebook vectors with each input block. The 10-bit codebook

    index of the best match codebook vector is then transmitted to the decoder.

    The decoding operation is also performed on a block-by-block basis. Upon receiving

    each 10-bit index, the decoder performs a table look-up to extract the corresponding

    code vector from the codebook. The extracted code vector is then filtered to produce the

    current decoded signal vector. The five samples of the post filter signal vector are then

    converted to five A-law or -law PCM output samples.

    Please note that this is a somewhat simplified description of the operation of LD-CELP.

    A more detailed description can be found in G.728 [15].

    5.5 Con ju ga te St ru ctu re -Alge bra ic Cod e Exc ite d Line a r Pred ict ion

    (CS-ACELP) ITU-T G.729

    CS-ACELP is another speech compression technique that is based upon CELP. It was

    originally designed for packetized voice support on mobile networks, although a different

    scheme known as RPE-LTP has been adopted on the GSM mobile telephone system

    (see Section 5.6).

    34

    Vo ice Fu n d a m e n ta ls

    Se c t i o n 5 - Sp e e c h C o m p re s s io n

  • 8/12/2019 Voice Fundamentals Book

    43/127

    CS-ACELP operates at a rate of only 8 Kbps, yet still provides speech quality similar to

    that of ADPCM at 32 Kbps. Furthermore, it has been shown to operate well in tests even

    when packets are lost.

    The coder operates on speech frames of 10 ms, corresponding to 80 samples at a

    sampling rate of 8000 samples per second. For every 10 ms frame, the speech signal

    is analyzed to extract the parameters of the CELP model (linear prediction filter coeffi-

    cients, adaptive and fixed codebook indices and gains). These parameters are then

    transmitted to the destination in a specified frame format. At the decoder, the filter and

    gain parameters are used to retrieve the filter information and simulate the filter. The

    speech is then reconstructed by taking the codebook entry of 80 samples and passing

    it through the reconstructed filter. Speech is then be converted to A or -law PCM and

    transmitted to the interface.

    Please note that this is a somewhat simplified description of the operation of CS-ACELP.

    A more detailed description can be found in G.729 [16].

    5.6 O t h e r Co m p r e s sio n Te ch n iq u e s

    Continuously Variable Slope Delta (CVSD) - CVSD was one of the earlier speech

    compression schemes designed into TDM multiplexers and was very popular in the

    early 1980s. It is a waveform coder operating directly on the waveform of the signal.

    However, rather than starting with PCM, as is the case with many other compression

    schemes, it often relies on analog rather than digital techniques. With CVSD coding, the

    sending end compares the analog input voltage to an internal reference voltage. If the

    input signal is greater than the reference, a "1" is transmitted and the reference voltage

    increased. If the input signal is less than the reference, a "0" is transmitted and the

    reference voltage is decreased.