stl 2005 man

ITU-T Software Tool Library 2005 User’s Manual

ITU-T Users’ Group on Software Tools

Geneva, August 2005

Copyright c© 2005, 2006 by the International Telecommunication Union (ITU)

This is edition 1.0 of the “ITU-T Software Tool Library Manual”, for the 2005 release ofthe ITU-T Software Tool Library, distribution 1.0, May 2005.

Published by the ITU. Copies of this manual are available as part of the STL2005 distri-bution. STL2005 copies can be acquired from:

ITU General SecretariatSales ServicePlace du NationsCH-1211 Geneve 20Switzerland

Also via the Internet from:http://www.itu.int/rec/recommendation.asp?type=folders&parent=T-REC-G.191

Permission is granted to make and distribute verbatim copies of this manual provided thecopyright notice and this permission notice are preserved on all copies.

http://www.itu.int/rec/recommendation.asp?type=folders&parent=T-REC-G.191

Contents

1 Introduction 1

1.1 Organization of the Software Library . . . . . . . . . . . . . . . . . . . . . 2

1.2 Whom to contact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Tutorial 5

2.1 Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Definition of terms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.1 Overload point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.2 Signal power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.3 Signal level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2.4 Relation between overload and maximum levels . . . . . . . . . . . 7

2.2.5 Saturation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.6 Data representation . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.7 Data justification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2.8 Equivalent results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.2.9 Little- and big-endian data ordering . . . . . . . . . . . . . . . . . . 9

2.3 Guidelines for software tool development . . . . . . . . . . . . . . . . . . . 12

2.4 Software module I/O signal representation . . . . . . . . . . . . . . . . . . 14

2.5 Tool specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 RATE-CHANGE: Up- and down-sampling module 19

3.1 Description of the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.1.1 High-quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.1.2 Telephony-band weighting . . . . . . . . . . . . . . . . . . . . . . . 21

3.1.3 Wideband weighting . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1.4 Super-wideband weightings . . . . . . . . . . . . . . . . . . . . . . 24

3.1.5 Noise weighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

i

ii ITU-T Software Tool Library, release 2005

3.1.6 PCM Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2.1 FIR module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2.2 IIR Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.3 Tests and portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

3.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.4.1 Description of the demonstration programs . . . . . . . . . . . . . . 65

3.4.2 Example: Calculating frequency responses . . . . . . . . . . . . . . 65

4 EID: Error Insertion Device 69


4.1.1 Simple Channel Model . . . . . . . . . . . . . . . . . . . . . . . . . 69

4.1.2 The Bellcore Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

4.2.1 open eid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4.2.2 open burst eid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.2.3 reset burst eid . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.2.4 close eid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.2.5 BER generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.2.6 FER generator random . . . . . . . . . . . . . . . . . . . . . . . . . 77

4.2.7 FER generator burst . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.2.8 BER insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.2.9 FER module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79


4.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80


4.4.2 Using the bit error insertion routine . . . . . . . . . . . . . . . . . . 82

4.4.3 Using the frame erasure routine . . . . . . . . . . . . . . . . . . . . 83

5 G.711: The ITU-T 64 kbit/s log-PCM algorithm 85

5.1 Description of the algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

5.2.1 alaw compress and ulaw compress . . . . . . . . . . . . . . . . . . 87

5.2.2 alaw expand and ulaw expand . . . . . . . . . . . . . . . . . . . . 88


5.4 Example code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Version: November 2, 2005 iii

5.4.1 Description of the demonstration program . . . . . . . . . . . . . . 89

5.4.2 Simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

6 G.711-PLC: Packet loss concealment with G.711 91

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91


6.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

6.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

6.3.2 PLC Algorithm Implementation . . . . . . . . . . . . . . . . . . . . 92

6.3.3 Test Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

6.3.4 Loss Pattern Conversion Utility . . . . . . . . . . . . . . . . . . . . 96

7 G.726: The ITU-T ADPCM algorithm at 40, 32, 24, and 16 kbit/s 99

7.1 Description of the 32 kbit/s ADPCM . . . . . . . . . . . . . . . . . . . . . 100

7.1.1 PCM format conversion . . . . . . . . . . . . . . . . . . . . . . . . 100

7.1.2 Difference Signal Computation . . . . . . . . . . . . . . . . . . . . . 100

7.1.3 Adaptive Quantizer . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

7.1.4 Inverse Adaptive Quantizer . . . . . . . . . . . . . . . . . . . . . . 101

7.1.5 Quantizer Scale Factor Adaptation . . . . . . . . . . . . . . . . . . 101

7.1.6 Adaptation Speed Control . . . . . . . . . . . . . . . . . . . . . . . 101

7.1.7 Adaptive Predictor and Reconstructed Signal Calculator . . . . . . 102

7.1.8 Tone Transition and Detector . . . . . . . . . . . . . . . . . . . . . 102

7.1.9 Output PCM Format Conversion . . . . . . . . . . . . . . . . . . . 102

7.1.10 Synchronous Coding Adjustment . . . . . . . . . . . . . . . . . . . 102

7.1.11 Extension for linear input and output signals . . . . . . . . . . . . . 102

7.2 ITU-T STL G.726 Implementation . . . . . . . . . . . . . . . . . . . . . . 103

7.2.1 G726 encode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

7.2.2 G726 decode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

7.3 Portability and compliance . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

7.4 Example code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107


7.4.2 Simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

8 G.727: The ITU-T embedded ADPCM algorithm at 40, 32, 24, and 16kbit/s 109

8.1 Description of the Embedded ADPCM . . . . . . . . . . . . . . . . . . . . 109

8.1.1 Extension for linear input and output signals . . . . . . . . . . . . . 109

iv ITU-T Software Tool Library, release 2005


8.2.1 G727 reset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

8.2.2 G727 encode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

8.2.3 G727 decode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112


8.4 Example code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113


8.4.2 Simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

9 G.722: The ITU-T 64, 56, and 48 kbit/s wideband speech coding algo-rithm 115

9.1 Description of the 64, 56, and 48 kbit/s G.722 algorithm . . . . . . . . . . 115

9.1.1 Functional description of the SB-ADPCM encoder . . . . . . . . . . 117

9.1.2 Functional description of the SB-ADPCM decoder . . . . . . . . . . 119


9.2.1 g722 encode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

9.2.2 g722 decode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

9.2.3 g722 reset encoder . . . . . . . . . . . . . . . . . . . . . . . . . . 124

9.2.4 g722 reset decoder . . . . . . . . . . . . . . . . . . . . . . . . . . 124


9.4 Example code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125


9.4.2 Simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

10 RPE-LTP: The full-rate GSM codec 127

10.1 Description of the 13 kbit/s RPE-LTP algorithm . . . . . . . . . . . . . . . 127

10.1.1 RPE-LTP Encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

10.1.2 RPE-LTP Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

10.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

10.2.1 rpeltp encode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

10.2.2 rpeltp decode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

10.2.3 rpeltp init . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

10.2.4 rpeltp delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133


10.4 Example code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134


Version: November 2, 2005 v

10.4.2 Simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

11 Duo-MNRU: The Dual-mode Modulated Noise Reference Unit 137


11.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

11.2.1 MNRU process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146


11.4 Example code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148


11.4.2 Simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

12 SVP56: The Speech Voltmeter 151


12.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

12.2.1 init speech voltmeter . . . . . . . . . . . . . . . . . . . . . . . . 155

12.2.2 speech voltmeter . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

12.2.3 Getting state variable fields . . . . . . . . . . . . . . . . . . . . . . 156


12.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157


12.4.2 Small example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

13 BASOP: ITU-T Basic Operators 159

13.1 Overview of basic operator libraries . . . . . . . . . . . . . . . . . . . . . . 159

13.2 Description of the 16-bit and 32-bit basicoperators and associated weights . . . . . . . . . . . . . . . . . . . . . . . 159

13.2.1 Variable definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

13.2.2 Operators with complexity weight of 1 . . . . . . . . . . . . . . . . 160







13.2.9 Basic operator usage across standards . . . . . . . . . . . . . . . . . 168

13.3 Description of the 40-bit basic operators and associated weights . . . . . . 170

13.3.1 Variable definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

vi ITU-T Software Tool Library, release 2005





13.3.6 Coding Guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

13.4 Description of the control basic operators and associated weights . . . . . . 174

13.4.1 Operators and complexity weights . . . . . . . . . . . . . . . . . . . 175

13.4.2 Coding guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

13.5 Complexity associated with data moves and other operations . . . . . . . . 181

13.5.1 Data moves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

13.5.2 Other operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

14 REVERB: Reverberation tool 183

14.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183


14.2.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

14.2.2 Impulse response measures . . . . . . . . . . . . . . . . . . . . . . . 184

14.2.3 Impulse response file format . . . . . . . . . . . . . . . . . . . . . . 185

14.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

14.3.1 shift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

14.3.2 conv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

14.3.3 Tests and portability . . . . . . . . . . . . . . . . . . . . . . . . . . 186

14.4 Example code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

15 TRUNCATE: Bitstream truncation tool 187

15.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187


15.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

15.3.1 trunc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188


15.4 Example code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

16 FREQRESP: Frequency response measurement tool 191

16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191


16.2.1 Discrete Fourier Transform (DFT) . . . . . . . . . . . . . . . . . . 191

Version: November 2, 2005 vii

16.2.2 Hanning window generation (DFT) . . . . . . . . . . . . . . . . . . 192

16.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

16.3.1 rdft . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

16.3.2 genHanning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

16.3.3 powSpect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192


16.4 Example code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

17 UTILITIES: UGST utilities 195

17.1 Some definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195

17.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

17.2.1 scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

17.2.2 sh2fl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

17.2.3 sh2fl alt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

17.2.4 fl2sh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

17.2.5 serialize * justified . . . . . . . . . . . . . . . . . . . . . . . . 199

17.2.6 parallelize * justified . . . . . . . . . . . . . . . . . . . . . . . 200


17.4 Example code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202


17.4.2 The master header file for the STL demonstration programs . . . . 202

17.4.3 Short and float conversion and scaling routines . . . . . . . . . . . . 203

17.4.4 Serialization and parallelization routines . . . . . . . . . . . . . . . 204

18 References 207

A Unsupported tools 211

A.1 Source code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

A.2 Scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

A.3 Makefiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

A.4 Test files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

B Future work 215

viii ITU-T Software Tool Library, release 2005

Chapter 1

Introduction

In July 1990, Study Group XV of the then CCITT decided to set up a group to deal withthe development of common software tools to help in the development of speech codingstandards. In the same period, cooperation was requested with SG XII Speech QualityExperts Group (SQEG), and a group called ‘User’s Group on Software Tools’ (UGST)was initially established with almost 20 corresponding members. The basic means ofinteraction were the then incipient electronic mail (e-mail) messages, for the exchange offiles and experiences – UGST was actually one of the pioneer groups in ITU collaboratingvia electronic means. In addition to this, there were meetings held mainly during regularWorking Party XV/2 (Signal Processing) sessions, where most of the decisions were made.

As result of that very intensive work, several software tools evolved forming the ‘1992 ITU-T Software Tool Library’ (STL92) which included, as its first application, the QualificationTest for a Speech Coder at 8 kbit/s. After this initial release, another release was approvedby ITU-T Study Group 15 in May, 1996, and called STL96. The STL96 introducedsubstantive improvement and new features to the STL92. In November 2000, ITU-TStudy Group 16 approved an updated version to the STL, the STL2000. In 2005, anotherupdated version of the STL, STL2005, was developed. STL2005 corrects bugs andbrings revisions (such as 32-bit accumulator basic operator weights) and adds new tools(alternative set of basic operators, new FIR filters, a reverberation tool, a frequencyresponse measurement tool and a bit stream truncation tool). STL2005 is described inthis document. Terms and conditions on the usage of the ITU-T STL are found in ITU-TRecommendation G.191 [1].

The remaining chapters of this document describe the principles that guided the genera-tion of the ITU-T STL, as well as the description of its organization. The various toolsare described in separate chapters. These descriptions have the following general outline:

a. technical description of the method or algorithm involved;

b. description of the algorithm implementation in this release (including proto-types, parameters, returned value, etc.); and

c. testing, applications and examples.

All the STL2000 modules had their portability tested for MSDOS/Windows and severalUnix flavors. In MSDOS, all modules were tested with the MSDOS port of the GNU gcc

compiler (a.k.a. DJCPP) and with at least one of these Borland compilers: Turbo C 2.0,Turbo C++ 2.0, or Borland C++ 3.1. In the Windows environment, the code was testedusing MS Visual C version 6.1 SP3 as well as using the gcc compiler available in the Cygnus

1

2 ITU-T Software Tool Library, release 2005

CYGWIN development environment (www.cygnus.com). The VAX/VMS environmentwas fully supported in the STL96 (VAXC and gcc), however it was not possible to continueit for the STL2000 due to operational reasons; nevertheless, compilation under gcc shouldprovide the expected results, and some tools were tested for Ultrix. For the Unix operatingsystem, portability was verified for three workstation platforms: Sun Solaris 5.7 (SPARCor Intel CPUs, using gcc), HP 9000 Series 700 HPUX 9.05 or 10.20 (using gcc), and SiliconGraphics. On Silicon Graphics systems, the standard cc compiler was used.The new tools and the revised portions of the STL2005 were compiled and tested with aWindows environment using MS Visual C++ 6.0.

1.1 Organization of the Software Library

Each tool of the STL has been produced as a stand-alone module, such that it may belinked to a user’s program, application or system. In the present version, there are severalof these modules:

1. RATE-CHANGE: An up- and down-sampling algorithm with embed-ded filterings:

• ITU-T Rec. G.712 filter for factors of 1:2, 2:1 and 1:1

• High-quality filter for factors 1:2, 2:1, 1:3, and 3:1

• IRS send-side weighting filter, for several sampling rates: 8, 16, and48 kHz. This includes the “full-IRS” as in ITU-T Rec. P.48 as wellas the “modified” IRS as in Annex D of ITU-T Rec. P.830.

• Modified-IRS receive-side filter is also available for 8 and 16 kHzsampled data.

• ∆SM weighting filter for near-to-far field conversion

• Psophometric weighting filter of ITU-T Rec. O.41 for noise measure-ments

• ITU-T P.341 weighting filter for wideband signal (50-7000 Hz)

• 100-5000 Hz bandpass filter (new in STL2005)

• 50-14000 Hz bandpass filter (P341 extension for super-wideband sig-nal) (new in STL2005)

• MUSHRA anchors (3.5 kHz low-pass filter, 7 kHz low-pass filter, 10kHz low-pass filter) (new in STL2005).

2. EID: Error insertion algorithm, with routines for generation of bit errorpatterns (random or burst) as well as random and burst frame erasure.

3. G.711: The 64 kbit/s PCM algorithm with A and µ law of ITU-T Rec.G.711.

4. G.711-PLC: The high-quality, low complexity packet-loss concealmentspecified in ITU-T Rec. G.711 Appendix I (new in STL2005).

5. G.726: The 40, 32, 24, and 16 kbit/s ADPCM algorithm of ITU-T Rec.G.726.

6. G.727: The 40, 32, 24, and 16 kbit/s embedded ADPCM algorithm ofITU-T Rec. G.727.

www.cygnus.com

Version: November 2, 2005 3

7. G.722: The 64, 56, and 48 kbit/s wideband speech ADPCM algorithmof ITU-T Rec. G.722.

8. RPE-LTP: The 13 kbit/s RPE-LTP algorithm of the full-rate GSMsystem (GSM Rec. 06.10).

9. MNRU: The modulated noise reference unit of ITU-T Rec. P.810 (for-merly ITU-T Rec. P.81).

10. SVP56: The Speech Voltmeter for measuring the active speech level(which skips over silence in a utterance) of ITU-T Rec. P.56.

11. BASOP: The set of basic digital signal processing (DSP) operators thatrepresent the set of instructions typically available in digital signal pro-cessors (revised in STL2005).

12. REVERB: Tool to add reverberation to speech (new in STL2005).

13. TRUNCATE: Bitstream truncation tool (new in STL2005).

14. FREQRESP: Frequency response measurement tool (new in STL2005).

15. UTILITIES: Tools that have been developed to assure proper interfac-ing between the various tools. These tools do not relate to any ITU-TRecommendation. Included are tools for conversion between float andshort data representations, between parallel and serial (bit-stream) for-mats, and for scaling of data.

It should be noted that C code is available for a number of codecs as a normative part ofthe respective standards, e.g. ITU-T G.723.1, G.729, G.722.1, G.722.2; 3GPP extendedAMR Wideband codec, enhanced aacPlus general audio codec; ETSI GSM-HR, GSM-EFR, GSM-AMR; TIA IS-641, IS-127, IS-96A, among others. These source codes arenot appropriate for inclusion in the ITU-T STL for a number of reasons: they are anintegral part of the respective standards, are maintained within the scope of the respectivestandards development organizations (SDOs), are protected by copyrights, and are openlyavailable. Parties interested in acquiring these source codes should contact the appropriateSDO.

1.2 Whom to contact

In case of problems with any of the tools, please contact the ITU-T Study Group 16secretariat at <[email protected]>. Please provide a precise description of the problemwith proper reference to the C-code, and possible solution(s), if known.

1.3 Acknowledgements

Several organizations which participate in ITU-T Study Groups 12, 15 and 16 have sub-stantially contributed to the completion of this release of the ITU-T STL.

First and foremost, UGST wishes to thank CPqD/Telebras (Brazil) for its support ofthe early coordination (1990-1993) of the activity and of the development of the follow-ing tools: Utilities, G.711, G.726, MNRU, and SVP56. For the first two, the work was


shared with PKI (Germany), which also provided the initial version of the modules EIDand RATE-CHANGE, as well as basic material that supported the initial organization ofthe work, together with Telenor (formerly NTA, Norway) and the DBP-Telekom (Ger-many). DBP-Telekom also collaborated in providing several software tools used in theHost Laboratory for the ITU-T 8 kbit/s speech coder: modified IRS filters, adaptationof the Bellcore burst frame erasure model, and ∆SM filter. UGST also wants to thankCSELT (Italy) for making available its Fortran MNRU program, which was the startingpoint of the present implementation, and for the implementation of the psophometricfilter. CNET (France) provided the G.722 tool, which was greatly appreciated. UGSTkindly thanks Mr. Jutta Deneger for allowing the incorporation of his implementation ofthe RPE-LTP algorithm in the STL. Also, Bellcore provided several programs in Fortranand C that, while not used directly in the present version of the STL, were importantin various stages of the development of the Library, especially a version of the Red BookG.721. PTT Ukraine graciously provided the G.727 implementation, which was warmlywelcomed. COMSAT Labs (now part of Lockheed Martin Global Telecommunications,LMGT), in turn, provided essential help in funding the recent coordination work (1994-current), and the harmonization and documentation of the tools. Also important was thetesting work done by the Research Institute of the Deutshes Telekom (now T-Nova/DT),as well as PKI, Telebras, AT&T (USA), and CNET. Since 2003, several companies havejointly worked on the Basic Operators revision and an alternative set addition: TexasInstruments, Conexant Systems, STMicroelectronics, Hughes Software Systems, FranceTelecom, VoiceAge. Besides this work on Basic Operators, ITU-T Q7/12 and Q10/16experts work on the addition of new tools. France Telecom and Polycom have provided es-sential contributions in these STL2005 works. France Telecom also provided great supportfor the management of Q.10/16, responsible for the up-keeping of the STL (2004-current).Special thanks to ITU-T Q7/12 rapporteurs, Paolo Usai (ETSI) and Catherine Quinquis(France Telecom), ITU-T Q10/16 STL work moderators, Karim Djafarian (Texas Instru-ments) and Stephane Ragot (France Telecom) and ITU-T SG16 Counsellor Simao Ferrazde Campos Neto (actually, the ”father” of the STL).

Several parts of this manual were possible only by the contribution of several individuals:Pierre Combescure (CNET) for the description of the G.722 algorithm, Rudolf Hofmann(PKI), for description the Gilbert-Elliot channel implemented in the EID module, PeterKroon (AT&T) for the description of the RPE-LTP algorithm, and Vijay Varma (Bell-core) for the text describing the Bellcore Burst Error Model. The following persons havecontributed to the 2005 edition of this manual: Karim Djafarian (Texas Instruments) tothe edition of the Basic Operators chapter, Claude Marro (France Telecom) to the newchapter on the reverberation tool, Cyril Guillaume (on behalf of France Telecom) to thenew chapters on the frequency response measurement tool and the bitstream truncationtool, David Kapilow (AT&T) to the new chapter of G.711 PLC tool.

It is also necessary to thank all UGST members that collaborated on the earlier versionsof STL Manual.

Chapter 2

Tutorial

2.1 Acronyms

Several acronyms are used in this text. The most relevant are:

ANSI . . . American National Standards Institute.

BBER . . Burst Bit Error Rate

BER . . . . Bit Error Rate (refers to random bit errors)

BFER . . Burst Frame Erasure Rate

DAT . . . . Digital Audio Tape.

EID . . . . Error insertion device.

ETSI . . . European Telecommunications Standards Institute.

FER . . . . Frame Erasure Rate (refers to random frame erasures)

GSM . . . Global System for Mobile Communications. Pan-European digital-cellular sys-tem operating at a net rate of 13 kbit/s in its full-rate system.

IRS . . . . . Intermediate Reference System, defined in ITU-T Rec. P.48 for the so-called“full-IRS” mask, or in Annex D of ITU-T Rec. P.830 for the so-called “modi-fied” IRS mask.

ITU . . . . International Telecommunication Union.

ITU-T . . Standardisation Sector of the International Telecommunication Union.

LSb . . . . . Least significant bit.

MIRS . . . Modified-IRS telephony speech weighting (in ITU-T Rec. P.830 Annex D).

MSb . . . . Most significant bit.

PSTN . . Public Switched Telecommunication Network.

R&O . . . Requirements and Objectives, for performance of software tools.

SQEG . . Speech Quality Experts Group, of Study Group 12 of the ITU-T.

PLC . . . . Packet loss concealment

STL92 . . ITU-T Software Tools Library, release 1992.

STL96 . . ITU-T Software Tools Library, release 1996.

STL2000 ITU-T Software Tools Library, release 2000.

STL2005 ITU-T Software Tools Library, release 2005.

UGST . . Users’ Group on Software Tools, of Study Group 16 of the ITU-T.

5


2.2 Definition of terms

In the documentation of the ITU-T software tools, several terms are widely used and aredefined below.

2.2.1 Overload point

The overload point within the digital domain is defined by the (normalized) amplitudevalue.

x over∆= 1.0

How this overload point relates to the analogue world depends on the conversion methodbetween the analog and digital domains, and is beyond the scope of this document. Allsignals in this manual are relative to this overload point in the digital domain.

NOTE: This overload point does NOT depend on the quantisation method used andremains identical, regardless of whether the quantisation is done e.g. with 32, 16, 13 or8 bits.

1. In floating point (either single or double precision), the representation of this valueis exact. In this text, and also in the tools, this data type is called float.

2. In 32 bit 2’s complement representation the data can be represented by multiplyingthe normalized value by 231. For example, the largest possible positive value is rep-resented by 0x7FFFFFFF. The largest negative value is represented by 0x80000000.In this text, and also in the tools, this data type is called long.

3. In 16 bit 2’s complement representation the data can be represented by multiplyingthe normalized value by 215. For example, the largest possible positive value isrepresented by 0x7FFF. The largest negative value is represented by 0x8000. In thistext, and also in the tools, this data type is called short.

4. The statements above may be generalized for all wordlengths in fixed point repre-sentation. The idea is to set the decimal point just after the MSb (sign bit).

2.2.2 Signal power

The power of a signal x(n) with a length of N samples is defined by

P =1

N

N−1∑n=0

x(n)2


A signal which does not contain amplitude values exceeding the overload point can havea maximum signal power of 1.0. This is the power of a DC signal with an amplitude of1.0 or of any other signal comprising only the values ± 1.0 (e.g., a square wave signal).

2.2.3 Signal level

The power level in decibels is defined relative to a reference power level P0 = 1.0:

L = 10 log10(P/P0) (dBov)

The level of a signal power P = 1.0 is thus 0 dBov (where the characters “ov” arbitrarilymean digital overload signal level), which is chosen to be the reference level. A signal withsuch power level could be either (a) a sequence of maximum positive numbers (+1), (b)a sequence of maximum negative numbers (–1), or (c) a rectangular function exercisingonly the positive or negative maximum numbers (±1). The level of a sinewave with anamplitude (peak value) of 1.0 is therefore L = −3.01 dBov.

2.2.4 Relation between overload and maximum levels

The measurement of signal levels in the digital part of the network is normally expressedby telecommunications engineers as y dBm0, i.e., the level relative to 1 mW in 600Ω.However, from the software point of view, it is more convenient to represent levels relativeto the maximum power that can be stored in integer format on a computer, e.g. z dBov.A conversion between both representations can be expressed as:

y (dBm0) = z (dBov) + C

For the G.711 encoding rule, a sinewave which exercises the maximum level has a powerTmax of 3.14 dBm0 for A-law, and of 3.17 dBm0 for µ-law. On the other hand, theRMS level of these sinewaves would always be -3.01 dBov. Therefore, C above becomes6.15 dB for A-law and 6.18 dB for µ-law. For the G.722 wideband coding algorithm, theoverload point of the A/D and D/A converters should be 9 dBm0. Therefore, in thatcase, C becomes 12.01 dB.

The following relationships summarize the discussion:

ΛA(dBm0) = Lov(dBov) + 6.15dB(A-law)

Λµ(dBm0) = Lov(dBov) + 6.18dB(µ-law)

Λwb(dBm0) = Lov(dBov) + 12.01dB(G.722)


2.2.5 Saturation

Saturation is the limitation of signal amplitudes to values equal to or smaller than theoverload point:

y(k) =

−1.0, if x(k) < −1.0x(k), if −1.0 ≤ x(k) ≤ +1.0+1.0, if x(k) > +1.0

2.2.6 Data representation

Unless otherwise noted all waveforms within the signal processing are assumed to haveinfinite precision and unlimited amplitude. The overload point is therefore the referencepoint only. In practice these signals may well be represented in 32 bit floating pointarithmetic or high precision integer arithmetic (24 bit for data and coefficients, 48 to 56bit for products and accumulation). In most cases, 16 or 32 bit integer arithmetic is notprecise enough.

Signals derived from 16 bit 2’s complement representation (DAT, files, digital I/O inter-face) should be converted to this (approximately) infinite precision before processing bymodules that require floating point input. Normalization of the floating point values tothe overload point is recommended.

2.2.7 Data justification

Justification of data here is used without distinction to data alignment and data adjust-ment: where the upper or lower significant bit of an integer sample is located.

Left-justified data are samples whose most significant bit is located at the leftmost positionof the computer storage unit used for it. Remaining low-bit positions must be set to zero.

Right-justified data are samples whose least significant bit is located at the rightmostposition of the computer storage unit used for it. Remaining upper bits depend on thedata representation: if two’s complement, sign extension from sample’s MSb to storage’sMSb is needed; otherwise, the upper (unused) bits shall be zeroes.

As an example, suppose a 12-bit resolution, two’s complement sample, to be stored forprocessing in a short. If left-justified, then a sign bit (the MSb!) is found in bit 15 (theMSb) of the short that stores it. On the other hand, if right-justified, the LSb will be thebit 0 of the short, in this case. If it is a negative number, there would be sign extensionfor bit 12 to 15. If it is an unsigned number, the upper 4 bits (in the example) are allzeros. Figure 2.1 illustrates these three cases.


Bit number 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0Bit type s v v v v v v v v v v v x x x x

(a) Left-justified data

Bit number 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0Bit type s s s s v v v v v v v v v v v v

(b) Right-justified, sign-extended data

Bit number 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0Bit type 0 0 0 0 v v v v v v v v v v v v

(c) Right-justified, unsigned data

Figure 2.1: Illustration of a left- and right-justified data with 12-bit resolution.Bit types s, v, and x represent respectively sign bit(s), valid bits and unused bits.

2.2.8 Equivalent results

Several software tools, such as the G.711 algorithm, are defined in terms of precise fixedpoint operations. Therefore, when comparing the output of one of these algorithms ondifferent platforms, or for compilation using different C compilers, one should expectidentical sample values for reference processed materials.

Other algorithms, however, may include highly intensive processing, or complex mathe-matical functions. Examples of these are rate change filters and floating-point arithmeticspeech coders, such as the 16 kbit/s LD-CELP of ITU-T Rec. G.728. In such cases, itis expected that the processing of the same reference material on different platforms willgenerate almost identical results. The generated files will probably be identical for mostof the samples, and for some samples they will differ by a small amount, e.g. ±1, or morerarely by ±2 or more. For the purposes of the STL, such an implementation is said toproduce equivalent results on different platforms.

2.2.9 Little- and big-endian data ordering

Present computer systems agree only on the data access for byte-oriented data structures.Although computer systems exist whose bytes do not have 8-bits, the majority of thesystems implement bytes as 8-bit data structures. In general, the computer architecturesdo not differ in the way they access the bit-order within a byte. In other words, for thevast majority of the computer systems existing today, the least significant bit occupiesthe lower memory position (i.e., bit 0), and the most significant bit occupies the highermemory position in the byte (i.e., bit 7). In terms of C operations, if b is a byte structure,then b&0x1 returns the LSb, and (b>>7)&0x1 returns the MSb.

Although most computer architectures agree on the definition of a byte and how its bitsare accessed, they vastly differ on how multi-byte structures are accessed. Trivial examplesof multi-byte structures are 16-bit short words or 32-bit long words. There are currently


Table 2.1: Example of big- and little-endian systemsBig-endian Little-endian

Computer Microprocessor Computer Microprocessor

Sun-3 Motorola 68000 family IBM-PC/compatibles(a) Intel 80x86/PentiumSun-4 Sun SPARC family DEC-Stations MIPS RISC

Silicon Graphics MIPS RISC DEC Alpha DEC Alpha AXPIBM 370 IBM VAX/VMS Microcomputers VAX CPU

HP 9000-700 HPPA RISCLegend: CISC: Complex Instruction Set Computer

RISC: Reduced Instruction Set ComputerNote: (a) Including Windows 9x/NT/2000 and Linux and Solaris on Intel CPUs.

two access means currently implemented by different CPUs in the market, which differon the significance of the bytes that are first read from memory positions.

On the so-called big-endian systems, the first byte read from a multi-byte structure isalways the most significant byte. For example, if the two bytes 0x12 (low address) and0x34 (high address) are stored in two consecutive memory addresses, then the numberread and stored in the CPU accumulator would be 0x3412, or 13330 in decimal. Thebig-endian data organization is, for this reason, also known as high-byte first.

For the so-called little-endian systems, the first byte read from a multi-byte structure isalways the least significant byte. For this reason, the little-endian data organization isalso known as low-byte first. Using the same example as before, for the two consecutivebytes in memory 0x12 and 0x34, the value loaded on a little-endian CPU will be 0x1234,or 4660 in decimal.

The concept is extended to other multi-byte data structures, such as 32-bit or 64-bit inte-gers. For example, the consecutive bytes 0x12, 0x34, 0x56, and 0x78 would be loaded asthe 32-bit integer 0x78563412 on the accumulator of a big-endian CPU and as 0x12345678on the accumulator of a little-endian CPU.

Table 2.1 indicates the data organization for several computer platforms. It should benoted that the data organization is a function of the CPU family rather than of theoperating system used. For example, Solaris on Sparc platforms uses big-endian dataorganization, while Solaris on Intel 80x86/Pentium platforms uses little-endian data or-ganization. Similarly, most Linux systems are little-endian (because they run on Intel80x86/Pentium CPUs), but several other implementations are actually big-endian (e.g.PowerPC CPU used in Macintosh machines).

The segment of C code in figure 2.2 can be used to determine whether a given computersystem has big- or little-endian data organization.

The approach above determines whether a platform is big- or little-endian, but it doesnot answer the question of what is the byte orientation in a given file. Although there isno closed-form method for such a determination, there is an empirical method that canbe carefully used for speech signals (usually represented using 16-bit linear PCM words)based on two speech properties: speech signals follow a gamma distribution (hence mostof the samples have small amplitude), and levels in voiced segments are usually in the –15dBov through –40 dBov range. For files that have a byte orientation mismatching that ofthe computer platform, the mostly small samples of the speech signal will be measured as


#include <stdio.h>#include <string.h>

int is_little_endian()

/* Hex version of the string ABCD */unsigned long tmp = 0x41424344;

/* Compare the hex version of the four characters with the ASCII version *//* On big-endian (or high-byte-first) systems, 0x41 (’A’ in ASCII) *//* is stored in the first memory position, and the equivalent string *//* is "ABCD". On a little-endian (or low-byte-first) system, 0x41 is *//* stored in the last position, and the equivalent string will be *//* "DCBA". Function strncmp will return 0 if both strings are equal *//* upto the first four characters. */return(strncmp("ABCD", (char *)&tmp, 4));

void main()

printf("System is %s-endian\n", is_little_endian()? "little" : "big");

Figure 2.2: Sample code for determination of byte organization.

having large amplitude. Hence, if a high-level power is found when measuring the powerof a voiced segment (typically around –4 dBov), one can assume that the file needs to bebyte-swapped. It is important however to measure the level for voiced segments, since forsilent intervals the increase in gain is not so dramatic and will not allow for a conclusionon the byte-orientation of the file.

When the change of format is necessary for short and long data, the operations in figure2.3 should be used. The conversion between big- and little-endian data representation for16-bit data is simple and is known as byte swapping. The byte swapping operation canbe implemented in several fashions. For example,

short swap_one_short(short in)

return (((in>>8)&0xFF) | (in<<8));

It should be noted that the simple byte-swapping above does not work properly for con-version of other multi-byte structures. For the purposes of the STL, however, 16-bitstructures is the most import case. For several of the STL modules, the provided test filesin general need to be byte swapped in one or another computer platform. The documen-tation and the “manifesto” accompaining each software tool module describe which files,if any, should be byte-swapped on certain platforms. As default, binary files organized in16-bit words are provided in big endian format in the STL distribution.


Byte 1 Byte 2 Byte 3 Byte 4

Byte 1 Byte 2 Byte 3 Byte 4

0 7 8 15 16 23 24 31

0 716 2324 31 8 15

Byte 1 Byte 2

Byte 1 Byte 2

0 7 8 15

0 78 15

(a) Conversion between little− and big−endian for 32−bit data (b) Conversion between little− and big−endian for 16−bit data

Figure 2.3: Conversion between big- and little-endian

2.3 Guidelines for software tool development

The software tools provided by the ITU-T User’s Group on Software Tools are to be usedby laboratories with different computers and A/D-D/A equipment. To make the softwareaccessible to everybody, it should be highly portable across operating systems and allowfor easy implementation in existing hardware environments.

To achieve this, some simple guidelines were followed in the development of the tools.The following are the UGST guidelines used to generate the official and beta releases ofthe ITU-T Software Tool Library.

i. All software should be written in ANSI C.

ii. Features of the language whose representation may create side-effects should not beused (e.g. union).

iii. All variables must be declared and the types used in the declarations must be theleast platform dependent. For example, the keyword int must be avoided. Insteadshort should be used for 16-bit integers and long should be used for 32-bit integers.

iv. The software should not contain any input or output that may be system dependent(e.g. open, read and write file operations). Instead, data must be passed to themodules as parameters of function calls. This will allow each laboratory to integratethe modules with their own application software without changing the modules.Interfaces to various file formats and user interaction can optionally be provided asexample main programs1 that will not be a part of the library module and shouldcontain the least possible amount of code.

v. Well defined digital signal formats should be used and documented for each moduleto allow the various modules to work together.

vi. The interface to the file system should be made in a standard way, but only withinthe example programs.

1Also called “demonstration programs” in this manual.


vii. The source code should be properly documented, with a standard header.

viii. Modularity is encouraged in the software design. All modules are self-contained, i.e.global definitions should be avoided.

ix. Each module should have an attached specification document explaining the func-tion and use of the module, the level of detail depending on its complexity.

x. The software modules shall be distributed to interested laboratories for commentsand testing before they are approved and included in the ITU-T Software ToolsLibrary, to minimize the ocurrence of bugs and to assure conformance with relatedITU-T Recommendations (when applicable). Two test procedures have been de-vised: compliance and portability.

The compliance procedure (or compliance test) is to certify that a given tool module fullycomplies with specifications, which should be carried out by at least one organizationother than the proponent organization (or by a group of organizations, each one checkinga different subset of the specifications, such that all together cover all the specifications).In order to minimize the probability of systematic errors, these procedures should bedefined by the verifying organization(s) without input from the tool provider(s).

The portability verification procedure (or portability test) is to certify that a given val-idated tool works on platforms other than the one(s) where they were generated andvalidated. In simple cases these verification procedures could be just test vectors (e.g.speech or noise files). It was also pointed out that problems may arise in Unix plat-forms, due to the existence of several flavors of Unix available today (this means that averification procedure could be valid in one Unix machine, but not in other).

Portability verification procedures should be provided by the proponents and shall be runon at least two relevant operating systems (DOS, UNIX). In the past, procedures for theVMS operating system used to be required, however this operating system has becomeless common. For DOS, the “pure” 16-bit mode has become less common, and 16-bitemulation window under a 32-bit version of MS Windows is now prevalent. These factsaffect the choice of compiler.

The following is a list of compilers used to test the portability of tools in the STL, althoughnot all tools were necessarily tested with all compilers.

HP/c89 This is the c89 compiler that can be purchased from HP for use in HP-UXsystems. For the STL, tests with this compiler were performed with HP-UX9.05.

HP/gcc This is the HP-UX port of the gcc compiler. The specific version may differfrom tool to tool. Versions used included gcc 2.7.2.2 for HP-UX 9.05 andgcc-2.95.2 for HP-UX 10.20.

MSDOS/gcc This is the MSDOS-6.22 port of the gcc compiler version 2.6.3-DJGPP V1.This is a 32-bit compilation of the code, however using a 16-bit interface.Executables are not likely to run under Windows MS-DOS emulation win-dow. Needs a run-time 32-bit extender called go32.exe.

MSDOS/tcc This is the Borland Turbo C++ Version 1.00 tcc compiler.


MSDOS/bcc This is Borland C++ bcc compiler. Versions used included 3.0 and 4.5.

Solaris/gcc This is the gcc compiler version 2.95 running under Solaris 7, usually in aSparc platform.

SunOS/cc This is the basic cc C compiler bundled in the SunOS distribution. For theSTL, SunOS version 4.1.3 was used.

SunOS/acc This is the licensed acc C compiler sold by Sun Microsystems. For the STL,SunOS version 4.1 was used.

Win32/gcc This is the gcc compiler version 2.95 running under Windows NT 4 SP 4and with the CYGWIN Unix emulation interface. These executables needeither the CYGWIN environment or the run-time library cygwin1.dll to run,and they are expected to work properly in a DOS emulation window un-der Windows 95/98 as well. This version will not run under native MS-DOS.

Win32/cl This is the command-line cl version 12.00.8168 C compiler of the MS VisualC V.6 SP3 running under the WinNT 4 SP4 (the executables will also runin Windows 95/98/SE/Me/2000). This version will not run under nativeMS-DOS.

2.4 Software module I/O signal representation

The idea behind the choice of the convention in this section is that all software moduleswithin the ITU-T tool library should be independent building blocks which can easilybe combined by connecting the output of one module to the input of the next module.With this characteristic, various systems may be very easily constructed. The individ-ual software modules must have well-defined interfaces to allow such simple connections,especially at the I/O level. This convention is based on the following:

1. All modules work ‘from RAM to RAM’. This means that the working modules areindependent from physical I/O functions which are normally machine dependent.This approach also allows easy cascading of modules within one ‘main’ program.

2. All signals at the I/O interfaces of modules are represented in one of the followingways:

(a) in single or double precision (32 or 64 bit) floating point representation. Thenormalized signal is used directly (overload point = reference point = 1.0)

(b) in 32 bit 2’s complement representation. The normalized signal must be mul-tiplied by 231 (i.e. the decimal point is just after the MSb, same as for 16 bitrepresentation). If less than 32 bits are required, then the signal is left adjustedwithin the 32 bit longword and the LSbs are optionally set to 0.


(c) in 16 bit 2’s complement representation, as described in section 2.2.1. If lessthan 16 bits are required, then the signal is left adjusted (left-justified) withinthe 16 bit words and the LSbs are optionally set to 0. If the host machinedoes not provide a format with 16 bit width, then the next longer wordlengthshould be used with the 16 bits right adjusted.

3. Data exchange with a module shall be done directly within the calling statement(not by global variables).

4. Data exchange with a module shall be done sample-by-sample (FIR-filtering, MNRU,etc.) or frame-by-frame (block oriented speech codec, etc.), whichever is more con-venient. Larger blocks may be formed (e.g. 128 samples at a time) for betterefficiency, however the block size should be rather small (less than 512). The blockand its length shall be variables.

5. All modules shall be constructed in a way that infinitely long signals may be pro-cessed with a reasonable amount of internal storage. As an example, the ‘main’program could read a block of input data (e.g., next frame of time signal samples)from the disk, call a module or sequence of modules, write the output signal (e.g.,next frame of coded parameters) back onto disk. This process is repeated for all theinput data blocks of interest.

6. All modules shall have

(a) an initialization part (if necessary) and

(b) a working part

The initialization part may be necessary to reset internal state variables, define themode of operation (e.g. MNRU-mode), and so on. It is called only once at thebeginning or whenever a reset to an initial state is needed.

NOTE: All state variables (if any) must be initialized at execution time, notat compile or load time.

The working part performs the processing itself. It leaves all state variables in awell-defined manner for the immediate use within the next call. One possible wayto do this is to introduce a flag-variable within the call statement (e.g., named‘Initialize’) which is set by the ‘main’ program to ’1’ for initialization and is setto ’0’ during normal operation. In this way, only one function for one module isnecessary. Alternatively, a specialized initialization routine may be written, to becalled before the main processing routine of the module. Only one of the approacheswill be followed in the future. However, both are present in the current version ofthe STL.

7. The RAM allocation shall in principle be split into ’static’ and ’temporary’ parts.‘Static’ means that the contents must be saved from call to call, preferrably bymeans of state variables rather than truly static variables2. ‘Temporary’ meansthat the contents are not saved between sucessive calls of the module.

8. All modules are separated in clearly and independently defined functions, but ac-companied by an example ‘main’ program which may also include file I/O.

2As a rule, state variables should not be defined as truly static ones because this may cause side-effects.


2.5 Tool specifications

For each tool, there are ‘Requirements and Objectives’ (R&Os) associated. Each of theR&Os has both a general and a specific part.

The general part includes the following3:

1. Portability among platforms and Operating Systems (DOS, UNIX, andVMS):

• compilation [GL-i];

• usage of language features that may cause side-effects [GL-ii];

• usage of language features that may be ambiguous among platforms[GL-iii];

• usage of system dependent calls (to access resources such as files, etc.within the modules) [GL-iv];

2. Efficiency:

• use of CPU (i.e., execution speed);

• use of I/O (intensity of access to files, etc.);

• use of memory (physical/virtual);

• code’s coverage (verbosity versus laconism);

3. Documentation:

• Self-documentation (e.g., comments, variables and structure resem-bling ITU-T Recommendations, etc.)[GL-vii];

• Separate documentation (clarity, objectivity, etc.)[GL-ix];

4. Modularity [GL-viii]

5. Fixed point versus floating point implementations;

Following are descriptions of each of the General R&Os. Full description of the R&Os canbe found in [2, Annex 4].

General performance specification refers to the document that specifies the tool in ques-tion, e.g. an ITU-T recommendation or ANSI or ETSI standard.

Portability addresses several points related to the tool’s capacity of working on severalplatforms: Compilation and linkage refers to the necessity of changes in the source codeto make a tool compile without any modification in a given environment. It was identifiedthat the operating systems of most interest are DOS and Unix (both BSD and System V).Side-effectable features are those that, if used in a program, when changing one parameter,may cause other(s) to be changed implicitly. Ambiguous features are those that, due tothe flexibility left in the C language specification, are implemented in different ways fordifferent platforms. For example, int in C is 32-bit wide in VAX-C and Unix workstations,but is 16-bit wide for most compilers available on MS-DOS (Turbo-C/MS-C). System-dependent calls are calls that are restricted to or are implementation of features of aparticular platform, to make better use of that particular computer architecture.

3GLx refers to the Guideline number x in section 2.3, e.g., GLiii is the Guideline iii.


Efficiency is related to how the computer’s resources are used in terms of CPU, I/O andmemory allocation, that may be a burden and prevent the usage in some systems, eitherby lack of resources or length of time needed for execution. Efficiency also includes code’scoverage, expressing how frequently code is accessed.

Documentation refers to how to describe the tool. Self-documentation is the documenta-tion present in the program itself to assure that the code clearly describes the algorithmimplemented, to provide compilation and linkage instructions, as well as to report knownbugs, etc. A separate document will be mandatory when no written description of thealgorithm is available, or when the written documents that specify the tool are too general.

Modularity degree is the degree of isolation that a particular tool has. From UGSTGuidelines, all tools must be modular, i.e., self-contained blocks; nonetheless, tools maymake use of system resources other than memory and CPU.

Arithmetic is the number representation specification, whether fixed (2’s complement, 1’scomplement, etc.) or floating point. Here, “fixed point” shall always be understood as2’s complement representation, except where otherwise noted.

Chapter 3

RATE-CHANGE: Up- anddown-sampling module

In certain applications involving digitized speech, such as subjective evaluation of speechprocessed by digital algorithms, it may be preferable to use sampling higher than thetypical rate used for the algorithms under test. This is desirable because simpler analogfilters with less phase distortion can be built. Another advantage is that upper frequencycomponents of the signal are not lost. It also allows for the convenient shaping of theinput signal, such as IRS, ∆SM , and psophometric weightings. Consequently there is aneed to adapt the sampling rate of the digitized signal to that of the processing algorithm.For telephony applications, the typical sampling rate is 8000 Hz with a signal bandwidthin general of 300–3400 Hz, and for wideband speech applications, a bandwidth of 50-7000Hz is desired with sampling rate of 16000 Hz. Therefore, sampling rates above 8000 Hzand 16000 Hz are desirable, respectively. In several experiments [3] the sampling rate was16 kHz. In others (see [4] and [5]), 48 kHz and 32 kHz were utilized. Hence the need for asoftware tool to carry out filtering and sampling rate change. Next, the rate change andspectral weighting routines implemented in the ITU-T STL are presented.

3.1 Description of the Algorithm

Signal processing theory describes the basic arrangement for decimation of signals; firstthe signal is low-pass filtered to limit its bandwidth in order to avoid aliasing when therate is lowered and, second, to decimate the samples, i.e., to drop out samples from theinput signal, such that the desired output rate is obtained. For example, if a rate reductionfrom 48 kHz to 8 kHz is desired, a decimation factor of 6:1 is necessary. This is equivalentto say that, after limiting the bandwidth of the digitized speech to 4 kHz, 5 out 6 samplesare skipped, or alternatively, only 1 out of 6 samples will be kept (or saved) from thesignal.

The up-sampling of signals requires that each of the input samples be followed by a numberof zero samples, such that the desired output rate is achieved; after this, an interpolationoperation of these zero samples is performed to obtain a continuous-envelope signal. Forexample, up-sampling data from 8 kHz to 16 kHz requires interleaving each sample of theinput signal with a zero sample followed by interpolation of the signal. This interpolationcan be carried out by means of a polynomial, which is equivalent to a filtering operation.

19


The type of filtering required is determined by the application intended for the signals.For the tools needed in this version of the STL, three different groups of characteristicswere defined:

• High-quality: Change in rate without changing the frequency response ofthe input signal. This is accomplished with a flat, linear phase, low-pass orbandpass FIR filter.

• Spectral weighting: Spectral weighting without rate change is necessaryfor some applications. For narrow-band speech, available are the IRS weight-ing specified in ITU-T Rec. P.48, the so-called “modified” IRS (annex D ofITU-T Rec. P.830), the far-to-near-field conversion ∆SM weighting, and thepsophometric noise weighting of ITU-T Rec. O.41. For wideband signals, themask for wideband handsets, as defined in ITU-T Rec. P.341, is also available.For super-wideband signals, the mask for super-wideband videoconferencingterminals has been derived as an extension of ITU-T Rec. P.341.

• PCM quality: Change in rate accompanied with modification of the fre-quency response of the input signal according to the mask specified in ITU-TRecommendation G.712. This is accomplished with a non-linear phase low-pass IIR filter.

3.1.1 High-quality

The response of the filters in this type of rate change must minimize phase and amplitudedistortion. For example, for decimation from 48 kHz to 16 kHz, the filter must be flatup to about 8 kHz (except for the transition, or cut-off, region), with a linear phase. Inother cases, it may be desirable to remove the DC component and hum noise (50–60 HzAC line noise) from the signal without additional phase distortion to the upper region ofthe spectrum.

z-1 -1

z z-1 -1

zz-1 -1

z z-1

y(k)

hN-1

hN-1

hhhh0 1 2 3

hhhh0 1 2 3

x(k)

Figure 3.1: FIR filter block diagram.

One way to do this is to use a linear phase finite impulse response (FIR) digital filter, asin figure 3.1. The input and output characteristic is defined by:

y(k) =N−1∑i=0

h(i) · x(k − i)


-1

K

1 01

11

0L

1L

2L

1L

21

11

1

c

c

c

c

b

b

b

b

z

z

z

z

y(k)x(k)

-1

-1

-1

Figure 3.2: Parallel-form IIR filter block diagram.

Linear phase is guaranteed if the filter is symmetric, i.e.:

h(k) = h(N − 1− k), for k = 0..N − 1

3.1.2 Telephony-band weighting

IRS weighting

The IRS weighting corresponds to a bandpass filtering characteristic whose mask can befound in ITU-T Recommendation P.48 [6]. The send and receive spectral shapes of theIRS weighting were obtained in a round-robin series of measurements made on a numberof contemporary analog telephones in the early 1970’s [7]. From these measurements, theaverage send and receive frequency-response characteristics were derived. However, forthe loudness balance purposes for which the IRS was designed, it was also necessary toinclude a 300-3400 Hz bandpass filter, known as the SRAEN filter. The values of send andreceive sensitivity currently given in ITU-T Rec. P.48 (columns 2 and 3 in Table 3.1) aretherefore composed of the average send and receive responses for a number of telephones,as well as the response of the SRAEN (Systeme de Reference pour la determination del’Affablissement Equivalent pour la Nettete; Reference System for determining Articula-tion Ratings) filter (see column 4 of Table 3.1).

Because the P.48 IRS weighting used to be considered to model of an average narrow-band telephone handset deployed in the PSTN, the IRS weighting has been chosen tosimulate speech signals obtained from a regular handset. Examples of standardizationefforts using the P.48 weighting characteristic are the ITU-T Recommendations G.711,G.721, and G.728. This weighting, as defined in P.48, is sometimes called “full-IRS”weighting.

While the weighting characteristic in P.48 was considered to model connections over ana-log transmission facilities in the past (although it is not clear why the SRAEN filtershould be included in both the send and receive paths), it is no longer representative ofconnections over modern digital facilities. In particular, the low frequency roll-off givesrise to unnecessary quality degradation. For the purpose of low bit-rate coder evaluation,especially where the coder is located in the telephone handset, a better characteristic can


Table 3.1: Send and receive amplitude frequency characteristics for the IRSresponse as in ITU-T Rec.P.48, the SRAEN filter, and the modified IRS (P.48IRS with SRAEN filter insertion loss removed).

Frequency P.48 IRS SRAEN Modified IRSSend Receive Filter Send Receive

(Hz) (dbPa/V) (dbPa/V) (dB) (dbPa/V) (dbPa/V)

100 -45.8 -27.2 14.1 -31.7 -13.4125 -36.1 -18.8 11.4 -24.7 -7.4160 -25.6 -10.8 8.4 -17.2 -2.4200 -19.2 -2.7 5.9 -13.3 3.2250 -14.3 2.7 4.0 -10.3 6.7300 -11.3 6.4 2.8 -8.5 9.2315 -10.8 7.2 2.5 -8.3 9.7400 -8.4 9.9 1.4 -7.0 11.3500 -6.9 11.3 0.6 -6.3 11.9600 -6.3 11.8 0.3 -6.0 12.1630 -6.1 11.9 0.2 -5.9 12.1800 -4.9 12.3 0.0 -4.9 12.31000 -3.7 12.6 0.0 -3.7 12.61250 -2.3 12.5 0.0 -2.3 12.51600 -0.6 13.0 0.1 -0.5 13.12000 0.3 13.1 -0.2 0.1 12.92500 1.8 13.1 -0.5 1.3 12.63000 1.5 12.5 0.5 2.0 13.03150 1.8 12.6 0.3 2.1 12.93500 -7.3 3.9 7.0 -0.3 10.94000 -37.2 -31.6 33.7 -3.5 2.15000 -52.2 -54.9 43.2 -9.0 -11.76300 -73.6 -67.5 -23*8000 -90.0 -90.0 -40*

(*): Values estimated from the modified IRS implemented in the STL.


be obtained by modifying the P.48 full-IRS response to remove the SRAEN filter as shownin columns 5 and 6 of Table 3.1. These values are specified in Annex D of ITU-T Rec-ommendation P.830 [8] and define the so-called “modified” IRS weighting. The modifiedIRS has been used in the development of ITU-T Recommendations G.723.1 and G.729.

The most important part of either the full or the modified IRS weighting is the transmis-sion (or send) characteristic. The receive characteristic is less important because listeningis in general done using handsets conforming to P.48 (which eliminates the need for filter-ing by the software, since it is done by the telephone terminal). In addition, the receivecharacteristic is relatively flat. Some studies also show that the use of headphones insteadof handsets does not result in significantly different results while yielding lesser listenerfatigue [9, 10]. Nevertheless, for cases where the receive-side MIRS filter is to be applied,a FIR implementation of this filter is available for 8000 Hz and 16000 Hz sampling rates.

An unspecified point in both P.48 and modified IRS is the phase response of the filter.There have been discussions within UGST on whether this should be implemented with alinear characteristic. The conclusion was that, since the phase response is unspecified, itshould be kept as generic as possible, what is better accomplished by keeping the phaselinear1. If a certain non-linear phase characteristic is desired by the user, this can beimplemented by cascading an all-pass filter with the desired phase response with one ofthe available FIR IRS implementations. Therefore, the IRS filters are implemented asFIR filters, as depicted in figure 3.1.

ITU-T Recommendation P.48 presents the nominal values for the amplitude response incolumn 2 of its Table 1 (here reproduced in column 2 of Table 3.1) and then the upper andlower tolerances listed in its Table 2. For the STL approach, it was decided to design IRSfilters whose characteristic would deviate no more than 0.5 dB from the average values inP.48 (see in Figure 3.12 the agreement of the nominal values, represed by dots, and themeasured frequency response for the original P.48 IRS characteristic, represented by thecontinuous curve in the figure).

Other weightings

A filter that simulates the input response characteristic of certain mobile terminals wasincorporated in the STL for data sampled at 16 kHz. Figures 3.10 and 3.11 display therespective frequency and impulse responses for the filter.

Another filter that models the input response characteristic of certain super-widebandvideoconferencing terminals was incorporated in the STL for data sampled at 32 kHz.Figures 3.22 and 3.23 show the respective frequency and impulse responses for the filter.

3.1.3 Wideband weighting

P.341 weighting

While the IRS filter is applicable to telephony bandwidth (or narrowband) speech, forwideband speech the specification for the send and receive sides is given in ITU-T Recom-mendation P.341 [11]. The mask specified in P.341 is rather wide, and an implementation

1In spite of that, a non-linear phase IIR IRS filter is provided in the IIR module as an example of acascade-form IIR filter implementation.


of the send-side mask agreed on by the experts has been incorporated in the STL.

Other weightings

In the process to select a wideband codec at 32 and 24 kbit/s, a 50 Hz-5 kHz bandpass filterwas developed and incorporated in the STL. Figures 3.24 and 3.25 display the respectivefrequency and impulse responses for the filter.

A 100Hz-5kHz filter was also designed for tests of another wideband codec. Figures 3.26and 3.27 show the respective frequency and impulse response of the filter.

3.1.4 Super-wideband weightings

P.341 extension weighting

For super-wideband signals, the mask for super-wideband videoconferencing terminals isbased on the ITU-T Rec. P.341. The sensitivity/frequency characteristics of the P.341filter were extended to a larger band [50Hz - 14 kHz] with a sampling frequency of 32kHz. The corresponding 50 Hz-14 kHz filter was developed and incorporated in the STL.Figures 3.22 and 3.23 display the respective frequency and impulse responses for the filter.

Other weightings

MUSHRA anchors for a sampling frequency of 48kHz are also provided with the STL.Anchors with cut-off frequencies 3.5kHz, 7kHz and 10kHz were designed. Their frequencyresponses are shown, respectively, in figures 3.28, 3.30, 3.32 and there impulse responseson figures 3.29, 3.31, 3.33.

3.1.5 Noise weighting

Two weighting filters are available in this version of the STL, the psophometric and the∆SM weighting filters.

The psophometric weighting curve defined by ITU-T Recommendation O.41 is used formeasuring the noise level in telephone circuits, accounting for the subjective perception ofnoise. The psophometric noise measure (given in dBmp) is related to the North-AmericanC-message weighting curve (given in dBrnC), using to the following:

dBmp = dBrnC − 90.0dB

The other type of signal weighting filter is the ∆SM , used for converting acoustic signalsrecorded in the far field using an omnidirectional microphone to the near-field equivalentof that signal if it were in the background of a telephone user. Owing to the directionalityof the human mouth, head and torso, the high frequencies will mainly be radiated in thefrontal direction, while the diffuse field will represent a spatial integration of the radiationin all directions [12]. Hence, the ∆SM filter is deployed for weighting acoustic noises(babble, vehicular, etc.) before electrical summation with clean speech files, in order to


simulate speech corrupted by background noise. It is useful in subjective listening testswhere precise control of the actual SNR is necessary.

Both these filters have been implemented as FIR filters. The psophometric filter has beendesigned for speech sampled at 8 kHz, and the ∆SM filter for speech sampled at 16 kHz.It should be noted that these filters, like the IRS filters, are also frequency-specific and,unlike the low-pass high-quality FIR filters described before, cannot be used for arbitraryrate ratio convertion.

3.1.6 PCM Quality

There are applications requiring the simulation of the response of filters found in the A/Dand D/A interfaces of current transmission systems, which are in general PCM systemssatisfying ITU-T Recommendation G.711. The filters associated with G.711 are specifiedin Recommendation G.712 [13]. The main characteristic of these filters is the low out-of-band rejection of 25 dB.

In this context it is also necessary to simulate the convertion back to and forth theanalog domain, e.g. to simulate multiple transcodings which are called asynchronoustrancodings2. One way to simulate asynchronous transcodings is by means of a non-linearphase filter (non-constant group delay), which is most efficiently implemented using IIRfilters.

Infinite impulse response (IIR) filters used in this tool are of the parallel form (see figure3.2), described by the equation:

HpI (z) = K +

L∑l=1

b0l + b1lz−1

1 + c1lz−1 + c2lz−2

and of the cascade form (see figure 3.3), described by the equation:

HcI (z) =

N∏l=1

b0l + b1lz−1 + b2lz

−2

1 + c1lz−1 + c2lz−2

3.2 Implementation

The rate change algorithm is organized in two modules, FIR and IIR, with prototypesrespectively in firflt.h and iirflt.h. It evolved from a version initially developedby PKI, as part of the ETSI Half-rate GSM codec Host Laboratory exercise [14]. Therate-change functionality was incorporated in the STL92 in two main files, hqflt.c andpcmflt.c. To make these routines more flexible, the following modifications were in-cluded:

FIR: the FIR module was divided into a library source file (fir-lib.c) containing thebasic filtering and initialization functions, as well as into source files for each kindof filter: fir-flat.c for high-quality low-pass and bandpass filters, fir-irs.c forthe classical and modified IRS filters, and so on;

2As the name indicates, there is no synchronisation between sampling instants of the two digitalsystems, i.e., re-sampling in the succeeding A/D is not synchronous to the clock in the preceeding D/Aconverter.


N

y (k)i

x (k)i

x (k)1

y (k)

-1

1

z

z

c

c

b

b

b

1

z

z

c

c

b

b

b

1

z

z

01

11

21

11

21

c

c

b

b

b

0N

1N

2N2N

1N

0i

1i

2i

1i

2i

...

...

-1

-1

-1

-1

Figure 3.3: Cascade-form IIR filter block diagram.

IIR: the IIR module was divided into a library file (iir-lib.c) containing basic filter-ing and initialization functions, as well as into source files for each kind of filter:iir-g712.c for G.712 filtering using the parallel-form filters, iir-flat.c for flatbandpass 1:3 and 3:1 asynchronization filtering using a cascade-form filter, and soon.

Files fir-*.c of the FIR module contain all the routines implementing FIR filters, i.e.,the high-quality filters and IRS, ∆SM and psophometric weighing filters. Files iir-*.c

of the IIR module implement the IIR filters, i.e., the parallel-form PCM filter and thecascade-form 3:1 asynchronization filter.

Some of these filters have been implemented using 24-bit coefficients, thus allowing real-time, bit-exact hardware implementation of these routines. It may noted that, for thesefilters in the STL, the calculations are performed in floating point by converting thecoefficients from the range −223..223–1 to −1.. + 1, which is not needed in real timehardware with fixed point DSPs.

Frequency response and impulse response plots are provided for the STL filters in theforthcoming sections. It should be noted, however, that the impulse responses shown havebeen computed from the 16-bit quantized impulse responses of the filters, as generated bythe demonstration programs, while the frequency responses were calculated as describedin section 3.3. It should be noted that the apparent asymmetry in some impulse responseshappens because an integer number of samples are generated, and linear interpolation isused to draw the figure. If the impulse responses were derived directly from the filtercoefficients, the plot would be symmetric.


NOTE: When the same filter type is used by several independent speech materials (e.g.several speech files) within the same execution of an application program, the user mustremember that the filters have memory. Hence, wrong results can be obtained if a givennumber of initial samples are not discarded. See section 3.4 for an example, where thefirst 512 samples are skipped when calculating the power level of the output tone.

3.2.1 FIR module

The frequency responses of the implemented high-quality low-pass filters are shown infigures 3.4 and 3.5 (for rate-change factors 2 and 3, respectively), while the telephonebandwidth bandpass filter is given in figure 3.6 (only a rate-change factor of 2 is available).The impulse responses of these filters are given in figures 3.7, 3.8, and 3.9, respectivelyfor the up-sampling filters (factors 2 and 3), for the down-sampling filters (factors 2 and3), and for the bandpass filter.

The transmit-side IRS filter has been implemented for the “regular” and modified flavors.The regular transmit-side P.48 IRS filter amplitude responses are shown in figure 3.12(the available sampling rates are 8 and 16 kHz). The transmit-side modified IRS filteris available for sampling at 16 kHz and 48 kHz, and their frequency responses are shownin figure 3.14. The impulse response of these transmit-side IRS filters are in figures 3.13and 3.15 for the regular and modified IRS filters, respectively. The receive-side modifiedIRS filter has also been implemented and the frequency responses for 8 kHz and 16 kHzsampling rate are found in 3.16. The impulse responses of the receive-side modified IRSfilters are shown in figure 3.17.

The frequency response of the STL psophometric filter is given in figure 3.18, and that ofthe ∆SM filter in figure 3.19.

For wideband signals, three weighting filters are available. The transmit-side ITU-T P.341filter amplitude response is shown in figure 3.20, and its impulse response is shown in figure3.21. Alternatively to the P.341 filter, the frequency and impulse responses of the twobandpass filters, 50Hz-5kHz bandpass filter and 100 Hz-5k Hz, are shown, respectively, infigures 3.24 and 3.25, and in figures 3.26 and 3.27.

For super-wideband signals, four weighting filters are available. The 50Hz-14kHz bandpassfilter, extension of ITU-T P.341 filter, is presented in figures 3.22 (frequency response),and 3.23 (impulse response). Alternatively to this P.341 filter extension, the frequencyresponses of the three MUSHRA anchors filters, LP3.5, LP7 and LP10 filters, are shownin figures 3.28, 3.30 and 3.32, respectively. Their impulse responses are shown in figures3.29, 3.31 and 3.33, respectively.

The high-quality filters were implemented for rate-change factors of 2 and 3. The IRSfilters, band-limiting filters and MUSHRA anchors have been designed for specific sam-pling rates (e.g. 8 and 16 kHz). It should be noted that, while the high-quality filters areindependent of the rate, these filters are not, because their masks are specified in termsof Hz, rather than normalized frequencies. This means that to carry out a high-qualityup-sampling from 8 to 16 kHz, and from 16 to 32 kHz, the same routines are called, whilefor IRS, band-limiting or MUSHRA anchors, there is no rate-change routine from 16 to32 kHz.

Since the digital filters have memory, state variables are needed. In this version of the


STL, a type SCD FIR is defined, containing the past sample memory, as well as filtercoefficients and other control variables. Its fields, whose values shall never be changed bythe user, are as follows:

lenh0 . . . . . . . . . .Number of FIR coefficientsdwn up . . . . . . . Down-sampling factork0 . . . . . . . . . . . . .Start index in next segment (needed in segment-wise filtering)h0 . . . . . . . . . . . . .Pointer to array with FIR coefficientsT . . . . . . . . . . . . . Pointer to delay linehswitch . . . . . . . .Switch to FIR-kernel: up- or down- sampling

The relevant routines for each module are described in the next sections.3

3It should be noted that in the source code files there are local (privately-defined) functions which arenot intended to be directly accessed by the user and therefore are not described here.


-80

-60

-40

-20

0

0 500 1000 1500 2000 2500 3000 3500 4000

|H(f

)| [d

B]

Frequency [Hz](a) High-quality filter for up-sampling.

-80

-60

-40

-20

0

0 1000 2000 3000 4000 5000 6000 7000 8000

|H(f

)| [d

B]

Frequency [Hz](b) High-quality filter for down-sampling.

Figure 3.4: High-quality filter responses for a factor of 2 and sampling rates of 8000 and16000 Hz.


-80

-60

-40

-20

0

0 500 1000 1500 2000 2500 3000 3500 4000

|H(f

)| [d

B]

Frequency [Hz](a) High-quality filter for up-sampling.

-80

-60

-40

-20

0

0 2000 4000 6000 8000 10000 12000

|H(f

)| [d

B]

Frequency [Hz](b) High-quality filter for down-sampling.

Figure 3.5: High-quality filter responses for a factor of 3 and sampling rates of 8000 and24000 Hz.


-80

-60

-40

-20

0

0 500 1000 1500 2000 2500 3000 3500 4000

|H(f

)| [d

B]

Frequency [Hz](a) High-quality bandpass for up-sampling (factor 1:2).

-80

-60

-40

-20

0

0 1000 2000 3000 4000 5000 6000 7000 8000

|H(f

)| [d

B]

Frequency [Hz](b) High-quality bandpass for 2:1 down-sampling or for 1:1 filtering.

Figure 3.6: High-quality bandpass filter responses. Mask shown is that of the G.712 filter,for reference.


Figure 3.7: Impulse response for high-quality up-sampling filters (top, factor of 2; bottom,factor of 3).


Figure 3.8: Impulse response for high-quality down-sampling filters (top, factor of 2;bottom, factor of 3).


-10000

-5000

0

5000

10000

15000

20000

25000

30000

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170

Am

plitu

de []

Samples []

(a) High-quality bandpass for up-sampling (factor 1:2).

-4000

-2000

0

2000

4000

6000

8000

10000

12000

14000

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85

Am

plitu

de []

Samples []

(b) High-quality bandpass for down-sampling (factor 2:1).

Figure 3.9: Impulse response for high-quality bandpass filter (factors 2:1 and 1:1). Top isup-sampling by a factor of 1:2, and bottom is down-sampling by a factor of 2:1.


-90

-85

-80

-75

-70

-65

-60

-55

-50

-45

-40

-35

-30

-25

-20

-15

-10

-5

0

5

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 8000

|H(f

)| [d

B]

Frequency [Hz]

Figure 3.10: STL mobile station input (MSIN) frequency response for data sampled at16 kHz (factor 1:1).

-5000

0

5000

10000

15000

20000

25000

30000

35000

0 20 40 60 80 100 120 140 160 180

Am

plitu

de []

Samples []

Figure 3.11: STL MSIN send-side filter impulse response.


-80

-60

-40

-20

0

100 1000

|H(f

)| [d

B]

Frequency [Hz]

STL Freq.Resp. IRS filter - sampl.freq=8kHz

(a) Transmission-side IRS for input samples at 8 kHz.

-80

-60

-40

-20

0

100 1000

|H(f

)| [d

B]

Frequency [Hz]

STL Freq.Resp. IRS filter - sampl.freq=16kHz

(b) Transmission-side IRS for input samples at 16 kHz.

Figure 3.12: Transmission-side IRS filter responses. The diamonds represent the nominalvalues of the “full” IRS characteristic and the interrupted line represent the mask of the“full” IRS, as shown in figure 2 of ITU-T Rec. P.48.


Figure 3.13: Impulse response of transmission-side IRS filters at 16 and 8 kHz.


-60

-55

-50

-45

-40

-35

-30

-25

-20

-15

-10

-5

0

5

10

100 1000

|H(f

)| [d

B]

Frequency [Hz]

(a) Transmission-side modified IRS for input samples at 16 kHz.

-60

-55

-50

-45

-40

-35

-30

-25

-20

-15

-10

-5

0

5

10

100 1000 10000

|H(f

)| [d

B]

Frequency [Hz]

(b) Transmission-side modified IRS for input samples at 48 kHz.

Figure 3.14: Transmission-side modified IRS filter responses. The interrupted line repre-sents the mask of the “full” IRS.


-7500

-5000

-2500

0

2500

5000

7500

10000

12500

15000

17500

20000

170 190 210 230 250 270 290 310 330

Am

plitu

de []

Samples []

(a) Transmission-side modified IRS for input samples at 16 kHz.

-3000

-2000

-1000

0

1000

2000

3000

4000

5000

6000

7000

8000

205 215 225 235 245 255 265 275 285 295 305 315

Am

plitu

de []

Samples []

(b) Transmission-side modified IRS for input samples at 48 kHz.

Figure 3.15: Impulse response of transmission-side modified IRS filters at 16 kHz (top)and 48 kHz (bottom).


-90-87-84-81-78-75-72-69-66-63-60-57-54-51-48-45-42-39-36-33-30-27-24-21-18-15-12

-9-6-303

0 500 1000 1500 2000 2500 3000 3500 4000

|H(f

)| [d

B]

Frequency [Hz]

Upper maskLower mask

STL filter

(a) Receive-side modified IRS for input samples at 8 kHz.

-90-87-84-81-78-75-72-69-66-63-60-57-54-51-48-45-42-39-36-33-30-27-24-21-18-15-12

-9-6-303

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 5500 6000 6500 7000 7500 8000

|H(f

)| [d

B]

Frequency [Hz]

Upper maskLower mask

STL filter

(b) Receive-side modified IRS for input samples at 16 kHz.

Figure 3.16: Receive-side modified IRS filter responses. The diamonds represent thenominal values of the modified IRS characteristic and the interrupted line represent themask of the “full” IRS.


-5000

0

5000

10000

15000

20000

25000

30000

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75

Am

plitu

de []

Samples []

(a) Receive-side modified IRS for input samples at 8 kHz.

-5000

-3000

-1000

1000

3000

5000

7000

9000

11000

13000

15000

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150

Am

plitu

de []

Samples []

(b) Receive-side modified IRS for input samples at 16 kHz.

Figure 3.17: Impulse response of receive-side modified IRS filters at 8 kHz (top) and 16kHz (bottom).


-80

-60

-40

-20

0

0 500 1000 1500 2000 2500 3000 3500 4000

|H(f

)| [d

B]

Frequency [Hz]

STL Freq.Resp. Psophometric filter - sampl.freq=8kHz

0 dB at 800 Hz

Figure 3.18: Frequency response for the psophometric filter. The points show the averagepoints and the allowed range as per ITU-T Rec. O.41.

-40

-35

-30

-25

-20

-15

-10

-5

0

5

10

0 1000 2000 3000 4000 5000 6000 7000 8000

|H(f

)| [d

B]

Frequency [Hz]

Figure 3.19: Frequency response for the ∆SM filter.


0 1000 2000 3000 4000 5000 6000 7000 8000Frequency [Hz]

-80-75-70-65-60-55-50-45-40-35-30-25-20-15-10-505

10

Am

plitu

de R

espo

nse

[dB

]

P.341 send filter(linear phase FIR, 592 taps)

P.341 send mask

P.341 send mask

Figure 3.20: STL P.341 send-side filter frequency response for data sampled at 16 kHz(factor 1:1).

-10000

-5000

0

5000

10000

15000

20000

25000

196 216 236 256 276 296 316 336 356 376 396

Am

plitu

de []

Samples []

Figure 3.21: STL P.341 send-side filter impulse response.


0 2000 4000 6000 8000 10000 12000 14000 16000−80

−70

−60

−50

−40

−30

−20

−10

0

10

Frequency (Hz)

Am

plitu

de r

espo

nse

(dB

)

linear phase FIR, 32kHz sampling frequency, 1119 taps

Figure 3.22: STL 50Hz-14kHz band limiting filter frequency response for data sampledat 32 kHz (factor 1:1).

500 520 540 560 580 600 620−0.2

0

0.2

0.4

0.6

0.8

FIR 50−14000 Hz Flat bandpass filter Impulse response

Samples[]

Am

plitu

de[]

Figure 3.23: STL 50Hz-14kHz band limiting filter impulse response.


0 1000 2000 3000 4000 5000 6000 7000 8000Frequency [Hz]

-80.0-75.0-70.0-65.0-60.0-55.0-50.0-45.0-40.0-35.0-30.0-25.0-20.0-15.0-10.0

-5.00.05.0

10.0

Am

plitu

de R

espo

nse

[dB

]

(16 kHz sampling frequency, 592 taps)

Figure 3.24: STL 50Hz-5kHz band limiting filter frequency response for data sampled at16 kHz (factor 1:1).

-5000

0

5000

10000

15000

20000

196 216 236 256 276 296 316 336 356 376 396

Am

plitu

de []

Samples []

FIR 50-5000 Hz Flat Bandpass Filter Impulse Response



0 1000 2000 3000 4000 5000 6000 7000 8000−80

−70

−60

−50

−40

−30

−20

−10

0

10(16 kHz sampling frequency, 603 taps)

Frequency (Hz)

Am

plitu

de r

espo

nse

[dB

]

Figure 3.26: STL 100Hz-5kHz band limiting filter frequency response for data sampledat 16 kHz (factor 1:1).

220 240 260 280 300 320 340 360 380 400−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

Samples []

Am

plitu

de []

FIR 100−5000 Hz Flat bandpass filter Impulse response



0 0.5 1 1.5 2

x 104

−120

−100

−80

−60

−40

−20

0

MUSHRA Anchor LP3.5, 465 taps

Frequency (Hz)

Am

plitu

de R

espo

nse

(dB

)

Figure 3.28: Frequency response of the STL MUSHRA Anchor LP3.5 – Lowpass filterwith cut-off frequency 3.5kHz for a sampling frequency of 48kHz (factor 1:1).

0 50 100 150 200 250 300 350 400 450

−0.04

−0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Samples[]

Am

plitu

de[]

MUSHRA Anchor LP3.5 impulse response

Figure 3.29: Impulse response of the STL MUSHRA Anchor LP3.5 – Lowpass filter withcut-off frequency 3.5kHz for a sampling frequency of 48kHz (factor 1:1).


0 0.5 1 1.5 2

x 104

−120

−100

−80

−60

−40

−20

0

Frequency (Hz)

Am

plitu

de R

espo

nse

(dB

)MUSHRA Anchor LP7, 235 taps

Figure 3.30: Frequency response of the STL MUSHRA Anchor LP7 – Lowpass filter withcut-off frequency 7kHz for a sampling frequency of 48kHz (factor 1:1).

0 50 100 150 200−0.1

−0.05

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Samples[]

Am

plitu

de[]

MUSHRA Anchor LP7 impulse response

Figure 3.31: Impulse response of the STL MUSHRA Anchor LP7 – Lowpass filter withcut-off frequency 7kHz for a sampling frequency of 48kHz (factor 1:1).


0 0.5 1 1.5 2

x 104

−180

−160

−140

−120

−100

−80

−60

−40

−20

0

20MUSHRA Anchor LP10, 165 taps

Frequency (Hz)

Am

plitu

de r

espo

nse

(dB

)

Figure 3.32: Frequency response of the STL MUSHRA Anchor LP10 – Lowpass filterwith cut-off frequency 10kHz for a sampling frequency of 48kHz (factor 1:1).

20 40 60 80 100 120 140 160−0.1

0

0.1

0.2

0.3

0.4

Samples[]

Am

plitu

de[]

MUSHRA Anchor LP10 impulse response

Figure 3.33: Impulse response of the STL MUSHRA Anchor LP10 – Lowpass filter withcut-off frequency 10kHz for a sampling frequency of 48kHz (factor 1:1).


3.2.1.1 * init for the FIR module

Syntax:

#include "firflt.h"

SCD FIR *delta sm 16khz init (void);

SCD FIR *hq down 2 to 1 init (void);

SCD FIR *hq up 1 to 2 init (void);

SCD FIR *hq down 3 to 1 init (void);

SCD FIR *hq up 1 to 3 init (void);

SCD FIR *irs 8khz init (void);

SCD FIR *irs 16khz init (void);

SCD FIR *linear phase pb 2 to 1 init (void);



SCD FIR *msin 16khz init();

SCD FIR *mod irs 16khz init (void);

SCD FIR *mod irs 48khz init (void);

SCD FIR *rx mod irs 8khz init(void);

SCD FIR *rx mod irs 16khz init(void);

SCD FIR *psophometric 8khz init (void);

SCD FIR *p341 16k init (void);

SCD FIR *bp14k 32khz init (void);

SCD FIR *bp5k 16k init (void);

SCD FIR *bp100 5k 16khz init (void);

SCD FIR *LP35 48kHz init (void);



Prototypes: firflt.h

Description:

delta sm 16khz init is the initialization routine for the ∆SM weighting filter for datasampled at 16 kHz using a linear phase FIR filter structure. Input and output signals willbe at 16 kHz. Code is in file fir-dsm.c and its frequency response is given in figure 3.19.

hq up 1 to 2 init is the initialization routine for high quality FIR up-sampling filteringby a factor of 2. The -3 dB point for this filter is located at approximately 3660 Hz. Codeis in file fir-flat.c and its frequency and impulse response are given in figures 3.4(a)and 3.7 (top), respectively.

hq down 2 to 1 init is the initialization routine for high quality FIR down-samplingfiltering by a factor of 2. The -3 dB point for this filter is located at approximately 3660Hz. Code is in file fir-flat.c and its frequency and impulse response are given in figures3.4(b) and 3.8 (top), respectively.

hq up 1 to 3 init is the initialization routine for high quality FIR up-sampling filter byfactor of 3. The -3 dB point for this filter is located at approximately 3650 Hz. Code isin file fir-flat.c and its frequency and impulse response are given in figures 3.5(a) and


3.7 (bottom), respectively.

hq down 3 to 1 init is the initialization routine for high quality FIR down-samplingfiltering by a factor of 3. The -3 dB point for this filter is located at approximately 3650Hz. Code is in file fir-flat.c and its frequency and impulse response are given in figures3.5(b) and 3.8 (bottom), respectively.

linear phase bp 1 to 2 init is the initialization routine for bandpass, FIR up-samplingfiltering by a factor of 2. The -3 dB points for this filter are located at approximately98 and 3460 Hz. Code is in file fir-flat.c and its frequency and impulse response aregiven in figures 3.6(b) and 3.9(b), respectively.

linear phase bp 2 to 1 init is the initialization routine for bandpass, FIR down-samplingfiltering by a factor of 2. The -3 dB points for this filter are located at approximately98 and 3460 Hz. Code is in file fir-flat.c and its frequency and impulse response aregiven in figures 3.6(a) and 3.9(a), respectively.

linear phase bp 1 to 1 init is the initialization routine for FIR 1:1 bandpass filtering.The -3 dB points for this filter are located at approximately 98 and 3460 Hz. Code is infile fir-flat.c and its frequency and impulse response are given in figures 3.6(a) and3.9(a), respectively.

msin 16khz init is the initialization routine for the high-pass, FIR 1:1 filter that sim-ulates a mobile station input characteristic. The -3 dB point for this filter is located atapproximately 195 Hz. Code is in file fir-flat.c and its frequency and impulse responseare given in figures 3.10 and 3.11, respectively.

irs 8khz init is the initialization routine for the transmit-side IRS weighting filter fordata sampled at 8 kHz using a linear phase FIR filter structure. Input and output signalswill be at 8 kHz. Code is in file fir-irs.c and its frequency and impulse response aregiven in figures 3.12 and 3.13, respectively.

irs 16khz init is the initialization routine for the transmit-side IRS weighting filter fordata sampled at 16 kHz using a linear phase FIR filter structure. Input and output signalswill be at 16 kHz. Code is in file fir-irs.c and its frequency and impulse response aregiven in figures 3.12 and 3.13, respectively.

mod irs 16khz init is the initialization routine for the transmit-side modified IRS weight-ing filter for data sampled at 16 kHz using a linear phase FIR filter structure. Input andoutput signals will be at 16 kHz since no rate change is performed by this function. Codeis in file fir-irs.c and its frequency and impulse response are given in figures 3.14 and3.15, respectively.

mod irs 48khz init is the initialization routine for the transmit-side modified IRS weight-ing filter for data sampled at 48 kHz using a linear phase FIR filter structure. Input andoutput signals will be at 48 kHz since no rate change is performed by this function. Codeis in file fir-irs.c and its frequency and impulse response are given in figures 3.14 and3.15, respectively.

rx mod irs 8khz init is the initialization routine for the receive-side modified IRS weight-ing filter for data sampled at 8 kHz using a linear phase FIR filter structure. The -3 dBpoints for this filter are located at approximately 285 Hz and 3610 Hz. Input and outputsignals will be at 8 kHz since no rate change is performed by this function. Code is infile fir-irs.c and its frequency and impulse response are given in figures 3.16 and 3.17,respectively.


rx mod irs 16khz init is the initialization routine for the receive-side modified IRSweighting filter for data sampled at 16 kHz using a linear phase FIR filter structure.The -3 dB points for this filter are located at approximately 285 Hz and 3610 Hz. Inputand output signals will be at 16 kHz since no rate change is performed by this function.Code is in file fir-irs.c and its frequency and impulse response are given in figures 3.16and 3.17, respectively.

psophometric 8khz init is the initialization routine for the O.41 psophometric weightingfilter for data sampled at 8 kHz using a linear phase FIR filter structure. Input and outputsignals will be at 8 kHz since no rate change is performed by this function. Code is in filefir-pso.c and its frequency response is given in figure 3.18.

p341 16khz init is the initialization routine for the P.341 send-side weighting filter fordata sampled at 16 kHz. Input and output signals will be at 16 kHz since no rate changeis performed by this function. Its frequency response is shown in figure 3.20 and itsimpulse response is shown in figure 3.21. The -3 dB points for this filter are located atapproximately 50 and 7000 Hz. Code is in file fir-wb.c.

bp14k 32khz init is the initialization routine for the [50Hz-14kHz] filter for data sampledat 32 kHz. Input and output signals will be at 32 kHz since no rate change is performedby this function. Its frequency response is shown in figure 3.22 and its impulse responseis shown in figure 3.23. The -3 dB points for this filter are located at approximately 50and 14000 Hz. Code is in file fir-wb.c.

bp5k 16khz init is the initialization routine for a 50Hz-5kHz band limiting filter forwideband signals sampled at 16 kHz. Input and output signals will be at 16 kHz sinceno rate change is performed by this function. Its frequency response is shown in figure3.24 and its impulse response is shown in figure 3.25. The -3 dB points for this filter arelocated at approximately 50 and 4990 Hz. Code is in file fir-wb.c.

bp100 5k 16khz init is the initialization routine for a 100Hz-5kHz band limiting filterfor wideband signals sampled at 16 kHz. Input and output signals will be at 16 kHz sinceno rate change is performed by this function. Its frequency response is shown in figure3.26 and its impulse response is shown in figure 3.27. The -3 dB points for this filter arelocated at approximately 100 and 5000 Hz. Code is in file fir-wb.c.

LP35 48kHz init is the initialization routine for a 3.5kHz lowpass filter for signals sampledat 48 kHz. Input and output signals will be at 48 kHz since no rate change is performedby this function. Its frequency response is shown in figure 3.28 and its impulse responseis shown in figure 3.29. The -3 dB point for this filter is located at approximately 3500Hz. Code is in file fir-LP.c.

LP7 48kHz init is the initialization routine for a 7kHz lowpass filter for signals sampledat 48 kHz. Input and output signals will be at 48 kHz since no rate change is performedby this function. Its frequency response is shown in figure 3.28 and its impulse responseis shown in figure 3.29. The -3 dB point for this filter is located at approximately 7000Hz. Code is in file fir-LP.c.

LP10 48kHz init is the initialization routine for a 10kHz lowpass filter for signals sampledat 48 kHz. Input and output signals will be at 48 kHz since no rate change is performedby this function. Its frequency response is shown in figure 3.28 and its impulse responseis shown in figure 3.29. The -3 dB point for this filter is located at approximately 10000Hz. Code is in file fir-LP.c.


Variables:

None.

Return value:

These functions return a pointer to a state variable structure of type SCD FIR.

3.2.1.2 hq kernel

Syntax:

#include "firflt.h"

long hq kernel(long lseg, float *x ptr, SCD FIR *fir ptr, float *y ptr);

Prototype: firflt.h

Source code: fir-lib.c

Description:

This is the main entry routine for generic FIR filtering. It works as a switch to specificup- and down-sampling FIR-kernel functions. The adequate lower-lever filtering routineprivate to the filtering module (which is not visible by the user) is defined by the initial-ization routines. Currently, this function does not work properly for sample-by-sampledownsampling operation, i.e. when lseg = 1. This limitation should be corrected inafuture version.

Please note that prior to the first call to hq kernel, one of the initialization routineshq * init must be called to allocate memory for state variables and the set the desiredfilter coefficients.

After returning from this function, the state variables are saved to allow segment-wisefiltering through successive calls of hq kernel. This is useful when large files have to beprocessed.

Variables:

lseg . . . . . . . . . . . . . . Number of input samples. Should be larger than 1 for properdownsampling operation.

x ptr . . . . . . . . . . . . . . Array with input samples.

fir ptr . . . . . . . . . . . . . . Pointer to FIR-struct.

y ptr . . . . . . . . . . . . . . Pointer to output samples.

Return value:

The number of filtered samples as a long.

3.2.1.3 hq reset

Syntax:

#include "firflt.h"

void hq reset (SCD FIR *fir ptr);

Prototype: firflt.h



Description:

Clear state variables in SCD FIR struct; deallocation of filter structure memory is not done.Please note that fir ptr should point to a valid SCD FIR structure, which was allocated byan earlier call to one of the FIR initilization routines hq * init.

Variables:

fir ptr . . . . . . . . . . . . . . Pointer to a valid structcure SCD FIR.

Return value:

None.

3.2.1.4 hq free

Syntax:

#include "firflt.h"

void hq free (SCD FIR *fir ptr);

Prototype: firflt.h


Description:

Deallocate memory, which was allocated by an earlier call to one of the FIR initilizationroutines hq * init. Note that the pointer to the structure SCD FIR must not be a nullpointer.

Variables:

fir ptr . . . . . . . . . . . . . . Pointer to a structure of type SCD FIR.

Return value:

None.

3.2.2 IIR Module

The IIR module contains filters whose main use is for asynchronous filtering. For tele-phony bandwidth asynchronous filtering, PCM filters are available in both cascade andparallel IIR filter forms. For wideband speech (50–7000 Hz), 3:1 and 1:3 rate-changefactor filters are available. A transmit-side IRS filter for speech sampled at 8 kHz is alsoavailable in this module as an example of implementation of an IIR cascade-form filter.

The PCM filters have been designed for sampling rates of 8 and 16 kHz. It should benoted that the G.712 mask is specified in terms of Hz, rather than normalized frequencies.Therefore this applies only to rate conversions of factor 2, i.e., 8 kHz to 16 kHz and 16kHz to 8 kHz. The frequency responses of the implemented PCM filters are shown infigure 3.38.

Since the digital filters need memory, state variables are needed. In the STL, a typeSCD IIR has been defined for parallel-form IIR filters, containing the past memory samplesas well as filter coefficients and other control variables. Its fields are as follows:


nblocks . . . . . . . . Number of coefficient setsidown . . . . . . . . . Up-/down-sampling factork0 . . . . . . . . . . . . . Start index in next segmentgain . . . . . . . . . . . Gain factordirect cof . . . . . . Direct path coefficientb[3] . . . . . . . . . . . .Pointer to numerator coefficientsc[2] . . . . . . . . . . . .Pointer to denominator coefficientsT[2] . . . . . . . . . . . Pointer to state variableshswitch . . . . . . . . Switch to IIR-kernel: Up or down-sampling

For the cascade-form IIR filters, the state variable structure defined is CASCADE IIR whichis slightly different from the one for the parallel form structure:

nblocks . . . . . . . . Number of stages in cascadeidown . . . . . . . . . Up-/down-sampling factork0 . . . . . . . . . . . . . Start index in next segmentgain . . . . . . . . . . . Gain Factora[2] . . . . . . . . . . . .Pointer to numerator coefficientsb[2] . . . . . . . . . . . .Pointer to denominator coefficientsT[4] . . . . . . . . . . . Pointer to state variableshswitch . . . . . . . . Switch to IIR-kernel: Up or down-sampling

It should be noted that the values of the fields must not be altered, and for most purposesthey are not needed by the user. The relevant routines for each module are described inthe next sections.

-80

-60

-40

-20

0

0 500 1000 1500 2000 2500 3000 3500 4000

|H(f

)| [d

B]

Frequency [Hz]

Figure 3.34: Frequency response of the cascade implementation of the G.712 standardPCM filter for data sampled at 8 kHz.


-80

-60

-40

-20

0

100 1000

|H(f

)| [d

B]

Frequency [Hz]

Figure 3.35: Frequency response of an IIR cascade implementation of the P.48 “full”transmit-side IRS weighting filter for data sampled at 8 kHz.

3.2.2.1 iir * init

Syntax:

#include "iirflt.h"

CASCADE IIR *iir G712 8khz init (void);

CASCADE IIR *iir irs 8khz init (void);

CASCADE IIR *iir casc lp 3 to 1 init(void);

CASCADE IIR *iir casc lp 1 to 3 init(void);

Prototypes: iirflt.h

Description:

iir G712 8khz init initializes an 8 kHz cascade IIR filter structure for a standard PCM(G.712) filtering. Input and output signals will be at 8 kHz since no rate change isperformed by this function. The -3 dB points for this filter are located at approximately230 and 3530 Hz. Its source code is found in file cascg712.c and its frequency responseis given in figure 3.34.

iir irs 8khz init initializes an 8 kHz cascade IIR filter structure for a transmit-sideP.48 IRS non-linear phase filtering. Input and output signals will be at 8 kHz since norate change is performed by this function. Its source code is found in file iir-irs.c andits frequency response is given in figure 3.35.

iir casc lp 3 to 1 init is the initialization routine for IIR low-pass filtering with adown-sampling factor of 3:1. Although this filter is relatively independent of the sampling


0 1000 2000 3000 4000 5000 6000 7000 8000Frequency [Hz]

-100.0

-90.0

-80.0

-70.0

-60.0

-50.0

-40.0

-30.0

-20.0

-10.0

0.0

10.0A

mpl

itude

Res

pons

e [d

B]

IIR 1:3 Asynchronous filter(for a sampling frequency of 16kHz)

(a) Flat low-pass up-sampling by a factor of 3:1.

0 5000 10000 15000 20000Frequency [Hz]

-100.0

-90.0

-80.0

-70.0

-60.0

-50.0

-40.0

-30.0

-20.0

-10.0

0.0

10.0

Am

plitu

de R

espo

nse

[dB

]

IIR 3:1 Asynchronous filter(for a sampling frequency of 48kHz)

(b) Flat low-pass down-sampling by a factor of 1:3.

Figure 3.36: Flat low-pass IIR filter frequency response with factors 1:3 and 3:1 forsampling rates of 16000 and 48000 Hz.


-15000

-10000

-5000

0

5000

10000

15000

20000

25000

0 50 100 150 200 250 300 350 400 450 500 550 600

Am

plitu

de []

Samples []

(a) 1:3 up-sampling factor

-4000

-2000

0

2000

4000

6000

8000

0 10 20 30 40 50 60 70 80 90 100110120130140150160170180190200

Am

plitu

de []

Samples []

(b) 3:1 down-sampling factor

Figure 3.37: Impulse response for 1:3 and 3:1 cascade-form low-pass IIR filter.


-80

-60

-40

-20

0

0 500 1000 1500 2000 2500 3000 3500 4000

|H(f

)| [d

B]

Frequency [Hz](a) G.712 for input samples at 8 kHz, up-sampling factor 1:2

-80

-60

-40

-20

0

0 1000 2000 3000 4000 5000 6000 7000 8000

|H(f

)| [d

B]

Frequency [Hz](b) G.712 for input samples at 16 kHz, down-sampling factor 2:1 or 1:1

Figure 3.38: Standard PCM (G.712) quality filter response.


Figure 3.39: Impulse response for G.712 filters (Top: factor 1:1; Middle: factor 1:2;Bottom: factor 2:1).


rate,4 it was originally designed for asynchronization filtering of 16 kHz sampled speech.The -3 dB point for this filter is located at approximately 7055 Hz. Its source code is foundin file iir-flat.c and its frequency and impulse response are given in figures 3.36(a) and3.37(a), respectively.

iir casc lp 1 to 3 init is the initialization routine for IIR low-pass filtering with a up-sampling factor of 1:3. Although this filter is relatively independent of the sampling rate,it was originally designed for asynchronization filtering of 16 kHz sampled speech. The -3dB point for this filter is located at approximately 7055 Hz. Its source code is found infile iir-flat.c and its frequency and impulse response are given in figures 3.36(b) and3.37(b), respectively.

3.2.2.2 cascade iir kernel

Syntax:

#include "iirflt.h"

long cascade iir kernel (long lseg, float *x ptr, CASCADE IIR *iir ptr,

float *y ptr);

Prototype: iirflt.h

Source code: iir-lib.c

Description:

General function for implementing filtering using a cascade-form IIR filter previouslyinitialized by one of the iir * init() routines.

Variables:

lseg . . . . . . . . . . . . . . Number of input samples.

x ptr . . . . . . . . . . . . . . Array with input samples.

iir ptr . . . . . . . . . . . . . . Pointer to a cascade-form IIR-struct CASCADE IIR.

y ptr . . . . . . . . . . . . . . Pointer to output samples.

Return value:

The number of output samples is returned as a long.

3.2.2.3 cascade iir reset

Syntax:

#include "iirflt.h"

void cascade iir reset (CASCADE IIR *iir ptr);

Prototype: iirflt.h


Description:

4Since this is a low-pass filter, change of sampling rate implies in change of the lower and upper cutofffrequencies.


Clear state variables in CASCADE IIR structure, which have been initialized by a previouscall to one of the initialisation functions. Memory previously allocated is not released.

Variables:

iir ptr . . . . . . . . . . . . . . Pointer to struct CASCADE IIR, previously initialized by a callto one of the initialization routines.

Return value:

None.

3.2.2.4 cascade iir free

Syntax:

#include "iirflt.h"

void cascade iir free (SCD IIR *iir ptr);

Prototype: iirflt.h


Description:

Deallocate memory, which was allocated by an earlier call to one of the cascade-form IIRfilter initilization routines described before. iir prt must not be a NULL pointer.

Variables:

iir ptr . . . . . . . . . . . . . . Pointer to struct CASCADE IIR, previously initialized by a callto one of the initialization routines.

Return value:

None.

3.2.2.5 stdpcm * init

Syntax:

#include "iirflt.h"

SCD IIR *stdpcm 16khz init (void);

SCD IIR *stdpcm 1 to 2 init (void);

SCD IIR *stdpcm 2 to 1 init (void);

Prototypes: iirflt.h

Description:

stdpcm 16khz init initializes a 16 kHz IIR filter structure for standard PCM (G712)filtering. Input and output signals will be at 16 kHz since no rate change is performed bythis function. The -3 dB points for this filter are located at approximately 174 and 3630Hz. Source code is found in file iir-g712.c and its frequency and impulse response aregiven in figures 3.38(b) and 3.39 (top), respectively.

stdpcm 1 to 2 init initializes standard PCM filter coefficients for filtering by the genericfiltering routine stdpcm kernel, for input signals at 8 kHz, generating the output at 16kHz. The -3 dB points for this filter are located at approximately 174 and 3630 Hz.


Source code is found in file iir-g712.c and its frequency and impulse response are givenin figures 3.38(a) and 3.39 (middle), respectively.

stdpcm 2 to 1 init initializes standard PCM filter coefficients for filtering by the genericfiltering routine stdpcm kernel for input signals at 16 kHz, generating the output at 8kHz. The -3 dB points for this filter are located at approximately 174 and 3630 Hz.Source code is found in file iir-g712.c and its frequency and impulse response are givenin figures 3.38(b) and 3.39 (bottom), respectively.

Variables:

None.

Return value:

This function returns a pointer to a state variable structure of type SCD IIR.

3.2.2.6 stdpcm kernel

Syntax:

#include "iirflt.h"

long stdpcm kernel (long lseg, float *x ptr, SCD IIR *iir ptr,

float *y ptr);

Prototype: iirflt.h


Description:

General function to perform filtering using a parallel-form IIR filter previously initializedby one of the appropriate parallel-form * init() routines available.

Variables:

lseg . . . . . . . . . . . . . . Number of input samples.x ptr . . . . . . . . . . . . . . Array with input samples.iir ptr . . . . . . . . . . . . . . Pointer to a parallel-form IIR-struct SCD IIR.y ptr . . . . . . . . . . . . . . Pointer to output samples.

Return value:

This function returns the number of output samples as a long.

3.2.2.7 stdpcm reset

Syntax:

#include "iirflt.h"

void stdpcm reset (SCD IIR *iir ptr);

Prototype: iirflt.h


Description:

Clear state variables in SCD IIR structure, which have been initialized by a previous callto one of the init functions. Memory previously allocated is not released.


Variables:

iir ptr . . . . . . . . . . . . . . Pointer to struct SCD IIR, previously initialized by a call toone of the initialization routines.

Return value:

None.

3.2.2.8 stdpcm free

Syntax:

#include "iirflt.h"

void stdpcm free (SCD IIR *iir ptr);

Prototype: iirflt.h


Description:

Release memory which was allocated by an earlier call to one of the parallel-form IIRfilter initilization routines described before. The parameter iir prt must not be a nullpointer.

Variables:

iir ptr . . . . . . . . . . . . . . Pointer to struct SCD IIR, previously initialized by a call toone of the initialization routines.

Return value:

None.

3.3 Tests and portability

Compliance with the R&Os was verified by checking the frequency response of the filtersand the size of the output files. Frequency response was obtained by feeding the filteringroutines with sinewaves and calculating the ratio in dB, for each frequency of interest.

Portability of this module was checked by running the same speech file on a proven plat-form and on a test platform. Comparison of both processed files should show either nodifferences or yield equivalent results.5 Tests were performed in the VAX/VMS environ-ment with VAX-C and gcc, in MSDOS with Borland Turbo C++ Version 1.00 and gcc(DJGPP), in SunOS with cc, acc, and gcc, and in HPUX with gcc.

5Some differences may appear in the output files, but for a few samples and by no more than 1 LSb.As an example, in the tests for checking VAX and SUN-OS, one of the files differed in 3 samples out of49152 for a cascade of high-quality up- and down-sampling of 1:6 and 6:1. For small rate change factors,differences are unlikely.


3.4 Examples

3.4.1 Description of the demonstration programs

Three programs are provided as demonstration programs for the RATE module, firdemo.c,iirdemo.c, and filter.c.

Programs firdemo.c and iirdemo.c were the first demonstration programs for the ratechange module. The former is found in directory fir of the STL and contains a cascadeprocessing of the FIR filters available upto the STL96. The latter is found in directory iir

of the STL and contains a cascade processing of the IIR filters available upto the STL96.However, because of the increasing static memory requirement for cascade processingthat came with the introduction of new filters in the STL, these two programs becameprohibitive and their maintenance was discontinued. They are still functional, althoughoutdated.

Program filter.c is a single demonstration program that incorporates both IIR andFIR filters in the STL and has been kept up-to-date as new filters are added to theSTL. Compared to the firdemo.c and iirdemo.c programs, filter.c can only perform onefiltering operation per pass, while firdemo.c and iirdemo.c could perform a number of 1:1operations combined with two up-sampling and two downsampling operations. Hence,several calls of the filter program are necessary to implement what was accomplished by asingle call of firdemo.c and iirdemo.c, in addition to the cummulative quantization noise(from the sucessive float-to-short conversions). In applications where multiple filtering isneeded and the user is concerned with the quantization noise accumulation, a custom-made program could be used e.g. based on a specialization of either firdemo.c, iirdemo.c,or filter.c.

3.4.2 Example: Calculating frequency responses

The following C code exemplifies the use of some of the filter functions available in theSTL. The C code generates a number of tones which are specified by the user (lower,upper, and step frequencies). The frequency response is obtained by calculating thepower change for each single frequency before and after filtered by the selected filter.

#include <stdio.h>#include <stdlib.h>#include <math.h>

/* UGST MODULES */#include "ugstdemo.h"#include "iirflt.h"#include "firflt.h"

/* Other stuff */#define TWO_PI (8*atan(1.0))#define QUIT(m,code) fprintf(stderr,m); exit((int)code);

void main(argc, argv)int argc;


char *argv[];

SCD_FIR *fir_state;SCD_IIR *iir_state;float *BufInp, *BufOut;char F_type[20];long j, N, N2;long inp_size, out_size;double f, f0, fstep, ff, fs, inp_pwr, H_k;char is_fir;

/* Preamble */N = 256; N2 = 20; inp_size = N * N2;

/* Read parameters for processing */GET_PAR_S(1, "_Filter type(irs,hq2,hq3,pcm,pcm1): ... ", F_type);GET_PAR_D(2, "_Start frequency [Hz]: ................ ", f0);GET_PAR_D(3, "_Stop frequency [Hz]: ................. ", ff);GET_PAR_D(4, "_Frequency step [Hz]: ................. ", fstep);FIND_PAR_D(5, "_Sampling Frequency [Hz]: ............. ", fs, 8000);

/* Check consistency of upper and lower frequencies */ff = (ff >= fs / 2)? (fs / 2) : ff;if (f0 < 2.0 / (double) inp_size * fs && f0 != 0.0)

f0 = 2.0 / (double) inp_size *fs;

/* Normalization of frequencies */f0 /= fs; ff /= fs; fstep /= fs;

/* Set flag to filter type: IIR or FIR */is_fir = (strncmp(F_type,"pcm",3)==0 || strncmp(F_type,"PCM",3)==0)

? 0 : 1;

/* ... CHOOSE CORRECT FILTER INITIALIZATION ... *//** Filter type: irs - IRS weighting 2:1 or 1:2 factor:* . fs == 8000 -> up-sample: 1:2* . fs == 16000 -> down-sample: 2:1

*/if (strncmp(F_type, "irs", 3) == 0 || strncmp(F_type, "IRS", 3) == 0)

if (fs == 8000)fir_state = irs_8khz_init();

else if (fs == 16000)fir_state = irs_16khz_init();

elseQUIT("IRS Implemented only for 8 and 16 kHz\n", 15);

/** Filter type: hq2 - High-quality 2:1 or 1:2 factor:* . fs == 8000 -> up-sample: 1:2* . fs == 16000 -> down-sample: 2:1


* hq3 - High-quality 3:1 or 3:1 factor* . fs == 8000 -> up-sample: 1:3* . fs == 16000 -> down-sample: 3:1

*/else if (strncmp(F_type,"hq",2)==0 || strncmp(F_type,"HQ",2)==0)

if (fs == 8000) /* It is up-sampling! */fir_state = F_type[2] == ’2’

? fir_up_1_to_2_init(): fir_up_1_to_3_init();

else /* It is down-sampling! */fir_state = F_type[2] == ’2’

? fir_down_2_to_1_init(): fir_down_3_to_1_init();

/** Filter type: pcm - Standard PCM quality 2:1 or 1:2 factor:* . fs == 8000 -> up-sample: 1:2* . fs == 16000 -> down-sample: 2:1* pcm1 - Standard PCM quality with 1:1 factor* . fs == 8000 -> unimplemented* . fs == 16000 -> OK, 1:1 at 16 kHz

*/else if (strncmp(F_type,"pcm",3)==0 || strncmp(F_type,"PCM",3)==0)

if (strncmp(F_type,"pcm1", 4)==0 || strncmp(F_type,"PCM1",4)==0)

if (fs == 16000)iir_state = stdpcm_16khz_init();

elseQUIT("Unimplemented: PCM w/ factor 1:1 for given fs\n", 10);

else

iir_state = (fs == 8000)? stdpcm_1_to_2_init() /* It is up-sampling! */: stdpcm_2_to_1_init(); /* It is down-sampling! */

/* Calculate Output buffer size */if (is_fir)

out_size = (fir_state->hswitch==’U’)? inp_size * fir_state->dwn_up: inp_size / fir_state->dwn_up;

elseout_size = (iir_state->hswitch==’U’)

? inp_size * iir_state->idown: inp_size / iir_state->idown;

/* Allocate memory for input buffer */if ((BufInp = (float *) calloc(inp_size, sizeof(float))) == NULL)

QUIT("Can’t allocate memory for data buffer\n", 10);


/* Allocate memory for output buffer */if ((BufOut = (float *) calloc(out_size, sizeof(float))) == NULL)

QUIT("Can’t allocate memory for data buffer\n", 10);

/* Filtering operation */for (f = f0; f <= ff; f += fstep)

/* Reset memory */memset(BufOut, ’\0’, out_size * sizeof(float));

/* Adjust top (NORMALIZED!) frequency, if needed */if (fabs(f - 0.5) < 1e-8/fs) f -= (0.05*fstep);

/* Calculate as a temporary the frequency in radians */inp_pwr = f * TWO_PI;

/* Generate sine samples with peak 20000 ... */for (j = 0; j < inp_size; j++)

BufInp[j] = 20000.0 * sin(inp_pwr * j);

/* Calculate power of input signal */for (inp_pwr = 0, j = 0; j < inp_size; j++)

inp_pwr += BufInp[j] * BufInp[j];

/* Convert to dB */inp_pwr = 10.0 * log10(inp_pwr / (double) inp_size);

/* Filtering the whole buffer ... */j = (is_fir)

? fir_kernel(inp_size, BufInp, fir_state, BufOut): stdpcm_kernel(inp_size, BufInp, iir_state, BufOut);

/* Compute power of output signal; discard initial 2*N samples */for (H_k = 0, j = 2 * N; j < out_size - 2 * N; j++)

H_k += BufOut[j] * BufOut[j];

/* Convert to dB */H_k = 10 * log10(H_k / (double) (out_size - 4 * N)) - inp_pwr;

/* Printout of gain at the current frequency */printf("\nH( %4.0f ) \t = %7.3f dB\n", f * fs, H_k);

Chapter 4

EID: Error Insertion Device

An error insertion device (EID) is used to study the behaviour of digital transmissionsystems and equipments under error conditions. This requires a model for the transmissionchannel, and an error generation algorithm. In the most general case, burst or random biterror generators are needed. In other cases, such as when evaluating mobile and wirelesssystems, random and bursty frame erasures are of importance.

The EID module implements these four functionalities. The model for random and burstybit errors, and for random frame erasure is based on a linear congruential sequence randomnumber generator, and the bit error insertion and random frame erasure are based on atwo-state channel model.

The burst frame erasure function requires a more elaborated model. For the specificapplication of wireless systems, a model based on Markov sequences has been developed.This is known within ITU as the Bellcore model [15, 16]. It has been used in the ITU-T8 kbit/s speech coder selection tests, and has been incorporated in the STL.

In this chapter one finds the description of both channel models, and a description of theirimplementation in the EID module.


4.1.1 Simple Channel Model

The bit error insertion algorithm of the EID is based on a channel model where (binary)data bitstreams are to be transmitted, and is based on the discrete Gilbert Elliott channel(GEC) model, described in [17].

This model (see figure 4.1) has two states, Good (G) and Bad or Burst (B). Associatedwith these two states, there are four parameters (probabilities): two relating to the prob-ability of remaining in state G or B, and two relating to the probability of transition fromthe current state to the other state (i.e., the occurence of a binary digit transition, orerror).

The probabilities associated with the channel states are P and Q, P being the probabilityof transition from state G to B, and Q the probability of transition from B to G. Hence,the probability of remaining in the same state is (1−P ) and (1−Q) for states G and B,

69


respectively. For a given state, there are probabilities that a change in a bit occur, andthis is PG for state G, and PB for state B.

Therefore, the channel may be either in the good state G, where the mean bit errorprobability PG is very low (PG ≈ 0), or in the bad state B, where the mean bit errorprobability PB is rather high (PB ≈ 0.5).

Figure 4.1: Gilbert Elliot Channel Model (GEC)

The mean bit error probability BER generated by this channel model is

BER =P

1− γ· PB +

Q

1− γ· PG (4.1)

whereγ = 1− (P + Q) (4.2)

is a measure for the correlation of the bit errors, and consequently an indication of theburst or random characteristic of the channel. In this issue, γ ≈ 0 implies a nearlyrandom error channel, while γ ≈ 1 implies a totally bursty channel. Please note thatBER is reasonable only in the range 0 ≤ BER ≤ 0.5.

For many applications, bit error sequences with a distinct mean bit error probabilityBER and a distinct bit error correlation γ are of interest. From equations (4.1) and(4.2) we get for the remaining parameters of the GEC for arbitrarily choosen values of0 ≤ PG < PB ≤ 0.5:

P = (1− γ) · (1− PB − BER

PB − PG

) (4.3)

Q = (1− γ) · PB − BER

PB − PG

(4.4)

In the Error Insertion Device (EID) the special values PG = 0 and PB = 0.5 are choosen.This relates to the fact that in the good state no bit changes are expected, hence PG = 0;


now, for the bad state, the channel is supposed to be in a totally uncertain state, thenPB = 0.5. With this choice, equations (4.3) and (4.4) reduce to:

P = 2 · (1− γ) · BER (4.5)

Q = (1− γ) · (1− 2 · BER) (4.6)

As an example, figure 4.2 shows the effect of γ on the auto-correlation of a bitstream,generated by the GEC (with BER = 0.02 in equations (4.5),(4.6)).

Figure 4.2: Bit Error Correlation for a Bit Error Rate of 2%

It can be seen that for γ = 0 the bit errors are statistically independent, because theautocorrelation sequence ACF(κ) has a peak (1.0) in κ = 0, and the remaining coefficientsACF(1),ACF(2),... oscillate around the selected bit error rate of 0.02 (2 · 10−2). Forγ = 0.5, slightly bursty errors can be observed, when the initial terms of the correlationsequence build a transition region, and the remaining (higher) terms are around 0.02.Increasing γ towards unity, the correlation between the bit errors also increases, leadingto totally bursty errors in the limit.

4.1.2 The Bellcore Model

The following description has been based on [15]. The actual error sequence in a wirelessenvironment will depend upon the carrier frequency, user speed, detection scheme, typeof diversity employed, mean SNR, hand-off mechanism, etc. Though a model could becreated using the above parameters, it would be impossible to apply because of the widevariance of the model parameters. It was found that a speech coder can be tested usingerror bursts generated by a much simpler model because the burst error performanceof a speech coder can be characterized to a great extend by the way it reacts to short(5–20 ms), medium (30–60 ms) and long (over 80 ms) error bursts. If a coder performswell in the presence of a representative range of short, medium and long error bursts,


it can be expected that the coder will behave well in an actual wireless communicationsenvironment, even though the actual radio channel generates error bursts with differentstatistics.

The Bellcore model, rather than modeling the wireless communications channel, modelsthe occurrence of these short, medium, and long error bursts that would enable thecharacterization of the coder reaction to error bursts and to the error burst patternsit is expected it will encounter in practice.

0

1

2

p0

01-p

1-p1

p1

j

p2

pj-1

N-1

p

1

N-2

pj

1-pj

1-p2

Figure 4.3: N-state Markov Model.

An N-state Markov model, as illustrated in figure 4.3, is used in the STL to generateframe erasure bursts. This model has to be adequate to test speech coders using shortspeech segments (6-8s). In this model, a transition from any state (0..N–1) to state 0represents a frame received without errors, while a transition from state j–1 to state jindicates that j previous frames have been received in error. A transition from j back to0 marks the end of an error burst of length j followed by a good frame.

The Markov model with N states for creating a bursty wireless communications channelis capable of erasing up to N–1 frames. The model generates both frames with correlatedframe erasures and error-free frames. The value of N will depend on the frame durationof the speech coder under test and the maximum error burst length the coder expectsto find in practice. The error statistics can be controlled by selecting the N–1 transitionprobabilities pk, k=0..N–2. The probabilities will also determine the sequence of goodand erased frames.

The steady state probabilities can be calculated by solving the state transition matrixor using numerical methods. If Sj denotes the steady state probability that the chain isin state j and pj is the probability of transitioning from state j to j + 1, the followingrelationships can be stablished:

Sj+1 = pjSj 0 ≤ j ≤ N − 2

S0 =N−1∑j=0

Sj(1− pj) (pN−1 = 0)


The equations above can be solved since the probabilities should satisfy:

N−1∑j=0

pj = 1

A frame erasure length of j can occur only if the chain first enters state j, and thentransitions to state 0. The probability Pfe of this occur is

Pfe = Sj(1− pj)

The probability of receiving a frame in error can be calculated as

Pe =N−1∑j=1

jPfe(j)

and the probability of receiving an error-free frame is 1 − Pe. It can be seen that thesteady state probability of being in state 0, S0, also gives the probability of receiving aframe without error, i.e.,

S0 = 1−N−1∑j=1

jPfe(j)

The frame error distribution can be controlled by selecting the transition probabilities.

4.2 Implementation

The EID algorithm is made in C can be found in the module eid.c, with prototypes ineid.h. This version evolved from previous C implementations developed by PKI1, andwas used in the Host Laboratory Sessions of ETSI’s contest for the second generation ofthe GSM Digital Mobile Radio Systems, and in the Selection Phase of the ITU-T 8 kbit/sspeech coder.

The random-number generator is based on a linear congruential technique, as describedin [18]. The rule here is:

an = (69069 ∗ an−1 + 1) mod 232,

which is converted (mapped) to a float number between 0 and 1.

Since the random number generator and the channel need their internal state to be saved,two state variable data structure types were defined for the EID module. The structuretype called SCD EID is applicable to the burst and random bit erasure functions, as wellas to the random frame erasure function. The fields of this structure are:

1Phillips Communications Industry.


seed Seed for random number generator.

nstates Number of states of the channel model (presently 2).

current state Index of current channel state.

ber Pointer to array containing thresholds according to the biterror rate in each state.

usrber User defined bit error rate.

usrgamma User defined correlation factor.

matrix Pointer to matrix containing thresholds according to theprobabilities for changing from one state to another one.

For burst frame erasures only, a different state variable structure type called BURST EID

has been defined, whose fields are:

seedptr Memory for random number generator.

internal Array with probabilities for each state of the Markovprocess.

index Channel’s current state.

The values of the fields shall not be altered and are not needed by user.

The random number generator always starts from the same point, if the user does notspecify different initial seeds. In order to avoid this, the EID state variables should besaved at the end of the processing of a speech sequence, e.g. to a file by the user. Thissaving is not implemented by the EID module because this envolves I/O to the computerfile systems, and this would violate one of the UGST guidelines. Nevertheless, an exampleof this procedure is described in the demonstration programs that accompany this releaseof the EID. Therefore, users should keep in mind that, unless they save (e.g. to a file) theEID state at the end of the processing, identical error patterns will be produced, whenthe processing is re-started.

The sample buffers used by the EID module use softbits. Softbits are defined as a multi-level representation of the binary (“hard”) bits ‘1’ and ‘0’ which are associated to prob-abilities of being in error. The softbit definition adopted in the ITU-T STL uses 16-bitwords as representation of the hardbits ‘1’ and ‘0’, where the hardbit ‘1’ is representedby the softbit 0x0081 and the hardbit ‘0’ is represented by the softbit 0x007F. Therefore,there are 8 significant bits for each softbit; this definition is flexible enough to accomodateall applications that utilize the softbit concept. When soft-decision is not used, the hardbit information can be derived directly from bit 7 of the 16-bit softbit word. Also, thesoftbit ’1’ is represented by the softbit 0x0081, instead of 0x0080, in order to have bothhard bits ’1’ and ’0’ equally spaced from 0x0000 (in other words, 0x0000 is exactlythe middle of the two-complement range for 0x81 and 0x7F). Therefore, a softbit 0x0000represents a total uncertainty about the true bit value.

Error patterns produced and used by the EID module use this softbit definition. Inputand output data (i.e. signals which are affected by bit errors or frame erasures) also usesoftbits, but additionally have a so-called synchronization header.

The synchronization header is defined as two consecutive 16-bit words, the first one alwaysbeing a synchronization (sync) word in the range 0x6B21 to 0x6B2F, followed by the


bitstream length word, a two-complement number indicating the number of softbits inthe frame. The sync word 0x6B20 is reserved to indicate that a frame erasure happened.For example, for the RPE-LTP algorithm, which uses 260 bits per frame, the soft bitstreamwould have the format indicated in figure 4.4. It can be seen that each RPE-LTP framewill have 262 16-bit words, being one for the sync word (0x6B21 in the example), onefor the frame length word (whose value here is 260), followed by 260 softbit words (herecorresponding to ‘1’, ‘0’, ..., ‘0’, ‘0’). This combination of the synchronization headerand a softbit “payload” is called the bitstream signal representation and is used in ITU-T Recommendation G.192 [19] to represent encoded signals between speech encoders,error-insertion devices, transmission channel models and speech decoders.

0x6B21

260

0x7F

0x7F

0x7F

0x81

1 2 3 4 261 262

Bits

0

1

14

15

...

......

Figure 4.4: Soft bitstream format for the 13 kbit/s RPE-LTP algorithm, where 260 bits are transmitted per 20ms transmission frame.

The EID routines for random bit errors are BER generator and BER insertion; for ran-dom frame erasures, FER generator random and FER module; for burst frame erasuresFER generator burst; and open eid, open burst eid and close eid for initialization(allocation) and release of EID state structures SCD EID and BURST EID. Their descriptioncan be found next. Besides these, there are other routines which are local (private) to theEID module, and therefore are not described.

4.2.1 open eid

Syntax:

#include "eid.h"

SCD EID *open eid (double ber, double gamma);

Prototype: eid.h

Description:

Allocate memory for EID struct, set up the transmission matrix according to the selectedbit error rate, and initialize the seed for the random number generator. If the symbolPORT TEST is defined at compilation time, then the seed will always be initialized to thesame value; otherwise, the seed is initialized with the system time (in seconds). Theformer is used to test portability of the EID module, since identical patterns will begenerated2.

2Another way to force the EID to produce identical bit error patterns is to save the EID state variable(of type SCD EID) e.g. to a file and, in the next call to the routine, initialize the state variable with thesaved value.


Variables:

ber . . . . . . . . . . . . . . User desired bit error rate;gamma . . . . . . . . . . . . . . User desired burst factor;

Return value:

Returns a pointer to struct SCD EID; if the initialization failed, returns a null pointer.

4.2.2 open burst eid

Syntax:

#include "eid.h"

BURST EID *open burst eid (long index);

Prototype: eid.h

Description:

Allocate memory for a state variable structure of type BURST EID and setup the trans-mission matrix according the burst frame erasure rate (BFER) selected by index, andinitialize the seed for the random number generator. If the symbol PORT TEST is definedat compilation time, then the seed will always be initialized to the same value; otherwise,the seed is initialized with the system time, in seconds (see note in the description ofopen eid()).

Variables:

index . . . . . . . . . . . . . . Indicates one of the three BFER defined for the Bellcoremodel. If index is equal to 0, the 1% BFER is selected; if1, BFER of 3% is used; or, if index equals 2, a 5% BFER isused.

Return value:

This function returns a pointer to a structure of type BURST EID. If the initializationfailed, it returns a null pointer.

4.2.3 reset burst eid

Syntax:

#include "eid.h"

void reset burst eid (BURST EID *burst eid);

Prototype: eid.h

Description:

Reset a BURST EID structure previously initialized by a call to open burst eid(). By de-fault, only counters are reset; if the symbol RESET SEED AS WELL is defined at compilationtime, the seed is also reset. However, this is not recommended.

Variables:

burst eid . . . . . . . . . . . . . . BURST EID structure to be reset.


Return value:

None.

4.2.4 close eid

Syntax:

#include "eid.h"

void close eid (SCD EID *EID);

Prototype: eid.h

Description:

Release the memory previously allocated by open eid() for the specified EID structure.

Variables:

EID . . . . . . . . . . . . . . EID state variables’ structure to be released.

Return value:

None.

4.2.5 BER generator

Syntax:

#include "eid.h"

double BER generator (SCD EID *EID, long lseg, short *EPbuff);

Prototype: eid.h

Description:

Generates a softbit error pattern according to the selected channel model present in EID.The introduction of the bit errors in the bitstream is done by the function BER insertion.It should be noted that softbit error pattern buffers do not contain synchronization head-ers.

Variables:

EID . . . . . . . . . . . . . . Structure with channel model.lseg . . . . . . . . . . . . . . Length of current frame.EPbuff . . . . . . . . . . . . . . Bit error pattern buffer with softbits.

Return value:

The bit error rate in the current frame is returned as a double.

4.2.6 FER generator random

Syntax:

#include "eid.h"

double FER generator random (SCD EID *EID);

Prototype: eid.h


Description:

Decides whether a random frame erasure should happen for the current frame accordingto the state of the GEC model in the channel memory pointed by EID.

Variables:

EID . . . . . . . . . . . . . . Structure with channel model.

Return value:

Returns a double value: 0 if the current frame should not be erased (“good frame”) and1 if the frame should erased (“bad frame”).

4.2.7 FER generator burst

Syntax:

#include "eid.h"

double BER generator burst (BURST EID *EID);

Prototype: eid.h

Description:

Decides whether a burst frame erasure should happen for the current frame according tothe state of the Bellcore model in the channel memory pointed by EID. It should be notedthat in the long run, the overall burst frame erasure rate (BFER) may not be consistentwith the BFER specified by the user (1%, 3%, or 5%). This is an inherent defficiency ofthe implemented model and the calling program is responsible for computing the overallBFER and monitoring whether this overall BFER is close enough to the desired BFER.

Variables:

EID . . . . . . . . . . . . . . Structure with Bellcore model parameters.

Return value:

This function returns a double value: 0 if the current frame should not be erased (“goodframe”) and 1 if the frame should erased (“bad frame”).

4.2.8 BER insertion

Syntax:

#include "eid.h"

void BER insertion (long lseg, short *xbuff, short *ybuff, short

*error pattern);

Prototype: eid.h

Description:

Disturbs an input bitstream xbuff according to the error pattern provided in error pattern,saving the disturbed bitstream in the output buffer ybuff. The input and output bitstreamare compliant to the bitstream format described before, i.e. are comprised of a synchro-nization header (sync word followed by a frame length word) and softbits representing theencoded bitstream. The sync and frame length words are always located in the offsets 0


and 1 of the array, respectively. The error pattern contains only softbits. The followingsummarizes the bit error insertion rules:

a) input signal (after synchronization header):

• hard bit ’0’ represented as 0x007F;• hard bit ’1’ represented as 0x0081.

b) error pattern:

• the probability for undisturbed transmission has values in therange 0x0001..0x007F, being 0x0001 the lowest probability;

• the probability for disturbed transmission has values in the range0x00FF..0x0081, being 0x00FF the lowest probability.

c) output signal computation (does not affect the syncronization header, which iscopied unmodified from the input buffer to the output buffer):

For input ‘1’ (0x0081):• if the error pattern is in the range 0x00FF..0x0081 (255..129),

then the output will be 0x0001..0x007F (1..127), respectively;

• if the error pattern is in the range 0x0001..0x007F (1..127), thenthe output will be 0x00FF..0x0081 (255..129), respectively.

For input ‘0’ (0x007F):• if the error pattern is in the range 0x00FF..0x0081 (255..129),

then the output will be 0x00FF..0x0081 (255..129), respectively;

• if the error pattern is in the range 0x0001..0x007F (1..127), thenthe output will be 0x0001..0x007F (1..127), respectively.

Variables:

lseg . . . . . . . . Length of current frame (including synchronisation header).xbuff . . . . . . . . Buffer with input bitstream of length lseg.ybuff . . . . . . . . Buffer with output bitstream of length lseg.error pattern . . . . . . Buffer with error pattern (without synchronisation header), of

length lseg–2.

Return value:

None.

4.2.9 FER module

Syntax:

#include "eid.h"

double FER module (SCD EID *EID, long lseg, short *xbuff, short

*ybuff);

Prototype: eid.h

Description:

Implementation of the frame erasure function based on the GEC model allowing a variabledegree of burstiness (as specified by parameter gamma in the state variable structure of type


SCD EID pointed by EID). This function actually erases the current frame (as describedbelow), as opposed to function FER generator random(), which only indicates whetherthe current frame should be erased.

• computes the “frame erasure pattern”;• erases all bits in one frame according the current state of the pattern generator.

The input (undisturbed) and output (disturbed) buffers have samples conforming to thebitstream representation description in Annex B of G.192. The input and output bit-stream are compliant to the bitstream format described before, i.e. are comprised of asynchronization header (sync word followed by a frame length word) and softbits repre-senting the encoded bitstream. The sync and frame length words are always located inthe offsets 0 and 1 of the array, respectively. Should the frame be erased (depending onthe frame erasure pattern), all softbits are set to 0x0000, which corresponds to a totaluncertainty about the true bit values.

In addition, the lower 4 bits of the sync word in the synchronization header are set to 0.This makes it easier for the succeeding software to detect an erased frame. The framelength word is copied unmodified to the output buffer.

Variables:

EID . . . . . . . . . . . . . . Pointer to a state variable structure structure of type SCD EID.lseg . . . . . . . . . . . . . . Length of current frame (including synchronisation header).xbuff . . . . . . . . . . . . . . Pointer to input bitstream. The synchronisation word (xbuff[0])

is processed, the frame length word (xbuff[1]) is not changed.ybuff . . . . . . . . . . . . . . Buffer with output bitstream.

Return value:

This function returns a double value: 1 if the current frame has been erased, and 0otherwise.


Portability may be checked by running the same speech file on a proven platform and on atest platform, for the whole range of input parameters. Results should be identical whenthe compilation is done with the symbol PORT TEST properly defined and the channelstates are set to a same value.

This routine had portability tested for VAX/VMS with VAX-C, MS-DOS with Turbo Cv2.0, Sun-OS with Sun-C, and HPUX with gcc.

4.4 Examples


Two programs are provided as demonstration programs for the EID module, eiddemo.c(version 3.2), eid8k.c (version 3.2), eid-xor.c (version 1.0), gen-patt.c (version 1.4), ep-stats.c (version 2.0), and eid-int.c (version 1.0).

Program eiddemo.c uses input and output file in the form of a serial bit stream conforming


to the bitstream signal representation, as defined in Annex B of ITU-T G.192. Thisprogram will disturb the input bitstream with errors using the Gilbert Elliot Channelmodel for random or burst bit error insertion and for random frame erasures. The Bellcoremodel, which is used for burst frame erasures, is supported as a command line option,but not as default. It should be noted that this program uses function FER module(), notfunction FER generator random(), for random frame erasures.

Program eid8k.c was developed during the standardization process of the ITU-T G.7298 kbit/s speech codec for the task of producing bit error masks which would be usedin the host laboratory hardware-implemented EID. For this program, input files are notgenerated, but only bit error pattern files. Consistent with the definition in the STL,error patterns do not have synchronization headers (sync word and frame length word),but only softbits representing disturbations of the channel. GEC and the Bellcore modelare supported in this program. The output file format is, as was necessary for the G.729work, different from a serial bitstream as defined in the STL because the softbits are savedas char (8-bit words) rather than as short (16-bit words). Conversion of this format tothe STL 16-bit bitstream format can be accomplished using the unsupported programch2sh.c.

It should be noted that both programs save in files the current state of the EID modelsunder use and also try to read these state files at startup time (if not found, the programscreate new ones, which are updated when the programs terminate).

Program eid-xor.c is an error-insertion program that simply XORs bits in a bitstreamfile (in one out of three formats: G.192, byte-oriented G.192, and compact) with errorpatterns (bit errors or frame erasures in one out of three formats) and saves the disturbedbitstream in a file. The error patterns need not have been produced by any of the EIDmodels implemented in the STL, they only have to be in one of the three input formats.Since error patterns are either bit error EPs or frame erasure EPs, simultaneous bit errorsand frame erasures are not allowed by eid-xor.c.

The program gen-patt.c is used to generate error patterns (EPs) using the EID modelsimplemented in the STL (Gilbert and Bellcore models). The EPs will be either frameerasures or bit errors EPs, since the models in the STL do not support mixed frameerasure/bit error mode.

Program ep-stats.c examines an error pattern file (either bit error EPs or frame erasureEPs) and displays the actual BER/FER found in the EP and the distribution of numberof consecutive errored bits or erased frames.

Program eid-int.c interpolates a frame erasure EP such that each synchronism wordfound in the EP is repeated a user-specified number of times. This is useful to align theframe erasures for codecs that have frame sizes that are an integer sub-multiple of eachother (e.g. 10ms codecs and 20 ms codecs). In the latter example, the master EP will bethe 20ms one, and the one generated by eid-int would be used for the 10ms codec.

As a final note, it should be reinforced that the definition of the symbol PORT TEST atcompilation time will affect the operation of the programs as explained before. If thissymbol is defined, functions open eid() and open burst eid() will always start from thesame seed. Therefore, the output of the programs will be the same, unless EID state filesare available. When that symbol is not defined at compilation time, the programs willuse the run-time library function time() to get the seed used in functions open eid()

and open burst eid().


4.4.2 Using the bit error insertion routine


#include "ugstdemo.h"#include "eid.h"

#define OVERHEAD 2#define LSEG 2048L /* Frame length is FIXED! */#define SYNCword 0x6B21

void main(argc, argv)int argc;char *argv[];

SCD_EID *ber_st; /* pointer to EID-structure */char ifile[128], ofile[128];/* input/output file names */FILE *ifilptr, *ofilptr; /* input/output file pointer */static int EOF_detected = 0; /* Flag to mark END OF FILE */double ber; /* bit error rate factor */double gamma; /* burst factor */static double dstbits = 0; /* distorted bits count */static double prcbits = 0; /* processed bits count */short err_pat[LSEG]; /* error pattern-buffer */short inp[LSEG+OVERHEAD], out[LSEG+OVERHEAD]; /* bit-buffers */

GET_PAR_S(1, "_File with input bitstream: ................ ", ifile);GET_PAR_S(2, "_File for disturbed bitstream: ............. ", ofile);GET_PAR_D(3, "_Bit error rate (0.0 ... 0.50): ........ ", ber);GET_PAR_D(4, "_Burst factor (0.0 ... 0.99): ........ ", gamma);

/* Open input and output files */ifilptr = fopen(ifile, RB);ofilptr = fopen(ofile, WB);

/* Allocate EID buffer for bit errors */ber_st = open_eid(ber, gamma);if (ber_st == (SCD_EID *) 0)

QUIT(" Could not create EID for bit errors!\n", 1);

/* Now process serial soft bitstream input file */while (fread(inp, sizeof(short), LSEG+OVERHEAD, ifilptr) == LSEG+OVERHEAD)

if (inp[0] == SYNCword && EOF_detected == 0)

/* Generate Error Pattern */dstbits += BER_generator(ber_st, LSEG, err_pat);

/* Modify input bitstream according the stored error pattern */


BER_insertion(LSEG+OVERHEAD, inp, out, err_pat);prcbits += (double) LSEG; /* count number of processed bits */

/* Write disturbed bits to serial soft bitstream output file */fwrite(out, sizeof(short), LSEG+OVERHEAD, ofilptr);

else

EOF_detected = 1; /* the next SYNC-word is missed */

if (EOF_detected == 1)printf(" --- end of file detected (no SYNCword match) ---\n");

printf("\n");

/* Print message with measured bit error rate */if (prcbits > 0)

printf("Measured BER: %f (%ld of %ld bits distorted)\n",dstbits / prcbits, (long) dstbits, (long) prcbits);

4.4.3 Using the frame erasure routine


#include "ugstdemo.h"#include "eid.h"

#define QUIT(m,code) fprintf(stderr,m); exit((int)code);#define LSEG 2048L /* Frame length is FIXED! */#define OVERHEAD 2#define SYNCword 0x6B21


SCD_EID *FEReid; /* pointer to EID-structure */char ifile[128], ofile[128];/* input/output file names */FILE *ifilptr, *ofilptr; /* input/output file pointer */static int EOF_detected = 0; /* Flag to mark END OF FILE */double fer; /* frame erasure factor */double gamma; /* burst factor */static double ersfrms = 0; /* total distorted frames */static double prcfrms = 0; /* number of processed frames */short inp[LSEG+OVERHEAD], out[LSEG+OVERHEAD]; /* bit-buffers */

GET_PAR_S(1, "_File with input bitstream: ................ ", ifile);GET_PAR_S(2, "_File for disturbed bitstream: ............. ", ofile);


GET_PAR_D(3, "_Frame erasure rate (0.0 ... 0.50): ........ ", fer);GET_PAR_D(4, "_Burst factor (0.0 ... 0.99): ........ ", gamma);

/* Open input and output files */ifilptr = fopen(ifile, RB);ofilptr = fopen(ofile, WB);

/* Allocate EID buffer for bit errors frame erasure */FEReid = open_eid(fer, gamma);if (FEReid == (SCD_EID *) 0)

QUIT(" Could not create EID for frame erasure module\n", 1);

/* Now process serial soft bitstream input file */while (fread(inp, sizeof(short), LSEG+OVERHEAD, ifilptr) == LSEG + OVERHEAD)

if (inp[0] == SYNCword && EOF_detected == 0)

/* Generate frame erasure */ersfrms += FER_module(FEReid, LSEG+OVERHEAD, inp, out);prcfrms++; /* count number of processed frames */

/* Write (erased) frames to serial soft bitstream output file */fwrite(out, sizeof(short), LSEG+OVERHEAD, ofilptr);

else

EOF_detected = 1; /* the next SYNC-word is missed */

if (EOF_detected == 1)printf(" --- end of file detected (no SYNCword match) ---\n");

printf("\n");

/* Print message with measured bit error rate */if (prcfrms > 0)

printf("measured FER: %f (%ld of %ld frames erased)\n",ersfrms / prcfrms, (long) ersfrms, (long) prcfrms);

Chapter 5

G.711: The ITU-T 64 kbit/slog-PCM algorithm

In the early 1960’s an interest was expressed in encoding the analog signals in telephonenetworks, mainly to reduce costs in switching and multiplexing equipments and to allowthe integration of communication and computing, increasing the efficiency in operationand maintenance [20].

In 1972, the then CCITT published the Recommendation G.711 that constitutes theprincipal reference as far as transmission systems are concerned [21]. The basic principleof the algorithm is to code speech using 8 bits per sample, the input voiceband signalbeing sampled at 8 kHz, keeping the telephony bandwidth of 300–3400 Hz. With thiscombination, each voice channel requires 64 kbit/s.

5.1 Description of the algorithm

The idea behind the digitalization of the network involved a compromise: use as faras possible the existing infrastructure; this imposes a bandwidth limitation for the bit-streams of coded signals. A rate of 64 kbit/s was found to be reasonable.

If one thinks of using the most natural quantization scheme, one will choose linear quan-tization. But one drawback of this approach is that the signal-to-noise ratio (SNR) varieswith the amplitude of the input signals: the smaller the amplitude, the smaller the SNR.And, from the quality point of view, if a signal has a wide variance, or a variance thatchanges with time (as in the case of speech signals), the SNR will also change, resultingin a wide-varying quality of the system.

To avoid this problem, one can use logarithmic quantization, which will result into a moreuniform quantization noise. With this in mind, several studies were carried out in late1960’s to choose a good algorithm for this purpose. This led to the definition of twotransmission schemes, one using the µ law compression characteristic:

c(x) = xmaxln(1 + µ|x|/xmax)

ln(1 + µ)sgn(x)

85


and the other using the A law compression characteristic:

c(x) =

A|x|

1 + ln(A)sgn(x), for 0 ≤ |x|

xmax≤ 1

A

xmax1 + ln(A|x|/xmax)

1 + ln(A)sgn(x), for 1

A ≤ |x|xmax

≤ 1

Both characteristics behave as linear for small amplitude signals (being then equivalentto a linear quantization scheme), but are truly logarithmic for large signals. In fact, forlarge signals the SNR is:

SNRµ = 6.02B + 4.77− 20 log10(ln(1 + µ))

andSNRA = 6.02B + 4.77− 20 log10(1 + ln A)

where B is the number of bits used for quantization.

The ITU chose the values A = 87.56 and µ = 255 for the G.711 standard, together with8 bits per sample, what leads the latter two equations to:

SNRµ = 6.02B − 9.99 = 38.17dB

andSNRA = 6.02B − 10.1 = 38.06dB

The G.711 standard does not specify the law as defined above, but rather uses a goodlinear-piecewise approximation for 8 bit samples, which has easier implementation (inhardware), as well as other properties (see [22, p.229]).

This approximation uses bit 1 for sign (1 for positive, 0 for negative), bits 2–4 to indicatea segment, and bits 5–8 for level1. Within each segment, the quantization is linear (4 bits,or 16 levels), having 15 segments of distinct slopes for µ law, and 13 for A law.

The A law works with signals in the range from -4096 to 4096, implying in a range of 13bits. As for the µ law, the linear signals are accepted in the range -8159 to 8159, whichis represented by 14 bits. Besides this, in the dynamic range sense, A and µ law areequivalent to 12 and 13 bit linear quantization, respectively.

One detail for the A law is that the even bits are inverted. The reason for this comes fromproblems observed (before the standardization of the line code HDB3) in transmissionsystems when long sequences of zeros happen, because small amplitudes, in A law, to becoded mostly using ‘0’ bits. With this bit-inversion, long sequences of bits ‘0’ becomesless probable, thus improving performance.

The conversion rule for A/µ law from/to linear is described in terms of tables in G.711. Agood reason for this is that there is no closed form for the compression of linear samples(although it is possible to find a closed formulae for the expansion algorithm). Hence,two implementations are possible: table look-up, and algorithmic. For in-chip (LSI)implementations, the first one may be preferred, because it is simpler to implement,at the cost of a wider chip area. For other applications, such as using Digital SignalProcessors (DSPs), or software implementations, table look-up would occupy too muchmemory, and the algorithmic solution would be preferred.

1Please note that the bit numbering in the G.711 is the reversal of the commonly used in computerlanguages, G.711’s bit 1 corresponding to common-sense’s (most significant) bit 7, and G.711’s bit 8 tothe normal least significant bit 0, respectively.


5.2 Implementation

This implementation of the G.711 can be found in the module g711.c, with prototypesin g711.h.

For the reason explained before, an algorithmic approach to the G.711 was followed. Forthe compression routines, first the samples are converted from two’s complement to signedmagnitude notation2. So, a segment classification is done, and then the linear quantizationof a certain number of bits of the input sample, that depends on the segment number(e.g., for A law, segment 1 uses a factor of 2:1, 2 a 4:1, etc.) is carried out. Finally, thesign of the sample is added. The expansion routines are even simpler: find the sign, getthe mantissa and the exponent, and compute the linear sample.

One important point here is that, following UGST Guidelines, linear input samples mustbe left-justified shorts. With this approach, the knowledge of the 0 dB reference for thefile is simplified, and the need of having to apply different normalization factors to filesif they are to be coded by A or µ law is eliminated3. As an example, suppose that wewant to process a speech file X by the G.711 at an input level of -20 dBov for both A andµ law. Then, if the sample representation is right-justified, and a factor f brings a file’slevel to -20 dBov for µ law, then for A law the factor will be 2.f , due to the differencein input signal’s dynamic range of both laws (4096 and 8159, respectively). On the otherhand, if the samples are left-justified, the factor is only one, and the routines will onlylook at the 13 or 14 most significant bits of the 16-bit word, for A and µ law, respectively.In other words, the peak value for linear and A/µ law is the same, therefore one factor issufficient.

Compliance tests to this code have been done using a ramp file having the full excursionof the dynamic range for each of the laws, and examining the compressed and expandedsamples against the values expected in tables 1a, 1b, 2a, and 2b of Recommendation G.711(see [21]). Another test done exploits the synchronous property of the G.711 scheme. Onlysamples from column 7 of G.711 tables 1 and 2 were used. These values are transparentto quantization. Hence, if the coding was done properly, output samples should matchexactly the original ones.

The compression functions are alaw compress and ulaw compress, and the expansionfunctions are alaw expand and ulaw expand. In the next part you find a summary ofcalls to these functions.

5.2.1 alaw compress and ulaw compress

Syntax:

#include "g711.h"

void alaw compress (long smpno, short *lin buf, short *log buf)

void ulaw compress (long smpno, short *lin buf, short *log buf)

Prototype: g711.h

2Using the samples as two’s complement in the compression algorithm is a very common error whoseeffects are only noticeable for small amplitude signals. Our approach agrees to the one in G.726[23], blockcompress.

3In the case of stand-alone tools, this would mean that two copies of the same file should be available!


Description:

alaw compress performs A law encoding rule according to ITU-T Recommendation G.711,and ulaw compress does the same for µ law. Note that input samples shall be left-justified, and that the output samples are right-justified with 8 bits.

Variables:

smpno . . . . . . . . . . . . . . Is the number of samples in lin buf.

lin buf . . . . . . . . . . . . . . Is the input samples’ buffer; each short sample shall con-tain linear PCM (2’s complement, 16-bit wide) samples, left-justified.

log buf . . . . . . . . . . . . . . Is the output samples’ buffer; each short sample will containright-justified 8-bit wide valid A or µ law samples.

Return value: None.

5.2.2 alaw expand and ulaw expand

Syntax:

#include "g711.h"

void alaw expand (long smpno, short *log buf, short *lin buf)

void alaw expand (long smpno, short *log buf, short *lin buf)

Prototype: g711.h

Description:

alaw expand performs A law decoding rule according to ITU-T Recommendation G.711,and ulaw expand does the same for µ law. Note that output samples will be left-justified,and that the input samples shall be right-justified with 8 bits.

Variables:

smpno . . . . . . . . . . . . . . Is the number of samples in log buf.

log buf . . . . . . . . . . . . . . Is the input samples’ buffer; each short sample shall containright-justified 8-bit wide valid A or µ law samples.

lin buf . . . . . . . . . . . . . . Is the output samples’ buffer; each short sample will con-tain linear PCM (2’s complement, 16-bit wide) samples, left-justified.

Return value: None.


Portability may be checked by running the same speech file in a proven platform and ina test platform. Files processed this way should match exactly. Source and processedreference files for portability tests are provided in the STL distribution.

These routines had portability tested for VAX/VMS with VAX-C and gcc, MS-DOS withTurbo C v2.0, HPUX with gcc, and Sun-OS with Sun-C.


5.4 Example code

5.4.1 Description of the demonstration program

One program is provided as demonstration program for the G.711 module, g711demo.c.

Program g711demo.c accepts input files in 16-bit linear PCM format for compressionoperation and produces files in the same format after the expansion operation. Thecompressed signal will be in 16-bit, right adjusted format, according to the logarithmiclaw specified by the user. Three operations are possible: linear in, linear out (lili) linearin, logarithmic out (lilo), or logarithmic in, linear out (loli).

5.4.2 Simple example

The following C code gives an example of companding using either the A- or µ-law func-tions available in the STL.

#include <stdio.h>#include "ugstdemo.h"#include "g711.h"

#define BLK_LEN 256#define QUIT(m,code) fprintf(stderr,m); exit((int)code);

main(argc, argv)int argc;char *argv[];

char law[4];

char FileIn[180], FileOut[180];short tmp_buf[BLK_LEN], inp_buf[BLK_LEN], out_buf[BLK_LEN];FILE *Fi, *Fo;void (*compress)(), (*expand)(); /* pointer to a function */

/* Get parameters for processing */GET_PAR_S(1, "_Law (A,u): ................... ", law);GET_PAR_S(2, "_Input File: .................. ", FileIn);GET_PAR_S(3, "_Output File: ................. ", FileOut);

/* Opening input and output LOG-PCM files */Fi = fopen(FileIn, RB);Fo = fopen(FileOut, WB);

/* Choose compression/expansion routinies according to the law */if (toupper(law[0])==’A’)

compress = alaw_compress;expand = alaw_expand;


else if (tolower(law[0])==’u’)

compress = ulaw_compress;expand = ulaw_expand;

else

QUIT("Bad law chosen!\n",1);

/* File processing */while (fread(inp_buf, BLK_LEN, sizeof(short), Fi) == BLK_LEN)

/* Process input linear PCM samples in blocks of length BLK_LEN */compress(BLK_LEN, inp_buf, tmp_buf);

/* Process log-PCM samples in blocks of length BLK_LEN */expand(BLK_LEN, tmp_buf, out_buf);

/* Write PCM output word */fwrite(out_buf, BLK_LEN, BLK_LEN, sizeof(short), Fo);

/* Close input and output files */fclose(Fi);fclose(Fo);return 0;

Chapter 6

G.711-PLC: Packet loss concealmentwith G.711

6.1 Introduction

Packet Loss Concealment (PLC) algorithms hide transmission losses in audio systemswhere the input signal is encoded and packetized at a transmitter, sent over a network,and received at a receiver that decodes the packet and plays out the output. G.711Appendix I [24], approved by ITU-T in September 1999, describes a high quality, lowcomplexity PLC algorithm designed for use with G.711.


A brief description of the PLC algorithm is given. A more extensive presentation can befound in Section I.2, “Algorithm description”, of G.711 Appendix I [24].

The PLC algorithm is inserted after the G.711 decoder at the receiver. The algorithmis designed to work with 10 ms frames, or 80 samples per frame at 8 KHz sampling.An external mechanism is needed to signal when packets are lost. Since speech signalsare often locally stationary, the signals recent history is used to generate a reasonableapproximation to lost frames. If the losses are not too long, and do not land in a regionwhere the signal is rapidly changing, the losses may be inaudible after concealment.

When a frame is received the decoded speech is given to the PLC algorithm. Receivedframes are saved in a 48.75 ms circular history buffer, and the output is delayed by 3.75ms (30 samples).

When a packet is lost the concealment algorithm starts synthetic signal generation. Firstthe pitch is estimated by finding the peak of the normalized autocorrelation of the mostrecent 20 ms of speech in the history buffer with the previous speech at taps from 5 to15 ms. Using the pitch estimate, the most recent pitch period from the history buffer isrepeated for the duration of the first lost frame (10 ms). If the pitch estimate is longerthan 10 ms, only a portion of the most recent pitch period will be used in the first lostframe. A 1/4 pitch period overlap add (OLA) with a triangular window is performed atall repetition boundaries, including the transition between the last received frame and thestart of the synthetic signal.

91


If consecutive frames are lost, the number of pitch periods used to generate the syntheticsignal is increased by one pitch period at the start of the 2nd and 3rd lost frames. Whenthe number of pitch periods is increased, the output is smoothly transitioned to the oldestused pitch period of the history signal with an additional 1/4 pitch period OLA. Increasingthe number of pitch periods reduces the number of unnatural harmonic artifacts in theconcealed speech for long losses. The algorithm does not distinguish between voiced andun-voiced speech and uses the same procedure for both types of speech.

At the start of the first received frame after a loss, the synthetic signal generation iscontinued and OLAed with the received speech. This OLA window length increases withthe length of the loss. For single frame losses it is 1/4 of the estimated pitch period. 4ms are added for each additional consecutive lost frame, up to a maximum of 10 ms.

If the loss exceeds 10 ms the synthetic signal is also linearly attenuated at the rate of 20%per frame. If the loss exceeds 60 ms the synthesized signal is set to silence.

6.3 Implementation

6.3.1 Introduction

The g711iplc directory contains an ANSI C implementation of the G.711 Appendix IPLC algorithm. The C++ version of this algorithm is in the g711iplc\cpp cod directory.Sample test programs read lost frame patterns in G.192 file format and apply the PLCalgorithm to audio files. The software in the g711iplc directory is covered by a morerestrictive copyright than the STL. See the copyrght.txt file for details.

6.3.2 PLC Algorithm Implementation

A detailed line by line description of the C++ code can be found in section I.3 “Algorithmdescription with annotated C++ code” of G.711 Appendix I [24] and will not be repeatedhere. The public interface functions that are called by applications are covered. The C++version is in the g711iplc\cpp code directory (files lowcfe.h and lowcfe.cc). The ANSI Cversion, contained in the files lowcfe.h and lowcfe.c, is a translation of the C++ code toC. The interface functions are the same for both versions, with the exception that the Cversions of the routines take an extra argument for the data structure that is implicitlypassed to C++ member functions in the class instance data. As for other STL modules,only the ANSI C version is compiled during STL2005 building.

6.3.2.1 Constructor

C++ syntax:

#include "lowcfe.h"

LowcFE lc; // No argument constructor

C syntax:

#include "lowcfe c.h"

g711plc construct(LowcFE c*); /* explicit constructor call */


Description:

Before the PLC algorithm can be called the data structure containing the algorithm’sinternal storage, such as the history buffer and buffer pointers, must be initialized.

6.3.2.2 Received Frames

C++ syntax:

void LowcFE::addtohistory(short *s); /* add a frame to the history buffer */

C syntax:

void g711plc addtohistory(LowcFE c*, short *s);

Description:

Frames of speech received from the transmitter are given to the PLC algorithm withaddtohistory function. The argument s points to a short array of length FRAMESZ (80samples, or 10 ms) that is used as both an input and output. Before the call is made s isfilled with the decoded G.711 data received from the transmitter. On return, it containsthe data that is output to the listener. Addtohistory performs several operations. Itstores the input speech into the history buffer for use in generating the synthetic signal ifa loss occurs. If this is the first received frame after a loss, an OLA is performed with thesynthetic signal to insure a smooth transition between the signals. In addition, it delaysthe output so an OLA can be performed at the start of a loss.

6.3.2.3 Lost Frames

C++ syntax:

void LowcFE::dofe(short *s); /* synthesize speech during loss */

C syntax:

void g711plc dofe(LowcFE c*, short *s);

Description:

If a frame is lost, the dofe routine is called. As with addtohistory, s is a pointer to shortarray of FRAMESZ samples. With dofe, s is only an output. The PLC algorithm fills swith the synthetic signal that conceals the missing frame.

6.3.2.4 Support Functions

error

Syntax:

#include "error.h"

void error(char *s, ...);

Description:

Error handles fatal errors in the programs. The pattern string, s, and optional following


arguments should be in the format of arguments accepted by the C library printf function.Error prints its argument message on stderr and then exits the program. The errorfunction never returns.

readplcmask open

Syntax:

#include "plcferio.h"

void readplcmask open(readplcmask *r, char *fname);

Description:

The readplcmask open function opens a G.192 format file containing a packet loss pat-tern. fname is the file path. If successfully opened, r contains the state informationneeded for reading the patterns. readplcmask open internally calls the STL eid moduleto determine the type of the G.192 file and select an appropriate reading function. Ifthe open fails or an unknown pattern is detected in the file, function error is called andreadplcmask open will not return.

readplcmask erased

Syntax:


int readplcmask erased(readplcmask *r);

Description:

readplcmask erased reads the next value from the opened G.192 format pattern file. Itreturns 1 if the frame is lost and should be concealed and 0 if the frame is ok. If theend of the G.192 file is reached, the routine seeks back to the beginning of the file andthe pattern sequence is repeated. If an illegal value is found in the G.192 file, the errorfunction is called.

readplcmask close

Syntax:


void readplcmask close(readplcmask *r)

Description:

readplcmask close is used to close a G.192 file that was opened with readplcmask open.

6.3.3 Test Program

6.3.3.1 Test Program Usage

The PLC algorithm is tested using g711iplc.c. The PLC test programs take 3 file argu-ments:

g711iplc mask.g192 input.raw output.raw

The mask.g192 file contains the lost frame pattern and should be in the G.192 format asspecified in the software tools library. The g192, byte, and compact representations are


supported. The G.192 file should contain only the frame headers words (G192 SYNC orG192 FER, see softbit.h), and not the data words.

A frame corresponds to 10 ms, or 80 samples. If the lost frame pattern file is shorter thanthe number of frames in the input.raw file, the program will roll-over back to the startof the pattern file. For example if the mask.g192 file contains the binary data:

0x6B21 0x6B21 0x6B21 0x6B21 0x6B21, 0x6B21 0x6B21 0x6B21 0x6B21 0x6B20

a 10% uniform loss pattern will be applied to the whole file. Erasures will occur at 90-100ms, 190-200 ms, 290-300 ms ... in the file.

While the algorithm is designed for packets containing 10ms of speech, it can be applied topacketizations containing speech chunks that are integer multiples of 10ms. For example,for a 10% uniform loss pattern with 20ms packetization one could use:

0x6B21 0x6B21 0x6B21 0x6B21 0x6B21, 0x6B21 0x6B21 0x6B21 0x6B21 0x6B21, 0x6B210x6B21 0x6B21 0x6B21 0x6B21, 0x6B21 0x6B21 0x6B21 0x6B20 0x6B20

to cause erasures at 180-200ms, 380-400ms, 580-600ms, etc.

The input audio file, input.raw, should contain header-less 16-bit binary data, sampledat 8 KHz, in the native byte order for the machine running the test programs (big-endianon SPARC or MIPS, little-endian on Intel). The test programs do not contain the G.711encoder or decoder. If you have a G.711 bit-stream, it must be decoded before theg711iplc program is run.

The output audio file, output.raw, also contains header-less 16-bit binary data. The PLCalgorithm delays the output by 3.75 ms. The test programs compensate for this delay bynot outputting the first 3.75 ms of the first packet. This way the input and output fileswill be time aligned if they are overlaid in an audio waveform editor. In addition, afterthe last full packet is input to the PLC algorithm, an extra zero filled frame is input, andthe first 3.75 ms of the corresponding output frame is sent to the output file. The lengthof the output file will always be a multiple of the 10ms frame size. If the input file lengthis not an integral number of frames the last partial input frame will be discarded.

The test programs can also simulate a silence insertion algorithm instead of the PLCalgorithm with the -nolplc option:

g711iplc -noplc mask.g192 input.raw output.raw

Instead of calling the concealment algorithm the lost frames are simply zero filled. Thisis helpful if you want to use a wave editor to view the location of the missing frames.

Use the -stats option to print out the number and percentage of frames concealed in theprocessed file.

6.3.3.2 Test Program Implementation

A simplified version of the C++ test program is shown next. This program does not sup-port any options, such as -noplc, or compensate for the algorithm delay, but demonstrateshow the components work together.


#include <stdlib.h>#include <stdio.h>#include <string.h>#include "error.h"#include "plcferio.h"#include "lowcfe.h"

int main(int argc, char *argv[]) FILE *fi; /* input file */FILE *fo; /* output file */LowcFE fesim; /* PLC simulation class */readplcmask mask; /* error pattern file reader */short s[FRAMESZ]; /* i/o buffer */

argc--; argv++;if (argc != 3)

error("Usage: g711iplc plcpattern speechin speechout");readplcmask_open(&mask, argv[0]);if ((fi = fopen(argv[1], "rb")) == NULL)

error("Can’t open input file: %s", argv[1]);if ((fo = fopen(argv[2], "wb")) == NULL)

error("Can’t open output file: %s", argv[2]);while (fread(s, sizeof(short), FRAMESZ, fi) == FRAMESZ)

if (readplcmask_erased(&mask))fesim.dofe(s); /* lost frame */

elsefesim.addtohistory(s); /* received frame */

fwrite(s, sizeof(short), FRAMESZ, fo);fclose(fo);fclose(fi);readplcmask_close(&mask);return 0;

6.3.4 Loss Pattern Conversion Utility

The PLC directory includes a tool, asc2g192, for converting ASCII loss pattern filescontaining sequences of 0s and 1s into G.192 format pattern files. In ASCII loss patternfiles, a “1” represents a lost frame and a “0” represents a received frame. For example, tocreate a 10% uniform loss pattern with each loss being 10ms, use a text editor to createa text file called fe10.txt:

0000000001

Then, convert it to the G.192 format for use by the g711iplc program with the followingcommand:

asc2g192 fe10.txt fe10.g192

Similarly, to create a 10% uniform loss pattern with each loss being 20ms (2 frames foreach loss), create the text file fe10 2.txt :


00000000000000000011

Then convert it to the G.192 format with:

asc2g192 fe10 2.txt fe10 2.g192

The asc2g192 conversion program ignores new lines and carriage returns in the input fileso the patterns can span multiple lines.

Chapter 7

G.726: The ITU-T ADPCMalgorithm at 40, 32, 24, and 16 kbit/s

In 1982, a group was established by the then CCITT Study Group XVIII to study thestandardization of a speech coding technique that could reduce the 64 kbit/s rate used indigital links, as per ITU-T Recommendation G.711 (see related Chapter), by half whilemaintaining the same voice quality.

After considering contributions received from several organizations, there was a generalfeeling that the ADPCM (Adaptive Differential Pulse Code Modulation) technique couldprovide a good quality coder. This process of finalizing an algorithm took 18 months ofdevelopment and objective and subjective testings, to culminate in a ITU Recommenda-tion, published in October, 1984, and available in the Red Book series as RecommendationG.721.

Meanwhile, problems were found with the G.721 algorithm of 1984 regarding voice-banddata signals modulated using the Frequency Shift Keying (FSK) technique, and changeshad to be done to the algorithm. These changes were approved in 1986 and published inthe next series of Recommendations of the CCITT, the Blue Book series, superseedingthe Red Book version of the G.721. This is why a note in the Blue Book G.721 warns theuser that the bit stream of coded speech from this version is incompatible with the oldone. Also in that Study Period (1985-1988), a need for other rates was identified, and anew Recommendation, G.723, was approved to extend the bitrate to 24 and 40 kbit/s.

In the Study Period of 1989–1992, these two Recommendations have been joined into asingle one, keeping full compatibility with the former ones, and adding a lower rate of 16kbit/s. This new Recommendation was named G.726, and the former G.721 and G.723have been replaced.

The current version of the STL includes a G.726 implementation. In the section to follow,the operation of the G.726 algorithm is described only for the 32 kbit/s bit rate. Acomplete description of the G.726 algorithm can be found in [23]. Other analyses of thealgorithm, besides some based on the Red Book version, can be found in several studies[25, 26, 27].

Despite the change in numbering, the ITU-T ADPCM algorithm for speech coding at 32kbit/s, the term “G.721 algorithm” has been retained for simplicity of the text, althougha more formal reference should be “G.726 at 32 kbit/s”.

99


7.1 Description of the 32 kbit/s ADPCM

The basic idea behind the G.721 coder is to code into 4-bit samples the input speech-bandsignals, sampled at 8 kHz and represented by the 8-bit of G.711 A or µ law samples. Thedecoder just implements the reverse procedure.

The ADPCM algorithm of the G.721 exploits the predictability of the speech signals.Therefore, an adaptive predictor is used to compute the difference signal d(k) (basedon the expanded input log-pcm sample s(k)), which is then quantized by an adaptivequantizer using 4 bits. These bits are sent to the decoder and then fed into an inversequantizer. The difference signal is used to calculate the reconstructed signal, sr(k), whichis compressed (A- or µ-law) and output from the decoder (sd(k)).

From this description, one could ask the following:

• If only the quantized signal is transmitted, how can the decoder reconstructthe signal?

• How can one assure estability of the predictor?• Will this bitrate reduction degrade the voice quality?

These and others have already been considered in the design of the G.721, and many blocksof the algorithm are made to assure a good behaviour. For example, one possibility inthis backward approach for adaptation is to have encoder and decoder starting from thesame point, which is accomplished by reseting key variables to a known state (usefulfor implementation verification). Leak factors have been introduced to ensure that thealgorithm will always converge, independently of the initial state. To avoid instabilities,some parameters had their range limited. To provide some insight in the building blocksof the G.721 algorithm, a short description of each of them is given [23, 26].

7.1.1 PCM format conversion

The input signal s(k), in either A- or µ-law format, must be converted into linear samples.This expansion is accomplished using the same algorithm in G.711 [21], but convertingfrom signed magnitude to 14-bit two’s complement samples.

7.1.2 Difference Signal Computation

This block simply calculates the difference between the (expanded) input signal and theestimated signal:

d(k) = sl(k)− se(k)

7.1.3 Adaptive Quantizer

A 15-level, non-uniform adaptive quantizer is used to quantize the difference signal. Beforethe quantization, this signal is converted to a logarithmic representation1 and scaled bya factor (y(k)), that is computed in the scale factor adaptation block (see below).

1Remember that to multiply samples in the linear domain one may add in the logarithmic one. Usingefficient log and exponentiation algorithms (as done here), this turns out to be very advantageous.


The output of this block is I(k), and it is used twice; first, is the ADPCM coded (quantized)sample; second, is the input to the backward part of the G.721 algorithm, to provideinformation for quantization of the next samples. One relevant point to notice here isthat the backward adaptation is done using the quantized sample. If one starts thedecoder from this very point, one will find identical behaviour. That is why only thequantized samples are needed in the decoder (i.e., no side information).

7.1.4 Inverse Adaptive Quantizer

The inverse adaptive quantizer takes the signal I(k) and converts it back to the lineardomain, generating a quantized version of the difference signal, dq(k). This is the inputto the adaptive predictor, such that the estimated signal is based on a quantized versionof the difference signal, instead of on the unquantized (original) one.

7.1.5 Quantizer Scale Factor Adaptation

This block computes y(k), the factor used in the adaptive quantizer and inverse quantizerfor domain conversion. As input, this block needs I(k), but also al(k), the adaptationspeed control parameter. The reason for the latter is that the scaling algorithm has twomodes (bimodal adaptation), one fast, another slow. This has been done to accomodatesignals that in nature produce difference signals with large fluctuations (e.g. speech) andsmall fluctuations (e.g. tones and voice-band data), respectively.

This block computes two scale factor (fast, yu(k), and slow, yl(k)) based on I(k), whichcombined using al(k) produce y(k).

7.1.6 Adaptation Speed Control

This block evaluates the parameter al(k), which can be seen as a proportion of the speed(fast or slow) of the input signal, and is in the range [0, 1]. If 0, the data are consideredto be slowly varying; if 1, they are considered to be fast varying.

To accomplish this, two measures of the average magnitude of I(k) are computed (dms(k)and dml(k)). These, in conjunction with delayed tone detect and transition detect flags(td(k) and tr(k), calculated in the Tone Transition and Detector block), are used toevaluate ap(k), whose delayed version (ap(k−1)) is used in the definition of al(k), limitingthe range to [0, 1]2.

An analysis of ap(k) gives insight on the nature of the signal: if around the value of 2, thismeans that the average magnitude of I(k) is changing, or that a tone has been detected,or that it is idle channel noise; on the other side, if near 0, the average magnitude of I(k)remains relatively constant.

2This limitation delays the start of a fast to slow transition until the average magnitude of I(k) remainsconstant for some time; acting so, premature transitions for pulsed input signals, such as switched carriervoiceband data, are avoided.


7.1.7 Adaptive Predictor and Reconstructed Signal Calculator

The adaptive predictor has as its main function to compute the signal estimate based onthe quantized difference signal, dq(k). It has 6 zeroes and 2 poles, structure that coverswell the kind of input signals expected for the algorithm. With these coefficients, and pastvalues of dq(k) and se(k), the updated value for the signal estimate se(k) is computed.

The two sets of coefficients (one for the pole section, ai(k), i = 1..2, other for the zerosection, bi(k), i = 1..6) are updated using a simplified gradient algorithm. At this point,since a situation in which the poles cause instability may arise, the two pole coefficients ai

have their ranges limited. In addition, if a transition from partial band signal is detected(signaled by tr(k)), the predictor is reset (all coefficients are set to 0), remaining disableduntil tr comes back to zero3.

The reconstructed signal sr(k) is calculated using the signal estimate se(k) and the quan-tized difference signal dq(k).

7.1.8 Tone Transition and Detector

This block is one of the changes from the Red Book version. It was added to improvealgorithm performance for signals originating from FSK modems operating in the char-acter mode. First, it checks if the signal has partial band (e.g., a tone) by looking at thepredictor coefficient a2(k), that defines the signal td(k). Second, a transition from partialband signal indicator tr(k) is set, such that predictor coefficients can be set to 0 and thequantizer can be forced into the fast mode of operation.

7.1.9 Output PCM Format Conversion

This block is unique to the decoder. Its sole function is to compress the reconstructedsignal sr(k), which is in linear PCM format, using A or µ law, and is a complement ofthe PCM format conversion block.

7.1.10 Synchronous Coding Adjustment

This block is also unique to the decoder. It has been devised in order to prevent cumulativedistortions occuring on synchronous tandem codings (ADPCM–PCM–ADPCM, etc., inpurely digital connections, i.e., with no intermediate analog conversions), provided that:

• the transmission of the ADPCM and the intermediate PCM are error-free, and

• the ADPCM and the intermediate PCM are not disturbed by digital signalprocessing devices.

7.1.11 Extension for linear input and output signals

An extension of the G.726 algorithm was carried out in 1994 to include, as an option,linear input and output signals. The specification for such linear interface is given in itsAnnex A [28].

3Note that when this happens, the quantizer is forced into the fast mode of adaptation.


This extension bypasses the PCM format conversion block for linear input signals, andboth the Output PCM Format Conversion and the Synchronous Coding Adjustmentblocks, for linear output signals. These linear versions of the input and output signals are14-bit, 2’s complement samples.

The effect of removing the PCM encoding and decoding is to decrease the coding degra-dation by 0.6 to 1 qdu, depending on the network configuration considered (presence orabsence of a G.712 filtering).

Currently, this extension has not been incorporated in the STL.

7.2 ITU-T STL G.726 Implementation

The STL implementation of the G.726 algorithm can be found in module g726.c, withprototypes in g726.h.

Originally in Fortran (VAX Fortran-77), the source was translated by means of the public-domain code converter f2c [29]. This explain why the code makes extensive use of passageof parameters by reference, rather than by value, and why many functions, that could beimplemented as macros (using the C pre-processor directive #define), are routines, andas well as all routines return void.

The problem of storing the state variables was solved by defining a structure containingall the necessary variables, defining a new type called G726 state. By means of thisapproach, several streams may be processed in parallel, provided that one structure isassigned (and that one call to the encoding/decoding routines is done) for each datastream (this can be advantageous for machines with support for parallel processing). TheG726 state variable structure has the following fields (all are short, except ylp, which islong):

sr0 Reconstructed signal with delay 0sr1 Reconstructed signal with delay 1a1r Delayed 2nd-order predictor coefficient 1a2r Delayed 2nd-order predictor coefficient 2b1r Delayed 6th-order predictor coefficient 1b2r Delayed 6th-order predictor coefficient 2b3r Delayed 6th-order predictor coefficient 3b4r Delayed 6th-order predictor coefficient 4b5r Delayed 6th-order predictor coefficient 5b6r Delayed 6th-order predictor coefficient 6dq0 Quantized difference signal with delay 0dq1 Quantized difference signal with delay 1dq2 Quantized difference signal with delay 2dq3 Quantized difference signal with delay 3dq4 Quantized difference signal with delay 4dq5 Quantized difference signal with delay 5dmsp Short term average of the F (I) sequencedmlp Long term average of the F (I) sequenceapr Triggered unlimited speed control parameteryup Fast quantizer scale factor


Figure 7.1: Packing of G.726-encoded signals (right-aligned, parallel format).

tdr Triggered tone detectorpk0 Sign of dq+sez with delay 0pk1 Sign of dq+sez with delay 1ylp Slow quantizer scale factor

The encoding function is G726 encode, and the decoding function is G726 decode. Thereare 41 other routines that, grouped in individual calls inside the encoder and decoder,implement the algorithm. Therefore, none of these 41 routines are expected to be accessedby the user, and only the two main ones.

In the following part a summary of calls to both functions is found.

7.2.1 G726 encode

Syntax:

#include "g726.h"

void G726 encode (short *inp buf, short *out buf, long smpno, char

*law, short rate, short reset, G726 state *state)

Prototype: g726.h

Description:

Simulation of the ITU-T G.726 ADPCM encoder. Takes the A or µ law input array ofshorts inp buf (16 bit, right-justified, without sign extension) with smpno samples, andsaves the encoded samples in the array of shorts out buf, with the same number of samplesand right-justified. An example of the sample packing for the G.726 encoded bitstreamis shown in figure 7.1.

The state variables are saved in the structure state, and the reset can be stablished by


making reset equal to 1. The law is A if law==’1’, and mu law if law==’0’.

Variables:

inp buf . . . . . . . . . . . . . . Is the input samples’ buffer; each short sample shall containright-justified 8-bit wide valid A or µ law samples.

out buf . . . . . . . . . . . . . . Is the output samples’ buffer; each short sample will containright-justified 2-, 3-, 4-, or 5-bit wide G.726 ADPCM samples,depending on the rate used.

smpno . . . . . . . . . . . . . . Is the number of samples in inp buf.law . . . . . . . . . . . . . . Is a char indicating if the law for the input samples is A (’1’)

or µ (’0’). See note below.rate . . . . . . . . . . . . . . Is a short indicating the number of bits per sample to used

by the algorithm: 5, 4, 3, or 2.reset . . . . . . . . . . . . . . Is the reset flag (see note below):

• 1: reset is to be applied in the variables;• 0: processing is carried out without setting state variablesto the reset state.Please note that this should normally be done only in the firstcall to the routine in processing a sample stream.

state . . . . . . . . . . . . . . The state variable structure; all the variables here are for in-ternal use of the G.726 algorithm, and should not be changedby the user. Fields of this structure are described above.

Note: Please note the difference between reset and law: reset must be either 1 (0x01)or 0 (0x00), not ‘1’ (0x31) or ‘0’ (0x30), while law is exactly the opposite.

Return value: None.

7.2.2 G726 decode

Syntax:

#include "g726.h"

void G726 decode (short *inp buf, short *out buf, long smpno, char

*law, short rate, short reset, G726 state *state)

Prototype: g726.h

Description:

Simulation of the ITU-T G.726 ADPCM decoder. Takes the ADPCM input array ofshorts inp buf (16 bit, right- justified, without sign extension) of length smpno, and savesthe decoded samples (A or µ law) in the array of shorts out buf, with the same numberof samples and right-justified.

The state variables are saved in the structure state, and the reset can be stablished bymaking reset equal to 1. The law is A if law==’1’, and mu law if law==’0’.

Variables:

inp buf . . . . . . . . . . . . . . Is the input samples’ buffer; each short sample will containright-justified 2-, 3-, 4-, or 5-bit wide G.726 ADPCM samples.

out buf . . . . . . . . . . . . . . Is the output samples’ buffer; each short sample shall containright-justified 8-bit wide valid A or µ law samples.

smpno . . . . . . . . . . . . . . Is the number of samples in inp buf.


law . . . . . . . . . . . . . . Is a char indicating if the law for the input samples is A (’1’)or µ (’0’). See note below.

rate . . . . . . . . . . . . . . Is a short indicating the number of bits per sample to usedby the algorithm: 5, 4, 3, or 2.

reset . . . . . . . . . . . . . . Is the reset flag (see note below):• 1: reset is to be applied in the variables;• 0: processing done without setting state variables to resetstate.Please note that this should normally be done only in the firstcall to the routine in processing a sample stream.

state . . . . . . . . . . . . . . The state variable structure; all the variables here are for in-ternal use of the G.721 algorithm, and should not be changedby the user. Fields of this structure are described above.

Note: Please note the difference between reset and law: reset must be either 1 (0x01)or 0 (0x00), not ‘1’ (0x31) or ‘0’ (0x30), while law is exactly the opposite.

Return value: None.

7.3 Portability and compliance

Code testing has been done using the reset test sequences for 40, 32, 24, and 16 kbit/sprovided in the G.726 test sequence diskettes (available from the ITU sales department).Other tests were also done with speech files for the 32 kbit/s mode, comparing withreference implementations, most noticeably the one from AT&T Bell Laboratories, whichis the original implementation. Both test approaches generated 100% compatibility ofthis implementation with the G.726. 4

The portability of the STL G.726 encoding function has been tested by feeding the routinewith the reset test sequences of the G.726 test sequences diskettes (available from the ITUSecretariat). As inputs, a binary version of the files nrm.a, ovr.a, nrm.m, ovr.m have beenused for the 4 bit rates; the output of G726 encoder was then compared with a binaryversion of the files rnrrfa.i, rvrrfa.i, rnrrfm.i, rvrrfm.i, rr = 16, 24, 32, 40, accordingly foreach input sequence and rate. The encoding routine passed the test when no differencesin the bit streams were found.

The portability test of the decoding function was carried out by feeding this routinewith the pertinent test sequences of the G.726 Test Sequences Diskettes. As inputs, abinary version of the files rnrrfa.i, rvrrfa.i, rnrrfa.i, rvrrfa.i, rnrrfm.i, rvrrfm.i, rnrrfm.i,rvrrfm.i, and irr (twice: one for A and another for µ law) have been used, rr being 16,24, 32, and 40. The output of G726 decoder was then compared with a binary versionof the files rnrrfa.o, rvrrfa.o, rnrrfx.o, rvrrfx.o, rnrrfm.o, rvrrfm.o, rnrrfc.o, rvrrfc.o,rirrfa.o, rirrfm.o (rr as above), respectively for each input sequences. All test vectorswere properly processed.

These routines have been tested in VAX/VMS with VAX-C and GNU-C, in the PC withBorland C v3.0 (16-bit mode) and GNU-C (32-bit mode). In the Unix environment forSun cc, acc, and gcc, and in HP for gcc.

4The problem with the A-law 40 kbit/s test vector ri40fa.o present in the STL96 has been solved inthe STL2000.


7.4 Example code


Two programs are provided as demonstration programs for the G.726 module, g726demo.cand vbr-g726.c.

Program g726demo.c accepts input files in either 16-bit, right-justified A- or µ-law format(as generated by g711demo.c) and encodes and/or decodes using one of the G.726 bitrates (16, 24, 32, or 40 kbit/s). Linear PCM files are not accepted by the program. Threeoperations are possible: logarithmic in, logarithmic out (lolo) logarithmic in, ADPCMout (load), or ADPCM in, logarithmic out (adlo).

Program vbr-g726.c can perform the same functions as g726demo.c, however it is capa-ble of two additional features. It can perform in variable bit rate mode, which is switchedat user-specified frame sizes (i.e. number of samples), and it can operate from 16-bitlinear PCM input files. In the latter case, A-law is used to compand the linear signalprior to G.726 encoding, since G.726 Annex A [28] is not yet implemented in the STL.


The following C code gives an example of G.726 coding and decoding using as inputspeech previously encoded by either the A- or µ-law functions available in the STL. Theoutput samples will be encoded using the same law of the input signal.


#define BLK_LEN 256


G726_state encoder_state, decoder_state;char law[4];short bitrate, reset;char FileIn[180], FileOut[180];short tmp_buf[BLK_LEN], inp_buf[BLK_LEN], out_buf[BLK_LEN];FILE *Fi, *Fo;

/* Get parameters for processing */GET_PAR_S(1, "_Law: ......................... ", law);GET_PAR_I(2, "_Bit-rate: .................... ", bitrate);GET_PAR_S(2, "_Input File: .................. ", FileIn);GET_PAR_S(3, "_Output File: ................. ", FileOut);



/* File processing */reset = 1; /* set reset flag as YES */while (fread(inp_buf, BLK_LEN, sizeof(short), Fi) == BLK_LEN)

/* Process input log PCM samples in blocks of length BLK_LEN */G726_encode(inp_buf, tmp_buf, BLK_LEN, law, bitrate, reset, &encoder_state);

/* Process ADPCM samples in blocks of length BLK_LEN */G726_decode(tmp_buf, out_buf, BLK_LEN, law, bitrate, reset, &decoder_state);

/* Write PCM output word */fwrite(out_buf, BLK_LEN, sizeof(short), Fo);

if (reset)reset = 0; /* set reset flag as NOMORE */

/* Close input and output files */fclose(Fi);fclose(Fo);

Chapter 8

G.727: The ITU-T embeddedADPCM algorithm at 40, 32, 24,and 16 kbit/s

8.1 Description of the Embedded ADPCM

The G.727 algorithm is specified in ITU-T Recommendation G.727 [30] with the blockdiagram shown in Figure 8.1, and will not be further described here. Additional infor-mation can be found in [27], where a thorough comparison is made between differentADPCM schemes, including G.726 and G.727. Details on the linear interface for theG.727 algorithm are found in G.727 Annex A [31].

8.1.1 Extension for linear input and output signals

An extension of the G.727 algorithm was carried out in 1994 to include, as an option,linear input and output signals. The specification for such linear interface is given in itsAnnex A [31].

This extension bypasses the PCM format conversion block for linear input signals, andboth the Output PCM Format Conversion and the Synchronous Coding Adjustmentblocks, for linear output signals. These linear versions of the input and output signals are14-bit, 2’s complement samples.

The effect of removing the PCM encoding and decoding is to decrease the coding degra-dation by 0.6 to 1 qdu, depending on the network configuration considered (presence orabsence of a G.712 filtering).

Currently, this extension has not been incorporated in the STL.


The STL implementation of the G.727 algorithm can be found in module g727.c, withprototypes in g727.h.

109


Reconstructedsignal

calculator

Input PCMformat

conversion

Differencesignal

computation

Adaptivequantizer

Feed-backbit

masking

Quantizerscale factoradaptation

ADPCMoutput

s(k)

s (k)l

d(k) l(k)

I (k)c

s (k)e

a (k)2

y (k)

a (k)

t (k)

t (k)

r

d

Inverseadaptivequantizer

Adaptivepredictor

Tone andtransitiondetector

Adaptationspeedcontrol

d (k)q

FB

l

1

y(k)

s (k)r

l(k)

(a) Encoder

Feed-forwardreconstructed

signal calculator

Output PCMformat

conversion

Feed-forwardinverse adaptive

quantizer

Adaptivepredictor

Quantizerscale factoradaptation

Feed-back bitmasking

Adaptationspeed control

ADPCMinput

I ′(k)

s (k)e

d (k)qs (k)p

a (k)2

y(k)

t (k)

t (k)

r

d

y (k)l

s (k)d

l (k)cd (k)q

y(k)

a (k)1

FF

Feed-backinverse adaptive

quantizer

Tone andtransitiondetector

Synchronouscoding

adjustment

Feed-backreconstructed

signal calculator

s (k)r FF

FB

s (k)r FB

(b) Decoder

Figure 8.1: G.727 encoder and decoder block diagrams


The problem of storing the state variables was solved by defining a structure containingall the necessary variables, defining a new type called G727 state. As for other STLmodules, the use of the state variable allows for parallel processing flows in the sameexecutable program. The internal elements of the state variable G727 state should notbe modified by the user, and are not described here.

The encoding function is G727 encode, and the decoding function is G727 decode. Addi-tionally, initialization and reset of the state variable is performed by g727 reset. Thereare other internal routines which are not for access by the user, and hence are not describedhere. Their usage description is given below.

8.2.1 G727 reset

Syntax:

#include "g727.h"

void G727 reset (g727 state *st);

Prototype: g727.h

Description:

Reset ITU-T G.727 embedded ADPCM encoder or decoder state variable.

8.2.2 G727 encode

Syntax:

#include "g727.h"

void G727 encode (short *src, short *dst, short smpno, short law,

short cbits, short ebits, g727 state *state);

Prototype: g727.h

Description:

Simulation of the ITU-T G.727 embedded ADPCM encoder. Takes the A or µ law inputarray of shorts src (16 bit, right- justified, without sign extension) of length smpno, andsaves the encoded samples in the array of shorts dst, with the same number of samples andright-justified. The ADPCM samples will have cbits core bits, and ebits enhancementbits.

The state variables are saved in the structure state, which should be initialized by g727 reset()

before use. A-law is used if law==’1’, and µ-law if law==’0’.

Variables:

src . . . . . . . . . . . . . . Is the input samples’ buffer; each short sample shall containright-justified 8-bit wide valid A or µ law samples.

dst . . . . . . . . . . . . . . Buffer with right justified short ADPCM-encoded sampleswith cbits core bits and ebits enhancement bits. Unused MSbsare set to zero.

smpno . . . . . . . . . . . . . . Is a short indicating the number of samples to encode.law . . . . . . . . . . . . . . Is a char indicating if the law for the input samples is A (’1’)

or µ (’0’).


cbits . . . . . . . . . . . . . . Number of core ADPCM bits.ebits . . . . . . . . . . . . . . Number of enhancement ADPCM bits.state . . . . . . . . . . . . . . The state variable structure; all the variables here are for in-

ternal use of the G.727 algorithm, and should not be changedby the user.

Return value: None.

8.2.3 G727 decode

Syntax:

#include "g727.h"

void G727 decode (short *src, short *dst, short smpno, short law,

short cbits, short ebits, g727 state *state );

Prototype: g727.h

Description:

Simulation of the ITU-T G.727 embedded ADPCM decoder. Takes the ADPCM inputarray of shorts src (16 bit, right-justified, without sign extension) of length smpno, andsaves the decoded samples (A or µ law) in the array of shorts dst, with the same numberof samples and right-justified. The ADPCM samples must have cbits core bits, andebits enhancement bits.

The state variables are saved in structure st, which should be initialized by g727 reset()

before use. The law is A if law==’1’, and µ law if law==’0’.

Variables:

src . . . . . . . . . . . . . . Buffer with right justified short ADPCM-encoded sampleswith cbits core bits and ebits enhancement bits. Unused MSbsare zero.

dst . . . . . . . . . . . . . . Is the input samples’ buffer; each short sample shall containright-justified 8-bit wide valid A or µ law samples.

smpno . . . . . . . . . . . . . . Is a short indicating the number of samples to encode.law . . . . . . . . . . . . . . Is a char indicating if the law for the input samples is A (’1’)

or µ (’0’).cbits . . . . . . . . . . . . . . Number of core ADPCM bits.ebits . . . . . . . . . . . . . . Number of enhancement ADPCM bits.state . . . . . . . . . . . . . . The state variable structure; all the variables here are for in-

ternal use of the G.727 algorithm, and should not be changedby the user.

Return value: None.


Code testing has been done using the reset test sequences for 5, 4, 3, and 2 bits withthe valid combination of core and enhancement bits. The reset test sequences can beacquired from the ITU Sales Department, and are not distributed with the STL. Thetesting procedure is implemented in the makefiles, which use a binary version of the test


vectors. The implementation passed the compliance test when no differences were foundbetween tested and reference test vectors. All test vectors were verified to be properlyprocessed.

These routines have been tested in in MS-DOS with Turbo C++ v1.0 (16-bit mode) andGNU-C (go32 32-bit mode), and in Windows/32 with MS Visual C and CYGNUS/gcc.In the Unix environment, they have been tested for SunOs (cc, acc, and gcc), HP-UX(gcc), and Ultrix 4.0 (cc and gcc).

8.4 Example code


One program is provided as demonstration program for the G.727 module, g727demo.c.

Program g727demo.c accepts input files in either 16-bit, right-justified A- or µ-law format(as generated by g711demo.c) and encodes and/or decodes using the G.727 algorithmfor the user-specified number of Nc core bits and Ne enhancement bits. The effectiveencoding bitrate will then be 16 × (Nc + Ne) kbit/s. Linear PCM files are not acceptedby the program, since G.727 Annex A [31] is not yet implemented in the STL. Threeoperations are possible: logarithmic in, logarithmic out (default) logarithmic in, ADPCMout (option -enc), or ADPCM in, logarithmic out (option -dec).


The following C code gives an example of G.727 coding and decoding using as inputspeech previously encoded by either the A- or µ-law functions available in the STL. Theoutput samples are encoded using the same law of the input signal.


#define BLK_LEN 256


G727_state encoder_state, decoder_state;char law;short core, enh;char FileIn[180], FileOut[180];short tmp_buf[BLK_LEN], inp_buf[BLK_LEN], out_buf[BLK_LEN];FILE *Fi, *Fo;

/* Get parameters for processing */GET_PAR_C(1, "_Law: ......................... ", law);GET_PAR_I(2, "_Core bits: ................... ", core);


GET_PAR_I(2, "_Enhancement bits: ............ ", enh);GET_PAR_S(2, "_Log-PCM Input File: .......... ", FileIn);GET_PAR_S(3, "_Log-PCM Output File: ......... ", FileOut);


/* Reset state variables */g727_reset(&encoder_state);g727_reset(&decoder_state);


/* Process input log PCM samples in blocks of length BLK_LEN */G727_encode(inp_buf, tmp_buf, BLK_LEN, law, core, enh, &encoder_state);

/* Process ADPCM samples in blocks of length BLK_LEN */G727_decode(tmp_buf, out_buf, BLK_LEN, law, core, enh, &decoder_state);

/* Write PCM output word */fwrite(out_buf, BLK_LEN, sizeof(short), Fo);

/* Close input and output files */fclose(Fi);fclose(Fo);

Chapter 9

G.722: The ITU-T 64, 56, and 48kbit/s wideband speech codingalgorithm

With the emergence of ISDN networks offering digital connectivity at 64 kbit/s betweensubscribers, the possibility was given to improve the standard telephone quality by increas-ing the transmitted bandwith. A bandwith of 50-7000 Hz corresponding to a samplingof 16 kHz was chosen because it provides a substantial improvement of the quality forapplications where the speech is to be heard through high quality loudspeakers e.g. foraudio or video conference services, commentary broadcasting, and high quality handsfreephones.

An expert group was created in November 1983 whose mandate was to define a singleworldwide standard for 7 kHz speech coding within 64 kbit/s. After many contribu-tion received from several organisations, it has been decided to choose a coder whichcombined subband filtering and adaptive differential pulse-code modulation algorithms(SB-ADPCM). The final recommendation was produced in March 1986 and approved inJuly 1986 by the then CCITT SG XVIII as Recommendation G.722 [32].

The full description on the implementation of the G.722 algorithm is found in [32], andnetwork aspects related to its operation are found in [33]. Figure 9.1 summarizes somesystemic aspects for the deployment of the G.722 algorithm. Overview and notes on thedevelopment of the G.722 algorithm can be found in several in [34, 35, 36, 37, 38, 39].The following description of the G.722 algorithm is based on the text in [40].

9.1 Description of the 64, 56, and 48 kbit/s G.722

algorithm

In order to improve the transmitted speech quality, the input signal has to be convertedafter antialiasing filtering by an analog-to-digital (A/D) converter operating at a 16 kHzsampling rate and with a resolution of at least 14 uniform PCM bits. Similarly, atthe receive side, a digital-to-analog (D/A) converter operating at a 16 kHz samplingrate and with a resolution of at least 14 uniform PCM bits should be used, followedby a reconstructing filter. The specifications of the transmission characteristics of the

115


ReconstructedSpeech Signal

Auxiliary DataChannel

WidebandSpeech

(50-7000Hz)

Auxiliary DataChannel

16kHz

A/D14 bits Mux

Device

DataInsertion

16kHz

D/A14 bits

upper sub-bandADPCM Decoder

(3 variants)

ADPCM DecoderLower sub-band

DMuxDevice

DataExtraction

QuadratureFilters

QuadratureFilters

upper sub-bandADPCM Coder

lower sub-bandADPCM Coder

* The signal in the 64kbit/s channel comprises 64, 56 or 48 kbit/s for speech and 0, 8 or 16 kbit/sfor data, depending on the operating mode.

(50-7000Hz)

* Operating modes 1-bis and 3-bis are applicable only to US national 56 kbit/s networks

Transmission

Reception

64kbit/s Channel (speech and data)

64kbit/s OperatingMode

16kbit/s

48kbit/s

64kbit/s16kbit/s

48kbit/s

Operating Mode

Mode 1:

Mode 1-bis:

Mode 2:

Mode 3:

Mode 3-bis:

64kbit/s for speech and 0kbit/s for auxiliary data




48kbit/s for speech, 6.4kbit/s for auxiliary data and1.6kbit/s for service channel framing and mode control

Notes:

* Operating modes:

Figure 9.1: G.722 encoder and decoder block diagrams


HIG HER SUB-BAND

ADPCM ENCODER

LO WER SUB-BAND

ADPCM ENCODER

TRANSMIT

QUADRATURE

MIRROR

FILTERS

MUXX

ln

XH

XL

48 kbit/s

IL

16 kbit/s

IH 64 kbit/s

I

Figure 9.2: Block diagram of the SB-ADPCM encoder

audio parts suited for the G.722 algorithm are described in the Recommendation. Someflexibility of the output bit rate was implemented to allow the opening of an auxilary datachannel within the 64 kbit/s channel.

9.1.1 Functional description of the SB-ADPCM encoder

Figure 9.2 shows a block diagram of the SB-ADPCM encoder which comprises the fol-lowing main blocks.

Transmit quadrature mirror filters

The input signal xin is first filtered by two quadrature mirror filters (QMF) which splitthe frequency band [0, 8000 Hz] into two equal subbands. The outputs xl and xh of thelower and higher subbands are downsampled at 8 kHz by the filtering procedure.

Lower subband ADPCM encoder

Figure 9.3 shows a block diagram of the lower subband ADPCM encoder. The encoderwas designed to operate at 6, 5 or 4 bits per sample corresponding to 48, 40 or 32 kbit/sto transmit the lower band. The ADPCM algorithm is very similar to the embeddedADPCM algorithm of ITU-T Recommendation G.727 [30]. It is an embedded ADPCMwith 4 core bits and 2 additionnal bits. The embedded property was introduced to preventdegradation in speech quality when the encoder and the decoder operate during shortintervals in different modes.

Adaptive quantizer A 60-level non-uniform adaptive quantizer is used to quantize thedifference el between the input signal xl and the estimated signal sl. The output of thequantizer Il is the ADPCM codeword for the lower subband. The 4 forbidden outputcodewords were primarily introduced to prevent the generation of all zero codes at allmodes, but have also later be used to recover the 8 kHz frame used by the coder.

Inverse adaptive quantizer In the feedback loop the two least significant bits of Ilare deleted to produce a 4 bit signal Ilt which is used for the adaptation of the quantizerscale factor and applied to a 15-level inverse adaptive quantizer to produce the quantizeddifference signal dlt.


60-LEVEL

ADAPTIVE

QUANTIZER

QUANTIZER

ADAPTATION

DELETE

THE 2

LSB's

15-LEVEL

INVERSE

ADAPTIVE

QUANTIZER

ADAPTIVE

PREDICTOR

XL+

-

eL IL

ILt

dLt

rLt

SL

∆L

+

+

48 kbit/s

Figure 9.3: Block diagram of the lower subband ADPCM encoder

Quantizer adaptation In order to maintain a wide dynamic range and minimize com-plexity, the quantizer scale factor adaptation is performed in the base 2 logarithmic do-main. The log-to-linear conversion is accomplished using a lookup table. There is noadaptation of the speed control parameter as in 32 kbit/s ADPCM [23] because the en-coder is not designed to transmit voiceband data.

Adaptive predictor and reconstructed signal computation The adaptive predic-tor structure is similar to the one used for G.727 ADPCM standard: 2 poles and 6 zeroes.The two sets of coefficients (one for the poles and the other for the zeroes section) are up-dated using a simplified gradient algorithm. Stability constraints are applied to the polesin order to prevent possible unstable conditions. However, no predictor reset is appliedfor some specifics inputs conditions as it is done in G.726 algorithm. The reconstructedsignal rlt is computed by adding the quantized difference signal dlt to the signal estimatesl produced by the adaptive predictor. The use of a 4-bit operation instead of a 6-bitoperation in the feedback loops of the lower band ADPCM encoder and decoder allowsfor the insertion of data in the two least significant bits without causing mistracking inthe decoder.

Higher subband ADPCM encoder

Figure 9.4 shows a block diagram of the higher subband ADPCM encoder. This encoderis designed to operate at 2 bits per sample, corresponding to a fixed bit rate of 16 kbit/s.The encoder algorithm is very similar to the lower band one but with the following maindifferences. The quantizer is a 4-level non-linear adaptive quantizer. The higher subbandADPCM encoder is not embedded, hence the inverse quantizer uses the 2 bits in thefeedback loop.


4-LEVEL

ADAPTIVE

QUANTIZER

QUANTIZER

ADAPTATION

4-LEVEL

INVERSE

ADAPTIVE

QUANTIZER

ADAPTIVE

PREDICTOR

XH+

-

eH IH

dH

rH

SH

∆H

+

+

16 kbit/s

Figure 9.4: Block diagram of the higher subband ADPCM encoder

Multiplexer

The resulting codewords from the higher and lower subbands Ih and Il are combinedto get the output codeword I with an octet format for transmission every 8 kHz frameproducing a 64 kbit/s rate at the encoder output. Notice that the 8 kHz clock may beprovided by the network as it is always done for 64 kbit/s A-law or µ-law log-PCM (G.711)systems.

9.1.2 Functional description of the SB-ADPCM decoder

Figure 9.5 shows a block diagram of the SB-ADPCM decoder.

Demultiplexer

The demultiplexer decomposes the received 64 kbit/s octet formatted signal Ir into twosignals Ilr and Ihr which form the codeword inputs for the lower and higher subbandADPCM decoders.

Lower subband ADPCM decoder

Figure 9.6 shows a block diagram of the lower subband decoder. This decoder can operatein three different modes depending on the received mode indication and correspondingto 64, 56 or 48 kbit/s. The block which produces the estimate signal is identical to thefeedback portion of the lower subband ADPCM encoder. The reconstructed signal rl isproduced by adding the signal estimate to the relevant quantized difference signals dl6, dl5or dl4, which are selected according to the received indication of the mode of operation.


HIG HER SUB-BANDADPCM ENCODER

LO WER SUB-BANDADPCM DECODER

(3 VARIANTS)

RECEIVEQUADRATURE

MIRRORFILTERS

DMUX

Xout

rH

rLILr

48 kbit/s

mode indication

Ir

64 kbit/s

IHr

16 kbit/s

Figure 9.5: Block diagram of the SB-ADPCM decoder

Higher subband ADPCM decoder

This decoder (see Figure 9.7) is identical to the feedback portion of the higher subbandADPCM encoder which is described in the section 9.1.1, the output being the recon-structed signal rh.

Receive QMF

The receive QMF are two reconstruction filters which interpolate the ouputs of the lowerand higher subband ADPCM decoders from 8 to 16 kHz (rh and rl) and generate theglobal reconstructed output xout sampled at 16 kHz. Signal xout is converted to analogby the digital to analog converter of the receiving side.


This implementation of the G.722 algorithm is composed of several source files. Theinterface routines are in file g722.c, with prototypes in g722.h. The original code ofthe STL G.722 was provided by CNET/France and its user interface was modified to beconsistent with the other software modules of the STL.

The problem of storing the state variables was solved by defining a structure calledg722 state which containing all the necessary state variables. By means of this ap-proach, several streams may be processed in parallel1, provided that one structure isassigned (and that one call to the encoding/decoding routines is done) for each datastream (this can be advantageous for machines with support for parallel processing). TheG.722 state structure has the following fields (which are all shorts):

1This feature was not possible with the original code provided by CNET and was added in themodifications of the user interface.


DELETE2 LSB's

QUANTIZERADAPTATION

15-LEVELINVERSE

ADAPTATIVEQUANTIZER

15-LEVELINVERSE

ADAPTATIVEQUANTIZER

30-LEVELINVERSE

ADAPTATIVEQUANTIZER

ADAPTATIVEPREDICTOR

60-LEVELINVERSE

ADAPTATIVEQUANTIZER

DELETE1 LSB

DELETE2 LSB's

mode indication

ILr

48 kbit/s

ILt

(ILt)

IL,4

IL,5

IL,6

(dLt)

dL,4

dL,5 dL rL

∆L

rLt

dLt

SL

SELECTOR

dL,6

Figure 9.6: Block diagram of the lower subband ADPCM decoder

QUANTIZER

ADAPTATION

ADAPTATIVE

PREDICTOR

4-LEVEL

INVERSE

ADAPTATIVE

QUANTIZER

IH

16 kbit/s

dH rH

sH

+

+

∆L

Figure 9.7: Block diagram of the higher subband ADPCM decoder


ah, al Second-order pole section coefficient buffer for higher and lowerband, respectively

bh, bl Seventh-order zero section coefficient buffer for higher and lowerband, respectively

deth, detl Delayed quantizer scale factor for higher and lower band,respectively

dh Quantizer difference signal memory

dlt Quantizer difference signal for the adaptive predictor

init qmf rx Flag indicating the need to initialize the QMF filters on thereception (decoder) side

init qmf tx Flag indicating the need to initialize the QMF filters on thetransmission (encoder) side

nbh, nbl Delayed logarithmic quantizer factor for higher and lower band,respectively

ph, plt Partially reconstructed signal memory for higher and lower band,respectively

qmf rx delayx Memory of past 24 received (decoded) samples

qmf tx delayx Memory of past 24 transmitted (encoded) samples

rh[3] Quantized reconstructed signal

rlt[3] Reconstructed signal memory for the adaptive predictor

sh, sl Predictor output value for higher and lower band, respectively

sph, spl Pole section output signal for higher and lower band, respectively

szh, szl Zero section output signal for higher and lower band, respectively

The bitstream generated by the STL G.722 encoder has 8 valid bits for each encodedsample, saved in right-justified shorts. The lower 6 bits are the lower-subband encodedbits, and the upper two bits of the 8 valid bits are the upper-subband encoded bits.When the decoder is not in operation mode 1, the decoder will descard 1 or 2 of the lowerbits of the lower-subband. It should be noted that, when bit errors are inserted in thisbitstream and the operation mode is not mode 1, the actual bit error rate seen by thedecoder may not be the one actually desired. One may consider that, in simulating asystem where auxiliary data channels are used, such as modes 2 and 3, this is actually thedesired behaviour, because errors hitting the auxiliary data will not affect the decodedspeech quality. However, if simulation of modes 1-bis or 3-bis is intended, then the someof the errors hitting the lower 1 (mode 1-bis) or 2 bits (mode 3-bis) will not be seen bythe decoder, and the overall bit error rate will actually be smaller than the desired one.There are two possible approaches to circumvent this problem:

• the use of an external program to shift the bitstream samples one or two bits (re-spectively for modes 1-bis or 3-bis) to the right before the bitstream serializationprocess for use with the STL EID module, and an external program to left-shift thebitream samples by one or two bits after error insertion and before using the STLG.722 decoder. This solution is valid for both random and burst bit errors.

• to increase proportionally the bit error rate by 1/8 (mode 1-bis) or 1/4 (mode 3-bis),to statistically compensate for errors hitting unsued bits. This solution is valid onlyfor random bit errors.


From the users’ perspective, the encoding function is g722 encode, and the decoding func-tion is g722 decode. Before using these functions, state variables for the encoder and thedecoder must be initialized respectively by g722 reset encoder and g722 reset decoder.It should be noted that encoder and decoder need individual state variables to work prop-erly.

In the following part a summary of calls to the three entry functions is found.

9.2.1 g722 encode

Syntax:

#include "g722.h"

void g722 encode (short *inp buf, short *g722 frame, long smpno,

g722 state *g722 encode);

Prototype: g722.h

Description:

Simulation of the ITU-T G.722 64 kbit/s encoder. Takes the linear (16-bit, left-justified)input array of shorts inp buf (16 bit, right-justified, without sign extension) with smpnosamples, and saves the encoded bit-stream in the array of shorts g722 frame.

The state variables are saved in the structure pointed by g722 encode, and the reset canbe stablished by making a call to g722 reset encoder.

Variables:

inp buf . . . . . . . . . . . . . . Is the input samples’ buffer with smpno left-justified 16-bitlinear short speech samples.

g722 frame . . . . . . . . . . . . . . Is the encoded samples’ buffer; each short sample will containthe encoded parameters as right-justified 8-bit samples.

smpno . . . . . . . . . . . . . . Is a long with the number of samples to be encoded from theinput buffer inp buf.

g722 encode. . . . . . . . . . . . . . A pointer to the state variable structure; all the variables hereare for internal use of the G.722 algorithm, and should not bechanged by the user. Fields of this structure are describedabove.

Return value:

Returns the number of speech samples encoded.

9.2.2 g722 decode

Syntax:

#include "g722.h"

void g722 decode (short *g722 frame, short *out buf, int mode, long

smpno, g722 state *g722 decoder, );

Prototype: g722.h

Description:

Simulation of the ITU-T 64 kbit/s G.722 decoder. Reconstructs a linear (16-bit, left-justified) array of shorts inp buf (16 bit, right-justified, without sign extension) withsmpno samples from the encoded bit-stream in the array of shorts g722 frame.


The state variables are saved in the structure pointed by g722 decoder, and the reset canbe stablished by making a call to g722 reset decoder.

Variables:

g722 frame . . . . . . . . . . . . . . Is the encoded samples’ buffer; each short sample will containthe encoded parameters as right-justified 8-bit samples.

out buf . . . . . . . . . . . . . . Is the output samples’ buffer with smpno left-justified 16-bitlinear short speech samples.

mode . . . . . . . . . . . . . . Is an int which indicates the operation mode for the G.722decoder. If equal to 1, the decoder will operate at 64 kbit/s.If equal to 2, the decoder will operate at 56 kbit/s, discardingthe least significant bit of the lower-band ADPCM. If equalto 3, the decoder will discard the two least significant bits ofthe lower band ADPCM, being equivalent to the 48 kbit/soperation of the G.722 algorithm. It should be noted that,for this implementation of the G.722 algorithm, mode 1-bis isidentical to mode 2, and mode 3-bis is identical to mode 3.

smpno . . . . . . . . . . . . . . Is a long with the number of samples in the input encodedsample buffer g722 frame to be decoded.

g722 decoder. . . . . . . . . . . . . . A pointer to the state variable structure; all the variables hereare for internal use of the G.722 algorithm, and should not bechanged by the user. Fields of this structure are describedabove.

Return value:

Returns the number of speech samples encoded.

9.2.3 g722 reset encoder

Syntax:

#include "g722.h"

void g722 reset encoder (g722 state *g722 encoder);

Prototype: g722.h

Description:

Initializes the state variables for the G.722 encoder or decoder. Coder and decoder requireeach a different state variable.

Variables:

g722 encoder. . . . . . . . . . . . . . A pointer to the G.722 encoder state variable structure whichis to be initialized.

Return value: None.

9.2.4 g722 reset decoder

Syntax:

#include "g722.h"

void g722 reset decoder (g722 state *g722 decoder);


Prototype: g722.h

Description:

Initializes the state variables for the G.722 decoder. Coder and decoder require each adifferent state variable.

Variables:

g722 decoder. . . . . . . . . . . . . . A pointer to the G.722 decoder state variable structure whichis to be initialized.

Return value: None.


The portability test for these routines has been done using the test sequences designedby the ITU-T for the G.722 algorithm (available from the ITU sales department). Itshould be noted that the G.722 test sequences are not designed to test the QMF filters,but only to exercise the upper and lower band encoder and decoder ADPCM algorithms.Therefore, testing of the codec with the test sequences was done with a special set of testprograms that used the core G.722 upper- and lower-band ADPCM coding and decodingfunctions. All test sequences were correctly processed.

This module has been tested in VAX/VMS with VAX-C, in the PC with Turbo C++ v1.0(16-bit mode) and GNU-C (32-bit mode), in the Unix environment in a Sun workstationwith cc, and in HP with gcc.

9.4 Example code


One demonstration program is provided for the G.722 module, g722demo.c. In addition,two programs are provided in the distribution when compliance testing of the encoderand decoder is necessary, tstcg722.c and tstdg722.c2.

Program g722demo.c accepts 16-bit, linear PCM samples sampled at 16 kHz as encoderinput. The decoder also produces files in the same format. The bitstream signals outof the encoder are always organized in 16-bit, right-justified words that use the lower 8bits (i.e., 64 kbit/s). According to the user-specified mode, the decoder will decode theG.722-encoded bitstream using 64, 56, or 48 kbit/s (i.e. full 8 bits, discard 1 bit of thelower band, or discard 2 bits of the lower band).


The following C code gives an example of G.722 coding and decoding using as inputwideband speech which is encoded and decoded at either 64, 56, or 48 kbit/s, accordingto the user-specified parameter mode.

2The demonstration program g722demo.c cannot be used for compliance verification because the testvectors for G.722 do not foresee processing through the quadrature mirror filters.


#include <stdio.h>#include "ugstdemo.h"#include "g722.h"#define BLK_LEN 256


g722_state encoder_state, decoder_state;int mode;char FileIn[180], FileOut[180];short smpno, tmp_buf[BLK_LEN], inp_buf[BLK_LEN], out_buf[BLK_LEN];FILE *Fi, *Fo;

/* Get parameters for processing */GET_PAR_S(1, "_Input File: .................. ", FileIn);GET_PAR_S(2, "_Output File: ................. ", FileOut);GET_PAR_I(3, "_Mode: ........................ ", mode);

/* Initialize state structures */g722_reset_encoder(&encoder_state);g722_reset_decoder(&decoder_state);

/* Opening input and output 16-bit linear PCM speech files */Fi = fopen(FileIn, RB);Fo = fopen(FileOut, WB);


/* Encode input samples in blocks of length BLK_LEN */smpno = g722_encode(inp_buf, tmp_buf, BLK_LEN, &encoder_state);

/* Decode G.722-coded samples in blocks of length BLK_LEN */smpno = g722_decode(tmp_buf, out_buf, mode, smpno, &decoder_state);

/* Write 16-bit linear PCM output decoded samples */fwrite(out_buf, smpno, sizeof(short), Fo);

/* Close input and output files */fclose(Fi); fclose(Fo);

Chapter 10

RPE-LTP: The full-rate GSM codec

In 1988, the Groupe Special Mobile of the Conference Europeene des Postes et Telecom-munications (CEPT) approved the first generation of a pan-European digital cellular radiosystem operating at a net rate of 13 kbit/s1. Its speech coding algorithm, the RPE-LTP(Regular Pulse Excitation, Long Term Predictor) was a compromise solution of the twobest coders at that stage. The full-rate GSM system started operation in the beginningof 1992 in some European countries and its expansion is expected in a mid-term. Thiscoder, despite not being an ITU-T standard, is relevant for standardization studies whenscenarios involving tandeming conditions between the PSTN and the European cellularsystem need to be studied.

The current version of the STL includes a RPE-LTP implementation based on a freelyavailable implementation originally produced at the Technical Institue of the Universityof Berlin, for a Unix environment. This code has been adapted, corrected to work onseveral platforms, and tested with the recommended test vectors, all properly processed.

Details on the algorithm can be found in several references [41, 42, 43], besides the Rec-ommendation itself [44].

10.1 Description of the 13 kbit/s RPE-LTP algorithm

The RPE-LTP is a frame based coder, encoding 20 ms frames of input data at a time.The encoder converts each 160 sample frame (8 kHz sampling rate, 13 bits uniform PCMformat) into a bitstream frame of 260 bits. The decoder uses the 260 bitstream bits togenerate a frame of 160 reconstructed speech samples.

10.1.1 RPE-LTP Encoder

A simplified block diagram of the RPE-LTP encoder [44] is shown in figure 10.1.

The input speech frame, consisting of 160 uniform 13 bits PCM signal samples, is firstpre-processed to produce an offset-free signal, which is then subjected to a first-order pre-

1The GSM standard developed initially under the responsability of the CEPT was later transferredto the European Standardisation Telecommunications Institute (ETSI), and the acronym GSM had itsmeaning changed to Global System for Mobile Communications. Currently, the GSM specifications arebeing maintained by the Third Generation Partnership Project, 3GPP (www.3gpp.org).

127

www.3gpp.org


Mux

coded as Log−Area Ratios

Reflection Coefficients

(36 bits each 20 ms)

Short term

LPC

(47 bits each 5 ms)

RPE parameters

LTP parameters

(9 bits each 5 ms)

(1) (2)

(4) (5)

(1) Short term residual

(2) Long−term residual (40 samples)

(3) Short−term residual estimate (40 samples)

(4) Reconstructed shirt−term residual (40 samples)

(5) Quantized long term residual (40 samples)

Signal

Input Short term

analysis

RPE grid

selection

analysis

filter

RPE grid

decoding and

Pre−

Processing

analysis

filter and coding

positioning

analysis

LTP

−(3)

Long term

(6)

(6) Bitstream to the decoder

Figure 10.1: Simplified block diagram of the RPE-LTP encoder.

emphasis filter. The 160 samples obtained are then analyzed to determine the coefficientsfor the short-term analysis filter (LPC analysis). Using these coefficients for the filteringof the same 160 samples produce the 160 samples of the short-term residual signal. Thefilter parameters are represented as reflection coefficients which are transformed to log-area ratios (LARs) before transmission.

For the following operations, the speech frame is divided into 4 sub-blocks consisting eachof 40 samples. Before the processing of each sub-block, the parameters of the long-termanalysis filter, the LTP lag and the LTP gain, are estimated and updated in the LTPanalysis block. Estimation and update is performed on the basis of the signal in thecurrent sub-block and a stored sequence of the 120 previously reconstructed short-termresidual samples.

A block of 40 long-term residual signal samples is obtained by subtracting 40 estimates ofthe short-term residual from the short-term residual signal itself. The resulting block isfed to the Regular Pulse Excitation (RPE) analysis which performs the basic compressionfunction.

As a result of the RPE-analysis, the block of 40 input long-term residual samples isrepresented by one of 4 candidate sub-sequences of 13 pulses each. The subsequenceselected is identified by the RPE grid position. The 13 RPE pulses are encoded usingAdaptive Pulse Code Modulation (APCM) with estimation of the sub-block amplitudewhich is transmitted to the decoder as side information. The RPE parameters are alsofed to a local RPE decoding and reconstruction module which produces a block of 40samples of the quantized version of the long-term residual signal. By adding these 40quantized samples of the long-term residual to the previously obtained block of short-term residual signal estimates, a reconstructed version of the current short-term residualsignal is obtained. The block of reconstructed short-term residual signal samples is thenfed to the long-term analysis filter which produces the new block of 40 short-term residualsignal estimates to be used for the next sub-block thereby completing the feedback loop.


decoding and

positioning

RPE grid Short term

filter

synthesis Processing

Post−

Signal

Output

filter

synthesis

Long term

(36 bits each 20 ms)

coded as Log−Area Ratios

Reflection Coefficients

RPE

parameters

(47 bits each 5 ms)

LTP parameters

(9 bits each 5 ms)

From

encoder Dem

ux

Figure 10.2: Simplified block diagram of the RPE-LTP decoder.

10.1.2 RPE-LTP Decoder

The simplified block diagram of the RPE-LTP decoder [44] is shown in figure 10.2.

The decoder includes the same structure as the feed-back loop of the encoder. In error-free transmission, the output of this stage will be the reconstructed short-term residualsamples. These samples are then applied to the short-term synthesis filter followed by thede-emphasis filter resulting in the reconstructed speech signal samples.

10.2 Implementation

This implementation of the RPE-LTP algorithm is composed of several source files. Theinterface routines are in rpeltp.c, with prototypes in rpeltp.h.

Originally written to be a device driver in Unix (known as toast), its interface was adaptedto the specifications of the ITU-T STL, and modified to operate correctly in a variety ofplatforms, like VAX, IBM PC compatibles, and Unix workstations (Sun and HP).

The problem of storing the state variables was solved by defining a structure containing allthe necessary variables, defining a new type called gsm, which is a pointer to a structure.By means of this approach, several streams may be processed in parallel, provided thatone structure is assigned (and that one call to the encoding/decoding routines is done)for each data stream (this can be advantageous for machines with support for parallelprocessing). The RPE-LTP state structure has the following fields (all except L z2 andmp are short, which are long and int, respectively):

dp0 Memory of 280 past samplesz1 DC-offset removal filter memoryL z2 DC-offset removal filter parameter.mp Preemphasisu Eighth-order short term LPC analysis coefficientsLARpp Log Area Ratio arrayj Indexnrp Long-term synthesis parameter


v Ninth order short-term synthesis vectormsr Post-processing parameterverbose Flag used only if compiled with NDEBUG==0fast Enables fast but inaccurate computation. Does not properly

process the test sequences with this mode turned on.

Table 10.1 presents the RPE-LTP encoder output parameters in order of occurence, withparameters defined in [44]. It should be noted that the bitstream file generated by theSTL implementation of the RPE-LTP algorithm uses an unpacked format, as other codecsin the STL. Therefore, each of the 76 parameters indicated in table 10.1 occupy an un-signed, right-adjusted 16-bit word. Unlikely to the G.711 and G.726 algorithms, however,the number of significant bits per bitstream parameter is not the same for all the pa-rameters, as can be seen from the table. An important implication is that the STL biterror insertion routines cannot be applied directly to the bitstream generated by the STLRPE-LTP encoder. This limitation is not a function of the EID module itself, but ofthe serialization and parallelization (S/P) routines serialize * and parallize * imple-mented in the Utility module, which are able only to handle bitstreams that have thesame number of valid bits per sample. Solution to this problem still needs to be imple-mented in the STL. It should be noted however that, since the full-rate GSM channelcoding is not implemented in the STL, bit error insertion directly in the unprotectedRPE-LTP bitstream will generally not be used. Should the user need bit error insertionin the unprotected RPE-LTP bitstream, there are two possible solutions:

• it will be necessary to pack the bits for each parameter in such a way that, as seen bythe S/P routines, each sample in the packed bitstream will have a constant numberof valid bits per bitstream sample. Since there are 260 (4 × 5 × 13) bits for eachframe, possible combinations are packed bitstreams with 65 16-bit words, of whichthe lower 4 bits are meaningful, or with 20 16-bit words, of which the lower 13 bitsare meaningful. The former is preferred, despite the longer files generated.

• the user may modify the demonstration program to generate or accept (dependingon whether it is an encoding or decoding operation) a serial bitstream format, asunderstood by the EID module, instead of a parallel bitstream format.

From the users’ perspective, the encoding function is rpeltp encode, and the decodingfunction is rpeltp decode. Before using these functions, the state variable for eitherthe encoder or the decoder must be initialized by rpeltp init. It should be notedthat encoder and decoder need individual state variables to work properly. After all theprocessing is performed, the memory allocated for the state variables can be freed bycalling rpeltp delete. The following sub-sections describe these four entry functions forthe STL RPE-LTP module.

10.2.1 rpeltp encode

Syntax:

#include "rpeltp.h"

void rpeltp encode (gsm rpe state, short *inp buf, short *rpe frame);

Prototype: rpeltp.h


Table 10.1: RPE-LTP bitstream format for each 20 ms speech frame.Parameter Parameter Number

Number of BitsLAR1 1 6LAR2 2 6LAR3 3 5LAR4 4 5LAR5 5 4LAR6 6 4LAR7 7 3LAR8 8 3

Sub-frame No. 1LTP lag 9 7LTP gain 10 2

RPE grid position 11 2Block amplitude 12 6RPE-pulse no. 1 13 3

. . . . . . . . .RPE-pulse no. 13 25 3



. . . . . . . . .RPE-pulse no. 13 42 3



. . . . . . . . .RPE-pulse no. 13 59 3



. . . . . . . . .RPE-pulse no. 13 76 3


Description:

Simulation of the GSM full-rate RPE-LTP encoder. The 16-bit, left-justified linear-PCMinput array of shortsamples inp buf are processed by the RPE-LTP encoder and theencoded bit-stream is returned in the right-justified array of shortsamples rpe frame,with one sample for each encoded parameter. The input frame has 160 samples and theencoded frame has 76 samples.

The state variables are saved in the structure pointed by rpe state, previously initialized bya call to rpeltp init(). The reset can be stablished by making a call to rpeltp init().

Variables:

rpe state . . . . . . . . . . . . . . A pointer to the state variable structure. All the variableshere are for internal use of the RPE-LTP algorithm and shouldnot be changed by the user. Fields of this structure are de-scribed above.

inp buf . . . . . . . . . . . . . . Is the linear-PCM input sample buffer which must have 160left-justified 16-bit linear-PCM shortsamples. Only the 13MSb are used.

rpe frame . . . . . . . . . . . . . . Is the encoded sample buffer. Each shortsample will containthe encoded parameters as right-justified samples. The actualnumber of significant bits per sample will depend on eachparameter.

Return value: None.

10.2.2 rpeltp decode

Syntax:

#include "rpeltp.h"

void rpeltp decode (gsm rpe state, short *rpe frame, short *out buf);

Prototype: rpeltp.h

Description:

Simulation of the GSM full-rate RPE-LTP decoder. The encoded bit-stream in the inputarray of right-justified shortsamples rpe frame is used to reconstruct a block of the speechsignal using the RPE-LTP decoder. The reconstructed speech block is returned in the16-bit, left-justified linear-PCM output array of shortsamples inp buf. The input framehas 76 samples and the decoded frame has 160 samples.

The state variables are saved in the structure pointed by rpe state, previously initializedby a call to rpeltp init(). The reset can be established by calling rpeltp init().

Variables:

rpe state . . . . . . . . . . . . . . A pointer to the state variable structure. All the variableshere are for internal use of the RPE-LTP algorithm and shouldnot be changed by the user. Fields of this structure are de-scribed above.

rpe frame . . . . . . . . . . . . . . Is the encoded sample buffer, which must have 76 right-justifiedshortsamples. The actual number of bits per sample will de-pend on each parameter.


out buf . . . . . . . . . . . . . . Is the output samples buffer, which will contain 160 left-justified, 13-bit linear-PCM shortsamples. The three LSbsare set to zero.

Return value: None.

10.2.3 rpeltp init

Syntax:

#include "rpeltp.h"

gsm rpeltp init (void);

Prototype: rpeltp.h

Description:

Initializes the state variables for the RPE-LTP encoder or decoder. Combined coder anddecoder operation requires a different state variable for the encoding and the decodeingpart.

Variables: None.

Return value:

A pointer to an initialized state variable structure defined by the type gsm. Returns NULLin case of failure.

10.2.4 rpeltp delete

Syntax:

#include "rpeltp.h"

void rpeltp init (gsm rpe state);

Prototype: rpeltp.h

Description:

Releases memory allocated to a state variable previously initialized by rpeltp init().

Variables:

rpe state . . . . . . . . . . . . . . A pointer to a previously initialized RPE-LTP state variablestructure.

Return value:

None.


The portability test for these routines has been done using the test sequences designedby the GSM for the RPE-LTP (available from ETSI), which were also used to verifythe compliance of the encoding and decoding function to the full-rate GSM voice codecRecommendation [44, Annex C].


This routine has been tested in VAX/VMS with VAX-C and gcc, in the PC with Borland Cv3.0 (16-bit mode) and gcc (32-bit mode). In the Unix environment in a Sun workstationwith cc, acc, and gcc, and in HP with gcc. In all tested cases, 100% of the test sequencespassed when the following symbols were defined at compilation time: SASR, USE FLOAT MUL

and NDEBUG. The symbol FAST must not be defined for perfomance compliant with theGSM 06.10 Recommendation, while USE FLOAT MUL must be defined at compilation time.The symbol NeedFunctionPrototypes must be undefined for pre-ANSI-C compilers (e.g.SunOS cc compiler).

10.4 Example code


One program is provided as demonstration program for the RPE-LTP module, rpedemo.c.

Program rpedemo.c accepts input files in either 16-bit linear PCM format, 16-bit, right-justified A-law format, or 16-bit, right-justified µ-law format for the encoding operation.The output of the decoder can also be in any of these formats, but it will have the sameformat as the encoding operation if encoding and decoding is performed in a single pass(default). If the encoding and decoding operations are performed in separate steps, theformat of the output signal does not need to match the format of the original linear PCMsignal. The encoder output and decoder input are signals in 16-bit, right-justified samples,as described before in Sections 10.2.1 and 10.2.2. Three operations are possible: encodeand decode in a single pass (default), encode-only (option -enc), or decode-only (option-dec).


The following C code gives an example of RPE-LTP coding and decoding using as input13-bit, linear-PCM speech samples, which are encoded and decoded at 13 kbit/s.

#include <stdio.h>#include "ugstdemo.h"#include "rpeltp.h"

#define BLK_LEN 160

int main(argc, argv)int argc;char *argv[];

gsm encoder_state, decoder_state;

char FileIn[180], FileOut[180];short bs_buf[BLK_LEN], inp_buf[BLK_LEN], out_buf[BLK_LEN];FILE *Fi, *Fo;

/* Get parameters for processing */


GET_PAR_S(1, "_Input File: .................. ", FileIn);GET_PAR_S(2, "_Output File: ................. ", FileOut);

/* Initialize state structures */encoder_state = rpeltp_init();decoder_state = rpeltp_init();


/* File processing */reset = 1; /* set reset flag as YES */while (fread(inp_buf, BLK_LEN, sizeof(short), Fi) == BLK_LEN)

/* Encode input linear PCM samples */rpeltp_encode(encoder_state, inp_buf, bs_buf, BLK_LEN);

/* Decode samples */rpeltp_decode(decoder_state, bs_buf, out_buf);

/* Write decoded samples */fwrite(out_buf, BLK_LEN, sizeof(short), Fo);

if (reset)reset = 0; /* set reset flag as NOMORE */

/* Free memory */rpe_delete(decoder_state);rpe_delete(encoder_state);

/* Close input and output files */fclose(Fi);fclose(Fo);return 0;

Chapter 11

Duo-MNRU: The Dual-modeModulated Noise Reference Unit

For evaluation of the quality of a system or equipment, it is important to express thequality measure in a unit suitable for comparison with other reference (or well-known)equipments and systems. A common way of representing these figures is by means of rela-tive units, where the quality is expressed by means of a unique figure, in a unidimensionalscale.

But it is insuficient to be unidimensional; the scale must be inequivocal, with a universalmeaning. As an example, the ACR scale (Absolute Category Rating, [45, Annex B]),which is a scale used for listening opinion tests and has five points termed Excellent,Good, Fair, Poor, and Bad, is inadequate: besides it shows a continuum of quality points,the meaning of the adjectives are far from universal, varying from language to language,and from person to person. Exchange of information on the performance of these systemsand equipments is easier and more consistent with more objective measures. The issue ofhow the MNRU is to be used as a reference system in subjective tests has been studiedin ITU-T Study Group 12, which is described in ITU-T Recommendation P.830 [8] in itsSections 8.2.2 and 11.

The Modulated Noise Reference Unity (MNRU) was introduced as a means to controlleddegradations that are representative of the non-linear distortion introduced by waveformcoding techniques. Initially aiming at evaluating the quality of log-PCM waveform codingsystems, it has been used in the process of generating several ITU-T standards, such asthe ITU-T G.726 (32 kbit/s), G.722, G.728, and G.729.

The concept of such reference unit was published in [46]. The first system aimed at wasthe PCM coding with logarithmic compression (today world-wide available by means ofthe G.711 Recommendation), whose main characteristic is to have a considerably uniformsignal-to-noise ratio (SNR) over a wide range of amplitudes. Moreover, the quantizingnoise is correlated to the signal: if no signal is present, no quantization noise is produced1,and large signals will produce more quantization noise than small ones. Therefore, themain characteristic of this reference unit should output speech corrupted by a speech-correlated noise.

In [46], the speech-correlated noise generation was based on a double-balanced ring mod-

1This is obviously academic, because always there will be idle noise, among others, in the absence ofan input signal.

137


removalDC

k

k

ν( )

ξ( )h(k)

G

G

n(k)

y(k)

N

S

NoiseGaussian

x(k)

Figure 11.1: Block diagram of the “digital” MNRU. The bandwidth of the output filterh(k) is 0–3400 Hz for the narrowband case, and 0–7000 Hz for the wideband case.

ulator, controlled by the input speech signal, which modulates a noise carrier generatedby a noise generator having a relatively uniform energy distribution, there in the range of0–20kHz. This correlated noise is then added to the input signal, with gains applied suchthat a controlled signal-to-noise ratio is obtained in the output, after the 300–3400Hzband-limiting filter.

With the 1996 revision of the MNRU description published in ITU-T RecommendationP.8102, specific guidelines were given for “digital implementations”3, eliminating many ofthe ambiguities possible in earlier descriptions [47], as explained in the STL92 manual[48, Chapter 8]. Figure 11.1 shows a block diagram of the “digital” MNRU. Also, this im-plementation allows for transparent operation on narrowband or wideband speech, hencebeing known as Dual-mode MNRU, or “Duo-MNRU”, for short.


The de-facto reference implementation of the MNRU4 is the same of the original descrip-tion, whose specification can be found in ITU-T Recommendation P.810 [49] (formerlyITU-T Recommendation P.81 [47]). This Recommendation describes two MNRU schemes,one called Narrow-band MNRU, and another, Wideband MNRU. Wideband MNRU isapplicable to systems where wideband speech (70–7000Hz) is expected, whereas Narrow-band MNRU is for telephone bandwidth (300–3400Hz). Both narrowband and widebandMNRUs are implemented in this version of the ITU-T Software Tools Library.

The basic block diagram of the P.810 MNRU is found in figure 11.1. In summary, thereare two paths, one called signal path, another called noise path. In the noise path, gaussiannoise (uniform in a range at least the cutoff frequency of the low-pass filter in the output

2Formerly known as ITU-T Recommendation P.81.3The revised P.81 define a “digital implementation” either as a digital hardware implementation or

as a software implementation of the MNRU.4Developed by the British Telecom and licenced to Malden Electronics.


of the MNRU) is modulated by the incoming signal. The result is then added with theoutput from the signal path. The gains are set such that the gain (in dB) applied in theoutput of the noise path is the signal-to-correlated-noise ratio Q, in the output of theband-pass filter, as calculated in the section to follow.

In analytical terms, the signal corrupted by the modulated noise y(k) is

y(k) = (Gsx(k) + Gnx(k)n(k)) ∗ h(k)

where Gs is the gain of the signal path, Gn is the gain of the noise path, x(k) is the inputsignal, and n(k) is the gaussian noise signal; the symbol ∗ means convolution, and h(k)is the band-pass filter.

If we suppose that the band-pass filter has |H(f)| = 1 in its pass band, and calling Q thesignal-to-noise-ratio (SNR) at its output, we may write:

10Q/10 =σ2

ξ

σ2ν

=E[ξ2(k)]

E[ν2(k)]=

G2sE[x2(k)]

G2nE[x2(k)n2(k)]

But x and n are uncorrelated, and the noise is gaussian with mean 0 and variance 1(N(0,1)):

10Q/10 =(

Gs

Gn

)2 σ2x

σ2xσ

2n

=(

Gs

Gn

)2

orQ = Γs + Γn

Γs = 20 log10(Gs)Γn = −20 log10(Gn)

If we set Γs = 0 (Gs = 1), Q is exactly Γn (or, Gn = 10−Q/20), i.e., the SNR is the gain(in dB) of the noise path and the previous expression may be written as:

y(k) = [x(n) + 10−Q/20x(k)n(k)] ∗ h(k)

or approximately

y(k) = x(k) + 10−Q/20x(k)n(k)

in the passband region of H(f)|.When both Gs and Gn are non-zero, the MNRU is in an operational mode normally calledModulated-noise mode. This is the most common operation mode.

Alternatively, if one consider Gs = 0, the output of the algorithm is only the correlatednoise, at a level Q dB below the input signal. This is Noise-only mode.

If, on the other hand, Gn = 0, the output of the algorithm is the input signal filtered byh(k), with a gain Gs; this is the Signal-only mode.

11.2 Implementation

This implementation of the MNRU algorithm can be found in the module mnru.c, withprototypes in mnru.h. A thorough characterization of this module is presented in [50].The previous version of the ITU-T STL MNRU was applicable to narrowband signals and


evolved from a Fortran implementation which had been used by several laboratories, espe-cially by participants of ETSI’s contest for the second generation of Digital Mobile RadioSystems, and was originally written by experts at CSELT/Italy (sometimes referred asCSELT MNRU), an implementation fully compliant with the narrowband MNRU speci-fication available in the then-in-force P.81 [47].

With the revision of MNRU specification, several changes had to be made to the STL92MNRU:

• The need for an upsampling by a factor of 5 before summation of the modulatednoise to the input speech was eliminated because now for digital implementations,the bandwidth of the multiplicative noise shall have the bandwidth of the inputsignal. In the previous version, the noise bandwidth had to be 20 kHz.

• The output filter for digital implementations shall be a low-pass filter, instead ofthe bandpass filter of the previous version of the MNRU

• The need of an input speech DC-component removal filter was added to the speci-fication.

These changes, especially the elimination of the 5:1 speech data rate conversion, allowedfor the implementation of both the narrowband and the wideband MNRU within the sameC function, when the output filter is adequately designed [50, pp.7–12].

The random number generator (RNG) algorithm was also modified to allow for real-timeimplementations, and the solution adopted was based on Aachen University’s approachused by the Host Laboratory for the ITU-T G.729 Selection Tests.

(8k)Table

removal

k

k

ν( )

ξ( )h(k)

Gn(k)

y(k)

N

x(k)

Noise(8 sums)

( =0.985)α

DC

Figure 11.2: STL MNRU implementation.

The block diagram of the MNRU implemented in the STL is in figure 11.2.

The MNRU works internally on a sample-by-sample basis but for ease of interface withother speech coding functions, access to it is made on a sample block basis. It should benoted however that the filters have memory, as well as do the random number generator,hence state variables are needed. These state variables have been arranged as fields of astructure whose type name is MNRU state. The fields of the structure are:


seed RNG’s seedsignal gain Gain of the signal pathnoise gain Gain of the noise pathvet Array for intermediate datalast xk x(k − 1) used as memory for the DC-removal filterlast yk ξ(k − 1) (see figure 11.2), used as memory for the DC-

removal filterDLY[2][2] Memory of delayed samples for two second-order stages

(first index) for first- and second-order delays (secondindex)

A[2][2] Numerator coefficients for the stage indicated by thefirst index and delay-order inidcated by the second index

B[2][2] Denominator coefficients for the stage indicated by thefirst index and delay-order inidcated by the second index

rnd state State structure for MNRU’s random number generator.Detailed description is found in the section on the ran-dom number generator.

rnd mode Operational mode of the random number generatorclip Number of samples clipped in the noise-insertion process

The values of the fields shall not be altered by the user.

Filters in the MNRU module

The composite frequency response of the narrowband and wideband MNRU filters isshown in figure 11.3. Figure 11.5 shows the contribution of the output low-pass filter for(a) the narrowband, and (b) the wideband cases. Figure 11.4 shows the effect of the inputDC-removal filter for (a) the narrowband and (b) the wideband operation modes of theMNRU. Details on the design of the output low-pass filters are given in [50]. The frequencyresponses have been obtained by exciting the MNRU module with digital sinewaves andcomputing the ratio of input and output signals, in dB.

The input DC-removal filter was implemented using a first-order IIR pole-zero filter de-fined by

Hi(z) =1− z−1

1− αz−1

with α=0.985. Its -3dB point is at 16 Hz for the narrowband case and at 38 Hz for thewideband case.

The output low-pass filter was implemented using a second-order cascade-form IIR filterwith two-sections as illustrated in figure 11.6 and defined by the equation:

Hi(z) = A2∏

k=1

a0k + a1kz−1 + a2kz

−2

1 + b1kz−1 + b2kz−2

IIR filters were chosen because of their low computational complexity when compared toFIR implementations, allowing for a more efficient MNRU implementation.


0 500 1000 1500 2000 2500 3000 3500 4000Frequency [Hz]

-60.0

-55.0

-50.0

-45.0

-40.0

-35.0

-30.0

-25.0

-20.0

-15.0

-10.0

-5.0

0.0

5.0

Am

plitu

de R

espo

nse

[dB

]

Duo-MNRU response for narrow-band signals(with DC removal and output low-pass filters)

(a) Narrowband Duo-MNRU

0 1000 2000 3000 4000 5000 6000 7000 8000Frequency [Hz]

-60.0

-55.0

-50.0

-45.0

-40.0

-35.0

-30.0

-25.0

-20.0

-15.0

-10.0

-5.0

0.0

5.0

Am

plitu

de R

espo

nse

[dB

]

Duo-MNRU response for wideband signals(with DC removal and output low-pass filters)

(b) Wideband Duo-MNRU

Figure 11.3: Total frequency response of the Duo-MNRU filters.


0 5 10 15 20 25 30 35 40 45 50Frequency [Hz]

-30.0

-25.0

-20.0

-15.0

-10.0

-5.0

0.0

5.0

10.0

Am

plitu

de R

espo

nse

[dB

]

Narrow Band MNRU DC Removal Filter(α=0.985)

-3 dB point


0 10 20 30 40 50 60 70 80 90 100Frequency [Hz]

-30.0

-25.0

-20.0

-15.0

-10.0

-5.0

0.0

5.0

10.0

Am

plitu

de R

espo

nse

[dB

]

Wideband MNRU DC Removal Filter(α=0.985)

-3 dB point


Figure 11.4: DC removal filter for the Duo-MNRU.


3300 3350 3400 3450 3500 3550 3600Frequency [Hz]

-20.0

-15.0

-10.0

-5.0

0.0

5.0

Am

plitu

de R

espo

nse

[dB

]

Duo-MNRU for narrow-band signals(output low-pass filter)

P.81 mask

P.81 mask

Response


6600 6700 6800 6900 7000 7100 7200Frequency [Hz]

-20.0

-15.0

-10.0

-5.0

0.0

5.0

Am

plitu

de R

espo

nse

[dB

]

Duo-MNRU response for wideband signals(with DC removal and output low-pass filters)


Figure 11.5: Output low-pass filter for the Duo-MNRU.


z-1

-1z

z-1

z-1

(1)

(1)(1)

(1)

H (1)

H (3)

H (2)

1 i

i H (0)i

i

ik

iy (k)ix (k) i

H (0)i

i

i

1

H (3)

H (2)

H (1)

(Li)

(Li)(Li)

(Li)

Figure 11.6: MNRU Filters Structure.

Random Number Generator for the MNRU module

The random number generator (RNG) used in this implementation was chosen using thefollowing criteria:

• the desired value for Q, Qd, and the measured Q, Qm, should bevery close for a wide range of Q, e.g., Q from 0 to 50 dB.

• it should show a good approximation of a gaussian distribution.This is needed because it is specified in P.810 and more importantlybecause uniform distributions do not allow good matching betweenthe desired and measured values of Q.

• the algorithm needed to be portable (i.e., identical results are gotin different platforms if the same seed is given).

The RNG chosen to be used in the STL92 version of the MNRU was based on Knuth’sSubtractive Method [51],[18, Parts 3.2–3.3], which generates adequate random sequencesbut is computationally intensive and was too complex to be implemented in a real-timedigital hardware MNRU.

The implementation used in the ITU-T G.729 8 kbit/s speech codec selection tests wasbased on a gaussian-noise table lookup, in a manner similar to Malden Electronic’s MNRUimplementation.5 This approach is considerably less computationally intensive than theSTL92 approach, and was used to further reduce the complexity of the MNRU implemen-tation.

5Malden’s MNRU uses a ROM table derived from a Gaussian distribution with 4096 samples uni-formly distributed throughout the table. An address in the table is uniformly sampled four times andaccumulated to form a gaussian noise sample.


After several experiments [50], a table with 8192 gaussian samples was chosen to be used,which is randomly and uniformly accessed 8 times (i.e., an eight-time sample accumu-lation) to be used by the MNRU algorithm. The gaussian table itself is generated inrun-time (rather than being stored in the data memory of the source or object code)using the Monte-Carlo substitution algorithm. The Monte-Carlo algorithm uses a linearcongruential generation (LCG) algorithm defined by

Ij = 69069Ij−1 + 1 (mod 232)

which is converted to numbers in the range [0..1] using the upper 24 bits of the 32-bitunsigned long Ij. I0 is a fixed seed equal to 314159265. This algorithm is used to generatethe necessary initial random samples for the substitution algorithm.

Once the table has been filled, during the normal operation of the MNRU, eight successivesamples are drawn (uniformily) from the table using a different LCG algorithm

Lj = 253Lj−1 + 1 (mod 224)

of which the upper 13 bits are used to generate random numbers uniformly distributedbetween 0 and 8191. L0 is a fixed seed equal to 12345. Both LCGs were implemented asin Aachen University’s MNRU implementation.

Since different ranges are necessary for table filling and for gaussian sample generation, twodifferent LCG random number generators were used to avoid any additional calculationsdue to range convertion and to reduce the software load.

Since the Monte-Carlo RNG is used only at startup time, it is not necessary to keepany state variables for it. The sample-drawing RNG however needs to keep stored thepreviously generated index, which is stored in a structure of type RANDOM state, whoseonly field is (as defined in mnru.h)6:

gauss Index for next random number;

The field in RANDOM state should not be altered by the user in any situation.

The operational modes are defined in mnru.h:

#define RANDOM RUN 0

#define RANDOM RESET 1

The noise modulation routine is MNRU process, which is described next.

11.2.1 MNRU process

Syntax:

#include "mnru.h"

double *MNRU process (char operation, MNRU state *s, float *input, float

*output, long n, long seed, char mode, double Q);

6The use of a structure instead of a single variable in the parent structure (MNRU state) allows forunimplemented features to be easily added in a later version of the algorithm.


Prototype: mnru.h

Description:

Module for addition of modulated noise to a vector of n samples, according to ITU-TRecommendation P.810, for either the narrowband or the wideband model. Dependingon the mode, this function:

• adds modulated noise to the input buffer at a SNR level of Q dB,saving to output buffer (mode==MOD NOISE);

• puts into output only the noise, without addition of the originalsignal (mode==NOISE ONLY);

• produces in the output a filtered-only (no noise added) version ofthe ‘input’ samples (mode==SIGNAL ONLY);

The symbols MOD NOISE, NOISE ONLY, and SIGNAL ONLY are defined in mnru.h.

Although the MNRU algorithm operates on a sample-by-sample basis, MNRU process

handles the input data in blocks of n samples, for better computational efficiency.

The implementation of the MNRU algorithm has three operational states, called MNRU START,MNRU CONTINUE and MNRU STOP. With MNRU START, the state variables are set, as well asmemory is allocated for the intermediate data, and this needs to be the first operationwith the algorithm. Differently from the speech voltmeter module, after the initializa-tion of the state variables, the normal calculations are carried out for the first block ofdata. Once reset, the algorithm changes the operation state to MNRU CONTINUE, and thenext calls to the MNRU algorithm will skip the reset operation. With the last block, itis adivisable to release the memory allocated to the intermediate data. This is accom-plished by calling MNRU process with the operational state set as MNRU STOP. These threeoperational states are defined in mnru.h as follows:

#define MNRU START 1

#define MNRU CONTINUE 0

#define MNRU STOP -1

Variables:

operation . . . . . . . . . . . . . . One of the defined operation status: MNRU START, MNRU STOP,MNRU CONTINUE.

s . . . . . . . . . . . . . . A pointer to a MNRU state structure.input . . . . . . . . . . . . . . Pointer to input float-data vector; must represent 8 or 16 kHz

speech samples.output . . . . . . . . . . . . . . Pointer to output float-data vector; will represent 8 or 16 kHz

speech samples.n . . . . . . . . . . . . . . Long with the number of samples (float) in input.seed . . . . . . . . . . . . . . Initial value for random number generator.mode . . . . . . . . . . . . . . Operation mode: MOD NOISE, SIGNAL ONLY, NOISE ONLY (de-

scription above).Q . . . . . . . . . . . . . . Double defining the desired value for the signal-to-modulated-

noise Q for the output data.

Please note that new values of seed, mode, and Q are considered only when operation is


MNRU START, because they are considered as INITIAL state values. Therefore, when theoperation is not MNRU START, they are ignored.

Return value:

Returns a (double *)NULL if not initialized or if initialization failed; returns a (double

*) to an intermediate data vector if reset was successful or is in the MNRU CONTINUE

(“run”) operation state.


In the development of this module, several steps were taken to assure its compliance toITU-T Recommendation P.810, which included:

• agreement of expected and measured Q values for tones and speech,

• addition of partial files,

• level of output files,

• frequency response of built-in filters.

Additionally to these objective measurements, a subjective test was performed. Theresults of this test are found in [50], where it was concluded that the new MNRU imple-mentation conforms to the P.810 and also behaves more closely to the hardware MNRUthan the previous STL92 version.

Additionally to the conformance tests, the algorithm was tested for portability using a1kHz tone file as input to the algorithm with Q values ranging from 0 to 50 dB in 5 dBsteps, and also for the algorithm in the SIGNAL ONLY mode. The processed test files werethen compared the the reference processed files (generated on a HP workstation). Testand reference files should be identical. The algorithm was found to compile and executecorrectly on MS-DOS under Borland Turbo-C++ 1.0 and under the MS-DOS port of theGNU-C compiler (gcc), on a HP UNIX workstation with cc (non-ANSI) and gcc, on aSun workstation with cc (non-ANSI) and also on VAX VMS and APX computers.

11.4 Example code


One demonstration program is provided for the MNRU module, mnrudemo.c. Irrespec-tive of whether the 16-bit, linear PCM input file is sampled at 8 or 16 kHz, programmnrudemo.c will add the multiplicative noise signal to the input signal at the user-definedQ level and produce as output a 16-bit, linear PCM file. Optionally, the program canproduce a signal-only file (equivalent to a very high Q value) or a noise-only file (thesignal path is disconnected).



The following C code gives an example of a possible use of the Duo-MNRU module. Theinput file speech is added to a multiplicative noise at a SNR defined by parameter Q. Allsamples in the file are processed.

#include <stdio.h>#include <stdlib.h>#include <math.h>#include "ugstdemo.h"#include "mnru.c" /* ... Include MNRU module ... */#include "ugst-utl.c" /* ... Include of utilities ... */

#define BLK_LEN 256main(argc, argv)

int argc;char *argv[];

/* File variables */char FileIn[80], FileOut[80];FILE *Fi, *Fo;MNRU_state state;short Buf[BLK_LEN];float inp[BLK_LEN], out[BLK_LEN];double QdB;long l;char MNRU_mode = MOD_NOISE, operation;

/* Read parameters for processing */GET_PAR_S(1, "_Input File: ................. ", FileIn);GET_PAR_S(2, "_Output File: ................ ", FileOut);GET_PAR_D(3, "_Desired Q: .................. ", QdB);

/* Check for parameter 4 to change MNRU operation mode */if (argc > 4)

MNRU_mode = toupper(argv[5][0]);if (MNRU_mode == ’S’) /* Signal-only mode */

MNRU_mode = SIGNAL_ONLY;else if (MNRU_mode == ’M’) /* Modulated noise, the default mode */

MNRU_mode = MOD_NOISE;else if (MNRU_mode == ’N’) /* Noise-only mode */

MNRU_mode = NOISE_ONLY;else

fprintf(stderr, "Bad mode chosen; use M,N,or S \n"); exit(2);

/* Opening input and output files */Fi = fopen(FileIn, RB);Fo = fopen(FileOut, WB);


/* INSERTION OF MODULATED NOISE ACCORDING TO P.810 (FEB.96) */

/* Set operation as start */operation = MNRU_START;

/* Process for all samples in file */while ((l = fread(Buf, sizeof(short), BLK_LEN, Fi)) != NULL)

/* Convert data from 16-bit short to normalized float */sh2fl_16bit((long) l, Buf, inp, 1);

/* MNRU processing */MNRU_process(operation, &state, inp, out, l, 314159265L, MNRU_mode, QdB);

/* Change operation mode: START --> CONTINUE */if (operation == MNRU_START)

operation = MNRU_CONTINUE;

/* Convert from normalized float to short (hard clip and rounding) */fl2sh_16bit((long) l, out, Buf, 1);

/* Save data to file */fwrite(Buf, sizeof(short), l, Fo);

/* Stop mode: Deallocation of memory, but process 0 samples */operation = MNRU_STOP;MNRU_process (operation, &state, inp, out, 0L, 0L, 0, (double) 0.0);

/* Finalizations */fclose(Fi);fclose(Fo);return(0);

Chapter 12

SVP56: The Speech Voltmeter


The specification for the measurement of the active level of speech signals is given in ITU-T Recommendation P.56 [52], and is commonly referred as speech voltmeter1. Besidesthe description above, there is complementary information in the ITU-T Handbook onTelephony [53], section on ‘Measurement of Speech’.

In summary, the P.56 algorithm takes samples of a signal in the speech bandwidth andcalculates its active speech level. This means that silence and idle noise are not taken intoaccount when calculating the level of the signal. Furthermore, structural pauses (pausesin the range of 250 ms which are inherent to the uterance process) are considered in themeasurements, but grammatical pauses (pauses between phrases or to emphasise words,generally in the range of 300 ms or more) are excluded, because they do not contributeto speech subjective loudness [53].

To decide about the activity or inactivity of a speech segment, the algorithm calculatesan envelope waveform, or short-term mean amplitude, such that pauses shorter that 100ms are not excluded, but pauses longer than 350 ms are2. A signal is considered activewhen its short-term mean level (envelope) exceeds a threshold level (or margin of ) 15.9dB below the prevailing speech voltage3, and also during short gaps between such burstsof activity.

A word of caution must be given here: the above mentioned margin above of 15.9 dB hasbeen optimized for speech with a low level of background noise. This means that in thecase of generation of material for listening subjective tests, once the original files have beenprocessed (already level equalized), especially by processes that add significant amountof noise to the files (e.g. MNRU for low values of Q), the P.56 algorithm shall not beutilized for re-equalization. Since the noise introduced by the processing algorithm will befar above the threshold discussed, the P.56 algorithm will generate wrong measurementsof the active level and speech activity. A practical way to observe whether the P.56 maybe utilized on processed files is to observe the activity factor: if it increases significantlyin relation to the original file’s activity factor, then the use of the P.56 for re-equalizations

1After the British Telecom and Malden Ltd.’s SV6 Speech Voltmeter.2Users will perceive a pause when it lasts more than about 350 ms.3The margin of 15.9 dB has been chosen to be comfortably above the circuit noise, while causing few

false detections or failures to detect, having being determined by subjective experiments.

151


should be discarded.

Another operating assumption for the P.56 Recommendation suggests that the inputsignal be band-limited (300–3400 Hz for telephony band signals and 100–7000 Hz forwideband signals), as given in Table 3 and Figure 2 of P.56.

The speech voltmeter algorithm is expressed in terms of discrete operations. Because ofthis, a minimum sampling frequency must be chosen, and the specification in P.56 givesit as 600 Hz. This is well below Nyquist frequency of the digitized sample’s normally usedfor telephony applications, either 8000 or 16000 Hz, which is explained by the fact thatthe matter of interest here is not signal’s frequency content’s information, but only signalstatistics. This is one of the unspecified details of the P.56 that may cause implementationsto differ.

After considering an input sample xi, the speech voltmeter performs two operations. First,the total energy of the signal is calculated (sq), updating also the number of samples nand signal’s (long-term) mean level s. Second, the envelope (or short-term mean level) qof the signal is extracted using a second-order exponential filtering:

pi = g · pi−1 + (1− g) · |xi|

qi = g · qi−1 + (1− g) · pi

with initial states p0 = q0 = 0 and the quantity g defined as:

g = exp(−1/(f · T ))

for f as the sampling frequency, in Hz, and T , a time constant for smoothing, equal to0.030s (30 ms).

With the envelope calculated, the algorithm calculates the number of times that theenvelope exceeds each of the threshold levels. The thresholds are represented in a vector cof B − 1 positions, where B is the resolution (number of bits) of the samples. The valuesin this vector range from half the maximum possible amplitude down to (or less than)one LSB (Least Significant Bit). In terms of practical implementations, the values of cj

are a power of 2:cj = 2j, j = 0 · · ·B − 2

There are three possible cases4:

• the envelope exceeds the threshold cj: increment the activity counter for thequantization level j, aj, and set the timer vector (or hangover counter) hj to zero.This operation means that the segment is active as far as the level j is concerned,and so the hangover counter must be set to zero, as well as the number of activesamples (aj) incremented.

• the envelope does not exceed the threshold level, but the hangover counterhj is less (or shorter) than I samples: this means that, besides the sample beinginto a pause segment (because the level is below the threshold), it is a structuralpause (because the time spent since the last activity burst is less than 200ms).Therefore, the action here is to increment the activity counter, as well as the hang-over counter for the level j.

4It is interesting to remark that the lower the threshold level, the greater the activity count for thatlevel will be.


• the envelope does not exceed the threshold level and the hangover timeexceeds I samples: this means that the sample is into a pause (because the levelis below the threshold); moreover, it is a grammatical pause (because the time spentsince the last activity burst is more than 200ms). Therefore, no increments are done.

Then, after all the samples of interest have been considered, three quantities have beenaccumulated:

1. total number of samples, n;2. signal energy, sq;3. an activity count aj for each threshold level cj, j = 0 · · ·B − 2.

The active level can be evaluated from these three parameters, as follows. First, thelong-term mean level is calculated:

L = 10 log10(sq/n)− 20 log(r)

and the activity counter and threshold vectors are converted to dB:

Aj = 10 log10(sq/aj)− 20 log(r)

Cj = 20 log10(cj)− 20 log(r)

where r is the 0 dB reference point for the measurements5.

In sequence, the difference between Aj and Cj is calculated for each j. When this differencelyes below the margin M (15.9 dB), then the active level6 A is found by interpolatingbetween this level and level − 1 (i.e., the nearest level k where Ak − Ck > M , whatgives k = − 1), using a bipartition (binary) interpolation algorithm. There are threespecial cases here:

• When = 0, then the active level is zero;• When |A − C − M | ≤ δ (where δ is the the given tolerance, or

degree of accuracy): the active level is A.• When |A−1 − C−1 −M | ≤ δ: the active level is A−1.

The tolerance δ is not specified in P.56, hence being implementation-dependent.

Once the active level is found, the only remaining point is the calculation of the activityfactor,

Activity = 10L−A

or, in percents,

Activity% = 100 · 10L−A

12.2 Implementation

This implementation of the speech voltmeter algorithm can be found in the modulesv-p56.c, with prototypes in sv-p56.h. This version evolved from a preliminary Fortranimplementation provided by Telebras, Brazil, which was used by several laboratories, in

5This is another unspecified detail in P.56. This implementation’s choice is given in next section.6The true active level is defined as the one which exceeds the threshold used for its derivation by a

M = 15.9 dB.


especial by participants of ETSI’s contest for the second generation of Digital MobileRadio Systems.

In Recommendation P.56, there are several undefined issues needed to be resolved forthe implementation of this module. Especially, the rate f used for the averages and thetolerance, or degree of accuracy, δ to be used for the interpolation of the active level haveto be defined. Another undefined parameter is the reference level, or 0 dB reference pointr. The choices of this implementation are shown in the table below:

Speech voltmeter parameters

Parameter Description Valuef sampling rate same rate of the input signal.r dB reference 0 dBov (see Chapter 2).δ tolerance ±0.5 dB (the same of M).

The P.56 algorithm operates on a sample-by-sample basis. However, since most softwareimplementations use blocks (or frames) of samples, the speech voltmeter was designed towork with blocks of samples. Measurements are cummulative, therefore state variables areneeded in this approach. These state variables have been arranged as fields of a structurewhose name is SVP56 state. The fields of the structure are7:

f Sampling frequency, in Hza[15] Activity countc[15] Threshold levelhang[15] Hangover countn Number of samples read since last resets Sum of all samples since last resetsq Squared sum of samples since last resetp Intermediate quantitiesq Envelopemax Max absolute value found since last resetrefdB 0 dB reference point, in [dB]rmsdB RMS value found since last resetmaxP Most positive value since last resetmaxN Most negative value since last resetDClevel Average level since last resetActivityFactor Activity factor since last reset

The user should note that although some fields are of interest to report signal statistics,such as long-term level, extreme values for file, average (or DC) level, etc., these valuesshall not be altered. See section 12.2.3, which describes macros for safe inpection of theparameters of interest.

The algorithm has two operational parts, one that deals with the initialization of the statevariables, and is carried out by the function init speech voltmeter, and the measuringpart (or the algorithm itself), carried out by speech voltmeter. These are presented inthe next two sections.

7All the fields are double, except the float f and the unsigned long a[], hang[], and n.


12.2.1 init speech voltmeter

Syntax:

#include "sv-p56.h"

void init speech voltmeter (SVP56 state *state, double f);

Prototype: sv-p56.h

Description:

init speech voltmeter performs the initialization of the speech voltmeter state variablesin the structure pointed by state to the appropriate initial values. The only value requiredfrom the user is the sampling rate f (in Hz) of the signal that the speech voltmeter issupposed to measure. Note that when measuring new speech material, the state variableshall be re-initialized, otherwise accumulation of previous measurements will happen andwrong measurements will be reported.

Variables:

state . . . . . . . . . . . . . . Is a pointer to a speech voltmeter state variable.f . . . . . . . . . . . . . . Is the sampling rate (in Hz) of the signal to be measured in

the next calls of speech voltmeter. If zero or negative, thesampling rate is initialized to 16000 Hz.

Return value:

None.

12.2.2 speech voltmeter

Syntax:

#include "sv-p56.h"

double speech voltmeter (float *buffer, long smpno, SVP56 state *state);

Prototype: sv-p56.h

Description:

speech voltmeter performs the measurement of the active level of a speech signal ac-cording to ITU-T Recommendation P.56. Other relevant statistics are also available inthe state variable (for details, see section 12.2.3 ahead):

• average level;• maximum and minimum amplitude values;• rms power, in dB;

Variables:

buffer . . . . . . . . . . . . . . Is the input sample float buffer.smpno . . . . . . . . . . . . . . Is the number of samples in buffer.state . . . . . . . . . . . . . . Is a pointer to the state variable buffer. This shall have been

initialized by a previous call to init speech voltmeter.

Return value: Returns the active speech level, in dB relative to dBov, as a double.


12.2.3 Getting state variable fields

Some macros are provided for the inspection of the speech voltmeter statistics:

Syntax:

#include "sv-p56.h"

SVP56 get rms dB(SVP56 state state);

SVP56 get DC level(SVP56 state state);

SVP56 get activity(SVP56 state state);

SVP56 get pos max(SVP56 state state);

SVP56 get neg max(SVP56 state state);

SVP56 get abs max(SVP56 state state);

SVP56 get smpno(SVP56 state state);

Description:

SVP56 get rms dB and SVP56 get DC level return respectively the long-term level (indBov) and the DC level (in the normalized range) calculated for the material, both as adouble.

SVP56 get activity returns the activity factor as a double, in percents (0..100%).

SVP56 get pos max, SVP56 get neg max, and SVP56 get abs max returns respectively themaximum positive, negative and absolute amplitudes found for the input data, as nor-malized double values (range –1.0..+1.0).

SVP56 get smpno returns as a unsigned long the total number of samples.

Variables:

All the macros expect a valid SVP56 state variable structure (not a pointer!).


Compliance tests of this module have been done based on the compliance with otherexisting implementations, especially of the Deutsches Bundespost Telekom ForschungsInstitute. Reported results were found to be within the error margins of the P.56 algo-rithm.

Portability was checked by running the same speech file on a proven platform and on atest platform. Results have to be identical, in especial long-term and active levels, as wellas the activity factor. During the development of this tool, the provided demonstrationprograms (see section 12.4) were used to measure and level-equalize a reference file. Thesetest files are provided in the STL distribution.

This module had portability tested for VAX/VMS with VAX-C and GNU C (gcc) and forMS-DOS with a number of Borland C/C++ compilers (Turbo C v2.0, Turbo-C++ v1.0,Borland C++ v3.1). Portability was also tested in a number of Unix workstations andcompilers: Sun workstation with Sun-OS and Sun-C (cc), acc, and gcc; HP workstationwith HP-UX and gcc.


12.4 Examples


As a part of the speech voltmeter module, two example programs are provided. They arecalled sv56demo.c and actlevel.c.

Both example programs calculate the equalization factor to equalize the active speechlevel of a file ‘NdB’ dBs below the 0 dBov reference using the algorithm described in thischapter. However, only program sv56demo.c carry out the level-equalization of the inputfile, which is saved in an aoutput file. Levels are reported in dBov.

In general, input files are in integer representation, 16-bit words, 2’s complement (i.e.,short data). In UGST convention, this data must be left-adjusted, rather than right-adjusted. Since the speech voltmeter uses float input data, it is necessary to convert fromshort (in the mentioned format) to float; this is carried out by the function sh2fl().In addition, the option to ‘normalize’ the input data to the range -1..+1 is selected. Afterthe equalization factor is found, results are reported on the screen, which varies accordingto the program used and some of the command-line options.

While program actlevel.c stops at this point, program sv56demo.c proceeds callingthe function scale() to carry out the (amplitude) equalization using single (rather thandouble) float precision. After equalization, the samples are converted back to integer(short, right-justified) with the routine fl2sh() using truncation, no zero-padding ofthe least significant bits, left-justification of data, and hard-clipping of data above theoverload point. After that, data is saved to the user-specified file .

12.4.2 Small example

Following is an simplification of the described demonstration programs. It only measuresthe statistics for the input file, without carrying out level equalizations and does notimplement the several command-line options of actlevel.c.

#include <stdio.h>#include <stdlib.h>#include <math.h>#include "ugstdemo.h" /* ... UGST demonstration program defs ... */#include "sv-p56.h" /* ... SV-P56 prototypes & defs ... */#include "ugst-utl.h" /* ... UGST utilities ... */#define BLK_LEN 256


SVP56_state state; /* Speech voltmeter state */char FileIn[180]; /* input file name */FILE *Fi; /* input file pointers */long N=BLK_LEN, l;short bitno, buffer[BLK_LEN];float Buf[BLK_LEN];


double ActiveLeveldB, sf, satur;

/* Reads parameters for processing */GET_PAR_S(1, "_Input File: ........... ", FileIn);

/* Checks parameters 2, and 3 for specification in command line */FIND_PAR_D(2, "_Sampling Frequency: .. ", sf, 16000);FIND_PAR_L(3, "_A/D resolution: ...... ", bitno, 16);

/* Calculate overload point in the non-normalized range */satur = pow ((double)2.0, (double)(bitno - 1));

/* Reset- variables for speech level measurements */init_speech_voltmeter(&state, sf);

/* Opening input file */Fi = fopen(FileIn, RB);

/* Read samples ... */while ((l = fread(buffer, N, sizeof(short), Fi)) > 0)

/* ... Convert samples to float, normalizing to +1..-1 */sh2fl((long) l, buffer, Buf, (long) state.bitno, 1);

/* ... Get the active level */ActiveLeveldB = speech_voltmeter(Buf, (long) l, &state);

/* If the activity factor is 0, don’t report many things */if (SVP56_get_activity(state) == 0)

printf("\n Activity factor is ZERO -- the file is silence or idle noise");else

printf("\n DC level: ................ %7.0f [PCM]",SVP56_get_DC_level(state) * satur);

printf("\n Maximum positive value: .. %7.0f [PCM]",SVP56_get_pos_max(state) * satur);

printf("\n Maximum negative value: .. %7.0f [PCM]",SVP56_get_neg_max(state) * satur);

printf("\n Long-term energy (rms): .. %7.3f [dBov]",SVP56_get_rms_dB(state);

printf("\n Active speech level: ..... %7.3f [dBov]", ActiveLeveldB);printf("\n Activity factor: ......... %7.3f [%%]",

SVP56_get_activity(state));fclose(Fi);

Chapter 13

BASOP: ITU-T Basic Operators

13.1 Overview of basic operator libraries

The fixed-point descriptions of G.723.1 and G.729 are based on 16- and 32-bit arithmeticoperations defined by ETSI in 1993 for the standardisation of the half-rate GSM speechcodec. These operations are also used to define the GSM enhanced full-rate (EFR) andadaptive multi-rate (AMR) speech codecs [1].

The version 2.0 of the ITU-T Basic Operators bears the following additional featurescompared to the version 1.x:

1. New 16-bit and 32-bit operators;

2. New 40-bit operators;

3. New control flow operators;

4. Revised complexity weight of version 1.x basic operators in order to reflect theevolution of processor capabilities.

13.2 Description of the 16-bit and 32-bit basic

operators and associated weights

This chapter describes the different 16-bit and 32-bit basic operators available in the STL,and are organized by complexity (”weights”). The complexity values to be considered (asof the publication of the STL2005) are the ones related to the version 2.0 of the module.When the basic operator had a different complexity value in the previous version of thelibrary (version 1.x), the previous complexity value is indicated for information. Whenthe basic operator did not exist in the previous version of the library (version 1.x), it ishighlighted as follows: “⇒ New in v2.0”.

13.2.1 Variable definitions

The variables used in the operators are signed integers in 2’s complements representation,defined by:

159


v1, v2: 16-bit variablesL v1, L v2, L v3: 32-bit variables

13.2.2 Operators with complexity weight of 1

Arithmetic operators (multiplication excluded)

add(v1, v2)

Performs the addition (v1+v2) with overflow control and saturation; the 16-bit result isset at +32767 when overflow occurs or at -32768 when underflow occurs.

sub(v1, v2)

Performs the subtraction (v1-v2) with overflow control and saturation; the 16-bit resultis set at +32767 when overflow occurs or at -32768 when underflow occurs.

abs s(v1)

Absolute value of v1. If v1 is -32768, returns 32767.

shl(v1, v2)

Arithmetically shift the 16-bit input v1 left v2 positions. Zero fill the v2 LSB of theresult. If v2 is negative, arithmetically shift v1 right by -v2 with sign extension. Saturatethe result in case of underflows or overflows.

shr(v1, v2)

Arithmetically shift the 16-bit input v1 right v2 positions with sign extension. If v2 isnegative, arithemtically shift v1 left by -v2 and zero fill the -v2 LSB of the result:

shr(v1, v2) = shl(v1, -v2)

Saturate the result in case of underflows or overflows.

negate(v1)

Negate v1 with saturation, saturate in the case when input is -32768:

negate(v1) = sub(0, v1)

s max(v1, v2) ⇒ New in v2.0

Compares two 16-bit variables v1 and v2 and returns the maximum value.

s min(v1, v2) ⇒ New in v2.0

Compares two 16-bit variables v1 and v2 and returns the minimum value.

norm s(v1)

Produces the number of left shifts needed to normalize the 16-bit variable v1 for positive


values on the interval with minimum of 16384 and maximum 32767, and for negativevalues on the interval with minimum of -32768 and maximum of -16384; in order tonormalise the result, the following operation must be done:

norm v1 = shl(v1, norm s(v1))

Note: In v1.x, the complexity weight of this operator was 15.

L add(L v1, L v2)

This operator implements 32-bit addition of the two 32-bit variables (L v1+L v2) withoverflow control and saturation; the result is set at +2147483647 when overflow occurs orat -2147483648 when underflow occurs.


L sub(L v1, L v2)

32-bit subtraction of the two 32-bit variables (L v1–L v2) with overflow control and sat-uration; the result is set at +2147483647 when overflow occurs or at -2147483648 whenunderflow occurs.


L abs(L v1)

Absolute value of L v1, with L abs(-2147483648)=2147483647.


L shl(L v1, v2)

Arithmetically shift the 32-bit input L v1 left v2 positions. Zero fill the v2 LSB ofthe result. If v2 is negative, arithmetically shift L v1 right by -v2 with sign extension.Saturate the result in case of underflows or overflows.


L shr(L v1, v2)

Arithmetically shift the 32-bit input L v1 right v2 positions with sign extension. If v2is negative, arithemtically shift L v1 left by -v2 and zero fill the -v2 LSB of the result.Saturate the result in case of underflows or overflows.


L negate(L v1)

Negate the 32-bit L v1 parameter with saturation, saturate in the case where input is-2147483648.


L max(L v1, L v2) ⇒ New in v2.0

Compares two 32-bit variables L v1 and L v2 and returns the maximum value.


L min(L v1, L v2) ⇒ New in v2.0

Compares two 32-bit variables L v1 and L v2 and returns the minimum value.

norm l(L v1)

Produces the number of left shifts needed to normalise the 32-bit variable L v1 for pos-itive values on the interval with minimum of 1073741824 and maximum 2147483647,and for negative values on the interval with minimum of -2147483648 and maximum of-1073741824; in order to normalise the result, the following operation must be done:

L norm v1 = L shl(L v1, norm l(L v1))


Multiplication operators

L mult(v1, v2)

Operator L mult implements the 32-bit result of the multiplication of v1 times v2 withone shift left, i.e.

L mult(v1, v2) = L shl((v1 × v2), 1)

Note that L mult(-32768,-32768) = 2147483647.

L mult0(v1, v2)

Operator L mult0 implements the 32-bit result of the multiplication of v1 times v2 withoutleft shift, i.e.

L mult(v1, v2) = (v1 × v2)

mult(v1, v2)

Performs the multiplication of v1 by v2 and gives a 16-bit result which is scaled, i.e.

mult(v1, v2) = extract l(L shr((v1 times v2),15))

Note that mult(-32768,-32768) = 32767.

mult r(v1, v2)

Same as mult() but with rounding, i.e.

mult r(v1, v2) = extract l(L shr(((v1 × v2)+16384), 15))

and mult r(-32768, -32768) = 32767.


L mac(L v3, v1, v2)

Multiply v1 by v2 and shift the result left by 1. Add the 32-bit result to L v3 withsaturation, return a 32-bit result:


L mac(L v3, v1, v2) = L add(L v3, L mult(v1, v2))

L mac0(L v3, v1, v2)

Multiply v1 by v2 without left shift. Add the 32-bit result to L v3 with saturation,returning a 32-bit result:

L mac(L v3, v1, v2) = L add(L v3, L mult0(v1, v2))

L macNs(L v3, v1, v2)

Multiply v1 by v2 and shift the result left by 1. Add the 32-bit result to L v3 withoutsaturation, return a 32-bit result. Generates carry and overflow values:

L macNs(L v3, v1, v2) = L add c(L v3, L mult(v1, v2))

mac r(L v3, v1, v2)

Multiply v1 by v2 and shift the result left by 1. Add the 32-bit result to L v3 withsaturation. Round the 16 least significant bits of the result into the 16 most significantbits with saturation and shift the result right by 16. Returns a 16-bit result.

mac r(L v3, v1, v2) =

round(L mac(L v3, v1, v2))=

extract h(L add(L add(L v3, L mult(v1, v2)), 32768))


L msu(L v3, v1, v2)

Multiply v1 by v2 and shift the result left by 1. Subtract the 32-bit result from L v3 withsaturation, return a 32-bit result:

L msu(L v3, v1, v2) = L sub(L v3, L mult(v1, v2)).

L msu0(L v3, v1, v2)

Multiply v1 by v2 without left shift. Subtract the 32-bit result from L v3 with saturation,returning a 32-bit result:

L msu(L v3, v1, v2) = L sub(L v3, L mult0(v1, v2)).

L msuNs(L v3, v1, v2)

Multiply v1 by v2 and shift the result left by 1. Subtract the 32-bit result from L v3

without saturation, return a 32-bit result. Generates carry and overflow values:

L msuNs(L v3, v1, v2) = L sub c(L v3, L mult(v1, v2))

msu r(L v3, v1, v2)

Multiply v1 by v2 and shift the result left by 1. Subtract the 32-bit result from L v3

with saturation. Round the 16 least significant bits of the result into the 16 bits withsaturation and shift the result right by 16. Returns a 16-bit result.


msu r(L v3, v1, v2) =

round(L msu(L v3, v1, v2))=

extract h(L add(L sub(L v3, L mult(v1, v2)), 32768))


Logical operators

s and(v1, v2) ⇒ New in v2.0

Performs a bit wise AND between the two 16-bit variables v1 and v2.

s or(v1, v2) ⇒ New in v2.0

Performs a bit wise OR between the two 16-bit variables v1 and v2.

s xor(v1, v2) ⇒ New in v2.0

Performs a bit wise XOR between the two 16-bit variables v1 and v2.

lshl(v1, v2) ⇒ New in v2.0

Logically shifts left the 16-bit variable v1 by v2 positions:

– if v2 is negative, v1 is shifted to the least significant bits by (-v2) positions withinsertion of 0 at the most significant bit.

– if v2 is positive, v1 is shifted to the most significant bits by (v2) positions withoutsaturation control.

lshr(v1, v2) ⇒ New in v2.0

Logically shifts right the 16-bit variable v1 by v2 positions:

– if v2 is positive, v1 is shifted to the least significant bits by (v2) positions withinsertion of 0 at the most significant bit.

– if v2 is negative, v1 is shifted to the most significant bits by (-v2) positionswithout saturation control.

L and(L v1, L v2) ⇒ New in v2.0

Performs a bit wise AND between the two 32-bit variables L v1 and L v2.

L or(L v1, L v2) ⇒ New in v2.0

Performs a bit wise OR between the two 32-bit variables L v1 and L v2.

L xor(L v1, L v2) ⇒ New in v2.0

Performs a bit wise XOR between the two 32-bit variables L v1 and L v2.

L lshl(L v1, v2) ⇒ New in v2.0

Logically shifts left the 32-bit variable L v1 by v2 positions:


– if v2 is negative, L v1 is shifted to the least significant bits by (-v2) positions withinsertion of 0 at the most significant bit.

– if v2 is positive, L v1 is shifted to the most significant bits by (v2) positionswithout saturation control.

L lshr(L v1, v2) ⇒ New in v2.0

Logically shifts right the 32-bit variable L v1 by v2 positions:

– if v2 is positive, L v1 is shifted to the least significant bits by (v2) positions withinsertion of 0 at the most significant bit.

– if v2 is negative, L v1 is shifted to the most significant bits by (-v2) positionswithout saturation control.

Data type conversion operators

extract h(L v1)

Return the 16 MSB of L v1.

extract l(L v1)

Return the 16 LSB of L v1.

round(L v1)

Round the lower 16 bits of the 32-bit input number into the most significant 16 bits withsaturation. Shift the resulting bits right by 16 and return the 16-bit number:

round(L v1) = extract h(L add(L v1, 32768))

L deposit h(v1)

Deposit the 16-bit v1 into the 16 most significant bits of the 32-bit output. The 16 leastsignificant bits of the output are zeroed.


L deposit l(v1)

Deposit the 16-bit v1 into the 16 least significant bits of the 32-bit output. The 16 mostsignificant bits of the output are sign-extended.



L add c(L v1, L v2)

Performs the 32-bit addition with carry. No saturation. Generates carry and overflowvalues. The carry and overflow values are binary variables which can be tested andassigned values.


L sub c(L v1, L v2)

Performs the 32-bit subtraction with carry (borrow). Generates carry (borrow) and over-flow values. No saturation. The carry and overflow values are binary variables which canbe tested and assigned values.


Arithmetic operators

shr r(v1, v2)

Same as shr() but with rounding. Saturate the result in case of underflows or overflows.

if (v2>0) then

if (sub(shl(shr(v1,v2),1), shr(v1,sub(v2,1)))==0)

then shr r(v1, v2) = shr(v1, v2)

else shr r(v1, v2) = add(shr(v1, v2), 1)

else if (v2 ≤ 0)

then shr r(v1, v2) = shr(v1, v2)

shl r(v1, v2)

Same as shl() but with rounding. Saturate the result in case of underflows or overflows:

shl r(v1, v2) = shr r(v1, -v2)

Note: In v1.x, the complexity weight of this operator was 2. Additionally, please notethat in v1.x this operator was called shift r(v1, v2); in the STL2005, bothnames can be used.

L shr r(L v1, v2)

Same as L shr(v1,v2) but with rounding. Saturate the result in case of underflows oroverflows:

if (v2 > 0) then

if (L sub(L shl(L shr(L v1,v2),1), L shr(L v1, sub(v2,1)))) == 0

then L shr r(L v1, v2) = L shr(L v1, v2)

else L shr r(L v1, v2) = L add(L shr(L v1, v2), 1)

if (v2 ≤ 0)

then L shr r(L v1, v2) = L shr(L v1, v2)

L shl r(L v1, v2)

Same as L shl(L v1,v2) but with rounding. Saturate the result in case of underflows oroverflows.

L shl r(L v1, v2) = L shr r(L v1, -v2)

In v1.x, this operator is called L shift r(L v1, v2) ; both names can be used.


i mult(v1, v2)

Multiply two 16-bit words v1 and v2 returning a 16-bit word with overflow control.


Logical Operators

rotl(v1, v2, * v3) ⇒ New in v2.0

Rotates the 16-bit variable v1 by 1 bit to the most significant bits. Bit 0 of v2 is copiedto the least significant bit of the result before it is returned. The most significant bit ofv1 is copied to the bit 0 of v3 variable.

rotr(v1, v2, * v3) ⇒ New in v2.0

Rotates the 16-bit variable v1 by 1 bit to the least significant bits. Bit 0 of v2 is copiedto the most significant bit of the result before it is returned. The least significant bit ofv1 is copied to the bit 0 of v3 variable.

L rotl(L v1, v2, * v3) ⇒ New in v2.0

Rotates the 32-bit variable L v1 by 1 bit to the most significant bits. Bit 0 of v2 is copiedto the least significant bit of the result before it is returned. The most significant bit ofL v1 is copied to the bit 0 of v3 variable.

L rotr(L v1, v2, * v3) ⇒ New in v2.0

Rotates the 32-bit variable L v1 by 1 bit to the least significant bits. Bit 0 of v2 is copiedto the most significant bit of the result before it is returned. The least significant bit ofL v1 is copied to the bit 0 of v3 variable.


L sat(L v1)

The 32-bit variable L v1 is set to 2147483647 if an overflow occurred, or -2147483648 ifan underflow occurred, on the most recent L add c(), L sub c(), L macNs() or L msuNs()

operations. The carry and overflow values are binary variables which can be tested andassigned values.


L mls(L v1, v2)

Performs a multiplication of a 32-bit variable L v1 by a 16-bit variable v2, returning a32-bit value.




div s(v1, v2)

Produces a result which is the fractional integer division of v1 by v2. Values in v1 andv2 must be positive and v2 must be greater than or equal to v1. The result is positive(leading bit equal to 0) and truncated to 16 bits. If v1=v2, then div(v1, v2) = 32767.


div l(L v1, v2)

Produces a result which is the fractional integer division of a positive 32-bit value L v1

by a positive 16-bit value v2. The result is positive (leading bit equal to 0) and truncatedto 16 bits.

13.2.9 Basic operator usage across standards

Table 13.1 contains a survey of the 16-bit and 32-bit basic operators which are used invarious standards. Follows some notes associated to 13.1:

1. abs s(v1) is referred to as abs(v1) in GSM 06.10 (GSM full-rate).

2. shl(v1,v2) is written as v1<<v2 in GSM 06.10.

3. shr(v1,v2) is written as v1>>v2 in GSM 06.10.

4. v2=extract h(L v1) is written as v2 = L v1 in GSM 06.10.

5. negate(v1) is written as –v1 in GSM 06.10.

6. L negate(L v1) is written as –L v1 in GSM 06.10.

7. L shl(L v1,v2) is written as L v1<<v2 in GSM 06.10.

8. L shr(L v1,v2) is written as L v1>>v2 in GSM 06.10.

9. L v2=deposit l(v1) is written as L v2=v1 in GSM 06.10.

10. div s(v1,v2) is written as div(v1,v2) in GSM 06.10.

11. norm l(L v1) is written as norm(L v1) in GSM 06.10.

12. GSM 06.20 uses shift r(v1,v2), which can be implemented as shr r(v1,–v2).

13. GSM 06.20 uses L shift r(L v1,v2), which can be implemented as L shr r(L v1,–v2).

14. div s(v1,v2) is written as divide s(v1,v2) in GSM 06.20.

15. Operator is not part of the original ETSI library.

16. Operator is not part of the original ETSI library but was accepted in the TETRAstandard.


Table 13.1: Use of 32-bit basic operators in G.723.1, G.729 and ETSI GSMspeech coding recommendations.

Operation Weight FR GSM HR GSM EFR GSM AMR GSM G.729 G.723.1 TETRAadd() 1 X X X X X X Xsub() 1 X X X X X X Xabs s() 1 X (1) X X X X X Xshl() 1 X (2) X X X X X Xshr() 1 X (3) X X X X X Xextract h() 1 X X X X X Xextract l() 1 X (4) X X X X X Xmult() 1 X X X X X X XL mult() 1 X X X X X X Xnegate() 1 X (5) X X X X Xround() 1 X X X X X XL mac() 1 X X X X X XL msu() 1 X X X X X XL macNs() 1 X X XL msuNs() 1 X XL add() 1 X X X X X X XL sub() 1 X X X X X X XL negate() 1 X (6) X X X X X XL shl() 1 X (7) X X X X X XL shr() 1 X (8) X X X X X Xmult r() 1 X X X X X X Xmac r() 1 X Xmsu r() 1 X XL deposit h() 1 X X X X X XL deposit l() 1 X (9) X X X X X XL abs() 1 X X X X X Xnorm s() 1 X X X X Xnorm l() 1 X (11) X X X X X XL add c() 2 XL sub c() 2 Xshr r() 3 X (12) X X X XL shr r() 3 X (13) X X X X XL sat() 4 X Xdiv s() 18 X (10) X (14) X X X X Xi mult() 3 X (15)L mls() 5 X (15)div l() 32 X (15)L mult0() 1 X (16)L mac0() 1 X (16)L msu0() 1 X (16)


13.3 Description of the 40-bit basic operators and

associated weights

This section describes the different 40-bit basic operators available in the STL, and areorganized by complexity (”weights”). The complexity values to be considered (as of thepublication of the STL2005) are the ones related to the version 2.0 of the library. Thesebasic operators did not exist in the previous version of the library (version 1.x).

A set of coding guidelines must be followed in order to avoid algorithm complexity miss-evaluation. This chapter describes also these guidelines.

13.3.1 Variable definitions

The variables used in the operators are signed integers in 2’s complements representation,defined by:

v1, v2: 16-bit variablesL v1, L v2, L v3: 32-bit variablesL40 v1, L40 v2, L40 v3: 40-bit variables


Arithmetic operators (multiplication excluded)

L40 add(L40 v1, L40 v2)

Adds the two 40-bit variables L40 v1 and L40 v2 without 40-bit saturation control. Itwill exit execution if it detects a 40-bit overflow.

L40 sub(L40 v1, L40 v2)

Subtracts the two 40-bit variables L40 v2 from L40 v1 without 40-bit saturation control.It will exit execution if it detects a 40-bit overflow.

L40 abs(L40 v1)

Returns the absolute value of the 40-bit variable L40 v1 without 40-bit saturation control.It will exit execution if it detects a 40-bit overflow.

L40 shl(L40 v1, v2)

Arithmetically shifts left the 40-bit variable L40 v1 by v2 positions:

– if v2 is negative, L40 v1 is shifted to the least significant bits by (-v2) positionswith extension of the sign bit.

– if v2 is positive, L40 v1 is shifted to the most significant bits by (v2) positionswithout 40-bit saturation control. It will exit execution if it detects a 40-bitoverflow.


L40 shr(L40 v1, v2)

Arithmetically shifts right the 40-bit variable L40 v1 by v2 positions:

– if v2 is positive, L40 v1 is shifted to the least significant bits by (v2) positions withextension of the sign bit.

– if v2 is negative, L40 v1 is shifted to the most significant bits by (-v2) positionswithout 40-bit saturation control. It will exit execution if it detects a 40-bitoverflow.

L40 negate(L40 v1)

Negates the 40-bit variable L40 v1 without 40-bit saturation control. It will exit execu-tion if it detects a 40-bit overflow.

L40 max(L40 v1, L40 v2)

Compares two 40-bit variables L40 v1 and L40 v2 and returns the maximum value.

L40 min(L40 v1, L40 v2)

Compares two 40-bit variables L40 v1 and L40 v2 and returns the minimum value.

norm L40(L40 v1)

Produces the number of left shifts needed to normalize the 40-bit variable L40 v1 forpositive values on the interval with minimum of 1073741824 and maximum 2147483647,and for negative values on the interval with minimum of -2147483648 and maximum of-1073741824; in order to normalize the result, the following operation must be done:

L40 norm v1 = L40 shl( L40 v1, norm L40( L40 v1))

Multiplication operators

L40 mult(v1, v2)

Multiplies the 2 signed 16-bit variables v1 and v2 without 40-bit saturation control. Itwill exit execution if it detects a 40-bit overflow. The operation is performed in fractionalmode:

– v1 and v2 are supposed to be in 1Q15 format.

– The result is produced in 9Q31 format.

L40 mac(L40 v3, v1, v2)

Equivalent to: L40 add( L40 v1, L40 mult( v2, v3))

L40 msu(L40 v3, v1, v2)

Equivalent to: L40 sub( L40 v1, L40 mult( v2, v3))


Logical operators

L40 lshl(L40 v1, v2)

Logically shifts left the 40-bit variable L40 v1 by v2 positions:

– if v2 is negative, L40 v1 is shifted to the least significant bits by (-v2) positionswith insertion of 0 at the most significant bit.

– if v2 is positive, L40 v1 is shifted to the most significant bits by (v2) positionswithout saturation control.

L40 lshr(L40 v1, v2)

Logically shifts right the 40-bit variable L40 v1 by v2 positions:

– if v2 is positive, L40 v1 is shifted to the least significant bits by (v2) positions withinsertion of 0 at the most significant bit.

– if v2 is negative, L40 v1 is shifted to the most significant bits by (-v2) positionswithout saturation control.

Data type conversion operators

Extract40 H(L40 v1)

Returns the bits [31..16] of L40 v1.

Extract40 L(L40 v1)


round40(L40 v1)

Equivalent to:extract h( L saturate40( L40 round( L40 v1)))

L Extract40(L40 v1)


L saturate40(L40 v1)

If L40 v1 is greater than 2147483647, the operator returns 2147483647.If L40 v1 is lower than -2147483648, the operator returns -2147483648.Otherwise, it is equivalent to L Extract40(L40 v1).

L40 deposit h(v1)

Deposits the 16-bit variable v1 in the bits [31..16] of the return value: the return valuebits [15..0] are set to 0 and the bits [39..32] sign extend v1 sign bit.


L40 deposit l(v1)

Deposits the 16-bit variable v1 in the bits [15..0] of the return value: the return value bits[39..16] sign extend v1 sign bit.

L40 deposit32(L v1)

Deposits the 32-bit variable L v1 in the bits [31..0] of the return value: the return valuebits [39..32] sign extend L v1 sign bit.

L40 round(L40 v1)

Performs a rounding to the infinite on the 40-bit variable L40 v1. 32768 is added toL40 v1 without 40-bit saturation control. It will exit execution if it detects a 40-bitoverflow. The end-result 16 LSBits are cleared to 0.


mac r40(L40 v1, v2, v3)

Equivalent to:round40( L40 mac( L40 v1, v2, v3))

msu r40(L40 v1, v2, v3)

Equivalent to:round40( L40 msu( L40 v1, v2, v3))

Mpy 32 16 ss(L v1, v2, *L v3 h, *v3 l)

Multiplies the 2 signed values L v1 (32-bit) and v2 (16-bit) with saturation controlon 48-bit. The operation is performed in fractional mode:When L v1 is in 1Q31 format, and v2 is in 1Q15 format, the result is produced in 1Q47format: L v3 h bears the 32 most significant bits while v3 l bears the 16 least significantbits.


L40 shr r(L40 v1, v2)

Arithmetically shifts the 40-bit variable L40 v1 by v2 positions to the least significantbits and rounds the result. It is equivalent to L40 shr( L40 v1, v2) except that if v2 ispositive and the last shifted out bit is 1, then the shifted result is increment by 1 without40-bit saturation control. It will exit execution if it detects a 40-bit overflow.


L40 shl r(L40 v1, v2)

Arithmetically shifts the 40-bit variable L40 v1 by v2 positions to the most significantbits and rounds the result. It is equivalent to L40 shl( L40 var1, v2) except if v2 isnegative. In this case, it does the same as L40 shr r( L40 v1, (-v2)).

L40 set(L40 v1)

Assigns a 40-bit constant to the returned 40-bit variable.


Mpy 32 32 ss(L v1, L v2, *L v3 h, *L v3 l)

Multiplies the two signed 32-bit values L v1 and L v2 with saturation control on 64-bit.The operation is performed in fractional mode: when L v1 and L v2 are in 1Q31 format,the result is produced in 1Q63 format; L v3 h bears the 32 most significant bits whileL v3 l bears the 32 least significant bits.

13.3.6 Coding Guidelines

The following recommendations must be followed in the usage of the 40-bit operators:

1. Only 40-bit variables local to functions can be declared. Declaration of arrays andstructures containing 40-bit elements must not be done.

2. 40-bit basic operators and 16/32-bit basic operators must not be mixed within thesame loop initialized with a FOR(), DO or WHILE() control basic operator.

When nested loop software structure is implemented, this recommendation appliesto the most inner loops. This enables to have, for instance, an outer loop containing2 inner loops, with the 1st inner loop using 40-bit basic operators and the 2ndinner loop using 16/32-bit basic operators. However, whenever possible, even such2 level loop structure configuration should only use either 40-bit basic operators or16/32-bit basic operators.

Current version (2.0) of the operator implementation does not evaluate the complex-ity associated to the mixing of 40-bit and 16/32-bit operators. Subsequent versionsmay do so.

13.4 Description of the control basic operators and

associated weights

This chapter describes the different control basic operators available in the STL andtheir associated complexity weights. The complexity values to be considered (as of thepublication of the STL2005) are the ones related to the version 2.0 of the library. Thesebasic operators did not exist in the previous version of the library (version 1.x).


A set of coding guidelines must be followed in order to avoid algorithm complexitymiss-evaluation. This chapter describes also these guidelines.

13.4.1 Operators and complexity weights

Nine macros are defined to enable the evaluation of the complexity associated to controlinstructions that are frequently used in C.

o The IF(expression) and ELSE macros evaluate the cost of the C statement:if(expression) ...[[ else if(expression2)...] else ...]

o The SWITCH(expression) macro evaluates the cost of the C statement:switch(expression)...

o The WHILE(expression) macro evaluates the cost of the C statement:while(expression) ...

o The FOR(expr1; expr2; expr3) macro evaluates the cost of the C statement:for( expr1; expr2; expr3) ...

o The DO and WHILE(expression) macros evaluates the cost of the C statement:do ... while(expression)

o The CONTINUE macro evaluates the cost of the C statement:while(expression)

...continue;...

orfor( expr1; expr2; expr3)

...continue;...

o The BREAK macro evaluates the cost of the C statement:while(expression)

...break;...

orfor( expr1; expr2; expr3)

...break;...


orswitch(...)

...break;...

o The GOTO macro evaluates the cost of the C statement: goto label;

Table 13.2 summarizes the control basic operators and their associated complexity.

13.4.2 Coding guidelines

When to use IF() instead of if()?

The IF() macro must be used instead of the classical C statement if(), wherever:

o There is an else or else if statement,

o There is strictly more than one basic operator to condition,

o There is at least a function call to condition.

o There is a control basic operator to condition.

Below example ...if( x == 0)

z = add(z, sub(y, x));if( z == 0)

Decode();something();

... must be written:IF( x == 0)

z = add(z, sub(y, x));IF( z == 0)

Decode();something();

While below code must stay untouched since only one basic operator is conditioned.if( x == 0)

z = add(z, x);something();

When to use FOR() and WHILE() macros?

The FOR() and WHILE() macros must be used to differentiate loops which can be handledby a h/w loop controller from complex loops which need to be controlled by additionalcontrol software.


Table 13.2: Control basic operators and associated complexity.

Complexity Weight Basic Operator Description0 DO... while(expression) The macro DO must be used instead of the

’do’ C statement.3 FOR(expr1; expr2; expr3) ... The macro FOR must be used instead of the

’for’ C statement. The complexity is inde-pendent of the number of loop iterations thatare performed.

0 if(expression)one and only one basic operator(control operators excluded)

The macro IF must not be used whenthe ’if’ structure does not have any ’else if’nor ’else’ statement and it conditions only onebasic operator (control operators excluded)

.

4 IF(expression) ... The macro IF must be used instead of the’if’ C statement in every other case: whenthere is an ’else’ or ’else if’ statement, or whenthe ’if’ conditions several basic operators, orwhen the ’if’ conditions a function call or whenthe ’if’ conditions a control operator.

4 if(expression) ... [[ELSE if(expression2)...]ELSE ...]

The macro ELSE must be used instead of the’else’ C statement.

8 SWITCH(expression) ... The macro SWITCH must be used instead ofthe ’switch’ C statement.

4 WHILE(expression) ... The macro WHILE must be used instead ofthe ’while’ C statement.The complexity is proportional to the num-ber of loop iterations that are performed.

4 while(expression) ... CONTINUE; ...orfor(expr1; expr2; expr3) ...CONTINUE;...

The macro CONTINUE must be used insteadof the ’continue’ C statement.

4 while(expression) ... BREAK; ...orfor(expr1; expr2; expr3) ...BREAK;...orswitch(var) ...BREAK;...

The macro BREAK must be used instead ofthe ’break’ C statement.

4 GOTO The macro GOTO must be used insteadof the ’goto’ C statement.


o Follows an example of a simple h/w loop that must be designed with the FOR()macro. It will iterate C-statement E0 to E20 a number of times known at loopentry (and at least once). Therefore, for such loops, there is no complexity associ-ated to the computation of the decision to loop back or not:

/* var1 > 0 is ensured */FOR( n = 0; n < var1; n++)

E0;/* never do anything that impacts var1 nor n value */E20;

o Follows an example of a complex s/w loop that must be designed with theWHILE(). It will iterate C-statement E0 to E20 a number of times undefinedat loop entry (eventually 0 times). Indeed, at the end of one loop iteration, thedecision to loop back depends on the processing done within the elapsed iteration.

/* do not need to ensure n < var1 at loop entry */WHILE(n < var1)

E0;/* can do anything that impacts var1 or n value */E20;

ANSI-C defines for() structures with while() structures, but by differencing the FOR()and WHILE() macro usage, a better complexity evaluation of the loop controlling ismade.

o A loop defined with FOR() macro:

- Only counts the initial set-up of the h/w loop controller with a complexityweight of 3.

- Must iterate at least once.

- Has a complexity independent of the number of iterations that are per-formed.

o A Loop defined with WHILE() macro:

- Counts, at every single iteration which is executed, the complexityassociated to the computation of the decision to loop back or not.

- Can be executed 0 times.

- Has a complexity proportional (by a factor of 4) to the number of iterationsthat are performed.

When to use DO and WHILE() macros?

It is important to modify below C code:do

x = sub(x, y) while( x < 0);


... into following one:DO

x = sub(x, y) WHILE( x < 0);

The following code is also possible but, although the associated complexity computationwill be identical, it can generate parsing errors by some source code editors which performon-the-fly syntax checking.

do x = sub(x, y)

WHILE(x < 0);

Testing an expression equality

if(expression) ... and while(expression) ... C statements.

All arithmetic tests on data must be presented as a comparison to zero. To performcomparison between two variables (or a variable and a non-zero constant), a subtraction(sub or L sub or L40 sub) must be performed first.

For example, below examples leads to an under evaluation of the complexity:if( a > 3 ) while( a != 5) ...

While, below examples leads to a correct evaluation of the complexity:

if( sub(a,3) > 0) while( sub( a, 5) != 0) ...

If multiple condition need to be evaluated and merged, one test() operator must be usedfor each additional test to be done.

Example 1:

The following code ...

if ( (a > b) && (c > d)) ... must be modified to:

test();if ( (sub( a, b) > 0) && (sub( c, d) > 0))

Example 2:

The following code ...if ( (a > b)&& (c > d)||(e > f))

... must be modified to:test();test();if ( (sub( a, b) > 0)&& ( sub( c, d) > 0)|| (sub( e, f) > 0))


(condition) ? (statement1) : (statement2)

The ternary operator “? :” must not be used since it does not enable the evaluation ofthe associated complexity.

Therefore, instead of writing:(condition) ? (statement1) : (statement2)

One must write:IF(condition)

statement1;ELSE

statement2;

Whenever it is possible to avoid the else clause, one should write:statement2;IF(condition)

statement1;

And whenever statement1 is one and only one basic operator (control operatorexcluded), one can write:

statement2;if(condition)

one and only one basic operator;

for(expresion1; expression2; expression3)

A ”for” C statement must be limited to initializing, testing and incrementing the loopcounter. The following C code statement is an example of incorrect usage:

for(i=0, j=0; i<N & w>0 ; i++, j+=3)

It must be replaced by:j=0;for(i=0; i<N ; i++)

j = add(j,3);if(w > 0)

break;

Actually, in order to respect the other recommendations, it must be replaced by:j=0;FOR(i=0; i<N ; i++)

j = add(j,3);if(w > 0)

BREAK;


13.5 Complexity associated with data moves and other

operations

13.5.1 Data moves

Each data move between two 16-bit variables (with move16() operator) has a complexityweight of 1 and each data move between two 32-bit variables (with move32() operator)has a complexity weight of 2.

1. A 16-bit variable cannot be directly moved to a 32-bit or 40-bit variable.2. A 32-bit variable cannot be directly moved to a 16-bit or 40-bit variable.3. A 40-bit variable cannot be directly moved to a 16-bit or 32-bit variable.

For above 3 types of moves, functions such as the following ones must be used:

round() round40() L saturate40()extract h() Extract40 H() L Extract40()extract l() Extract40 L() L40 deposit32()

L deposit h() L40 deposit h()L deposit l() L40 deposit l()

There will be no extra weighting for data move when using above functions: the weightingof the data move is already included in the weighting of these functions.

Data moves are only counted in the following cases:1. A data move from a constant to a variable;2. A data move from a variable to a variable;3. A data move of the result of a basic operation to an array variable;4. When an arithmetic test is performed on an array variable.

13.5.2 Other operations

Address computation must be excluded from the complexity evaluation. However, whenextremely complex address computations are done, these address computations should beresolved using the basic operations, in order to account for the associated complexity.

Chapter 14

REVERB: Reverberation tool

14.1 Introduction

In some hands-free applications (videoconference for example), the received sound is com-posed of direct sound from a speaker and its reverberated components. This reverberationeffect corresponds to the modification of the speech signal by the acoustic response of theenclosure. The room effect is usually modeled [54] as a finite impulse response that can bemeasured between a specific source and the position of the receiver. Thus, it is possible tosimulate a given room by convolving its measured impulse response with anechoic signals,which is the goal of this tool.


14.2.1 Algorithm

Many approaches are available to add reverberation to a signal. The most realistic ofthem is to measure a real room impulse response and to convolve anechoic signals withit, which is used in the STL. This is the principle of this tool.

The reverberated signal is computed as

srev(k) =N−1∑l=0

IR(l) · s(k − l),

where s(k) is the original signal at time index k, srev the reverberated signal, IR theimpulse response of a room, and N filter coefficients in IR.

The power level of the obtained reverberated signal depends of the experimental conditionsof the impulse response measure. As a consequence, the processed signal can be attenuatedor amplified. In order to compare reverberated sounds, the user can specify an alignmentfactor α which will scale the reverberated sound. This factor can be determined with theSV56 speech voltmeter.

The aligned reverberated signal is computed as

s′rev(k) = srev(k) · α,

where s′rev and α are the aligned reverberated signal, and the scaling factor, respectively.

183


14.2.2 Impulse response measures

Three impulse responses of typical meeting rooms have been measured and are providedwith this tool:

- File visio.IR: sound capture at 100cm distance in a small video-conference room,

- File meeting50.IR: sound capture at 50cm distance in a meeting room,

- File meeting100.IR: sound capture at 100cm distance in the same meeting room.

All of these impulse responses are sampled at 32kHz. Pictures of the two rooms consideredare shown in figure 14.1.

(a) Video-conference room

(b) Meeting room

Figure 14.1: Picture of the rooms whose impulse responses are available in the STL.


Table 14.1: Room characteristics.

Length (m) Width (m) Height (m) Volume (m3)Video-conference room 4.80 4.45 2.50 53.40Meeting room 8.55 5.30 2.70 122.35

Table 14.2: Octave-band reverberation time.

Reverberation time (ms)Octave band 125 Hz 250 Hz 500 Hz 1 kHz 2 kHz 4 kHzVideo-conference room 600 450 360 295 280 250Meeting room 671 600 518 490 466 440

The geometry characteristics are given in the table 14.1. These rooms are acousticallytreated in order to limit the reverberation (filled carpet, acoustically absorbent wall andceiling). Note that the reverberation reduces the intelligibility of recorded speech anddegrades the performance of acoustic echo canceller in case of hands-free communications.

To give more information concerning the acoustical behavior, the octave-band reverbera-tion time has been computed for frequencies below 8 kHz. The values represented in table14.2 are estimated by the backward integration method applied in each octave-band ofthe measured impulse response.

14.2.3 Impulse response file format

The Impulse Response (IR) measures are stored into ”.IR” files. Each sample of the IRis written in the IEEE floating-point format (32-bit IEEE float (0.24)). Attention mustbe paid on the concordance between the sampling frequency of input data and the one ofthe impulse response measure.

14.3 Implementation

14.3.1 shift

Syntax:

#include "reverb-lib.h"

void shift (short* buff, long N);

Prototype: reverb-lib.h

Description:

This routine replaces the first N-1 samples of a buffer by its last N-1 samples. It is usefulfor the block-based convolution, where N is the length of the blocks.


Variables:

buff . . . . . . . . . . . . . . buffer (input/output);N . . . . . . . . . . . . . . length of each block (input);

14.3.2 conv

Syntax:

#include "reverb-lib.h"

void conv (float* IR, short* buffIn, short* buffRvb, float

alignFact, long N, long L);

Prototype: reverb-lib.h

Description:

This function convolves the input buffer buffIn with an impulse response IR and storesthe processed data into the output buffer buffRvb. The alignment factor (multiplicativefactor) alignFact is used to align the energy of the input file with another file.

Variables:

IR . . . . . . . . . . . . . . impulse response buffer ;buffIn . . . . . . . . . . . . . . input buffer;buffRvb . . . . . . . . . . . . . . convolved data;alignFact . . . . . . . . . . . . . . alignment factor;N . . . . . . . . . . . . . . length of the impulse response buffer;L . . . . . . . . . . . . . . length of the input buffer to process;

14.3.3 Tests and portability

Compiled and tested on a PC (Windows) platform with MS Visual C++ 6.0.

14.4 Example code

The demonstration program uses a room impulse response and a sound file as input toproduce a reverberated sound file as output. The input sound is convolved with theroom impulse response to produce the reverberated sound. The program can be found inreverb.c.

Chapter 15

TRUNCATE: Bitstream truncationtool

15.1 Introduction

A scalable codec is a highly flexible coding technique that is characterized by a multi layerbitstream:

• The core layer provides the minimum quality. The decoder needs this layers to work.

• Upper layers enable improving quality by increasing bit rate till a maximum value.

The main feature of a scalable codec lies in bit rate flexibility. The bitrate can be adjustedbetween minimal and maximal values by any component in the communication chain. Forinstance to cope with network congestion it can be adjusted on a frame by frame basis.

The bit rate modification is very simple as it consists of cutting the bitstream at the rightrate, i.e. stripping bits. Apart from this straightforward truncation, there is no signalprocessing.

To simulate this functionality a bitstream truncation tool (truncate) is proposed. G.192bitstreams (with or without sync header), G.192 byte-oriented bitstreams (with or withoutsync header) and binary (compact) bitstreams can be processed with this tool.


This tool truncates the bitstream at the desired bit rate (see figure figure 15.1). For eachframe of an input bitstream, this tool performs the following operations :

1. Read the synchronization word and copy it to the output bitstream;

2. write the new frame length word equal to the number N2 of bits to copy to theoutput bitstream (N2 depends on the desired bit rate);

3. read the N1 words of the input bitstream and copy the first N2 16-bit words repre-senting the first N2 bits of the input to the output bitstream.

187


Figure 15.1: Bitstream truncation principle.

For example, one frame of the input bitstream will comprise 640 bits for a 32 kbit/s codecoperating a frame size of 20 ms. To truncate the 32 kbit/s input bitstream to 14 kbit/s,the output bitstream frame length will be set to 280 for each frame. Only the first 280words of the 640 words of the input bitstream will be copied in the output bitstream whilethe last 360 words will be discarded.

15.3 Implementation

15.3.1 trunc

Syntax:

#include "trunc-lib.h"

void trunc (short syncWord, short outFrameLgth, short* inpFrame,

short* outFrame);

Prototype: trunc-lib.h

Description:

This routine copies the syncWord and the first outFrameLgth words of the input frameinpFrame to the output frame outFrame.

Variables:

syncWord . . . . . . . . . . . . . . synchronization word to write to the output frame;outFrameLgth. . . . . . . . . . . . . . length of the output frame;


inpFrame . . . . . . . . . . . . . . input frame to truncate;outFrame . . . . . . . . . . . . . . output frame;



15.4 Example code

A demonstration program, truncate.c illustrates the use of this module to truncate abitstream to the desired bitrate.

Chapter 16

FREQRESP: Frequency responsemeasurement tool

16.1 Introduction

In order to measure effective codec bandwith, a frequency response measurement tool wascreated for the STL2005.


An input signal is encoded and decoded by the Codec under Test. The periodogrammethod [55] is then used to compute the average amplitude spectrum difference betweena reference signal (e.g. the input file to the speech codec) and a test signal (e.g. thespeech signal after encoding and decoding by a codec).The input and output signals are treated on a frame by frame basis, among a frame lengthof 2048 samples. A Hanning window of length 2048 samples is applied to each input andoutput frame. The resulting windowed signals are transformed to the frequency domainusing a 2048-point Fast Fourier transform. The amplitude spectra are then computed andaveraged. This tool outputs the average amplitude spectrum in ASCII and also producesa bitmap file.SG 12 has recommended as input signal the P50 test signals (p50 m.16p for male voices,and p50 f.16p for female voices) which are representative of speech signals.

16.2.1 Discrete Fourier Transform (DFT)

The spectrum is computed using the Discrete Fourier Transform (for real signals). Thisis performed as follows :

X(f) =NFFT−1∑

k=0

x(k) · cos (2πfk)− j ·NFFT−1∑

k=0

x(k) · sin (2πfk)

where NFFT is the number of DFT coefficients.

191


16.2.2 Hanning window generation (DFT)

The hanning window of length n is defined as :

hanning(k) = 0.5 · [1− cos(2π · k + 1

n + 1)] (0 ≤ k ≤ n− 1)

16.3 Implementation

16.3.1 rdft

Syntax:

#include "fft.h"

void rdft (int NFFT, float* x1, float* x2, float* y2);

Prototype: fft.h

Description:

This routine computes the positive part of the spectrum, using Real Discrete FourierTransform.

Variables:

NFFT . . . . . . . . . . . . . . number of coefficients of the fourier transform;x1 . . . . . . . . . . . . . . input real signal;x2 . . . . . . . . . . . . . . output real part of the Fourier Transform;y2 . . . . . . . . . . . . . . output imaginary part of the Fourier Transform;

16.3.2 genHanning

Syntax:

#include "fft.h"

void genHanning (int n, float* hanning);

Prototype: fft.h

Description:

This routine generates a hanning window.

Variables:

n . . . . . . . . . . . . . . number of coefficients of the hanning window;hanning . . . . . . . . . . . . . . buffer containing the coefficients of the hanning window;

16.3.3 powSpect

Syntax:

#include "fft.h"

void powSpect (float* real, float* imag, float* pws, int n);

Prototype: fft.h


Description:

This routine computes the power spectrum of the DFT of a signal.

Variables:

real . . . . . . . . . . . . . . input buffer containing the real part of the DFT;imag . . . . . . . . . . . . . . input buffer containing the imaginary part of the DFT;pws . . . . . . . . . . . . . . output buffer containing the power spectrum;n . . . . . . . . . . . . . . length of the input buffers;



16.4 Example code

A demonstration program, freqresp.c illustrates the use of this module to compute theaverage power spectrum of two signals (input and output of the codec).

Chapter 17

UTILITIES: UGST utilities

This module does not relate to any ITU-T Recommendation, but implements severalgeneral-purpose routines, that are needed when using other STL modules.

In the process of implementing the STL modules, it was found that the interfacing betweendata representations (float and short; serial and parallel) could present problems.Hence, algorithms implementation these functions have been made available in the ITU-TSTL. Additionally, a scaling routine for application of gain and loss to speech samples isincluded.

17.1 Some definitions

Some functions in this module convert between a serial format and a parallel format. Theparallel format is defined to be a representation in which all the bits in a computer wordhave an information content, as in a multi-level representation of data. Speech samplesin a computer file are a typical example of a parallel representation. A serial format isdefined as the representation of the data where each computer word refer to a single bit ofinformation. An example would be the sequence of bits sent in a communication channelrefering to an encoded digital signal. A serial bitstream, in the context of the ITU-T STL,refers to a multi-level representation of information bits in which each of the “hard” bits‘0’ or ‘1’ are mapped respectively to the so-called softbits 0x007F and 0x0081, to whichan error probability is associated. These softbits are stored in 16-bit right-justified words.In addition, if the bitstream is compliant to the bitstream signal representation in AnnexB of ITU-T Recommendation G.192, the serial bitstream “payload” described above willbe preceed by a synchronization header. A synchronization header is composed by asynchronization word followed by a frame length word. Synchronization words are wordsin the bitstream in the range 0x6B21 to 0x6B2F. A synchronization word equal to 0x6B20

indicates a frame loss. The frame length word is a two-complement word representing thenumber of softbits in the payload. Therefore, the frame length word does not account forthe synchronization header length (which equals two, by definition). Typically (as in theEID module), encoded signals are represented using the bitstreams with a synchronizationheader, while error patterns are represented without a synchronization header.

195


17.2 Implementation

The functions implemented in this module are:

• scale: . . . . . . . . . . . . . . . for level change of a float data stream;• sh2fl*: . . . . . . . . . . . . . . for conversion from short to float;• fl2sh: . . . . . . . . . . . . . . . for conversion from float to short;• serialize *: . . . . . . . . for conversion from parallel to serial data representation;• parallelize *: . . . . . .for conversion from serial to parallel data representation;

Following you find a summary of calls to these functions.

17.2.1 scale

Syntax:

#include "ugst-utl.h"

long scale (float *buffer,long smpno,double factor);

Prototype: ugst-utl.h

Description:

Gain/loss insertion algorithm that scales the input buffer data by a given factor. If thefactor is greater than 1.0, it means a gain; if less than 1.0, a loss. The basic algorithm is:

y(k) = x(k) · factor

Please note that:

• the scaled data is put into the same location of the original data, in orderto save memory space, thus overwriting original samples;

• input data buffer is an array of floats;• scaling precision is single (rather than double precision).

Variables:

buffer . . . . . . . . . . . . . . Float data vector to be scaled.smpno . . . . . . . . . . . . . . Number of samples in buffer.factor . . . . . . . . . . . . . . The floatscaling factor.

Return value:

Return the number of scaled samples.

17.2.2 sh2fl

Syntax:


void sh2fl (long n, short *ix, float *y, long resolution, char norm);


Description:

Common conversion routine. The conversion routine expects the fixed point data to be inthe range between –32768..32767. Conversion to float is done by taking into account only


the most significant bits (i.e., input samples shall be left-justified), normalizing afterwardsto the range –1..+1, if norm is 1.

In order to maintain a match with its complementary routine fl2sh, a set of macros havebeen defined for resolutions in the range of 16 to 12 bits (see below for the complementarydefinitions):

• sh2fl 16bit: conversion from 16 bit to float• sh2fl 15bit: conversion from 15 bit to float• sh2fl 14bit: conversion from 14 bit to float• sh2fl 13bit: conversion from 13 bit to float• sh2fl 12bit: conversion from 12 bit to float

Variables:

n . . . . . . . . . . . . . . Is the number of samples in ix[ ];ix . . . . . . . . . . . . . . Is input short array pointer;y . . . . . . . . . . . . . . Is output float array pointer;resolution . . . . . . . . . . . . . . Is the resolution (number of bits) desired for the input data

in the floating point representation.norm . . . . . . . . . . . . . . Flag for normalization:

1: normalize float data to the range –1..+1;0: convert from short to float, leaving data in the range:

-32768(16–resolution) .. 32767(16–resolution),where is the right-shift operation.

Return value:

None.

17.2.3 sh2fl alt

Syntax:


void sh2fl alt (long n, short *ix, float *y, short mask);


Description:

Common conversion routine alternative to routine sh2fl. This conversion routine expectsthe fixed-point data to be in the range -32768..32767. Conversion to float is done by takinginto account only the most significant bits, indicated by mask. Conversion to float resultsnecessarily in normalised values in the range -1.0 ≤ y <+1.0.

Variables:

n . . . . . . . . . . . . . . Number of samples in ix[ ].ix . . . . . . . . . . . . . . Pointer to input short array.y . . . . . . . . . . . . . . Pointer to output float array.mask . . . . . . . . . . . . . . Mask determining how many bits of the input samples are

to be considered for convertion to float. Bits ’1’ in maskindicate that this bit in particular will be used in the conver-sion. For example, mask equal to 0xFFFF indicates that all


16 bits of the word are used in the convertion, while maskequal 0xFFFE, 0xFFFC, 0xFFF8, or 0xFFF0 will force respec-tively only the upper 15, 14, 13, or 12 most significant bits tobe used.

Return value:

None.

17.2.4 fl2sh

Syntax:


long fl2sh (long n, float *x, short *iy, double half lsb, unsigned

mask);


Description:

Common quantisation routine. The conversion routine expects the floating point data tobe in the range between –1..+1, values outside this range are limited. Quantization isdone by taking into account only the most significant bits. Therefore, the quantized (orconverted) data are located left justified within the 16-bit word, and the results are in therange:

• –32768, ..., –1, 0, +1, ..., +32767, if quantized to 16 bit• –32768, ..., –2, +2, ..., +32766, if quantized to 15 bit• –32768, ..., –4, +4, ..., +32763, if quantized to 14 bit• –32768, ..., –8, +8 ..., +32760, if quantized to 13 bit• –32768, ..., –16, +16, ..., +32752, if quantized to 12 bit

The operation may be summarized as:

yk = (xk ± h)&m

where xk is the float number, yk is the quantized number, h is the value of half-LSb forthe resolution desired (which is added to xk if the latter is positive or zero, or subtractedotherwise), and m is the bit mask (to assure that the bits below the LSb are 0). Theoperation = is a truncation, and & is a bit-wise AND operation. The appropriate valuesfor h are determined by:

h = 0.5 · 216−B = 215−B

where B is the desired resolution in bits. As an example, if data is to be stored with 15bits of resolution (equivalent to –16384..+16383, in right-justified notation), the roundingnumber h is 1.0, because the smallest number in the output buffer can be +1 or –1. Themask m, by its turn, is

m = 0xFFFF (16−B)

where is the left-shift bit operation with zero-padding from the right. For the sameexample, m is 0xFFFE, i.e., only bit 0 of the samples is zeroed.

To facilitate to the use of the fl2sh, a set of macros has been defined for quantizationsin the range of 16 to 12 bits (see ugst-utl.h):


• fl2sh 16bit: conversion from float to 16 bit• fl2sh 15bit: conversion from float to 15 bit• fl2sh 14bit: conversion from float to 14 bit• fl2sh 13bit: conversion from float to 13 bit• fl2sh 12bit: conversion from float to 12 bit

In some cases truncated data is needed, what can be accomplished by setting h = 0. Forexample, at the input for A-law encoding, truncation is necessary, not rounding. On theother hand within recursive filters rounding is essential. Hence, this routine serves bothcases.

Concerning the location of the fixed-point data within one 16 bit word, it is more practicalto have the decimal point immediateley after the sign bit (between bit 15 and 14, if the bitsare ordered from 0..15). Since this is well defined, software that processes the quantizeddata needs no knowledge about the resolution of the data. It is not important whethertha data comes from A or µ law decoding routines or from 12-bit (13, 14, 16-bit) A/Dconverters.

It should be noted that this routine only processes data in a normalized form (−1.0 ≤x < +1.0); it shall not be used if data is in the short range (–32768.0 .. 32767.0).

Variables:

n . . . . . . . . . . . . . . Number of samples in x[ ].x . . . . . . . . . . . . . . Pointer to input float array.iy . . . . . . . . . . . . . . is output short array pointer.half lsb . . . . . . . . . . . . . . A double representation of half LSb for the desired resolution

(quantization).mask . . . . . . . . . . . . . . The unsigned masking of the lower (right) bits.

Return value:

Returns the number of overflows that happened in the quantization process.

17.2.5 serialize * justified

Syntax:


long serialize right justified (short *par buf, short *bit stm, long n, long

resol, char sync);

long serialize left justified (short *par buf, short *bit stm, long n, long

resol, char sync);


Description:

Routines serialize right justified and serialize left justified convert a frameof n right- or left-justified samples with a resolution resol into a right-justified, serial softbitstream of length n.resol. If the parameter sync is set, a serial bitstream compliant tothe Annex B of ITU-T Recommendation G.192 will be generated. In this case, the thelength of the bitstream is increased to (n+2).resol.1 It should be noted that the least

1The option of adding only the synchronization word, as implemented in the STL92, is no longer


significant bits of the input words are serialized first, such that the bitstream is a streamwith less significant bits coming first.

The only difference between these functions is that function serialize right justified

serializes right-justified parallel data and function serialize left justified serializeleft-adjusted data.

It is supposed that all parallel samples have a constant number of bits, or resolution, forthe whole frame. If this does not happen, the bitstream cannot be serialized by thesefunctions. As an example, this is the case of the RPE-LTP bitstream: the 260 bits of theencoded bitstream are not divided equally among the 76 parameters of the bitstream. Incases like this, users must write their own serialization function.

Variables:

par buf . . . . . . . . . . . . . . Input buffer with right- or left-adjusted, parallel samples tobe serialized.

bit stm . . . . . . . . . . . . . . Output buffer with serial bitstream. It should be noted thatbit stm must point to an appropriately allocated memory block,which should be a block of n.resol shorts if sync is 0, or ablock of (n+2).resol shorts otherwise.

n . . . . . . . . . . . . . . Number of words in the input buffer, i.e., the number of par-allel samples/frame.

resol . . . . . . . . . . . . . . Resolution (number of bits) of the samples in par buf.sync . . . . . . . . . . . . . . If 1, a synchronization header is to be used (appended) at the

boundaries of each frame of the bitstream. If 0, a synchro-nization header is not used.

Return value:

This function returns the total number of softbits in the output bitstream, includingthe synchronization word and frame length. If the value returned is 0, the number ofconverted samples is zero.

17.2.6 parallelize * justified

Syntax:


long parallelize right justified (short *bit stm, short *par buf, long bs len,

long resol, char sync);

long parallelize left justified (short *bit stm, short *par buf, long bs len,

long resol, char sync);


Description:

Functions parallelize right justified and parallelize left justified convert thesamples in input buffer bit stm from the ITU-T softbit representation to its parallel repre-sentation, given a number of bits per sample, or resolution. The input serial bitstream oflength bs len is converted into a frame with bs len/resol samples (if sync==0) or (bs len–2)/resol samples (if sync!=0), with a resolution resol. It should be noted that softbits in

available with this function since the STL96.


lower positions in the input buffer are supposed to represent less significant bits of theparallel word (considering bits that would compose the same parallel word). In otherwords, the softbits that come first are less significant than the next ones, when referringto the same parallel word (as defined by the parameter resol). Therefore, when generatinga word from the bitstream, bits from the bitstream that comes first are converted to lowersignificant bits. Frames with the synchronization flag but without the frame length causethe function to exit with an error code equal to –bs len.

The difference between both functions is that parallelize right justified convertsthe serial bitstream to a parallel data in a right-justified format, i.e., data is alignedto the right, while the routine parallelize left justified parallelizes samples withleft-justification.

If the G.192 Annex B bitstream format is used (parameter sync==1), a synchronizationheader is present at frame boundaries in the input buffer. In this case, the synchronizationand frame lengthwords are not copied from the bitstream to the output buffer.

Note that all parallel samples are supposed to have a constant number of bits, or resolu-tion, for the whole frame. This means that, by construction, the number of softbits dividedby the resolution must be an integer number, or (bs len–2)%resol==0. If this does nothappen, probably the serial bitstream was not generated by one of the serialize ...()

routines, and cannot be parallelized by these functions. An example is the case of theRPE-LTP bitstream: the 260 bits of the encoded bitstream are not divided equally amongthe 76 parameters of the bitstream. In cases like this, users must write their own paral-lelization function.

If an erased frame is found, the function returns without performing any action.

Variables:

bit stm . . . . . . . . . . . . . . Input buffer with bitstream to be parallelized.par buf . . . . . . . . . . . . . . Output buffer with right- or left-adjusted samples.bs len . . . . . . . . . . . . . . Number of bits per frame (i.e., size of input buffer, which

includes the synchronization header length if sync==1).resol . . . . . . . . . . . . . . Resolution (number of bits per parallel sample) of the right-

or left-adjusted samples in par buf.sync . . . . . . . . . . . . . . If 1, a synchronization header is expected in the boundaries

of each frame of input the bitstream. If 0, synchronizationheaders are not expected.

Return value:

On success, this function returns the number of samples of the output parallel samplebuffer.


Since these tools do not refer to ITU-T recommendations, no special compliance testsare needed. As for portability, it may be checked by running the same speech file on aproven platform and on a test platform. Files processed this way should match exactly.A preferred data file would be the ramp described in the compliance test description.

The routines in this module had portability tested for VAX/VMS with VAX-C and GNUC (gcc) and for MS-DOS with a number of Borland C/C++ compilers (Turbo C v2.0,


Turbo-C++ v1.0, Borland C++ v3.1). Portability was also tested in a number of Unixworkstations and compilers: Sun workstation with Sun-OS and Sun-C (cc), acc, and gcc;HP workstation with HP-UX and gcc.

17.4 Example code


Two programs are provided as demonstration programs for the UTL module, scaldemo.c(version 1.3) and spdemo.c (version 3.2).

Program scaldemo.c scales a 16-bit, linear PCM input file by a user-specified linearor dB gain value. Default resolution is 16 bits per sample, and rounding is used bydefault when converting from float to short. When resolutions different from 16 bitsare used with rounding, versions 1.2 and earlier of the program might not produce the”expected” results. The program used to limit the resolution of the samples (by maskingthe 16− resolution least significant bits) when converting from short to float. Additionalrounding is applied after scaling when converting from float to short. If the desiredoperation is, actually, scale and then reduce the resolution with rounding, masking beforethe scaling operation should be disabled. In version 1.3 and later, the default behavioris not to apply such mask, (same as the option -nopremask) for backward compatiblebehavior, the option -premask should be explicitly used.

Program spdemo.c converts files between serial and parallel formats using a user-specifiedresolution and frame (or block) size. A known issue with spdemonstration version 3.2 isthat the command-line option -frame does not work properly for parallel-to-serial con-version. In this case, the desired frame size has to be specified as parameter N in thecommand line.

17.4.2 The master header file for the STL demonstration pro-grams

The module also contains the common demonstration program definition file ugstdemo.h(version 2.2), which is used by all STL demonstration programs. This header file containsthe definition of a number of pseudo-functions and symbols that facilitate the use of amore homogeneous user interface for the different demonstration programs in the STL.

The available pseudo-functions include:

GET PAR * ... Pseudo-functions for printing a user prompt and reading a positionalparameter from the command line. The parameters can be char (C),integers (I), long integers (L), unsigned long integers (LU), floats (F),doubles (D), and strings (S).

FIND PAR * .. Pseudo-functions for printing a user prompt and reading a positionalparameter from the command line if it was specified by the user, or toassume a default value defined by the programer. The parameters canbe char (C), integers (I), long integers (L), floats (F), doubles (D), andstrings (S).


ARGS() ...... The pseudo-function ARGS() allows that the list of parameters that showup as its arguments be passed on to ANSI-C compliant compilers, or bediscarded for old-vintage, K&R-style compilers that do not accept pa-rameter list in function prototypes. This pseudo-function allows for saferfunction prototypes in compilers that support parameter declaration infunction prototypes and avoids the need to edit function declarations (orlong #if/#else/#end for prototype sections) for non-ANSI C compilers.

Some of the symbols defined in ugstdemo.h include:

• Symbols WB, RB, WT, RT, and RWT for file open (fopen()) operation. These symbolsare portable across a large number of platforms and permit write-binary, read-binary,write-text, read-text, and read-write-text file mode operations.

• Symbol MSDOS, which is necessary for proper compilation of some of the programsunder the MS-DOS environment. The symbol is defined in case MS-DOS is detected,and undefined in case MS-DOS is not detected.

• Symbol COMPILER, which contains a text string describing the compiler used togenerate an executable.

17.4.3 Short and float conversion and scaling routines

The following C code exemplifies the use of the short and float number format interchangeroutines, as well as of the gain scaling routine. This program is a simplified version ofthe example program scaldemo.c provided in the STL distribution. This program reads16-bit, 2-complement, left-justified input samples, converts them to a float representationin the range of -1..+1, applies a gain (or loss) factor to these samples, converts the scaledsamples back to an integer representation (16 bit, 2’s complement, left-justified) usingrounding and hard-clip of the floating point numbers. The number of most significantbits to be used is also specified by the user.


#include "ugstdemo.h"#include "ugst-utl.h"

#define LENGTH 5

main(argc, argv)int argc;char *argv[];

long i, NrSat;long B, round;double h;unsigned m;short ix[LENGTH];


float y[LENGTH];float factor;

GET_PAR_F(1, "_Factor: .............. ", factor);GET_PAR_L(2, "_Resolution: .......... ", B);GET_PAR_L(3, "_Round(1=yes,0=no): ... ", round);

/* Initialize short’s buffer, BUT left-ajusted! */for (i = 0; i < LENGTH; i++)

ix[i] = i << (16 - B);

/* Choose rounding number */h = 0.5 * (round << (16 - B));

/* Find mask */m = 0xFFFF << (16 - B);

/* Print original data */printf("ix before normalization\n");printf("=======================\n");for (i = 0; i < LENGTH; i++)

printf("ix[%3d]=%5d\n", i, ix[i]);

/* Convert samples to float, normalizing */sh2fl(LENGTH, ix, y, B, 1);

/* Normalizes vector */scale(y, LENGTH, (double) factor);

/* Convert from float to short */NrSat = fl2sh(LENGTH, y, ix, h, m);

/* Inform about overflows */if (NrSat != 0)

printf("\n Number of clippings: .......... %ld [] ", NrSat);

/* Print new data */printf("after normalization ... \n");printf("========================\n");for (i = 0; i < LENGTH; i++)

printf("y[%3d]= %e -> ix[%3d]=%5d\n", i, y[i], i, ix[i]);

return (0);

17.4.4 Serialization and parallelization routines

The following C code implements an example of use of the serialization and paralleliza-tion routines available in the STL. Input data is generated within the program. The


program takes the number of bits per sample, the justification, and whether synchroniza-tion headers should be generated. The input data is printed on the screen in its parallelrepresentation, which is then converted to the serial format and back to the parallel for-mat. Then, the serialized version of the data is printed on the screen, and the programends.


#include "ugstdemo.h"#include "ugst-utl.h"

#define LENGTH 5


long i, j, k, smpno, bitno, init;long B, just;double h;unsigned m;char c;short par[LENGTH];short ser[16 * LENGTH + 2];char sync;long (*ser_f) (); /* pointer to serialization function */long (*par_f) (); /* pointer to parallelization function */

GET_PAR_L(1, "_Resolution: ................................ ", B);GET_PAR_L(2, "_Data is Right (1) or Left (0) justified? ... ", just);GET_PAR_L(3, "_Use sync header? ........................... ", sync);

/* Initialize flag "OFF" */init = 0;c = sync ? 1 : 0;smpno = LENGTH;bitno = LENGTH * B + sync ? 2 : 0;

/* Initialize data and choose pointers to appropriate functions */if (just) /* Right-justified data */

ser_f = serialize_right_justified;par_f = parallelize_right_justified;for (i = 0; i < LENGTH; i++)

par[i] = i;else /* Left-justified data */

ser_f = serialize_left_justified;


par_f = parallelize_left_justified;for (i = 0; i < LENGTH; i++)

par[i] = i << (16 - B);

/* Print original data */printf("\npar[] before serialization\n");printf("==========================\n");for (i = 0; i < LENGTH; i++)

printf("par[%3d]=%5d\n", i, par[i]);

bitno = ser_f(par, /* input buffer pointer */ser, /* output buffer pointer */smpno, /* no. of samples (not bits) per frame */B, /* number of bits per sample */sync); /* whether sync header is present or not */

smpno = par_f(ser, /* input buffer pointer */par, /* output buffer pointer */bitno, /* number of softbits per frame */B, /* number of bits per sample */sync); /* whether sync header is present or not */

/* Print new data */printf("========================\n");printf("| 0x81 represents a ‘1’| \n| 0x7F represents a ‘0’|\n");printf("========================\n");printf("after serialization ... \n");printf("========================\n");if (sync)

printf("Sync word is ser[%d]= %04X", 0, ser[0]);printf("Frame length is ser[%d]= %04X", 1, ser[1]);

for (k = 2, i = 0; i < LENGTH; i++)

printf("\npar[%3d]=%5d\n", i, par[i]);for (j = 0; j < B; j++, k++)

printf("ser[%3d]= %04X\t", sync ? k : k - 2, ser[k]);printf("\n");

Chapter 18

References

[1] ITU-T. Recommendation G.191, Software Tools for Speech and Audio Coding Stan-dards. ITU, Geneva, March 1993.

[2] Study Group XV. Report of Working Party XV/2. Technical report, CCITT, Novem-ber 1991. COM XV-R 73-E.

[3] C. South and P. Usai. Subjective Performance of CCITT’s 16 kbit/s LD-CELPAlgorithm with Voice Signals. In Globecom 92. IEEE, 1992.

[4] H.J. Braun, S. Feldes, and G. Schroder. Preselection for the Half-Rate GSM Stan-dard. In Workshop on Speech Coding for Telecommunications, pages 90–92, Septem-ber 11-13 1991.

[5] ITU-T Q.10/16 Rapporteur. ITU-T AC-05-16 Processing Test Plan of the 14kHzLow-Complexity Audio Coding Algorithm at 24, 32 and 48 kbps Extension to ITU-TG.722.1. ITU, Strasbourg, April 2005.

[6] ITU-T. Recommendation P.48, Specification for an intermediate reference system ,volume V of Blue Book, pages 81–86. ITU, Geneva, 1989.

[7] Bell Northern Research (Canada). Frequency response characteristics for low bit-ratecodec testing. Technical report, UIT-T SG 12, Geneva, December 1994. DelayedDocument D,38 (SG12).

[8] ITU-T. Recommendation P.830, Subjective performance assessment of TelephoneBand and Wideband Digital Codecs. ITU, Geneva, February 1996.

[9] Spiros Dimolitsas, Frank Corcoran, and Channasandra Ravishankar. Correlationbetween headphone and telephone-handset listener opinion scores for single-stimulusvoice coder performance assessments. IEEE Signal Processing Letters, 2(3):41–43,March 1995.

[10] Spiros Dimolitsas, Frank Corcoran, and Channasandra Ravishankar. Dependenceof opinion scores on listening sets used in degradation category rating assessments.IEEE Transactions on Speech and Audio Processing, 3(5):421–424, September 1995.

[11] ITU-T. Recommendation P.341, Characteristics of wideband terminals. ITU,Geneva, 1994.

207


[12] Denis Byrne et al. An international comparison of long-term average speech spectra.Journal of the Acoutical Society of America, 96(4):2108–2120, October 1994.

[13] ITU-T. Recommendation G.712, Performance Characteristics of PCM channels.ITU, Geneva, 1992.

[14] Aachen University. An Implementation of the Signal Conditioning Device. TechnicalReport TD91/23, ETSI/TM/TM5/TCH-HS, April 1991.

[15] V.K. Varma. Testing speech coders for usage in wireless communications systems. InSecond IEEE Workshop on Speech Coding for Telecommunications, “Speech Codingfor the Network of the Future”, Quebec, Canada, October 13–15 1993. IEEE.

[16] Bellcore. Proposed model for simulating radio channel burst errors. Technical report,CCITT SG XII, Geneva, October 1992. Doc.SQ-15.92(Rev.).

[17] E.N Gilbert. Capacity of a burst-noise channel. Bell Syst.Tech.J., pages 1253–1265,1960.

[18] D. Knuth. Seminumerical Algorithms . In The Art of Computer Programming .Addison-Wesley, Massachusetts, 1981.

[19] ITU-T. Recommendation G.192, A Common Digital Parallel Interface for SpeechStandardization Activities. ITU, Geneva, November 1995.

[20] J. Fennick. Quality Measures and the design of telecommunications systems . ArtechHouse, 1988.

[21] ITU-T. Recommendation G.711, Pulse code molulation (PCM) of voice frequencies,volume Fascicle III.4 of Blue Book, pages 175–184. ITU, Geneva, 1989.

[22] N.S. Jayant and P. Noll. Digital Coding of Waveforms. Prentice-Hall, 1984.

[23] ITU-T. Recommendation G.726, 40, 32, 24, 16 kbit/s adaptive differential pulse codemodulation (ADPCM). ITU, Geneva, 1990.

[24] ITU-T. Recommendation G.711/Appendix I, A high quality low-complexity algorithmfor packet loss concealment with G.711. ITU, Geneva, 1999.

[25] M. Bonnet, O. Macchi, and M. Jaidane-Saidane. Theoretical analysis of the ADPCMCCITT algorithm. IEEE Trans.on Communications, 38(6):847–858, June 1990.

[26] W.R. Daumer, X. Maitre, P. Mermelstein, and I. Tokizawa. Overview of the ADPCMcoding algorithm. Proc.Globecom, pages 774–777, 1984.

[27] ITU-T. Comparison of ADPCM Algorithms. ITU-T Rec.G.726, Appendix III,Geneva 1994.

[28] ITU-T. Extensions of Recommendation G.726 for use with uniform quantized inputand output. In Recommendation G.726, chapter Annex A. ITU, Geneva, 1994.

[29] S.I. Feldman, D.M. Gay, M.W. Maimone, and N.L. Schryer. A Fortran–to–C Con-verter. Technical Report Computing Science 149, AT&T Bell Laboratories, August1990.


[30] ITU-T. Recommendation G.727, 5-, 4-, 3- and 2-bits/sample embedded adaptivedifferential pulse code modulation (ADPCM). ITU, Geneva, December 1990.

[31] ITU-T. Annex A to Recommendation G.727, Extensions of Recommendation G.727for use with uniform-quantized input and output. ITU, Geneva, November 1994.

[32] CCITT. Recommendation G.722, 7 kHz audio-coding within 64 kbit/s, volume Fas-cicle III.4 of Blue Book, pages 269–341. ITU, Geneva, 1989.

[33] ITU-T. Recommendation G.725, System Aspects for the Use of the 7kHz Audio Codecwithin 64 kbit/s, volume Fascicle III.4 of Blue Book, page 11. ITU, Geneva, 1989.

[34] Paul Mermelstein. G.722, a new CCITT coding standard for digital transmissionof wideband audio signals. IEEE Communications Magazine, 26(1):8–15, January1988.

[35] Masahiro Taka et al. Overview of the 64 kbit/s (7 kHz) audio coding standard. InGlobecom 86, pages 593–598, Houston, Texas, Dec.1–4 1986. IEEE.

[36] G. Modena, A. Coleman, P. Usai, and P. Coverdale. Subjective Performance Evalu-ation of the 7 KHz Audio Coder. In Globecom 86, pages 599–604, Houston, Texas,Dec.1–4 1986. IEEE.

[37] G. Le Tourner et al. Implementation of the 7 kHz Audio Codec and its TransmissionCharacteristics. In Globecom 86, pages 605–609. IEEE, 1986.

[38] Manfred Dietrich et al. Initialization and Mode Switching of 7 kHz Audio Terminals.In Globecom 86, pages 610–614. IEEE, 1986.

[39] Keith R. Harrison et al. Possible Applications for networking considerations relatingto the 64 kbit (7 kHz) audio coding system. In Globecom 86, pages 615–622. IEEE,1986.

[40] Xavier Maitre. 7 kHz audio coding within 64 kbit/s. IEEE Journal on Selected Areasin Communications, 6(2):283–298, February 1988.

[41] P. Kroon, R.J. Sluyter, and E.F. Deprettere. Regular-pulse excitation: a novel ap-proach to effective and efficient multipulse coding of speech. IEEE Trans. Acoust.,Speech, Signal Processing, ASSP-34(5):1054–1063, October 1986.

[42] Peter Vary et al. Speech codec for the European Mobile Radio System. In ICASSP88, pages 227–230. IEEE, 1988.

[43] Ulrich Reute (Guest Editor). Special Issue on Medium Rate Speech Coding forDigital Mobile Technology. Speech Communication, 7(2), July 1988.

[44] GSM-06.10. Full Rate Speech Transcoding. ETSI, France, October 1992. ReleasedJuly 1, 1993.

[45] ITU-T. Recommendation P.800, Methods for the subjective determination of trans-mission quality. ITU, Geneva, 1996.

[46] H.B Law and R.A. Seymour. A reference distortion system using modulated noise.Proc.Institution of Electrical Engineers (IEE), 109B:484–485, Nov 1962.


[47] ITU-T. Recommendation P.81, Modulated Noise Reference Unit (MNRU) , volume Vof Blue Book, pages 198–203. ITU, Geneva, 1989.

[48] User’s Group on Software Tools. CCITT Software Tool Library Manual. Technicalreport, CCITT SG XV, May 1992. COM XV-R 87-E.

[49] ITU-T. Recommendation P.810, Modulated Noise Reference Unit (MNRU) . ITU,Geneva, February 1996.

[50] S.F. Campos Neto. Characterization of the revised implementation of the Modu-lated Noise Reference Unit (MNRU) for the ITU-T Software Tool Library. WhiteContribution COM 15-182-E, ITU-T, 1993-1996.

[51] W.H Press, B.P Flannery, S.A. Teukolky, and W.T. Vetterling. Numerical Recipesin C: The Art of Scientific Computing . Cambridge University Press, Cambridge,1990.

[52] ITU-T. Recommendation P.56, Objective measurement of active speech level, vol-ume V of Blue Book, pages 110–120. ITU, Geneva, 1989.

[53] ITU-T. Handbook on Telephonometry . ITU, Geneva, 1992. 2nd. Edition.

[54] H.Kuttruff. Room acoustics. Elsevier applied sciences, 1991.

[55] A.V. Oppenheim and R.W. Schafer. Discrete-Time Signal Processing. Prentice-Hall,1989.

Appendix A

Unsupported tools

This Appendix to the ITU-T Software Tool Library (STL) Manual describes the unsup-ported tools provided in the ITU-T STL. These tools are provided “as is” and withoutany warranties or implied suitability to use. However, any feedback on problems withthese tools will be welcome and accomodated as possible, as will any improvements madewhich can be shared and incorporated in the STL.

A.1 Source code

asc2bin.c: . . . . . . . . . converts decimal or hex ASCII data into short/long/float or doublebinary numbers. Input data must be one number per line.

astrip.c: . . . . . . . . . . . strips off a segment of a file. Can operate on block or sample-based parameters and can apply windowing to the borders of theextracted segment. Tested in Unix/MSDOS.

bin2asc.c: . . . . . . . . . converts short/long/float or double binary numbers into octal, deci-mal or hex ASCII numbers, printing one per line. For Unix/MSDOS.

compfile.c: . . . . . . . . . compare word-wise binary files. For VMS/Unix/MSDOS.

dumpfile.c: . . . . . . . . dump a binary file. For VMS/Unix/MSDOS.

chr2sh.c: . . . . . . . . . . convert char-oriented files to short-oriented (16-bit words) files bypadding the upper byte of each word of the output file with zeros.For Unix/MSDOS.

endian.c: . . . . . . . . . . program that verifies whether the current platform is big or littleendian (i.e. high-byte first or low-byte first). For Unix/MSDOS.

fdelay.c: . . . . . . . . . . . flexible program to introduce delay into a file. Delay can be speci-fied in value and length, or can be taken from a user-specified file.For Unix/MSDOS.

g728-vt: . . . . . . . . . . . a directory with software tools for use with the G.728 floating pointverification package. Not all tools are functional; preserved here forfuture reference.

getcrc32.c: . . . . . . . . . 32-bit CRC calculation function and program (depending on howit is compiled). Uses the same polynomial as ZIP. Checked forportability across a number of platforms. Makefile compiles it intoan executable called crc. For Unix/MSDOS.

measure.c: . . . . . . . . . measure statistics/CRC for a bunch of files. For VMS/Unix/MSDOS.

211


oper.c: . . . . . . . . . . . . implement arithmetic operation on two files: add, subtract, multi-ply or divide two files applying scaling factors (linear or dB), andadding a DC level. For Unix/MSDOS.

pshar: . . . . . . . . . . . . . a directory with makefiles, readme, source code and test files for aportable shell archiving/dearchiving program compatible with Unixthe shar utility. Very simple and useful, in especial for MSDOS andVMS systems. See the directory for more details.

sb.c: . . . . . . . . . . . . . . . swap bytes for word-oriented files. For VMS/Unix/MSDOS.sh2chr.c: . . . . . . . . . . convert short-oriented (16-bit words) files to char-oriented files

by ignoring the upper byte of each word of the input file. ForUnix/MSDOS.

sine.c: . . . . . . . . . . . . . generate a sinewave file for a given speco of AC/DC/phase/ fre-quency/sampling frequency values. For VMS/Unix/MSDOS.

sub-add.c: . . . . . . . . . subtract/add files (depending on the compilation, see makefiles).For VMS/Unix/MSDOS.

xencode.c: . . . . . . . . . uuencode compatible with auto-break/sequencing for long files andCRC calculation for error detection. Not functional under MSDOS6.22.

xdecode.c: . . . . . . . . . uudecode compatible with auto-break/sequencing for long files andCRC calculation for error detection. Not functional under MSDOS6.22.

A.2 Scripts

rm.bat . . . . . . . . . . . . . “fake” deletion utility that tries to emulate the basic functionalityof the Unix command rm, that deletes multiple files specified in thecommand line. Should be put in the path, unless a version of rm isalready available.

swapover.bat . . . . . . . MSDOS batch script for byte-swapping multiple files. Uses sb.c.swapover.sh . . . . . . . . Unix script for byte-swapping multiple files. Uses sb.c.

A.3 Makefiles

makefile.tcc . . . . . . . . for Borland [bt]cc C/C++ compilermakefile.djc . . . . . . . . for MSDOS djc port of gccmakefile.unx . . . . . . . for Unix makemakefile.cl . . . . . . . . . for MS Visual C command-line compiler

A.4 Test files

tstunsup.zip . . . . . . . . zip archive with test files for testing some of the unsupported tools:

cf:

3200 cftest1.dat

3200 cftest2.dat

3200 cftest3.dat


186 delay-15.ref

186 delay-a.ref

214 delay-u.ref

186 delaydft.ref

200 delayfil.ref

sb:

100 bigend.src

100 litend.src

xencode and xdecode:

9182 voice.ori

8705 voice.uue

2093 printme.eps

3795 printme.uue

5368 voice01.uue

It is necessary to have unzip/pkunzip/Winzip installed for extrac-tion.

Appendix B

Future work

The following item have been identified as future action items by the UGST in the closefuture. Contributions are welcome, eventhough the proposed algorithm implementationsmay not be in fully compliance with the software tool guidelines:

G.728 . . . . . . . . . . . . . . . . . . . . . . . . LDCELP coding at 16 kbit/s, possibly at other bitratesas well

P.50, P.59 . . . . . . . . . . . . . . . . . . . . Reference implementations of the Artificial Speech andArtificial Conversational Speech.

Channel models . . . . . . . . . . . . . . . Transmission channel models to be incorporated in theEID module, e.g. satellite, cellular, and IP transmissionchannels.

Reference systems . . . . . . . . . . . . Alternatives to the MNRU, e.g. T-reference, S-Reference,PMNRU, PMNRU).

G.726 & G.727 Annex A . . . . . . linear-interfaced G.726 and G.727.

Processing framework . . . . . . . . . A processing framework tool for the implementation ofhost laboratory functions.

215

stl 2005 man

Documents