speech signal analysis and coding - ernetpkalra/old-courses/siv864-2010/session-0… · mpeg-4 hvxc...

52
Speech Signal Analysis and Coding Dr. Arun Kumar Centre for Applied Research in Electronics (CARE), IIT Delhi [email protected]

Upload: others

Post on 19-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Speech Signal Analysis

and Coding

Dr. Arun Kumar

Centre for Applied Research in Electronics

(CARE), IIT Delhi

[email protected]

Page 2: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Contents

• Speech Processing Applications

• Speech Signal Understanding

– Speech Production

– Speech Signal Characteristics and Analysis

• Speech Coding

– Coding Standards

– Coder Attributes including Quality Evaluation

– Coding Methodologies

Page 3: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

• Speech Transmission

– Trunk-line telephony

– Wireless telephony

• Speech Storage

– Voice Mail, Voice Memo, Answering

machines

• Speech Synthesis

– Text-to-speech-synthesis

– Automatic information services

Speech Processing Applications

Page 4: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

• Speaker Verification and Identification

– Phone banking

– Secure entry

• Aids for the Handicapped

– Variable rate playback

– Hearing aids

– Reading machine for visually impaired

– Visual display of speech information for

hearing impaired

Speech Processing Applications

Page 5: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

• Speech Enhancement

– Echo and noise cancellation

• Speech Recognition

– Automatic language translation

• Voice Personality Transformation

– Voice conversion from “source” to “target”

Speech Processing Applications

Page 6: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

“ It is the variation of pressure, from atmospheric pressure, as a function of time, caused by traveling waves from the speaker’s mouth (apart from nostrils, cheeks and throat).”

The Speech Signal

Page 7: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Units:

SPL (Sound Pressure Level) in dB

relative to a reference level.

Reference: 10 –16 W/cm2

- Corresponds to ‘just barely audible’

The Intensity Level of Speech

Page 8: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

0

20

55 60

70

80

100

120

d

B

Just barely audible

Whisper

Airplane

Rock concert

Heavy traffic Variations in normal voice

level (1 meter distance from

mouth)

The Intensity Level of Speech

Page 9: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

• Energy of speech during 1 s

– 2 x 10 –5 Joules

(It takes 100 Joules to light a 100 W bulb for

1 s)

• Strongest vowel: /a/ as in “talk”

• Weakest vowel: /i/ as in “see”

• Strongest consonant: /r/ as in “run”

• Weakest consonant: /Θ/ as in “thin”

The Intensity Level of Speech

Page 10: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Audio

Signal

Category

Bandwid

th(Hz)

Sampling

Rate

(kHz)

Source

Rate

(kbps)

Telephone

Band

Speech

300-3400 8.0 128

Wideband

Speech50-7000 16.0 256

Wideband

Audio20-20,000 44.1/48.0 705/768

Speech & Audio Signal Specs.

Page 11: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Speech Articulation by the Vocal System

Reproduced from: D. O’Shaughnessy, Human and machine speech communication, IEEE Press, 2000

Page 12: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Speech Classes by Articulation

• Voiced speech

• Unvoiced speech

• Transient (stop) sounds

Page 13: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

The relationship between speech sounds (phonemes) and their acoustic realizations

– Waveform

– Spectrum

– Spectrogram

Acoustic Analysis of Speech

Page 14: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Time Waveform of a Speech Sentence

0 0 . 2 0 . 4 0 . 6 0 . 8 1 1 . 2 1 . 4

- 1

- 0 . 8

- 0 . 6

- 0 . 4

- 0 . 2

0

0 . 2

0 . 4

0 . 6

0 . 8

T im e ( s )

Am

plit

ud

e

ʓʓʓʓ(TH)

THIS IS GOOD

ɪɪɪɪ(i) s

(s)ɪɪɪɪ(i) s

(s)

ɡɡɡɡ (G) U (O) d

(D)

Page 15: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

• Vowels– High energy, periodic, steady state utterance

• Unvoiced fricatives– Low energy, noise-like, steady-state utterance

• Voiced fricatives– Low energy, element of periodicity, steady-state

utterance

• Stops– Transient release, medium to low energy

• Nasals– Low-to-medium energy, periodic, steady-state

utterance

Waveform Analysis of a Speech

Page 16: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Fundamental frequency F0 / Pitch period

F0 Male Female

Average (Hz) 132 223

Range (Hz) 50-250 120-500

Acoustic Analysis of Vowels

Page 17: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

• Stop Consonants

– Momentary blockage of the vocal tract (50-

100ms): Closure phase

– Release burst (shortest acoustic event)

– Voice – onset time (VOT)

• Fricatives

– Narrow constriction somewhere in vocal

tract

– Turbulent airflow through the constriction

Acoustic Analysis of Consonants

Page 18: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

The

International

Phonetic

Alphabet

(IPA)

Page 19: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Universal Speech Production Model

Output speech

Impulse Train

Generator

Glottal Pulse Model

White Noise

Generator

Vocal Tract Filter

Voiced or Unvoiced switch

Radiation Model

Voiced Gain

Unvoiced Gain

Page 20: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Vocal Tract Model

• Time-varying all-pole linear filter excited by a

source signal.

• H(z) models the vocal tract system.

H(z)=1/A(z)

e[n] s[n]

)(

1

1

1)(

1

zAza

zHP

i

i

i

=

=

∑=

Page 21: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

0 500 1000 1500 2000 2500 3000 3500 4000-100

-80

-60

-40

-20

0

20

40

60

80

Frequency (Hz)

Mag (

dB

)Voiced Speech Spectrum

Page 22: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

0 500 1000 1500 2000 2500 3000 3500 4000-100

-80

-60

-40

-20

0

20

40

60

80

Frequency (Hz)

Mag (

dB

)Superimposed 2nd-order LP Envelope

Page 23: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

0 500 1000 1500 2000 2500 3000 3500 4000-100

-80

-60

-40

-20

0

20

40

60

80

Frequency (Hz)

Mag (

dB

)Superimposed 2nd, 6th order LP Envelopes

Page 24: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

0 500 1000 1500 2000 2500 3000 3500 4000-100

-80

-60

-40

-20

0

20

40

60

80

Frequency (Hz)

Mag (

dB

)Superimposed 2nd, 6th, &10th order LP Envelopes

Page 25: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

0 500 1000 1500 2000 2500 3000 3500 4000-100

-80

-60

-40

-20

0

20

40

60

80

Frequency (Hz)

Mag (

dB

)Superimposed 2nd, 6th, 10th & 16th order LP Envelopes

Page 26: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Unvoiced Speech and 10th order LP Residual

0 5 1 0 1 5 2 0 2 5 3 0 3 5 4 0-0 .1 9

-0 .1 8

-0 .1 7

-0 .1 6

-0 .1 5

-0 .1 4

-0 .1 3

-0 .1 2

-0 .1 1

- 0 . 1

T im e ( m s )

Am

plit

ud

e

0 5 1 0 1 5 2 0 2 5 3 0 3 5 4 0- 0 . 2

-0 .1 5

- 0 . 1

-0 .0 5

0

0 .0 5

0 . 1

0 .1 5

T im e ( m s )

Am

plit

ud

e

Page 27: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Voiced Speech and 10th-order LP Residual

0 5 1 0 1 5 2 0 2 5 3 0 3 5 4 0- 0 . 8

- 0 . 6

- 0 . 4

- 0 . 2

0

0 . 2

0 . 4

0 . 6

T i m e ( m s )

Am

plit

ude

0 5 1 0 1 5 2 0 2 5 3 0 3 5 4 0- 0 . 1 5

- 0 . 1

- 0 . 0 5

0

0 . 0 5

0 . 1

0 . 1 5

0 . 2

T i m e ( m s )

Am

plit

ud

e

• Short-term correlation

• Long-term correlation

Page 28: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Speech Coding

Page 29: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

• For telephone band (or narrowband) speech:– Signal Bandwidth: 300-3400 Hz

– Sampling Rate: 8000 Hz

– Resolution: 16 bits / sample linear PCM

• Uncompressed bit rate:16 bits/sample x 8000 samples/s

= 128 Kbit/s

• What is the minimum coding rate for transmitting the message information?

Coding Rates

Page 30: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Coder Classes according to Bit-Rate

B > 16 Kbps High bit rate coders

4 < B <=16 KbpsMedium bit rate

coders

1 < B <=4 Kbps Low bit rate coders

B < 1 KbpsVery low bit rate

coders

Page 31: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

• ITU-T: International Telecommunications Union (UN)

• MPEG: Motion Pictures Experts Group (ISO/UN)

• INMARSAT: Intl. Maritime Satellite Corporation – for geo-synchronous satellites

• US Government: DoD, NATO

• TIA: Telecom Industry Association - for North American Telecom standards

• ETSI: European Telecom. Standards Institute

Standards Organizations

Page 32: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Name Coding TypeBit-rate

(kbps)Organization Year

G.711/

G.712

PCM µ-law/

A-law64 ITU-T 1972

G.721/G.723

G.726/G.727ADPCM

32/24/40/

16ITU-T

1984/86/

88/90

G.728 LD-CELP 16 ITU-T 1992

G.729 CS-ACELP 8.0 ITU-T 1995

G.723.1 ACELP 6.3/5.3 ITU-T 1995

G.722(Wideband)

SB-ADPCM48/56/64 ITU-T 1985

Speech Coding Standards

Page 33: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Name Coding TypeBit-rate

(kbps)Organization Year

G.722.1(Wideband)

Transform 24/32 ITU-T 1999

Inmarsat IMBE 4.15 INMARSAT 1990

IS-54 (old) VSELP 7.95 TIA 1992

GSM-FR RPE-LTP 13 GSM 1991

GSM-HR CELP 5-6 GSM 1994

GSM-EFR CELP 12.2 GSM 1997

Speech Coding Standards

Page 34: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Name Coding TypeBit-rate

(kbps)Organization Year

IS-641(new) ACELP 7.4 TIA 1997

Iridium AMBE 2.4 Iridium 1996

MPEG-4 HVXC 2-4 MPEG/ISO 1999

MPEG-4 CELP 4-24 MPEG/ISO 1999

FS-1015 LPC-10 2.4 US-DoD

/NATO 1984

FS-1016 CELP 4.8US-DoD

/NATO1989

MELP MELP 2.4US-DoD

/NATO1996

Speech Coding Standards

Page 35: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

• Coding Methodologies

– Waveform coding

– Vocoding or parametric coding

– Hybrid coding

Coding Methodologies

Page 36: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Classes according to Coding Type

Bit rate (Kbps)

Quality

Poor

Fair

Good

Excellent

Parametric Coders

Waveform

approximating

coders

1 42 168 32 64

Hybrid

Coders

Page 37: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Coding Standards

Bit rate (Kbps)

Quality

Poor

Fair

Good

Excellent

Parametric Coders

Waveform approximating

coders

1 42 168 32 64

Hybrid Coders

G.726G.711

Linear

PCM

GSM EFR

FS1015

G.723.1

G.729

G.728

IS96

GSM/2

GSM FR

MELP

Page 38: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

PCM Coding

Q[.]x[n] x’[n]

i[n]

• Instantaneous, non-uniform quantization

• For time-varying energy signals eg speech, uniform quantization is inefficient.

• If signal energy is halved, SQNR falls 6 dB.

• SQNR is independent of signal level in Log quantizer.

Page 39: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

ADPCM Coding

+ Q[.]

Encoder

+P

Decoder +

P

Input

x[n]- d[n]

x’[n]

c[n]d’[n]

x”[n]

c[n]

d’[n] x”[n]

x’[n]

Page 40: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Prediction in the context of Coding

0 5 1 0 1 5 2 0- 0 . 8

- 0 . 6

- 0 . 4

- 0 . 2

0

0 . 2

0 . 4

0 . 6

T i m e ( m s )

Am

plit

ud

e

0 5 1 0 1 5 2 0- 0 . 8

- 0 . 6

- 0 . 4

- 0 . 2

0

0 . 2

0 . 4

T i m e ( m s )

Am

plit

ude

Signal and first-difference signal

Page 41: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

• DPCM with fixed predictor can give 4-11 dB improvement over PCM.

• PCM with adaptive quantization can give ~ 5

dB improvement over µ-law non-adaptive PCM.

• DPCM with adaptive prediction can give 10-12 dB improvement over fixed predictor.

ADPCM Coding

Page 42: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Code Excited Linear Prediction (CELP) Coding

• Most coders in 4.8-16 kbps are based on Linear Prediction Analysis-by-Synthesis (LPAS) coding.

• CELP belongs to LPAS paradigm of speech coding.

Page 43: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Generic Linear Prediction Analysis Synthesis (LPAS) Coder

Excitation

Generator

Error

Minimization

Synthesis

Filter

LP Analysis

+

Input

speech

-

Page 44: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

CELP Decoder

Excitation

GeneratorG/A(z)

Excitation parameters

LP and Gain parameters

Synthesized speech

Page 45: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

• Speech Quality

– Objective measures

• Segmental SNR

• Itakura-Saito distance measure

• Spectral distortion (SD)

• ITU-T P.862 Recommendation

– Subjective measures

• Mean opinion score (MOS)

• Diagnostic Rhyme Test (DRT)

• Diagnostic Acceptability Measure (DAM)

Speech Quality Measurement

Page 46: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

• Listening quality scale

Excellent 5

Good 4

Fair 3

Poor 2

Bad 1

Absolute Category Rating Tests (MOS)

Page 47: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

• Measures speech intelligibility

• Listeners are presented with one of two words which differ only in leading consonant

– Examples:

• Meet - Beat

• Than - Dan

• Met - Net

• Jest - Guest

Diagnostic Rhyme Test

Page 48: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

• Total possible pairs = 96

• Intelligibility score, S, is given by:

N(correct) – N(incorrect)

S = 100 x

N(test pairs)

Coder Rate (kbps) DRT MOS

FS1016 4.8 91.7 3.3

G.728 16 93.0 3.9

Diagnostic Rhyme Test

Page 49: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

• Part of ITU-T P.862 standard

• Objective is to mimic sound perception by persons in real life

• PESQ simulates expts. in which subjects judge speech quality

• Physical signals are mapped to psychophysical representations that match internal representations in the head

Perceptual evaluation of speech quality (PESQ)

Page 50: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

• Complexity

– Computational complexity

• Simplex/half-duplex/full-duplex real time

performance on a single DSP

• Fixed point vs. floating point

• CELP coders are computationally complex

– Memory requirement

• Storage of look-up tables, codebooks etc.

Speech Coder Complexity Issues

Page 51: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Timing Diagram for various Coding Delays

Buffer input

speech frame

Buffer input

speech frame 2

Buffer input

speech frame 3

Buffer input

speech frame 4

Buffer input

speech frame 5

Encode

frame 1Encode

frame 2

Encode

frame 3

Encode

frame 4

Transmit bits of

frame 1

Transmit bits of

frame 2Transmit bits of

frame 3

decode

frame 1decode

frame 2

decode

frame 2

Play back

decoded speech

frame 1

Play back

decoded speech

frame 2Total one way coding delay

Algorithmic

buffering delay

Encoder

processing

delay

Bit transmission

delay

Decoder

processing

delay

Sum of the

two is the

total processing

delay

0 1 2 3 4 5Time (frame index)

Page 52: Speech Signal Analysis and Coding - ERNETpkalra/OLD-COURSES/siv864-2010/session-0… · MPEG-4 HVXC 2-4 MPEG/ISO 1999 MPEG-4 CELP 4-24 MPEG/ISO 1999 FS-1015 LPC-10 2.4 US-DoD /NATO

Thank You!