[ieee 2010 data compression conference - snowbird, ut, usa (2010.03.24-2010.03.26)] 2010 data...

1
Analysis of Amplitude Quantization in ACELP Excitation Coding Wisarn Patchoo 1 , Thomas R. Fischer 1 , Changho Ahn 2 , and Sangwon Kang 2 1 School of EECS, Washington State University, Pullman, WA 99163-2752, USA 2 School of Elec. Eng. and Comp. Science, Hanyang Univ., Ansan, Korea Algebraic Code-Excited Linear Prediction (ACELP) is a popular linear prediction speech coding algorithm that provides good performance with reasonable implementation complexity, and requires low transmission bit rate (e.g., [1]). The excitation sequence is formed as the sum of two quantized excitations: an adaptive (pitch) codebook excitation and an algebraic (fixed) codebook excitation. Algebraic codevectors are sparse. A sub- frame of length L is partitioned into K interleaved tracks of pulse positions, and m total non-zero pulse positions are partitioned into m k non-zero pulse positions in track k, with 1 0 . K k k m m = = The algebraic codebook is a product code of the form gc where g is a (quantized) sub-frame gain, and c is the sparse, L-dimensional codevector with m non- zero amplitudes restricted to the values 1 ± . The algebraic codevector is used to excite a synthesis filter with impulse response h(n), forming the fixed-codebook excitation contribution to the synthesized speech sub-frame, gHc, where H is a lower-triangular matrix formed from h(n) [1]. In this paper we study the coding performance advantages possible by using the optimum ACELP codevector amplitudes compared to the ACELP codebook. Denote the m non-zero positions in the codevector as 0 1 0 m i i L " . Combining the sub-frame gain, g, with the codevector amplitudes c, and since the remaining L-m dimensions of c are zero, an optimum algebraic codevector must minimize 2 2 H H = x c x c where x is a target signal, 0 1 1 [ ] m i i i H = " hh h is an L row by m column matrix with j i h , the i j column of impulse response matrix H, and 1 0 j m i j c c = = is the sum of the m interleaved algebraic codevectors. The optimum (unquantized) non-zero pulse amplitudes are 1 ( ) T T opt HH H = c x , provided H H T ~ ~ is invertible. The empirical density of the normalized non-zero pulse amplitudes of opt c is symmetric and bimodal. Rate-distortion analysis of the quantization of such random variables indicates that at the (typical) ACELP encoding rate of 1 bit per amplitude, and for the square error distortion measure, simple uniform scalar quantization is optimum. At rates larger than 1 bit per pulse amplitude some increase in SNR is possible, but the increase in SNR yields only modest improvement in perceptual quality, as measured by mean opinion score. [1] ITU-T G.729.1, “G.729 based embedded variable bit-rate coder: An 8-32 kbits/s scalable wideband coder bitstream interoperable with G.729,’’ May 2006. 2010 Data Compression Conference 1068-0314/10 $26.00 © 2010 IEEE DOI 10.1109/DCC.2010.52 550

Upload: sangwon

Post on 25-Mar-2017

219 views

Category:

Documents


4 download

TRANSCRIPT

Analysis of Amplitude Quantization in ACELP Excitation Coding

Wisarn Patchoo1, Thomas R. Fischer1, Changho Ahn2, and Sangwon Kang2

1School of EECS, Washington State University, Pullman, WA 99163-2752, USA 2School of Elec. Eng. and Comp. Science, Hanyang Univ., Ansan, Korea

Algebraic Code-Excited Linear Prediction (ACELP) is a popular linear prediction speech coding algorithm that provides good performance with reasonable implementation complexity, and requires low transmission bit rate (e.g., [1]). The excitation sequence is formed as the sum of two quantized excitations: an adaptive (pitch) codebook excitation and an algebraic (fixed) codebook excitation. Algebraic codevectors are sparse. A sub-frame of length L is partitioned into K interleaved tracks of pulse positions, and m total non-zero pulse positions are partitioned into mk non-zero pulse positions in track k, with

1

0.

K

kk

m m−

=

=∑ The algebraic codebook is a product code of the form gc where g is a

(quantized) sub-frame gain, and c is the sparse, L-dimensional codevector with m non-zero amplitudes restricted to the values 1± . The algebraic codevector is used to excite a synthesis filter with impulse response h(n), forming the fixed-codebook excitation contribution to the synthesized speech sub-frame, gHc, where H is a lower-triangular matrix formed from h(n) [1].

In this paper we study the coding performance advantages possible by using the optimum ACELP codevector amplitudes compared to the ACELP codebook. Denote the m non-zero positions in the codevector as 0 10 mi i L−≤ ≤ ≤ ≤ . Combining the sub-frame gain, g, with the codevector amplitudes c, and since the remaining L-m dimensions of c are zero, an optimum algebraic codevector must minimize

22H H− = −x c x c where x

is a target signal, 0 1 1

[ ]mi i iH

−= h h h is an L row by m column matrix with

jih , the ij

column of impulse response matrix H, and 1

0j

m

ij

c c−

==∑ is the sum of the m interleaved

algebraic codevectors. The optimum (unquantized) non-zero pulse amplitudes are 1( )T T

opt H H H−=c x , provided HH T ~~ is invertible. The empirical density of the normalized non-zero pulse amplitudes of optc is symmetric and bimodal. Rate-distortion analysis of the quantization of such random variables indicates that at the (typical) ACELP encoding rate of 1 bit per amplitude, and for the square error distortion measure, simple uniform scalar quantization is optimum. At rates larger than 1 bit per pulse amplitude some increase in SNR is possible, but the increase in SNR yields only modest improvement in perceptual quality, as measured by mean opinion score. [1] ITU-T G.729.1, “G.729 based embedded variable bit-rate coder: An 8-32 kbits/s scalable wideband coder bitstream interoperable with G.729,’’ May 2006.

2010 Data Compression Conference

1068-0314/10 $26.00 © 2010 IEEE

DOI 10.1109/DCC.2010.52

550