super-wideband fine spectrum quantization for low-rate ...mdct coding mode of the 3gpp evs codec...

23
Super-Wideband Fine Spectrum Quantization for Low-rate High-Quality MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15 th Dec 2015

Upload: others

Post on 25-Sep-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

Super-Wideband Fine Spectrum

Quantization for Low-rate High-Quality

MDCT Coding Mode of The 3GPP EVS

Codec

Presented by

Srikanth Nagisetty, Hiroyuki Ehara

15th Dec 2015

Page 2: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

2

Topics of this Presentation

Background

EVS Encoder Overview

LR-MDCT Coder

Envelope Coding

Spectrum Coding

Bit allocation

Gap Filling

Evaluation Results

Summary

Page 3: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

3

Background

Challenge:

Encoding SWB band signal at low delay & low bit-rate

Mixed contents and music sampled at 32kHz

Frame length : 20 ms

Too few bits for quantizing SWB spectrum (quantizing 560 bins using around 256 bits)

Highly efficient quantization algorithm is needed

Page 4: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

4

EVS Encoder Overview

The Low Rate High Quality (LR-HQ) MDCT coding is

one of the mode in the EVS MDCT coder.

Speech Signals

Mixed & Music Signals

Page 5: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

5

LR-HQ MDCT SWB Encoder (1/2)

Signal Classification Spectrum Coding

Advanced bit allocation

Quantized LF information

for gap filling

Harmonic Normal

Transient Huffman Coding

Energy Coding

Page 6: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

6

LR-HQ MDCT SWB Coder -Envelope coding

1,...,0,)(log)(

1)(

22

)(

bands

bkk

kk

MM NbEpsilonkXbE

end

bstart

Band Energies

1,...,0,))(

()(int

bandsM

M Nbq

bEroundbI

Scaled Energies

1,...,1),1()()(

)()0()0(int

bandsMMM

refMM

NbbIbIbI

q

IroundII

Differential Indices

Selection based on 1. Range of indices 2. Bits consumption

Page 7: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

7

HF bands for Quantization

Spectral Peak Tracking

Reduces the width for the current frame - Bit savings - Inter-frame Continuity

Temporal discontinuity -If the current frame HF spectrum

peak is quantized while the previous frame is not quantized

-Perceivable degradation

n HF Bands

Energy 𝑃𝑒𝑎𝑘

𝐴𝑣𝑒𝑟𝑎𝑔𝑒 − 𝐸𝑛𝑒𝑟𝑔𝑦 Average-Energy

Non-Zero Non-Zero

Spectral Peak Tracking

Bit allocation

Bit allocation

Page 8: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

8

LR-HQ MDCT Coder -Bit allocation (1/2)

Bits(1)>T1

Bands(1)

Bands(2)

Bands(n-1)

Bands(n) Bits(n)

Bits(n-1)

Bits(1)

Bits(2)

Bit

allocation

Result

Iterative

: :

Bits(2)>T2

Bits(n-1)>Tn-1

Bits(n)>Tn

: Dynamic

Bit

allocation

Combination

Logic

Dynamic Bit allocation

Bands with bits < Threshold identified and relocates bits to other bands

Page 9: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

9

LR-HQ MDCT Coder -Bit allocation (2/2)

Adaptive bit allocation based on Grouping

Adaptive

Grouping

Bit

allocation

to Group

Bands(1)

Bands(2)

Bands(n-1)

Bands(n)

Group(k)

Group(1)

Group(2)

Bit

allocation

Results : : Dynamic Bit

allocation to

bands in a

Group

Iterative: Identify Next Dominant Group

Identify

Dominant

Group

Bit allocation

Bits allocated 1) Adaptively grouping the bands and 2) By exploiting the relationship between the groups. This approach is more suitable for tonal (Harmonic) like signals as the energy of the bands is mainly concentrated at discrete tones.

Identifies dominant bands (local max) and its valley to form groups Group energies

Group energy and Variance

Page 10: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

10

LR-HQ MDCT Coder -Gap Filling

2 0 4 10 8 14 kHz 12 6

19 20 21 22

TCQ/USQ is performed on Non Zero bands

Noise-filling is performed on 0-bit-allocated LF bands

Zero-bit bands cause spectral gaps, which lead to audible

artifacts if left alone. Gap filling technique is used.

Spectral Gaps

Page 11: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

11

LR-HQ MDCT Coder -Gap Filling Normal Mode

2 0 4 10 8 14 kHz 12 6

19 20 21 22

… TCQ/USQ is performed on Non Zero bands

Noise-filling is performed on 0-bit-allocated LF bands

Best match spectrum filled in the High Frequency

Normalized LF

Page 12: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

12

LR-HQ MDCT Coder -Gap Filling Harmonic Mode

2 0 4 10 8 14 kHz 12 6

19 20 21 22

TCQ/USQ is performed on Non Zero bands

Best match spectrum filled in the High Frequency

Noise-filling is performed on 0-bit-allocated LF bands

Mirroring

Gap Filled Spectrum

1.Harmonic relation not maintained b/w reproduced HF and input HF

2.Roughness perceived during reproduction @ the boundary regions

Normalized LF

Page 13: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

13

f(n)

Synthesized LF spectrum Replicated HF spectrum

Last spectral peak in LF

freq

f(n)

Last spectral peak in LF

HarmonicEst HarmonicEstHarmonicEst

1st spectral peak in HF

3rd spectral peak in HF

1st spectral peak in HF

2nd spectral peak in HF

3rd spectral peak in HF

LR-HQ MDCT Coder -Spectral Correction Roughness perceived when

2 tones fall in 30 - 600 Hz and amplitudes are rapidly changing

freq

Spectral correction using Harmonic transposition.

This method retains both the fine structure of the spectrum and the harmonic relationships between the low frequency tones and the replicated high frequency tones

Page 14: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

14

LR-HQ MDCT Coder - Listening Test Setup

Degradation Category Rating (DCR) methodology (ITU-T P.800) : 24 Mixed and Music samples recorded in Japanese language 16 Japanese naïve listeners Codecs EVS SWB LR-HQ mode Reference Codec: AMR-WB+

Page 15: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

15

LR-HQ MDCT Coder - Evaluation Results

LR-HQ SWB Performance is equal or greater than AMR-WB+ whose algorithmic delay is longer than twice of EVS (32ms)

Page 16: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

16

Summary

For encoding the SWB spectral coefficients at low bit budget

Spectral band energies are quantized using an efficient Huffman coding methods

Advanced bit allocation methods are used for efficient representation of spectrum.

Spectral holes in the full spectrum coding is filled using gap filling techniques

Gap-filling techniques are improved by introducing a fine spectrum normalization and adaptive sparse BWE coding

Conclusion: EVS LR-HQ SWB coder meets the performance requirements and is adopted as a part of multi-mode MDCT coding in the EVS codec.

Page 17: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

17

References

1. 3GPP TR 22.813 - “Study of Use Cases and requirements for enhanced voice codecs for the EPS”, v.10.0.0, Mar 2010.

2. 3GPP TS 26.090 - “Mandatory speech codec speech processing functions; Adaptive Multi-Rate (AMR) speech codec; General description Speech”, v.11.0.0, Sep.2012.

3. 3GPP TS 26.190 - “Speech codec speech processing functions; Adaptive Multi-Rate-Wide-band (AMR-WB) speech codec; General description”, v.11.0.0, Oct.2012.

4. 3GPP TS 26.445 - “Codec for Enhanced Voice Services (EVS); Detailed Algorithmic Description”, v.12.0.0), Sep. 2014.

5. S. Bruhn, et al., “Standardization of the new EVS Codec”, IEEE ICASSP, April 2015. 6. S. Nagisetty, et al., “Low Bitrate High Quality MDCT Audio coding of the 3GPP EVS”, IEEE ICASSP 2015. 7. ITU-T G.719, “Low complexity, full band audio coding for high quality, conversational applications”, ITU-T

Recommendation G.719, June 2008. 8. T. R. Fischer, H. Sung, J. Zhan, E. Oh, “High-quality audio transform coded excitation using trellis codes,” ICASSP,

2008, pp.197-200. 9. ITU-T G.718 Annex B, “Superwideband scalable extension for ITU-T G.718”, ITU-T Recommendation G.718

Amendment 2, March, 2010. 10. L. Laaksonen et al., “Superwideband extension of G.718 and G.729.1 speech codecs”, Interspeech 2010. 11. M. Oshikiri, H. Ehara, K. Yoshida, “Efficient spectrum coding for super-wideband speech and its application to

7/10/15 kHz bandwidth scalable coders,” IEEE ICASSP, 2004, pp.I-481-484. 12. H. Fastl, E. Zwicker, Psychoacoustics: Facts and Models. Springer series in information sciences. Springer, 3rd

edition, 2007. 13. Audio codec processing functions; Extended Adaptive Multi-Rate Wide-band (AMR-WB+) codec”, 3GPP TS

26.290. 14. 3GPP, T-doc S4-130522, EVS performance requirements, April 2013. 15. ITU-T P.800, Methods for Subjective Determination of Transmission Quality. International Telecommunication

Union (ITU), Series P., August 1996. 16. 3GPP, Tdoc S4-141065, Report of the Global Analysis Lab for the EVS Selection Phase, Aug. 2014.

Page 18: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

18

APPENDIX

Page 19: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

19

LR-HQ MDCT SWB Encoder (2/2)

Frequency

Energy

2 0 4 10 8 14 kHz 12 6

19 20 21 22

EM(1) EM(2) EM(b-1) EM(b) .. Spectral Coefficients are grouped Spectral bands

Spectral band energies

Page 20: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

20

LR-HQ MDCT Coder -Envelope coding (2/2)

IM(b)

IM(0)

:

Small Symbol Huffman

< [-64 63]

Context Based Huffman

Resized Huffman

Large Symbol Huffman

> [-64 63]

Pulse Mode Scale Mode

32 Symbol Huffman tables 8 Symbol

Huffman table

determines best Huffman table for encoding the current band

IM(b-1)

IM(b)

If exceeds [-4 3] 1. Position 2. Amplitude of

coded directly

IM(b)

IM(b)

Range

Span is narrowed down to fewer code words (21 symbols)

IM(b)

Page 21: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

21

LR-HQ MDCT Coder -Envelope coding (3/3)

Small Symbol coding method

Context based Huffman: determines best Huffman

table for encoding the current band

Resized Huffman : narrowed to a smaller range for

using Huffman table with fewer symbols (21 symbols).

T)(b-ΔIT)(b-ΔI(b) ΔI

T) (b-ΔIT) (b-ΔI(b) ΔI(b) I Δ

MMM

MMMM

1),3,-1max(

1),3, -1min( 1

IM(b-1)

IM(b)

IM(b)

thrTthrT

b(b)IΔ M

15,15

bandfor index aldifferenti new the theis

1

Page 22: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

22

LR-HQ MDCT Coder -Quantization (TCQ and USQ)

Magnitude Quantized by USQ & TCQ

Position, Number and sign coded by Arithmetic coding

Page 23: Super-Wideband Fine Spectrum Quantization for Low-rate ...MDCT Coding Mode of The 3GPP EVS Codec Presented by Srikanth Nagisetty, Hiroyuki Ehara 15th Dec 2015 . 2 Topics of this Presentation

23

LR-HQ MDCT Coder -Sparse Band Search

Gap filling for Normal mode

1,...,0',)'()'()'( 2 NlagkkEnekcorrkS

1

0])[]'[(

~])[()'(

Ncnt

kkIdxklagkXkIdxXkcorr

1

0

2])[]'[(~

)'(Ncnt

kkIdxklagkXkEne

Best match index is identified by finding the k' which maximizes the correlation measure, S(k') according to

Reduces computational load

Best match spectrum filled in the High Frequency Region