integrating perception into the flexcode project flexcode...auditory modeling and audio coding •...
TRANSCRIPT
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 1
FlexCode
The Sensitivity Matrix
Integrating Perception into the Flexcode Project
Jan PlasbergFlexcode Seminar Lannion
June 6, 2007
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 2
FlexCode Outline
• Flexcode in a Nutshell• The Sensitivity Matrix• Coding with the Sensitivity Matrix• Open Issues
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 3
FlexCode Who?
France Telecom
RWTH Aachen
KTH
NokiaEricsson
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 4
FlexCode The Problem
• Heterogeneity of networks increasing• Networks inherently variable (mobile users)
• But:– Coders not designed for specific environment– Coders inflexible (codebooks and FEC)– Feedback channel underutilized
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 5
FlexCode Adaptation and Coding
set of models, fixed quantizer
flexible
rigid
CELP with FEC
fixed model
fixed quantizer
adaptive coding
AMR
FlexCode
PCM
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 6
FlexCode The Tools
• Tools include– Models of source, channel, receiver– High-rate quantization theory– Multiple description coding (MDC)– Iterative source-channel decoding– Distortion measures using the sensitivity matrix
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 7
FlexCode Outline
• Flexcode in a Nutshell• The Sensitivity Matrix
– Why, what and how?– What can it tell us?
• Coding with the Sensitivity Matrix• Open Issues
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 8
FlexCode
• Simultaneous masking (frequency masking)
Human Auditory Masking
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 9
FlexCode
• Non-simultaneous masking (time masking)
• Not accounted for by simple auditory models• Coding standards use (heuristic) work-arounds
Human Auditory Masking
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 10
FlexCode
• Merge the worlds ofauditory modeling and audio coding
• Advanced auditory models too complex for coding
AM
AM+
x
x̂⋅ )ˆ,( xxd
perceptual distortion criterion
Motivation
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 11
FlexCode The Sensitivity Matrix
• Sensitivity Matrix M from Taylor expansion
Complexity problem solved!
Analysis of model properties using linear algebra• Masking curve• Subspace-based analysis
0)ˆ,(),ˆ()()ˆ(21)ˆ,( →−−≈ xxxxxMxxxx dd x
H
xx
xxxM=
∂∂∂
=ˆ
2
, ˆˆ)ˆ,()]([
jijix xx
d
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 12
FlexCode Application to an Auditory Model
• A typical auditory model (here: the Dau model)
• Distortion is sum of per-channel distortions
Filte
rban
k
Stage 1
Stage 2
Stage KL yx
channel m
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 13
FlexCode Application to an Auditory Model
• Find sensitivity matrix for model output y per channel m
Filte
rban
kStage
1Stage
2Stage
KL yx
)ˆ()()ˆ(21)ˆ,( )()( yyyMyyyy −−= m
yHmd
IyM 2)()( =myhere
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 14
FlexCode Application to an Auditory Model
• Each model stage linearized by respective Jacobian
Filte
rban
kStage
1Stage
2Stage
KL yx
z∂ y∂)(zJ z
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 15
FlexCode Application to an Auditory Model
• Get sensitivity matrix for channel m in input signal domain from chain rule;
Filte
rban
kStage
1Stage
2Stage
KL yx
∏∏⎥⎥⎦
⎤
⎢⎢⎣
⎡=
k
mk
H
k
mk
mx x )()()( 2)( JJM
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 16
FlexCode Application to an Auditory Model
• Sensitivity Matrix is sum of per-channel matrices
Filte
rban
kStage
1Stage
2Stage
KL yx
∑ ∏∏∑⎥⎥⎦
⎤
⎢⎢⎣
⎡==
m k
mk
H
k
mk
m
mxx
)()()( 2)()( JJxMxM
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 17
FlexCode Validity of the Approximation
• Narrow-band music + white noise at different SNR (10-60 dB), correlation > 0.9
(dB)Daud
(dB)SMd coding possible
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 18
FlexCode Analysis: Fourier Transform
• Sensitivity matrix for frequency domain vectors, with = DFT in matrix notation
)(XM X
HxX FxMFXM )()( =
F
)ˆ()ˆ( XXFxx −=− H
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 19
FlexCode Analysis: Masking Curve
• Assume masking threshold is at • Masking curve at frequency i:
gain for unit distortion in DFT bin i
to approach masking threshold
iiXHi guXMu )(
211 =
[ ]Hi 010 LL=u
[ ] iiXig
,)(2XM
=⇔
ig
1)ˆ( =XX,d
pos. i
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 20
FlexCode
0 500 1000 1500 2000 2500 3000 3500 40000
10
20
30
40
50
60
F
MA
SK
Example 1: Simultaneous Masking
• Masking curve for 50 dB sinusoid at 1000 Hz
dB
f (Hz)
- sensitivity matrix for Dau et al.- spectral model (v.d.Par, 2002)O human listeners (Zwicker, 1982)
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 21
FlexCodeExample 2: Non-Simultaneous Masking
• Subspace analysis of M: low sensitivity space
signal
noise
projection
t (ms)
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 22
FlexCodeF
200
700
F
200
700
F
T 50 100
200
700
A Spectro-Temporal Experiment
• Spectrograms for two sinusoids (200 Hz and 700 Hz) coded with an APCM coder
signal
WMSE-coder
Sens.-coder
f (H
z)
t (ms)
f (H
z)f (
Hz)
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 23
FlexCode Outline
• Flexcode in a Nutshell• The Sensitivity Matrix• Coding with the Sensitivity Matrix• Open Issues
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 24
FlexCode Problem
• Sensitivity Matrix generally not diagonal in typical coding domains (e.g., time, frequency, etc.)
– distortion not single-letter:
– no analytical solution known for optimal coding– so far only trained VQ available
(undoable for large dimensions)!
)ˆ()()ˆ(21)ˆ,( xxxMxxxx −−≈ x
Hd
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 25
FlexCode Solution Outline
• Sensitivity Matrix diagonal in perceptual domain– transform signal vectors into perceptual domain– (W)MSE-optimal coding in perceptual domain results in
minimum perceptual distortion • Optimal Transform:
)()()( xMxFxF xoptT
opt c=′′
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 26
FlexCode Making it Practical
• Problem: optimal transform depends on x– not known at decoder!
• Solution: Parameterize transform as DCT + diagonal weights
– combines DCT coding gain with perceptual transform!– Model-driven approach to temporal noise shaping (TNS)
tDCTf WTWxF =′ )(
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 27
FlexCode Examples
• Original
• Freq. weights
• Time-freq. weights
DCTf TWxF =′ )(
tDCTf WTWxF =′ )(
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 28
FlexCode Outline
• Flexcode in a Nutshell• The Sensitivity Matrix• Coding with the Sensitivity Matrix• Open Issues
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 29
FlexCode Complexity
• Main complexity lies here:
• Can we make multiplications ‘sparse’?• Can we reduce the size of the matrices?
∑ ∏∏⎥⎥⎦
⎤
⎢⎢⎣
⎡=
m k
mk
H
k
mk
)()(2)( JJxM
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 30
FlexCode Conclusion
• Using auditory models in coding is possible• General method• Direct applicability in AbS structures (CELP)• Solution outline for transform coding
SIP - Sound and Image Processing Lab, EE, KTH Stockholm 31
FlexCode The End
Thank you!