pyramid vector quantization - xiph.orgtterribe/daala/pvq201404.pdf · 2 mozilla what is pyramid...
Post on 15-Feb-2018
220 Views
Preview:
TRANSCRIPT
Mozilla
Pyramid Vector Quantization
Mozilla2
What is Pyramid Vector Quantization
A Vector Quantizer That has a simple algebraic structure To perform gain-shape quantization
Mozilla3
Motivation
Mozilla4
Why Vector Quantization 3 classic advantages (Lookabaugh et al 1989)
ndash Space filling advantage VQ codepoints tile space more efficiently
Example 2-D squares vs hexagons Maximum possible gain for large dimension 153 dB
ndash Shape advantage VQ can use more points where PDF is higher
114 dB gain for 2-D Gaussian 281 for high dimension
ndash Memory advantage exploit statistical dependence between vector components
Mozilla5
Why Vector Quantization 3 classic advantages (Lookabaugh et al 1989)
ndash Space filling advantage VQ codepoints tile space more efficiently
Example 2-D squares vs hexagons Maximum possible gain for large dimension 153 dB
ndash Shape advantage VQ can use more points where PDF is higher
Can be mitigated with entropy coding
ndash Memory advantage exploit statistical dependence between vector components
Transform coefficients are not strongly correlated
Mozilla6
Why Vector Quantization Important Space advantage applies even when
values are totally uncorrelated Another important advantage
ndash Can have codebooks with less than 1 bit per dimension
Mozilla7
Why Algebraic VQ Trained VQ impractical for high rates large
dimensionsndash High dimension rarr large LUTs lots of memory
Exponential in bitrate
ndash No codebook structure rarr slow search
ldquoAlgebraicrdquo VQ solves these problemsndash Structured codebook no LUTs fast search
Space-filling lattice for arbitrary dimension unknown have to approximate
ndash PVQ asymptotically optimal for Laplacian sources
Mozilla8
Why Gain-Shape Quantization Separate ldquogainrdquo (energy) from ldquoshaperdquo (spectrum)
ndash Vector = Magnitude times Unit Vector (point on sphere)
Potential advantagesndash Can give each piece different rate allocations
Preserve energy (contrast) instead of low-passing Scalar can only add energy by coding plusmn1rsquos
ndash Implicit activity masking Can derive quantization resolution from the explicitly
coded energy
ndash Better representation of coefficients
Mozilla9
How it Works (High-Level)
Mozilla10
Simple Case PVQ without a Predictor
Scalar quantize gain Place K unit pulses in N dimensions
ndash Up to N = 1024 dimensions for large blocks
ndash Only has N-1 degrees of freedom
Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain
Mozilla11
Codebook for N=3 anddifferent K
Mozilla12
PVQ vs Scalar Quantization
Mozilla13
PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the
prediction as ldquospecialrdquondash They are much more likely
Subtracting and coding the residual would lose energy preservation
Solution align the codebook axes with the prediction and treat one dimension differently
Mozilla14
2-D Projection Example
Input
Input
Mozilla15
2-D Projection Example
Prediction
Input
Input + Prediction
Mozilla16
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection
Mozilla17
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection
Mozilla18
2-D Projection Example
θ
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle
Mozilla19
2-D Projection Example Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle Code other
dimensions
Prediction
Input
θ
Mozilla20
What does this accomplish Creates another ldquointuitiverdquo parameter θ
ndash ldquoHow much like the predictor are werdquo
ndash θ = 0 rarr use predictor exactly
θ determines how many pulses go in the ldquopredictionrdquo direction
ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down
Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)
ndash Can repeat for more predictors
Mozilla21
Details
Mozilla22
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands
ndash Gain theta etc signaled separately for each band
ndash Layout ad-hoc for now
Scan order in each band optimized for decreasing average variance
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla2
What is Pyramid Vector Quantization
A Vector Quantizer That has a simple algebraic structure To perform gain-shape quantization
Mozilla3
Motivation
Mozilla4
Why Vector Quantization 3 classic advantages (Lookabaugh et al 1989)
ndash Space filling advantage VQ codepoints tile space more efficiently
Example 2-D squares vs hexagons Maximum possible gain for large dimension 153 dB
ndash Shape advantage VQ can use more points where PDF is higher
114 dB gain for 2-D Gaussian 281 for high dimension
ndash Memory advantage exploit statistical dependence between vector components
Mozilla5
Why Vector Quantization 3 classic advantages (Lookabaugh et al 1989)
ndash Space filling advantage VQ codepoints tile space more efficiently
Example 2-D squares vs hexagons Maximum possible gain for large dimension 153 dB
ndash Shape advantage VQ can use more points where PDF is higher
Can be mitigated with entropy coding
ndash Memory advantage exploit statistical dependence between vector components
Transform coefficients are not strongly correlated
Mozilla6
Why Vector Quantization Important Space advantage applies even when
values are totally uncorrelated Another important advantage
ndash Can have codebooks with less than 1 bit per dimension
Mozilla7
Why Algebraic VQ Trained VQ impractical for high rates large
dimensionsndash High dimension rarr large LUTs lots of memory
Exponential in bitrate
ndash No codebook structure rarr slow search
ldquoAlgebraicrdquo VQ solves these problemsndash Structured codebook no LUTs fast search
Space-filling lattice for arbitrary dimension unknown have to approximate
ndash PVQ asymptotically optimal for Laplacian sources
Mozilla8
Why Gain-Shape Quantization Separate ldquogainrdquo (energy) from ldquoshaperdquo (spectrum)
ndash Vector = Magnitude times Unit Vector (point on sphere)
Potential advantagesndash Can give each piece different rate allocations
Preserve energy (contrast) instead of low-passing Scalar can only add energy by coding plusmn1rsquos
ndash Implicit activity masking Can derive quantization resolution from the explicitly
coded energy
ndash Better representation of coefficients
Mozilla9
How it Works (High-Level)
Mozilla10
Simple Case PVQ without a Predictor
Scalar quantize gain Place K unit pulses in N dimensions
ndash Up to N = 1024 dimensions for large blocks
ndash Only has N-1 degrees of freedom
Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain
Mozilla11
Codebook for N=3 anddifferent K
Mozilla12
PVQ vs Scalar Quantization
Mozilla13
PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the
prediction as ldquospecialrdquondash They are much more likely
Subtracting and coding the residual would lose energy preservation
Solution align the codebook axes with the prediction and treat one dimension differently
Mozilla14
2-D Projection Example
Input
Input
Mozilla15
2-D Projection Example
Prediction
Input
Input + Prediction
Mozilla16
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection
Mozilla17
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection
Mozilla18
2-D Projection Example
θ
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle
Mozilla19
2-D Projection Example Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle Code other
dimensions
Prediction
Input
θ
Mozilla20
What does this accomplish Creates another ldquointuitiverdquo parameter θ
ndash ldquoHow much like the predictor are werdquo
ndash θ = 0 rarr use predictor exactly
θ determines how many pulses go in the ldquopredictionrdquo direction
ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down
Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)
ndash Can repeat for more predictors
Mozilla21
Details
Mozilla22
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands
ndash Gain theta etc signaled separately for each band
ndash Layout ad-hoc for now
Scan order in each band optimized for decreasing average variance
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla3
Motivation
Mozilla4
Why Vector Quantization 3 classic advantages (Lookabaugh et al 1989)
ndash Space filling advantage VQ codepoints tile space more efficiently
Example 2-D squares vs hexagons Maximum possible gain for large dimension 153 dB
ndash Shape advantage VQ can use more points where PDF is higher
114 dB gain for 2-D Gaussian 281 for high dimension
ndash Memory advantage exploit statistical dependence between vector components
Mozilla5
Why Vector Quantization 3 classic advantages (Lookabaugh et al 1989)
ndash Space filling advantage VQ codepoints tile space more efficiently
Example 2-D squares vs hexagons Maximum possible gain for large dimension 153 dB
ndash Shape advantage VQ can use more points where PDF is higher
Can be mitigated with entropy coding
ndash Memory advantage exploit statistical dependence between vector components
Transform coefficients are not strongly correlated
Mozilla6
Why Vector Quantization Important Space advantage applies even when
values are totally uncorrelated Another important advantage
ndash Can have codebooks with less than 1 bit per dimension
Mozilla7
Why Algebraic VQ Trained VQ impractical for high rates large
dimensionsndash High dimension rarr large LUTs lots of memory
Exponential in bitrate
ndash No codebook structure rarr slow search
ldquoAlgebraicrdquo VQ solves these problemsndash Structured codebook no LUTs fast search
Space-filling lattice for arbitrary dimension unknown have to approximate
ndash PVQ asymptotically optimal for Laplacian sources
Mozilla8
Why Gain-Shape Quantization Separate ldquogainrdquo (energy) from ldquoshaperdquo (spectrum)
ndash Vector = Magnitude times Unit Vector (point on sphere)
Potential advantagesndash Can give each piece different rate allocations
Preserve energy (contrast) instead of low-passing Scalar can only add energy by coding plusmn1rsquos
ndash Implicit activity masking Can derive quantization resolution from the explicitly
coded energy
ndash Better representation of coefficients
Mozilla9
How it Works (High-Level)
Mozilla10
Simple Case PVQ without a Predictor
Scalar quantize gain Place K unit pulses in N dimensions
ndash Up to N = 1024 dimensions for large blocks
ndash Only has N-1 degrees of freedom
Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain
Mozilla11
Codebook for N=3 anddifferent K
Mozilla12
PVQ vs Scalar Quantization
Mozilla13
PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the
prediction as ldquospecialrdquondash They are much more likely
Subtracting and coding the residual would lose energy preservation
Solution align the codebook axes with the prediction and treat one dimension differently
Mozilla14
2-D Projection Example
Input
Input
Mozilla15
2-D Projection Example
Prediction
Input
Input + Prediction
Mozilla16
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection
Mozilla17
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection
Mozilla18
2-D Projection Example
θ
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle
Mozilla19
2-D Projection Example Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle Code other
dimensions
Prediction
Input
θ
Mozilla20
What does this accomplish Creates another ldquointuitiverdquo parameter θ
ndash ldquoHow much like the predictor are werdquo
ndash θ = 0 rarr use predictor exactly
θ determines how many pulses go in the ldquopredictionrdquo direction
ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down
Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)
ndash Can repeat for more predictors
Mozilla21
Details
Mozilla22
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands
ndash Gain theta etc signaled separately for each band
ndash Layout ad-hoc for now
Scan order in each band optimized for decreasing average variance
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla4
Why Vector Quantization 3 classic advantages (Lookabaugh et al 1989)
ndash Space filling advantage VQ codepoints tile space more efficiently
Example 2-D squares vs hexagons Maximum possible gain for large dimension 153 dB
ndash Shape advantage VQ can use more points where PDF is higher
114 dB gain for 2-D Gaussian 281 for high dimension
ndash Memory advantage exploit statistical dependence between vector components
Mozilla5
Why Vector Quantization 3 classic advantages (Lookabaugh et al 1989)
ndash Space filling advantage VQ codepoints tile space more efficiently
Example 2-D squares vs hexagons Maximum possible gain for large dimension 153 dB
ndash Shape advantage VQ can use more points where PDF is higher
Can be mitigated with entropy coding
ndash Memory advantage exploit statistical dependence between vector components
Transform coefficients are not strongly correlated
Mozilla6
Why Vector Quantization Important Space advantage applies even when
values are totally uncorrelated Another important advantage
ndash Can have codebooks with less than 1 bit per dimension
Mozilla7
Why Algebraic VQ Trained VQ impractical for high rates large
dimensionsndash High dimension rarr large LUTs lots of memory
Exponential in bitrate
ndash No codebook structure rarr slow search
ldquoAlgebraicrdquo VQ solves these problemsndash Structured codebook no LUTs fast search
Space-filling lattice for arbitrary dimension unknown have to approximate
ndash PVQ asymptotically optimal for Laplacian sources
Mozilla8
Why Gain-Shape Quantization Separate ldquogainrdquo (energy) from ldquoshaperdquo (spectrum)
ndash Vector = Magnitude times Unit Vector (point on sphere)
Potential advantagesndash Can give each piece different rate allocations
Preserve energy (contrast) instead of low-passing Scalar can only add energy by coding plusmn1rsquos
ndash Implicit activity masking Can derive quantization resolution from the explicitly
coded energy
ndash Better representation of coefficients
Mozilla9
How it Works (High-Level)
Mozilla10
Simple Case PVQ without a Predictor
Scalar quantize gain Place K unit pulses in N dimensions
ndash Up to N = 1024 dimensions for large blocks
ndash Only has N-1 degrees of freedom
Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain
Mozilla11
Codebook for N=3 anddifferent K
Mozilla12
PVQ vs Scalar Quantization
Mozilla13
PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the
prediction as ldquospecialrdquondash They are much more likely
Subtracting and coding the residual would lose energy preservation
Solution align the codebook axes with the prediction and treat one dimension differently
Mozilla14
2-D Projection Example
Input
Input
Mozilla15
2-D Projection Example
Prediction
Input
Input + Prediction
Mozilla16
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection
Mozilla17
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection
Mozilla18
2-D Projection Example
θ
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle
Mozilla19
2-D Projection Example Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle Code other
dimensions
Prediction
Input
θ
Mozilla20
What does this accomplish Creates another ldquointuitiverdquo parameter θ
ndash ldquoHow much like the predictor are werdquo
ndash θ = 0 rarr use predictor exactly
θ determines how many pulses go in the ldquopredictionrdquo direction
ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down
Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)
ndash Can repeat for more predictors
Mozilla21
Details
Mozilla22
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands
ndash Gain theta etc signaled separately for each band
ndash Layout ad-hoc for now
Scan order in each band optimized for decreasing average variance
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla5
Why Vector Quantization 3 classic advantages (Lookabaugh et al 1989)
ndash Space filling advantage VQ codepoints tile space more efficiently
Example 2-D squares vs hexagons Maximum possible gain for large dimension 153 dB
ndash Shape advantage VQ can use more points where PDF is higher
Can be mitigated with entropy coding
ndash Memory advantage exploit statistical dependence between vector components
Transform coefficients are not strongly correlated
Mozilla6
Why Vector Quantization Important Space advantage applies even when
values are totally uncorrelated Another important advantage
ndash Can have codebooks with less than 1 bit per dimension
Mozilla7
Why Algebraic VQ Trained VQ impractical for high rates large
dimensionsndash High dimension rarr large LUTs lots of memory
Exponential in bitrate
ndash No codebook structure rarr slow search
ldquoAlgebraicrdquo VQ solves these problemsndash Structured codebook no LUTs fast search
Space-filling lattice for arbitrary dimension unknown have to approximate
ndash PVQ asymptotically optimal for Laplacian sources
Mozilla8
Why Gain-Shape Quantization Separate ldquogainrdquo (energy) from ldquoshaperdquo (spectrum)
ndash Vector = Magnitude times Unit Vector (point on sphere)
Potential advantagesndash Can give each piece different rate allocations
Preserve energy (contrast) instead of low-passing Scalar can only add energy by coding plusmn1rsquos
ndash Implicit activity masking Can derive quantization resolution from the explicitly
coded energy
ndash Better representation of coefficients
Mozilla9
How it Works (High-Level)
Mozilla10
Simple Case PVQ without a Predictor
Scalar quantize gain Place K unit pulses in N dimensions
ndash Up to N = 1024 dimensions for large blocks
ndash Only has N-1 degrees of freedom
Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain
Mozilla11
Codebook for N=3 anddifferent K
Mozilla12
PVQ vs Scalar Quantization
Mozilla13
PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the
prediction as ldquospecialrdquondash They are much more likely
Subtracting and coding the residual would lose energy preservation
Solution align the codebook axes with the prediction and treat one dimension differently
Mozilla14
2-D Projection Example
Input
Input
Mozilla15
2-D Projection Example
Prediction
Input
Input + Prediction
Mozilla16
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection
Mozilla17
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection
Mozilla18
2-D Projection Example
θ
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle
Mozilla19
2-D Projection Example Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle Code other
dimensions
Prediction
Input
θ
Mozilla20
What does this accomplish Creates another ldquointuitiverdquo parameter θ
ndash ldquoHow much like the predictor are werdquo
ndash θ = 0 rarr use predictor exactly
θ determines how many pulses go in the ldquopredictionrdquo direction
ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down
Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)
ndash Can repeat for more predictors
Mozilla21
Details
Mozilla22
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands
ndash Gain theta etc signaled separately for each band
ndash Layout ad-hoc for now
Scan order in each band optimized for decreasing average variance
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla6
Why Vector Quantization Important Space advantage applies even when
values are totally uncorrelated Another important advantage
ndash Can have codebooks with less than 1 bit per dimension
Mozilla7
Why Algebraic VQ Trained VQ impractical for high rates large
dimensionsndash High dimension rarr large LUTs lots of memory
Exponential in bitrate
ndash No codebook structure rarr slow search
ldquoAlgebraicrdquo VQ solves these problemsndash Structured codebook no LUTs fast search
Space-filling lattice for arbitrary dimension unknown have to approximate
ndash PVQ asymptotically optimal for Laplacian sources
Mozilla8
Why Gain-Shape Quantization Separate ldquogainrdquo (energy) from ldquoshaperdquo (spectrum)
ndash Vector = Magnitude times Unit Vector (point on sphere)
Potential advantagesndash Can give each piece different rate allocations
Preserve energy (contrast) instead of low-passing Scalar can only add energy by coding plusmn1rsquos
ndash Implicit activity masking Can derive quantization resolution from the explicitly
coded energy
ndash Better representation of coefficients
Mozilla9
How it Works (High-Level)
Mozilla10
Simple Case PVQ without a Predictor
Scalar quantize gain Place K unit pulses in N dimensions
ndash Up to N = 1024 dimensions for large blocks
ndash Only has N-1 degrees of freedom
Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain
Mozilla11
Codebook for N=3 anddifferent K
Mozilla12
PVQ vs Scalar Quantization
Mozilla13
PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the
prediction as ldquospecialrdquondash They are much more likely
Subtracting and coding the residual would lose energy preservation
Solution align the codebook axes with the prediction and treat one dimension differently
Mozilla14
2-D Projection Example
Input
Input
Mozilla15
2-D Projection Example
Prediction
Input
Input + Prediction
Mozilla16
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection
Mozilla17
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection
Mozilla18
2-D Projection Example
θ
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle
Mozilla19
2-D Projection Example Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle Code other
dimensions
Prediction
Input
θ
Mozilla20
What does this accomplish Creates another ldquointuitiverdquo parameter θ
ndash ldquoHow much like the predictor are werdquo
ndash θ = 0 rarr use predictor exactly
θ determines how many pulses go in the ldquopredictionrdquo direction
ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down
Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)
ndash Can repeat for more predictors
Mozilla21
Details
Mozilla22
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands
ndash Gain theta etc signaled separately for each band
ndash Layout ad-hoc for now
Scan order in each band optimized for decreasing average variance
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla7
Why Algebraic VQ Trained VQ impractical for high rates large
dimensionsndash High dimension rarr large LUTs lots of memory
Exponential in bitrate
ndash No codebook structure rarr slow search
ldquoAlgebraicrdquo VQ solves these problemsndash Structured codebook no LUTs fast search
Space-filling lattice for arbitrary dimension unknown have to approximate
ndash PVQ asymptotically optimal for Laplacian sources
Mozilla8
Why Gain-Shape Quantization Separate ldquogainrdquo (energy) from ldquoshaperdquo (spectrum)
ndash Vector = Magnitude times Unit Vector (point on sphere)
Potential advantagesndash Can give each piece different rate allocations
Preserve energy (contrast) instead of low-passing Scalar can only add energy by coding plusmn1rsquos
ndash Implicit activity masking Can derive quantization resolution from the explicitly
coded energy
ndash Better representation of coefficients
Mozilla9
How it Works (High-Level)
Mozilla10
Simple Case PVQ without a Predictor
Scalar quantize gain Place K unit pulses in N dimensions
ndash Up to N = 1024 dimensions for large blocks
ndash Only has N-1 degrees of freedom
Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain
Mozilla11
Codebook for N=3 anddifferent K
Mozilla12
PVQ vs Scalar Quantization
Mozilla13
PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the
prediction as ldquospecialrdquondash They are much more likely
Subtracting and coding the residual would lose energy preservation
Solution align the codebook axes with the prediction and treat one dimension differently
Mozilla14
2-D Projection Example
Input
Input
Mozilla15
2-D Projection Example
Prediction
Input
Input + Prediction
Mozilla16
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection
Mozilla17
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection
Mozilla18
2-D Projection Example
θ
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle
Mozilla19
2-D Projection Example Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle Code other
dimensions
Prediction
Input
θ
Mozilla20
What does this accomplish Creates another ldquointuitiverdquo parameter θ
ndash ldquoHow much like the predictor are werdquo
ndash θ = 0 rarr use predictor exactly
θ determines how many pulses go in the ldquopredictionrdquo direction
ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down
Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)
ndash Can repeat for more predictors
Mozilla21
Details
Mozilla22
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands
ndash Gain theta etc signaled separately for each band
ndash Layout ad-hoc for now
Scan order in each band optimized for decreasing average variance
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla8
Why Gain-Shape Quantization Separate ldquogainrdquo (energy) from ldquoshaperdquo (spectrum)
ndash Vector = Magnitude times Unit Vector (point on sphere)
Potential advantagesndash Can give each piece different rate allocations
Preserve energy (contrast) instead of low-passing Scalar can only add energy by coding plusmn1rsquos
ndash Implicit activity masking Can derive quantization resolution from the explicitly
coded energy
ndash Better representation of coefficients
Mozilla9
How it Works (High-Level)
Mozilla10
Simple Case PVQ without a Predictor
Scalar quantize gain Place K unit pulses in N dimensions
ndash Up to N = 1024 dimensions for large blocks
ndash Only has N-1 degrees of freedom
Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain
Mozilla11
Codebook for N=3 anddifferent K
Mozilla12
PVQ vs Scalar Quantization
Mozilla13
PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the
prediction as ldquospecialrdquondash They are much more likely
Subtracting and coding the residual would lose energy preservation
Solution align the codebook axes with the prediction and treat one dimension differently
Mozilla14
2-D Projection Example
Input
Input
Mozilla15
2-D Projection Example
Prediction
Input
Input + Prediction
Mozilla16
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection
Mozilla17
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection
Mozilla18
2-D Projection Example
θ
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle
Mozilla19
2-D Projection Example Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle Code other
dimensions
Prediction
Input
θ
Mozilla20
What does this accomplish Creates another ldquointuitiverdquo parameter θ
ndash ldquoHow much like the predictor are werdquo
ndash θ = 0 rarr use predictor exactly
θ determines how many pulses go in the ldquopredictionrdquo direction
ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down
Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)
ndash Can repeat for more predictors
Mozilla21
Details
Mozilla22
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands
ndash Gain theta etc signaled separately for each band
ndash Layout ad-hoc for now
Scan order in each band optimized for decreasing average variance
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla9
How it Works (High-Level)
Mozilla10
Simple Case PVQ without a Predictor
Scalar quantize gain Place K unit pulses in N dimensions
ndash Up to N = 1024 dimensions for large blocks
ndash Only has N-1 degrees of freedom
Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain
Mozilla11
Codebook for N=3 anddifferent K
Mozilla12
PVQ vs Scalar Quantization
Mozilla13
PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the
prediction as ldquospecialrdquondash They are much more likely
Subtracting and coding the residual would lose energy preservation
Solution align the codebook axes with the prediction and treat one dimension differently
Mozilla14
2-D Projection Example
Input
Input
Mozilla15
2-D Projection Example
Prediction
Input
Input + Prediction
Mozilla16
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection
Mozilla17
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection
Mozilla18
2-D Projection Example
θ
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle
Mozilla19
2-D Projection Example Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle Code other
dimensions
Prediction
Input
θ
Mozilla20
What does this accomplish Creates another ldquointuitiverdquo parameter θ
ndash ldquoHow much like the predictor are werdquo
ndash θ = 0 rarr use predictor exactly
θ determines how many pulses go in the ldquopredictionrdquo direction
ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down
Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)
ndash Can repeat for more predictors
Mozilla21
Details
Mozilla22
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands
ndash Gain theta etc signaled separately for each band
ndash Layout ad-hoc for now
Scan order in each band optimized for decreasing average variance
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla10
Simple Case PVQ without a Predictor
Scalar quantize gain Place K unit pulses in N dimensions
ndash Up to N = 1024 dimensions for large blocks
ndash Only has N-1 degrees of freedom
Normalize to unit norm K is derived implicitly from the gain Can also code K and derive gain
Mozilla11
Codebook for N=3 anddifferent K
Mozilla12
PVQ vs Scalar Quantization
Mozilla13
PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the
prediction as ldquospecialrdquondash They are much more likely
Subtracting and coding the residual would lose energy preservation
Solution align the codebook axes with the prediction and treat one dimension differently
Mozilla14
2-D Projection Example
Input
Input
Mozilla15
2-D Projection Example
Prediction
Input
Input + Prediction
Mozilla16
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection
Mozilla17
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection
Mozilla18
2-D Projection Example
θ
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle
Mozilla19
2-D Projection Example Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle Code other
dimensions
Prediction
Input
θ
Mozilla20
What does this accomplish Creates another ldquointuitiverdquo parameter θ
ndash ldquoHow much like the predictor are werdquo
ndash θ = 0 rarr use predictor exactly
θ determines how many pulses go in the ldquopredictionrdquo direction
ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down
Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)
ndash Can repeat for more predictors
Mozilla21
Details
Mozilla22
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands
ndash Gain theta etc signaled separately for each band
ndash Layout ad-hoc for now
Scan order in each band optimized for decreasing average variance
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla11
Codebook for N=3 anddifferent K
Mozilla12
PVQ vs Scalar Quantization
Mozilla13
PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the
prediction as ldquospecialrdquondash They are much more likely
Subtracting and coding the residual would lose energy preservation
Solution align the codebook axes with the prediction and treat one dimension differently
Mozilla14
2-D Projection Example
Input
Input
Mozilla15
2-D Projection Example
Prediction
Input
Input + Prediction
Mozilla16
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection
Mozilla17
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection
Mozilla18
2-D Projection Example
θ
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle
Mozilla19
2-D Projection Example Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle Code other
dimensions
Prediction
Input
θ
Mozilla20
What does this accomplish Creates another ldquointuitiverdquo parameter θ
ndash ldquoHow much like the predictor are werdquo
ndash θ = 0 rarr use predictor exactly
θ determines how many pulses go in the ldquopredictionrdquo direction
ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down
Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)
ndash Can repeat for more predictors
Mozilla21
Details
Mozilla22
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands
ndash Gain theta etc signaled separately for each band
ndash Layout ad-hoc for now
Scan order in each band optimized for decreasing average variance
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla12
PVQ vs Scalar Quantization
Mozilla13
PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the
prediction as ldquospecialrdquondash They are much more likely
Subtracting and coding the residual would lose energy preservation
Solution align the codebook axes with the prediction and treat one dimension differently
Mozilla14
2-D Projection Example
Input
Input
Mozilla15
2-D Projection Example
Prediction
Input
Input + Prediction
Mozilla16
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection
Mozilla17
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection
Mozilla18
2-D Projection Example
θ
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle
Mozilla19
2-D Projection Example Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle Code other
dimensions
Prediction
Input
θ
Mozilla20
What does this accomplish Creates another ldquointuitiverdquo parameter θ
ndash ldquoHow much like the predictor are werdquo
ndash θ = 0 rarr use predictor exactly
θ determines how many pulses go in the ldquopredictionrdquo direction
ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down
Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)
ndash Can repeat for more predictors
Mozilla21
Details
Mozilla22
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands
ndash Gain theta etc signaled separately for each band
ndash Layout ad-hoc for now
Scan order in each band optimized for decreasing average variance
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla13
PVQ with a Predictor Video provides us with useful predictors We want to treat vectors in the direction of the
prediction as ldquospecialrdquondash They are much more likely
Subtracting and coding the residual would lose energy preservation
Solution align the codebook axes with the prediction and treat one dimension differently
Mozilla14
2-D Projection Example
Input
Input
Mozilla15
2-D Projection Example
Prediction
Input
Input + Prediction
Mozilla16
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection
Mozilla17
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection
Mozilla18
2-D Projection Example
θ
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle
Mozilla19
2-D Projection Example Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle Code other
dimensions
Prediction
Input
θ
Mozilla20
What does this accomplish Creates another ldquointuitiverdquo parameter θ
ndash ldquoHow much like the predictor are werdquo
ndash θ = 0 rarr use predictor exactly
θ determines how many pulses go in the ldquopredictionrdquo direction
ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down
Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)
ndash Can repeat for more predictors
Mozilla21
Details
Mozilla22
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands
ndash Gain theta etc signaled separately for each band
ndash Layout ad-hoc for now
Scan order in each band optimized for decreasing average variance
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla14
2-D Projection Example
Input
Input
Mozilla15
2-D Projection Example
Prediction
Input
Input + Prediction
Mozilla16
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection
Mozilla17
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection
Mozilla18
2-D Projection Example
θ
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle
Mozilla19
2-D Projection Example Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle Code other
dimensions
Prediction
Input
θ
Mozilla20
What does this accomplish Creates another ldquointuitiverdquo parameter θ
ndash ldquoHow much like the predictor are werdquo
ndash θ = 0 rarr use predictor exactly
θ determines how many pulses go in the ldquopredictionrdquo direction
ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down
Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)
ndash Can repeat for more predictors
Mozilla21
Details
Mozilla22
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands
ndash Gain theta etc signaled separately for each band
ndash Layout ad-hoc for now
Scan order in each band optimized for decreasing average variance
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla15
2-D Projection Example
Prediction
Input
Input + Prediction
Mozilla16
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection
Mozilla17
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection
Mozilla18
2-D Projection Example
θ
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle
Mozilla19
2-D Projection Example Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle Code other
dimensions
Prediction
Input
θ
Mozilla20
What does this accomplish Creates another ldquointuitiverdquo parameter θ
ndash ldquoHow much like the predictor are werdquo
ndash θ = 0 rarr use predictor exactly
θ determines how many pulses go in the ldquopredictionrdquo direction
ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down
Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)
ndash Can repeat for more predictors
Mozilla21
Details
Mozilla22
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands
ndash Gain theta etc signaled separately for each band
ndash Layout ad-hoc for now
Scan order in each band optimized for decreasing average variance
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla16
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection
Mozilla17
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection
Mozilla18
2-D Projection Example
θ
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle
Mozilla19
2-D Projection Example Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle Code other
dimensions
Prediction
Input
θ
Mozilla20
What does this accomplish Creates another ldquointuitiverdquo parameter θ
ndash ldquoHow much like the predictor are werdquo
ndash θ = 0 rarr use predictor exactly
θ determines how many pulses go in the ldquopredictionrdquo direction
ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down
Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)
ndash Can repeat for more predictors
Mozilla21
Details
Mozilla22
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands
ndash Gain theta etc signaled separately for each band
ndash Layout ad-hoc for now
Scan order in each band optimized for decreasing average variance
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla17
2-D Projection Example
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection
Mozilla18
2-D Projection Example
θ
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle
Mozilla19
2-D Projection Example Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle Code other
dimensions
Prediction
Input
θ
Mozilla20
What does this accomplish Creates another ldquointuitiverdquo parameter θ
ndash ldquoHow much like the predictor are werdquo
ndash θ = 0 rarr use predictor exactly
θ determines how many pulses go in the ldquopredictionrdquo direction
ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down
Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)
ndash Can repeat for more predictors
Mozilla21
Details
Mozilla22
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands
ndash Gain theta etc signaled separately for each band
ndash Layout ad-hoc for now
Scan order in each band optimized for decreasing average variance
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla18
2-D Projection Example
θ
Prediction
Input
Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle
Mozilla19
2-D Projection Example Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle Code other
dimensions
Prediction
Input
θ
Mozilla20
What does this accomplish Creates another ldquointuitiverdquo parameter θ
ndash ldquoHow much like the predictor are werdquo
ndash θ = 0 rarr use predictor exactly
θ determines how many pulses go in the ldquopredictionrdquo direction
ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down
Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)
ndash Can repeat for more predictors
Mozilla21
Details
Mozilla22
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands
ndash Gain theta etc signaled separately for each band
ndash Layout ad-hoc for now
Scan order in each band optimized for decreasing average variance
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla19
2-D Projection Example Input + Prediction Compute Householder
Reflection Apply Reflection Compute amp
code angle Code other
dimensions
Prediction
Input
θ
Mozilla20
What does this accomplish Creates another ldquointuitiverdquo parameter θ
ndash ldquoHow much like the predictor are werdquo
ndash θ = 0 rarr use predictor exactly
θ determines how many pulses go in the ldquopredictionrdquo direction
ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down
Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)
ndash Can repeat for more predictors
Mozilla21
Details
Mozilla22
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands
ndash Gain theta etc signaled separately for each band
ndash Layout ad-hoc for now
Scan order in each band optimized for decreasing average variance
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla20
What does this accomplish Creates another ldquointuitiverdquo parameter θ
ndash ldquoHow much like the predictor are werdquo
ndash θ = 0 rarr use predictor exactly
θ determines how many pulses go in the ldquopredictionrdquo direction
ndash K (and thus bitrate) for remaining N-1 dimensions adjusted down
Remaining N-1 dimensions have N-2 degrees of freedom (no redundancy)
ndash Can repeat for more predictors
Mozilla21
Details
Mozilla22
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands
ndash Gain theta etc signaled separately for each band
ndash Layout ad-hoc for now
Scan order in each band optimized for decreasing average variance
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla21
Details
Mozilla22
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands
ndash Gain theta etc signaled separately for each band
ndash Layout ad-hoc for now
Scan order in each band optimized for decreasing average variance
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla22
Band Structure DC coded separately with scalar quantization AC coefficients grouped into bands
ndash Gain theta etc signaled separately for each band
ndash Layout ad-hoc for now
Scan order in each band optimized for decreasing average variance
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla23
Band Structure
Scan order is possibly over-fit
4x48x8
16x16
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla24
To Predict or Not to Predict θ gt π2 rarr Prediction not helping
ndash Could code large θrsquos but doesnrsquot seem that useful
ndash Need to handle zero predictors anyway
Current approach code a ldquonorefrdquo flagndash Currently jointly code up to 4 flags at once with
fixed order-0 probability per band (5 of KF rate)
ndash Patches in review cut this down this a lot Force noref=1 when predictor is zero in keyframes Separate probabilities for each block size Adapt the probabilities
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla25
Quantization Matrix Simple approach (what wersquore doing now)
ndash Separate quantization resolution for each band Keep flat quantization within bands
Advanced approachndash Scaling after normalization complicated
Unit pulses no longer ldquounitrdquo (how to sum to K) Householder reflection scrambles things further
ndash Better() Pre-scale vector by quantization factors
ndash Effects on energy preservation
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla26
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla27
Quantization Matrix Example
Flat Quantizer (base Q=35) Adjusted Per-Band (base Q=23)
Metrics +15 PSNR +12 SSIM -18 PSNR-HVS
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla28
Activity Masking Goal Use better resolution in flat areas
ndash Low contrast rarr low energy (gain)
ndash Derivations in docvideo_pvqlyx doctheoretical_resultslyx
Currently wrongincomplete working on updates
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla29
Activity Masking Step 1 Compand gain (g)
ndash Goal Q prop g2α (x264 uses α = 0173 we start with 16)
ndash Quantize ĝ = (Qgĥ)β encode ĥ
β = 1(1-2α)
Qg = (Qβ)β
ndash Offset steps so at least one value of ĥ gives same gain as the prediction
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla30
Activity Masking cotd Step 2 Choose θ resolution
ndash Polar coordinates ĝ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = dĝdĥ radic(cos θ ndash cos )ϑ 2 + (sin θ ndash sin )ϑ 2 = 2 ndash 2cos(θ ndash ϑ)
asymp arcdistance(θ ) asymp ϑ θ ndash ϑndash At least for small θ ndash ϑ
ndash Qθ = (dĝdĥ)ĝ = βĥ
Make sure Qθ evenly divides π2
When ĝ is small force Qθ = π2
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla31
Activity Masking cotd Step 3 Choose K
ndash D = (g - ĝ)2 + gĝ(Dθ + sin θ sin ϑ D
pvq)
Dθ = 2 ndash 2cos(θ ndash ϑ) = distortion due to θ quant
Dpvq
= distortion due to PVQ on last N ndash 1 dimensions
ndash Distortion due to scalar quantizing gain (dĝdĥ)212
ndash High-rate distortion due to PVQ (N ndash 1)2(24K2) Derived experimentally far too high at low rate (N ndash 2) DOF rarr should be (N ndash 2) times gain distortion
ndash Assume g = ĝ θ = ϑ solve for K K = (ĥβ) sin ϑ (N ndash 1)radic2(N ndash 2) asymp (ĥβ) sin ϑ radicN2
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla32
Loss Robustness K asymp (ĥβ) sin ϑ radicN2
ndash ĥ is offset by the companded reference gain so can be wrong if there are losses
ndash But if K is wrong wersquoll decode the wrong number of pulses totally desyncing the bitstream
Remove dependence on ĥ
ndash sin ϑ asymp rarr ĥ ϑ sin ϑ asymp ĥ = ĥQϑθ(ϑQ
θ) = (ϑQ
θ)β
ndash (ϑQθ) is the index encoded in the bitstream
Since Qθ not exact canrsquot cap ϑ le π2 in bitstream
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla33
Inter-band Masking ĝ is per-band but traditional activity masking is
per-blockndash Could just sum ĝ over all bands
ndash Actual model is that energy in one band masks energy in another
Lower bands appear to mask higher but not other way around
Still very early not much is tuned
ndash ρ = (ĝh
2(ĝh
2 + ηĝl
2))α η controls amount of masking
ρĥ then used to derive Qθ and K instead of ĥ
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla34
Calibration Activity masking always increases rate
ndash Scale base quantizer in each band to reduce rate Q = Q
0L(1β ndash 1)
L is the maximum luma value
ndash Just an approximation seems to work okay
ndash AM currently disabled for chroma
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla35
Activity Masking Example
No activity masking (base Q=23)
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla36
Activity Masking Example
Activity masking (base Q=23)
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
Mozilla37
Open Issues Better entropy coding
ndash Everything order-0
ndash Take advantage of correlation in gainθnorefetc
Better RDOndash Currently iterating over small range of gains θs
ndash Rate estimates very approximate
Reducing overhead of loss-robust case Noise injectionfolding Bit-exact implementation tuning etc
- Slide 1
- Slide 2
- Slide 3
- Slide 4
- Slide 5
- Slide 6
- Slide 7
- Slide 8
- Slide 9
- Slide 10
- Slide 11
- Slide 12
- Slide 13
- Slide 14
- Slide 15
- Slide 16
- Slide 17
- Slide 18
- Slide 19
- Slide 20
- Slide 21
- Slide 22
- Slide 23
- Slide 24
- Slide 25
- Slide 26
- Slide 27
- Slide 28
- Slide 29
- Slide 30
- Slide 31
- Slide 32
- Slide 33
- Slide 34
- Slide 35
- Slide 36
- Slide 37
-
top related