1 audio compression techniques mumt 611, january 2005 assignment 2 paul kolesnik

26
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

Post on 19-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

1

Audio Compression Techniques

MUMT 611, January 2005Assignment 2Paul Kolesnik

Page 2: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

2

Introduction

Digital Audio Compression Removal of redundant or otherwise irrelevant

information from audio signal Audio compression algorithms are often referred to as

“audio encoders” Applications

Reduces required storage space Reduces required transmission bandwidth

Page 3: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

3

Audio Compression

Audio signal – overview Sampling rate (# of samples per second) Bit rate (# of bits per second). Typically,

uncompressed stereo 16-bit 44.1KHz signal has a 1.4MBps bit rate

Number of channels (mono / stereo / multichannel) Reduction by lowering those values or by data

compression / encoding

Page 4: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

4

Audio Data Compression

Redundant information Implicit in the remaining informationEx. oversampled audio signal

Irrelevant informationPerceptually insignificantCannot be recovered from remaining

information

Page 5: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

5

Audio Data Compression

Lossless Audio CompressionRemoves redundant dataResulting signal is same as original – perfect

reconstruction Lossy Audio Encoding

Removes irrelevant dataResulting signal is similar to original

Page 6: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

6

Audio Data Compression

Audio vs. Speech Compression TechniquesSpeech Compression uses a human vocal

tract model to compress signalsAudio Compression does not use this

technique due to larger variety of possible signal variations

Page 7: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

7

Generic Audio Encoder

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 8: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

8

Generic Audio Encoder

Psychoacoustic ModelPsychoacoustics – study of how sounds are

perceived by humansUses perceptual coding

eliminate information from audio signal that is inaudible to the ear

Detects conditions under which different audio signal components mask each other

Page 9: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

9

Psychoacoustic Model

Signal MaskingThreshold cut-offSpectral (Frequency / Simultaneous) MaskingTemporal Masking

Threshold cut-off and spectral masking occur in frequency domain, temporal masking occurs in time domain

Page 10: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

10

Signal Masking

Threshold cut-off Hearing threshold level

– a function of frequency

Any frequency components below the threshold will not be perceived by human ear

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 11: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

11

Signal Masking

Spectral Masking A frequency

component can be partly or fully masked by another component that is close to it in frequency

This shifts the hearing threshold

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 12: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

12

Signal Masking

Temporal Masking A quieter sound can

be masked by a louder sound if they are temporally close

Sounds that occur both (shortly) before and after volume increase can be masked

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 13: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

13

Spectral Analysis

Tasks of Spectral AnalysisTo derive masking thresholds to determine

which signal components can be eliminatedTo generate a representation of the signal to

which masking thresholds can be applied Spectral Analysis is done through

transforms or filter banks

Page 14: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

14

Spectral Analysis

TransformsFast Fourier Transform (FFT)Discrete Cosine Transform (DCT) - similar to

FFT but uses cosine values onlyModified Discrete Cosine Transform (MDCT)

[used by MPEG-1 Layer-III, MPEG-2 AAC, Dolby AC-3] – overlapped and windowed version of DCT

Page 15: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

15

Spectral Analysis

Filter BanksTime sample blocks are passed through a set

of bandpass filtersMasking thresholds are applied to resulting

frequency subband signalsPoly-phase and wavelet banks are most

popular filter structures

Page 16: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

16

Filter Bank Structures

Polyphase Filter Bank [used in all of the MPEG-1 encoders]Signal is separated into subbands, the widths

of which are equal over the entire frequency range

The resulting subband signals are downsampled to create shorter signals (which are later reconstructed during decoding process)

Page 17: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

17

Filter Bank Structures

Wavelet Filter Bank [used by Enhanced Perceptual Audio Coder (EPAC) by Lucent] Unlike polyphase filter, the widths of the

subbands are not evenly spaced (narrower for higher frequencies)

This allows for better time resolution (ex. short attacks), but at expense of frequency resolution

Page 18: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

18

Noise Allocation

System Task: derive and apply shifted hearing threshold to the input signal Anything below the threshold doesn’t need to be

transmitted Any noise below the threshold is irrelevant

Frequency component quantization Tradeoff between space and noise Encoder saves on space by using just enough bits for

each frequency component to keep noise under the threshold - this is known as noise allocation

Page 19: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

19

Noise Allocation

Pre-echo In case a single audio block contains silence followed

by a loud attack, pre-echo error occurs - there will be audible noise in the silent part of the block after decoding

This is avoided by pre-monitoring audio data at encoding stage and separating audio into shorter blocks in potential pre-echo case

This does not completely eliminate pre-echo, but can make it short enough to be masked by the attack (temporal masking)

Page 20: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

20

Pre-echo Effect

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 21: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

21

Additional Encoding Techniques

Other encoding techniques techniques are available (alternative or in combination)Predictive CodingCoupling / Delta EncodingHuffman Encoding

Page 22: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

22

Additional Encoding Techniques

Predictive Coding Often used in speech and image compression Estimates the expected value for each sample based

on previous sample values Transmits/stores the difference between the expected

and received value Generates an estimate for the next sample and then

adjusts it by the difference stored for the current sample

Used for additional compression in MPEG2 AAC

Page 23: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

23

Additional Encoding Techniques

Coupling / Delta encoding Used in cases where audio signal consists of two or

more channels (stereo or surround sound) Similarities between channels are used for

compression A sum and difference between two channels are

derived; difference is usually some value close to zero and therefore requires less space to encode

This is a case of lossless encoding process

Page 24: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

24

Additional Encoding Techniques

Huffman Coding Information-theory-based technique An element of a signal that often reoccurs in the signal

is represented by a simpler symbol, and its value is stored in a look-up table

Implemented using a look-up tables in encoder and in decoder

Provides substantial lossless compression, but requires high computational power and therefore is not very popular

Used by MPEG1 and MPEG2 AAC

Page 25: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

25

Encoding - Final Stages

Audio data packed into frames Frames stored or transmitted

Page 26: 1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik

26

Conclusion

HTML Bibliographyhttp://www.music.mcgill.ca/~pkoles

Questions