audio fundamentals, compression techniques &...

53
Audio Fundamentals, Compression Techniques & Standards Instructor: Hamid R. Rabiee Spring 2013

Upload: buidang

Post on 24-Apr-2018

227 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Audio Fundamentals, Compression Techniques &

Standards Instructor: Hamid R. Rabiee

Spring 2013

Page 2: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Digital Media Lab - Sharif University of Technology 2

Outline

²  Audio Fundamentals ²  Sampling, digitization, quantization

²  µ-law and A-law

²  Sound Compression

²  PCM, DPCM, ADPCM, CELP, LPC

²  Audio Compression

²  MPEG Audio Codec

²  Perceptual Coding

²  MPEG1

²  Layer1

²  Layer2

²  Layer3(MP3)

Page 3: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Sound, Sound Wave, Acoustics

²  Sound: A continuous wave that travels through a medium

²  Sound wave: Energy causes disturbance in a medium, made of

pressure differences (measures pressure level at a location)

²  Acoustics: The study of sound (generation, transmission, and reception

of sound waves)

²  Example is striking a drum

²  Head of drum vibrates => disturbs air molecules close to head

²  Regions of molecules with pressure above and below equilibrium

²  Sound transmitted by molecules bumping into each other

3

Page 4: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

The Characteristics of Sound Waves

² Frequency

² The rate at which sound is measured ² Number of cycles per second or Hertz (Hz) ² Determines the pitch of the sound as heard by our ears ² The higher frequency, the clearer and sharper the sound => the

higher pitch of sound

² Amplitude

² Sound’s intensity or loudness

² The louder the sound, the larger the amplitude.

² All sounds have a duration and successive musical sounds is called rhythm

4

Page 5: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

The Characteristics of Sound Waves (cont.)

5

Page 6: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

²  To get audio or video into a computer, we must digitize it

²  Audio signals are continuous in time and amplitude

²  Audio signal must be digitized in both time and amplitude

to be represented in binary form.

6

Audio Digitization

Analog-Digital Converter (ADC)

Voice 10010010000100

Sinusoid wave

Page 7: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

How Digitized Audio Data

To decide how to digitize audio data we need to answer the following questions:

²  What is the sampling rate?

²  How finely is the data to be quantized, and is quantization uniform?

²  How is audio data formatted? (File format)

Digital Media Lab - Sharif University of Technology 7

Page 8: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Human Perception

² Speech is a complex waveform

²  Vowels (a,i,u,e,o) and bass sounds are low frequencies

²  Consonants (s,sh,k,t,…) are high frequencies

²  Humans most sensitive to low frequencies

² Most important region is 2 kHz to 4 kHz

² Hearing dependent on room and environment

²  Sounds masked by overlapping sounds

² Temporal & Frequency masking

8

Page 9: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Audio Digitization

² Step1. Sampling

divide the horizontal axis (time dimension) into discrete

pieces. Uniform sampling is ubiquitous (everywhere at once).

9

Page 10: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Nyquist Theorem

²  Suppose we are sampling a waveform. How often do we need to sample it to figure out its frequency?

²  Signals can be decomposed into a sum of sinusoids. ²  The Nyquist theorem states how frequently we must sample in time to be

able to recover the original sound.

•  The Nyquist Theorem, also known as the sampling theorem, is a principle that engineers follow in the digitization of analog signals. •  For Analog-to-Digital conversion (ADC) to result in a faithful reproduction of the signal, slices, called samples, of the analog waveform must be taken frequently. • The number of samples per second is called the sampling rate or sampling frequency.

10

Page 11: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Nyquist Theorem (cont.)

² Nyquist rate

It can be proven that a

bandwidth-limited signal

can be fully reconstructed

from its samples, if the

sampling rate is at least

twice the highest

frequency in the signal

(fmax).

11

Page 12: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

12

²  Step2. Quantization -- Converts actual amplitude sample values (usually voltage measurements) into an integer approximation

²  Tradeoff between number of bits required and error ²  Human perception limitations affect allowable error

²  Specific application affects allowable error

²  Once samples have been captured (Discrete in time by sampling – Nyquist), they must be made discrete in amplitude (Discrete in amplitude by quantization).

²  Two approaches to quantization ²  Rounding the sample to the closest integer.

²  (e.g. round 3.14 to 3) ²  Create a Quantizer table that generates a staircase pattern of values

based on a step size.

Quantization

Page 13: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

13

Reconstruction

²  Analog-to-Digital Converter (ADC) provides the sampled and

quantized binary code.

²  Digital-to-Analog Converter (DAC) converts the quantized

binary code back into an approximation of the analog signal

(reverses the quantization process) by clocking the code to the

same sample rate as the ADC conversion.

Page 14: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

14

Quantization Error (Noise)

²  After quantization, some information is lost

²  Errors (noise) introduced

²  The difference between the original sample value and the rounded

value is called the quantization error

²  A Signal to Noise Ratio (SNR) is the ratio of the relative sizes of the

signal values and the errors.

²  The higher the SNR, the smaller the average error is with respect to

the signal value, and the better the fidelity.

Quantization is only an approximation.

Page 15: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Quantization (An Example)

Digital Media Lab - Sharif University of Technology 15

Page 16: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

µ-law, A-law

²  Non-uniform quantizers: Difficult to make, Expensive.

²  Solution: Companding => Uniform Q. => Expanding

16

Page 17: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

µ-law, A-law (cont.)

17

Page 18: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

µ-law vs. A-law

²  8-bit µ-law used in US for telephony

²  ITU Recommendation G.711

²  8-bit A-law used in Europe for telephony

²  Similar, but a slightly different curve.

²  Both give similar quality to 12-bit linear encoding.

²  A-law used for International circuits.

²  Both are linear approximations to a log curve.

²  8000 samples/sec * 8bits per sample = 64Kb/s data rate

Digital Media Lab - Sharif University of Technology 18

Page 19: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

AUDIO & SOUND COMPRESSION TECHNIQUES

Page 20: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Audio Coding

²  Quantization and transformation of data are collectively known as

coding of the data. ²  For audio, the -law technique for compounding audio signals is usually combined with an

algorithm that exploits the temporal redundancy present in audio signals.

²  Differences in signals between the present and a past time can reduce the size of signal values

and also concentrate the histogram of pixel values (differences, now) into a much smaller range.

²  The result of reducing the variance of values is that lossless compression methods produce a bit

stream with shorter bit lengths for more likely values

20

Page 21: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Audio Coding (cont.)

²  In general, producing quantized sampled output for audio is called PCM

(Pulse Code Modulation). The differences version is called DPCM (and a

crude but efficient variant is called DM). The adaptive version is called

ADPCM.

Digital Media Lab - Sharif University of Technology 21

Page 22: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Compression

Every compression scheme has three stages:

²  The input data is transformed to a new representation that is easier or

more efficient to compress.

²  We may introduce loss of information. ²  (Quantization is the main lossy step ).

²  Coding.

²  Assign a codeword (thus forming a binary bit stream) to each output level or symbol

Digital Media Lab - Sharif University of Technology 22

Page 23: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Sound Compression

²  Some techniques for sound compression: ²  PCM - send every sample

²  DPCM - send differences between samples

²  ADPCM - send differences, but adapt how we code them

²  LPC - linear model of speech formation

²  CELP - use LPC as base, but also use some bits to code corrections for the things LPC gets

wrong.

²  The techniques that code received sound signals ²  PCM, DPCM, ADPCM

²  The techniques that parameterize the sound model (model-based

coding) ²  LPC, CELP

23

Page 24: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Pulse-code Modulation (PCM)

²  µ-law and a-law PCM have already reduced the data sent.

²  However, each sample is still independently encoded.

²  In reality, samples are correlated.

²  Can utilize this correlation to reduce the data sent.

Digital Media Lab - Sharif University of Technology 24

Page 25: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Pulse-code Modulation (PCM)

²  Digital Representation of an Analog Signal

²  Sampling and Quantization

²  Parameters:

²  Sampling Rate (Samples per Second)

²  Quantization Levels (Bits per Sample)

Digital Media Lab - Sharif University of Technology 25

Original analog signal and its corresponding PCM signals.

Decoded staircase signal. Reconstructed signal after low-pass filtering.

Page 26: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Differential PCM (DPCM)

²  Makes a simple prediction of the next sample, based on weighted previous

n samples.

²  Normally the difference between samples is relatively small and can be

coded with less than 8 bits.

²  Typically use 6 bits for difference, rather than 8 bits for absolute value.

²  Compression is lossy, as not all differences can be coded

Digital Media Lab - Sharif University of Technology 26

Page 27: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Adaptive DPCM (ADPCM)

²  Different Coding techniques in comparison with DPCM

²  2 Methods ²  Adaptive Quantization

² Quantization levels are adaptive, based on the content of the audio.

²  Adaptive Prediction

²  Receiver runs same prediction algorithm and adaptive quantization levels

to reconstruct speech.

Digital Media Lab - Sharif University of Technology 27

Page 28: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Model-based Coding

²  PCM, DPCM and ADPCM directly code the received audio signal.

²  An alternative approach is to build a parameterized model of the

²  sound source (i.e.. Human voice).

²  For each time slice (e.g. 20ms):

²  Analyze the audio signal to determine how the signal was produced.

²  Determine the model parameters that fit.

²  Send the model parameters.

²  At the receiver, synthesize the voice from the model and received

parameters.

Digital Media Lab - Sharif University of Technology 28

Page 29: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Speech formation

²  Voiced sounds: series of pulses of air as larynx opens and closes. Basic

tone then shaped by changing resonance of vocal tract.

²  Unvoiced sounds: larynx held open, turbulent noise made in mouth.

Digital Media Lab - Sharif University of Technology 29

Page 30: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

LPC

²  Introduced in 1960s.

²  Low-bit rate encoder: ²  1.2Kb/s - 4Kb/s

²  Sounds very synthetic

²  Basic LPC mostly used where bitrate really matters (eg in miltary applications)

²  Most modern voice codecs (eg GSM) are based on enhanced LPC encoders.

Digital Media Lab - Sharif University of Technology 30

Page 31: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

LPC

²  Digitize signal, and split into segments (eg 20ms)

²  For each segment, determine: ²  Pitch of the signal (i.e. basic formant frequency)

²  Loudness of the signal.

²  Whether sound is voiced or unvoiced

² Voiced: vowels, “m”, “v”, “l”

² Unvoiced: “f”, “s”

²  Vocal tract excitation parameters (LPC coefficients)

Digital Media Lab - Sharif University of Technology 31

Page 32: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Limitations of LPC Model

²  LPC linear predictor is very simple.

²  For this to work, the vocal tract “tube” must not have any side branches

(these would require a more complex model).

²  OK for vowels (tube is a reasonable model)

²  For nasal sounds, nose cavity forms a side branch.

²  In practice this is ignored in pure LPC.

²  More complex codecs attempt to code the residue signal, which helps correct this.

Digital Media Lab - Sharif University of Technology 32

Page 33: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Code Excited Linear Prediction (CELP)

²  Goal is to efficiently encode the residue signal, improving speech quality

over LPC, but without increasing the bit rate too much.

²  CELP codecs use a codebook of typical residue values.

²  Analyzer compares residue to codebook values.

²  Chooses value which is closest.

²  Sends that value.

²  Receiver looks up the code in its codebook, retrieves the residue, and uses

this to excite the LPC formant filter.

Digital Media Lab - Sharif University of Technology 33

Page 34: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

CELP

²  Problem is that codebook would require different residue values for every

possible voice pitch.

²  Codebook search would be slow, and code would require a lot of bits to send.

²  One solution is to have two codebooks. ²  One fixed by codec designers, just large enough to represent one pitch period of residue.

²  One dynamically filled in with copies of the previous residue delayed by various amounts

(delay provides the pitch)

²  CELP algorithm using these techniques can provide pretty good quality at

4.8Kb/s.

Digital Media Lab - Sharif University of Technology 34

Page 35: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Enhanced LPC Usage

²  GSM (Group Special Mobile) ²  Residual Pulse Excited LPC

²  13Kb/s

²  LD-CELP ²  Low-delay Code-Excited Linear Prediction (G.728)

²  16Kb/s

²  CS-ACELP ²  Conjugate Structure Algebraic CELP (G.729)

²  8Kb/s

²  MP-MLQ ²  Multi-Pulse Maximum Likelihood Quantization (G.723.1)

²  6.3Kb/s

Digital Media Lab - Sharif University of Technology 35

Page 36: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Audio Compression

²  LPC-based codec model the sound source to achieve good compression.

²  Works well for voice.

²  Terrible for music.

²  What if you can’t model the source?

²  Model the limitations of the human ear.

²  Not all sounds in the sampled audio can actually be heard.

²  Analyze the audio and send only the sounds that can be heard.

²  Quantize more coarsely where noise will be less audible.

²  Audio vs. Speech

²  Higher quality requirement for audio

²  Wider frequency range of audio Sound Signal

Digital Media Lab - Sharif University of Technology 36

Page 37: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

37

Psychoacoustics Model

²  Dynamic range is ratio of maximum signal amplitude to minimum signal

amplitude (measured in decibels).

²  D = 20 log (Amax/Amin) dB

²  Human hearing has dynamic range of ~ 96dB

²  Sensitivity of the ear is dependent on frequency.

²  Most sensitive in range of 2-5KHz

Sharif University of Technology, Department of Computer Engineering, Multimedia Systems Course

Sensitivity of human ears

Page 38: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Psychoacoustics Model

²  Amplitude Sensitivity

²  Frequencies only heard if they exceed a sensitivity threshold:

Digital Media Lab - Sharif University of Technology 38

Page 39: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

39

Psychoacoustics Model

²  Frequency Masking

²  The sensitivity threshold curve is distorted by the presence of loud sounds.

²  Frequencies just above and below the frequency of a loud sound need to be louder than

the normal minimum amplitude before they can be heard.

Sharif University of Technology, Department of Computer Engineering, Multimedia Systems Course

Page 40: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

40

Psychoacoustics Model

² Temporal masking ²  If we hear a loud sound, then it stops, it takes a little while until we can hear a

soft tone nearby

² After hearing a loud sound, the ear is deaf to quieter sounds in the

same frequency range for a short time.

Sharif University of Technology, Department of Computer Engineering, Multimedia Systems Course

Page 41: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Principle of Audio Compression

²  Audio compression – Perceptual coding

²  Take advantage of psychoacoustics model

²  Distinguish between the signal of different sensitivity to human ears

²  Signal of high sensitivity – more bits allocated for coding

²  Signal of low sensitivity – less bits allocated for coding

²  Exploit the frequency masking

²  Don’t encode the masked signal (range of masking is 1 critical band)

²  Exploit the temporal masking

²  Don’t encode the masked signal

41 Sharif University of Technology, Department of Computer Engineering, Multimedia Systems Course

Page 42: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

AUDIO CODING STANDARDS

Page 43: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Audio Coding Standards

²  G.711 - A-LAW/U-LAW encodings (8 bits/sample)

²  G.721 - ADPCM (32 kbs, 4 bits/sample)

²  G.723 - ADPCM (24 kbs and 40 kbs, 8 bits/sample)

²  G.728 - CELP (16 kbs)

²  LPC (FIPS-1015) - Linear Predictive Coding (2.4kbs)

²  CELP (FIPS-1016) - Code excited LPC (4.8kbs, 4bits/sample)

²  G.729 - CS-ACELP (8kbs)

²  MPEG1/MPEG2, AC3 - (16-384kbs) mono, stereo, and 5+1 channels

Digital Media Lab - Sharif University of Technology 43

Page 44: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

MPEG Audio Codec

²  MPEG (Motion Picture Expert Group) and ISO (International Standard

Organization) have published several standards about digital audio coding.

²  MPEG-1 Layer 1,2 and 3 (MP3)

²  MPEG2 AAC

²  MPEG4 AAC and TwinVQ

²  Exploits the psychoacoustic models.

²  Frequency masking is always utilized

²  More complex forms of MPEG also employ temporal masking

²  They have been widely used in consumer electronics, digital audio

broadcasting, DVD and movies etc.

44 Sharif University of Technology, Department of Computer Engineering, Multimedia Systems Course

Page 45: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Advantages of MPEG approach

²  Complex psychoacoustic modeling only in coding phase

²  Desirable for real time (Hardware or software) decompression

²  Essential for broadcast purposes.

²  Decompression is independent of the psychoacoustic

²  models used

²  Different models can be used

²  If there is enough bandwidth no models at all.

²  Although (data) lossy, MPEG claims to be perceptually lossless:

²  Human tests (part of standard development), Expert listeners.

²  Under Optimal listening conditions no statistically distinguishable difference

between original and MPEG.

45 Sharif University of Technology, Department of Computer Engineering, Multimedia Systems Course

Page 46: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

46

MPEG Audio Codec

²  Procedure of MPEG audio coding ²  The audio signal is first samples and quantized using PCM

²  The PCM samples are then divided up into 32 frequency sub-bands (sub-band filtering using

DFT)

²  Use psychoacoustics model in bit allocation

²  If the amplitude of signal in band is below the masking threshold, don’t encode

²  Otherwise, allocate bits based on the sensitivity of the signal

²  Multiplex the output of the 32 bands into one bit stream

Sharif University of Technology, Department of Computer Engineering, Multimedia Systems Course

Page 47: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

47

MPEG Layers

²  MPEG defines 3 levels of processing layers for audio: ²  Level 1 is the basic mode,

²  Levels 2 and 3 more advance (use temporal masking).

²  Level 3 is the most common form for audio files on the Web

²  Our beloved MP3 files

²  Strictly speaking these files should be called MPEG-1 level 3 files.

²  Each level: ²  Increasing levels of sophistication

²  Greater compression ratios.

²  Greater computation expense (but mainly at the coder side)

Sharif University of Technology, Department of Computer Engineering, Multimedia Systems Course

Page 48: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

MPEG1 Audio Layer 1

²  MPEG 1 audio allows sampling rate at 44.1 48, 32, 22.05, 24 and 16KHz.

²  MPEG1 Divides data into frames

²  Each of them contains 384 samples,

²  12 samples from each of the 32 filtered sub-bands.

²  Psychoacoustic model only uses frequency masking.

²  Optional Cyclic Redundancy Code (CRC) error checking.

48 Sharif University of Technology, Department of Computer Engineering, Multimedia Systems Course

Filtering And

downsampling

Audio samples

12 samples 12 samples

12 samples

Perceptual coder

Normalize By scale factor

Page 49: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

MPEG1 Audio Layer 2

²  Layer 2 is very similar to Layer 1, but codes audio data in larger groups: ²  Use three frames in filter: before, current, next, a total of 1152 samples.

²  This models a little bit of the temporal masking.

²  Imposes some restrictions on bit allocation in middle and high sub-bands.

²  More compact coding of scale factors and quantized samples.

²  Better audio quality due to saving bits here so more bits can be used in

quantized sub-band values

49 Sharif University of Technology, Department of Computer Engineering, Multimedia Systems Course

Filtering And

downsampling

Audio samples

36 samples 36 samples

36 samples

Perceptual coder

Normalize By scale factor

Page 50: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

MPEG Audio Layer 3 : MP3

²  Targeted at bit rates of 64 kbits/sec per channel.

²  Much more complex approach.

²  Psychoacoustic model includes temporal masking effects,

²  Better critical band filter is used (non-equal frequencies)

²  Uses a modified DCT (MDCT) for lossless sub-band transformation.

²  Greater frequency resolution accounts for poorer time resolution

²  Uses Huffman coding on quantized samples for better compression.

50 Sharif University of Technology, Department of Computer Engineering, Multimedia Systems Course

Page 51: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

MPEG Audio Layer 3 : MP3

51 Sharif University of Technology, Department of Computer Engineering, Multimedia Systems Course

Page 52: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

MPEG Audio

Digital Media Lab - Sharif University of Technology 52

Page 53: Audio Fundamentals, Compression Techniques & Standardsce.sharif.edu/courses/91-92/2/ce873-1/resources/root... ·  · 2015-04-06Audio Fundamentals, Compression Techniques & Standards

Sharif University of Technology, Department of Computer Engineering, Multimedia Systems Course

Next Session

Image

53