digitized sound - electrical, computer & energy...
TRANSCRIPT
Digitized SoundDigitized Sound
Telecommunications 1Telecommunications 1P. MathysP. Mathys
Sampling of WaveformsSampling of Waveforms
Computers cannot directly deal with Computers cannot directly deal with continuouscontinuous--timetime (CT) waveforms.(CT) waveforms.A CT waveform needs to beA CT waveform needs to be sampledsampled at at regular time intervals before it can be regular time intervals before it can be stored and processed by a computer.stored and processed by a computer.The sampling operation converts the CT The sampling operation converts the CT signal into a signal into a discretediscrete--timetime (DT) signal (DT) signal or or sequencesequence..
Sampling of WaveformsSampling of Waveforms
The next slide shows a sinewave and its The next slide shows a sinewave and its samples after sampling with ratesamples after sampling with rateFs = 16000 Hz (16000 samples/sec)Fs = 16000 Hz (16000 samples/sec)The samples are marked with “The samples are marked with “oo” in the ” in the graph.graph.Each sample is a real number, e.g., Each sample is a real number, e.g., 0.70710678 and a sampled waveform is 0.70710678 and a sampled waveform is a sequence of such numbers.a sequence of such numbers.
Sampling of 1000 Hz SinewaveSampling of 1000 Hz Sinewave
16000 Samples/sec ==> Fs = 16000 Hz 16 Samples/Period
Sampling at Higher RateSampling at Higher Rate
If we sample at a higher rate, we expect If we sample at a higher rate, we expect toto
Need more memory.Need more memory.Need larger files to store all samples.Need larger files to store all samples.Takes longer to transmit samples over Takes longer to transmit samples over network.network.Reduce approximation error that results Reduce approximation error that results from sampling a CT signal.from sampling a CT signal.
Sampling at Higher RateSampling at Higher Rate
32000 Samples/sec ==> Fs = 32000 Hz 32 Samples/Period
Sampling at Lower RateSampling at Lower Rate
If we sample at a lower rate, we will use If we sample at a lower rate, we will use less storage space and transmission, less storage space and transmission, e.g., over the Internet, will be faster.e.g., over the Internet, will be faster.But how much will the quality degrade?But how much will the quality degrade?What is the minimum sampling rate that What is the minimum sampling rate that is needed?is needed?
Sampling at Lower RateSampling at Lower Rate
8000 Samples/sec ==> Fs = 8000 Hz 8 Samples/Period
Using Different Sampling RatesUsing Different Sampling Rates
Sampling RateFs
File Size(1 sec, 16 bits)
SoundSample
8,000 Hz 16 kB
16,000 Hz 32 kB
32,000 Hz 64 kB
sin1000_8.wav
sin1000_16.wav
sin1000_32.wav
What Sampling Rate is Best?What Sampling Rate is Best?
The previous three examples of The previous three examples of sampling a sampling a 1000 Hz1000 Hz tone at tone at Fs=8000 Fs=8000 Hz, Fs=16000 Hz, Hz, Fs=16000 Hz, andand Fs=32000 HzFs=32000 Hzshow no difference in sound quality.show no difference in sound quality.But let’s look at another example where But let’s look at another example where we sample a we sample a 5000 Hz5000 Hz tone at tone at Fs=16000Fs=16000Hz and Hz and Fs=8000Fs=8000 Hz.Hz.
5000 Hz Sinewave, Fs=16000 Hz5000 Hz Sinewave, Fs=16000 Hz
16 samples in 1 msec, 3.2 samples per period
sin5000_16.wav
5000 Hz Sinewave, Fs=8000 Hz5000 Hz Sinewave, Fs=8000 Hz
Leaving out every second sample (x) we have8 samples (o) in 1 msec, 1.6 samples per period
Fs=8000
Fs=16000sin5000_16.wav
sin5000_8.wav
5000 Hz Sinewave5000 Hz Sinewave
The The 5000 Hz5000 Hz sinewave sounds different sinewave sounds different at at Fs=8000 HzFs=8000 Hz and at and at Fs=16000 HzFs=16000 Hz..The reason is that the soundcard, which The reason is that the soundcard, which converts the samples back to a CT converts the samples back to a CT waveform, tries to find the “smoothest” waveform, tries to find the “smoothest” (i.e., the lowest frequency) waveform (i.e., the lowest frequency) waveform that passes through all samples as that passes through all samples as shown in the next slide.shown in the next slide.
5000 Hz Sinewave, Fs=8000 Hz5000 Hz Sinewave, Fs=8000 Hz
Green curve is “smoother” than blue (dashed) curve.
Green
Blue
sin5000_8.wav
sin5000_16.wav
AliasingAliasingThe effect whereby tone frequencies The effect whereby tone frequencies are altered because of a reduction in are altered because of a reduction in sampling rate is called sampling rate is called aliasingaliasing..Aliasing affects all sampled sound Aliasing affects all sampled sound sequences, whether they be pure tones, sequences, whether they be pure tones, music or speech. Music examples:music or speech. Music examples:
OriginalFs=44100 Hz
w/o AliasingFs=11025 Hz
with AliasingFs=11025 Hz
muss44.wav muss11.wav muss11_ali.wav
AliasingAliasing
With Aliasing Original
Aliasing foldshigh frequencycomponentsdown to lowfrequencies.This is visibleand audibleespecially wellin the portionsmarked in red.
muss11_ali_orig.wav
muss44s.wavmuss11_ali44s.wav
AliasingAliasing
With Aliasing No Aliasing
To preventaliasing fromoccurring, thesound fileneeds to belowpass-filteredbefore thesampling rateis reduced.
muss11_ali_noali.wav
muss11_ali44s.wav muss11_44s.wav
Nyquist RateNyquist Rate
Let Let BB (in Hz) be the highest frequency (in Hz) be the highest frequency contained in a sound waveform.contained in a sound waveform.Sampling TheoremSampling Theorem: To avoid distortion : To avoid distortion due to aliasing, a signal of bandwidth due to aliasing, a signal of bandwidth BBmust use a sampling rate must use a sampling rate Fs>2BFs>2B..The sampling rate of The sampling rate of 2B2B samples/sec is samples/sec is called called Nyquist rateNyquist rate..
Common Sampling RatesCommon Sampling Rates
Application Sampling RateFs [Hz]
BandwidthB [Hz]
Telephony 8000 3000
MusicLow Quality 11025 5000
Music 22050 10000
MusicHi-Fi 44100 20000
DATDigital Audio Tape 48000 22000
QuantizationQuantization
Sampling alone is not sufficient to Sampling alone is not sufficient to convert a waveform to a format that a convert a waveform to a format that a computer can handle.computer can handle.The problem is that each sample is a The problem is that each sample is a real numberreal number that requires infinite that requires infinite precision for processing and storing.precision for processing and storing.Computers have finite word length and Computers have finite word length and can only can only approximateapproximate real numbers.real numbers.
QuantizationQuantization
For example,For example,pi = 3.14159265358979323846264…pi = 3.14159265358979323846264…
is a real number. On a computer that is a real number. On a computer that can represent 5 decimal digits we would can represent 5 decimal digits we would approximate approximate pipi as as 3.14163.1416..Typical computer word lengths are 8, Typical computer word lengths are 8, 16, 32, and 64 bits (approximately 2, 4, 16, 32, and 64 bits (approximately 2, 4, 9, and 19 decimal digits).9, and 19 decimal digits).
QuantizationQuantization
The process of approximating a real The process of approximating a real number by a number with finite number by a number with finite wordlength is called wordlength is called quantizationquantization..Unlike sampling, quantization isUnlike sampling, quantization is not a not a reversible processreversible process..However, by choosing a large enough However, by choosing a large enough wordlength, the quantization error can wordlength, the quantization error can be made as small as desired.be made as small as desired.
Quantization: ExamplesQuantization: Examples
The next slides show the quantization of The next slides show the quantization of a sine wave to 2, 3, and 4 bits:a sine wave to 2, 3, and 4 bits:
2 Bits (4 levels): 2 Bits (4 levels): 00, 01, 10, 1100, 01, 10, 113 Bits (8 levels): 3 Bits (8 levels): 000, 001, 010, 011, 100, 000, 001, 010, 011, 100, 101, 110, 111101, 110, 1114 Bits (16 levels): 4 Bits (16 levels): 0000, 0001, 0010, 0011, 0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 0100, 0101, 0110, 0111, 1000, 1001, 1010, 1011, 1100, 1101, 1110, 11111011, 1100, 1101, 1110, 1111
Quantization: Example (2 Bits)Quantization: Example (2 Bits)
Digital sequence of samples (Fs=16000 Hz):10,11,11,11,11,11,11,10,01,00,00,00,00,...
00
11
10
01
2 bits
16 bitsq1000_16.wav
q1000_2.wav
Quantization: Example (2 Bits)Quantization: Example (2 Bits)Quantization isa non-linearoperation thatintroduces newfrequencycomponents.Here the newcomponentsappear at oddmultiples of thefundamentalfrequency
q1000h_16_2.wav
16 bits 2 bits
q1000_16.wav q1000_2.wav
Quantization: Example (3 Bits)Quantization: Example (3 Bits)
000001
011
101110111
100
010
Digital sequence of samples (Fs=16000 Hz):100,110,111,111,111,111,110,100,011,001,...
16 bits
3 bits
q1000_16.wav
q1000_3.wav
Quantization: Example (4 Bits)Quantization: Example (4 Bits)
Digital sequence of samples (Fs=16000 Hz):1001,1100,1110,1111,1111,1110,1100,1001,0110,...
0000000100100011010001010110011110001001101010111100110111101111
4 bits
16 bits
q1000_4.wav
q1000_16.wav
Quantization ErrorQuantization Error
The difference The difference q(t)q(t)--y(t)y(t) between the between the quantized signal and the original signal quantized signal and the original signal (shown dotted in red in the previous (shown dotted in red in the previous slides) is called slides) is called quantization errorquantization error..The quantization error becomes larger if The quantization error becomes larger if fewer bits are used. Quantization is a fewer bits are used. Quantization is a nonlinear processnonlinear process that introduces new that introduces new and unwanted frequency components.and unwanted frequency components.
Signal to Quantization Noise RatioSignal to Quantization Noise Ratio
The signal to quantization noise ratio The signal to quantization noise ratio ((SQNRSQNR) is a measure to judge the effect ) is a measure to judge the effect of quantization.of quantization.SQNR is computed as:SQNR is computed as:SQNR = 10*log(signal power/quant err pwr)SQNR = 10*log(signal power/quant err pwr)
where the logarithm is taken to the base where the logarithm is taken to the base 10 and SQNR is measured in dB 10 and SQNR is measured in dB (decibels).(decibels).
If the signal power is If the signal power is 1010 times higher times higher than the quantization error power, then than the quantization error power, then the SQNR is the SQNR is 10 dB10 dB, if it is a , if it is a 100100 times times higher, the SQNR is higher, the SQNR is 20 dB20 dB, if it is a , if it is a 10001000times higher, the SQNR is times higher, the SQNR is 30 dB30 dB, etc., etc.For quantization of a sinusoid to For quantization of a sinusoid to kk bits bits we havewe have
SQNR = 6.02 x k + 1.76 dBSQNR = 6.02 x k + 1.76 dB
Signal to Quantization Noise RatioSignal to Quantization Noise Ratio
Quantization of SinewavesQuantization of SinewavesQuantization SQNR [dB] Sound
16 bits 98.1
8 bits 49.9
4 bits 25.8
3 bits 19.8
2 bits 13.8
1 bit 7.8 q1000_1.wav
q1000_2.wav
q1000_3.wav
q1000_4.wav
q1000_8.wav
q1000_16.wav
Quantization of Speech/MusicQuantization of Speech/Music
Quantization SQNR [dB] Sound
16 bits ~84
8 bits 35.0
6 bits 22.6
4 bits 10.1
2 bits -3.5
tlp_16.wav
tlp_8.wav
tlp_6.wav
tlp_4.wav
tlp_2.wav
SQNR for 16SQNR for 16--bit bit QuantizationQuantization
1616--Bit Bit quantization quantization (e.g., for CD) yields (e.g., for CD) yields approximately 90 dB SQNR.approximately 90 dB SQNR.What does that mean?What does that mean?10*log(signal 10*log(signal pwrpwr/noise /noise pwrpwr) = 90) = 90
Signal Signal pwrpwr/noise /noise pwr pwr = 10^9= 10^9That is, the That is, the quantization quantization noise power is noise power is only only one billionthone billionth of the signal power.of the signal power.
Common Parameters for SoundCommon Parameters for Sound
The table on the next slide shows The table on the next slide shows common sampling rates common sampling rates FsFs and and quantizations quantizations Q Q (in bits/sample) for (in bits/sample) for sound waveforms.sound waveforms.The uncompressed rate The uncompressed rate R R (in bytes/sec) (in bytes/sec) is computed using:is computed using:
R = #channels x Fs x Q / 8R = #channels x Fs x Q / 8
Common Parameters for SoundCommon Parameters for SoundApplication # of
ChannelsFs
in HertzQ
in bitsRate R
in kB/secSpeech
(Telephony)1 8000 8 8
Speech 1 11025 8 11.025
MusicLow Quality
1 11025 16 22.05
SpeechHigh Quality
1 22050 16 44.1
Music, Stereo 2 22050 16 88.2
Music, Hi-Fi 1 44100 16 88.2
MusicHi-Fi Stereo
2 44100 16 176.4
The MPEG StandardsThe MPEG Standards
MPEG MPEG stands for Moving Picture stands for Moving Picture Experts Group. This group works on Experts Group. This group works on standards for coding of moving images standards for coding of moving images and sound.and sound.MPEG standards can be obtained from MPEG standards can be obtained from ISO (International Standards ISO (International Standards Organization) or, in the US, from ANSI Organization) or, in the US, from ANSI (American National Standards Institute).(American National Standards Institute).
The MPEG StandardsThe MPEG Standards
MPEGMPEG--1: Standard for compression and 1: Standard for compression and coding of relatively low resolution video coding of relatively low resolution video (352x240 pixels, 30 frames/s) at 1.152 (352x240 pixels, 30 frames/s) at 1.152 MbitsMbits/sec (or 144 /sec (or 144 kBkB/sec) for CD/sec) for CD--ROM.ROM.MPEGMPEG--2: Is an extension of MPEG2: Is an extension of MPEG--1 for 1 for highhigh--quality digital video using rates in quality digital video using rates in the range 1.5 ... 15 the range 1.5 ... 15 MbitsMbits/sec./sec.
MPEGMPEG--3: Was once intended for HDTV 3: Was once intended for HDTV (High Definition Television) applications, (High Definition Television) applications, but HDTV is now part of MPEGbut HDTV is now part of MPEG--2. Thus 2. Thus MPEGMPEG--3 efforts were abandoned.3 efforts were abandoned.MPEGMPEG--4: Intended for very low4: Intended for very low bitratebitratecoding of audiocoding of audio--visual programs, e.g., visual programs, e.g., for interactive mobile multimedia for interactive mobile multimedia applications at rates up to 64applications at rates up to 64 kbitskbits/sec./sec.
The MPEG StandardsThe MPEG Standards
MPEGMPEG--1 (and MPEG1 (and MPEG--2) specify a family 2) specify a family of three audio coding schemes, called of three audio coding schemes, called layerlayer--1, layer1, layer--2, and layer2, and layer--3.3.From layerFrom layer--1 to layer1 to layer--3, encoder 3, encoder complexity and performance (sound complexity and performance (sound quality perquality per bitratebitrate) are increasing.) are increasing.
LayerLayer--1: From 321: From 32 kbitkbit/sec to 448/sec to 448 kbitkbit/sec/secLayerLayer--2: From 322: From 32 kbitkbit/sec to 384/sec to 384 kbitkbit/sec/secLayerLayer--3: From 323: From 32 kbitkbit/sec to 320/sec to 320 kbitkbit/sec/sec
The MPEG StandardsThe MPEG Standards
LayerLayer--1:1: 4:1 (4:1 (typtyp. rate 384 kbps). rate 384 kbps)LayerLayer--2:2: 6:1…8:1 (6:1…8:1 (typtyp. rate 224 kbps). rate 224 kbps)LayerLayer--3:3: 10:1…12:1 (10:1…12:1 (typtyp. rate 128 kbps). rate 128 kbps)
MP3: MP3: Typical Compression RatiosTypical Compression Ratios
MP3 Compression TechniquesMP3 Compression Techniques
Minimal Audition Threshold:Minimal Audition Threshold: Is not Is not linear, ear is most sensitive between 2 linear, ear is most sensitive between 2 and 5 kHz. Sounds below threshold and 5 kHz. Sounds below threshold need not be retained and coded.need not be retained and coded.Masking Effect:Masking Effect: During strong sounds During strong sounds you do not hear the weakest sounds. you do not hear the weakest sounds. Thus, using Thus, using psychoacoustic psychoacoustic modeling modeling not all sounds need to be coded.not all sounds need to be coded.
MP3 Compression TechniquesMP3 Compression TechniquesReservoir of Bytes:Reservoir of Bytes: Some passages Some passages may not be may not be codeable codeable at a given rate at a given rate without altering musical quality. MP 3 without altering musical quality. MP 3 then “borrows” bytes from other then “borrows” bytes from other passages that can be coded at lower passages that can be coded at lower rate.rate.Huffman Coding:Huffman Coding: Is used to code the Is used to code the information into variable length information into variable length codewordscodewords after compression.after compression.
SummarySummary
Sound waveforms need to be sampled Sound waveforms need to be sampled and quantized for computers.and quantized for computers.Sampling converts continuous time to Sampling converts continuous time to discrete time. Aliasing must be avoided.discrete time. Aliasing must be avoided.Quantization converts amplitude to finite Quantization converts amplitude to finite wordlength value. Keep SQNR low.wordlength value. Keep SQNR low.File size (uncompressed) in bytes is:File size (uncompressed) in bytes is:#channels x Fs x bits/sample x time(sec) / 8#channels x Fs x bits/sample x time(sec) / 8