digital signal processing
DESCRIPTION
Digital Signal Processing. January 16, 2014. Analog and Digital. In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude value at all points in time. and there are an infinite number of possible air pressure values. analog clock. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/1.jpg)
Digital Signal Processing
January 18, 2016
![Page 2: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/2.jpg)
Analog-to-Digital Conversion• Recording sounds onto a computer requires an analog-to-
digital conversion (A-to-D)
• When computers record sound, they need to digitize analog readings in two dimensions:
X: Time (this is called sampling)
Y: Amplitude (this is called quantization)
sampling
quantization
![Page 3: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/3.jpg)
Sampling Example
0 20 40 60 80 100-100000
10000
nominal time
amplitude
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
Thanks to Chilin Shih for making these materials available.
![Page 4: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/4.jpg)
Sampling Example
![Page 5: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/5.jpg)
Digital Dimension #2: Quantization
• Each sample that is taken has a range of pressure values
• This range is determined by the number of bits allotted to each sample
• Remember: in computers, numbers are stored in binary format (sequences of ones and zeroes).
• Ex: 89 = 01011001 in 8-bit encoding
• Typical sample sizes:
• 8 bits 28 256 values
• 12 bits 212 4,096 values
• 16 bits 216 65,536 values
![Page 6: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/6.jpg)
Samples Go Small• We lose information when the sample size is too
small, given the same sampling rate.
0 20 40 60 80 100-100000
10000
nominal time
amplitude
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
• Sample size here = 2 bits = 22 = 4 values
![Page 7: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/7.jpg)
Quantization
![Page 8: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/8.jpg)
Quantization Noise
![Page 9: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/9.jpg)
Sample Size Demo
• 11k 16 bits
• 11k 8 bits
• 8k 16 bits
• 8k 8bits (telephone)
• Note: CDs sample at 44,100 Hz and have 16-bit quantization.
• Also check out bad and actedout examples in Praat.
![Page 10: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/10.jpg)
Quantization Range• With 16-bit quantization, we can encode 65,536 different possible amplitude values.
• Remember that I(dB) = 10 * log10 (A2/r2)
• Substitute the max and min amplitude values for A and r, respectively, and we get:
• I(dB) = 10 * log10 (655362/12) = 96.3 dB
• Some newer machines have 24-bit quantization--
• = 16,777,216 possible amplitude values.
• I(dB) = 10 * log10 (167772162/12) = 144.5 dB
• This is bigger than the range of sounds we can listen to without damaging our hearing.
![Page 11: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/11.jpg)
Problem: Clipping• Clipping occurs when the pressure in the analog signal exceeds the quantization range in digitization
• Check out sylvester and normal in Praat.
![Page 12: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/12.jpg)
A Note on Formats• Digitized sound files come in different formats…
• .wav, .aiff, .au, etc.
• Lossless formats digitize sound in the way I’ve just described.
• They only differ in terms of “header” information and specified limits on file size, etc.
• Lossy formats use algorithms to condense the size of sound files
• …and the sound file loses information in the process.
• For instance: the .mp3 format primarily saves space by eliminating some very high frequency information.
• (which is hard for people to hear)
![Page 13: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/13.jpg)
AIFF vs. MP3
.aiff format
.mp3 format
(digitized at 128 kB/s)
• This trick can work pretty well…
![Page 14: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/14.jpg)
MP3 vs. MP3.mp3 format
(digitized at 128 kB/s)
.mp3 format
(digitized at 64 kB/s)
• .mp3 conversion can induce reverb artifacts, and also cut down on temporal resolution (among other things).
![Page 15: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/15.jpg)
Sound Digitization Summary• Samples are taken of an analog sound’s pressure value at a recurring sampling rate.
• This digitizes the time dimension in a waveform.
• The sampling frequency needs to be twice as high as any frequency components you want to capture in the signal.
• E.g., 44100 Hz for speech
• Quantization converts the amplitude value of each sample into a binary number in the computer.
• This digitizes the amplitude dimension in a waveform.
• Rounding off errors can lead to quantization noise.
• Excessive amplitude can lead to clipping errors.
![Page 16: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/16.jpg)
The Digitization of Pitch
• The blue line represents the fundamental frequency (F0) of the speaker’s voice.
• Also known as a pitch track
• How can we automatically “track” F0 in a sample of speech?
• Praat can give us a representation of speech that looks like:
![Page 17: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/17.jpg)
Pitch Tracking• Voicing:
• Air flow through vocal folds
• Rapid opening and closing due to Bernoulli Effect
• Each cycle sends an acoustic shockwave through the vocal tract
• …which takes the form of a complex wave.
• The rate at which the vocal folds open and close becomes the fundamental frequency (F0) of a voiced sound.
![Page 18: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/18.jpg)
Voicing Bars
![Page 19: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/19.jpg)
Voicing Bars
Individual glottal pulses
![Page 20: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/20.jpg)
Voicing = Complex Wave
• Note: voicing is not perfectly periodic.
• …always some random variation from one cycle to the next.
• How can we measure the fundamental frequency of a complex wave?
![Page 21: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/21.jpg)
• The basic idea: figure out the period between successive cycles of the complex wave.
• Fundamental frequency = 1 / period
duration = ???
![Page 22: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/22.jpg)
Measuring F0• To figure out where one cycle ends and the next
begins…
• The basic idea is to find how well successive “chunks” of a waveform match up with each other.
• One period = the length of the chunk that matches up best with the next chunk.
• Automatic Pitch Tracking parameters to think about:
1. Window size (i.e., chunk size)
2. Step size
3. Frequency range (= period range)
![Page 23: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/23.jpg)
Window (Chunk) Size
Here’s an example of a small window
![Page 24: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/24.jpg)
Window (Chunk) Size
Here’s an example of a large(r) window
![Page 25: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/25.jpg)
Initial window of the waveform is compared to another window (of the same duration) at a later point in the waveform
![Page 26: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/26.jpg)
Matching
The waveforms in the two windows are compared to see how well they match up.
Correlation = measure of how well the two windows match
???
![Page 27: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/27.jpg)
Autocorrelation• The measure of correlation =
• Sum of the point-by-point products of the two chunks.
• The technical name for this is autocorrelation…
• because two parts of the same wave are being matched up against each other.
• (“auto” = self)
![Page 28: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/28.jpg)
Autocorrelation Example• Ex: consider window x, with n samples…
• What’s its correlation with window y?
• (Note: window y must also have n samples)
• x1 = first sample of window x
• x2 = second sample of window x
• …
• xn = nth (final) sample of window x
• y1 = first sample of window y, etc.
• Correlation (R) = x1*y1 + x2* y2 + … + xn* yn
• The larger R is, the better the correlation.
![Page 29: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/29.jpg)
By the NumbersSample 1 2 3 4 5 6
x .8 .3 -.2 -.5 .4 .8
y -.3 -.1 .1 .3 .1 -.1
product -.24 -.03 -.02 -.15 .04 -.08
Sum of products = -.48
• These two chunks are poorly correlated with each other.
![Page 30: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/30.jpg)
By the Numbers, part 2Sample 1 2 3 4 5 6
x .8 .3 -.2 -.5 .4 .8
z .7 .4 -.1 -.4 .1 .4
product .56 .12 .02 .2 .04 .32
Sum of products = 1.26
• These two chunks are well correlated with each other.
(or at least better than the previous pair)
• Note: matching peaks count for more than matches close to 0.
![Page 31: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/31.jpg)
Back to (Digital) Reality
The waveforms in the two windows are compared to see how well they match up.
Correlation = measure of how well the two windows match
???
These two windows are poorly correlated
![Page 32: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/32.jpg)
Next: the pitch tracking algorithm moves further down the waveform and grabs a new window
![Page 33: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/33.jpg)
The distance the algorithm moves forward in the waveform is called the step size
“step”
![Page 34: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/34.jpg)
Matching, again
The next window gets compared to the original.
???
![Page 35: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/35.jpg)
Matching, again
The next window gets compared to the original.
???
These two windows are also poorly correlated
![Page 36: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/36.jpg)
The algorithm keeps chugging and, eventually…
another “step”
![Page 37: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/37.jpg)
Matching, again
The best match is found.
???
These two windows are highly correlated
![Page 38: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/38.jpg)
The fundamental period can be determined by calculating the length of time between the start of window 1 and the start of (well correlated) window 2.
period
![Page 39: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/39.jpg)
period
• Frequency is 1 / period
• Q: How many possible periods does the algorithm need to check?
• Frequency range (default in Praat: 75 to 600 Hz)
Mopping up
![Page 40: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/40.jpg)
Moving on
• Another comparison window is selected and the whole process starts over again.
![Page 41: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/41.jpg)
*
**********************
*******************
*************
****** ********************
************* ************** ***********************
**********************
*********** ****************** *******
****************
F0 (Hz)
1 2 3 4 (s)
200300400
Time
would
Uhm
I
like
A flight to Seattle from Albuquerque
• The algorithm ultimately spits out a pitch track.
• This one shows you the F0 value at each step.
Thanks to Chilin Shih for making these materials available
![Page 42: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/42.jpg)
Pitch Tracking in Praat• Play with F0 range.
• Create Pitch Object.
• Also go To Manipulation…Pitch.
• Also check out:
![Page 43: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/43.jpg)
Summing Up• Pitch tracking uses three parameters
1. Window size
• Ensures reliability
• In Praat, the window size is always three times the longest possible period.
• E.g.: 3 X 1/75 = .04 sec.
2. Step size
• For temporal precision
3. Frequency range
• Reduces computational load
![Page 44: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/44.jpg)
Deep Thought Questions• What might happen if:
• The shortest period checked is longer than the fundamental period?
• AND two fundamental periods fit inside a window?
• Potential Problem #1: Pitch Halving
• The pitch tracker thinks the fundamental period is twice as long as it is in reality.
• It estimates F0 to be half of its actual value
![Page 45: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/45.jpg)
Pitch Halving
pitch is halvedCheck out normal file in Praat.
![Page 46: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/46.jpg)
More Deep Thoughts• What might happen if:
• The shortest period checked is less than half of the fundamental period?
• AND the second half of the fundamental cycle is very similar to the first?
• Potential Problem #2: Pitch doubling
• The pitch tracker thinks the fundamental period is half as long as it actually is.
• It estimates the F0 to be twice as high as it is in reality.
![Page 47: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/47.jpg)
Pitch Doubling
pitch is doubled
![Page 48: Digital Signal Processing](https://reader036.vdocument.in/reader036/viewer/2022062809/568159f6550346895dc742b9/html5/thumbnails/48.jpg)
Microperturbations• Another problem:
• Speech waveforms are partly shaped by the type of segment being produced.
• Pitch tracking can become erratic at the juncture of two segments.
• In particular:
• voiced to voiceless segments
• sonorants to obstruents
• These discontinuities in F0 are known as microperturbations.
• Also: transitions between modal and creaky voicing tend to be problematic.