digital signal processing january 16, 2014 analog and digital in “reality”, sound is analog....

57
Digital Signal Processing January 16, 2014

Upload: kelly-horton

Post on 14-Jan-2016

214 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Digital Signal Processing

January 16, 2014

Page 2: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Analog and Digital

• In “reality”, sound is analog.

• variations in air pressure are continuous

• = it has an amplitude value at all points in time.

• and there are an infinite number of possible air pressure values.

• Back in the bad old days, acoustic phonetics was strictly an analog endeavor.

analog clock

Page 3: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Analog and Digital

• In the good new days, we can represent sound digitally in a computer.

• In a computer, sounds must be discrete.

• everything = 1 or 0 digital clock

• Computers represent sounds as sequences of discrete pressure values at separate points in time.

• Finite number of pressure values.

• Finite number of points in time.

Page 4: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Analog-to-Digital Conversion• Recording sounds onto a computer requires an analog-to-

digital conversion (A-to-D)

• When computers record sound, they need to digitize analog readings in two dimensions:

X: Time (this is called sampling)

Y: Amplitude (this is called quantization)

sampling

quantization

Page 5: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Sampling Example

0 20 40 60 80 100-100000

10000

nominal time

amplitude

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

Thanks to Chilin Shih for making these materials available.

Page 6: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Sampling Example

Page 7: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Sampling Rate• Sampling rate = frequency at which samples are taken.

• What’s a good sampling rate for speech?

• Typical options include:

• 22050 Hz, 44100 Hz, 48000 Hz

• sometimes even 96000 Hz and 192000 Hz

• Higher sampling rate preserves sound quality.

• Lower sampling rate saves disk space.

• (which is no longer much of an issue)

• Young, healthy human ears are sensitive to sounds from 20 Hz to 20,000 Hz

Page 8: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

One Consideration• The Nyquist Frequency

• = highest frequency component that can be captured with a given sampling rate

• = one-half the sampling rate

Problematic Example:

• 100 Hz sound

• 100 Hz sampling rate

samples 1 2 3

Harry Nyquist (1889-1976)

Page 9: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Nyquist’s Implication• An adequate sampling rate has to be…

• at least twice as much as any frequency components in the signal that you’d like to capture.

• 100 Hz sound

• 200 Hz sampling rate

samples 1 2 3 4 5 6

Page 10: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Sampling Rate Demo• Speech should be sampled at at least 44100 Hz

• (although there is little frequency information in speech above 10,000 Hz)

• 44100 Hz

• 22050 Hz • 11025 Hz (watch out for [s])

• 8000 Hz • 5000 Hz

Page 11: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Another Problem• When the continuous sound signal completes more than one cycle in between samples, a phenomenon called aliasing occurs.

• The digital signal then contains a low frequency component which is not in the analog signal.

Page 12: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

The Aliasing Solution: Filtering• Whenever sound is digitized, frequencies above the Nyquist frequency need to be filtered out of the end product.

• E.g., CDs digitize at a 44100 Hz sampling rate…

• And filter out any components over 20000 Hz.

• “Low-pass filters”

• allow low frequencies to pass through the filter.

• and remove high frequencies from the signal.

• Cf. “high-pass” filters:

• allow high frequencies to pass through filter.

Page 13: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Low-Pass Filter in Action• Power spectrum of 100 Hz + 1000 Hz combo:

• Filter passes 100 Hz component, but not 1000 Hz component.

Page 14: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Digital Dimension #2: Quantization

• Each sample that is taken has a range of pressure values

• This range is determined by the number of bits allotted to each sample

• Remember: in computers, numbers are stored in binary format (sequences of ones and zeroes).

• Ex: 89 = 01011001 in 8-bit encoding

• Typical sample sizes:

• 8 bits 28 256 values

• 12 bits 212 4,096 values

• 16 bits 216 65,536 values

Page 15: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Samples Go Small• We lose information when the sample size is too

small, given the same sampling rate.

0 20 40 60 80 100-100000

10000

nominal time

amplitude

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

o

• Sample size here = 2 bits = 22 = 4 values

Page 16: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Quantization

Page 17: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Quantization Noise

Page 18: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Sample Size Demo

• 11k 16 bits

• 11k 8 bits

• 8k 16 bits

• 8k 8bits (telephone)

• Note: CDs sample at 44,100 Hz and have 16-bit quantization.

• Also check out bad and actedout examples in Praat.

Page 19: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Quantization Range• With 16-bit quantization, we can encode 65,536 different possible amplitude values.

• Remember that I(dB) = 10 * log10 (A2/r2)

• Substitute the max and min amplitude values for A and r, respectively, and we get:

• I(dB) = 10 * log10 (655362/12) = 96.3 dB

• Some newer machines have 24-bit quantization--

• = 16,777,216 possible amplitude values.

• I(dB) = 10 * log10 (167772162/12) = 144.5 dB

• This is bigger than the range of sounds we can listen to without damaging our hearing.

Page 20: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Problem: Clipping• Clipping occurs when the pressure in the analog signal exceeds the sample size range in digitization

• Check out sylvester and normal in Praat.

Page 21: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

A Note on Formats• Digitized sound files come in different formats…

• .wav, .aiff, .au, etc.

• Lossless formats digitize sound in the way I’ve just described.

• They only differ in terms of “header” information and specified limits on file size, etc.

• Lossy formats use algorithms to condense the size of sound files

• …and the sound file loses information in the process.

• For instance: the .mp3 format primarily saves space by eliminating some very high frequency information.

• (which is hard for people to hear)

Page 22: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

AIFF vs. MP3

.aiff format

.mp3 format

(digitized at 128 kB/s)

• This trick can work pretty well…

Page 23: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

MP3 vs. MP3.mp3 format

(digitized at 128 kB/s)

.mp3 format

(digitized at 64 kB/s)

• .mp3 conversion can induce reverb artifacts, and also cut down on temporal resolution (among other things).

Page 24: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Sound Digitization Summary• Samples are taken of an analog sound’s pressure value at a recurring sampling rate.

• This digitizes the time dimension in a waveform.

• The sampling frequency needs to be twice as high as any frequency components you want to capture in the signal.

• E.g., 44100 Hz for speech

• Quantization converts the amplitude value of each sample into a binary number in the computer.

• This digitizes the amplitude dimension in a waveform.

• Rounding off errors can lead to quantization noise.

• Excessive amplitude can lead to clipping errors.

Page 25: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

The Digitization of Pitch

• The blue line represents the fundamental frequency (F0) of the speaker’s voice.

• Also known as a pitch track

• How can we automatically “track” F0 in a sample of speech?

• Praat can give us a representation of speech that looks like:

Page 26: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Pitch Tracking• Voicing:

• Air flow through vocal folds

• Rapid opening and closing due to Bernoulli Effect

• Each cycle sends an acoustic shockwave through the vocal tract

• …which takes the form of a complex wave.

• The rate at which the vocal folds open and close becomes the fundamental frequency (F0) of a voiced sound.

Page 27: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Voicing Bars

Page 28: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Voicing Bars

Individual glottal pulses

Page 29: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Voicing = Complex Wave

• Note: voicing is not perfectly periodic.

• …always some random variation from one cycle to the next.

• How can we measure the fundamental frequency of a complex wave?

Page 30: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

• The basic idea: figure out the period between successive cycles of the complex wave.

• Fundamental frequency = 1 / period

duration = ???

Page 31: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Measuring F0• To figure out where one cycle ends and the next

begins…

• The basic idea is to find how well successive “chunks” of a waveform match up with each other.

• One period = the length of the chunk that matches up best with the next chunk.

• Automatic Pitch Tracking parameters to think about:

1. Window size (i.e., chunk size)

2. Step size

3. Frequency range (= period range)

Page 32: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Window (Chunk) Size

Here’s an example of a small window

Page 33: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Window (Chunk) Size

Here’s an example of a large(r) window

Page 34: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Initial window of the waveform is compared to another window (of the same duration) at a later point in the waveform

Page 35: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Matching

The waveforms in the two windows are compared to see how well they match up.

Correlation = measure of how well the two windows match

???

Page 36: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Autocorrelation• The measure of correlation =

• Sum of the point-by-point products of the two chunks.

• The technical name for this is autocorrelation…

• because two parts of the same wave are being matched up against each other.

• (“auto” = self)

Page 37: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Autocorrelation Example• Ex: consider window x, with n samples…

• What’s its correlation with window y?

• (Note: window y must also have n samples)

• x1 = first sample of window x

• x2 = second sample of window x

• …

• xn = nth (final) sample of window x

• y1 = first sample of window y, etc.

• Correlation (R) = x1*y1 + x2* y2 + … + xn* yn

• The larger R is, the better the correlation.

Page 38: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

By the NumbersSample 1 2 3 4 5 6

x .8 .3 -.2 -.5 .4 .8

y -.3 -.1 .1 .3 .1 -.1

product -.24 -.03 -.02 -.15 .04 -.08

Sum of products = -.48

• These two chunks are poorly correlated with each other.

Page 39: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

By the Numbers, part 2Sample 1 2 3 4 5 6

x .8 .3 -.2 -.5 .4 .8

z .7 .4 -.1 -.4 .1 .4

product .56 .12 .02 .2 .04 .32

Sum of products = 1.26

• These two chunks are well correlated with each other.

(or at least better than the previous pair)

• Note: matching peaks count for more than matches close to 0.

Page 40: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Back to (Digital) Reality

The waveforms in the two windows are compared to see how well they match up.

Correlation = measure of how well the two windows match

???

These two windows are poorly correlated

Page 41: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Next: the pitch tracking algorithm moves further down the waveform and grabs a new window

Page 42: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

The distance the algorithm moves forward in the waveform is called the step size

“step”

Page 43: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Matching, again

The next window gets compared to the original.

???

Page 44: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Matching, again

The next window gets compared to the original.

???

These two windows are also poorly correlated

Page 45: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

The algorithm keeps chugging and, eventually…

another “step”

Page 46: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Matching, again

The best match is found.

???

These two windows are highly correlated

Page 47: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

The fundamental period can be determined by the calculating the length of time between the start of window 1 and the start of (well correlated) window 2.

period

Page 48: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

period

• Frequency is 1 / period

• Q: How many possible periods does the algorithm need to check?

• Frequency range (default in Praat: 75 to 600 Hz)

Mopping up

Page 49: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Moving on

• Another comparison window is selected and the whole process starts over again.

Page 50: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

*

**********************

*******************

*************

****** ********************

************* ************** ***********************

**********************

*********** ****************** *******

****************

F0 (Hz)

1 2 3 4 (s)

200300400

Time

would

Uhm

I

like

A flight to Seattle from Albuquerque

• The algorithm ultimately spits out a pitch track.

• This one shows you the F0 value at each step.

Thanks to Chilin Shih for making these materials available

Page 51: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Pitch Tracking in Praat• Play with F0 range.

• Create Pitch Object.

• Also go To Manipulation…Pitch.

• Also check out:

Page 52: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Summing Up• Pitch tracking uses three parameters

1. Window size

• Ensures reliability

• In Praat, the window size is always three times the longest possible period.

• E.g.: 3 X 1/75 = .04 sec.

2. Step size

• For temporal precision

3. Frequency range

• Reduces computational load

Page 53: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Deep Thought Questions• What might happen if:

• The shortest period checked is longer than the fundamental period?

• AND two fundamental periods fit inside a window?

• Potential Problem #1: Pitch Halving

• The pitch tracker thinks the fundamental period is twice as long as it is in reality.

• It estimates F0 to be half of its actual value

Page 54: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Pitch Halving

pitch is halvedCheck out normal file in Praat.

Page 55: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

More Deep Thoughts• What might happen if:

• The shortest period checked is less than half of the fundamental period?

• AND the second half of the fundamental cycle is very similar to the first?

• Potential Problem #2: Pitch doubling

• The pitch tracker thinks the fundamental period is half as long as it actually is.

• It estimates the F0 to be twice as high as it is in reality.

Page 56: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Pitch Doubling

pitch is doubled

Page 57: Digital Signal Processing January 16, 2014 Analog and Digital In “reality”, sound is analog. variations in air pressure are continuous = it has an amplitude

Microperturbations• Another problem:

• Speech waveforms are partly shaped by the type of segment being produced.

• Pitch tracking can become erratic at the juncture of two segments.

• In particular:

• voiced to voiceless segments

• sonorants to obstruents

• These discontinuities in F0 are known as microperturbations.

• Also: transitions between modal and creaky voicing tend to be problematic.