professor douglas lyon [email protected] fairfield university voice and signal processing
TRANSCRIPT
• Professor Douglas Lyon
• Fairfield University
• http://www.docjava.com
Voice and Signal Processing
Two Course Texts!
• Java for Programmers
• Available from:– http://www.docjava.com
Java Digital Signal Processing
• Java for Programmers
• Available from:– http://www.docjava.com
Grading
• Midterm: 1/3
• Homework: 1/3
• Final: 1/3
• Midterm and Final – Take home!
Pre-reqs
• You should have CS232 and MA 172
• OR permission of the instructor
• You need a working knowledge of Java!
What do I need to learn this?
• Basic multimedia programming– It helps implement interesting programs– It enables active learning– It requires a good background in Java
programming
Preliminary Java Topics
• exceptions (ch11)
• nested reference data types (ch 12)
• threads (ch13)
Preliminary IO Topics
• files (ch14)
• streams(15)
• readers (16)
• writers (17)
Preliminary GUI Topics
• Swing (ch 18)
• Events (ch 19)
What is Voice and Signal Processing?
• 1D data processing– input sound– output sound– a time varying functions are used as both input
and output.
What is Digital Signal Processing?
• A kind of data processing.
• Typically numeric data processing– Look at kind and DIMENSION of data.– 1D in, 1D out -> DSP. – 2D in, 2D out -> Image Processing– 2D in, symbols out -> computer vision– 3D in, 2D out -> computer graphics
What are some DSP examples?
• If the input is images and the output is images we call it image processing
• If the input is images and the output is symbols we call it pattern recognition or machine vision
• If the input is text and the output is voice we call it voice synthesis.
• If the input is voice and the output is text we call it voice recognition
• If the input is images and geometry and the output is images we call it image warping
What are some 1D DSP applications?
• Analysis – weak variables -> strong variables
• Systhesis– Strong variables - > weak variables
What are some kinds of 1D data?
• Any form of energy that can be digitized.
• Any source of data (a function in 1D).– Voice data– Sound data– Temperature data– Range, blood pressure, EEG (brain stuff), EKG
(heart stuff), weight, age…..
non-physical phenomena and DSP
• Anything that can produce a digital stream of data is suitable for DSP– i.e., financial data, – statistical data, – network traffic, etc.
What is Audio?
• Pressure wave that moves air.
• Human auditory system (ear).
• Audio is a sensation.
What is a signal?
• signal is any sequence of values.
• Stock price-a function of time
• Image (2d signal)
• Movie – 2d signal as a function of time
• Any collection of symbols or numbers
What is a continuous signal?
• A signal that can be represented as a function of a real-valued domain (i.e., time)
What is a discrete signal?
• A signal that can be represented by a function over a whole number domain.
• sampling is the reduction of a continuous signal to a discrete signal.
What is quantization?
• Given a signal with a set of values, S, create a new signal with a set of values, S’, such that the card(S) > card(S’).
Quantization
• 1 part of digitization
• input v(t)
• ouput Vq(t)
• let N = the number of quantization levels.
• Suppose minimum voltage is 0 vdc
• Suppose max voltage is 1 vdc
• What is the min quantization step?
What is digitization?
• Sampling + quantization
• Converts a continuous signal into a discrete signal
What is Digital Signal Processing?
• Data processing with signals that are both sampled and quantized
Compute the quantization step
• maximum voltage / total number of steps.
• For example, a CD has 16 audio sampling.– N = 2**16 = 65536– Voltage of quantization = 1/ 65536=0.00002
• For AU files, N = 2 ** 8 = 256– Voltage of quantization = 1/256=.003
How do I do digitization?
Low-Pass Filter analog to digital converter
parallel to serial converter
PCM
A low-pass filter removes high frequenciesADC samples the signal and quantizes itParallel to serial converter is a shift-register
Sampling and Quantization
What is the noise relative to the signal?
• SNR = signal to noise ratio
• Log(Signal power / noise power) to base 10.
• This is named after Alexander Grahm Bell
• It is called the decibel (dB).
SNR
Dynamic Range
Log(2) is about 0.3, 0.3*20 = 6
sampling rate
• Nyquist–Shannon sampling theorem
• If a function f(t) contains no frequencies higher than W cps,
• it is completely determined by giving its ordinates at a series of points spaced 1/(2W) seconds apart.
• W=10Hz, then sample at 20 Hz
aliasing
• Sampling artifact that occurs when sampling below the Nyquest rate.
• High frequencies can be reconstructed as low frequencies.
• Images can have interference patterns
What is an anti-aliasing filter?
Low-pass filter
What is oversampling?
• When you sample at higher than Nyquest
General Analysis for the ADC
The role of the low-pass filter
• anti-aliasing filter
• Nyquest frequency = sample freq /2
• only pass freqs below Nyquest Frequency
How do I reconstruct a signal?
PCM Low-Pass Filterserial to parallel converter
digital to analog converter
sample/reconstruction process
v(t)
f s
Amplifierlow-pass filter
outputR
Digitizing Voice: PCM Waveform Encoding
• Nyquist Theorem: sample at twice the highest frequency– Voice frequency range: 300-3400 Hz– Sampling frequency = 8000/sec (every 125us)– Bit rate: (2 x 4 Khz) x 8 bits per sample– = 64,000 bits per second (DS-0)
• By far the most commonly used method
CODEC
PCM64 Kbps
= DS-0
In 1D, DSP Is…
• 1D Digital signal processing is a kind of data processing that operates on 1D PCM data.
O-scope
Harmonics
• The fundamental frequency of a sound is said to be the component of strongest magnitude.
• Few sounds are just sine waves.
• The extra waves in a sound refer to the harmonic content or timbre.
Harmonic formula
• A harmonic is a numeric multiple of pitches.
• If 440 Hz is the 1st harmonic then
• 880 Hz is the 2nd harmonic
• Individual sine waves are called partials.
Harmonic Motion
The frequency of the oscillations is given by
How do I model Spectra?
• Suppose the continuous signal is v(t)
• Let the Fourier coefficients be denoted:
v(t) a0 (a1 cos t b1 sin t) (a2 cos2 t b2 sin 2t)a0 ,a1,b1, a2 ,b2
Sawtooth Wave Form
K=10
Model of a Saw Wave
f (x) 2
1 (n 1) sin(nx)
nn1
K
Sawwave k=100
Example: a 4 voice synthesizer
• Design a program that can:– Play sound– Provide a GUI for determining the amplitudes
of up to 7 harmonics– Enable the user to alter the frequency for the
fundamental tone.– Enable the playing of 4 voices– Enable the control of the overall volume.
Building an Oscillator in software
• //the period of the wave form is
• lambda = 1 / frequency in seconds
• //The number of samples per period is
• samplesPerCycle = sampleRate * lambda;
• sampleRate = 8000 samples/ second
Fourier transform
V( f ) F[v(t )] v(t )e 2 iftdt
v(t) F 1 V( f ) V( f )e2iftdt
How do you compute the Fourier Coefficients?
• Use the Fourier transform!
v(t) a0 (a1 cos t b1 sin t) (a2 cos2 t b2 sin 2t)
V( f ) F[v(t )] v(t )e 2 iftdt
v(t) F 1 V( f ) V( f )e2iftdt
Recall Euler’s identity
• Complex numbers have a real and imaginary part:
e i cos i sin
Another way to express a function
v(t) a0 (a1 cos t b1 sin t) (a2 cos2 t b2 sin 2t)
f 0 frequency
nf 0 nth harmonic of f 0
Sine-Cosine Representation
x(t) an cos(2nf 0t) bn sin(2nf 0t)n1
n0
f 0 frequency
nf 0 nth harmonic of f 0
Correlation
• Fourier coefficients, are found by correlating the time dependent function, x(t), with a Nth harmonic sine-cosine pair:
a0 1
Tx( t)dt
0
T
an 2
Tx( t)cos(2nf 0t)dt
0
T
bn 2
Tx(t )sin(2nf 0t)dt
0
T
amplitude-phase representation
x(t) = c0 cn cos(2f 0t n )n1
c0 1
Tx(t)dt
0
T
cn an
2 bn2
n tan 1 bn
an
Average Power
P 1
t1 t2
x(t)2
t1
t2
2
0
1( )
TP x t dt
T Periodic signal avg power
PSD (Power Spectral Density)
• is the power at a specific frequency, . ( )S f
Linear combinations in the time domain become linear combinations
in the frequency domain
1 1 2 2 1 1 2 1( ) ( ) [ ( ) ( )]a V f a V f F a v t a v t
Delay in the time domain causes a phase shift in the frequency domain
2( ) ( ( ))ifdV f e F v t t
Scale change in the time domain causes a reciprocal scale change in
the frequency domain
1( ( )), 0
fV F v t
convolution theorem: multiplication in the time domain causes
convolution in the frequency domain
* ( ) ( ( ) ( ))V W f F v t w t
Convolution between two functions of the same variable is defined by
* ( ) ( ) ( )V W f V W f d
Various Codec Bandwidth Consumptions
Encoding/Compression
ResultBit Rate
G.711 PCMA-Law/u-Law
64 kbps (DS0)
G.726 ADPCM 16, 24, 32, 40 kbps
G.727 E-ADPCM
G.729 CS-ACELP 8 kbps
G.728 LD-CELP 16 kbps
G.723.1 CELP 6.3/5.3 kbpsVariable
16, 24, 32, 40 kbps
StandardTransmissionRate for Voice
A means to improve SNR
• Compression uses a coder and a decoder.
• One CODEC is called U-Law.
• U-Law runs at 8 khz sampling and 8 bits per digitized sample.
• ULaw is meant for voice.
Voice grade audio-Application
• voice over IP
• Voice ranged to about 3.4 khz
• Sample at 8 Khz, that should be plenty
• Quantize to 8 bits of data (about 48 db SNR)
• Improve the SNR with compression
Voice Quality of Service (QoS) Requirements
Loss
Delay
Delay Variation (Jitter)
Avoiding The 3 Main QoS Challenges
The u-law codec
• X is a number whose range is 0..255
• Log, to the base 2 of X is a number whose range is 0..8
• U-law uses a scale factor (mu) that multiplies the input before log is taken.
• Log (x), base 2 = Log(x)/Log(2)
• Mu-law takes the log to the base 1+mu.