professor douglas lyon [email protected] fairfield university voice and signal processing

• Professor Douglas Lyon

• [email protected]

• Fairfield University

• http://www.docjava.com

Voice and Signal Processing

mailto:[email protected]

Two Course Texts!

• Java for Programmers

• Available from:– http://www.docjava.com

http://www.amazon.com/gp/product/images/0130478695/ref=dp_image_0/103-0188310-2091806?ie=UTF8&n=283155&s=books

Java Digital Signal Processing

• Java for Programmers

• Available from:– http://www.docjava.com

Grading

• Midterm: 1/3

• Homework: 1/3

• Final: 1/3

• Midterm and Final – Take home!

Email

• Please send me an e-mail asking to be placed on the CR310 List

• E-mail: [email protected]

Pre-reqs

• You should have CS232 and MA 172

• OR permission of the instructor

• You need a working knowledge of Java!

What do I need to learn this?

• Basic multimedia programming– It helps implement interesting programs– It enables active learning– It requires a good background in Java

programming

Preliminary Java Topics

• exceptions (ch11)

• nested reference data types (ch 12)

• threads (ch13)

Preliminary IO Topics

• files (ch14)

• streams(15)

• readers (16)

• writers (17)

Preliminary GUI Topics

• Swing (ch 18)

• Events (ch 19)

What is Voice and Signal Processing?

• 1D data processing– input sound– output sound– a time varying functions are used as both input

and output.

What is Digital Signal Processing?

• A kind of data processing.

• Typically numeric data processing– Look at kind and DIMENSION of data.– 1D in, 1D out -> DSP. – 2D in, 2D out -> Image Processing– 2D in, symbols out -> computer vision– 3D in, 2D out -> computer graphics

What are some DSP examples?

• If the input is images and the output is images we call it image processing

• If the input is images and the output is symbols we call it pattern recognition or machine vision

• If the input is text and the output is voice we call it voice synthesis.

• If the input is voice and the output is text we call it voice recognition

• If the input is images and geometry and the output is images we call it image warping

What are some 1D DSP applications?

• Analysis – weak variables -> strong variables

• Systhesis– Strong variables - > weak variables

What are some kinds of 1D data?

• Any form of energy that can be digitized.

• Any source of data (a function in 1D).– Voice data– Sound data– Temperature data– Range, blood pressure, EEG (brain stuff), EKG

(heart stuff), weight, age…..

non-physical phenomena and DSP

• Anything that can produce a digital stream of data is suitable for DSP– i.e., financial data, – statistical data, – network traffic, etc.

What is Audio?

• Pressure wave that moves air.

• Human auditory system (ear).

• Audio is a sensation.

What is a signal?

• signal is any sequence of values.

• Stock price-a function of time

• Image (2d signal)

• Movie – 2d signal as a function of time

• Any collection of symbols or numbers

What is a continuous signal?

• A signal that can be represented as a function of a real-valued domain (i.e., time)

What is a discrete signal?

• A signal that can be represented by a function over a whole number domain.

• sampling is the reduction of a continuous signal to a discrete signal.

http://en.wikipedia.org/wiki/Continuous_signal

http://en.wikipedia.org/wiki/Discrete_signal

What is quantization?

• Given a signal with a set of values, S, create a new signal with a set of values, S’, such that the card(S) > card(S’).

Quantization

• 1 part of digitization

• input v(t)

• ouput Vq(t)

• let N = the number of quantization levels.

• Suppose minimum voltage is 0 vdc

• Suppose max voltage is 1 vdc

• What is the min quantization step?

What is digitization?

• Sampling + quantization

• Converts a continuous signal into a discrete signal

What is Digital Signal Processing?

• Data processing with signals that are both sampled and quantized

Compute the quantization step

• maximum voltage / total number of steps.

• For example, a CD has 16 audio sampling.– N = 2**16 = 65536– Voltage of quantization = 1/ 65536=0.00002

• For AU files, N = 2 ** 8 = 256– Voltage of quantization = 1/256=.003

How do I do digitization?

Low-Pass Filter analog to digital converter

parallel to serial converter

PCM

A low-pass filter removes high frequenciesADC samples the signal and quantizes itParallel to serial converter is a shift-register

Sampling and Quantization

What is the noise relative to the signal?

• SNR = signal to noise ratio

• Log(Signal power / noise power) to base 10.

• This is named after Alexander Grahm Bell

• It is called the decibel (dB).

Dynamic Range

Log(2) is about 0.3, 0.3*20 = 6

sampling rate

• Nyquist–Shannon sampling theorem

• If a function f(t) contains no frequencies higher than W cps,

• it is completely determined by giving its ordinates at a series of points spaced 1/(2W) seconds apart.

• W=10Hz, then sample at 20 Hz

http://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem

http://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampling_theorem

aliasing

• Sampling artifact that occurs when sampling below the Nyquest rate.

• High frequencies can be reconstructed as low frequencies.

• Images can have interference patterns

What is an anti-aliasing filter?

Low-pass filter

http://en.wikipedia.org/wiki/Image:Aliased.png

http://en.wikipedia.org/wiki/Image:Antialiased-sinc.png

What is oversampling?

• When you sample at higher than Nyquest

General Analysis for the ADC

The role of the low-pass filter

• anti-aliasing filter

• Nyquest frequency = sample freq /2

• only pass freqs below Nyquest Frequency

How do I reconstruct a signal?

PCM Low-Pass Filterserial to parallel converter

digital to analog converter

sample/reconstruction process

v(t)

f s

Amplifierlow-pass filter

outputR

Digitizing Voice: PCM Waveform Encoding

• Nyquist Theorem: sample at twice the highest frequency– Voice frequency range: 300-3400 Hz– Sampling frequency = 8000/sec (every 125us)– Bit rate: (2 x 4 Khz) x 8 bits per sample– = 64,000 bits per second (DS-0)

• By far the most commonly used method

CODEC

PCM64 Kbps

= DS-0

In 1D, DSP Is…

• 1D Digital signal processing is a kind of data processing that operates on 1D PCM data.

O-scope

Harmonics

• The fundamental frequency of a sound is said to be the component of strongest magnitude.

• Few sounds are just sine waves.

• The extra waves in a sound refer to the harmonic content or timbre.

Harmonic formula

• A harmonic is a numeric multiple of pitches.

• If 440 Hz is the 1st harmonic then

• 880 Hz is the 2nd harmonic

• Individual sine waves are called partials.

Harmonic Motion

The frequency of the oscillations is given by

http://en.wikipedia.org/wiki/Frequency

http://en.wikipedia.org/wiki/Image:Simple_harmonic_oscillator.gif

How do I model Spectra?

• Suppose the continuous signal is v(t)

• Let the Fourier coefficients be denoted:

v(t) a0 (a1 cos t b1 sin t) (a2 cos2 t b2 sin 2t)a0 ,a1,b1, a2 ,b2

Sawtooth Wave Form

K=10

Model of a Saw Wave

f (x) 2

1 (n 1) sin(nx)

nn1

K

Sawwave k=100

Example: a 4 voice synthesizer

• Design a program that can:– Play sound– Provide a GUI for determining the amplitudes

of up to 7 harmonics– Enable the user to alter the frequency for the

fundamental tone.– Enable the playing of 4 voices– Enable the control of the overall volume.

Building an Oscillator in software

• //the period of the wave form is

• lambda = 1 / frequency in seconds

• //The number of samples per period is

• samplesPerCycle = sampleRate * lambda;

• sampleRate = 8000 samples/ second

Fourier transform

V( f ) F[v(t )] v(t )e 2 iftdt

v(t) F 1 V( f ) V( f )e2iftdt

How do you compute the Fourier Coefficients?

• Use the Fourier transform!

v(t) a0 (a1 cos t b1 sin t) (a2 cos2 t b2 sin 2t)

V( f ) F[v(t )] v(t )e 2 iftdt

v(t) F 1 V( f ) V( f )e2iftdt

Recall Euler’s identity

• Complex numbers have a real and imaginary part:

e i cos i sin

Another way to express a function

v(t) a0 (a1 cos t b1 sin t) (a2 cos2 t b2 sin 2t)

f 0 frequency

nf 0 nth harmonic of f 0

Sine-Cosine Representation

x(t) an cos(2nf 0t) bn sin(2nf 0t)n1

n0

f 0 frequency

nf 0 nth harmonic of f 0

Correlation

• Fourier coefficients, are found by correlating the time dependent function, x(t), with a Nth harmonic sine-cosine pair:

a0 1

Tx( t)dt

0

T

an 2

Tx( t)cos(2nf 0t)dt

0

T

bn 2

Tx(t )sin(2nf 0t)dt

0

T

amplitude-phase representation

x(t) = c0 cn cos(2f 0t n )n1

c0 1

Tx(t)dt

0

T

cn an

2 bn2

n tan 1 bn

an

Average Power

P 1

t1 t2

x(t)2

t1

t2

2

0

1( )

TP x t dt

T Periodic signal avg power

PSD (Power Spectral Density)

• is the power at a specific frequency, . ( )S f

Linear combinations in the time domain become linear combinations

in the frequency domain

1 1 2 2 1 1 2 1( ) ( ) [ ( ) ( )]a V f a V f F a v t a v t

Delay in the time domain causes a phase shift in the frequency domain

2( ) ( ( ))ifdV f e F v t t

Scale change in the time domain causes a reciprocal scale change in

the frequency domain

1( ( )), 0

fV F v t

convolution theorem: multiplication in the time domain causes

convolution in the frequency domain

* ( ) ( ( ) ( ))V W f F v t w t

Convolution between two functions of the same variable is defined by

* ( ) ( ) ( )V W f V W f d

Various Codec Bandwidth Consumptions

Encoding/Compression

ResultBit Rate

G.711 PCMA-Law/u-Law

64 kbps (DS0)

G.726 ADPCM 16, 24, 32, 40 kbps

G.727 E-ADPCM

G.729 CS-ACELP 8 kbps

G.728 LD-CELP 16 kbps

G.723.1 CELP 6.3/5.3 kbpsVariable

16, 24, 32, 40 kbps

StandardTransmissionRate for Voice

A means to improve SNR

• Compression uses a coder and a decoder.

• One CODEC is called U-Law.

• U-Law runs at 8 khz sampling and 8 bits per digitized sample.

• ULaw is meant for voice.

Voice grade audio-Application

• voice over IP

• Voice ranged to about 3.4 khz

• Sample at 8 Khz, that should be plenty

• Quantize to 8 bits of data (about 48 db SNR)

• Improve the SNR with compression

Voice Quality of Service (QoS) Requirements

Loss

Delay

Delay Variation (Jitter)

Avoiding The 3 Main QoS Challenges

The u-law codec

• X is a number whose range is 0..255

• Log, to the base 2 of X is a number whose range is 0..8

• U-law uses a scale factor (mu) that multiplies the input before log is taken.

• Log (x), base 2 = Log(x)/Log(2)

• Mu-law takes the log to the base 1+mu.

professor douglas lyon [email protected] fairfield university voice and signal processing

Documents