@let@token forecasting in signal processing · forecasting in signal processing s. umesh department...

23
Forecasting in Signal Processing S. Umesh Department of Electrical Engineering Indian Institute of Technology Kanpur Kanpur - 208016, INDIA 17 th March, 2008 S. Umesh

Upload: others

Post on 26-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Forecasting in Signal Processing

S. Umesh

Department of Electrical Engineering

Indian Institute of Technology Kanpur

Kanpur - 208016, INDIA

17th March, 2008

S. Umesh

What is Forecasting?

To estimate or predict in advance thevalue or occurence of an event, usually byanalysis of previous data.

Trained Model

PredictorTest Data

Train Data

PredictedValue

Forecasting

Train: Given Historical data, develop and train a

mathematical model that explains the historical data very well.

Validate: Optimize or test this model to determine how well

it forecasts some of this historical data which is unobserved by

the model, but for which we know the ground truth.

Test: Finally, use this model to forecast future values.

The parameters of the model may change as it observes newdata to better forecast future data, i.e. model may vary withtime.

S. Umesh

Linear Prediction

Given a sequence of data samples x [1], . . . x [N − 1],predict Nth sample, x [N], based on previous P samples.

x[N ]?

N-1 N

· · ·

n

x[n]

- x [N] =∑P

k=1 akx [n − k]

- A P th order Linear Predictor.

Example: Given x [1], . . . x [20] & P = 8, the prediction equations are:

x [9] = a1x [8] + a2x [7] + . . . + a8x [1]

x [10] = a1x [9] + . . . . . . . . . . . . + a8x [2]

. . .

x [17] = a1x [16] + . . . . . . . . . . . . + a8x [9]

How do we determine a1, . . . a8?

S. Umesh

Linear Prediction

Prediction error for nth sample is e[n] = x [n] − x [n]

Estimate model parameters ai , for i = 1, . . . , 8by minimizing

∑16n=9 e2[n] =

∑17n=9(x [n] − x [n])2.

Let a1 . . . a8 be the estimated parameters of the LP model.Validation: Using the model, we can check or test how well it predicts

x [18] = a1x [16] + . . . + a8x [9]

x [19] = a1x [17] + . . . + a8x [10]

If the prediction is good then we can use this model with confidenceto estimate the unseen data x [21], x [22] etc.

Applications: Stock price prediction, speech coding, Forex rate

S. Umesh

Linear Predictive Coding

Linear Prediction is used for compression of speech signal.

Forms the basis for mobile telephony and Voice Over IP (e.g. Skype).

SpeechVocal Cord

Glottis Vocal tract,

h[n]g[n]

Tongue,Lip and Jaw

s[n]

FilterSource (excitation)

Vocal-tract shape determines sound produced

Linear-Prediction of speech s[n] (with P = 8 to P = 12)⇒ good model for characterizing vocal-tract shape

- ak (LP coefficients) model vocal tract-shape (and hence sound).

The error in modeling represents the glottis excitation – residual.

Model parameter updated every 10 msec, to keep up with changes invocal-tract shape (or the sound produced).

S. Umesh

Linear Predictive Coding (Continued)

For data that is 800 sample long corresponding to 10 ms of speech:

Need only 10-12 parameters + parameter to represent the residual.Residual :e[n] = s[n] − s[n] = s[n] −

∑p

k=1 aks[n − k ]

Decoder:

Using the residual, e[n] and LPC coefficients, ak , we predict s[n]

Prediction at Receiver : s[n] = e[n] +

p∑

k=1

aks[n − k ]

Perfect prediction if residual, e[n], is exactly transmitted but stillrequires 800 samples! – thankfully less dynamic range.

In practice e[n] is generated at decoder (and is not same as e[n]) –Code Excitation Linear Prediction (CELP) [ Atal, ICASSP’84 ]

Other applications of LPC: System Identification, Deconvolution

S. Umesh

Auto-Regressive Estimation

Spectral Estimation: dominant “lines” or frequency in data.

Sxx(f ) =∑

k=−∞rxx [k]e−j2πfk FourierTrans.

⇐⇒ rxx [k] for −∞ < k < ∞

Given finite length data: x [0], x [1], . . . x [N − 1],

rxx [k] =

{1N

∑N−1−kn=0 x∗[n]x [n + k], for k = 0, 1, 2 . . . P − 1

r∗xx [−k], k = −(P − 1),−(P − 2), . . . ,−1

Poor estimation: rxx [k] is assumed zero outside [−(N − 1),N − 1]

Correlation Extrapolation:

rxx [k] =

{rxx [k] for 0 ≤ k ≤ P

−∑P

l=1 arxx [k − l ] , k > P

S. Umesh

Auto-Regressive Estimation

S. Umesh

Language Modeling(LM) (N-gram)

In LPC, given previous P samples, we predict next sample x [n].

Given a sequence of P words, can we predict the next word?

Example:

S. Umesh

Language Modeling(LM) (N-gram)

In LPC, given previous P samples, we predict next sample x [n].

Given a sequence of P words, can we predict the next word?

Example:I want to make a collect..............?

S. Umesh

Language Modeling(LM) (N-gram)

In LPC, given previous P samples, we predict next sample x [n].

Given a sequence of P words, can we predict the next word?

Example:I want to make a collect..............?...........Call ⇒ most Probable

S. Umesh

Language Modeling(LM) (N-gram)

In LPC, given previous P samples, we predict next sample x [n].

Given a sequence of P words, can we predict the next word?

Example:I want to make a collect..............?...........Call ⇒ most Probable...........Telephone Call

S. Umesh

Language Modeling(LM) (N-gram)

In LPC, given previous P samples, we predict next sample x [n].

Given a sequence of P words, can we predict the next word?

Example:I want to make a collect..............?...........Call ⇒ most Probable...........Telephone Call...........Person-to-person Call

S. Umesh

Language Modeling(LM) (N-gram)

In LPC, given previous P samples, we predict next sample x [n].

Given a sequence of P words, can we predict the next word?

Example:I want to make a collect..............?...........Call ⇒ most Probable...........Telephone Call...........Person-to-person Call...........International call

S. Umesh

Language Modeling(LM) (N-gram)

In LPC, given previous P samples, we predict next sample x [n].

Given a sequence of P words, can we predict the next word?

Example:I want to make a collect..............?...........Call ⇒ most Probable...........Telephone Call...........Person-to-person Call...........International call

Many Uses: Automatic Speech Recognition, Search Engines etc.

Example: To Correct “real-word” spelling mistake based on context.

They are leaving in 15 minuets︸ ︷︷ ︸

15 minutes

to go home.

Or, They signed the Piece Treaty (“Peace”)

S. Umesh

Language Modeling(LM) (N-gram)

In Automatic Speech Recognition⇒ we may get many competing sentences (hypothesis) as output.

LM helps increase score of “grammatically correct sentences.”

In Word Prediction

Use Conditional Probability as measure unlike Error in linear predictionUse known data to compute

P(wi |wi−1, wi−2, .....) =P(wi , wi−1, ....)

P(wi−1, .....)

Predict the next word as the one with highest probability.

However, distant words have little influence on the current word.

Therefore normally P(wi |wi−1,wi−2, .....,wi−N ) depends onlyprevious N words (finite memoryness), N = 1, 2, · · ·

S. Umesh

Language Modeling(LM) (N-gram)

Markov Property: p(wi |wi−1, wi−2, . . . w0) = p(wi |wi−1, wi−2 . . . , wi−N)

First-order Markov (N = 1) depends only on previous word or state.Example: P(“That cat sat on mat”) = P(“mat”|“on sat cat That”) ·

P(“on”|“sat cat That”)·P(“sat”|“cat That”)·P(“cat|“That”)·P(“That”) =

P(“mat”|“on”) ·P(“on”|“sat”) ·P(“sat”|“cat”) ·P(“cat|“That”) ·P(“That”)

Probability of observing {Sunny, Cloudy, Cloudy, Rainy, Sunny} is:Prob(Sunny) ·Prob(Cloudy |Sunny) ·Prob(Cloudy |Cloudy) ·Prob(Rainy |Cloudy) ·

Prob(Sunny |Rainy) = (0.333) · (0.25) · (0.125) · (0.375) · (0.125)

S. Umesh

Hidden Markov Model (HMM)

Markovian: Helps finding the probability of a word sequence, P(W )

However, in ASR, the word sequence is Unknown (hidden)We observe only Speech Signal corresponding to word sequence.

Automatic Speech Recognition: Given the speech signal, can weuncover the underlying (most likely) word sequence?

Use Hidden Markov Models (HMMs).

S. Umesh

Speech Signal Processing

Synthesis

Speech Analysis

Large Vocabulary Small Vocabulary

Speaker IndependentSpeaker Dependent

RecognitionCompression

Isolated Word Continuous Speech

IBM Via VoiceDragon Naturally Speaking

Inter Disciplinary Nature

ASRSearch

Perception

&

Psychology

Signal

Processing

Phonetics

&

Linguistics

• Dynamic Prog• Viterbi Search

• Decision Trees

•Estimation

•Statistical Model, HMM•Probability Theory

•Feature Extraction

•Spectrum Estimation•Filter-Banks

Statistics

•Grammar•Lexicon

Understanding Ear Processing for better Features

S. Umesh

Overview of Speech Recognition

6000 6500 7000 7500 8000

−0.15

−0.1

−0.05

0

0.05

0.1

acoustic- of feature vectorsSequence

ExtractionFeatureSpeech

Signal Pattern ClassifierText

AcousticModel

LanguageModel Dictionary

“That cat sat on mat”

o1, o2, . . . , oT

Observations: O = {o1, o2, o3, · · · , oT }

Word sequences: W = {w1,w2,w3, · · · ,wn}

Find most likely word-sequence, W , given speech data :

W = arg maxw∈N

P(W |O)

Using Bayes’ rule:

W = arg maxw∈N

P(W |O) = arg maxw∈N

AcousticModel︷ ︸︸ ︷

P(O|W )

LanguageModel︷ ︸︸ ︷

P(W )

P(O)

S. Umesh

Issues in Developing ASR Systems

Many sources of variabilities!

Type of Channel

Microphone: 60 to 20000HzTelephone: 300 to 3400 Hz

Noise: Environmental noiseaffects ASR performance.

Speaker Characteristics

Differences in vocal tractlength.Prosody (durations, pitch)Speaking Rate

Figure: Same sound spoken by differentspeakers have different spectra.

0 1000 2000 3000 4000 5000 6000 7000 80000

0.5

1

1.5

2

2.5x 10

9 Vowel /eh/

Frequency(Hz)

Mag

nitu

de

Female SpeakerMale Speaker

Formants

Speaker-Independent Model is notcompact, due to speaker relatedvariabilities in spectra.

Speaker Independent models have applications in Railway EnquirySystem, Direct Assistance etc.

S. Umesh

Affine Model for Spectra (Bharath & Umesh, IEEE-ICASSP’04)∗

Spectra for same sound: SA(f ) = SB(αABf + κ (αAB − 1)

Gives better fit and recognition results than existing models

More importantly, using transformation: ν = log(1 + fκ),

SA(ν) = SA(f = κ (eν − 1)) = SB

(e log αAB+ν − 1

))= SB(ν+log αAB) .

i.e., in the “ν” they are just translated version of one another

Interestingly, it is known that earanalyses frequency in the“ν-scale” and not in linearfrequency, f, (Hz) domain.

Tonotopic theory suggests thatspeaker differences manifest astranslations in basilar membrane

Our model provides a connectionbetween speech productionmodel and the hearingmechanism.

Our model seems to validate“tonotopic” theory.

∗ Voted top in its Review Category at IEEE-ICASSP’04

S. Umesh

Some possible Areas on which we would like to focus....

Speaker Normalization and Acoustic Modeling

Better models, computationally efficiency, Children Speech

Speaker Recognition & Verification

Speaker normalization ideas can be usedApplications in defense, speaker-segmentation, clustering.

Emotion detection (Stress, anger, sadness etc.. )

Useful in monitoring emotional health of military personnelUseful in routing call appropriately in call center.

Speech Recognition Systems for Indian Languages (ILs)

No annotated Speech database available for ILsLDC-IL consortium to generate annotated speech database.

Need manpower for transcription (phoneticians and linguists)Use this speech database to develop recognition systems for ILs.

S. Umesh