@let@token forecasting in signal processing · forecasting in signal processing s. umesh department...
TRANSCRIPT
Forecasting in Signal Processing
S. Umesh
Department of Electrical Engineering
Indian Institute of Technology Kanpur
Kanpur - 208016, INDIA
17th March, 2008
S. Umesh
What is Forecasting?
To estimate or predict in advance thevalue or occurence of an event, usually byanalysis of previous data.
Trained Model
PredictorTest Data
Train Data
PredictedValue
Forecasting
Train: Given Historical data, develop and train a
mathematical model that explains the historical data very well.
Validate: Optimize or test this model to determine how well
it forecasts some of this historical data which is unobserved by
the model, but for which we know the ground truth.
Test: Finally, use this model to forecast future values.
The parameters of the model may change as it observes newdata to better forecast future data, i.e. model may vary withtime.
S. Umesh
Linear Prediction
Given a sequence of data samples x [1], . . . x [N − 1],predict Nth sample, x [N], based on previous P samples.
x[N ]?
N-1 N
· · ·
n
x[n]
- x [N] =∑P
k=1 akx [n − k]
- A P th order Linear Predictor.
Example: Given x [1], . . . x [20] & P = 8, the prediction equations are:
x [9] = a1x [8] + a2x [7] + . . . + a8x [1]
x [10] = a1x [9] + . . . . . . . . . . . . + a8x [2]
. . .
x [17] = a1x [16] + . . . . . . . . . . . . + a8x [9]
How do we determine a1, . . . a8?
S. Umesh
Linear Prediction
Prediction error for nth sample is e[n] = x [n] − x [n]
Estimate model parameters ai , for i = 1, . . . , 8by minimizing
∑16n=9 e2[n] =
∑17n=9(x [n] − x [n])2.
Let a1 . . . a8 be the estimated parameters of the LP model.Validation: Using the model, we can check or test how well it predicts
x [18] = a1x [16] + . . . + a8x [9]
x [19] = a1x [17] + . . . + a8x [10]
If the prediction is good then we can use this model with confidenceto estimate the unseen data x [21], x [22] etc.
Applications: Stock price prediction, speech coding, Forex rate
S. Umesh
Linear Predictive Coding
Linear Prediction is used for compression of speech signal.
Forms the basis for mobile telephony and Voice Over IP (e.g. Skype).
SpeechVocal Cord
Glottis Vocal tract,
h[n]g[n]
Tongue,Lip and Jaw
s[n]
FilterSource (excitation)
Vocal-tract shape determines sound produced
Linear-Prediction of speech s[n] (with P = 8 to P = 12)⇒ good model for characterizing vocal-tract shape
- ak (LP coefficients) model vocal tract-shape (and hence sound).
The error in modeling represents the glottis excitation – residual.
Model parameter updated every 10 msec, to keep up with changes invocal-tract shape (or the sound produced).
S. Umesh
Linear Predictive Coding (Continued)
For data that is 800 sample long corresponding to 10 ms of speech:
Need only 10-12 parameters + parameter to represent the residual.Residual :e[n] = s[n] − s[n] = s[n] −
∑p
k=1 aks[n − k ]
Decoder:
Using the residual, e[n] and LPC coefficients, ak , we predict s[n]
Prediction at Receiver : s[n] = e[n] +
p∑
k=1
aks[n − k ]
Perfect prediction if residual, e[n], is exactly transmitted but stillrequires 800 samples! – thankfully less dynamic range.
In practice e[n] is generated at decoder (and is not same as e[n]) –Code Excitation Linear Prediction (CELP) [ Atal, ICASSP’84 ]
Other applications of LPC: System Identification, Deconvolution
S. Umesh
Auto-Regressive Estimation
Spectral Estimation: dominant “lines” or frequency in data.
Sxx(f ) =∑
∞
k=−∞rxx [k]e−j2πfk FourierTrans.
⇐⇒ rxx [k] for −∞ < k < ∞
Given finite length data: x [0], x [1], . . . x [N − 1],
rxx [k] =
{1N
∑N−1−kn=0 x∗[n]x [n + k], for k = 0, 1, 2 . . . P − 1
r∗xx [−k], k = −(P − 1),−(P − 2), . . . ,−1
Poor estimation: rxx [k] is assumed zero outside [−(N − 1),N − 1]
Correlation Extrapolation:
rxx [k] =
{rxx [k] for 0 ≤ k ≤ P
−∑P
l=1 arxx [k − l ] , k > P
S. Umesh
Language Modeling(LM) (N-gram)
In LPC, given previous P samples, we predict next sample x [n].
Given a sequence of P words, can we predict the next word?
Example:
S. Umesh
Language Modeling(LM) (N-gram)
In LPC, given previous P samples, we predict next sample x [n].
Given a sequence of P words, can we predict the next word?
Example:I want to make a collect..............?
S. Umesh
Language Modeling(LM) (N-gram)
In LPC, given previous P samples, we predict next sample x [n].
Given a sequence of P words, can we predict the next word?
Example:I want to make a collect..............?...........Call ⇒ most Probable
S. Umesh
Language Modeling(LM) (N-gram)
In LPC, given previous P samples, we predict next sample x [n].
Given a sequence of P words, can we predict the next word?
Example:I want to make a collect..............?...........Call ⇒ most Probable...........Telephone Call
S. Umesh
Language Modeling(LM) (N-gram)
In LPC, given previous P samples, we predict next sample x [n].
Given a sequence of P words, can we predict the next word?
Example:I want to make a collect..............?...........Call ⇒ most Probable...........Telephone Call...........Person-to-person Call
S. Umesh
Language Modeling(LM) (N-gram)
In LPC, given previous P samples, we predict next sample x [n].
Given a sequence of P words, can we predict the next word?
Example:I want to make a collect..............?...........Call ⇒ most Probable...........Telephone Call...........Person-to-person Call...........International call
S. Umesh
Language Modeling(LM) (N-gram)
In LPC, given previous P samples, we predict next sample x [n].
Given a sequence of P words, can we predict the next word?
Example:I want to make a collect..............?...........Call ⇒ most Probable...........Telephone Call...........Person-to-person Call...........International call
Many Uses: Automatic Speech Recognition, Search Engines etc.
Example: To Correct “real-word” spelling mistake based on context.
They are leaving in 15 minuets︸ ︷︷ ︸
15 minutes
to go home.
Or, They signed the Piece Treaty (“Peace”)
S. Umesh
Language Modeling(LM) (N-gram)
In Automatic Speech Recognition⇒ we may get many competing sentences (hypothesis) as output.
LM helps increase score of “grammatically correct sentences.”
In Word Prediction
Use Conditional Probability as measure unlike Error in linear predictionUse known data to compute
P(wi |wi−1, wi−2, .....) =P(wi , wi−1, ....)
P(wi−1, .....)
Predict the next word as the one with highest probability.
However, distant words have little influence on the current word.
Therefore normally P(wi |wi−1,wi−2, .....,wi−N ) depends onlyprevious N words (finite memoryness), N = 1, 2, · · ·
S. Umesh
Language Modeling(LM) (N-gram)
Markov Property: p(wi |wi−1, wi−2, . . . w0) = p(wi |wi−1, wi−2 . . . , wi−N)
First-order Markov (N = 1) depends only on previous word or state.Example: P(“That cat sat on mat”) = P(“mat”|“on sat cat That”) ·
P(“on”|“sat cat That”)·P(“sat”|“cat That”)·P(“cat|“That”)·P(“That”) =
P(“mat”|“on”) ·P(“on”|“sat”) ·P(“sat”|“cat”) ·P(“cat|“That”) ·P(“That”)
Probability of observing {Sunny, Cloudy, Cloudy, Rainy, Sunny} is:Prob(Sunny) ·Prob(Cloudy |Sunny) ·Prob(Cloudy |Cloudy) ·Prob(Rainy |Cloudy) ·
Prob(Sunny |Rainy) = (0.333) · (0.25) · (0.125) · (0.375) · (0.125)
S. Umesh
Hidden Markov Model (HMM)
Markovian: Helps finding the probability of a word sequence, P(W )
However, in ASR, the word sequence is Unknown (hidden)We observe only Speech Signal corresponding to word sequence.
Automatic Speech Recognition: Given the speech signal, can weuncover the underlying (most likely) word sequence?
Use Hidden Markov Models (HMMs).
S. Umesh
Speech Signal Processing
Synthesis
Speech Analysis
Large Vocabulary Small Vocabulary
Speaker IndependentSpeaker Dependent
RecognitionCompression
Isolated Word Continuous Speech
IBM Via VoiceDragon Naturally Speaking
Inter Disciplinary Nature
ASRSearch
Perception
&
Psychology
Signal
Processing
Phonetics
&
Linguistics
• Dynamic Prog• Viterbi Search
• Decision Trees
•Estimation
•Statistical Model, HMM•Probability Theory
•Feature Extraction
•Spectrum Estimation•Filter-Banks
Statistics
•Grammar•Lexicon
Understanding Ear Processing for better Features
S. Umesh
Overview of Speech Recognition
6000 6500 7000 7500 8000
−0.15
−0.1
−0.05
0
0.05
0.1
acoustic- of feature vectorsSequence
ExtractionFeatureSpeech
Signal Pattern ClassifierText
AcousticModel
LanguageModel Dictionary
“That cat sat on mat”
o1, o2, . . . , oT
Observations: O = {o1, o2, o3, · · · , oT }
Word sequences: W = {w1,w2,w3, · · · ,wn}
Find most likely word-sequence, W , given speech data :
W = arg maxw∈N
P(W |O)
Using Bayes’ rule:
W = arg maxw∈N
P(W |O) = arg maxw∈N
AcousticModel︷ ︸︸ ︷
P(O|W )
LanguageModel︷ ︸︸ ︷
P(W )
P(O)
S. Umesh
Issues in Developing ASR Systems
Many sources of variabilities!
Type of Channel
Microphone: 60 to 20000HzTelephone: 300 to 3400 Hz
Noise: Environmental noiseaffects ASR performance.
Speaker Characteristics
Differences in vocal tractlength.Prosody (durations, pitch)Speaking Rate
Figure: Same sound spoken by differentspeakers have different spectra.
0 1000 2000 3000 4000 5000 6000 7000 80000
0.5
1
1.5
2
2.5x 10
9 Vowel /eh/
Frequency(Hz)
Mag
nitu
de
Female SpeakerMale Speaker
Formants
Speaker-Independent Model is notcompact, due to speaker relatedvariabilities in spectra.
Speaker Independent models have applications in Railway EnquirySystem, Direct Assistance etc.
S. Umesh
Affine Model for Spectra (Bharath & Umesh, IEEE-ICASSP’04)∗
Spectra for same sound: SA(f ) = SB(αABf + κ (αAB − 1)
Gives better fit and recognition results than existing models
More importantly, using transformation: ν = log(1 + fκ),
SA(ν) = SA(f = κ (eν − 1)) = SB
(κ
(e log αAB+ν − 1
))= SB(ν+log αAB) .
i.e., in the “ν” they are just translated version of one another
Interestingly, it is known that earanalyses frequency in the“ν-scale” and not in linearfrequency, f, (Hz) domain.
Tonotopic theory suggests thatspeaker differences manifest astranslations in basilar membrane
Our model provides a connectionbetween speech productionmodel and the hearingmechanism.
Our model seems to validate“tonotopic” theory.
∗ Voted top in its Review Category at IEEE-ICASSP’04
S. Umesh
Some possible Areas on which we would like to focus....
Speaker Normalization and Acoustic Modeling
Better models, computationally efficiency, Children Speech
Speaker Recognition & Verification
Speaker normalization ideas can be usedApplications in defense, speaker-segmentation, clustering.
Emotion detection (Stress, anger, sadness etc.. )
Useful in monitoring emotional health of military personnelUseful in routing call appropriately in call center.
Speech Recognition Systems for Indian Languages (ILs)
No annotated Speech database available for ILsLDC-IL consortium to generate annotated speech database.
Need manpower for transcription (phoneticians and linguists)Use this speech database to develop recognition systems for ILs.
S. Umesh