speech recognition raymond sastraputera. introduction frame/buffer algorithm silent detector ...

20
Speech Recognition Raymond Sastraputera

Upload: sydney-harper

Post on 20-Jan-2016

223 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal

Speech RecognitionRaymond Sastraputera

Page 2: Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal

Introduction Frame/Buffer Algorithm

Silent Detector Estimate Pitch

◦ Correlation and Candidate◦ Optimal Candidate

◦ Buffer Delay Added Bias

Test and Result Conclusion

Page 3: Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal

Estimates the pitch on a speech

Written in C++

Page 4: Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal

Frame segment are shifted with no overlap

Frame segment

Buffer

Page 5: Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal

Initial detection of silent

|max(x)| + |max(y)| + |max(z)| + |min(x)| + |min(y)| + |min(z)| Threshold Value (50dB)

X Y Z

Page 6: Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal

Correlation of two vectors

j j

j,VV

jVjV

jVjV

P2221

)(2)(1

)(2)(1

Page 7: Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal

Correlation P(x,y)

Calculate for different window size (nm)◦ Window size will be the pitch value (in sample)◦ Correlation value above threshold become

candidate with score 1

X Y Z

Vector x Vector y

nmnm

Page 8: Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal

Correlation P(y,z)

Calculate for different nm

◦ Only for window size in candidate score 1◦ Correlation value above threshold become

candidate with score 2

X Y Z

Vector y Vector z

nm nm

Page 9: Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal

Correlation Q(n,m)

Calculate for different nm

◦ nMAX is maximum nm in the candidate

Optimal Candidate◦ if current candidate Qnm*0.77 is higher than

preceeding candidate’s Qnm

X Y Z

Vector x Vector z

nMAX nMAXnm

Page 10: Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal

Candidate score 1 Correlation P(x,y)◦ No candidate silence◦ Single candidate compute P(y,z)

Score stays at 1 hold Score 2 estimated pitch

◦ Multi candidate compute P(y,z) Candidate score 2 Correlation P(y,z)

◦ No candidate compute Q(n,m) candidate score1◦ Single candidate estimated pitch◦ Multi candidate compute Q(n,m)

Optimal Pitch Correlation Q(n,m)

Page 11: Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal

Single candidate with score 2 From Q(n,m) of

◦ Candidate score 2◦ Candidate score 1

On hold, and next frame estimated pitch is neither silence nor on hold.

Page 12: Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal

Delay the returning value of estimated pitch◦ Needed to limit the duration of on hold

Page 13: Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal

Conditions:◦ Two previous frame is not silent◦ Previous frame is not on hold◦ Previous frame pitch is between 5/8 and 7/4 of

the preceding frame pitch

Page 14: Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal

P(x,y) is doubled

Page 15: Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal

correlation_threshold_silent(0.88) Qnm_optimal_multiplier(0.77) sample_rate(20000.0F) max_pitch(400) min_pitch(50) pitch_buffer_size(20) bias_max_frequency(7/4) bias_min_frequency(5/8) silent_threshold(50.0F)

Page 16: Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal
Page 17: Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal
Page 18: Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal
Page 19: Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal

Some improvement can be done to increase the performance of the estimated pitch.◦ Reduce the search space◦ Adding 1st order derivaiton of the pitch◦ Filtering the outlier / noise

Current algorithm might not be fast enough to perform in real time

Page 20: Speech Recognition Raymond Sastraputera.  Introduction  Frame/Buffer  Algorithm  Silent Detector  Estimate Pitch ◦ Correlation and Candidate ◦ Optimal

Bagshaw, Paul Christopher. Automatic Prosodic Analysis for Computer Aider Pronunciation Teaching. The University of Edinburgh (1994).