speaker recognition system by abhishek mahajan
TRANSCRIPT
![Page 1: Speaker recognition system by abhishek mahajan](https://reader033.vdocument.in/reader033/viewer/2022042723/5885ff4a1a28ab0a3f8b531b/html5/thumbnails/1.jpg)
SHREEJEE INSTITUTE OF TECHNOLOGY AND MANAGEMENT
Speaker Recognition
• Guided By:- Mr. Prakash Singh Panwar
• By:- Rajpal Singh Chouhan• EC BRANCH 1ST YEAR
![Page 2: Speaker recognition system by abhishek mahajan](https://reader033.vdocument.in/reader033/viewer/2022042723/5885ff4a1a28ab0a3f8b531b/html5/thumbnails/2.jpg)
What is Speaker Recognition?
Speaker Recognition is the process of automatically recognizing who is speaking on the basis of individual
information included in speech signals.
Speaker Recognition =
Speaker Identification, Speaker Verification
![Page 3: Speaker recognition system by abhishek mahajan](https://reader033.vdocument.in/reader033/viewer/2022042723/5885ff4a1a28ab0a3f8b531b/html5/thumbnails/3.jpg)
Speaker Identification• a
• Determine the speaker identity.
• Selection between a set of known voices.
• The user does not claim an identity.
Whose voice is this?
? ?
??
![Page 4: Speaker recognition system by abhishek mahajan](https://reader033.vdocument.in/reader033/viewer/2022042723/5885ff4a1a28ab0a3f8b531b/html5/thumbnails/4.jpg)
Speaker Verification• a
• Synonyms: authentication, detection.• User claims an identity.• System task: Accept or reject identity claim.
Is this Ahmad’s voice
?
?
![Page 5: Speaker recognition system by abhishek mahajan](https://reader033.vdocument.in/reader033/viewer/2022042723/5885ff4a1a28ab0a3f8b531b/html5/thumbnails/5.jpg)
Model of Speaker Recognizer• a
Fig -1 : Simple model of Speaker Recognizer .
U Permitted to Access
Hello,Mr. John
![Page 6: Speaker recognition system by abhishek mahajan](https://reader033.vdocument.in/reader033/viewer/2022042723/5885ff4a1a28ab0a3f8b531b/html5/thumbnails/6.jpg)
The Structure of Speaker Recognizer• a
• Figure 2 :Functional Scheme of an ASR System.
Feature Extraction Feature Vector
Training Mode
Recognition
Speaker Modeling
Classification
Decision Logic Speaker
#ID
Speaker_1
![Page 7: Speaker recognition system by abhishek mahajan](https://reader033.vdocument.in/reader033/viewer/2022042723/5885ff4a1a28ab0a3f8b531b/html5/thumbnails/7.jpg)
Speech Signal AnalysisFeature Extraction
• a
• - The aim is to extract the voice features to distinguish different phonemes of a language.
515645465
156156165
156456454
251561565
![Page 8: Speaker recognition system by abhishek mahajan](https://reader033.vdocument.in/reader033/viewer/2022042723/5885ff4a1a28ab0a3f8b531b/html5/thumbnails/8.jpg)
MFCC extraction• a
Pre-emphasis DFT Mel filter banks Log(||2) IDFT
Speech
signalx(n)
WINDOW
x’(n)
xt (n)
Xt(k)
Yt(m)
MFCCyt(m)(k)
MFCC means Mel-frequency cepstral coefficients that representation of the short-term power spectrum of a sound for audio processing.
The MFCCs are the amplitudes of the resulting spectrum.
![Page 9: Speaker recognition system by abhishek mahajan](https://reader033.vdocument.in/reader033/viewer/2022042723/5885ff4a1a28ab0a3f8b531b/html5/thumbnails/9.jpg)
a
• a
Speech waveform of a phoneme “\ae”
After pre-emphasis and Hamming windowing
Power spectrum MFCC
![Page 10: Speaker recognition system by abhishek mahajan](https://reader033.vdocument.in/reader033/viewer/2022042723/5885ff4a1a28ab0a3f8b531b/html5/thumbnails/10.jpg)
Speech Signal to Feature Vector• a
515645465
156156165
156456454
251561565
![Page 11: Speaker recognition system by abhishek mahajan](https://reader033.vdocument.in/reader033/viewer/2022042723/5885ff4a1a28ab0a3f8b531b/html5/thumbnails/11.jpg)
Vector Quantization (VQ) • aAIM of VQ :
representation of large amountsof data by (few) prototype vectors.
example: identification and groupingin clusters of similar data.
assignment of feature vector to the closest prototype w(similarity or distance measure, e.g. Euclidean distance )
![Page 12: Speaker recognition system by abhishek mahajan](https://reader033.vdocument.in/reader033/viewer/2022042723/5885ff4a1a28ab0a3f8b531b/html5/thumbnails/12.jpg)
Database Creation Process• a
Database
Speaker #1
Speaker #2
Speaker #3
Hello, Speaker #1
Speaker #1Speaker #2
Hello, Speaker #2
![Page 13: Speaker recognition system by abhishek mahajan](https://reader033.vdocument.in/reader033/viewer/2022042723/5885ff4a1a28ab0a3f8b531b/html5/thumbnails/13.jpg)
Speaker Identification• a
Database
#1 #2 #3
Speaker
# ?
Speaker 1 5.94
Speaker
# 1
![Page 14: Speaker recognition system by abhishek mahajan](https://reader033.vdocument.in/reader033/viewer/2022042723/5885ff4a1a28ab0a3f8b531b/html5/thumbnails/14.jpg)
Speaker Verification• a
Database
#1 #2 #3
Speaker
# 1
Speaker 1 5.94
Accept
14
![Page 15: Speaker recognition system by abhishek mahajan](https://reader033.vdocument.in/reader033/viewer/2022042723/5885ff4a1a28ab0a3f8b531b/html5/thumbnails/15.jpg)
Database Creation Condition• a
Table 1: Database description.
Parameter Characteristics
Language BanglaNo. of speaker 5Speech type Sentence reading Recording condition A normal room conditionAudio Length 60-90 secondsAudio type StereoSample Format 16-bit PCMSampling Frequency 8 KHzBit Rate 1411 kbps
![Page 16: Speaker recognition system by abhishek mahajan](https://reader033.vdocument.in/reader033/viewer/2022042723/5885ff4a1a28ab0a3f8b531b/html5/thumbnails/16.jpg)
Speaker Recognition Result• a
Table 3: Test result for speaker recognition system.
Speaker No. of input Correct Incorrect Accuracy
Speaker_1 5 5 0 100%
Speaker_2 9 8 1 88.88%
Speaker_3 6 6 0 100%
Speaker_3 12 11 1 91.67%
Speaker_4 8 8 0 100%
Speaker_5 10 10 0 100%
Total Speaker 50 48 2 96%
![Page 17: Speaker recognition system by abhishek mahajan](https://reader033.vdocument.in/reader033/viewer/2022042723/5885ff4a1a28ab0a3f8b531b/html5/thumbnails/17.jpg)
Applications• a • Transaction authentication
– Toll fraud prevention– Telephone credit card purchases– Telephone brokerage (e.g., stock trading)
• Access control– Physical facilities– Computers and data networks
• Information retrieval– Customer information for call centers– Audio indexing (speech skimming device)
• Forensics– Voice sample matching