automatic speaker recognition system using mfcc and vq approach

Automatic Speaker Recognition system using Mel Frequency Cepstral Coefficients (MFCC) and Vector Quantization (VQ)

approachPresented by:

Md. Abdullah-al-MAMUN

1

OUTLINEOUTLINE What is speaker recognition ?What is speaker recognition ?

Speaker Identification Speaker Identification Speaker VerificationSpeaker Verification

The Structure of Speaker Recognizer The Structure of Speaker Recognizer Feature Extraction : Feature Extraction : MFCCMFCC Speech Signal to Vector Quantization Speech Signal to Vector Quantization ((VQVQ)) Database Creation ProcessDatabase Creation Process Speaker IdentificationSpeaker Identification Speaker VerificationSpeaker VerificationTable :Table : Speaker Recognition Result Speaker Recognition Result ApplicationsApplications ConclusionConclusion ReferencesReferences

2

What is What is SSpeaker peaker RRecognitionecognition??

Speaker Recognition is the process Speaker Recognition is the process of automatically recognizing who is of automatically recognizing who is speaking on the basis of individual speaking on the basis of individual information included in speech information included in speech signals. signals.

3

Speaker Recognition =

Speaker Identification, Speaker Verification

Speaker Identification

Whose voice is this?

??

??

4

Speaker Verification

• Synonyms: authentication, detection.• User claims an identity.• System task: Accept or reject identity claim.

Is this Ahmad’s voice

?

?

5

Model of Model of Speaker Speaker RecognizerRecognizer

6

Fig -1 : Simple model of Speaker Recognizer .

U Permitted to Access

Hello,Mr. John

The Structure of The Structure of Speaker Speaker RecognizerRecognizer

Figure 2 :Functional Scheme of an ASR System.Figure 2 :Functional Scheme of an ASR System.

7

Feature Extraction Feature VectorFeature Vector

Training ModeTraining Mode

RecognitionRecognition

Speaker Modeling

Classification

Decision Logic Speaker

#ID

Speaker_1Speaker_1

Speech Signal AnalysisSpeech Signal Analysis

FFeature eature EExtractionxtraction- The aim is to extract the voice - The aim is to extract the voice features to distinguish different features to distinguish different phonemes of a language.phonemes of a language.

8

515645465

156156165

156456454

251561565

MFCCMFCC extractionextraction

Pre-emphasis DFTMel filter

banks Log(||2) IDFT

Speech

signalx(n)

WINDOW

x’(n)

xt (n)

Xt(k)

Yt(m)

MFCCyt(m)(k)

9

MFCC means Mel-frequency cepstral coefficients that representation of the short-term power spectrum of a sound for audio processing.

The MFCCs are the amplitudes of the resulting spectrum.

Speech waveform Speech waveform of a phoneme “\of a phoneme “\

ae”ae”

After pre-emphasis After pre-emphasis and Hamming and Hamming

windowingwindowing

Power spectrumPower spectrum MFCCMFCC

Explanatory ExampleExplanatory Example

10

Speech SignalSpeech Signal to to Feature Feature VectorVector

11

515645465

156156165

156456454

251561565

Feature VectorFeature Vector to to ClassificationClassification

Vector Quantization (VQ)

12

Vector Quantization (VQ)

AIM of VQ :representation of large amounts

of data by (few) prototype vectors.

example:

identification and grouping

in clusters of similar data.

assignment of feature vector to the closest prototype w

(similarity or distance measure,

e.g. Euclidean distance )

DDatabase atabase CCreation reation PProcessrocess

13

Database

Speaker #1

Speaker #2

Speaker #3

Hello, Speaker #1

Speaker #1

Speaker #1

Speaker #2

Speaker #2

Hello, Speaker #2

SSpeaker peaker IIdentificationdentification

Database

#1

#2

#3

Speaker

# ?

Speaker #

1

14

SSpeaker peaker VVerificationerification

Database

#1

#2

#3

Speaker #

1Accep

t

15

DDatabaseatabase C Creationreation CConditionondition

16

Table 1: Database description.

Parameter Characteristics

Language BanglaNo. of speaker 5Speech type Sentence reading Recording condition A normal room conditionAudio Length 60-90 secondsAudio type StereoSample Format 16-bit PCMSampling Frequency 8 KHzBit Rate 1411 kbps

SSpeakerpeaker R Recognitionecognition RResultesult

17

Table 3: Test result for speaker recognition system.

Speaker No. of input Correct Incorrect Accuracy

Speaker_1 5 5 0 100%

Speaker_2 9 8 1 88.88%

Speaker_3 6 6 0 100%

Speaker_3 12 11 1 91.67%

Speaker_4 8 8 0 100%

Speaker_5 10 10 0 100%

Total Speaker 50 48 2 96%

Applications

• Transaction authentication– Toll fraud prevention– Telephone credit card purchases– Telephone brokerage (e.g., stock trading)

• Access control– Physical facilities– Computers and data networks

• Information retrieval– Customer information for call centers– Audio indexing (speech skimming device)

• Forensics– Voice sample matching

18

ConclusionsConclusions 100% accuracy achievement is really 100% accuracy achievement is really

difficult whereas our proposal difficult whereas our proposal system achieve 96% accuracy for system achieve 96% accuracy for limited resources (limited resources (speaker & utterancespeaker & utterance)). .

You should avoided poor quality You should avoided poor quality microphone to get better accuracy.microphone to get better accuracy.

Training the recognizer will provide Training the recognizer will provide an even better experience.an even better experience.

19

Thank YouThank You

20

automatic speaker recognition system using mfcc and vq approach

Engineering

id speaker

speaker verification

mfccmfcc speech signal

process ofspeaker recognition

asr system

voice features

system task

ahmads voice