speaker and language recognition

21
Speaker and Language Recognition A Guided Safari Doug Reynolds 2008 Odyssey Workshop This work was sponsored by the Department of Defense under Air Force contract F19628-00-C-0002. Opinions, interpretations, conclusions, and recommendations are those of the authors and are not necessarily endorsed by the United States Government.

Upload: others

Post on 09-Apr-2022

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Speaker and Language Recognition

Speaker and Language RecognitionA Guided Safari

Doug Reynolds

2008 Odyssey Workshop

This work was sponsored by the Department of Defense under Air Force contract F19628-00-C-0002 Opinions interpretations conclusions and recommendations are those of the authors and are not necessarily endorsed by the United States Government

MIT Lincoln Laboratory2

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory3

Odyssey 2008

The OdysseyMartigny Switzerland ndash April 5-7 1994

bull First workshop focused solely on speaker recognition

ndash Helped form working relationships among international SID community

bull 46 papers 6 tutorialskeynotesbull 65 attendeesbull Technologies TD-HMMs TI-GMMs MLP

LVQ RBF DTW LTAbull Corpora Home grown (digits words

phrases 10-30 speakers) YOHO TIMIT NTIMIT KING POLYPHONE SWB1

bull Very difficult to compare resultsndash Varying corpora experiment designs

measures of performancebull Large emphasis on text-dependent

applications (telcoms)ndash Some papers on forensic SV (human and

machines)

MIT Lincoln Laboratory4

Odyssey 2008

The OdysseyAvignon France ndash April 20-23 1998

bull Focus on forensic and commercial applications

bull 40 papers 5 keynotesbull 78 participantsbull Technologies More emphasis on statistical

approaches (HMM GMM AHS)bull Corpora Still diverse small set (less home-

grown) more TIMIT and SPIDRE (SWB)ndash Europeans showing lead in common

corporaexperiments (POLYCOST VERIVOX CAVE)

bull Increasing buzz about dot-com speechspeaker companies

bull Some lasting themes in talksndash Doddington getting to know the speakerndash Champod LRs as evidence in Baysian

framework

bull Some friction between automatic speaker recognition community and expert human speaker examiner community

ndash ASR crowd pressed for measured error rate

ndash Examiner crowd pressed for transparency and explanation in results

MIT Lincoln Laboratory5

Odyssey 2008

The OdysseyCrete Greece ndash June 18-22 2001

bull Start of official ldquoOdysseyrdquo workshop series

ndash Originally set for Tel-Aviv Israelbull 40 papers 3 keynotesbull 75 participantsbull Technologies More papers on new

tasks (biometrics diarization) addressing practical issues (robustness channel compensation threshold setting)

bull Corpora Many more papers using SRE corpora and experiment design

bull Bayesian framework taking hold for forensic applications

bull Several speaker verification companies (PerSay Nuance VoiceVault)

MIT Lincoln Laboratory6

Odyssey 2008

The OdysseyToledo Spain ndash May 31- June 3 2004

bull Co-occurrence with NIST SRE 2004 workshop

bull 61 paper 4 keynotesbull 147 participantsbull Technologies GMM SVM NAP LFA

high-level features adaptation audio-video LID

bull Corpora SRE corporaprotocol dominant for TI-Telephone RT BNEWS data for diarization TNONFI field forensic corpus

bull Text-dependent work focusing more on user phrases (less digit strings)

MIT Lincoln Laboratory7

Odyssey 2008

The OdysseySan Juan Puerto Rico ndash June 28-30 2006

bull Co-occurrence with NIST SRE 2006 workshop

ndash Followed LRE 2005 in December bull 60 papers 1 keynotebull 103 participantsbull Technologies GMM-SVM NAPLFA

GMM-MMI high-level features robustness

bull Corpora Dominated by SRE and LRE corporaprotocol

MIT Lincoln Laboratory8

Odyssey 2008

The OdysseyStellenbosch South Africa ndash January 21-24 2008

bull Expect to see continued trends inndash Common corporaevaluationsndash High-quality papers and novel topics

bull More fish pictures hellip

MIT Lincoln Laboratory9

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory10

Odyssey 2008

NIST SpeakerLanguage Recognition Evaluations

bull Recurring NIST evaluations of speakerlanguage recognition technology

bull Aim Provide a common paradigm for comparing technologies

bull Focus Conversational telephone speech (text-independent)

Evaluation Coordinator

Linguistic Data Consortium

Data Provider Comparison of technologies on common task

Evaluate

Improve

Technology ConsumersApplication domain parameters

Technology Developers

MIT Lincoln Laboratory11

Odyssey 2008

NIST SRELREPre-history

1992 1993

Rutgers Summer Workshop

1994Informal LRE bull 4 sites OGI MITLL MIT ITTbull OGI 12 lang corpus

MartignyWorkshop

DARPA SID evalbull 3 sites Dragon (LVCSR) ITT

(NN) MITLL (GMM)bull Early SWB1bull 1-4 conv trainbull 24 tgtsbull 111 tgt-test 466 imp-testbull 10-60s testbull Speaker dependent ROCbull Intro of Swets Normal-Deviate

plot (DET)bull Areas Under ROC PdPf=10bull Sun Sparc 1020 MB

MIT Lincoln Laboratory12

Odyssey 2008

NIST SRELREFormal Start

1995 1996SRE 1 bull 6 sites BBN (uniGauss)

Dragon (LVCSR) Ensigma(ergodic HMM) INRS (phone HMMs) ITT (NN) MITLL (GMM o64 cohort)

bull SWB1bull 26 tgtsbull Train 10s 30s 4 sessbull Test 5s 10s 30s Separate

tgt and imp testsbull Area Under ROC

PdPf=3105 closed set error

bull SameDiff phone effectbull Speaker dependent ROC

SRE 2 bull 11 sites Ensigma ITT

MITLL SRI CAIP INRS Dragon BBN LIMSI ATampT Sanders

bull Broad phone HMM LVCSR VQ adapted GMM SVM worldcohorts hnorm anchor models

bull SWB1bull 40 tgtsbull Train 2 min - 1 sess 1

handset 2 handset bull Test 3s 10s 30s Separate

tgt and imp testsbull Pooled DET DCF

LRE 1 bull 4-5 sitesbull PPRLM GMM-CEP

Syllabic models fusion

bull Callfriendbull 12 languages 3

dialectsbull Test 3s 10s 30sbull DET DCF

MIT Lincoln Laboratory13

Odyssey 2008

NIST SRELRESteady Progress

Avignon Workshop

1997 1998SRE 3 bull 8 sites bull Pitch features handset mic

detectorcomp using more dev data

bull SWB2p1bull All speakers act as tgts

and imposters (current paradigm)

bull Train 2 min - 1 sess 1 handset 2 handset

bull Test 3s 10s 30sbull No cross-sex trials

matched and mismatched test phone

bull DET DCF

SRE 4bull 12 sitesbull Phone sequences (BBN)

sequence models (Dragon)

bull SWB2p2bull Train 2 min - 1 sess 2

sess all - 2 sessbull Test 3s 10s 30sbull SNDN and HS type side

knowledgebull Human performance

3s

1999SRE 5bull 13 sitesbull T-norm system fusionbull SWB2p3bull Train 2 min - 2 sessbull Test varying duration (0-

15 15-30 30-45gt45) diff number

bull New tasks 2-spkr test speaker tracking

MIT Lincoln Laboratory14

Odyssey 2008

NIST SRELRENew Directions

Odyssey Workshop

JHU SuperSIDWorkshop

2000 2001SRE 6bull 12 sites (First shark

sighting) bull SMS bull SWB2p1p2

AHUMADAbull Train 2 min - 1 sessbull Test variable 0-60 bull New tasks 2-spkr

train amp test N-speaker segmentation

20032002SRE 7bull 13 sitesbull Per-frame SVM

Fusion text-constrained GMM word amp phone N-gram

bull SWB2p1p2 AHUMADA SWB2p4 (cell)

bull Extended data task (SWB1)

SRE 8bull 24 sitesbull Feat map

high-level features mlpfusion

bull SWB2p5 (cell) SWB2p2p3 (ext) FBIVoiceDB(Multi Modal) BNEWS (seg) Meeting (seg)

SRE 9bull 19 sitesbull SVM GLDS phone

svm nerfs)bull SWB2p5 (cell)

SWB2p2p3 (ext)

LRE 2bull 6 sitesbull PPRLM GMM-SDC

SVM-SDC fusionbull Callfriendbull 12 languagesbull Test 3s 10s 30s

MIT Lincoln Laboratory15

Odyssey 2008

NIST SRELRECurrent Period

Odyssey Workshop

Odyssey Workshop

2004 2005 20072006LRE 4bull 21 sitesbull SVM-GSV ho

ngrams fLFAfNAPbull Mixer5 OHSUbull 14 languages 5

dialectsaccentsbull Calibrated LLRs

LRE 3bull 11 sitesbull GMM-MMI TRAPSNN-

decoder phone lattice

2008

Odyssey WS SRE 13 JHU WS

SRE 10bull 24 sitesbull Large system

fusionbull Mixer1bull Bilingual

speakers

SRE 11bull 27 sitesbull LFA SVM-MLLRbull Mixer2 MMSRbull Cross-channel

microphonesbull Calibrated LLRs

SRE 12bull 36 sitesbull SVM-GSV

spectral-only systems

bull Mixer2+3 MMSRbull Bilingual cross-

channelbull Multi-site

collaboration

PPR-BinTree PPR-SVMbull OHSU Mixer12bull 7 languages 3

dialectsaccents

MIT Lincoln Laboratory16

Odyssey 2008

NIST SREHow are we doing

0

001002

003004

005

006007

008

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

Year

DC

F

Landline 1sp 2 min train 30 sec test

Cellular 1sp 2 min train 30 sec test

Landline 2-speaker detection

Ahumada(Spanish)

Multimodal (FBI)Landline 1sp (40

target speaker paradigm)

Cellularland 2-speaker detection

CellLand 1sp 8-conv train 1-conv test

Cross-mic1-conv train (tel) 1-conv test (mic)

Swb1 Swb2p1 Swb2p3 Swb2p4 Swb2p5 Mixer1 Mixer3

bull Sampling of tasks shown 28 in SRE04

Cross-language

Swb2p2 Mixer2 MMSR

MIT Lincoln Laboratory17

Odyssey 2008

0123456789

10

0

1

2

3

4

2001 2002 2003 2004 2005 2006

SRE Performance Trends 2001-2007Lincoln Systems

bull Consistent and steady improvement for datatask focus

EER ()1conv4w1conv4w

8conv4w1conv4w

minDCFx100

2001 2002 2003 2004 2005 2006SWB1 SWB2 MIXER2-3

bull New data sets designed to be more challenging

bull New features classifiers and compensations drive error rates down over time

SVM-GSV GMM-LFA MultiFeatSVM-GLDS SVM-MLLR+NAP

2006

NAP TC-SVM wordphone lattices2005

PhoneWord-SVM GMM-ATNORM2004

Feature Mapping SVM-GLDS2003

SuperSID High-level features2002

Text-const GMM word-ngram2001

MIT Lincoln Laboratory18

Odyssey 2008

0

10

20

30

40

1996 2003 2005 2005 2007 2007

EER

()

30s 10s 3s

CallFriend(12-lang)

OHSU(7-lang)

Mixer3(14-lang)

113

32 421014

LRE Performance Trends 1996-2007Lincoln Systems

19

Year Main LID Technology

1996 PPRLM2003 + GMM-SDC SVM-SDC2005 + Phone lattices SVM w ngrams

Binary Trees2007 + TRAPS tokenizers fLFA fNAP

GMM-MMI SVM-GSV calibrated LLRs

MIT Lincoln Laboratory19

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory20

Odyssey 2008

The ExpeditionEvaluations

bull The evaluation paradigm has clearly helped propel speaker and language RampD forward

ndash Common focus ndash Comparable results and repeatable experimentsndash Collaboration

bull But there are some issues to considerndash Proliferation of tasks and conditions can dilute and fragment

community effortndash Evaluations are application-dependent

The tasks conditions and data are representative of some application(s)

Are these being set in a meaningful wayndash Performance numbers need context

Time-pressed less-technical potential users want yesno to ldquowill it or wonrsquot it work for my applicationrdquo

ndash Speaker and language recognition research increasingly relies on data driven discovery

Does performance depend on highly matched dev data Are performance gains due to technology or data

MIT Lincoln Laboratory21

Odyssey 2008

The ExpeditionResearch

bull Speaker and language research are built on three core areas

ndash Speech Science Understanding how speakerlanguage information is conveyed in the speech signal and how to robustly extract measures of this information

ndash Pattern Recognition Techniques and algorithms to effectively represent and compare salient patterns in data

ndash Data Driven Discovery Effectively using data to apply refine and improve systems built from above

bull Current speakerlanguage research is heavily weighted toward data driven discovery

ndash Cure or cursendash Are we discovering underlying problems to address in

research or just where we want more data

  • Speaker and Language RecognitionA Guided Safari
  • Roadmap
  • The OdysseyMartigny Switzerland ndash April 5-7 1994
  • The OdysseyAvignon France ndash April 20-23 1998
  • The OdysseyCrete Greece ndash June 18-22 2001
  • The OdysseyToledo Spain ndash May 31- June 3 2004
  • The OdysseySan Juan Puerto Rico ndash June 28-30 2006
  • The OdysseyStellenbosch South Africa ndash January 21-24 2008
  • Roadmap
  • NIST SpeakerLanguage Recognition Evaluations
  • NIST SRELREPre-history
  • NIST SRELREFormal Start
  • NIST SRELRESteady Progress
  • NIST SRELRENew Directions
  • NIST SRELRECurrent Period
  • NIST SREHow are we doing
  • SRE Performance Trends 2001-2007Lincoln Systems
  • LRE Performance Trends 1996-2007Lincoln Systems
  • Roadmap
  • The ExpeditionEvaluations
  • The ExpeditionResearch
Page 2: Speaker and Language Recognition

MIT Lincoln Laboratory2

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory3

Odyssey 2008

The OdysseyMartigny Switzerland ndash April 5-7 1994

bull First workshop focused solely on speaker recognition

ndash Helped form working relationships among international SID community

bull 46 papers 6 tutorialskeynotesbull 65 attendeesbull Technologies TD-HMMs TI-GMMs MLP

LVQ RBF DTW LTAbull Corpora Home grown (digits words

phrases 10-30 speakers) YOHO TIMIT NTIMIT KING POLYPHONE SWB1

bull Very difficult to compare resultsndash Varying corpora experiment designs

measures of performancebull Large emphasis on text-dependent

applications (telcoms)ndash Some papers on forensic SV (human and

machines)

MIT Lincoln Laboratory4

Odyssey 2008

The OdysseyAvignon France ndash April 20-23 1998

bull Focus on forensic and commercial applications

bull 40 papers 5 keynotesbull 78 participantsbull Technologies More emphasis on statistical

approaches (HMM GMM AHS)bull Corpora Still diverse small set (less home-

grown) more TIMIT and SPIDRE (SWB)ndash Europeans showing lead in common

corporaexperiments (POLYCOST VERIVOX CAVE)

bull Increasing buzz about dot-com speechspeaker companies

bull Some lasting themes in talksndash Doddington getting to know the speakerndash Champod LRs as evidence in Baysian

framework

bull Some friction between automatic speaker recognition community and expert human speaker examiner community

ndash ASR crowd pressed for measured error rate

ndash Examiner crowd pressed for transparency and explanation in results

MIT Lincoln Laboratory5

Odyssey 2008

The OdysseyCrete Greece ndash June 18-22 2001

bull Start of official ldquoOdysseyrdquo workshop series

ndash Originally set for Tel-Aviv Israelbull 40 papers 3 keynotesbull 75 participantsbull Technologies More papers on new

tasks (biometrics diarization) addressing practical issues (robustness channel compensation threshold setting)

bull Corpora Many more papers using SRE corpora and experiment design

bull Bayesian framework taking hold for forensic applications

bull Several speaker verification companies (PerSay Nuance VoiceVault)

MIT Lincoln Laboratory6

Odyssey 2008

The OdysseyToledo Spain ndash May 31- June 3 2004

bull Co-occurrence with NIST SRE 2004 workshop

bull 61 paper 4 keynotesbull 147 participantsbull Technologies GMM SVM NAP LFA

high-level features adaptation audio-video LID

bull Corpora SRE corporaprotocol dominant for TI-Telephone RT BNEWS data for diarization TNONFI field forensic corpus

bull Text-dependent work focusing more on user phrases (less digit strings)

MIT Lincoln Laboratory7

Odyssey 2008

The OdysseySan Juan Puerto Rico ndash June 28-30 2006

bull Co-occurrence with NIST SRE 2006 workshop

ndash Followed LRE 2005 in December bull 60 papers 1 keynotebull 103 participantsbull Technologies GMM-SVM NAPLFA

GMM-MMI high-level features robustness

bull Corpora Dominated by SRE and LRE corporaprotocol

MIT Lincoln Laboratory8

Odyssey 2008

The OdysseyStellenbosch South Africa ndash January 21-24 2008

bull Expect to see continued trends inndash Common corporaevaluationsndash High-quality papers and novel topics

bull More fish pictures hellip

MIT Lincoln Laboratory9

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory10

Odyssey 2008

NIST SpeakerLanguage Recognition Evaluations

bull Recurring NIST evaluations of speakerlanguage recognition technology

bull Aim Provide a common paradigm for comparing technologies

bull Focus Conversational telephone speech (text-independent)

Evaluation Coordinator

Linguistic Data Consortium

Data Provider Comparison of technologies on common task

Evaluate

Improve

Technology ConsumersApplication domain parameters

Technology Developers

MIT Lincoln Laboratory11

Odyssey 2008

NIST SRELREPre-history

1992 1993

Rutgers Summer Workshop

1994Informal LRE bull 4 sites OGI MITLL MIT ITTbull OGI 12 lang corpus

MartignyWorkshop

DARPA SID evalbull 3 sites Dragon (LVCSR) ITT

(NN) MITLL (GMM)bull Early SWB1bull 1-4 conv trainbull 24 tgtsbull 111 tgt-test 466 imp-testbull 10-60s testbull Speaker dependent ROCbull Intro of Swets Normal-Deviate

plot (DET)bull Areas Under ROC PdPf=10bull Sun Sparc 1020 MB

MIT Lincoln Laboratory12

Odyssey 2008

NIST SRELREFormal Start

1995 1996SRE 1 bull 6 sites BBN (uniGauss)

Dragon (LVCSR) Ensigma(ergodic HMM) INRS (phone HMMs) ITT (NN) MITLL (GMM o64 cohort)

bull SWB1bull 26 tgtsbull Train 10s 30s 4 sessbull Test 5s 10s 30s Separate

tgt and imp testsbull Area Under ROC

PdPf=3105 closed set error

bull SameDiff phone effectbull Speaker dependent ROC

SRE 2 bull 11 sites Ensigma ITT

MITLL SRI CAIP INRS Dragon BBN LIMSI ATampT Sanders

bull Broad phone HMM LVCSR VQ adapted GMM SVM worldcohorts hnorm anchor models

bull SWB1bull 40 tgtsbull Train 2 min - 1 sess 1

handset 2 handset bull Test 3s 10s 30s Separate

tgt and imp testsbull Pooled DET DCF

LRE 1 bull 4-5 sitesbull PPRLM GMM-CEP

Syllabic models fusion

bull Callfriendbull 12 languages 3

dialectsbull Test 3s 10s 30sbull DET DCF

MIT Lincoln Laboratory13

Odyssey 2008

NIST SRELRESteady Progress

Avignon Workshop

1997 1998SRE 3 bull 8 sites bull Pitch features handset mic

detectorcomp using more dev data

bull SWB2p1bull All speakers act as tgts

and imposters (current paradigm)

bull Train 2 min - 1 sess 1 handset 2 handset

bull Test 3s 10s 30sbull No cross-sex trials

matched and mismatched test phone

bull DET DCF

SRE 4bull 12 sitesbull Phone sequences (BBN)

sequence models (Dragon)

bull SWB2p2bull Train 2 min - 1 sess 2

sess all - 2 sessbull Test 3s 10s 30sbull SNDN and HS type side

knowledgebull Human performance

3s

1999SRE 5bull 13 sitesbull T-norm system fusionbull SWB2p3bull Train 2 min - 2 sessbull Test varying duration (0-

15 15-30 30-45gt45) diff number

bull New tasks 2-spkr test speaker tracking

MIT Lincoln Laboratory14

Odyssey 2008

NIST SRELRENew Directions

Odyssey Workshop

JHU SuperSIDWorkshop

2000 2001SRE 6bull 12 sites (First shark

sighting) bull SMS bull SWB2p1p2

AHUMADAbull Train 2 min - 1 sessbull Test variable 0-60 bull New tasks 2-spkr

train amp test N-speaker segmentation

20032002SRE 7bull 13 sitesbull Per-frame SVM

Fusion text-constrained GMM word amp phone N-gram

bull SWB2p1p2 AHUMADA SWB2p4 (cell)

bull Extended data task (SWB1)

SRE 8bull 24 sitesbull Feat map

high-level features mlpfusion

bull SWB2p5 (cell) SWB2p2p3 (ext) FBIVoiceDB(Multi Modal) BNEWS (seg) Meeting (seg)

SRE 9bull 19 sitesbull SVM GLDS phone

svm nerfs)bull SWB2p5 (cell)

SWB2p2p3 (ext)

LRE 2bull 6 sitesbull PPRLM GMM-SDC

SVM-SDC fusionbull Callfriendbull 12 languagesbull Test 3s 10s 30s

MIT Lincoln Laboratory15

Odyssey 2008

NIST SRELRECurrent Period

Odyssey Workshop

Odyssey Workshop

2004 2005 20072006LRE 4bull 21 sitesbull SVM-GSV ho

ngrams fLFAfNAPbull Mixer5 OHSUbull 14 languages 5

dialectsaccentsbull Calibrated LLRs

LRE 3bull 11 sitesbull GMM-MMI TRAPSNN-

decoder phone lattice

2008

Odyssey WS SRE 13 JHU WS

SRE 10bull 24 sitesbull Large system

fusionbull Mixer1bull Bilingual

speakers

SRE 11bull 27 sitesbull LFA SVM-MLLRbull Mixer2 MMSRbull Cross-channel

microphonesbull Calibrated LLRs

SRE 12bull 36 sitesbull SVM-GSV

spectral-only systems

bull Mixer2+3 MMSRbull Bilingual cross-

channelbull Multi-site

collaboration

PPR-BinTree PPR-SVMbull OHSU Mixer12bull 7 languages 3

dialectsaccents

MIT Lincoln Laboratory16

Odyssey 2008

NIST SREHow are we doing

0

001002

003004

005

006007

008

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

Year

DC

F

Landline 1sp 2 min train 30 sec test

Cellular 1sp 2 min train 30 sec test

Landline 2-speaker detection

Ahumada(Spanish)

Multimodal (FBI)Landline 1sp (40

target speaker paradigm)

Cellularland 2-speaker detection

CellLand 1sp 8-conv train 1-conv test

Cross-mic1-conv train (tel) 1-conv test (mic)

Swb1 Swb2p1 Swb2p3 Swb2p4 Swb2p5 Mixer1 Mixer3

bull Sampling of tasks shown 28 in SRE04

Cross-language

Swb2p2 Mixer2 MMSR

MIT Lincoln Laboratory17

Odyssey 2008

0123456789

10

0

1

2

3

4

2001 2002 2003 2004 2005 2006

SRE Performance Trends 2001-2007Lincoln Systems

bull Consistent and steady improvement for datatask focus

EER ()1conv4w1conv4w

8conv4w1conv4w

minDCFx100

2001 2002 2003 2004 2005 2006SWB1 SWB2 MIXER2-3

bull New data sets designed to be more challenging

bull New features classifiers and compensations drive error rates down over time

SVM-GSV GMM-LFA MultiFeatSVM-GLDS SVM-MLLR+NAP

2006

NAP TC-SVM wordphone lattices2005

PhoneWord-SVM GMM-ATNORM2004

Feature Mapping SVM-GLDS2003

SuperSID High-level features2002

Text-const GMM word-ngram2001

MIT Lincoln Laboratory18

Odyssey 2008

0

10

20

30

40

1996 2003 2005 2005 2007 2007

EER

()

30s 10s 3s

CallFriend(12-lang)

OHSU(7-lang)

Mixer3(14-lang)

113

32 421014

LRE Performance Trends 1996-2007Lincoln Systems

19

Year Main LID Technology

1996 PPRLM2003 + GMM-SDC SVM-SDC2005 + Phone lattices SVM w ngrams

Binary Trees2007 + TRAPS tokenizers fLFA fNAP

GMM-MMI SVM-GSV calibrated LLRs

MIT Lincoln Laboratory19

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory20

Odyssey 2008

The ExpeditionEvaluations

bull The evaluation paradigm has clearly helped propel speaker and language RampD forward

ndash Common focus ndash Comparable results and repeatable experimentsndash Collaboration

bull But there are some issues to considerndash Proliferation of tasks and conditions can dilute and fragment

community effortndash Evaluations are application-dependent

The tasks conditions and data are representative of some application(s)

Are these being set in a meaningful wayndash Performance numbers need context

Time-pressed less-technical potential users want yesno to ldquowill it or wonrsquot it work for my applicationrdquo

ndash Speaker and language recognition research increasingly relies on data driven discovery

Does performance depend on highly matched dev data Are performance gains due to technology or data

MIT Lincoln Laboratory21

Odyssey 2008

The ExpeditionResearch

bull Speaker and language research are built on three core areas

ndash Speech Science Understanding how speakerlanguage information is conveyed in the speech signal and how to robustly extract measures of this information

ndash Pattern Recognition Techniques and algorithms to effectively represent and compare salient patterns in data

ndash Data Driven Discovery Effectively using data to apply refine and improve systems built from above

bull Current speakerlanguage research is heavily weighted toward data driven discovery

ndash Cure or cursendash Are we discovering underlying problems to address in

research or just where we want more data

  • Speaker and Language RecognitionA Guided Safari
  • Roadmap
  • The OdysseyMartigny Switzerland ndash April 5-7 1994
  • The OdysseyAvignon France ndash April 20-23 1998
  • The OdysseyCrete Greece ndash June 18-22 2001
  • The OdysseyToledo Spain ndash May 31- June 3 2004
  • The OdysseySan Juan Puerto Rico ndash June 28-30 2006
  • The OdysseyStellenbosch South Africa ndash January 21-24 2008
  • Roadmap
  • NIST SpeakerLanguage Recognition Evaluations
  • NIST SRELREPre-history
  • NIST SRELREFormal Start
  • NIST SRELRESteady Progress
  • NIST SRELRENew Directions
  • NIST SRELRECurrent Period
  • NIST SREHow are we doing
  • SRE Performance Trends 2001-2007Lincoln Systems
  • LRE Performance Trends 1996-2007Lincoln Systems
  • Roadmap
  • The ExpeditionEvaluations
  • The ExpeditionResearch
Page 3: Speaker and Language Recognition

MIT Lincoln Laboratory3

Odyssey 2008

The OdysseyMartigny Switzerland ndash April 5-7 1994

bull First workshop focused solely on speaker recognition

ndash Helped form working relationships among international SID community

bull 46 papers 6 tutorialskeynotesbull 65 attendeesbull Technologies TD-HMMs TI-GMMs MLP

LVQ RBF DTW LTAbull Corpora Home grown (digits words

phrases 10-30 speakers) YOHO TIMIT NTIMIT KING POLYPHONE SWB1

bull Very difficult to compare resultsndash Varying corpora experiment designs

measures of performancebull Large emphasis on text-dependent

applications (telcoms)ndash Some papers on forensic SV (human and

machines)

MIT Lincoln Laboratory4

Odyssey 2008

The OdysseyAvignon France ndash April 20-23 1998

bull Focus on forensic and commercial applications

bull 40 papers 5 keynotesbull 78 participantsbull Technologies More emphasis on statistical

approaches (HMM GMM AHS)bull Corpora Still diverse small set (less home-

grown) more TIMIT and SPIDRE (SWB)ndash Europeans showing lead in common

corporaexperiments (POLYCOST VERIVOX CAVE)

bull Increasing buzz about dot-com speechspeaker companies

bull Some lasting themes in talksndash Doddington getting to know the speakerndash Champod LRs as evidence in Baysian

framework

bull Some friction between automatic speaker recognition community and expert human speaker examiner community

ndash ASR crowd pressed for measured error rate

ndash Examiner crowd pressed for transparency and explanation in results

MIT Lincoln Laboratory5

Odyssey 2008

The OdysseyCrete Greece ndash June 18-22 2001

bull Start of official ldquoOdysseyrdquo workshop series

ndash Originally set for Tel-Aviv Israelbull 40 papers 3 keynotesbull 75 participantsbull Technologies More papers on new

tasks (biometrics diarization) addressing practical issues (robustness channel compensation threshold setting)

bull Corpora Many more papers using SRE corpora and experiment design

bull Bayesian framework taking hold for forensic applications

bull Several speaker verification companies (PerSay Nuance VoiceVault)

MIT Lincoln Laboratory6

Odyssey 2008

The OdysseyToledo Spain ndash May 31- June 3 2004

bull Co-occurrence with NIST SRE 2004 workshop

bull 61 paper 4 keynotesbull 147 participantsbull Technologies GMM SVM NAP LFA

high-level features adaptation audio-video LID

bull Corpora SRE corporaprotocol dominant for TI-Telephone RT BNEWS data for diarization TNONFI field forensic corpus

bull Text-dependent work focusing more on user phrases (less digit strings)

MIT Lincoln Laboratory7

Odyssey 2008

The OdysseySan Juan Puerto Rico ndash June 28-30 2006

bull Co-occurrence with NIST SRE 2006 workshop

ndash Followed LRE 2005 in December bull 60 papers 1 keynotebull 103 participantsbull Technologies GMM-SVM NAPLFA

GMM-MMI high-level features robustness

bull Corpora Dominated by SRE and LRE corporaprotocol

MIT Lincoln Laboratory8

Odyssey 2008

The OdysseyStellenbosch South Africa ndash January 21-24 2008

bull Expect to see continued trends inndash Common corporaevaluationsndash High-quality papers and novel topics

bull More fish pictures hellip

MIT Lincoln Laboratory9

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory10

Odyssey 2008

NIST SpeakerLanguage Recognition Evaluations

bull Recurring NIST evaluations of speakerlanguage recognition technology

bull Aim Provide a common paradigm for comparing technologies

bull Focus Conversational telephone speech (text-independent)

Evaluation Coordinator

Linguistic Data Consortium

Data Provider Comparison of technologies on common task

Evaluate

Improve

Technology ConsumersApplication domain parameters

Technology Developers

MIT Lincoln Laboratory11

Odyssey 2008

NIST SRELREPre-history

1992 1993

Rutgers Summer Workshop

1994Informal LRE bull 4 sites OGI MITLL MIT ITTbull OGI 12 lang corpus

MartignyWorkshop

DARPA SID evalbull 3 sites Dragon (LVCSR) ITT

(NN) MITLL (GMM)bull Early SWB1bull 1-4 conv trainbull 24 tgtsbull 111 tgt-test 466 imp-testbull 10-60s testbull Speaker dependent ROCbull Intro of Swets Normal-Deviate

plot (DET)bull Areas Under ROC PdPf=10bull Sun Sparc 1020 MB

MIT Lincoln Laboratory12

Odyssey 2008

NIST SRELREFormal Start

1995 1996SRE 1 bull 6 sites BBN (uniGauss)

Dragon (LVCSR) Ensigma(ergodic HMM) INRS (phone HMMs) ITT (NN) MITLL (GMM o64 cohort)

bull SWB1bull 26 tgtsbull Train 10s 30s 4 sessbull Test 5s 10s 30s Separate

tgt and imp testsbull Area Under ROC

PdPf=3105 closed set error

bull SameDiff phone effectbull Speaker dependent ROC

SRE 2 bull 11 sites Ensigma ITT

MITLL SRI CAIP INRS Dragon BBN LIMSI ATampT Sanders

bull Broad phone HMM LVCSR VQ adapted GMM SVM worldcohorts hnorm anchor models

bull SWB1bull 40 tgtsbull Train 2 min - 1 sess 1

handset 2 handset bull Test 3s 10s 30s Separate

tgt and imp testsbull Pooled DET DCF

LRE 1 bull 4-5 sitesbull PPRLM GMM-CEP

Syllabic models fusion

bull Callfriendbull 12 languages 3

dialectsbull Test 3s 10s 30sbull DET DCF

MIT Lincoln Laboratory13

Odyssey 2008

NIST SRELRESteady Progress

Avignon Workshop

1997 1998SRE 3 bull 8 sites bull Pitch features handset mic

detectorcomp using more dev data

bull SWB2p1bull All speakers act as tgts

and imposters (current paradigm)

bull Train 2 min - 1 sess 1 handset 2 handset

bull Test 3s 10s 30sbull No cross-sex trials

matched and mismatched test phone

bull DET DCF

SRE 4bull 12 sitesbull Phone sequences (BBN)

sequence models (Dragon)

bull SWB2p2bull Train 2 min - 1 sess 2

sess all - 2 sessbull Test 3s 10s 30sbull SNDN and HS type side

knowledgebull Human performance

3s

1999SRE 5bull 13 sitesbull T-norm system fusionbull SWB2p3bull Train 2 min - 2 sessbull Test varying duration (0-

15 15-30 30-45gt45) diff number

bull New tasks 2-spkr test speaker tracking

MIT Lincoln Laboratory14

Odyssey 2008

NIST SRELRENew Directions

Odyssey Workshop

JHU SuperSIDWorkshop

2000 2001SRE 6bull 12 sites (First shark

sighting) bull SMS bull SWB2p1p2

AHUMADAbull Train 2 min - 1 sessbull Test variable 0-60 bull New tasks 2-spkr

train amp test N-speaker segmentation

20032002SRE 7bull 13 sitesbull Per-frame SVM

Fusion text-constrained GMM word amp phone N-gram

bull SWB2p1p2 AHUMADA SWB2p4 (cell)

bull Extended data task (SWB1)

SRE 8bull 24 sitesbull Feat map

high-level features mlpfusion

bull SWB2p5 (cell) SWB2p2p3 (ext) FBIVoiceDB(Multi Modal) BNEWS (seg) Meeting (seg)

SRE 9bull 19 sitesbull SVM GLDS phone

svm nerfs)bull SWB2p5 (cell)

SWB2p2p3 (ext)

LRE 2bull 6 sitesbull PPRLM GMM-SDC

SVM-SDC fusionbull Callfriendbull 12 languagesbull Test 3s 10s 30s

MIT Lincoln Laboratory15

Odyssey 2008

NIST SRELRECurrent Period

Odyssey Workshop

Odyssey Workshop

2004 2005 20072006LRE 4bull 21 sitesbull SVM-GSV ho

ngrams fLFAfNAPbull Mixer5 OHSUbull 14 languages 5

dialectsaccentsbull Calibrated LLRs

LRE 3bull 11 sitesbull GMM-MMI TRAPSNN-

decoder phone lattice

2008

Odyssey WS SRE 13 JHU WS

SRE 10bull 24 sitesbull Large system

fusionbull Mixer1bull Bilingual

speakers

SRE 11bull 27 sitesbull LFA SVM-MLLRbull Mixer2 MMSRbull Cross-channel

microphonesbull Calibrated LLRs

SRE 12bull 36 sitesbull SVM-GSV

spectral-only systems

bull Mixer2+3 MMSRbull Bilingual cross-

channelbull Multi-site

collaboration

PPR-BinTree PPR-SVMbull OHSU Mixer12bull 7 languages 3

dialectsaccents

MIT Lincoln Laboratory16

Odyssey 2008

NIST SREHow are we doing

0

001002

003004

005

006007

008

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

Year

DC

F

Landline 1sp 2 min train 30 sec test

Cellular 1sp 2 min train 30 sec test

Landline 2-speaker detection

Ahumada(Spanish)

Multimodal (FBI)Landline 1sp (40

target speaker paradigm)

Cellularland 2-speaker detection

CellLand 1sp 8-conv train 1-conv test

Cross-mic1-conv train (tel) 1-conv test (mic)

Swb1 Swb2p1 Swb2p3 Swb2p4 Swb2p5 Mixer1 Mixer3

bull Sampling of tasks shown 28 in SRE04

Cross-language

Swb2p2 Mixer2 MMSR

MIT Lincoln Laboratory17

Odyssey 2008

0123456789

10

0

1

2

3

4

2001 2002 2003 2004 2005 2006

SRE Performance Trends 2001-2007Lincoln Systems

bull Consistent and steady improvement for datatask focus

EER ()1conv4w1conv4w

8conv4w1conv4w

minDCFx100

2001 2002 2003 2004 2005 2006SWB1 SWB2 MIXER2-3

bull New data sets designed to be more challenging

bull New features classifiers and compensations drive error rates down over time

SVM-GSV GMM-LFA MultiFeatSVM-GLDS SVM-MLLR+NAP

2006

NAP TC-SVM wordphone lattices2005

PhoneWord-SVM GMM-ATNORM2004

Feature Mapping SVM-GLDS2003

SuperSID High-level features2002

Text-const GMM word-ngram2001

MIT Lincoln Laboratory18

Odyssey 2008

0

10

20

30

40

1996 2003 2005 2005 2007 2007

EER

()

30s 10s 3s

CallFriend(12-lang)

OHSU(7-lang)

Mixer3(14-lang)

113

32 421014

LRE Performance Trends 1996-2007Lincoln Systems

19

Year Main LID Technology

1996 PPRLM2003 + GMM-SDC SVM-SDC2005 + Phone lattices SVM w ngrams

Binary Trees2007 + TRAPS tokenizers fLFA fNAP

GMM-MMI SVM-GSV calibrated LLRs

MIT Lincoln Laboratory19

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory20

Odyssey 2008

The ExpeditionEvaluations

bull The evaluation paradigm has clearly helped propel speaker and language RampD forward

ndash Common focus ndash Comparable results and repeatable experimentsndash Collaboration

bull But there are some issues to considerndash Proliferation of tasks and conditions can dilute and fragment

community effortndash Evaluations are application-dependent

The tasks conditions and data are representative of some application(s)

Are these being set in a meaningful wayndash Performance numbers need context

Time-pressed less-technical potential users want yesno to ldquowill it or wonrsquot it work for my applicationrdquo

ndash Speaker and language recognition research increasingly relies on data driven discovery

Does performance depend on highly matched dev data Are performance gains due to technology or data

MIT Lincoln Laboratory21

Odyssey 2008

The ExpeditionResearch

bull Speaker and language research are built on three core areas

ndash Speech Science Understanding how speakerlanguage information is conveyed in the speech signal and how to robustly extract measures of this information

ndash Pattern Recognition Techniques and algorithms to effectively represent and compare salient patterns in data

ndash Data Driven Discovery Effectively using data to apply refine and improve systems built from above

bull Current speakerlanguage research is heavily weighted toward data driven discovery

ndash Cure or cursendash Are we discovering underlying problems to address in

research or just where we want more data

  • Speaker and Language RecognitionA Guided Safari
  • Roadmap
  • The OdysseyMartigny Switzerland ndash April 5-7 1994
  • The OdysseyAvignon France ndash April 20-23 1998
  • The OdysseyCrete Greece ndash June 18-22 2001
  • The OdysseyToledo Spain ndash May 31- June 3 2004
  • The OdysseySan Juan Puerto Rico ndash June 28-30 2006
  • The OdysseyStellenbosch South Africa ndash January 21-24 2008
  • Roadmap
  • NIST SpeakerLanguage Recognition Evaluations
  • NIST SRELREPre-history
  • NIST SRELREFormal Start
  • NIST SRELRESteady Progress
  • NIST SRELRENew Directions
  • NIST SRELRECurrent Period
  • NIST SREHow are we doing
  • SRE Performance Trends 2001-2007Lincoln Systems
  • LRE Performance Trends 1996-2007Lincoln Systems
  • Roadmap
  • The ExpeditionEvaluations
  • The ExpeditionResearch
Page 4: Speaker and Language Recognition

MIT Lincoln Laboratory4

Odyssey 2008

The OdysseyAvignon France ndash April 20-23 1998

bull Focus on forensic and commercial applications

bull 40 papers 5 keynotesbull 78 participantsbull Technologies More emphasis on statistical

approaches (HMM GMM AHS)bull Corpora Still diverse small set (less home-

grown) more TIMIT and SPIDRE (SWB)ndash Europeans showing lead in common

corporaexperiments (POLYCOST VERIVOX CAVE)

bull Increasing buzz about dot-com speechspeaker companies

bull Some lasting themes in talksndash Doddington getting to know the speakerndash Champod LRs as evidence in Baysian

framework

bull Some friction between automatic speaker recognition community and expert human speaker examiner community

ndash ASR crowd pressed for measured error rate

ndash Examiner crowd pressed for transparency and explanation in results

MIT Lincoln Laboratory5

Odyssey 2008

The OdysseyCrete Greece ndash June 18-22 2001

bull Start of official ldquoOdysseyrdquo workshop series

ndash Originally set for Tel-Aviv Israelbull 40 papers 3 keynotesbull 75 participantsbull Technologies More papers on new

tasks (biometrics diarization) addressing practical issues (robustness channel compensation threshold setting)

bull Corpora Many more papers using SRE corpora and experiment design

bull Bayesian framework taking hold for forensic applications

bull Several speaker verification companies (PerSay Nuance VoiceVault)

MIT Lincoln Laboratory6

Odyssey 2008

The OdysseyToledo Spain ndash May 31- June 3 2004

bull Co-occurrence with NIST SRE 2004 workshop

bull 61 paper 4 keynotesbull 147 participantsbull Technologies GMM SVM NAP LFA

high-level features adaptation audio-video LID

bull Corpora SRE corporaprotocol dominant for TI-Telephone RT BNEWS data for diarization TNONFI field forensic corpus

bull Text-dependent work focusing more on user phrases (less digit strings)

MIT Lincoln Laboratory7

Odyssey 2008

The OdysseySan Juan Puerto Rico ndash June 28-30 2006

bull Co-occurrence with NIST SRE 2006 workshop

ndash Followed LRE 2005 in December bull 60 papers 1 keynotebull 103 participantsbull Technologies GMM-SVM NAPLFA

GMM-MMI high-level features robustness

bull Corpora Dominated by SRE and LRE corporaprotocol

MIT Lincoln Laboratory8

Odyssey 2008

The OdysseyStellenbosch South Africa ndash January 21-24 2008

bull Expect to see continued trends inndash Common corporaevaluationsndash High-quality papers and novel topics

bull More fish pictures hellip

MIT Lincoln Laboratory9

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory10

Odyssey 2008

NIST SpeakerLanguage Recognition Evaluations

bull Recurring NIST evaluations of speakerlanguage recognition technology

bull Aim Provide a common paradigm for comparing technologies

bull Focus Conversational telephone speech (text-independent)

Evaluation Coordinator

Linguistic Data Consortium

Data Provider Comparison of technologies on common task

Evaluate

Improve

Technology ConsumersApplication domain parameters

Technology Developers

MIT Lincoln Laboratory11

Odyssey 2008

NIST SRELREPre-history

1992 1993

Rutgers Summer Workshop

1994Informal LRE bull 4 sites OGI MITLL MIT ITTbull OGI 12 lang corpus

MartignyWorkshop

DARPA SID evalbull 3 sites Dragon (LVCSR) ITT

(NN) MITLL (GMM)bull Early SWB1bull 1-4 conv trainbull 24 tgtsbull 111 tgt-test 466 imp-testbull 10-60s testbull Speaker dependent ROCbull Intro of Swets Normal-Deviate

plot (DET)bull Areas Under ROC PdPf=10bull Sun Sparc 1020 MB

MIT Lincoln Laboratory12

Odyssey 2008

NIST SRELREFormal Start

1995 1996SRE 1 bull 6 sites BBN (uniGauss)

Dragon (LVCSR) Ensigma(ergodic HMM) INRS (phone HMMs) ITT (NN) MITLL (GMM o64 cohort)

bull SWB1bull 26 tgtsbull Train 10s 30s 4 sessbull Test 5s 10s 30s Separate

tgt and imp testsbull Area Under ROC

PdPf=3105 closed set error

bull SameDiff phone effectbull Speaker dependent ROC

SRE 2 bull 11 sites Ensigma ITT

MITLL SRI CAIP INRS Dragon BBN LIMSI ATampT Sanders

bull Broad phone HMM LVCSR VQ adapted GMM SVM worldcohorts hnorm anchor models

bull SWB1bull 40 tgtsbull Train 2 min - 1 sess 1

handset 2 handset bull Test 3s 10s 30s Separate

tgt and imp testsbull Pooled DET DCF

LRE 1 bull 4-5 sitesbull PPRLM GMM-CEP

Syllabic models fusion

bull Callfriendbull 12 languages 3

dialectsbull Test 3s 10s 30sbull DET DCF

MIT Lincoln Laboratory13

Odyssey 2008

NIST SRELRESteady Progress

Avignon Workshop

1997 1998SRE 3 bull 8 sites bull Pitch features handset mic

detectorcomp using more dev data

bull SWB2p1bull All speakers act as tgts

and imposters (current paradigm)

bull Train 2 min - 1 sess 1 handset 2 handset

bull Test 3s 10s 30sbull No cross-sex trials

matched and mismatched test phone

bull DET DCF

SRE 4bull 12 sitesbull Phone sequences (BBN)

sequence models (Dragon)

bull SWB2p2bull Train 2 min - 1 sess 2

sess all - 2 sessbull Test 3s 10s 30sbull SNDN and HS type side

knowledgebull Human performance

3s

1999SRE 5bull 13 sitesbull T-norm system fusionbull SWB2p3bull Train 2 min - 2 sessbull Test varying duration (0-

15 15-30 30-45gt45) diff number

bull New tasks 2-spkr test speaker tracking

MIT Lincoln Laboratory14

Odyssey 2008

NIST SRELRENew Directions

Odyssey Workshop

JHU SuperSIDWorkshop

2000 2001SRE 6bull 12 sites (First shark

sighting) bull SMS bull SWB2p1p2

AHUMADAbull Train 2 min - 1 sessbull Test variable 0-60 bull New tasks 2-spkr

train amp test N-speaker segmentation

20032002SRE 7bull 13 sitesbull Per-frame SVM

Fusion text-constrained GMM word amp phone N-gram

bull SWB2p1p2 AHUMADA SWB2p4 (cell)

bull Extended data task (SWB1)

SRE 8bull 24 sitesbull Feat map

high-level features mlpfusion

bull SWB2p5 (cell) SWB2p2p3 (ext) FBIVoiceDB(Multi Modal) BNEWS (seg) Meeting (seg)

SRE 9bull 19 sitesbull SVM GLDS phone

svm nerfs)bull SWB2p5 (cell)

SWB2p2p3 (ext)

LRE 2bull 6 sitesbull PPRLM GMM-SDC

SVM-SDC fusionbull Callfriendbull 12 languagesbull Test 3s 10s 30s

MIT Lincoln Laboratory15

Odyssey 2008

NIST SRELRECurrent Period

Odyssey Workshop

Odyssey Workshop

2004 2005 20072006LRE 4bull 21 sitesbull SVM-GSV ho

ngrams fLFAfNAPbull Mixer5 OHSUbull 14 languages 5

dialectsaccentsbull Calibrated LLRs

LRE 3bull 11 sitesbull GMM-MMI TRAPSNN-

decoder phone lattice

2008

Odyssey WS SRE 13 JHU WS

SRE 10bull 24 sitesbull Large system

fusionbull Mixer1bull Bilingual

speakers

SRE 11bull 27 sitesbull LFA SVM-MLLRbull Mixer2 MMSRbull Cross-channel

microphonesbull Calibrated LLRs

SRE 12bull 36 sitesbull SVM-GSV

spectral-only systems

bull Mixer2+3 MMSRbull Bilingual cross-

channelbull Multi-site

collaboration

PPR-BinTree PPR-SVMbull OHSU Mixer12bull 7 languages 3

dialectsaccents

MIT Lincoln Laboratory16

Odyssey 2008

NIST SREHow are we doing

0

001002

003004

005

006007

008

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

Year

DC

F

Landline 1sp 2 min train 30 sec test

Cellular 1sp 2 min train 30 sec test

Landline 2-speaker detection

Ahumada(Spanish)

Multimodal (FBI)Landline 1sp (40

target speaker paradigm)

Cellularland 2-speaker detection

CellLand 1sp 8-conv train 1-conv test

Cross-mic1-conv train (tel) 1-conv test (mic)

Swb1 Swb2p1 Swb2p3 Swb2p4 Swb2p5 Mixer1 Mixer3

bull Sampling of tasks shown 28 in SRE04

Cross-language

Swb2p2 Mixer2 MMSR

MIT Lincoln Laboratory17

Odyssey 2008

0123456789

10

0

1

2

3

4

2001 2002 2003 2004 2005 2006

SRE Performance Trends 2001-2007Lincoln Systems

bull Consistent and steady improvement for datatask focus

EER ()1conv4w1conv4w

8conv4w1conv4w

minDCFx100

2001 2002 2003 2004 2005 2006SWB1 SWB2 MIXER2-3

bull New data sets designed to be more challenging

bull New features classifiers and compensations drive error rates down over time

SVM-GSV GMM-LFA MultiFeatSVM-GLDS SVM-MLLR+NAP

2006

NAP TC-SVM wordphone lattices2005

PhoneWord-SVM GMM-ATNORM2004

Feature Mapping SVM-GLDS2003

SuperSID High-level features2002

Text-const GMM word-ngram2001

MIT Lincoln Laboratory18

Odyssey 2008

0

10

20

30

40

1996 2003 2005 2005 2007 2007

EER

()

30s 10s 3s

CallFriend(12-lang)

OHSU(7-lang)

Mixer3(14-lang)

113

32 421014

LRE Performance Trends 1996-2007Lincoln Systems

19

Year Main LID Technology

1996 PPRLM2003 + GMM-SDC SVM-SDC2005 + Phone lattices SVM w ngrams

Binary Trees2007 + TRAPS tokenizers fLFA fNAP

GMM-MMI SVM-GSV calibrated LLRs

MIT Lincoln Laboratory19

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory20

Odyssey 2008

The ExpeditionEvaluations

bull The evaluation paradigm has clearly helped propel speaker and language RampD forward

ndash Common focus ndash Comparable results and repeatable experimentsndash Collaboration

bull But there are some issues to considerndash Proliferation of tasks and conditions can dilute and fragment

community effortndash Evaluations are application-dependent

The tasks conditions and data are representative of some application(s)

Are these being set in a meaningful wayndash Performance numbers need context

Time-pressed less-technical potential users want yesno to ldquowill it or wonrsquot it work for my applicationrdquo

ndash Speaker and language recognition research increasingly relies on data driven discovery

Does performance depend on highly matched dev data Are performance gains due to technology or data

MIT Lincoln Laboratory21

Odyssey 2008

The ExpeditionResearch

bull Speaker and language research are built on three core areas

ndash Speech Science Understanding how speakerlanguage information is conveyed in the speech signal and how to robustly extract measures of this information

ndash Pattern Recognition Techniques and algorithms to effectively represent and compare salient patterns in data

ndash Data Driven Discovery Effectively using data to apply refine and improve systems built from above

bull Current speakerlanguage research is heavily weighted toward data driven discovery

ndash Cure or cursendash Are we discovering underlying problems to address in

research or just where we want more data

  • Speaker and Language RecognitionA Guided Safari
  • Roadmap
  • The OdysseyMartigny Switzerland ndash April 5-7 1994
  • The OdysseyAvignon France ndash April 20-23 1998
  • The OdysseyCrete Greece ndash June 18-22 2001
  • The OdysseyToledo Spain ndash May 31- June 3 2004
  • The OdysseySan Juan Puerto Rico ndash June 28-30 2006
  • The OdysseyStellenbosch South Africa ndash January 21-24 2008
  • Roadmap
  • NIST SpeakerLanguage Recognition Evaluations
  • NIST SRELREPre-history
  • NIST SRELREFormal Start
  • NIST SRELRESteady Progress
  • NIST SRELRENew Directions
  • NIST SRELRECurrent Period
  • NIST SREHow are we doing
  • SRE Performance Trends 2001-2007Lincoln Systems
  • LRE Performance Trends 1996-2007Lincoln Systems
  • Roadmap
  • The ExpeditionEvaluations
  • The ExpeditionResearch
Page 5: Speaker and Language Recognition

MIT Lincoln Laboratory5

Odyssey 2008

The OdysseyCrete Greece ndash June 18-22 2001

bull Start of official ldquoOdysseyrdquo workshop series

ndash Originally set for Tel-Aviv Israelbull 40 papers 3 keynotesbull 75 participantsbull Technologies More papers on new

tasks (biometrics diarization) addressing practical issues (robustness channel compensation threshold setting)

bull Corpora Many more papers using SRE corpora and experiment design

bull Bayesian framework taking hold for forensic applications

bull Several speaker verification companies (PerSay Nuance VoiceVault)

MIT Lincoln Laboratory6

Odyssey 2008

The OdysseyToledo Spain ndash May 31- June 3 2004

bull Co-occurrence with NIST SRE 2004 workshop

bull 61 paper 4 keynotesbull 147 participantsbull Technologies GMM SVM NAP LFA

high-level features adaptation audio-video LID

bull Corpora SRE corporaprotocol dominant for TI-Telephone RT BNEWS data for diarization TNONFI field forensic corpus

bull Text-dependent work focusing more on user phrases (less digit strings)

MIT Lincoln Laboratory7

Odyssey 2008

The OdysseySan Juan Puerto Rico ndash June 28-30 2006

bull Co-occurrence with NIST SRE 2006 workshop

ndash Followed LRE 2005 in December bull 60 papers 1 keynotebull 103 participantsbull Technologies GMM-SVM NAPLFA

GMM-MMI high-level features robustness

bull Corpora Dominated by SRE and LRE corporaprotocol

MIT Lincoln Laboratory8

Odyssey 2008

The OdysseyStellenbosch South Africa ndash January 21-24 2008

bull Expect to see continued trends inndash Common corporaevaluationsndash High-quality papers and novel topics

bull More fish pictures hellip

MIT Lincoln Laboratory9

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory10

Odyssey 2008

NIST SpeakerLanguage Recognition Evaluations

bull Recurring NIST evaluations of speakerlanguage recognition technology

bull Aim Provide a common paradigm for comparing technologies

bull Focus Conversational telephone speech (text-independent)

Evaluation Coordinator

Linguistic Data Consortium

Data Provider Comparison of technologies on common task

Evaluate

Improve

Technology ConsumersApplication domain parameters

Technology Developers

MIT Lincoln Laboratory11

Odyssey 2008

NIST SRELREPre-history

1992 1993

Rutgers Summer Workshop

1994Informal LRE bull 4 sites OGI MITLL MIT ITTbull OGI 12 lang corpus

MartignyWorkshop

DARPA SID evalbull 3 sites Dragon (LVCSR) ITT

(NN) MITLL (GMM)bull Early SWB1bull 1-4 conv trainbull 24 tgtsbull 111 tgt-test 466 imp-testbull 10-60s testbull Speaker dependent ROCbull Intro of Swets Normal-Deviate

plot (DET)bull Areas Under ROC PdPf=10bull Sun Sparc 1020 MB

MIT Lincoln Laboratory12

Odyssey 2008

NIST SRELREFormal Start

1995 1996SRE 1 bull 6 sites BBN (uniGauss)

Dragon (LVCSR) Ensigma(ergodic HMM) INRS (phone HMMs) ITT (NN) MITLL (GMM o64 cohort)

bull SWB1bull 26 tgtsbull Train 10s 30s 4 sessbull Test 5s 10s 30s Separate

tgt and imp testsbull Area Under ROC

PdPf=3105 closed set error

bull SameDiff phone effectbull Speaker dependent ROC

SRE 2 bull 11 sites Ensigma ITT

MITLL SRI CAIP INRS Dragon BBN LIMSI ATampT Sanders

bull Broad phone HMM LVCSR VQ adapted GMM SVM worldcohorts hnorm anchor models

bull SWB1bull 40 tgtsbull Train 2 min - 1 sess 1

handset 2 handset bull Test 3s 10s 30s Separate

tgt and imp testsbull Pooled DET DCF

LRE 1 bull 4-5 sitesbull PPRLM GMM-CEP

Syllabic models fusion

bull Callfriendbull 12 languages 3

dialectsbull Test 3s 10s 30sbull DET DCF

MIT Lincoln Laboratory13

Odyssey 2008

NIST SRELRESteady Progress

Avignon Workshop

1997 1998SRE 3 bull 8 sites bull Pitch features handset mic

detectorcomp using more dev data

bull SWB2p1bull All speakers act as tgts

and imposters (current paradigm)

bull Train 2 min - 1 sess 1 handset 2 handset

bull Test 3s 10s 30sbull No cross-sex trials

matched and mismatched test phone

bull DET DCF

SRE 4bull 12 sitesbull Phone sequences (BBN)

sequence models (Dragon)

bull SWB2p2bull Train 2 min - 1 sess 2

sess all - 2 sessbull Test 3s 10s 30sbull SNDN and HS type side

knowledgebull Human performance

3s

1999SRE 5bull 13 sitesbull T-norm system fusionbull SWB2p3bull Train 2 min - 2 sessbull Test varying duration (0-

15 15-30 30-45gt45) diff number

bull New tasks 2-spkr test speaker tracking

MIT Lincoln Laboratory14

Odyssey 2008

NIST SRELRENew Directions

Odyssey Workshop

JHU SuperSIDWorkshop

2000 2001SRE 6bull 12 sites (First shark

sighting) bull SMS bull SWB2p1p2

AHUMADAbull Train 2 min - 1 sessbull Test variable 0-60 bull New tasks 2-spkr

train amp test N-speaker segmentation

20032002SRE 7bull 13 sitesbull Per-frame SVM

Fusion text-constrained GMM word amp phone N-gram

bull SWB2p1p2 AHUMADA SWB2p4 (cell)

bull Extended data task (SWB1)

SRE 8bull 24 sitesbull Feat map

high-level features mlpfusion

bull SWB2p5 (cell) SWB2p2p3 (ext) FBIVoiceDB(Multi Modal) BNEWS (seg) Meeting (seg)

SRE 9bull 19 sitesbull SVM GLDS phone

svm nerfs)bull SWB2p5 (cell)

SWB2p2p3 (ext)

LRE 2bull 6 sitesbull PPRLM GMM-SDC

SVM-SDC fusionbull Callfriendbull 12 languagesbull Test 3s 10s 30s

MIT Lincoln Laboratory15

Odyssey 2008

NIST SRELRECurrent Period

Odyssey Workshop

Odyssey Workshop

2004 2005 20072006LRE 4bull 21 sitesbull SVM-GSV ho

ngrams fLFAfNAPbull Mixer5 OHSUbull 14 languages 5

dialectsaccentsbull Calibrated LLRs

LRE 3bull 11 sitesbull GMM-MMI TRAPSNN-

decoder phone lattice

2008

Odyssey WS SRE 13 JHU WS

SRE 10bull 24 sitesbull Large system

fusionbull Mixer1bull Bilingual

speakers

SRE 11bull 27 sitesbull LFA SVM-MLLRbull Mixer2 MMSRbull Cross-channel

microphonesbull Calibrated LLRs

SRE 12bull 36 sitesbull SVM-GSV

spectral-only systems

bull Mixer2+3 MMSRbull Bilingual cross-

channelbull Multi-site

collaboration

PPR-BinTree PPR-SVMbull OHSU Mixer12bull 7 languages 3

dialectsaccents

MIT Lincoln Laboratory16

Odyssey 2008

NIST SREHow are we doing

0

001002

003004

005

006007

008

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

Year

DC

F

Landline 1sp 2 min train 30 sec test

Cellular 1sp 2 min train 30 sec test

Landline 2-speaker detection

Ahumada(Spanish)

Multimodal (FBI)Landline 1sp (40

target speaker paradigm)

Cellularland 2-speaker detection

CellLand 1sp 8-conv train 1-conv test

Cross-mic1-conv train (tel) 1-conv test (mic)

Swb1 Swb2p1 Swb2p3 Swb2p4 Swb2p5 Mixer1 Mixer3

bull Sampling of tasks shown 28 in SRE04

Cross-language

Swb2p2 Mixer2 MMSR

MIT Lincoln Laboratory17

Odyssey 2008

0123456789

10

0

1

2

3

4

2001 2002 2003 2004 2005 2006

SRE Performance Trends 2001-2007Lincoln Systems

bull Consistent and steady improvement for datatask focus

EER ()1conv4w1conv4w

8conv4w1conv4w

minDCFx100

2001 2002 2003 2004 2005 2006SWB1 SWB2 MIXER2-3

bull New data sets designed to be more challenging

bull New features classifiers and compensations drive error rates down over time

SVM-GSV GMM-LFA MultiFeatSVM-GLDS SVM-MLLR+NAP

2006

NAP TC-SVM wordphone lattices2005

PhoneWord-SVM GMM-ATNORM2004

Feature Mapping SVM-GLDS2003

SuperSID High-level features2002

Text-const GMM word-ngram2001

MIT Lincoln Laboratory18

Odyssey 2008

0

10

20

30

40

1996 2003 2005 2005 2007 2007

EER

()

30s 10s 3s

CallFriend(12-lang)

OHSU(7-lang)

Mixer3(14-lang)

113

32 421014

LRE Performance Trends 1996-2007Lincoln Systems

19

Year Main LID Technology

1996 PPRLM2003 + GMM-SDC SVM-SDC2005 + Phone lattices SVM w ngrams

Binary Trees2007 + TRAPS tokenizers fLFA fNAP

GMM-MMI SVM-GSV calibrated LLRs

MIT Lincoln Laboratory19

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory20

Odyssey 2008

The ExpeditionEvaluations

bull The evaluation paradigm has clearly helped propel speaker and language RampD forward

ndash Common focus ndash Comparable results and repeatable experimentsndash Collaboration

bull But there are some issues to considerndash Proliferation of tasks and conditions can dilute and fragment

community effortndash Evaluations are application-dependent

The tasks conditions and data are representative of some application(s)

Are these being set in a meaningful wayndash Performance numbers need context

Time-pressed less-technical potential users want yesno to ldquowill it or wonrsquot it work for my applicationrdquo

ndash Speaker and language recognition research increasingly relies on data driven discovery

Does performance depend on highly matched dev data Are performance gains due to technology or data

MIT Lincoln Laboratory21

Odyssey 2008

The ExpeditionResearch

bull Speaker and language research are built on three core areas

ndash Speech Science Understanding how speakerlanguage information is conveyed in the speech signal and how to robustly extract measures of this information

ndash Pattern Recognition Techniques and algorithms to effectively represent and compare salient patterns in data

ndash Data Driven Discovery Effectively using data to apply refine and improve systems built from above

bull Current speakerlanguage research is heavily weighted toward data driven discovery

ndash Cure or cursendash Are we discovering underlying problems to address in

research or just where we want more data

  • Speaker and Language RecognitionA Guided Safari
  • Roadmap
  • The OdysseyMartigny Switzerland ndash April 5-7 1994
  • The OdysseyAvignon France ndash April 20-23 1998
  • The OdysseyCrete Greece ndash June 18-22 2001
  • The OdysseyToledo Spain ndash May 31- June 3 2004
  • The OdysseySan Juan Puerto Rico ndash June 28-30 2006
  • The OdysseyStellenbosch South Africa ndash January 21-24 2008
  • Roadmap
  • NIST SpeakerLanguage Recognition Evaluations
  • NIST SRELREPre-history
  • NIST SRELREFormal Start
  • NIST SRELRESteady Progress
  • NIST SRELRENew Directions
  • NIST SRELRECurrent Period
  • NIST SREHow are we doing
  • SRE Performance Trends 2001-2007Lincoln Systems
  • LRE Performance Trends 1996-2007Lincoln Systems
  • Roadmap
  • The ExpeditionEvaluations
  • The ExpeditionResearch
Page 6: Speaker and Language Recognition

MIT Lincoln Laboratory6

Odyssey 2008

The OdysseyToledo Spain ndash May 31- June 3 2004

bull Co-occurrence with NIST SRE 2004 workshop

bull 61 paper 4 keynotesbull 147 participantsbull Technologies GMM SVM NAP LFA

high-level features adaptation audio-video LID

bull Corpora SRE corporaprotocol dominant for TI-Telephone RT BNEWS data for diarization TNONFI field forensic corpus

bull Text-dependent work focusing more on user phrases (less digit strings)

MIT Lincoln Laboratory7

Odyssey 2008

The OdysseySan Juan Puerto Rico ndash June 28-30 2006

bull Co-occurrence with NIST SRE 2006 workshop

ndash Followed LRE 2005 in December bull 60 papers 1 keynotebull 103 participantsbull Technologies GMM-SVM NAPLFA

GMM-MMI high-level features robustness

bull Corpora Dominated by SRE and LRE corporaprotocol

MIT Lincoln Laboratory8

Odyssey 2008

The OdysseyStellenbosch South Africa ndash January 21-24 2008

bull Expect to see continued trends inndash Common corporaevaluationsndash High-quality papers and novel topics

bull More fish pictures hellip

MIT Lincoln Laboratory9

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory10

Odyssey 2008

NIST SpeakerLanguage Recognition Evaluations

bull Recurring NIST evaluations of speakerlanguage recognition technology

bull Aim Provide a common paradigm for comparing technologies

bull Focus Conversational telephone speech (text-independent)

Evaluation Coordinator

Linguistic Data Consortium

Data Provider Comparison of technologies on common task

Evaluate

Improve

Technology ConsumersApplication domain parameters

Technology Developers

MIT Lincoln Laboratory11

Odyssey 2008

NIST SRELREPre-history

1992 1993

Rutgers Summer Workshop

1994Informal LRE bull 4 sites OGI MITLL MIT ITTbull OGI 12 lang corpus

MartignyWorkshop

DARPA SID evalbull 3 sites Dragon (LVCSR) ITT

(NN) MITLL (GMM)bull Early SWB1bull 1-4 conv trainbull 24 tgtsbull 111 tgt-test 466 imp-testbull 10-60s testbull Speaker dependent ROCbull Intro of Swets Normal-Deviate

plot (DET)bull Areas Under ROC PdPf=10bull Sun Sparc 1020 MB

MIT Lincoln Laboratory12

Odyssey 2008

NIST SRELREFormal Start

1995 1996SRE 1 bull 6 sites BBN (uniGauss)

Dragon (LVCSR) Ensigma(ergodic HMM) INRS (phone HMMs) ITT (NN) MITLL (GMM o64 cohort)

bull SWB1bull 26 tgtsbull Train 10s 30s 4 sessbull Test 5s 10s 30s Separate

tgt and imp testsbull Area Under ROC

PdPf=3105 closed set error

bull SameDiff phone effectbull Speaker dependent ROC

SRE 2 bull 11 sites Ensigma ITT

MITLL SRI CAIP INRS Dragon BBN LIMSI ATampT Sanders

bull Broad phone HMM LVCSR VQ adapted GMM SVM worldcohorts hnorm anchor models

bull SWB1bull 40 tgtsbull Train 2 min - 1 sess 1

handset 2 handset bull Test 3s 10s 30s Separate

tgt and imp testsbull Pooled DET DCF

LRE 1 bull 4-5 sitesbull PPRLM GMM-CEP

Syllabic models fusion

bull Callfriendbull 12 languages 3

dialectsbull Test 3s 10s 30sbull DET DCF

MIT Lincoln Laboratory13

Odyssey 2008

NIST SRELRESteady Progress

Avignon Workshop

1997 1998SRE 3 bull 8 sites bull Pitch features handset mic

detectorcomp using more dev data

bull SWB2p1bull All speakers act as tgts

and imposters (current paradigm)

bull Train 2 min - 1 sess 1 handset 2 handset

bull Test 3s 10s 30sbull No cross-sex trials

matched and mismatched test phone

bull DET DCF

SRE 4bull 12 sitesbull Phone sequences (BBN)

sequence models (Dragon)

bull SWB2p2bull Train 2 min - 1 sess 2

sess all - 2 sessbull Test 3s 10s 30sbull SNDN and HS type side

knowledgebull Human performance

3s

1999SRE 5bull 13 sitesbull T-norm system fusionbull SWB2p3bull Train 2 min - 2 sessbull Test varying duration (0-

15 15-30 30-45gt45) diff number

bull New tasks 2-spkr test speaker tracking

MIT Lincoln Laboratory14

Odyssey 2008

NIST SRELRENew Directions

Odyssey Workshop

JHU SuperSIDWorkshop

2000 2001SRE 6bull 12 sites (First shark

sighting) bull SMS bull SWB2p1p2

AHUMADAbull Train 2 min - 1 sessbull Test variable 0-60 bull New tasks 2-spkr

train amp test N-speaker segmentation

20032002SRE 7bull 13 sitesbull Per-frame SVM

Fusion text-constrained GMM word amp phone N-gram

bull SWB2p1p2 AHUMADA SWB2p4 (cell)

bull Extended data task (SWB1)

SRE 8bull 24 sitesbull Feat map

high-level features mlpfusion

bull SWB2p5 (cell) SWB2p2p3 (ext) FBIVoiceDB(Multi Modal) BNEWS (seg) Meeting (seg)

SRE 9bull 19 sitesbull SVM GLDS phone

svm nerfs)bull SWB2p5 (cell)

SWB2p2p3 (ext)

LRE 2bull 6 sitesbull PPRLM GMM-SDC

SVM-SDC fusionbull Callfriendbull 12 languagesbull Test 3s 10s 30s

MIT Lincoln Laboratory15

Odyssey 2008

NIST SRELRECurrent Period

Odyssey Workshop

Odyssey Workshop

2004 2005 20072006LRE 4bull 21 sitesbull SVM-GSV ho

ngrams fLFAfNAPbull Mixer5 OHSUbull 14 languages 5

dialectsaccentsbull Calibrated LLRs

LRE 3bull 11 sitesbull GMM-MMI TRAPSNN-

decoder phone lattice

2008

Odyssey WS SRE 13 JHU WS

SRE 10bull 24 sitesbull Large system

fusionbull Mixer1bull Bilingual

speakers

SRE 11bull 27 sitesbull LFA SVM-MLLRbull Mixer2 MMSRbull Cross-channel

microphonesbull Calibrated LLRs

SRE 12bull 36 sitesbull SVM-GSV

spectral-only systems

bull Mixer2+3 MMSRbull Bilingual cross-

channelbull Multi-site

collaboration

PPR-BinTree PPR-SVMbull OHSU Mixer12bull 7 languages 3

dialectsaccents

MIT Lincoln Laboratory16

Odyssey 2008

NIST SREHow are we doing

0

001002

003004

005

006007

008

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

Year

DC

F

Landline 1sp 2 min train 30 sec test

Cellular 1sp 2 min train 30 sec test

Landline 2-speaker detection

Ahumada(Spanish)

Multimodal (FBI)Landline 1sp (40

target speaker paradigm)

Cellularland 2-speaker detection

CellLand 1sp 8-conv train 1-conv test

Cross-mic1-conv train (tel) 1-conv test (mic)

Swb1 Swb2p1 Swb2p3 Swb2p4 Swb2p5 Mixer1 Mixer3

bull Sampling of tasks shown 28 in SRE04

Cross-language

Swb2p2 Mixer2 MMSR

MIT Lincoln Laboratory17

Odyssey 2008

0123456789

10

0

1

2

3

4

2001 2002 2003 2004 2005 2006

SRE Performance Trends 2001-2007Lincoln Systems

bull Consistent and steady improvement for datatask focus

EER ()1conv4w1conv4w

8conv4w1conv4w

minDCFx100

2001 2002 2003 2004 2005 2006SWB1 SWB2 MIXER2-3

bull New data sets designed to be more challenging

bull New features classifiers and compensations drive error rates down over time

SVM-GSV GMM-LFA MultiFeatSVM-GLDS SVM-MLLR+NAP

2006

NAP TC-SVM wordphone lattices2005

PhoneWord-SVM GMM-ATNORM2004

Feature Mapping SVM-GLDS2003

SuperSID High-level features2002

Text-const GMM word-ngram2001

MIT Lincoln Laboratory18

Odyssey 2008

0

10

20

30

40

1996 2003 2005 2005 2007 2007

EER

()

30s 10s 3s

CallFriend(12-lang)

OHSU(7-lang)

Mixer3(14-lang)

113

32 421014

LRE Performance Trends 1996-2007Lincoln Systems

19

Year Main LID Technology

1996 PPRLM2003 + GMM-SDC SVM-SDC2005 + Phone lattices SVM w ngrams

Binary Trees2007 + TRAPS tokenizers fLFA fNAP

GMM-MMI SVM-GSV calibrated LLRs

MIT Lincoln Laboratory19

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory20

Odyssey 2008

The ExpeditionEvaluations

bull The evaluation paradigm has clearly helped propel speaker and language RampD forward

ndash Common focus ndash Comparable results and repeatable experimentsndash Collaboration

bull But there are some issues to considerndash Proliferation of tasks and conditions can dilute and fragment

community effortndash Evaluations are application-dependent

The tasks conditions and data are representative of some application(s)

Are these being set in a meaningful wayndash Performance numbers need context

Time-pressed less-technical potential users want yesno to ldquowill it or wonrsquot it work for my applicationrdquo

ndash Speaker and language recognition research increasingly relies on data driven discovery

Does performance depend on highly matched dev data Are performance gains due to technology or data

MIT Lincoln Laboratory21

Odyssey 2008

The ExpeditionResearch

bull Speaker and language research are built on three core areas

ndash Speech Science Understanding how speakerlanguage information is conveyed in the speech signal and how to robustly extract measures of this information

ndash Pattern Recognition Techniques and algorithms to effectively represent and compare salient patterns in data

ndash Data Driven Discovery Effectively using data to apply refine and improve systems built from above

bull Current speakerlanguage research is heavily weighted toward data driven discovery

ndash Cure or cursendash Are we discovering underlying problems to address in

research or just where we want more data

  • Speaker and Language RecognitionA Guided Safari
  • Roadmap
  • The OdysseyMartigny Switzerland ndash April 5-7 1994
  • The OdysseyAvignon France ndash April 20-23 1998
  • The OdysseyCrete Greece ndash June 18-22 2001
  • The OdysseyToledo Spain ndash May 31- June 3 2004
  • The OdysseySan Juan Puerto Rico ndash June 28-30 2006
  • The OdysseyStellenbosch South Africa ndash January 21-24 2008
  • Roadmap
  • NIST SpeakerLanguage Recognition Evaluations
  • NIST SRELREPre-history
  • NIST SRELREFormal Start
  • NIST SRELRESteady Progress
  • NIST SRELRENew Directions
  • NIST SRELRECurrent Period
  • NIST SREHow are we doing
  • SRE Performance Trends 2001-2007Lincoln Systems
  • LRE Performance Trends 1996-2007Lincoln Systems
  • Roadmap
  • The ExpeditionEvaluations
  • The ExpeditionResearch
Page 7: Speaker and Language Recognition

MIT Lincoln Laboratory7

Odyssey 2008

The OdysseySan Juan Puerto Rico ndash June 28-30 2006

bull Co-occurrence with NIST SRE 2006 workshop

ndash Followed LRE 2005 in December bull 60 papers 1 keynotebull 103 participantsbull Technologies GMM-SVM NAPLFA

GMM-MMI high-level features robustness

bull Corpora Dominated by SRE and LRE corporaprotocol

MIT Lincoln Laboratory8

Odyssey 2008

The OdysseyStellenbosch South Africa ndash January 21-24 2008

bull Expect to see continued trends inndash Common corporaevaluationsndash High-quality papers and novel topics

bull More fish pictures hellip

MIT Lincoln Laboratory9

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory10

Odyssey 2008

NIST SpeakerLanguage Recognition Evaluations

bull Recurring NIST evaluations of speakerlanguage recognition technology

bull Aim Provide a common paradigm for comparing technologies

bull Focus Conversational telephone speech (text-independent)

Evaluation Coordinator

Linguistic Data Consortium

Data Provider Comparison of technologies on common task

Evaluate

Improve

Technology ConsumersApplication domain parameters

Technology Developers

MIT Lincoln Laboratory11

Odyssey 2008

NIST SRELREPre-history

1992 1993

Rutgers Summer Workshop

1994Informal LRE bull 4 sites OGI MITLL MIT ITTbull OGI 12 lang corpus

MartignyWorkshop

DARPA SID evalbull 3 sites Dragon (LVCSR) ITT

(NN) MITLL (GMM)bull Early SWB1bull 1-4 conv trainbull 24 tgtsbull 111 tgt-test 466 imp-testbull 10-60s testbull Speaker dependent ROCbull Intro of Swets Normal-Deviate

plot (DET)bull Areas Under ROC PdPf=10bull Sun Sparc 1020 MB

MIT Lincoln Laboratory12

Odyssey 2008

NIST SRELREFormal Start

1995 1996SRE 1 bull 6 sites BBN (uniGauss)

Dragon (LVCSR) Ensigma(ergodic HMM) INRS (phone HMMs) ITT (NN) MITLL (GMM o64 cohort)

bull SWB1bull 26 tgtsbull Train 10s 30s 4 sessbull Test 5s 10s 30s Separate

tgt and imp testsbull Area Under ROC

PdPf=3105 closed set error

bull SameDiff phone effectbull Speaker dependent ROC

SRE 2 bull 11 sites Ensigma ITT

MITLL SRI CAIP INRS Dragon BBN LIMSI ATampT Sanders

bull Broad phone HMM LVCSR VQ adapted GMM SVM worldcohorts hnorm anchor models

bull SWB1bull 40 tgtsbull Train 2 min - 1 sess 1

handset 2 handset bull Test 3s 10s 30s Separate

tgt and imp testsbull Pooled DET DCF

LRE 1 bull 4-5 sitesbull PPRLM GMM-CEP

Syllabic models fusion

bull Callfriendbull 12 languages 3

dialectsbull Test 3s 10s 30sbull DET DCF

MIT Lincoln Laboratory13

Odyssey 2008

NIST SRELRESteady Progress

Avignon Workshop

1997 1998SRE 3 bull 8 sites bull Pitch features handset mic

detectorcomp using more dev data

bull SWB2p1bull All speakers act as tgts

and imposters (current paradigm)

bull Train 2 min - 1 sess 1 handset 2 handset

bull Test 3s 10s 30sbull No cross-sex trials

matched and mismatched test phone

bull DET DCF

SRE 4bull 12 sitesbull Phone sequences (BBN)

sequence models (Dragon)

bull SWB2p2bull Train 2 min - 1 sess 2

sess all - 2 sessbull Test 3s 10s 30sbull SNDN and HS type side

knowledgebull Human performance

3s

1999SRE 5bull 13 sitesbull T-norm system fusionbull SWB2p3bull Train 2 min - 2 sessbull Test varying duration (0-

15 15-30 30-45gt45) diff number

bull New tasks 2-spkr test speaker tracking

MIT Lincoln Laboratory14

Odyssey 2008

NIST SRELRENew Directions

Odyssey Workshop

JHU SuperSIDWorkshop

2000 2001SRE 6bull 12 sites (First shark

sighting) bull SMS bull SWB2p1p2

AHUMADAbull Train 2 min - 1 sessbull Test variable 0-60 bull New tasks 2-spkr

train amp test N-speaker segmentation

20032002SRE 7bull 13 sitesbull Per-frame SVM

Fusion text-constrained GMM word amp phone N-gram

bull SWB2p1p2 AHUMADA SWB2p4 (cell)

bull Extended data task (SWB1)

SRE 8bull 24 sitesbull Feat map

high-level features mlpfusion

bull SWB2p5 (cell) SWB2p2p3 (ext) FBIVoiceDB(Multi Modal) BNEWS (seg) Meeting (seg)

SRE 9bull 19 sitesbull SVM GLDS phone

svm nerfs)bull SWB2p5 (cell)

SWB2p2p3 (ext)

LRE 2bull 6 sitesbull PPRLM GMM-SDC

SVM-SDC fusionbull Callfriendbull 12 languagesbull Test 3s 10s 30s

MIT Lincoln Laboratory15

Odyssey 2008

NIST SRELRECurrent Period

Odyssey Workshop

Odyssey Workshop

2004 2005 20072006LRE 4bull 21 sitesbull SVM-GSV ho

ngrams fLFAfNAPbull Mixer5 OHSUbull 14 languages 5

dialectsaccentsbull Calibrated LLRs

LRE 3bull 11 sitesbull GMM-MMI TRAPSNN-

decoder phone lattice

2008

Odyssey WS SRE 13 JHU WS

SRE 10bull 24 sitesbull Large system

fusionbull Mixer1bull Bilingual

speakers

SRE 11bull 27 sitesbull LFA SVM-MLLRbull Mixer2 MMSRbull Cross-channel

microphonesbull Calibrated LLRs

SRE 12bull 36 sitesbull SVM-GSV

spectral-only systems

bull Mixer2+3 MMSRbull Bilingual cross-

channelbull Multi-site

collaboration

PPR-BinTree PPR-SVMbull OHSU Mixer12bull 7 languages 3

dialectsaccents

MIT Lincoln Laboratory16

Odyssey 2008

NIST SREHow are we doing

0

001002

003004

005

006007

008

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

Year

DC

F

Landline 1sp 2 min train 30 sec test

Cellular 1sp 2 min train 30 sec test

Landline 2-speaker detection

Ahumada(Spanish)

Multimodal (FBI)Landline 1sp (40

target speaker paradigm)

Cellularland 2-speaker detection

CellLand 1sp 8-conv train 1-conv test

Cross-mic1-conv train (tel) 1-conv test (mic)

Swb1 Swb2p1 Swb2p3 Swb2p4 Swb2p5 Mixer1 Mixer3

bull Sampling of tasks shown 28 in SRE04

Cross-language

Swb2p2 Mixer2 MMSR

MIT Lincoln Laboratory17

Odyssey 2008

0123456789

10

0

1

2

3

4

2001 2002 2003 2004 2005 2006

SRE Performance Trends 2001-2007Lincoln Systems

bull Consistent and steady improvement for datatask focus

EER ()1conv4w1conv4w

8conv4w1conv4w

minDCFx100

2001 2002 2003 2004 2005 2006SWB1 SWB2 MIXER2-3

bull New data sets designed to be more challenging

bull New features classifiers and compensations drive error rates down over time

SVM-GSV GMM-LFA MultiFeatSVM-GLDS SVM-MLLR+NAP

2006

NAP TC-SVM wordphone lattices2005

PhoneWord-SVM GMM-ATNORM2004

Feature Mapping SVM-GLDS2003

SuperSID High-level features2002

Text-const GMM word-ngram2001

MIT Lincoln Laboratory18

Odyssey 2008

0

10

20

30

40

1996 2003 2005 2005 2007 2007

EER

()

30s 10s 3s

CallFriend(12-lang)

OHSU(7-lang)

Mixer3(14-lang)

113

32 421014

LRE Performance Trends 1996-2007Lincoln Systems

19

Year Main LID Technology

1996 PPRLM2003 + GMM-SDC SVM-SDC2005 + Phone lattices SVM w ngrams

Binary Trees2007 + TRAPS tokenizers fLFA fNAP

GMM-MMI SVM-GSV calibrated LLRs

MIT Lincoln Laboratory19

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory20

Odyssey 2008

The ExpeditionEvaluations

bull The evaluation paradigm has clearly helped propel speaker and language RampD forward

ndash Common focus ndash Comparable results and repeatable experimentsndash Collaboration

bull But there are some issues to considerndash Proliferation of tasks and conditions can dilute and fragment

community effortndash Evaluations are application-dependent

The tasks conditions and data are representative of some application(s)

Are these being set in a meaningful wayndash Performance numbers need context

Time-pressed less-technical potential users want yesno to ldquowill it or wonrsquot it work for my applicationrdquo

ndash Speaker and language recognition research increasingly relies on data driven discovery

Does performance depend on highly matched dev data Are performance gains due to technology or data

MIT Lincoln Laboratory21

Odyssey 2008

The ExpeditionResearch

bull Speaker and language research are built on three core areas

ndash Speech Science Understanding how speakerlanguage information is conveyed in the speech signal and how to robustly extract measures of this information

ndash Pattern Recognition Techniques and algorithms to effectively represent and compare salient patterns in data

ndash Data Driven Discovery Effectively using data to apply refine and improve systems built from above

bull Current speakerlanguage research is heavily weighted toward data driven discovery

ndash Cure or cursendash Are we discovering underlying problems to address in

research or just where we want more data

  • Speaker and Language RecognitionA Guided Safari
  • Roadmap
  • The OdysseyMartigny Switzerland ndash April 5-7 1994
  • The OdysseyAvignon France ndash April 20-23 1998
  • The OdysseyCrete Greece ndash June 18-22 2001
  • The OdysseyToledo Spain ndash May 31- June 3 2004
  • The OdysseySan Juan Puerto Rico ndash June 28-30 2006
  • The OdysseyStellenbosch South Africa ndash January 21-24 2008
  • Roadmap
  • NIST SpeakerLanguage Recognition Evaluations
  • NIST SRELREPre-history
  • NIST SRELREFormal Start
  • NIST SRELRESteady Progress
  • NIST SRELRENew Directions
  • NIST SRELRECurrent Period
  • NIST SREHow are we doing
  • SRE Performance Trends 2001-2007Lincoln Systems
  • LRE Performance Trends 1996-2007Lincoln Systems
  • Roadmap
  • The ExpeditionEvaluations
  • The ExpeditionResearch
Page 8: Speaker and Language Recognition

MIT Lincoln Laboratory8

Odyssey 2008

The OdysseyStellenbosch South Africa ndash January 21-24 2008

bull Expect to see continued trends inndash Common corporaevaluationsndash High-quality papers and novel topics

bull More fish pictures hellip

MIT Lincoln Laboratory9

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory10

Odyssey 2008

NIST SpeakerLanguage Recognition Evaluations

bull Recurring NIST evaluations of speakerlanguage recognition technology

bull Aim Provide a common paradigm for comparing technologies

bull Focus Conversational telephone speech (text-independent)

Evaluation Coordinator

Linguistic Data Consortium

Data Provider Comparison of technologies on common task

Evaluate

Improve

Technology ConsumersApplication domain parameters

Technology Developers

MIT Lincoln Laboratory11

Odyssey 2008

NIST SRELREPre-history

1992 1993

Rutgers Summer Workshop

1994Informal LRE bull 4 sites OGI MITLL MIT ITTbull OGI 12 lang corpus

MartignyWorkshop

DARPA SID evalbull 3 sites Dragon (LVCSR) ITT

(NN) MITLL (GMM)bull Early SWB1bull 1-4 conv trainbull 24 tgtsbull 111 tgt-test 466 imp-testbull 10-60s testbull Speaker dependent ROCbull Intro of Swets Normal-Deviate

plot (DET)bull Areas Under ROC PdPf=10bull Sun Sparc 1020 MB

MIT Lincoln Laboratory12

Odyssey 2008

NIST SRELREFormal Start

1995 1996SRE 1 bull 6 sites BBN (uniGauss)

Dragon (LVCSR) Ensigma(ergodic HMM) INRS (phone HMMs) ITT (NN) MITLL (GMM o64 cohort)

bull SWB1bull 26 tgtsbull Train 10s 30s 4 sessbull Test 5s 10s 30s Separate

tgt and imp testsbull Area Under ROC

PdPf=3105 closed set error

bull SameDiff phone effectbull Speaker dependent ROC

SRE 2 bull 11 sites Ensigma ITT

MITLL SRI CAIP INRS Dragon BBN LIMSI ATampT Sanders

bull Broad phone HMM LVCSR VQ adapted GMM SVM worldcohorts hnorm anchor models

bull SWB1bull 40 tgtsbull Train 2 min - 1 sess 1

handset 2 handset bull Test 3s 10s 30s Separate

tgt and imp testsbull Pooled DET DCF

LRE 1 bull 4-5 sitesbull PPRLM GMM-CEP

Syllabic models fusion

bull Callfriendbull 12 languages 3

dialectsbull Test 3s 10s 30sbull DET DCF

MIT Lincoln Laboratory13

Odyssey 2008

NIST SRELRESteady Progress

Avignon Workshop

1997 1998SRE 3 bull 8 sites bull Pitch features handset mic

detectorcomp using more dev data

bull SWB2p1bull All speakers act as tgts

and imposters (current paradigm)

bull Train 2 min - 1 sess 1 handset 2 handset

bull Test 3s 10s 30sbull No cross-sex trials

matched and mismatched test phone

bull DET DCF

SRE 4bull 12 sitesbull Phone sequences (BBN)

sequence models (Dragon)

bull SWB2p2bull Train 2 min - 1 sess 2

sess all - 2 sessbull Test 3s 10s 30sbull SNDN and HS type side

knowledgebull Human performance

3s

1999SRE 5bull 13 sitesbull T-norm system fusionbull SWB2p3bull Train 2 min - 2 sessbull Test varying duration (0-

15 15-30 30-45gt45) diff number

bull New tasks 2-spkr test speaker tracking

MIT Lincoln Laboratory14

Odyssey 2008

NIST SRELRENew Directions

Odyssey Workshop

JHU SuperSIDWorkshop

2000 2001SRE 6bull 12 sites (First shark

sighting) bull SMS bull SWB2p1p2

AHUMADAbull Train 2 min - 1 sessbull Test variable 0-60 bull New tasks 2-spkr

train amp test N-speaker segmentation

20032002SRE 7bull 13 sitesbull Per-frame SVM

Fusion text-constrained GMM word amp phone N-gram

bull SWB2p1p2 AHUMADA SWB2p4 (cell)

bull Extended data task (SWB1)

SRE 8bull 24 sitesbull Feat map

high-level features mlpfusion

bull SWB2p5 (cell) SWB2p2p3 (ext) FBIVoiceDB(Multi Modal) BNEWS (seg) Meeting (seg)

SRE 9bull 19 sitesbull SVM GLDS phone

svm nerfs)bull SWB2p5 (cell)

SWB2p2p3 (ext)

LRE 2bull 6 sitesbull PPRLM GMM-SDC

SVM-SDC fusionbull Callfriendbull 12 languagesbull Test 3s 10s 30s

MIT Lincoln Laboratory15

Odyssey 2008

NIST SRELRECurrent Period

Odyssey Workshop

Odyssey Workshop

2004 2005 20072006LRE 4bull 21 sitesbull SVM-GSV ho

ngrams fLFAfNAPbull Mixer5 OHSUbull 14 languages 5

dialectsaccentsbull Calibrated LLRs

LRE 3bull 11 sitesbull GMM-MMI TRAPSNN-

decoder phone lattice

2008

Odyssey WS SRE 13 JHU WS

SRE 10bull 24 sitesbull Large system

fusionbull Mixer1bull Bilingual

speakers

SRE 11bull 27 sitesbull LFA SVM-MLLRbull Mixer2 MMSRbull Cross-channel

microphonesbull Calibrated LLRs

SRE 12bull 36 sitesbull SVM-GSV

spectral-only systems

bull Mixer2+3 MMSRbull Bilingual cross-

channelbull Multi-site

collaboration

PPR-BinTree PPR-SVMbull OHSU Mixer12bull 7 languages 3

dialectsaccents

MIT Lincoln Laboratory16

Odyssey 2008

NIST SREHow are we doing

0

001002

003004

005

006007

008

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

Year

DC

F

Landline 1sp 2 min train 30 sec test

Cellular 1sp 2 min train 30 sec test

Landline 2-speaker detection

Ahumada(Spanish)

Multimodal (FBI)Landline 1sp (40

target speaker paradigm)

Cellularland 2-speaker detection

CellLand 1sp 8-conv train 1-conv test

Cross-mic1-conv train (tel) 1-conv test (mic)

Swb1 Swb2p1 Swb2p3 Swb2p4 Swb2p5 Mixer1 Mixer3

bull Sampling of tasks shown 28 in SRE04

Cross-language

Swb2p2 Mixer2 MMSR

MIT Lincoln Laboratory17

Odyssey 2008

0123456789

10

0

1

2

3

4

2001 2002 2003 2004 2005 2006

SRE Performance Trends 2001-2007Lincoln Systems

bull Consistent and steady improvement for datatask focus

EER ()1conv4w1conv4w

8conv4w1conv4w

minDCFx100

2001 2002 2003 2004 2005 2006SWB1 SWB2 MIXER2-3

bull New data sets designed to be more challenging

bull New features classifiers and compensations drive error rates down over time

SVM-GSV GMM-LFA MultiFeatSVM-GLDS SVM-MLLR+NAP

2006

NAP TC-SVM wordphone lattices2005

PhoneWord-SVM GMM-ATNORM2004

Feature Mapping SVM-GLDS2003

SuperSID High-level features2002

Text-const GMM word-ngram2001

MIT Lincoln Laboratory18

Odyssey 2008

0

10

20

30

40

1996 2003 2005 2005 2007 2007

EER

()

30s 10s 3s

CallFriend(12-lang)

OHSU(7-lang)

Mixer3(14-lang)

113

32 421014

LRE Performance Trends 1996-2007Lincoln Systems

19

Year Main LID Technology

1996 PPRLM2003 + GMM-SDC SVM-SDC2005 + Phone lattices SVM w ngrams

Binary Trees2007 + TRAPS tokenizers fLFA fNAP

GMM-MMI SVM-GSV calibrated LLRs

MIT Lincoln Laboratory19

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory20

Odyssey 2008

The ExpeditionEvaluations

bull The evaluation paradigm has clearly helped propel speaker and language RampD forward

ndash Common focus ndash Comparable results and repeatable experimentsndash Collaboration

bull But there are some issues to considerndash Proliferation of tasks and conditions can dilute and fragment

community effortndash Evaluations are application-dependent

The tasks conditions and data are representative of some application(s)

Are these being set in a meaningful wayndash Performance numbers need context

Time-pressed less-technical potential users want yesno to ldquowill it or wonrsquot it work for my applicationrdquo

ndash Speaker and language recognition research increasingly relies on data driven discovery

Does performance depend on highly matched dev data Are performance gains due to technology or data

MIT Lincoln Laboratory21

Odyssey 2008

The ExpeditionResearch

bull Speaker and language research are built on three core areas

ndash Speech Science Understanding how speakerlanguage information is conveyed in the speech signal and how to robustly extract measures of this information

ndash Pattern Recognition Techniques and algorithms to effectively represent and compare salient patterns in data

ndash Data Driven Discovery Effectively using data to apply refine and improve systems built from above

bull Current speakerlanguage research is heavily weighted toward data driven discovery

ndash Cure or cursendash Are we discovering underlying problems to address in

research or just where we want more data

  • Speaker and Language RecognitionA Guided Safari
  • Roadmap
  • The OdysseyMartigny Switzerland ndash April 5-7 1994
  • The OdysseyAvignon France ndash April 20-23 1998
  • The OdysseyCrete Greece ndash June 18-22 2001
  • The OdysseyToledo Spain ndash May 31- June 3 2004
  • The OdysseySan Juan Puerto Rico ndash June 28-30 2006
  • The OdysseyStellenbosch South Africa ndash January 21-24 2008
  • Roadmap
  • NIST SpeakerLanguage Recognition Evaluations
  • NIST SRELREPre-history
  • NIST SRELREFormal Start
  • NIST SRELRESteady Progress
  • NIST SRELRENew Directions
  • NIST SRELRECurrent Period
  • NIST SREHow are we doing
  • SRE Performance Trends 2001-2007Lincoln Systems
  • LRE Performance Trends 1996-2007Lincoln Systems
  • Roadmap
  • The ExpeditionEvaluations
  • The ExpeditionResearch
Page 9: Speaker and Language Recognition

MIT Lincoln Laboratory9

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory10

Odyssey 2008

NIST SpeakerLanguage Recognition Evaluations

bull Recurring NIST evaluations of speakerlanguage recognition technology

bull Aim Provide a common paradigm for comparing technologies

bull Focus Conversational telephone speech (text-independent)

Evaluation Coordinator

Linguistic Data Consortium

Data Provider Comparison of technologies on common task

Evaluate

Improve

Technology ConsumersApplication domain parameters

Technology Developers

MIT Lincoln Laboratory11

Odyssey 2008

NIST SRELREPre-history

1992 1993

Rutgers Summer Workshop

1994Informal LRE bull 4 sites OGI MITLL MIT ITTbull OGI 12 lang corpus

MartignyWorkshop

DARPA SID evalbull 3 sites Dragon (LVCSR) ITT

(NN) MITLL (GMM)bull Early SWB1bull 1-4 conv trainbull 24 tgtsbull 111 tgt-test 466 imp-testbull 10-60s testbull Speaker dependent ROCbull Intro of Swets Normal-Deviate

plot (DET)bull Areas Under ROC PdPf=10bull Sun Sparc 1020 MB

MIT Lincoln Laboratory12

Odyssey 2008

NIST SRELREFormal Start

1995 1996SRE 1 bull 6 sites BBN (uniGauss)

Dragon (LVCSR) Ensigma(ergodic HMM) INRS (phone HMMs) ITT (NN) MITLL (GMM o64 cohort)

bull SWB1bull 26 tgtsbull Train 10s 30s 4 sessbull Test 5s 10s 30s Separate

tgt and imp testsbull Area Under ROC

PdPf=3105 closed set error

bull SameDiff phone effectbull Speaker dependent ROC

SRE 2 bull 11 sites Ensigma ITT

MITLL SRI CAIP INRS Dragon BBN LIMSI ATampT Sanders

bull Broad phone HMM LVCSR VQ adapted GMM SVM worldcohorts hnorm anchor models

bull SWB1bull 40 tgtsbull Train 2 min - 1 sess 1

handset 2 handset bull Test 3s 10s 30s Separate

tgt and imp testsbull Pooled DET DCF

LRE 1 bull 4-5 sitesbull PPRLM GMM-CEP

Syllabic models fusion

bull Callfriendbull 12 languages 3

dialectsbull Test 3s 10s 30sbull DET DCF

MIT Lincoln Laboratory13

Odyssey 2008

NIST SRELRESteady Progress

Avignon Workshop

1997 1998SRE 3 bull 8 sites bull Pitch features handset mic

detectorcomp using more dev data

bull SWB2p1bull All speakers act as tgts

and imposters (current paradigm)

bull Train 2 min - 1 sess 1 handset 2 handset

bull Test 3s 10s 30sbull No cross-sex trials

matched and mismatched test phone

bull DET DCF

SRE 4bull 12 sitesbull Phone sequences (BBN)

sequence models (Dragon)

bull SWB2p2bull Train 2 min - 1 sess 2

sess all - 2 sessbull Test 3s 10s 30sbull SNDN and HS type side

knowledgebull Human performance

3s

1999SRE 5bull 13 sitesbull T-norm system fusionbull SWB2p3bull Train 2 min - 2 sessbull Test varying duration (0-

15 15-30 30-45gt45) diff number

bull New tasks 2-spkr test speaker tracking

MIT Lincoln Laboratory14

Odyssey 2008

NIST SRELRENew Directions

Odyssey Workshop

JHU SuperSIDWorkshop

2000 2001SRE 6bull 12 sites (First shark

sighting) bull SMS bull SWB2p1p2

AHUMADAbull Train 2 min - 1 sessbull Test variable 0-60 bull New tasks 2-spkr

train amp test N-speaker segmentation

20032002SRE 7bull 13 sitesbull Per-frame SVM

Fusion text-constrained GMM word amp phone N-gram

bull SWB2p1p2 AHUMADA SWB2p4 (cell)

bull Extended data task (SWB1)

SRE 8bull 24 sitesbull Feat map

high-level features mlpfusion

bull SWB2p5 (cell) SWB2p2p3 (ext) FBIVoiceDB(Multi Modal) BNEWS (seg) Meeting (seg)

SRE 9bull 19 sitesbull SVM GLDS phone

svm nerfs)bull SWB2p5 (cell)

SWB2p2p3 (ext)

LRE 2bull 6 sitesbull PPRLM GMM-SDC

SVM-SDC fusionbull Callfriendbull 12 languagesbull Test 3s 10s 30s

MIT Lincoln Laboratory15

Odyssey 2008

NIST SRELRECurrent Period

Odyssey Workshop

Odyssey Workshop

2004 2005 20072006LRE 4bull 21 sitesbull SVM-GSV ho

ngrams fLFAfNAPbull Mixer5 OHSUbull 14 languages 5

dialectsaccentsbull Calibrated LLRs

LRE 3bull 11 sitesbull GMM-MMI TRAPSNN-

decoder phone lattice

2008

Odyssey WS SRE 13 JHU WS

SRE 10bull 24 sitesbull Large system

fusionbull Mixer1bull Bilingual

speakers

SRE 11bull 27 sitesbull LFA SVM-MLLRbull Mixer2 MMSRbull Cross-channel

microphonesbull Calibrated LLRs

SRE 12bull 36 sitesbull SVM-GSV

spectral-only systems

bull Mixer2+3 MMSRbull Bilingual cross-

channelbull Multi-site

collaboration

PPR-BinTree PPR-SVMbull OHSU Mixer12bull 7 languages 3

dialectsaccents

MIT Lincoln Laboratory16

Odyssey 2008

NIST SREHow are we doing

0

001002

003004

005

006007

008

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

Year

DC

F

Landline 1sp 2 min train 30 sec test

Cellular 1sp 2 min train 30 sec test

Landline 2-speaker detection

Ahumada(Spanish)

Multimodal (FBI)Landline 1sp (40

target speaker paradigm)

Cellularland 2-speaker detection

CellLand 1sp 8-conv train 1-conv test

Cross-mic1-conv train (tel) 1-conv test (mic)

Swb1 Swb2p1 Swb2p3 Swb2p4 Swb2p5 Mixer1 Mixer3

bull Sampling of tasks shown 28 in SRE04

Cross-language

Swb2p2 Mixer2 MMSR

MIT Lincoln Laboratory17

Odyssey 2008

0123456789

10

0

1

2

3

4

2001 2002 2003 2004 2005 2006

SRE Performance Trends 2001-2007Lincoln Systems

bull Consistent and steady improvement for datatask focus

EER ()1conv4w1conv4w

8conv4w1conv4w

minDCFx100

2001 2002 2003 2004 2005 2006SWB1 SWB2 MIXER2-3

bull New data sets designed to be more challenging

bull New features classifiers and compensations drive error rates down over time

SVM-GSV GMM-LFA MultiFeatSVM-GLDS SVM-MLLR+NAP

2006

NAP TC-SVM wordphone lattices2005

PhoneWord-SVM GMM-ATNORM2004

Feature Mapping SVM-GLDS2003

SuperSID High-level features2002

Text-const GMM word-ngram2001

MIT Lincoln Laboratory18

Odyssey 2008

0

10

20

30

40

1996 2003 2005 2005 2007 2007

EER

()

30s 10s 3s

CallFriend(12-lang)

OHSU(7-lang)

Mixer3(14-lang)

113

32 421014

LRE Performance Trends 1996-2007Lincoln Systems

19

Year Main LID Technology

1996 PPRLM2003 + GMM-SDC SVM-SDC2005 + Phone lattices SVM w ngrams

Binary Trees2007 + TRAPS tokenizers fLFA fNAP

GMM-MMI SVM-GSV calibrated LLRs

MIT Lincoln Laboratory19

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory20

Odyssey 2008

The ExpeditionEvaluations

bull The evaluation paradigm has clearly helped propel speaker and language RampD forward

ndash Common focus ndash Comparable results and repeatable experimentsndash Collaboration

bull But there are some issues to considerndash Proliferation of tasks and conditions can dilute and fragment

community effortndash Evaluations are application-dependent

The tasks conditions and data are representative of some application(s)

Are these being set in a meaningful wayndash Performance numbers need context

Time-pressed less-technical potential users want yesno to ldquowill it or wonrsquot it work for my applicationrdquo

ndash Speaker and language recognition research increasingly relies on data driven discovery

Does performance depend on highly matched dev data Are performance gains due to technology or data

MIT Lincoln Laboratory21

Odyssey 2008

The ExpeditionResearch

bull Speaker and language research are built on three core areas

ndash Speech Science Understanding how speakerlanguage information is conveyed in the speech signal and how to robustly extract measures of this information

ndash Pattern Recognition Techniques and algorithms to effectively represent and compare salient patterns in data

ndash Data Driven Discovery Effectively using data to apply refine and improve systems built from above

bull Current speakerlanguage research is heavily weighted toward data driven discovery

ndash Cure or cursendash Are we discovering underlying problems to address in

research or just where we want more data

  • Speaker and Language RecognitionA Guided Safari
  • Roadmap
  • The OdysseyMartigny Switzerland ndash April 5-7 1994
  • The OdysseyAvignon France ndash April 20-23 1998
  • The OdysseyCrete Greece ndash June 18-22 2001
  • The OdysseyToledo Spain ndash May 31- June 3 2004
  • The OdysseySan Juan Puerto Rico ndash June 28-30 2006
  • The OdysseyStellenbosch South Africa ndash January 21-24 2008
  • Roadmap
  • NIST SpeakerLanguage Recognition Evaluations
  • NIST SRELREPre-history
  • NIST SRELREFormal Start
  • NIST SRELRESteady Progress
  • NIST SRELRENew Directions
  • NIST SRELRECurrent Period
  • NIST SREHow are we doing
  • SRE Performance Trends 2001-2007Lincoln Systems
  • LRE Performance Trends 1996-2007Lincoln Systems
  • Roadmap
  • The ExpeditionEvaluations
  • The ExpeditionResearch
Page 10: Speaker and Language Recognition

MIT Lincoln Laboratory10

Odyssey 2008

NIST SpeakerLanguage Recognition Evaluations

bull Recurring NIST evaluations of speakerlanguage recognition technology

bull Aim Provide a common paradigm for comparing technologies

bull Focus Conversational telephone speech (text-independent)

Evaluation Coordinator

Linguistic Data Consortium

Data Provider Comparison of technologies on common task

Evaluate

Improve

Technology ConsumersApplication domain parameters

Technology Developers

MIT Lincoln Laboratory11

Odyssey 2008

NIST SRELREPre-history

1992 1993

Rutgers Summer Workshop

1994Informal LRE bull 4 sites OGI MITLL MIT ITTbull OGI 12 lang corpus

MartignyWorkshop

DARPA SID evalbull 3 sites Dragon (LVCSR) ITT

(NN) MITLL (GMM)bull Early SWB1bull 1-4 conv trainbull 24 tgtsbull 111 tgt-test 466 imp-testbull 10-60s testbull Speaker dependent ROCbull Intro of Swets Normal-Deviate

plot (DET)bull Areas Under ROC PdPf=10bull Sun Sparc 1020 MB

MIT Lincoln Laboratory12

Odyssey 2008

NIST SRELREFormal Start

1995 1996SRE 1 bull 6 sites BBN (uniGauss)

Dragon (LVCSR) Ensigma(ergodic HMM) INRS (phone HMMs) ITT (NN) MITLL (GMM o64 cohort)

bull SWB1bull 26 tgtsbull Train 10s 30s 4 sessbull Test 5s 10s 30s Separate

tgt and imp testsbull Area Under ROC

PdPf=3105 closed set error

bull SameDiff phone effectbull Speaker dependent ROC

SRE 2 bull 11 sites Ensigma ITT

MITLL SRI CAIP INRS Dragon BBN LIMSI ATampT Sanders

bull Broad phone HMM LVCSR VQ adapted GMM SVM worldcohorts hnorm anchor models

bull SWB1bull 40 tgtsbull Train 2 min - 1 sess 1

handset 2 handset bull Test 3s 10s 30s Separate

tgt and imp testsbull Pooled DET DCF

LRE 1 bull 4-5 sitesbull PPRLM GMM-CEP

Syllabic models fusion

bull Callfriendbull 12 languages 3

dialectsbull Test 3s 10s 30sbull DET DCF

MIT Lincoln Laboratory13

Odyssey 2008

NIST SRELRESteady Progress

Avignon Workshop

1997 1998SRE 3 bull 8 sites bull Pitch features handset mic

detectorcomp using more dev data

bull SWB2p1bull All speakers act as tgts

and imposters (current paradigm)

bull Train 2 min - 1 sess 1 handset 2 handset

bull Test 3s 10s 30sbull No cross-sex trials

matched and mismatched test phone

bull DET DCF

SRE 4bull 12 sitesbull Phone sequences (BBN)

sequence models (Dragon)

bull SWB2p2bull Train 2 min - 1 sess 2

sess all - 2 sessbull Test 3s 10s 30sbull SNDN and HS type side

knowledgebull Human performance

3s

1999SRE 5bull 13 sitesbull T-norm system fusionbull SWB2p3bull Train 2 min - 2 sessbull Test varying duration (0-

15 15-30 30-45gt45) diff number

bull New tasks 2-spkr test speaker tracking

MIT Lincoln Laboratory14

Odyssey 2008

NIST SRELRENew Directions

Odyssey Workshop

JHU SuperSIDWorkshop

2000 2001SRE 6bull 12 sites (First shark

sighting) bull SMS bull SWB2p1p2

AHUMADAbull Train 2 min - 1 sessbull Test variable 0-60 bull New tasks 2-spkr

train amp test N-speaker segmentation

20032002SRE 7bull 13 sitesbull Per-frame SVM

Fusion text-constrained GMM word amp phone N-gram

bull SWB2p1p2 AHUMADA SWB2p4 (cell)

bull Extended data task (SWB1)

SRE 8bull 24 sitesbull Feat map

high-level features mlpfusion

bull SWB2p5 (cell) SWB2p2p3 (ext) FBIVoiceDB(Multi Modal) BNEWS (seg) Meeting (seg)

SRE 9bull 19 sitesbull SVM GLDS phone

svm nerfs)bull SWB2p5 (cell)

SWB2p2p3 (ext)

LRE 2bull 6 sitesbull PPRLM GMM-SDC

SVM-SDC fusionbull Callfriendbull 12 languagesbull Test 3s 10s 30s

MIT Lincoln Laboratory15

Odyssey 2008

NIST SRELRECurrent Period

Odyssey Workshop

Odyssey Workshop

2004 2005 20072006LRE 4bull 21 sitesbull SVM-GSV ho

ngrams fLFAfNAPbull Mixer5 OHSUbull 14 languages 5

dialectsaccentsbull Calibrated LLRs

LRE 3bull 11 sitesbull GMM-MMI TRAPSNN-

decoder phone lattice

2008

Odyssey WS SRE 13 JHU WS

SRE 10bull 24 sitesbull Large system

fusionbull Mixer1bull Bilingual

speakers

SRE 11bull 27 sitesbull LFA SVM-MLLRbull Mixer2 MMSRbull Cross-channel

microphonesbull Calibrated LLRs

SRE 12bull 36 sitesbull SVM-GSV

spectral-only systems

bull Mixer2+3 MMSRbull Bilingual cross-

channelbull Multi-site

collaboration

PPR-BinTree PPR-SVMbull OHSU Mixer12bull 7 languages 3

dialectsaccents

MIT Lincoln Laboratory16

Odyssey 2008

NIST SREHow are we doing

0

001002

003004

005

006007

008

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

Year

DC

F

Landline 1sp 2 min train 30 sec test

Cellular 1sp 2 min train 30 sec test

Landline 2-speaker detection

Ahumada(Spanish)

Multimodal (FBI)Landline 1sp (40

target speaker paradigm)

Cellularland 2-speaker detection

CellLand 1sp 8-conv train 1-conv test

Cross-mic1-conv train (tel) 1-conv test (mic)

Swb1 Swb2p1 Swb2p3 Swb2p4 Swb2p5 Mixer1 Mixer3

bull Sampling of tasks shown 28 in SRE04

Cross-language

Swb2p2 Mixer2 MMSR

MIT Lincoln Laboratory17

Odyssey 2008

0123456789

10

0

1

2

3

4

2001 2002 2003 2004 2005 2006

SRE Performance Trends 2001-2007Lincoln Systems

bull Consistent and steady improvement for datatask focus

EER ()1conv4w1conv4w

8conv4w1conv4w

minDCFx100

2001 2002 2003 2004 2005 2006SWB1 SWB2 MIXER2-3

bull New data sets designed to be more challenging

bull New features classifiers and compensations drive error rates down over time

SVM-GSV GMM-LFA MultiFeatSVM-GLDS SVM-MLLR+NAP

2006

NAP TC-SVM wordphone lattices2005

PhoneWord-SVM GMM-ATNORM2004

Feature Mapping SVM-GLDS2003

SuperSID High-level features2002

Text-const GMM word-ngram2001

MIT Lincoln Laboratory18

Odyssey 2008

0

10

20

30

40

1996 2003 2005 2005 2007 2007

EER

()

30s 10s 3s

CallFriend(12-lang)

OHSU(7-lang)

Mixer3(14-lang)

113

32 421014

LRE Performance Trends 1996-2007Lincoln Systems

19

Year Main LID Technology

1996 PPRLM2003 + GMM-SDC SVM-SDC2005 + Phone lattices SVM w ngrams

Binary Trees2007 + TRAPS tokenizers fLFA fNAP

GMM-MMI SVM-GSV calibrated LLRs

MIT Lincoln Laboratory19

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory20

Odyssey 2008

The ExpeditionEvaluations

bull The evaluation paradigm has clearly helped propel speaker and language RampD forward

ndash Common focus ndash Comparable results and repeatable experimentsndash Collaboration

bull But there are some issues to considerndash Proliferation of tasks and conditions can dilute and fragment

community effortndash Evaluations are application-dependent

The tasks conditions and data are representative of some application(s)

Are these being set in a meaningful wayndash Performance numbers need context

Time-pressed less-technical potential users want yesno to ldquowill it or wonrsquot it work for my applicationrdquo

ndash Speaker and language recognition research increasingly relies on data driven discovery

Does performance depend on highly matched dev data Are performance gains due to technology or data

MIT Lincoln Laboratory21

Odyssey 2008

The ExpeditionResearch

bull Speaker and language research are built on three core areas

ndash Speech Science Understanding how speakerlanguage information is conveyed in the speech signal and how to robustly extract measures of this information

ndash Pattern Recognition Techniques and algorithms to effectively represent and compare salient patterns in data

ndash Data Driven Discovery Effectively using data to apply refine and improve systems built from above

bull Current speakerlanguage research is heavily weighted toward data driven discovery

ndash Cure or cursendash Are we discovering underlying problems to address in

research or just where we want more data

  • Speaker and Language RecognitionA Guided Safari
  • Roadmap
  • The OdysseyMartigny Switzerland ndash April 5-7 1994
  • The OdysseyAvignon France ndash April 20-23 1998
  • The OdysseyCrete Greece ndash June 18-22 2001
  • The OdysseyToledo Spain ndash May 31- June 3 2004
  • The OdysseySan Juan Puerto Rico ndash June 28-30 2006
  • The OdysseyStellenbosch South Africa ndash January 21-24 2008
  • Roadmap
  • NIST SpeakerLanguage Recognition Evaluations
  • NIST SRELREPre-history
  • NIST SRELREFormal Start
  • NIST SRELRESteady Progress
  • NIST SRELRENew Directions
  • NIST SRELRECurrent Period
  • NIST SREHow are we doing
  • SRE Performance Trends 2001-2007Lincoln Systems
  • LRE Performance Trends 1996-2007Lincoln Systems
  • Roadmap
  • The ExpeditionEvaluations
  • The ExpeditionResearch
Page 11: Speaker and Language Recognition

MIT Lincoln Laboratory11

Odyssey 2008

NIST SRELREPre-history

1992 1993

Rutgers Summer Workshop

1994Informal LRE bull 4 sites OGI MITLL MIT ITTbull OGI 12 lang corpus

MartignyWorkshop

DARPA SID evalbull 3 sites Dragon (LVCSR) ITT

(NN) MITLL (GMM)bull Early SWB1bull 1-4 conv trainbull 24 tgtsbull 111 tgt-test 466 imp-testbull 10-60s testbull Speaker dependent ROCbull Intro of Swets Normal-Deviate

plot (DET)bull Areas Under ROC PdPf=10bull Sun Sparc 1020 MB

MIT Lincoln Laboratory12

Odyssey 2008

NIST SRELREFormal Start

1995 1996SRE 1 bull 6 sites BBN (uniGauss)

Dragon (LVCSR) Ensigma(ergodic HMM) INRS (phone HMMs) ITT (NN) MITLL (GMM o64 cohort)

bull SWB1bull 26 tgtsbull Train 10s 30s 4 sessbull Test 5s 10s 30s Separate

tgt and imp testsbull Area Under ROC

PdPf=3105 closed set error

bull SameDiff phone effectbull Speaker dependent ROC

SRE 2 bull 11 sites Ensigma ITT

MITLL SRI CAIP INRS Dragon BBN LIMSI ATampT Sanders

bull Broad phone HMM LVCSR VQ adapted GMM SVM worldcohorts hnorm anchor models

bull SWB1bull 40 tgtsbull Train 2 min - 1 sess 1

handset 2 handset bull Test 3s 10s 30s Separate

tgt and imp testsbull Pooled DET DCF

LRE 1 bull 4-5 sitesbull PPRLM GMM-CEP

Syllabic models fusion

bull Callfriendbull 12 languages 3

dialectsbull Test 3s 10s 30sbull DET DCF

MIT Lincoln Laboratory13

Odyssey 2008

NIST SRELRESteady Progress

Avignon Workshop

1997 1998SRE 3 bull 8 sites bull Pitch features handset mic

detectorcomp using more dev data

bull SWB2p1bull All speakers act as tgts

and imposters (current paradigm)

bull Train 2 min - 1 sess 1 handset 2 handset

bull Test 3s 10s 30sbull No cross-sex trials

matched and mismatched test phone

bull DET DCF

SRE 4bull 12 sitesbull Phone sequences (BBN)

sequence models (Dragon)

bull SWB2p2bull Train 2 min - 1 sess 2

sess all - 2 sessbull Test 3s 10s 30sbull SNDN and HS type side

knowledgebull Human performance

3s

1999SRE 5bull 13 sitesbull T-norm system fusionbull SWB2p3bull Train 2 min - 2 sessbull Test varying duration (0-

15 15-30 30-45gt45) diff number

bull New tasks 2-spkr test speaker tracking

MIT Lincoln Laboratory14

Odyssey 2008

NIST SRELRENew Directions

Odyssey Workshop

JHU SuperSIDWorkshop

2000 2001SRE 6bull 12 sites (First shark

sighting) bull SMS bull SWB2p1p2

AHUMADAbull Train 2 min - 1 sessbull Test variable 0-60 bull New tasks 2-spkr

train amp test N-speaker segmentation

20032002SRE 7bull 13 sitesbull Per-frame SVM

Fusion text-constrained GMM word amp phone N-gram

bull SWB2p1p2 AHUMADA SWB2p4 (cell)

bull Extended data task (SWB1)

SRE 8bull 24 sitesbull Feat map

high-level features mlpfusion

bull SWB2p5 (cell) SWB2p2p3 (ext) FBIVoiceDB(Multi Modal) BNEWS (seg) Meeting (seg)

SRE 9bull 19 sitesbull SVM GLDS phone

svm nerfs)bull SWB2p5 (cell)

SWB2p2p3 (ext)

LRE 2bull 6 sitesbull PPRLM GMM-SDC

SVM-SDC fusionbull Callfriendbull 12 languagesbull Test 3s 10s 30s

MIT Lincoln Laboratory15

Odyssey 2008

NIST SRELRECurrent Period

Odyssey Workshop

Odyssey Workshop

2004 2005 20072006LRE 4bull 21 sitesbull SVM-GSV ho

ngrams fLFAfNAPbull Mixer5 OHSUbull 14 languages 5

dialectsaccentsbull Calibrated LLRs

LRE 3bull 11 sitesbull GMM-MMI TRAPSNN-

decoder phone lattice

2008

Odyssey WS SRE 13 JHU WS

SRE 10bull 24 sitesbull Large system

fusionbull Mixer1bull Bilingual

speakers

SRE 11bull 27 sitesbull LFA SVM-MLLRbull Mixer2 MMSRbull Cross-channel

microphonesbull Calibrated LLRs

SRE 12bull 36 sitesbull SVM-GSV

spectral-only systems

bull Mixer2+3 MMSRbull Bilingual cross-

channelbull Multi-site

collaboration

PPR-BinTree PPR-SVMbull OHSU Mixer12bull 7 languages 3

dialectsaccents

MIT Lincoln Laboratory16

Odyssey 2008

NIST SREHow are we doing

0

001002

003004

005

006007

008

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

Year

DC

F

Landline 1sp 2 min train 30 sec test

Cellular 1sp 2 min train 30 sec test

Landline 2-speaker detection

Ahumada(Spanish)

Multimodal (FBI)Landline 1sp (40

target speaker paradigm)

Cellularland 2-speaker detection

CellLand 1sp 8-conv train 1-conv test

Cross-mic1-conv train (tel) 1-conv test (mic)

Swb1 Swb2p1 Swb2p3 Swb2p4 Swb2p5 Mixer1 Mixer3

bull Sampling of tasks shown 28 in SRE04

Cross-language

Swb2p2 Mixer2 MMSR

MIT Lincoln Laboratory17

Odyssey 2008

0123456789

10

0

1

2

3

4

2001 2002 2003 2004 2005 2006

SRE Performance Trends 2001-2007Lincoln Systems

bull Consistent and steady improvement for datatask focus

EER ()1conv4w1conv4w

8conv4w1conv4w

minDCFx100

2001 2002 2003 2004 2005 2006SWB1 SWB2 MIXER2-3

bull New data sets designed to be more challenging

bull New features classifiers and compensations drive error rates down over time

SVM-GSV GMM-LFA MultiFeatSVM-GLDS SVM-MLLR+NAP

2006

NAP TC-SVM wordphone lattices2005

PhoneWord-SVM GMM-ATNORM2004

Feature Mapping SVM-GLDS2003

SuperSID High-level features2002

Text-const GMM word-ngram2001

MIT Lincoln Laboratory18

Odyssey 2008

0

10

20

30

40

1996 2003 2005 2005 2007 2007

EER

()

30s 10s 3s

CallFriend(12-lang)

OHSU(7-lang)

Mixer3(14-lang)

113

32 421014

LRE Performance Trends 1996-2007Lincoln Systems

19

Year Main LID Technology

1996 PPRLM2003 + GMM-SDC SVM-SDC2005 + Phone lattices SVM w ngrams

Binary Trees2007 + TRAPS tokenizers fLFA fNAP

GMM-MMI SVM-GSV calibrated LLRs

MIT Lincoln Laboratory19

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory20

Odyssey 2008

The ExpeditionEvaluations

bull The evaluation paradigm has clearly helped propel speaker and language RampD forward

ndash Common focus ndash Comparable results and repeatable experimentsndash Collaboration

bull But there are some issues to considerndash Proliferation of tasks and conditions can dilute and fragment

community effortndash Evaluations are application-dependent

The tasks conditions and data are representative of some application(s)

Are these being set in a meaningful wayndash Performance numbers need context

Time-pressed less-technical potential users want yesno to ldquowill it or wonrsquot it work for my applicationrdquo

ndash Speaker and language recognition research increasingly relies on data driven discovery

Does performance depend on highly matched dev data Are performance gains due to technology or data

MIT Lincoln Laboratory21

Odyssey 2008

The ExpeditionResearch

bull Speaker and language research are built on three core areas

ndash Speech Science Understanding how speakerlanguage information is conveyed in the speech signal and how to robustly extract measures of this information

ndash Pattern Recognition Techniques and algorithms to effectively represent and compare salient patterns in data

ndash Data Driven Discovery Effectively using data to apply refine and improve systems built from above

bull Current speakerlanguage research is heavily weighted toward data driven discovery

ndash Cure or cursendash Are we discovering underlying problems to address in

research or just where we want more data

  • Speaker and Language RecognitionA Guided Safari
  • Roadmap
  • The OdysseyMartigny Switzerland ndash April 5-7 1994
  • The OdysseyAvignon France ndash April 20-23 1998
  • The OdysseyCrete Greece ndash June 18-22 2001
  • The OdysseyToledo Spain ndash May 31- June 3 2004
  • The OdysseySan Juan Puerto Rico ndash June 28-30 2006
  • The OdysseyStellenbosch South Africa ndash January 21-24 2008
  • Roadmap
  • NIST SpeakerLanguage Recognition Evaluations
  • NIST SRELREPre-history
  • NIST SRELREFormal Start
  • NIST SRELRESteady Progress
  • NIST SRELRENew Directions
  • NIST SRELRECurrent Period
  • NIST SREHow are we doing
  • SRE Performance Trends 2001-2007Lincoln Systems
  • LRE Performance Trends 1996-2007Lincoln Systems
  • Roadmap
  • The ExpeditionEvaluations
  • The ExpeditionResearch
Page 12: Speaker and Language Recognition

MIT Lincoln Laboratory12

Odyssey 2008

NIST SRELREFormal Start

1995 1996SRE 1 bull 6 sites BBN (uniGauss)

Dragon (LVCSR) Ensigma(ergodic HMM) INRS (phone HMMs) ITT (NN) MITLL (GMM o64 cohort)

bull SWB1bull 26 tgtsbull Train 10s 30s 4 sessbull Test 5s 10s 30s Separate

tgt and imp testsbull Area Under ROC

PdPf=3105 closed set error

bull SameDiff phone effectbull Speaker dependent ROC

SRE 2 bull 11 sites Ensigma ITT

MITLL SRI CAIP INRS Dragon BBN LIMSI ATampT Sanders

bull Broad phone HMM LVCSR VQ adapted GMM SVM worldcohorts hnorm anchor models

bull SWB1bull 40 tgtsbull Train 2 min - 1 sess 1

handset 2 handset bull Test 3s 10s 30s Separate

tgt and imp testsbull Pooled DET DCF

LRE 1 bull 4-5 sitesbull PPRLM GMM-CEP

Syllabic models fusion

bull Callfriendbull 12 languages 3

dialectsbull Test 3s 10s 30sbull DET DCF

MIT Lincoln Laboratory13

Odyssey 2008

NIST SRELRESteady Progress

Avignon Workshop

1997 1998SRE 3 bull 8 sites bull Pitch features handset mic

detectorcomp using more dev data

bull SWB2p1bull All speakers act as tgts

and imposters (current paradigm)

bull Train 2 min - 1 sess 1 handset 2 handset

bull Test 3s 10s 30sbull No cross-sex trials

matched and mismatched test phone

bull DET DCF

SRE 4bull 12 sitesbull Phone sequences (BBN)

sequence models (Dragon)

bull SWB2p2bull Train 2 min - 1 sess 2

sess all - 2 sessbull Test 3s 10s 30sbull SNDN and HS type side

knowledgebull Human performance

3s

1999SRE 5bull 13 sitesbull T-norm system fusionbull SWB2p3bull Train 2 min - 2 sessbull Test varying duration (0-

15 15-30 30-45gt45) diff number

bull New tasks 2-spkr test speaker tracking

MIT Lincoln Laboratory14

Odyssey 2008

NIST SRELRENew Directions

Odyssey Workshop

JHU SuperSIDWorkshop

2000 2001SRE 6bull 12 sites (First shark

sighting) bull SMS bull SWB2p1p2

AHUMADAbull Train 2 min - 1 sessbull Test variable 0-60 bull New tasks 2-spkr

train amp test N-speaker segmentation

20032002SRE 7bull 13 sitesbull Per-frame SVM

Fusion text-constrained GMM word amp phone N-gram

bull SWB2p1p2 AHUMADA SWB2p4 (cell)

bull Extended data task (SWB1)

SRE 8bull 24 sitesbull Feat map

high-level features mlpfusion

bull SWB2p5 (cell) SWB2p2p3 (ext) FBIVoiceDB(Multi Modal) BNEWS (seg) Meeting (seg)

SRE 9bull 19 sitesbull SVM GLDS phone

svm nerfs)bull SWB2p5 (cell)

SWB2p2p3 (ext)

LRE 2bull 6 sitesbull PPRLM GMM-SDC

SVM-SDC fusionbull Callfriendbull 12 languagesbull Test 3s 10s 30s

MIT Lincoln Laboratory15

Odyssey 2008

NIST SRELRECurrent Period

Odyssey Workshop

Odyssey Workshop

2004 2005 20072006LRE 4bull 21 sitesbull SVM-GSV ho

ngrams fLFAfNAPbull Mixer5 OHSUbull 14 languages 5

dialectsaccentsbull Calibrated LLRs

LRE 3bull 11 sitesbull GMM-MMI TRAPSNN-

decoder phone lattice

2008

Odyssey WS SRE 13 JHU WS

SRE 10bull 24 sitesbull Large system

fusionbull Mixer1bull Bilingual

speakers

SRE 11bull 27 sitesbull LFA SVM-MLLRbull Mixer2 MMSRbull Cross-channel

microphonesbull Calibrated LLRs

SRE 12bull 36 sitesbull SVM-GSV

spectral-only systems

bull Mixer2+3 MMSRbull Bilingual cross-

channelbull Multi-site

collaboration

PPR-BinTree PPR-SVMbull OHSU Mixer12bull 7 languages 3

dialectsaccents

MIT Lincoln Laboratory16

Odyssey 2008

NIST SREHow are we doing

0

001002

003004

005

006007

008

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

Year

DC

F

Landline 1sp 2 min train 30 sec test

Cellular 1sp 2 min train 30 sec test

Landline 2-speaker detection

Ahumada(Spanish)

Multimodal (FBI)Landline 1sp (40

target speaker paradigm)

Cellularland 2-speaker detection

CellLand 1sp 8-conv train 1-conv test

Cross-mic1-conv train (tel) 1-conv test (mic)

Swb1 Swb2p1 Swb2p3 Swb2p4 Swb2p5 Mixer1 Mixer3

bull Sampling of tasks shown 28 in SRE04

Cross-language

Swb2p2 Mixer2 MMSR

MIT Lincoln Laboratory17

Odyssey 2008

0123456789

10

0

1

2

3

4

2001 2002 2003 2004 2005 2006

SRE Performance Trends 2001-2007Lincoln Systems

bull Consistent and steady improvement for datatask focus

EER ()1conv4w1conv4w

8conv4w1conv4w

minDCFx100

2001 2002 2003 2004 2005 2006SWB1 SWB2 MIXER2-3

bull New data sets designed to be more challenging

bull New features classifiers and compensations drive error rates down over time

SVM-GSV GMM-LFA MultiFeatSVM-GLDS SVM-MLLR+NAP

2006

NAP TC-SVM wordphone lattices2005

PhoneWord-SVM GMM-ATNORM2004

Feature Mapping SVM-GLDS2003

SuperSID High-level features2002

Text-const GMM word-ngram2001

MIT Lincoln Laboratory18

Odyssey 2008

0

10

20

30

40

1996 2003 2005 2005 2007 2007

EER

()

30s 10s 3s

CallFriend(12-lang)

OHSU(7-lang)

Mixer3(14-lang)

113

32 421014

LRE Performance Trends 1996-2007Lincoln Systems

19

Year Main LID Technology

1996 PPRLM2003 + GMM-SDC SVM-SDC2005 + Phone lattices SVM w ngrams

Binary Trees2007 + TRAPS tokenizers fLFA fNAP

GMM-MMI SVM-GSV calibrated LLRs

MIT Lincoln Laboratory19

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory20

Odyssey 2008

The ExpeditionEvaluations

bull The evaluation paradigm has clearly helped propel speaker and language RampD forward

ndash Common focus ndash Comparable results and repeatable experimentsndash Collaboration

bull But there are some issues to considerndash Proliferation of tasks and conditions can dilute and fragment

community effortndash Evaluations are application-dependent

The tasks conditions and data are representative of some application(s)

Are these being set in a meaningful wayndash Performance numbers need context

Time-pressed less-technical potential users want yesno to ldquowill it or wonrsquot it work for my applicationrdquo

ndash Speaker and language recognition research increasingly relies on data driven discovery

Does performance depend on highly matched dev data Are performance gains due to technology or data

MIT Lincoln Laboratory21

Odyssey 2008

The ExpeditionResearch

bull Speaker and language research are built on three core areas

ndash Speech Science Understanding how speakerlanguage information is conveyed in the speech signal and how to robustly extract measures of this information

ndash Pattern Recognition Techniques and algorithms to effectively represent and compare salient patterns in data

ndash Data Driven Discovery Effectively using data to apply refine and improve systems built from above

bull Current speakerlanguage research is heavily weighted toward data driven discovery

ndash Cure or cursendash Are we discovering underlying problems to address in

research or just where we want more data

  • Speaker and Language RecognitionA Guided Safari
  • Roadmap
  • The OdysseyMartigny Switzerland ndash April 5-7 1994
  • The OdysseyAvignon France ndash April 20-23 1998
  • The OdysseyCrete Greece ndash June 18-22 2001
  • The OdysseyToledo Spain ndash May 31- June 3 2004
  • The OdysseySan Juan Puerto Rico ndash June 28-30 2006
  • The OdysseyStellenbosch South Africa ndash January 21-24 2008
  • Roadmap
  • NIST SpeakerLanguage Recognition Evaluations
  • NIST SRELREPre-history
  • NIST SRELREFormal Start
  • NIST SRELRESteady Progress
  • NIST SRELRENew Directions
  • NIST SRELRECurrent Period
  • NIST SREHow are we doing
  • SRE Performance Trends 2001-2007Lincoln Systems
  • LRE Performance Trends 1996-2007Lincoln Systems
  • Roadmap
  • The ExpeditionEvaluations
  • The ExpeditionResearch
Page 13: Speaker and Language Recognition

MIT Lincoln Laboratory13

Odyssey 2008

NIST SRELRESteady Progress

Avignon Workshop

1997 1998SRE 3 bull 8 sites bull Pitch features handset mic

detectorcomp using more dev data

bull SWB2p1bull All speakers act as tgts

and imposters (current paradigm)

bull Train 2 min - 1 sess 1 handset 2 handset

bull Test 3s 10s 30sbull No cross-sex trials

matched and mismatched test phone

bull DET DCF

SRE 4bull 12 sitesbull Phone sequences (BBN)

sequence models (Dragon)

bull SWB2p2bull Train 2 min - 1 sess 2

sess all - 2 sessbull Test 3s 10s 30sbull SNDN and HS type side

knowledgebull Human performance

3s

1999SRE 5bull 13 sitesbull T-norm system fusionbull SWB2p3bull Train 2 min - 2 sessbull Test varying duration (0-

15 15-30 30-45gt45) diff number

bull New tasks 2-spkr test speaker tracking

MIT Lincoln Laboratory14

Odyssey 2008

NIST SRELRENew Directions

Odyssey Workshop

JHU SuperSIDWorkshop

2000 2001SRE 6bull 12 sites (First shark

sighting) bull SMS bull SWB2p1p2

AHUMADAbull Train 2 min - 1 sessbull Test variable 0-60 bull New tasks 2-spkr

train amp test N-speaker segmentation

20032002SRE 7bull 13 sitesbull Per-frame SVM

Fusion text-constrained GMM word amp phone N-gram

bull SWB2p1p2 AHUMADA SWB2p4 (cell)

bull Extended data task (SWB1)

SRE 8bull 24 sitesbull Feat map

high-level features mlpfusion

bull SWB2p5 (cell) SWB2p2p3 (ext) FBIVoiceDB(Multi Modal) BNEWS (seg) Meeting (seg)

SRE 9bull 19 sitesbull SVM GLDS phone

svm nerfs)bull SWB2p5 (cell)

SWB2p2p3 (ext)

LRE 2bull 6 sitesbull PPRLM GMM-SDC

SVM-SDC fusionbull Callfriendbull 12 languagesbull Test 3s 10s 30s

MIT Lincoln Laboratory15

Odyssey 2008

NIST SRELRECurrent Period

Odyssey Workshop

Odyssey Workshop

2004 2005 20072006LRE 4bull 21 sitesbull SVM-GSV ho

ngrams fLFAfNAPbull Mixer5 OHSUbull 14 languages 5

dialectsaccentsbull Calibrated LLRs

LRE 3bull 11 sitesbull GMM-MMI TRAPSNN-

decoder phone lattice

2008

Odyssey WS SRE 13 JHU WS

SRE 10bull 24 sitesbull Large system

fusionbull Mixer1bull Bilingual

speakers

SRE 11bull 27 sitesbull LFA SVM-MLLRbull Mixer2 MMSRbull Cross-channel

microphonesbull Calibrated LLRs

SRE 12bull 36 sitesbull SVM-GSV

spectral-only systems

bull Mixer2+3 MMSRbull Bilingual cross-

channelbull Multi-site

collaboration

PPR-BinTree PPR-SVMbull OHSU Mixer12bull 7 languages 3

dialectsaccents

MIT Lincoln Laboratory16

Odyssey 2008

NIST SREHow are we doing

0

001002

003004

005

006007

008

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

Year

DC

F

Landline 1sp 2 min train 30 sec test

Cellular 1sp 2 min train 30 sec test

Landline 2-speaker detection

Ahumada(Spanish)

Multimodal (FBI)Landline 1sp (40

target speaker paradigm)

Cellularland 2-speaker detection

CellLand 1sp 8-conv train 1-conv test

Cross-mic1-conv train (tel) 1-conv test (mic)

Swb1 Swb2p1 Swb2p3 Swb2p4 Swb2p5 Mixer1 Mixer3

bull Sampling of tasks shown 28 in SRE04

Cross-language

Swb2p2 Mixer2 MMSR

MIT Lincoln Laboratory17

Odyssey 2008

0123456789

10

0

1

2

3

4

2001 2002 2003 2004 2005 2006

SRE Performance Trends 2001-2007Lincoln Systems

bull Consistent and steady improvement for datatask focus

EER ()1conv4w1conv4w

8conv4w1conv4w

minDCFx100

2001 2002 2003 2004 2005 2006SWB1 SWB2 MIXER2-3

bull New data sets designed to be more challenging

bull New features classifiers and compensations drive error rates down over time

SVM-GSV GMM-LFA MultiFeatSVM-GLDS SVM-MLLR+NAP

2006

NAP TC-SVM wordphone lattices2005

PhoneWord-SVM GMM-ATNORM2004

Feature Mapping SVM-GLDS2003

SuperSID High-level features2002

Text-const GMM word-ngram2001

MIT Lincoln Laboratory18

Odyssey 2008

0

10

20

30

40

1996 2003 2005 2005 2007 2007

EER

()

30s 10s 3s

CallFriend(12-lang)

OHSU(7-lang)

Mixer3(14-lang)

113

32 421014

LRE Performance Trends 1996-2007Lincoln Systems

19

Year Main LID Technology

1996 PPRLM2003 + GMM-SDC SVM-SDC2005 + Phone lattices SVM w ngrams

Binary Trees2007 + TRAPS tokenizers fLFA fNAP

GMM-MMI SVM-GSV calibrated LLRs

MIT Lincoln Laboratory19

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory20

Odyssey 2008

The ExpeditionEvaluations

bull The evaluation paradigm has clearly helped propel speaker and language RampD forward

ndash Common focus ndash Comparable results and repeatable experimentsndash Collaboration

bull But there are some issues to considerndash Proliferation of tasks and conditions can dilute and fragment

community effortndash Evaluations are application-dependent

The tasks conditions and data are representative of some application(s)

Are these being set in a meaningful wayndash Performance numbers need context

Time-pressed less-technical potential users want yesno to ldquowill it or wonrsquot it work for my applicationrdquo

ndash Speaker and language recognition research increasingly relies on data driven discovery

Does performance depend on highly matched dev data Are performance gains due to technology or data

MIT Lincoln Laboratory21

Odyssey 2008

The ExpeditionResearch

bull Speaker and language research are built on three core areas

ndash Speech Science Understanding how speakerlanguage information is conveyed in the speech signal and how to robustly extract measures of this information

ndash Pattern Recognition Techniques and algorithms to effectively represent and compare salient patterns in data

ndash Data Driven Discovery Effectively using data to apply refine and improve systems built from above

bull Current speakerlanguage research is heavily weighted toward data driven discovery

ndash Cure or cursendash Are we discovering underlying problems to address in

research or just where we want more data

  • Speaker and Language RecognitionA Guided Safari
  • Roadmap
  • The OdysseyMartigny Switzerland ndash April 5-7 1994
  • The OdysseyAvignon France ndash April 20-23 1998
  • The OdysseyCrete Greece ndash June 18-22 2001
  • The OdysseyToledo Spain ndash May 31- June 3 2004
  • The OdysseySan Juan Puerto Rico ndash June 28-30 2006
  • The OdysseyStellenbosch South Africa ndash January 21-24 2008
  • Roadmap
  • NIST SpeakerLanguage Recognition Evaluations
  • NIST SRELREPre-history
  • NIST SRELREFormal Start
  • NIST SRELRESteady Progress
  • NIST SRELRENew Directions
  • NIST SRELRECurrent Period
  • NIST SREHow are we doing
  • SRE Performance Trends 2001-2007Lincoln Systems
  • LRE Performance Trends 1996-2007Lincoln Systems
  • Roadmap
  • The ExpeditionEvaluations
  • The ExpeditionResearch
Page 14: Speaker and Language Recognition

MIT Lincoln Laboratory14

Odyssey 2008

NIST SRELRENew Directions

Odyssey Workshop

JHU SuperSIDWorkshop

2000 2001SRE 6bull 12 sites (First shark

sighting) bull SMS bull SWB2p1p2

AHUMADAbull Train 2 min - 1 sessbull Test variable 0-60 bull New tasks 2-spkr

train amp test N-speaker segmentation

20032002SRE 7bull 13 sitesbull Per-frame SVM

Fusion text-constrained GMM word amp phone N-gram

bull SWB2p1p2 AHUMADA SWB2p4 (cell)

bull Extended data task (SWB1)

SRE 8bull 24 sitesbull Feat map

high-level features mlpfusion

bull SWB2p5 (cell) SWB2p2p3 (ext) FBIVoiceDB(Multi Modal) BNEWS (seg) Meeting (seg)

SRE 9bull 19 sitesbull SVM GLDS phone

svm nerfs)bull SWB2p5 (cell)

SWB2p2p3 (ext)

LRE 2bull 6 sitesbull PPRLM GMM-SDC

SVM-SDC fusionbull Callfriendbull 12 languagesbull Test 3s 10s 30s

MIT Lincoln Laboratory15

Odyssey 2008

NIST SRELRECurrent Period

Odyssey Workshop

Odyssey Workshop

2004 2005 20072006LRE 4bull 21 sitesbull SVM-GSV ho

ngrams fLFAfNAPbull Mixer5 OHSUbull 14 languages 5

dialectsaccentsbull Calibrated LLRs

LRE 3bull 11 sitesbull GMM-MMI TRAPSNN-

decoder phone lattice

2008

Odyssey WS SRE 13 JHU WS

SRE 10bull 24 sitesbull Large system

fusionbull Mixer1bull Bilingual

speakers

SRE 11bull 27 sitesbull LFA SVM-MLLRbull Mixer2 MMSRbull Cross-channel

microphonesbull Calibrated LLRs

SRE 12bull 36 sitesbull SVM-GSV

spectral-only systems

bull Mixer2+3 MMSRbull Bilingual cross-

channelbull Multi-site

collaboration

PPR-BinTree PPR-SVMbull OHSU Mixer12bull 7 languages 3

dialectsaccents

MIT Lincoln Laboratory16

Odyssey 2008

NIST SREHow are we doing

0

001002

003004

005

006007

008

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

Year

DC

F

Landline 1sp 2 min train 30 sec test

Cellular 1sp 2 min train 30 sec test

Landline 2-speaker detection

Ahumada(Spanish)

Multimodal (FBI)Landline 1sp (40

target speaker paradigm)

Cellularland 2-speaker detection

CellLand 1sp 8-conv train 1-conv test

Cross-mic1-conv train (tel) 1-conv test (mic)

Swb1 Swb2p1 Swb2p3 Swb2p4 Swb2p5 Mixer1 Mixer3

bull Sampling of tasks shown 28 in SRE04

Cross-language

Swb2p2 Mixer2 MMSR

MIT Lincoln Laboratory17

Odyssey 2008

0123456789

10

0

1

2

3

4

2001 2002 2003 2004 2005 2006

SRE Performance Trends 2001-2007Lincoln Systems

bull Consistent and steady improvement for datatask focus

EER ()1conv4w1conv4w

8conv4w1conv4w

minDCFx100

2001 2002 2003 2004 2005 2006SWB1 SWB2 MIXER2-3

bull New data sets designed to be more challenging

bull New features classifiers and compensations drive error rates down over time

SVM-GSV GMM-LFA MultiFeatSVM-GLDS SVM-MLLR+NAP

2006

NAP TC-SVM wordphone lattices2005

PhoneWord-SVM GMM-ATNORM2004

Feature Mapping SVM-GLDS2003

SuperSID High-level features2002

Text-const GMM word-ngram2001

MIT Lincoln Laboratory18

Odyssey 2008

0

10

20

30

40

1996 2003 2005 2005 2007 2007

EER

()

30s 10s 3s

CallFriend(12-lang)

OHSU(7-lang)

Mixer3(14-lang)

113

32 421014

LRE Performance Trends 1996-2007Lincoln Systems

19

Year Main LID Technology

1996 PPRLM2003 + GMM-SDC SVM-SDC2005 + Phone lattices SVM w ngrams

Binary Trees2007 + TRAPS tokenizers fLFA fNAP

GMM-MMI SVM-GSV calibrated LLRs

MIT Lincoln Laboratory19

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory20

Odyssey 2008

The ExpeditionEvaluations

bull The evaluation paradigm has clearly helped propel speaker and language RampD forward

ndash Common focus ndash Comparable results and repeatable experimentsndash Collaboration

bull But there are some issues to considerndash Proliferation of tasks and conditions can dilute and fragment

community effortndash Evaluations are application-dependent

The tasks conditions and data are representative of some application(s)

Are these being set in a meaningful wayndash Performance numbers need context

Time-pressed less-technical potential users want yesno to ldquowill it or wonrsquot it work for my applicationrdquo

ndash Speaker and language recognition research increasingly relies on data driven discovery

Does performance depend on highly matched dev data Are performance gains due to technology or data

MIT Lincoln Laboratory21

Odyssey 2008

The ExpeditionResearch

bull Speaker and language research are built on three core areas

ndash Speech Science Understanding how speakerlanguage information is conveyed in the speech signal and how to robustly extract measures of this information

ndash Pattern Recognition Techniques and algorithms to effectively represent and compare salient patterns in data

ndash Data Driven Discovery Effectively using data to apply refine and improve systems built from above

bull Current speakerlanguage research is heavily weighted toward data driven discovery

ndash Cure or cursendash Are we discovering underlying problems to address in

research or just where we want more data

  • Speaker and Language RecognitionA Guided Safari
  • Roadmap
  • The OdysseyMartigny Switzerland ndash April 5-7 1994
  • The OdysseyAvignon France ndash April 20-23 1998
  • The OdysseyCrete Greece ndash June 18-22 2001
  • The OdysseyToledo Spain ndash May 31- June 3 2004
  • The OdysseySan Juan Puerto Rico ndash June 28-30 2006
  • The OdysseyStellenbosch South Africa ndash January 21-24 2008
  • Roadmap
  • NIST SpeakerLanguage Recognition Evaluations
  • NIST SRELREPre-history
  • NIST SRELREFormal Start
  • NIST SRELRESteady Progress
  • NIST SRELRENew Directions
  • NIST SRELRECurrent Period
  • NIST SREHow are we doing
  • SRE Performance Trends 2001-2007Lincoln Systems
  • LRE Performance Trends 1996-2007Lincoln Systems
  • Roadmap
  • The ExpeditionEvaluations
  • The ExpeditionResearch
Page 15: Speaker and Language Recognition

MIT Lincoln Laboratory15

Odyssey 2008

NIST SRELRECurrent Period

Odyssey Workshop

Odyssey Workshop

2004 2005 20072006LRE 4bull 21 sitesbull SVM-GSV ho

ngrams fLFAfNAPbull Mixer5 OHSUbull 14 languages 5

dialectsaccentsbull Calibrated LLRs

LRE 3bull 11 sitesbull GMM-MMI TRAPSNN-

decoder phone lattice

2008

Odyssey WS SRE 13 JHU WS

SRE 10bull 24 sitesbull Large system

fusionbull Mixer1bull Bilingual

speakers

SRE 11bull 27 sitesbull LFA SVM-MLLRbull Mixer2 MMSRbull Cross-channel

microphonesbull Calibrated LLRs

SRE 12bull 36 sitesbull SVM-GSV

spectral-only systems

bull Mixer2+3 MMSRbull Bilingual cross-

channelbull Multi-site

collaboration

PPR-BinTree PPR-SVMbull OHSU Mixer12bull 7 languages 3

dialectsaccents

MIT Lincoln Laboratory16

Odyssey 2008

NIST SREHow are we doing

0

001002

003004

005

006007

008

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

Year

DC

F

Landline 1sp 2 min train 30 sec test

Cellular 1sp 2 min train 30 sec test

Landline 2-speaker detection

Ahumada(Spanish)

Multimodal (FBI)Landline 1sp (40

target speaker paradigm)

Cellularland 2-speaker detection

CellLand 1sp 8-conv train 1-conv test

Cross-mic1-conv train (tel) 1-conv test (mic)

Swb1 Swb2p1 Swb2p3 Swb2p4 Swb2p5 Mixer1 Mixer3

bull Sampling of tasks shown 28 in SRE04

Cross-language

Swb2p2 Mixer2 MMSR

MIT Lincoln Laboratory17

Odyssey 2008

0123456789

10

0

1

2

3

4

2001 2002 2003 2004 2005 2006

SRE Performance Trends 2001-2007Lincoln Systems

bull Consistent and steady improvement for datatask focus

EER ()1conv4w1conv4w

8conv4w1conv4w

minDCFx100

2001 2002 2003 2004 2005 2006SWB1 SWB2 MIXER2-3

bull New data sets designed to be more challenging

bull New features classifiers and compensations drive error rates down over time

SVM-GSV GMM-LFA MultiFeatSVM-GLDS SVM-MLLR+NAP

2006

NAP TC-SVM wordphone lattices2005

PhoneWord-SVM GMM-ATNORM2004

Feature Mapping SVM-GLDS2003

SuperSID High-level features2002

Text-const GMM word-ngram2001

MIT Lincoln Laboratory18

Odyssey 2008

0

10

20

30

40

1996 2003 2005 2005 2007 2007

EER

()

30s 10s 3s

CallFriend(12-lang)

OHSU(7-lang)

Mixer3(14-lang)

113

32 421014

LRE Performance Trends 1996-2007Lincoln Systems

19

Year Main LID Technology

1996 PPRLM2003 + GMM-SDC SVM-SDC2005 + Phone lattices SVM w ngrams

Binary Trees2007 + TRAPS tokenizers fLFA fNAP

GMM-MMI SVM-GSV calibrated LLRs

MIT Lincoln Laboratory19

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory20

Odyssey 2008

The ExpeditionEvaluations

bull The evaluation paradigm has clearly helped propel speaker and language RampD forward

ndash Common focus ndash Comparable results and repeatable experimentsndash Collaboration

bull But there are some issues to considerndash Proliferation of tasks and conditions can dilute and fragment

community effortndash Evaluations are application-dependent

The tasks conditions and data are representative of some application(s)

Are these being set in a meaningful wayndash Performance numbers need context

Time-pressed less-technical potential users want yesno to ldquowill it or wonrsquot it work for my applicationrdquo

ndash Speaker and language recognition research increasingly relies on data driven discovery

Does performance depend on highly matched dev data Are performance gains due to technology or data

MIT Lincoln Laboratory21

Odyssey 2008

The ExpeditionResearch

bull Speaker and language research are built on three core areas

ndash Speech Science Understanding how speakerlanguage information is conveyed in the speech signal and how to robustly extract measures of this information

ndash Pattern Recognition Techniques and algorithms to effectively represent and compare salient patterns in data

ndash Data Driven Discovery Effectively using data to apply refine and improve systems built from above

bull Current speakerlanguage research is heavily weighted toward data driven discovery

ndash Cure or cursendash Are we discovering underlying problems to address in

research or just where we want more data

  • Speaker and Language RecognitionA Guided Safari
  • Roadmap
  • The OdysseyMartigny Switzerland ndash April 5-7 1994
  • The OdysseyAvignon France ndash April 20-23 1998
  • The OdysseyCrete Greece ndash June 18-22 2001
  • The OdysseyToledo Spain ndash May 31- June 3 2004
  • The OdysseySan Juan Puerto Rico ndash June 28-30 2006
  • The OdysseyStellenbosch South Africa ndash January 21-24 2008
  • Roadmap
  • NIST SpeakerLanguage Recognition Evaluations
  • NIST SRELREPre-history
  • NIST SRELREFormal Start
  • NIST SRELRESteady Progress
  • NIST SRELRENew Directions
  • NIST SRELRECurrent Period
  • NIST SREHow are we doing
  • SRE Performance Trends 2001-2007Lincoln Systems
  • LRE Performance Trends 1996-2007Lincoln Systems
  • Roadmap
  • The ExpeditionEvaluations
  • The ExpeditionResearch
Page 16: Speaker and Language Recognition

MIT Lincoln Laboratory16

Odyssey 2008

NIST SREHow are we doing

0

001002

003004

005

006007

008

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

Year

DC

F

Landline 1sp 2 min train 30 sec test

Cellular 1sp 2 min train 30 sec test

Landline 2-speaker detection

Ahumada(Spanish)

Multimodal (FBI)Landline 1sp (40

target speaker paradigm)

Cellularland 2-speaker detection

CellLand 1sp 8-conv train 1-conv test

Cross-mic1-conv train (tel) 1-conv test (mic)

Swb1 Swb2p1 Swb2p3 Swb2p4 Swb2p5 Mixer1 Mixer3

bull Sampling of tasks shown 28 in SRE04

Cross-language

Swb2p2 Mixer2 MMSR

MIT Lincoln Laboratory17

Odyssey 2008

0123456789

10

0

1

2

3

4

2001 2002 2003 2004 2005 2006

SRE Performance Trends 2001-2007Lincoln Systems

bull Consistent and steady improvement for datatask focus

EER ()1conv4w1conv4w

8conv4w1conv4w

minDCFx100

2001 2002 2003 2004 2005 2006SWB1 SWB2 MIXER2-3

bull New data sets designed to be more challenging

bull New features classifiers and compensations drive error rates down over time

SVM-GSV GMM-LFA MultiFeatSVM-GLDS SVM-MLLR+NAP

2006

NAP TC-SVM wordphone lattices2005

PhoneWord-SVM GMM-ATNORM2004

Feature Mapping SVM-GLDS2003

SuperSID High-level features2002

Text-const GMM word-ngram2001

MIT Lincoln Laboratory18

Odyssey 2008

0

10

20

30

40

1996 2003 2005 2005 2007 2007

EER

()

30s 10s 3s

CallFriend(12-lang)

OHSU(7-lang)

Mixer3(14-lang)

113

32 421014

LRE Performance Trends 1996-2007Lincoln Systems

19

Year Main LID Technology

1996 PPRLM2003 + GMM-SDC SVM-SDC2005 + Phone lattices SVM w ngrams

Binary Trees2007 + TRAPS tokenizers fLFA fNAP

GMM-MMI SVM-GSV calibrated LLRs

MIT Lincoln Laboratory19

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory20

Odyssey 2008

The ExpeditionEvaluations

bull The evaluation paradigm has clearly helped propel speaker and language RampD forward

ndash Common focus ndash Comparable results and repeatable experimentsndash Collaboration

bull But there are some issues to considerndash Proliferation of tasks and conditions can dilute and fragment

community effortndash Evaluations are application-dependent

The tasks conditions and data are representative of some application(s)

Are these being set in a meaningful wayndash Performance numbers need context

Time-pressed less-technical potential users want yesno to ldquowill it or wonrsquot it work for my applicationrdquo

ndash Speaker and language recognition research increasingly relies on data driven discovery

Does performance depend on highly matched dev data Are performance gains due to technology or data

MIT Lincoln Laboratory21

Odyssey 2008

The ExpeditionResearch

bull Speaker and language research are built on three core areas

ndash Speech Science Understanding how speakerlanguage information is conveyed in the speech signal and how to robustly extract measures of this information

ndash Pattern Recognition Techniques and algorithms to effectively represent and compare salient patterns in data

ndash Data Driven Discovery Effectively using data to apply refine and improve systems built from above

bull Current speakerlanguage research is heavily weighted toward data driven discovery

ndash Cure or cursendash Are we discovering underlying problems to address in

research or just where we want more data

  • Speaker and Language RecognitionA Guided Safari
  • Roadmap
  • The OdysseyMartigny Switzerland ndash April 5-7 1994
  • The OdysseyAvignon France ndash April 20-23 1998
  • The OdysseyCrete Greece ndash June 18-22 2001
  • The OdysseyToledo Spain ndash May 31- June 3 2004
  • The OdysseySan Juan Puerto Rico ndash June 28-30 2006
  • The OdysseyStellenbosch South Africa ndash January 21-24 2008
  • Roadmap
  • NIST SpeakerLanguage Recognition Evaluations
  • NIST SRELREPre-history
  • NIST SRELREFormal Start
  • NIST SRELRESteady Progress
  • NIST SRELRENew Directions
  • NIST SRELRECurrent Period
  • NIST SREHow are we doing
  • SRE Performance Trends 2001-2007Lincoln Systems
  • LRE Performance Trends 1996-2007Lincoln Systems
  • Roadmap
  • The ExpeditionEvaluations
  • The ExpeditionResearch
Page 17: Speaker and Language Recognition

MIT Lincoln Laboratory17

Odyssey 2008

0123456789

10

0

1

2

3

4

2001 2002 2003 2004 2005 2006

SRE Performance Trends 2001-2007Lincoln Systems

bull Consistent and steady improvement for datatask focus

EER ()1conv4w1conv4w

8conv4w1conv4w

minDCFx100

2001 2002 2003 2004 2005 2006SWB1 SWB2 MIXER2-3

bull New data sets designed to be more challenging

bull New features classifiers and compensations drive error rates down over time

SVM-GSV GMM-LFA MultiFeatSVM-GLDS SVM-MLLR+NAP

2006

NAP TC-SVM wordphone lattices2005

PhoneWord-SVM GMM-ATNORM2004

Feature Mapping SVM-GLDS2003

SuperSID High-level features2002

Text-const GMM word-ngram2001

MIT Lincoln Laboratory18

Odyssey 2008

0

10

20

30

40

1996 2003 2005 2005 2007 2007

EER

()

30s 10s 3s

CallFriend(12-lang)

OHSU(7-lang)

Mixer3(14-lang)

113

32 421014

LRE Performance Trends 1996-2007Lincoln Systems

19

Year Main LID Technology

1996 PPRLM2003 + GMM-SDC SVM-SDC2005 + Phone lattices SVM w ngrams

Binary Trees2007 + TRAPS tokenizers fLFA fNAP

GMM-MMI SVM-GSV calibrated LLRs

MIT Lincoln Laboratory19

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory20

Odyssey 2008

The ExpeditionEvaluations

bull The evaluation paradigm has clearly helped propel speaker and language RampD forward

ndash Common focus ndash Comparable results and repeatable experimentsndash Collaboration

bull But there are some issues to considerndash Proliferation of tasks and conditions can dilute and fragment

community effortndash Evaluations are application-dependent

The tasks conditions and data are representative of some application(s)

Are these being set in a meaningful wayndash Performance numbers need context

Time-pressed less-technical potential users want yesno to ldquowill it or wonrsquot it work for my applicationrdquo

ndash Speaker and language recognition research increasingly relies on data driven discovery

Does performance depend on highly matched dev data Are performance gains due to technology or data

MIT Lincoln Laboratory21

Odyssey 2008

The ExpeditionResearch

bull Speaker and language research are built on three core areas

ndash Speech Science Understanding how speakerlanguage information is conveyed in the speech signal and how to robustly extract measures of this information

ndash Pattern Recognition Techniques and algorithms to effectively represent and compare salient patterns in data

ndash Data Driven Discovery Effectively using data to apply refine and improve systems built from above

bull Current speakerlanguage research is heavily weighted toward data driven discovery

ndash Cure or cursendash Are we discovering underlying problems to address in

research or just where we want more data

  • Speaker and Language RecognitionA Guided Safari
  • Roadmap
  • The OdysseyMartigny Switzerland ndash April 5-7 1994
  • The OdysseyAvignon France ndash April 20-23 1998
  • The OdysseyCrete Greece ndash June 18-22 2001
  • The OdysseyToledo Spain ndash May 31- June 3 2004
  • The OdysseySan Juan Puerto Rico ndash June 28-30 2006
  • The OdysseyStellenbosch South Africa ndash January 21-24 2008
  • Roadmap
  • NIST SpeakerLanguage Recognition Evaluations
  • NIST SRELREPre-history
  • NIST SRELREFormal Start
  • NIST SRELRESteady Progress
  • NIST SRELRENew Directions
  • NIST SRELRECurrent Period
  • NIST SREHow are we doing
  • SRE Performance Trends 2001-2007Lincoln Systems
  • LRE Performance Trends 1996-2007Lincoln Systems
  • Roadmap
  • The ExpeditionEvaluations
  • The ExpeditionResearch
Page 18: Speaker and Language Recognition

MIT Lincoln Laboratory18

Odyssey 2008

0

10

20

30

40

1996 2003 2005 2005 2007 2007

EER

()

30s 10s 3s

CallFriend(12-lang)

OHSU(7-lang)

Mixer3(14-lang)

113

32 421014

LRE Performance Trends 1996-2007Lincoln Systems

19

Year Main LID Technology

1996 PPRLM2003 + GMM-SDC SVM-SDC2005 + Phone lattices SVM w ngrams

Binary Trees2007 + TRAPS tokenizers fLFA fNAP

GMM-MMI SVM-GSV calibrated LLRs

MIT Lincoln Laboratory19

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory20

Odyssey 2008

The ExpeditionEvaluations

bull The evaluation paradigm has clearly helped propel speaker and language RampD forward

ndash Common focus ndash Comparable results and repeatable experimentsndash Collaboration

bull But there are some issues to considerndash Proliferation of tasks and conditions can dilute and fragment

community effortndash Evaluations are application-dependent

The tasks conditions and data are representative of some application(s)

Are these being set in a meaningful wayndash Performance numbers need context

Time-pressed less-technical potential users want yesno to ldquowill it or wonrsquot it work for my applicationrdquo

ndash Speaker and language recognition research increasingly relies on data driven discovery

Does performance depend on highly matched dev data Are performance gains due to technology or data

MIT Lincoln Laboratory21

Odyssey 2008

The ExpeditionResearch

bull Speaker and language research are built on three core areas

ndash Speech Science Understanding how speakerlanguage information is conveyed in the speech signal and how to robustly extract measures of this information

ndash Pattern Recognition Techniques and algorithms to effectively represent and compare salient patterns in data

ndash Data Driven Discovery Effectively using data to apply refine and improve systems built from above

bull Current speakerlanguage research is heavily weighted toward data driven discovery

ndash Cure or cursendash Are we discovering underlying problems to address in

research or just where we want more data

  • Speaker and Language RecognitionA Guided Safari
  • Roadmap
  • The OdysseyMartigny Switzerland ndash April 5-7 1994
  • The OdysseyAvignon France ndash April 20-23 1998
  • The OdysseyCrete Greece ndash June 18-22 2001
  • The OdysseyToledo Spain ndash May 31- June 3 2004
  • The OdysseySan Juan Puerto Rico ndash June 28-30 2006
  • The OdysseyStellenbosch South Africa ndash January 21-24 2008
  • Roadmap
  • NIST SpeakerLanguage Recognition Evaluations
  • NIST SRELREPre-history
  • NIST SRELREFormal Start
  • NIST SRELRESteady Progress
  • NIST SRELRENew Directions
  • NIST SRELRECurrent Period
  • NIST SREHow are we doing
  • SRE Performance Trends 2001-2007Lincoln Systems
  • LRE Performance Trends 1996-2007Lincoln Systems
  • Roadmap
  • The ExpeditionEvaluations
  • The ExpeditionResearch
Page 19: Speaker and Language Recognition

MIT Lincoln Laboratory19

Odyssey 2008

Roadmap

bull The odyssey from 1994 to 2008

bull The scenic route through NIST speaker and language recognition evaluations

bull The expedition into future territories

MIT Lincoln Laboratory20

Odyssey 2008

The ExpeditionEvaluations

bull The evaluation paradigm has clearly helped propel speaker and language RampD forward

ndash Common focus ndash Comparable results and repeatable experimentsndash Collaboration

bull But there are some issues to considerndash Proliferation of tasks and conditions can dilute and fragment

community effortndash Evaluations are application-dependent

The tasks conditions and data are representative of some application(s)

Are these being set in a meaningful wayndash Performance numbers need context

Time-pressed less-technical potential users want yesno to ldquowill it or wonrsquot it work for my applicationrdquo

ndash Speaker and language recognition research increasingly relies on data driven discovery

Does performance depend on highly matched dev data Are performance gains due to technology or data

MIT Lincoln Laboratory21

Odyssey 2008

The ExpeditionResearch

bull Speaker and language research are built on three core areas

ndash Speech Science Understanding how speakerlanguage information is conveyed in the speech signal and how to robustly extract measures of this information

ndash Pattern Recognition Techniques and algorithms to effectively represent and compare salient patterns in data

ndash Data Driven Discovery Effectively using data to apply refine and improve systems built from above

bull Current speakerlanguage research is heavily weighted toward data driven discovery

ndash Cure or cursendash Are we discovering underlying problems to address in

research or just where we want more data

  • Speaker and Language RecognitionA Guided Safari
  • Roadmap
  • The OdysseyMartigny Switzerland ndash April 5-7 1994
  • The OdysseyAvignon France ndash April 20-23 1998
  • The OdysseyCrete Greece ndash June 18-22 2001
  • The OdysseyToledo Spain ndash May 31- June 3 2004
  • The OdysseySan Juan Puerto Rico ndash June 28-30 2006
  • The OdysseyStellenbosch South Africa ndash January 21-24 2008
  • Roadmap
  • NIST SpeakerLanguage Recognition Evaluations
  • NIST SRELREPre-history
  • NIST SRELREFormal Start
  • NIST SRELRESteady Progress
  • NIST SRELRENew Directions
  • NIST SRELRECurrent Period
  • NIST SREHow are we doing
  • SRE Performance Trends 2001-2007Lincoln Systems
  • LRE Performance Trends 1996-2007Lincoln Systems
  • Roadmap
  • The ExpeditionEvaluations
  • The ExpeditionResearch
Page 20: Speaker and Language Recognition

MIT Lincoln Laboratory20

Odyssey 2008

The ExpeditionEvaluations

bull The evaluation paradigm has clearly helped propel speaker and language RampD forward

ndash Common focus ndash Comparable results and repeatable experimentsndash Collaboration

bull But there are some issues to considerndash Proliferation of tasks and conditions can dilute and fragment

community effortndash Evaluations are application-dependent

The tasks conditions and data are representative of some application(s)

Are these being set in a meaningful wayndash Performance numbers need context

Time-pressed less-technical potential users want yesno to ldquowill it or wonrsquot it work for my applicationrdquo

ndash Speaker and language recognition research increasingly relies on data driven discovery

Does performance depend on highly matched dev data Are performance gains due to technology or data

MIT Lincoln Laboratory21

Odyssey 2008

The ExpeditionResearch

bull Speaker and language research are built on three core areas

ndash Speech Science Understanding how speakerlanguage information is conveyed in the speech signal and how to robustly extract measures of this information

ndash Pattern Recognition Techniques and algorithms to effectively represent and compare salient patterns in data

ndash Data Driven Discovery Effectively using data to apply refine and improve systems built from above

bull Current speakerlanguage research is heavily weighted toward data driven discovery

ndash Cure or cursendash Are we discovering underlying problems to address in

research or just where we want more data

  • Speaker and Language RecognitionA Guided Safari
  • Roadmap
  • The OdysseyMartigny Switzerland ndash April 5-7 1994
  • The OdysseyAvignon France ndash April 20-23 1998
  • The OdysseyCrete Greece ndash June 18-22 2001
  • The OdysseyToledo Spain ndash May 31- June 3 2004
  • The OdysseySan Juan Puerto Rico ndash June 28-30 2006
  • The OdysseyStellenbosch South Africa ndash January 21-24 2008
  • Roadmap
  • NIST SpeakerLanguage Recognition Evaluations
  • NIST SRELREPre-history
  • NIST SRELREFormal Start
  • NIST SRELRESteady Progress
  • NIST SRELRENew Directions
  • NIST SRELRECurrent Period
  • NIST SREHow are we doing
  • SRE Performance Trends 2001-2007Lincoln Systems
  • LRE Performance Trends 1996-2007Lincoln Systems
  • Roadmap
  • The ExpeditionEvaluations
  • The ExpeditionResearch
Page 21: Speaker and Language Recognition

MIT Lincoln Laboratory21

Odyssey 2008

The ExpeditionResearch

bull Speaker and language research are built on three core areas

ndash Speech Science Understanding how speakerlanguage information is conveyed in the speech signal and how to robustly extract measures of this information

ndash Pattern Recognition Techniques and algorithms to effectively represent and compare salient patterns in data

ndash Data Driven Discovery Effectively using data to apply refine and improve systems built from above

bull Current speakerlanguage research is heavily weighted toward data driven discovery

ndash Cure or cursendash Are we discovering underlying problems to address in

research or just where we want more data

  • Speaker and Language RecognitionA Guided Safari
  • Roadmap
  • The OdysseyMartigny Switzerland ndash April 5-7 1994
  • The OdysseyAvignon France ndash April 20-23 1998
  • The OdysseyCrete Greece ndash June 18-22 2001
  • The OdysseyToledo Spain ndash May 31- June 3 2004
  • The OdysseySan Juan Puerto Rico ndash June 28-30 2006
  • The OdysseyStellenbosch South Africa ndash January 21-24 2008
  • Roadmap
  • NIST SpeakerLanguage Recognition Evaluations
  • NIST SRELREPre-history
  • NIST SRELREFormal Start
  • NIST SRELRESteady Progress
  • NIST SRELRENew Directions
  • NIST SRELRECurrent Period
  • NIST SREHow are we doing
  • SRE Performance Trends 2001-2007Lincoln Systems
  • LRE Performance Trends 1996-2007Lincoln Systems
  • Roadmap
  • The ExpeditionEvaluations
  • The ExpeditionResearch