sp eech in telligence for security and defense · 2015. 8. 21. · uni vers ity res ea rch group es...

29
Speech intelligence for security and defense (getting state-of-the-art speech recognition research from university lab to the real world) Pavel Mat!"ka% ’e)r Sc-.ar/ an1 2an 34on/a6 7ernock8 Phonexia Ltd. and Brno University of Technology, Czech Republic ISS World Prague, 4-5 th June 2009

Upload: others

Post on 26-Feb-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

Speech intelligence for security and defense

(getting state-of-the-art speech recognition research from university lab to the real world)

Pavel Mat!"ka% 'e)r Sc-.ar/ an1 2an 34on/a6 7ernock8

Phonexia Ltd. and Brno University of Technology, Czech Republic

ISS World Prague, 4-5th June 2009

Page 2: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

2/28

Plan ! Speech technogies " an introduction

! Who we are

! Technologies

! Developer*s corner! Summary

Page 3: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

3/28

Needle in a haystack! Speech is the most important modality of human-human communication 45608 of information: ; criminals and terrorists are also communicating by speech

! Speech is easy to acquire in both civilian and intelligence/defense scenarios.

! More difficult is to find what we are looking for! Typically done by human experts, but always count on:

" Limited personnel

" Limited budget

" Not enough languages spoken

" Insufficient security clearances

Technologies of speech processing are not almighty but can help to narrow the search space.

Page 4: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

4/28

6S9eec- recogni)ion<What was said ? ! Speech recognition

" Complete transcription - Large Vocabulary Continuous speech recognition (LVCSR): transcription, speech to text, S2T.

" Detection of keywords / keyphrases " keyword spotting (KWS), spoken term detection (STD)

Which language ? ! Language recognition (LRE), Language identification (LID)

Who said it ? ! choose one out of a set of N speakers " speaker identification! confirm the claimed identity of a speaker " speaker verification! Haven*t heard the speaker before " age ID, gender ID, etc.

Page 5: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

5/28

Plan ! Speech technogies " an introduction

! Who we are

! Technologies

! Developer*s corner! Summary

Page 6: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

6/28

Speech@FIT at BUT! University research

group established in 1997

! 20 people in 2009 (faculty, researchers, students, support staff).

! Provides also education within Dpt. of Computer Graphics and Multimedia.

! Cooperating with EU and US universities and companies.

! Supported by EC, US and national projects

The goal: high profile research in speech theory, algorithms and software implementation

Page 7: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

7/28

Focus on evaluations! AICm better than the other guysF " not relevant unless the same data and

evaluation metrics for everyone.! NIST " US Government Agency, http://www.nist.gov/speech ! Regular benchmark campaigns " evaluations " of speech technologies.! All participants have the same data and have the same limited time to

process them and send results to NIST => objective comparison.! The results and details of systems are discussed at NIST workshops.! Speech@FIT extensively participating in NIST evaluations:

! Transcription 2005, 2006, 2007, 2009 ! Language ID 2003, 2005, 2007, 2009 (now!)! Speaker Verification 1998, 1999, 2006, 2008, ! Spoken term detection 2006

Why are we doing this ? ! We believe that evaluations are really advancing the state of the art ! Do not want to waste our time on useless work ; 

Page 8: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

8/28

Phonexia Ltd.! Company created in 2006 by 6

Speech@FIT members! Closely cooperating with the

research group! Key people

" Pavel MatJjka, CEO" Petr Schwarz, CTO" Igor Szöke, CFO" Dr. LukRS Burget, research 

coordinator" Dr. Uan VernockW, university 

relations" TomRS KaSpRrek, hardware 

architect

The goal: bringing mature technologies to the market, especially in the security/defense sector

Page 9: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

9/28

Not new in the business !Speech@FIT! NIST evaluations are

supported by intelligence sponsors in the US.

! Project sponsored by US Air Force EOARD

! Project supported by Czech Ministry of Interior

! Czech Ministry of Education supporting FIT BUT under framework project FSecurity-Oriented Research in Information Technology[

Phonexia! Founded based on

consultations from Czech military intelligence.

! Delivers systems for civilian and military intelligence since 2006.

! Customers in ! Czech Republic! Germany! Spain! Russia

Page 10: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

10/28

Plan ! Speech technogies " an introduction

! Who we are

! Technologies

! Developer*s corner! Summary

Page 11: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

11/28

Language IDTechnical approach

! acoustic

! phonotactic

Page 12: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

12/28

Research achievements

! NIST LRE 2005 "Speech@FIT the best in 2 out of 3 categories

! NIST LRE 2007 "confirmation of the leading position.

!"!#$##%&%'()#$##%&%*!"#$##%&%*"'#+#,,&,)'"#$##%&%-.(#$##%&%/!0#$##%&%12"#$##%&%3!(#$##%&%40!#$##%&%5!3#$##%&%6.'#$##%&%

!"!#$##%&%'()#+#,7&7*!"#$##%&%*"'#$##%&7)'"#$##8&,-.(#$##%&%/!0#$##%&%12"#$##%&%3!(#$##9&740!#$##%&%5!3#$##%&%6.'#$##%&9

!"!#$##%&%'()#$#9:&9*!"#$##%&%*"'#$##%&%)'"#+#;8&<-.(#$##%&%/!0#$##%&%12"#$##%&%3!(#$##%&%40!#$##%&%5!3#$##%&%6.'#$##%&%

!"!#+#8=&,'()#$##9&<*!"#$#9=&,*"'#$##%&%)'"#$##%&%-.(#$#99&=/!0#$##%&,12"#$#==&=3!(#$##%&%40!#$##%&95!3#$##<&86.'#$##%&9

Key ideas:! Discriminative modeling

! Gathering training data from public sources

Page 13: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

13/28

ProductsReady to ship: Phonexia LID! Application with GUI for sorting of record,

and command line version! Combination of acoustic and phontatic

approach! 12 pre-trained languages! Possibility to train new language/model by

customer! Possibility to discriminatively train higher

quality languages/models by Phonexia! API for developers

Ongoing development! Increasing the robustness to adverse

factors (speaker, acoustic environment, channel)

Page 14: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

14/28

Speaker verificationTechnical approach

! Model of speaker against model of the Fworld[

Page 15: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

Fighting unwanted variability

UBM

Target speaker model

Page 16: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

Let the models move !

UBM

Target speaker model Test data

For recognition, move both models along the high inter-session variability direction(s) to fit well the test data

Page 17: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

17/28

Research achievements

Key ideas:! Coping with unwanted variability

! Compact representation of speakers allowing for extremely fast scoring of speech files.

<- NIST SRE 2006: ! BUT! STBU

consortium

NIST SRE 2008 ->! confirming

leading position

Page 18: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

18/28

ProductsReady to ship: Phonexia Speaker

Verification! GUI application for speaker search in

audio archives! Command line version and API for

developers

Ongoing development! More powerful techniques for

robustness on non-speaker information " Joint Factor Analysis.

! Calibration in different setups (lengths of utterances, etc.) to always obtain a meaningful score.

Page 19: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

19/28

But what if we did not hear the speaker before ?

Gender ID ! The easiest speech application to deploy ;

! ; and the most accurate 4\9^8 on challenging channels)

! Limits search space by 50% ! Available now, standalone or in

Phonexia Speaker ID

Page 20: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

20/28

Keyword spottingTechnical approach! Comparing keyword model output with an anti-model.! Key question: what is the needed tradeoff between

speed and accuracy?

Acoustic! Fast! No problem with OOVL Can not index " new keyword

mens new processing of all the data

L Does not have language model " problem with short keywords.

LVCSR! once indexed, the search is very

fast ! More precise.L More complex, recognition is

slowerL Limited vocabulary " OOV

Page 21: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

21/28

Research achievements

Key ideas:! Expertise with acoustic, word and sub-word recognition

! Speech indexing and search

! Normalization of scores.

NIST STD 2006 = English MV Task 2008 = Czech

Page 22: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

22/28

ProductsReady to ship: Phonexia Acoustic KWS! GUI application for keyword spotting in

incoming files! Czech and Russian supported

Ongoing development! Command line version and API for

developers! LVCSR-based KWS for English

and Czech ! Other languages " Polish,

Hungarian, Slovak.

Page 23: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

23/28

What is special for ISS public?>e kno. you are no) .orking .i)- 4iAi…! Phonexia PreSelector " filtering out DTMF, FAX, ringing tones,

noises.! Channel compensation " coping with irrelevant information. >e kno. .e .ill no) ge) your 6-o)< 1a)a…! LID: Training new languages by the user! SID: Background models trained on publicly available databases.! Phonexia application won*t need Internet connection.>e kno. youDll be in)eres)e1 in languages .e 1onD) su99or)! Custom development (but costly and long)! Language-independent technologies, such as SIDWe know this is not a box-software! We respect specifics of each customer! We are used to adapt our systems to your data and needs

Page 24: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

24/28

Plan ! Speech technogies " an introduction

! Who we are

! Technologies

! Developer*s corner! Summary

Page 25: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

25/28

Brno Speech Core! Shares building

blocks (source code) among all our technologies

! Allows for fast prototyping of any speech application.

! Unified application interface enables fast and clean integration of our technology to customers* systems.

BSCORE

LID

SID

VADKWS

GID

LVCSR

PhnRec

! The API allows to use (and distribute) the technology as the whole or in parts

Page 26: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

26/28

Forms of delivery! Executable software including GUI

! Libraries + models + API

! Combination of both

! Integration in a full speech search system

! Consulting

Lang. ID

KWS EN

KWS RU

KWS AR

Preselector

Speaker ID

Gender ID

SPEECH

Page 27: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

27/28

Plan ! Speech technogies " an introduction

! Who we are

! Technologies

! Developer*s corner! Summary

Page 28: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

SummarySpeech@FIT:! Research " academic, but driven by real demands of the

intelligence community. Phonexia:! Technology, SDKs! Stand alone applications! Custom development! Maintenance, training, services! ConsultingTogether: ! Serving the intelligence community in making the world a

safer place.

Page 29: Sp eech in telligence for security and defense · 2015. 8. 21. · Uni vers ity res ea rch group es tab lishe d in 1 9 9 7! 20 pe op le in 20 09 (facul ty, researc he rs, stud en

ContactsPhonexia, Ltd. http://phonexia.com/

Pavel MatJjka, CEO, [email protected]

Petr Schwarz, CTO, [email protected]

Speech@FIT, Brno University of Technology, http://speech.fit.vutbr.cz/

Uan FHon`a[ Cernocky, Head of Department, [email protected]

Thanks for your attentionReady for your questions now or in our booth