mediaeval 2015 - but quesst 2015 system description
TRANSCRIPT
BUT QUESST 2015 System Description
Miroslav Skácel, Igor SzökeSpeech@FIT
Faculty of Information TechnologyBrno University of Technology
MediaEval QUESST 2015 workshop, September 14.-15. 2015, Wurzen
System overviewOur internal task was:
● to reuse some Atomic systems as we have● to incorporate bottlenecks● to calibrate and fuse● to cope with T2/T3 queries
We ended up with:● 4 Atomic systems● 3 QbE subsystems based on DTW● 4 languages (Czech, Portuguese, Russian and Spanish).
2
Atomic system● no adaptation on target data (SMVN, VTLN, …)● Artificial Neural Networks – to estimate bottlenecks ● bottlenecks – trained on GlobalPhone (GP) database
4
Subsystem
Neural network based features:● bottleneck features (30 dimensional)● No VTLN, No SMN/SVN
Query detector● based on Dynamic Time Warping (DTW)
5
DTW QbE subsystem● segmental DTW (query can start in any frame of utterance)● Voice Activity Detection (VAD) only on queries● Pearson product-moment correlation distance (dcorr)● slope limitation● online normalizing of the path● bottlenecks superior to posteriors
features dcorr in minCnxe (ALL)
SD CZ POST 0.984
SD HU POST 0.972
SD RU POST 0.952
GP CZ BN 0.853
GP PO BN 0.894
GP RU BN 0.893
GP SP BN 0.904
6
Dealing with T2● query split into equal parts● each part searched in utterance separately● results averaged together● query split into 2 (denoted as 2w) and 3 (3w) parts
in late evaluation
8
Score normalization● raw detection scores normalized by length● the best detection per utterance-query pair selected● mode normalization performed
original mode norm.
9
Results
● posteriors do not work for this year dataset● slope limitation helps to control path shape● fea stack of more than 4 langs does not improve performance● mode norm is good for raw score normalization
● we will focus on denoising and dereverberation in next year
10