research activities at auth related to dialogue detection ioannis pitas constantine kotropoulos...
Post on 20-Dec-2015
219 views
TRANSCRIPT
Research activities at AUTH related to dialogue Research activities at AUTH related to dialogue detectiondetection
Ioannis Pitas Ioannis Pitas Constantine KotropoulosConstantine Kotropoulos
Nikos NikolaidisNikos Nikolaidis
WP6 e-team: Audiovisual UnderstandingWP6 e-team: Audiovisual Understanding
AIIA Lab, Department of InformaticsAIIA Lab, Department of InformaticsAristotle University of ThessalonikiAristotle University of Thessaloniki
OutlineOutline IntroductionIntroduction Dialogue detection concept: cross-correlation of Dialogue detection concept: cross-correlation of
indicator functionsindicator functions Speaker turn detection based on speech and visual Speaker turn detection based on speech and visual
cues (mouth activity)cues (mouth activity) Frontal face detection; facial feature detection (e.g. Frontal face detection; facial feature detection (e.g.
mouth)mouth) One-two speaker detection One-two speaker detection Speaker clustering based on speech and visual cuesSpeaker clustering based on speech and visual cues FingerprintingFingerprinting
AIIA Lab, Department of InformaticsAIIA Lab, Department of InformaticsAristotle University of ThessalonikiAristotle University of Thessaloniki
Indicator functions and their cross-Indicator functions and their cross-correlation (1)correlation (1)
A dialogue between two persons from the movie “Secret Window” [Dialogue 1] .
( )AI n n
( )ABc d
n( )BI n
d
AIIA Lab, Department of InformaticsAIIA Lab, Department of InformaticsAristotle University of ThessalonikiAristotle University of Thessaloniki
Indicator functions and their cross-Indicator functions and their cross-correlation (2)correlation (2)
( )AI n
( )BI n
( )ABc d
n
n
d
A scene without a dialogue between two persons
AIIA Lab, Department of InformaticsAIIA Lab, Department of InformaticsAristotle University of ThessalonikiAristotle University of Thessaloniki
Speaker Turn DetectionSpeaker Turn Detection
Audio Segmentation aims at finding acoustic events within Audio Segmentation aims at finding acoustic events within an audio stream. Speaker turn detection is a special case of an audio stream. Speaker turn detection is a special case of speaker segmentation.speaker segmentation.
Important step in pre-processing of speech in order to Important step in pre-processing of speech in order to implement audio indexing or speaker tracking.implement audio indexing or speaker tracking.
Usually, no prior knowledge about speakers is assumed. Usually, no prior knowledge about speakers is assumed.
Speaker 1 Speaker 2
AIIA Lab, Department of InformaticsAIIA Lab, Department of InformaticsAristotle University of ThessalonikiAristotle University of Thessaloniki
MODEL BASED SEGMENTATION
( , )Z ZN
DISTBIC DISTBIC
( , )Y YN CONTRAST THE HYPOTHESIS OF NO SPEAKER TURN ( ) AGAINST THE SPEAKER TURN( )
ZN
,X YN N
( ) log2
log log2 2
ZZ
X YX Y
NML i
N N
( , )X XN XN vectors in X
YN vectors in Y
BIC CRITERION( ) ( ) 0BIC i ML i P
Z X YN N N
Speaker turn!!!!
AIIA Lab, Department of InformaticsAIIA Lab, Department of InformaticsAristotle University of ThessalonikiAristotle University of Thessaloniki
Frontal face images at quartet and Frontal face images at quartet and octet resolutionoctet resolution
Original ImageOriginal Image Quartet ImageQuartet Image Octet ImageOctet Image
AIIA Lab, Department of InformaticsAIIA Lab, Department of InformaticsAristotle University of ThessalonikiAristotle University of Thessaloniki
Face detection based on cornersFace detection based on corners
The figures show the 3 possible feature point set The figures show the 3 possible feature point set configurations, having 100 feature points each. They differ configurations, having 100 feature points each. They differ at the minimum distance allowed between the feature at the minimum distance allowed between the feature points. In general, small inter feature point distances yield points. In general, small inter feature point distances yield a feature point concentration and poor face detection. The a feature point concentration and poor face detection. The minimum allowed distance is a parameter of the training minimum allowed distance is a parameter of the training procedure.procedure.
AIIA Lab, Department of InformaticsAIIA Lab, Department of InformaticsAristotle University of ThessalonikiAristotle University of Thessaloniki
Face detection Receiver Operating Face detection Receiver Operating Characteristic (ROC) curvesCharacteristic (ROC) curves
• For the SVM-based face For the SVM-based face detection, the best results detection, the best results were obtained with the were obtained with the sigmoidal kernel. Best sigmoidal kernel. Best equal error rate 4.5%equal error rate 4.5%
• The maximum likelihood The maximum likelihood detection commits a few detection commits a few false alarm. For FAR in false alarm. For FAR in [5.2%, 5.67%] the FRR [5.2%, 5.67%] the FRR drops quickly from 6.1% to drops quickly from 6.1% to 0.7%. 0.7%.
AIIA Lab, Department of InformaticsAIIA Lab, Department of InformaticsAristotle University of ThessalonikiAristotle University of Thessaloniki
One/Two Speaker Detection One/Two Speaker Detection
Two-speaker detection (NIST 2002): Best EER 16.2 %
Kajarekar, Adami, Hermansky, 2003
One-speaker detection (NIST 2002): Best EER 7.1 %
AIIA Lab, Department of InformaticsAIIA Lab, Department of InformaticsAristotle University of ThessalonikiAristotle University of Thessaloniki
Frontal face authenticationFrontal face authentication