macquarie rt05s speaker diarisation system steve cassidy centre for language technology macquarie...

Macquarie RT05s Speaker Diarisation System

Steve Cassidy

Centre for Language TechnologyMacquarie University

Sydney

2©19 Apr 2023 Macquarie University

System Goals

• Develop a simple end-to-end system for the SPKR task

• Platform for experimentation • Improve on RT04s system

Overall Results

AMI AMI CMU CMU ICSI ICSI NIST NIST VT VT

System OverviewFeature Extractio

Segmentation

Turn Clusterin

Speaker ID

• Single Distant Microphone• Implemented in C and Tcl• Runs in around 6x real time on

single AMD64 • Developed with RT04 devtest

data– No AMI or VT data seen

before eval

Feature ExtractionFeature Extractio

Feature Extractio

Segmentation

Turn Clusterin

Speaker ID

• 26 coefficients:– 12 MFCC– RMS Energy– Delta Coefficients

• 10ms frame rate, 25.6ms window

• Mean subtraction based on mean of first 60 seconds of file

• Uses the KTH Snack toolkit

Speech Activity DetectionFeature Extractio

SADSAD

Segmentation

Turn Clusterin

Speaker ID

• Goal: find obvious regions of non-speech for gross segmentation of recording

• GMMs for speech and non-speech– Speech model: 32 mixtures– Non-speech model: 8 mixtures

• Trained on RT04s devtest data set– Reference labels generated from

speaker labelling– Ignored silence regions < 0.3s

SADSAD

Segmentation

Turn Clusterin

Speaker ID

• Evaluate frame classification error (%):Dataset NSPER SPER

RT04s unseen 32 19

RT05s 47 15

SADSAD

Segmentation

Turn Clusterin

Speaker ID

• SAD is performed by classifying successive windows of 10 frames using the GMM models

• Consecutive regions are merged and labelled

• Non-speech < 0.35s merged with following segment

• Speech < 0.15s merged with following non-speech

SADSAD

Segmentation

Turn Clusterin

Speaker ID

• Evaluation– Frame classification error– Boundaries missed

– nothing within 0.5s

– Boundaries inserted inside real segments

Meeting

Frame Error

Boundary Error

% # Auto

Miss FP

CMU 1415

89 7 91 77 45

ICSI 1100

99 4 85 88 99

NIST 0939

71 9 83 84 97

AMI 1206 43 18 25 79 348VT 1430 100 0 99 50 2

Turn Segmentation Feature Extractio

Segmentation

Turn Clusterin

Speaker ID

• Speech regions are segmented using BIC criterion

• Compare fit of single gaussian model of sequence with pair of models each side of break

• Fixed windows of 200 frames advanced over speech region

• Peaks in delta BIC curve indicate change points

Turn Segmentation Feature Extractio

Segmentation

Turn Clusterin

Speaker ID

0 50 100

CMU/98

ICSI/198

NIST/257

AMI/427

VT/168

% Error

FPMiss

Turn ClusteringFeature Extractio

Segmentation

Turn Clusterin

Speaker ID

• Given a set of speaker turns, find natural clusters

• Number of clusters unknown• Requires:

– Distance metric on speaker turns

– Clustering algorithm– Cluster evaluation metric

Speaker Similarity

Mean + variance of feature vectorsK-L distance metric

Turn ClusteringFeature Extractio

Segmentation

Turn Clusterin

Speaker ID

• Implementation:– Select segments longer than

1.5s for clustering– KL distance on mean/variance of

features– Hierarchical clustering – Select labellings for 2, 3…N

speakers– Cluster evaluation performed

after speaker ID

Speaker IDFeature Extractio

Segmentation

Turn Clusterin

Speaker ID

• Use cluster labelled turns to train speaker models– 32 mixture GMM

• Now classify and re-label all speaker turns

• Potentially correct poor clustering decisions

• Very small amounts of data to support models

Overall Results

AMI AMI CMU CMU ICSI ICSI NIST NIST VT VT

What Didn’t Work

• Inter-channel phase and level differences

• Exemplar speaker models• SVD based turn clustering

– Find similar groups by factoring the distance matrix

– One product of SVD is a number of clusters

macquarie rt05s speaker diarisation system steve cassidy centre for language technology macquarie...

clustering speaker id

clustering speaker id

nonspeech speech model

clustering speaker id

following nonspeech

s slide

set of speaker

rt04s system slide

Documents

macquarie university macquarie university central

wordsearch - cathy cassidy

cassidy stephens

: fr. james cassidy

2015-03-24 (3) - mcgill library · abc cassidy co. edmonton...

cathy cassidy

seehear: signer diarisation and a new dataset

cassidy sarah 14494202_edc171_assessment2

henry cassidy s11

jim cassidy presentation

kritn cassidy photography

cassidy deposition

c cassidy exam

noel cassidy

signer diarisation in the wild - university of oxford

anita cassidy...

recent improvements in the cued diarisation...

dr steve cassidy

fruits by cassidy

macquarie university macquarie md domestic admissions...