iowa state university developmental robotics laboratory unsupervised segmentation of audio speech...

19
Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander Stoytchev Developmental Robotics Lab Department of Electrical and Computer Engineering Iowa State University [email protected], [email protected] www.cs.iastate.edu/~mamille/

Upload: oswin-cross

Post on 12-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander

Iowa State UniversityDevelopmental Robotics Laboratory

Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm

Matthew Miller, Alexander StoytchevDevelopmental Robotics Lab

Department of Electrical and Computer Engineering Iowa State University

[email protected], [email protected]/~mamille/

Page 2: Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander

Iowa State UniversityDevelopmental Robotics Laboratory

Language: A Grand Challenge• A working example• Automatically acquires

language• Well studied

Page 3: Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander

Iowa State UniversityDevelopmental Robotics Laboratory

Statistical Learning Experiments

• Saffran et. al. (1996): 8-month-olds can segment speech.

Artificial Language:tupiro golabu bedaku padoti

Language: tu pi ro go la bu be da kuTransition Prob: 1.0 1.0 .25 1.0 1.0 .25 1.0 1.0 ...

Acclimate

Novel Word

• Hypothesis: Infants use local minima in single syllable transition probabilities to segment speech streams.

Page 4: Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander

Iowa State UniversityDevelopmental Robotics Laboratory

Voting Experts

• An algorithm for unsupervised segmentation• Key Idea: Natural “chunks” have:

– Low Internal Information– High Boundary Entropy

itwasabrightcolddayinaprilandtheclockswere

))"log(Pr(")"(" brightbrightI

)"(")"(" rightcIbrightI

Page 5: Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander

Iowa State UniversityDevelopmental Robotics Laboratory

Voting Experts

• An algorithm for unsupervised segmentation• Key Idea: Natural “chunks” have:

– Low Internal Information– High Boundary Entropy

itwasabrightcolddayinaprilandtheclockswere

)"(")"|"Pr()"("

brightIbrightbrightE

)"(")"(" brighEbrightE

Page 6: Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander

Iowa State UniversityDevelopmental Robotics Laboratory

VE Implementation (Cohen 2006)

1. Build an n-gram trie from text.2. Slide a window along the text sequence3. Two experts vote how to break the window

1. One minimizes internal info2. Other maximizes boundary entropy

i t w a s a b r i g h t c o l d d a y i n a p r i lWindow

1

windowts

II

..

)]()([min ,)"(")"(" abrigIasI

Page 7: Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander

Iowa State UniversityDevelopmental Robotics Laboratory

VE Implementation (Cohen 2006)

1. Build an n-gram trie from text.2. Slide a window along the text sequence3. Two experts vote how to break the window

1. One minimizes internal info2. Other maximizes boundary entropy

i t w a s a b r i g h t c o l d d a y i n a p r i lWindow

2

windowts

E

..

)]([max )"("asaE

Page 8: Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander

Iowa State UniversityDevelopmental Robotics Laboratory

VE Implementation (Cohen 2006)

1. Build an n-gram trie from text.2. Slide a window along the text sequence3. Two experts vote how to break the window

1. One minimizes internal info2. Other maximizes boundary entropy

4. Break at vote peaks

i t w a s a b r i g h t c o l d d a y i n a p r i l

i | t | w | a | s | a | b | r | i | g | h | t | c | o | l | d0

3

1

0

3

2

0

1

1

0

0

6

1

0

0

Page 9: Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander

Iowa State UniversityDevelopmental Robotics Laboratory

VE Results• Results are surprisingly good on text

– Especially giving its simplicity– Accuracy and Hit rate about 75%

• Seems to capture something about the nature of “chunks”

• Can we use this algorithm to segment real audio?

It was a br igh t

Page 10: Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander

Iowa State UniversityDevelopmental Robotics Laboratory

Acoustic Model

Page 11: Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander

Iowa State UniversityDevelopmental Robotics Laboratory

Acoustic Model

• Cluster spectral features using a GGSOM

Page 12: Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander

Iowa State UniversityDevelopmental Robotics Laboratory

Acoustic Model

• Cluster spectral features using a GGSOM• Collapse state sequence

Page 13: Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander

Iowa State UniversityDevelopmental Robotics Laboratory

Acoustic Model

• Cluster spectral features using a GGSOM• Collapse state sequence• Run VE to get breaks

Page 14: Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander

Iowa State UniversityDevelopmental Robotics Laboratory

Experiments and Results• Used the model to segment “1984”

– CD 1 of audio book (40 mins)– Chosen for length, consistency– Evaluation: Human graders

Page 15: Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander

Iowa State UniversityDevelopmental Robotics Laboratory

New Experiments• Trained on infant datasets

• Tested on manually generated keys

Stream A:tupiro golabu bedaku padoti

Stream B:dapiku tilado pagotu burobi

Train Train

Train Train

Test Test

Test Test

Acoustic Model A

Acoustic Model B

VE Model A

VE Model B

Key A

Key B

Page 16: Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander

Iowa State UniversityDevelopmental Robotics Laboratory

New Experiments• Trained on infant datasets

• Tested on manually generated keys

Stream A:tupiro golabu bedaku padoti

Stream B:dapiku tilado pagotu burobi

Test TestTes

t Test

Acoustic Model A

Acoustic Model B

VE Model A

VE Model B

Key B

Key A

Page 17: Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander

Iowa State UniversityDevelopmental Robotics Laboratory

Results• Experiment 1

– Accuracy: 50% on all induced breaks– Hit Rate: 75% of word breaks– Significantly better than chance

• Experiment 2– Accuracy: 16% on all induced breaks– Hit Rate: 1% of word breaks– Worse than chance– 18 breaks, 3 correct

Page 18: Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander

Iowa State UniversityDevelopmental Robotics Laboratory

Conclusions and Future Work• VE Model can be used to segment audio

• Can reproduce the results of Infant studies

• May model part of the human chunking mechanism

• Have built more sophisticated acoustic models– Better results (nearly perfect)

Page 19: Iowa State University Developmental Robotics Laboratory Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm Matthew Miller, Alexander

Iowa State UniversityDevelopmental Robotics Laboratory

Thank You• www.cs.iastate.edu/~mamille/