iowa state university developmental robotics laboratory unsupervised segmentation of audio speech...
TRANSCRIPT
Iowa State UniversityDevelopmental Robotics Laboratory
Unsupervised Segmentation of Audio Speech using the Voting Experts Algorithm
Matthew Miller, Alexander StoytchevDevelopmental Robotics Lab
Department of Electrical and Computer Engineering Iowa State University
[email protected], [email protected]/~mamille/
Iowa State UniversityDevelopmental Robotics Laboratory
Language: A Grand Challenge• A working example• Automatically acquires
language• Well studied
Iowa State UniversityDevelopmental Robotics Laboratory
Statistical Learning Experiments
• Saffran et. al. (1996): 8-month-olds can segment speech.
Artificial Language:tupiro golabu bedaku padoti
Language: tu pi ro go la bu be da kuTransition Prob: 1.0 1.0 .25 1.0 1.0 .25 1.0 1.0 ...
Acclimate
Novel Word
• Hypothesis: Infants use local minima in single syllable transition probabilities to segment speech streams.
Iowa State UniversityDevelopmental Robotics Laboratory
Voting Experts
• An algorithm for unsupervised segmentation• Key Idea: Natural “chunks” have:
– Low Internal Information– High Boundary Entropy
itwasabrightcolddayinaprilandtheclockswere
))"log(Pr(")"(" brightbrightI
)"(")"(" rightcIbrightI
Iowa State UniversityDevelopmental Robotics Laboratory
Voting Experts
• An algorithm for unsupervised segmentation• Key Idea: Natural “chunks” have:
– Low Internal Information– High Boundary Entropy
itwasabrightcolddayinaprilandtheclockswere
)"(")"|"Pr()"("
brightIbrightbrightE
)"(")"(" brighEbrightE
Iowa State UniversityDevelopmental Robotics Laboratory
VE Implementation (Cohen 2006)
1. Build an n-gram trie from text.2. Slide a window along the text sequence3. Two experts vote how to break the window
1. One minimizes internal info2. Other maximizes boundary entropy
i t w a s a b r i g h t c o l d d a y i n a p r i lWindow
1
windowts
II
..
)]()([min ,)"(")"(" abrigIasI
Iowa State UniversityDevelopmental Robotics Laboratory
VE Implementation (Cohen 2006)
1. Build an n-gram trie from text.2. Slide a window along the text sequence3. Two experts vote how to break the window
1. One minimizes internal info2. Other maximizes boundary entropy
i t w a s a b r i g h t c o l d d a y i n a p r i lWindow
2
windowts
E
..
)]([max )"("asaE
Iowa State UniversityDevelopmental Robotics Laboratory
VE Implementation (Cohen 2006)
1. Build an n-gram trie from text.2. Slide a window along the text sequence3. Two experts vote how to break the window
1. One minimizes internal info2. Other maximizes boundary entropy
4. Break at vote peaks
i t w a s a b r i g h t c o l d d a y i n a p r i l
i | t | w | a | s | a | b | r | i | g | h | t | c | o | l | d0
3
1
0
3
2
0
1
1
0
0
6
1
0
0
Iowa State UniversityDevelopmental Robotics Laboratory
VE Results• Results are surprisingly good on text
– Especially giving its simplicity– Accuracy and Hit rate about 75%
• Seems to capture something about the nature of “chunks”
• Can we use this algorithm to segment real audio?
It was a br igh t
Iowa State UniversityDevelopmental Robotics Laboratory
Acoustic Model
Iowa State UniversityDevelopmental Robotics Laboratory
Acoustic Model
• Cluster spectral features using a GGSOM
Iowa State UniversityDevelopmental Robotics Laboratory
Acoustic Model
• Cluster spectral features using a GGSOM• Collapse state sequence
Iowa State UniversityDevelopmental Robotics Laboratory
Acoustic Model
• Cluster spectral features using a GGSOM• Collapse state sequence• Run VE to get breaks
Iowa State UniversityDevelopmental Robotics Laboratory
Experiments and Results• Used the model to segment “1984”
– CD 1 of audio book (40 mins)– Chosen for length, consistency– Evaluation: Human graders
Iowa State UniversityDevelopmental Robotics Laboratory
New Experiments• Trained on infant datasets
• Tested on manually generated keys
Stream A:tupiro golabu bedaku padoti
Stream B:dapiku tilado pagotu burobi
Train Train
Train Train
Test Test
Test Test
Acoustic Model A
Acoustic Model B
VE Model A
VE Model B
Key A
Key B
Iowa State UniversityDevelopmental Robotics Laboratory
New Experiments• Trained on infant datasets
• Tested on manually generated keys
Stream A:tupiro golabu bedaku padoti
Stream B:dapiku tilado pagotu burobi
Test TestTes
t Test
Acoustic Model A
Acoustic Model B
VE Model A
VE Model B
Key B
Key A
Iowa State UniversityDevelopmental Robotics Laboratory
Results• Experiment 1
– Accuracy: 50% on all induced breaks– Hit Rate: 75% of word breaks– Significantly better than chance
• Experiment 2– Accuracy: 16% on all induced breaks– Hit Rate: 1% of word breaks– Worse than chance– 18 breaks, 3 correct
Iowa State UniversityDevelopmental Robotics Laboratory
Conclusions and Future Work• VE Model can be used to segment audio
• Can reproduce the results of Infant studies
• May model part of the human chunking mechanism
• Have built more sophisticated acoustic models– Better results (nearly perfect)
Iowa State UniversityDevelopmental Robotics Laboratory
Thank You• www.cs.iastate.edu/~mamille/