You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text
Laura Mayfield Tomokiyo
Rosie Jones
Carnegie Mellon University
Overview
Motivation Speech data Accent detection as document
classification Classification performance Discriminative tokens Conclusions
Non-native speech recognition
The warship U.S.S. Jarrett has pulled into port in San Diego, CA after training voyage
Native recognizer (word accuracy = 26.7):
Tomorrow CPU a sister at has spilled into port and sandy and afford after a training wage
Non-native recognizer (word accuracy = 73.3):
The worst eighty U.S.S. chart has pulled into port in San Diego California after training warrior
Motivation
Practical can we detect non-native users with
enough accuracy to switch acoustic models?
Exploratory how well does an algorithm based
only on text features work? what tokens are discriminative for
non-native speakers?
Speech examples
Over the next two months, public officials, Native American leaders, businesses and environmental groups will come up with plans for meeting the law’s requirements.
Spontaneous speech
Read speech
I like to have anything very special in Boston, very native in Boston.
Local specialties
Speech data
Read speech Spontaneous speech
Native language
Speaker count
Utterance count
Word count (types)
Speaker count
Utterance count
Word count (types)
Japanese 10 957 15868 (3195)
31 1685 15934 (826)
English 8 756 10237 (2073)
6 320 4117 (418)
Mandarin --- --- --- 6 374 3490 (391)
Transcripts and hypotheses
A safety net for the salmons
Environment= environmentalists…
A safety net forced simon
Um environmental activists…
•Usually gives a good idea of gold standard
•Finds true differences in linguistic usage
•Implicitly models acoustics
•Benefits from amplified difference between native and non-native samples
Classification based on transcripts: Classification based on hypotheses:
“A safety net for salmon: environmentalists, the government, and ordinary folks team up to save the Northwest’s wondrous wild salmon”
Related work
Acoustic feature based accent discrimination (e.g. Fung and Liu 1999)
Competing HMM based accent discrimination (e.g. Teixeira et al 1996)
Classification of documents according to style (Argamon-Engleson et al 1998), author (Mosteller and Wallace 1964)
Accent detection as document classification
Native speaker utterances
Non-native speaker utterances
Classifier
Accent detection as document classification
Classifier
Test speaker utterances
Classification decision: native or non-native?
Experimental methodology
Rainbow naïve Bayes classifier Both word and part-of-speech tokens were examined Classification based on token unigrams and bigrams No feature selection initially Stopwords were not excluded from feature set Data randomly split into 30% testing, 70% training data
for evaluation; evaluation repeated 20 times and classification results averaged
Utterances from the same speaker never appeared in both training and test sets
Classification of spontaneous speech (transcripts only)
01020304050
60708090
100
Cla
ssif
icat
ion
accu
racy
BaselineWordPOSPOSNoun
Native/ Japanese
Native/ Chinese
Japanese/ Chinese
Native/ Non-native
Native/ Japanese/ Chinese
Classification of read speech
0102030405060708090
100
A
Word-trans
POS-trans
Word-hypo
POS-hypo
A train: same texts
test: same texts
baseline
Classification of read speech
0102030405060708090
100
A B C D
trans-word
trans-POS
hypo-word
hypo-POS
A train: same texts
test: same texts
B train: disjoint texts
test: disjoint texts
C train: disjoint texts
test: same texts
D train: same texts
test: disjoint texts
baseline
Classification of read speech
0102030405060708090
100
B
trans-word
trans-pos
hypo-word
hypo-pos
A train: same texts
test: same texts
B train: disjoint texts
test: disjoint texts
C train: disjoint texts
test: same texts
D train: same texts
test: disjoint texts
baseline
Feature Selection
Method Number of features Accuracy
None 4087 47
IG-524 524 69
SMART-524 524 88
IG-200 200 74
SMART-524, IG-200 200 88
IG-70 70 70
M&W-70 70 87
IG-48 48 74
SMART-48 48 84
Discriminative sequences
Speech type Token type Native Non-native
Read Word NMFS the + the
the that
Read POS noun(pl) noun(sing)
noun(pl) verb(past)
Spontaneous Word Wonderland the
Spontaneous POS TO + verb(base) noun(sing)
Spontaneous POSNoun am noun(sing)
transcriptions hypotheses
Conclusions
Transcriptions of spontaneous speech can be classified with high accuracy for both 2-way and 3-way distinctions
Read speech samples, which are simple transformations of native-produced text, can be classified with high accuracy
Recognizer output is classified more accurately than transcripts
Future directions
Incorporating the classification decision in acoustic model selection
Minimizing the number of samples from the test speaker needed for classification
Applying classification to parsing grammar selection, language model construction, writer identification
Discriminative POS sequences
Native Non-native
Noun(pl) Noun(sing)
Determiner Preposition
Noun(pl);preposition Preposition;preposition
Adjective;noun(Pl) Noun(sing);noun(sing)
Gerund;particle Particle;preposition
Noun(s);verb(3s) Cardinal#;cardinal#
Noun(pl);modal Verb(past)
Discriminative word sequences
Native Non-native
NMFS the;the
the;NMFS in;in
nineteen;hundreds the
hundreds;now in
hundreds that
habitats;and habitat;and
Phone-based classification
0
20
40
60
80
100
Words Phones
Identity POS/Phone class
Native Non-native
Phone identity // /I/
Phone class
CCC V
Discriminative tokens
Condition B