you’re not from ‘round here, are you? naïve bayes detection of non-native utterance text laura...
Post on 21-Dec-2015
214 views
TRANSCRIPT
![Page 1: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5a5503460f94a3a723/html5/thumbnails/1.jpg)
You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text
Laura Mayfield Tomokiyo
Rosie Jones
Carnegie Mellon University
![Page 2: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5a5503460f94a3a723/html5/thumbnails/2.jpg)
Overview
Motivation Speech data Accent detection as document
classification Classification performance Discriminative tokens Conclusions
![Page 3: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5a5503460f94a3a723/html5/thumbnails/3.jpg)
Non-native speech recognition
The warship U.S.S. Jarrett has pulled into port in San Diego, CA after training voyage
Native recognizer (word accuracy = 26.7):
Tomorrow CPU a sister at has spilled into port and sandy and afford after a training wage
Non-native recognizer (word accuracy = 73.3):
The worst eighty U.S.S. chart has pulled into port in San Diego California after training warrior
![Page 4: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5a5503460f94a3a723/html5/thumbnails/4.jpg)
Motivation
Practical can we detect non-native users with
enough accuracy to switch acoustic models?
Exploratory how well does an algorithm based
only on text features work? what tokens are discriminative for
non-native speakers?
![Page 5: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5a5503460f94a3a723/html5/thumbnails/5.jpg)
Speech examples
Over the next two months, public officials, Native American leaders, businesses and environmental groups will come up with plans for meeting the law’s requirements.
Spontaneous speech
Read speech
I like to have anything very special in Boston, very native in Boston.
Local specialties
![Page 6: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5a5503460f94a3a723/html5/thumbnails/6.jpg)
Speech data
Read speech Spontaneous speech
Native language
Speaker count
Utterance count
Word count (types)
Speaker count
Utterance count
Word count (types)
Japanese 10 957 15868 (3195)
31 1685 15934 (826)
English 8 756 10237 (2073)
6 320 4117 (418)
Mandarin --- --- --- 6 374 3490 (391)
![Page 7: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5a5503460f94a3a723/html5/thumbnails/7.jpg)
Transcripts and hypotheses
A safety net for the salmons
Environment= environmentalists…
A safety net forced simon
Um environmental activists…
•Usually gives a good idea of gold standard
•Finds true differences in linguistic usage
•Implicitly models acoustics
•Benefits from amplified difference between native and non-native samples
Classification based on transcripts: Classification based on hypotheses:
“A safety net for salmon: environmentalists, the government, and ordinary folks team up to save the Northwest’s wondrous wild salmon”
![Page 8: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5a5503460f94a3a723/html5/thumbnails/8.jpg)
Related work
Acoustic feature based accent discrimination (e.g. Fung and Liu 1999)
Competing HMM based accent discrimination (e.g. Teixeira et al 1996)
Classification of documents according to style (Argamon-Engleson et al 1998), author (Mosteller and Wallace 1964)
![Page 9: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5a5503460f94a3a723/html5/thumbnails/9.jpg)
Accent detection as document classification
Native speaker utterances
Non-native speaker utterances
Classifier
![Page 10: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5a5503460f94a3a723/html5/thumbnails/10.jpg)
Accent detection as document classification
Classifier
Test speaker utterances
Classification decision: native or non-native?
![Page 11: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5a5503460f94a3a723/html5/thumbnails/11.jpg)
Experimental methodology
Rainbow naïve Bayes classifier Both word and part-of-speech tokens were examined Classification based on token unigrams and bigrams No feature selection initially Stopwords were not excluded from feature set Data randomly split into 30% testing, 70% training data
for evaluation; evaluation repeated 20 times and classification results averaged
Utterances from the same speaker never appeared in both training and test sets
![Page 12: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5a5503460f94a3a723/html5/thumbnails/12.jpg)
Classification of spontaneous speech (transcripts only)
01020304050
60708090
100
Cla
ssif
icat
ion
accu
racy
BaselineWordPOSPOSNoun
Native/ Japanese
Native/ Chinese
Japanese/ Chinese
Native/ Non-native
Native/ Japanese/ Chinese
![Page 13: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5a5503460f94a3a723/html5/thumbnails/13.jpg)
Classification of read speech
0102030405060708090
100
A
Word-trans
POS-trans
Word-hypo
POS-hypo
A train: same texts
test: same texts
baseline
![Page 14: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5a5503460f94a3a723/html5/thumbnails/14.jpg)
Classification of read speech
0102030405060708090
100
A B C D
trans-word
trans-POS
hypo-word
hypo-POS
A train: same texts
test: same texts
B train: disjoint texts
test: disjoint texts
C train: disjoint texts
test: same texts
D train: same texts
test: disjoint texts
baseline
![Page 15: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5a5503460f94a3a723/html5/thumbnails/15.jpg)
Classification of read speech
0102030405060708090
100
B
trans-word
trans-pos
hypo-word
hypo-pos
A train: same texts
test: same texts
B train: disjoint texts
test: disjoint texts
C train: disjoint texts
test: same texts
D train: same texts
test: disjoint texts
baseline
![Page 16: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5a5503460f94a3a723/html5/thumbnails/16.jpg)
Feature Selection
Method Number of features Accuracy
None 4087 47
IG-524 524 69
SMART-524 524 88
IG-200 200 74
SMART-524, IG-200 200 88
IG-70 70 70
M&W-70 70 87
IG-48 48 74
SMART-48 48 84
![Page 17: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5a5503460f94a3a723/html5/thumbnails/17.jpg)
Discriminative sequences
Speech type Token type Native Non-native
Read Word NMFS the + the
the that
Read POS noun(pl) noun(sing)
noun(pl) verb(past)
Spontaneous Word Wonderland the
Spontaneous POS TO + verb(base) noun(sing)
Spontaneous POSNoun am noun(sing)
transcriptions hypotheses
![Page 18: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5a5503460f94a3a723/html5/thumbnails/18.jpg)
Conclusions
Transcriptions of spontaneous speech can be classified with high accuracy for both 2-way and 3-way distinctions
Read speech samples, which are simple transformations of native-produced text, can be classified with high accuracy
Recognizer output is classified more accurately than transcripts
![Page 19: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5a5503460f94a3a723/html5/thumbnails/19.jpg)
Future directions
Incorporating the classification decision in acoustic model selection
Minimizing the number of samples from the test speaker needed for classification
Applying classification to parsing grammar selection, language model construction, writer identification
![Page 20: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5a5503460f94a3a723/html5/thumbnails/20.jpg)
Discriminative POS sequences
Native Non-native
Noun(pl) Noun(sing)
Determiner Preposition
Noun(pl);preposition Preposition;preposition
Adjective;noun(Pl) Noun(sing);noun(sing)
Gerund;particle Particle;preposition
Noun(s);verb(3s) Cardinal#;cardinal#
Noun(pl);modal Verb(past)
![Page 21: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5a5503460f94a3a723/html5/thumbnails/21.jpg)
Discriminative word sequences
Native Non-native
NMFS the;the
the;NMFS in;in
nineteen;hundreds the
hundreds;now in
hundreds that
habitats;and habitat;and
![Page 22: You’re Not From ‘Round Here, Are You? Naïve Bayes Detection of Non-native Utterance Text Laura Mayfield Tomokiyo Rosie Jones Carnegie Mellon University](https://reader030.vdocument.in/reader030/viewer/2022032521/56649d5a5503460f94a3a723/html5/thumbnails/22.jpg)
Phone-based classification
0
20
40
60
80
100
Words Phones
Identity POS/Phone class
Native Non-native
Phone identity // /I/
Phone class
CCC V
Discriminative tokens
Condition B