natural language...
TRANSCRIPT
![Page 1: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/1.jpg)
Natural Language Processing
Info 159/259Lecture 5: Truth and ethics (Sept 6, 2018)
David Bamman, UC Berkeley
![Page 2: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/2.jpg)
Natural Language Processing
Info 159/259Lecture 5: Truth and ethics (Sept 6, 2018)
David Bamman, UC Berkeley
Hwæt! Wé Gárdena in géardagum, þéodcyninga
þrym gefrúnon, hú ðá æþelingas ellen fremedon. Oft Scyld Scéfing sceaþena
![Page 3: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/3.jpg)
![Page 4: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/4.jpg)
h1
h2
h3
x1
x2
x3
x4
x5
x6
x7
h1 = �(x1W1 + x2W2 + x3W3)
h2 = �(x3W1 + x4W2 + x5W3)
h3 = �(x5W1 + x6W2 + x7W3)
h1=f(I, hated, it)
h3=f(really, hated, it)
h2=f(it, I, really)
I
hated
it
I
really
hated
it
Wx
x1
x2
x3
1
1
1
3.1
-2.7
1.4
-0.7
-1.4
9.2
-3.1
-2.7
1.4
0.1
0.3-0.4
-2.4-4.7
5.7
W1
W2
W3
![Page 5: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/5.jpg)
Convolutional networksx1
x2
x3
x4
x5
x6
x7
1
10
2
-1
5
10
max poolingconvolution
This defines one filter.
![Page 6: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/6.jpg)
Zhang and Wallace 2016, “A Sensitivity Analysis of (and Practitioners’ Guide to)
Convolutional Neural Networks for Sentence Classification”
![Page 7: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/7.jpg)
Modern NLP is driven by annotated data
• Penn Treebank (1993; 1995;1999); morphosyntactic annotations of WSJ
• OntoNotes (2007–2013); syntax, predicate-argument structure, word sense, coreference
• FrameNet (1998–): frame-semantic lexica/annotations
• MPQA (2005): opinion/sentiment
• SQuAD (2016): annotated questions + spans of answers in Wikipedia
![Page 8: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/8.jpg)
• In most cases, the data we have is the product of human judgments.
• What’s the correct part of speech tag?
• Syntactic structure?
• Sentiment?
Modern NLP is driven by annotated data
![Page 9: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/9.jpg)
Ambiguity
“One morning I shot an elephant in my pajamas”
Animal Crackers
![Page 10: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/10.jpg)
Dogmatism
Fast and Horvitz (2016), “Identifying Dogmatism in Social Media: Signals and Models”
![Page 11: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/11.jpg)
Sarcasm
https://www.nytimes.com/2016/08/12/opinion/an-even-stranger-donald-trump.html?ref=opinion
![Page 13: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/13.jpg)
Pustejovsky and Stubbs (2012), Natural Language Annotation for Machine Learning
Annotation pipeline
![Page 14: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/14.jpg)
Pustejovsky and Stubbs (2012), Natural Language Annotation for Machine Learning
Annotation pipeline
![Page 15: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/15.jpg)
Annotation Guidelines
• Our goal: given the constraints of our problem, how can we formalize our description of the annotation process to encourage multiple annotators to provide the same judgment?
![Page 16: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/16.jpg)
Annotation guidelines• What is the goal of the project?
• What is each tag called and how is it used? (Be specific: provide examples, and discuss gray areas.)
• What parts of the text do you want annotated, and what should be left alone?
• How will the annotation be created? (For example, explain which tags or documents to annotate first, how to use the annotation tools, etc.)
Pustejovsky and Stubbs (2012), Natural Language Annotation for Machine Learning
![Page 17: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/17.jpg)
Practicalities
• Annotation takes time, concentration (can’t do it 8 hours a day)
• Annotators get better as they annotate (earlier annotations not as good as later ones)
![Page 18: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/18.jpg)
Why not do it yourself?
• Expensive/time-consuming
• Multiple people provide a measure of consistency: is the task well enough defined?
• Low agreement = not enough training, guidelines not well enough defined, task is bad
![Page 19: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/19.jpg)
Adjudication• Adjudication is the process of deciding on a single
annotation for a piece of text, using information about the independent annotations.
• Can be as time-consuming (or more so) as a primary annotation.
• Does not need to be identical with a primary annotation (both annotators can be wrong by chance)
![Page 20: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/20.jpg)
Interannotator agreement
puppy fried chicken
puppy 6 3
fried chicken 2 5
annotator A
anno
tato
r B
observed agreement = 11/16 = 68.75%
https://twitter.com/teenybiscuit/status/705232709220769792/photo/1
![Page 21: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/21.jpg)
Cohen’s kappa• If classes are imbalanced, we can get high inter
annotator agreement simply by chance
puppy fried chicken
puppy 7 4
fried chicken 8 81
annotator Aan
nota
tor B
![Page 22: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/22.jpg)
Cohen’s kappa• If classes are imbalanced, we can get high inter
annotator agreement simply by chance
puppy fried chicken
puppy 7 4
fried chicken 8 81
annotator A
anno
tato
r B
� =po � pe
1 � pe
� =0.88 � pe
1 � pe
![Page 23: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/23.jpg)
Cohen’s kappa• Expected probability of agreement is how often we would
expect two annotators to agree assuming independent annotations
pe = P (A = puppy, B = puppy) + P (A = chicken, B = chicken)
= P (A = puppy)P (B = puppy) + P (A = chicken)P (B = chicken)
![Page 24: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/24.jpg)
Cohen’s kappa= P (A = puppy)P (B = puppy) + P (A = chicken)P (B = chicken)
puppy fried chicken
puppy 7 4
fried chicken 8 81
annotator A
anno
tato
r B
P(A=puppy) 15/100 = 0.15
P(B=puppy) 11/100 = 0.11
P(A=chicken) 85/100 = 0.85
P(B=chicken) 89/100 = 0.89
= 0.15 � 0.11 + 0.85 � 0.89
= 0.773
![Page 25: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/25.jpg)
Cohen’s kappa• If classes are imbalanced, we can get high inter
annotator agreement simply by chance
puppy fried chicken
puppy 7 4
fried chicken 8 81
annotator A
anno
tato
r B
� =po � pe
1 � pe
� =0.88 � pe
1 � pe
� =0.88 � 0.773
1 � 0.773
= 0.471
![Page 26: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/26.jpg)
• “Good” values are subject to interpretation, but rule of thumb:
Cohen’s kappa
0.80-1.00 Very good agreement
0.60-0.80 Good agreement
0.40-0.60 Moderate agreement
0.20-0.40 Fair agreement
< 0.20 Poor agreement
![Page 27: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/27.jpg)
puppy fried chicken
puppy 0 0
fried chicken 0 100
annotator Aan
nota
tor B
![Page 28: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/28.jpg)
puppy fried chicken
puppy 50 0
fried chicken 0 50
annotator Aan
nota
tor B
![Page 29: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/29.jpg)
puppy fried chicken
puppy 0 50
fried chicken 50 0
annotator Aan
nota
tor B
![Page 30: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/30.jpg)
Interannotator agreement
• Cohen’s kappa can be used for any number of classes.
• Still requires two annotators who evaluate the same items.
• Fleiss’ kappa generalizes to multiple annotators, each of whom may evaluate different items (e.g., crowdsourcing)
![Page 31: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/31.jpg)
Fleiss’ kappa• Same fundamental idea of
measuring the observed agreement compared to the agreement we would expect by chance.
• With N > 2, we calculate agreement among pairs of annotators
� =Po � Pe
1 � Pe
![Page 32: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/32.jpg)
nijNumber of annotators who assign category j to item i
Pi =1
n(n � 1)
K�
j=1
nij(nij � 1)For item i with n annotations, how many annotators agree, among all
n(n-1) possible pairs
Fleiss’ kappa
![Page 33: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/33.jpg)
Pi =1
n(n � 1)
K�
j=1
nij(nij � 1)For item i with n annotations, how many annotators agree, among all
n(n-1) possible pairs
A B C D
+ + + -
Annotator
Label nij
+ 3- 1
A-B B-A A-C C-A B-C C-B
Pi =1
4(3)(3(2) + 1(0))
agreeing pairs of annotators →
Fleiss’ kappa
![Page 34: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/34.jpg)
Average agreement among all items
Expected agreement by chance — joint probability two raters pick the same label is the product of their
independent probabilities of picking that label
pj =1
Nn
N�
i=1
nijProbability of category j
Pe =K�
j=1
p2j
Po =1
N
N�
i=1
Pi
Fleiss’ kappa
![Page 35: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/35.jpg)
Annotator bias correction• Dawid, A. P. and Skene, A. M. "Maximum Likelihood Estimation of Observer Error-Rates
Using the EM Algorithm," Journal of the Royal Statistical Society, 28(1):20–28, 1979.
• Weibe et al. (1999), "Development and use of a gold-standard data set for subjectivity classifications," ACL (for sentiment)
• Carpenter (2010), "Multilevel Bayesian Models of Categorical Data Annotation"
• Rion Snow, Brendan O'Connor, Daniel Jurafsky and Andrew Y. Ng. Cheap and Fast - But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks. EMNLP 2008
• Sheng et al. (2008), "Get another label? improving data quality and data mining using multiple, noisy labelers", KDD.
• Raykar et al. (2009), "Supervised learning from multiple experts: whom to trust when everyone lies a bit," ICML
• Hovy et al. (2013), "Learning Whom to Trust with MACE," NAACL
![Page 36: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/36.jpg)
Annotator bias correction
positive negative mixed unknown
positive 0.95 0 0.03 0.02
negative 0 0.80 0.10 0.10
mixed 0.20 0.05 0.50 0.25
unknown 0.15 0.10 0.10 0.70
truth
annotator label
confusion matrix for a single annotator (David)
P (label | truth)
![Page 37: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/37.jpg)
Annotator bias correction
Dawid and Skene 1979
labels
LI
0.0
0.1
0.2
0.3
0.4
0.0
0.1
0.2
0.3
0.4annotator confusion matrix
P(label | truth)
truth
Annotator bias correction
Basic idea: the true label is unobserved; what we observe are
noisy judgments by annotators
![Page 38: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/38.jpg)
Evaluation
• A critical part of development new algorithms and methods and demonstrating that they work
![Page 39: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/39.jpg)
Classification
𝓧 = set of all documents 𝒴 = {english, mandarin, greek, …}
A mapping h from input data x (drawn from instance space 𝓧) to a label (or labels) y from some enumerable output space 𝒴
x = a single document y = ancient greek
![Page 40: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/40.jpg)
train dev test
𝓧instance space
![Page 41: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/41.jpg)
Experiment designtraining development testing
size 80% 10% 10%
purpose training models model selectionevaluation;
never look at it until the very
end
![Page 42: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/42.jpg)
Metrics• Evaluations presuppose that you have some metric to evaluate
the fitness of a model.
• Language model: perplexity
• POS tagging/NER: accuracy, precision, recall, F1
• Phrase-structure parsing: PARSEVAL (bracketing overlap)
• Dependency parsing: Labeled/unlabeled attachment score
• Machine translation: BLEU, METEOR
• Summarization: ROUGE
![Page 43: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/43.jpg)
POS NEG NEUT
POS 100 2 15
NEG 0 104 30
NEUT 30 40 70
Predicted (ŷ)
True
(y)
Multiclass confusion matrix
![Page 44: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/44.jpg)
Accuracy
POS NEG NEUT
POS 100 2 15
NEG 0 104 30
NEUT 30 40 70
Predicted (ŷ)Tr
ue (y
)
1
N
N�
i=1
I[yi = yi]
I[x]
�1 if x is true
0 otherwise
![Page 45: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/45.jpg)
Precision
Precision: proportion of predicted class that are actually that class.
POS NEG NEUT
POS 100 2 15
NEG 0 104 30
NEUT 30 40 70
Predicted (ŷ)Tr
ue (y
)
Precision(POS) =
∑Ni=1 I(yi = yi = POS)
∑Ni=1 I( yi = POS)
![Page 46: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/46.jpg)
Recall
Recall: proportion of true class that are predicted to be that class.
POS NEG NEUT
POS 100 2 15
NEG 0 104 30
NEUT 30 40 70
Predicted (ŷ)Tr
ue (y
)
Recall(POS) =
∑Ni=1 I(yi = yi = POS)
∑Ni=1 I(yi = POS)
![Page 47: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/47.jpg)
Significance
• If we observe difference in performance, what’s the cause? Is it because one system is better than another, or is it a function of randomness in the data? If we had tested it on other data, would we get the same result?
Your work 58%
Current state of the art 50%
![Page 48: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/48.jpg)
Hypothesis testing
• Hypothesis testing measures our confidence in what we can say about a null from a sample.
![Page 49: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/49.jpg)
Hypothesis testing• Current state of the art = 50%; your model = 58%.
Both evaluated on the same test set of 1000 data points.
• Null hypothesis = there is no difference, so we would expect your model to get 500 of the 1000 data points right.
• If we make parametric assumptions, we can model this with a Binomial distribution (number of successes in n trials)
![Page 50: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/50.jpg)
Example
Binomial probability distribution for number of correct predictions in n=1000 with p = 0.5
0.000
0.005
0.010
0.015
0.020
0.025
400 450 500 550 600# Dem
![Page 51: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/51.jpg)
0.000
0.005
0.010
0.015
0.020
0.025
400 450 500 550 600# Dem
ExampleAt what point is a sample statistic unusual enough to reject
the null hypothesis?
510
580
![Page 52: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/52.jpg)
Example
• The form we assume for the null hypothesis lets us quantify that level of surprise.
• We can do this for many parametric forms that allows us to measure P(X ≤ x) for some sample of size n; for large n, we can often make a normal approximation.
![Page 53: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/53.jpg)
• Decide on the level of significance α. {0.05, 0.01}
• Testing is evaluating whether the sample statistic falls in the rejection region defined by α
Tests
![Page 54: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/54.jpg)
1 “jobs” is predictive of presidential approval rating
2 “job” is predictive of presidential approval rating
3 “war” is predictive of presidential approval rating
4 “car” is predictive of presidential approval rating
5 “the” is predictive of presidential approval rating
6 “star” is predictive of presidential approval rating
7 “book” is predictive of presidential approval rating
8 “still” is predictive of presidential approval rating
9 “glass” is predictive of presidential approval rating
… …
1,000 “bottle” is predictive of presidential approval rating
![Page 55: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/55.jpg)
Errors
• For any significance level α and n hypothesis tests, we can expect α⨉n type I errors.
• α=0.01, n=1000 = 10 “significant” results simply by chance
![Page 56: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/56.jpg)
Multiple hypothesis corrections
• Bonferroni correction: for family-wise significance level α0 with n hypothesis tests:
• [Very strict; controls the probability of at least one type I error.]
• False discovery rate
α � α0n
![Page 57: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/57.jpg)
Nonparametric tests
• Many hypothesis tests rely on parametric assumptions (e.g., normality)
• Alternatives that don’t rely on those assumptions:
• permutation test • the bootstrap
![Page 58: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/58.jpg)
Issues
• Evaluation performance may not hold across domains (e.g., WSJ →literary texts)
• Covariates may explain performance (MT/parsing, sentences up to length n)
• Multiple metrics may offer competing results
Søgaard et al. 2014
![Page 59: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/59.jpg)
English POS
5062.5
7587.5100
WSJ Shakespeare
81.997.0
German POS
5062.5
7587.5100
Modern Early Modern
69.6
97.0English POS
5062.5
7587.5100
WSJ Middle English
56.2
97.3
Italian POS
5062.5
7587.5100
News Dante
75.0
97.0English POS
5062.5
7587.5100
WSJ Twitter
73.7
97.3English NER
40557085
100
CoNLL Twitter
41.0
89.0
Phrase structure parsing
5060708090
WSJ GENIA
79.389.5
Dependency parsing
5059.7569.5
79.2589
WSJ Patent
79.688.2
Dependency parsing
5059.2568.5
77.7587
WSJ Magazines
77.186.9
![Page 60: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/60.jpg)
Ethics
Why does a discussion about ethics need to be a part of NLP?
![Page 61: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/61.jpg)
Conversational Agents
![Page 62: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/62.jpg)
Question Answering
http://searchengineland.com/according-google-barack-obama-king-united-states-209733
![Page 63: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/63.jpg)
Language Modeling
![Page 64: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/64.jpg)
Vector semantics
![Page 65: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/65.jpg)
• The decisions we make about our methods — training data, algorithm, evaluation — are often tied up with its use and impact in the world.
![Page 66: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/66.jpg)
I saw the man with the telescope
nsubjdobj
detdet
pobjprep
prep
Scope
• NLP often operates on text divorced from the context in which it is uttered.
• It’s now being used more and more to reason about human behavior.
![Page 67: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/67.jpg)
Privacy
![Page 68: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/68.jpg)
![Page 69: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/69.jpg)
![Page 70: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/70.jpg)
Interventions
![Page 71: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/71.jpg)
![Page 72: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/72.jpg)
Exclusion
• Focus on data from one domain/demographic
• State-of-the-art models perform worse for young (Hovy and Søgaard 2015) and minorities (Blodgett et al. 2016)
![Page 73: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/73.jpg)
Language identification Dependency parsing
Blodgett et al. (2016), "Demographic Dialectal Variation in Social Media: A Case Study of African-American English" (EMNLP)
Exclusion
![Page 74: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/74.jpg)
Overgeneralization
• Managing and communicating the uncertainty of our predictions
• Is a false answer worse than no answer?
![Page 75: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/75.jpg)
Dual Use
• Authorship attribution (author of Federalist Papers vs. author of ransom note vs. author of political dissent)
• Fake review detection vs. fake review generation
• Censorship evasion vs. enabling more robust censorship
![Page 76: Natural Language Processingpeople.ischool.berkeley.edu/~dbamman/nlpF18/slides/5_truth_ethics.pdfLecture 5: Truth and ethics (Sept 6, 2018) David Bamman, UC Berkeley. Natural Language](https://reader034.vdocument.in/reader034/viewer/2022050101/5f408a89d990707d8d4e2e34/html5/thumbnails/76.jpg)
Homework 2• Derive the updates for a CNN and implement the
functions for forward/backward pass
• Out today, due Sept 20
• Be sure to check Piazza for any updates
• Start early! Another short homework will come out next Thursday.