machine learning in nlp - uppsala universitynivre/master/ml13-18.pdf · 2018-05-23 · i why do we...

43
Machine Learning in NLP Joakim Nivre Uppsala University Linguistics and Philology Machine Learning in NLP 1(41)

Upload: others

Post on 26-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Machine Learning in NLP

Joakim Nivre

Uppsala University

Linguistics and Philology

Machine Learning in NLP 1(41)

Page 2: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

I Why do we use machine learning in NLP?

I When should we (not) use machine learning?

I Appropriate problems for machine learning (from Lecture 1):I Problems for which there is no known exact methodI Problems for which the exact method is too expensiveI Problems that evolve over time

Machine Learning in NLP 2(41)

Page 3: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

I Why do we use machine learning in NLP?

I When should we (not) use machine learning?

I Appropriate problems for machine learning (from Lecture 1):I Problems for which there is no known exact methodI Problems for which the exact method is too expensiveI Problems that evolve over time

Machine Learning in NLP 2(41)

Page 4: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Eugene Wigner’s article “The Unreasonable Effectiveness of Math-ematics in the Natural Sciences” examines why so much of physicscan be neatly explained with simple mathematical formulas such asf = ma or e = mc2. Meanwhile, sciences that involve human beingsrather than elementary particles have proven more resistant to ele-gant mathematics. Economists suffer from physics envy over theirinability to neatly model human behavior. An informal, incompletegrammar of the English language runs over 1,700 pages. Perhapswhen it comes to natural language processing and related fields,we’re doomed to complex theories that will never have the eleganceof physics equations. But if that’s so, we should stop acting as ifour goal is to author extremely elegant theories, and instead embracecomplexity and make use of the best ally we have: the unreasonableeffectiveness of data.

Alon Halevy, Peter Norvig and Fernando Pereira. 2010.The Unreasonable Effectiveness of Data. IEEE Intelligent Systems.

Machine Learning in NLP 3(41)

Page 5: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Plan for this Lecture

I A historical perspectiveI Formal theory-driven systemsI Statistical methodsI Deep learning

I Strengths and weaknesses of machine learning in NLP

Machine Learning in NLP 4(41)

Page 6: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Running Example: Parsing

I Input: natural language sentence (word sequence)

I Output: tree or graph capturing syntactic structure

we saw her duck

nsubj obj

xcomp

Machine Learning in NLP 5(41)

Page 7: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Computational Linguistics in the 1980s

Machine Learning in NLP 6(41)

Page 8: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Computational Linguistics in the 1980s

I Languages described by formal systemsI Inventory of elementary units (lexicon)I Rules for combining units (grammar)

I Created by linguists in a theoretical frameworkI Linguistic levels: morphology, syntax, semanticsI Generate all and only well-formed expressions

I Combined with algorithms for analysis/synthesis

Machine Learning in NLP 7(41)

Page 9: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Issues

I CoverageI Hard to build a complete description of a languageI Languages are constantly changing

I RobustnessI Language use is not always well-formedI Made worse by lack of coverage

Machine Learning in NLP 8(41)

Page 10: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Issues

I AmbiguityI Natural language grammars inherently ambiguousI Combinatorial explosion from interacting rules and levelsI Practical applications need disambiguation

we saw her duck

nsubj obj

xcomp

nsubj

obj

nmod:poss

Machine Learning in NLP 9(41)

Page 11: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Statistical NLP in the 1990s

Machine Learning in NLP 10(41)

Page 12: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Eisner’s Model C

I Stochastic process generating a dependency treeI Tree probability = product of subtree probabilitiesI Subtree probability = product of child probabilitiesI Child conditioned on tagged head word and preceding child tag

Machine Learning in NLP 11(41)

Page 13: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Statistical NLP in the 1990s

I Probabilistic models of languageI Generative models of P(X ,Y )I Examples: HMM, PCFG, NB

I Parameters estimated from (annotated) dataI Maximum-likelihood estimationI Smoothing to cope with sparse data

I Inference algorithms for analysis:I Exact argmax search using dynamic programmingI Examples: Viterbi, CKY

Machine Learning in NLP 12(41)

Page 14: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

How Does This Help?

I AmbiguityI Disambiguation through probability rankingI Learning from data more effective than heuristicsI Statistical evaluation to measure progress

we saw her duck

nsubj obj

xcomp

nsubj

obj

nmod:poss

Machine Learning in NLP 13(41)

Page 15: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

How Does This Help?

I CoverageI Smoothing allows graceful degradationI Unknown words can be interpreted in context

I RobustnessI Probability ranking allows constraint relaxationI No sharp line between well-formed and deviant

Machine Learning in NLP 14(41)

Page 16: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

A New Paradigm

I Emphasis on robust large-scale processingI Quantitative evaluation

I Naturally occurring test dataI Exact numerical metrics (frequency-based)

I Data-driven developmentI Naturally occurring training dataI Models induced using statistical inference

Machine Learning in NLP 15(41)

Page 17: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Machine Learning?

I Statistical models of the (early) 90s:I Generative models of P(X ,Y )I Maximum likelihood estimation (with smoothing)I No advanced learning algorithms – just counting

I Main limitation:I Rigid independence assumptions (local context)I Required for effective learning and efficient inference

Machine Learning in NLP 16(41)

Page 18: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Machine Learning in NLP (2005)

Machine Learning in NLP 17(41)

Page 19: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

McDonald’s Discriminative Model

I Discriminative model of trees given sentencesI Online learning (perceptron style)I Max-margin objective (MIRA)I Rich features over the input-output space

Machine Learning in NLP 18(41)

Page 20: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Machine Learning in NLP (2005)

I Conditional or discriminative modelsI Models for prediction X → YI Examples: Perceptron, SVM, MaxEnt

I Parameters estimated from (annotated) dataI Learning as numerical optimizationI Regularization to prevent overfitting

I Inference algorithms for analysisI Exact argmax search not always possibleI Heuristic methods like beam search and reranking

Machine Learning in NLP 19(41)

Page 21: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

How Does This Help?

I Independence assumptions can be relaxedI No need to estimate joint distribution P(X ,Y )I Features over input X come for free

I Prediction accuracy improves with rich featuresI Arbitrary combinations of input and output featuresI Fall back on heuristic inference for efficiency if needed

Machine Learning in NLP 20(41)

Page 22: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Problem Solved?

I Feature engineeringI Feature combinations have to be hand-craftedI Feature selection requires trial-and-error experiments

I Sparse discrete featuresI Most features are binarized symbolic features (1-hot)I Feature vectors get extremely high-dimensional but sparseI Problematic for learning and efficient inference

Machine Learning in NLP 21(41)

Page 23: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Deep Learning in NLP (2014)

Machine Learning in NLP 22(41)

Page 24: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Chen and Manning’s Transition-Based Parser

A Fast and Accurate Dependency Parser using Neural Networks 17

Model Architecture

Motivation | Model | Experiments | Analysis

ROOT has VBZ

He PRP

nsubj

has VBZ good JJ control NN . .

Stack Bu↵er

Correct transition: SHIFT

1

Input layer

Hidden layer

Output layer

Softmax probabilities

I MaltParser with MLP instead of SVM (greedy, local)

I But 2 percentage points better LAS on PTB/CTB!?

Machine Learning in NLP 23(41)

Page 25: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Traditional Sparse Features

A Fast and Accurate Dependency Parser using Neural Networks 10

Traditional Features

Motivation | Model | Experiments | Analysis

0 0 0 1 0 0 1 0 0 0 1 0binary, sparse dim =106 ~ 107

Indicator features lc(s2).t = PRP ^ s2.t = VBZ ^ s1.t = JJ

lc(s2).w = He ^ lc(s2).l = nsubj ^ s2.w = has

s2.w = has ^ s2.t = VBZ

s1.w = good ^ s1.t = JJ ^ b1.w = control

ROOT has VBZ

He PRP

nsubj

has VBZ good JJ control NN . .

Stack Bu↵er

Correct transition: SHIFT

1

I Sparse – but lexical features and interaction features crucial

I Incomplete – unavoidable with hand-crafted feature templates

I Expensive – accounts for 95% of computing time

Machine Learning in NLP 24(41)

Page 26: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Dense Features

A Fast and Accurate Dependency Parser using Neural Networks 13

Indicator Features Revisited

Motivation | Model | Experiments | Analysis

Our$Solution:$Neural$Networks!$Learn$a$dense$and$compact$feature$representation

0.1dense dim = 200 0.9 -0.2 0.3 -0.1 -0.5…

ROOT has VBZ

He PRP

nsubj

has VBZ good JJ control NN . .

Stack Bu↵er

Correct transition: SHIFT

1

A Fast and Accurate Dependency Parser using Neural Networks

• We$represent$each$word$as$a$dFdimensional$dense$vector$(i.e.,$word$embeddings).• $Similar$words$expect$to$have$close$vectors.

15

Distributed Representations

come

go

werewas

isgood

Motivation | Model | Experiments | Analysis

I Sparse – dense features capture similarities (words, pos, dep)

I Incomplete – neural network learns interaction features

I Expensive – matrix multiplication with low dimensionality

Machine Learning in NLP 25(41)

Page 27: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

PoS Embeddings

A Fast and Accurate Dependency Parser using Neural Networks 31

POS Embeddings

Motivation | Model | Experiments | Analysis (van der Maaten and Hinton 2008)

Machine Learning in NLP 26(41)

Page 28: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Dep Embeddings

A Fast and Accurate Dependency Parser using Neural Networks 32

Dependency Embeddings

Motivation | Model | Experiments | Analysis

Machine Learning in NLP 27(41)

Page 29: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

The Power of Embeddings

One-Hot (discrete, sparse) Embedding (continuous, dense)

I Inherently much more expressive (R× D vs. 1)

I Can capture similarities between items (sparsity)

I Can be pre-trained on large unlabeled corpora (OOV)

I Can be learned/tuned specifically for the parsing task

Machine Learning in NLP 28(41)

Page 30: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Recurrent Neural Networks

I Bi-LSTM encodes global context in word representations

I Character models capture morphology (and help sparsity)

Machine Learning in NLP 29(41)

Page 31: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Neural Network Techniques in Parsing

I Empirical results have improved substantially since 2014I Neural network techniques yield more effective features:

I Features are learned (not hand-crafted)I Features are continuous and dense (not discrete and sparse)I Features can be tuned to (multiple) specific tasksI Features can capture unbounded dependenciesI Features can capture subword regularities

I Parsing architectures remain essentially the same

Machine Learning in NLP 30(41)

Page 32: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Strengths and Weaknesses

I Is (deep) machine learning always the solution?

I On the one handI Learning from data is extremely powerfulI Normally the first choice for maximizing accuracy

I On the other handI Conditions for applying machine learning may not be idealI There may be additional factors to consider

Machine Learning in NLP 31(41)

Page 33: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Strengths and Weaknesses

I Is (deep) machine learning always the solution?

I On the one handI Learning from data is extremely powerfulI Normally the first choice for maximizing accuracy

I On the other handI Conditions for applying machine learning may not be idealI There may be additional factors to consider

Machine Learning in NLP 31(41)

Page 34: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

The Unreasonable Effectiveness of Data?

I What kind of data is available?I Do we have labeled data?I How much data do we have?I Do we have data from the right domain/language?

I What to do if we don’t have adequate/sufficient data?I Collect and/or annotate (more) dataI Apply cross-domain or cross-language learningI Consider a rule-based (or hybrid) method

Machine Learning in NLP 32(41)

Page 35: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

F-score Isn’t All That Matters

I We may care more about minimum than average quality

Machine Learning in NLP 33(41)

Page 36: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Machine Learning in NLP 34(41)

Page 37: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

F-score Isn’t All That Matters

I We may care more about minimum than average quality

I Users may want to have predictions explained

Machine Learning in NLP 35(41)

Page 38: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Machine Learning in NLP 36(41)

Page 39: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

F-score Isn’t All That Matters

I We may care more about minimum than average quality

I Users may want to have predictions explained

I There may be ethical considerations with biased data

Machine Learning in NLP 37(41)

Page 40: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Machine Learning in NLP 38(41)

Page 41: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

F-score Isn’t All That Matters

I We may care more about minimum than average quality

I Users may want to have predictions explained

I There may be ethical considerations with biased data

I Companies often need to maintain legacy systems

Machine Learning in NLP 39(41)

Page 42: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Machine Learning in NLP 40(41)

Page 43: Machine Learning in NLP - Uppsala Universitynivre/master/ml13-18.pdf · 2018-05-23 · I Why do we use machine learning in NLP? I When should we (not) use machine learning? I Appropriate

Conclusion

I NLP today is overwhelmingly data-driven

I Deep learning is an evolution, not a revolution

I Machine learning is often the best solution

I But be open to pitfalls and alternative techniques

Machine Learning in NLP 41(41)