something old, something new - yow! conferences · 2019-09-23 · @evanahari, yow! australia 2016...

34
@EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

Upload: others

Post on 21-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

@EVANAHARI, YOW! AUSTRALIA 2016

Something Old, Something New

A Talk about NLP for the Curious

Page 2: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

Jabberwocky

Page 3: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

– Lewis Carrollfrom Through the Looking-Glass and What Alice Found There, 1871

“`Twas brillig, and the slithy toves Did gyre and gimble in the wabe:

All mimsy were the borogoves, And the mome raths outgrabe.”

Page 4: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious
Page 5: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious
Page 6: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

Why are these monkeys following

me?

Arrfff!

LOL

Page 7: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

Challenges• Mistakes

• Slang & sparse words

• Ambiguity types • Lexical • Syntax level • Referential

Page 8: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

Human Language• The cortical speech center unique to humans • Evolution over hundred thousands of years

• Vocabulary • Grammar • Speed

• An advanced processing unit • Sounds • Meaning of words • Grammar constructs • Match against a knowledge base • Understanding context and humor!

Page 9: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

Human Language ProcessingPhonology − organization of sounds

Morphology − construction of words

Syntax − creation of valid sentences/phrases and identifying the structural roles of words in them

Semantics − finding meaning of words/phrases/sentences

Pragmatics − Situational meaning of sentences

Discourse − order of sentences affecting interpretation

World knowledge − mapping to general world knowledge

Context awareness - the hardest part…?

Page 10: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

Natural Language Processing

• Computers generating language

• Computers understanding human language

Lexical analysis

Syntactic analysis

Semantic analysis

Discourse Integration

Pragmatic Analysis

Page 11: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

– J. R. Firth, 1957

“You should know a word by the company it keeps.”

Page 12: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

Language Models• Represent language in a mathematical way

A language model is a function that captures the statistical characteristics of the word-sequence distribution in a language

• Dimensionality challenge

10-word sequence from a 100 000 word vocabulary —> 10^50 possible sequences

• Large sample set vs processing time & cost vs accuracy

Page 13: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

Bag-of-words

• Not suited for huge vocabulary • Semantics are not considered • Order of words are lost

= [111100]= [111100]= [110011] = [111100]

= [443311]

Vocabulary:Happy birthday to you dear “name”

= [100000]= [010000] = [001000]= [000100] = [000010] = [000001]

Sample text:Happy birthday to you Happy birthday to you Happy birthday dear “name”Happy birthday to you

Term frequency

Page 14: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

n-grams“Hello everyone who is eager to learn NLP!”

• “gram”: a unit, e.g. letter, phoneme, word, …

• uni-gram: Hello, everyone, who, is, …

• bi-gram: Hello-everyone, everyone-who, who-is, …

• n-gram: n-length sequences of units

• k-skip-gram: skip k units

• bi-skip-tri-gram: Hello-is-learn, everyone-eager-NLP

Page 15: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

n-gram Probabilistic Model• Given a sequence of words what is the likelihood of the next?

• Using counts of n-grams extracted from a training data set we can predict the next word x based on probabilities

• Simple; only n-1 words determines the probability

• Difficult to handle infrequent words and expressions

• Smoothening (e.g. Good-Turing, Katz-Back-off model, etc)

• Use additional sampling (bi-grams, tri-grams, skip-grams)

P(xi | xi-(n-1),… ,xi-1) = count(xi-(n-1),… ,xi-1)

count (xi-(n-1),… ,xi-1, xi)

Page 16: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

Example use: Named Entity Extraction (NER) Examples:

• Grammar based: “…live in <city>”

• Co-occurrence based: “new+york”, “san+francisco”, …

Common pattern: Inference of applying various models

Page 17: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

AppleRound

Red

HasLeaf

+

+

0

Naive Bayes Probabilistic Model

Page 18: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

Example Use: Text Classification

Sample Data AppleRed No

Green YesYellow YesRed YesRed Yes

Green YesYellow NoYellow NoRed Yes

Yellow YesRed No

Green YesGreen YesYellow No

Feature No YesGreen 4 4/14 0.29Yellow 3 2 5/14 0.36Red 2 3 5/14 0.36

Grand Total 5 95/14 9/140.36 0.64

Incoming fruit text says “red” - is it about an apple?P(Yes | Red) = P( Red | Yes) * P(Yes) / P (Red)

P (Red |Yes) = 3/9 = 0.33 P(Yes)= 9/14 = 0.64 P(Red) 0.36

P (Yes | Red) = 0.33 * 0.64 / 0.36 = 0.60

60% chance it’s about an apple!

Page 19: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

Naive BayesThings to Consider:

• Easy and fast, good for multi-class, better than most

• Does not handle unknown categories well, needs smoothing

• Needs less training data, but well representative

• Assuming attributes to be truly independent

Page 20: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

Combining Models

Things to Consider:

• How many models can you afford?

• How good are your models (i.e. training data)?

• Latency vs accuracy?

Page 21: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

Bag of Words

0 0 0 1 =

= 0 1 0 0

Page 22: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

=

=

Continuous Bag of Words (Embeddings)

2 3 8 1

7 5 6 2

Page 23: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious
Page 24: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

Distributed Representation• A word is a dot in a multi-dimensional vector

space, where each dimension is features of a word

• Decide features?

• HUMAN: decides features; gender, plurality, semantic characteristics

• COMPUTER: learn the features; continuous values

Page 25: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

Neural Net Language Model• A model based on the capabilities of NN is an

NNLM

• Rely on the NN to discover the features of a distributed representation

• Extrapolations makes it possible to keep a dense model - even for very large data sets

Page 26: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

Mikolow et al’s CBOW vs Continuous Skip-gram

• CBOW - predict a term based on context (near-terms)

• w-2, w-1, w+1, w+2 —> w

• fast to train

• higher accuracy for frequent words

• conditioning on context needs larger data sets

• Continuous Skip-gram - predict context (near-terms) based on a word

• w —> w-2, w-1, w+1, w+2

• k-skip-n-gram: k and n determines complexity (training time vs accuracy)

• helps create more samples from a smaller data set (data sparsity, rare terms)

Page 27: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

Diagram borrowed from Mikolow et al’s paper

Page 28: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

1. Probability of next term, i.e. Bayes TheoremApproximate t with n - to gain simplicity of n-grams

2. d-dimensional feature vector Cwt-i (column wt-i of parameter matrix C):Ck contains learned features for word k

3. Use standard NN for probabilistic classification (Softmax):where

NN-based Probabilistic Prediction Model

P(w1, w2,… ,wt-1, wt) = P(w1)P(w2|w1)P(w3|w1,w2)…P(w1, w2,… ,wt-1)

x = (Cwt-n+1, 1, …, Cwt-n+1, d, Cwt-n+2, 1, …, Cwt-2, d, Cwt-1, 1, …, Cwt-1, d)

SUM(i=1 to N) eai eak P(wt = k|wt-n+1, … ,wt-1) =

ak = bk + SUM(i=1 to h) Wki tanh(ci + SUM(j=1 to (n-1)d) Vijxj)

Page 29: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

Diagram borrowed from Bengio et al’s paper

Page 30: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

NLP is not New…

ABBYY, Angoss, Attensity, AUTINDEX, Autonomy, Averbis, Basis Technology, Clarabridge, Complete Discovery Source, Endeca Technologies, Expert

System S.p.A., FICO Score, General Sentiment, IBM LanguageWare, IBM SPSS, Insight, LanguageWare,

Language Computer Corporation, Lexalytics, LexisNexis, Luminoso, Mathematica,

MeaningCloud, Medallia, Megaputer Intelligence, NetOwl, RapidMiner, SAS Text Miner and

Teragram;, Semantria , Smartlogic, StatSoft, Sysomos, WordStat, Xpresso, ….

Page 31: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

…but Getting Hot (Again)• Big text data sets available

• Distributed processing tech & capacity cheaper

• ML-based training economically possible (and more accurate)

• Open source movement

• Large upswing potential…

No animals were harmed during this photo shoot

Page 32: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

Cheat Sheet• openNLP-Java,Apache,familiar,easier,older• coreNLP-Java,Stanford,popular,goodtoolspan• NLTK-python,richinresources,easiest• spaCy-upandcoming,python,promising..• FasCext-nothingnew..?• Spark-“MLframework”,customimplementaKon,largescale

• Deeplearning4j-word2vec(java,scala)• Tensorflow(SyntaxNet)-separatedopKmizaKon&moretuningnobs,beCersyntaxparsingmodel,veryrecentlylargescaletoo

Page 33: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

• Language key to our species’ success • Our multi-step process is complex and our brains

forgiving • A language models represents word sequence

distributions within a language • Bag-of-words, n-grams are common representations • Naive bayes common for probabilistic models • Distributed representations are dense and powerful • NNLM based on learned word-features • Positive NLP trends:

More open source tools and frameworks and generated

distributed representations available to all

Summary and Questions

Page 34: Something Old, Something New - YOW! Conferences · 2019-09-23 · @EVANAHARI, YOW! AUSTRALIA 2016 Something Old, Something New A Talk about NLP for the Curious

@EVANAHARI, YOW! AUSTRALIA 2016

Jabberwocky

Vote!