Download - Overview of NLP Tasks and Featurization
![Page 1: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/1.jpg)
Some slides courtesy & adapted from Jason Eisner
Overview of NLP Tasks and Featurization
Frank Ferraro – [email protected] 473/673
![Page 2: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/2.jpg)
Today’s Learning Goals
• Learn about where to find NLP resources/how the community works
• Learn about the many depths of language (the “iceberg”)
• Define featurization and some examples
• Learn about NLP Tasks at a high-level, e.g.,– Document classification
– Part of speech tagging
– Syntactic parsing
– Entity id/coreference
![Page 3: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/3.jpg)
The NLP Research Community
• Papers– ACL Anthology (http://aclweb.org/anthology) has
nearly everything, free! As of 2016:• Over 36,000 papers!
• Free-text searchable– Great way to learn about current research on a topic
– New search interfaces currently available in beta
» Find recent or highly cited work; follow citations
• Used as a dataset by various projects– Analyzing the text of the papers (e.g., parsing it)
– Extracting a graph of papers, authors, and institutions(Who wrote what? Who works where? What cites what?)
Slide courtesy Jason Eisner, with mild edits
![Page 4: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/4.jpg)
The NLP Research Community
• Conferences– Most work in NLP is published as 8-page conference
papers with 3 double-blind reviewers.
– Main annual conferences: ACL, EMNLP, NAACL• Also EACL, IJCNLP, COLING• + journals (TACL, Computational Linguistics [CL])• + various specialized conferences and workshops
– Big events, and growing fast! ACL 2015:• About 1500 attendees• 692 full-length papers submitted (173 accepted)• 648 short papers submitted (145 accepted)• 14 workshops on various topics
Slide courtesy Jason Eisner, with mild edits
![Page 5: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/5.jpg)
The NLP Research Community
• Standard tasks– If you want people to work on your problem, make it
easy for them to get started and to measure their progress. Provide:
• Test data, for evaluating the final systems
• Development data, for measuring whether a change to the system helps, and for tuning parameters
• An evaluation metric (formula for measuring how well a system does on the dev or test data)
• A program for computing the evaluation metric
• Labeled training data and other data resources
• A prize? – with clear rules on what data can be usedSlide courtesy Jason Eisner, with mild edits
![Page 6: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/6.jpg)
The NLP Research Community: Standard Tasks (some of them)
GLUEhttps://gluebenchmark.com/
https://super.gluebenchmark.com/
![Page 7: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/7.jpg)
The NLP Research
Community: Standard Tasks (some of them)
https://semeval.github.io/SemEval2021/tasks
(past years too!)
![Page 8: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/8.jpg)
The NLP Research Community
• Software
– Lots of people distribute code for these tasks• Or you can email a paper’s authors to ask for their code
– Some lists of software, but no central site
– Some end-to-end pipelines for text analysis• “One-stop shopping”
• Cleanup/tokenization + morphology + tagging + parsing + …
• NLTK is easy for beginners and has a free book (intersession?)
• GATE has been around for a long time and has a bunch of modules
Slide courtesy Jason Eisner, with mild edits
![Page 9: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/9.jpg)
The NLP Research Community
• Software– To find good or popular tools:
• Search current papers, ask around, use the web
– Still, often hard to identify the best tool for your job:• Produces appropriate, sufficiently detailed output?
• Accurate? (on the measure you care about)
• Robust? (accurate on your data, not just theirs)
• Fast?
• Easy and flexible to use? Nice file formats, command line options, visualization?
• Trainable for new data and languages? How slow is training?
• Open-source and easy to extend?
Slide courtesy Jason Eisner, with mild edits
![Page 10: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/10.jpg)
The NLP Research Community
• Datasets– Raw text or speech corpora
• Or just their n-gram counts, for super-big corpora
• Various languages and genres
• Usually there’s some metadata (each document’s date, author, etc.)
• Sometimes licensing restrictions (proprietary or copyright data)
– Text or speech with manual or automatic annotations• What kind of annotations? That’s the rest of this lecture …
• May include translations into other languages
– Words and their relationships• Morphological, semantic, translational, evolutionary
– Grammars
– World Atlas of Linguistic Structures
– Parameters of statistical models (e.g., grammar weights)
Slide courtesy Jason Eisner, with mild edits
![Page 11: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/11.jpg)
The NLP Research Community
• Datasets– Read papers to find out what datasets others are using
• Linguistic Data Consortium (searchable) hosts many large datasets
• Many projects and competitions post data on their websites
• But sometimes you have to email the author for a copy
– CORPORA mailing list is also good place to ask around
– LREC Conference publishes papers about new datasets & metrics
– Amazon Mechanical Turk – pay humans (very cheaply) to annotate your data or to correct automatic annotations
• Old task, new domain: Annotate parses etc. on your kind of data
• New task: Annotate something new that you want your system to find
• Auxiliary task: Annotate something new that your system may benefit from finding (e.g., annotate subjunctive mood to improve translation)
– Can you make annotation so much fun or so worthwhilethat they’ll do it for free?
Slide courtesy Jason Eisner, with mild edits
![Page 12: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/12.jpg)
The NLP Research Community
• Standard data formats– Often just simple ad hoc text-file formats
• Documented in a README; easily read with scripts
– Some standards:• Unicode – strings in any language (see ICU toolkit)
• PCM (.wav, .aiff) – uncompressed audio– BWF and AUP extend w/metadata; also many compressed formats
• XML – documents with embedded annotations
• Text Encoding Initiative – faithful digital representations of printed text
• Protocol Buffers, JSON – structured data
• UIMA – “unstructured information management”; Watson uses it
– Standoff markup: raw text in one file, annotations in other files (“ noun phrase from byte 378—392”)
• Annotations can be independently contributed & distributed
Slide courtesy Jason Eisner, with mild edits
![Page 13: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/13.jpg)
The NLP Research Community
• Survey articles– May help you get oriented in a new area
– Synthesis Lectures on Human Language Technologies
– Handbook of Natural Language Processing
– Oxford Handbook of Computational Linguistics
– Foundations & Trends in Machine Learning
– Survey articles in journals – JAIR, CL, JMLR
– ACM Computing Surveys?
– Online tutorial papers
– Slides from tutorials at conferences
– Textbooks
Slide courtesy Jason Eisner, with mild edits
![Page 14: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/14.jpg)
To Write A Typical Paper
• Need some of these ingredients:
– A domain of inquiry
– A task
– Resources
– A method for training & testing
– An algorithm
– Analysis of results
• There are other kinds of papers too: theoretical papers on formal grammars and their properties, new error metrics, new tasks or resources, etc.
Scientific or engineering question
Input & output representations, evaluation metric
Corpora, annotations, dictionaries, …
Derived from a model?
Comparison to baselines & other systems,significance testing, learning curves, ablation analysis, error analysis
Slide courtesy Jason Eisner, with mild edits
![Page 15: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/15.jpg)
http://www.qwantz.com/index.php?comic=170
“Language is Productive”
![Page 16: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/16.jpg)
Adapted from Jason Eisner, Noah Smith
![Page 17: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/17.jpg)
orthography
Adapted from Jason Eisner, Noah Smith
![Page 18: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/18.jpg)
orthography
morphology: study of how words change
Adapted from Jason Eisner, Noah Smith
![Page 19: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/19.jpg)
![Page 20: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/20.jpg)
Watergate
![Page 21: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/21.jpg)
Troopergate
Watergate ➔ Bridgegate
Deflategate
![Page 22: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/22.jpg)
orthography
morphology
Adapted from Jason Eisner, Noah Smith
lexemes: a basic “unit” of language
![Page 23: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/23.jpg)
Ambiguity
Kids Make Nutritious Snacks
![Page 24: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/24.jpg)
Ambiguity
Kids Make Nutritious Snacks
Kids Prepare Nutritious Snacks
Kids Are Nutritious Snacks
sense ambiguity
![Page 25: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/25.jpg)
orthography
morphology
Adapted from Jason Eisner, Noah Smith
lexemes
syntax: study of structure in
language
![Page 26: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/26.jpg)
Ambiguity
British Left Waffles on Falkland Islands
![Page 27: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/27.jpg)
LexicalAmbiguity…
British Left Waffles on Falkland Islands
British Left Waffles on Falkland Islands
British Left Waffles on Falkland Islands
![Page 28: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/28.jpg)
… yields the “Part of Speech
Tagging” task
British Left Waffles on Falkland Islands
British Left Waffles on Falkland Islands
British Left Waffles on Falkland Islands
Adjective Noun Verb
Noun Verb Noun
![Page 29: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/29.jpg)
Parts of Speech
Classes of words that behave like one another in “similar” contexts
Pronunciation (stress) can differ: object (noun: OB-ject) vs. object (verb: ob-JECT)
It can help improve the inputs to other systems
(text-to-speech, syntactic parsing)
![Page 30: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/30.jpg)
SyntacticAmbiguity…
Pat saw Chris with the telescope on the hill.
I ate the meal with friends.
![Page 31: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/31.jpg)
… yields the “Syntactic Parsing”
task
Pat saw Chris with the telescope on the hill.
I ate the meal with friends.
dobj ncomp
dobj
![Page 32: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/32.jpg)
Syntactic Parsing
I ate the meal with friends
NP VP
VP NP PP
S
Syntactic parsing: perform a “meaningful” structural analysis according to grammatical rules
![Page 33: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/33.jpg)
Syntactic Parsing Can Show the Structure of Language
I ate the meal with friends
NP VP
VP NP PP
S
![Page 34: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/34.jpg)
Syntactic Parsing Can Show the Structure of Language
I ate the meal with friends
NP VP
VP NP PP
S
This structure is provided by a grammar. We’ll talk more about
how to (a) obtain it, or (b) learn it, or (c) both!
![Page 35: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/35.jpg)
Syntactic Parsing Can Help Disambiguate
I ate the meal with friends
NP VP
VP NP PP
S
![Page 36: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/36.jpg)
Syntactic Parsing Can Help Disambiguate
I ate the meal with friends
NP VP
VP NP PP
S
NP VP
S
VP NP
PPNP
![Page 37: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/37.jpg)
Clearly Show Ambiguity…But Not Necessarily All Ambiguity
I ate the meal with friends
NP VP
VP NP PP
S
I ate the meal with gusto
I ate the meal with a fork
![Page 38: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/38.jpg)
orthography
morphology
Adapted from Jason Eisner, Noah Smith
lexemes
syntax
semantics: study of (literal?) meaning
![Page 39: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/39.jpg)
orthography
morphology
Adapted from Jason Eisner, Noah Smith
lexemes
syntax
semantics
pragmatics: study of (implied?) meaning
![Page 40: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/40.jpg)
orthography
morphology
Adapted from Jason Eisner, Noah Smith
lexemes
syntax
semantics
pragmatics
discourse: study of how we communicate
![Page 41: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/41.jpg)
Semantics → Discourse Processing
John stopped at the donut store.
Courtesy Jason Eisner
![Page 42: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/42.jpg)
Semantics → Discourse Processing
John stopped at the donut store.
Courtesy Jason Eisner
![Page 43: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/43.jpg)
Semantics → Discourse Processing
John stopped at the donut store before work.
Courtesy Jason Eisner
![Page 44: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/44.jpg)
Semantics → Discourse Processing
John stopped at the donut store on his way home.
Courtesy Jason Eisner
![Page 45: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/45.jpg)
Semantics → Discourse Processing
John stopped at the donut shop.
John stopped at the trucker shop.
John stopped at the mom & pop shop.
John stopped at the red shop.
Courtesy Jason Eisner
![Page 46: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/46.jpg)
Pat and Chandler agreed on a plan.
He said Pat would try the same tactic again.
is “he” the same person as “Chandler?”
?
Entity Coreference Resolution
![Page 47: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/47.jpg)
Discourse Processing through Coreference
I spread the cloth on the table to protect it.
I spread the cloth on the table to display it.
Courtesy Jason Eisner
![Page 48: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/48.jpg)
I spread the cloth on the table to protect it.
I spread the cloth on the table to display it.
Courtesy Jason Eisner
Discourse Processing through Coreference
![Page 49: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/49.jpg)
I spread the cloth on the table to protect it.
I spread the cloth on the table to display it.
Courtesy Jason Eisner
Discourse Processing through Coreference
![Page 50: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/50.jpg)
Adapted from Jason Eisner, Noah Smith
NLP + Latent Modeling
explain what you see/annotate
with things “of importance” you don’t
orthography
morphology
lexemes
syntax
semantics
pragmatics
discourse
observed text
![Page 51: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/51.jpg)
orthography
morphology
lexemes
syntax
semantics
pragmatics
discourse
![Page 52: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/52.jpg)
orthography
morphology
lexemes
syntax
semantics
pragmatics
discourse
VISION
AUDIO
prosody
intonation
color
![Page 53: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/53.jpg)
ML Term: “Featurization”
The procedure of extracting features for some input
Often viewed as a K-dimensional vector function f of the input language x
𝑓 𝑥 = (𝑓1 𝑥 ,… , 𝑓𝐾(𝑥))
Each of these is a feature (/feature function)
![Page 54: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/54.jpg)
ML Term: “Featurization”
The procedure of extracting features for some input
Often viewed as a K-dimensional vector function f of the input language x
𝑓 𝑥 = (𝑓1 𝑥 ,… , 𝑓𝐾(𝑥))
In supervised settings, it can equivalently be viewed as a K-dimensional vector function f of the input language x and a potential label y
𝑓 𝑥, 𝑦 = (𝑓1 𝑥, 𝑦 , … , 𝑓𝐾(𝑥, 𝑦))
Features can be thought of as “soft” rulesE.g., POSITIVE sentiments tweets may be more likely to have the word “happy”
![Page 55: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/55.jpg)
Defining Appropriate Features
Feature functions help extract useful features (characteristics) of the data
They turn data into numbers
Features that are not 0 are said to have fired
![Page 56: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/56.jpg)
Defining Appropriate Features
Feature functions help extract useful features (characteristics) of the data
They turn data into numbers
Features that are not 0 are said to have fired
You can define classes of features by templating (we’ll come back to this!)
Often binary-valued (0 or 1), but can be real-valued
![Page 57: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/57.jpg)
Three Common Types of Featurization in NLP
1. Bag-of-words (or bag-of-characters, bag-of-relations)
2. Linguistically-inspired features
3. Dense features via embeddings
![Page 58: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/58.jpg)
Three Common Types of Featurization in NLP
1. Bag-of-words (or bag-of-characters, bag-of-relations)
– Identify unique sufficient atomic sub-parts (e.g., words in a document)
– Define simple features over these, e.g.,• Binary (0 or 1) ➔ indicating presence• Natural numbers ➔ indicating number of times in a
context• Real-valued ➔ various other score (we’ll see examples
throughout the semester)
2. Linguistically-inspired features3. Dense features via embeddings
![Page 59: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/59.jpg)
Three Common Types of Featurization in NLP
1. Bag-of-words (or bag-of-characters, bag-of-relations)– Identify unique sufficient atomic sub-parts (e.g., words in
a document)
– Define simple features over these, e.g.,• Binary (0 or 1) ➔ indicating presence
• Natural numbers ➔ indicating number of times in a context
• Real-valued ➔ various other score (we’ll see examples throughout the semester)
2. Linguistically-inspired features– Define features from words, word spans, or linguistic-
based annotations extracted from the document
3. Dense features via embeddings
![Page 60: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/60.jpg)
Three Common Types of Featurization in NLP
1. Bag-of-words (or bag-of-characters, bag-of-relations)– Identify unique sufficient atomic sub-parts (e.g., words in a
document)– Define simple features over these, e.g.,
• Binary (0 or 1) ➔ indicating presence• Natural numbers ➔ indicating number of times in a context• Real-valued ➔ various other score (we’ll see examples throughout
the semester)
2. Linguistically-inspired features– Define features from words, word spans, or linguistic-based
annotations extracted from the document
3. Dense features via embeddings– Compute/extract a real-valued vector, e.g., from word2vec,
ELMO, BERT, …
![Page 61: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/61.jpg)
Example: Document Classification via Bag-of-Words Features
TECH
NOT TECH
Electronic alerts have been used to assist the authorities in moments of chaos and potential danger: after the Boston bombing in 2013, when the Boston suspects were still at large, and last month in Los Angeles, during an active shooter scare at the airport.
Let’s make a core assumption: the label can be predicted from
counts of individual word types
![Page 62: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/62.jpg)
Example: Document Classification via Bag-of-Words Features
TECH
NOT TECH
Electronic alerts have been used to assist the authorities in moments of chaos and potential danger: after the Boston bombing in 2013, when the Boston suspects were still at large, and last month in Los Angeles, during an active shooter scare at the airport.
feature extraction
Core assumption: the label can be predicted from
counts of individual word types
𝑓𝑖 𝑥 =# of times word type i appearsin document x
With V word types, define V feature
functions 𝑓𝑖 𝑥 as
![Page 63: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/63.jpg)
Example: Document Classification via Bag-of-Words Features
TECH
NOT TECH
Electronic alerts have been used to assist the authorities in moments of chaos and potential danger: after the Boston bombing in 2013, when the Boston suspects were still at large, and last month in Los Angeles, during an active shooter scare at the airport.
feature extraction
Core assumption: the label can be predicted from
counts of individual word types
𝑓 𝑥 = 𝑓𝑖 𝑥 𝑖
𝑉
𝑓𝑖 𝑥 =# of times word type i appearsin document x
With V word types, define V feature
functions 𝑓𝑖 𝑥 as
![Page 64: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/64.jpg)
Example: Document Classification via Bag-of-Words Features
TECH
NOT TECH
Electronic alerts have been used to assist the authorities in moments of chaos and potential danger: after the Boston bombing in 2013, when the Boston suspects were still at large, and last month in Los Angeles, during an active shooter scare at the airport.
feature 𝑓𝑖 𝑥 value
alerts 1
assist 1
bombing 1
Boston 2
…
sniffle 0
…
feature extraction
Core assumption: the label can be predicted from
counts of individual word types
![Page 65: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/65.jpg)
Example: Document Classification with Bag-of-Words Features
TECH
NOT TECH
Electronic alerts have been used to assist the authorities in moments of chaos and potential danger: after the Boston bombing in 2013, when the Boston suspects were still at large, and last month in Los Angeles, during an active shooter scare at the airport.
feature weight
alerts .043
assist -0.25
bombing 0.8
Boston -0.00001
…
w: weights
feature 𝑓𝑖 𝑥 value
alerts 1
assist 1
bombing 1
Boston 2
…
f(x): “bag of words”
![Page 66: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/66.jpg)
600.465 - Intro to NLP - J. Eisner 66
Text Annotation Tasks
1. Classify the entire document
2. Classify word tokens individually
3. Classify word tokens in a sequence
4. Identify phrases (“chunking”)
5. Syntactic annotation (parsing)
6. Semantic annotation
Slide courtesy Jason Eisner, with mild edits
![Page 67: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/67.jpg)
600.465 - Intro to NLP - J. Eisner 67
Text Annotation Tasks
1. Classify the entire document(“text categorization”)
2. Classify individual word tokens
3. Classify word tokens in a sequence
4. Identify phrases (“chunking”)
5. Syntactic annotation (parsing)
6. Semantic annotation
Slide courtesy Jason Eisner, with mild edits
![Page 68: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/68.jpg)
Text Classification
Assigning subject categories, topics, or genres
Spam detection
Authorship identification
Age/gender identification
Language Identification
Sentiment analysis
…
![Page 69: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/69.jpg)
Text Classification
Assigning subject categories, topics, or genres
Spam detection
Authorship identification
Age/gender identification
Language Identification
Sentiment analysis
…
Input:a documenta fixed set of classes C = {c1, c2,…, cJ}
Output: a predicted class c from C
![Page 70: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/70.jpg)
Text Classification
Assigning subject categories, topics, or genres
Spam detection
Authorship identification
Age/gender identification
Language Identification
Sentiment analysis
…
Input:a document linguistic bloba fixed set of classes C = {c1, c2,…, cJ}
Output: a predicted class c from C
![Page 71: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/71.jpg)
Text Classification: Hand-coded Rules?
Assigning subject categories, topics, or genres
Spam detection
Authorship identification
Age/gender identification
Language Identification
Sentiment analysis
…
Rules based on combinations of words or other features
spam: black-list-address OR (“dollars” AND “have been selected”)
Accuracy can be high
If rules carefully refined by expert
Building and maintaining these rules is expensive
Can humans faithfully assign uncertainty?
![Page 72: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/72.jpg)
Text Classification: Supervised Machine Learning
Assigning subject categories, topics, or genres
Spam detection
Authorship identification
Age/gender identification
Language Identification
Sentiment analysis
…
Input: a document da fixed set of classes C = {c1, c2,…, cJ}A training set of m hand-labeled documents (d1,c1),....,(dm,cm)
Output: a learned classifier γ that maps documents to classes
![Page 73: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/73.jpg)
Text Classification: Supervised Machine Learning
Assigning subject categories, topics, or genres
Spam detection
Authorship identification
Age/gender identification
Language Identification
Sentiment analysis
…
Input: a document da fixed set of classes C = {c1, c2,…, cJ}A training set of m hand-labeled documents (d1,c1),....,(dm,cm)
Output: a learned classifier γ that maps documents to classes
Naïve Bayes
Logistic regression
Support-vector machines
k-Nearest Neighbors
…
![Page 74: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/74.jpg)
Text Classification: Supervised Machine Learning
Assigning subject categories, topics, or genres
Spam detection
Authorship identification
Age/gender identification
Language Identification
Sentiment analysis
…
Naïve Bayes
Logistic regression
Support-vector machines
k-Nearest Neighbors
…
Input: a document da fixed set of classes C = {c1, c2,…, cJ}A training set of m hand-labeled documents (d1,c1),....,(dm,cm)
Output: a learned classifier γ that maps documents to classes
![Page 75: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/75.jpg)
600.465 - Intro to NLP - J. Eisner 75
Text Annotation Tasks
1. Classify the entire document(“text categorization”)
2. Classify individual word tokens
3. Classify word tokens in a sequence
4. Identify phrases (“chunking”)
5. Syntactic annotation (parsing)
6. Semantic annotation
Slide courtesy Jason Eisner, with mild edits
![Page 76: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/76.jpg)
slide courtesy of D. Yarowsky
p(class | token in context)
(WSD)
Build a special classifier just for tokens of “plant”
Slide courtesy Jason Eisner, with mild edits
![Page 77: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/77.jpg)
slide courtesy of D. Yarowsky
p(class | token in context)
WSD for
Build a special classifier just for tokens of “sentence”
Slide courtesy Jason Eisner, with mild edits
![Page 78: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/78.jpg)
slide courtesy of D. Yarowsky
p(class | token in context)
Slide courtesy Jason Eisner, with mild edits
![Page 79: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/79.jpg)
slide courtesy of D. Yarowsky
p(class | token in context)
Slide courtesy Jason Eisner, with mild edits
![Page 80: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/80.jpg)
slide courtesy of D. Yarowsky
p(class | token in context)
Slide courtesy Jason Eisner, with mild edits
![Page 81: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/81.jpg)
slide courtesy of D. Yarowsky
p(class | token in context)
Slide courtesy Jason Eisner, with mild edits
![Page 82: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/82.jpg)
slide courtesy of D. Yarowsky
p(class | token in context)
Slide courtesy Jason Eisner, with mild edits
![Page 83: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/83.jpg)
600.465 - Intro to NLP - J. Eisner 83
What features? Example: “word to [the] left [of correction]”slide courtesy of D. Yarowsky (modified)
Spelling correction using an n-gram language model (n ≥ 2) would use words to left and right to help predict the true word.
Similarly, an HMM would predict a word’s class using classes to left and right.
But we’d like to throw in all kinds of other features, too …
Slide courtesy Jason Eisner, with mild edits
![Page 84: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/84.jpg)
600.465 - Intro to NLP - J. Eisner 84
An assortment of possible cues ...
slide courtesy of D. Yarowsky (modified)
generates a whole bunch of potential cues – use data to find out which
ones work best
Slide courtesy Jason Eisner, with mild edits
![Page 85: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/85.jpg)
600.465 - Intro to NLP - J. Eisner 85
An assortment of possible cues ...
slide courtesy of D. Yarowsky (modified)
merged rankingof all cues
of all these types
This feature is relatively
weak, but weak features are still useful,
especially since very few
features will fire in a given
context.
Slide courtesy Jason Eisner, with mild edits
![Page 86: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/86.jpg)
600.465 - Intro to NLP - J. Eisner 86
Final decision list for lead (abbreviated)
slide courtesy of D. Yarowsky (modified)
List of all features,ranked by their weight.
(These weights are for a simple “decision list” model where the single highest-weighted feature
that fires gets to make the decision all by itself.
However, a log-linear model, which adds up the weights of all
features that fire, would be roughly similar.)
Slide courtesy Jason Eisner, with mild edits
![Page 87: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/87.jpg)
600.465 - Intro to NLP - J. Eisner 87
Text Annotation Tasks
1. Classify the entire document(“text categorization”)
2. Classify individual word tokens
3. Classify word tokens in a sequence
4. Identify phrases (“chunking”)
5. Syntactic annotation (parsing)
6. Semantic annotation
Slide courtesy Jason Eisner, with mild edits
![Page 88: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/88.jpg)
600.465 - Intro to NLP - J. Eisner
Part of Speech Tagging
• We could treat tagging as a token classification problem– Tag each word independently given features of context
– And features of the word’s spelling (suffixes, capitalization)
Slide courtesy Jason Eisner, with mild edits
![Page 89: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/89.jpg)
Sequence Labeling as Classification
Classify each token independently but use as input features, information about the surrounding tokens (sliding window).
John saw the saw and decided to take it to the table.
classifier
NNP
Slide from Ray Mooney Slide courtesy Jason Eisner, with mild edits
![Page 90: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/90.jpg)
Sequence Labeling as Classification
John saw the saw and decided to take it to the table.
classifier
VBD
Slide from Ray Mooney
Classify each token independently but use as input features, information about the surrounding tokens (sliding window).
Slide courtesy Jason Eisner, with mild edits
![Page 91: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/91.jpg)
Sequence Labeling as Classification
John saw the saw and decided to take it to the table.
classifier
DT
Slide from Ray Mooney
Classify each token independently but use as input features, information about the surrounding tokens (sliding window).
Slide courtesy Jason Eisner, with mild edits
![Page 92: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/92.jpg)
Sequence Labeling as Classification
John saw the saw and decided to take it to the table.
classifier
NN
Slide from Ray Mooney
Classify each token independently but use as input features, information about the surrounding tokens (sliding window).
Slide courtesy Jason Eisner, with mild edits
![Page 93: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/93.jpg)
Sequence Labeling as Classification
John saw the saw and decided to take it to the table.
classifier
CC
Slide from Ray Mooney
Classify each token independently but use as input features, information about the surrounding tokens (sliding window).
Slide courtesy Jason Eisner, with mild edits
![Page 94: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/94.jpg)
Sequence Labeling as Classification
John saw the saw and decided to take it to the table.
classifier
VBD
Slide from Ray Mooney
Classify each token independently but use as input features, information about the surrounding tokens (sliding window).
Slide courtesy Jason Eisner, with mild edits
![Page 95: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/95.jpg)
Sequence Labeling as Classification
John saw the saw and decided to take it to the table.
classifier
TO
Slide from Ray Mooney
Classify each token independently but use as input features, information about the surrounding tokens (sliding window).
Slide courtesy Jason Eisner, with mild edits
![Page 96: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/96.jpg)
Sequence Labeling as Classification
John saw the saw and decided to take it to the table.
classifier
VB
Slide from Ray Mooney
Classify each token independently but use as input features, information about the surrounding tokens (sliding window).
Slide courtesy Jason Eisner, with mild edits
![Page 97: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/97.jpg)
Sequence Labeling as Classification
John saw the saw and decided to take it to the table.
classifier
PRP
Slide from Ray Mooney
Classify each token independently but use as input features, information about the surrounding tokens (sliding window).
Slide courtesy Jason Eisner, with mild edits
![Page 98: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/98.jpg)
Sequence Labeling as Classification
John saw the saw and decided to take it to the table.
classifier
IN
Slide from Ray Mooney
Classify each token independently but use as input features, information about the surrounding tokens (sliding window).
Slide courtesy Jason Eisner, with mild edits
![Page 99: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/99.jpg)
Sequence Labeling as Classification
John saw the saw and decided to take it to the table.
classifier
DT
Slide from Ray Mooney
Classify each token independently but use as input features, information about the surrounding tokens (sliding window).
Slide courtesy Jason Eisner, with mild edits
![Page 100: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/100.jpg)
Sequence Labeling as Classification
John saw the saw and decided to take it to the table.
classifier
NN
Slide from Ray Mooney
Classify each token independently but use as input features, information about the surrounding tokens (sliding window).
Slide courtesy Jason Eisner, with mild edits
![Page 101: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/101.jpg)
600.465 - Intro to NLP - J. Eisner
Part of Speech Tagging
Or we could use an HMM:
Start PN Verb Det Noun Prep Noun Prep Det Noun
Bill directed a cortege of autos through the dunes
0.4 0.6
0.001
probs
from tag
bigram
model
probs from
unigram
replacement
Det
Start Adj
Noun
Stop
Det
Adj
Noun
Det
Adj
Noun
Det
Adj
Noun
Adj:directed…
Slide courtesy Jason Eisner, with mild edits
We’ll see HMMs later!
![Page 102: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/102.jpg)
600.465 - Intro to NLP - J. Eisner
Part of Speech Tagging
• We could treat tagging as a token classification problem– Tag each word independently given features of context
– And features of the word’s spelling (suffixes, capitalization)
• Or we could use an HMM: – The point of the HMM is basically that the tag of one word might depend
on the tags of adjacent words.
• Combine these two ideas??– We’d like rich features (e.g., in a log-linear model), but we’d also like our
feature functions to depend on adjacent tags.
– So, the problem is to predict all tags together.
Slide courtesy Jason Eisner, with mild edits
![Page 103: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/103.jpg)
• Easy to build a “yes” or “no” predictor from supervised training data– Plenty of software packages to do the learning & prediction– Lots of people in NLP never go beyond this ☺
• Similarly, easy to build a system that chooses from a small finite set– Basically the same deal– But runtime goes up linearly with the size of the set, unless you’re clever (HW3)
• Harder to predict the best string or tree (set is exponentially large or infinite)
Supervised Learning Methods
Slide courtesy Jason Eisner, with mild edits
![Page 104: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/104.jpg)
Can We {Do Better?{Be More Expressive?
CRF Tutorial, Fig 1.2, Sutton & McCallum (2012)
See: CMSC 678 or CMSC 691 (Prob. & Graphical ML)
![Page 105: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/105.jpg)
wi-3 wi-2 wiwi-1
hi-3 hi-2 hi-1 hi
yi-3 yi-2 yiyi-1
observe these words one at a time
predict the label for the word
from these hidden states
“cell”
Can We Use Neural, Recurrent Methods?
![Page 106: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/106.jpg)
• Easy to build a “yes” or “no” predictor from supervised training data– Plenty of software packages to do the learning & prediction– Lots of people in NLP never go beyond this ☺
• Similarly, easy to build a system that chooses from a small finite set– Basically the same deal– But runtime goes up linearly with the size of the set, unless you’re clever (HW3)
• Harder to predict the best string or tree (set is exponentially large or infinite)– In the best case, requires dynamic programming; you might have to write your own code– But finite-state or CRF toolkits may find the best string for you– And you could modify someone else’s parser to pick the best tree– An algorithm for picking the best can usually be turned into a learning algorithm– You may need to rely on approximate solutions (e.g., via beam search)
Supervised Learning Methods
Slide courtesy Jason Eisner, with mild edits
![Page 107: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/107.jpg)
600.465 - Intro to NLP - J. Eisner 120
Text Annotation Tasks
1. Classify the entire document(“text categorization”)
2. Classify individual word tokens
3. Classify word tokens in a sequence
4. Identify phrases (“chunking”)
5. Syntactic annotation (parsing)
6. Semantic annotation
Slide courtesy Jason Eisner, with mild edits
![Page 108: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/108.jpg)
Example: Finding Named Entities
Named entity recognition (NER)
Identify proper names in texts, and classification into a set of predefined categories of interest
Person namesOrganizations (companies, government organisations,
committees, etc)Locations (cities, countries, rivers, etc)Date and time expressionsMeasures (percent, money, weight etc), email addresses, Web
addresses, street addresses, etc. Domain-specific: names of drugs, medical conditions, names
of ships, bibliographic references etc.
Cunningham and Bontcheva (2003, RANLP Tutorial)
![Page 109: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/109.jpg)
9/12/2020 Slide from Jim Martin
Named Entity Recognition
CHICAGO (AP) — Citing high fuel prices, United Airlines said Friday
it has increased fares by $6 per round trip on flights to some cities
also served by lower-cost carriers. American Airlines, a unit AMR,
immediately matched the move, spokesman Tim Wagner said.
United, a unit of UAL, said the increase took effect Thursday night
and applies to most routes where it competes against discount
carriers, such as Chicago to Dallas and Atlanta and Denver to San
Francisco, Los Angeles and New York.
Slide courtesy Jason Eisner, with mild edits
![Page 110: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/110.jpg)
Slide from Jim Martin
NE Types
Slide courtesy Jason Eisner, with mild edits
![Page 111: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/111.jpg)
Slide from Chris Brew, adapted from slide by William Cohen
Example: Information Extraction
Filling slots in a database from sub-segments of text.As a task:
October 14, 2002, 4:00 a.m. PT
For years, Microsoft Corporation CEO Bill
Gates railed against the economic philosophy
of open-source software with Orwellian fervor,
denouncing its communal licensing as a
"cancer" that stifled technological innovation.
Today, Microsoft claims to "love" the open-
source concept, by which software code is
made public to encourage improvement and
development by outside programmers. Gates
himself says Microsoft will gladly disclose its
crown jewels--the coveted code behind the
Windows operating system--to select
customers.
"We can be open source. We love the concept
of shared source," said Bill Veghte, a
Microsoft VP. "That's a super-important shift
for us in terms of code access.“
Richard Stallman, founder of the Free
Software Foundation, countered saying…
NAME TITLE ORGANIZATION
Bill Gates CEO Microsoft
Bill Veghte VP Microsoft
Richard Stallman founder Free Soft..
IE
Slide courtesy Jason Eisner, with mild edits
![Page 112: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/112.jpg)
Phrase Types to Identify for IE
Closed set
He was born in Alabama…
The big Wyoming sky…
U.S. states
Regular set
Phone: (413) 545-1323
The CALD main office can be
reached at 412-268-1299
U.S. phone numbers
Complex pattern
University of Arkansas
P.O. Box 140
Hope, AR 71802
U.S. postal addresses
Headquarters:
1128 Main Street, 4th Floor
Cincinnati, Ohio 45210
…was among the six houses
sold by Hope Feldman that year.
Ambiguous patterns,
needing context and
many sources of evidence
Person names
Pawel Opalinski, Software
Engineer at WhizBang Labs.
Slide from Chris Brew, adapted from slide by William Cohen Slide courtesy Jason Eisner, with mild edits
![Page 113: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/113.jpg)
• A key step in IE is to identify relevant phrases– Named entities
• As on previous slides
– Relationship phrases• “said”, “according to”, …
• “was born in”, “hails from”, …
• “bought”, “hopes to acquire”, “formed a joint agreement with”, …
– Simple syntactic chunks (e.g., non-recursive NPs)• “Syntactic chunking” sometimes done before (or instead of) parsing
• Also, “segmentation”: divide Chinese text into words (no spaces)
• So, how do we learn to mark phrases?
Identifying phrases
Slide courtesy Jason Eisner, with mild edits
![Page 114: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/114.jpg)
Reduce to a tagging problem …
• The IOB encoding (Ramshaw & Marcus 1995):
– B_X = “beginning” (first word of an X)
– I_X = “inside” (non-first word of an X)
– O = “outside” (not in any phrase)
– Does not allow overlapping or recursive phrases
…United Airlines said Friday it has increased …B_ORG I_ORG O O O O O
… the move , spokesman Tim Wagner said …O O O O B_PER I_PER O
Slide adapted from Chris Brew
What if this were tagged as B_ORG instead?
Slide courtesy Jason Eisner, with mild edits
![Page 115: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/115.jpg)
• Classified ads
• Restaurant reviews
• Bibliographic citations
• Appointment emails
• Legal opinions
• Papers describing clinical medical studies
• …
Example applications for IE
Slide courtesy Jason Eisner, with mild edits
![Page 116: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/116.jpg)
600.465 - Intro to NLP - J. Eisner 131
Text Annotation Tasks
1. Classify the entire document(“text categorization”)
2. Classify individual word tokens
3. Classify word tokens in a sequence
4. Identify phrases (“chunking”)
5. Syntactic annotation (parsing)
6. Semantic annotation
Slide courtesy Jason Eisner, with mild edits
![Page 117: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/117.jpg)
Context Free Grammar
Set of rewrite rules, comprised of terminals and non-terminals
S → NP VPNP → Det Noun
NP → NounNP → Det AdjP
NP → NP PP
PP → P NPAdjP→ Adj Noun
VP → V NPNoun → Baltimore
…
![Page 118: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/118.jpg)
Generate from a Context Free Grammar
S → NP VPNP → Det Noun
NP → NounNP → Det AdjP
NP → NP PP
PP → P NPAdjP→ Adj Noun
VP → V NPNoun → Baltimore
…
Baltimore is a great city
S
NP VP
Noun
Baltimore
Verb NP
is a great city
![Page 119: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/119.jpg)
Assign Structure (Parse) with aContext Free Grammar
S → NP VPNP → Det Noun
NP → NounNP → Det AdjP
NP → NP PP
PP → P NPAdjP→ Adj Noun
VP → V NPNoun → Baltimore
…
Baltimore is a great cityS
NP VP
Noun
Baltimore
Verb NP
is a great city
[S [NP [Noun Baltimore] ] [VP [Verb is] [NP a great city]]]
bracket notation
(S (NP (Noun Baltimore))(VP (V is)
(NP a great city)))
S-expression
![Page 120: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/120.jpg)
T = Cell[N][N+1]
for(j = 1; j ≤ N; ++j) {T[j-1][j].add(X for non-terminal X in G if X → wordj)
}
for(width = 2; width ≤ N; ++width) {for(start = 0; start < N - width; ++start) {
end = start + widthfor(mid = start+1; mid < end; ++mid) {
for(non-terminal Y : T[start][mid]) {for(non-terminal Z : T[mid][end]) {
T[start][end].add(X for rule X → Y Z : G)}
}}
}}
You may get to use a dynamic program
X
Y Z
Y Z
![Page 121: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/121.jpg)
You can get dependency-based syntactic forms
I ate the meal with friends
NP VP
VP NP PP
S
NP VP
S
VP NP
PPNPnsubj dobj nmod
nsubj dobj nmod
![Page 122: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/122.jpg)
And Go From Syntax to Shallow Semantics
http://corenlp.run/ (constituency & dependency)
https://github.com/hltcoe/predpatt
http://openie.allenai.org/
http://www.cs.rochester.edu/research/knext/browse/ (constituency trees)
http://rtw.ml.cmu.edu/rtw/
Angeli et al. (2015)
“Open Information Extraction”
a sampling of efforts
![Page 123: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/123.jpg)
• Easy to build a “yes” or “no” predictor from supervised training data– Plenty of software packages to do the learning & prediction– Lots of people in NLP never go beyond this ☺
• Similarly, easy to build a system that chooses from a small finite set– Basically the same deal– But runtime goes up linearly with the size of the set, unless you’re clever (HW3)
• Harder to predict the best string or tree (set is exponentially large or infinite)– In the best case, requires dynamic programming; you might have to write your own code– But finite-state or CRF toolkits may find the best string for you– And you could modify someone else’s parser to pick the best tree– An algorithm for picking the best can usually be turned into a learning algorithm– You may need to rely on approximate solutions (e.g., via beam search)
• Hardest if your features look at “non-local” properties of the string or tree– Now dynamic programming won’t work (or will be something awful like O(n9))– You need some kind of approximate search– Can be harder to turn approximate search into a learning algorithm– Still, this is a standard preoccupation of machine learning
(“structured prediction,” “graphical models”)
Supervised Learning Methods
Slide courtesy Jason Eisner, with mild edits
![Page 124: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/124.jpg)
600.465 - Intro to NLP - J. Eisner 139
Text Annotation Tasks
1. Classify the entire document(“text categorization”)
2. Classify individual word tokens
3. Classify word tokens in a sequence
4. Identify phrases (“chunking”)
5. Syntactic annotation (parsing)
6. Semantic annotation
Slide courtesy Jason Eisner, with mild edits
![Page 125: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/125.jpg)
Semantic Role Labeling (SRL)
• For each predicate (e.g., verb)1. find its arguments (e.g., NPs) 2. determine their semantic roles
John drove Mary from Austin to Dallas in his Toyota Prius.
The hammer broke the window.
– agent: Actor of an action– patient: Entity affected by the action– source: Origin of the affected entity– destination: Destination of the affected entity– instrument: Tool used in performing action.– beneficiary: Entity for whom action is performed
Slide thanks to Ray Mooney (modified)Slide courtesy Jason Eisner, with mild edits
![Page 126: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/126.jpg)
As usual, can solve as classification …
• Consider one verb at a time: “bit”
• Classify the role (if any) of each of the 3 NPs
S
NP VP
NP PP
The
Prep NP
with
the
V NP
bit
a
big
dog girl
boy
Det A NDet A N
εAdj A
ε
Det A N
ε
Color Code:not-a-role
agent
patient
source
destination
instrument
beneficiary
Slide thanks to Ray Mooney (modified)Slide courtesy Jason Eisner, with mild edits
![Page 127: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/127.jpg)
Parse tree paths as classification features
S
NP VP
NP PP
The
Prep NP
with
the
V NP
bit
a
big
dog girl
boy
Det A NDet A N
εAdj A
ε
Det A N
ε
Path feature is
V ↑ VP ↑ S ↓ NP
which tends to
be associated
with agent role
Slide thanks to Ray Mooney (modified)Slide courtesy Jason Eisner, with mild edits
![Page 128: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/128.jpg)
S
NP VP
NP PP
The
Prep NP
with
the
V NP
bit
a
big
dog girl
boy
Det A NDet A N
εAdj A
ε
Det A N
ε
Slide thanks to Ray Mooney (modified)
Parse tree paths as classification features
Path feature is
V ↑ VP ↑ S ↓ NP ↓ PP ↓ NP
which tends to
be associated
with no role
Slide courtesy Jason Eisner, with mild edits
![Page 129: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/129.jpg)
Head words as features • Some roles prefer to be filled by certain kinds of NPs.
• This can give us useful features for classifying accurately:– “John ate the spaghetti with chopsticks.” (instrument)
“John ate the spaghetti with meatballs.” (patient)
“John ate the spaghetti with Mary.”• Instruments should be tools
• Patient of “eat” should be edible
– “John bought the car for $21K.” (instrument)
“John bought the car for Mary.” (beneficiary)• Instrument of “buy” should be Money
• Beneficiaries should be animate (things with desires)
– “John drove Mary to school in the van”
“John drove the van to work with Mary.”• What do you think?
Slide thanks to Ray Mooney (modified)Slide courtesy Jason Eisner, with mild edits
![Page 130: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/130.jpg)
Uses of Semantic Roles• Find the answer to a user’s question
– “Who” questions usually want Agents– “What” question usually want Patients– “How” and “with what” questions usually want Instruments– “Where” questions frequently want Sources/Destinations.– “For whom” questions usually want Beneficiaries– “To whom” questions usually want Destinations
• Generate text– Many languages have specific syntactic constructions that must or should
be used for specific semantic roles.
• Word sense disambiguation, using selectional restrictions – The bat ate the bug. (what kind of bat? what kind of bug?)
• Agents (particularly of “eat”) should be animate – animal bat, not baseball bat• Patients of “eat” should be edible – animal bug, not software bug
– John fired the secretary.John fired the rifle.
Patients of fire1 are different than patients of fire2
Slide thanks to Ray Mooney (modified)Slide courtesy Jason Eisner, with mild edits
![Page 131: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/131.jpg)
• PropBank – coarse-grained roles of verbs
• NomBank – similar, but for nouns
• FrameNet – fine-grained roles of any word
• TimeBank – temporal expressions
Other Current Semantic Annotation Tasks (similar to SRL)
Slide courtesy Jason Eisner, with mild edits
![Page 132: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/132.jpg)
We avenged the insult by setting fire to his village.
REVENGE FRAME
Avenger
Offender (unexpressed in this sentence)
Injury
Injured Party (unexpressed in this sentence)
Punishment
FrameNet Example
a word/phrase that triggers the REVENGE frame
Slide thanks to CJ Fillmore (modified) Slide courtesy Jason Eisner, with mild edits
![Page 133: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/133.jpg)
REVENGE FRAME
triggering words and phrases
(not limited to verbs)
avenge, revenge, retaliate, get back at, pay back, get even, …
revenge, vengeance, retaliation, retribution, reprisal, …
vengeful, retaliatory, retributive; in revenge, in retaliation, …
take revenge, wreak vengeance, exact retribution, …
FrameNet Example
Slide thanks to CJ Fillmore (modified) Slide courtesy Jason Eisner, with mild edits
![Page 134: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/134.jpg)
Slide courtesy Jason Eisner
![Page 135: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/135.jpg)
600.465 - Intro to NLP - J. Eisner
Generating new text
1. Speech recognition (transcribe as text)
2. Machine translation
3. Text generation from semantics
4. Inflect, analyze, or transliterate words
5. Single- or multi-doc summarization
Slide courtesy Jason Eisner, with mild edits
![Page 136: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/136.jpg)
600.465 - Intro to NLP - J. Eisner
Deeper Information Extraction
1. Coreference resolution (within a document)
2. Entity linking (across documents)
3. Event extraction and linking
4. Knowledge base population (KBP)
5. Recognizing texual entailment (RTE)
Slide courtesy Jason Eisner, with mild edits
![Page 137: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/137.jpg)
600.465 - Intro to NLP - J. Eisner
User interfaces
1. Dialogue systems
▪ Personal assistance
▪ Human-computer collaboration
▪ Interactive teaching
2. Language teaching; writing help
3. Question answering
4. Information retrieval
Slide courtesy Jason Eisner, with mild edits
![Page 138: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/138.jpg)
600.465 - Intro to NLP - J. Eisner
Multimodal interfaces or modeling
1. Sign languages
2. Speech + gestures
3. Images + captions
4. Brain recordings, human reaction times
Slide courtesy Jason Eisner, with mild edits
![Page 139: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/139.jpg)
600.465 - Intro to NLP - J. Eisner
Discovering Linguistic Structure
1. Decipherment
2. Grammar induction
3. Topic modeling
4. Deep learning of word meanings
5. Language evolution (historical linguistics)
6. Grounded semantics
NLP automates things that humans do well, so that they can be done automatically on more sentences. But this slide is about language analysis that’s hard even for humans. Computational linguistics (like comp bio, etc.) can discover underlying patterns in large datasets: things we didn’t know!
Slide courtesy Jason Eisner, with mild edits
![Page 140: Overview of NLP Tasks and Featurization](https://reader030.vdocument.in/reader030/viewer/2022012411/616a82d72431d402341a6ed3/html5/thumbnails/140.jpg)
Today’s Learning Goals
• Learn about the many depths of language (the “iceberg”)
• Define featurization and some examples
• Learn about NLP Tasks at a high-level:
– Document classification
– Part of speech tagging
– Syntactic parsing
– Entity id/coreference