introduction to natural language processing...introduction to natural language processing bernardo...
TRANSCRIPT
![Page 1: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/1.jpg)
1NLP, Bologna May 22 2017 - Bernardo Magnini
Introduction to Natural Language Processing
Bernardo MagniniFBK, Trento, [email protected]
![Page 2: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/2.jpg)
2NLP, Bologna May 22 2017 - Bernardo Magnini
Outline
• What is Natural Language Processing (NLP)• Challenges in NLP
– Ambiguity, redundancy– Lack of knowledge, need of inferences– Probabilistic judgments
• Natural Language Processing: where we are– Applications– Current limitations
• Several approaches– Frame semantics– Distributional semantics– Probabilistic models
![Page 3: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/3.jpg)
3NLP, Bologna May 22 2017 - Bernardo Magnini
What is Computational LinguisticsComputational Linguistics (CL) is the scientific study of language from a computational perspective. [www.aclweb.org/]
The long term goal is to realize machines that understand natural languages (e.g. English, German, Italian) both spoken and writtenOther terminology:
– Natural Language Processing (NLP)– Human Language Technology (HLT)
A Computational Linguist is a scientist in CL: background in linguistics or computer science
The Association for Computational Linguistics (ACL) is the referent scientific society for CL
• Journals: Computational Linguistics, Transactions of the ACL, JNLE, etc.• Conferences: ACL, EACL, EMNLP, COLING, LREC, etc.
![Page 4: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/4.jpg)
4
Natural Language InterpretationText
Lexical analysis
Word Sequence
Sentence structure
Logical Form
Interpretation
Syntax (Parsing)
Semantics
Pragmatics
World Knowledge
GrammarLexicon
NLP, Bologna May 22 2017 - Bernardo Magnini
Speech
Context
![Page 5: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/5.jpg)
5
Lexical AnalysisWord level
– Tokenization (the role of punctuation)– Morphological analysis
• Lemma, part of speech, morphological features
World War One veteran becomes world's oldest man.
1. World(WORLD NOUN COMMON M SING)2. War(WAR NOUN COMMON F SING) 3. One(ONE ADV NEG) 4. veteran(VETERAN NOUN COMMON SING) 5. becomes(BECOME VERB PRES 3 SING) 6. world(WORLD NOUN COMMON M SING)7. ‘s(‘S POS)8. oldest(OLD ADJECTIVE M SING)9. man(MAN NOUN COMMON M SING)10..(. )
NLP, Bologna May 22 2017 - Bernardo Magnini
![Page 6: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/6.jpg)
6
Syntactic Analysis• Sentence level
– Shallow parsing: chunking[NP I] [VP ate] [NP the spaghetti] [PP with] [NP chopsticks].[NP I] [VP ate] [NP the spaghetti] [PP with] [NP meatballs].
– Deep parsing: the syntactic structure of a sentence is recognized– Syntactic constituents, syntactic ambiguities
NLP, Bologna May 22 2017 - Bernardo Magnini
![Page 7: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/7.jpg)
7
Semantic Intepretation
Giorgio loves Maria
PR-Noun (Giorgio) PR-Noun (Maria) Verbo (love(x-Subj, y-Obj))
NP (Giorgio-Subj) NP (Maria-Obj)
VP (Love(x, Maria-Obj))
S (love(Giorgio-Subj, Maria-Obj))
NLP, Bologna May 22 2017 - Bernardo Magnini
• Compositional view of meaning– Word sense disambiguation– The meaning of a sentence is built on the meaning of words
![Page 8: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/8.jpg)
8
Discourse Analysis and Pragmatics
• Sentences are interpreted in the communicative context in which they are uttered– Non linguistic context (e.g. time, place)– Anaphora and ellipsis resolution
I put pasta in the dish and then I ate it
– World knowledge is required (e.g. dishes are not food)– Discourse relations (e.g. temporal connectives)– User model (e.g. profiling)
NLP, Bologna May 22 2017 - Bernardo Magnini
![Page 9: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/9.jpg)
9NLP, Bologna May 22 2017 - Bernardo Magnini
NLP: A Difficult Challenge
![Page 10: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/10.jpg)
10NLP, Bologna May 22 2017 - Bernardo Magnini
Redundancy: different expressions (birthplace, native city) for the same meaning
NLP: A Difficult Challenge
![Page 11: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/11.jpg)
11NLP, Bologna May 22 2017 - Bernardo Magnini
Ambiguity: (which “Mozart”)same expression has differentmeanings
NLP: A Difficult Challenge
![Page 12: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/12.jpg)
12NLP, Bologna May 22 2017 - Bernardo Magnini
Incompleteness: inferences are needed (house - located_in -Salzburg)
CL: A Difficult Challenge
![Page 13: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/13.jpg)
13NLP, Bologna May 22 2017 - Bernardo Magnini
Non literal meaning: “saw the light”
CL: A Difficult Challenge
![Page 14: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/14.jpg)
14
CL: Where we are
ü Natural Language Understanding
ü Artificial Intelligence
ü An interdisciplinary field: computer science, statistics, linguistics, psychology, philosophy of language, …
Hal 9000 Space Odyssey – 1968
IBM Watson at Jeopardy Challenge - 2011
ü Rule-based systems: 70’-80’
ü Data-driven approaches: 90’- now
ü Enabling technology: search engines, translation, voice commands
NLP, Bologna May 22 2017 - Bernardo Magnini
![Page 15: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/15.jpg)
15
Where we are: Personal AssistantVoice commands– Speech recognition– Interpretation of simple questions
Context aware– Know where you are“the closest restaurant…”
Personalized– Know your social network“send a message to my wife…”
NLP, Bologna May 22 2017 - Bernardo Magnini
![Page 16: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/16.jpg)
16
Where we are: Personal AssistantVoice commands– Speech recognition– Interpretation of simple questions
Context aware– Know where you are“the closest restaurant…”
Personalized– Know your social network“send a message to my wife…”
Still, poor dialogue, restricted domains, noisy environments, …
NLP, Bologna May 22 2017 - Bernardo Magnini
![Page 17: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/17.jpg)
17
Where we are: Subtitling and Translation
- Real-time transcription
ü Quasi real-time translation
ü Applications:
ü media content (BBC)
ü Skype translator
ü education (lectures)
ü Language acquisition
NLP, Bologna May 22 2017 - Bernardo Magnini
![Page 18: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/18.jpg)
18
Where we are: Subtitling and Translation
Real-time transcription
ü Quasi real-time translation
ü Applications:
ü media content (BBC)
ü Skype translator
ü education (lectures)
ü Language acquisition
Still, poor quality of translation (compared to professional level), …
NLP, Bologna May 22 2017 - Bernardo Magnini
![Page 19: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/19.jpg)
19
Where we are: Semantic Tagging
The semantic web (3.0)– Using metadata to tag “post”– Crucial for semantic search– Multimedia tagging– Sentiment
The web of data- Linking text to structured data(Open Linked Data, Wikipedia)
NLP, Bologna May 22 2017 - Bernardo Magnini
![Page 20: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/20.jpg)
20
Where we are: Semantic Tagging
The semantic web (3.0)– Using metadata to tag “post”– Crucial for semantic search– Multimedia tagging– Sentiment
The web of data- Linking text to structured data(Open Linked Data, Wikipedia)
Still, tagging “big data” is computationally expensive, portability (domains, languages) is very poor
NLP, Bologna May 22 2017 - Bernardo Magnini
![Page 21: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/21.jpg)
21
Where we are: Entity-Based SearchFrom key-words to objects– Information extraction from large-
scale archives: entities, persons, locations, institutions, relations.
NLP, Bologna May 22 2017 - Bernardo Magnini
![Page 22: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/22.jpg)
22
Where we are: Entity-Based SearchFrom key-words to objects– Information extraction from large-
scale archives: entities, persons, locations, institutions, relations.
Still, onlysimple entities, no events,
coreference problematic for low frequent entities, …
NLP, Bologna May 22 2017 - Bernardo Magnini
![Page 23: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/23.jpg)
23
Where we are: Deep Understanding
Semantic inferences– Entailment, similarity, causality,
temporal relations– Probabilistic judgements
NLP, Bologna May 22 2017 - Bernardo Magnini
chocolates' price is unashamedly high
chocolate bars cost like gold bars
food on train is too expensive
food is too expensivefood costs too much
food is expensive
food on train is expensive
sandwiches are overpricedthey charge too much for
sandwichessandwiches cost too much
moneysandwiches cost too much
0.9
chocolates' price is high
0.93 3
5
2
2
7
2
2
1.0
1.0
1.0
1.0
1.0
food in economy costs too much
0.87
1
![Page 24: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/24.jpg)
24
Where we are: Deep Understanding
Still, performance are poor, lack of clear models,
available datasets are very small, …
NLP, Bologna May 22 2017 - Bernardo Magnini
chocolates' price is unashamedly high
chocolate bars cost like gold bars
food on train is too expensive
food is too expensivefood costs too much
food is expensive
food on train is expensive
sandwiches are overpricedthey charge too much for
sandwichessandwiches cost too much
moneysandwiches cost too much
0.9
chocolates' price is high
0.93 3
5
2
2
7
2
2
1.0
1.0
1.0
1.0
1.0
food in economy costs too much
0.87
1
Semantic inferences– Entailment, similarity, causality,
temporal relations– Probabilistic judgements
![Page 25: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/25.jpg)
25
1. Frame Semantics
![Page 26: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/26.jpg)
26
AppleSiri• http://www.apple.com/it/ios/siri/• PersonalvoiceassistantonallAppledevices• Since2011,severallanguages• Vocalcommands:email,appointments,news,maps,pointsof
interest,whetherforecast,booking,search,…• Integratedintothirdpartyapp:WhatsApp,Linkedin,Pinterest,…
NLP, Bologna May 22 2017 - Bernardo Magnini
![Page 27: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/27.jpg)
27
“FrameandSlots”Semantics
Add Statistical classifiers to map words to semantic frame- fillers
Frame
Slot
Filler
![Page 28: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/28.jpg)
28
Managingconversations…
• Arule-baseddialoguemanager.Ateachdialoguestatethesystemtriestofillaslotwithuserinformation.
![Page 29: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/29.jpg)
29
“ActiveOntology”(AppleSiri)
• Active Ontology: relational network of concepts• rule sets that perform actions on concepts
![Page 30: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/30.jpg)
30
Frames and LinguisticsFrameNet [Fillmore et al. 01]
Frame: Hit_target(hit, pick off, shoot)
AgentTarget
InstrumentManner
MeansPlace
PurposeSubregion
Time
Lexical units (LUs):Words that evoke the frame(usually verbs)
Frame elements (FEs):The involved semantic roles
Non-CoreCore
[Agent Kristina] hit [Target Scott] [Instrument with a baseball] [Time yesterday ].
NLP, Bologna May 22 2017 - Bernardo Magnini
![Page 31: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/31.jpg)
31
Abstract Meaning Representation
Once when I was six years old I saw a magnificent picture in a book , called True Stories from Nature , about the primeval forest .
(s / see-01:ARG0 (i / i):ARG1 (p / picture
:mod (m / magnificent):location (b2 / book :wiki -
:name (n / name :op1 "True" :op2 "Stories" :op3 "from" :op4 "Nature"):topic (f / forest
:mod (p2 / primeval)))):mod (o / once):time (a / age-01
:ARG1 i:ARG2 (t / temporal-quantity :quant 6
:unit (y / year))))
NLP, Bologna May 22 2017 - Bernardo Magnini
Semantic roles asslots
![Page 32: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/32.jpg)
32
Semantic Role Labeling (SRL)
• SRL can be treated as an sequence labeling problem.
• For each verb, try to extract a value for each of the possible semantic roles for that verb.
• Employ any of the standard sequence labeling methods (e.g. machine learning algorithms)
Computational Linguistics 2016 - Bernardo Magnini
![Page 33: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/33.jpg)
33
SRL with Parse Trees
• Assume that a syntactic parse is available.• For each predicate (verb), label each node in the parse tree
as either not-a-role or one of the possible semantic roles.
S
NP VP
NP PP
The
Prep NP
with
the
V NP
bit
a
big
dog girl
boy
Det A NDet A N
εAdj A
ε
Det A N
ε
Color Code:not-a-roleagent patientsourcedestinationinstrumentbeneficiary
Computational Linguistics 2016 - Bernardo Magnini
Node to be classified Predicate
![Page 34: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/34.jpg)
34
Parse Tree Path Feature
S
NP VP
NP PP
The
Prep NP
with
the
V NP
bit
a
big
dog girl
boy
Det A NDet A N
εAdj A
ε
Det A N
ε
Path Feature Value:
V ↑ VP ↑ S ↓ NP
Computational Linguistics 2016 - Bernardo Magnini
![Page 35: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/35.jpg)
35
SRL FeaturesS
NP VP
NP PP
The
Prep NP
with
the
V NP
bit
a
big
dog girl
boy
Det A NDet A N
εAdj A
ε
Det A N
ε
Phrasetype
ParsePath
Position Voice Headword
NP V↑VP↑S↓NP precede active dogComputational Linguistics 2016 - Bernardo Magnini
![Page 36: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/36.jpg)
36Computational Linguistics 2016, Bernardo Magnini
Lexical MeaningWord Meanings may help slot filling (e.g. a food is expected as patient of eat)We need lexical dictionaries: A snapshot of the WordNet hierarchy
http://www.cogsci.princeton.edu/~wn/
![Page 37: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/37.jpg)
37Computational Linguistics 2016, Bernardo Magnini
BabelNet: A Rich and Multilingual Semantic Network
http://babelnet.org
![Page 38: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/38.jpg)
38
•Detecting and collecting as much as possible information from text (unstructured) and store it in a knowledge-base (structured). The goal is then to retrieve and use information from the KB rather than from texts.•Linking against existing repositories
– Wikipedia info-boxes•Cross-document coreference: are we talking about the same entity/event?
– Need of world knowledge to fill textual gaps•Building a KnowledgeGraph (Google-like)
OntologyPopulationandLinking
![Page 39: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/39.jpg)
39
KnowledgeStore in
![Page 40: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/40.jpg)
40
Interpreting / extracting / aligning knowledge from different media (e.g., video, commentary, images, text, …)
A Multimodal KnowledgeStore
Frame Commentary Knowledge
“Sanchez, Sanchez,. . . goal. Sanchez equalizes for Chile”
dbpedia:Alexis_Sanchez scorestAt 32min
“Yellow card for the Chilean defender” dbpedia:Mauricio_Pinilla yellowCardAt 102min
“Now is Marcelo turn, to kick the fourth penalty”
“Marcelo. . . Goal”
dbpedia:Marcelo_Vieira kicks SuppPenalty4SuppPenalty4 leadsTo goal
![Page 41: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/41.jpg)
41Computational Linguistics 2016, Bernardo Magnini
2. Representing Word Meaningwith Vectors
![Page 42: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/42.jpg)
42
What is the meaning of “bardiwac”?
• He handed her glass of bardiwac.• Beef dishes are made to complement the bardiwacs.• Nigel staggered to his feet, face flushed from too much
bardiwac.• Malbec, one of the lesser-known bardiwac grapes, responds
well to Australia’s sunshine.• I dined off bread and cheese and this excellent bardiwac.• The drinks were delicious: blood-red bardiwac as well as light,
sweet Rhenish.
Þ Bardiwac ???
Stefan Evert 2010Computational Linguistics 2016 - Bernardo Magnini
![Page 43: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/43.jpg)
43
What is the meaning of “bardiwac”?
• He handed her glass of bardiwac.• Beef dishes are made to complement the bardiwacs.• Nigel staggered to his feet, face flushed from too much
bardiwac.• Malbec, one of the lesser-known bardiwac grapes, responds
well to Australia’s sunshine.• I dined off bread and cheese and this excellent bardiwac.• The drinks were delicious: blood-red bardiwac as well as light,
sweet Rhenish.
Þ bardiwac is a heavy red alcoholic beverage made from grapes
Stefan Evert 2010Computational Linguistics 2016 - Bernardo Magnini
![Page 44: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/44.jpg)
44
Geometric interpretation of Word Meaning
• row vector xdogdescribes frequency of word dog in the corpus
• can be seen as coordinates of point in n-dimensional Euclidean space Rn
Stefan Evert 2010Computational Linguistics 2016 - Bernardo Magnini
![Page 45: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/45.jpg)
45
Distance and similarity
• illustrated for two dimensions: get and use: xdog = (115, 10)
• similarity = spatial proximity (Euclidean distance)
• location depends on frequency of noun (fdog » 2.7 · fcat)
45Computational Linguistics 2016 - Bernardo Magnini Stefan Evert 2010
![Page 46: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/46.jpg)
46
Vector Space
Postulate: Words that are “close together” in the vector space have similar meaning. There is one dimension for each term in the document collection.
t1
d2
d1
d3
d4
d5
t3
t2
θφ
Computational Linguistics 2016, Bernardo Magnini
![Page 47: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/47.jpg)
47
Angle and similarity
• direction more important than location
• normalise “length”||xdog|| of vector
• or use angle a as distance measure
47
a
Stefan Evert 2010Computational Linguistics 2016 - Bernardo Magnini
![Page 48: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/48.jpg)
48
Application: Clustering
Adapted from Stefan Evert 2010Computational Linguistics 2016 - Bernardo Magnini
![Page 49: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/49.jpg)
49
Embeddings: Dimensionality Reduction
• One-Hot vector (a vector with only 1)• Social= [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]• Private= [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0]
– Easy to obtain, but computationally expensive (vector size = vocabulary size)
– Do not generalize– Risk of overfitting (due to size)
• Dense vectors have better performances and reduce the risk of overfitting:– Social = [0.5, 0.3, 1.0] – Private = [0.2, 1.0, 0.4]
![Page 50: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/50.jpg)
50
Using Neural Networks to Build Embeddings
Transform the original sparse vector
INTOA new dense and optimized vector
CLASSIFIERINPUT
![Page 51: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/51.jpg)
51
Neural Word Embedding
In this example we suppose 10,000 words in the vocabulary. The network extracts dense vectors of 300 elements for each word.
Vocabulary size
Dense vector size
![Page 52: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/52.jpg)
52Computational Linguistics 2016, Bernardo Magnini
3. Representing Meaningwith Probabilistic Models
![Page 53: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/53.jpg)
53
Linguistic events are not uniformly distributed (Zipf Law)
1 10 100 1.000 10.000 log rank
log
freq
uenc
y
1
10
100
1000
1000
0
Logarithmic scale: log f(z) = logK - a log z
With K= 6185, i.e. f x z = 6185
Pinocchio
The slope of the curve is defined by the coefficient a (1)
![Page 54: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/54.jpg)
54Computational Linguistics 2013, Bernardo Magnini
Linguistic Events• Examples of linguistic events
– The probability that the word “dog” occurs after the word “the”– The probability that the word “race” is a NOUN given that the
word before is an ARTICLE– The probability that the translation of “chair” is “sedia” given
that the word before is translated as “tavolo”– The probability that “dog” is a SUBJECT of the verb “bark”– The probability that “John Smith” is a PERSON– The probability of the sentence “My dog barks”– The probability of a text like “Pinocchio” to be generated
• Estimate posterior probabilities for linguistic events: – use corpora for empirical observations
![Page 55: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/55.jpg)
55Computational Linguistics 2013, Bernardo Magnini
Language Model• Define a statistical (stochastic) model of language
1. Observe linguistic phenomena on a training corpus2. Estimate probabilities of linguistic events (i.e. the language
model)3. Apply the model to a new (not yet observed) set of data
• A text T is seen as a sequence of simple events e1, e2, … e|T|, (not necessarily independent), each of them representing the occurrence of a word in T.
• Example: The1 dog2 sleeps3 here4 the5 dog6 sleeps7 there8 the9 dog10 eats11here12 the13 cat14 eats15 there16 a17 cat18 sleeps19 …..
• e1= The, e2= dog,…– p(ei=w) is the probability that the word type w
occurs as the i event in T
![Page 56: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/56.jpg)
56
Guess ofunknown
parameters(probabilities)
initialguess
Maximization step
Observed structure(words)
Estimate probabilities for linguistic events: EM Approach
Guess of unknownhidden structure
(tags, parses, etc.)
Expectation step
![Page 57: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/57.jpg)
5757
Guess ofunknown
parameters(probabilities)
M step
Observed structure(words, ice cream)
For Hidden Markov Models
Guess of unknownhidden structure
(tags, parses, weather)
E stepinitialguess
![Page 58: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/58.jpg)
58600.465 - Intro to NLP - J. Eisner 58
Guess ofunknown
parameters(probabilities)
M step
Observed structure(words, ice cream)
For Hidden Markov Models
Guess of unknownhidden structure
(tags, parses, weather)
E stepinitialguess
![Page 59: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/59.jpg)
59600.465 - Intro to NLP - J. Eisner 59
Guess ofunknown
parameters(probabilities)
M step
Observed structure(words, ice cream)
For Hidden Markov Models
Guess of unknownhidden structure
(tags, parses, weather)
E stepinitialguess
![Page 60: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/60.jpg)
60
NLP Research at FBK-irst (Trento)
• Human Language Technologies (HLT) research unit– Natural Language processing– Machine Translation– Speech Recognition
• Language Technologies– TextPro platform: basic text processing in Italian
and English– Speech Recognition system
• High level education– PhD students
• Internship, thesis• LCT seminars
• Spin-off: Pervoice, Spazio Dati, Cross Library Service, Semantic Valley
NLP, Bologna May 22 2017 - Bernardo Magnini
![Page 61: Introduction to Natural Language Processing...Introduction to Natural Language Processing Bernardo Magnini FBK, Trento, Italy magnini@fbk.eu NLP, Bologna May 22 2017 -Bernardo Magnini](https://reader030.vdocument.in/reader030/viewer/2022041106/5f0856b47e708231d421847e/html5/thumbnails/61.jpg)
61
Textbooks
1. [JM] Jurafsky-Martin, Speech and Language Processing, 2009. (third edition in preparation)
2. [MRS] Manning, Raghavan, Schutze, An Introduction to Information Retrieval, CUP, 2008.
3. [CFL] Clark, Fox Lappin, The handbook of Computational Linguistics and Natural Language Processing , 2010.
4. [MS] Manning-Schutze, Foundation of Statistical Natural Language Processing, 1999.
5. [CL] Mitkof, Handbook of Computational Linguistics, 2003.6. [JM] Jackson-Moulinier, Natural Language Processing for Online
Applications, 2002..7. [AE] Agirre-Edmond, Word Sense Disambiguation, 2006.8. [WN] Fellbaum, WordNet, MIT Press, 1998.9. [AL] Allen, Natural Language Processing, 1995.
NLP, Bologna May 22 2017 - Bernardo Magnini