the semantics and pragmatics of natural language
TRANSCRIPT
![Page 1: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/1.jpg)
The Semantics and Pragmatics
of Natural Language
Daniela GÎFU
http://profs.info.uaic.ro/~daniela.gifu/
“ALEXANDRU IOAN CUZA” UNIVERSITATY OF IAŞI
FACULTY OF COMPUTER SCIENCE
![Page 2: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/2.jpg)
Course 1
SPNL OVERVIEW
2
![Page 3: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/3.jpg)
https://profs.info.uaic.ro/~daniela.gifu/
Who am I?
![Page 4: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/4.jpg)
“Alexandru Ioan Cuza” University of Iași
T H E H A L L O F T H E L O S T S T E P S
![Page 5: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/5.jpg)
Faculty of Computer Science
BE AMONG THE FIRST…..
![Page 6: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/6.jpg)
Romanian Academy
![Page 7: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/7.jpg)
![Page 8: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/8.jpg)
What is this course about?
➢ Meaning and Natural Language Processing (NLP)
➢ Computational Semantics
➢ Computational Pragmatics
8
![Page 9: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/9.jpg)
Familiarization
with relevant terminology
• Semantics
• Pragmatics
• Natural language
• Computational Linguistics
• Natural Language Processing
…9
![Page 10: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/10.jpg)
Simulation of human (natural)
intelligence by machines
Interdisplinary field ~
Scientific study of
language from a
computational
perspective
A discipline that spans
theory and practice to
understand
computer systems and
networks at a deep level.10
![Page 11: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/11.jpg)
Computational Linguistics (CL)
vs.
Natural Language Processing (CLP)
11
![Page 12: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/12.jpg)
CL = gives theoretical background (computational
theories on language), linguistics models.
NLP = applied CL, including:
- natural language technology (NLT)
- human language technology (HLT)
12
Researches
Engineering techniques have to be underpinned by scientific
understanding…
Good performances in some
tasks when large amount of data
(with annotation) are available
![Page 13: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/13.jpg)
Spoken language
- speech processing (from speech to text to syntax and
semantics to speech) - https://speechlogger.appspot.com/ro/
Ex: mobile
Written language – my area of interest
Language in correlation with other modalities
(multimodality)
- speech
- intonation
- image
Ex: GPS (Global Positioning System)13
Natural Language Technology
![Page 14: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/14.jpg)
Document segmentation and interpretation
– cleaning (elimination of dots, enhancing contrast,
etc.)
– separation of text from image, curved lines...
– recognizing printed, semi-uncial characters, etc.
• Optical Character Recognition (OCR)
~ 100% accuracy in scanning printed Latin script
based material
Challenge in OCR
14
Written Language Technologies
Students?
![Page 15: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/15.jpg)
15
OCR Handwriting – Why?
= presents some unique particularities
= many varieties of cursive writing
see: https://pdf.iskysoft.com/ocr-pdf/handwriting-ocr.html
![Page 16: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/16.jpg)
16
OCR Handwriting very challenging
= the interpretation of physician handwriting (Rasmussen,
L.V. et al., 2012; Broda. B. & Piasecki, M., 2007)
= analysis of old handwritten documents (useful for linguists,
musicians, historians, etc.)
Document Image
Analysis
PR = a sub-topic of machine learning
(description or classification (recognition) of
measurements.
![Page 17: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/17.jpg)
17
Differences between CL Approaches
![Page 18: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/18.jpg)
•Analysis and understanding of written language
– sub-syntactic processing
• lexical units
• sentence splitting
• clause borders
• part of speech and morphological information
• lemmas
• entity names
• groups (nominal, verbal, prepositional, etc.)
and lexical attractions (collocations)
18
Written Language Technologies
![Page 19: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/19.jpg)
• Language analysis and understanding
– semantic and discourse processing
• semantic disambiguation → word senses
• semantic roles labeling → NLTK
• rhetorical structure of discourse and dialogue →
RST (Rhetorical Structure Theory)
• anaphora resolution → StandfordCoreNLP
• text summarization → Machine Learning
19
Written Language Technologies
![Page 20: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/20.jpg)
20
the study of mathematical structures and methods that are
of importance to linguistics.
→ Phonetics → Phonology → Morphology →
Syntax and → Semantics → and…
Sociolinguistics → Language Acquisition.
20
Mathematical Linguistics
Mathematical Linguistics before Computational Linguistics….
ML ⇔ CL?
![Page 21: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/21.jpg)
= art of solving problems that need to analyze
(or generate) natural language text.
Find that metrics for a good solution to the
engineering problem…
NLP
Google Translate – Don’t blame!!!!
Romanian = Luceafărul de dimineață
English = The morning gentleman (bad answer)
= Morning star (good answer)
Why????
explains how human translators do their job...
21
Let’s try!
![Page 22: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/22.jpg)
22
NLP – a subdomain of
Artificial Intelligence & Linguistics
Thematic Areas
- Linguistics - mathematical linguistics - computational
linguistics
- Formal Language
- Linguistic and Language Processing
- The grammatical structure of utterances: the sentence,
constituents, phrase, classifications and structural rules,
syntactic processing ...
- Parser or Syntax Analyzer
- Semantics & Pragmatics
![Page 23: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/23.jpg)
= an area of Artificial Intelligence (AI) devoted to
creating computers that use NL as input and/or
output.
NLP
23
AI-hard problem
= machine reading
comprehension
= produces language
as output on the basis
of data input
![Page 24: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/24.jpg)
= developing computational methods/models of human
linguistics behavior.
CL
▪ INFORMATION RETRIEVAL
▪ INFORMATION EXTRACTION
▪ MACHINE TRANSLATION
▪ QUESTION – ANSWERING
▪ SUMMARIZATION
▪ MACHINE READABLE DICTIONARIES
▪ SPELLING & GRAMMAR CHECKERS
…
24
Let’s describe and exemplify
![Page 25: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/25.jpg)
2525
A discipline concerned with understanding written and spoken
language from a computational perspective.
- detecting synonymy (Grigonytė et al., 2010);
- developing WordNet (including Romanian - Gala et Mititelu,
2013), (Iftene and Balahur, 2007)...;
- WSD (Yang, H. et al. 2010), (Lefever et Hoste, 2010), (Tufiș,
2002)...;
- semantic annotation (Garcia et al., 2012)...;
- reconstructing a diachronic morphology (Cristea et al.,
2007/2012)
- diachronic text classification (Mihalcea and Năstase, 2012;
Popescu and Strapparava, 2015), etc.
- epoch detection (Gifu, 2015/2016/2017)...;
CL – Applications
Tools developed
by students…
![Page 26: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/26.jpg)
26
Linguistic & Language Processing
1. Linguistics
- Science of language. Includes:
✓ Sounds (phonology)
✓ Word formation (morphology)
✓ Sentence structure (syntax)
✓ Meaning (semantics) and understanding
(pragmatics)…
2. Levels of linguistic analysis
- Higher level → Speech Recognition (SR)
- Lower levels → Natural Language Processing (NLP)
![Page 27: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/27.jpg)
27
Levels of Linguistic Analysis
NLP
Letters - strings
Morphemes
Words
Phrases & sentences
Meaning out of context
Meaning in context
Phonemes
Acoustic signal
Speech
Recognition
Phonetics – production and perception of speech
Phonology – Sound patterns of language
Lexicon – Dictionary of words in a language
Morphology – Word formation and structure
Syntax – Sentence structure
Semantics – Intended meaning
Pragmatics – Understanding from external info
![Page 28: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/28.jpg)
NLP Pipeline
Course purpose
28
![Page 29: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/29.jpg)
29
MAIN CONCEPTS
1. Natural Language
- used by human beings for communication...
- sign, system, symbols, rule-set (or grammar)
2. Semantics
- literal meaning determined from a word, phrase,
sentence.
3. Pragmatics
- contextual meaning {situation, speaker, etc.}
![Page 30: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/30.jpg)
30
Natural or ordinary language
• A system of speech symbols → (form criterion)
Types:
a) speech (spoken language)
b) signing (written language) - the representation of a spoken or
gestural language.
• The most important means of human communication →
(function criterion)
![Page 31: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/31.jpg)
31
Natural Language…• Multiplicity of languages
![Page 32: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/32.jpg)
32
Formal Language_I
1. Symbol
- a character, an abstract entity that has no meaning by
itself
Ex: lettters, digits and special characters
2. Alphabet
- finite set of symbols
- often denoted by Σ
Ex:
B = {0, 1} says B is an alphabet of two symbols, 0 and 1
C = {a, b, c} – C an alphabet of 3 symbols, a, b and c
* More about formal language:
http://www.its.caltech.edu/~matilde/FormalLanguageTheory.pdf
![Page 33: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/33.jpg)
33
Formal Language_II
3. String or word
- a finite sequence of symbols from an alphabet
Ex: 01110 and 111 are strings from the alphabet B above
aaabccc and b are strings from the C above
4. Sentence
- a string of words.
Ex: I saw the gentleman with the hat.
String = a b c d e b f
![Page 34: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/34.jpg)
34
Formal language_III
Define possible relations of parts of a string to each other?
A.
[I] saw the gentleman [with the binocular] = [a] b c d [e b f]
B.
I saw [the gentleman with the binocular] = a b [c d e b f ]
We can represent structures with trees…
I saw the gentleman with the binocular. I saw the gentleman with the binocular.
![Page 35: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/35.jpg)
35
Formal Language_IV
5. Language
- a set of strings of symbols from an alphabet.
6. Natural Language or ordinary language
- open-ended = built on 3 different knowledge components: the
sound of words - phonology; the meaning of words -
semantics; the grammatical rules according to which words are
put together - syntax.
7. Formal language
- a set L of sequences/strings over some finite alphabet Σ
- described using formal grammars (a set of rules for strings,
specified to it).
- many application (e.g., Prognosis wearable system)
![Page 36: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/36.jpg)
36
Formal Language_VContext-Free Grammars (CFG) - a finite set of grammar rules https://www.tutorialspoint.com/automata_theory/context_free_grammar_introduction.htm
= a quadruple (N, T, P, S) , where:
N = a finite set of non-terminal symbols (character or variable).
Note! Each n ∈ N = type of phrase/clause in the sentence.
T = a finite set of terminals (an alphabet, defined by the grammar) disjoint of N: N ∩ T = NULL.
P = a finite set of (rewrite) rules or productions of the grammar, from N to
P: N → (N ∪ T)*
Note! The left-hand side of the production rule P does have any right context or left
context. * = Kleene star operation = unary operation on sets of strings or sets of symbols or
characters → a set N is written as N* (used for regular expressions).
Ex: {"a", "b", "c"}* = {ε, "a", "b", "c", "aa", "ab",
"ac", "ba", "bb", "bc", "ca", "cb", "cc", "aaa", "aab",
...} - {ε} (the language consisting only of the empty string)
S = start symbol/start symbol, used to represent the whole sentence.
![Page 37: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/37.jpg)
37
Main Concepts - IICONCLUSIONS
Computational semantics and pragmatics:
➢ automatic construction of semantic representations for NL
expressions (in context).
➢ automatic inferences over the representations.
Major Issues:
➢Ambiguity of various levels:
lexical, syntactic, semantic, pragmatic
➢ Interface between LF from linguistic form and context of use
(essential for modelling anaphora).
Tools used include:
➢ Information: syntax, world knowledge, lexical semantics,
corpora…
➢ Inference: logic (model checkers and theorem proving), machine
learning, statistics…
![Page 38: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/38.jpg)
38
Semester Homework:
1. Each student has to present a paper about
his/her SEMEVAL task that guide final project
- https://aclweb.org/anthology/
between 2018-2021
EMNLP (Empirical Methods on Natural Language
Processing)
ACL (Association of Computational Linguistics)
EACL (European Association of Computational
Linguistics)
COLING (International Conference on
Computational Linguistics) …
![Page 39: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/39.jpg)
39
Final project: SEMEVAL 2022
Groups structured by 2-3 students:
- 1-2 humanists & 1 computer scientists prepare a paper
at the SEMEVAL-2022 based to their research
supervised constantly -
https://semeval.github.io/SemEval2022/tasks
![Page 40: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/40.jpg)
40
Projects steps – next time
1. Form a team...
2. Choose a task
3. Define the teamwork
4. Establish the modular structure
5. Edit the paper – a possible structure
![Page 41: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/41.jpg)
41
5. Edit the paper – making and outline
* Choosing a Title
* Abstract (executive summary) & Keywords
* Introduction (the new approach; background
information; research problem/question; theoretical
framework)
* SOTA (citation tracking; content alert services;
evaluating sources; primary sources; secondary sources…)
* Methodology (qualitative methods; quantitative
methods)
* Results
* Discussion
* Conclusions and future work
* References
![Page 42: The Semantics and Pragmatics of Natural Language](https://reader031.vdocument.in/reader031/viewer/2022011901/61d630909d14531c1f70e5fd/html5/thumbnails/42.jpg)
Thank you!
42