natural language understanding difficulties: large amount of human knowledge assumed – context is...
TRANSCRIPT
Natural Language Understanding
Difficulties: Large amount of human knowledge assumed –
Context is key. Language is pattern-based. Patterns can restrict
possible interpretations. Language is purposeful. There is a goal behind an
utterance.
Other Difficulties
Ambiguity (different levels) word meanings syntactic structure referential ambiguity intentional ambigiuity
Imprecision Idioms, Jargon, Slang Language changes
Analysis of Language
Analysis of Language occurs at different levels: Prosody – rhythm and intonation Phonology – sound formation (from phonemes) Morphology – word formation (from morphemes) Syntax – phrase and sentence formation Semantics – applying meaning to expressions Pragmatics – how language is used World knowledge – contextual information
Processing Language
Parsing – analyzing the syntactic structure of
sentences, often resulting a parse tree. Semantics – analyzing the meaning of sentences,
resulting in semantic networks, logical statements,
or other KR. Integration of world knowledge – add appropriate
knowledge from the domain of discourse. Use of knowledge learning from discourse.
Processing (cont'd)
Often, the steps are done sequentially (parse syntax
of sentences, make semantic inferences, add
domain knowledge, use result), with the output of
one stage becoming the input to the next stage.
Alternatively, fragments may be pass along once
they are determined (incremental parsing).
Feedback may be necessary to resolve references (“I
shot the bear in my pajamas) – blackboard
systems.
Context-Free Grammars
A good deal of syntax can be represented by using
context-free grammars (cfg). Rules are of the
form:<non-terminal> <- list of <terminals> and <non-terminals>
non-terminals are syntactic categories, terminals are
words (and punctuation). One non-terminal is
“sentence”
CFG Example
sent <- np, vp.
np <- noun.
np <- art, noun.
np <- art, adj, noun.
vp <- verb.
vp <- verb, np.
noun <- boy.
noun <- dog.
art <- a.
art <- the.
adj <- yellow.
verb <- runs.
verb <- pets.
Prolog Code
sent(X,Y) :- np(X,Z), vp(Z,Y).np(X,Y) :- noun(X,Y).np(X,Y) :- art(X,Z), noun(Z,Y).np(X,Y) :- art(X,Z), adj(Z,W), noun(W,Y).vp(X,Y) :- verb(X,Y).vp(X,Y) :- verb(X,Z), np(Z,Y).noun([boy|Y],Y).noun([dog|Y],Y).art([a|Y],Y).art([the|Y],Y).adj([yellow|Y],Y).verb([runs|Y],Y).verb([pets|Y],Y).
Prolog Example
| ?- sent([the,boy,pets,a,dog],[]).sent([the,boy,pets,a,dog],[]).
true ? ;
yes| ?- sent([the,boy,likes,a,dog],[]).sent([the,boy,likes,a,dog],[]).
no
Parsing
We can augment the Prolog program so that each
clause has a third variable, which contains the parse
tree of the phrase. The parse trees are built up
recursively.
Parsing Code
sent(X,Y,s(M1,M2)) :- np(X,Z,M1), vp(Z,Y,M2).np(X,Y,M) :- noun(X,Y,M).np(X,Y,np(M1,M2)) :- art(X,Z,M1), noun(Z,Y,M2).np(X,Y,np(M1,M2,M3)) :- art(X,Z,M1), adj(Z,W,M2), noun(W,Y,M3).
vp(X,Y,vp(M)) :- verb(X,Y,M).vp(X,Y,vp(M1,M2)) :- verb(X,Z,M1), np(Z,Y,M2).noun([boy|Y],Y,noun(boy)).noun([dog|Y],Y,noun(dog)).art([a|Y],Y,art(a)).art([the|Y],Y,art(the)).adj([yellow|Y],Y,adj(yellow)).verb([runs|Y],Y,verb(runs)).verb([pets|Y],Y,verb(pets)).
Parsing Example| ?- sent([the,boy,pets,a,dog],[],M).
M = s(np(art(the),noun(boy)),vp(verb(pets),np(art(a),noun(dog)))) ? ;
(1 ms) no| ?- sent([the,yellow,dog,runs],[],M).
M = s(np(art(the),adj(yellow),noun(dog)),vp(verb(runs))) ? ;
no
Semantics
Since we can use arbitrary Prolog code, it is possible
to add tests to the code. For example, we could
include a type system and only allow parses that are
consistent with the types (for example, only animate
actors)
In addition, we could return the meaning of the
phrase or sentence instead of just a parse tree.
Frame and Slot Notation
In this simple example, we will use a frame and slot
notation for the meaning of words, phrases, and
sentences. A meaning will consist of a pair
containing a head item and a list of slots, each of
which is an attribute/value pair. Values may be
variables to be instantiate at a later time.
Notation Examples
For example, a verb in Simmon's semantic
representation scheme has attributes agent and
object. The meaning of a verb, say, likes, could be
represented by the term
meaning([likes,[agent,X], [object,Y]], [[agent X],[object,Y]]))
X and Y will be instantiated by the meanings of other
words and phrases of the sentence.
More on Slots
The attribute names may be semantic relationships
(agent, object), or surface semantic relationships
(adjmod – the thing modified by an adjective, or
pobj – the object of a preposition). The slot filler
must come from an appropriate part of the sentence
as indicated by the grammar.
Another Example
prep([over|R], R, meaning([V,[location,[over,X]]A], [[pmod,[V|A]],[pobj,X]])).
The preposition over will modify the subject of the
preposition V (indicated by pmod) which may
already have a list of attributes A. The object of the
preposition X, is added to the list of attributes under
the attribute name location and value [over,X].
Semantics - Example
| ?- sent([i,shot,the,bear,in,my,pajamas],[],M).
M = meaning([shoot,[location,[in,[pajamas,[owner,me]]]],[agent,[i]],[object,[bear]],[time,past]],[]) ? ;
;
no
Phrase Structured Grammar
These kind of grammars are called phrase structured
grammars. As implement in Prolog, they have
equivalent computing power of any Turing
complete system and yet are simple to follow.
Alternative Methods
Chart Parsing (Early) – see bookTransition Network Parser: The grammar is
represented as a set of finite state machines
(transition diagrams). Each FSM implements a non-
terminal. Arcs are labeled with non-terminals or
terminals. In the former case, a subprogram is
invoked (jump to the network for that non-
terminal). A path from the start node to the end
node indicates acceptance.
Augmented Transition Networks
Procedures may be attached to arcs which are
triggered when the arcs are traversed. The
procedure may perform a test, or set a variable to a
value for later use. ATNS are often combines with
KR schema to produce a meaning of the sentence or
phrase (semantics).
Uses of Natural Language
Database Front-endQuestion and AnsweringInformation Extraction and Summary (Web)Next generation computingBetter than keyword search – incorporates context