swg strategy (c) copyright ibm corp. 2006, 2012. all rights reserved. international technology...
TRANSCRIPT
SWG Strategy
(C) Copyright IBM Corp. 2006, 2012. All Rights Reserved.
International Technology Alliance Programme:
Fact Extraction using a Controlled Natural Language
David Mott, Dave Braines, ETS, Hursley, IBM UK
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.2
Team
Dave Braines, David Mott
– IBM, Hursley
Steve Poteet, Ping Xue, Anne Kao
– Boeing, Seattle
Paul Smart, Antonio Penta, Ron Tasker
– University of Southampton
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.3
International Technology Alliance (ITA) in network and information sciences
How can coalition operations be assisted by networks of computer systems?
US/UK Academic/Industry collaboration
10 year programme ending in May 2016
– Sponsored by UK MOD and US ARL
– Research must be scientific, fundamental, reviewed by academic peers, and published
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.4
ITA Consortium Members
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.5
Fundamental Research Issues
How do we assist people to create and use applications that reason?– Modelling concepts, relationships and rules of inference
– Grasping the basic logic of the model and rules
– Understanding the reasoning performed by others
– Sharing understanding across the human team
– Sharing reasoning and artefacts across different systems
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.6
Supporting the "analyst"
doc27doc27
doc27
CE Facts
Inference Rationale
Argumentation
Query
Analysts Conceptual Model
Assumptions
Uncertainty CNL Tools
NLP
Requirements
Product
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.7
Analysts's "Conceptual Model"
Analyst represents specialist knowledge as concepts, facts and rules for inference
– a conceptual model
– a common set of concepts
The system must "understand" the conceptual model
– assist analyst to search for patterns, deduce information
A language to build the conceptual model
– analyst: easy to understand
– system: readable, unambiguous and formal
We use Controlled English to express the model
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.8
Controlled English
A Controlled Natural Language, being a subset of English
– limited syntax, but still readable as English
– meanings of the expressions unambiguously defined
Avoids the complexity of a real Natural Language
– computer systems can read, interpret and apply it
Retains the appearance of a real language
– humans can naturally use it, without learning "computer speak"
The analyst may use Controlled English to construct their Conceptual Model
the person John is married to the person Jane and has red as hair colour.
Based on work by John Sowa
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.9
CE for Reasoning CE used to define:
– "propositions", facts, assumptions– logical rules– queries– meta model of concepts
Inference engines constructed to apply logical rules– Specific Prolog implementations– CE Store based on Java and SQL
Rationale may be constructed:– presented to users for hybrid man/machine reasoning– to determine dependencies
Formal semantics for CE– (partially defined) in FOPL
Applications– analysis of information– societal and open government data– planning and resource allocation– (in progress) NLP
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.10
Fact Extraction using Controlled Natural Language
As the target of the NL processing
– facts in documents can be used for further reasoning
As a means of describing the NL processing
– to share understanding of the linguistic processing
– to help configure NL tooling
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.11
Controlled English is "Curiously Useful" – Why?
perhaps because humans are naturally good at using language to model, understand and reason
we can build upon "literary devices" already developed to solve problems in expressing knowledge
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.12
Conceptual Model(s)
Meta Model Concept, Entity Concept, Relation Concept, Conceptual Model
belongs to, has as domain
Semiotic Triangle
Thing, Meaning, Symbol stands for, expresses
General Agent, Spatial Entity, Temporal Entity, Situation, Container
has as agent role, is contained in
Linguistic Sentence, Phrase, Word, Noun, Linguistic Category, Linguistic Frame
has as dependent, is parsed from
ACM Place, Church, Person, Village, IED, Facility, .... is located in
meaning
symbol thing
conceptualises
stands for
expresses
"Our" Semiotic Triangle, based on the original [Ogden, C. K. and Richards, I. A. (1923). ]
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.13
Current NL Processing
StanfordParser
Entity Extractor
SituationExtractor
Names
CEAggregatorCEStore
SYNCOINReports
MessagePreProcessor
"Stylistic" CE
Conceptual Model(concepts, logical rules, linguistic expression)
Proper Nouns(places, units)
For Analysis
Our focus is on the semantics
of the conceptual
model
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.14
General Semantics: Containersif ( the prepositional phrase PP has the word '|in|' as head and has the noun phrase NP2 as object ) and ( the noun phrase NP2 stands for the thing T2 )then ( the thing T2 is a container ).
the noun phrase np1
the prepositional phrase pp1
has as dependent"the patrol in East Rashid discovers the facility."
the word |in|
the thing t1
stands for
the noun phrase np2
has as head has as object
container
is a
the thing t2
stands for
is contained in
if ( the noun phrase NP1 stands for the thing T1 and has the prepositional phrase PP as dependent ) and ( the prepositional phrase PP has the word '|in|' as head and has the noun phrase NP2 as object ) and ( the noun phrase NP2 stands for the container T2)then ( the thing T1 is contained in the container T2 ).
Least Commitment approach – dont say
what sort of container
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.15
Specific Semantics: Entities from Noun Phrases
the noun phrase np1
if ( the noun phrase NP has the noun N as head and stands for the thing T ) and ( the noun N expresses the entity concept C )then ( the thing T realises the entity concept EC ).
"the patrol in East Rashid discovers the facility."
the noun |patrol|
has as head
the thing s1
stands for
the entity concept 'patrol unit'
expresses
realises
patrol unit
Analyst's helper
is a
Requires "expresses" link between words and concepts
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.16
"Analyst's Helper"
Analyst HelperNL parser
"expresses"
conceptual model
Proper Names
wordnet/etc
meta information
ITAnet
MetaModel generator
gazetteers etc
Analyst
the word |xxx| is an unrecognised word
wordnet/etc gazetteers etc
translate translate
semantic rules the word |www| expresses the concept yyy
Only the analyst knows what the concepts mean
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.17
Current question
How should the "expresses" link be made more expressive!
– conditional rules to handle ambiguous words
– selectional constraints based on semantics of models?
– introduce verbnet, etc?
– ...
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.18
The ambiguity barrier
we start from basic CE and move towards full English
Can we control the crossing of the ambiguity barrier?
Basic CE
anaphoric reference
sub clauses
prepositional phrases flexible identities
verb inflections
domain specific syntax
Ambiguity
Ambiguity Barrier
Full English
CE needs to be enhanced
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.19
"Identical" NL and CNL parsers
NL Parser CNL Parserlexicon
conceptualmodel
Reference English
Grammar
SemanticTheory
Increase stylistic expressibility of CEBetter understanding of linguistics
stylistically expressive CE
basic CE or predicate logic orCE-in-Java
stylistically expressive CE
NLP
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.20
Linguistic Frame for semanticsthere is a linguistic frame named vp0 that
has 'is the dog Fido' as example and
defines the verb phrase VP_vp0 and
has the sequence
( the copula BE_vp0 , and the noun phrase OBJ_vp0 )
as syntactic pattern and
is predicated on the thing T and
has the statement that
( the noun phrase OBJ_vp0 is predicated on the thing OBJ )
and
( the thing T is the same as the thing OBJ )
as semantic statement.
the word |is| belongs to the linguistic category 'copula'.
the word |dog| is a noun.
the entity concept ce:Dog is expressed by the word |dog| and
has 'dog' as concept term.
semantics
syntaxcopula noun
phrase
verb phrase
is the dog fido
v(OBJ), dog(OBJ)..
v(T) T=OBJ,...
Analyst's Conceptual Model
Linguistic Model
We want exactly the same logic here as
in the real NL processing
SWG Strategy – Emerging Technology Services, Hursley
(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.21
Could we?
use LKB instead of the Stanford Parser?
use the ERG instead of WordNet etc?
– where does the Analysts Helper fit in?
improve our linguistic model to take account of LKB semantic theory?
represent MRS in CE?
represent linguistic rules in CE?