from words to knowledge orion active structure. orion active structure two approaches we could...
TRANSCRIPT
From Words to Knowledge
ORIONActive Structure
ORIONActive Structure
Two Approaches
We could separate the process of turning words into knowledge into its components, or
we could adopt a more holistic approach.
ORIONActive Structure
A Sequence of Activities
This approach segments the process into separate parts, each of which is blind to all the others. This seems easier conceptually, but is obviously not what people do in reading text.
POS Tags
Words
Grammar
Semantics
Knowledge
ORIONActive Structure
The Holistic Approach
The lexing, grammar, semantic and structure-building processes proceed simultaneously and synergistically, opportunistically using any information coming from any direction
ActiveStructure
LocalKnowledge
GlobalKnowledge
Sem antics Gram m ar
Words
ORIONActive Structure
The Basic Elements
These are the basic elements of Active Structure - variables, operators, links, values flowing in the structure
ORIONActive Structure
A Common Substrate
PAR SE
The basic elements of Active Structure can also be seen as
Entities, Relations and States
These three elements are adequate to model everything - including the grammar of language and the world of
objects
ORIONActive Structure
The Reading Process
A document is read, paragraph by paragraph, sentence by sentence, word by word.
As the words are read, they are turned into objects that can be manipulated - objects that have the properties both of words and of the objects they represent - a ligand, a gene
The word objects are assembled through grammar into larger objects - receptor or gene structure
And into larger structures, using the relations between the objects provided by nouns and verbs
ORIONActive Structure
Transformationchanges in the conformation of the Tsr dimer induced by
serine binding improve methylation efficiency
ORIONActive Structure
Building Structure
When a single possible structure match is found, an invocation of the structure is built, leaving a new BRIDGE operator to look for higher level matches
The
Four WordNoun Phrase
Red Car StopStart
ORIONActive Structure
Next Symbol
W as O n O fH ead M an StopH atThe
N ounPhrase
VerbPhrase
N ounPhrase
N ounPhrase
PrepPhrase
Preposition
Preposition
Start
The Next Symbol depends on the local structure - run down from the current symbol, then run up again if other structure exists, otherwise jump across a PARSE
ORIONActive Structure
Harpooning the ModelWhen the noun phrase is recognised, the objects it joins are searched for connection - one is found for animal and colour through ATTRIBUTE, so the same relation joining the objects is searched for in the model, and a unique match is “harpooned” for use with relations - the type of object changes the grammar
ORIONActive Structure
Automatic Phasing
A BRIDGE operator doing a long match may find not all the information is available
The
Four WordNoun Phrase
Red Car StopStart
If so, it puts a connection on the missing information and waits to be re-activated
ORIONActive Structure
In the Process of Building
Part of a sentence under construction - hundreds of different active structures are cooperating in the process - building up, cutting out, reversing connections
ORIONActive Structure
Tight Integration
The structure combines lexical information, grammar and semantics - we pick up the fact that a word is a noun because it is an Entity, we know something isn’t a Material because the Verb says not.
This tight and immediate interweaving of lexical, grammatical and semantic analysis allows us to do things that are not possible with a static sequential approach.
ORIONActive Structure
Scientific Sentences Are Complex
The synergistic effect of serine and CheW binding to Tsr is attributed to distinct influences on receptor structure; changes in the conformation of the Tsr dimer induced by serine binding improve methylation efficiency, and CheW binding changes the arrangement among Tsr dimers, which increases access to methylation sites.
ORIONActive Structure
Grammar Is Not Enough
Grammar alone would turn meaningful scientific text into sludge - a participial phrase “induced by...” has to be anchored on the right object, a relative pronoun “which” has to be anchored on the relation
The reading process demands that domain knowledge be available at every turn - knowledge that is held in object hierarchies and relations, and which is seamlessly intermingled with grammatical knowledge during the parsing
ORIONActive Structure
What Does It Rely On
The paradigm relies on dynamic construction and destruction of active structure, where operators in the structure respond to their local environment by changing the local topology, and then respond to the changed environment, and so on. Each operator can only transmit information through its links, change its connections, add structure or destroy itself.Their interaction suffices to cause all the necessary processes to proceed in parallel, in an opportunistic and synergistic manner.
ORIONActive Structure
Typical Domain Knowledge Model
Attenuation
Greece Info(GIS) Intensity/
Damage
Acceleration attenuation based on magnitude, distance and local site conditions
Find distance between site and epicentre, local conditions, etc.
Relations between acceleration, intensity and damage ratio
EarthquakeEvent
Frequency/Amplification
Relations between magnitude and frequency, building type, number of floors and natural frequency
The model is built out of the same variables, operators, links as the grammatical and semantic structures, so it can interact with them
ORIONActive Structure
Genetic Knowledge
ABCA1: ATP-binding cassette, sub-family A (ABC1), member 1
LocusID: 19
Overview ?
The membrane-associated protein encoded by this gene is a member of thesuperfamily of ATP-binding cassette (ABC) transporters. ABC proteins transportvarious molecules across extra- and intracellular membranes. ABC genes aredivided into seven distinct subfamilies (ABC1, MDR/TAP, MRP, ALD, OABP,GCN20, White). This protein is a member of the ABC1 subfamily. With cholesterolas its substrate, this protein functions as a cholesteral efflux pump in the cellularlipid removal pathway. Mutations in this gene have been associated with Tangier'sdisease and familial high-density lipoprotein deficiency.
Family ABC (transporter across membranes)Subfamily ABC1 (members ABC1, MDR/TAP, MRP, ALD, OABP, GCN20, White)Gene ABCA1Protein NP_005493Substrate CholesterolFunction cholesterol efflux pump associated lipid removal pathwayMutation causes Tangier’s disease, familial high-density lipoprotein deficiency.
Chromosome: 9 mv Cytogenetic: 9q31.1 RefSeq
Genes
Proteins
Diseases
Anatomy
N P_0001 N P_0047
Brain
Brain
Liver
LiverEye
Kidney
Cortex
FFPFICABC HKT
PKC
The structure is used to understand the text - then the text is used to extend the structure
ORIONActive Structure
Why Do This
The automated process of Information Extraction needs to be in the same state as a knowledgeable human reader at every point in the text, so inferences about
alternatives and anaphora are made on the same basis - the basis on which the writer expects them to be
made.
The automated process also needs the ability to backtrack when reading more text refutes assumptions
already built into any part of the structure.
ORIONActive Structure
Is It Really So Different
We are asserting that knowledge can only be captured in active structure - structure that is capable of adapting itself to its environment.
Efforts at capturing knowledge in static structure founder on two reefs - the pieces of structure will not fit together statically, and an algorithm that could manage their combination would be more complex than the combination of the pieces, and is thus unmanageable.
Active Structure avoids both problems - the pieces adapt to each other, and the behavior of the combination is managed by the interaction of the pieces.
ORIONActive Structure