knowledge representation in practice: project halo and the
TRANSCRIPT
Knowledge Representation in Practice:Project Halo and the Semantic Web
SRI AI Center Seminar
Dr. Mark GreavesVulcan, Inc.
[email protected] +1 (206) 342-2276
2
KR&R Systems, Scaling, and the Google PropertyKR&R Systems, Scaling, and the Google Property
We seek KR&R systems that have the “Google Property:”they get (much) better as they get bigger
– Google’s PageRank™ yields better relevance judgments when it indexes more pages
– Current KR&R systems have the antithesis of this property
So what are the components of a scalable KR&R system?– Distributed, robust, reliable infrastructure– Multiple linked ontologies and points of view
• Single ontologies are feasible only at the program/agency level– Mixture of deep and shallow knowledge repositories– Simulations and procedural knowledge components
• “Knowing how” and “knowing that”– Embrace uncertainty, defaults, and nonmonotonicity in all
components– Uncertainty in the KB – you don’t know what you know, things go
away, contradiction is rampant, resource-aware computing (linear logic), surveying the KB is not possible
KR&R System Scale(Number of Assertions
Number of Ontologies/ContextsNumber of Rules
Linkages to other KBsReasoning Engine Types …)
Spee
d &
Quali
ty o
f Ans
wers
Ideal KR&R
KR&R now
KR&R Goals
Scalable KR&R Systems should look just like the Web!!(coupled with great question-answering technology)
3
Envisioning the Digital Aristotle for Scientific KnowledgeEnvisioning the Digital Aristotle for Scientific Knowledge
Inspired by Dickson’s Final Encyclopedia, the HAL-9000, and the broad SF vision of computing
– The “Big AI” Vision
The volume of scientific knowledge has outpaced our ability to manage it
– This volume is too great for researchers in a given domain to keep abreast of all the developments
– Research results may have cross-domain implications that are not apparent due to terminology and knowledge volume
“Shallow” information retrieval and keyword indexing systems are not well suited to scientific knowledge management because they cannot reason about the subject matter
– Example: “What are the reaction products if metallic copper is heated strongly with concentrated sulfuric acid?” (Answer: Cu2+, SO2(g), and H2O)
Response to a query should supply the answer (possibly coupled with conceptual navigation) rather than simply list 1000s of possibly relevant documents
4
How do we get to the Digital Aristotle?How do we get to the Digital Aristotle?
Vulcan’s Goals for its KR&R Research Program– Address the problem of scale in
Knowledge Bases• Scaling by web-style participation• Incorporate large numbers of
people in KB construction and maintenance
– Have high impact• Show that the Digital Aristotle is
possible• Change our experience of the
Web• Have quantifiable, explainable
metrics– Be a commercializable approach
Project Halo is one concrete program that addresses these goals
KB
Eff
ort (
cost
, peo
ple,
…)
KB size (number of assertions, complexity…)
Vulcan
Now
Future
5
In 2004, Vulcan funded a six-month effort to determine the state-of-the-art in fielded “deep reasoning” systems– Can these systems support reasoning in scientific domains?– Can they answer novel questions?– Can they produce domain appropriate answer justifications?
Three teams were selected, and used their available technology– SRI, with Boeing Phantom Works and UT-Austin– Cycorp– Ontoprise GmbH
No NLP in the Pilot
The Project Halo PilotThe Project Halo Pilot
Q A S y s te mN L PE n g lish F L E n g lish
A n s w e r&
Ju s tif ic a tio n
6
The Halo DomainThe Halo Domain
50 pages from the AP-chemistry syllabus (Stoichiometry, Reactions in aqueous solutions, Acid-Base equilibria)– Small and self contained enough to be do-able in a short period of time, but large
enough to create many novel questions– Complex “deep” combinations of rules– Standardize exam with well understood scores (AP1-AP5)– Chemistry is an exact science, more “monotonic”– No undo reliance on graphics (e.g., free-body diagrams)– Availability of experts for exam generation and grading
Example: Balance the following reactions, and indicate whether they are examples of combustion, decomposition, or combination(a) C4H10 + O2 CO2 + H2O(b) KClO3 KCl + O2(c) CH3CH2OH + O2 CO2 + H2O(d) P4 + O2 P2O5(e) N2O5 + H2O HNO3
7
Halo Pilot Evaluation ProcessHalo Pilot Evaluation Process
Evaluation– Teams were given 4 months to formulate the knowledge in 70 pages from the
AP Chemistry syllabus– Systems were sequestered and run by Vulcan against 100 novel AP-style
questions (hand coded queries)– Exams were graded by chemistry professors using AP methodology
Metrics– Coverage: The ability of the system to answer novel questions from the syllabus
• What percentage of the questions was the system capable of answering?– Justification: The ability to provide concise, domain appropriate explanations
• What percentage of the answer justifications were acceptable to domain evaluators?
– Query encoding: The ability to faithfully represent queries– Brittleness: What were the major causes of failure? How can these be
remedied?
8
Halo Pilot ResultsHalo Pilot Results
Challenge Answer Scores
0.00
10.00
20.00
30.00
40.00
50.00
60.00
SME1 SME2 SME3
Sco
res
(%)
CYCORPONTOPRISESRI
Challenge Justification Scores
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
SME1 SME2 SME3
Scor
es (%
)CYCORPONTOPRISESRI
Best scoring system scored the equivalent of an AP3 (on our restricted syllabus)
Cyc had issues with answer justification and question focus
Full Details in AI Magazine 25:4, “Project Halo: Towards a Digital Aristotle”
9
From the Halo Pilot to Halo 2From the Halo Pilot to Halo 2
Halo Pilot Results– Much better than expected results on a very tough evaluation– Most failures attributed to modeling errors due to contractors’ lack of domain
knowledge– Expensive: O($10,000) per page, per team
Halo 2 Goal: To determine whether document-rooted tools can be built to facilitate robust knowledge formulation, query and evaluation by domain experts, with ever-decreasing reliance on knowledge engineers– Halo 2 Questions:
• Can SMEs build robust question-answering systems that demonstrate excellent coverage of a given syllabus, the ability to answer novel questions, and produce readable domain appropriate justifications using reasonable computational resources?
• Will SMEs be capable of posing questions and complex problems to these systems?
• Do these systems address key failure, scalability and cost issues encountered in the Pilot?
10
Halo 2 Technical SubgoalsHalo 2 Technical Subgoals
Goal 1: Creation of KR technology that is expressive enough to capture AP knowledge in an expanded syllabus, and sufficiently tractable to automate synthesis and question answering– Representation: conceptual knowledge, tables, equations, processes– Reasoning: matching, qualitative reasoning, equational reasoning, problem setup
reasoning, chaining– Integration of multiple KR approaches
Goal 2: Creation of a technology to enable scientists to build, extend, and maintain the knowledge bases of the Digital Aristotle– Piecewise-entered knowledge must be maintained in a life cycle of entry,
provenance, validation, synthesis with others, modification, use, expansion, and deletion
– A systems approach to AI
Goal 3: A user interface that is so intuitive it requires (almost) no training– Covers knowledge lifecycle, problem setup, terminology mapping, question dialogs– Subject matter experts lack knowledge of the KB terminology (Bell Labs 20% rule)– Real NLP in the system
11
Halo 2 ProgrammaticsHalo 2 Programmatics
Scope: Selected portions of the AP syllabi for chemistry, biology and physics– This allows us to expand the types of reasoning addressed by Halo
Two competing teams/approaches (F-Logic, Concept Maps/KM)
30-month, 3-stage effort focused on building and evaluating SME tools– Currently in month 24, will probably expand to 36 months
12
Team SRI Halo 2 Intermediate EvaluationTeam SRI Halo 2 Intermediate Evaluation
Professional KE KBsNo natural language~$10K per syllabus page
No other system has EVER achieved this performance level with SME-entered knowledge
Percentage correctDomain Number of
questions SME1 SME2 Avg KE
Biol 146 52% 24% 38% 51%
40%
21%
Chem 86 42% 33% 37.5%
Phy 131 16% 22% 19% 47%Ontoprise
44%SRI
37%Cycorp
Percent correct
Halo Pilot System
Time for KF– Concept: ~20 mins for all SMEs– Equation: ~70 s (Chem) to ~120
sec (Physics)– Table: ~10 mins (Chem)– Reaction: ~3.5 mins (Chem)– Constraint: 14s Bio; 88s (Chem)
SME need for help– 68 requests over 480 person
hours (33%/55%/12%) = 1/day
Science grad student KBsExtensive natural lang~$100 per syllabus page
Knowledge Formulation
VS.
Avg time for SME to formulate a question
– 2.5 min (Bio)– 4 min (Chem)– 6 min (Physics)– Avg 6 reformulation attempts
Usability– SMEs requested no significant help – Pipelined errors dominated failure
analysis
Question FormulationBiology: 90% answer < 10 secChem: 60% answer < 10 secPhysics: 45% answer < 10 sec
System Responsiveness
14s / 252s34s / 429sPhy
7s / 485s7s / 493sChem
1s / 569s3s / 601sBio
Answer(Median/Max)
Interpretation(Median/Max)
13
Team Ontoprise Halo 2 Intermediate EvaluationTeam Ontoprise Halo 2 Intermediate Evaluation
Question Accuracy DMS Answer Explanation
1 2 3 M 1 2 3 M 1 2 3 M
Chem SME18
- 1 6 11 1 - 7 10 - - 1 17
Phy KE 6 2 4 42 6 48 4 1 4954
Phy SME 26 8 20 - 21 8 25 20 1 7 26
Bio KE - 7 4 3 5 2 4 3 - 2 6 6
Bio SME - 6 4 4 - 2 4 8 - - - 14
Chem KE - 1 - 17 1 - - 17 - - - 18
14
Total Questions
Median/Max Times (seconds)
149 help threads (Phy: Equations; Bio: Processes; Chem: Rules)
Knowledge Formulation
Reasoning engine times not measured due to logging bugSME feedback and video analysis shows substantial (10min) waits
System Responsiveness
3s / 47s5s / 56sPhysics
3s / 934s7s / 69sChemistry
2s / 1460s5s / 3240sBiology
Test / Debug(Median/Max)
WYSIWYM(Median/Max)
Rule
Relation
Attribute
Instance
Concept
124 / 2517
95 / 1280
64 / 636
18 / 1312
34 / 2774
Bio
372 1268877 / 3369
39 / 162821 / 1442
84 / 173431 / 1289
143 / 97821 / 495
92 / 190610 / 844
PhysicsChem
Difficulty of QF SMEswith NLPQF was halted for 3 days for patches and new trainingSpeed issues severely limited QF SMEsiterationsSMEs still exhibited very poor performance on the AP questions
Question Formulation
14
Halo 2 Refinement Stage ProgrammaticsHalo 2 Refinement Stage Programmatics
Improvements and Redesigns–Fix Aura’s Class I and Class II software bugs
and desired enhancements–Extensive UI and language improvements– Increase performance by a factor of 5-10
Specific brittleness areas –Process representation–Multiple precision representation–Process/Event Questions and Reasoning–Concept Inheritance and Identification–Knowledge revision and Truth Maintenance
Better More authors and wiki editing for quality
Faster Halo users will be able to importknowledge rather than hand-creation of each instance
Cheaper Simple knowledge is created byunpaid wiki authors
Tasks–Build semantic wikis for knowledge acquisition–Use RDF/OWL to translate into Aura, and
OntoMap to create concept mappings in Aura–Integrate wikis with explanation generation
Team SRI – Refining Aura Team Ontoprise – Semantic Wikis for Science
Refining and Hardening Aura (Rich Question Answering)Semantic Wikis for Science (Scaling up Participation)Digital Aristotle Construction (Scaling up the KB)
Halo Acceleration
15
Halo Refinement Stage Core GoalsHalo Refinement Stage Core Goals
Demonstrate a 75% score for correctness and explanation on the intermediate evaluation questions– Current scores range from 16% to 52%
Median number of SME question reformulation attempts will be 5 or less (end-to-end)– Current numbers are 5 (Chem); 7 (Physics); and 1 (Bio, constrained by limited Q
types)
Performance– Complete 75% of the knowledge formulation operations in 5 sec or less– For 75% of the final evaluation questions, the mean response time for
interpreting a question and answering a question will be less than 10 sec. – For 90% of the questions, the mean system response time for answering the
question will be less than 1 minute
16
Digital Aristotle ConstructionDigital Aristotle Construction
Contracted KB construction – Not looking for Java coders, but for SMEs/knowledge engineering
hybrids– Aura tested at IJCAI with IIIT-Hyderabad students– Soliciting proposals:
• IIIT-Hyderabad– Faculty will guide students, Raj Reddy is a trustee
and is supportive– Students will work for us 4 hours/day at
low cost– Could be 100s of students
• Tata Consulting Services iLabs– Traditional Indian consulting/outsourcing company
• A couple of others– Next steps
• Gather bids and select a performer• Pilot with the implementation phase syllabus (~160 hours); compare to
reference and US results
17
Refining Halo 2: the Semantic WebRefining Halo 2: the Semantic Web
Halo 2’s knowledge acquisition design is classic AI– Halo systems (SRI, Ontoprise) are logically self-contained– Knowledge acquisition use cases are single-author expert systems
But, Vulcan’s goal is the Digital Aristotle– Large knowledge bases in support of human inquiry
• Scale beyond single authors– Social issues surrounding real KR&R systems
• Disciplinary approval of KB• Non-formal annotations of KB material (historical material, examples, different
pedagogical approaches)• Transparency of motivation for KB modeling choices
Programmatic changes in Halo– Expand knowledge acquisition approach
• RDF/OWL import and export (for DL-expressible fragments)• Use Semantic Wikis (specifically, AIFB’s Semantic MediaWiki)• Basic support for collaboration
– Leverage European research vigor
19
Halo 2 Way ForwardHalo 2 Way Forward
We haven’t made our problem easier...– Alignment of networked ontology and instance information between Semantic
Wiki(s) and Halo KR tools• Leverage Ontoprise’s participation in the NeOn Program (EC IST Framework
Program 6; ~€15M, ~15 partners, kickoff March 2006)– Explanation generation from multiple sources
So, we will build a prototype to investigate the initial problem– Use our three domains– Target average 30% reuse (use of wiki-entered facts) in knowledge– Incorporates Semantic MediaWiki for knowledge capture by multiple authors – Incorporates combination of different knowledge sources, by mapping between
the Wiki content to the Aura KB– Design a little, build a little, test a little, and repeat– Evaluation is being designed