knowledge representation in practice: project halo and the

Knowledge Representation in Practice:Project Halo and the Semantic Web

SRI AI Center Seminar

Dr. Mark GreavesVulcan, Inc.

[email protected] +1 (206) 342-2276

mailto:[email protected]

2

KR&R Systems, Scaling, and the Google PropertyKR&R Systems, Scaling, and the Google Property

We seek KR&R systems that have the “Google Property:”they get (much) better as they get bigger

– Google’s PageRank™ yields better relevance judgments when it indexes more pages

– Current KR&R systems have the antithesis of this property

So what are the components of a scalable KR&R system?– Distributed, robust, reliable infrastructure– Multiple linked ontologies and points of view

• Single ontologies are feasible only at the program/agency level– Mixture of deep and shallow knowledge repositories– Simulations and procedural knowledge components

• “Knowing how” and “knowing that”– Embrace uncertainty, defaults, and nonmonotonicity in all

components– Uncertainty in the KB – you don’t know what you know, things go

away, contradiction is rampant, resource-aware computing (linear logic), surveying the KB is not possible

KR&R System Scale(Number of Assertions

Number of Ontologies/ContextsNumber of Rules

Linkages to other KBsReasoning Engine Types …)

Spee

d &

Quali

ty o

f Ans

wers

Ideal KR&R

KR&R now

KR&R Goals

Scalable KR&R Systems should look just like the Web!!(coupled with great question-answering technology)

3

Envisioning the Digital Aristotle for Scientific KnowledgeEnvisioning the Digital Aristotle for Scientific Knowledge

Inspired by Dickson’s Final Encyclopedia, the HAL-9000, and the broad SF vision of computing

– The “Big AI” Vision

The volume of scientific knowledge has outpaced our ability to manage it

– This volume is too great for researchers in a given domain to keep abreast of all the developments

– Research results may have cross-domain implications that are not apparent due to terminology and knowledge volume

“Shallow” information retrieval and keyword indexing systems are not well suited to scientific knowledge management because they cannot reason about the subject matter

– Example: “What are the reaction products if metallic copper is heated strongly with concentrated sulfuric acid?” (Answer: Cu2+, SO2(g), and H2O)

Response to a query should supply the answer (possibly coupled with conceptual navigation) rather than simply list 1000s of possibly relevant documents

4

How do we get to the Digital Aristotle?How do we get to the Digital Aristotle?

Vulcan’s Goals for its KR&R Research Program– Address the problem of scale in

Knowledge Bases• Scaling by web-style participation• Incorporate large numbers of

people in KB construction and maintenance

– Have high impact• Show that the Digital Aristotle is

possible• Change our experience of the

Web• Have quantifiable, explainable

metrics– Be a commercializable approach

Project Halo is one concrete program that addresses these goals

KB

Eff

ort (

cost

, peo

ple,

…)

KB size (number of assertions, complexity…)

Vulcan

Now

Future

5

In 2004, Vulcan funded a six-month effort to determine the state-of-the-art in fielded “deep reasoning” systems– Can these systems support reasoning in scientific domains?– Can they answer novel questions?– Can they produce domain appropriate answer justifications?

Three teams were selected, and used their available technology– SRI, with Boeing Phantom Works and UT-Austin– Cycorp– Ontoprise GmbH

No NLP in the Pilot

The Project Halo PilotThe Project Halo Pilot

Q A S y s te mN L PE n g lish F L E n g lish

A n s w e r&

Ju s tif ic a tio n

6

The Halo DomainThe Halo Domain

50 pages from the AP-chemistry syllabus (Stoichiometry, Reactions in aqueous solutions, Acid-Base equilibria)– Small and self contained enough to be do-able in a short period of time, but large

enough to create many novel questions– Complex “deep” combinations of rules– Standardize exam with well understood scores (AP1-AP5)– Chemistry is an exact science, more “monotonic”– No undo reliance on graphics (e.g., free-body diagrams)– Availability of experts for exam generation and grading

Example: Balance the following reactions, and indicate whether they are examples of combustion, decomposition, or combination(a) C4H10 + O2 CO2 + H2O(b) KClO3 KCl + O2(c) CH3CH2OH + O2 CO2 + H2O(d) P4 + O2 P2O5(e) N2O5 + H2O HNO3

7

Halo Pilot Evaluation ProcessHalo Pilot Evaluation Process

Evaluation– Teams were given 4 months to formulate the knowledge in 70 pages from the

AP Chemistry syllabus– Systems were sequestered and run by Vulcan against 100 novel AP-style

questions (hand coded queries)– Exams were graded by chemistry professors using AP methodology

Metrics– Coverage: The ability of the system to answer novel questions from the syllabus

• What percentage of the questions was the system capable of answering?– Justification: The ability to provide concise, domain appropriate explanations

• What percentage of the answer justifications were acceptable to domain evaluators?

– Query encoding: The ability to faithfully represent queries– Brittleness: What were the major causes of failure? How can these be

remedied?

8

Halo Pilot ResultsHalo Pilot Results

Challenge Answer Scores

0.00

10.00

20.00

30.00

40.00

50.00

60.00

SME1 SME2 SME3

Sco

res

(%)

CYCORPONTOPRISESRI

Challenge Justification Scores

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

40.00

45.00

SME1 SME2 SME3

Scor

es (%

)CYCORPONTOPRISESRI

Best scoring system scored the equivalent of an AP3 (on our restricted syllabus)

Cyc had issues with answer justification and question focus

Full Details in AI Magazine 25:4, “Project Halo: Towards a Digital Aristotle”

9

From the Halo Pilot to Halo 2From the Halo Pilot to Halo 2

Halo Pilot Results– Much better than expected results on a very tough evaluation– Most failures attributed to modeling errors due to contractors’ lack of domain

knowledge– Expensive: O($10,000) per page, per team

Halo 2 Goal: To determine whether document-rooted tools can be built to facilitate robust knowledge formulation, query and evaluation by domain experts, with ever-decreasing reliance on knowledge engineers– Halo 2 Questions:

• Can SMEs build robust question-answering systems that demonstrate excellent coverage of a given syllabus, the ability to answer novel questions, and produce readable domain appropriate justifications using reasonable computational resources?

• Will SMEs be capable of posing questions and complex problems to these systems?

• Do these systems address key failure, scalability and cost issues encountered in the Pilot?

10

Halo 2 Technical SubgoalsHalo 2 Technical Subgoals

Goal 1: Creation of KR technology that is expressive enough to capture AP knowledge in an expanded syllabus, and sufficiently tractable to automate synthesis and question answering– Representation: conceptual knowledge, tables, equations, processes– Reasoning: matching, qualitative reasoning, equational reasoning, problem setup

reasoning, chaining– Integration of multiple KR approaches

Goal 2: Creation of a technology to enable scientists to build, extend, and maintain the knowledge bases of the Digital Aristotle– Piecewise-entered knowledge must be maintained in a life cycle of entry,

provenance, validation, synthesis with others, modification, use, expansion, and deletion

– A systems approach to AI

Goal 3: A user interface that is so intuitive it requires (almost) no training– Covers knowledge lifecycle, problem setup, terminology mapping, question dialogs– Subject matter experts lack knowledge of the KB terminology (Bell Labs 20% rule)– Real NLP in the system

11

Halo 2 ProgrammaticsHalo 2 Programmatics

Scope: Selected portions of the AP syllabi for chemistry, biology and physics– This allows us to expand the types of reasoning addressed by Halo

Two competing teams/approaches (F-Logic, Concept Maps/KM)

30-month, 3-stage effort focused on building and evaluating SME tools– Currently in month 24, will probably expand to 36 months

12

Team SRI Halo 2 Intermediate EvaluationTeam SRI Halo 2 Intermediate Evaluation

Professional KE KBsNo natural language~$10K per syllabus page

No other system has EVER achieved this performance level with SME-entered knowledge

Percentage correctDomain Number of

questions SME1 SME2 Avg KE

Biol 146 52% 24% 38% 51%

40%

21%

Chem 86 42% 33% 37.5%

Phy 131 16% 22% 19% 47%Ontoprise

44%SRI

37%Cycorp

Percent correct

Halo Pilot System

Time for KF– Concept: ~20 mins for all SMEs– Equation: ~70 s (Chem) to ~120

sec (Physics)– Table: ~10 mins (Chem)– Reaction: ~3.5 mins (Chem)– Constraint: 14s Bio; 88s (Chem)

SME need for help– 68 requests over 480 person

hours (33%/55%/12%) = 1/day

Science grad student KBsExtensive natural lang~$100 per syllabus page

Knowledge Formulation

VS.

Avg time for SME to formulate a question

– 2.5 min (Bio)– 4 min (Chem)– 6 min (Physics)– Avg 6 reformulation attempts

Usability– SMEs requested no significant help – Pipelined errors dominated failure

analysis

Question FormulationBiology: 90% answer < 10 secChem: 60% answer < 10 secPhysics: 45% answer < 10 sec

System Responsiveness

14s / 252s34s / 429sPhy

7s / 485s7s / 493sChem

1s / 569s3s / 601sBio

Answer(Median/Max)

Interpretation(Median/Max)

13

Team Ontoprise Halo 2 Intermediate EvaluationTeam Ontoprise Halo 2 Intermediate Evaluation

Question Accuracy DMS Answer Explanation

1 2 3 M 1 2 3 M 1 2 3 M

Chem SME18

- 1 6 11 1 - 7 10 - - 1 17

Phy KE 6 2 4 42 6 48 4 1 4954

Phy SME 26 8 20 - 21 8 25 20 1 7 26

Bio KE - 7 4 3 5 2 4 3 - 2 6 6

Bio SME - 6 4 4 - 2 4 8 - - - 14

Chem KE - 1 - 17 1 - - 17 - - - 18

14

Total Questions

Median/Max Times (seconds)

149 help threads (Phy: Equations; Bio: Processes; Chem: Rules)

Knowledge Formulation

Reasoning engine times not measured due to logging bugSME feedback and video analysis shows substantial (10min) waits

System Responsiveness

3s / 47s5s / 56sPhysics

3s / 934s7s / 69sChemistry

2s / 1460s5s / 3240sBiology

Test / Debug(Median/Max)

WYSIWYM(Median/Max)

Rule

Relation

Attribute

Instance

Concept

124 / 2517

95 / 1280

64 / 636

18 / 1312

34 / 2774

Bio

372 1268877 / 3369

39 / 162821 / 1442

84 / 173431 / 1289

143 / 97821 / 495

92 / 190610 / 844

PhysicsChem

Difficulty of QF SMEswith NLPQF was halted for 3 days for patches and new trainingSpeed issues severely limited QF SMEsiterationsSMEs still exhibited very poor performance on the AP questions

Question Formulation

14

Halo 2 Refinement Stage ProgrammaticsHalo 2 Refinement Stage Programmatics

Improvements and Redesigns–Fix Aura’s Class I and Class II software bugs

and desired enhancements–Extensive UI and language improvements– Increase performance by a factor of 5-10

Specific brittleness areas –Process representation–Multiple precision representation–Process/Event Questions and Reasoning–Concept Inheritance and Identification–Knowledge revision and Truth Maintenance

Better More authors and wiki editing for quality

Faster Halo users will be able to importknowledge rather than hand-creation of each instance

Cheaper Simple knowledge is created byunpaid wiki authors

Tasks–Build semantic wikis for knowledge acquisition–Use RDF/OWL to translate into Aura, and

OntoMap to create concept mappings in Aura–Integrate wikis with explanation generation

Team SRI – Refining Aura Team Ontoprise – Semantic Wikis for Science

Refining and Hardening Aura (Rich Question Answering)Semantic Wikis for Science (Scaling up Participation)Digital Aristotle Construction (Scaling up the KB)

Halo Acceleration

15

Halo Refinement Stage Core GoalsHalo Refinement Stage Core Goals

Demonstrate a 75% score for correctness and explanation on the intermediate evaluation questions– Current scores range from 16% to 52%

Median number of SME question reformulation attempts will be 5 or less (end-to-end)– Current numbers are 5 (Chem); 7 (Physics); and 1 (Bio, constrained by limited Q

types)

Performance– Complete 75% of the knowledge formulation operations in 5 sec or less– For 75% of the final evaluation questions, the mean response time for

interpreting a question and answering a question will be less than 10 sec. – For 90% of the questions, the mean system response time for answering the

question will be less than 1 minute

16

Digital Aristotle ConstructionDigital Aristotle Construction

Contracted KB construction – Not looking for Java coders, but for SMEs/knowledge engineering

hybrids– Aura tested at IJCAI with IIIT-Hyderabad students– Soliciting proposals:

• IIIT-Hyderabad– Faculty will guide students, Raj Reddy is a trustee

and is supportive– Students will work for us 4 hours/day at

low cost– Could be 100s of students

• Tata Consulting Services iLabs– Traditional Indian consulting/outsourcing company

• A couple of others– Next steps

• Gather bids and select a performer• Pilot with the implementation phase syllabus (~160 hours); compare to

reference and US results

17

Refining Halo 2: the Semantic WebRefining Halo 2: the Semantic Web

Halo 2’s knowledge acquisition design is classic AI– Halo systems (SRI, Ontoprise) are logically self-contained– Knowledge acquisition use cases are single-author expert systems

But, Vulcan’s goal is the Digital Aristotle– Large knowledge bases in support of human inquiry

• Scale beyond single authors– Social issues surrounding real KR&R systems

• Disciplinary approval of KB• Non-formal annotations of KB material (historical material, examples, different

pedagogical approaches)• Transparency of motivation for KB modeling choices

Programmatic changes in Halo– Expand knowledge acquisition approach

• RDF/OWL import and export (for DL-expressible fragments)• Use Semantic Wikis (specifically, AIFB’s Semantic MediaWiki)• Basic support for collaboration

– Leverage European research vigor

18

A Semantic MediaWiki for ChemistryA Semantic MediaWiki for Chemistry

19

Halo 2 Way ForwardHalo 2 Way Forward

We haven’t made our problem easier...– Alignment of networked ontology and instance information between Semantic

Wiki(s) and Halo KR tools• Leverage Ontoprise’s participation in the NeOn Program (EC IST Framework

Program 6; ~€15M, ~15 partners, kickoff March 2006)– Explanation generation from multiple sources

So, we will build a prototype to investigate the initial problem– Use our three domains– Target average 30% reuse (use of wiki-entered facts) in knowledge– Incorporates Semantic MediaWiki for knowledge capture by multiple authors – Incorporates combination of different knowledge sources, by mapping between

the Wiki content to the Aura KB– Design a little, build a little, test a little, and repeat– Evaluation is being designed

20

Thank You

knowledge representation in practice: project halo and the

Documents