webgrid: knowledge modeling and inference...

WebGrid: Knowledge Modeling and Inferencethrough the World Wide Web

Brian R. Gaines and Mildred L. G. ShawKnowledge Science Institute

University of CalgaryAlberta, Canada T2N 1N4

{mildred, gaines}@cpsc.ucalgary.ca

Abstract: WebGrid is a knowledge acquisition and inference server on the World Wide Webthat uses an extended repertory grid system for knowledge acquisition, inductive inference forknowledge modeling, and an integrated knowledge-based system shell for inference. Thisdemonstration shows WebGrid modeling a standard dataset for the NASA autolander problemwhich illustrates the system’s capability for open-class reasoning with incompletely specifiedcases.

1 IntroductionA description of WebGrid, associated applications and example applications, can be found in ourassociated paper (Gaines and Shaw, 1996). WebGrid is a port of our KSS0/RepGrid knowledgeacquisition tools to operate as a server on the World Wide Web, allowing a web client on anyplatform world-wide to be used for knowledge modeling and inference. The system is interestingfor a number of reasons:• It provides widely available access to knowledge-based system development tools• It is open in its architecture and designed to support integration with other systems• The repertory grid technology is extended to support data types other than rating scales, such

as categorical data, integers, floats and dates• The inductive modeling methodology can generate rules, rules with exceptions, factored rules

with exceptions (EDAG’s) and ripple-down rules• The performance engine is integrated so that test cases may be checked and, if appropriate,

corrected and posted into the dataset to change the model• The acquisition, modeling and inference tools are designed to reason correctly with open data

having don’t care or unknown values

This articles demonstrates some of these features using a dataset that has been widely analyzedin the literature.

2 The NASA Autolander ProblemMichie (1989) has used as an example of the successful application of machine learning, thedevelopment of a program developed to advise the pilot of a space shuttle about the advisabilityof using its autolander system. He reports that an attempt to develop an algorithm throughconventional programming failed after several months of effort, but the use of an inductivemodeling package produced a solution very rapidly.

Figure 1 shows the dataset used for induction. It comprises 16 cases characterized by 4 binaryattributes and 2 4-valued attributes, leading to a binary decision. The 16 cases are interestingbecause they involve large numbers of “don’t care” values such that they cover all 256 possiblesituations. Michie notes that the first 15 cases were elicited in the first knowledge acquisitionphase, and the 16th case was added specifically to give such full coverage.

2

-noyesyes-yesyesyesyesyesyesyesyesyesyesyes

stab

--lxxl-ssssssmmmmmmmmmmmmmmmm

errors signwind

--------tailheadheadtailtailheadtailhead

mag

----outlightmedstrong-lightmedlightmedstrongstrong-

--------negativepositivepositivepositivepositivepositivepositivenegative

123456789

10111213141516

Case

noyesyesyesyesyesyesyesyesyesyesyesyesyesyesyes

visauto

usenotnotnotnotuseuseusenotuseuseuseusenotusenot

Figure 1 NASA autolander dataset (Michie, 1989)

Induction of a minimal decision tree leads to a tree with 15 root nodes, and induction of rulesleads to 13 rules with 38 clauses. When Induct is run on this dataset in KSSn it models it with theEDAG shown in Figure 2 which captures the essential algorithm to use the autolander unless thevisibility is yes and one of a number of exception conditions hold.

-> class = use auto

vis = yes

errors = mm

sign = negative

wind = headmag = strong

stab = no errors = lxerrors = xlmag = out

-> class = not auto

Figure 2 EDAG produced by Induct from NASA autolander dataset

A rational reconstruction of this dataset has been used to exemplify the operation of WebGrid.

3 Eliciting the Autolander Data in WebGridFigure 3 shows the initial screen of WebGrid. The HTML form requests the usual data requiredto initiate grid elicitation: user name; domain and context; terms for elements and constructs;default rating scale; data types allowed; and a list of initial elements. It also allows thesubsequent screens to be customized with an HTML specification of background and text colors,ruler line, and a header and trailer (not shown). The capability to include links to multimedia webdata is also used to allow annotation, text and pictures, to be attached to elements.

3

Figure 3 WebGrid initial screen

The knowledge engineer has entered the data noted and the names of 6 initial stereotypical casesand clicks on “Done”. WebGrid generates the triadic elicitation screen shown in Figure 4 whichasks the expert in what way one case differs from the other two. The obvious answer in this caseis that one should use the autolander in case 1 but not use it in cases 2 and 4. The expert clicks onthe radio button for the case which is different, enters its attribute and the contrasting one for theother two cases, and clicks on “Done”.

4

Figure 4 Construct elicitation from a triad

Note some features of the screen of figure 4. The horizontal rules and background color havebeen customized as specified at the bottom of Figure 4. More significantly, there are options atthe bottom of the screen to enter categorical data, name the attribute, give it a weight inclustering, a priority in data requests, and to specify whether it is an input to inductive modelingor an output to be predicted. WebGrid has generated these options because the knowledgeengineer selected the radio button “+Categories” in the initial screen of Figure 3.

If the default of “Ratings” had been left selected then none of the options would have appearedand WebGrid would act as a simple repertory grid tool. Knowledge elicitation can becommenced in this simple mode and changed to offer more data types at any time. This enablesthe elicitation process to be kept as simple as possible and more features to be added through agraceful upgrade as the expert gains confidence in the use of the tool.

When the expert clicks on “Done” WebGrid generates the screen of Figure 5 which allows all theelements to be rated on the new construct. Note that the name of the construct, the extremevalues, the rating scale, and so on, can all be changed. The system is highly non-modal, enablingerrors to be corrected and improvements made at any time.

5

Figure 5 Rating elements on a construct

Elements are rated by using popup menus as shown in Figure 6. The menu provides a naturalrating scale replacing the special rating widgets developed for KSS0/RepGrid.

Figure 6 Popup menus used for rating scale entry

6

The expert can rate an element or leave it as a “?” which is taken as a “don’t care” value in latermodeling. When the rating is complete he clicks on “Done” and WebGrid generates the mainstatus screen shown in Figure 7 which shows the elements and constructs, allowing them to beselected if required, and offers various context-dependent options.

Figure 7 Main status screen showing elements, constructs and options

4 Entering Categorical Data in WebGridTo illustrate the entering of categorical data, consider the triadic elicitation screen of Figure 8.The expert has noted that case 5 differs from the other in that the turbulence is out of rangewhereas for case 6 it is light. He also realizes that medium and strong turbulence may besignificant for other cases and decides to enter these as categories, clicking on the “Categories”radio button in the bottom part of the screen. He gives the category the name “turbulence”. Theprimary use of such naming is disambiguation when more than one construct with rating orcategorical data has similar names for its values. In the later knowledge modeling the values willbe referenced as “turbulence = light” rather than just “light”.

7

Figure 8 Entering a category

The expert has the option to enter all the category values in the list box next to “Categories”. Ifhe does not, the values entered in response to the questions will be taken as extremal and theothers as interpolated between them. Thus, in the example given, the ordering will be: “out ofrange”, “strong”, “medium”, “light”. Again the system is highly non-modal and more categoryvalues can be added later in the elicitation, or categories re-ordered, and WebGrid adjustsexisting values so that no data re-entry is required.

When the expert clicks on “Done” WebGrid generates the data entry screen shown in Figure 9where popup menus are again used to allow elements to be assigned value son constructs, butnow from a list of categories rather than a rating scale.

8

Figure 9 Entering categorical data

Figure 10 shows the upper part of the main status screen when all the cases from Figure 1 havebeen entered. Note the detailed feedback that WebGrid has generated suggesting the addition offurther elements and constructs to break matches. As shown in Figure 7, this status screen alsoprovides the capability of deleting, editing and adding more constructs and elements, annotatingelements with multimedia notes that will be displayed in the elicitation process, analysis of thedata, saving it, and so on.

9

Figure 10 Main status screen when all the data from Figure 1 has been entered

10

6 Knowledge Modeling in WebGridFigure 11 shows the output returned when the “FOCUS” button is used to sort the grid to bringsimilar elements and similar constructs together. The results of analysis are graphed, convertedto GIF format and returned to the client where they can be examined and saved if required. Notethat the grid itself is a mixture of rating and categorical data, and that the construct clusters showthat the use of the autolander is associated with visibility, stability and small errors.

Figure 11 FOCUS clustering of NASA autolander data

Figure 12 shows the output returned when the “PrinCom” button is used to provide a principalcomponents analysis of the grid by rotating it in vector space to give maximum separation ofelements in two dimensions. The results of analysis are graphed, converted to GIF format andreturned to the client where they can be examined and saved if required. On the right of thegraph it can be seen that the use of the autolander is counter-indicated when the wind is head,there is visibility, errors are large, turbulence is out of range, attitude is negative or the shuttle isunstable.

11

Figure 12 Principal components clustering of NASA autolander data

The cluster analyses provides the expert with feedback that enables him to check whether themodel being modeled appears correct but does not capture fine details and idiosyncraticexceptions. Inductive modeling provides a more precise account of logical structure thataccounts for the data. Figure 13 shows the rules returned when the “Induct” button is clicked.

When the knowledge selects a factored EDAG in the control panel at the bottom of Figure 13and clicks on “Induct” to run it again, the EDAG shown in Figure 14 is returned. This isprecisely that of Figure 2 taking into account the slightly different vocabulary used.

12

Figure 13 Induct modeling of NASA autolander data through rules

Figure 14 Induct modeling of NASA autolander data as EDAG

13

7 Integrated Inference in WebGridNew cases may be tested against the rules by clicking the “Test” button under the list of elementsin Figure 10. This results in the data entry screen shown in Figure 15 which allows the attributesof a test case to be entered and the rules used to infer a conclusion. The WebGrid inferenceengine uses open class reasoning to make correct inferences with data that has missing values.The current inference is that it is open whether to use the autolander or not.

Figure 15 Test case data entry and inference

The user enters data through the popup menus, say that there is visibility, the vehicle is unstableand the wind is medium, and clicks on “Infer”. There is enough data to produce a definiteconclusion even though some attributes are unspecified, and WebGrid returns the screen ofFigure 16 showing that it can be inferred that one should not use the autolander.

The expert can continue to adjust the test case data and run inference until he is either satisfiedthat the system is correct or he finds a case for which the inference is incorrect. He can thencorrect the conclusion and click on the “Add” button to enter the data as a new case. WhenInduct is run again it will generate a new model that takes account the additional case and, ifpossible, will then be corrected on the existing cases together with the new one. Thus, knowledgeacquisition can be integrated with performance.

14

Figure 16 Test case data entry and inference

8 ConclusionsThis demonstration has shown how WebGrid provides an interactive knowledge modelingsystem through the Internet. The main paper (Gaines and Shaw, 1996) gives more details ofrelated systems and of other capabilities in WebGrid.

AcknowledgmentsFinancial assistance for this work has been made available by the Natural Sciences andEngineering Research Council of Canada.

URLsWebGrid can be accessed at http://tiger.cpsc.ucalgary.ca/WebGrid/Related papers on WebGrid can be accessed through http://ksi.cpsc.ucalgary.ca/articles/

ReferencesGaines, B.R. and Shaw, M.L.G. (1996). A networked, open architecture knowledge management

system. Gaines, B.R. and Musen, M.A., Ed. Proceedings of Tenth Knowledge AcquisitionWorkshop.

Michie, D. (1989). Problems of computer-aided concept formation. Quinlan, J.R., Ed.Applications of Expert Systems Volume 2. pp.310-333. Sydney, Addison-Wesley.

webgrid: knowledge modeling and inference...

Documents