knowledge in learning copyright, 1996 © dale carnegie & associates, inc. chapter 19 spring 2004

Knowledge in Learning

Copyright, 1996 © Dale Carnegie & Associates, Inc.

Chapter 19

Spring 2004

CS 471/598 by H. Liu 2

A logical formulation of learning

What’re Goal and Hypotheses Goal predicate Q - WillWait

Learning is to find an equivalent logical expression we can classify examplesEach hypothesis proposes such an expression - a candidate definition of Q r WillWait(r) Pat(r,Some)

Pat(r,Full) Hungry(r)Type(r,French) …

CS 471/598 by H. Liu 3

Hypothesis space is the set of all hypotheses the learning algorithm is designed to entertain.One of the hypotheses is correct:H1 V H2 V…V Hn

Each Hi predicts a certain set of examples - the extension of the goal predicate.Two hypotheses with different extensions are logically inconsistent with each other, otherwise, they are logically equivalent.

CS 471/598 by H. Liu 4

What are ExamplesAn example is an object of some logical description to which the goal concept may or may not apply. Alt(X1)^!Bar(X1)^!Fri/Sat(X1)^…

Ideally, we want to find a hypothesis that agrees with all the examples.The relation between f and h are: ++, --, +- (false negative), -+ (false positive). If the last two occur, example I and h are logically inconsistent.

CS 471/598 by H. Liu 5

Current-best hypothesis search

Maintain a single hypothesisAdjust it as new examples arrive to maintain consistency (Fig 19.1) Generalization for positive examples Specialization for negative examplesAlgorithm (Fig 19.2, page 681) Need to check for consistency with all

existing examples each time taking a new example

CS 471/598 by H. Liu 6

Example of WillWaitFig 18.3

Problems: nondeterministic, no guarantee for simplest and correct h, need backtrack

CS 471/598 by H. Liu 7

Least-commitment searchKeeping one h as its best guess is the problem -> Can we keep as many as possible?Version space (candidate elimination) Algo incremental least-commitment

From intervals to boundary sets G-set and S-set Everything between is guaranteed to be

consistent wit examples.

CS 471/598 by H. Liu 8

Version spaceGeneralization and specialization (Fig 19.4)

False positive for Si, too general, discard it False negative for Si, too specific, generalize it minimally False positive for Gi, too general, specialize it minimally False negative for Gi, too specific, discard it

When to stop One concept left (Si = Gi) The version space collapses Run out of examples

One major problem: can’t handle noise

CS 471/598 by H. Liu 9

Using prior knowledgeFor DT and logical description learning, we assume no prior knowledgeWe do have some prior knowledge, so how can we use it?We need a logical formulation as opposed to the function learning.

CS 471/598 by H. Liu 10

Inductive learning in the logical setting

The objective is to find a hypothesis that explains the classifications of the examples, given their descriptions.Hypothesis ^ Description |= Classifications Descriptions - the conjunction of all the

example descriptions Classifications - the conjunction of all the

example classifications

CS 471/598 by H. Liu 11

A cumulative learning process

Fig 19.6 (p 687)The new approach is to design agents that already know something and are trying to learning some more.Intuitively, this should be faster and better than without using knowledge, assuming what’s known is always correct.How to implement this cumulative learning with increasing knowledge?

CS 471/598 by H. Liu 12

Some examples of using knowledge

One can leap to general conclusions after only one observation. Your such experience?

Traveling to Brazil: Language and name ?

A pharmacologically ignorant but diagnostically sophisticated medical student … ?

CS 471/598 by H. Liu 13

Some general schemes

Explanation-based learning (EBL) Hypothesis^Description |= Classifications Background |= Hypothesis

doesn’t learn anything factually new from instance

Relevance-based learning (RBL) Hypothesis^Descriptions |= Classifications Background^Descrip’s^Class |= Hypothesis

deductive in nature

Knowledge-based inductive learning (KBIL) Background^Hypothesis^Descrip’s |=

Classifications

CS 471/598 by H. Liu 14

Inductive logical programming (ILP)

ILP can formulate hypotheses in general first-order logic Others like DT are more restricted

languages

Prior knowledge is used to reduce the complexity of learning: prior knowledge further reduces the H space prior knowledge helps find the shorter H Again, assuming prior knowledge is correct

CS 471/598 by H. Liu 15

Explanation-based learning

A method to extract general rules from individual observationsThe goal is to solve a similar problem faster next time.Memoization - speed up by saving results and avoiding solving a problem from scratchEBL does it one step further - from observations to rules

CS 471/598 by H. Liu 16

Why EBL?Explaining why something is a good idea is much easier than coming up with the idea.Once something is understood, it can be generalized and reused in other circumstances.

Extracting general rules from examplesEBL constructs two proof trees simultaneously by variablization of the constants in the first treeAn example (Fig 19.7)

CS 471/598 by H. Liu 17

Basic EBLGiven an example, construct a proof tree using the background knowledge In parallel, construct a generalized proof tree for the variabilized goalConstruct a new rule (leaves => the root)Drop any conditions that are true regardless of the variables in the goal

CS 471/598 by H. Liu 18

Efficiency of EBLChoosing a general rule too many rules -> slow inference aim for gain - significant increase in speed as general as possible

Operationality - A subgoal is operational means it is easy to solve Trade-off between Operationality and

Generality

Empirical analysis of efficiency in EBL study

CS 471/598 by H. Liu 19

Learning using relevant information

Prior knowledge: People in a country usually speak the same language

Observation: Given Fernando is Brazilian & speaks Portuguese

We cab logically conclude via resolution

CS 471/598 by H. Liu 20

Functional dependenciesWe have seen a form of relevance: determination - language (Portuguese) is a function of nationality (Brazil)

Determination is really a relationship between the predicatesThe corresponding generalization follows logically from the determinations and descriptions.

CS 471/598 by H. Liu 21

We can generalize from Fernando to all Brazilians, but not to all nations. So, determinations can limit the H space to be considered.Determinations specify a sufficient basis vocabulary from which to construct hypotheses concerning the target predicate.A reduction in the H space size should make it easier to learn the target predicate.

CS 471/598 by H. Liu 22

Learning using relevant information

A determination P Q says if any examples match on P, they must also match on QFind the simplest determination consistent with the observations Search through the space of determinations

from one predicate, two predicates Algorithm - Fig 19.8 (page 696) Time complexity is n choosing p.

CS 471/598 by H. Liu 23

Combining relevance based learning with decision tree learning -> RBDTLIts learning performance improves (Fig 19.9).Other issues noise handling using other prior knowledge from attribute-based to FOL

CS 471/598 by H. Liu 24

Inductive logic programmingIt combines inductive methods with FOL.ILP represents theories as logic programs.ILP offers complete algorithms for inducing general, first-order theories from examples.It can learn successfully in domains where attribute-based algorithms fail completely.An example - a typical family tree (Fig 19.11)

CS 471/598 by H. Liu 25

Inverse resolutionIf Classifications follow from B^H^D, then we can prove this by resolution with refutation (completeness).If we run the proof backwards, we can find a H such that the proof goes through.Generating inverse proofs A family tree example (Fig 19.13)

CS 471/598 by H. Liu 26

Inverse resolution involves search Each inverse resolution step is

nondeterministic For any C and C1, there can be many C2

Discovering new knowledge with IR It’s not easy - a monkey and a typewriter

Discovering new predicates with IR Fig 19.14

The ability to use background knowledge provides significant advantages

CS 471/598 by H. Liu 27

Top-down learning (FOIL)

A generalization of DT induction to the first-order case by the same author of C4.5 Starting with a general rule and specialize it to fit

data Now we use first-order literals instead of attributes,

and H is a set of clauses instead of a decision tree.Example: =>grandfather(x,y) (page 701) positive and negative examples adding literals one at a time to the left-hand side e.g., Father (x,y) => Grandfather(x,y) How to choose literal? (Algorithm on page 702)

CS 471/598 by H. Liu 28

SummaryUsing prior knowledge in cumulative learningPrior knowledge allows for shorter H’s.Prior knowledge plays different logical roles as in entailment constraintsEBL, RBL, KBILILP generate new predicates so that concise new theories can be expressed.

knowledge in learning copyright, 1996 © dale carnegie & associates, inc. chapter 19 spring 2004

Documents

new examples

existing examples

french slide

certain set of examples

learning algorithm

correct h

false negative

example classifications