uncertainty in ai. birds can fly, right? seems like common sense knowledge

Post on 17-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Uncertainty in AI

• Birds can fly, right?• Seems like common sense knowledge.

• The problem with First-Order Logic is that universal rules are too strict

• x bird(x)→flies(x) mean all bird must fly• but there are many exceptions...

• If we asserted:• penguin(opus)• x penguin(x) → flies(x)• x penguin(x) → bird(x)

• this would make the KB inconsistent

opustweety big bird woodstock

• we can't enumerate all possible exceptions• what about a robin with a broken wing?• what about birds that are made out of plastic?• what about Big Bird?• this is known as the "qualification problem"

• Uncertainty in actions:• If a gun is loaded and you pull the trigger, the gun will

fire, right?

• we can't enumerate all possible exceptions• what about a robin with a broken wing?• what about birds that are made out of plastic?• what about Big Bird?

• Uncertainty in actions:• If a gun is loaded and you pull the trigger, the gun will

fire, right?• ...unless it is a toy gun• ...unless it is underwater• ...unless the barrel is filled with concrete• ...unless it is defective

• what might we want to say is...• "most birds fly" or • "some birds do not fly" or• "tweety possibly flies"

• We cannot express these in FOL• there are no degrees of truth• "some birds do not fly" becomes x bird(x)flies(x)

• solutions:• Default (non-monotonic) Logics• Fuzzy Logics• procedural approachs• Probability

Default Logic

• extensions of FOL that allow for exceptions• FOL is monotonic: • anything that is entailed by a set of sentences will still be

entailed by a superset of sentences• |= a g implies |= ab g • bird(opus) |= flies(opus)• bird(opus)penguin(opus) |= flies(opus) [necessarily]

• Default Logics are non-monotonic• old conclusions can get retracted due to new

information• bird(opus)penguin(opus) |= flies(opus)

• Default Logics require alternative model theory, such as possible worlds (or Kripke) semantics• without going into the complexities, most default

logics are based on drawing conclusions from "minimal models" that make the least assumptions necessary

• Default Logic sytem I• new type of "soft" implication• bird(x) flies(x)• "" means "..and if it is not inconsistent to believe so, then..."• example:

• penguin(opus)• bird(tweety)• x penguin(x) → flies(x)• x penguin(x) → bird(x)• x bird(x) flies(x)

• model: m0=<U,D,R>• U={a,b} • D={opus→a,tweety→b}• R: bird={a,b}, penguin={a}, flies={b}• because opus is a penguin, 'a' cannot be in the set of things that fly• this makes it inconsistent to apply the "birds fly" rule for opus• this model is the "minimal" consistent model of the sentences

• Circumscription - another approach to Defeasible

Reasoning• add 'abnormal' predicates to default sentences• penguin(opus)• bird(tweety)• x penguin(x) → flies(x)• x penguin(x) → bird(x)• x bird(x)abnormal1(x) → flies(x)

• find models that make the fewest abnormal assumptions to be consistent• U={a,b} • D={opus→a,tweety→b}• R: bird={a,b}, penguin={a}, flies={b}, abnormal1 ={a}• there is no reason to assume tweety is abnormal• but we opus needs to be abnormal to explain not flying

• Procedural Solutions• Semantics networks

• 'isa' links encode inheritance relationships

• to answer queries, follow pointers until property is defined

• Closed World Assumption in Prolog• anything you cannot prove is assumed to be false• this makes it more concise to write KBs by asserting only true things• flies(X) :- bird(X),not penguin(X),not too_large_to_fly(X).• bird(tweety).• bird(opus).• bird(big_bird).• penguin(opus).• too_big_to_fly(big_bird).

• note that in FOL, we could not conclude flies(tweety) because penguin(tweety) would have to be explicitly asserted as a fact

opus isa canflypenguin F isa canflybird T

tweety

• Rule strengths or priorities in Expert Systems• ...an old ad-hoc approach (with unclear semantics)• penguin(x) →0.9 flies(x)• bird(x) →0.5 flies(x)

• example: • can assign 'salience' (integers) to defrules in Jess• this will determine the order in which rules fire in the

forward-chaining

Probability• encode knowledge in the form of prior probabilities

and conditional probabilities• P(x speaks portugese|x is from Brazil)=0.9• P(x speaks portugese)=0.012• P(x is from Brazil)=0.007• P(x flies|x is a bird)=0.8 (?)

• inference is done by calculating posterior probabilities given evidence• compute P(cavity | toothache, flossing, dental history,

recent consumption of candy...)• compute P(fed will raise interest rate | employment=5%,

inflation=0.5%, GDP=2%, recent geopolitical events...)

• Many modern knowledge-based systems are based on probabilistic inference• including Bayesian networks, Hidden Markov Models,

Markov Decision Problems• example: Bayesian networks are used for inferring user

goals or help needs from actions like mouse clicks in an automated software help system (think 'Clippy')• Decision Theory combines utilities with probabilities of

outcomes to decide actions to tak

• the challenge is capturing all the numbers needed for the prior and conditional probabilities• objectivists (frequentists) - probabilities represent

outcomes of trials/experiments• subjectivists - probabilities are degress of belief

• probability and statistics is at the core of many Machine Learning algorithms

• causal vs. diagnostic knowledge• causal: P(x has a toothache|x has a cavity)=0.9• diagnostic: P(x has a cavity|x has a toothache)=0.5

• typically it is easier to articulate knowledge in the causal direction, but we often want to use it in a diagnostic way to make inferences from observations

• product rule : P(A,B) joint = P(A|B)*P(B) • Bayes' Rule: convert between causal and diagnostic

H = hypothesis (cause, disease)E = evidence (effect, symptoms)

• Joint probability table (JPT)• you can calculate answer to any question from JPT• the problem is there are exponential # of entries (2N,

where N is the number of binary random variables)

• Joint probability table (JPT)• you can calculate answer to any question from JPT• the problem is there are exponential # of entries (2N,

where N is the number of binary random variables)

P(cavity | toothache) = P(cavity toothache) / P(toothache)

= 0.016+0.064 /

(0.108 + 0.012 + 0.016 + 0.064)

= 0.4

P(A,B|C) = P(A|C)P(B|C)

Solution to reduce complexity: Employ the Independence AssumptionMost variables are not strictly independent, in that there is a statistical association (but which is cause? effect?). However, some many variables are conditionally independent.

Bayesian networks

note: absence of link between Burglary and JohnCalls means JohnCalls is conditionally independent of Burglary given Alarm

top related