what is ai?cgi.di.uoa.gr/~ys02/siteai2005/lectures/ai2004-2pp.pdf · ai is the eld of science and...
TRANSCRIPT
'
&
$
%
What is AI?
AI is the field of science and engineering which attempts to build
intelligent systems.
But what are intelligent systems?
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
What is AI (cont’d)
Definitions found in AI textbooks tend to fall into the following
categories:
• AI is the field of science and engineering which attempts to
build systems that act like humans.
• ... that think like humans.
• ... that think rationally.
• ... that act rationally.
'
&
$
%
Acting Like Humans: The Turing Test Approach
To pass the Turing test a computer should have the following
capabilities:
• natural language processing
• knowledge representation
• automated reasoning
• machine learning
• computer vision
• robotics
Within AI, there has not been a big effort to pass the Turing test.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Thinking Like Humans:
The Cognitive Modelling Approach
How do humans think?
There two ways to find out:
• Introspection
• Psychological experiments
Example: The GPS program by Newell and Simon
In this tradition, psychology and cognitive science are very
relevant.
'
&
$
%
Thinking Rationally: The Laws of Thought Approach
What are the laws of thought? This question goes back to the
syllogisms of the Greek philosopher Aristotle.
The logicist tradition in AI has followed this approach.
Example: Early work on theorem proving
The emphasis on this tradition is correct inference. As a result related
work from philosophy and logic is very important.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Acting Rationally: The Rational Agent Approach
In this approach the design of rational agents is the main
problem.
What is an agent?
'
&
$
%
Agents
An agent is anything that can be viewed as perceiving its
environment through sensors and acting upon that environment
through effectors or actuators.
?
agent
percepts
sensors
actions
effectors
environment
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Examples of Agents
• Human agents
• Robotic agents
• Software agents (or software robots or softbots).
'
&
$
%
Rational Agents
A rational agent is one that acts so as to achieve the best
outcome or, when there is uncertainty, the best expected outcome.
Other properties of agents:
• autonomy
• social ability
• situatedness
• adaptivity
• ...
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Acting Rationally (cont’d)
The study of AI as rational agent design is
• more general than the laws of thought approach
• “easier” than approaches based on human thought or human
behaviour
This is the approach that we will take in this course. We will
concentrate on general principles of rational agents and on
components for constructing them.
'
&
$
%
Foundations of AI
The following disciplines have contributed ideas, viewpoints and
techniques to AI.
• Philosophy
• Mathematics
• Economics
• Neuroscience
• Psychology and cognitive science
• Computer science and engineering
• Control theory and cybernetics
• Linguistics
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
History of AI
• Gestation (1943-1955)
Models of artificial neurons (McCulloch and Pitts, 1943).
Hebbian learning (Hebb, 1949).
The article “Computing Machinery and Intelligence” by Alan
Turing (1950).
Snarc: The first neural network computer (Minsky and
Edmonds, 1951).
• Birth (1956)
The Dartmouth workshop in the summer of 1956 (McCarthy,
Minsky, Newell, Simon).
'
&
$
%
History of AI (cont’d)
• Early enthusiasm, great expectations (1952-1969)
Logic Theorist, General Problem Solver, Geometry Theorem
Prover, game playing, Lisp, theorem proving, Shakey the robot,
micro-worlds, adalines, perceptrons.
• A dose of reality (1966-1973)
Programs with no domain knowledge, intractability problems.
Cancellation of big projects on machine translation (US),
Lighthill report (UK).
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
History of AI (cont’d)
• Knowledge-based systems (1969-1979)
The role of domain specific knowledge, expert systems.
Representation and reasoning languages (e.g., Prolog and frame-based
languages).
• AI becomes industry (1980-present)
The first successful expert system: R1 (McDermott, DEC).
The Japanese 5th generation project (1981) and its emphasis on logic
programming.
Microelectronics and Computer Technology Corporation (MCC) in the
U.S.
Alvey report in the U.K.
• The return of neural networks (1986-present)
Connectionism.
'
&
$
%
History of AI (cont’d)
• AI becomes science (1987-present)
Neats vs. scruffies.
Knowledge representation, speech recognition, neural networks
and data mining, Bayesian networks, robotics, computer vision.
• Intelligent agents (1995-present)
See the conference AAMAS
(http://www.aamas-conference.org/)
• Semantic Web (1998-present)
See the site http://www.semanticweb.org/.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
State of the Art
• Autonomous planning and scheduling
See NASA’s Remote Agent
(http://ic.arc.nasa.gov/projects/remote-agent/).
• Game Playing
See IBM’s Deep Blue
(http://www.research.ibm.com/deepblue/).
• Autonomous control
See CMU’s NavLab computer controlled minivan
(http://www.ri.cmu.edu/labs/lab_28.html).
See DARPA’s grand challenge in autonomous ground vehicles
(http://www.stanfordracing.org/).
'
&
$
%
State of the Art
• Constraint solving software
See solvers by ILOG (http://www.ilog.com).
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Readings
Chapter 1 and 2 (not in depth) of AIMA.
'
&
$
%
Solving Problems by Searching
• Agents, Goal-Based Agents, Problem-Solving Agents
• Search Problems
• Blind Search Strategies
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Agents
?
agent
percepts
sensors
actions
effectors
environment
Definition. An agent is anything that can be viewed as
perceiving its environment through sensors and acting upon
that environment through effectors or actuators.
'
&
$
%
Examples of Agents
• Human agents
• Robotic agents
• Software agents (or software robots or softbots).
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
How Agents Should Act?
The behavior of an agent depends on the following:
• The environment. This is the world where the agent lives.
• The percept sequence. This is the complete history of everything the
agent has ever perceived.
The agent can be described (almost completely) by an agent function
that maps every given percept sequence to an action.
The agent function will be implemented by an agent program.
• The performance measure. This is the objective criterion for
success of an agent’s behavior. It is imposed by the agent designer. It
might not be easy to define the performance measure.
Example: The performance measure of an automatic taxi driver
should be ....
'
&
$
%
Goal-Based Agents
Agent
En
viro
nm
en
t
Sensors
Effectors
What it will be like if I do action A
What the worldis like now
What action Ishould do now
State
How the world evolves
What my actions do
Goals
The behavior of an agent also depends on:
• The agent’s goals. A goal specifies what states of the
environment are desirable for the agent.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Problem-Solving Agents
Problem-solving agents are a class of goal-based agents.
Problem solving agents decide what to do by finding sequences of
actions that lead to desirable states.
Example: Consider an agent in the city of Arad, Romania. How
can it get to Bucharest the next day, on time for its flight?
'
&
$
%
Problem-Solving Agents
Problem-solving agents work by carrying out the following tasks
repeatedly:
• Goal formulation: decide what the objective is.
• Problem formulation: decide what actions and states to
consider in order to meet the objective.
• Search: find a sequence of actions that achieve the goal.
• Execution: execute the chosen sequence of actions.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example: Route Finding in Romania
Giurgiu
UrziceniHirsova
Eforie
Neamt
Oradea
Zerind
Arad
Timisoara
Lugoj
Mehadia
Dobreta
Craiova
Sibiu Fagaras
Pitesti
Vaslui
Iasi
Rimnicu Vilcea
Bucharest
'
&
$
%
Our First Agent Program
function SimpleProblemSolvingAgent(percept) returns an
action
static seq, state, goal, problem
state← UpdateState(state, percept)
if seq is empty then
goal← FormulateGoal(state)
problem← FormulateProblem(state, goal)
seq ← Search(problem)
end
action← First(seq)
seq ← Rest(seq)
return action
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Structure of Agents
Agent = Architecture + Program
The architecture makes the percepts from the sensors available to
the program, runs the program, and feeds the program’s action
choices to the effectors as they are generated.
We will only deal with agent programs in this course.
'
&
$
%
Search Problems
The basic elements of a search problem are:
• The initial state.
• The set of available actions. To specify the available actions
we use a successor function Succ which, given a state x,
returns a set of ordered pairs (action, successor state). This
set tells us what actions are possible in x and what states are
reachable from x by executing these actions.
The initial state and the successor function define state space
of a search problem: the set of all states reachable from the
initial state by any sequence of actions.
A path in the state space is any sequence of states connected
by a sequence of actions.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Search Problems (cont’d)
• The goal to be achieved. The goal is a set of world states
called the goal states. Goals can be specified implicitly by a
goal test i.e., a test which can be applied to a state to
determine if it is a goal state.
• A path cost function is a function (usually denoted by g)
which assigns a numeric cost to each path. The path cost will
usually be the sum of the costs of the individual actions along
the path.
The step cost of taking action a to go from state x to state y
is denoted by c(x, a, y).
A solution to a search problem is a path from the initial state to a
goal state. A solution is optimal if it has the lowest path cost
among all solutions.
'
&
$
%
An Example
The route finding problem from Arad to Bucharest can formally be
specified as follows:
• The states specify the city we are in e.g., In(Arad).
• The only available action is GoTo e.g., GoTo(Sibiu).
• For every city the successor function gives us a set of pairs
(GoTo(.), In(.)). For example,
S(Arad) = {(GoTo(Sibiu), In(Sibiu)),
(GoTo(T imisoara), In(T imisoara)), (GoTo(Zerind), In(Zerind))}.
• The initial state is In(Arad). The goal state is In(Bucharest).
• The path cost can be the road distance in kilometers.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The 8-puzzle
Start State Goal State
2
45
6
7
8
1 2 3
4
67
81
23
45
6
7
81
23
45
6
7
8
5
'
&
$
%
The 8-puzzle (cont’d)
Formal specification:
• States: a state description specifies the location of each tile and
blank
• Actions: blank moves L, R, U, D
• Goal state
• Path cost: length of the path
������� ��� ������������������������� � � !"��#���$�%�&('*)���+
The 8-queens problem
'
&
$
%
The 8-queens problem (cont’d)
Formal specification:
• States: any arrangement of 0 to 8 queens on board
• Actions: add a queen to any square
• Goal test: 8 queens on board, none attacked
• Path cost: zero
Size of state space: 648
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The 8-queens problem (cont’d)
Alternative specification:
• States: arrangements of 0 to 8 queens with none attacked
• Actions: place a queen in the left-most empty column such
that it is not attacked by any other queen.
• Goal test: 8 queens on board, none attacked
• Path cost: zero
Size of state space: 88
'
&
$
%
Search Problems in the Real World
• Route finding
• Touring problems e.g., travelling salesman
• Robot navigation
• Automatic assembly sequencing
• Protein design
• Query optimisation problems in DBMS
• Internet searching
• Automatic workflow generation
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Computational Complexity Note
Almost all of the problems presented above have NP-hard or
worse computational complexity.
Thus, we should not expect simple algorithms for search problems
to be efficient. This is a big problem for search problems; we will
try to find ways to tackle it!
'
&
$
%
Searching for Solutions
Example:
Timisoara
Timisoara
(a) The initial state Arad
(b) After expanding Arad
(c) After expanding Sibiu
Arad
Sibiu Zerind
Rimnicu VilceaOradeaFagarasArad
Arad
Sibiu Zerind
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Searching for Solutions (cont’d)
Comments:
• Finding a solution is done by searching through the state space.
The trick is to maintain and extend a set of partial solutions.
• The choice of which state to expand next is determined by the
search strategy.
• The search process is building up a search tree that is
superimposed over the state space.
• It is important to distinguish between the state space and the
search tree.
'
&
$
%
Searching for Solutions (cont’d)
function TreeSearch(problem, strategy)
returns a solution or failure
initialize the search tree using the initial state of problem
loop do
if there are no candidates for expansion then
return failure
choose a leaf node for expansion according to strategy
if the node contains a goal state then
return the corresponding solution
else expand the node and add the resulting nodes
to the search tree
end
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Search Tree Nodes
Nodes in the search tree can be represented by a data structure with
five components:
• State: the state to which the node corresponds.
• ParentNode: the node in the search tree that generated this
node.
• Action: the action that was applied to generate the node.
• PathCost: the cost of the path from the initial state to the node.
• Depth: the number of nodes on the path from the root to this
node.
'
&
$
%
The Fringe or Frontier
The set of nodes awaiting to be expanded is called the fringe or
frontier. It can be implemented as a queue with operations:
• MakeQueue(Elements)
• Empty?(Queue)
• RemoveFront(Queue)
• QueuingFn(Elements,Queue)
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The General Tree Search Algorithm
function TreeSearch(problem,QueuingFn)
returns a solution, or failure
fringe← MakeQueue(MakeNode(InitialState[problem]))
loop do
if fringe is empty then return failure
node← RemoveFront(fringe)
if GoalTest[problem] applied to State[node] succeeds then
return node
fringe← QueuingFn(fringe,Expand(node, problem))
end
The function Expand is responsible for calculating each of the
components of the nodes it generates.
'
&
$
%
Search Algorithms
We will consider two kinds of search algorithms:
• Uninformed or blind
• Informed or heuristic
Evaluation criteria for a search algorithm:
• Completeness
• Optimality
• Time complexity
• Space complexity
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Uninformed Search Methods
• Breadth-first search
• Uniform-cost search
• Depth-first search
• Depth-limited search
• Iterative deepening search
• Bidirectional search
'
&
$
%
Breadth-first search (BFS)
function BreadthFirstSearch(problem)
returns a solution or failure
return TreeSearch(problem,EnqueueAtEnd)
Example:
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Breadth-first search (cont’d)
Evaluation:
• Complete? Yes, if the branching factor b is finite.
• Time: O(bd+1) where b is the branching factor and d is the
depth of the solution.
• Space: O(bd+1). This is the biggest problem with BFS.
• Optimal? Yes, under the assumption that the path cost is a
non-decreasing function of the depth of the node (e.g., when all
actions have identical cost).
Note: BFS finds the shallowest goal state.
'
&
$
%
Uniform-cost search (UCS)
Modifies BFS by always expanding the lowest cost node on the
fringe (as measured by the path cost).
Example:
(a) (b)
S
0 S
A B C1 5 15
5 15
S
A B C
G11 S
A B C15
G11
G10
S G
A
B
C
1 10
55
15 5
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Uniform-cost search (cont’d)
Evaluation:
• Complete? Yes.
• Time: O(bdC∗/εe) where b is the branching factor, C∗ is the
cost of the optimal solution and every action costs at least
ε > 0.
• Space: same as time.
• Optimal? Yes.
Completeness and optimality hold under the assumption that the
branching factor is finite and the cost never decreases as we go
along a path i.e., g(Successor(n)) ≥ g(n) for every node n. The
last condition holds e.g., when each action costs at least ε > 0.
Note: BFS is UCS with g(n) =Depth(n).
'
&
$
%
Depth-first search (DFS)
Depth-first search always expands one of the nodes at the deepest
level of the search tree.
Example:
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Depth-first search (cont’d)
function DepthFirstSearch(problem)
returns a solution, or failure
TreeSearch(problem,EnqueueAtFront)
Evaluation:
• Complete? No
• Time: O(bm) where b is the branching factor and m is the
maximum depth of the search tree.
• Space: O(bm).
• Optimal? No
'
&
$
%
Depth-limited search (DLS)
Like DFS but imposes a depth limit on search. E.g., for the
“driving to Bucharest” example, a good depth-limit is 19 (we have
20 cities).
Evaluation:
• Complete? Yes, iff l ≥ d where l is the depth limit and d the
depth of a solution.
• Time: O(bl)
• Space: O(bl)
• Optimal? No
Question: Can we always find a good depth-limit?
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Iterative-deepening search (IDS)
IDS sidesteps the issue of choosing the best depth-limit by trying
all possible ones: 0,1,2 and so on.
function IterativeDeepeningSearch(problem)
returns a solution sequence
for depth← 0 to ∞ do
if DepthLimitedSearch(problem, depth) succeeds then
return its result
end
return failure
'
&
$
%
Iterative-deepening search (cont’d)
Example:
Limit = 3
Limit = 2
Limit = 1
Limit = 0
.....
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Iterative-deepening search (cont’d)
Question: Is IDS wasteful?
Answer: No!
Let us assume that the solution is found when the last node at level
d is expanded. Then the number of nodes generated in a BFS to
depth d is
1 + b+ b2 + · · ·+ bd + (bd+1 − b)
The number of nodes generated in an IDS to depth d is
(d+ 1) + db+ (d− 1)b2 + · · ·+ 2bd−1 + 1bd
Using the above formulas we can see that BFS can actually be a lot
more wasteful than IDS (e.g., try b = 10 and d = 5).
'
&
$
%
Iterative-deepening search (cont’d)
Evaluation:
• Complete? Yes, under the assumptions for BFS.
• Time: O(bd)
• Space: O(bd)
• Optimal? Yes, under the assumptions for BFS.
IDS is the search algorithm of choice when the search space is large
and the depth of the search is not known.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Bidirectional search
Idea: Search both forward from the initial state and backward
from the goal. Stop when the two searches meet.
Motivation: bd/2 + bd/2 << bd
Problems:
• What does it mean to search backwards from the goal?
• What if we have many possible goal states?
• Can we check efficiently that the two searches meet?
• What kind of search do we do in each half?
'
&
$
%
Bidirectional search (cont’d)
Evaluation:
• Complete? Yes, if the branching factor is finite and both
directions use BFS.
• Time: O(bd/2)
• Space: O(bd/2)
• Optimal? Yes, if both directions use BFS and under the
assumptions for BFS.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Avoiding Repeated States
Example:
A
B
C
D
A
BB
CCCC
'
&
$
%
Avoiding Repeated States (cont’d)
• In this case the state space is a graph.
• A solution is to avoid generating any state that was generated
before. This can be enforced by keeping a list of the generated
states called the closed list. In this case the fringe of
unexpanded nodes is called the open list.
The closed list can be implemented by a hash-table for retrieval
in constant time. However, there is no easy way to avoid the
space penalty!
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The General Graph Search Algorithm
function GraphSearch(problem,QueuingFn)
returns a solution, or failure
closed← an empty set
fringe← MakeQueue(MakeNode(InitialState[problem]))
loop do
if fringe is empty then return failure
node← RemoveFront(fringe)
if GoalTest[problem] applied to State[node] succeeds then
return node
if State[node] is not in closed then
add State[node] to closed
fringe← QueuingFn(fringe,Expand(node, problem))
end
'
&
$
%
Summary
• Agents, Goal-Based Agents, Problem-Solving Agents
• Search Problems
• Blind Search Strategies
Readings: Chapter 3, Sections 3.1-3.5 of AIMA
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Informed (or Heuristic) Search Methods
• Heuristics
• Best-first search
• The algorithm A∗
• Properties of heuristic functions
• Branch-and-bound
'
&
$
%
Heuristics
All blind search algorithms that we discussed have time complexity
of order O(bd) or something similar. This is unacceptable in real
problems!
In large search spaces, one can do a lot better by using
domain-specific information to speed-up search.
Heuristics are “rules of thumb” for selecting the next node to be
expanded by a search algorithm.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Best-First Search
A blind search algorithm could be improved if we knew the best (or
“seemingly best”) node to expand.
function BestFirstSearch(problem,EvalFn)
returns a solution sequence
QueuingFn← a function that orders nodes in
ascending order of EvalFn
return TreeSearch(problem,QueuingFn)
The function EvalFn is called the evaluation function.
Note: GraphSearch can be used instead of TreeSearch.
'
&
$
%
Evaluation Functions and Heuristic Functions
There is a whole family of best-first search algorithms with
different evaluation functions.
A key component of many of these algorithms is a heuristic
function h such that
h(n) = estimated cost of the cheapest path from the state
at node n to a goal state
h can be any function such that h(n) = 0 if n is a goal node. But in
order to find a good heuristic function, we need domain specific
information.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Greedy Best-First Search
GreedyBestFirstSearch tries to expand the node that it is
closest to the goal, on the grounds that this is likely to lead to a
solution quickly. Thus nodes are evaluated using the heuristic
function h i.e., f(n) = h(n).
function GreedyBestFirstSearch(problem)
returns a solution or failure
return BestFirstSearch(problem, h)
The algorithm is greedy because it prefers to take the biggest
possible bite out of the remaining cost to reach the goal.
'
&
$
%
Example - On the Road to Bucharest
Giurgiu
UrziceniHirsova
Eforie
Neamt
Oradea
Zerind
Arad
Timisoara
Lugoj
Mehadia
Dobreta
Craiova
Sibiu Fagaras
Pitesti
Vaslui
Iasi
Rimnicu Vilcea
Bucharest
71
75
118
111
70
75
120
151
140
99
80
97
101
211
138
146 85
90
98
142
92
87
86
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example (cont’d)
hSLD(n)= straight line distance between n and the goal location.
Distances for Bucharest are shown below:
Urziceni
NeamtOradea
Zerind
Timisoara
Mehadia
Sibiu
PitestiRimnicu Vilcea
Vaslui
241
25332980
199
380234
374
Bucharest
GiurgiuHirsova
Eforie
Arad
Lugoj
DobretaCraiova
Fagaras
Iasi
0160242161
77151
366
244226
176
100193
'
&
$
%
Example (cont’d)
Rimnicu Vilcea
Zerind
Arad
Sibiu
Arad Fagaras Oradea
Timisoara
Sibiu Bucharest
329 374
366 380 193
253 0
Rimnicu Vilcea
Zerind
Arad
Sibiu
Arad Fagaras Oradea
Timisoara
329 374
366 176 380 193
Zerind
Arad
Sibiu Timisoara
253 329 374
Arad
366
(a) The initial state
(b) After expanding Arad
(c) After expanding Sibiu
(d) After expanding Fagaras
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Greedy Best-First Search (cont’d)
Evaluation:
• Complete? No (consider the problem of getting from Iasi to
Fagaras; search can oscillate between Iasi and Neamt).
• Time: O(bm) where m is the maximum depth of the search
space.
• Space: O(bm)
• Optimal? No (the path Arad-Sibiu-Rimnicu
Vilcea-Pitesti-Bucharest is optimal).
A good choice of h can reduce space and time substantially.
'
&
$
%
The A∗ Search Algorithm
Greedy Best-First Search:
• Searches by minimizing the estimated cost h(n) to the goal
• Neither optimal nor complete
Uniform Cost Search:
• Searches by minimizing the cost g(n) of the path so far
• Optimal, complete
Can we combine the two algorithms?
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The A∗ Algorithm (cont’d)
A∗ is a best-first search algorithm with evaluation function
f(n) = g(n) + h(n).
In this case f(n) is the estimated cost of the cheapest solution
through n.
function A∗Search(problem) returns a solution or failure
return BestFirstSearch(problem, g+ h)
'
&
$
%
A∗ Goes to Bucharest
See illustration in accompanying file astar-progress.ps or in
AIMA.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The A∗ Algorithm (cont’d)
Let us assume that A∗ uses TreeSearch as its main subroutine
and also:
• The function h is chosen such that it never overestimates the
cost to reach a goal. Such an h is called an admissible
heuristic.
If h is admissible then f(n) never overestimates the actual cost
of the best solution through n.
• The branching factor b is finite.
• Every action costs at least δ > 0.
'
&
$
%
The A∗ Algorithm (cont’d)
Evaluation (under the previous assumptions):
• Complete? Yes.
• Time: exponential, unless the error in the heuristic function h
grows no faster than the logarithm of the actual path cost.
For most heuristics used in practice, the error is at least
proportional to the path cost.
But even when A∗ takes exponential time, it offers a huge
improvement compared to blind search.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The A∗ Algorithm (cont’d)
Evaluation (cont’d):
• Space: O(bd).
This is the main drawback of A∗. The algorithm IDA∗
addresses the large space requirements of A∗.
• Optimal? Yes.
'
&
$
%
Optimality and Completeness of A∗
Proposition. A∗ is optimal.
Proof: Let us assume that the cost of the optimal solution is C∗
and a suboptimal goal node G2 appears on the fringe. Then
because G2 is suboptimal and h(G2) = 0, we have:
f(G2) = g(G2) + h(G2) = g(G2) > C∗
Now consider a fringe node n which is on the optimal path.
Because h does not overestimate the cost to the goal, we have:
f(n) = g(n) + h(n) ≤ C∗
So G2 will not be chosen for expansion!
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Optimality and Completeness of A∗ (cont’d)
The proof of optimality breaks down when A∗ uses GraphSearch
as its main subroutine because GraphSearch can discard the
optimal path to a repeated state if it is not the first one to be
generated.
To guarantee optimality in this case, we have two options:
• Modify GraphSearch so that it discards the most
expensive path found to a node.
• Impose an extra requirement of consistency or monotonicity
on h.
'
&
$
%
Consistent Heuristics
Definition. A heuristic h is called consistent if for all nodes n, n′
such that n′ is a successor of n generated by any action a,
h(n) ≤ c(n, a, n′) + h(n′).
This is a form of the general triangle inequality: the sum of the
lengths of any two sides of a triangle is greater than the length of
the remaining side.
Proposition. Every consistent heuristic is also admissible.
Most admissible heuristics that one can think of are also consistent
(e.g., hSLD)!
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Optimality and Completeness of A∗ (cont’d)
Proposition. If h is consistent then the values of f for nodes
expanded by A∗ along any path are non-decreasing.
Proof: Let n be a node and n′ its successor. Then
g(n′) = g(n) + c(n, a, n′)
for some action a, and we have
f(n′) = g(n′)+h(n′) = g(n)+c(n, a, n′)+h(n′) ≥ g(n)+h(n) = f(n).
Thus we can conceptually draw contours in the state space like
contours in a topographic map.
'
&
$
%
The behaviour of A∗
O
Z
A
T
L
M
DC
R
F
P
G
BU
H
E
V
I
N
380
400
420
S
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Optimality and Completeness of A∗ (cont’d)
A∗ search is complete: as we add contours of increasing f , we
must eventually reach a contour where f is equal to the cost of the
path to a goal state.
In fact, A∗ works as follows:
• It expands all nodes with f(n) < C∗
• It may then expand some of the nodes right on the “goal
contour”, for which f(n) = C∗, before selecting a goal node.
'
&
$
%
Optimality and Completeness of A∗ (cont’d)
A∗ expands no nodes with cost f(n) > C∗ where C∗ is the cost of
the optimal solution.
There is no other optimal algorithm that is guaranteed to expand
fewer nodes than A∗.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Heuristic Functions
What is a good heuristic for the 8-puzzle problem?
Start State Goal State
2
45
6
7
8
1 2 3
4
67
81
23
45
6
7
81
23
45
6
7
8
5
'
&
$
%
The 8-puzzle Problem
Formal specification:
• States: a state description specifies the location of each tile and
blank
• Actions: blank moves L, R, U, D
• Goal state
• Path cost: length of the path
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Heuristic Functions (cont’d)
Heuristics for the 8-puzzle:
• h1 = the number of tiles in the wrong position
• h2 = the sum of the horizontal and vertical distances of all tiles
from their goal positions (Manhattan distance).
Both heuristics are admissible. Which one is better?
'
&
$
%
Heuristic Functions (cont’d)
A way of characterizing the quality of a heuristic is to find its
effective branching factor b∗.
If the total number of nodes expanded by A∗ for a particular
problem is N , and the solution depth is d then
N = 1 + b∗ + (b∗)2 + · · ·+ (b∗)d.
Usually, b∗ is fairly constant over a large number of instances. A
well-defined heuristic would have a value of b∗ close to 1.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+
Comparing A∗ and IDS
Search Cost Effective Branching Factor
d IDS A*(h1) A*(h2) IDS A*(h1) A*(h2)
2 10 6 6 2.45 1.79 1.794 112 13 12 2.87 1.48 1.456 680 20 18 2.73 1.34 1.308 6384 39 25 2.80 1.33 1.24
10 47127 93 39 2.79 1.38 1.2212 364404 227 73 2.78 1.42 1.2414 3473941 539 113 2.83 1.44 1.2316 – 1301 211 – 1.45 1.2518 – 3056 363 – 1.46 1.2620 – 7276 676 – 1.47 1.2722 – 18094 1219 – 1.48 1.2824 – 39135 1641 – 1.48 1.26
'
&
$
%
Heuristic Functions (cont’d)
If h2(n) ≥ h1(n) for all nodes n then h2 dominates h1 (or h2 is
more informed) than h1.
Example: In the 8-puzzle h2 dominates h1.
Theorem: If h2 dominates h1 then A∗ using h2 will expand fewer
nodes, on average, than A∗ using h1.
Lesson: It is always better to use an admissible heuristic
function with higher values.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Heuristic Functions: How do we find them?
It is possible to find heuristic functions by considering relaxed
versions of the given problem.
The cost of an optimal solution to a relaxed problem is an
admissible heuristic for the original problem.
Relaxed problems can sometimes be generated automatically and
then heuristics can be discovered automatically! Otherwise, we
have to consider the problem at hand carefully and use our brain!
'
&
$
%
Heuristic Functions: How do we find them?
If we have admissible heuristics h1, . . . , hn such that no one
dominates any of the others then we can choose
h = max(h1, . . . , hn).
Final note: The cost of computing the heuristic function for each
node must be taken into account.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Extensions of A∗
The main problem with A∗ is its excessive use of memory for large
problems. Several algorithms have been invented to tackle this
problem: IDA∗, RBFS, MA∗, SMA∗. See AIMA for more details.
'
&
$
%
Branch-and-Bound
Another class of traditional intelligent search algorithms pioneered
originally in the Operations Research community is
branch-and-bound.
Branch-and-bound has been designed for optimization
problems. The main idea is to eliminate parts of the search
space where we know that a solution cannot be found.
In Operations Research courses branch-and-bound is usually
presented in the context of solving of integer linear
programming problems. We will present a general formulation of
branch-and-bound and examples from the following book:
Christos Papadimitriou and Kenneth H. Steiglitz.
Combinatorial Optimization - Algorithms and Complexity.
Prentice-Hall, 1982.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Branch-and-Bound (cont’d)
In branch-and-bound the search space is organized as a tree with
the following two features:
• Branching or partitioning. Each node represents a set of
solutions which can be partitioned into mutually exclusive sets.
Each subset in the partition is represented by a child of the
node.
• Lower bounding. There is an algorithm for computing a
lower-bound on the cost of each solution in a given subset (i.e.,
obtained as a child of a node). Actually, we need a
lower-bound if we are minimizing but an upper bound if
we are maximizing.
'
&
$
%
Branch-and-Bound (cont’d)
The tree can be searched in any way we choose (e.g., DFS or BFS).
However, if we have already found a solution with cost c by
traversing the tree, and we are at a node with lower bound ≥ c,
then we can safely ignore (prune) this branch of the tree and
carry on our search with another branch.
This is the step in branch-and-bound where heuristic knowledge
about the problem domain is used.
Notice the differences with the Artificial Intelligence terminology
we have used so far:
• Partition the current solution set – refine – branch (OR).
• Extend a partial solution – create – generate-and-test (AI).
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Branch-and-Bound (cont’d)
algorithm BranchAndBound(problem)
activeset := {problem}
U :=∞; currentbest := anything
while activeset is not empty do
choose a branching node k ∈ activeset
remove node k from activeset
generate the children of node k: child i, i = 1, . . . , nk,
and the corresponding lower bounds zi
for i = 1, . . . , nk do
if zi ≥ U then kill child i
else if child i is a complete solution and zi < U then
U := zi; currentbest := i
else add child i to activeset
end
end
'
&
$
%
Example
The shortest-path problem for directed weighted graphs.
Definition. Let G = (V,E) be a directed graph with non-negative
weight cj ≥ 0 associated with each arc ej ∈ E. The shortest-path
problem is to find a directed path from a distinguished source
node s to distinguished terminal node t, with the minimum total
weight.
Note that this is just an example. Dijkstra’s algorithm for the
shortest-path problem is more efficient than using branch and
bound and runs in time O(n2) where n is the number of nodes in
the graph.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example Graph
a 2 u 3 e
2 j
2 n
4
b 3
a f 9
k 3
o 4
c 4
2 g l
5 p
2
h 2
m 1
q 1 r 2
s t
d 7
i 1
'
&
$
%
Applying Branch-and-Bound
s
2 3
12
4
9 4 5
14 7
5
6 10 8
6 10 11 7
6
7
5
8
a
u
f g q k
m l
d e
r p
m l
g f
b c
h
l m
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The Search Tree with Pruned Branches Shown
s
2 3
12
4
9 4 5
14
7 5
6 10 8
6 10 11
7
6
7
5
8
a
u
f g q k
m l
d e
r p
m l
g f
b c
h
l m
8 12
m l
8 8
r p
o
11 9 9
r p
'
&
$
%
Applying Branch-and-Bound (cont’d)
At each node in the search tree of this example the following is true:
• Branching is determined by considering which arc to choose
to continue the path. I.e., a subset of the feasible solutions
corresponds to all paths from s to t that start by the choices
already made.
• The lower bound used is the cumulative length of the partial
path up to the current node.
Note: if we always choose as branching node the one with the
shortest partial path, we have an algorithm similar to UCS (or A∗
with h = 0).
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example: k-way Number Partitioning (kNUMP)
Definition. Let S be a finite bag (multi-set) of positive integers.
Partition S into k bags A1, . . . , Ak ⊆ S so as to minimize the
following difference:
∆(A1, . . . , Ak) = maxi
∑
x∈Ai
x−mini
∑
x∈Ai
x
Let us deal with 2NUMP for simplicity.
Example: How do we partition the bag of numbers {8, 7, 6, 5, 4}
into two bags so that the sum of the numbers in these two bags is
minimized?
'
&
$
%
A Greedy Algorithm
Order the given numbers in descending order. Initially, the two
subsets are empty. Then, repeatedly take the next input number
and assign it to the subset with the smallest sum so far.
For the input set {8, 7, 6, 5, 4}, this greedy algorithm will return the
subsets
{8, 5} and {7, 6, 4}
with difference 4 which is not optimal. The optimal difference is 0.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The Search Tree
{}{}
{8} {}
{8} {7}
{8} {7,6}
{8,5} {7,6}
{8,5,4} {7,6}
{8,6} {7}
{8,7} {}
{8,6} {7,5}
{8,6} {7,5,4}
{8,7} {6}
{8,7} {6,5}
{8,7} {6,5,4}
'
&
$
%
Executing the Search Algorithm
• We can search the tree in a DFS fashion.
• We can order the search by always trying first the branch
where the next number is put in the smallest subset.
• This search strategy will actually return the greedy solution
first. We can call this algorithm anytime: it starts with a
greedy solution, and given more time, it improves it until it
finds and proves the optimal solution.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Pruning the Search Tree by Branch and Bound
• Let us assume that we have already found a solution with
difference d and we are at a search tree node n with current
subset sum difference d′, sum of remaining numbers s and
d′ > s.
If d′ − s ≥ d then there is no need to explore the tree below n
because the difference d′ − s will not be better than d.
If d′ − s < d, we can add the remaining numbers in the smaller
sum and this is actually a better solution that the current one.
'
&
$
%
Pruning (cont’d)
• If at any point in the search we find a perfect partition then we
terminate the search.
A partition will be called perfect if it gives difference 0 when
the sum of the given numbers is even and 1 if the sum of the
given numbers is odd.
The difference corresponding to a perfect partition is a
lower-bound on any other possible difference.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Pruning (cont’d)
• The first number should be assigned only to one subset.
• The last number should only be assigned to the smallest subset.
• When the two current subsets have equal sums, the next
number should only be assigned to one subset.
Note: The above algorithm does not correspond exactly to the
algorithm BranchAndBound as given earlier. Thus
branch-and-bound should be understood to be a family of
algorithms with the features we presented (as opposed to a single
fixed algorithm).
'
&
$
%
Readings
• Chapter 4 of AIMA (Sections 4.1 and 4.2).
• Section 18.2 from
Christos Papadimitriou and Kenneth H. Steiglitz.
Combinatorial Optimization - Algorithms and
Complexity. Prentice-Hall, 1982.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Local Search and Optimization Problems
• Hill-climbing
• Simulated annealing
• Local beam search
• Genetic Algorithms
'
&
$
%
Local Search Algorithms
In many optimization problems the path to a goal state is
irrelevant. The goal state itself is the solution.
Example:
• Finding a configuration satisfying certain constraints, e.g., the
8-queens problem or a job-shop scheduling problem.
In such cases, we can use iterative improvement: start with a
single current state, and try to improve it!
The same framework is applicable to problems where the path
appears to be of interest (e.g., TSP) if these problems can be
casted in a more appropriate (but equivalent) way.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Local Search Algorithms (cont’d)
Local search algorithms work as follows:
• Pick a “solution” from the search space and evaluate it. Define
this as the current solution.
• Apply a transformation to the current solution to generate and
evaluate a new solution.
• If the new solution is better than the current solution the
exchange it with the current solution; otherwise discard the
new solution.
• Repeat steps 2 and 3 until no transformation in the given set
improves the current solution.
'
&
$
%
Local Search Algorithms (cont’d)
Thus local search algorithms operate using a single current state
(rather than multiple paths as e.g., A∗) and generally move only to
neighbours of that state.
At each step of a local search algorithm we have a complete but
imperfect solution to a search problem. Other algorithms we saw
previously (e.g., A∗) work with partial solutions and extend
them to complete ones.
Good properties of local search algorithms:
• Constant space
• Suitable for on-line as well as off-line problems.
• Can find reasonable solutions in large solution spaces
where exhaustive search would fail miserably.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Iterative Improvement Algorithms
Idea: Start with a “solution” and make modifications until you
reach a solution. Graphically:
evaluation
currentstate
'
&
$
%
Example: TSP
TSP: Let G be a (directed or undirected) graph of n nodes with
each edge assigned a non-negative cost. Find the lowest-cost path
of G that visits each node only once and returns to a given initial
node.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Algorithm 2-Opt
Let us view the solution set for TSP as the set of permutations of
the n cities.
Algorithm:
Start with an arbitrary complete tour T (i.e., a random
permutation).
The neighbourhood of T is defined as the set of all tours that can
be reached by exchanging two non-adjacent edges (this move is
called a 2-interchange).
Search in the neighbourhood of T for a new tour T ′. If this tour is
better than T (i.e., it has lower cost), then replace T with T ′.
If you cannot find a better tour, terminate. The resulting
permutation is called 2-optimal.
'
&
$
%
Example: 4-Queens
Idea: Start with 4 queens placed arbitrarily on the board. Then repeatedly
move a single queen to another square within its column.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The State Space Landscape
currentstate
objective function
state space
global maximum
local maximum
"flat" local maximum
shoulder
'
&
$
%
Hill-Climbing Search (Gradient Steepest Ascent)
function Hill-Climbing(problem)
returns a state that is a local maximum
inputs: problem, a problem
local variables: current, a node
neighbour, a node
current← MakeNode(RandomState[problem])
loop do
next← a highest-valued successor of current
if Value[neighbour] ≤ Value[current] then return current
current← neighbour
end
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Hill-Climbing Search (cont’d)
• Successors are searched in a systematic way.
• When choosing a highest-valued successor, break ties randomly.
• Change “highest-valued” to “lowest-valued” and ≤ to ≥ to get
“gradient steepest descent” (applicable to minimization
problems).
'
&
$
%
Example: 8-queens
Formal specification:
• States: any arrangement of 8 queens on board
• Actions: Move a queen within its column
• Goal test: 8 queens on board, none attacked
• Evaluation function (cost): Number of pairs of queens that
attack each other.
Thus we have a minimization problem: find a state that
minimizes the evaluation function.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example: 8-queens (cont’d)
14
18
17
15
14
18
14
14
14
14
14
12
16
12
13
16
17
14
18
13
14
17
15
18
15
13
15
13
12
15
15
13
15
12
13
14
14
14
16
12
14
12
12
15
16
13
14
12
14
18
16
16
16
14
16
14
The value of cost for the above state is 17. The numbers in the squares
show the new costs if a queen is moved within its column.
'
&
$
%
Hill-Climbing (cont’d)
Problems:
• Local optima
• Plateaux (flat local optima or shoulders)
• Ridges
How can we cope with these problems? The proper choice might be
problem dependent.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example: 8-queens (cont’d)
The value of cost for the above state is 1. All the neighbours of this
state have cost > 1 thus we have a local minimum.
'
&
$
%
Hill-Climbing (cont’d)
currentstate
objective function
state space
global maximum
local maximum
"flat" local maximum
shoulder
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example: Ridges
'
&
$
%
Hill-Climbing for 8-queens
Let us start with a randomly generated 8-queens state. Then,
steepest ascent hill-climbing performs as follows:
• It solves 14% of the problems within 4 steps on average.
• It gets stuck in local optima 86% of the time within 3 steps on
average.
Reminder: Total state space 88 ≈ 17 million states.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Sideways Moves
When hill-climbing reaches a plateau and there are no uphill moves
then it stops. Alternatively, we could resort to a sideways move:
a move to a state which has the same value as the current one.
However, we have to be careful so that we do not go into an infinite
loop (i.e., when we are on a plateau that is not a shoulder). An
idea that works in some cases is to limit the number of consecutive
sideways moves.
Example: If we limit the number of consecutive sideways moves to
100 in the 8-queens problem, this raises the percentage of solved
problems to 94%.
'
&
$
%
Variations of Hill-Climbing
• First-choice hill climbing: Generates successors randomly
until one is generated that is better than the current state.
This is a good strategy for states with many (e.g., thousands)
of successors.
• Stochastic hill climbing: Chooses randomly among the
uphill moves. The probability of selection can vary with
steepness.
This variation converges more slowly than steepest ascent, but
in some state landscapes it finds better solutions.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
How do we Avoid Local Optima?
We will present two algorithms that avoid local optima:
• Random-Restart Hill-Climbing
• Simulated Annealing
'
&
$
%
Random-Restart Hill-Climbing
Advice: If at first you don’t succeed, try, try again!
Random-restart hill-climbing conducts a series of hill-climbing
searches from randomly generated initial states, stopping when a
goal is found.
With probability approaching 1, we will eventually generate a goal
state as the initial state.
If each hill-climbing search has a probability p of success, then the
expected number of restarts required to reach a solution is 1/p.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Random-Restart Hill-Climbing (cont’d)
Example: 8-queens
p ≈ 0.14
In this case we need roughly 7 iterations (6 failures and 1 success).
Expected number of steps: p times the number of steps of a
successful iteration plus (1− p)/p times the number of steps of a
failed iteration. This is roughly 22 steps in our case.
The success of random-restart hill-climbing depends very much on
the shape of the state space (there are practical problems with
state spaces with very bad shapes).
'
&
$
%
Simulated Annealing
A hill-climbing algorithm that never makes “downhill” moves
towards states with lower value can be incomplete.
A random walk, i.e., moving to a successor chosen uniformly at
random from the set of successors, is complete (proof?) but
extremely inefficient.
How can we combine both?
This is a classical tradeoff between exploration of the search space
and exploitation of the imperfect solution at hand. How do we
resolve this tradeoff?
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Simulated Annealing (cont’d)
Physical analogue: Annealing of metals is the process used to
temper or harden metals by heating them to a high temperature and
then gradually cooling them, thus allowing the material to coalesce into
a low-energy crystalline state.
The discovery of the simulated annealing algorithm is an instance of the
use of ideas from statistical mechanics (an area of condensed matter
physics) to solving large and complex optimization problems.
Statistical mechanics concentrates on analyzing aggregate properties of
large numbers of atoms to be found in samples of liquid or solid matter.
See the paper on simulated annealing by Kirkpatrick et. al. in Science,
Volume 220, Number 4598, May 1983.
'
&
$
%
Statistical Mechanics and Optimization
Physical System Optimization Problem
state feasible solution
energy evaluation function
ground state optimal solution
quenching local search
temperature control parameter T
careful annealing simulated annealing
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Simulated Annealing (cont’d)
Simulated annealing solves the tradeoff among exploration and
exploitation as follows.
At every iteration, a random move is chosen. If it improves the
situation then the move is accepted, otherwise it is accepted with some
probability less than 1.
The probability decreases exponentially with the badness of the move.
It also decreases with respect to a temperature parameter T .
Simulated annealing starts with a high value of T and then T is
gradually reduced. At high values of T , simulated annealing is like pure
random search. Towards the end of the algorithm when the values of T
are quite small, simulated annealing resembles ordinary hill-climbing.
'
&
$
%
Simulated Annealing (cont’d)
function Simulated-Annealing(problem, schedule)
returns a solution state
inputs: problem, a problem
schedule, a mapping from time to “temperature”
local variables: current, a node next, a node T , the temperature
current← MakeNode(RandomState[problem])
for t← 1 to ∞ do
T ← schedule[t]
if T = 0 then return current
next← a randomly selected successor of current
∆E ← Value[next]− Value[current]
if ∆E > 0 then current← next
else current← next only with probability e∆E/T
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example
Let us assume that the current and next point in a search space differ by 13
(i.e., ∆E = −13). Then:
T e∆E/T
1 0.000002
5 0.0743
10 0.2725
20 0.52
50 0.77
1010 0.9999...
Thus, at high values of T , simulated annealing behaves like a random walk;
at low values of T , it behaves like hill-climbing.
'
&
$
%
Simulated Annealing (cont’d)
Simulated annealing finds a global optimum with probability
approaching 1 if the schedule lowers T slowly enough.
The exact bound for parameter t and schedule for T is usually
problem dependent. Thus we need to experiment heavily with
every new problem at hand to see whether simulated annealing
makes a difference.
Simulated annealing is a very popular algorithm and has been
used to solve various classes of interesting optimization problems
(e.g., VLSI layout problems, job-shop scheduling problems etc.)
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Local Beam Search
Idea: Why not keep more than just one state (e.g., k states) in
memory?
At each iteration, all the successors of the k states are generated. If
one of them is a solution then we halt. Otherwise k states are
selected and the process is repeated. We expect good successors to
“attract the attention”.
Diversity is important so we don’t get stuck in bad regions of the
search space.
Stochastic beam search chooses k successors at random, with
the probability of choosing a successor being an increasing function
of its value.
Similar to natural selection?
'
&
$
%
Genetic Algorithms
A genetic algorithm (GA) is a variant of stochastic beam search
in which successor states are generated by combining two parent
states (sexual reproduction).
Concepts:
• Individuals represent states. They are denoted by strings over
an alphabet usually {0, 1}.
• Population are sets of individuals.
• Fitness function is an evaluation function for rating each
individual.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Genetic Algorithms
Operations:
• Reproduction: a new individual is born by combining two
parents.
• Mutation: a new individual is slightly modified.
'
&
$
%
Example: 8-queens
+ =
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example: 8-queens
32252124
(a)Initial Population
(b)Fitness Function
(c)Selection
(d)Cross−Over
(e)Mutation
24748552
32752411
24415124
24
23
20
32543213 11
29%
31%
26%
14%
32752411
24748552
32752411
24415124
32748552
24752411
32752124
24415411
24752411
32748152
24415417
'
&
$
%
A Genetic Algorithm
function Genetic-Algorithm(population, Fitness-Fn) returns an individual
inputs: population, a set of individuals
Fitness-Fn, a function that measures the fitness of an individual
repeat
new population← ∅
loop for i from 1 to Size(population) do
x← Random-Selection(population,Fitness-Fn)
y ← Random-Selection(population,Fitness-Fn)
child← Reproduce(x, y)
if (small random probability) then child← Mutate(child)
add child to new population
population← new population
until some individual is fit enough, or enough time has elapsed
return the best individual in population, according to Fitness-Fn
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Genetic Algorithms (continued)
Intuitively the advantage of genetic algorithms comes form the
ability of crossover to combine large blocks of letters that have
evolved independently to produce useful functions.
The theory of genetic algorithms explains how this works using the
idea of a schema.
Example: 246*****
Representation of instances is very important in genetic algorithms.
'
&
$
%
Readings
• Chapter 4, Section 4.3 of AIMA.
• Parts of Sections 3 and 5 of the book:
Z. Michalewicz and D. B. Fogel. How to Solve it:
Modern Heuristics. Springer, 2000.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Constraint Satisfaction Problems
• Constraint satisfaction problems
• Backtracking algorithms for CSP
• Heuristics
• Local search for CSP
• Problem structure and difficulty of solving
'
&
$
%
Search Problems
The formalism of search problems we have discussed so far is a very
powerful formalism that depends on the notion of state.
From the point of view of a search algorithm, a state is a black
box with no discernible internal structure.
A state can be represented by an arbitrary data structure and can
be accessed only by problem specific functions: successor, goal test,
heuristics etc.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Constraint Satisfaction Problems
The framework we will now present (constraint satisfaction
problems) admits a very simple standard representation.
This allows us to define search algorithms that take advantage of
this very simple representation and use general purpose
heuristics to enable solution of large problems.
The simple structure also allows us to define methods for problem
decomposition and offers us an intimate connection between the
structure of a problem and the difficulty of solving it.
'
&
$
%
Constraint Satisfaction Problems - Definitions
A constraint satisfaction problem (CSP) is defined by:
• A set of variables X1, . . . , Xn. Each variable has a domain Di
of possible values.
• A set of constraints C1, . . . , Cm. Each constraint involves
some subset of the variables and specifies the allowable
combinations of values for that subset.
Formally, an (k-ary) constraint C on a set of variables
X1, . . . , Xk is a subset of the Cartesian product D1 × · · · ×Dk.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Definitions (cont’d)
• A solution to a CSP is a complete assignment of values to
variables such that all the constraints are satisfied.
• A CSP is called consistent it has a solution, otherwise it is
called inconsistent.
'
&
$
%
Example: Map Coloring
WesternAustralia
NorthernTerritory
SouthAustralia
Queensland
New South Wales
Victoria
Tasmania
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example: Formal Definition
• Variables: WA,NT, SA,Q,NSW, V, T
• Domain (same for all variables): { red, green, blue }
• Constraints:
C(WA,NT ) = { (red, green), (red, blue), (green, red),
(blue, red), (blue, green) }
More succinctly WA 6= NT . Similarly for the other pairs of
variables.
'
&
$
%
Constraint Graphs
Victoria
WA
NT
SA
Q
NSW
V
T
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example:The 8-queens problem
'
&
$
%
Example: Formal Definition
• Variables:
Let variable Xi (i = 1, . . . , 8) represent the column that the
i-th queen occupies in the i-th row. If columns are represented
by numbers 1 to 8 then the domain of every variable Xi is
Di = {1, 2, . . . , 8}.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example: Formal Definition
• Constraints:
There is a binary constraint C(Xi, Xj) for each pair of
variables. These constraints can be specified succinctly as
follows:
– For all variables Xi and Xj , Xi 6= Xj .
– For all variables Xi and Xj , if Xi = a and Xj = b then
i− j 6= a− b and i− j 6= b− a.
'
&
$
%
Example: Cryptarithmetic
(a)
OWTF U R+
OWTOWT
F O U R
X2 X1X3
(b)
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example: Formal Definition
• Variables and domains:
F, T, U,W,R,O ∈ {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
X1, X2, X3 ∈ {0, 1}
• Constraints:
alldiff(F, T, U,W,R,O)
O + O = R+ 10X1
X1 +W +W = U + 10X2
X2 + T + T = O + 10X3
X3 = F
'
&
$
%
Constraints in Real Life
In many practical problems there are soft constraints in addition
to hard constraints. Soft constraints encode preferences.
Example: Timetabling for a University. What are the hard
constraints? What are the soft constraints?
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Other Examples of CSPs
• The satisfiability problem in Boolean logic (also a CSP with
finite domains).
• Temporal reasoning (infinite domains).
• Timetabling
• Job-shop scheduling, airline crew scheduling
• Spatial reasoning
• Integer, linear and non-linear programming (operations
research).
'
&
$
%
CSP Technology: Practical, Successful and AI!
CSPs are certainly the most successful example of ideas from AI
with many of applications. There are currently several companies
marketing such technology:
• www.ilog.com
• www.cosytec.com
• www.parc-technologies.com
• ...
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
A Taxonomy of CSPs
• Discrete vs. continuous variables
• Finite vs. infinite domains
• Explicit enumeration of allowed combinations of values vs.
constraint languages
• Linear vs. non-linear constraints
• Unary vs. binary vs. ... constraints
In the rest of this presentation, we will concentrate on search
algorithms for binary finite domain CSPs.
It is possible to reduce every higher-order finite-domain constraint
to a set of binary constraints if enough auxiliary variables are
introduced.
'
&
$
%
Search Algorithms for CSPs
Let us apply a general search algorithm to CSP:
• Initial state: all variables are unassigned.
• Actions: Assign to any unassigned variable Xi any value from
Di.
• Goal test: All variables are assigned and the constraints are
satisfied.
The branching factor in this case is∑n
i=1|Di| where |Di| is the
cardinality of domain Di. The last level of the tree has
(∑n
i=1|Di|)
n nodes.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Search Algorithms for CSPs (cont’d)
Better approach: Order the variables! CSPs are commutative
search problems i.e., the order of application of actions does not
matter.
Characteristics:
• The size of the search space is finite. If the variables are
ordered as X1, . . . , Xn the number of nodes in the search tree is
1 +∑n
i=1(|D1| · · · |Di|).
When is the number of nodes in the search tree maximal?
Minimal?
• The depth of the search tree is fixed.
• There are similar subtrees.
'
&
$
%
Search Algorithms for CSPs (cont’d)
Which of the search algorithms presented so far is appropriate for
solving CSPs:
• BFS?
No! BFS will not be effective because goal states are located at
the leaves of the search tree.
• DFS?
Better than BFS. But it wastes time searching when
constraints are already violated.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Search Algorithms for CSPs (cont’d)
We will present variants of DFS for CSPs. These algorithms are
based on the idea of backtracking search: choose values for one
variable at a time, and backtrack when there are no more legal
values to assign.
We will see the following algorithms and their variants/hybrids:
• Simple or chronological backtracking (BT)
• Forward checking (FC)
• Backjumping (BJ)
• Conflict-directed Backjumping (CBJ)
• Maintaining Arc Consistency (MAC)
'
&
$
%
BT in Operation: Example
WA=red WA=blueWA=green
WA=redNT=blue
WA=redNT=green
WA=redNT=greenQ=red
WA=redNT=greenQ=blue
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Chronological Backtracking (BT)
The basic idea in any backtracking algorithm is to start with a partial
solution and to extend it until we reach a complete solution.
BT follows this general method. Additionally, when it reaches a dead-end,
it always backtracks to the last decision made (hence its name!).
function Backtracking-Search(csp)
returns a solution or failure
return Recursive-Backtracking({}, csp)
'
&
$
%
BT (cont’d)
function Recursive-Backtracking(assignment, csp)
returns a solution or failure
if assignment is complete then return assignment
var ← Select-Unassigned-Variable(Variables[csp], assignment, csp)
for each value in Order-Domain-Values(var, assignment, csp) do
if value is consistent with assignment according to Constraints(csp)
then
add {var = value} to assignment
result← Recursive-Backtracking(assignment, csp)
if result 6= failure then return result
remove {var = value} from assignment
return failure
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
BT (cont’d)
Evaluation:
• Complete? Yes
• Time: O(dne) where d is the maximum domain size, n is the
number of variables, and e is the number of constraints.
• Space: O(nd). This is the amount of space needed for storing
the domains of the variables.
The above time and space complexity bounds are based on the
assumption that constraints can be stored using a constant amount
of space, and constraint checks can be done in constant time.
'
&
$
%
BT (cont’d)
Backtracking can be improved if we give clever answers to the
following questions:
1. Which variable should be assigned next, and in what order
should its values be tried?
2. What are the implications of the current variable assignments
to the other unassigned variables?
3. When a path fails, can the search avoid this failure in
subsequent paths?
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Variable Ordering Heuristics
By default the function Select-Unassigned-Variable selects
the next unassigned variable from the list Variables[csp]. This is
static variable assignment and seldom results in efficient search.
The minimum remaining values (MRV) heuristic: choose the
variable with the fewest remaining legal values (dynamic test).
Question: In the map coloring problem for Australia, what is the
variable to be assigned after the assignments
WA = red, NT = green
have been made?
Answer: SA because only the value blue is possible.
'
&
$
%
Example: Map Coloring
WesternAustralia
NorthernTerritory
SouthAustralia
Queensland
New South Wales
Victoria
Tasmania
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Variable Ordering Heuristics (cont’d)
For coloring the map of Australia, how do we start our search?
The degree heuristic: choose the variable involved in the largest
number of constraints with other unassigned variables (static
test).
Question: In the map coloring problem for Australia, what is the
variable to be assigned first?
Answer: SA that has degree 5. All the other variables have degree
2 or 3 except of T that has degree 0.
'
&
$
%
Variable Ordering Heuristics (cont’d)
MRV is usually more powerful than the degree. They can be used
together with the latter one playing the role of a tie-breaker when
the first one cannot make a distinction.
We will use just MRV to refer to this combination of heuristics.
Both heuristics enforce the well-known fail-first principle.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Evaluation
Problem BT BT+MRV FC FC+MRV Min-con
USA (>1000K) (>1000K) 2K 60 64
n-Queens (>40000K) 13500K (>40000K) 817K 4K
Zebra 3859K 1K 35K 0.5K 2K
'
&
$
%
Value Ordering Heuristics
Once a variable has been selected, the algorithm must decide on
the order in which to examine values.
The least-constraining-value heuristic (LCV): prefer the value
that rules out the fewest choices for neighbouring variables in the
constraint graph. In other words, leave maximum flexibility for
subsequent variable assignments.
LCV is not useful if we are looking for all solutions or the problem
has no solution.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Value Ordering Heuristics (cont’d)
Question: In the map coloring problem for Australia, what is the
value to be assigned to variable Q after the assignments
WA = red, NT = green
have been made?
Answer: We can only have blue or red.
The value blue would be a bad choice according to LCV: it
eliminates the last legal value for SA.
red is better because it has only 1 conflict: with value red for
NSW .
'
&
$
%
Example: Map Coloring
WesternAustralia
NorthernTerritory
SouthAustralia
Queensland
New South Wales
Victoria
Tasmania
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Constraint Propagation
The main idea behind constraint propagation is to consider the
given constraints early in the search or even before the search has
started!
For example, we can prune the search space by examining the
consequences of partial assignments. The algorithms that have
been proposed to achieve this are sometimes referred to as
look-ahead algorithms.
'
&
$
%
Forward Checking
Forward Checking (FC) belongs to the family of backtracking
algorithms based on constraint propagation and maintains the
following invariant:
For every unassigned variable, there exists at least one
value in its domain which is compatible with the values
that have been assigned to other variables.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Forward Checking
FC works as follows: every time a value v is assigned to a variable,
FC will remove all values which are inconsistent with v from the
domains of the unassigned variables. If the domain of any of the
unassigned variables is reduced to an empty set, then v will be
rejected.
FC is typically used together with the MRV heuristic since
all the machinery required for the implementation of MRV is used
by FC as well.
'
&
$
%
FC in Operation: Example
R G B R G B R G B R G B R G B R G B R G BR G B R G B R G B R G B G B R G BR B G R B R G B B R G BR B G R B R G B
Initial domainsAfter WA=redAfter Q=greenAfter V=blue
WA NT Q NSW V SA T
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Evaluation
Problem BT BT+MRV FC FC+MRV Min-con
USA (>1000K) (>1000K) 2K 60 64
n-Queens (>40000K) 13500K (>40000K) 817K 4K
Zebra 3859K 1K 35K 0.5K 2K
'
&
$
%
Example: Map Coloring with FC
WesternAustralia
NorthernTerritory
SouthAustralia
Queensland
New South Wales
Victoria
Tasmania
WA = red, Q = green implies NT = blue, SA = blue.
This inconsistency is not detected by FC.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example (cont’d)
The partial assignment
WA = red, Q = green
together with the problem constraints
WA 6= NT, WA 6= SA, Q 6= NT, Q 6= SA
WA,Q ∈ {red, blue, green}
imply
NT = blue, SA = blue.
These implied constraints together with the problem constraint
NT 6= SA are inconsistent.
FC’s machinery do not allow it to make this second constraint
propagation step.
'
&
$
%
Arc Consistency
We can improve this behavior of FC by devising more sophisticated
propagation steps. For example, the constraint propagation step of
forward checking can actually become stronger by using the
concept of arc consistency.
Definition. Let X,Y be variables of a CSP P and (X,Y ) be a
directed arc in the constraint graph for P . The arc (X,Y ) is
called consistent if for every value x of X, there is some value y of
Y such that x is consistent with y.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Arc Consistency in Operation: Example
R G B R G B R G B R G B R G B R G B R G BR G B R G B R G B R G B G B R G BR B G R B R G B B R G BR B G R B R G B
Initial domainsAfter WA=redAfter Q=greenAfter V=blue
WA NT Q NSW V SA T
'
&
$
%
Example (cont’d)
Consider the third row of the previous table for the problem of coloring
the map of Australia using FC. If the current domains of nodes SA and
NSW are { blue } and { red, blue } then arc (SA,NSW ) is
consistent.
Arc (NSW,SA) is inconsistent because assignment NSW = blue does
not have a consistent assignment for SA. In this case we should delete
the value blue from the domain of NSW to make the arc consistent.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example (cont’d)
Now consider the arc (SA,NT ). This arc is inconsistent. To make
the arc consistent, we remove the value blue from the domain of SA,
leaving this domain empty.
Thus applying arc consistency resulted in earlier detection of an
inconsistency during search (a path that would not lead to a
solution).
'
&
$
%
Arc Consistency (cont’d)
In a backtracking algorithm arc consistency can be applied as:
• Preprocessing step before the search starts.
• Constraint propagation step after each assignment
repeatedly until no more arc inconsistencies remain. This
algorithm is known as MAC which stands for maintaining
arc consistency.
Example: If we use MAC after the assignment
WA = red, Q = green
in our map coloring example, we immediately discover the
impossibility of extending this assignment because arc (SA,NT ) is
inconsistent.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The Algorithm AC-3
function AC-3( csp) returns the CSP, possibly with reduced domains
inputs: csp, a binary CSP with variables {X1, X2, . . . , Xn}
local variables: queue, a queue of arcs, initially all the arcs in csp
while queue is not empty do
(Xi, Xj)←Remove-First(queue)
if Remove-Inconsistent-Values(Xi, Xj) then
for each Xk in Neighbors[Xi] do
add (Xk, Xi) to queue
'
&
$
%
The Algorithm AC-3 (cont’d)
function Remove-Inconsistent-Values(Xi, Xj) returns true
iff we remove a value
removed← false
for each x in Domain[Xi] do
if no value y in Domain[Xj] allows (x,y) to satisfy
the constraint between Xi and Xj then
delete x from Domain[Xi]; removed← true
return removed
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Stronger Notions of Consistency
There are stronger notions of consistency (3-consistency,
4-consistency ..., n-consistency) that can be employed in
backtracking algorithms in a similar way.
At each step of the search process, these consistency steps bring
into the light the implications of the problem constraints by
examining 3 variables, 4 varialbes, ..., n variables at a time.
Important: The cost of constraint propagation steps needs
to be carefully balanced with their benefits. One way to determine
this is by carrying out experiments.
'
&
$
%
Intelligent Backtracking
Backtracking algorithms can become more effective by backtracking
in a more sophisticated way (intelligently!). The techniques
proposed to achieve this are sometimes referred to as look-back
techniques.
The idea here is to avoid backtracking chronologically like BT,
but rather backtrack in a clever way to the actual cause of the
failure.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Chronological Backtracking
WesternAustralia
NorthernTerritory
SouthAustralia
Queensland
New South Wales
Victoria
Tasmania
'
&
$
%
Chronological Backtrakcing (cont’d)
Let us consider BT in the problem of map coloring with a fixed
variable ordering: Q,NSW, V, T, SA,WA,NT . Suppose we have
generated the following partial assignment:
Q = red, NSW = green, V = blue, T = red
When we try the next variable, SA, we see that every value
violates a constraint. Now BT tell us to backtrack and try a new
color for Tasmania! This is not a good idea!
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Backjumping
Backjumping (BJ) is an intelligent backtracking algorithm.
When a dead-end occurs at variable x, BJ does not backtrack to
the previous variable like BT. Instead, it backtracks to the deepest
variable in the search tree (also called the most recent variable)
which caused a value in the domain of x to be eliminated.
The set of these variables is called the conflict set of x.
'
&
$
%
Example with BJ
WesternAustralia
NorthernTerritory
SouthAustralia
Queensland
New South Wales
Victoria
Tasmania
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
BJ (cont’d)
Let us consider BJ in the problem of map coloring with a fixed
variable ordering:
Q,NSW, V, T, SA,WA,NT
Suppose we have generated the following partial assignment:
Q = red, NSW = green, V = blue, T = red
When we try the next variable, SA, we see that every value
violates a constraint.
The variables that cause elimination of all possible values for SA
are {Q, NSW, V }. Now BJ tells us to backtrack to V .
'
&
$
%
BJ vs. FC
When backjumping occurs, all values of a domain are in conflict
with the current assignment. This would have already been
detected by FC!
Proposition. Every branch of a search tree pruned by BJ is also
pruned by FC.
Thus BJ is redundant in a search using FC or a stronger constraint
propagation algorithm such as MAC.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example
WesternAustralia
NorthernTerritory
SouthAustralia
Queensland
New South Wales
Victoria
Tasmania
'
&
$
%
From BJ to CBJ
Let us consider again BJ in the problem of map coloring with the
fixed variable ordering
WA,NSW, T,NT,Q, V, SA.
Suppose we have generated the following partial assignment:
WA = red, NSW = red
This assignment cannot lead us to a solution.
But let us assign T = red, and continue with NT,Q, V, SA. This is
not going to work and, eventually, we run out of values at NT .
Where should we backtrack?
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
From BJ to CBJ (cont’d)
BJ cannot tell us anything useful because the conflict set of NT is
empty (i.e., NT has values that are consistent with all previous
variables).
In this case, it is really the variables
NT,Q, V, SA
taken together that conflict with the previous variables.
This leads to a deeper notion of a conflict set of a variable x: it is
that set of preceding variables that caused x, together with any
subsequent variables, to lead to failure.
Under this definition, the conflict set for NT is {WA,NSW} and
we should backtrack to NSW .
This is the idea behind conflict-directed backjumping.
'
&
$
%
Conflict-Directed Backjumping (CBJ)
In CBJ every variable has a conflict set as in BJ.
When a consistency check between an instantiation vi of the
current variable xi and an instantiation vk of a previously assigned
variable xk fails, then xk is added to the conflict set of xi.
When there are no more values to be tried for xi, CBJ backtracks
to the deepest variable in the conflict set of xi. At the same time,
the variables in the conflict set of xi (except of xk) are added to
the conflict set of xk so that no information about conflicts is
lost. This is the important difference with BJ.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Hybrid Algorithms
The ideas presented in the previous algorithms can be combined to
create hybrid algorithms e.g., FC-CBJ or MAC-CBJ.
Heuristics are usually combined with these hybrid algorithms as
well.
'
&
$
%
Evaluating Backtracking Algorithms
Criteria:
1. Worst case time/space complexity
2. Run time
3. Number of nodes visited in the search tree
4. Number of consistency checks performed
5. Number of backtracks
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Evaluating Backtracking Algorithms (cont’d)
Results:
• BT, FC, BJ, CBJ, MAC and their variants have exponential
worst case time complexity.
• Run time can vary depending on implementation details.
• Visited nodes
FC-CBJ ≤ FC-BJ ≤ FC ≤ BJ ≤ BT
CBJ ≤ BJ
MAC-CBJ ≤ MAC-BJ ≤ MAC
MAC ≤ FC
'
&
$
%
Evaluating Backtracking Algorithms (cont’d)
• Consistency checks
CBJ ≤ BJ ≤ BT
FC-CBJ ≤ FC-BJ ≤ FC
FC may perform more or less consistency checks than BJ and
BT depending on the problem.
• Experimental results have shown that in most cases a good
constraint propagation algorithm (like MAC or FC) with a
good set of heuristics (like MRV and LCV) can go a long way
in solving difficult CSP problems.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Local Search Algorithms
All the search algorithms we have presented up to now (for general
search problems or CSPs) are systematic: they explore the search
space carefully keeping track of each path explored until they find a
solution.
We have already seen how to solve n-queens problems using local
search. Can we solve arbitrary CSPs using local search
algorithms?
'
&
$
%
Local Search Algorithms (cont’d)
Idea: Start with a “solution” and make modifications until you
reach a solution. Graphically:
evaluation
currentstate
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Local Search Algorithms for CSPs
Local search is particularly useful for CSPs. A local search algorithm starts
with a random assignment of values to variables and then modifies
(repairs) this assignment until it becomes a solution.
These algorithms are also referred to as heuristic repair algorithms in the
literature.
We have already seen the application of hill-climbing to the 8-queens
problem.
'
&
$
%
The Min-Conflicts Heuristic
In choosing a new value for a variable, a useful heuristic is to
choose the one that would cause the minimum number of
conflicts with the current assignment to the other
variables.
The above heuristic is called the min-conflicts heuristic and it is
surprisingly powerful for many CSPs.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Min-Conflicts and N-Queens
When given a reasonable initial state, it can solve problems with
millions of queens in around 50 steps (in fact, its running time is
roughly independent of the problem size).
We can compute an initial state by iterating through the rows,
placing each queen in the column where it conflicts with the least
number of previously placed queens.
The repair can be accomplished in O(n) time by keeping a list of
all queens that are in conflict (i.e., are attacked by others) together
with counters showing the number of attacking queens for each
alternative position of these queens.
'
&
$
%
Min-Conflicts
function Min-Conflicts(csp,max-steps) returns a solution or failure
inputs: csp, a constraint satisfaction problem
max-steps, the number of steps allowed before giving up
local variables: current, a complete assignment
var, a variable
value, a value for a variable
current← an initial complete assignment for csp
for i = 1 to max-steps do
if current is a solution for csp then return current
var← a randomly chosen, conflicted variable from Variables[csp]
value← the value v for var that minimizes Conflicts(var, v, current, csp)
set var=value in current
return failure
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example
2
2
1
2
3
1
2
3
3
2
3
2
3
0
'
&
$
%
Evaluation
Problem BT BT+MRV FC FC+MRV Min-con
USA (>1000K) (>1000K) 2K 60 64
n-Queens (>40000K) 13500K (>40000K) 817K 4K
Zebra 3859K 1K 35K 0.5K 2K
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Min-Conflicts and Scheduling Problems
The min-conflicts heuristic has been used in observation scheduling
algorithms for the Hubble telescope reducing the scheduling time
from 3 weeks to 10 minutes!
The importance of local search algorithms in problems such as
scheduling is that on-line re-scheduling is an important
operation. This can be gracefully carried out by local search as well.
The local search techniques we have discussed (hill climbing,
simulated annealing etc.) can also be applied to constraint
optimization problems.
'
&
$
%
The Structure of Problems
There are ways to exploit the structure of a CSP to find
solutions quickly. For example:
• Independent subproblems
• Tree CSPs and directed arc consistency
• Tree decomposition
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Readings
Chapter 5 of AIMA.
'
&
$
%
Knowledge-Based Agents
Knowledge-based agents are best understood as agents that
know about their world and reason about their courses of action.
Basic concepts:
• The knowledge-base (KB): a set of representations of facts
about the world.
• The knowledge representation language: a language
whose sentences represent facts about the world.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Knowledge-Based Agents (cont’d)
• TELL and ASK interface: operations for adding new
sentences to the KB and querying what is known. This is
similar to updating and querying in databases.
• The inference mechanism: a mechanism for determining
what follows from what has been TELLed to the knowledge
base. The ASK operation utilizes this inference mechanism.
'
&
$
%
A Generic Knowledge-based Agent
function KB-Agent(percept) returns an action
static KB, a knowledge-base
t, a counter, initially 0, indicating time
Tell(KB,Make-Percept-Sentence(percept, t))
action← Ask(KB,Make-Action-Query(t))
Tell(KB,Make-Action-Sentence(action, t))
t← t+ 1
return action
This agent design is similar to the one for agents with internal
state.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Knowledge-based Agents (cont’d)
We can describe a knowledge-based agent at three levels:
• The knowledge level: In this level the agent is specified by
saying what it knows about the world and what its goals are.
• The logical level: This is the level at which the knowledge is
encoded into sentences of some logical language.
• The implementation level: This is the level where sentences
are implemented. This level runs on the agent architecture.
Note: Declarative vs. procedural way of system building
'
&
$
%
Knowledge-based Agents (cont’d)
Example:
• Knowledge level or epistemological level:
The automated taxi driver knows that Golden Gate Bridge
links San Francisco and Marin County.
• Logical level:
The automated taxi driver has the FOL sentence
Links(GGBridge, SF,Marin) in its KB.
• Implementation level:
The sentence Links(GGBridge, SF,Marin) is implemented by
a Pascal record (or a C structure).
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Knowledge-Based Agents (cont’d)
We can build a knowledge-based agent by TELLing it what it
needs to know before it starts perceiving the world.
We can also design learning mechanisms that output general
knowledge about the environment given a series of percepts.
Autonomous agent=Knowledge-based agent + Learning mechanism
'
&
$
%
The Wumpus World (WW)
Breeze Breeze
Breeze
BreezeBreeze
Stench
Stench
BreezePIT
PIT
PIT
1 2 3 4
1
2
3
4
START
Gold
Stench
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The WW (cont’d)
• Environment: 4x4 grid of rooms with agent, wumpus, gold
and pits.
• Actuators: The agent can move forward, turn left or turn
right. The agent dies if it enters a room with a pit or a live
wumpus.
The agent has action Grab and Shoot (one arrow only) at its
disposal.
• Sensors: The percept is a list of 5 symbols:
(Stench, Breeze, Glitter, Bump, Scream)
Any of the above values can be None.
'
&
$
%
Reasoning and Acting in the WW
ABG
PS
W
= Agent = Breeze = Glitter, Gold
= Pit = Stench
= Wumpus
OK = Safe square
V = Visited
A
OK
1,1 2,1 3,1 4,1
1,2 2,2 3,2 4,2
1,3 2,3 3,3 4,3
1,4 2,4 3,4 4,4
OKOKB
P?
P?A
OK OK
OK
1,1 2,1 3,1 4,1
1,2 2,2 3,2 4,2
1,3 2,3 3,3 4,3
1,4 2,4 3,4 4,4
V
(a) (b)
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Reasoning and Acting in the WW (cont’d)
BB P!
A
OK OK
OK
1,1 2,1 3,1 4,1
1,2 2,2 3,2 4,2
1,3 2,3 3,3 4,3
1,4 2,4 3,4 4,4
V
OK
W!
VP!
A
OK OK
OK
1,1 2,1 3,1 4,1
1,2 2,2 3,2 4,2
1,3 2,3 3,3 4,3
1,4 2,4 3,4 4,4
V
S
OK
W!
V
V V
BS G
P?
P?
(b)(a)
S
ABG
PS
W
= Agent = Breeze = Glitter, Gold
= Pit = Stench
= Wumpus
OK = Safe square
V = Visited
'
&
$
%
KR languages: Syntax and Semantics
A KR language is defined by specifying its syntax and semantics.
The syntax of a KR language specifies the well-formed formulas
and sentences.
The semantics of a KR language defines a correspondence
between formulas/sentences of the language and facts in the world
to which these formulas/sentences refer.
A sentence of a KR language does not mean anything by itself.
The semantics or meaning of a sentence must be provided by its
writer by means of an interpretation.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Truth and Entailment
Truth. A sentence will be called true under a particular
interpretation if the state of affairs it represents is the case.
Entailment. We will write KB |= α to denote that whenever the
sentences of KB are true, then the sentence α is also true. In this
case we will say that the sentences of KB entail the sentence α.
Given a knowledge-base KB and a sentence α, how do we design
an algorithm that verifies whether KB |= α?
'
&
$
%
Entailment (cont’d)
Follows
Sentences
Facts
Sentence
Fact
Entails
Se
ma
ntic
s
Se
ma
ntic
s
Representation
World
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Inference, proof and proof-theory
Inference is the process of mechanically deriving sentences
entailed by a knowledge-base. If sentence α is derived from KB
using inference mechanism i then we will write KB `i α.
An inference mechanism is called sound if it derives only sentences
that are entailed.
An inference mechanism is called complete if it derives all the
sentences that are entailed.
The steps used to derive a sentence α from a set of sentences KB is
called a proof.
A proof theory is a set of rules for deriving the entailments of a
set of sentences.
'
&
$
%
Logic
The KR languages we will consider will be based on propositional
logic and first-order logic.
In general, a logic is a formal system consisting of:
• Syntax
• Semantics
• Proof theory
Why do we use logical languages for KR? Why don’t we use
natural language or programming languages?
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Propositional Logic (PL): Syntax
The symbols of PL are:
• A countably infinite set of proposition symbols P1, P2, . . ..
This set will be denoted by P.
• The logical connectives ¬, ∧ , ∨ , =⇒ and ⇐⇒ .
• Parentheses: (, ).
Note: Logicians usually introduce only the connectives ¬ and ∨
and define the rest in terms of them.
'
&
$
%
PL: Syntax (cont’d)
The following context-free grammar defines the well-formed
sentences of propositional logic.
Sentence→ AtomicSentence | ComplexSentence
AtomicSentence→ True | False | Symbol
Symbol→ P1 | P2 | · · ·
ComplexSentence → (Sentence) | ¬Sentence
| Sentence BinaryConnective Sentence
BinaryConnective→ ∧ | ∨ | =⇒ | ⇐⇒
Precedence: ¬, ∧ , ∨ , =⇒ and ⇐⇒ .
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
PL: Semantics
A proposition symbol can mean anything we want. Its
interpretation can be any arbitrary fact. This fact will be either
true or false in the world. This is not the same in other logics (e.g.,
fuzzy logic!).
This is formalized by introducing the notion of interpretation.
Definition. Let P be the set of proposition symbols. An
interpretation for P is a mapping
I : P → {true, false}.
'
&
$
%
PL: Semantics (cont’d)
The notion of interpretation can be extended to arbitrary
well-formed sentences of PL using the following recursive
definitions:
• I(True) = true.
• I(False) = false.
• I(¬φ) = true if I(φ) = false; otherwise it is false.
• I(φ1 ∧ φ2) = true if I(φ1) = true and I(φ2) = true; otherwise
it is false.
• I(φ1 ∨ φ2) = true if I(φ1) = true or I(φ2) = true; otherwise it
is false.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
PL: Semantics (cont’d)
• I(φ1 =⇒ φ2) = true if I(φ1) = false or I(φ2) = true;
otherwise it is false.
Explanation: If φ1 and φ2 are both true then most people
would agree that φ1 =⇒ φ2 (φ1 implies φ2) should be true.
Example: For all integers, if x is even then x+ 2 is even. If we
take x to be 6 then this says: “If 6 is even then 6+2 is even”.
But what about cases where the truth value of φ1 is false?
Example: If we take x to be 7 then the above formula says:
“If 7 is even then 7+2 is even”. Is this sentence true or false?
This is an instance of a “false implies false” implication.
'
&
$
%
PL: Semantics (cont’d)
We will take the above sentence to be true although some of us
might find it disconcerting. It would be wrong to take it to be false
given that the more general sentence of which it is an instance is
true.
We have similar difficulties for “false implying true”.
Example: If 1+1=3 then Athens is the capital of Greece.
The case “true implying false” is easier: most people would
accept such an implication to be false.
Thus we have taken “implication” to have the semantics of
material implication.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
PL: Semantics (cont’d)
• I(φ1 ⇐⇒ φ2) = true if I(φ1) = I(φ2); otherwise it is false.
'
&
$
%
Compositionality of PL
A language is called compositional when the meaning of a
sentence is a function of the meaning of the parts.
Compositionality is a desirable property in formal languages.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The Ontological Commitments of PL
Ontological commitments have to do with the nature of reality.
PL assumes that the world consists of facts that either hold
or not hold.
Other logics, for example FOL, make more elaborate and detailed
ontological commitments.
'
&
$
%
Satisfaction and Models
Definition. Let φ be a PL sentence. If I is an interpretation such
that I(φ) = true then we say that I satisfies φ or I is a model of
φ.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Satisfiability
Definition. A sentence φ of PL is satisfiable if there is an
interpretation I such that I(φ) = true.
Examples: P, P ∨ Q, (P ∧ R) ∨ Q
Definition. A sentence φ of PL is unsatisfiable if there is no
interpretation I such that I(φ) = true.
Example: P ∧ ¬P
'
&
$
%
Validity
Definition. A sentence φ of PL is valid if for all interpretations
I, I(φ) = true.
Examples: P ∨ ¬P , ((P ∨ H) ∧ ¬H) =⇒ P
Valid statements in PL are also called tautologies.
Theorem. Let φ be a sentence of PL. If φ is unsatisfiable then its
negation ¬φ is valid. Proof?
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Entailment
Definition. Let φ and ψ be sentences of PL. We will say that φ
entails ψ (denoted by φ |= ψ) if for all interpretations I such that
I(φ) = true then I(ψ) = true.
Example: P ∧ Q |= P
The deduction theorem. Let φ and ψ be sentences of PL.
Then φ |= ψ iff φ =⇒ ψ is valid. Proof?
Example: (P ∧ Q) =⇒ P is a valid sentence.
'
&
$
%
Entailment and Unsatisfiability
Theorem. Let φ and ψ be sentences of PL. Then φ |= ψ iff
φ ∧ ¬ψ is unsatisfiable. Proof?
Example: P ∧ Q |= P
The above theorem is the essence of proofs by contradiction or
refutation.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Equivalence
Definition. Let φ and ψ be sentences of PL. We will say that φ
is equivalent to ψ (denoted by φ ≡ ψ) if φ |= ψ and ψ |= φ.
Example: ¬(P ∧ ¬Q) ≡ ¬P ∨ Q
'
&
$
%
Some Useful Equivalences
• (α ∧ β) ≡ (β ∧ α) commutativity of ∧
• (α ∨ β) ≡ (β ∨ α) commutativity of ∨
• ((α ∧ β) ∧ γ) ≡ (α ∧ (β ∧ γ)) associativity of ∧
• ((α ∨ β) ∨ γ) ≡ (α ∨ (β ∨ γ)) associativity of ∨
• ¬(¬α) ≡ α double-negation elimination
• (α⇒ β) ≡ (¬β ⇒ ¬α) contraposition
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Some Useful Equivalences (cont’d)
• (α⇒ β) ≡ (¬α ∨ β) implication elimination
• (α⇔ β) ≡ ((α⇒ β) ∧ (β ⇒ α)) biconditional elimination
• ¬(α ∧ β) ≡ (¬α ∨ ¬β) de Morgan law
• ¬(α ∨ β) ≡ (¬α ∧ ¬β) de Morgan law
• (α ∧ (β ∨ γ)) ≡ ((α ∧ β) ∨ (α ∧ γ)) distribution of ∧ over ∨
• (α ∨ (β ∧ γ)) ≡ ((α ∨ β) ∧ (α ∨ γ)) distribution of ∧ over ∨
'
&
$
%
Truth Tables
A ¬A
true false
false true
A B A ∧ B A ∨ B A =⇒ B A ⇐⇒ B
false false false false true true
false true false true true false
true false false true false false
true true true true true true
Why are truth tables useful?
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Truth tables (cont’d)
Example: A truth table for showing the validity of sentence
((P ∨ H) ∧ ¬H) =⇒ P .
P H P ∨ H (P ∨ H) ∧ ¬H ((P ∨ H) ∧ ¬H)
=⇒ P
false false false false true
false true true false true
true false true true true
true true true false true
'
&
$
%
PL Satisfiability as a CSP
The satisfiability problem for PL is fundamental. The entailment
and validity problems can be rephrased as satisfiability problems.
Notice that the satisfiability problem for PL can be phrased as a
CSP. What are the variables, values and constraints?
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Complexity of PL Satisfiability and Validity
Theorem. The problem of determining whether a sentence of PL
is satisfiable is NP-complete (Cook, 1971).
Corollary. The problem of determining whether a sentence of PL
is valid is co-NP-complete.
The above results mean that it is highly unlikely that we will ever
find a polynomial time algorithm for these problems.
'
&
$
%
Horn Sentences
Definition. A PL sentence will be called Horn if it is in one of
the following two forms:
Q
or
P1 ∧ P2 ∧ . . . ∧ Pn =⇒ Q
The second of the above forms is equivalent to
¬P1 ∨ ¬P2 ∨ . . . ∨ ¬Pn ∨ Q.
Theorem. If φ is a conjunction of Horn sentences then the
satisfiability of φ can be decided in polynomial time.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Inference Rules for PL
An inference rule is a rule of the form
α1, α2, . . . , αn
β
where α1, α2, . . . , αn are sentences called conditions and β is a
sentence called conclusion.
Whenever we have a set of sentences that match the conditions of
an inference rule then we can conlude the sentence in the
conclusion.
'
&
$
%
Inference Rules for PL (cont’d)
• Modus Ponens: α =⇒ β, αβ
• And-Elimination: α1 ∧ α2 ∧ ... ∧ αn
αi
• And-Introduction: α1,α2,...,αn
α1 ∧ α2 ∧ ... ∧ αn
• Or-Introduction: αi
α1 ∨ α2 ∨ ... ∨ αn
• Double-Negation Elimination: ¬¬αα
• Unit Resolution: α ∨ β, ¬βα
• Resolution: α ∨ β, ¬β ∨ γα ∨ γ
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Why Is Inference Important?W
orld
input sentences
conclusions
User
?
'
&
$
%
Formalizing the WW in PL
BG
PS
W
A = Agent = Breeze = Glitter, Gold
= Pit = Stench
= Wumpus
OK = Safe square
V = Visited
B P!
A
OK OK
OK
1,1 2,1 3,1 4,1
1,2 2,2 3,2 4,2
1,3 2,3 3,3 4,3
1,4 2,4 3,4 4,4
V
SOK
W!
V
We can formalize the above situation in PL and use inference to
conclude that the wumpus is in room (1,3). Do it as an exercise!
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Formalizing the WW in PL (cont’d)
Consider the following WW rule:
If a square has no smell, then neither the square nor any of
its adjacent squares can house a Wumpus.
How can we formalize this rule in PL?
We have to write one rule for every relevant square! For example:
¬S11 =⇒ ¬W11 ∧ ¬W12 ∧ W21
This is a very disappointing feature of PL. There is no way in PL
to make a statement referring to all objects of some kind (e.g., to
all squares).
Not to worry: this can be done in first order logic!
'
&
$
%
A knowledge-based agent using PL
function Propositional-KB-Agent(percept) returns an action
static KB, a knowledge-base
t, a counter, initially 0, indicating time
Tell(KB,Make-Percept-Sentence(percept, t))
for each action in the list of possible actions do
if Ask(KB,Make-Action-Query(t, action)) then
Tell(KB,Make-Action-Sentence(action, t))
t← t+ 1
return action
end
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Readings
Chapter 7 of AIMA: Logical Agents, Sections 7.1 to 7.5.
'
&
$
%
First-Order Logic (FOL)
Ontological commitments of FOL:
• The world consists of objects i.e., things with individual
identities. Objects have properties that distinguish them
from other objects.
• Objects participate in relations with other objects. Some of
these relations are functions. Relations hold or do not hold.
These ontological commitments make FOL more powerful than PL.
FOL is here to stay!
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
FOL: Syntax
The symbols of FOL (with equality) are the following:
• Parentheses: (, ).
• The logical connectives ¬, ∧ , ∨ , =⇒ and ⇐⇒ .
• A countably infinite set of variables. This set will be denoted
by V ars.
Examples: x, y, v, . . .
'
&
$
%
FOL: Syntax (cont’d)
• The quantifier symbols: ∀, ∃
• A countably infinite set of constant symbols.
Examples: John, Mary, 5, 6, Ball, . . .
• The equality symbol: =
• Predicate symbols: For each positive integer n, some set
(possibly empty) of symbols, called n-place predicate symbols.
Examples: Happy(.), Brother(., .), Arrives(., ., .), . . .
• Function symbols: For each positive integer n, some set
(possibly empty) of symbols, called n-place function symbols.
Examples: FatherOf(.), Cosine(.), . . .
Logicians usually introduce only the connectives ¬ and ∨ and one
of the quantifiers.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
FOL: Syntax (cont’d)
Terms are expressions of FOL that refer to objects. The set of all
terms will be denoted by Terms.
The following BNF grammar gives the syntax of terms:
Term→ ConstantSymbol | V ariable
| FunctionSymbol(Term, . . . , T erm)
Examples:
John, x, FatherOf(John), WifeOf(FatherOf(x)), . . .
'
&
$
%
FOL: Syntax (cont’d)
Atomic formulas are expressions of FOL that refer to simple
facts.
The following BNF grammar gives the syntax of atomic formulas:
AtomicFormula→ Term = Term
| PredicateSymbol(Term, . . . , T erm)
Examples:
John = ElderSonOf(FatherOf(John)), Happy(John),
Lives(John, London), Arrives(John,Athens,Monday)
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
FOL: Syntax (cont’d)
Well-formed formulas (wffs) are the most complex kind of
expressions in FOL. They can be used to refer to any complicated state
of affairs.
The following BNF grammar gives the syntax of wffs:
Wff → AtomicFormula | ( Wff ) | ¬ Wff
| Wff BinaryConnective Wff
| ( Quantifier V ariable ) Wff
'
&
$
%
FOL: Syntax (cont’d)
Examples of wffs:
• ¬Loves(Tony,Mary)
• Loves(Tony, Paula) ∨ Loves(Tony, F iona)
• Loves(John, Paula) ∧ Loves(John, F iona)
• (∀x)(SportsCar(x) ∧ HasDriven(Mike, x) =⇒ Likes(Mike, x))
• (∃x)(SportsCar(x) ∧ Owns(John, x))
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Free Variables
The following recursive definition defines the notion of free variables
of a wff.
• If φ is an atomic formula, x occurs free in φ iff x is a symbol of φ.
• x occurs free in ¬φ iff x is a symbol of φ.
• x occurs free in φ ∧ ψ iff x is a symbol of φ or ψ. Similarly for the
remaining binary connectives.
• x occurs free in (∀v)φ iff x is a symbol of φ and x is different than
v. Similarly for ∃.
The opposite of free is bound.
Definition. If no variable occurs free in the wff φ, then φ is a
sentence.
'
&
$
%
Free Variables (cont’d)
Examples:
• x is free in Brother(x, John) but not in
(∀x)(Cat(x) =⇒ Mammal(x)).
• y is free in
(∀x)(Friend(x, y) =⇒ Loves(x, y))
but not in
(∀x)(∀y)(Friend(x, y) =⇒ Loves(x, y)).
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
FOL: Semantics
The meaning of FOL formulas is provided by interpretations.
An interpretation is a mapping between symbols of FOL and
objects, functions or relations in the world. More precisely:
• An interpretation maps each constant symbol to an object in
the world.
Example: In one particular interpretation the symbol John
might refer to John Major, the British PM. In another
interpretation it might refer to the evil King John, king of
England from 1199 to 1216.
'
&
$
%
FOL: Semantics (cont’d)
• An interpretation maps each predicate symbol to a relation in
the world.
Example: In one particular interpretation the symbol
Brother(., .) might refer to the relation of brotherhood. In a
world with three objects, King John, John Major, and Richard
the Lionheart, the relation of brotherhood is defined by the
following set of tuples:
{ 〈King John,Richard the Lionheart〉,
〈Richard the Lionheart,King John〉 }
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
FOL: Semantics (cont’d)
• An interpretation always maps the equality symbol to the
identity relation in the world. The identity relation is:
id = {〈o, o〉 : o is an object in the world}
• An interpretation maps each function symbol to a functional
relation (or function) in the world.
Example: In one particular interpretation the symbol
FatherOf(.) might refer to the relation of fatherhood.
'
&
$
%
FOL: Formal Semantics
An interpretation I is a function which makes the following
assignments to the symbols of FOL:
1. I assigns to the quantifier symbol ∀ a non-empty set |I| called
the universe or domain of I.
2. I assigns to each constant symbol c a member cI of the
universe |I|.
3. I assigns to each n-place predicate symbol P an n-ary relation
P I ⊆ |I|n; i.e., P I is a set of n-tuples of members of the
universe.
4. I assigns to each n-place function symbol f an n-ary function
f I on |I|; i.e., f I : |I|n→ |I|.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Satisfaction
Definition. A variable assignment is a function s : V ars→ |I|
for some set of variables V ars and interpretation I.
Let φ be a wff of FOL, I an interpretation and s : V ars→ |I| a
variable assignment.
We will define what it means for I to satisfy φ with variable
assignment s. This will be denoted by
|=I φ[s].
Intuitively |=I φ[s] if and only if the state of affairs denoted by φ is
true according to I (where any variable x which occurs in φ, stands
for s(x) wherever it occurs free).
'
&
$
%
Satisfaction (cont’d)
The formal definition of satisfaction proceeds as follows:
Terms. We define the function
s : Terms→ |I|
from the set of all terms Terms into the universe |I|. This function
is an extension of s, and maps each FOL term to the object in the
universe denoted by this term:
• For each variable x, s(x) = s(x).
• For each constant symbol c, s(c) = cI .
• If t1, . . . , tn are terms and f is an n-place function symbol, then
s(f(t1, . . . , tn)) = f I(s(t1), . . . , s(tn)).
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Satisfaction (cont’d)
Atomic formulas. The definition of satisfaction for atomic
formulas is as follows:
• For atomic formulas involving the equality symbol,
|=I t1 = t2[s] iff s(t1) is identical to s(t2).
• For an n-place predicate symbol P ,
|=I P (t1, . . . , tn)[s] iff 〈s(t1), . . . , s(tn)〉 ∈ P I .
'
&
$
%
Satisfaction (cont’d)
Other wffs.
• |=I ¬φ[s] iff 6|=I φ[s] (i.e., iff |=I φ[s] is not the case).
• |=I (φ ∧ ψ)[s] iff |=I φ[s] and |=I ψ[s].
• |=I (φ ∨ ψ)[s] iff |=I φ[s] or |=I ψ[s].
• |=I (φ =⇒ ψ)[s] iff 6|=I φ[s] or |=I ψ[s].
• |=I (φ ⇐⇒ ψ)[s] iff |=I φ[s] and |=I ψ[s], or 6|=I φ[s] and
6|=I ψ[s].
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Satisfaction (cont’d)
• |=I (∀x)φ [s] iff for all d ∈ |I|, we have |=I φ[s(x|d)].
The function s(x|d) is defined as follows:
s(x|d)(y) =
s(y) if y 6= x
d if y = x
• |=I (∃x)φ [s] iff there exists d ∈ |I| such that |=I φ[s(x|d)].
'
&
$
%
Example: the WW in FOL
Breeze Breeze
Breeze
BreezeBreeze
Stench
Stench
BreezePIT
PIT
PIT
1 2 3 4
1
2
3
4
START
Gold
Stench
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example (cont’d)
If we want to formalize the WW in FOL, we can use the following
symbols:
• Constant symbols:
Agent,Wumpus,Gold,Breeze, Stench,Rm11, Rm12, . . . , Rm44
• Function symbols:
– The unary function symbol NorthOf to denote the unique
room which is north of the room denoted by the argument
of the function. For example, the room north of room 11 is
room 21.
– The unary function symbols SouthOf,WestOf,EastOf
with similar meanings.
'
&
$
%
Example (cont’d)
• Predicate symbols:
– The binary predicate Location will be used to denote the
location (i.e. room) of each object (agent, wumpus and
gold).
– The binary predicate Percept will be used to denote the
percept (i.e., breeze or stench) in each room.
– The unary predicate Bottomless will be used to denote that
a room contains a pit.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example (cont’d)
Let us now provide an interpretation I of the above symbols which
corresponds to the previous picture:
• The universe of I is the objects we see in the picture:
|I| = {agent, wumpus, gold, breeze, stench, rm11, . . . , rm44}.
• I makes the following assignments to constant symbols:
AgentI = agent, WumpusI = wumpus, GoldI = gold,
BreezeI = breeze, StenchI = stench,
Rm11I = rm11, . . . , Rm44I = rm44
'
&
$
%
Example (cont’d)
• I assigns to the unary function symbol NorthOf the function
NorthOf I : |I| → |I| which is defined as follows:
NorthOf I(rm11) = rm21,
NorthOf I(rm21) = rm22, . . . , NorthOf I(rm34) = rm44
• I assigns to the unary function symbols
SouthOf,WestOf,EastOf the function symbols
SouthOf I ,WestOf I , EastOf I that are defined similarly with
NorthOf I .
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example (cont’d)
• I assigns to the unary predicate symbol Bottomless the
following relation:
{〈rm13〉, 〈rm33〉, 〈rm44〉}
• I assigns to the binary predicate symbol Location the following
relation:
{〈agent, rm11〉, 〈wumpus, rm31〉, 〈gold, rm32〉}
'
&
$
%
Example (cont’d)
• I assigns to the binary predicate symbol Percept the following relation:
{〈rm12, breeze〉, 〈rm14, breeze〉, 〈rm21, stench〉, 〈rm23, breeze〉,
〈rm32, breeze〉, 〈rm32, stench〉, 〈rm34, breeze〉, 〈rm41, stench〉,
〈rm43, breeze〉}
Note: To describe interpretation I we used words like agent, breeze, etc.
which start with a lowercase letter. These are not symbols of our FOL
language; they are just English words referring to what is in the picture.
Instead, we could have drawn little pictures to describe the elements of the
universe of the interpretation.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example (cont’d)
We now give examples of satisfaction:
• |=I x = y[s] for any variable assignment s which maps x and y
to identical objects of the universe (e.g.,
s(x) = s(y) = wumpus). Why?
Because if s(x) = s(y) = wumpus then s(x) = s(x) = wumpus
is identical to s(y) = s(y) = wumpus.
• |=I Agent = Agent[s] for any variable assignments s.
This is trivial.
'
&
$
%
Example (cont’d)
• |=I Rm21 = NorthOf(Rm11)[s] for any variable assignment s.
Why?
Because s(Rm21) = Rm21I = rm21 and
s(NorthOf(Rm11)) = NorthOf I(s(Rm11)) =
= NorthOf I(Rm11I) = NorthOf I(rm11) = rm21.
• |=I Rm21 = NorthOf(x)[s] for any variable assignment s such
that s(x) = rm11. Why?
Because s(Rm21) = Rm21I = rm21 and
s(NorthOf(x)) = NorthOf I(s(x)) =
NorthOf I(s(x)) = NorthOf I(rm11) = rm21.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example (cont’d)
• |=I Bottomless(x)[s] for any variable assignment s such that
s(x) = rm13 or s(x) = rm33 or s(x) = rm44. Why?
Because if s(x) = rm13 then
〈s(x)〉 = 〈s(x)〉 = 〈rm13〉 ∈ BottomlessI .
Similarly, for the other cases.
• |=I Location(Agent, Rm11)[s] for any variable assignments s. Why?
Because
〈s(Agent), s(Rm11)〉 = 〈AgentI , Rm11I〉 = 〈agent, rm11〉 ∈ LocationI .
'
&
$
%
Example (cont’d)
• |=I ¬Location(Gold,Rm44)[s] for any variable assignment s. Why?
Because
〈s(Gold), s(Rm44)〉 = 〈GoldI , Rm44I〉 = 〈gold, rm44〉 6∈ LocationI
therefore 6|=I Location(Gold,Rm44)[s] for any variable assignment s.
• |=I Location(Gold,Rm32) ∨ Location(Gold,Rm44)[s] for any
variable assignment s. Why?
Because
〈s(Gold), s(Rm32)〉 = 〈GoldI , Rm32I〉 = 〈gold, rm32〉 ∈ LocationI
therefore |=I Location(Gold,Rm32)[s] for any variable assignment s.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example (cont’d)
• |=I (∃x)Location(x,Rm11)[s] for any variable assignment s. Why?
Because
〈s(Agent), s(Rm11)〉 = 〈AgentI , Rm11I〉 = 〈agent, rm11〉 ∈ LocationI
thus |=I Location(x,Rm11)[s(x|agent)].
• 6|=I (∀x)Location(Wumpus, x)[s] for any variable assignment s.
Why?
Because
〈s(Wumpus), s(Rm11)〉 = 〈WumpusI , Rm11I〉 =
〈wumpus, rm11〉 6∈ LocationI
thus 6|=I Location(Wumpus, x)[s(x|rm11)].
'
&
$
%
Satisfaction (cont’d)
When we want to verify whether or not an interpretation satisfies a
wff φ with s, we do not really need all of the (infinite amount of)
information that s gives us. All that matters are the values of the
function s at the (finitely many) variables which occur free in s. In
particular, if φ is a sentence, then s does not matter at all. This is
made formal by the following theorem.
Theorem. Let s1 and s2 be variable assignments from V ars into
|I| which agree at all variables (if any) which occur free in the wff
φ. Then
|=I φ[s1] iff |=I φ[s2].
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Satisfaction (cont’d)
The previous theorem has the following corollary.
Corollary. Let φ be a sentence and I an interpretation. Then,
either
(a) I satisfies φ with every variable assignment s : V ars→ |I|, or
(b) I does not satisfy φ with any variable assignment.
The above corollary allows us to ignore variable assignments
whenever we talk about satisfaction of sentences. Thus if φ is a
sentence and I an interpretation we can just say that I satisfies
(or does not satisfy) φ.
'
&
$
%
Satisfiability
Definition. A formula φ is called satisfiable iff there exists an
interpretation I and variable assignment s such that |=I φ[s].
Otherwise, the formula is called unsatisfiable.
Examples: The formulas
Location(Wumpus,Rm31), Location(Agent, Rm11), (∃x)R(y, x)
are satisfiable. The following formulas are unsatisfiable:
P (x) ∧ ¬P (x), (∀x)P (x) ∧ ¬P (A)
Can you write an algorithm which discovers whether a given wff is
satisfiable?
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Truth and Models
Definition. Let φ be a sentence and I an interpretation. If I
satisfies φ then we will say that φ is true in I or I is a model of φ.
Example: The interpretation I defined in the WW example is a
model of the following sentences:
Location(Wumpus,Rm31), Location(Agent, Rm11),
(∃x)Percept(Breeze, x)
Definition. An interpretation I is a model of a set of sentences
KB iff it is a model of every member of KB.
'
&
$
%
Entailment
Definition. Let KB be a set of wffs, and φ a wff. Then KB entails
φ, denoted by KB |= φ, iff for every interpretation I and every variable
assignment s : V ars→ |I| such that I satisfies every member of KB
with s, I also satisfies φ with s.
Examples:
{ Happy(John), (∀x)(Happy(x) =⇒ Laughs(x)) } |= Laughs(John)
{WellPaid(John), ¬WellPaid(John) ∨ Happy(John) } |= Happy(John)
Can you give an algorithm that discovers whether a set of wffs entail a
wff?
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Validity and Equivalence
Definition. A wff φ is valid iff for every interpretation I and
every variable assignment s : V ars→ |I|, I satisfies φ with s.
Examples: The formulas
P (A) ∨ ¬P (A), P (A) =⇒ P (A), (∀x)P (x) =⇒ (∃x)P (x)
are valid.
Can you write an algorithm which discovers whether a given wff is
valid?
Definition. Two formulas φ and ψ will be called logically
equivalent, denoted by φ ≡ ψ, iff φ |= ψ and ψ |= φ.
'
&
$
%
Satisfiability, Equivalence and Validity
Theorem. φ |= ψ iff φ =⇒ ψ is valid. Proof?
Theorem. φ is unsatisfiable iff ¬φ is valid. Proof?
Theorem. φ ≡ ψ iff φ ⇐⇒ ψ is valid. Proof?
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Some Important Logical Equivalences
Let φ and ψ be wffs. Then:
1. ¬(φ ∧ ψ) ≡ ¬φ ∨ ¬ψ
2. ¬(φ ∨ ψ) ≡ ¬φ ∧ ¬ψ
3. φ ∧ ψ ≡ ¬(¬φ ∨ ¬ψ)
4. φ ∨ ψ ≡ ¬(¬φ ∧ ¬ψ)
5. φ =⇒ ψ ≡ ¬φ ∨ ψ
6. φ ⇐⇒ ψ ≡ (φ =⇒ ψ) ∧ (ψ =⇒ φ)
Proofs?
'
&
$
%
Some Important Logical Equivalences (cont’d)
1. (∀x)φ ≡ ¬(∃x)¬φ
2. (∃x)φ ≡ ¬(∀x)¬φ
3. (∀x)¬φ ≡ ¬(∃x)φ
4. (∃x)¬φ ≡ ¬(∀x)φ
Proofs?
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Some Important Logical Equivalences (cont’d)
1. (∃x)(φ ∨ ψ) ≡ (∃x)φ ∨ (∃x)ψ
2. (∃x)(φ ∧ ψ) |= (∃x)φ ∧ (∃x)ψ
3. (∀x)φ ∨ (∀x)ψ |= (∀x)(φ ∨ ψ)
4. (∀x)(φ ∧ ψ) ≡ (∀x)φ ∧ (∀x)ψ
Proofs?
'
&
$
%
An Exercise
Prove that
(∃x)(φ(x) ∧ ψ(x)) |= (∃x)φ(x) ∧ (∃x)ψ(x).
Proof: Let I be an interpretation such that
|=I (∃x)(φ(x) ∧ ψ(x)).
Then according to the definition of satisfaction for existential
statements, there exists a variable assignment s and d ∈ |I| such
that
|=I (φ(x) ∧ ψ(x))[s(x|d)].
Then according to the definition of satisfaction for conjunctive
statements, we have
|=I φ(x)[s(x|d)]
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
and
|=I ψ(x)[s(x|d)].
Now from the definition of satisfaction for existential statements
again, we have
|=I (∃x)φ(x)
and
|=I (∃x)ψ(x).
Now from the definition of satisfaction for conjunctive statements
we have:
|=I (∃x)φ(x) ∧ (∃x)ψ(x).
The proof is now finished.
'
&
$
%
Representing Knowledge Using FOL
Definition. In knowledge representation, a domain is a section
of the world about which we wish to express some knowledge.
• The domain of family relationships.
• The domain of sets.
• The wumpus domain.
• The domain of web resources (HTML pages, images, programs
etc. on the WWW)
• ...
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Knowledge Engineering
The process of knowledge-base construction is called knowledge
engineering.
A knowledge engineer is someone who investigates a particular
domain, determines what concepts are important in that domain,
and creates a formal representation of the objects and relations in
that domain.
You will become knowledge engineers for some of the exercises!
'
&
$
%
Other Logics in Computer Science
FOL is certainly the most important logic in use today by
computer scientists. But there are others too:
• Second-order logic
• Modal logic (with operators such as “possible” and “certain”)
• Temporal logic (with operators such as “in the past”, etc.)
• Logics of knowledge and belief
• Logics for databases
• ....
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Readings
Chapter 8 of AIMA: First-Order Logic
Other formal presentations of FOL can be found in:
1. M.R. Genesereth and N.J. Nilsson, “Logical Foundations of
Artificial Intelligence”, Morgan Kaufmann, 1987.
2. Any mathematical logic textbook. Most of the formal material
in these notes is from:
H.B. Enderton, “A Mathematical Introduction to Logic”,
Academic Press, 1972.
'
&
$
%
Inference in First-Order Logic
Inference (or reasoning) is the process of mechanically deriving
sentences entailed by other sentences.
We would like to find an inference mechanism i such that
KB |= α iff KB `i α
for any set of FOL sentences KB and any sentence α.
If this inference mechanism can be implemented by a program then
it could form the basis of any knowledge-based agent!
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
A Brief History of Reasoning
450b.c. Stoics propositional logic, inference (maybe)
322b.c. Aristotle “syllogisms” (inference rules), quantifiers
1847 Boole propositional logic (again)
1879 Frege first-order logic
1922 Wittgenstein proof by truth tables
1930 Godel ∃ complete algorithm for proofs in FOL
1930 Herbrand complete algorithm for proofs in FOL
(reduce to propositional)
1931 Godel ¬∃ complete algorithm for arithmetic proofs
1960 Davis/Putnam “practical” algorithm for propositional logic
1965 Robinson “practical” algorithm for FOL—resolution
'
&
$
%
Inference rules for FOL
The following inference rules of PL are valid for FOL as well:
• Modus Ponens: α, α =⇒ ββ
• And-Elimination: α1 ∧ α2 ∧ ... ∧ αn
αi
• And-Introduction: α1,α2,...,αn
α1 ∧ α2 ∧ ... ∧ αn
• Or-Introduction: αi
α1 ∨ α2 ∨ ... ∨ αn
• Double-Negation Elimination: ¬¬αα
• Unit Resolution: α ∨ β, ¬βα
• Resolution: α ∨ β, ¬β ∨ γα ∨ γ
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The Concept of Substitution
Definition. A substitution θ is a finite set of the form
{v1/t1, . . . , vn/tn} where
• each vi is a variable and each ti is a term distinct from vi,
• the variables vi, . . . , vn are distinct, and
• no variable vi occurs in any of the ti’s.
Each element ti is called a binding for vi. The variables with
bindings are called bound.
Definition. A substitution is called ground if the terms ti
contain no variables (i.e., they are ground terms).
'
&
$
%
The Concept of Substitution (cont’d)
The empty substitution will be denoted by {}.
Examples: The sets
{x/John, y/Mary} and {x/John, y/MotherOf(z)}
are substitutions.
The sets
{x/F (x)} and {x/G(y), y/F (x)}
are not substitutions.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Substitution (cont’d)
Definition. Let θ = {v1/t1, . . . , vn/tn} be a substitution and α
be any FOL term or formula without quantifiers. Then
SUBST (θ, α) is the expression obtained from α by replacing each
occurrence of the variable vi in α by the term ti (i = 1, . . . , n).
Example:
SUBST ({x/John, y/Mary}, Loves(x, y)) = Loves(John,Mary)
SUBST ({x/John, y/HouseOf(z)}, Likes(x, y)) =
Likes(John,HouseOf(z))
Note: SUBST ({}, α) = α for any FOL formula α.
'
&
$
%
Inference rules for FOL
The three new inference rules are the following:
• Universal Elimination: For any sentence α, variable v and
ground term g:
(∀v)α
SUBST ({v/g}, α)
Example: From (∀x)Likes(x, IceCream), we can use the
substitution {x/Ben} and infer Likes(Ben, IceCream).
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Inference rules for FOL (cont’d)
• Existential Elimination: For any sentence α, variable v and
brand new constant symbol k:
(∃v)α
SUBST ({v/k}, α)
Example: From (∃x)Likes(x, IceCream), we can infer
Likes(Somebody, IceCream) as long as Somebody is a brand
new constant that has not been used before.
'
&
$
%
Inference rules for FOL (cont’d)
• Existential Introduction: For any sentence α, variable v
that does not occur in α, and ground term g that does occur in
α:
α
(∃v)SUBST ({g/v}, α)
Example: From Likes(John, IceCream), we can infer
(∃x)Likes(John, x).
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
An Example Proof
Let us consider the following text:
The law says that it is a crime for an American to sell
weapons to hostile nations. The country Nono, an enemy
of America, has some missiles, and all of its missiles were
sold to it by Colonel West, who is an American.
How can we formalize this text in FOL, and use the above inference
rules to conclude that West is a criminal?
'
&
$
%
Example: Formalization in FOL
• “... it is a crime for an American to sell weapons to hostile
nations”:
(∀x, y, z) (American(x) ∧ Weapon(y) ∧ Nation(z) ∧
Hostile(z) ∧ Sells(x, z, y) =⇒ Criminal(x))
• “Nono ... has some missiles”:
(∃x) (Owns(Nono, x) ∧ Missile(x))
• “All of its missiles were sold to it by Colonel West”:
(∀x) (Owns(Nono, x) ∧Missile(x) =⇒ Sells(West,Nono, x))
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example: Formalization in FOL (cont’d)
• Missiles are weapons:
(∀x) (Missile(x) =⇒ Weapon(x))
• An enemy of America is a “hostile nation”:
(∀x) (Enemy(x,America) =⇒ Hostile(x))
• “West, who is an American”: American(West)
• “The country Nono ...”: Nation(Nono)
• “Nono, an enemy of America ...”:
Enemy(Nono,America), Nation(America)
'
&
$
%
Example: Proof
• From (∃x) (Owns(Nono, x) ∧ Missile(x)) and Existential
Elimination:
Owns(Nono,M1) ∧ Missile(M1)
• From Owns(Nono,M1) ∧ Missile(M1) and And-Elimination:
Owns(Nono,M1), Missile(M1)
• From (∀x) (Missile(x) =⇒ Weapon(x)) and Universal
Elimination:
Missile(M1) =⇒ Weapon(M1)
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example: Proof (cont’d)
• From Missile(M1),Missile(M1) =⇒ Weapon(M1) and
Modus Ponens:
Weapon(M1)
• From
(∀x) (Owns(Nono, x) ∧Missile(x) =⇒ Sells(West,Nono, x))
and Universal Elimination:
Owns(Nono,M1) ∧Missile(M1) =⇒ Sells(West,Nono,M1)
'
&
$
%
Example: Proof (cont’d)
• From Owns(Nono,M1) ∧ Missile(M1),
Owns(Nono,M1) ∧Missile(M1) =⇒ Sells(West,Nono,M1)
and Modus Ponens:
Sells(West,Nono,M1)
• From (∀x, y, z) (American(x) ∧ Weapon(y) ∧ Nation(z) ∧
Hostile(z) ∧ Sells(x, z, y) =⇒ Criminal(x)) and Universal
Elimination (three times):
American(West) ∧ Weapon(M1) ∧ Nation(Nono) ∧
Hostile(Nono) ∧ Sells(West,Nono,M1) =⇒ Criminal(West)
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example: Proof (cont’d)
• From (∀x) (Enemy(x,America) =⇒ Hostile(x)) and
Universal Elimination:
Enemy(Nono,America) =⇒ Hostile(Nono)
• From Enemy(Nono,America),
Enemy(Nono,America) =⇒ Hostile(Nono)
and Modus Ponens:
Hostile(Nono)
'
&
$
%
Example: Proof (cont’d)
• From
American(West), Weapon(M1), Nation(Nono),
Hostile(Nono), Sells(West,Nono,M1)
and And-Introduction:
American(West) ∧ Weapon(M1) ∧ Nation(Nono) ∧
Hostile(Nono) ∧ Sells(West,Nono,M1)
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example: Proof (cont’d)
• From
American(West) ∧ Weapon(M1) ∧ Nation(Nono) ∧
Hostile(Nono) ∧ Sells(West,Nono,M1),
American(West) ∧ Weapon(M1) ∧ Nation(Nono) ∧
Hostile(Nono) ∧ Sells(West,Nono,M1) =⇒ Criminal(West)
and Modus Ponens:
Criminal(West)
'
&
$
%
Finding a Proof: a Search Problem
Finding a proof can be formalized as a search problem:
• Initial state: the initial KB.
• Operators: applicable inference rules
• Goal test: the KB contains the sentence we are trying to prove.
Then any search algorithm can be used to find a proof for a given
sentence. Unfortunately the search space is infinite!
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Composition of Substitutions
Definition. Let θ1 = {u1/s1, . . . , um/sm} and
θ2 = {v1/t1, . . . , vn/tn} be substitutions such that no variable
bound in θ1 occurs anywhere in θ2. The composition
COMPOSE(θ1, θ2) of θ1 and θ2 is the substitution
{u1/SUBST (θ2, s1), . . . , um/SUBST (θ2, sm), v1/t1, . . . , vn/tn}.
Note: Other authors denote the composition of θ1 and θ2 by θ1θ2.
'
&
$
%
Examples of Composition
• Let θ1 = {x/y, z/G(w)} and θ2 = {y/A,w/D}. Then
COMPOSE(θ1, θ2) = {x/A, z/G(D), y/A,w/D}.
• Let θ1 = {x/y, z/G(w, y)} and θ2 = {y/v, w/D}. Then
COMPOSE(θ1, θ2) = {x/v, z/G(D, v), y/v, w/D}.
• Let θ1 = {x/y, z/G(w)} and θ2 = {y/x}. The composition is
not defined in this case because x is bound in θ1 and occurs in
θ2.
• Let θ1 = {x/F (y), z/y, w/D} and θ2 = {y/A, v/E}. Then
COMPOSE(θ1, θ2) = {x/F (A), z/A,w/D, y/A, v/E}.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Properties of Composition
Theorem. Let θ1, θ2 and θ3 be substitutions and φ a FOL
expression. Then
1. COMPOSE(θ1, {}) = COMPOSE({}, θ1) = θ1
2. SUBST (θ2, SUBST (θ1, φ)) = SUBST (COMPOSE(θ1, θ2), φ)
whenever the composition is defined.
3. COMPOSE(COMPOSE(θ1, θ2), θ3) =
COMPOSE(θ1, COMPOSE(θ2, θ3))
whenever the compositions are defined.
'
&
$
%
Unification
Definition. A set of expressions {φ1, . . . , φn} is unifiable if and
only if there is a substitution θ that makes the expressions
identical; i.e.,
SUBST (θ, φ1) = · · · = SUBST (θ, φn).
In such a case, θ is called a unifier for the set.
Examples:
• The set of expressions {P (A, y, z), P (x,B, z)} is unifiable. The
substitution {x/A, y/B, z/C} is a unifier. There are other
unifiers as well; e.g., {x/A, y/B, z/F (w)}, {x/A, y/B} etc.
• The set of expressions {P (F (x), A), P (y, F (w))} is not
unifiable.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Most General Unifiers
Definition. A most general unifier, or mgu, of a set of
expressions S is a unifier γ with the following property. For each
unifier σ of S, there exists a substitution θ such that
σ = COMPOSE(γ, θ).
Examples:
• A mgu of the set {P (A, y, z), P (x,B, z)} is {x/A, y/B}.
Notice that
{x/A, y/B, z/F (w)} = COMPOSE({x/A, y/B}, {z/F (w)}).
• A mgu of the set {P (F (x), z), P (y,A)} is {y/F (x), z/A}.
'
&
$
%
Most General Unifiers (cont’d)
Example: A mgu of P (x) and P (y) is {x/y}. Another mgu is
{y/x}.
A most general unifier is unique up to variable renaming. This is
why we usually speak of the most general unifier of a set of
expressions.
We will now present algorithm Unify which finds the most general
unifier of two input FOL expressions.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
A Unification Algorithm
function Unify(x, y) returns the mgu of x and y
if x = y then return {}
if Variable(x) then return Unify-Var(x, y)
if Variable(y) then return Unify-Var(y, x)
if Constant(x) or Constant(y) then return failure
if not(Length(x)=Length(y)) then return failure
i← 0; γ ← {}
tag if i =Length(x) then return γ
σ ← Unify(Part(x, i),Part(y, i))
if σ = failure then return failure
γ ← COMPOSE(γ, σ)
x← SUBST (γ, x)
y ← SUBST (γ, y)
i← i + 1
goto tag
'
&
$
%
A Unification Algorithm (cont’d)
function Unify-Var(x, y) returns a substitution
if x occurs in y then return failure
return { x/y }
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
A Unification Algorithm (cont’d)
• Unify was first presented by Robinson in 1965.
• You should use the above algorithm to compute the mgu whenever
the given expressions are complex.
• The inputs of Unify can be constants, variables, terms or atomic
formulas.
• The length of a term or an atomic formula is the number of its
arguments.
• The top-level function or relation symbol in a term or atomic
formula is its 0-th part, and the arguments are the other parts.
• The condition in the if-statement in function Unify-Var is called
the occurs-check. It ensures that terms such as z and F (z) do
not unify.
'
&
$
%
A Unification Algorithm (cont’d)
The worst-case time complexity of Unify is exponential in the size
of the input expressions.
Example: If we unify the following two terms
H(x1, x2, . . . , xn, F (y0, y0), . . . , F (yn−1, yn−1), yn)
H(f(x0, x0), F (x1, x1), . . . , F (xn−1, xn−1), y1, . . . , yn, xn)
each xi and yi will be bound to a term with 2i+1 − 1 symbols.
The problem is that the mgu of these two terms contains many
duplicate copies of the same subterms.
There are better (linear time!) algorithms for unification. The
main idea in these algorithms is to use good data structures for
representing FOL expressions and to apply substitutions carefully.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Generalized Modus Ponens (GMP)
For atomic formulas pi, p′i and q, where there is a substitution θ
such that SUBST (θ, p′i) = SUBST (θ, pi), for all i:
p′1, p′2, . . . , p
′n, (p1 ∧ p2 ∧ · · · ∧ pn =⇒ q)
SUBST (θ, q)
Example: From Missile(M1), Owns(Nono,M1) and
Missile(x) ∧ Owns(Nono, x) =⇒ Sells(West,Nono, x)
we can infer Sells(West,Nono,M1).
The substitution θ in this case is {x/M1}.
'
&
$
%
Generalized Modus Ponens (cont’d)
Comments:
• GMP does in one step what would otherwise require an
And-Introduction, Universal Elimination, and Modus-Ponens.
• GMP applies only to Horn formulas.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Literals
Definition. A literal is an atomic formula or a negated atomic
formula. In the first case we have a positive literal and in the
second a negative literal.
Examples:
Drives(John,BMW ), ¬Drives(John,BMW ), Drives(x,BMW ),
Loves(Mary, FatherOf(Mary)), ¬P (x, F (G(x)))
'
&
$
%
Horn Formulas
Definition. A FOL formula will be called Horn if it is a
disjunction of literals of which at most one is positive.
In other words, a Horn formula is in one of the following three
forms:
q
¬p1 ∨ ¬p2 ∨ . . . ∨ ¬pn ∨ q (or p1 ∧ p2 ∧ . . . ∧ pn =⇒ q)
¬p1 ∨ ¬p2 ∨ . . . ∨ ¬pn
where p1, . . . , pn, q are atomic formulas.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Horn Formulas
Horn formulas of the first kind are also called facts.
Horn formulas of the second kind are also called rules. In this case
q is called the head of the rule and and p1 ∧ p2 ∧ . . . ∧ pn is
called the body of the rule.
Horn formulas of the last kind can be used as queries or integrity
constraints in logic programming systems.
In a Horn formula the (free) variables of p1, . . . , pn, q are
interpreted as being universally quantified in front of the formula.
If a Horn formula has exactly one positive literal, it is called
definite.
'
&
$
%
Examples
The formulas
Drives(John,BMW )
Drives(x,BMW )
Person(x) =⇒ Animal(x)
Person(x) ∧ Knows(John, x) =⇒ Loves(John, x)
¬Person(x) ∨ ¬Knows(John, x)
are Horn formulas. The first two are facts. The second two are
rules.
The formula
Drives(John,BMW ) ∨ Drives(John, Porsche)
is not Horn.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Horn Formulas (cont’d)
Definition. A KB is in Horn form iff it consists only of Horn
formulas.
Can we solve our crime problem using GMP? Yes, if it is possible
to transform our KB into Horn form.
Later on, we will give a general algorithm for this transformation.
'
&
$
%
Example: Horn Formulas
• “... it is a crime for an American to sell weapons to hostile
nations”:
American(x) ∧ Weapon(y) ∧ Nation(z) ∧
Hostile(z) ∧ Sells(x, z, y) =⇒ Criminal(x)
• “Nono ... has some missiles”:
Owns(Nono,M1), Missile(M1)
• “All of its missiles were sold to it by Colonel West”:
Owns(Nono, x) ∧ Missile(x) =⇒ Sells(West,Nono, x)
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example (cont’d)
• Missiles are weapons:
Missile(x) =⇒ Weapon(x)
• An enemy of America is a “hostile nation”:
Enemy(x,America) =⇒ Hostile(x)
• “West, who is an American”: American(West)
• “The country Nono ...”: Nation(Nono)
• “Nono, an enemy of America ...”:
Enemy(Nono,America), Nation(America)
'
&
$
%
Example: Proof
• From Missile(M1) and Missile(x) =⇒ Weapon(x), using
GMP:
Weapon(M1)
• From Enemy(x,America) =⇒ Hostile(x) and
Enemy(Nono,America), using GMP:
Hostile(Nono)
• From Owns(Nono,M1), Missile(M1) and
Owns(Nono, x) ∧ Missile(x) =⇒ Sells(West,Nono, x)
using GMP:
Sells(West,Nono,M1)
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example: Proof
• From
American(West),Weapon(M1), Nation(Nono), Hostile(Nono),
Sells(West,Nono,M1) and
American(x) ∧ Weapon(y) ∧ Nation(z) ∧
Hostile(z) ∧ Sells(x, z, y) =⇒ Criminal(x),
using GMP:
Criminal(West)
This proof illustrates that reasoning using GMP is natural, and
easy to follow. We will now present two important reasoning
algorithms based on GMP. Because they use GMP these algorithms
are applicable only to KBs in Horn form.
'
&
$
%
Standardization of Variables
Example:
Likes(x,Mary)
Likes(John, x) =⇒ GreetsCheerfully(John, x)
We would expect the above KB to entail:
GreetsCheerfully(John,Mary)
But GMP cannot be used to infer it because Likes(x,Mary) and
Likes(John, x) do not unify!
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Standardization of Variables (cont’d)
The previous problem can be solved by standardization of variables:
1. No variable should occur in more than one formula in the
initial KB.
2. Whenever we apply GMP we rename the variables in the
formulas involved. The new names must not include any
variable names already in the KB.
'
&
$
%
Forward Chaining
Forward chaining is an inference method that starts with
sentences in the knowledge base, and generates new conclusions
using GMP. These conclusions in turn can allow more inferences to
be made.
The following algorithm triggers forward chaining after the
addition of a new fact p in KB:
procedure Forward-Chain(KB, p)
if there is a sentence in KB that is a renaming of p then return
Add p to KB
for each (p1 ∧ . . . ∧ pn =⇒ q) in KB such that
for some i, Unify(pi, p) = θ succeeds do
Find-And-Infer(KB, [p1, . . . , pi−1, pi+1, . . . , pn], q, θ)
end
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Forward Chaining (cont’d)
procedure Find-And-Infer(KB, premises, conclusion, θ)
if premises = [ ] then
Forward-Chain(KB,SUBST (θ, conclusion))
else for each p′ in KB such that
Unify(p′, SUBST (θ,First(premises))) = θ2 do
Find-And-Infer(KB,Rest(premises),
conclusion, COMPOSE(θ, θ2))
end
'
&
$
%
Forward Chaining
Comments:
• Forward-chaining is a data-driven or data-directed
procedure.
• Forward chaining needs a policy for standardization of
variables like the one we discussed for GMP. This policy is
implicit in the previous algorithm.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Forward Chaining: Example 1
Let KB be
Missile(M1)
Owns(Nono, x) ∧ Missile(x) =⇒ Sells(West,Nono, x)
and p be Owns(Nono,M1).
Forward-Chain(KB, p) will work as follows:
• Forward-Chain adds Owns(Nono,M1) to KB.
• Owns(Nono,M1) unifies with the first premise of the rule
Owns(Nono, x1) ∧ Missile(x1) =⇒ Sells(West,Nono, x1)
with mgu θ = {x1/M1} (note the standardization of variables!).
Find-And-
Infer(KB, [Missile(M1)], Sells(West,Nono, x1), {x1/M1}) is
called now.
'
&
$
%
Example 1 (cont’d)
• First([Missile(M1)])=Missile(M1) matches with
Missile(M1) with mgu θ2 = {} so
Find-And-Infer(KB, [ ], Sells(West,Nono, x1), {x1/M1}) is
called.
• In this call of Find-And-Infer we have premises = [ ] thus
Forward-Chain(KB,Sells(West,Nono,M1)) is called.
• Forward-Chain adds Sells(West,Nono,M1) to the KB. No
unifications are possible now, thus the procedure returns.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example 2
Let KB be
Q(x) =⇒ Q(F (x))
and p be Q(A).
Forward-Chain(KB, p) will work as follows:
• Forward-Chain adds Q(A) to KB.
• Q(A) unifies with the premise of the rule
Q(x1) =⇒ Q(F (x1))
with mgu θ = {x1/A} (note the standardization of variables!).
Find-And-Infer(KB, [ ], Q(F (x1)), {x1/A}) is called now.
• In this call of Find-And-Infer we have premises = [ ] thus
Forward-Chain(KB,Q(F (A))) is called.
'
&
$
%
Example 2 (cont’d)
• Forward-Chain adds Q(F (A)) to the KB. Q(F (A)) unifies
with the premise of the rule
Q(x2) =⇒ Q(F (x2))
with mgu θ = {x2/A} (note the standardization of variables!).
At this point the algorithm goes into an infinite loop! It will keep
on adding to KB the following facts:
Q(F (F (A))), Q(F (F (F (A)))), . . .
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Backward Chaining
Backward chaining is an inference method which starts with
something that we want to prove, finds implications that would
allow us to conclude it, and then attempts to establish their
premises in turn.
The following algorithm Back-Chain takes as input a knowledge
base KB and an atomic formula q (with free variables), and
returns a set of substitutions θ such that SUBST (θ, q) can be
inferred, using GMP, from the formulas in KB.
function Back-Chain(KB, q) returns a set of substitutions
Back-Chain-List(KB, [q], {})
'
&
$
%
Backward Chaining (cont’d)
function Back-Chain-List(KB, qlist, θ) returns a set of substitutions
inputs: KB, a knowledge base,
qlist, a list of conjuncts forming a query (θ already applied)
θ the current substitution
local: answers, a set of substitutions, initially empty
if qlist is empty then return {θ}
q ← First(qlist)
for each q′
i in KB such that θi ← Unify(q, q′
i) succeeds do
Add COMPOSE(θ, θi) to answers
end
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Backward Chaining (cont’d)
for each sentence (p1 ∧ . . . ∧ pn =⇒ q′
i) in KB
such that θi ← Unify(q, q′
i) succeeds do
answers← answers ∪
Back-Chain-List (KB, SUBST (θi, [p1, . . . , pn]), COMPOSE(θ, θi))
end
return the union of Back-Chain-List (KB, SUBST (θ,(Rest(qlist))), θ)
for each θ ∈ answers
'
&
$
%
Backward Chaining (cont’d)
Comments:
• Backward chaining needs a policy for standardization of
variables like the one we discussed for GMP. This policy is
implicit in the previous algorithm.
• The call BackChain(KB, q) can be used to find all answers to
a query q posed to any knowledge base KB.
The answer to a query q is a set of substitutions which is
formed by considering all substitutions returned by
Back-Chain, and keeping only the bindings for the variables
of q.
All logic programming languages (e.g., Prolog) and all
deductive database systems are based on this observation.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Backward Chaining: Example 1
Let KB be
Q(A), Q(B)
and let the query q be Q(x).
Back-Chain(KB, q) will work as follows:
• Back-Chain calls Back-Chain-List with arguments
KB, [Q(x)] and {}.
• Q(x) unifies with Q(A) with mgu {x/A}, and Q(B) with mgu
{x/B}. Thus variable answers is assigned the set
{{x/A}, {x/B}}.
'
&
$
%
Example 1 (cont’d)
• The last line of Back-Chain-List generates the calls
Back-Chain-List(KB, [ ], {x/A}) and
Back-Chain-List(KB, [ ], {x/B}).
The first of the above calls returns {{x/A}} and the second
returns {{x/B}}. Thus finally the function returns the result
{{x/A}, {x/B}}.
This list of substitutions is the answer to query Q(x).
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example 2
Let KB be
Q(A), Q(B), R(C), R(D)
and qlist be [Q(x), R(y)].
Then Back-Chain-List(KB, qlist, {}) will work as follows:
• Q(x) unifies with Q(A) with mgu {x/A}, and Q(B) with mgu
{x/B}. Thus variable answers is assigned the set
{{x/A}, {x/B}}.
'
&
$
%
Example 2 (cont’d)
• The last line of Back-Chain-List generates the calls
Back-Chain-List(KB, [R(y)], {x/A}) and
Back-Chain-List(KB, [R(y)], {x/B}).
The first of the above calls returns
{{x/A, y/C}, {x/A, y/D}}
and the second returns
{{x/B, y/C}, {x/B, y/D}}.
Thus finally the function returns the result
{{x/A, y/C}, {x/A, y/D}, {x/B, y/C}, {x/B, y/D}}.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example 2 (cont’d)
Let us now see why Back-Chain-List(KB, [R(y)], {x/A}) returns
{{x/A, y/C}, {x/A, y/D}}.
We can deal similarly with the call
Back-Chain-List(KB, [R(y)], {x/B})
• R(y) unifies with R(C) with mgu {{y/C}}, and R(D) with
mgu {{y/D}}. Thus variable answers is assigned the set
{{x/A, y/C}, {x/A, y/D}}.
'
&
$
%
Example 2 (cont’d)
• The last line of Back-Chain-List generates the calls
Back-Chain-List(KB, [ ], {x/A, y/C}) and
Back-Chain-List(KB, [ ], {x/A, y/D}).
The first of the above calls returns {{x/A, y/C}} and the
second returns {{x/A, y/D}}. Thus finally the function returns
the result
{{x/A, y/C}, {x/A, y/D}}.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example 3
Let KB be
Missile(M1), Owns(Nono,M1),
Owns(Nono, x) ∧ Missile(x) =⇒ Sells(West,Nono, x)
and the query q be Sells(West,Nono, y).
Back-Chain(KB, q) will work as follows:
• Back-Chain calls Back-Chain-List with arguments
KB, [Sells(West,Nono, y)] and {}.
• Sells(West,Nono, y) does not unify with any atomic formula in
KB. However it unifies with the conclusion of the rule
Owns(Nono, x1) ∧ Missile(x1) =⇒ Sells(West,Nono, x1)
with substitution θi = {y/x1} (note the standardization of
variables!).
'
&
$
%
Example 3 (cont’d)
• Variable answers is assigned its old value (empty set) union
the value returned by Back-Chain-
List(KB, [Owns(Nono, x1),Missile(x1)], {y/x1}).
This call returns {{y/M1, x1/M1}}.
• The last line of Back-Chain-List calls Back-Chain-List
again with arguments KB, [ ] and {y/M1, x1/M1}. This call
returns {{y/M1, x1/M1}} therefore the final result is
{{y/M1, x1/M1}}.
Thus the answer to the query Sells(West,Nono, y) is
{{y/M1}}
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example 3 (cont’d)
Let us now see why
Back-Chain-List(KB, [Owns(Nono, x1),Missile(x1)], {y/x1})
returns {{y/M1, x1/M1}}.
• Owns(Nono, x1) unifies with Owns(Nono,M1) with
substitution θi = {x1/M1}. Thus
COMPOSE({y/x1}, {x1/M1}) = {y/M1, x1/M1} is added to
answers (currently empty).
• The last line of Back-Chain-List calls Back-Chain-List
again with arguments KB, [Missile(M1)] and {y/M1, x1/M1}.
This call returns {{y/M1, x1/M1}} as we can see easily. Thus
this is also the result of Back-Chain-
List(KB, [Owns(Nono, x1),Missile(x1)], {y/x1})
'
&
$
%
Example 4
Let KB be
p(3) =⇒ p(3)
and the query q be p(x).
Back-Chain(KB, q) will work as follows:
• Back-Chain calls Back-Chain-List with arguments
KB, [p(x)] and {}.
• p(x) unifies with
p(3) =⇒ p(3)
with mgu θi = {x/3}. Then Back-Chain-List is called again
with arguments KB, [p(3)] and {x/3}.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example 4 (cont’d)
• p(3) unifies with the conclusion of the rule
p(3) =⇒ p(3)
with mgu θi = {x/3}. Then Back-Chain-List is called again
with arguments KB, [p(3)] and {x/3}. At this point the
algorithm goes into an infinite loop!
This example shows that “bad” knowledge bases can make
algorithm Back-Chain go into an infinite loop.
'
&
$
%
Proof Trees for Backward Chaining
Example: (slightly modified this time!)
• “... it is a crime for an American to sell weapons to hostile
nations”:
American(x) ∧ Weapon(y) ∧
Sells(x, y, z) ∧ Hostile(z) =⇒ Criminal(x)
• Missiles are weapons:
Missile(x) =⇒ Weapon(x)
• “Nono ... has some missiles”:
Owns(Nono,M1), Missile(M1)
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Proof Trees (cont’d)
• “All of its missiles were sold to it by Colonel West”:
Missile(x) ∧ Owns(Nono, x) =⇒ Sells(West,Nono, x)
• An enemy of America is a “hostile nation”:
Enemy(x,America) =⇒ Hostile(x)
• “West, who is an American”: American(West)
• “Nono, an enemy of America ...”:
Enemy(Nono,America)
'
&
$
%
Proof Trees (cont’d)
Hostile(Nono)
Enemy(Nono,America)Owns(Nono,M1)Missile(M1)
Criminal(West)
Missile(y)
Weapon(y) Sells(West,M1,z)American(West)
y/M1{ } { }{ }{ }
{ } z/Nono{ }
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Soundness and Completeness of GMP
An inference rule i is called sound if it derives only sentences that
are entailed. In other words, if KB `i α then KB |= α.
An inference mechanism is called complete if it derives all the
sentences that are entailed. In other words, if KB |= α then
KB `i α.
Is GMP a sound and complete inference rule for FOL?
'
&
$
%
Soundness and Completeness of GMP (cont’d)
Theorem. GMP is a sound inference rule. Proof?
Example:
(∀x) P (x) =⇒ Q(x)
(∀x) P (x) ∨ R(x)
(∀x) Q(x) =⇒ S(x)
(∀x) R(x) =⇒ S(x)
The above KB entails S(A) but GMP will not be able to infer it.
Thus GMP is an incomplete inference rule for FOL.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Readings
• AIMA, Chapter 9.
'
&
$
%
Sound and Complete Inference Rules in FOL
An inference procedure i is called sound if
KB |= α whenever KB `i α
An inference procedure i is called complete if
KB `i α whenever KB |= α
Generalised Modus-Ponens (equivalently, forward or backward
chaining) is sound and complete for Horn KBs but incomplete
for general first-order logic.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example
Let us consider the following formulas:
PhD(x) =⇒ HighlyQualified(x)
¬PhD(x) =⇒ EarlyEarnings(x)
HighlyQualified(x) =⇒ Rich(x)
EarlyEarnings(x) =⇒ Rich(x)
From the above we should be able to infer Rich(Me), but GMP
won’t do it!
Is there a complete inference procedure for FOL?
'
&
$
%
The Resolution Inference Rule
Basic propositional version:
α ∨ β, ¬β ∨ γ
α ∨ γor equivalently
¬α =⇒ β, β =⇒ γ
¬α =⇒ γ
������� ��� ������������������������� � � !"��#���$�%�&('*)���+
'
&
$
%
The Resolution Inference Rule - FOL version
p1 ∨ . . . pj . . . ∨ pm, q1 ∨ . . . qk . . . ∨ qnSUBST (σ, (p1 ∨ . . . pj−1 ∨ pj+1 . . . pm ∨ q1 . . . qk−1 ∨ qk+1 . . . ∨ qn))
where UNIFY (pj ,¬qk) = σ.
Note: σ is the most general unifier (MGU) of pj and q′k. The literals pj
and qk are called complementary literals because each one unifies with
the negation of the other. The resulting disjunction is called a resolvent.
'
&
$
%
Examples
¬Rich(x) ∨ Unhappy(x), Rich(Me)
Unhappy(Me)
with MGU σ = {x/Me}
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Examples (cont’d)
PhD(x) =⇒ HighlyQualified(x)
¬PhD(x) =⇒ EarlyEarnings(x)
HighlyQualified(x) =⇒ Rich(x)
EarlyEarnings(x) =⇒ Rich(x)
Let us try resolution to infer Rich(Me)!
The standard way of showing that KB ` φ by resolution is to add
¬φ to the KB and show that we can reach the empty clause by
repeated application of the resolution rule.
In our case, we add ¬Rich(Me).
'
&
$
%
Examples (cont’d)
Let us first write all our formulas as disjunctions:
¬PhD(x) ∨ HighlyQualified(x)
PhD(x) ∨ EarlyEarnings(x)
¬HighlyQualified(x) ∨ Rich(x)
¬EarlyEarnings(x) ∨ Rich(x)
¬Rich(Me)
Now we can apply resolution repeatedly.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Examples (cont’d)
From
¬Rich(Me)
and
¬HighlyQualified(z) ∨ Rich(z)
with MGU σ = {z/Me}, we infer
¬HighlyQualified(Me).
'
&
$
%
Examples (cont’d)
From
¬Rich(Me)
and
¬EarlyEarnings(w) ∨ Rich(w)
using MGU σ = {w/Me}, we infer
¬EarlyEarnings(Me).
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Examples (cont’d)
From
¬PhD(x) ∨ HighlyQualified(x)
and
PhD(y) ∨ EarlyEarnings(y)
with MGU σ = {x/y}, we infer
HighlyQualified(y) ∨ EarlyEarnings(y).
'
&
$
%
Examples (cont’d)
From
HighlyQualified(v) ∨ EarlyEarnings(v)
and
¬EarlyEarnings(Me)
using MGU σ = {v/Me}, we infer
HighlyQualified(Me).
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Examples (cont’d)
From
HighlyQualified(Me)
and
¬HighlyQualified(Me)
using MGU σ = {}, we infer the empty clause. Thus we have
reached a contradiction!
'
&
$
%
Conjunctive Normal Form
To be able to do resolution, the given formulas have to be in
conjunctive normal form.
Definition. A literal is an atomic formula or the negation of an
atomic formula. An atomic formula is also called a positive
literal, and the negation of an atomic formula is called a negative
literal. A clause is a disjunction of literals. There is a special
clause called empty which is equivalent to false.
Definition. A FOL formula is in conjunctive normal form
(CNF) if it is a conjunction of disjunctions of literals (equivalently
if it is a set of clauses).
Proposition. Every FOL formula is equivalent to a formula in
CNF.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Conversion to CNF
1. Eliminate equivalences and implications using the laws:
(φ ⇐⇒ ψ) ≡ (φ =⇒ ψ ∧ ψ =⇒ φ)
φ =⇒ ψ ≡ ¬φ ∨ ψ
2. Move ¬ inwards using the equivalences
¬(φ ∨ ψ) ≡ ¬φ ∧ ¬ψ
¬(φ ∧ ψ) ≡ ¬φ ∨ ¬ψ
¬(∀x)φ ≡ (∃x)¬φ
¬(∃x)φ ≡ (∀x)¬φ
¬¬φ ≡ φ
'
&
$
%
Conversion to CNF (cont’d)
3. Rename variables so that each quantifier has a unique
variable.
3. Eliminate existential quantifiers.
If an existential quantifier does not occur in the scope of a
universal quantifier, we simply drop the quantifier and replace
all occurences of the quantifier variable by a new constant
called a Skolem constant.
If an existential quantifier ∃x is within the scope of universal
quantifiers ∀y1, . . . , ∀yn, we drop the quantifier and replace all
occurences of the quantifier variable x by the term f(y1, . . . , yn)
where f is a new function symbol called a Skolem function.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Conversion to CNF (cont’d)
5. Drop all universal quantifiers.
6. Distribute ∧ over ∨ using the equivalence
(φ ∧ ψ) ∨ θ ≡ (φ ∨ θ) ∧ (ψ ∨ θ)
7. Flatten nested conjunctions or disjunctions. Then write each
disjunction on a separate line and standardize variables apart
(i.e., make sure disjunctions use different variables).
'
&
$
%
Example
Let us convert to CNF the following sentence:
(∀x)((∀y)P (x, y) =⇒ ¬(∀y)(Q(x, y) =⇒ R(x, y)))
1. Eliminate implications:
(∀x)(¬(∀y)P (x, y) ∨ ¬(∀y)(¬Q(x, y) ∨ R(x, y)))
2. Move ¬ inwards:
(∀x)((∃y)¬P (x, y) ∨ (∃y)(Q(x, y) ∧ ¬R(x, y)))
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example (cont’d)
3. Rename variables:
(∀x)((∃y)¬P (x, y) ∨ (∃z)(Q(x, z) ∧ ¬R(x, z)))
4. Skolemize:
(∀x)(¬P (x, F1(x)) ∨ (Q(x, F2(x)) ∧ ¬R(x, F2(x))))
5. Drop universal quantifiers:
¬P (x, F1(x)) ∨ (Q(x, F2(x)) ∧ ¬R(x, F2(x)))
6. Distribute ∧ over ∨ :
(¬P (x, F1(x)) ∨ Q(x, F2(x))) ∧ (¬P (x, F1(x)) ∨ ¬R(x, F2(x)))
7. Final form:
¬P (x, F1(x)) ∨ Q(x, F2(x))
'
&
$
%
¬P (x, F1(x)) ∨ ¬R(x, F2(x))
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Resolution: Soundness and Refutation-Completeness
Theorem. (Soundness)
Let KB be a knowledge base. If φ can be proved from KB using
resolution then KB |= φ.
Theorem. (Refutation-completeness)
If a set ∆ of clauses is unsatisfiable then resolution will derive the
empty clause from ∆.
Note: The above theorem holds only if ∆ does not involve equality.
Methodology: If we are asked to prove KB |= α then we negate α
and show that KB ∧ ¬α is unsatisfiable using resolution.
'
&
$
%
Example 1
The crime example we saw in a previous lecture:
The law says that it is a crime for an American to sell
weapons to hostile nations. The country Nono, an enemy
of America, has some missiles, and all of its missiles were
sold to it by Colonel West, who is an American.
Use resolution to conclude that West is a criminal.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example 1: Formalization in FOL
• “... it is a crime for an American to sell weapons to hostile nations”:
(∀x, y, z) (American(x) ∧ Weapon(y) ∧ Nation(z) ∧
Hostile(z) ∧ Sells(x, z, y) =⇒ Criminal(x))
• “Nono ... has some missiles”:
(∃x) (Owns(Nono, x) ∧ Missile(x))
• “All of its missiles were sold to it by Colonel West”:
(∀x) (Owns(Nono, x) ∧ Missile(x) =⇒ Sells(West,Nono, x))
'
&
$
%
Example 1: Formalization in FOL (cont’d)
• Missiles are weapons:
(∀x) (Missile(x) =⇒ Weapon(x))
• An enemy of America is a “hostile nation”:
(∀x) (Enemy(x,America) =⇒ Hostile(x))
• “West, who is an American”: American(West)
• “The country Nono ...”: Nation(Nono)
• “Nono, an enemy of America ...”:
Enemy(Nono,America), Nation(America)
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example 1: CNF form
• “... it is a crime for an American to sell weapons to hostile
nations”:
¬American(x) ∨ ¬Weapon(y) ∨ ¬Sells(x, y, z)∨
¬Hostile(z) ∨ Criminal(x)
• “Nono ... has some missiles”:
Owns(Nono,M1), Missile(M1)
• “All of its missiles were sold to it by Colonel West”:
¬Missile(x) ∨ ¬Owns(Nono, x) ∨ Sells(West, x,Nono)
'
&
$
%
Example 1: CNF form (cont’d)
• Missiles are weapons:
¬Missile(x) ∨ Weapon(x)
• An enemy of America is a “hostile nation”:
¬Enemy(x,America) ∨Hostile(x)
• “West, who is an American”:
American(West)
• “The country Nono ...”:
Nation(Nono)
• “Nono, an enemy of America ...”:
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Enemy(Nono,America), Nation(America)
'
&
$
%
Example 1: Proof
American(West)
Missile(M1)
Missile(M1)
Owns(Nono,M1)
Enemy(Nono,America) Enemy(Nono,America)
Criminal(x)Hostile(z)LSells(x,y,z)LWeapon(y)LAmerican(x)L > > > >
Weapon(x)Missile(x)L >
Sells(West,x,Nono)Missile(x)L Owns(Nono,x)L> >
Hostile(x)Enemy(x,America)L >
Sells(West,y,z)LWeapon(y)LAmerican(West)L > > Hostile(z)L>
Sells(West,y,z)LWeapon(y)L > Hostile(z)L>
Sells(West,y,z)L> Hostile(z)L>L Missile(y)
Hostile(z)L>L Sells(West,M1,z)
> > L Hostile(Nono)L Owns(Nono,M1)L Missile(M1)
> L Hostile(Nono)L Owns(Nono,M1)
L Hostile(Nono)
Criminal(West)L
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example 2
Let us assume that we know the following:
Everyone who loves animals is loved by someone.
Anyone who kills an animal is loved by no one.
Jack loves all animals.
Either Jack or Curiosity killed the cat, who is named Tuna.
From the above facts, can we prove that Curiosity killed Tuna?
'
&
$
%
Example 2: Formalization in FOL
• Everyone who loves animals is loved by someone.
(∀x)((∀y)(Animal(y) =⇒ Loves(x, y)) =⇒ (∃y)Loves(y, x) )
• Anyone who kills an animal is loved by no one.
(∀x)((∃y)(Animal(y) ∧ Kills(x, y)) =⇒ (∀z)¬Loves(z, x))
• Jack loves all animals.
(∀x)(Animal(x) =⇒ Loves(Jack, x))
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example 2: Formalization in FOL
• Either Jack or Curiosity killed the cat ...
Kills(Jack, Tuna) ∨ Kills(Curiosity, Tuna)
• ... who is named Tuna.
Cat(Tuna)
We will also need the formula
(∀x)(Cat(x) =⇒ Animal(x))
which is background knowledge.
The negation of the formula to be proved is:
¬Kills(Curiosity, Tuna)
'
&
$
%
Example 2: Proof
Kills(Curiosity,Tuna)LAnimal(x)>Cat(x)LCat(Tuna)
Animal(Tuna) Kills(Jack,Tuna)
>Kills(Jack,Tuna} Kills(Curiosity,Tuna)
Animal(F(x)) Loves(G(x),x)>Loves(G(x),x)Loves(x, F(x))L >
Kills(x,z)LAnimal(z)LLoves(y,x)L >>
Loves(G(Jack),Jack)Animal(F(Jack))L >
L Kills(x,Tuna)>Loves(y,x)L > Loves(Jack,x)Animal (x)L
L Loves(y,Jack)
Loves(G(Jack),Jack)
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Resolution, Validity and Unsatisfiability
Questions:
• How do we use resolution to prove that the following formula is
valid?
Happy(John) ∨ ¬Happy(John)
• How do we use resolution to prove that the following formula is
unsatisfiable?
Happy(John) ∧ ¬Happy(John)
'
&
$
%
Fill-in-the-Blank Questions
So far we have used resolution to see that something follows from a
KB. We can also use resolution to answer questions about facts
that follow from a KB. In the previous example, we can use
resolution to find the answer to the question: Who killed Tuna?
This can be expressed using a free variable and writing the
fill-in-the-blank query Kills(x, Tuna).
Definition. An answer literal for a fill-in-the-blank query φ is an
atomic formula of the form Ans(v1, . . . , vn) where the variables
v1, . . . , vn are the free variables in φ.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Fill-in-the-Blank Questions (cont’d)
To answer the fill-in-the-blank query φ we form the disjunction
Ans(v1, . . . , vn) ∨ ¬φ
and convert it to CNF.
Then we use resolution and terminate our search when we reach a
clause containing only answer literals (instead of terminating when
we reach the empty clause).
'
&
$
%
Fill-in-the-Blank Questions (cont’d)
For fill-in-the-blank questions, we can have:
• Termination with a clause which is a single answer literal
Ans(c1, . . . , cn). In this case, the constants c1, . . . , cn gives us
an answer to the query. There might be more answers
depending on whether there are more resolution refutations of
Ans(v1, . . . , vn) ∨ ¬φ. We can go on looking for more answers
but we can never be sure that we have found them all.
• Termination with a clause which is a disjunction of more than
one answer literals. In this case, one of the answer literals
contains the answer but we cannot say which one for sure.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Dealing with Equality
If we want to use equality in our resolution proofs, we can do it in
two ways:
• Add appropriate formulas that axiomatize equality in our
KB. What are these formulas?
• Use special versions of resolution that take resolution into
account.
The same is true for other special predicates such as arithmetic
ones <, ≤ etc.
'
&
$
%
Computational Complexity and Resolution
Resolution proofs can in general be exponentially long as the
following theorem demonstrates.
Theorem (Haken, 1985). There is a sequence of PL formulas
p1, p2, p3, . . ., each a tautology, such that the number of symbols of
¬pn when converted to CNF is O(n3), but the shortest resolution
refutation of it contains at least cn symbols (for a fixed c > 1).
There are various strategies that can be applied to make resolution
more efficient (unit preference, set of support, input resolution,
subsumption).
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Other Normal Forms: DNF
Definition. A FOL formula is in disjunctive normal form
(DNF) if it is a disjunction of conjunctions of literals.
Proposition. Every FOL formula is equivalent to a formula in
DNF.
'
&
$
%
Other Normal Forms: PNF
Definition. A FOL formula is in prenex normal form (PNF) if
all its quantifiers appear at the front of the formula.
Proposition. Every FOL formula is equivalent to a formula in
PNF.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Conversion to Prenex Normal Form
• Steps 1 and 2 of conversion to CNF.
• Move quantifiers to the front of the formula using the
equivalences
(∀x)(φ ∧ ψ) ≡ (∀x)φ ∧ ψ
(∀x)(φ ∨ ψ) ≡ (∀x)φ ∨ ψ
(∃x)(φ ∧ ψ) ≡ (∃x)φ ∧ ψ
(∃x)(φ ∨ ψ) ≡ (∃x)φ ∨ ψ
The above equivalences hold only if x does not appear free in ψ.
Step 1 and 2 are not necessary if we introduce equivalences for the
rest of the connectives.
'
&
$
%
A Brief History of Reasoning
450b.c. Stoics propositional logic, inference (maybe)
322b.c. Aristotle “syllogisms” (inference rules), quantifiers
1847 Boole propositional logic (again)
1879 Frege first-order logic
1922 Wittgenstein proof by truth tables
1930 Godel ∃ complete algorithm for proofs in FOL
1930 Herbrand complete algorithm for proofs in FOL
(reduce to propositional)
1931 Godel ¬∃ complete algorithm for arithmetic proofs
1960 Davis/Putnam “practical” algorithm for propositional logic
1965 Robinson “practical” algorithm for FOL—resolution
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Soundnes and Completeness of FOL Inference
Theorem. (Godel, 1930)
KB |= φ iff KB ` φ.
Theorem. Checking entailment (equivalently: validity or
unsatisfiability or provability) of a FOL formula is a recursively
enumerable problem.
'
&
$
%
Informal Definitions
A yes/no problem P is called recursive or decidable if there is an
algorithm that, given input x, outputs “yes” and terminates
whenever x ∈ P , and “no” and terminates when x 6∈ P .
A yes/no problem P is called recursively enumerable or
semi-decidable if there is an algorithm that, given input x,
outputs “yes” and terminates whenever x ∈ P but computes for
ever when x 6∈ P .
The above algorithm is not a very useful because, if it has not
terminated, we cannot know for sure whether we have waited long
enough to get an answer.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Godel’s Incompleteness Theorem
Theorem. (Godel, 1930)
For any set A of true sentences of number theory, and, in particular,
any set of basic axioms, there are other true sentences of
number arithmetic that cannot be proved from A.
Sad conclusion: We can never prove all the theorems of
mathematics within any given system of axioms.
'
&
$
%
Soundness and Completeness (cont’d)
Theorem. (Herbrand, 1930)
If a finite set ∆ of clauses is unsatisfiable then the Herbrand base of
∆ is unsatisfiable.
Theorem. (Robinson, 1965)
Soundness of Resolution. If there is a resolution refutation of a
clause φ from a set of clauses KB then KB |= φ.
Theorem. (Robinson, 1965)
Completeness of Resolution. If a set of clauses KB is
unsatisfiable then there is a resolution refutation of the empty
clause from KB.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Soundness and Completeness (cont’d)
Question: How can we use a complete proof procedure to
determine whether a sentence φ is entailed by a set of sentences
KB?
Answer: We can negate φ, add it to KB and then use resolution.
But we will not know whether KB |= φ until resolution finds a
contradiction and returns.
While resolution has not returned, we do not know whether the
system has gone into a loop or the proof is about to pop out!!!
'
&
$
%
Some Good News
There are many interesting subsets of FOL that are decidable (e.g.,
monadic logic, Horn logic etc.).
Many practical problems can be encoded in these subsets!
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Knowledge-Based Agents
function KB-Agent(percept) returns an action
static KB, a knowledge-base
t, a counter, initially 0, indicating time
Tell(KB,Make-Percept-Sentence(percept, t))
action← Ask(KB,Make-Action-Query(t))
Tell(KB,Make-Action-Sentence(action, t))
t← t+ 1
return action
Using the FOL machinery we presented, how can we implement
knowledge-based agents?
'
&
$
%
Logical Reasoning Systems
• Logic programming languages (most notably Prolog).
Prolog was developed in 1972 by Alain Colmerauer and it is based on the
idea of backward chaining. Prolog’s motto (after Kowalski) is:
Algorithm = Logic + Control
Logic programming and Prolog was the basis of much exciting research and
development in logic programming in the 70’s and 80’s.
Logic programming and its extensions is still a very lively area of research
that has been applied in many areas (databases, natural language processing,
expert systems etc.). Of particular, importance is constraint logic
programming (CLP) that integrates logic programming with CSPs. CLP has
been used with success recently in many combinatorial optimisation
applications (e.g., scheduling, planning, etc.)
See www.afm.sbu.ac.uk/logic-prog/ for various Prolog implementations.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Logical Reasoning Systems (cont’d)
• Production systems based on the idea of forward-chaining (where
the conclusion of an implication is interpreted as an action to be
executed).
Production systems were used a lot in early AI work (particularly in
rule-based expert systems).
There are various implemented production systems such as OPS-5 or
CLIPS (see http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/
areas/expert/systems/clips/0.html).
'
&
$
%
Logical Reasoning Systems (cont’d)
• Theorem provers are more powerful tools than Prolog since they can
deal with full first-order logic.
Examples: OTTER, PTTP, etc.
Theorem provers have come up with novel mathematical results (lattice
theory, a formal proof of Godel’s incompleteness theorem, Robbins
algebra).
They are also used in verification and synthesis of both hardware and
software because both domains can be given correct axiomatizations.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Readings
• AIMA, Chapter 9.
• M. Genesereth and N. Nilsson. “Logical Foundations of
Artificial Intelligence”, Chapter 4.
'
&
$
%
Advanced FOL for KR
• Power and limitations
• Non-monotoning reasoning
• FOL and relational databases
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The Power and Limitations of FOL
The syntax, semantics and proof-theory of pure FOL offer us a
general, flexible and powerful framework for KR.
FOL has weaknesses too.
Because FOL is very general and it is based on very primitive concepts
(constants, variables, function symbols, predicates and quantifiers), it
offers no explicit help for defining higher-level abstractions:
• taxonomic information
• physical composition
• measurements
• events, actions, processes, plans, time, space, causality
'
&
$
%
The Power and Limitations of FOL (cont’d)
FOL does not allow:
• non-monotonicity
• belief revision
• uncertainty
This is a serious weakness of FOL and has been addressed by more
appropriate KR formalisms (in many cases extensions of FOL
itself).
Some formalisms for non-monotonic reasoning are presented later.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Taxonomic Information in FOL
Mammals
Persons
Female Persons
Mary John
Male Persons
Legs
2 HasMother
SubsetOf
SubsetOf SubsetOf
MemberOf MemberOf
SisterOf Legs 1
'
&
$
%
Taxonomic Information
The concept of a category or class is an important abstraction in
knowledge representation and reasoning. Categories can be
organized into taxonomies.
Taxonomies have been used profitably for centuries in various
technical fields (biology, medicine, library science etc.).
Taxonomic information plays a central role in various database and
object-oriented models.
Taxonomies are very important in modern web applications:
knowledge management, information retrieval and dissemination,
information integration, e-commerce, e-science, etc.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Categories in FOL
FOL offers us two ways to talk about categories:
• Predicates. For example: Person(x) or Basketball(x)
• Constants (through reification). For example: Persons or
Basketballs.
In this case we also need predicates for membership and
subclass: MemberOf (or ∈) and SubsetOf (or ⊂).
Both of the above ways are needed! But we have to be careful
when defining the semantics of the resulting languages.
Other issues: inheritance, disjointness and partitioning
'
&
$
%
Examples
• An object is a member of a category.
MemberOf(BB12, BasketBalls)
• A category is a subclass of another category.
SubsetOf(BasketBalls, Balls)
• All members of a category have some properties.
(∀x)(MemberOf(x,BasketBalls)⇒ Round(x))
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Examples (cont’d)
• All members of a category can be recognized by some
properties.
(∀x)(Orange(x) ∧ Round(x) ∧ Diameter(x) = 9.5′′
∧MemberOf(x,Balls)⇒MemberOf(x,BasketBalls))
• A category as a whole has some properties.
MemberOf(Dogs,DomesticatedSpecies)
In this case DomesticatedSpecies is a category of
categories.
'
&
$
%
Examples (cont’d)
Can we have categories of categories of categories? Are they
useful?
In various OO modeling frameworks (e.g., Telos) we have 4+ levels
of data modeling:
• Instances (e.g., John)
• Classes (e.g., Person)
• Meta-classes (e.g., the class of all classes with no instances).
• Meta-meta-classes (e.g., the class of all meta-classes we have
defined).
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Examples (cont’d)
In the Information Resource Dictionary Standard (IRDS) we have
4 levels of data description:
• Level 1: Application data (e.g., code).
• Level 2: Data dictionary for application data.
• Level 3: Schema of the data dictionary.
• Level 4: Different types of IRDS schemas.
'
&
$
%
Other Relations Among Categories
Often we want to say that two categories are disjoint or that they
form an exhaustive decomposition of some other category or
that they form a partition of some other category.
Examples:
Disjoint({Animals, V egetables})
ExhaustiveDecomposition({Americans, Canadians,Mexicans},
NorthAmericans)
Partition({Males, Females}, Animals)
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Definitions
The three predicates used above can be defined as follows:
Disjoint(s) ≡
(∀c1, c2)(c1 ∈ s ∧ c2 ∈ s ∧ c1 6= c2 ⇒ Intersection(c1, c2) = {})
ExhaustiveDecomposition(s, c) ≡ (∀i)(i ∈ c⇒ (∃c2)(c2 ∈ s ∧ i ∈ c2))
Partition(s, c) ≡ Disjoint(s) ∧ ExhaustiveDecomposition(s, c)
'
&
$
%
Categories and Definitions
Some categories can be given “if and only if” definitions.
Example: An object is a triangle if and only if it is a polygon
with three sides.
Natural kind categories cannot be defined in this way.
Example: Try to define tomatoes with an “if and only if”
definition.
For natural kind categories, we can write down “if and only if”
definitions that hold for typical instances.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Physical Composition
The idea that one object is part of another is an important one in
many applications (e.g., engineering design or e-commerce
catalogs). We use the general predicate PartOf to represent such
information.
Example:
PartOf(Athens,Greece), PartOf(Greece,WesternEurope)
PartOf(WesternEurope, Europe), PartOf(Europe, Earth)
The relation PartOf is irreflexive and transitive:
(∀x)(¬PartOf(x, x))
(∀x, y, z)(PartOf(x, y) ∧ PartOf(y, z)⇒ PartOf(x, z))
Thus we can conclude: PartOf(Athens,Earth).
'
&
$
%
Physical Composition (cont’d)
Categories of composite objects are often characterized by the
structure of those objects i.e., the parts and how the parts relate
to the whole.
Example: How can we define a biped?
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Defining a biped
Biped(a) ≡
(∃l1, l2, b)(Leg(l1) ∧ Leg(l2) ∧ Body(b) ∧
PartOf(l1, a) ∧ PartOf(l2, a) ∧ PartOf(b, a) ∧
Attached(l1, b) ∧ Attached(l2, b) ∧
l1 6= l2 ∧ (∀l3)(Leg(l3)⇒ (l3 = l1 ∨ l3 = l2)))
Description logics are a particular kind of logics for KR that
allow us to write definitions such as the above more easily.
'
&
$
%
The Power and Limitations of FOL (Revisited)
We mentioned that FOL provides no explicit support for the
definition of:
1. taxonomic information and categories
2. physical composition
3. measurements
4. events, actions, processes, plans, time, space and causality
We have now shown how FOL can be used to represent knowledge
for the first two of the above cases. The reader interested in Cases
3 and 4 should see Chapters 10-13 of the book AIMA2ed.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Monotonicity of FOL
Theorem. Let KB be a set of FOL formulas and α, β two
arbitrary FOL formulas. If KB |= α then KB ∪ {β} |= α.
The above theorem captures the monotonicity property of FOL.
The monotonicity of FOL becomes a very awkward feature in the
following cases:
• Closed world reasoning.
• When we want to represent defaults, exceptions or
qualifications.
• When we want to revise our beliefs in the presence of new
knowledge.
'
&
$
%
Closed World Reasoning
Example: Imagine the course schedule of a university department
available on the Web. How would you represent all relevant
information about who teaches what course in FOL?
You might have something like:
Teaches(Alex, CS100), T eaches(Bob, P100),
T eaches(Charlie, P200)
Now answer the following question:
• Who is teaching CS100?
The answer to this question is “Alex” as we can see from the above
KB.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Closed World Reasoning (cont’d)
Now answer the following questions:
• Is Bob teaching CS100?
• Is Alex teaching CS200?
Assuming that the schedule is complete, the answer to both of
these question is “no” but this is not explicit in the schedule KB!
Here we have a situation where in the absence of information
to the contrary we assume that Bob is not teaching CS100 and
Alex is not teaching CS200.
This is how we interpret answers to queries in relational databases
as well.
This kind of reasoning is called non-monotonic and cannot be
supported directly by FOL.
'
&
$
%
The Closed World Assumption
In traditional relational databases and many knowledge bases it is
natural to make the assumption that available information is
complete.
Let KB be a knowledge base and φ a ground atomic
sentence. If KB 6|= φ then assume φ to be false.
The above assumption is usually called the closed world
assumption (CWA) originally proposed by Ray Reiter in 1978.
CWA is a non-monotonic KR feature.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The CWA More Formally
Let KB be a knowledge base (i.e., a set of FOL formulas).
Let Closure(KB) be the closure of KB under logical entailment:
Closure(KB) = {φ : KB |= φ}
Let
KBasm = {¬ψ : ψ is ground and KB 6|= ψ}
denote the set of assumptions. Then the completion of KB
under the CWA is defined as follows:
CWA(KB) = Closure(KB) ∪KBasm
Exercise: Apply the CWA to the course KB of the previous
example.
'
&
$
%
Problems with the CWA
The CWA can result in inconsistencies (this depends critically on
syntactic features e.g., what formulas we have in the KB).
Example: Let KB be
Professor(John) ∨ Professor(Mary).
Then
CWA(KB) = {¬Professor(John), ¬Professor(Mary)}
and KB ∪ CWA(KB) is inconsistent.
Theorem. If the CNF of KB consists only of Horn clauses and is
consistent, then the CWA(KB) is consistent.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Revising our Beliefs
Now assume that we have just learned that Alex teaches the course
CS200 as well.
The KB should now become:
Teaches(Alex, CS100), T eaches(Bob, P100),
T eaches(Charlie, P200), T eaches(Alex, CS200)
Now answer the following question:
• Is Alex teaching CS200?
The answer to this question now is “yes” and it is different than
the answer we got previously.
This kind of reasoning is non-monotonic.
'
&
$
%
Defaults
Example: By default, persons have two legs.
How do we represent this information in FOL?
The sentence
(∀x)(Person(x)⇒ Legs(x, 2))
is an approximation. It is not entirely appropriate because it
talks about all persons.
Now answer the following question:
• How many legs does John have?
The answer to this question is “two” since this is what the above
KB gives us.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Revising our Beliefs
Now assume that we have just learned that John actually has one
leg only.
We should now be able to update the previous KB and in this way
revise our beliefs about John.
As a result, the answer to the previous question should become
“one” and it is different than the answer we got previously.
This kind of reasoning is non-monotonic.
'
&
$
%
Defaults, Exceptions or Qualifications
So how do we update the KB?
The sentence
(∀x)(Person(x)⇒ Legs(x, 2))
could be modified to become
(∀x)(Person(x) ∧ x 6= John⇒ Legs(x, 2)).
If we adopt this representation, we need to write down an
exception for every atypical persons. This is usually called the
qualification problem.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Defaults, Exceptions or Qualifications (cont’d)
Problem: How do we represent default information and at the
same time deal with exceptions and belief revision in a graceful
way?
This problem has been studied in detail in the area of AI called
non-monotoning reasoning and non-monotonic logics have
been invented.
'
&
$
%
FOL and Relational Databases - Example
TEACHER
NAME
Alex
Bob
Charlie
COURSE
NUMBER
CS100
CS200
P100
P200
STUDENT
NAME
John
Mary
Pam
Paul
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example (cont’d)
TEACHES
NAME NUMBER
Alex CS100
Alex CS200
Bob P100
Charlie P200
ENROLLED
NAME NUMBER
John CS100
John P100
Mary CS100
Pam P100
Paul CS200
Paul P200
'
&
$
%
FOL and Relational Databases
How can we use concepts of FOL to understand the theory of
relational databases?
Two perspectives have been developed in the literature:
model-theoretic and proof-theoretic.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The Model-Theoretic Perspective
For a given database DB, we can define a FO database language
LDB as follows:
• For each relation R in DB, we have a corresponding predicate
symbol PR of the same arity in LDB.
• For each attribute value v in a relation of DB, we have a
corresponding constant Cv in LDB.
• LDB has no function symbols.
'
&
$
%
The Model-Theoretic Perspective (cont’d)
The given database DB is considered to be an interpretation
IDB of LDB with the following properties:
• The universe of the interpretation is the set of all values in the
database.
• Each constant Cv is mapped to attribute value v.
• The interpretation of each predicate PR is given by the relation
R.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Queries and Integrity Constraints
The language LDB can be used to write queries and integrity
constraints.
Queries:
x : Teacher(x) ∧ Teaches(x,CS100)
: Teaches(Charlie, CS100)
Integrity Constraints:
(∀x)(Course(x)⇒ (∃y)(Teacher(y) ∧ Teaches(y, x)))
(∀x)(Course(x)⇒ (∃y)(Student(y) ∧ Enrolled(y, x)))
'
&
$
%
Queries and Integrity Constraints (cont’d)
Answering a query q is equivalent to determining whether the
interpretation IDB satisfies q.
Verifying that an integrity constraint C holds is equivalent
to determining whether the interpretation IDB satisfies C.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The Proof-Theoretic Perspective
Let DB be a given database. As in the model-theoretic perspective we
can define a FO language LDB .
We can now write a FO theory (i.e., a set of FO sentences) TDB that
corresponds to DB.
Example:
Teacher(Alex), T eacher(Bob), T eacher(Charlie)
Course(CS100), Course(CS200), Course(P100), Course(P200)
Teaches(Alex, CS100), T eaches(Alex,CS200)
Teaches(Bob, P100), T eaches(Charlie,C200)
...
'
&
$
%
Queries and Integrity Constraints
In the proof-theoretic perspective, the language LDB can again be
used to write queries and integrity constraints.
Queries:
x : Teacher(x) ∧ Teaches(x,CS100)
: Teaches(Charlie, CS100)
Integrity Constraints:
(∀x)(Course(x)⇒ (∃y)(Teacher(y) ∧ Teaches(y, x)))
(∀x)(Course(x)⇒ (∃y)(Student(y) ∧ Enrolled(y, x)))
Now answering a query q could be done by determining whether
q logically follows (equivalently: can be proven) from TDB . Let
us try ...
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example
Database:
Teacher(Alex), T eacher(Bob), T eacher(Charlie)
Course(CS100), Course(CS200), Course(P100), Course(P200)
Teaches(Alex, CS100), T eaches(Alex,CS200)
Teaches(Bob, P100), T eaches(Charlie,C200)
Queries:
: Teacher(Alex)
: (∃x)Course(x)
: (∃x, y)(Teacher(x) ∧ Course(y) ∧ Teaches(x, y))
We can use resolution to see that the answer to all of these queries
is “yes”.
'
&
$
%
Example (cont’d)
Database:
Teacher(Alex), T eacher(Bob), T eacher(Charlie)
Course(CS100), Course(CS200), Course(P100), Course(P200)
Teaches(Alex, CS100), T eaches(Alex,CS200)
Teaches(Bob, P100), T eaches(Charlie,C200)
Queries:
: Teacher(CS100)
: ¬Teacher(CS100)
The answers to these queries are “no” and “yes” respectively. But
resolution will not help us in this case (try it!). What is the
problem?
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Implicit Assumptions in Databases
In our database we have made two assumptions silently:
• The information in the database is complete.
• Different constants name different objects (i.e., object CS100 is
different than objects Alex, Bob and Charlie).
How can we solve the problem formally?
We can use predicate completion, the unique names
assumption and some axioms for equality to capture the
above assumptions.
'
&
$
%
Predicate Completion
Let us consider the following simple KB:
Teacher(Alex)
This KB can be written equivalently as:
(∀x)(x = Alex⇒ Teacher(x))
The above formula can be taken as the “if” part of the definition
for predicate Teacher. The assumption that there are no other
teachers can now be captured by writing the “only if” part of the
definition:
(∀x)(Teacher(x)⇒ x = Alex)
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Predicate Completion (cont’d)
If our KB was
Teacher(Alex), T eacher(Bob)
then the “if” and “only if” forms can be combined as follows:
(∀x)(x = Alex ∨ x = Bob⇔ Teacher(x))
For a knowledge base KB and predicate P , we will denote the
completion of KB with respect to P as COMP (KB;P ).
'
&
$
%
The Unique Names Assumption
In many knowledge bases it is also natural to assume that distinct
names refer to distinct objects. This is usually called the
unique names assumption (UNA).
Example: Let KB be:
Teaches(Alex, CS100), T eaches(Bob, P100)
Then UNA(KB) is:
Alex 6= Bob, CS100 6= P100,
CS100 6= Bob, CS100 6= Alex,
P100 6= Bob, P100 6= Alex
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example: Completion+UNA+Equality
Database:
Teacher(Alex), T eacher(Bob), T eacher(Charlie)
Course(CS100), Course(CS200), Course(P100), Course(P200)
Teaches(Alex, CS100), T eaches(Alex,CS200)
Teaches(Bob, P100), T eaches(Charlie,C200)
Completion:
(∀x)(Teacher(x)⇔ (x = Alex ∨ x = Bob ∨ x = Charlie))
(∀x)(Course(x)⇔ (x = CS100 ∨ x = CS200 ∨ x = P100 ∨ x = P200))
UNA:
Alex 6= Bob, Alex 6= Charlie, . . . , P100 6= P200
'
&
$
%
Example (cont’d): Equality Axioms
Reflexivity:
(∀x)(x = x)
Commutativity:
(∀x, y)(x = y ⇒ y = x)
Transitivity:
(∀x, y, z)(x = y ∧ y = z ⇒ x = z)
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example (cont’d)
Database:
Teacher(Alex), T eacher(Bob), T eacher(Charlie)
Course(CS100), Course(CS200), Course(P100), Course(P200)
Teaches(Alex, CS100), T eaches(Alex,CS200)
Teaches(Bob, P100), T eaches(Charlie,C200)
Query:
: (∀x)(Teacher(x) ∨ Course(x))
The answer to this query is yes but resolution will not give it to us.
Also, predicate completion, the UNA and equality axioms will not
help us. What is the problem now?
'
&
$
%
The Domain Closure Assumption
In many knowledge bases it is natural to assume that the only
objects in the domain are the ones that can be named
using the constants and function symbols of the language.
This is usually called the domain closure assumption (DCA).
Example: Let KB be:
Teaches(Alex, CS100), T eaches(Bob, P100)
Then DCA(KB) is:
(∀x)(x = Alex ∨ x = Bob ∨ x = CS100 ∨ x = P100)
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Queries and Answers: Proof-Theoretic Perspective
Let DB be a database expressed as a FO theory TDB and q be a
query. To answer q we can decide whether q logically follows
(equivalently: can be proven) from:
• The completion of theory TDB .
• UNA
• DCA
• The equality axioms for reflexivity, commutativity and
transitivity.
'
&
$
%
Predicate Completion in General
Definition. Let KB be a set of clauses. We will say that KB is
solitary in P if each clause with a positive occurence of P has at
most one occurrence of P .
Example:
Q(A) ∨ P (A) ∨ R(A), Q(A) ∨ ¬P (B) ∨ P (A)
The first clause is solitary in P but not the second.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Predicate Completion in General (cont’d)
We will define predicate completion for P only for clauses solitary
in P . We can write each such solitary clause as
(∀y)(Q1 ∧ · · · ∧ Qm ⇒ P (t))
where t is an n-tuple (t1, . . . , tn) of terms.
There may be no Qi in which case the clause is just P (t). The Qi
and t may contain variables, let us say the tuple of variables y.
'
&
$
%
Predicate Completion in General (cont’d)
The above formula is equivalent to
(∀x)(∀y)(x = t ∧ Q1 ∧ · · · ∧ Qm ⇒ P (x))
where x is a tuple of variables not occurring in t and x = t is an
abbreviation for the conjunction
x1 = t1 ∧ · · · ∧ xn = tn.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Predicate Completion in General (cont’d)
Since the variables y now occur only in the antecedent of the
implication, the above is equivalent to:
(∀x)(∃y)(x = t ∧ Q1 ∧ · · · ∧ Qm ⇒ P (x))
'
&
$
%
Predicate Completion in General (cont’d)
Let us suppose we have exactly k clauses solitary in P in our
knowledge base. Then we will transform these clauses as above to
arrive at:
(∀x)(E1 ⇒ P (x))
(∀x)(E2 ⇒ P (x))
...
(∀x)(En ⇒ P (x))
or equivalently
(∀x)(E1 ∨ E2 . . . ∨ En ⇒ P (x))
This is the “if” part of the definition of P .
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Predicate Completion in General (cont’d)
The “only if” completion of P then is:
(∀x)(P (x)⇒ E1 ∨ E2 . . . ∨ En)
Definition. Let KB be a set of clauses all of them solitary in
predicate P . The completion of P in KB (denoted by
COMP (KB;P )) is defined as follows:
KB ∧ (∀x)(E1 ∨ E2 . . . ∨ En ⇔ P (x))
'
&
$
%
Example
(∀x)(Ostrich(x)⇒ Bird(x))
Bird(Tweety)
¬Ostrich(Sam)
The above knowledge base KB represents the following
information:
All ostriches are birds. Tweety is a bird. Sam is not an
ostrich.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example (cont’d)
Then COMP (KB;P ) allows us to assume that the only birds are
the ones that the KB tell us about.
Thus we can conclude ¬Bird(Sam) because
COMP (KB;P ) + UNA+DCA+ Equality Axioms |= ¬Bird(Sam).
This conclusion can later on be retracted if we discover that Sam
is actually a bird. Thus predicate completion allows us to do useful
non-monotonic reasoning even in situations where we have a
KB which is more complex than a relational DB.
Predicate completion provides the basis for the semantics of
negation-as-failure in logic programming e.g., Prolog (Clarke,
1978).
'
&
$
%
Readings
• Stuart Russell and Peter Norvig. Artificial Intelligence: A
Modern Approach, Prentice Hall, 2nd edition (2002).
www.cs.berkeley.edu/~russell/aima.html.
Chapter 10.
• Michael R. Genesereth and Nils J. Nilsson. Logical
Foundations of Artificial Intelligence, Morgan Kaufmann, 1987.
Chapter 6.
• Ray Reiter. Towards a Logical Reconstruction of Relational
Database Theory. In M. L. Brodie, J. Mylopoulos and J. W.
Schmidt (eds.) On Conceptual Modelling: Perspectives from
Artificial Intelligence, Databases and Programming Languages.
Springer-Verlag, 1984.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
An Introduction to Prolog
• The programming language Prolog
• Examples of programs in Prolog
• Prolog and FOL machinery: entailment and inference
'
&
$
%
Prolog and Logic Programming
• Prolog stands for “programming in logic”. Prolog is the first
and the most widely used logic programming language.
• Logic programming is the programming language paradigm
that is based on the following view:
A problem should be formalised in logic (i.e., in a
“declarative” way as opposed to the procedural way we
see in languages such as C). Inference processes can be
run to solve the problem.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Prolog and Logic Programming (cont’d)
Logic programming took off in the 70’s based on pioneering work
by Robert Kowalski. Prolog itself was invented by Alain
Colmerauer in 1972.
This idea is summed up in the famous slogan:
Algorithm = Logic + Control
Logic programming was a very influential field of research in the
80’s and 90’s fuelled particularly by Japan’s 5th generation project.
'
&
$
%
Prolog
• Prolog is a programming language centered around a small set
of basic mechanisms, including unification, tree-based data
structures and backtracking.
• It is a great programming language for symbolic,
non-numeric computation.
• It is well suited for problems that involve objects and relations
between them.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
What is a Prolog program?
A Prolog program is simply a set of Horn formulas (or Horn
clauses or simply clauses in the Prolog terminology).
A Horn clause is a FOL formula in any of the following forms:
• An atomic formula (also called fact in the Prolog terminology).
• A formula of the form
q:- p1, p2, ..., pn.
where p1, p2,..., pn, q are atomic formulas.
Such formulas are called rules in the Prolog terminology.
'
&
$
%
Example: Defining family relations
parent(pam, bob). parent(tom, bob). parent(tom, liz). parent(bob,
ann). parent(bob, pat). parent(pat, jim).
This is the “hello world” program in Prolog.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Prolog Programs: Facts
• The fact that Tom is parent of Bob can be written in Prolog as:
parent(tom, bob).
parent is a predicate; tom and bob are constants.
The fact parent(tom,bob) represents symbolically an instance
of the “parenthoold” relation in our world.
• Prolog syntax: Predicates, constants and functions in Prolog
are written in lowercase.
'
&
$
%
Prolog Programs: Queries
The previous Prolog program is essentially a relational database
defining a relation PARENT. This is the reason we often speak of
Prolog databases.
We can use Prolog to pose queries about “parenthood” (in
database terminology: to query the relation PARENT).
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Examples of Queries
• Is Bob a parent of Pat?
?- parent(bob, pat).
Answer: yes
• Is Liz a parent of Pat?
?- parent(liz, pat).
Answer: no
Prolog answers a query without variables with either yes, or no.
'
&
$
%
Examples of Queries (cont’d)
We can also have queries with variables.
• Who are Liz’s parents?
?- parent(X, liz).
Answer: X=tom
• Who are Bob’s children?
?- parent(bob, X).
Answer: X=ann; X=pat
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Prolog Queries
A query is an expression of the form
?-p1, p2, ..., pn.
where p1, p2, ..., pn are atomic formulas (possibly with free
variables).
Comments:
• p1, p2, ..., pn are also called goals.
• When the query has no free variables, its answer is yes or no.
• When we have variables in a query, Prolog will return all the
values of these variables such that the query logically follows
from the program.
• Prolog syntax: Variables in Prolog are in upper case.
'
&
$
%
More Examples
We can also ask more complicated queries.
• Who are the grandparents of Pat?
?- parent(Y, pat), parent(X, Y).
Answer: Y=bob, X=pam; Y=bob, X=tom
• Who are Jim’s great grandparents?
?- parent(Y, pat), parent(X, Y), parent(Z, X).
Answer: Y=pat, X=bob, Z=pam; Y=pat, X=bob, Z=tom
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The Example Revisited
Let us add information on people’s sex:
female(pam). male(tom). male(bob). female(liz). female(pat).
female(ann). male(jim).
'
&
$
%
The Example Revisited (cont’d)
An alternative representation would be:
sex(pam, feminine). sex(tom, masculine). sex(bob, masculine). ...
The relation sex is binary.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The Example Revisited (cont’d)
Let us introduce the predicate offspring as the inverse of the
predicate parent.
We could provide the list of simple facts about the offspring
relation. For example: offspring(liz, tom).
Alternative: why not utilize the information available in the
predicate parent?
Rule: For all X and Y, Y is an offspring of X if X is a parent of Y.
'
&
$
%
The Example Revisited (cont’d)
The corresponding Prolog rule is:
offspring(Y, X) :- parent(X, Y).
Prolog rules have:
• a condition part or body (the right-hand side of the rule) and
• a conclusion part or head (the left-hand side of the rule).
The meaning of a rule is: If the body holds, the head holds as well.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The Example Revisited (cont’d)
• The predicate mother can be defined by:
mother(X, Y) :- parent(X, Y), female(X).
The predicate grandparent can be defined by:
grandparent(X, Z) :- parent(X, Y), parent(Y, Z).
The predicate sister can be defined by:
sister(X, Y) :-
parent(Z, X),
parent(Z, Y),
female(X),
different(X, Y).
'
&
$
%
Recursive Rules
Prolog rules can be recursive. A rule is recursive if the predicate
in its head also appears in its body.
Example: The following rules define the predicate predecessor.
predecessor(X, Z) :-
parent(X, Z).
predecessor(X, Z) :-
parent(X, Y),
predecessor(Y, Z).
The second rule is recursive.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The Final Program
parent(pam, bob). % Pam is a parent of Bob
parent(tom, bob). parent(tom, liz). parent(bob, ann). parent(bob,
pat). parent(pat, jim).
female(pam). % Pam is female
male(tom). % Tom is male
male(bob). female(liz). female(ann). female(pat). male(jim).
'
&
$
%
The Final Program (cont’d)
offspring(Y, X) :- % Y is an offspring of X if
parent(X, Y). % X is a parent of Y
mother(X, Y) :- % X is the mother of Y if
parent(X, Y), % X is a parent of Y and
female(X). % X is female
grandparent(X, Z) :- % X is a grand parent of Z if
parent(X, Y), % X is a parent of Y and
parent(Y, Z). % Y is a parent of Z
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The Final Program (cont’d)
sister(X, Y) :- % X is a sister of Y if
parent(Z, X),
parent(Z, Y) % X and Y have the same parents
female(X), % X is female and
different(X, Y). % X and Y are different
predecessor(X, Z) :- % Rule 1
parent(X, Z).
predecessor(X, Z) :- % Rule 2
parent(X, Y),
predecessor(Y, Z).
'
&
$
%
Rules vs. Views
Rules can be understood to define views over the database
relations defined by other rules or facts.
Example: The predicate offspring defines a view over predicate
parent.
Prolog syntax:
offspring(Y, X) :-
parent(X, Y).
SQL syntax:
CREATE VIEW OFFSPRING AS SELECT * FROM PARENT
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Entailment and Inference in Prolog
• Prolog clauses are a proper subset of FOL (Horn formulas).
• What happens with entailment and inference in this subset?
'
&
$
%
Entailment and Prolog queries
Proposition. Let q be query with no free variables posed over a
Prolog database DB. The answer to q is yes iff DB |= q.
Proposition. Let q be a query with no free variables posed over a
Prolog database DB. The answer to q is yes iff Generalized Modus
Ponens (forward chaining!) will infer q after it is applied to DB a
finite number of times.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Entailment and Prolog queries with Free Variables
Proposition. Let q be query over a Prolog database DB. Let θ be a
substitution over the variables of q. The answer to q contains θ iff
DB |= SUBST (θ, q).
Proposition. Let q be a query over a Prolog database DB. Let θ be a
substitution over the variables of q. The answer to q contains θ iff
Generalized Modus Ponens (forward chaining!) will infer SUBST (θ, q)
after it is applied to DB a finite number of times.
'
&
$
%
Prolog, Backward Chaining and Resolution
Prolog uses backward chaining to compute answers to queries. More
precisely, it uses a certain form of resolution called SLD resolution.
Example:
predecessor(X, Z) :- %pr1
parent(X, Z).
predecessor(X, Z) :- %pr2
parent(X, Y),
predecessor(Y, Z).
parent(pam, bob). parent(bob, ann). parent(tom, bob).
parent(bob, pat). parent(tom, liz). parent(pat, jim).
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Proof Tree for predecessor(pam,bob)
predecessor(pam, bob)
parent(pam, bob)
by rule pr1 MGU{X/pam, Y/bob}
yes
'
&
$
%
Proof Tree for predecessor(pam,ann)
predecessor(pam, ann)
parent(pam, ann)
by rule pr1
parent(pam, Y) predecessor(Y, ann)
by rule pr2
no
MGU{X/pam, Z/ann}
,
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Proof Tree for predecessor(pam,ann) (cont’d)
by fact parent(pam, bob)
predecessor(pam, ann)
parent(pam, ann)
by rule pr1
parent(pam, Y) predecessor(Y, ann)
by rule pr2
no
predecessor(bob, ann)
MGU{X/bob}
parent(bob, ann)
by rule pr1
yes
MGU{X/bob, Z/ann}
MGU{X/pam, Z/ann}
'
&
$
%
Declarative vs. Procedural Programming
In Prolog we can understand the meaning of a program in two ways:
• Declarative meaning: This determines what the output of
the program will be and can be defined in terms of entailment
in FOL.
• Procedural meaning: This determines how the output of
the program is obtained and can be defined in terms of
backward chaining and proof trees.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Execution of Prolog Programs
execute goal list
program
success/failure indicator
instantiation of variables
'
&
$
%
Execution of Prolog Programs: the Algorithm
procedure execute (Program, GoalList, Success); begin if empty(GoalList) then Success := true else begin Goal := head(GoalList); OtherGoals := tail(GoalList); Satisfied := false; while not Satisfied and "more clauses in program" do begin Let next clause in Program be H :- B1, ..., Bn. Construct a variant of this clause H' :- B1', ..., Bn'. unify(Goal, H', UnificationOK, Instant); if UnificationOK then begin NewGoals := append([B1',...,Bn'], OtherGoals); NewGoals := substitute(MGU, NewGoals); execute(Program, NewGoals, satisfied); end end ; Success := satisfied end end ;
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Goal and Clause Order Matters!
• Goals are processed left to right.
• Clauses are selected from top to bottom.
'
&
$
%
Infinite Loops
Example program:
p :- p.
Example query:
?- p.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Infinite Loops (cont’d)
Consider our earlier predecessor example.
Version 1:
pred1(X,Z):-
parent(X,Z).
pred1(X,Z):-
parent(X,Y), pred1(Y,Z).
'
&
$
%
Infinite Loops (cont’d)
Version 2: Swap clauses
pred2(X,Z):-
parent(X,Y), pred2(Y,Z).
pred2(X,Z):-
parent(X,Z).
What happens if we pose the query ?-pred2(tom,pat) ?
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Infinite Loops (cont’d)
Version 3: Swap goals in the second clause
pred3(X,Z):-
parent(X,Z).
pred3(X,Z):-
pred3(Y,Z), parent(X,Y).
What happens if we pose the query ?-pred3(tom,pat) ?
'
&
$
%
Infinite Loops (cont’d)
Version 4: Swap clauses and goals in the second clause
pred1(X,Z):-
pred1(Y,Z), parent(X,Y).
pred1(X,Z):-
parent(X,Z).
What happens if we pose the query ?-pred1(tom,pat) ?
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Control in Prolog
In the slogan
Algorithm = Logic + Control
both logic and control are important!
Prolog offers various two facilities for control:
• Ordering of clauses and goals.
• The cut operator ! to control backtracking. The cut operator
will be introduced later.
'
&
$
%
Prolog Systems
There are various nice Prolog systems available for many platforms.
For your project we propose that you choose:
• SICStus Prolog (available from http://www.sics.se/sicstus/)
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Readings
• Leon Sterling and Ehud Shapiro. The Art of Prolog. MIT
Press.
• Ivan Bratko, Prolog Programming for Artificial Intelligence,
2nd edition.
Chapters 1 and 2.
• SICStus Prolog manual.
'
&
$
%
More Features of Prolog
• Data objects
• Lists
• Operators and arithmetic
• The cut operator
• Negation as failure
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Data Objects in Prolog
The data objects in Prolog are called terms. Terms in Prolog are
like terms in FOL.
A term is a constant, a variable or a compound term.
A constant is an atom, an integer or a float.
'
&
$
%
Atoms
Atoms in Prolog can be constructed in three ways:
• Strings of letters, digits and underscores starting with a
lowercase letter.
Examples: anna, nil, x25, x 25
• Strings of special characters (depending on the
implementation).
• Strings of characters enclosed in quotes.
Example: ’Oliver Twist’
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Numbers
Integers and floats.
Details can vary depending on the implementation.
Note that Prolog is not a language aimed at arithmetic
calculations.
'
&
$
%
Variables
Variables are strings of letters, digits and underscores. Variables always
start with an upper case letter or an underscore.
Examples:
hasachild(X):- parent(X,Y).
hasachild(_p):- parent(_p,_c).
hasachild(X):- parent(X,_).
?- parent(X,_).
Variables consisting of a single underscore are called anonymous
variables. Anonymous variables are useful when they appear only in
the body of a clause or in a query as shown above.
The lexical scope of a variable is one clause.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Structured Objects
Structured objects in Prolog are represented by compound terms.
Example:
location(bridge, segment(point(1,1),point(2,3))).
location(factory, triangle(point(4,2),point(6,4),point(7,1))).
The above facts represent geographic knowledge about the location of a
bridge and a factory.
'
&
$
%
Compound Terms
A compound term consists of a functor (called the principal
functor of a term) and a sequence of one or more terms called
arguments.
A functor is characterized by its name, which is an atom, and its
arity (i.e., the number of its arguments).
Example: point, segment and triangle are called functors.
Compound terms can be pictorially represented as trees.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Lists
Because Prolog is a symbolic computation language, the list data
structure is very important.
A list is an ordered sequence of any number of items.
Example: [ann, tennis, tom, skiing]
Lists in Prolog are just another type of structured object and are
defined formally as follows. A list is
• either an empty list which has no elements and is represented
by [], or
• a structure that has two components
– the first element, called the head.
– the remaining elements (also a list), called the tail.
'
&
$
%
Lists (cont’d)
Lists are structures built using the functor . (dot) with arguments
the head and tail of the list. Using this notation the list
[ann, tennis, tom, skiing]
can be represented as the term
.(ann, .(tennis, .(tom, .(skiing, []))))
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Lists (cont’d)
Lists in Prolog can be represented as follows:
• The convenient notation using brackets.
Example: [ann, tom]
• The cumbersome notation using the functor dot.
Example: .(ann, .(tom, []))
• The useful notation using the vertical bar.
Examples:
[ann | [tom]], [ann, tom | []]
[ann, tennis | [tom, skiing]]
'
&
$
%
Programs for Lists: Membership
member(X, [X|Tail]). member(X, [Head|Tail]):-
member(X,Tail).
Declarative semantics: member(X,L) is true if element X occurs
in list L.
The predicate member can also be used for listing the members of a
list!
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Programs for Lists: Concatenation
conc([], L, L). conc([X|L1], L2, [X|L3]):-
conc(L1, L2, L3).
Declarative semantics: conc(L1,L2,L3) is true if list L3 is the
concatenation of lists L1 and L2.
'
&
$
%
Programs for Lists: Concatenation (cont’d)
The above program can be used for concatenating two given lists:
?- conc([a,b,c], [1,2,3], X).
X=[a,b,c,1,2,3]
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Programs for Lists: Concatenation (cont’d)
The same program can be used for decomposing a given list into two lists:
?- conc(L1,L2,[a,b,c]).
L1=[], L2=[a,b,c];
L1=[a], L2=[b,c];
L1=[a,b], L2=[c];
L1=[a,b,c], L2=[]; no
'
&
$
%
Membership via Concatenation
The following is another program for membership:
member1(X,L):-
conc(L1,[X|L2],L).
or equivalently
member1(X,L):-
conc(_,[X|_],L).
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Adding an Element
To add an element to a list L, it is easiest to put it in front of the
list so that it becomes its new head: [X|L].
If you need a program for this, it is the following:
add(X, L, [X|L]).
'
&
$
%
Deleting an Element
The following Prolog program deletes an element from a list:
delete(X, [X|Tail], Tail). delete(X, [Y|Tail], [Y|Tail1]):-
delete(X, Tail, Tail1).
The program fails if the given element is not in the list.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Deleting an Element (cont’d)
The previous program can be used as follows:
• Non-deterministically to delete any occurence of the given
element in the list by backtracking.
?- delete(a, [a,b,a,a], L).
L=[b,a,a];
L=[a,b,a];
L=[a,b,a];
no
'
&
$
%
Deleting an Element (cont’d)
The previous program can also be used as follows:
• delete can be used in the inverse direction to add an element
anywhere in a list.
?- delete(a, L, [1,2,3]).
L=[a,1,2,3];
L=[1,a,2,3];
L=[1,2,a,3];
L=[1,2,3,a]; no
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Sublists
The following program sublist(S,L) checks whether list S occurs
within list L as its sublist.
sublist(S,L):-
conc(L1,L2,L),
conc(S,L3,L2).
The program can also be used to find all sublists of a given list.
'
&
$
%
Permutations
The following program permute(L,P) generates by backtracking all
permutations P of a given list L.
permutation([], []). permutation(L, [X|P]):-
delete(X, L, L1),
permutation(L1, P).
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Operators in Prolog
Operators in Prolog can be prefix, infix or postfix.
Each operator has a precedence and an associativity.
Operators in Prolog are merely notational convenience.
Internally Prolog will represent expressions involving operators as
terms (e.g., 2x+3y will be represented as the term +(*(2,x),
*(3,y))).
'
&
$
%
How to Define Operators
Prolog allows the definition of operators using a special type of
clause called a directive.
The syntax for operator directives is
:- op(Precedence, Associativity, Name)
where
• Name is the name of the operator (e.g., ==>).
• Precedence is a number (in SICStus Prolog it is between 1 and
1200) giving the precedence of the operator.
• Associativity is a specification of the associativity of the
operator.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Example
The addition/subtraction operators +/- could have been defined by
the directive
:- op(500, yfx, [+, -])
where yfx specifies that they are right-associative.
You can find the details of precedence/associativity of all built-in
Prolog operators in any Prolog book or in the SICStus Prolog
manual.
'
&
$
%
Arithmetic
Arithmetic in Prolog is performed with special built-in arithmetic
predicates.
An arithmetic expression is a term involving numbers (integers and
floats), variables, and functors representing arithmetic functions.
Arithmetic expressions are simply data structures. Evaluation of
arithmetic expressions is performed using appropriate built-in
predicates.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Some Built-in Arithmetic Predicates
The built-in predicate is is used when we want to evaluate an expression
and unify the result with a variable. Notice the difference with the built-in
predicate = which unifies two terms.
?- X is 1+2. X = 3; no
?- X = 1+2. X = 1+2; no
?- X is 1+2, Y=X. X = 3, Y = 3; no
'
&
$
%
Some Built-in Arithmetic Predicates (cont’d)
The built-in predicates
X =:= Y X =\= Y X < Y X > Y X =< Y X >= Y
are also used when we want their arguments to be evaluated.
Example:
?- X is 8/4, X =:= (3+2+1)/3. X = 2.0 ; no
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Examples
The following predicate gcd(X,Y,D) is true if and only if D is the
greatest common divisor of X and Y.
gcd(X, X, X). gcd(X, Y, D):-
X < Y,
Y1 is Y-X,
gcd(X, Y1, D).
gcd(X, Y, D):-
Y < X,
gcd(Y, X, D).
are also used when we want their arguments to be evaluated.
'
&
$
%
Examples (cont’d)
The following predicate length(L, N) is true if and only if N is the
length of list L.
length([], 0). length([_|Tail], N):-
length(Tail, N1),
N is N1 + 1.
are also used when we want their arguments to be evaluated.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
The Cut Operator (!)
The cut operator can be used to reduce the search space of Prolog
computations by dynamically pruning the search tree. The cut can
be used to prevent Prolog from following fruitless computation
paths that the programmer knows could not produce solutions.
The use of cut is controversial. Many of its uses can only be
interpreted procedurally, in a contrast to the declarative
programming style we advocate. Used sparingly, however, it can
improve the efficiency of programs without compromising their
clarity.
'
&
$
%
Green Cuts: expressing determinism
Consider the following program for computing the maximum of two
numbers. Predicate max(X,Y,Z) is true if and only if Z is the
maximum of X and Y.
max(X, Y, X):- X >= Y. max(X, Y, Y):- X < Y.
Finding the maximum of two numbers is a deterministic operation.
Only one of the two max clauses applies in a given computation
because the tests X >= Y and X < Y are mutually exclusive.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Green Cuts: expressing determinism (cont’d)
We can use the cut operator to express the mutually exclusive
nature of the tests in the max predicate:
max(X, Y, X):- X >= Y, !. max(X, Y, Y):- X > Y.
or
max(X, Y, X):- X >= Y, !. max(X, Y, Y):- X > Y, !.
Operationally the cut is handled as follows. The goal succeeds and
commits Prolog to all the choices made since the parent goal was
unified with the head of the clause the cut occurs in.
'
&
$
%
Green Cuts: expressing determinism (cont’d)
Let us consider the following clause:
A :- B1, B2, ..., Bk, !, C1, C2, ..., Ck
If the current goal G unifies with the head of C and B1, ..., Bk
further succeed, the cut has the following effects:
• The program is committed to the choice of the above clause for
reducing G; any alternative clauses for A that might unify with
G are ignored.
• Further, should any of the C1, C2, ..., Ck fail, backtracking
goes back only as far as the cut. Other choices remaining in the
computation of B1, ..., Bk are pruned from the search tree.
If backtracking actually reaches the cut, the cut fails, and the
search proceeds from the last choice made before the above
clause was chosen for goal G.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Red Cuts: omitting explicit conditions
We can go one step further in the use of cut. If we take into
account the execution model of Prolog, we can rewrite the program
for max in the following form:
max(X, Y, X):- X >= Y, !. max(X, Y, Y).
The above program still works as expected but now we have
modified the declarative semantics of the original program.
For example, the fact max(5,1,1) follows from the second clause.
So this is a false logic program but behaves correctly!
Cuts whose presence in a program changes the meaning of the
program are called red cuts. Using red cuts should be avoided if
possible since it is error-prone.
'
&
$
%
Red Cuts: omitting explicit conditions (cont’d)
Consider the program for member:
member(X, [X|L]). member(X, [Y|L]):-
member(X,L).
Now consider a new version of member where we have used the cut
operator to obtain efficiency.
member(X, [X|L]):- !. member(X, [Y|L]):-
member1(X,L).
Is this a good Prolog program?
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Red Cuts: omitting explicit conditions (cont’d)
If the semantics of member are:
member(X,L) is true if and only if X is a member of list L
then the above program is not correct (red cut!) because the goal
member(X,[1,2]) has only the solution X=1. Perhaps the above
program should be called memberCheck with appropriate semantics.
'
&
$
%
Negation as Failure
Prolog allows a limited form of negation called negation as
failure.
Negation as failure can be implemented by a built-in predicate not
which can be defined by the following Prolog program.
not(G):- G, !, fail. not(G).
In other words, the goal not(G) succeeds if and only if the goal G
fails. fail is a built-in predicate that simply fails. The cut used
above is a red one because the meaning of the program is different
when the cut is removed.
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Negation as Failure (cont’d)
Negation as failure comes very handy in writing various Prolog
rules or queries.
The following program defines a predicate disjoint(L,T) which is
true if lists L and T have no common elements.
disjoint(L,T) :- not(commonMembers(L,T)).
commonMembers(L,T):-
member(X,L),
member(X,T).
'
&
$
%
Negation as Failure (cont’d)
In some Prolog systems different notation is used to express
negation as failure.
In SICstus Prolog the appropriate operator is \+. Thus the above
rules should be written as:
disjoint(L,T) :- \+(commonMembers(L,T)).
commonMembers(L,T):-
member(X,L),
member(X,T).
������� ��� ������������������������� � � !"��#���$�%�&('*)���+'
&
$
%
Readings
• Leon Sterling and Ehud Shapiro. The Art of Prolog. MIT
Press.
• Ivan Bratko. Prolog Programming for Artificial Intelligence.
2nd edition. Addison-Wesley.
Chapters 3 and 5.
• SICStus Prolog manual.