syllabus

Syllabus

Artificial Intelligence

Introduction to AI, AI Tasks, AI Techniques, Knowledge Representation, Approaches to

Knowledge Representation, Procedural Vs. Declarative Knowledge, Inferential knowledge,

Issues in knowledge representation, Frame Problem, Forward Vs. Backward Reasoning.

First order predicate logic, Prepositional Logic, Predicate Logic, Inference Rules, Resolution,

Unification, Structured Knowledge Representation, Semantic Network, Conceptual graph,

Frame Structure, Conceptual Dependency, Script.

AI Problem, Space and Search, Means-end Analysis, Breadth first search, Depth first Search,

Hill Climbing Search, Best first Search, A* Algorithm

Learning, Rote Learning, Learning by taking advice, learning by problem solving, inductive

learning, Explanation based learning.

Expert System, Application area of Expert System, Structure of Expert System,

Characteristics of Expert System, MYSIN Case study.

Fuzzy Logic, Memory Organisation, Neural Network, Genetic Algorithm, Matching.

TABLE OF CONTENTS

UNIT 1 : Introduction to AI

1.1 What is Artificial Intelligence?

1.2 Is AI Possible

1.3 Some AI Tasks

1.4 What we can do with AI?

1.5 AI Techniques

1.5.1 Knowledge Representation

1.5.2 Search

1.6 The Underlying Assumption

UNIT 2 : Knowledge Representation

2.1 What to Represent?

2.2 Application of Knowledge in AI

2.3 Properties for Knowledge Representation Systems

2.4 Approaches to Knowledge Representation

2.4.1 Simple Relational Knowledge

2.4.1 Inheritable Knowledge

2.5 Inferential Knowledge

2.6 Procedural Vs. Declarative Knowledge

2.7 Issues in Knowledge Representation

2.8 The Frame Problem

2.9 Forward Vs Backward Reasoning

UNIT 3 : First Order Predicate Logic

3.1 Logic

3.2 Introduction to Propositional Logic

3.3 Predicate Logic

3.3.1 Introduction to Predicate Logic

3.3.2 Predicate Logic: Semantics

3.4 Quantification

3.5 Well-Formed Formula For First Order Predicate Logic

3.6 From Wff to Proposition

3.7 Transcribing English to Predicate Logic Wffs

3.8 Properties of Statements

3.9 Inference Rules

3.10 Resolution

3.11 Conversion to Clausal Form

3.12 Unification

Unit 4 : Structured Knowledge Representation

4.1 Semantic Network

4.2 Conceptual Graph

4.3 Frame Structures

4.4 Conceptual Dependency

4.5 Scripts

Unit 5 : Problem, Problem Space and Search

5.1 Search and Control Strategies

5.2 Preliminary Concepts

5.3 Water Container Problem

5.4 Production System

5.5 Problem Characteristics

5.6 Means-end analysis

5.7 Problem Reduction

5.8 Uninformed or Blind Search

5.8.1 Breadth-First Search

5.8.2 Depth-First Search

5.9 Informed Search

5.9.1 Hill Climbing Methods

5.9.2 Best First Search

5.9.3 A* Algorithm

Unit 6 : Learning

6.1 What is Learning?

6.2 Types of Learning

6.2.1 Rote Learning

6.2.2 Learning by Taking Advice

6.2.3 Learning by Problem Solving

6.2.4 Inductive Learning

6.2.5 Explanation Based Learning

Unit 7 : Expert System

7.1 What is Expert System?

7.2 Expert System Application Area

7.3 Expert System Structure

7.4 Expert System Characteristics

7.5 Conventional Vs Expert Systems

7.6 Participants in Expert Systems Development

7.7 Tools For Development of Expert System

7.8 MYSIN

Unit 8 : Matching and Reasoning

8.1 Fuzzy Logic

8.1.1 What is Fuzziness?

8.1.2 Current Application of Fuzzy Logic

8.1.3 Overview of Fuzzy Logic

8.1.4 Fuzzy Sets

8.1.5 Hedges

8.1.6 Fuzzy Set Operations

8.1.7 Fuzzy Inference

8.2 Memory Organisation

8.3 Neural Networks and Parallel Computation

8.3.1 Neural Network Architectures

8.4 Genetic Algorithm

8.5 Matching

8.5.1 Variable Matching

UNIT 1 INTRODUCTION TO AI

1.1 What is Artificial Intelligence?

1.2 Is AI Possible

1.3 Some AI Tasks

1.4 What we can do with AI?

1.5 AI Techniques

1.5.1 Knowledge Representation

1.5.2 Search

1.6 The Underlying Assumption

1.1 What is Artificial Intelligence?Artificial intelligence (AI) is a broad field, and means different things to different people. It is

concerned with getting computers to do tasks that require human intelligence. However,

having said that, there are many tasks, which we might reasonably think require intelligence

- such as complex arithmetic, which computers can do very easily. Conversely, there are

many tasks that people do without even thinking - such as recognizing a face - which are

extremely complex to automate. AI is concerned with these difficult tasks, which seem to

require complex and sophisticated reasoning processes and knowledge.

People might want to automate human intelligence for a number of different reasons. One

reason is simply to understand human intelligence better. For example, we may be able to

test and refine psychological and linguistic theories by writing programs, which attempt to

simulate aspects of human behavior. Another reason is simply so that we have smarter

programs. We may not care if the programs accurately simulate human reasoning, but by

studying human reasoning we may develop useful techniques for solving difficult problems.

AI is a field that overlaps with computer science rather than being a strict sub field. Different

areas of AI are more closely related to psychology, philosophy, logic, linguistics, and even

neurophysiology. However, as this is a CS course I'll emphasize the computational

techniques used, and put less emphasis on psychological modeling or philosophical issues.

I'll just briefly touch on some of the widely discussed philosophical issues below:

What is AI Exactly? As a beginning I offer the following definition:

AI is the branch of computer science concerned with the study and creation of computer

systems that exhibit some form of intelligence: system that learn new concepts and tasks,

systems that can reason and draw useful conclusions about the world around us, systems

that can understand a natural language or perceive and comprehend a visual scene, and

systems that perform other types of feats that require human types of intelligence.

Like other definition of complex topics, an understanding of AI require an understanding of

related terms such as intelligence, knowledge, reasoning, thought, cognition, learning, and a

number of computer-related terms. While we lack precise scientific definitions for many of

these terms, we can give general definitions of them. And, of course, one of the objectives of

this text is to impart special meaning to all of the terms related to AI, including their

operational meanings.

Dictionaries define intelligence as the ability to acquire, understand and apply knowledge, or

the ability to exercise thought and reason. Of course, intelligence is more than this. It

embodies all of the knowledge and feats, both conscious and unconscious, which we have

acquired through study and experience: highly refined sight and sound perception; thought;

imagination; the ability to converse, read, write, drive a car, memorize and recall facts,

express and feel emotions; and much more.

Intelligence is the integrated sum of those feats, which gives us the ability to remember a

face not seen for thirty or more years, or to build and send rockets to the moon. It is those

capabilities, which set Homo sapiens apart from other forms of living things. And as we shall

see, the food for this intelligence is knowledge.

Can we ever expect to build systems, which exhibit these characteristics? The answer to this

question is yes! Today with the advent of the computer and 50 years of research into AI

programming techniques, the dream of smart machines is becoming a reality. Researchers

are creating systems which can mimic human thought, understand speech, beat the best

human chess player, and countless other feats never before possible.

It is not my aim to surprise or shock you--but the simplest way I can summarize is to say

that there are now in the world machines that can think, that can learn and that can create.

Moreover, their ability to do these things is going to increase rapidly until--in a visible

future--the range of problems they can handle will be coextensive with the range to which

the human mind has been applied. --Herbert Simon

In spite of these impressive achievements, we still have not been able to produce

coordinated, autonomous system which posses some of the basis abilities of a three-year-

old child. These include the ability to recognize and remember numerous diverse objects in

a scene, to learn new sounds and associate them with objects and concepts, and to adapt

readily to many diverse new situations. These are the challenge now facing researchers in AI

and they are not easy ones. They will require important breakthrough before we can expect

to equal the performance of our three-year old.

1.2 Is AI Possible?Before we embark on a course in Artificial Intelligence, we should consider for a moment

whether automating intelligence is really possible!

Artificial intelligence research makes the assumption that human intelligence can be

reduced to the (complex) manipulation of symbols, and that it does not matter what medium

is used to manipulate these symbols - it does not have to be a biological brain! This

assumption does not go unchallenged among philosophers etc. Some argue that true

intelligence can never be achieved by a computer, but requires some human property,

which cannot be simulated. There are endless philosophical debates on this issue (some on

comp.ai.philosophy), brought recently to public attention again in Penrose's book.

The most well known contributions to the philosophical debate are Turing's ``Turing test''

paper, and Searle's ``Chinese room''. Very roughly, Turing considered how you would be

able to conclude that a machine was really intelligent. He argued that the only reasonable

way was to do a test. The test involves a human communicating with a human and with a

computer in other rooms, using a computer for the communication. The first human can ask

the other human/computer any questions they like, including very subjective questions like

``What do you think of this Poem''. If the computer answers so well that the first human

can't tell which of the two others is human, then we say that the computer is intelligent.

Searle argued that just behaving intelligently wasn't enough. He tried to demonstrate this by

suggesting a thought experiment (the ``Chinese room''). Imagine that you don't speak any

Chinese, but that you have a huge rule book which allows you to look up Chinese sentences

and tells you how to reply to them in Chinese. You don't understand Chinese, but can

behave in an apparently intelligent way. He claimed that computers, even if they appeared

intelligent, wouldn't really be, as they'd be just using something like the rule book of the

Chinese room.

Many people go further than Searle, and claim that computers will never even be able to

appear to be really intelligent (so will never pass the Turing test). There are therefore a

number of positions that you might adopt:

Computers will never even appear to be really intelligent, though they might do a few

useful tasks that conventionally require intelligence.

Computers may eventually appear to be intelligent, but in fact they will just be

simulating intelligent behavior, and not really be intelligent.

Computers will eventually be really intelligent.

Computers will not only be intelligent, they'll be conscious and have emotions.

My view is that, though computers can clearly behave intelligently in performing certain

limited tasks, full intelligence is a very long way off and hard to imagine (though I don't see

any fundamental reason why a computer couldn't be genuinely intelligent.) However, these

philosophical issues rarely impinge on AI practice and research. It is clear that AI techniques

can be used to produce useful programs that conventionally require human intelligence, and

that this work helps us understand the nature of our own intelligence. This is as much as we

can expect from AI for now, and it still makes it a fascinating topic!

1.3 Some AI TasksHuman intelligence involves both ``mundane'' and ``expert'' reasoning. By mundane

reasoning I mean all those things which (nearly) all of us can routinely do (to various

abilities) in order to act and interact in the world. This will include:

Vision: The ability to make sense of what we see.

Natural Language: The ability to communicate with others in English or another

natural language.

Planning: The ability to decide on a good sequence of actions to achieve your goals.

Robotics: The ability to move and act in the world, possibly responding to new

perceptions.

By expert reasoning I mean things that only some people are good at, and which require

extensive training. It can be especially useful to automate these tasks, as there may be a

shortage of human experts. Expert reasoning includes:

Medical diagnosis.

Equipment repair.

Computer configuration.

Financial planning.

Expert Systems are concerned with the automation of these sorts of tasks.

AI research is concerned with automating both these kinds of reasoning. It turns out,

however, that it is the mundane tasks that are by far the hardest to automate.

1.4 What we can do with AI ?We have been studying this issue of AI application for quite some time now and know all the

terms and facts. But what we all really need to know is what can we do to get our hands on

some AI today. How can we as individuals use our own technology? We hope to discuss this

in depth (but as briefly as possible) so that you the consumer can use AI as it is intended.

First, we should be prepared for a change. Our conservative ways stand in the way of

progress. AI is a new step that is very helpful to the society. Machines can do jobs that

require detailed instructions followed and mental alertness. AI with its learning capabilities

can accomplish those tasks but only if the worlds conservatives are ready to change and

allow this to be a possibility. It makes us think about how early man finally accepted the

wheel as a good invention, not something taking away from its heritage or tradition.

Secondly, we must be prepared to learn about the capabilities of AI. The more use we get

out of the machines the less work is required by us. In turn less injuries and stress to human

beings. Human beings are a species that learn by trying, and we must be prepared to give AI

a chance seeing AI as a blessing, not an inhibition.

Finally, we need to be prepared for the worst of AI. Something as revolutionary as AI is sure

to have many kinks to work out. There is always that fear that if AI is learning based, will

machines learn that being rich and successful is a good thing, then wage war against

economic powers and famous people? There are so many things that can go wrong with a

new system so we must be as prepared as we can be for this new technology.

However, even though the fear of the machines are there, their capabilities are infinite

Whatever we teach AI, they will suggest in the future if a positive outcome arrives from it. AI

are like children that need to be taught to be kind, well mannered, and intelligent. If they are

to make important decisions, they should be wise. We as citizens need to make sure AI

programmers are keeping things on the level. We should be sure they are doing the job

correctly, so that no future accidents occur.

1.5 AI TechniquesThere are various techniques that have evolved that can be applied to a variety of AI tasks -

these will be the focus of this course. These techniques are concerned with how we

represent, manipulate and reason with knowledge in order to solve problems.

Knowledge Representation

Search

1.5.1 Knowledge RepresentationKnowledge representation is crucial. One of the clearest results of artificial intelligence

research so far is that solving even apparently simple problems requires lots of knowledge.

Really understanding a single sentence requires extensive knowledge both of language and

of the context. For example, today's (4th Nov) headline ``It's President Clinton'' can only be

interpreted reasonably if you know it's the day after the American elections. [Yes, these

notes are a bit out of date]. Really understanding a visual scene similarly requires

knowledge of the kinds of objects in the scene. Solving problems in a particular domain

generally requires knowledge of the objects in the domain and knowledge of how to reason

in that domain - both these types of knowledge must be represented.

Knowledge must be represented efficiently, and in a meaningful way. Efficiency is important,

as it would be impossible (or at least impractical) to explicitly represent every fact that you

might ever need. There are just so many potentially useful facts, most of which you would

never even think of. You have to be able to infer new facts from your existing knowledge, as

and when needed, and capture general abstractions, which represent general features of

sets of objects in the world.

Knowledge must be meaningfully represented so that we know how it relates back to the

real world. A knowledge representation scheme provides a mapping from features of the

world to a formal language. (The formal language will just capture certain aspects of the

world, which we believe is important to our problem - we may of course miss out crucial

aspects and so fail to really solve our problem, like ignoring friction in a mechanics

problem). Anyway, when we manipulate that formal language using a computer we want to

make sure that we still have meaningful expressions, which can be mapped back to the real

world. This is what we mean when we talk about the semantics of representation languages.

In other way we can say that AI techniques is a method that exploits knowledge that should

be represented in such a way that:

The knowledge captures generalizations. In other words, it is not necessary to

represent separately each individual situation. Instead, situations that share important

properties are grouped together if knowledge does not have this property, inordinate

amounts of memory and updating will be required. So we usually call something

without this property “data” rather than knowledge.

It can be understood by people who must provide it. Although for many programs, the

bulk of the data can be acquired automatically (for example, by taking readings from a

variety of instruments), in many AI domains, mist of the knowledge a program has

must ultimately be provided by people in terms they understand.

It can easily be modified to correct errors and to reflect changes in the world and in out

world view.

It can be used in a great many situations even if it is not totally accurate or complete.

It can be used to help overcome its own sheer bulk by helping to narrow the range of

possibilities that must usually be considered.

1.5.2 SearchAnother crucial general technique required when writing AI programs is search. Often there

is no direct way to find a solution to some problem. However, you do know how to generate

possibilities. For example, in solving a puzzle you might know all the possible moves, but not

the sequence that would lead to a solution. When working out how to get somewhere you

might know all the roads/buses/trains, just not the best route to get you to your destination

quickly. Developing good ways to search through these possibilities for a good solution is

therefore vital. Brute force techniques, where you generate and try out every possible

solution may work, but are often very inefficient, as there are just too many possibilities to

try. Heuristic techniques are often better, where you only try the options, which you think

(based on your current best guess) are most likely to lead to a good solution.

Although AI techniques must be designed in keeping with these constraints imposed by AI

problem, there is some degree of independence between problems and problem solving

techniques. It is possible to solve AI problem without using AI techniques (although those

solutions are not likely to be vary good). And it is possible to apply AI techniques to the

solution of non-AI problems. This is likely to be a good thing to do for problems that possess

many of the same characteristics, as do AI problems.

1.6 The Underlying Assumption One of the assumptions underlying work in Artificial Intelligence is that intelligent behavior

can be achieved through the manipulation of symbol structures (representing bits of

knowledge). These symbols can be represented on any medium - in principle, we could

develop a (very slow) intelligent machine made out of empty beer cans (plus something to

move the beer cans around). However, computers provide the representational and

reasoning powers whereby we might realistically expect to make progress towards

automating intelligent behavior.

So, the main question now is how we can represent knowledge as symbol structures and use

that knowledge to intelligently solve problems. The next few lectures will concentrate on

how we represent knowledge, using particular knowledge representation languages. These

are high-level representation formalisms, and can in principle be implemented using a whole

range of programming languages. The remaining lectures will concentrate more on how we

solve problems, using general knowledge of problem solving and domain knowledge.

In AI, the crucial thing about knowledge representation languages is that they should

support inference. We can't represent explicitly everything that the system might ever need

to know - some things should be left implicit, to be deduced by the system as and when

needed in problem solving. For example if we were representing facts about a particular CS3

Honours student (say Fred Bloggs) we don't want to have to explicitly record the fact that

Fred's studying AI. All CS3 Honours students are, so we should be able to deduce it.

Similarly, you probably wouldn't explicitly represent the fact that I'm not the president of the

United States, or that I have an office in Lilybank Gardens. You can deduce these things from

your general knowledge about the world.

Representing everything explicitly would be extremely wasteful of memory. For our CS3

example, we'd have 100 statements representing the fact that each student studies AI. Most

of these facts would never be used. However, if we DO need to know if Fred Bloggs studies

AI we want to be able to get at that information efficiently. We also would like to be able to

make more complex inferences - maybe that Fred should be attending a lecture at 12am on

Tuesday Feb 9th, so won't be able to have a supervision then. However, there is a tradeoff

between inferential power (what we can infer) and inferential efficiency (how quickly we can

infer it), so we may choose to have a language where simple inferences can be made

quickly, though complex ones are not possible.

In general, a good knowledge representation language should have at least the following

features:

It should allow you to express the knowledge you wish to represent in the language. For

example, suppose you want to represent the fact that ``Richard knows how old he is''.

This turns out to be difficult to express in some languages.

It should allow new knowledge to be inferred from a basic set of facts, as discussed

above.

It should be clear, and have a well-defined syntax and semantics. We want to know what

the allowable expressions are in the language, and what they mean. Otherwise we won't

be sure if our inferences are correct, or what the results mean. For example, if we have a

fact grey (elephant) we want to know whether it means all elephants are grey, some

particular one is grey, or what.

Some of these features may be present in recent non-AI representation languages, such as

deductive and object oriented databases. In fact, these systems have been influenced by

early AI research on knowledge representation, and there is some promise of further cross-

fertilization of ideas, to allow robust, multi-user knowledge/data bases with well-defined

semantics and flexible representation and inference capabilities. However, at present the

fields are still largely separate, and we will only be discussing basic AI approaches here.

Broadly speaking, there are three main approaches to knowledge representation in AI. The

most important is arguably the use of logic. A logic, almost by definition, has a well-defined

syntax and semantics, and is concerned with truth preserving inference. However, using

logic to represent things has problems. On the one hand, it may not be very efficient - if we

just want a very restricted class of inferences, we may not want the full power of a logic-

based theorem prove, for example. On the other hand, representing some common-sense

things in a logic can be very hard. For example in first order predicate logic we can't

conclude that something is true one minute, and then later decide that it is not true after all.

If we did this it would lead to a contradiction, from which we could prove anything at all! We

could decide to use more complex logics, which allow this kind of reasoning - there are all

sorts of logics out there, such as default logics, temporal logics and modal logics. However,

another approach is to abandon the constraints that the use of a logic imposes and use a

less clean, but more flexible knowledge representation language.

Two such ``languages'' are structured objects and production systems. The idea of

structured objects is to represent knowledge as a collection of objects and relations, the

most important relations being the subclass and instance relations. The subclass relation (as

you might expect) says that one class is a subclass of another, while the instance relation

says that some individual belongs to some class. We'll use them so that ``X subclass Y''

means that X is a subclass of Y, not that X has a subclass Y. (Some books/approaches use

the relation is a to refer to the subclass relation. So Fred Bloggs is an instance of the class

representing AI3 students, while the class of AI3 students is a subclass of the class of third

year students. We can then define property inheritance, so that, by default, Fred inherits all

the typical attributes of AI3 students, and AI3 students inherit typical attributes of 3rd yr

students. We'll go into this in much more detail below.

Production systems consist of a set of if-then rules, and a working memory. The working

memory represents the facts that are currently believed to hold, while the if-then rules

typically state that if certain conditions hold (e.g. certain facts are in the working memory),

then some action should be taken (e.g., other facts should be added or deleted). If the only

action allowed is to add a fact to working memory then rules may be essentially logical

implications, but generally greater flexibility is allowed. Production rules capture (relatively)

procedural knowledge in a simple, modular manner.

UNIT 2 KNOWLEDGE REPRESENTATION

2.1 What to Represent?

2.2 Application of Knowledge in AI

2.3 Properties for Knowledge Representation Systems

2.4 Approaches to Knowledge Representation

2.4.1 Simple Relational Knowledge

2.4.1 Inheritable Knowledge



2.7 Issues in Knowledge Representation

2.8 The Frame Problem

2.9 Forward Vs Backward Reasoning

2.1 What to Represent?Let us first consider what kinds of knowledge might need to be represented in AI systems:

Objects

-- Facts about objects in our world domain. e.g. Guitars have strings, trumpets are

brass instruments.

Events

-- Actions that occur in our world. e.g. Steve Vai played the guitar in Frank Zappa's

Band.

Performance

-- A behavior like playing the guitar involves knowledge about how to do things.

Meta-knowledge

-- knowledge about what we know. e.g. Bobrow's Robot who plan's a trip. It knows

that it can read street signs along the way to find out where it is.

Thus in solving problems in AI we must represent knowledge and there are two entities to

deal with:

Facts

-- truths about the real world and what we represent. This can be regarded as the

knowledge level

Representation of the facts

which we manipulate. This can be regarded as the symbol level since we usually

define the representation in terms of symbols that can be manipulated by programs.

We can structure these entities at two levels

the knowledge level

-- at which facts are described

the symbol level

-- at which representations of objects are defined in terms of symbols that can be

manipulated in programs (see the following Fig. )

Fig Two Entities in Knowledge Representation

English or natural language is an obvious way of representing and handling facts.

Logic enables us to consider the following fact:

spot is a dog as

dog(spot)

We could then infer that all dogs have tails with:

"x dog(x) ® hasatail(x)

We can then deduce:

hasatail(Spot)

Using an appropriate backward mapping function we could generate the English sentence

Spot has a tail.

The available functions are not always one to one but rather are many to many which is a

characteristic of English representations. The sentences All dogs have tails and every dog

has a tail both say that each dog has a tail but the first could say that each dog has more

than one tail try substituting teeth for tails. When an AI program manipulates the internal

representation of facts these new representations should also be interpretable as new

representations of facts.

Consider the classic problem of the mutilated chess board. Problem In a normal chess board

the opposite corner squares have been eliminated. The given task is to cover all the squares

on the remaining board by dominoes so that each domino covers two squares. No

overlapping of dominoes is allowed, can it be done. Consider three data structures

Fig. Mutilated Checker

the first two are illustrated in the diagrams above and the third data structure is the number

of black squares and the number of white squares. The first diagram loses the colour of the

squares and a solution is not east to see; the second preserves the colours but produces no

easier path whereas counting the number of squares of each colour giving black as 32 and

the number of white as 30 yields an immediate solution of NO as a domino must be on one

white square and one black square, thus the number of squares must be equal for a positive

solution.

2.2 Application of Knowledge in AIWe have briefly mentioned where knowledge is used in AI systems. Let us consider a little

further to what applications and how knowledge may be used.

Learning

-- acquiring knowledge. This is more than simply adding new facts to a knowledge

base. New data may have to be classified prior to storage for easy retrieval, etc..

Interaction and inference with existing facts to avoid redundancy and replication in

the knowledge and and also so that facts can be updated.

Retrieval

-- The representation scheme used can have a critical effect on the efficiency of the

method. Humans are very good at it.

Many AI methods have tried to model human (see lecture on distributed reasoning)

Reasoning

-- Infer facts from existing data.

If a system on only knows:

Miles Davis is a Jazz Musician.

All Jazz Musicians can play their instruments well.

If things like Is Miles Davis a Jazz Musician? or Can Jazz Musicians play their instruments

well? are asked then the answer is readily obtained from the data structures and

procedures.

However a question like Can Miles Davis play his instrument well? requires reasoning.

The above are all related. For example, it is fairly obvious that learning and reasoning

involve retrieval etc.

2.3 Properties for Knowledge Representation SystemsThe following properties should be possessed by a knowledge representation system.

Representational Adequacy

-- the ability to represent the required knowledge;

Inferential Adequacy

- the ability to manipulate the knowledge represented to produce new knowledge

corresponding to that inferred from the original;

Inferential Efficiency

- the ability to direct the inferential mechanisms into the most productive directions

by storing appropriate guides;

Acquisitional Efficiency

- the ability to acquire new knowledge using automatic methods wherever possible

rather than reliance on human intervention.

To date no single system optimises all of the above

2.4 Approaches to Knowledge RepresentationWe briefly survey some representation schemes. We will look at some in more detail in

further chapters.

Simple relational knowledge

Inheritable knowledge

2.4.1 Simple relational knowledgeThe simplest way of storing facts is to use a relational method where each fact about a set

of objects is set out systematically in columns. This representation gives little opportunity for

inference, but it can be used as the knowledge basis for inference engines.

Simple way to store facts.

Each fact about a set of objects is set out systematically in columns(shown below).

Little opportunity for inference.

Knowledge basis for inference engines.

Figure: Simple Relational Knowledge

We can ask things like:

Who is dead?

Who plays Jazz/Trumpet etc.?

This sort of representation is popular in database systems.

2.4.2 Inheritable knowledgeRelational knowledge is made up of objects consisting of

attributes

corresponding associated values.

We extend the base more by allowing inference mechanisms:

Property inheritance

o elements inherit values from being members of a class.

o data must be organised into a hierarchy of classes (Fig. given below).

Fig. Property Inheritance Hierarchy

Boxed nodes -- objects and values of attributes of objects.

Values can be objects with attributes and so on.

Arrows -- point from object to its value.

This structure is known as a slot and filler structure, semantic network or a collection of

frames.

The algorithm to retrieve a value for an attribute of an instance object:

1. Find the object in the knowledge base

2. If there is a value for the attribute report it

3. Otherwise look for a value of instance if none fail

4. Otherwise go to that node and find a value for the attribute and then report it

5. Otherwise search through using isa until a value is found for the attribute.


Represent knowledge as formal logic:

All dogs have tails

"x dog(x) ® hasatail(x)

Advantages:

A set of strict rules.

o Can be used to derive more facts.

o Truths of new statements can be verified.

o Guaranteed correctness.

Many inference procedures available to in implement standard rules of logic.

Popular in AI systems. e.g Automated theorem proving.


Declarative knowledge representation:

Static representation -- knowledge about objects, events etc. and their relationships and

states given.

Requires a program to know what to do with knowledge and how to do it.

Procedural representation:

control information necessary to use the knowledge is embedded in the knowledge itself.

e.g. how to find relevant facts, make inferences etc.

Requires an interpreter to follow instructions specified in knowledge.

Knowledge encoded in some procedures

o small programs that know how to do specific things, how to proceed.

o e.g a parser in a natural language understander has the knowledge that a noun

phrase may contain articles, adjectives and nouns. It is represented by calls to

routines that know how to process articles, adjectives and nouns.

Advantages:

Heuristic or domain specific knowledge can be represented.

Extended logical inferences, such as default reasoning facilitated.

Side effects of actions may be modelled. Some rules may become false in time. Keeping

track of this in large systems may be tricky.

Disadvantages:

Completeness -- not all cases may be represented.

Consistency -- not all deductions may be correct.

e.g If we know that Fred is a bird we might deduce that Fred can fly. Later we might

discover that Fred is an emu.

Modularity is sacrificed. Changes in knowledge base might have far-reaching effects.

Cumbersome control information.

2.7 Issue in Knowledge RepresentationBelow are listed issues that should be raised when using a knowledge representation

technique:

Important Attributes

-- Are there any attributes that occur in many different types of problem?

There are two instance and isa and each is important because each supports property

inheritance.

Relationships

-- What about the relationship between the attributes of an object, such as, inverses,

existence, techniques for reasoning about values and single valued attributes. We can

consider an example of an inverse in

band(John Zorn,Naked City)

This can be treated as John Zorn plays in the band Naked City or John Zorn's band is Naked

City.

Another representation is band = Naked City

band-members = John Zorn, Bill Frissell, Fred Frith, Joey Barron,

Granularity

At what level should the knowledge be represented and what are the primitives. Choosing

the Granularity of Representation Primitives are fundamental concepts such as holding,

seeing, playing and as English is a very rich language with over half a million words it is

clear we will find difficulty in deciding upon which words to choose as our primitives in a

series of situations.

If Tom feeds a dog then it could become:

feeds(tom, dog)

If Tom gives the dog a bone like:

gives(tom, dog,bone)

Are these the same? In any sense does giving an object food constitute feeding?

If give(x, food) ® feed(x) then we are making progress.

But we need to add certain inferential rules.

In the famous program on relationships

Louise is Bill's cousin

How do we represent this?

louise = daughter (brother or sister (father or mother( bill)))

Suppose it is Chris then we do not know if it is Chris as a male or female and then son

applies as well.

Clearly the separate levels of understanding require different levels of primitives and these

need many rules to link together apparently similar primitives.

Obviously there is a potential storage problem and the underlying question must be what

level of comprehension is needed.

2.8 The Frame ProblemSo far in this chapter we have seen several methods for representing knowledge that would

allow us to form complex state description for search program. Another issue concerns how

to represent efficient sequences of problem states that arise from a search process. For

complex ill-structured problems, this can be a serious matter.

Consider the world of a household robot. There are many objects and relationships in the

world, and a state description must somehow include facts like

on(Plant12, Table34),

under(Table34,Window13), and

in(Table34, Room15).

One strategy is to each state description as a list of such facts. But what happens during the

problem-solving process if each of those descriptions is very long? Most of the facts will not

change from one state to another, yet each fact will be represented once at every node, and

we will quickly run out of memory. Furthermore, we will spend the majority of out time

creating these nodes and copying these facts – most of which do not change often – from

one node to another. For example, in the robot world, we could spend a lot of time

recording.

above(Ceiling, Floor)

at every node. All of this is, of course, in addition to the real problem of figuring out which

facts should be different at each node.

This whole problem of representing the facts that change as well as those that do not is

known as the frame problem. In some domains, the only hard part is representing all the

facts. In others, though, figuring out which ones change is nontrivial. For example, in the

robot world, there might be a table with a plant on it under the window. Suppose we move

the table to the center of the room. We must also infer that the plant is now in the center of

the room but that the window is not.

To support this kind of reasoning, some systems make use of an explicit set of axioms called

frame axioms, which describe all the things that do not change when a particular operator is

applied in state n to produce state n+1. (The things that do change must be mentioned as

part of the operator itself.) Thus, in the robot domain, we might write axioms such as

Color(x, y, s1) Ù move(x,s1,s2)à color(x,y,s2)

Which can be read as, “if x has color y in state s1 and the operation of moving x is applied in

state s1 to produce state s2, then the color of x in s2 is still y.” Unfortunately, in any

complex domain, a huge number of these axioms becomes necessary. An alternative

approach is to make the assumption that the only things that change are the things that

must. By “must” here we mean that the change is either required explicitly by the axioms

that describe the operator or that it follows logically from some change that is asserted

explicitly. This idea of circumscribing the set of unusual things is a very powerful one; it can

be used as a partial solution to the frame problem and as a way of reasoning with

incomplete knowledge.

But now let’s return briefly to the problem of representing a changing problem state. We

could do it by simply starting with a description of the initial state and then making changes

to that description as indicated by the rules we apply. This solves the problem of the wasted

space and time involved in copying the information for each node. And it works fine until the

first time the search has to backtrack. Then, unless all the changes that were made can

simply be ignored (as they could be if, for example, they were simply additions of new

theorems), we are faced with the problem of backing up to some earlier node. But how do

we know what changes in the problem state description need to be undone? For example,

what do we have to change to undo the effect of moving the table to the center of the room?

There are two ways this problem can be solved.

Do not modify the initial state description at all. At each node, store an indication of the

specific changes that should be made at this node. Whenever it is necessary to refer to

the description of the current problem state. Look at the initial state description and also

look back through all the nodes on the path from the start state to the current state.

Modify the initial state description as appropriate, but also record at each node an

indication of what to do to undo the move should it ever be necessary to backtrack

through the node. Then, whenever it is necessary to backtrack, check each node along

the way and perform the indicated operations on the state description.

Sometimes, even these solutions are not enough. We might want to remember, for example,

in the robot world, that before the table was moved, it was under the window and after

being moved, it was in the center of the room. This can be handled by adding to the

representation of each fact a specific indication of the time at which that fact was true. This

indication is called a state variable. But to apply the same technique to a real-world

problem. We need, for example, separate facts to indicate all the times at which the Statue

of Liberty in the New York.

2.9 Forward Vs Backward ReasoningThe object of a search procedure is to discover a path through a problem space from a goal

state, there are actually two directions in which such a search could proceed.

Forward, from the start states.

Backward, from the goal states.

The production system model of the search process provides an easy way of viewing forward

and backward reasoning as symmetric processes. Consider the problem of solving a

particular instance of the 8-puzzle. The rules to be used for solving the puzzle can be written

as shown in following figure.

Assume the areas of the tray are numbered:

Square 1 empty and Square 2 contains tile n à

Square 2 empty and Square 1 contains tile n

Square 1 empty and Square 4 contains tile n àSquare 4 empty and Square 1 contains tile n

Square 2 empty and Square 1 contains tile n àSquare 1 empty and Square 2 contains tile n

Using those rules we could attempt to solve the puzzle shown below.

Reason forward from the initial states. Begin by building a tree of move sequences that

might be solutions by starting with the initial configuration(s) at the root of the tree.

Generate the next level of the tree by finding all the rules whose left sides match the root

node and using their right sides to create the new configurations. Generate the next level by

taking each node generated at the previous level and applying to it all of the rules whose

left sides match it. Continue until a configuration that matches the goal state is generated."

Reason backward from the goal states. Begin building a tree of move sequences that might

be solutions by starting with the goal configuration(s) at the root of the tree. Generate the

next level of the tree by finding all the rules whose right sides match the root node. These

are all the rules that, if only we could apply them, would generate the state we wanted. Use

the left sides of the rules to generate the nodes at this second level of the tree. Generate

the next level of the tree by taking each node at the previous level and finding all the rules

whose right sides match it. Then using the corresponding left sides to generate the new

nodes. Continue until a node that matches the initial state is generated. This method of

reasoning backward from the desired final state is often called goal-directed reasoning.”

Four factors influence the question of whether it is better to reason forward or backward

Are there more possible start states or goal states? We want to move from the

smaller to the larger In which direction is the branching factor? We want to proceed in the direction of the

smaller branching factor Will the program be asked to justify its reasoning process to the user? Proceed in the

direction that corresponds to the way the user thinks What kind of event is going to trigger a problem-solving episode? If a new fact

triggers reasoning, forward chaining is better. If a query triggers reasoning, backward

chain.

UNIT 3 FIRST ORDER PREDICATE LOGIC

3.1 Logic

3.2 Introduction to Propositional Logic

3.3 Predicate Logic

3.3.1 Introduction to Predicate Logic

3.3.2 Predicate Logic: Semantics

3.4 Quantification

3.5 Well-Formed Formula For First Order Predicate Logic

3.6 From Wff to Proposition

3.7 Transcribing English to Predicate Logic Wffs

3.8 Properties of Statements

3.9 Inference Rules

3.10 Resolution

3.11 Conversion to Clausal Form

3.12 Unification

3.1 LogicLogic is a language for reasoning. It is a collection of rules we use when doing logical

reasoning. Human reasoning has been observed over centuries from at least the times of

Greeks, and patterns appearing in reasoning have been extracted, abstracted, and

streamlined. The foundation of the logic we are going to learn here was laid down by a

British mathematician George Boole in the middle of the 19th century, and it was further

developed and used in an attempt to derive all of mathematics by Gottlob Frege, a German

mathematician, towards the end of the 19th century. A British philosopher/mathematician,

Bertrand Russell, found a flaw in basic assumptions in Frege's attempt but he, together with

Alfred Whitehead, developed Frege's work further and repaired the damage. The logic we

study today is more or less along this line.

In logic we are interested in true or false of statements, and how the truth/falsehood of a

statement can be determined from other statements. However, instead of dealing with

individual specific statements, we are going to use symbols to represent arbitrary

statements so that the results can be used in many similar but different cases. The

formalization also promotes the clarity of thought and eliminates mistakes.

There are various types of logic such as logic of sentences (propositional logic), logic of

objects (predicate logic), logic involving uncertainties, logic dealing with fuzziness, temporal

logic etc. Here we are going to be concerned with propositional logic and predicate logic,

which are fundamental to all types of logic.

3.2 Introduction to Propositional LogicPropositional logic is a logic at the sentential level. The smallest unit we deal with in

propositional logic is a sentence. We do not go inside individual sentences and analyze or

discuss their meanings. We are going to be interested only in true or false of sentences, and

major concern is whether or not the truth or falsehood of a certain sentence follows from

those of a set of sentences, and if so, how. Thus sentences considered in this logic are not

arbitrary sentences but are the ones that are true or false. This kind of sentences are called

propositions.

Sentences considered in propositional logic are not arbitrary sentences but are the ones that

are either true or false, but not both. These kinds of sentences are called propositions.

If a proposition is true, then we say it has a truth value of "true"; if a proposition is false,

its truth value is "false".

For example, "Grass is green", and "2 + 5 = 5" are propositions. The first proposition has

the truth value of "true" and the second "false".

But "Close the door", and "Is it hot outside?" are not propositions.

Also "x is greater than 2", where x is a variable representing a number, is not a proposition,

because unless a specific value is given to x we can not say whether it is true or false, nor

do we know what x represents.

Similarly "x = x" is not a proposition because we don't know what "x" represents hence what

"=" means. For example, while we understand what "3 = 3" means, what does "Air is equal

to air" or "Water is equal to water" mean? Does it mean a mass of air is equal to another

mass or the concept of air is equal to the concept of air? We don't quite know what "x = x"

mean. Thus we cannot say whether it is true or not. Hence it is not a proposition.

Simple sentences which are true or false are basic propositions. Larger and more complex

sentences are constructed from basic propositions by combining them with connectives.

Thus propositions and connectives are the basic elements of propositional logic. Though

there are many connectives, we are going to use the following five basic connectives

here:

NOT, AND, OR, IF_THEN (or IMPLY), IF_AND_ONLY_IF.

They are also denoted by the symbols:

, , , , ,

respectively.

Truth Table

Often we want to discuss properties/relations common to all propositions. In such a case

rather than stating them for each individual proposition we use variables representing an

arbitrary proposition and state properties/relations in terms of those variables. Those

variables are called a propositional variable. Propositional variables are also

considered a proposition and called a proposition since they represent a proposition

hence they behave the same way as propositions. A proposition in general contains a

number of variables. For example (P Q) contains variables P and Q each of which

represents an arbitrary proposition. Thus a proposition takes different values depending on

the values of the constituent variables. This relationship of the value of a proposition and

those of its constituent variables can be represented by a table. It tabulates the value of a

proposition for all possible values of its variables and it is called a truth table.

For example the following table shows the relationship between the values of P, Q and P Q:

OR

P Q (P Q)

F F F

F T T

T F T

T T T

In the table, F represents truth-value false and T true. This table shows that P Q is false if

P and Q are both false, and it is true in all the other cases.

Meaning of the Connectives

Meaning of connectives: NOT, AND, OR, IMPLIES, IF AND ONLY IF

Let us define the meaning of the five connectives by showing the relationship between

the truth value (i.e. true or false) of composite propositions and those of their component

propositions. They are going to be shown using truth table. In the tables P and Q represent

arbitrary propositions, and true and false are represented by T and F, respectively.

This table shows that if P is true, then ( P) is false, and that if P is false, then ( P) is true.

This table shows that (P Q) is true if both P and Q are true, and that it is false in any other

case.

Similarly for the rest of the tables.

NOT

P P

T F

F T

AND

P Q (P Q)

F F F

F T F

T F F

T T T

OR

P Q (P Q)

F F F

F T T

T F T

T T T

IMPLIES

P Q (P Q)

F F T

F T T

T F F

T T T

When P Q is always true, we express that by P Q. That is P Q is used when

proposition P always implies proposition Q regardless of the value of the variables in them.

IF AND ONLY IF

P Q ( P Q )

F F T

F T F

T F F

T T T

When P Q is always true, we express that by P Q. That is is used when two

propositions always take the same value regardless of the value of the variables in them.

When P Q is always true, we express that by P Q. That is P Q is used when

proposition P always implies proposition Q regardless of the value of the variables in them.

IMPLIES

P Q (P Q)

F F T

F T T

T F F

T T T

When P Q is always true, we express that by P Q. That is is used when two

propositions always take the same value regardless of the value of the variables in them.

Construction of Complex Propositions

Syntax of propositions

First it is informally shown how complex propositions are constructed from simple ones.

Then more general way of constructing propositions is given.

In everyday life we often combine propositions to form more complex propositions without

paying much attention to them. For example combining "Grass is green", and "The sun is

red" we say something like "Grass is green and the sun is red", "If the sun is red, grass is

green", "The sun is red and the grass is not green" etc. Here "Grass is green", and "The sun

is red" are propositions, and form them using connectives "and", "if... then..." and "not" a

little more complex propositions are formed. These new propositions can in turn be

combined with other propositions to construct more complex propositions. They then can be

combined to form even more complex propositions. This process of obtaining more and

more complex propositions can be described more generally as follows:

Let X and Y represent arbitrary propositions. Then

[ X], [X Y], [X Y], [X Y], and [X Y]

are propositions.

Note that X and Y here represent an arbitrary proposition. This is actually a part of more

rigorous definition of proposition.

Example: [P -> [Q V R]] is a proposition and it is obtained by first constructing [Q V R] by

applying [X V Y] to propositions Q and R considering them as X and Y, respectively, then by

applying [X -> Y] to the two propositions P and [Q V R] considering them as X and Y,

respectively.

Note: Rigorously speaking X and Y above are place holders for propositions, and so they are

not exactly a proposition. They are called a propositional variable, and propositions formed

from them using connectives are called a propositional form. However, we are not going to

distinguish them here, and both specific propositions such as "2 is greater than 1" and

propositional forms such as (P Q) are going to be called a proposition.

For the proposition P Q, the proposition Q P is called its converse, and the

proposition Q P is called its contrapositive.

For example for the proposition "If it rains, then I get wet",

Converse: If I get wet, then it rains.

Contrapositive: If I don't get wet, then it does not rain.

The converse of a proposition is not necessarily logically equivalent to it, that is they may or

may not take the same truth value at the same time.

IF AND ONLY IF

P Q (P Q)

F F T

F T F

T F F

T T T

On the other hand, the contrapositive of a proposition is always logically equivalent to the

proposition. That is, they take the same truth value regardless of the values of their

constituent variables. Therefore, "If it rains, then I get wet." and "If I don't get wet, then it

does not rain." are logically equivalent. If one is true then the other is also true, and vice

versa.

As we are going to see in the next section, reasoning is done on propositions using inference

rules. For example, if the two propositions "if it snows, then the school is closed", and "it

snows" are true, then we can conclude that "the school is closed" is true. In everyday life,

that is how we reason.

To check the correctness of reasoning, we must check whether or not rules of inference have

been followed to draw the conclusion from the premises. However, for reasoning in English

or in general for reasoning in a natural language, that is not necessarily straight forward and

it often encounters some difficulties. Firstly, connectives are not necessarily easily identified

as we can get a flavor of that from the previous topic on variations of if then statements.

Secondly, if the argument becomes complicated involving many statements in a number of

different forms twisted and tangled up, it can easily get out of hand unless it is simplified in

some way.

One solution for that is to use symbols (and mechanize it). Each sentence is represented by

symbols representing building block sentences, and connectives. For example, if P

represents "it snows" and Q represents "the school is closed", then the previous argument

can be expressed as

[ [ P -> Q ] ^ P ] -> Q ,

or

P -> Q

P

-----------------------------

Q

This representation is concise, much simpler and much easier to deal with. In addition today

there are a number of automatic reasoning systems and we can verify our arguments in

symbolic form using them. One such system called TPS is used for reasoning exercises in

this course. For example, we can check the correctness of our argument using it.

To convert English statements into a symbolic form, we restate the given statements using

the building block sentences and the connectives of propositional logic (not, and, or, if_then,

if_and_only_if), and then substitute the symbols for the building blocks and the connectives.

For example, let P be the proposition "It is snowing", Q be the proposition "I will go the

beach", and R be the proposition "I have time". Then first "I will go to the beach if it is not

snowing" is restated as "If it is not snowing, I will go to the beach". Then symbols P and Q

are substituted for the respective sentences to obtain ~P -> Q. Similarly, "It is not snowing

and I have time only if I will go to the beach" is restated as "If it is not snowing and I have

time, then I will go to the beach", and it is translated as ( ~P ^ R ) -> Q.

3.3 Predicate Logic

The propositional logic is not powerful enough to represent all types of assertions that are

used in computer science and mathematics, or to express certain types of relationship

between propositions such as equivalence.

For example, the assertion "x is greater than 1", where x is a variable, is not a proposition

because you can not tell whether it is true or false unless you know the value of x. Thus the

propositional logic can not deal with such sentences. However, such assertions appear quite

often in mathematics and we want to do inferencing on those assertions.

Also the pattern involved in the following logical equivalences cannot be captured by the

propositional logic:

"Not all birds fly" is equivalent to "Some birds don't fly".

"Not all integers are even" is equivalent to "Some integers are not even".

"Not all cars are expensive" is equivalent to "Some cars are not expensive",

Each of those propositions is treated independently of the others in propositional logic. For

example, if P represents "Not all birds fly" and Q represents "Some integers are not even",

then there is no mechanism in propositional logic to find out the is equivalent to Q. Hence to

be used in inferencing, each of these equivalences must be listed individually rather than

dealing with a general formula that covers all these equivalences collectively and

instantiating it as they become necessary, if only propositional logic is used.

Predicate logic is a development of propositional logic, which should be familiar to you. In

proposition logic a fact such as `Àlison likes waffles'' would be represented as a simple

atomic proposition. Lets call it P. We can build up more complex expressions (sentences) by

combining atomic propositions with the logical connectives Ù Ú ~ ® and « . So if we had the

proposition Q representing the fact `Àlison eats waffles'' we could have the facts:

P Ú Q : `Àlison likes waffles or Alison eats waffles''

P Ù Q : `Àlison likes waffles and Alison eats waffles''

~ Q : `Àlison doesn't eat waffles''

P ® Q : `Ìf Alison likes waffles then Alison eats waffles''.

In general, if X and Y are sentences in propositional logic, then so are X ÙY, X Ú Y, ~X, X ® Y,

and X «Y. So the following are valid sentences in the logic:

P Ú ~Q

P Ù (P ®Q)

(Q Ú~R) ® P

Propositions can be true or false in the world. An interpretation function assigns, to each

proposition, a truth value (i.e., true or false). This interpretation function says what is true in

the world. We can determine the truth value of arbitrary sentences using truth tables which

define the truth values of sentences with logical connectives in terms of the truth values of

their component sentences. The truth tables provide a simple semantics for expressions in

propositional logic. As sentences can only be true or false, truth tables are very simple.

In order to infer new facts in a logic we need to apply inference rules. The semantics of the

logic will define which inference rules are universally valid. One useful inference rule is the

following (called modus ponens) but many others are possible:

assertion : a,

implication : a àb

---

conclusion : b

This rule just says that if a àb is true, and a is true, then b is necessarily true. We could

prove that this rule is valid using truth tables.

Thus we need more powerful logic to deal with these and other problems. The predicate

logic is one of such logic and it addresses these issues among others.

3.3.1 Introduction to Predicate LogicThe most important knowledge representation language is arguably predicate logic (or

strictly, first order predicate logic - there are lots of other logics out there to distinguish

between). Predicate logic allows us to represent fairly complex facts about the world, and to

derive new facts in a way that guarantees that, if the initial facts were true then so are the

conclusions. It is a well understood formal language, with well-defined syntax, semantics

and rules of inference.

The trouble with propositional logic is that it is not possible to write general statements in it,

such as ``Alison eats everything that she likes''. We'd have to have lots of rules, for every

different thing that Alison liked. Predicate logic makes such general statements possible.

Sentences in predicate calculus are built up from atomic sentences (not to be confused with

Prolog atoms). Atomic sentences consist of a predicate name followed by a number of

arguments. These arguments may be any term. Terms may be:

Constant symbols

such as ``alison''.

Variable symbols

such as ``X''. For consistency with Prolog we'll use capital letters to denote variables.

Function expressions

such as ``father(alison)''. Function expressions consist of a function followed by a

number of arguments, which can be arbitrary terms.

This should all seem familiar from our description of Prolog syntax. However, although

Prolog is based on predicate logic the way we represent things is slightly different, so the

two should not be confused.

So, atomic sentences in predicate logic include the following:

friends(alison, richard)

friends(father(fred), father(joe))

likes(X, richard)

Sentences in predicate logic are constructed (much as in propositional logic) by combining

atomic sentences with logical connectives, so the following are all sentences in predicate

calculus:

friends(alison, richard) ® likes(alison, richard)

likes(alison, richard) V likes(alison, waffles)

((likes(alison, richard) V likes(alison, waffles)) Ù~likes(alison, waffles)) ® likes(alison,

richard)

Sentences can also be formed using quantifiers to indicate how any variables in the

sentence are to be treated. The two quantifiers in predicate logic are " and $, so the

following are valid sentences:

$X bird(X) Ù~flies(X)

i.e., there exists some bird that doesn't fly.

"X (person(X) $Y loves(X,Y))

i.e., every person has something that they love.

A sentence should have all its variables quantified. So strictly, an expression like ``"X

loves(X, Y)'', though a well formed formula of predicate logic, is not a sentence. Formulae

with all their variables quantified are also called closed formulae.

3.3.2 Predicate Logic: SemanticsThe semantics of predicate logic is defined (as in propositional logic) in terms of the truth

values of sentences. Like in propositional logic, we can determine the truth value of any

sentence in predicate calculus if we know the truth values of the basic components of that

sentence. An interpretation function defines the basic meanings/truth values of the basic

components, given some domain of objects that we are concerned with.

In propositional logic we saw that this interpretation function was very simple, just assigning

truth values to propositions. However, in predicate calculus we have to deal with predicates,

variables and quantifiers, so things get much more complex.

Predicates are dealt with in the following way. If we have, say, a predicate P with 2

arguments, then the meaning of that predicate is defined in terms of a mapping from all

possible pairs of objects in the domain to a truth value. So, suppose we have a domain with

just three objects in: fred, jim and joe. We can define the meaning of the predicate father in

terms of all the pairs of objects for which the father relationship is true - say fred and jim.

The meaning of " and $ is defined again in terms of the set of objects in the domain. "X S

means that for every object X in the domain, S is true. $X S means that for some object X in

the domain, S is true. So, "X father(fred, X), given our world (domain) of 3 objects (fred, jim,

joe), would only be true if father(fred, X) was true for each object. In our interpretation of the

father relation this only holds for X=jim, so the whole quantified expression will be false in

this interpretation.

This only gives a flavor of how we can give a semantics to expressions in predicate logic.

The details are best left to logicians. The important thing is that everything is very precisely

defined, so if use predicate logic we should know exactly where we are and what inferences

are valid.

We can explain the whole concept in very easy term that

To cope with deficiencies of propositional logic we introduce two new features: predicates

and quantifiers.

A predicate is a verb phrase template that describes a property of objects, or a relationship

among objects represented by the variables.

For example, the sentences "The car Tom is driving is blue", "The sky is blue", and "The

cover of this book is blue" come from the template "is blue" by placing an appropriate

noun/noun phrase in front of it. The phrase "is blue" is a predicate and it describes the

property of being blue. Predicates are often given a name. For example any of "is_blue",

"Blue" or "B" can be used to represent the predicate "is blue" among others. If we adopt B

as the name for the predicate "is_blue", sentences that assert an object is blue can be

represented as "B(x)", where x represents an arbitrary object. B(x) reads as "x is blue".

Similarly the sentences "John gives the book to Mary", "Jim gives a loaf of bread to Tom",

and "Jane gives a lecture to Mary" are obtained by substituting an appropriate object for

variables x, y, and z in the sentence "x gives y to z". The template "... gives ... to ..." is a

predicate and it describes a relationship among three objects. This predicate can be

represented by Give( x, y, z ) or G( x, y, z ), for example.

Note: The sentence "John gives the book to Mary" can also be represented by another

predicate such as "gives a book to". Thus if we use B( x, y ) to denote this predicate, "John

gives the book to Mary" becomes B( John, Mary ). In that case, the other sentences, "Jim

gives a loaf of bread to Tom", and "Jane gives a lecture to Mary", must be expressed with

other predicates.

3.4 Quantification --- Forming Propositions from PredicatesSubjects to be Learned

universe

universal quantifier

existential quantifier

free variable

bound variable

scope of quantifier

order of quantifiers

Contents

A predicate with variables is not a proposition. For example, the statement x > 1 with

variable x over the universe of real numbers is neither true nor false since we don't know

what x is. It can be true or false depending on the value of x.

For x > 1 to be a proposition either we substitute a specific number for x or change it to

something like "There is a number x for which x > 1 holds", or "For every number x, x > 1

holds".

More generally, a predicate with variables (called an atomic formula) can be made a

proposition by applying one of the following two operations to each of its variables:

1. assign a value to the variable

2. quantify the variable using a quantifier (see below).

For example, x > 1 becomes 3 > 1 if 3 is assigned to x, and it becomes a true statement,

hence a proposition.

In general, a quantification is performed on formulas of predicate logic (called wff ), such as

x > 1 or P(x), by using quantifiers on variables. There are two types of quantifiers: universal

quantifier and existential quantifier.

The universal quantifier turns, for example, the statement x > 1 to "for every object x in

the universe, x > 1", which is expressed as " x x > 1". This new statement is true or false in

the universe of discourse. Hence it is a proposition once the universe is specified.

Similarly the existential quantifier turns, for example, the statement x > 1 to "for some

object x in the universe, x > 1", which is expressed as " x x > 1." Again, it is true or false in

the universe of discourse, and hence it is a proposition once the universe is specified.

Universe of Discourse

The universe of discourse, also called universe, is the set of objects of interest. The

propositions in the predicate logic are statements on objects of a universe. The universe is

thus the domain of the (individual) variables. It can be the set of real numbers, the set of

integers, the set of all cars on a parking lot, the set of all students in a classroom etc. The

universe is often left implicit in practice. But it should be obvious from the context.

The Universal Quantifier

The expression: x P(x), denotes the universal quantification of the atomic formula P(x).

Translated into the English language, the expression is understood as: "For all x, P(x) holds",

"for each x, P(x) holds" or "for every x, P(x) holds". is called the universal quantifier, and

x means all the objects x in the universe. If this is followed by P(x) then the meaning is that

P(x) is true for every object x in the universe. For example, "All cars have wheels" could be

transformed into the propositional form, x P(x), where:

P(x) is the predicate denoting: x has wheels, and

the universe of discourse is only populated by cars.

Universal Quantifier and Connective AND

If all the elements in the universe of discourse can be listed then the universal quantification

x P(x) is equivalent to the conjunction: P(x1)) P(x2) P(x3) ... P(xn) .

For example, in the above example of x P(x), if we knew that there were only 4 cars in our

universe of discourse (c1, c2, c3 and c4) then we could also translate the statement as:

P(c1) P(c2) P(c3) P(c4)

The Existential Quantifier

The expression: xP(x), denotes the existential quantification of P(x). Translated into the

English language, the expression could also be understood as: "There exists an x such that

P(x)" or "There is at least one x such that P(x)" is called the existential quantifier, and x

means at least one object x in the universe. If this is followed by P(x) then the meaning is

that P(x) is true for at least one object x of the universe. For example, "Someone loves you"

could be transformed into the propositional form, x P(x), where:

P(x) is the predicate meaning: x loves you,

The universe of discourse contains (but is not limited to) all living creatures.

Existential Quantifier and Connective OR

If all the elements in the universe of discourse can be listed, then the existential

quantification xP(x) is equivalent to the disjunction: P(x1) P(x2) P(x3) ... P(xn).

For example, in the above example of x P(x), if we knew that there were only 5 living

creatures in our universe of discourse (say: me, he, she, rex and fluff), then we could also

write the statement as: P(me) P(he) P(she) P(rex) P(fluff)

An appearance of a variable in a wff is said to be bound if either a specific value is assigned

to it or it is quantified. If an appearance of a variable is not bound, it is called free. The

extent of the application(effect) of a quantifier, called the scope of the quantifier, is

indicated by square brackets [ ]. If there are no square brackets, then the scope is

understood to be the smallest wff following the quantification. For example, in x P(x, y),

the variable x is bound while y is free. In x [ y P(x, y) Q(x, y) ] , x and the y in P(x,

y) are bound, while y in Q(x, y) is free, because the scope of y is P(x, y). The scope of x

is [ y P(x, y) Q(x, y) ] .

How to read quantified formulas

When reading quantified formulas in English, read them from left to right. x can be

read as "for every object x in the universe the following holds" and x can be read as "there

exists an object x in the universe which satisfies the following" or "for some object x in the

universe the following holds". Those do not necessarily give us good English expressions.

But they are where we can start. Get the correct reading first then polish your English

without changing the truth values.

For example, let the universe be the set of airplanes and let F(x, y) denote "x flies faster

than y". Then x y F(x, y) can be translated initially as "For every airplane x the following

holds: x is faster than every (any) airplane y". In simpler English it means "Every airplane is

faster than every airplane (including itself !)".

x y F(x, y) can be read initially as "For every airplane x the following holds: for some

airplane y, x is faster than y". In simpler English it means "Every airplane is faster than

some airplane".

x y F(x, y) represents "There exist an airplane x which satisfies the following: (or such

that) for every airplane y, x is faster than y". In simpler English it says "There is an airplane

which is faster than every airplane" or "Some airplane is faster than every airplane".

x y F(x, y) reads "For some airplane x there exists an airplane y such that x is faster than

y", which means "Some airplane is faster than some airplane".

Order of Application of Quantifiers

When more than one variables are quantified in a wff such as y x P( x, y ), they are

applied from the inside, that is, the one closest to the atomic formula is applied first. Thus

y x P( x, y ) reads y [ x P( x, y ) ] , and we say "there exists an y such that for every

x, P( x, y ) holds" or "for some y, P( x, y ) holds for every x".

The positions of the same type of quantifiers can be switched without affecting the truth

value as long as there are no quantifiers of the other type between the ones to be

interchanged.

For example x y z P(x, y , z) is equivalent to y x z P(x, y , z), z y x P(x, y ,

z), etc. It is the same for the universal quantifier.

However, the positions of different types of quantifiers can not be switched.

For example x y P( x, y ) is not equivalent to y x P( x, y ). For let P( x, y ) represent

x < y for the set of numbers as the universe, for example. Then x y P( x, y ) reads "for

every number x, there is a number y that is greater than x", which is true, while y x P( x,

y ) reads "there is a number that is greater than every (any) number", which is not true.

3.5 Well-Formed Formula for First Order Predicate Logic --- Syntax RulesSubjects to be Learned

wff (well formed formula)

atomic formula

syntax of wff

Contents

Not all strings can represent propositions of the predicate logic. Those which produce a

proposition when their symbols are interpreted must follow the rules given below, and they

are called wffs(well-formed formulas) of the first order predicate logic.

Rules for constructing Wffs

A predicate name followed by a list of variables such as P(x, y), where P is a predicate name,

and x and y are variables, is called an atomic formula.

Wffs are constructed using the following rules:

1. True and False are wffs.

2. Each propositional constant (i.e. specific proposition), and each propositional variable

(i.e. a variable representing propositions) are wffs.

3. Each atomic formula (i.e. a specific predicate with variables) is a wff.

4. If A, B, and C are wffs, then so are A, (A B), (A B), (A B), and (A B).

5. If x is a variable (representing objects of the universe of discourse), and A is a wff,

then so are x A and x A .

(Note : More generally, arguments of predicates are something called a term. Also variables

representing predicate names (called predicate variables) with a list of variables can form

atomic formulas. But we do not get into that here. Those who are interested click here.)

For example, "The capital of Virginia is Richmond." is a specific proposition. Hence it is a wff

by Rule 2.

Let B be a predicate name representing "being blue" and let x be a variable. Then B(x) is an

atomic formula meaning "x is blue". Thus it is a wff by Rule 3. above. By applying Rule 5. to

B(x), xB(x) is a wff and so is xB(x). Then by applying Rule 4. to them x B(x) x B(x)

is seen to be a wff. Similarly if R is a predicate name representing "being round". Then R(x)

is an atomic formula. Hence it is a wff. By applying Rule 4 to B(x) and R(x), a wff B(x)

R(x) is obtained.

In this manner, larger and more complex wffs can be constructed following the rules given

above.

Note, however, that strings that can not be constructed by using those rules are not wffs.

For example, xB(x)R(x), and B( x ) are NOT wffs, NOR are B( R(x) ), and B( x R(x) ) .

One way to check whether or not an expression is a wff is to try to state it in

English. If you can translate it into a correct English sentence, then it is a wff.

More examples: To express the fact that Tom is taller than John, we can use the atomic

formula taller(Tom, John), which is a wff. This wff can also be part of some compound

statement such as taller(Tom, John) taller(John, Tom), which is also a wff.

If x is a variable representing people in the world, then taller(x,Tom), x taller(x,Tom),

x taller(x,Tom), x y taller(x,y) are all wffs among others.

However, taller( x,John) and taller(Tom Mary, Jim), for example, are NOT wffs.

3.6 From Wff to PropositionSubjects to be Learned

interpretation

satisfiable wff

invalid wff (unsatisfiable wff)

valid wff

equivalence of wffs

Contents

Interpretation

A wff is, in general, not a proposition. For example, consider the wff x P(x). Assume that

P(x) means that x is non-negative (greater than or equal to 0). This wff is true if the

universe is the set {1, 3, 5}, the set {2, 4, 6} or the set of natural numbers, for example,

but it is not true if the universe is the set {-1, 3, 5}, or the set of integers, for example.

Further more the wff x Q(x, y), where Q(x, y) means x is greater than y, for the universe

{1, 3, 5} may be true or false depending on the value of y.

As one can see from these examples, the truth value of a wff is determined by the universe,

specific predicates assigned to the predicate variables such as P and Q, and the values

assigned to the free variables. The specification of the universe and predicates, and

an assignment of a value to each free variable in a wff is called an interpretation for

the wff.

For example, specifying the set {1, 3, 5} as the universe and assigning 0 to the variable y,

for example, is an interpretation for the wff x Q(x, y), where Q(x, y) means x is greater

than y. x Q(x, y) with that interpretation reads, for example, "Every number in the set {1,

3, 5} is greater than 0".

As can be seen from the above example, a wff becomes a proposition when it is given

an interpretation.

There are, however, wffs which are always true or always false under any interpretation.

Those and related concepts are discussed below.

Satisfiable, Unsatisfiable and Valid Wffs

A wff is said to be satisfiable if there exists an interpretation that makes it true, that is if

there are a universe, specific predicates assigned to the predicate variables, and an

assignment of values to the free variables that make the wff true.

For example, x N(x), where N(x) means that x is non-negative, is satisfiable. For if the

universe is the set of natural numbers, the assertion x N(x) is true, because all natural

numbers are non-negative. Similarly x N(x) is also satisfiable.

However, x [N(x) N(x)] is not satisfiable because it can never be true. A wff is called

invalid or unsatisfiable, if there is no interpretation that makes it true.

A wff is valid if it is true for every interpretation*. For example, the wff x P(x) x P(x)

is valid for any predicate name P , because

x P(x) is the negation of x P(x).

However, the wff x N(x) is satisfiable but not valid.

Note that a wff is not valid iff it is unsatisfiable for a valid wff is equivalent to true.

Hence its negation is false.

Equivalence

Two wffs W1 and W2 are equivalent if and only if W1 W2 is valid, that is if and only if W1

W2 is true for all interpretations.

For example x P(x) and x P(x) are equivalent for any predicate name P . So are x

[ P(x) Q(x) ] and [ x P(x) x Q(x) ] for any predicate names P and Q .

To be precise, it is not for every interpretation but for the ones that "make sense". For

example you don't consider the universe of set of people for the predicate x > 1 as an

interpretation.

Also an interpretation assigns a specific predicate to each predicate variable. A rigorous

definition of interpretation etc. are, however, beyond the scope of this course.

3.7 Transcribing English to Predicate Logic wffsSubjects to be Learned

Translating English sentences to wff

Contents

English sentences appearing in logical reasoning can be expressed as a wff. This makes the

expressions compact and precise. It thus eliminates possibilities of misinterpretation of

sentences. The use of symbolic logic also makes reasoning formal and mechanical,

contributing to the simplification of the reasoning and making it less prone to errors.

Transcribing English sentences into wffs is sometimes a non-trivial task. In this course we

are concerned with the transcription using given predicate symbols and the universe.

To transcribe a proposition stated in English using a given set of predicate symbols, first

restate in English the proposition using the predicates, connectives, and quantifiers. Then

replace the English phrases with the corresponding symbols.

Example: Given the sentence "Not every integer is even", the predicate "E(x)" meaning x is

even, and that the universe is the set of integers, first restate it as "It is not the case that

every integer is even" or "It is not the case that for every object x in the universe, x is even."

Then "it is not the case" can be represented by the connective " ", "every object x in the

universe" by " x", and "x is even" by E(x). Thus altogether wff becomes x E(x). This given

sentence can also be interpreted as "Some integers are not even". Then it can be restated

as "For some object x in the universe, x is not integer". Then it becomes x E(x).

More examples: A few more sentences with corresponding wffs are given below. The

universe is assumed to be the set of integers, E(x) represents x is even, and O(x), x is odd.

"Some integers are even and some are odd" can be translated as

x E(x) x O(x)

"No integer is even" can go to

x E(x)

"If an integer is not even, then it is odd" becomes

x [ E(x) O(x)]

"2 is even" is

E(2)

More difficult translation: In these translations, properties and relationships are

mentioned for certain type of elements in the universe such as relationships between

integers in the universe of numbers rather than the universe of integers. In such a case the

element type is specified as a precondition using if_then construct.

Examples: In the examples that follow the universe is the set of numbers including real

numbers, and complex numbers. I(x), E(x) and O(x) representing "x is an integer", "x is

even", and "x is odd", respectively.

"All integers are even" is transcribed as

x [ I(x) E(x)]

It is first restated as "For every object in the universe (meaning for every number in this

case) if it is integer, then it is even". Here we are interested in not any arbitrary

object(number) but a specific type of objects, that is integers. But if we write x it means

"for any object in the universe". So we must say "For any object, if it is integer .." to narrow

it down to integers.

"Some integers are odd" can be restated as "There are objects that are integers and odd",

which is expressed as

x [ I(x) E(x)]

For another interpretation of this sentence see a note "A number is even only if it is integer"

becomes

x [ E(x) I(x)]

"Only integers are even" is equivalent to "If it is even, then it is integer". Thus it is translated

to

x [ E(x) I(x)]

Proving Things in Predicate Logic

To prove things in predicate calculus we need two things. First we need to know what

inference rules are valid - we can't keep going back to the formal semantics when trying to

draw a simple inference! Second we need to know a good proof procedure that will allow us

to prove things with the inference rules in an efficient manner.

When discussing propositional logic we noted that a much used inference rule was modus

ponens:

A,

A ® B

---

B

This rule is a sound rule of inference for predicate logic. Given the semantics of the logic, if

the premises are true then the conclusions are guaranteed true. Other sound inference rules

include modus tollens (if A ® B is true and B is false then conclude ~ A), and-elimination (if A

Ù B is true then conclude both A is true and B is true), and lots more.

In predicate logic we need to consider how to apply these rules if the expressions involved

have variables. For example we would like to be able to use the facts "X (man(X) ® mortal(X)) and man(socrates) and conclude mortal(socrates). To do this we can use modus

ponens, but allow universally quantified sentences to be matched with other sentences (like

in Prolog). So, if we have a sentence "X A ® B and a sentence C then if A and C can be

matched or unified then we can apply modus ponens.

Representing Things in Predicate Logic

Your average AI programmer/researcher may not need to know the details of predicate logic

semantics or proof theory, but they do need to know how to represent things in predicate

logic, and what expressions in predicate logic mean. Formally we've already gone through

what expressions mean, but it may make more sense to give a whole bunch of examples.

This section will just give a list of logical expressions paired with English descriptions, then

some unpaired logical or English expressions - you should try and work out for yourself how

to represent the English expressions in Logic, and what the Logic expressions mean in

English. There may be exam questions of this sort.

" $ ® « Ù Ú $x Table(x) Ù ~numberoflegs (x,4)

``There is some table that doesn't have 4 legs''

"x (macintosh(x) ® ~realcomputer(x))

``No macintosh is a real computer'' or

`Ìf something is a macintosh then its not a real computer''

"x glaswegian(x) ® (supports(x,rangers) V supports(x,celtic))

`Àll Glaswegians support either Celtic or Rangers''

existsXsmall(x) Ù on(x, table)

``There is something small on the table''

`Àll elephants are grey''

`Èvery apple is either green or yellow''

``There is some student who is intelligent''

"x red(x) Ù on(x,table) ® small(x)

~$x brusslesprout(x) Ù lasiy(x)

[Note: When asked to translate English statements into predicate logic you should NOT use

set expressions. So I don't want things like. "x : Xecarrols : orange(x)]

3.8 Properties of StatementsSatisfiable à A statement is satisfiable if there is some interpretation for which it is true.

For exp P is satisfiable because we can assign either true to P or false to P.

Contradiction à A sentence is contradictory (unsatisfiable) if there is no interpretation for

which it is true. For exp P & ~P is always contradiction since every interpretation results in a

value of false.

Valid à A sentence is valid if it is true for every interpretation. Valid sentence are also

called tautologies. For exp P V ~P is always valid since every interpretation results in a value

of true.

Equivalence. à Two sentences are equivalent if they have the same truth-value under

every interpretation. For exp P & ~(~P) are equivalent since each has the same truth values

under every interpretation.

Logical consequences à A sentence is a logical consequence of another is it is satisfied

by all interpretations which satisfy the first. For exp P is a logical consequence of (P & Q)

since any interpretation for which (P & Q) is true, P is also true.

SOME EQUIVALENCE LAW

Idempotent Laws P Ç P ºP

P È P º P

Commutative Laws P Ç Q º Q Ç P

P È Q º Q È P

Distributive Laws P Ç (Q È R) º (P Ç Q) È (P Ç R)

P È (Q Ç R) º (P È Q) Ç (P È R)

Associative Laws P Ç (Q Ç R) º (P Ç Q) Ç R

P È (Q È R) º (P È Q) È R

Absorptive Laws A È (A Ç B) º A

A Ç (A È B) º A

De Morgan’s Laws Ø(P Ç Q) º ØP È ØQ

Ø(P È Q) º ØP Ç ØQ

Conditional elimination P ® Q º ØP È Q

Bi-conditional elimination P « Q = (P à Q) & (Q à P)

TRUTH TABLE FOR EQUIVALANENT SENTENCES

P Q ~P (~PvQ) (PàQ) (QàP) (PàQ)&(QàP)

true true false true true true true

true false false false false true false

false true true true true false false

false false true true true true true

3.9 Inference RulesThe inference rules of PL provide the means to perform logical proofs or deductions. There

are two categories of inference rules

A. Deductive inference Rules – are those inference rules, which are certain. The

followings are the deductive inference rules

1. Modus ponens : From P and PàQ infer Q. This is sometimes written as

Assertion: P

Implication: PàQ

Conclusion: Q

Here is an example of an argument that fits the form modus ponens:

If democracy is the best system of government, then everyone should vote.

Democracy is the best system of government.

Therefore, everyone should vote.

2. Modus tollens :

Assertion: ~P

Implication: PàQ

Conclusion: ~Q

If there is fire here, then there is oxygen here.

There is no oxygen here.

Therefore, there is no fire here.

3. Chain rules : From PàQ, and QàR, infer PàR. Or

PàQ

QàR

PàR

For example,

Given : (programmer likes LISP) à (programmer hate COBOL)

And : (programmer hates COBOL) à (Programmer likes recursion)

Conclusion : (programmer likes LISP) à (Programmer likes recursion)

4. Substitution – if s is a valid sentence, s’ derived from s by consistent substitution

of propositions in s, is also valid. For example, the sententence P V ~p is valid.

B. Non-Deductive inference Rules – are those inference rules, which are not certain.

1. Abductive Inference : Abductive inference is based on the use of known casual

knowledge to explain or justify a (possibly invalid) conclusion. Given the truth of

proposition Q and the implication PàQ, conclude P. For example, people who have

had too much to drink tend to stagger when they walk. Therefore, it is not

unreasonable to conclude that a person who is staggering is drunk even though this

may be an incorrect conclusion. People may stagger when they walk for other

reasons, including dizziness from twirling in circles or from some physical problem.

We may represent abductive inference with the following, where the c over the

implication arrow is meant to imply a possible causal relationship.

assertion Q

implication PàQ

conclusion P

abductive inference is useful when known causal relations are likely and deductive

inferencing is not possible for lack of facts.

2. Inductive Inference : Inductive inference is based on the assumption that a

recurring pattern, observed for some event or entity, implies that the pattern is true

for all entities in the class. For exp, after seeing a few white swans, we incorrectly

infer that all swans are white

We can represent inductive inference using the following description.

P(a1),……,P(ak)

"x P(x)

Inductive inference, of course, is not a valid form of inference, since it is not usually

the case that all objects of a class can be verified as having a particular property.

Even so, this is an important and commonly used form of inference.

3. Analogical Inference : Analogical inference is a form of experiential inference. Situation

or entities which are alike in some respects tend to be similar in other respects. Thus,

when we find that situation(object) A is related in certain ways to B, and A’ is similar in

some context to A, we conclude that B’ has a similar relation to A’ in this context. We

depict this form of inference with the following description, where the r above the

implication symbol mean is related to.

P à Q

P’ à Q’

Analogical inference, like abductive and inductive is a useful but invalid form of

commonsense inference.

3.10 ResolutionThe most well known general proof procedure for predicate calculus is resolution. Resolution

is a sound proof procedure for proving things by refutation - if you can derive a contradiction

from ~P then P must be true. In resolution theorem proving, all statements in the logic are

transformed into a normal form involving disjunctions of atomic expressions or negated

atomic expressions (e.g., ~dog(X) V animal(X)). This allows new expressions to be deduced

using a single inference rule. Basically, if we have an expression A1 v A2 ...v An v ~C and an

expression B1 v B2 ...v Bm v C then we can deduce a new expression A1 v A2 ...v An v B1 v

B2 ...v Bm. This single inference rule can be applied in a systematic proof procedure.

Resolution is a sound proof procedure. If we prove something using it we can be sure it is a

valid conclusion. However, there are many other things to worry about when looking at a

proof procedure. It may not be complete (i.e., we may not be able to always prove

something is true even if it is true) or decidable (the procedure may never halt when trying

to prove something that is false). Variants of resolution may be complete, but no proof

procedure based on predicate logic is decidable. And of course, it may just not be

computationally efficient. It may eventually prove something, but take such a long time that

it is just not usable. The efficiency of a proof will often depend as much on how you

formulate your problem as on the general proof procedure used, but it is still an important

issue to bear in mind.

Resolution is very simple. Given two clauses C1 and C2 with no variables in common, if there

is a literal l1 in C1 which is a complement of a literal l2 in C2, both l1 and l2 are deleted and

a disjuncted C is formed from the remaining reduced clauses. The clause C is called the

resolvent of C1 and C2. For example, to resolve the two clauses

(~P V Q) and (~Q V R)

we write

~P V Q, ~Q V R

~P V R

several types of resolution are possible depending on the number and types of parents. We

define a few of these types below.

Binary Resolution : Two clauses having complementary literals are combined as

disjuncts to produce a single clause after deleting the complementary literals. For example,

the binary resolvent of

~P(x,a) V Q(x) and ~Q(b) V R(x)

is just

~P(b,a) V R(b)

Unit resulting (UR) resolution : A number of clauses are resolved simultaneously to

produce a unit clause. All except one of the clauses are unit clauses, and that one clause has

exactly one more literal than the total number of unit clauses. For example, resolving the set

{~MARRIED(x,y) V ~MOTHER(x,z) V FATHER(y,z),

MARRIED(sue,joe), ~FATHER(joe,bill)}

Where the substitution b = { sue/x, joe/y, bill/z } is used, resulting in the unit clause

~MOTHER(sue,bill).

Linear resolution : When each resolved clause Ci is a parent to the clause Ci+1 (i = 1,2

….,n-1) the process is called linear resolution. For example, given a set S of clauses with C0 Í S, Cn is derived by a sequence of resolutions, C0 with some clause B0 to get C1, then C1 with

some clause B1 to get C2, and so on until Cn has been derived.

Linear input resolution : if one of the presents in linear resolution is always from the

original set of clauses (the Bi), we have linear input resolution. For example, given the set of

clauses S = { P V Q, ~P V Q, P V ~Q, ~P V ~Q} let C 0 = (P V Q). Choosing B0 = ~P V Q from

the set S and resolving this with C0 we obtain the resolvent Q = C1. B1 must now be chosen

from S and the resolvent of C1 and B1 becomes C2 and so on.

3.11 CONVERSION TO CLAUSAL FORMAs noted earlier, we are interested in mechanical inference by programs using symbolic

FOPL expressions. One method we shall examine is called resolution. It requires that all

statements be converted into a normalized clausal form.

To transform a sentence into clausal form requires the following steps:

Step 1. Eliminate all implication and equivalency connectives (use ~P V Q in place of

P à Q) and (~P V Q) & (~Q V P) in place of P « Q.

Step 2. Move all negations in to immediately precede an atom (use P in place of ~(~P), and

DeMorgan’s laws, $x ~F[x] in place of ~("x) F[x] and "x ~F[x] in place of ~($x) F[x]).

Step 3. Rename variables, if necessary, so that all quantifiers have different variable

assignments; that is, rename variables so that variables bound by one quantifier are not the

same as variables bound by a different quantifier. For example, in the expression "x (P(x) à ($y Q(y))).

Step 4. Skolemize by replacing all existentially quantified variables with Skolem functions

as described below, and deleting the corresponding existential quantifiers.

We describe the process of eliminating the existential quantifiers through a substitution

process. This process requires that all such variables be replaced by something called

Skolem functions, arbitrary functions which can always assume a correct value required of

an existentially quantified variable.

For simplicity in what follows, assume that all quantifiers have been properly moved to the

left side of the expression, and each quantifies a different variable. Skolemization, the

replacement of existentially quantified variables with Skolem function and deletion of the

respective quantifiers, is then accomplished as follows:

1. If the first (leftmost) quantifier in an expression is an existential quantifier, replace all

occurrences of the variable it quantifies with an arbitrary constant not appearing

elsewhere in the expression and deleting the quantifier. This same procedure should

be followed for all other existential quantifiers not preceded by a universal quantifier,

in each case, using different constant symbols in the substitution. 2. For each existential quantifier that is preceded by one or more universal quantifiers

( is within the scope of one or more universal quantifiers), replace all occurrences of

the existentially quantified variable by a function symbol not appearing elsewhere in

the expression. The arguments assigned to the function should match all the

variables appearing in each universal quantifier which precedes the existential

quantifier. This existential quantifier should then be deleted. The same process

should be repeated for each remaining existential quantifier using a different function

symbol and choosing function arguments that correspond to all universally quantified

variables that precede the existentially quantified variable being replaced. An example will help to clarify this process. Given the expression

$x "v "x $y P(f(u),v,x,y) à Q(u,v,y)

the Skolem form is determined as

"v "x P(f(a),v,x,g(v,x))à Q(a,v,g(v,x)).

In making the substitutions, it should be noted that the variable you appearing after the first

existential quantifier has been replaced in the second expression by the arbitrary constant

a. This constant did not appear elsewhere in the first expression. The variable y has been

replaced by the function symbol g having the variable v and x as arguments, since both of

these variables are universally quantified to the left of the existential quantifier for y.

Replacement of y by an arbitrary function with arguments v and x is justified on the basis

that y, following v and x, may be functionally dependency.

Step 5. Move all universal quantifiers to the left of the expressions and put the expression

on the right into CNF.

Step 6. Eliminate all universal quantifiers and conjunctions since they are retained

implicitly. The resulting expressions (the expressions previously connected by the

conjunctions) are clauses and the set of such expressions is said to be in clausal form.

As an example of this process, let us convert the expression

$x "y ("z P(f(x),y,z) à ($u Q(x,u) & $v R(y,v))).

into clausal form. We have after application of step 1

$x "y (~("z) P(f(x),y,z) V ($u Q(x,u) & ($v) R(y,v))).

After application of step 2 we obtain

$x "y ($z ~P(f(x),y,z) V ($u Q(x,u) & ($v) R(y,v))).

After application of step 4 (step 3 is not required)

"y (~P(f(a),y,g(y)) V Q(a,h(y)) & R(y,l(y))).

After application of step 5 the result is

"y((~P(f(a),y,g(y)) V Q(a,h(y)) & (~P(f(a),y,g(y)) V R(y,l(y))).

Finally, after application of step 6 we obtain the clausal form

~P(f(a),y,g(y)) V Q(a,h(y))

~P(f(a),y,g(y) V R(y,l(y))

3.12 Unification Resolution works on the principle of identifying complementary literals in two clauses and

deleting them thereby forming a new literal (the resolvent). The process is simple and

straightforward when one has identical literals. In other words, for clauses containing no

variables, resolution is easy.

There are three major types of substitutions, viz.

1. Substitution of a variable by a constant.

2. Substitution of a variable by another variable.

3. Substitution of a variable by a function that does not contain the same variable. A substitution that makes two clauses resolvable is called unifier and the process of

identifying such unifier is carried out by the unification algorithm. We can also define the

same in another way that any substitution that makes two or more expressions equal is

called a unifier for the expressions.

The unification algorithm tries to find out the Most General Unifier (MGU) between a given

set of atomic formulae. For example, to unify P(f(a,x),y,y) and P(x,b,z) we first rename

variables so that the two predicates have no variables in common. This can be done by

replacing the x in the second predicate with u to given P(u,b,z). Next, we compare the two

symbol-by-symbol from left to right until a disagreement is found. Disagreement can be

between two different variables, a nonvariable term and a variable, or two nonvariable

terms if no disagreement is found, the two are identical and we have succeeded.

If a disagreement is found and both are nonvariable terms, unification is impossible, so we

have failed. If both are variable, one is replaced throughout by other. Finally, if the

disagreement is a variable and a nonvariable term, the variable is replaced by the entire

term. Of course, in this last step, replacement is possible only if the term does not contain

the variable that is being replaced. This matching process is repeated until the two are

unified or until a failure occurs. For the two predicate P. above, a disagreement is first found

between the term f(a,x) and variable u. Since f(a.x) does not contain the variable u, we

replace u with f(a,x) everywhere it occurs in the literal. This gives a substitution set of

{ f(a,x)/u } and the partially matched predicates P (f(a,x),y,y) and P (f(a,x),b,z).

Proceeding with the match, we find the next disagreement pair, y and b, a variable and

term, respectively. Again, we replace the variable y with the term b and update the

substitution list to get { f(a,x)/u, b/y }. The final disagreement pair is two variables.

Replacing the variable in the second literal with the first we get the substitution set

{ f(a,x)/u, b/y, y/z } or, equivalently, { f(a,x/u, b/y, b/z}. Note that this procedure can

always give the most general unifier.

UNIT 4 STRUCTURED KNOWLEDGE REPRESENTATION

4.1 Semantic Network

4.2 Conceptual Graph

4.3 Frame Structures

4.4 Conceptual Dependency

4.5 Scripts

4.1 SEMENTIC NETWORK Network representation gives a pictorial presentation of objects, their attributes and

relationships that exist between them and other entities.

A semantic network or a semantic net is a structure for representing knowledge as a pattern

of interconnected nodes and arcs. It is also defined as a graphical representation of

knowledge. The objects under consideration serve as nodes and the relationships with

another nodes give the arcs.

The following rules about nodes and arcs generally apply to most of the semantic networks.

1. Nodes in the semantic net represent either

o Entities

o Attributes

o State or

o Events

2. Arcs in the net give the relationship between the nodes and labels on the arc

specify what type of relationship actually exists.

Using a simple semantic net, it is possible to add more knowledge by linking other objects

with different relationships. Following figure shows this

is-a is-a

is-a

has has

has has

From this, it is possible for us to say that a scooter is a two-wheeler and it is a moving

vehicle. The network also shows that a moving vehicle needs an engine (could be petrol

diesel or any engine), a fuel system to sustain the engine running, an electrical system for

its lights, horn and also for initial ignition (in case of petrol vehicles) and brakes (of course,

very important).

Unlike FOPL, there is neither generally accepted syntax nor semantics for associative

networks. Such rules tend to be designer dependent and vary greatly from one

implementation to another.

Classification of Nodes in a Semantic Net

Generally, the nodes in the semantic net are classified as

Generic nodes.

Individual or instance nodes.

A Generic node is a very general node. In the above fig, for the semantic network of Moving-

vehicle, the Two-wheeler is a generic node because many two-wheeler exist. On the

contrary, individual or instance node explicitly state that they are specific node. In the above

figure Scooter is an individual node. It is a very specific instance of the two-wheeler.

A number of arc relations have become common among users. They include such predicate

as is-a, member-of, subset-of, ako (a kind of), has-parts, instance-of, agent etc.

Less common arcs have also been used to express modality relation (time, manner, mood),

linguistics case relations (theme, source, goal), logical connectives (or, not, and, implies),

quantifier (all, some), set relations (superset, subset, member), attributes, and quantification

(ordinal, count).

One particular arc or link, the is-a link, has taken on a special meaning. It signifies that

scooter is a two-wheeler and motorbike is a two-wheeler. Is-a relationship occurs in many

representations of worlds. Bill is a student, a car is a furry animal, a tree is a plant, and so

on. The is-a link is most often used to represent the fact than an object is of a certain type

(predication) or to express the fact that one type is a subtype of another (for example,

conditional quantification).

Semantic Network structure permits the implementation of property inheritance, a form of

inference. Nodes, which are members or subsets of other nodes, may inherit properties from

their higher-level ancestor nodes. For example, from the following fig. It is possible to infer

that a mouse has hair and drinks milk.

Hair mammal milk

is-a

rodent

is-a

mouse

4.2 CONCEPTUAL GRAPHSAlthough there are no commonly accepted standards for a syntax and semantics for

associative networks, we present an approach in this section, which we feel may at least

become a de facto standard in the future. It is based on the use of the conceptual graph as a

primitive building block for semantic networks.

A conceptual graph is a graphical portrayal of a mental perception which consists of basic or

primitive concepts and the relationships that exist between the concepts. A single

conceptual graph is roughly equivalent to a graphical diagram of a natural language

sentence where the words are depicted as concepts and relationships. Conceptual graphs

may be regarded as formal building blocks for semantic network which, when linked

together in a coherent way, form a more complex knowledge structure. An example of such

a graph which represents the sentence “Ram is eating soup with a spoon” is depicted as

A plumber is carrying a pipe.”

Agent eat object

Instrument

In this figure concepts are enclosed in boxes and relations between the concepts are

enclosed in ovals. The direction of the arrow corresponds to the order of the arguments in

the relation they connect. The last or nth arc (argument) points away from the circle relation

and all other arcs point toward the relation.

Concept symbols refer to entities, actions, properties, or events in the world. A concept may

be individual or generic. Individual concepts have a type field followed by a referent field.

The concept [PERSON:ram] has type PERSON and referent ram. Referent like ram and food

in figure are called individual concepts since they refer to specific entities. EAT and SPOON

have no referent fields since they are generic concepts which refer to unspecified entities.

Concepts like AGENT, OBJECT, INSTRUMENT, and PART are obtained from a collection of

standard concepts. New concepts and relations can also be defined from these basic ones.

A linear conceptual graph, which is easier to present as text can also be given. The linear

form equivalent to the above sentence is

[PERSON:ram] (AGENT) [EAT]-

(OBJECT) à [FOOD:soup]

(INSTRUMENT) à (SPOON)

where square bracket have replaced concept boxes and parentheses have replaced relation

circles.

Some Examples of Conceptual Graphs are

Example 1. "John is going to Boston by bus."

In DF, concepts are represented by rectangles: [Go], [Person: John], [City: Boston], and

[Bus]. Conceptual relations are represented by circles or ovals: (Agnt) relates [Go] to the

agent John, (Dest) relates [Go] to the destination Boston, and (Inst) relates [Go] to the

instrument bus.

Above figure could be read as three English sentences:

Go has an agent which is a person John.

Go has a destination which is a city Boston.

Go has an instrument which is a bus.

The linear form for CGs is intended as a more compact notation than DF, but with good

human readability. It is exactly equivalent in expressive power to the abstract syntax and

the display form. Following is the LF for above Figure :

[Go]-

(Agnt)->[Person: John]

(Dest)->[City: Boston]

(Inst)->[Bus].

In this form, the concepts are represented by square brackets instead of boxes, and the

conceptual relations are represented by parentheses instead of circles. A hyphen at the end

of a line indicates that the relations attached to the concept are continued on subsequent

lines.

Example 2. A person is between a rock and a hard place.

. In LF, above Figure may be represented in the following form:

[Person]<-(Betw)-

<-1-[Rock]

<-2-[Place]->(Attr)->[Hard].

Example 3. Tom believes that Mary wants to marry a sailor

A conceptual graph containing a nest of two contexts

In the above Figure, Tom is the experiencer (Expr) of the concept [Believe], which is linked

by the theme relation (Theme) to a proposition that Tom believes. The proposition box

contains another conceptual graph, which says that Mary is the experiencer of [Want], which

has as theme a situation that Mary hopes will come to pass. That situation is described by

another nested graph, which says that Mary (represented by the concept [&top;]) marries a

sailor. The dotted line, called a coreference link, shows that the concept [&top;] in the

situation box refers to the same individual as the concept [Person: Mary] in the proposition

box. Following is the linear form of Figure 4:

[Person: Tom]<-(Expr)<-[Believe]->(Thme)-

[Proposition: [Person: Mary *x]<-(Expr)<-[Want]->(Thme)-

[Situation: [?x]<-(Agnt)<-[Marry]->(Thme)->[Sailor] ]].

4.3 FRAME STRUCTURESFrame were first introduced by Marvin Minsky (1975) as a data structure to represent a

mental model of a stereotypical situation such as driving a car, attending a meeting, or

eating in a restaurant. Knowledge about an object or event is stored together in memory as

a unit. Then, when a new situation is encountered, an appropriate frame is selected from

memory for use in reasoning about the situation.

Frame are general record-like structure which consist of a collection of slots and slot

values. The slots may be of any size and type. Slots typically have names and values or sub

fields called facets. Facets may also have names and any number of values.

A general frame template structure is illustrated in following figure

(<frame name>

(<slot1> (<facet1><value1>……<valuek1>)

(<facet2><value1>……<valuek2>)

.

.

.

(<slot2>(<facet 1><value1>……<valuekm>)

.

.

.)

An example of a simple frame for Ram is depicted in following figure

(ram

(PROFESSION (VALUE professor))

(AGE (VALUE 40))

(WIFE (VALUE sita))

(CHILDREN (VALUE love kush))

(ADDRESS (STREET (VALUE 7))

(CITY (VALUE audhya))

(STATE (VALUE uttarpradesh))

(ZIP (VALUE 124507))))

From the above figure it will be seen that a frame may have any number of slots, and a slot

may have any number of facets, each with any number of values. This gives a very general

framework from which to build a variety of knowledge structures.

The slots in a frame specify general or specific characteristics of the entity for

which the frame represents, and sometimes they include instructions on how to apply or use

the slot values. Typically, a slot contains information such as attribute value pairs, default

values, conditions for filling a slot, pointers to other related frames, and procedures that are

activated when needed for different purposes. Facets (subslots) describe some knowledge or

procedures about the attribute in the slot.

May takes many form such as:

a constraint value; for example, the slot 'age', would be constraint age

has to be an integer between 0 and 120

a default value; for example, unless there is contrary evidence it is

assumes that all people like sambal belacan

If-added Procedure: Executes when new information is placed in the

slots.

If-removed Procedure: Executes when information is deleted from the

slot.

If-needed Procedure: Executes when new information is needed from

the slot, but the slot is empty.

If-changed Procedure: Executes when information changes.

Procedural attachments are called demons. They are used to derive slot values.

Important aspects of procedural attachments are that they can be used to direct

reasoning process.

Taking another example

(ford

(AKO (VALUE car))

(COLOR (VALUE silver))

(MODEL (VALUE 4-door))

(GAS-MILELAGE (DEFALUT fget))

(RANGE (VALUE if-needed))

(WEIGHT (VALUE 2500))

(FUEL-CAPACITY (VALUE 18)))

The Ford frame illustrated in above figure has attribute-value slots (COLOR: silver, MODEL:

4-door, and the like), a slot; which takes default values for GAS-MILAGE, and a slot with an

attached if-needed procedure.

The value fget in the GAS-MILAGE slot is a function call to fetch a default value

from another frame such as the general car frame for which ford is a-kind-of (AKO). When

the value of this slot is evaluated, the fget function is activated. When fget finds no value for

gas mileage it recursively looks for a value from ancestor frame until a value is found.

The if-needed value in the Range slot is a procedure name that, when called,

computes the driving range of the Ford as a function of gas mileage and fuel capacity. Slots

with attached procedures such as fget and if-needed are called procedural attachments or

demons. They are done automatically when a value is needed but not provided for in a slot.

4.4 CONCEPTUAL DEPENDENCY Conceptual Dependency originally developed to represent knowledge acquired from natural

language input.

The goals of this theory are:

To help in the drawing of inference from sentences.

To be independent of the words used in the original input.

That is to say: For any two or more sentences that are identical in meaning there

should be only one representation of that meaning.

In CD theory different types of basic building blocks are distinguished. Each of these types,

in turn, has several subtypes. The types are made up of entities, actions, conceptual cases,

conceptual tenses.

ENTITIES

Picture producer (PP) are actors or physical objects (including human memory) that perform

different acts.

Picture aiders (PA) are supporting properties or attributes of producers.

ACTIONS

Primitive action (ACTs) as list below.

Action aiders (AA) are properties or attributes of primitive actions.

Examples of Primitive Acts are:

ATRANS

-- Transfer of an abstract relationship. e.g. give.

PTRANS

-- Transfer of the physical location of an object. e.g. go.

PROPEL

-- Application of a physical force to an object. e.g. push.

MTRANS

-- Transfer of mental information. e.g. tell.

MBUILD

-- Construct new information from old. e.g. decide.

SPEAK

-- Utter a sound. e.g. say.

ATTEND

-- Focus a sense on a stimulus. e.g. listen, watch.

MOVE

-- Movement of a body part by owner. e.g. punch, kick.

GRASP

-- Actor grasping an object. e.g. clutch.

INGEST

-- Actor ingesting an object. e.g. eat.

EXPEL

-- Actor getting rid of an object from body. e.g. ????.

CONCEPTUAL CASES (ALL ACTIONS INVOLVE ONE OR MORE OF THESE)

o

-- Objective Case

R

-- Recipient – Donor Case.

I

-- Instrumental Case e.g. eat with a spoon.

D

-- Destination or Directive Case e.g. going home.

CONCEPTAL TENSES (TIME OR ACTION OR STATE OF BEING)

Conditional (c)

Continuing (k)

Finished Transition (tf)

Future (f)

Interrogative (?)

Negative (/)

Past (p)

Present (nil)

Start Transition (ts)

Timeless (delta)

Transition (t)

CONCEPTUAL DEPENDENCIES

Sementic rules for the formation of dependency structures such as the relationship between

an actor and an event or between a primitive action and an instrument.

1. Bird flew.

p

PP Û ACT Bird Û PTRANS

2. Ram is a student.

PP Û PP Ram Û student

3. Shyam pushed the door.

o p o

ACT PP shyam Û PROPEL door

4. Joe gave sue a flower

PP p r Sue

ACT Joe Û ATRANS

PP Joe

o

flower

5. Joe ate some soup.

I i Joe

CT Joe Û INGEST

o do

soup

o

spoon

Advantages of CD:

Using these primitives involves fewer inference rules.

Many inference rules are already represented in CD structure.

The holes in the initial structure help to focus on the points still to be established.

Disadvantages of CD:

Knowledge must be decomposed into fairly low level primitives.

Impossible or difficult to find correct set of primitives.

A lot of inference may still be required.

Representations can be complex even for relatively simple actions. Consider:

Dave bet Frank five pounds that Wales would win the Rugby World Cup.

Complex representations require a lot of storage

Applications of CD:

MARGIE

(Meaning Analysis, Response Generation and Inference on English) -- model natural

language understanding.

SAM

(Script Applier Mechanism) -- Scripts to understand stories. See next section.

PAM

(Plan Applier Mechanism) -- Scripts to understand stories.

Schank et al. developed all of the above.

4.5 SCRIPTS

A script is a structure that prescribes a set of circumstances which could be expected to

follow on from one another.

It is similar to a thought sequence or a chain of situations which could be anticipated.

It could be considered to consist of a number of slots or frames but with more specialised

roles.

Scripts are beneficial because:

Events tend to occur in known runs or patterns.

Causal relationships between events exist.

Entry conditions exist which allow an event to take place

Prerequisites exist upon events taking place. E.g. when a student progresses through

a degree scheme or when a purchaser buys a house.

The components of a script include:

Entry Conditions

-- these must be satisfied before events in the script can occur.

Results

-- Conditions that will be true after events in script occur.

Props

-- Slots representing objects involved in events.

Roles

-- Persons involved in the events.

Track

-- Variations on the script. Different tracks may share components of the same script.

Scenes

-- The sequence of events that occur. Events are represented in conceptual

dependency form.

Scripts are useful in describing certain situations such as robbing a bank. This might involve:

Getting a gun.

Hold up a bank.

Escape with the money.

Here the Props might be

Gun, G.

Loot, L.

Bag, B

Get away car, C.

The Roles might be:

Robber, S.

Cashier, M.

Bank Manager, O.

Policeman, P.

The Entry Conditions might be:

S is poor.

S is destitute.

The Results might be:

S has more money.

O is angry.

M is in a state of shock.

P is shot.

There are 3 scenes: obtaining the gun, robbing the bank and the getaway.

Fig. Simplified Bank Robbing Script

Some additional points to note on Scripts:

Result:S has more money. O is angry. M is in a state of shock. P is shot.

If a particular script is to be applied it must be activated and the activating depends

on its significance.

If a topic is mentioned in passing then a pointer to that script could be held.

If the topic is important then the script should be opened.

The danger lies in having too many active scripts much as one might have too many

windows open on the screen or too many recursive calls in a program.

Provided events follow a known trail we can use scripts to represent the actions

involved and use them to answer detailed questions.

Different trails may be allowed for different outcomes of Scripts ( e.g. The bank

robbery goes wrong).

Advantages of Scripts:

Ability to predict events.

A single coherent interpretation may be build up from a collection of observations.

Disadvantages:

Less general than frames.

May not be suitable to represent all kinds of knowledge.

UNIT 5 PROBLEMS, PROBLEM SPACES, AND SEARCH

5.1 Search and Control Strategies

5.2 Preliminary Concepts

5.3 Water Container Problem

5.4 Production System


5.6 Means-end analysis

5.7 Problem Reduction

5.8 Uninformed or Blind Search

5.8.1 Breadth-First Search

5.8.2 Depth-First Search

5.9 Informed Search

5.9.1 Hill Climbing Methods

5.9.2 Best First Search

5.9.3 A* Algorithm

5.1 Search and Control StrategiesSearch is one of the operational tasks that characterize AI programs best. Almost every AI

program depends on a search procedure to perform its prescribed functions. Problems are

typically defined in terms of states, and solutions correspond to goal states. Solving a

problem then amounts to searching through the different states until one or more of the

goal states are found. In this chapter we investigate search techniques that will be referred

to often in subsequent chapters.

5.2 Preliminary conceptsProblem can be characterized as a space consisting of a set of states and a set of operators

that map from one state to another states. Three types of states may be distinguished one

or more initial states, a number of intermediate states, and one or more goal states. A

solution to a problem is a sequence of operators that map an initial state to a goal state. A

“best” or good solution is one that requires the fewest operations or the least cost to map

from an initial state to a goal state. The performance of a particular solution method is

judged by the amount of time and memory space required to complete the mapping. Thus, a

solution based on some algorithm A1 is consider better than one using algorithm A2 if the

time and space complexity of A1 is less than that of A2.

It is customary to represent a search space as a diagram of a directed graph or a tree. Each

node or vertex in the graph corresponds to a problem state, and arcs between nodes

correspond to transformations or mapping between the states. The immediate successor of

a node are referred to as children, siblings, or offspring, and predecessor nodes are

ancestors. An immediate ancestor to a node is a parent.

Search can be characterized as finding a path through a graph or tree structure. This

requires moving from node to node after successively expanding and generating connected

nodes. Node generation is accomplished by computing the identification or representation

code of children nodes from a parent node. Once this is done, a child is said to be generated

and the parent is said to be explored. The process of generating all of the children of a

parent is also known as expanding the node. A search procedure is a strategy for selecting

the order in which nodes are generated and a given path selected.

Search problem may be classified by the information used to carry out a given strategy. In

blind or uninformed search, no performance is given to the order of successor node

generation and selection. The path selected is blindly or mechanically followed. No

information is used to determine the preference of one child over another.

In informed or directed search, some information about the problem space is used to

compute a preference among the children for exploration and expansion. Before proceeding

with a comparison of strategies, we consider next some typical search problems.

5.3 WATER CONTAINER PROBLEM There is a 4l container and 3l container ; neither has any measuring markers on it. There is a

pump that can be used to fill the containers with water.Problem to solve is to get exactly two

liters of water in the 4l container.

SOLUTION

From initial state to goal state through appropriate sequence of moves or actions

such as filling and emptying the containers.

Content of the two containers at any given time is a problem state.

Let :

x - content of the 4l container

y - content of the 3l container

Then :

(x,y) - problem state represented by an ordered pair.

The set of all ordered pairs is the space of problem states or the state-space of the

problem .

State-space : { (x,y) | x = 0,1,2,3,4 y = 0,1,2,3 }

Data structure to represent the state-space can be :

o vectors

o sets

o arrays

o lists

etc...

Problem statement :

initial state (0,0)

goal state (2,y) where y = any possible number.

Moves transform from one state into another state.

Operators determine the moves.

Operators for the problem state-space :

1. Fill the 4l container

2. Fill the 3l container

3. Empty the 3l container

4. Empty the 3l container

5. Pour water from 3l container into 4l conatiner until 4l container is full

6. Pour water from 4l container into the 3l container until the 3l

container is full

7. Pour all the water from 3l container into the 4l container

8. Pour all the water from 4l container into the 3l container

Preconditions need to be staisfied before an operator can be applied.

EXAMPLE :

# 1 can be applied if there is less than 4l water in the container.

IF there is less than 4l in the 4l container THEN fill the 4l container.

Adding pre-conditions to operators => generation of production rules.

Forwarded form of rule # 1 :

IF (x,y| x?4) THEN (4,y)

The forwarded set of production rules :

R1 IF (x,y| x?4) THEN (4,y)

R2 IF (x,y| y?3) THEN (x,3)

R3 IF (x,y| x>0) THEN (0,y)

R4 IF (x,y| y>0) THEN (x,0)

R5 IF (x,y| x+y>=4 ? y>0 ? x?4) THEN (4,y-(4-x))

R6 IF (x,y| x+y>=3 ? x>0 ? y?3) THEN (x-(3-y),3)

R7 IF (x,y| x+y?=4 ? y>0) THEN (x+y,0)

R8 IF (x,y| x+y?=3 ? x>0) THEN (0,x+y)

In certain states, more than one rule can be applied.

EXAMPLE:

(4,0) satisfies the preconditions of R2,R3 ? R6

5.4 Production SystemSince search forms the core of many intelligent processes. It is useful to structure AI

programs in a way that facilitates describing and performing the search process. Production

systems provide such structures. A definition of a production system is given below. Do not

be confused by other uses of the word production, such as to describe what is done in

factories. A production system consists of :

A set of rules, each consisting of a left side (a pattern) that determines the

applicability of the rule and a right side that describes the operation to be performed

if the rule is applied. One or more knowledge/databases that contain whatever information is appropriate

for the particular task. Some parts of the database may be permanent, while other

parts of it may pertain only to the solution of the current problem. The information in

these databases may be structured in any appropriate way. A control strategy that specifies the order in which the rules will be compared to the

database and a way of resolving the conflicts that arise when several rules match at

once.

A rule applier.

So far, our definition of a production system has been very general. It encompasses a great

many systems, including water jug problem solver. It also encompasses a family of general

production system interpreters, including:

Basic production system languages, such as OPS5 and ACT. More complex, often hybrid systems called expert system shells, which provide

complete (relatively speaking) environments for the construction of knowledge base

expert systems. General problem-solving architectures like SOAR, a system based on a specific set of

cognitively motivated hypotheses about the nature of problem solving. All of these systems provide the overall architecture of a production system and allow the

programmer to write rules that define particular problems to be solved.


In order to choose the most appropriate method (or combination of methods) for a particular

problem, it is necessary to analyze the problem along several key dimensions:

Is the problem decomposable into a set of (nearly) independent smaller or easier

sub-problem?

Can solution steps be ignored or at least undone if they prove unwise?

Is the problem’s universe predictable? Is a good solution to the problem obvious without comparison to all other possible

solutions?

Is the desired solution a state of the world or a path to a state? Is a large amount of knowledge absolutely required to solve the problem, or is

knowledge important only to constrain the search? Can a computer that is simply given the problem return the solution, or will the

solution of the problem require interaction between the computer and a person?

5.6 Means-end analysis.The problem space of means-end analysis has as initial state and one or more goal states, a

set of operators Ok with given preconditions for their application, and a difference function

that computes the difference between two states Si and Sj. A problem is solved using

means-end analysis by

1. Comparing the current state Si to a goal state Sg and computing the difference Dig.

2. An operator Ok is then selected to reduce the difference Dig. 3. The operator Ok is applied if possible. If not, the current state is saved, a subgoal is

created and means-end analysis is applied recursively to reduce on the subgoal. 4. If the subgoal is solved, the saved state is restored and work is resumed on the

original problem.

Example

R & (~P à Q)

(~P à Q) & R

(~~P V Q) & R

(P V Q) & R

(Q V P) & R

As a simple example, we suppose General Problem Solver is given the initial prepositional

logic object Li = (R & (~P à Q)) and goal object Lg = ((Q V P) & R). To determine Lg from Li

requires a few simple transformations. The system first determines the difference between

the two expressions and then systematically reduces these differences until Lg is obtained

from Li or failure occurs. For example, a comparison of Li and Lg reveals the difference that

R is on the left in Li but on the right in Lg. This causes a subgoal to be set up to reduce this

difference. The subgoal, in turn, calls for an application of the reduction method, namely to

rewrite Li in the equivalent form L’I = ((~P à Q) & R). The rest of the solution process

follows the path indicated in the tree of above figure.

The Key Idea in Means-Ends Analysis is to Reduce Differences

The purpose of means-ends analysis is to identify a procedure that cause a transition from

the current state to the goal state, or at least to an intermediate state.

Here is the general Procedure

To perform means-ends analysis,

Until the goal is reached or no more procedures are available

Describe the current state, the goal state, and the difference between the two.

Use the difference between the current state and goal state, possibly with the

description of the current or goal state, to select a promising procedure.

Use the promising procedure and update the current state.

If an acceptable solution is found,announce it; otherwise,announce failure.

Difference-Procedure Tables Often Determines the Means

> The key idea in means-ends analysis is to reduce differences.

> Means-ends analysis is often mediated via difference-procedure tables.

The difference-procedure table determines what to do, leaving descriptions of the

current state and destination state with no purpose other than to specify the origin and

destination for the appropriate procedure.

5.7 Problem-Reduction

Sometimes, it is possible to convert difficult goals into one or more easier-to-achive

subgoals. Each subgoal, in turn, maybe divided still more finely into one or more lower-level

subgoals. The most typical example is in computer programming. Problem Reduction is

ubiquitous in Programming.Most real world programs consists of a collection of specialized

procedures. Each time one specified procedure calls another, it effects a problem-reduction

step.The key idea in Problem Reduction is to Explore a Goal Tree

A goal tree is a semantic tree in which nodes represent goals and branches indicate how you

can achive goals by sloving one or more subgoals.

A goal tree consists of,

And goals, all of which must be satisfied,

Or goals , one of which must be satisfied.

5.8 Uninformed or Blind SearchAs noted earlier, search problems can be classified by the amount of information that is

available to the search process. Such information might relate to the problem space as a

whole or to only some states. It may be available a priori or only after a node has been

expanded. In a worst case situation the only information available will be the ability to

distinguish goal from nongoal nodes. When no further information is known a priori, a search

program must perform a blind or uninformed search. A blind or uninformed search algorithm

is one that uses no information other than the initial state, the search operators, and a test

for a solution. A blind search should proceed in a systematic way by exploring nodes in some

predetermined order or simply by selecting nodes at random.

5.8.1 Breadth-First SearchBreadth-first searches are performed by exploring all nodes at a given depth before

proceeding to the next level. This means that all immediate children of nodes are explored

before any of the children’s children are considered. Breadth-first tree search is illustrated in

the figure given below. It has the obvious advantage of always finding a minimal path length

solution when one exist. However, a great many nodes may need to be explored before a

solution is found, especially if the tree is very full.

An algorithm for the breadth-first search is quite simple. It uses a queue structure

to hold all generated but sill unexplored nodes.

Algorithm

1. Place the starting node s on the queue.

2. If the queue is empty, return failure and stop.

Goel:Earn some money Goal: Buy TV setGoal:Steal TV set

Goel: Acquire TV set

3. If the first element on the queue is a goal node g, return success and stop.

Otherwise, 4. Remove and expand the first element from the queue and place all the children at

the end of the queue in any order.

5. Return to step 2. The time complexity of the breadth-first search is O(bd). The space complexity is also O(bd).

5.8.2 Depth-First SearchDepth-first searches are performed by diving downward into a tree as quickly as possible. It

does this by always generating a child node from the most recently expanded node, then

generating that child’s children, and so on until a goal is found or some cutoff depth point d

is reached. If a goal is not found when a leaf node is reached or at the cutoff point, the

program backtracks to the most recently expanded node and generates another of its

children. This process continues until a goal is found or failure occurs. Depth-first tree search

is illustrated in the figure given below.

Algorithm:


2. If the queue is empty, return failure and stop. 3. If the first element on the queue is a goal node g, return success and stop.

Otherwise, 4. Remove and expand the first-element, and place the children at the front of the

queue (in any order).

5. Return to step 2.

The depth-first search is preferred over the breadth-first when the search tree is known to

have a plentiful number of goals. Otherwise, depth-first may never find a solution.

The time complexity is the same as that for breadth-first, O(bd). It is less demanding in space

requirements, however, since only the path from the starting node to the current node

needs to be stored. Therefore, if the depth cutoff is d, the space complexity is just O(d)

5.9 Informed Search When more information than the initial state, the operators, and the goal test is available,

the size of the search space can usually be constrained. When this is the case, the better the

information available, the more efficient the search process will be. Such methods are

known as informed search methods. They often depend on the use of heuristic information

i.e. information about the problem (the nature of the states, the cost of transforming from

one state to another, the promise of taking a certain path, and the characteristics of the

goals) can sometimes be used to help guide the search more efficiently. This information

can often be expressed in the form of a heuristic evaluation function f(n,g), a function of the

nodes n and/or the goals g.

Recall that a heuristic is a rule of thumb or judgmental technique that leads to a

solution some of the time but provides no guarantee of success. Heuristic play an important

role in search strategies because of the exponential nature of most problems. They help to

reduce the number of alternatives from an exponential number to a polynomial number and,

thereby, obtain a solution in a tolerable amount of time.

Generally two categories of problem use heuristics.

1. Problem for which no exact algorithm are known and one needs to find an

approximate and satisfying solution. e.g., computer vision, speech recognition etc.; 2. Problem for which exact solutions are known, but computationally infeasible. e.g.,

chess etc. The following algorithm make use of heuristic evaluation functions.

5.9.1 Hill Climbing MethodsHill climbing is like depth-first searching where the most promising child is selected for

expansion. When the children have been generated, alternative choices are evaluated using

some type of heuristic function. The path that appears most promising is then chosen and

no further reference to the parent or other children is retained. This process continues from

node-to-node with previously expanded nodes being discarded.

In fact, there is practically no difference between hill-climbing and depth-first

search except that the children of the node that has been expanded are sorted by the

remaining distance.

The algorithm for hill-climbing is given as below

Step 1 : Put the initial node on a list START

Step 2 : If (START is empty) or (START = GOAL) terminate search

Step 3 : Remove the first node from START. Call this node a

Step 4 : If (a = GOAL) terminate search with success.

Step 5 : Else if node a has successor, generate all of them. Find out how far they are from

the goal node. Sort them by the remaining distance from the goal and add them

to the beginning of START.

Step 6 : Goto Step 2.

A typical path is illustrated in following figure.

0

21 18 28 24

16 19 21 23 25 23 16 19

9 11 22 20 0 25 25 25 0 0

Hill-climbing technique is being used in some activity or other in out day-to-day chore. Some

of them are:

1. While listening to somebody playing flute on the transistor, tone and volume control

are adjusted in a way that makes the music melodius.

2. While tuning the carburetor of a scooter, the accelerator is raised to its maximum

once and the carburetor is tuned so that the engine keeps on running for a

considerably long period of time. 3. An electronics experts, while making the transistor for the first time, tunes the radio

set at mid-afternoon when the signal is weak for proper reception.

Problem of Hill-Climbing Technique

1. Local maximum: A state that is better than al its neighbors but not so when

compared to states to states that are farther away.

Local Maximum

2. Plateau: A flat area of the search space, in which all neighbors have the same value.

Plateau

3. Ridge: Described as “a long and narrow stretch of elevated ground or a narrow

elevation or raised path running along or across a surface “ by the Oxford English

Dictionary, this is an area in the path which must be traversed very carefully because

movement in any direction might maintain one at the same level or result in fast

descent.

Ridge

In order to overcome these problems, adopt one of the following or a combination of the

following methods.

Backtracking for local maximum. Backtracking helps in undoing what has been done

so far and permits to try a totally different path to attain the global peak. A big jump is the solution to escape from the plateau. A huge jump is recommended

because in a plateau all neighboring point have the same value. Trying different paths at the same time is the solution for circumventing ridges.

The problem encountered with hill climbing can be avoided using a best first search

approach.

5.9.2 Best-First SearchBest-first search also depends on the use of a heuristic to select most promising paths to the

goal node. Unlike hill climbing, however, this algorithm retains all estimates computed for

previously generated nodes and makes its selection based on the best among them all.

Thus, at any point in the search process, Best-first search moves forward from the most

promising of all nodes generated so far. In so doing, it avoids the potential traps

encountered in hill climbing. Then best-first process is illustrated in given figure

9 D

8 E

1 K

3 A

12 F

S 6 B f L Goal Node

4 G 5 I

5 C 2 M

6

7 H J

First, the start node S is expanded. It has three children A,B and C with values 3,6 and 5

respectively. These values approximately indicate how far they are from the goal node. The

child with minimum value namely A is chosen. The children of A are generated. They are D

and E with values 9 and 8. The search process has now four nodes to search for. i.e., node D

with value 9, node E with value 8, node B with value 6 and node C with value 5. Of them,

node C has got the minimal value which is expanded to give node H with value 7. At this

point, the nodes available for search are (D : 9), (E : 8), (B : 6) and (H : 7) where (a : b)

indicates that (a) is the node and b is its evaluation value. Of these, B is minimal and hence

B is expanded to give (F : 12), (G : 14).

At this juncture, the nodes available for search are (D : 9), (E : 8), (H : 7), (F : 12) and

(G : 14) out of which (H : 7) is minimal and is expanded to give (I : 5), (J : 6).

The entire steps of the search process are given in table below.

As you can see, best-first search “jumps all around” in the search graph to identify the node

with minimal evaluation function value.

The algorithm is given below


2. If the queue is empty, return failure and stop. 3. if the first element on the queue is a goal node g, return success and stop.

Otherwise, 4. Remove the first element from the queue, expand it and compute the estimated goal

distances for each child. Place the children on the queue (at either end) and arrange

all queue elements in ascending order corresponding to goal distance from the front

of the queue.

5. Return to step 2.

5.9.3 A* Algorithm

Step # Node being Children Available Nodes Expanded nodes Chosen

1 S (A : 3), (B : 6), (C : 5) (A : 3), (B : 6), (C : 5) (A : 3)2 A (D : 9), (E : 8) (B : 6), (C : 5), (D : 9), (E : 8) (C : 5 )3 C (H : 7) (B : 6), (D : 9), (E : 8), (H : 7) (B : 6)4 B (F : 12), (G : 14) (D : 9), (E : 8), (H : 7), (H : 7)

(F : 12), (G : 14)5 H (I : 5), (J : 6) (D : 9), (E : 8), (F : 12), (I : 5)

(G : 14), (I : 5), [J : 6)6 I (K : 1), (L : 0), (M : 2) (D : 9), (E : 8), (F : 12), Search

(G : 14), [J : 6), (K : 1), stops as

In best-first search, we brought in a heuristic value called evaluation function value. It is a

value that estimates how far a particular node is from the goal. Apart from the evaluation

function values, one can also bring in cost functions. Cost functions indicate how much

resources like time, energy, money etc. have been spent in reaching a particular node from

the start. While evaluation function values deal with the future, cost function values deal

with the past. Since cost function values are really expended, they are more concrete than

evaluation function values.

If it is possible for one to obtain the evaluation function values and the cost function values,

then A* algorithm can be used. The basic principle is that sum the cost and evaluation

function values for a state to get its “goodness” worth and use this as a yardstick instead of

the evaluation function value in best-first search. The sum of the evaluation function value

and the cost along the path leading to that state is called fitness number.

Consider the following figure again with the same evaluation function values. Now associate

with each node are three number, the evaluation function value, the cost function value and

the fitness number.

The fitness number, as stated earlier, is the total of the evaluation function value and the

cost-function value. For example, consider node K, the fitness number is 20, which is

obtained as follows:

(Evaluation function of K) +

(Cost function involved from start node S to node K)

= I + (Cost function from S to C + Cost function from C to H + Cost function from H to I +

Cost function from I to K)

= 1 + 6 + 5 + 7 + 1 = 20.

While best-first search uses the evaluation function value only for expanding the best node,

D 9 14

2 E 8 13

3 K 1 20

6 A 2

3 4 F 12 18 1

8 20

S 2 6 B 3 5 2 f L Goal Node

G 19 I

6 11 14 23 2

C 7 M 2 21

5 5

18

H 6

7 6 J

23

The Algorithm for A* as follows

1. Put the initial node on a list START

2. If (START is empty) or (START = GOAL) terminate search

3. Remove the first node from START. Call this node a

4. if (a = GOAL) terminate search with success. 5. Else if node a has successor, generate all of them. Estimate the fitness number of the

successors by totaling the evaluation function value and the cost function value. Sort

the list by fitness number.

6. Name the new list as START 1

7. Replace START with START 1

8. Goto Step 2.

UNIT 6 LEARNING



6.2.1 Rote Learning

6.2.2 Learning by Taking Advice

6.2.3 Learning by Problem Solving

6.2.4 Inductive Learning

6.2.5 Explanation Based Learning


Learning is a an area of AI that focusses on processes of self-improvement. Information

processes that improve their performance or enlarge their knowledge bases are said to

learn.

Intelligence implies that an organism or machine must be able to adapt to new

situations.

It must be able to learn to do new things.

This requires knowledge acquisition, inference, updating/refinement of knowledge

base, acquisition of heuristics, applying faster searches, etc. A simple model of learning systems:

Simon [1983] has proposed that learning denotes

...changes in the system that are adaptive in the sense that they enable the system to do

the same task or tasks drawn from the same population more efficiently and more

effectively the next time.

As thus defined, learning covers a wide range of phenomena. At the end of the spectrum is

skill refinement (i.e. people get better at many tasks simply by practicing). At the other end

of the spectrum lies knowledge aquisition. Many AI programs draw heavily on knowledge as

their source of power.

How can we learn?

Many approaches have been taken to attempt to provide a machine with learning

capabilities. This is because learning tasks cover a wide range of phenomena.

Listed below are a few examples of how one may learn. We will look at these in detail shortly

Skill refinement

-- one can learn by practicing, e.g playing the piano.

Knowledge acquisition

-- one can learn by experience and by storing the experience in a knowledge base.

One basic example of this type is rote learning.

Taking advice

-- Similar to rote learning although the knowledge that is input may need to be

transformed (or operationalised) in order to be used effectively.

Problem Solving

-- if we solve a problem one may learn from this experience. The next time we see a

similar problem we can solve it more efficiently. This does not usually involve

gathering new knowledge but may involve reorganisation of data or remembering

how to achieve to solution.

Induction

-- One can learn from examples. Humans often classify things in the world without

knowing explicit rules. Usually involves a teacher or trainer to aid the classification.

Discovery

-- Here one learns knowledge without the aid of a teacher.

Analogy

-- If a system can recognise similarities in information already stored then it may be

able to transfer some knowledge to improve to solution of the task in hand.


6.2.1 Rote LearningRote Learning is basically memorisation.

Saving knowledge so it can be used again.

Retrieval is the only problem.

No repeated computation, inference or query is necessary.

A simple example of rote learning is caching

Store computed values (or large piece of data)

Recall this information when required by computation.

Significant time savings can be achieved.

Many AI programs (as well as more general ones) have used caching very effectively.

Memorisation is a key necessity for learning:

It is a basic necessity for any intelligent program -- is it a separate learning process?

Memorisation can be a complex subject -- how best to store knowledge?

6.2.2 Learning by Taking AdviceThe idea of advice taking in AI based learning was proposed as early as 1958 (McCarthy).

However very few attempts were made in creating such systems until the late 1970s. Expert

systems providing a major impetus in this area.

There are two basic approaches to advice taking:

Take high level, abstract advice and convert it into rules that can guide performance

elements of the system. Automate all aspects of advice taking

Develop sophisticated tools such as knowledge base editors and debugging. These

are used to aid an expert to translate his expertise into detailed rules. Here the

expert is an integral part of the learning system. Such tools are important in expert

systems area of AI.

Automated Advice Taking

The following steps summarise this method:

Request

-- This can be simple question asking about general advice or more complicated by

identifying shortcomings in the knowledge base and asking for a remedy.

Interpret

-- Translate the advice into an internal representation.

Operationalise

-- Translated advice may still not be usable so this stage seeks to provide a

representation that can be used by the performance element.

Integrate

When knowledge is added to the knowledge base care must be taken so that bad side-

effects are avoided.

E.g. Introduction of redundancy and contradictions.

Evaluate

-- The system must assess the new knowledge for errors, contradictions etc.

The steps can be iterated.

6.2.3 Learning by Problem SolvingThere are three basic methods in which a system can learn from its own experiences.

Learning by Parameter Adjustment

Learning by Macro Operators

Learning by Chunking

Learning by Parameter Adjustment

Many programs rely on an evaluation procedure to summarise the state of search etc. Game

playing programs provide many examples of this.

However, many programs have a static evaluation function.

In learning a slight modification of the formulation of the evaluation of the problem is

required.

Here the problem has an evaluation function that is represented as a polynomial of the form

such as:

c1t1 + c2t2 + c3t3 + …

The t terms a values of features and the c terms are weights.

In designing programs it is often difficult to decide on the exact value to give each weight

initially.

So the basic idea of idea of parameter adjustment is to:

Start with some estimate of the correct weight settings.

Modify the weight in the program on the basis of accumulated experiences.

Features that appear to be good predictors will have their weights increased and bad

ones will be decreased.

Learning by Macro Operators

The basic idea here is similar to Rote Learning:

Avoid expensive recomputation

Macro-operators can be used to group a whole series of actions into one.

For example: Making dinner can be described a lay the table, cook dinner, serve dinner. We

could treat laying the table as on action even though it involves a sequence of actions.

The STRIPS problem-solving employed macro-operators in it's learning phase.

Consider a blocks world example in which ON(C,B) and ON(A,TABLE) are true.

STRIPS can achieve ON(A,B) in four steps:

UNSTACK(C,B), PUTDOWN(C), PICKUP(A), STACK(A,B)

STRIPS now builds a macro-operator MACROP with preconditions ON(C,B), ON(A,TABLE),

postconditions ON(A,B), ON(C,TABLE) and the four steps as its body.

MACROP can now be used in future operation.

But it is not very general. The above can be easily generalised with variables used in place

of the blocks.

Learning by Chunking

Chunking involves similar ideas to Macro Operators and originates from psychological ideas

on memory and problem solving.

The computational basis is in production systems (studied earlier).

SOAR is a system that use production rules to represent its knowledge. It also employs

chunking to learn from experience.

Basic Outline of SOAR's Method

SOAR solves problems it fires productions these are stored in long term memory.

Some firings turn out to be more useful than others.

When SOAR detects are useful sequence of firings, it creates chunks.

A chunk is essentially a large production that does the work of an entire sequence of

smaller ones.

Chunks may be generalised before storing.

6.2.4 Inductive LearningThis involves the process of learning by example -- where a system tries to induce a general

rule from a set of observed instances.

This involves classification -- assigning, to a particular input, the name of a class to which it

belongs. Classification is important to many problem solving tasks.

A learning system has to be capable of evolving its own class descriptions:

Initial class definitions may not be adequate.

The world may not be well understood or rapidly changing.

The task of constructing class definitions is called induction or concept learning

6.2.5 Explanation Based Learning (EBL)Humans appear to learn quite a lot from one example.

Basic idea: Use results from one examples problem solving effort next time around.

An EBL accepts 4 kinds of input:

A training example

-- what the learning sees in the world.

A goal concept

-- a high level description of what the program is supposed to learn.

A operational criterion

-- a description of which concepts are usable.

A domain theory

-- a set of rules that describe relationships between objects and actions in a domain.

From this EBL computes a generalisation of the training example that is sufficient not only to

describe the goal concept but also satisfies the operational criterion.

This has two steps:

Explanation

-- the domain theory is used to prune away all unimportant aspects of the training

example with respect to the goal concept.

Generalisation

-- the explanation is generalised as far possible while still describing

the goal concept.

UNIT 7 EXPERT SYSTEM

7.1 What is Expert System?

7.2 Expert System Application Area

7.3 Expert System Structure

7.4 Expert System Characteristics

7.5 Conventional Vs Expert Systems

7.6 Participants in Expert Systems Development

7.7 Tools For Development of Expert System

7.8 MYSIN

7.1 WHAT IS EXPERT SYSTEM?Who is an Expert?

An expert is a person who has expertise and knowledge in a certain area.

Through experience, the expert develops skills that enable him/her to effectively and

efficiently solve problem (called heuristics).

Definition of Expert Systems:

Prof. Edward Feigenbaum of Stanford University, leading researchers in ES has produced the

following definition:

" . . . An intelligent computer program that uses knowledge and inference procedures to

solve problems that are difficult enough to require significant human expertise for their

solution."

Simply, expert system is a computer program designed to model the problem-solving ability

of human expert.

IMPORTANT POINT!

In the process of emulating the behavior of a human expert, an expert system must be able

to supply users with the same services and facilities that the human expert does.

WHY BUILD EXPERT SYSTEMS

The answer to this question is by comparing an expert system with a human expert as

illustrated in the following figure.

Factor Human Expert Expert System

Time Availability Workday Always

Geographic Local Anywhere

Safety Irreplaceable Replaceable

Perishable Yes No

Performance Variable Consistent

Speed Variable Consistent

Cost High Affordable

We build expert systems for 2 reasons:

1. to replace an expert

2. to assist an expert

Reasons for replacing an expert:

Make available expertise after hours or in other locations

Automate a routine task requiring an expert.

Expert is retiring or leaving

Expert is expensive

Reasons for assisting an expert:

Aiding expert in some routine task to improve productivity

Aiding expert in some difficult task to effectively manage the complexities

Making available to the expert information that is difficult to recall

7.2 EXPERT SYSTEM APPLICATION AREA Major Application Areas: Agriculture, business, chemistry, communications, computer

systems, education, information management, law, military etc.

Types of problem solved by expert systems:

Problem-solving Paradigm Description

Control Governing system behavior, to meet specifications

Design Configuring objects under constraint

Diagnosis Inferring system malfunctions from observable

Instruction Diagnosing, debugging and repairing student

behavior

Interpretation Inferring situation description from data

Monitoring Comparing observations to expectation

Planning Design actions

Prediction Inferring likely consequences of given situation

Selection Identifying best choice from a list of possibilities

Prescription Recommending solution to system malfunction

Current Applications

1. Financial applications

The type of system which are in common use include:

a. System that aid bank managers when they are deciding whether not to grant a loan

to a particular customer

b. System that give advice as to whether to grant a mortgage or not

c. System which advice insurance companies as to the risk factor involved in insuring a

particular individual or item

d. System which are used by credit card companies to help them decide whether or not

to issue an individual with a credit card

e. Systems, which have been devised to recognize and guard against computer fraud.

2. Industry, manufacturing and military

The type of system which are in common use include:

a. Systems that are capable of diagnosing various industrial faults, such as faults in

aircraft, gas turbines and helicopters

b. Systems that are used to design and make small mechanical parts

System that identifies targets and potential threats to security.

7.3 EXPERT SYSTEM STRUCTURE Human Expert Problem solving

Expert System problem solving

1. The Knowledge Base (LTM)

The key bottleneck in developing an expert system.

Contain everything necessary for understanding, formulating and solving a problem.

It contains facts and heuristics.

The most popular approach to representing domain knowledge is using production rules.

Rule 1

IF car won't start

THEN problem in electrical system

Rule 2

IF problem in electrical system

AND battery voltage is below 10 volts

THEN bad battery

2. Working Memory (STM)

Contains facts about a problem that are discovered during consultation with the expert

system.

System matches this information with knowledge contained in the knowledge base to infer

new facts.

The conclusion reach will enter the working memory.

Thus, working memory contain information either supplied by user or infer by the system.

3. Inference Engine

Knowledge processor which is modeled after the expert reasoning power.

Processor in an expert system that matches the facts contained in the working memory with

the domain knowledge contained in the knowledge base, to draw conclusion about the

problems.

It taps the knowledge base and working memory to derive new information and solve

problems

4. The User Interface

The user communicates with the expert system through the user interface.

It allows the user to query the system, supply information and receive advice.

The aims are to provide the same form of communication facilities provided by the expert.

But normally has less capability of understanding natural language and general knowledge.

Example of inference process:

Consultation with a Mr. Mech (ES). Based on previous rules.

MM: The car cannot start?

User: YES

User assert: The car cannot start.

MM assert: Problem in electrical system.

MM: Is the battery voltage 10 volts?

User: YES

User assert: The battery voltage 10 volts.

MM assert: Bad battery.

The Explanation Facility

A trademark of expert systems: ability to explain their reasoning.

An additional component of expert system.

ES can provide explanation on:

1. WHY it is asking the question

2. HOW it reached some conclusion.

Explaining WHY

Example:

MM: Will the car not start?

Person: WHY

MM: If I know that the car won't start then I usually assume the problem is the electrical

system.

Expert responds with what they might conclude from the answer.

ES respond to a WHY query by displaying the rule it is currently pursuing.

Explaining HOW

Besides giving final results, expert system can explain how it arrived at a result.

Conventional programming works on well-defined problem but not for expert system.

Therefore, needs justification on the system findings.

Example:

MM: The battery is bad

Person : HOW

MM: Since your car won't start, I assumed there was a problem with the electrical system.

Once I found the battery voltage was below 10 volts, I knew the battery was bad.

ES respond by tracing back through the rules that fire the conclusion.

This tracing is a map of the system line of reasoning.

7.4 EXPERT SYSTEM CHARACTERISTICSAn expert system is usually designed to have the following general characteristics:

1. High-level Expertise

The most useful characteristic of an expert system.

This expertise can represent the best thinking of top experts in the field, leading to problem

solutions that are imaginative, accurate, and efficient.

2. Adequate Response Time

The system must also perform in a reasonable amount of time, comparable to or better than

the time required by an expert to solve a problem.

3. Permits Inexact Reasoning

These types of applications are characterized by information that is uncertain, ambiguous, or

unavailable and by domain knowledge that is inherently inexact.

4. Good Reliability

The system must be reliable and not prone to crashes because it will not be used

5. Comprehensibility

The system should be able to explain the steps of its reasoning while executing so that it is

understandable.

The systems should have an explanation capability in the same way that human experts are

suppose to be able to explain their reasoning.

6. Flexibility

Because of the large amount of knowledge that an expert system may have, it is important

to have an efficient mechanism for modifying the knowledge base.

7. Symbolic Reasoning

Expert systems represent knowledge symbolically as sets of symbols that stand for problems

concepts.

These symbols can be combined to express relationship between them. When these

relationship are represented in a program they are called symbol structures.

For example :-

Assert: Ahmad has a fever

Rule: IF person has fever THEN take panadol

Conclusion: Ahmad takes panadol

8. Reasons Heuristically

Experts are adapt at drawing on their experiences to help them efficiently solved some

current problem.

Typical heuristics used by experts:

I always check the electrical first People rarely get a cold during the summer

If I suspect cancer, then I always check the family history

9. Makes Mistakes

Expert systems can make mistakes.

Since the knowledge of expert have to be captured as close as possible in expert system,

like its human counterpart, it can make mistakes.

10. Thrives on Reasonable Complexity

The problem should be reasonably complex, not too easy or too difficult.

11. Focuses Expertise

Most experts are skillful at solving problems within their narrow area of expertise, but have

limited ability outside this area.

7.5 CONVENTIONAL VS EXPERT SYSTEMSConventional Systems Expert Systems

Knowledge and processing are

combined in one sequential

program

Knowledge base is clearly

separated from the processing

(inference) mechanism

(knowledge rules are

separated from the control)

Programs do not make mistakes (only

programmers do)

Program may make mistakes.

Do not usually explain why input data are

needed or how conclusions were drawn

Explanation is a part of most expert systems

The system operates only when it is

completed

The system can operate with only a few rules

(as a first prototype)

Execution is done on a step-by-step

(algorithmic) basis

Execution is done by using heuristics and

logic

Needs complete information to operate

Can operate with incomplete or uncertain

information

Effective manipulation of large databases Effective manipulation of large knowledge

bases

Representation and use of data

Representation and use of knowledge

Efficiency is a major goal

Effectiveness is a major goal

Easily deals with quantitative data

Easily deals with qualitative data

7.6 PARTICIPANTS IN EXPERT SYSTEMS DEVELOPMENT The main participants in the process of building an expert system are:

1. the domain expert

2. the knowledge engineer

3. the user

The Domain Expert

Is a person who has the special knowledge, judgment, experience, skills and methods, to

give advice and solve problems in a manner superior to others.

Although an expert system usually models one or more experts, it may also contain

expertise from other sources such as books and journal articles.

Qualification needed by the Domain Expert:

Has expert knowledge

Has efficient problem-solving skills

Can communicate the knowledge

Can devote time

Must be cooperative

The Knowledge Engineer

A person who designs, builds and tests an expert systems.

Qualifications needed by Knowledge Engineer:

Has knowledge engineering (art of building expert system)

Has good communications skills

Can match problems to software

has expert system programming skills

The User

Is a person who uses the expert system once it is developed.

Can aid in knowledge acquisition (giving broad understanding of the problems)

Can aid in system development

7.7 TOOLS FOR DEVELOPMENT Of EXPERT SYSTEMSExpert system developer can choose between three different approaches in building an

expert system :-

1. use a programming languages (usually an AI language)

2. use a shell

3. use an AI environment (or toolkit)

1. Languages

Expert systems may be written in symbolic languages, such as LISP, or PROLOG or in

conventional high level languages such as FORTRAN, C and PASCAL

LISP

All the large early expert systems were developed in LISP (List Processing) or a tool written

in LISP.

LISP deals with symbols.

PROLOG

Research on logic programming culminated in the seventies with the invention of the

PROLOG language.

Means Programming in Logic.

A PROLOG program can be thought of as a database of facts and rules.

2. Expert System Shells

A shell is a program that can be used to build expert systems.

An expert system shell performs three major functions:

1. Assists in building the knowledge base by allowing the developer to insert

knowledge into knowledge representation structures

2. Provides methods of inference or deduction that reason on the basis of

information in the knowledge base and new facts input by the user

3. Provides an interface that allows the user to set up reasoning task and query the

system about its reasoning strategy

3. AI Environments or Toolkits

More expensive and powerful than either languages or shells.

Advantage of using toolkits:

They provide a variety of knowledge representation techniques such as rules and frames

(inheritance)

The following is the actual figures for the different development tools use by expert system

builders in the UK:

Conventional Languages 11%

AI languages 23%

Expert system Shells 56%

Toolkits 11%

7.8 MYCIN : AN ES FOR THE TREATMENT AND DIAGNOSIS OF MENINGITIS

AND BACTERIMIA INFECTIONSDeveloped at Stanford University in the mid 1970's The first large expert system that

perform at the level of human expert. Use as benchmark by expert system developers.

Provide consultative advise about bacteremia and meningitis. Bacteremia is an infection that

involves bacteria in the blood. Meningitis is an infections that inflammations of the

membranes that envelop the brain and spinal cord. Can be fatal, thus need quick response,

but positive identification normally takes 24 - 48 hours. Normally doctors must begin

treatment in the absence of lab results. Very complex and doctor need advice of an expert

of bacteremia and meningitis.

How MYCIN works?

MYCIN begin by initiating a dialogue.

Physician response to the questions

MYCIN provides diagnosis and prescription

How MYCIN reasons?

Laboratory results of body fluid analyses

Symptoms that patient is displaying

Characteristics of the patient such as age, sex etc.

MYCIN consultative proceeds in 2 phases:

1. Diagnosis is made to identify the most likely infection organisms

2. Prescribe one or more drugs (antibiotics)

A consultation With MYCIN

1. Please enter information about the patient.

Name Age Sex Race

Sally 42 years Female Melayu

2. Are there any positive cultures obtained from SALLY?

No

3. Are there any pending cultures of SALLY?

Yes

4. Site: date collected: stain examined: method

CSF 3-Jun-75 Yes N/A

5. Stain of smear:

(type NONE if no organism was seen on the smear)

None

Unable to reach conclusion from lab test, like physician continue asking questions.

6. Has Sally recently had symptoms of persistent headache or other abnormal neurologic

symptoms(dizziness, lethargy etc.)

Yes

7. Has SALLY recently had objective evidence of abnormal neurologic signs documented

by

physician observation of examination?

Yes

Questions 6 and 7 traditional approach to diagnose by symptoms.

Conclude from the questions by giving first conclusion.

The CSF cultures will be considered to be associated with MENINGITIS.

The following questions will identify particular organism that cause the infections.

8. Is the meningitis a hospital-acquired infection?

Yes

9. Is SALLY clinical history consistent with EPIGLOTTITIS?

No

10. Is Sally clinical history consistent with OTITIS-MEDIA?

No

11. Has the patient ever been to an area that is endemic for coccidiomycoses?

Unknown

12. Is Sally a burn patient?

Yes (Why)

The type of infection may be bacterial .

How?

List of rules that conclude the infection may be bacteria will be listed.

The final phase, prescription of antibiotics.

13. Does SALLY have clinically significant allergic reaction to any antimicrobial agent?

No

14. Is Sally pregnant of breast-feeding?

No

15. Do you have reason to suspect that SALLY may have impaired renal functions?

No

16. SALLY weight in kilograms?

51.4

My preferred therapy recommendation is as follows:

Give the following combination:

1. ETHAMBUTAL

Dose: 1.28g (13.0 100mg tablets) q24h PO for 60 days

then 770 mg (7.5 100 mg tablets) q24h PO.

Comments: periodic vision screening tests are recommended for

optic neuritis.

2. INH

Dose: 513 mg (5.0 100mg-tablets) q24h PO

3. RIFAMPIN

Dose: 600 mg PO q24h

Comments: Administer dose on empty stomach.

HOW GOOD AN EXPERT IS MYCIN

In one complex evaluation, 8 independent evaluators evaluate MYCIN with 9 others human

diagnostician for 10 difficult cases of meningitis.

The task used was the selection of drugs for cases of meningitis before causative agents had

been identified.

Two phases of the evaluation:

1. MCYIN and 9 human experts evaluate 10 cases

2. Each of them prescribe medications

Two evaluative criteria was used to see whether the prescriptions:

1. Would be effective against the actual bacteria after it was finally identified.

2. Adequately covered for other possible bacteria while avoiding over prescribing.

Result:

Criteria 1: MYCIN and 3 other humans’ expert consistently prescribe therapy that would

have been effective for all 10 cases.

Criteria 2: MYCIN received higher ratings. 65% correct in all the cases whereas human

expert 42.5% to 62.5%.

MYCIN strengths is based on 4 factors:

1. MYCIN's knowledge base is extremely detail because acquired from the best human

practitioners.

2. MYCIN do not overlook anything or forget any details. It considers every possibility.

3. MYCIN never jumps to conclusions of fails to ask for key pieces of information.

4. MYCIN is maintained at a major medical center and consequently, completely

current.

MYCIN represents 50 man-years of effort.

UNIT 8 MATCHING AND REASONING

8.1 Fuzzy Logic

8.1.1 What is Fuzziness?

8.1.2 Current Application of Fuzzy Logic

8.1.3 Overview of Fuzzy Logic

8.1.4 Fuzzy Sets

8.1.5 Hedges

8.1.6 Fuzzy Set Operations

8.1.7 Fuzzy Inference

8.2 Memory Organisation

8.3 Neural Networks and Parallel Computation

8.3.1 Neural Network Architectures

8.4 Genetic Algorithm

8.5 Matching

8.5.1 Variable Matching

8.1 FUZZY LOGICIn our daily life situations, we always experience a large number of attributes which are not

precise. But human brain always processes such imprecise terms. If a doctor asks a patient

“how are you” ; patient replies “almost OK”. Here the media of communication for

information passing is not mathematics but something else which are not well modeled in

any way. The hedge “almost” is a vague term. But, interestingly doctor processed this

information in his brain and took some action which could be in the form like:

i) Doctor smiles;

ii) Doctor expressed his satisfaction by some movement of his eyes or face;

iii) Doctor thinks of further treatment, if any etc., etc.

Such type of data processing or information processing are always done by the word’s

biggest computer “human brain”. To make this processing faster than human brain, we need

to make use of machine (computer). Machine has got no intuition, no intelligence but circuits

and devices. It can be fed with such data, which it can process. Thus it can be make

intelligent artificially; it can be make expert in any area say: medical diagnosis, chess game,

washing a cloth, robot movement, etc. artificially. For this reason, a mathematical modeling

of vague concepts or vague knowledge or imprecise data or ill-defined information is

necessary. An important tool, probably one of the most important tools, to do so for such

information processing is fuzzy set theory. In this unit a good discussion is done on fuzzy

sets and its basic operations. Applications of fuzzy sets in artificial intelligence, experts

system etc. you will learn at a later stage.

8.1.1 What is FUZZINESS?According to OXFORD DICTIONARY FUZZY means Blurred, Fluffy, Frayed or Indistinct

Fuzziness is deterministic uncertainty Fuzziness is concerned with the degree to which

events occur rather than the likelihood of their occurrence (probability)

For example:The degree to which a person is young is a fuzzy event rather than a random event.

Suppose you have been in a desert for a week without a drink and you came upon a bottle A

and B, marked with the following information:

P (A belongs to a set of drinkable liquid) = 0.9

µB in fuzzy set of drinkable liquid = 0.9

Which one would you choose?

Some unrealistic and realistic quotes:

Q: How was the weather like yesterday in San Francisco?

A1: Oh! The temperature was -5.5 degrees centigrade

A2: Oh! It was really cold.Experts rely on common sense to solve problem.

This type of knowledge exposed when expert describe problem with vague terms.

Example of vague terms:

When it is really/quite hot ...

If a person is very tall he is suitable for ...

Only very small person can enter into that hole

I am quite young

Mr. Azizi drive his car moderately fast

How can we represent and reason with vague terms in a computer?

Use FUZZY LOGIC!!

8.1.2 Current Applications of Fuzzy LogicSome examples of how Fuzzy Logic has been applied in reality:

Camera aiming for telecast of sporting events

Expert system for assessment of stock exchange activities

Efficient and stable control of car-engines

Cruise control for automobiles

Medicine technology: cancer diagnosis

Recognition of hand-written symbols with pocket computers

Automatic motor-control for vacuum cleaners

Back light control for camcorders

Single button control for washing machines

Flight aids for helicopters

Controlling of subway systems in order to improve driving comfort, precision halting and

power economy

Improved fuel-consumption for automobiles

Expert systems also utilized fuzzy logic since the domain is often inherently fuzzy. Some

examples:

decision support systems

financial planners

diagnosing systems for determining soybean pathology

a meteorological expert system in China for determining areas in which to establish rubber

tree orchards

8.1.3 OVERVIEW OF FUZZY LOGICStudy mathematical representation of fuzzy terms such as old, tall, heavy etc.

This term don’t have truth representation. i.e. truth or false [0,1]

But, have extended truth-values to all real numbers in the range of values 0 to 1.

This real numbers are used to represent the possibility that a given statement is true or

false. (Possibility Theory)

Eg. The possibility that a person 6ft tall is really tall is set to 0.9 i.e. (0.9) signify that it is

very likely that the person is tall.

Zadeh (1965) extended the work and brought a collection of valuables concepts for working

with fuzzy terms called Fuzzy Logic.

Definition of Fuzzy LogicA branch of logic that uses degrees of membership in sets rather that a strict true/false

membership

Linguistic VariablesFuzzy terms are called linguistic variables. (or fuzzy variables)

Definition of Linguistic VariableTerm used in our natural language to describe some concept that usually has vague or fuzzy

values

Example of Linguistic Variables With Typical Values

Linguistic Variable Typical Values

Temperature hot, cold

Height short, medium, tall

Weight light, heavy

Speed slow, creeping, fast

Fuzzy rules in expert systems:R1: IF Speed is slow

THEN make the acceleration high

R2: IF Temperature is low

AND pressure is medium

THEN Make the speed very slow

IF the water is very hot

THEN add plenty of cold water

Fact: The water is moderately hot

Conclusion: Add a little cold water

Possible numerical values of linguistic variables is called UNIVERSE OF DISCOURSE.

Example:

The Universe of Discourse for the linguistic variable speed in R1 is in the range

[0,100mph].

Thus, the phrase "speed is slow" occupies a section of the variable’s Universe of Discourse. -

It is a fuzzy set. (slow)

8.1.4 Fuzzy SetsTraditional set theory views world as black and white.

Example like set of young people i.e. children.

A person is either a member or non-member. Member is given value 1 and non-member 0;

called Crisp set.

Whereas, Fuzzy Logic interpret young people reasonably using fuzzy set.

HOW?

By assigning membership values between 0 and 1.

Example:Consider young people (age <= 10).

If person age is 5 assign membership value 0.9

if 13, a value of 0.1

Age = linguistic variable

young = one of it fuzzy sets

Other fuzzy sets, old and middle age.

Definition: Fuzzy SetsLet X be the universe of discourse, with elements of X denoted as x. A fuzzy set A is

characterized by a membership mA(x) that associates each element x with degree of

membership value in A.

Probability theory relies on assigning probabilities to given event, whereas Fuzzy logic relies

on assigning values to given event x using membership function:

A(x): X [0,1]

This value represent the degree (possibility) to which element x belongs to fuzzy set A.

A(x) = Degree (x A)

Membership values is bounded by:

0 A(x) 1

FORMING FUZZY SETSHow to represent fuzzy set in computer??

Need to define its membership function.

One approach is:

Make a poll to a group of people is ask them of the fuzzy term that we want to represent.

For example: The term tall person.

What height of a given person is consider tall?

Need to average out the results and use this function to associate membership value to a

given individual height.

Can use the same method for other height description such as short or medium.

Multiple fuzzy sets on the same universe of discourse are refers to as Fuzzy Subsets.

Thus, membership value of a given object will be assigned to each set. (refer to fig. 2)

Individual with height 5.5 is a medium person with membership value 1.

At the same time member of short and tall person with membership value 0.25.

Single object is a partial member of multiple sets.

FUZZY SET REPRESENTATIONHow do we represent fuzzy set formally?

Assume we have universe of discourse X and a fuzzy set A defined on it.

X = {x1,x2,x3,x4,x5...xn}

Fuzzy set A defines the membership function mA(x) that maps elements xi of X to degree of

membership in [0,1].

A = {a1,a2,a3...an}

where

ai = m A(xi)

For clearer representation, includes symbol "/" which associates membership value ai with

xi:

A = {a1/x1,a2/x2....an/xn}

Consider Fuzzy set of tall, medium and short people:

TALL = {0/5, 0.25/5.5, 0.7/6, 1/6.5, 1/7}

MEDIUM = {0/4.5, 0.5/5, 1/5.5, 0.5/6, 0/6.5}

SHORT = { }

8.1.5 HEDGESWe have learn how to capture and representing vague linguistic term using fuzzy set.

In normal conversation, we add additional vagueness by using adverbs such as:

very, slightly, or somewhat..What is adverb??

A word that modifies a verb, an adjective, another adverb, or whole sentence.

Example: Adverb modifying an adjective.

The person is very tall

How do we represent this new fuzzy set??

Use a technique called HEDGES.

A hedge modifies mathematically an existing fuzzy set to account for some added adverb.

Concentration (Very)Further reducing the membership values of those element that have smaller membership

values.

m CON(A) (x) = (m A(x))2

Given fuzzy set of tall persons, can create a new set of very tall person.

Example:

Tall = {0/5, 0.25/5.5, 0.76/6, 1/6.5, 1/7)

Very tall = { /5, /5.5, /6, /6.5, /7}

Dilation (somewhat)

Dilates the fuzzy elements by increasing the membership values with small membership

values more than elements with high membership values.

m DIL(A) (x) = (m A(x))0.5

Example:

Tall = {0/5, 0.25/5.5, 0.76/6, 1/6.5, 1/7}

somewhat tall = { /5, /5.5, /6, /6.5, /7}

Intensification (Indeed)Intensifying the meaning of phrase by increasing the membership values above 0.5 and

decreasing those below 0.5.

m INT(A) (x) = 2(m A(x))2 for 0 m A(x) 0.5

= 1 - 2(1 - m A(x))2 for 0.5 m A(x) 1

Example:

short = {1/5, 0.8/5.5, 0.5/6, 0.2/6.5, 0/7}

indeed short = { /5, /5.5, /6, /6.5, /7}

Power (Very Very)Extension of the concentration operation.

m POW(A) (x) = (m A(x))n

Example:

Create fuzzy set of very very tall person with n=3

Tall = {0/5, 0.25/5.5, 0.76/6, 1/6.5, 1/7}

Very very tall = { /5, /5.5, /6, /6.5, /7}

8.1.6 FUZZY SET OPERATIONS

IntersectionIn classical set theory, intersection of 2 sets contains elements common to both.

In fuzzy sets, an element may be partially in both sets.

m A B (X) = min (m A(x), m B(x)) " x X

Example:

Tall = {0/5, 0.2/5.5, 0.5/6, 0.8/6.5, 1/7}

Short = {1/5, 0.8/5.5, 0.5/6, 0.2/6.5, 0/7}

m tall short =

Tall and short can mean ________

Highest at the middle and lowest at both end.

UnionUnion of 2 sets is comprised of those elements that belong to one or both sets.

m A B (X) = max (m A(x), m B(x)) " x X

Example:

Tall = {0/5, 0.2/5.5, 0.5/6, 0.8/6.5, 1/7}

Short = {1/5, 0.8/5.5, 0.5/6, 0.2/6.5, 0/7}

m tall short =

Attains its highest vales at the limits and lowest at the middle.

Tall or short can mean ________

Complementation (Not)Find complement ~A by using the following operation:

m ~A (x) = 1 - m A(x)\Short = {1/5, 0.8/5.5, 0.5/6, 0.2/6.5, 0/7}

Not short = { /5, /5.5, /6, /6.5, /7}

8.1.7 FUZZY INFERENCEFuzzy proposition: a statement that assert a value for some linguistic variable such as

‘height is tall’.

Fuzzy Rule: Rule that refers to 1 or more fuzzy variable in its conditions and single fuzzy

variable in its conclusion.

General form: IF X is A THEN Y is B

Specific form: IF height is tall THEN weight is heavy

Association of 2 fuzzy sets are store in matrix M called Fuzzy Associative Memory (FAM)

Rules are applied to fuzzy variables by a process called propagation. (inference process).

When applied, it looks for degree of membership in the condition part and

calculate degree of membership in the conclusion part.

Calculation depends upon connectives: AND, OR or NOT.

A fuzzy Logic program can be viewed as a 3 stage process:

1. FuzzificationThe crisp values input are assigned to the appropriate input fuzzy variables

and converted to the degree of membership.

2. Propagation (Inference)Fuzzy rules are applied to the fuzzy variables where degrees of membership computed in

the condition part are propagated to the fuzzy variables in the conclusion part. (max-min

and max-product inference)

3. De-fuzzificationThe resultant degrees of membership for the fuzzy variables are converted back into crisp

values.

Example:

Assume 2 cars traveling the same sped along a straight road. The distance between the cars

becomes one of the factors for the second driver to brake his car to avoid collision. The

following rule might be used by the second driver:

IF the distance between cars is very small

AND the speed of car is high

THEN brake very hard for speed reduction.

IF distance between cars is slightly long

AND the speed of car is not too low

THEN brake moderately hard to reduce speed

In the above examples, the rules are feature with:

linguistic variables:

fuzzy subsets:

connectives:

hedges:

Diagram:

Fuzzy Logic Controllers are build up from 4 main components:

Fuzzifier

Fuzzy inference engine

Defuzzifier

8.2 Memory OrganisationMemory is central to common sense behaviour. Human memory contains an immense

amount of knowledge about the world. So far, we have only discussed a tiny fraction of that

knowledge. Memory is also the basis for learning. A system that cannot learn cannot, in

practice, possess common sense.

A complete theory of human memory has not yet been discovered, but we do have

a number of facts at our disposal. Some of these facts come from neurobiology while others

are psychological in nature. Computer models of neural memory are interesting, but they do

not serve as theories about how memory is used in everyday, commonsense reasoning.

Psychology an AI seek to address these issues.

Psychological studies suggest several distinctions in human memory. One

distinction is between short-term memory (STM) and long-term memory (LTM). We know

that a person can only hold a few items at a time in STM, but the capacity of LTM is very

large. LTM storage is also fairly permanent. The production system is one computer model of

the STM-LTM structure. Perceptual information is stored directly in STM, also called working

memory. Production rules, stored in LTM, match themselves against items in STM.

Productions fire, modify STM, and repeat.

LTM is often divided into episodic memory and sementic memory. Episodic memory

contains information about past personal experiences, usually stored from an

autobiograpical point of view. For example, a college graduation, a wedding, or a concert

may all form episodic memories. Semantic memory, on the other hand, contains facts like

“Birds fly”. These facts are no longer connected with personal experiences. Sementic

memory is especially useful in programs that understand natural language.

Models for episodic memory grew out of research on scripts. Recall that a script is a

stereotyped sequence of events, such as those involved in going to the dentist. One obvious

question to ask is: How are scripts acquired? Surely they are acquired through personal

experience. But a particular experience often includes details that we do not want to include

in a script. For example, just because we once saw The New Yorker magazine in a dentist’s

waiting room, that doesn’t mean that The New Yorker should be part of the dentist script.

The problem is that if a script contains too many details, it will not be matched and retrived

correctely when new, similar situation arise.

In general, it is difficult to know which script to retrive. One reason reason for this is

that script are too monolithic. It is hard to do any kind of partial matching. It is also hard to

modify a script. More recent work reduces scripts to individual scenes, which can be shared

across multiple structures. Streotypical sequences of scenes are strung together into

memory organization packets. Usually, three distinct MOP’s encode knowledge about an

event sequence. One MOP represents the physical sequence of events, such as entering a

dentist’s office, sitting in the waiting room, reading a magazine, sitting inte dentist’s chair,

etc. Another MOP represent the set of social event that take place. These are events that

involve personal interactions. A third MOP revolves around the goals of the person in the

particualr episode. Any of these MOPs may be important for understanding new situations.

MOPs organize scenes, and they themselves are further organized into higer-level

MOPs. For example, the MOP for visiting the office of a professional may contain a sequence

of abstract general scenes, such as talking to an assistant, waiting, and meeting. High-level

MOPs contain no actual memories, so where do they come from?

New MOPs are created upon the failure of exceptions. When we use scripts for story

understanding, we are able to locate interesting parts of the story by noticing places where

events do not conform to the script’s expectations. In a MOP-based system, if an expectation

is repeatedly violeted, then the MOP is generalized or split. Eventually, episodic memories

can fade away, leaving only a set of generalized MOPs. These MOPs look something like

scripts, except that they share scenes with one another.

Let’s look at an example. the first time you go to the dentist, you must determine

how things work from scratch since you have no prior experience. In doing so, you store

detailed accounts of each scene and string them together into a MOP. The next time you

visit the dentist, that MOP provides certain expectations, which are mostly met. You are able

to deal with the situation easily and make inferences that you could not make the first time.

If any expectation fails, this provides grounds for modifying the MOP. Now, suppose you

later visit a doctor’s office. As you begin to store episodic scenes, you notice similarties

between these scenes and scenes from the dentist MOP. Such similarities provide a basis for

using the dentist MOP to generate expectations. Miltiple trips to the doctor will result in a

doctor MOP that is slightly different from the dentist MOP. Later experiences with visiting

lawyers and gevernment officials will result in other MOPs. Ultimately, the structures shared

by all of these MOPs will cause a generalized MOP to appear. Whenever you visit a

profession’s office in the future, you can use the generalized MOP to provide expectations.

With MOPs, memory is both a constructive and reconstructive process. It is

constructive because new experience create new memory structures. It is reconstrctive

because even if the details of a particular episode are lost, the MOP provides information

about what was likely to have happened. The ability to do this kind of reconstruction is an

important feature of human memory.

There are several MOP-based computer programs. CYRUS is a program that contrains

episodes taken from the life of a particular individual. CYRUS can answer questions that

require significant amount of memory reconstruction. The IPP program accepts stories about

terrorist attacks and stores them in an episodic memory. As it notices similarities in the

stories, it creates general memory structures. These structures improve its ability to

understand. MOPTRANS uses a MOP-based memory to understand sentences in one

language and translate them into another.

8.3 Neural Networks and Parallel ComputationThe human brain is made up of a web of billions of cells called neurons, and understanding

its complexities is seen as one of the last frontiers in scientific research. It is the aim of AI

researchers who prefer this bottom-up approach to construct electronic circuits that act as

neurons do in the human brain. Although much of the working of the brain remains

unknown, the complex network of neurons is what gives humans intelligent characteristics.

By itself, a neuron is not intelligent, but when grouped together, neurons are able to pass

electrical signals through networks.

The neuron "firing", passing a signal to the next in the chain.

Research has shown that a signal received by a neuron travels through the dendrite region,

and down the axon. Separating nerve cells is a gap called the synapse. In order for the

signal to be transferred to the next neuron, the signal must be converted from electrical to

chemical energy. The signal can then be received by the next neuron and processed.

Warren McCulloch after completing medical school at Yale, along with Walter Pitts a

mathematician proposed a hypothesis to explain the fundamentals of how neural networks

made the brain work. Based on experiments with neurons, McCulloch and Pitts showed that

neurons might be considered devices for processing binary numbers. An important back of

mathematic logic, binary numbers (represented as 1's and 0's or true and false) were also

the basis of the electronic computer. This link is the basis of computer-simulated neural

networks, also know as Parallel computing.

A century earlier the true / false nature of binary numbers was theorized in 1854 by George

Boole in his postulates concerning the Laws of Thought. Boole's principles make up what is

known as Boolean algebra, the collection of logic concerning AND, OR, NOT operands. For

example according to the Laws of thought the statement: (for this example consider all

apples red)

Apples are red-- is True

Apples are red AND oranges are purple-- is False

Apples are red OR oranges are purple-- is True

Apples are red AND oranges are NOT purple-- is also True

Boole also assumed that the human mind works according to these laws, it performs logical

operations that could be reasoned. Ninety years later, Claude Shannon applied Boole's

principles in circuits, the blueprint for electronic computers. Boole's contribution to the

future of computing and Artificial Intelligence was immeasurable, and his logic is the basis of

neural networks.

McCulloch and Pitts, using Boole's principles, wrote a paper on neural network theory. The

thesis dealt with how the networks of connected neurons could perform logical operations. It

also stated that, one the level of a single neuron, the release or failure to release an impulse

was the basis by which the brain makes true / false decisions. Using the idea of feedback

theory, they described the loop which existed between the senses ---> brain ---> muscles,

and likewise concluded that Memory could be defined as the signals in a closed loop of

neurons. Although we now know that logic in the brain occurs at a level higher then

McCulloch and Pitts theorized, their contributions were important to AI because they showed

how the firing of signals between connected neurons could cause the brains to make

decisions. McCulloch and Pitt's theory is the basis of the artificial neural network theory.

Using this theory, McCulloch and Pitts then designed electronic replicas of neural networks,

to show how electronic networks could generate logical processes. They also stated that

neural networks may, in the future, be able to learn, and recognize patterns. The results of

their research and two of Weiner's books served to increase enthusiasm, and laboratories of

computer simulated neurons were set up across the country.

Two major factors have inhibited the development of full scale neural networks. Because of

the expense of constructing a machine to simulate neurons, it was expensive even to

construct neural networks with the number of neurons in an ant. Although the cost of

components have decreased, the computer would have to grow thousands of times larger to

be on the scale of the human brain. The second factor is current computer architecture. The

standard Von Neuman computer, the architecture of nearly all computers, lacks an adequate

number of pathways between components. Researchers are now developing alternate

architectures for use with neural networks.

Even with these inhibiting factors, artificial neural networks have presented some impressive

results. Frank Rosenblatt, experimenting with computer simulated networks, was able to

create a machine that could mimic the human thinking process, and recognize letters. But,

with new top-down methods becoming popular, parallel computing was put on hold. Now

neural networks are making a return, and some researchers believe that with new computer

architectures, parallel computing and the bottom-up theory will be a driving factor in

creating artificial intelligence.

8.3.1 Neural Network ArchitecturesNeural networks are large networks of simple processing elements or nodes which process

information dynamically in response to external inputs. The nodes are simplified models of

neurons. The knowledge in a neural network is distributed throughout the network in the

form of inter-node connections and weighted links which form the inputs to the nodes. The

link weights serve to enhance or inhibit the input stimuli values which are then added

together at the nodes. If the sum of all the inputs to a node exceeds some threshold value T,

the node executes and produces an output which is passed on to other nodes or is used to

produce some output response. In the simplest case, no output is produced if the total input

is less than T. In more complex models, the output will depend on a nonlinear activation

function.

Neural networks were originally inspired as being models of the human nervous system.

They are greatly simplified models to be sure (neurons are known to be fairly complex

processor). Even so, they have been shown to exhibit many “intelligent” abilities, such as

learning, generalization, and abstraction.

A single node is illustrated in following figure. The inputs to the node are the values x1,x2,

…….xn, which typically take on values of –1, 0, 1, or real values within the range (-1,1). The

weights w1,w2,……wn, correspond to the synaptic strengths of a neuron. They serve to

increase or decrease the effects of the corresponding x1 input values. The sum of the

products xi * wi , i = 1,2,…..,n, serve as the total combined input to the node. If this sum is

large enough to exceed the threshold amount T,

x1 w1

x2 w2 y

= x * w

x3 w3

the node fires, and produces an output y, an activation function value placed on the node’s

output links. This output may then be the input to other nodes or the final output response

from the networks.

Following fig illustrates three layers of a number of interconnected nodes. The first layer

serves as the input layer, receiving inputs from some set of stimuli. The second layer(called

the hidden layer) receives input from the first layer and produces a pattern of inputs to the

third layer, the output layer. The pattern of outputs from the final layer are the network’s

response to the input stimuli patterns. Input links to layer j (j = 1, 2, 3) have weight wij for i

= 1, 2 …. n.

General multiplayer networks having n nodes (number of rows) in each of m layers (number

of columns of nodes) will have weights represented as an n * m matrix W. Using this

representation, nodes having no interconnecting links will have a weight value of zero.

Networks consisting of more than three layers would, of course, be correspondingly more

complex than the network depicted in following Figure.

A neural network can be thought of as a black box that transforms the input vector x to the

output vector y where the transformation performed is the result of the pattern of

connections and weights, that is, according to the values of the weight matrix W.

Consider the vector product x * w = åxiwi

x1 w11 W12 W13

x2 w21 w22

w23

x3 w31 w32

w33

layer 1 layer 2 layer 3There is a geometric interpretation for this product. It is equivalent to projecting one vector

onto the other vector in n-dimensional space. This notation is depicted in following Figure for

the two-dimensional case.

The magnitude of the resultant vector is given by

x * w = |x||w| cos qwhere |x| denotes the norm or length of the vector x. note that this product is maximum

when both vectors point in the same direction, that is, when q = 0. The product is a

minimum when both point in opposite directions or when q = 1800 degrees. This illustrates

how the vectors in the weight matrix W influence the inputs to the nodes in a neural

network.

8.4 Genetic AlgorithmGenetic algorithm learning methods are based on model of natural adaptation and evolution.

These learning systems improve their performance through processes, which model

population genetics and survival of fittest.

In the field of genetic, a population is subjected to an environment, which places demands

on the members. The members which adapt well are selected for mating and reproduction.

The offspring of these better performers inherit genetic traits from both their parents.

Member of this second generation of offspring which also adapt well are then selected for

mating and reproduction and the evolutionary cycle continues. Poor performers die off

without leaving offspring. Good performers produce good offspring and they, in turn, perform

well. After some number of generations, the resultant population will have adapted optimally

or at least very well to the environment.

Genetic algorithm systems start with a fixed size population of data structures which are

used to perform some given tasks. After requiring the structures to execute the specified

tasks some number of times, the structures are rated on their performance, and a new

generation of data structures is then created. The new generation is created by mating the

higher performing structures to produce offspring. These offspring and their parents are then

retained for the next generation while the poorer performing structures are discarded. The

basic cycle is illustrated in following figure.

Generate initial population

So

Structures perform given tasks repeatedly

Performance utility values assigned to knowledge

structures

New population is generated from best performing

structures

Process repeated until desired performance reached

Mutations are also performed on the best performing structures to insure that the full space

of possible structures is reachable. This process is repeated for a number of generations

until the resultant population consists of only the highest performing structures.

Data structure which make up the population can represent rules or any other suitable types

of knowledge structures are fixed-length binary strings such as the eight-bit string

11010001. An initial population of these eight-bit strings would be generated randomly or

with the use of heuristics at time zero. These strings, which might be simple condition and

action rules, would then be assigned some tasks to perform (like predicting the weather

based on certain physical and geographic conditions or diagnosing a fault in a piece of

equipment).

After multiple attempts at executing the tasks, each of the participating structures would be

rated and tagged with a utility value u commensurate with its performance. The next

population would then be generated using the higher performing structures as parents and

the process would be repeated with the newly produced generation. After many generations

the remaining population structures should perform the desired tasks well.

Mating between two string is accomplished with the crossover operation which randomly

selects a bit position in the eight-bit string and concatenates the head of one parent to the

tail of the second parent to produce the offspring. Suppose the two parents are designated

as xxxxxxxx and yyyyyyyy respectively, and suppose the third bit position has been selected

as the crossover point (at the position of the colon in the structure xxx:xxxxx). After the

crossover operation is applied, two offspring are then generated, namely xxxyyyyy and

yyyxxxxx. Such offspring and their parents are then used to make up the next generation of

structures.

A second genetic operation often used is called inversion. Inversion is a transformation

applied to a single string. A bit position is selected at random, and when applied to a

structure, the inversion operation concatenates the tail of the string to the head of the same

string. Thus, if the sixth position were selected (x1x2x3x4x5x6x7x8), the inverted string

would be x7x8x1x2x3x4x5x6.

A third operator, mutation, is used to insure that all locations of the rule space are

reachable, that every potential rule in the rule space is available for evaluation. This insures

that the selection process does not get caught in a local minimum. For example, it may

happen that use of the crossover and inversion operators will only produce a set of

structures that are better than all local neighbors but not only produce a set of structures

that are better than all local neighbors but not optimal in a global sense. This can happen

since crossover and inversion may not be able to produce some undiscovered structures.

The mutation operator can overcome this by simply selecting any bit position in a string at

random and changing it. This operator is typically used only infrequently to prevent random

wandering in the search space.

8.5 Matching:Matching is the process of comparing two or more structures to discover their likenesses or

differences. The structures may represent a wide range of objects including physical entities,

words of phrases in some language, complete classes of things, general concepts, relations

between complex entities, and the like. The representations will be given in one or more of

the formalisms like FOPL, networks, or some other scheme, and matching will involve

comparing the component parts of such structures.

Matching is used in a variety of programs for different reasons. It may serve to control the

sequence of operations, to identify or classify objects, to determine the best of a number of

different alternatives, or to retrieve items from a database. It is an essential operation in

such diverse programs as speech recognition, natural language understanding, vision,

learning, automated reasoning, planning, automatic programming, and expert systems, as

well as many others.

In its simplest form, matching is just the process of comparing two structures or pattern for

equality. The match fails if the patterns differ in any aspect. For example a match between

the two characters strings acdebfba and acdebeba fails on an exact match since the string

differ in the sixth character positions.

In more complex cases the matching process may permit transformations in the patterns in

order to achieve an equality match. The transformation may be a simple change of some

variables to constants, or it may amount to ignoring some components during the match

operation. For example, a pattern matching variable such as ?x may be used to permit

successful matching between the two patterns (a b (c d) e) and (a b ?x e) by binding ?x to (c

d). Such matching are usually restricted in some way, however, as is the case with the

unification of two clauses where only consistent bindings are permitted. Thus, two patterns

such as

(a b (c d) e f) and (a b ?x e ?x )

would not match since ?x could not be bound to two different constants.

In some extreme cases, a complete change of representational from may be required in

either one or both structures before a match can be attempted. This will be the case, for

example, when one visual object is represented as a vector of pixel gray levels and objects

to be matched are represented as descriptions in predicate logic or some other high level

statements. A direct comparison is impossible unless one form has been transformed into

the other.

In subsequent chapters we will see examples of many problems where exact matches are

inappropriate, and some form of partial matching is more meaningful. Typically in such

cases, one is interested in finding a best match between pairs of structures. This will be the

case in object classification problems, for example, when object descriptions are subject to

corruption by noise or distortion. In such cases, a measure of the degree of match may also

be required.

Other types of partial matching may require finding a match between certain key elements

while ignoring all other elements in the pattern. For example, a human language input unit

should be flexible enough to recognize any of the following three statements as expressing a

choice of preference for the low-calorie food item.

I prefer the low-calorie choice.

I want the low-calorie item.

The low-calorie one please.

Recognition of the intended request can be achieved by matching against key words in a

template containing “low-calorie” and ignoring other words except, perhaps, negative

modifiers.

Finally, some problems may obviate the need for a form of fuzzy matching where an entity’s

degree of membership in one or more classes is appropriate. Some classification problems

will apply here if the boundaries between the classes are not distinct, and an object may

belong to more than one class.

Following fig illustrate the general match process where an input description is being

compared with other descriptions. As stressed earlier, the term object is used here in a

general sense. It does not necessarily imply physical objects. All objects will be represented

in some formalism such as a vector of attribute values, prepositional logic or FOPL

statements, rules, frame-like structures, or other scheme. Transformations, if required, may

involve simple instantiations or unifications among clauses or more complex operation such

as transforming a two-dimensional scene to a description in some formal language. Once the

descriptions have been transformed into the same schema, the matching process is

performed element-by-element using a relational or other test (like equality or ranking). The

test results may then be combined in some way to provide an overall measure of similarity.

The choice of measure will depend on the match criteria and representation scheme

employed.

à Representation à Transformations

Match

Comparator

Result

à Representation à Transformations

Metric

The output of the matcher is a description of the match. It may be a simple yes or no

response or a list of variable bindings, or as complicated as a detailed annotation of the

similarities and differences between the matched objects. To summarize then, matching

may be exact, used with or without pattern variables, partial, or fuzzy, and any matching

algorithm will be based on such factors as

Choice of representation scheme for the objects being matched,

Criteria for matching (exact, partial, fuzzy, and so on),

Choice of measure required to perform the match in accordance with the chosen criteria,

and

Type of match description required for output.

8.5.1 Variables MatchingAll of the structures we shall consider here are constructed from basic atomic elements,

number, and characters. Character string elements may represent either constraints or

variables. If variables, they may be classified by either the type of match permitted or by

their value domains.

We can classify match variables by the number of items that can replace them (one or more

than one). An open variable can be replaced by a single item, while a segment variable can

be replaced by zero or more items. Open variables are labeled with a preceding question

mark (?x, ?y, ?class). They may match or assume the value of any single string element or

word, but they are sometimes subject to consistency constraints. For example, to be

consistent, the variable ?x can be bound only to the same top level element in any single

structure. Thus (a ?x d ?x e) may match (a b d b e), but not (a b d a e). segment variable

types will be preceded with an asterisk (*x d (e f) * y) will match the patterns

(a (b c) d (e f) g h), (d (e f) (g))

or other similar patterns. Segment variables may also be subject to consistency constraints

similar to open variables.

Variables may also be classified by their value domains. This distinction will be useful when

we consider similar measure below. The variables may be either quantitative, having a

meaningful origin or zero point and a meaningful interval difference between two values, or

they may be qualitative in which there is no origin nor meaningful interval value difference.

These two types may be further subdivided as follows.

Nominal variables. Qualitative variables whose value or states have no order nor rank. It is

only possible to distinguish equality or inequality between two such objects. Of course each

state can be given a numerical code. For example, “marital status” has states of married,

single, divorced, or windowed. These states have no numerical significance, and no

particular order nor rank. The states could be assigned numerical codes however, such as

married = 1, single = 2, divorced = 3, and widowed = 4.

Ordinal variables. Qualitative variables whose states can be arranged in a rank order, but

the difference between two distinct values has no significance. Ordinal variables may also be

assigned numerical values. For example, the states very tall, tall, medium, short, and very

short can be arranged in order from tallest to shortest and be assigned an arbitrary scale of

5 to 1. However, the difference between successive values does not necessarily have any

quantitative meaning.

Binary variable. Qualitative discrete variables which may assume only one of two values,

such as 0 or 1, good or bad, yes or no, high or low.

Interval variables. Quantitative variables, which take on numeric values and for which

equal differences between values, have the same significance. For example, real numbers

corresponding to temperature or integers corresponding to an amount of money are

considered as interval variables.

syllabus

Documents

ai problem

ai tasks

application of knowledge

ai techniques

learning unit

artificial intelligence

inferential knowledge

declarative knowledge