is 524 corporate information systems - wiu...

©Copyright 2017 by Chandra S. Amaravadi. All rights reserved. Electronic reproduction or distribution is strictly prohibited.

IS 524 CORPORATE INFORMATION SYSTEMS

SUPPLEMENTARY NOTES

Chandra S. Amaravadi April 26, 2017

Part III

Management Support Systems I Management Support Systems II Artificial Intelligence Knowledge Management

1

MANAGEMENT SUPPORT SYSTEMS-I

Introduction

Business complexity has been increasing since the 1970’s. There are now more products and payment

options than ever before. Competition aggravates the situation by forcing organizations to change strategies

more frequently. Greater amount of globalization means that there is a greater amount of co-ordination that

is necessary. All of these imply a greater degree of reliance on management support systems. The evolution

of information systems has been discussed in a previous chapter. DSS evolved from reporting systems which

grew out of transaction processing systems. DSS allows data to be analyzed, using models. The DSS concept

led to GIS, GDSS and BI systems. GIS overlay statistical data on a geography, the typical example being

automobile sales shown on a map of the country (BI systems provide similar capability). BI systems are based

on warehouses which evolved out of DBMS technologies. GDSS evolved into collaborative systems. Thus

DSS, GDSS, GIS & BI systems are all grouped into MSS.

Fig 1. MSS Evolution

Management Support Systems

Management support systems are those which support managerial activity. This begs the question what is

managerial activity? Decision making is obviously one important managerial activity and IT support for

managerial decisions can have its benefits. Hence MSS is an important area of Information Systems. MSS

have some characteristics – they tend to be interactive and customizable. Interactive means that the decision

maker is actively involved with the system and can change parameters, models and data. Contemporary

businesses cannot operated without the use of MSS. Some MSS are model based in the sense that there are

algorithms or mathematical equations for solving the problems.

Types of Decisions:

MSS are intended to support all managerial decisions. The types of decisions faced in organizations range

from those that are structured to those that are unstructured. Generally speaking, structured decisions are those

where the assumptions are known, the problem is known, the solution method is known. In unstructured

decisions, some or all of these are often not known. For e.g. whether to approve a loan, whether to promote

an employee or how much raw material to order are structured decisions. A company could be experiencing

a decline in sales and management may not be able to know the reason – whether due to product quality or

due to pricing or due to consumer whims. These unknowns lead to an unstructured decision situation and

become hypotheses to be tested. Unstructured decisions involve more variables and are therefore more

complex than structured decisions. They also have a greater impact on the organization.

2

Decision Making Styles:

Just as there are structured and unstructured decisions there are also different decision making styles.

Decision making styles vary on a continuum from analytical to intuitive. Analytical decision makers follow a

rational decision making style. They analyze a problem, develop alternatives and identify the alternative that

maximizes the decision maker’s utility. Intuitive decision makers make use of no known models but rather get

their information and opinions by talking to colleagues, going around company facilities, meeting vendors and

then abruptly produce a decision. If they are asked to explain their reasoning they cannot.

Fig 2. Decision Making Styles

Simon’s phases of decision making (IDC):

Herbert Simon postulated that rational decision makers move through a systematic process while making

decisions. This is characterized by IDC or ‘Intelligence’, ‘Design’ ‘Choice’. In the intelligence phase, the

decision maker collects information on the problem. During the design phase, the decision maker develops

viable alternatives for the problem. Finally during the choice phase, the decision maker selects the choice that

maximizes his or her decision making utility. It could be reducing costs, increasing profits, market share etc.

To give you an example let us say a manager wants to buy a plane for his/her company. First step is to

identify information about the problem -- -- how many people will need to travel, what range,

domestic/international travel, budget etc. Next step is to identify alternatives. Alternatives could involve

outright purchase, lease or fractional ownership. Next is to identify what alternative maximizes the utility to

the company. In this case utility is to fulfill the needs at the lowest cost. A DSS allows decision makers to list

and compare these alternatives and evaluate the decision.

Fig 3. Individual model of Decision Making/ Rational model

Organizational Models of Decision Making:

Just as there is an individual decision making model, there is also an organizational model or the organization’s

way of making decisions. This is the overall decision making style that is prevalent in the organization. You

3

are already familiar with the rational model. This means gathering information about the problem in a

systematic way, identifying alternatives and choosing an alternative that maximizes the decision makers’

utility. Then there is the bureaucratic model. You should already be familiar with this. In the bureaucratic

model decisions are made under constraints. For e.g. consider a department at a state university needing a

faculty member and lets say that a faculty from the department goes to a conference to recruit candidates. If

the faculty found a good candidate, he/she cannot hire them on the spot. The search has to be advertised in

a national media, to get all candidates. These candidates need to be selected and ranked. Reasons have to be

provided for the selection and ranking for the purpose of affirmative action. The selected candidates are then

called for an interview. But unfortunately top candidates also apply elsewhere and interview at other places

(because they are rational). The candidates still need to be brought on campus, wined and dined and

interviewed. Until all the top three candidates are interviewed, none of them can be given an offer. Thus this

illustrates decision making under constraints. The political model is one where decision making is clouded by

politics (tug of war between decision makers) or decision makers act to maximize their own interests. The

classic example of this is Jim Barksdale of Netscape who according to one magazine, took the company

public because he had to make a down payment on his yacht. Here an individual acted on personal interest.

Otherwise it would have been advantageous for the company to wait a few more years before IPO.

Biases in decision making (No ideal decisions)

Decision making is fraught with human biases.

a. Bounded rationality –Because of human cognitive limitations, decision makers can process only a limited

amount of information at any given time. For example, a manager may not be able to review more than

100 applications in a day. We have this problem for computers too. In linear programming, technology is

unable to handle problems with more than 120 constraints.

b. Anchoring and adjustment biases – In unstructured decision making situations many problem variables are

not known and need to be estimated. When estimating, we make an estimate based on the most recent

value. If we are asked to estimate the value of the Nasdaq in 20 years, we might start with 4,849. Since

that value is a little bit inflated the estimate may be completely wrong.

c. Localized search/uncertainty avoidance – It has been found that decision makers search for solutions in

their area of comfort and avoid as much uncertainty as possible. Suppose the company has a software in

existence, they will try to continue using it rather than to get a different software.

d. Heuristics – Heuristics are short cuts in decision making. For e.g. if a person is asked to estimate Nasdaq

in 20 years, he/she may use short cut such as four times the current value.

e. Constraints – Sometimes time or resource constraints force a decision. Examples of these are Candidates

need to be hired before a certain time period and contracts awarded before a deadline.

Decision Support Systems

A system that supports structured and semi-structured decision making by managers in their own personalized

way e.g. to decide projects, investments etc. DSS provide support for decisions where there is some degree

of structure. They do not support unstructured decisions simply because such types of decisions are difficult to

model let alone support with programs (i.e. computer programs or DSS). DSS are intended for use by

operational and tactical (lower and middle management but not strategic).

4

Fig 4. Overview of DSS

The diagram above us an overview of DSS concepts. We have seen that decisions can be structured or semi-

structured or unstructured. Unstructured decisions cannot be supported with a DSS as noted earlier since the

variables and solution method may not be known. Regardless of what type of decision, the decision has

variables or parameters – Interest rates, initial investment, degree of risk, etc. Decision parameters are entered

into the DSS through an interface called Dialogue Management. The Dialogue Management facility allows

problems to be entered. The decision is represented in an abstract fashion through formula -referred as a

decision model. Note that models need not always be mathematical. We can have qualitative models such as

simulation models where a definite outcome cannot be calculated. Decision utility concerns outcome valued

by decision maker such as costs, profits, etc.

Classical DSS Architecture:

The classical DSS architecture has been introduced in the late seventies and has remained unchanged over the

years. A decision support system has three components, as follows:

Fig 5. Classical DSS Architecture

5

Dialog management: Dialog management allows the decision problem to be represented and modeled. It is

essentially the user interface. For example certain types of decisions such as investment and cash flow lend

themselves to be modeled in a spreadsheet. Here the interface is two dimensional and the data is generally

numerical. So what type of problems cannot be represented in a spread sheet?

Model management: Models are abstract problem representations. Examples of models include NPV – Net

Present Value, EOQ – Economic Order Quantity. The Model management facility allows decision makers to

create and link models with the parameters/decision data (in the process of making a decision). Another way

to think about this is to create a formula using the decision data.

Data management: Manages decision data. It allows us to sort, filter and select data for the decision just as

with a database.

DSS Capabilities:

DSS capabilities are the analysis features provided by the system. This terminology is so popular that it has

become part of the business vocabulary.

What if - change one or more variables. What if sales were to increase by 10% instead of 15% and costs to

increase by 8% instead of 5%?

Sensitivity - Change one variable such as int. rates or growth rate and see impact on bottom line. If

changing a variable can change the solution by a great amount the variable is said to be sensitive.

Goal seeking - finding a solution to satisfy constraints. What level of sales would satisfy profit target?

Optimization- find best solution under a given set of constraints. What level of sales would maximize profit?

Classification of Models:

While decisions are characterized by degree of structure, DSS models are classified based on the degree of

certainty in the decision. It refers to how much information we have about the decision situation. Accordingly

we have decision making under certainty (e.g. L.P.), decision making under risk (e.g. decision tree, Bayesian,

Queing), and decision making under uncertainty (e.g. causal models and influence diagrams). Uncertainty or

risk in the decision is generally modelled with probability.

D.M. under certainty

Linear programming (LP)

Integer Programming

Non-linear programming models

Graph models (e.g. PERT)

D.M. under risk

Decision tree

Bayesian Analysis

Queuing, Discrete Event Simulation

Markov

D.M. under uncertainty.

Causal models

Influence diagrams

Strategic Assumption Surfacing & Testing (SAST)

6

Linear Programming

This is an example of a model for solving problems that have constraints. The constraints are typically

expressed as linear equations (y = mx + c). Suppose a company is manufacturing three products, ‘A’, ‘B’ and

‘C’. If ‘a’, ‘b’, ‘c’ represent quantities of A, B, C that should be produced and if there is a maximum capacity

of 200 units a day, this is expressed as a constraint:

a + b + c <= 200

if A takes 8 hours to produce, B takes 5 hours to produce and C takes 3 hours to produce, but the maximum

number of production hours available are 150, this is expressed as:

8 a + 5 b + 3 c <= 150

For a given problem there will be many such constraints modelled by linear equations and solved by an LP

program.

Decision Tree

A decision tree is one of the models for decision making under risk. In a decision tree, a decision is shown

as a tree with branches, the branches represent outcomes of a decision. Uncertainty is modeled with

probability or the chance that an outcome will occur. Consider for example a person who does an MBA (see

diagram below) – let us say that there is a 90% chance of them joining a company and a 10% chance of them

going into the teaching line. These are shown as outcomes of doing an MBA. Out of the people who work

in companies, 10% become managers and how many become co-ordinators? These are shown as sub

branches. Out of the people who work in academia some decide to go for a Ph.D while others go directly

into teaching. Note the use of squares for decisions and use of circles for outcomes.

Fig 6. An MBA Example of Decision Tree

D.M. Under Risk – Case of SS. Kuniang

This case illustrates the complexity of real world decision making. The Kuniang ran aground off the coast of

Florida in a storm. The owners wanted to sell it, but the Coast Guard had authority for selling. The scrap

value was estimated at $5 million and the repair costs at $15 million.

New England Electric System

NEES is a utility company on the East Coast. It uses coal for power generation. Their demand was for 4

million tons/year. But they already had a General Dynamics vessel that costs $70 million and could transport

7

2.5 million tons/year. They need additional capacity for transporting 1.5 million tons. They thought they

could refurbish Kuniang and use it to transport coal. The problem is whether or not to bid for Kuniang and

how much to bid. This is not a simple matter. It is tied in with issues of source of coal and turnaround times.

Decision Complications

There are some decision complications. First is whether to use Egyptian coal or PA coal. Egyptian coal has

less sulfur content than PA coal, but it takes 80 days for round trip time. Further there is the complication of

the Jones Act that gives priority to U.S. ships in U.S. ports. If there are ten ships in line at a U.S. port and the

third ship is a U.S. ship, it gets into port first. The only exception is if the repair cost exceeds 3 times the

value of the ship. Then it will be considered a Jones act ship. How is the value to be judged? That depends

on how the coast guard values the ship. They could value it as scrap or at the highest bid. Depending on that,

the repair costs necessary to make it a Jones Act ship will change. If they add a crane that adds to the cost (to

meet the Jones act requirement) but reduces turnaround time.

Decision Options

The decision alternatives are Kuniang, Kuniang with crane, General Dynamics vessel, or Tug Barge. The

next slide shows the data for the four options. The cost of the General Dynamics vessel is $70m, tug barge is

$32m, Kuniang without the crane is Bidprice plus $15m which is the cost of repairs and Kuniang with crane

costs Bid price plus $36million. Out of this, $15 million is for the repair of the ship and $21 million is for the

crane. Note that with the crane the cargo capacity of the Kuniang decreases by 5,750 tons. The round trip

takes 5.15 days with the General Dynamics vessel, 7.15 days with the Tug barge, 8.18 days with the Kuniang

without the crane and 5.39 days with the crane.

Decision Tree of How Much to Bid

A decision tree shows different decision alternatives and decision outcomes. In the chart, squares are

decisions, circles are outcomes. The decision utility is shown on the right hand side of the decision tree.

Here the utility is the total cost and net present value. If the company bids $7m, there is a 50% chance of

winning and a 50% chance of losing. If it wins they estimated that there is a 70% chance that it will be valued

as scrap and 30% chance that it will be valued at the bid price. The net present value is highest for the

Kuniang without the crane – the gearless option. So this is the preferred decision option. The company

ended being more conservative in its bid, but lost to a bid of $10 m. Why does the decision tree approach

make sense in this situation? Think about it!

Causal Model

For decision making under uncertainty there are no models to support the entire decision process (why?).

There are some to represent the causes and effects (causal model), influences (influence diagram) and

decision assumptions (SAST). We will briefly discuss the causal model. The causal model depicts causes and

effects in a decision situation. Consider the following situation. If we assume that ‘training’, ‘satisfaction’

with co-workers and ‘better pay’ leads to better ‘on-the-job’ performance, this can be modeled as a causal

map.

8

Fig 7. Causal model

Note the use of ‘+’ and ‘-‘. ‘+’ means ‘is proportional to’ and ‘-‘ means ‘is inversely proportional to’. For e.g.

the link between ‘on the job performance’ and ‘domestic problems’ is –ve. This means that if ‘domestic

problems’ increase, ‘on the job performance’ would decrease. According to the model if pay is increased, on-

the-job-performance increases. See if you can research other factors affecting ‘on the job performance’ and

add them here! How does a causal model help the decision maker?

DSS Applications

Decision support systems have been developed in various areas including cash forecasting, stock selection,

event scheduling etc. Following are some examples:

Cash forecasting -- forecasting cash requirements for an organization

Fire-fighting – deciding on a fire fighting technique to use

Portfolio selection – decide on stocks to select for a portfolio

Evaluate lending risk – decide whether to approve a loan or not

Event scheduling – to schedule games at Olympics

School location – decide where to locate schools

Police beat – decide on how to patrol neighborhoods based on crime patterns

Movie forecasting – forecasts the success of movies with a DSS called movieGuru.

Extensions to DSS

As discussed earlier, the DSS concept has been extended to BI, GIS, Collaborative Systems, Expert Systems

and Data Mining, but only BI and Collaborative Systems are discussed here. Other topics are discussed

further downstream.

BI systems:

BI systems are systems that provide information to executives on the business environment. Summary

information is provided in an easy-to-use drill-down format from operational databases. Drill down capability

refers to the ability to view data in increasing levels of detail. Yearly sales of a product for example can be

broken down into monthly and weekly sales. The screens and reports in a BI system can be easily tailored to

the needs of the individual executive.

Dashboard:

An interface that displays information needed to effectively run an enterprise. A dashboard is a metaphor for

the manner in which the information is displayed. Just as a car has a dashboard which shows the vital signs

of the car (oil level, battery level, engine RPM.), a BI dashboard is designed to give an overview of the health

9

of the enterprise, in a graphical, intuitive and easy to comprehend manner. Data can be provided in a variety

of formats including bar graphs, trend lines, line graphs, pie charts etc.

Fig 8. BI architecture

BI Architecture and capabilities:

A BI system draws on data from a number of operational databases. These are typically stored to a

warehouse and periodically updated.

The system provides a dashboard or an interface as described above. It also enables variables to be measured

according to score cards or metrics. These are defined by the individual company. For e.g. one company may

define the minimum volume of sales per sales person per month as 500 units. BI systems also support trend

analysis such as whether sales have increased or decreased relative to previous years etc. Finally, BI system can

provide reports that can be customized in a million different ways. A magazine editor may want to know what

articles on diets were printed between 2010-2015. An example of customization is that product sales data

may be listed row wise or column wise. Access to data is obviously an important assumption for BI as seen

in the diagram above. Data is drawn into a warehouse from operational databases. Although it is

advantageous (for decision makers) to have current data in the warehouse, this is not always possible due to

volume and time required for updates. So generally there is a time lag between warehouse data and

operational databases.

BI features can be summarized as:

Dashboard

Score cards/metrics

Analysis

Reports

Some limitations of BI systems are that detailed information may not be always be available and therefore

drill-down cannot be provided in those instances. A second limitation is that if the data is not favorable, such

as if there is a cost increase, the BI system is unable to explain it. For this we may need Artificial Intelligence

capabilities such as data mining.

BI Technologies:

OLAP consists of tools to analyze data in a warehouse for decision support. The analysis is carried out

mainly by data summarization (‘aggregation’) and slice and dice (cross sectionalization). The data is logically

10

organized in memory in the form of a cube (also called multi-dimensional organization). This organization

permits queries such as ‘how many leather jackets were sold in Tucson store in the month of December

2015’? Consider the sales table below: How many Camrods were sold in the northern region? The process

we go through to answer this question is called aggregation.

Sales

part month region units

Cam rod Apr SW 35

Cam rod Feb SW 52

Cam rod Jun NE 10

Cam rod Mar SW 35

Cam rod May SE 52

Cam shafts Jun MW 43

Cam shafts Mar MW 52

Fig 4. Table of Sales

Dimensions and concept hierarchies.

A dimension is an aspect of the data, it is a characteristic of a variable such as location, for sales variable. It is

an attribute that can vary along fixed values. For e.g. location could be Tucson, Chicago, Los Angeles etc.

Other examples of dimensions include product, manufacturer, time etc.

Dimensions can have hierarchies (or various levels of aggregations)

A concept hierarchy defines a sequence of mappings from a set of low-level concepts to higher-level, more

general concepts. A concept hierarchy permits data summarization (aggregation) along one or more

dimensions.

Fig 5. Example of Hierarchy Concept

In the example above, the concept hierarchy shows cellular sales broken across the manufacturer and product

dimensions. The manufacturer can take on the values ‘Samsung’, ‘Apple’ and ‘LG’. Samsung has the product

models ‘GAL S7’ and ‘GAL S6’, Apple has the ‘iPhone7’ and ‘iPhone6’ and LG has the ‘G5’, ‘G4’ and ‘G3’

11

models. Thus to find the total sales of Samsung, the sales of ‘GAL S7’ and ‘GAL S6’ need to be added. It is

10 mil + 38 mil = 48 mil units.

Multi-Dimensional Organization

The cube organization of data permits slice and dice operations. In this case we can see LG sales across

different regions (Midwest, Southwest, Southeast etc.), different products and different time periods (January,

February, march etc.). This data cube allows managers to ask questions such as “what were the sales of ‘G4’

models in the Midwest for the first quarter?”

Fig 6. LG sales across different regions

Collaborative Systems

Collaborative systems are a collection of technologies to support group activity. This includes support for

design, strategic planning, software development, research, reports etc. This technology evolved out of

Group Decision Support Systems which was an extension of decision support technologies. The primary

purpose of GDSS was to support group decision making. Collaborative systems technologies are broader

and include those that support decision making, document sharing and design.

The advantages of collaborative systems are that they support the 22nd century organization, reduce travel,

save time and could ultimately improve the end product as a result of inputs from multiple parties.

Social Media

Technology supported social interaction has become a popular phenomenon in recent times and is an

important source of marketing information for many retailers. Twitter, Facebook, Whatsapp and LinkedIn

are among the prime examples of social media. These can be used in marketing in a number of ways:

1) They can be used to directly advertise the medium itself as in the case of twitter (i.e. tweets) and

iphone (text messaging).

2) Many organizations also have internal social networking sites for employees to interact. This is

valuable for organizations to promote if it can gain work related information from the social media

site or if the (internal) networking site has work interactions. Companies like W.R. Grace and

GoreTex use internal social networking sites to publicize their projects or get feedback from them.

3) Social media can be used to create a standing presence, such as having a “company page” in

Facebook.

4) Social media can also be used to mine data on brands and brand preferences as well as to predict

demand. The latest trend is to analyze sentiments regarding products and services i.e. whether a

12

product/brand is positively mentioned or negatively mentioned. Sentiments are a good predictor of

future sales.

13

MANAGEMENT SUPPORT SYSTEMS—II

The concept of Management Support Systems has been extended through the use of data for business

improvement. This has been an increasing trend since the late 1990’s. Technologies such as data

warehousing, OLAP and data mining are available to collect and analyze data for improving sales and

marketing. These technologies collectively are being known as ‘business analytics’ and sometimes incorrectly

as ‘big data’.

Data Analytics: This is a broad umbrella term that describes the tools and technologies used to draw

meaningful conclusions from the data. It uses both traditional statistical techniques such as correlation,

regression, chi-square, T-test etc. as well as more advanced techniques (used in mining) as listed below:

Table 1. Examples of statistical techniques used in Data Analytics

Technique Brief Description

Correlation Technique to find relationship between numeric variables.

Chi-squared test Allows to test if two random variables are independent i.e. probability distribution of

each variable does not influence others.

T-test Evaluate if there are differences among group distribution by comparing the mean

and variance of each group.

Analysis of Variance Statistical model to analyze the difference among group distribution by comparing

mean and variance of each group.

Logistic regression Looks for relationship between a binary variable and one or more nominal variables.

Table 2. Examples of statistical techniques used in Data Analytics

Technique Brief Description

Machine learning Field of computer science that deals with pattern & speech recognition and text

analytics. There are two types: supervised and unsupervised.

Naïve bayes

classifier

Probabilistic technique for constructing classifiers.

K-means clustering Used to find groupings in the data that are not pre-defined.

Time Series analysis Analysis of data that has time (day, month, year) as a component.

Association rule Uncovered relationship can be represented in the form of association rules.

Text analysis Identifying relationships from unstructured data such as blogs and tweets.

Table 3. Examples of tools used in Data Analytics

Tool Brief Description

R programming

language

Open source programming language with focus on statistical analysis

Python General purpose programming language and contains large number of libraries for

data analysis

Julia High level dynamic programming language for technical computing

14

SAS Commercial language used for business intelligence

SPSS Used to analyze survey data. It is a product of IBM for statistical analysis

Matlab, Octave Both are tools used for research. Octave is open source version of Matlab

CRISP-DM (Cross Industry Standard Process for data analysis)

CRISP is an industry standard methodology that is widely used. The process starts with understanding the

business objectives of the mining. What exactly needs to be found? Is it gas usage, sales trends, blow out of

transformers? The next step is to understand the data in terms of what needs to be extracted into the data

warehouse (note that the diagram shows database). Operational databases have hundreds of tables. Those

relevant to the business objectives will need to be extracted. Data often needs to be integrated from different

sources, cleaned and transformed. For example, if the dollar amount, sales/sales person is required, it needs

to be calculated. This is called data preparation. In the modelling stage an analyst will select a suitable model

for e.g. K-means for clustering (note that decision tree can also be used) or A-priori for associative analysis.

The evaluation stage is concerned about seeing if the business objectives have been achieved. If they have

been achieved, the results are shared (i.e. deployment stage) or if the results are not satisfactory the study is

repeated (perhaps with different models and different data sets.).

Fig 1. The Crisp Cycle for Data Mining (same for analytics too)

15

Data Warehouses:

A data warehouse is large collection of historical data that is organized specifically for use in decision support

(i.e. OLAP, data mining). Please note that a data warehouse is simply a ‘bare bones’ database designed

specifically for volume and speed. It has facilities for selecting and loading large volumes of business data –

this is typically transaction data from Point of Sale systems (e.g. retail checkout). Generally the data has to be

in the order of terabytes to be considered a warehouse and larger than a petabyte to be considered “big data.”

Data life cycle:

Data is brought into the warehouse from various sources. It could come from external or internal sources.

Data can be collected from outside the organization and shared with it. For example a credit card company

could share transaction information with hotels and restaurants. Most frequently, data is internally generated

through sales transactions. Since a data warehouse is too unwieldy, data could be divided up into data marts

which are subsets of the warehouse. For example a warehouse could have sales data going back 20 years and

managers could be only interested in the last ten years of data. So a data mart that has the last ten years of

data could be created to meet the needs of the manager. Once data is extracted to a data mart, two

technologies could be deployed: a) OLAP and b) data mining. OLAP uses summarization operations to

provide a high level summary of the data – example “how many Volvo S70’s had transmission problems in

the last ten years?” Data mining in contrast, looks for patterns in the data – “What model had more

transmission problems than other models?” Data visualization refers to the presentation of information via a

dashboard (which is one part of a BI system).

Fig 2. Data Life Cycle

16

Characteristics of a warehouse (FYI):

A data warehouse has some defining characteristics. It is subject-oriented, integrated, time-variant, non-

volatile.

Subject-oriented: A data warehouse is organized around major subjects, such as customer, supplier, product,

and sales. You should recognize these as entity classes.

Integrated: A data warehouse is usually constructed by integrating data from multiple heterogeneous sources,

such as relational databases, flat files, and on-line transaction records.

Time-variant: This means that time is a dimension or attribute of the data. In the case of sales data, the time

when the sale occurred is often captured with the data.

Nonvolatile: The data in a warehouse is permanent and does not change.

Design of warehouse:

The design of a warehouse is similar to databases. In relational databases, cross reference keys are used to

relate tables together. For example consider two tables, accounts and transactions. What key is used to link

these tables together? What is the purpose of linking tables together?

Table 4. Tables of Account and Transactions

Accounts

acct# name date opened balance

8898444 Smith 6/7/15 $35,000

8898454 Farley 4/22/16 $300

Transactions

tid t_type t_amt account#

45940 interest 12.43 8898444

45941 deposit 200.00 8898454

The design of the warehouse is not very different from databases, but a warehouse is designed for queries.

Warehouse data is converted to a multi-dimensional organization at run time – this is what is OLAP.

W/H organization:

The ware house could be organized into star, snowflake and constellation schemas.

Star schema: Consists of a large central table and a set of smaller tables, one each for each dimension.

Snowflake schema: A variant of the star schema, where some dimension tables are normalized, thereby

splitting the data into additional tables.

Constellation schema: A collection of stars.

17

Fig 3. Example of Warehouse and Marts

Shown above is an example of warehouses and data marts. The warehouse contains detailed sales data of an

organization between 2010-2015. A data mart is an extract or subset of the warehouse. In this case, there are

two data marts: weekly sales listed by state for the years 2010-2015 (i.e. across all products) and weekly sales

listed by product for 2012-2015 (across all states). These can be aggregated into sales by region and sales by

product line as seen in the top level of the diagram. In theory any number of data marts are possible from a

warehouse. The type of data mart chosen depends on the information (query) needs of managers.

OLAP:

OLAP consists of tools to analyze data in a warehouse for decision support. The analysis is carried out

mainly by data summarization (‘aggregation’) and slice and dice (cross sectionalization). The data is logically

organized in memory in the form of a cube (also called multi-dimensional organization). This organization

permits queries such as ‘how many leather jackets were sold in Tucson store in the month of December

2015’? Consider the sales table below: How many Camrods were sold in the southwest region? The process

we go through to answer this question is called aggregation.

Table 5. Table of Sales

Sales

part month region units

Cam rod Apr SW 35

Cam rod Feb SW 52

Cam rod Jun NE 10

Cam rod Mar SW 35

Cam rod May SE 52

Cam shafts Jun MW 43

Cam shafts Mar MW 52

18

Data mining:

Data mining is the application of statistical and AI techniques to identify patterns that exist in large databases

but are hidden in the vast amounts of data, for e.g. “how long do patients stay at a hospital following a hip

surgery?”, “how much do people spend on average at a shopping mall?”, “how much meat is consumed in a

fast food restaurant on a weekend vs weekday? “ This information is utilized by management in various

ways -- Utility companies can set rates, insurance companies can judge claims as being reasonable or not and

retail companies can stock merchandise according to how much customers are likely to buy. Harrahs casinos

found that its most profitable customers are not high rollers but middle income people and started catering

more to their needs. For e.g. by having bus shuttles from parking lots to casinos. The techniques are usually

sequence, association, classification and clustering.

Data Mining Techniques/Types of analysis

There are a number of techniques that are used in data mining stemming from statistics and AI some of these

are discussed as follows:

Sequence: Sequence is concerned with purchasing activities occurring one after another. For example, consumers who

buy a house are typically in the market for appliances and home owner’s insurance. Similarly a person who

finishes a degree and who may be entering the job market may be in the market for a credit card and a car.

Time series analysis, neural networks and genetic algorithms are techniques used in prediction.

Associative analysis:

Associative analysis is used to predict purchasing activities that occur at the same time i.e. items purchased

together. This is most often used in deciding retail store layouts. The A-priori is one of the popular

algorithms for mining.

Classification:

In classification, data is placed in one or other of pre-defined categories i.e. whether a person is a sports fan

or not or whether a person is likely to be in the market for a cellular package. The techniques that are used

are discriminant analysis, bayesian classification and others.

Clustering:

Clustering is concerned with identifying the natural groupings of data. The technique is similar to the notion

of a centroid. A centroid is the center of gravity or equivalently ‘center of mass’ for a two dimensional object.

It is the point from which the distance to every other point is minimized. Since there is no algorithm to find

the centroid, this is a trial and error technique. Clustering is tried with a trial number of cluster centers (e.g. 4)

and the inter-cluster/intra-cluster distances are examined for goodness of fit. If the fit is not good, clustering

is repeated with a different number of cluster centers.

The example below illustrates clustering of grocery store customers. The cluster centers are actual grocery

stores. The dots are customer addresses. The clustering exercise shows that customers are served well by

existing grocery stores.

19

Fig 4. An Example of Clustering of Grocery Store Customers

Associative analysis using a-priori algorithm

The A-priori algorithm is used to mine associations in a set of transactions.

1. list all items – create a 1 item set C1

2. filter items by min. transaction support and get set L1

3. identify 2-item sets by pairing elements of L1 with every other element in L1 and get C2

(L1* L1) = (C2 )

4. filter items by min. transaction support and generate L2

5. repeat with 3 item and 4 item sets...

The minimum transaction support is the minimum number (sometimes given as %) of transactions in which the

item must occur. Here it is given as 2.

Consider the following table showing purchases at a grocery store; each purchase/transaction is identified by

a transaction ID and followed by a list of items purchased such as eggs, oranges, jam etc.

Table 6. Table of Purchases identified by a Transaction ID

TID Items

100 A C D

200 B C E

300 A B C E

400 B E

20

From this a one item set C1 is created by taking each item and assessing its frequency. Item ‘A’ occurs twice

(in TID 100 and 200). Item ‘B’ occurs three times, in TID 200, 300 and 400 etc.

Table 7. Table of items and frequencies

Item Set Sup.

{A} 2

{B} 3

{C} 3

{D} 1

{E} 3

From this set L1 the set L1 * L1 = C2 is created, consisting of all possible pairings. The resulting items are

tested for support.

Table 8. Frequency tally of two item pairs

Itemset Sup.

{A B} 1

{A C} 2

{A E} 1

{B C} 2

{B E} 3

{C E} 2

In this case, ‘AB’, and ‘AE’ do not meet the minimum support. The resulting matrix answers the question

which two items are most frequently purchased together? (it is ‘BE’).

21

Probability:

Probability is the chance of an event happening. It is a number between 1 and 0.

1 – means the event occurs with certainty

0 – the event will not occur with certainty.

P(sun will rise tomorrow) = 1; P(rain) = 0.2.

P(Elvis is alive!) = 0

Prior probabilities:

Prior probabilities are knowledge of other events that may help improve prediction. According to one financial

blog, someone reported that the probability of an IPO (Initial Public Offering) being successful is 0.33 and

that if a big a company (eg: McDonalds) is behind the IPO, the chances of it being successful is 0.99. These

are expressed as follows:

P (IPO success) = 0.33

P (IPO success| big company) = .99 (‘this should be read as Probability of IPO success given big company is

behind IPO).

In this case knowledge of previous events improves prediction success. Prior probabilities are therefore

useful for prediction and there is a theorem that involves prior probabilities. How would you write in

formulae form: “the probability that it will rain given that it is sunny is 0.3”?

Bayes Theorem

Bayes theorem involves prior probabilities and their inverse

P(A|B) = [P(B|A) * P(A)]/P(B) where A and B are some events

Bayes theorem can be used in classification. If there are two classes ‘A’ and ‘B’ we can decide which class

based solely on probability – whichever class has the higher probability is the most likely outcome. If a state

in the U.S. has 60% declared republicans 30% declared democrats and 10% undecided, the outcome of an

election is most likely to be republicans since this probability is 0.6 and is higher than those of others. The

use of conditional probabilities increases the chances of prediction, so Bayes theorem can often be exploited

for calculating conditional probabilities.

A Problem involving Bayes Theorem

The following problem illustrates Bayes theorem:

We are interested in the p(person becoming manager/he she is doing MBA). We are given the following

assumptions.

300 m population; 100 m employees

500,000 are managers

10,000 managers go to college for an MBA

20 m go to college

100,000 do MBA

22

First we write out Bayes formula:

P(person becoming mgr|doing MBA) = [P(doing MBA|is a mgr) * P(mgr)]/P(doing MBA)

Please note that the formula always needs to be written in terms of the problem (rather than P(A) or P(B)..)

P(doing MBA|person is a mgr) = 10,000/500,000 {10,000 mgrs out of 500,000 go for MBA}

P(mgr) = 500,000/300 m {there are 500,000 mgrs out of a total population of 300 m}

P(doing MBA) = 100,000/300m {100,000 students do an MBA out of a population of

300m}

P(person becoming mgr|doing MBA) = [(10,000/500,000) * 500,000/300m]/(100,000/300m)

0.10 or 10%

Using Bayes Theorem for Classification

Bayes theorem can be exploited for classification. As noted earlier, placing a data item as belonging to Class I

or Class II is made on the basis of probability. The target class is the class for which the probability is higher.

We illustrate this via an example of classifying birds as eagles or hawks based on their wingspan. Since these

are birds of prey it is not possible to get close to them – the only distinguishing feature is their wingspan. The

average eagle is bigger than the average hawk. The probability of birds being eagles and hawks was calculated

by the observer as follows:

P(eagle) = neagle/N = 0.8 (from observations of birds, where N is total number of birds)

P(hawk) = nhawk/N = 0.2 (from observations of birds where N is total number of birds)

P(eagle|x) = [P(x|eagle). P(eagle)]/P(x). ………….. (1)

P(hawk|x) = [P(x|hawk). P(hawk)]/P(x) ……………(2)

It is assumed that the birds observed are only eagles and hawks since only these birds fly high. The unknown

bird will be classified depending on whichever probability is higher, P(eagle/x)? or P(hawk/x)? Since these

are unknown, Bayes theorem will need to be used. Further since all we need to ascertain is a comparison of

probability, the common term P(x) can be dropped. In equations (1) and (2) only P(x/eagle) and P(x/hawk)

are not known. These can be calculated from the probability density function (PDF). The PDF for eagles

and hawks is shown below. The PDF shows the probability of a certain wingspan given an eagle or a certain

wingspan given a hawk. The PDF for eagles is flatter than for hawks showing a greater variation in wingspan.

The numbers for a wingspan of 45 cm are given below, from the PDF.

P(45|eagle) = 2.22 x 10 -2

P(45|hawk) = 1.1 x 10 -2

23

Fig 5. The PDF for eagles and hawks

To compare P(eagle|x) and P(hawk|x),

P(eagle|x) = [P(x|eagle). P(eagle)]/P(x) = 2.22 x 10 -2 x 0.8

P(hawk|x) = [P(x|hawk). P(hawk)]/P(x) = 1.1 x 10 -2 x 0.2

= 0.01776 vs 0.0021 (the probability for eagle is higher, predicting eagle class).

Big Data

Big data is the name given to large volumes of data generated from diverse sources such as blogs, social

media posts, sensor data etc. This is data generated often in real time, in Petabytes (billion billion bytes or all

data in all the world’s academic libraries) and Exabytes (quadrillion or all conversations ever spoken). It is

useful to analyze it. For example in 2012, MIT researchers used cell phone data from the parking lots of

Macy’s in New York to estimate sales in Macy’s New York stores on black Friday. Similarly they used google

search data to predict real estate sales in Pennsylvania. This proved more accurate than the National Realtor’s

Association which is done only on an annual basis.

The technical framework for Big Data is called Hadoop. Hadoop is thus an (IS) environment for Big Data.

Housing the big data requires use of low cost servers called Hadoop Cluster Servers. Since the same data is

spread over multiple servers a distributed file system called HDFS is used. This file system is then aware of

how the data is spread over multiple servers. The data/queries have to be processed in parallel and answers

integrated and the software is called Map Reduce.

24

ARTIFICIAL INTELLIGENCE

Major milestones

The progress of any complex field is marked by a series of unrelated developments. The same is true for

Artificial Intelligence – we discuss some of the developments in AI. In 1950, Alan Turing, the famous British

mathematician wrote the paper “Computing Machinery and Intelligence” in which he proposed a test for

intelligence which is now known as the Turing test. To put it simply, the test is to compare a machine with a

human to see if it can perform as well as a human.

The next major event in the field was when the Rockefeller foundation funded the Dartmouth conference in

1956, which was a three day conference in Dartmouth, in New Hampshire. AI was formally recognized as a

field of study here. Programs like the logic theorist and GPS were exhibited here. The goal of AI is to endow

machines with intelligence so as to make them capable of human like behavior.

In 1958 the LISP language was introduced by John McCarthy. In this language the basic data structure is a

list. It was thought to be useful for symbolic reasoning i.e. reasoning with words. Subsequently in 1965 two

computer programs were introduced which caught the attention of both the businesss and scientific

communities. These are now known as expert systems. Dendral could identify chemical compounds from

spectrographs and Mycin could diagnose ailments. 1972 was a banner year for AI since two languages, Small

Talk and Prolog were introduced. Small Talk is the mother of all Object Oriented Languages. Prolog is a

language based on predicate logic and is still used today. In 1981 Japanese introduced the fifth generation project.

Part of the 5thGeneration project was to develop natural language interfaces with computers. They had the

idea that Personal Computers could be made more user friendly if they could communicate in natural

language. If PCs became more user friendly they could spur demand of semi-conductor chips over which

Japanese had a dominance at that time. Unfortunately the fifth generation project did not succeed. But it

spurred R & D funding in other countries like the U.K. and USA.

In 1995, Japanese introduced the Honda robot which was originally a 500 lb robot; now shrunk to the size of

a child so as to be less intimidating. Called Asimo (after Isaac Asimov the science fiction writer) it can walk

stairs and play football, much like a human child. In 2004 DARPA (department of army advanced research

projects agency) had a competition for driver-less vehicles. It was a 100 mile course in the desert that was laden

with natural and artificial barriers, which none of the vehicles could finish. The maximum distance

completed in the previous year was 7.5 miles and this became the length of the course for the following year

and Stanford’s entry won the DARPA competition. Subsequently, an Italian version of this vehicle (a

separate venture) travelled 8,000 km to prove the concept of driver-less vehicles.

Early research:

Logic – it’s a system of mathematical symbols for representing statements of truth that are known as facts or

assertions. Here it is asserted that if X is a dog, it likes to go on walks. If given a fact that ‘Fido’ is a dog the

conclusion that Fido likes to go on walks can be reached.

Dog(X) :- like_walks(X).

Dog(Fido) :- likes_walks(Fido)

25

Perceptrons – are machines based on a simple version of the neural model of the brain. In those days, the

brain was thought of as consisting of neurons that turned on or off in response to stimuli much like semi-

conductor memory. Perceptrons were trained to recognize animals (dogs and cats) as well as to distinguish

between males and females from photographs. This work was discontinued following a published critique by

Minsky and Pappert, but later continued in the form of neural networks.

Chess -- Chess is a game of eight squares but has real-world complexity. The games uses 32 pieces but there

are 2120 possible combinations of these pieces. Researchers thought that if they could conquer chess they

could solve problems in the real world. But this unfortunately was not the case. However the result of this

research was work on search algorithms. Solving a problem was viewed as searching for a solution in a space of

solutions i.e. the solution space.

Blocks World – Blocks world is an artificial world consisting of building blocks. Programs were developed to

move blocks around and to stack them, with the idea that this work could be transferred over to construction

in the real world. Unfortunately, this was also not the case. In the real world there is gravity and rules for

stacking blocks. For this reason blocks world programs did not transfer too well to the real world.

The outcome of early research was search strategies, like breadth-first search, depth-first search, heuristic, hill-

climbing as well as the realization that common sense (what happens if we throw a stone up?) is needed for real

world programs. To get an idea of search strategies, just imagine one has a van full of hungry children and

one just arrived at a major city. Because it is raining heavily assume cell phone map is not working. How

would one go about finding a pizza parlor? For example one could explore all possible roads one by one, a

depth first search. Do you have a better method (other than asking someone)?

Nature of intelligence:

Knowledge + reasoning power = intelligence

Expert systems (recall from this intro as well as from intro to IS) have given AI researchers one theory about

achieving intelligence -- to achieve intelligence, programs should be endowed with knowledge and reasoning

power. The reasoning is usually given via a program that can understand knowledge and draw required

conclusions or perform some action. The problem of AI then is one of giving machines knowledge.

Knowledge can be represented in various forms but it is not always easy or amenable. Assuming knowledge can

be given, how do we know whether a machine has become intelligent or not?

The test for machine intelligence:

The test for machine intelligence is called the Turing Test. The machine/program to be tested is isolated from

the tester. The tester simply asks questions through a computer to avoid bias. If by simply asking questions,

the tester cannot make out whether he or she is dealing with a human or a machine, the machine is said to

have passed the Turing test. To date, the Turing test has not been passed despite IBM’s Deep Thought.

We give machines knowledge and the ability to think with it. Knowledge is not easily defined. In the AI context it

is information organized for problem solving and can consist of facts, goals, problems and procedures. For

e.g. 1) The Eiffel tower has a height of 320 m. 2) If you want to get up to the top you have to take a lift up to

the second stage and take another lift from there.

Which of this is factual and which is procedural knowledge?

26

Fig 1. The Test for Machine Intelligence

Knowledge

Knowledge can be declarative (factual) or procedural. Declarative knowledge is factual like item #1 above.

Procedural is ‘how to knowledge’ illustrated by item#2. The author recently had a leaky shower tap. How to

fix the leak is procedural knowledge.

Predicate Logic

Predicate logic is based on logic but uses predicates to make or prove assertions (statements). Predicates are

labels signifying relationships. If we say:

Partner(tom,mary).

This states that the partner of ‘tom’ is ‘mary’. Note the use of lower case for variable values. Below we

assert that X is a mammal if it is warm blooded. Thus Mammal(X) implies that X is warm blooded.

‘Mammal’ and ‘warmblooded’ are predicates below.

Mammal(X) :-- warmblooded(X)

Predicate Logic is a very versatile general purpose representation scheme.

Frames

Are record like structures used to hold an assortment of facts about an object or a situation. It is

characterized as a ‘slot’ and ‘filler’ notation. Slots correspond to attributes. Fillers correspond to attribute

values. Fillers can be values or procedures or is-a links. Differs from records in this way.

Is_a: sports car

Model: Cayman

Manufacturer: Porsche

Retail price: $52,600

Accessories: check_with_dealer

Fig 2. A Representation of A Car Frame

Above is a representation of a car frame. The ‘IS-A’ slot indicates that the frame is a subtype or subclass of

sports car. The ‘Model’ is Cayman, ‘Manufacturer’ is Porsche and ‘retail price’ is $52,600. The ‘Accessories’

slot has a value ‘check_with_dealer’ which is a procedure. Frames are useful in classification (this should ring

a bell!) such as for ailments or rocks.

27

Scripts

Scripts are descriptions of actions in a pre-defined situation. For e.g. there are scripts for when a person goes

to a restaurant (restaurant script), when a person goes to a party (party script) or a wedding (wedding script).

A regular sequence of actions occur in these situations. The method originated in the film industry. A movie

script defines what actors do or say in the different scenes. Similarly we can expect a restaurant script to

consist of the following actions:

Walk into restaurant

Wait to be seated

Hostess asks how many

Shows you a table or booth

Asks if it is ok?

…………….

A formalized description of these events constitutes a script in knowledge representation. Scripts were used in

natural language understanding, to understand stories (e.g. red riding hood). They are necessary for a

program to understand stories since some of the knowledge in stories is implicit, such as why someone would

take a cake to visit grandma’s house.

Semantic nets

Semantic nets are a representation scheme based on associative memory. It is thought that the human brain

stores knowledge of related objects together for efficiency. Thus one would expect ‘wealth’ to be stored

adjacent to neurons storing big estates, luxury vehicles, boats and exotic vacations. A semantic net is said to

be a ‘node + link’ formalism. The nodes can represent concepts or values. Only single concepts or values are

permitted. Thus ’64 in TV’ is not permitted as a node value (it has to be just ‘TV’ with an attribute of size)

Links represent typically relationships and are of two types structural and descriptive. Structural links describe

the structure of the objects such as class/subclass (the link is labeled ‘is-a’) and part/subpart (the link is

labeled ‘has-a’). Descriptive links describe properties (or attributes – what does an attribute mean?) can have

any label (e.g. ‘max capacity’, ‘color’, ‘manufacturer’). In the example below eagle is both a bird and a bird-of-

prey. It has a part, wings (see the ‘has-a’ link). The max wingspan is given as 1.5 m. Note that ‘bird-of-prey’,

‘bird’, ‘eagle’ and ‘wings’ are concepts and ‘1.5m’ is a value. Semantic networks are useful in expressing

knowledge that has many relationships e..g relationships between various entities of the government.

Fig 3. Eagle Example of Semantic Networks

28

Rules.

Rules are a representation scheme based on the way we react to situations. Ultimately these are derived from

the S-R (stimulus-response) paradigm in psychology. A rule consists of the following format:

If A then B else C {else part is optional}

Where A is a condition or situation and B/C is a conclusion or an action. The conditions can be combined or

‘And-ed’ together. But conditions have to be expressed in terms of variables such as if temperature is high or

coolant level is low. Consider another example, ‘if patient has a high temperature and spots on the face then

patient is suffering from measles.’ This is expressed as a rule as follows:

If patient_temperature = high and spots_on_face = yes then patient_illness = measles.

Rules are useful in diagnosis/recommendation problems in narrowly defined domains such as selecting stocks,

currency trading, locomotive fault diagnosis or CIM configuration.

Branches of AI

AI is concerned with the principles and mechanisms for achieving intelligent behavior in machines. Initial

research on AI branched into a number of areas as follows:

Natural language processing

Natural language processing is concerned with understanding speech and written text. When a system can

understand natural language inputs, it can carry out a number of tasks such as taking dictation, translating

from one language to another etc.

Robotics

Robotics is concerned with giving machines human like movement so they can perform useful tasks such as

assembly and driving. This is one of the more developed branches in AI as industrial robots are very

common and currently carry out 20% of manufacturing work. Driverless vehicles will be deployed in the

near future.

Vision/perception systems

Vision/perception systems are concerned with machines having human like vision, so they can perform tasks

such as object recognition or machine inspection. Typical applications are recognizing employees at a factory

gate or examining finished parts in a factory. A more spectacular application is a driverless vehicle.

Expert systems

Expert systems are programs that incorporate human expertise and function like human experts. Some of

the early systems diagnosed medical problems (what is the name of this?) and identified chemical compounds

(what is the name of this?). Contemporary systems can perform any number of tasks including adjusting

roller height for a paper mill, restoring power in a ship under attack, configuration of CIM etc.

Machine learning

It’s a branch of AI concerned with automatic acquisition of knowledge by machines from successful and

unsuccessful cases. It can be used to predict exchange rates, detect spam from regular mail, recognize a

stressed traveler and many other applications.

29

Expert Systems

An expert system mimics a human expert by incorporating his/her knowledge. It can make

judgements/recommendations in the same way that a human expert can. Generally knowledge in an expert

system is incorporated in the form of rules such as ‘if a person has a fair credit history but a poor bank

balance then he/she is a poor credit risk’.

Expert Systems Architecture

The expert systems architecture is shown in the diagram below.

User Interface: This is the interface through which user enters information about the problem. For a credit

application, a person might enter information about his/her bank balance, his/her credit history etc.

Inference Engine: The inference engine is used to draw conclusions from rules. Consider a car that does not

start and the headlights do not turn on and the following two rules:

Rule1: If engine does not crank then problem could be battery or starter motor.

Rule2: If engine headlights will not turn on then problem is battery.

Based on the information and the two rules the inference engine can conclude that the problem is the battery.

Knowledge Base: The knowledge base consists of knowledge about the situation, generally in the form of

rules.

KA Subsystem: This is essentially an editor to enter rules in a user friendly way (imagine an email message,

but with prompts for ‘if’, ‘then’ and ‘else’ rather than ‘from’, ‘to’ ‘subject’)

Fig 4. Expert Systems Architecture

Neural Networks:

Neural networks are mathematical models to simulate neural models of the brain, often used in applications

requiring pattern recognition e.g. crime, fraud, intrusion detection etc.

30

Fig 5. Neural Networks

The diagram on the left (above) shows that the human brain consists of neurons or nerve endings that are

connected by microscopic fibers called dendrites. A neuron is 1/100th the thickness of a human hair. Neurons

are activated by signals received from the senses. If a neuron is activated, under certain conditions it activates

its neighbors, then it is said to fire. When this is done on a repeated basis a pattern is formed. Recognition then

is the matching of the incoming signal to this pattern through the process of firing neurons. The conditions

of activation are strength of the signal and frequency of association. For e.g. we are more easily able to

recognize common objects such as cars and computers than less frequent objects such as a hydraulic presses

or capacitors. The diagram below models an individual neuron. Individual neurons have an Activation

Function that are set by the designers of the neural net. An AF typically has variables and weights. If we set

an AF X+Y-2 > 0 where X and Y are inputs with weights of 1 and 1, and 0 is the threshold,

Fig 6. The Model of an Individual Neuron

for X = 1, Y = 1 the output of the neuron will be 1+1-2 = 0, so the neuron will not fire. If X = 2 and Y =1, the

output will exceed the threshold and the neuron will fire. Thus individual neurons are modeled. With a neural

net program, the weights are automatically adjusted by the program so that inputs match outputs for a

particular application. Applications could be in predicting currency rates or selecting stocks or predicting

water flow into a dam. For each of these applications the neural net has to be trained to recognize the pattern.

For e.g. a neural network can be trained for stock selection by being given a large number of successful and

unsuccessful stocks. In practice layers of neurons are required for the program to establish a connection

between inputs and outputs. In the example below the neural net is intended to associate the demographic

and calling habits to whether the cellular customer is ‘loyal’ or ‘hopper’ (person who goes for a better deal) or

‘lost’. Note that there is a layer of neurons between the inputs and outputs and that the system requires a

large number of cases before it can perform the task of classifying customers.

31

Fig 7. A Neural Net Example Associating Demographics with Customer Behavior

Challenges of AI

While the popular press claims that the Turing test has been passed, technologies such as Honda’s robot, IBM’s Deep thought and SIRI give only a semblance of intelligence in some specific situation. There are fundamental challenges in natural language processing, knowledge representation, speech and object recognition to name a few. Research online for some of these. Business applications of AI

Business applications of AI are those uses in organizations. AI applications are used in all functional areas

including marketing, production, accounting and finance, although there are more applications in production

because of the (mostly) repetitive nature of the work.

Marketing applications include data/text mining (what might be examples of these?), systems to support

pricing and negotiation; and automated response systems for customer service.

Production applications include design of machinery such as turbines, robotics to handle materials and

repetitive processes like painting and scheduling of autonomous cranes at a busy harbor.

Accounting and financial applications include expert systems for detecting accounting irregularities, neural

nets to predict currency exchange price movements, stock selection and credit approval.

Industrial applications of AI

Industrial applications of AI are the non-business applications such as driverless vehicles, facial recognition,

pothole detection and crime prevention.

32

KNOWLEDGE MANAGEMENT

KNOWLEDGE-CENTRIC DRIVERS

Organizations are under competitive pressures to improve productivity and earn a profit. For example there

are 282 car manufacturers in the world competing against one another for the customer’s purse. Previously

the approach to productivity was to attempt improvements to the technical system. What would be an

example of this? A new method of improving productivity in business is to leverage the intellectual

assets/intellectual capital of the organization. Thus a new driver of productivity in organizations is to reap

benefits from intellectual capital such as employee’s knowledge, patents, reports and customer de-briefings

rather than to strive for mechanical efficiency improvements. Knowledge is regarding as being the new

organizational resource.

KNOWLEDGE VS INFORMATION VS DATA

We are already familiar with one difference between data and information. We stated earlier in the course

that data are raw facts while information is organized data or elaboration of data. Now one should think of

information as processed data, where data could be filtered, contextualized, summarized and/or categorized

and knowledge as processed information. A company could receive customer feedback on its computers (what is

this – data, information or knowledge?) and classify it as: a) monitor problems, b) keyboard problems, c) disk

drive problems etc. This classified data is of more value to the organization and to decision makers than the

raw feedback.

Fig 1. Information Vs Data

DIFFERENCE BETWEEN INFORMATION & KNOWLEDGE?

There is also a corresponding difference between information and knowledge. Knowledge can be thought of

as human-processed information. When information is interpreted, analyzed, refined and annotated, it

becomes knowledge. An executive could review financial statements (information) and conclude that the

company is losing money in operations (knowledge).

33

Fig 2. Information Vs Knowledge

CASE OF TI

Texas Instruments like any large organization deals with a lot of paper. There are 3.100 data sheets relating to

its semiconductor products, each averaging about 12 pages in length. TI produces and maintains about 50

user guides, each of which averages 250 pages. TI supports its products with 400 application notes, each of

which is between 2 to 100 pages in length. The application notes support the product by providing additional

information. TI revises about 90,000 pages of documentation every year. The company has an army of 100

technical writers, 5 illustrators, and 10 team leaders who collectively manage this process. Is this data,

information or knowledge?

ANOTHER VIEW OF DATA, INFORMATION & KNOWLEDGE

The following highlights the difference between data, information and knowledge. Details about ethylene

such as chemical formula, density, molar mass, solubility in water are chemical properties of ethylene and

would be considered data (see first diagram below). On the other hand, the fact that it is used in the

production of (application) ethylene oxide, ethylene dichloride, ethylbenzene and polyethylene is information. It

is useful to place ethylene in a manufacturing context (see second diagram below). Finally, how reactions of

ethylene are carried out or how Ethylene Oxide is produced are examples of knowledge because this can be

used to manufacture the product. Here knowledge can be thought of as refinement of information or alternatively

as actionable information (see third diagram below).

34

Fig 3. The difference between Data, Information and Knowledge

DEFINING KM & ORGANIZATIONAL KNOWLEDGE

In this section, we will attempt to gain a greater understanding of knowledge management and organizational

knowledge.

KNOWLEDGE MANAGEMENT (KM)

The explicit management of organizational knowledge, including tools and processes to create, store, access

and disseminate organizational knowledge is called Knowledge Management. So KM efforts imply that the

organization consciously cultivates and manages its knowledge as a resource. A simple example of KM is to

manage a company’s ‘patent real estate’ using a simple database tool. This allows employees to search and

reuse the patent knowledge. Use of IT to support KM though desirable, is not always possible. KM

approaches can be classified into hard and soft approaches depending on whether they are technology oriented

or management oriented.

ORGANIZATIONAL KNOWLEDGE

Knowledge is defined as a mixture of experience and judgements that can form the basis for action. There are

two types of organizational knowledge, tacit and explicit. Tacit knowledge is knowledge embedded in human

agents, cognitive & production processes. Examples of these include: how to detect when it is time to

concede in a negotiation, how to detect when a chemical reaction has been completed, to know if an

interview candidate will make a good hire, who will be a good distribution partner, how to speak a language.

Explicit knowledge is knowledge encoded in the form of memos, procedure descriptions etc. Examples are

procedure manuals, TI’s ‘application notes’, information from past projects etc. (i.e. assuming they have

annotations and judgements). Explicit knowledge is more abundant and amenable to AI approaches, whereas

tacit knowledge is hidden in the minds of employees and is made more valuable. A major task of knowledge

management is to make tacit knowledge explicit. This is often carried out by identifying experts, capturing their

knowledge and encoding it. This is discussed in the next section.

35

KM CYCLE & APPROACHES

The Knowledge Management Cycle defines how knowledge is identified, captured, transferred and measured.

The cycle consists of:

a) Defining strategy – this is concerned with identifying the subject area to focus on for the KM effort. In

the case of HP (discussed later) the focus was on ERP consulting activity and for Rolls Royce it was the

design, development and maintenance of the supersonic engine. Usually it is carried out with the help of a

knowledge map or a high level view of the organization’s knowledge. (What phase of the software development

cycle does defining strategy correspond with?). The knowledge map indicates the scope of the KM effort

(this should be somewhat similar to what?)

Fig 4. An example of a knowledge map

b) Identifying sources of the knowledge viz. subject matter experts (SME) – these are individuals with

knowledge of about the domain. They are identified via surveys of department heads and making a shortlist

from them.

c) Generating the knowledge – since knowledge is in the tacit form it has to be made explicit via debriefings,

interviews etc.

d) Capture and codify – when knowledge is generated, it has to be captured in a suitable form to support

retrieval at a later stage. Knowledge could be captured in the form of documents, using multi-media or with

the help of artificial intelligence approaches. This issue is discussed subsequently.

e) Transfer/absorb – knowledge that is captured is made available to employees who need it, this can be done

through web sites, database approaches or customized applications,

f) Use/measure – knowledge usage is monitored to determine the effectiveness of the KM system. Rewards

can also be given. We will now discuss selected aspects of this cycle – Expert identification, knowledge

generation, capture and codification, usage and assessment.

36

Fig 5. The Knowledge Management Cycle

CAPTURE AND CODIFICATION

Knowledge can be captured using various technologies such as:

Social networking -- Knowledge is gathered from posts presumably made by experts on social media

e.g. Linked-In, CodeSnipp etc.

Document based – In this technology first popularized by Lotus Notes (a software for sharing

notes), each knowledge item is treated as a ‘document’. A user can create a document (such as about

how to design aircraft engines to be quiet) and others can respond to it (just like in an email).

Ontology based – An ontology is a formal classification of knowledge such as in welding (‘arc

welding’ ‘gas welding’..). Ontology based approaches provide a categorization of knowledge and use

it to store and retrieve knowledge.

AI based – In these approaches traditional knowledge representation techniques are used to store

knowledge.

DOCUMENT BASED KM SYSTEMS

Store knowledge as documents e.g. policy knowledge, problem resolution etc. This is a relatively simple

approach where an intranet or a collaborative tool such as ‘IBM Notes/Domino’ (an OIS) is used to store

knowledge as documents. With such a system an employee can create a document using a template. For e.g. a

software bug can be reported with the fields ‘created by’, ‘date’, ‘type of’, ‘keywords’ and the actual bug

report. It’s a simple approach that can be implemented easily. But it suffers from the same problem that

search engines have. Knowledge is retrieved by using keywords. If the query were ‘find an alloy that is

lighter than titanium and having at least three quarters of its strength’. Typing the keyword ‘alloy’ could lead

to hundreds of results. Also effort is required in keeping the knowledge up-to-date – a maintenance challenge

(where did we first come across the term maintenance?).

37

Fig 6. Store Knowledge as Documents.

ONTOLOGY BASED APPROACHES

Ontology is a scheme for characterizing and classifying knowledge so that it can be accessed easily. For

example engines could be classified into a) human powered, b) steam powered, c) gas powered etc. The best

way to understand ontology is to consider the Dewey decimal system. This is a system for properly shelving

books in libraries. In this scheme, each subject is given a number for e.g. 000 for ‘general works, computers

and information’, 200 for ‘religion’, 600 for ‘technology’, 800-for literature etc. Thus “The Adventures of

Sherlock Holmes” by Arthur Conan Doyle will have a call# of 823.91, being in literature. Ontology is a

scheme to similarly categorize knowledge items so that they can be retrieved easily with a standardized set of

keywords. (In what way is it an improvement over document based approaches?). Another way to think of

an ontology is as a formal description of a domain (e.g. welding).

THE DAST ONTOLOGY

DAST is a simple ontology to retrieve government resources online. It uses the keywords, ‘Data type,’

‘Agency’, ‘Action’, ‘Subject’ ‘Location’ (not shown below) and ‘Time’ as shown in the diagram and discussed

below.

Data Type: This describes the fundamental nature of the data, whether it is regulatory, descriptive, statistical,

ownership related etc. Thus information about an overseas embassy would be descriptive while the date a

license was issued falls into the sub-category of “ownership.”

Action: Describes the action associated with the subject that we are interested in. If there is none, the default

is “stative.” Thus in “pork consumption”, “consumption” is an action associated with the subject. Multiple

subjects will be associated with multiple actions. In queries involving regulations or ownership of property,

the action would be “stative.”

Agency: This describes the organization or organizations involved in the query. Examples include Microsoft,

the Federal Trade Commission, National Space Center etc. The default is “any.”

38

Subject: This describes the subject or main focus of the query and corresponds to the grammatical notion of

the concept. Subjects can be abstract or physical. There can be more than one subject.

Time: This attribute is used if there is a time dimension for the subject such as a license valid for a certain

duration or an expiry time for a tax credit. The time point could be a specific time point such as August 3rd,

2013 or a range such as 2000-2013.

Location: This attribute is used if there is a physical location associated with the subject. There can be more

than one location associated with a subject as in a ship going to multiple ports.

Fig 7. The DAST Ontology

EXAMPLE KNOWLEDGE ITEMS

Item1 is concerned with the per capita meat consumption in the U.S. Its most prominent qualities are time

period and data type. The data is available by decade from 2002-2012 and it is statistical in nature. When

querying for this data, a template similar to that below (see second item below) may be used.

ITEM 1: Excerpted from the U.S. Department of Agriculture Economic Research

39

Fig 8. The Per Capita Meat Consumption in the U.S

Fig 9. A template for querying Item#1

Item2 is concerned with the minimum wage and overtime pay for hourly employees. The only quality is the type of data, which is regulatory in nature. When querying for this, a template similar to the template below may be used, which indicates that a person is searching for ‘statistical data’ from the ‘USDA’ concerning ‘meat’ during the time period ‘2002-2012’. ITEM 2: Excerpted from the U.S. Department of Labor, Fair Labor Standards Act [http://www.dol.gov/elaws/esa/flsa/screen5.asp]. “The FLSA has been amended on many occasions since 1938. Currently, workers covered by the FLSA are entitled to the minimum wage of $8.15 per hour and overtime pay at a rate of not less than one and one-half times their regular rate of pay after 40 hours of work in a workweek. “

Fig 10. A template for querying Item#2

When querying for this, a template similar to the above template may be used, which indicates that a person is searching for ‘regulatory’ knowledge from ‘any organization’ concerning ‘minimum wage’ or ‘overtime’.

40

ANOTHER EXAMPLE OF AN ONTOLOGY (STEEL INDUSTRY CASE STUDY)

Fig 11. Terminology diagram

Diagram above shows ontology. Diagram below shows terminology for steel coil. There are two types of coil that are ordered, dual phase and HSLA. For quality purposes, a steel coil is sampled at points sp1, sp2… The length scope is the distance between sampling points. Defects can appear at sampling points as indicated by d1, d2 etc. Defects can be mechanical defects or material defects (only mechanical are shown above). Mechanical defects include temperature, skin pass, yield-strength etc. Defects can cause Phenomenon such as discoloration or stress that can extend into the coil.

x – x axis sp – sampling point d – defect ls – length scope

Fig 12. Ontology for steel coil

AI BASED APPROACHES

Artificial Intelligence based approaches use AI technologies to manage organizational knowledge. The great

advantage is that knowledge can be accessed by querying the knowledge base in a natural language (i.e.

spoken or written commands). Unfortunately, in this approach, even simple examples of professional

knowledge such as “Insurance is the obligation to compensate the insured in the event of a loss” are

41

surprisingly difficult to represent. The problem is that concepts such as ‘obligation’, ‘compensate’, ‘event of a

loss’, and ‘loss’ are abstract and therefore challenging and awkward to represent. At the present time the state

of the art of AI technology permits only representation of administrative support knowledge. This can be

used to represent knowledge such as “Van leaves DBSS at 8:00 am.” Or that “DBSS cannot own fixed

assets.” The approach uses systems developed using AI languages such as Visual Prolog.

THE AEI-3 SCHEME

A specialized representation scheme known as AEI-3 has been developed to store administrative support

knowledge. There are two types of links, structural (“S”) and descriptive links (“D”). ‘S’ links can designate

any structural relationship such as part-subpart (“has_a”) or class-instance (“is_a”), while “D” links designate

attributes and are defined between classes, instances and/or “extensions”. In the example below ‘Rangarajan’

is asserted as an instance of ‘Instructor’ through the ‘s:is_a’ link. Note the use of a double walled rectangle

for class and a single bordered rectangle for an instance. There is a ‘D:’ link between ‘Rangarajan’ and ‘C++’

showing that ‘Rangarajan’ teaches ‘C++’. There are no restrictions on link descriptors (“travel arrangements”)

except that they be minimalist.

Fig 13. An Example of AEI-3 Scheme

The second diagram below shows that Susan is an ‘instructor’ and that she is ‘available’ in ‘August’ which is

an instance of ‘month’. The collection of such items becomes a knowledge base. Such information cannot

be captured in conventional/rule-based knowledge bases. Note that it is necessary to capture, in this

knowledge base, common knowledge that ‘August’ is a ‘month’.

Fig 14. An Example of AEI-3 Scheme

42

KM USAGE – ASSESSMENT

The objective of the assessment is to assess the effectiveness of the KM program. One approach is to assess

knowledge levels in different areas of the organization. The assessment is carried out by department

managers and/or senior managers. Weaknesses if identified, become the targets of next year’s KM focus.

EVALUATE KNOWLEDGE

Knowledge is evaluated in areas such as ‘customer/market’, ‘employee/organization’, ‘product/service’. In

these areas, progress in knowledge is evaluated along the dimensions of ‘experience’, ‘image’, ‘fixed form’ i.e.

some outputs, and ‘system’. Managers are asked, ‘Compared to previous years levels, how do you rate

customer knowledge?’ and similar questions for other items of knowledge. They assign a number based on a

base of 100 (not maximum of 100).

As the chart below illustrates, experience knowledge seems to have decreased significantly in all three areas

i.e. customer/market’, ‘employee/organization’, ‘product/service’. Customer image has decreased

significantly in one area, slightly in another area and slightly increased in the third area. ‘Fixed form’ and

‘system’ have increased across all three areas. The knowledge assessment highlights the problem areas.

Fig 15. An Example of a Knowledge Audit

CASE STUDY OF KM INITIATIVE AT HP CONSULTING

The following case study describes the implementation of Knowledge Management (KM) initiatives at HP

Consulting in the early 2000’s. The company provides global consulting in managing IT services, PC support,

CRM and ERP. The KM Initiative was motivated by client demands for expertise, competitive pressures and

the global nature of the business. The objectives were to deliver more value and expertise to clients without

increasing the number of hours worked and to make knowledge sharing part of the corporate culture.

A pilot program was undertaken at two sites in North America. Management regarded these sites as good

candidates because only a few consultants there had expertise in the area and it had to be shared quickly. A

four step change model was utilized. Employees would be first mobilized by educating them about the

43

initiative and its vision. The vision emphasized the benefits and values of the initiative such as (Knowledge

Sharing being a valued behavior). This was operationalized through programs designed to share knowledge

including Learning Communities, Project Snapshots and Knowledge Mapping. In the transition stage, the KM

culture was propagated to the rest of employees in two day workshops. Despite initial resistance (due to

perceptions that it was ‘fluff’ and a waste of time), the workshops were successful in introducing knowledge

relationships among participants. They quickly formed learning communities and began to share their

expertise. Anecdotes regarding knowledge-sharing helped accelerate this trend. Outcomes included reduced

consulting time, re-use of trade material, increased levels of expertise and enthusiasm for the project.

With the success of the pilot programs, management of intellectual capital became an essential part of HP

Consulting’s strategy and was quickly incorporated into its knowledge processes.

CASE STUDY OF SIEMENS SHARENET

This case study concerns the Siemens Sharenet, an IT based KM system. Siemens is a large

Telecommunications conglomerate with headquarters in Munich and sales of over € 60b. It employs over

400,000 people in 190 countries. Up until the mid 1990’s Siemens dominated its markets, but

telecommunications deregulation and change in data traffic worldwide forced it to change from a simple

manufacturer of ‘boxes’ to a provider of customer communications solutions. As part of its re-organization,

the carrier and enterprise branches of Siemens was merged into a new group called the Information and

Communications Network (ICN). The new group had 65,000 employees and € 12b in sales. The case is

focused on this group. Since the company was shifting its emphasis from a “products” provider to a

“solution’ provider, it needed to leverage the expertise of its employees.

The president of strategy, Joachim Doring, invited a select group of top sales persons and developed a

detailed map of the selling process and identified broad categories of business knowledge relevant to each

stage. They planned for the system to use familiar technologies such as intranet, email, library and a forum.

It differed from other KM systems by being a multi-platform approach. The core part of the system consists

of the library that has knowledge objects. There were two types ‘environment objects’ and ‘solution objects.’

The environment objects have knowledge about competitors, markets, projects, customers etc. – “how did you

approach a customer?,” “what factors drive the customer’s decision process?” while the solution objects focused

on technical and functional solutions to specific situations – “how do you lay a cable in the jungle?” “how do

we set addresses for an IP based phone system?” To enter knowledge objects, employees filled a series of

structured questions, “What type of technology is it?” “What are the key features?”. A forum was used for

urgent requests. For e.g. “How dangerous is it to lay cable in the jungle?”

After several meetings, the first version of Sharenet was developed by a web development company in April

1999 and pilots were run in China, Malaysia, Portugal and Australia. The pilots were not successful as the

system was difficult to use. Subsequently, another company was hired and it revamped Sharenet so it was

easier to use even in places where the bandwidth was small. The next challenge was in getting employees to

use the system. To accomplish this, 2-3 day workshops were held. Participants were asked to identify

unsolved work problems and encouraged to post their problem in Sharenet. To their amazement, most

particpants received solution in a day or less. To further encourage usage, “evangelist” managers in other

countries were recruited to help propagate Sharenet. Technical support was provided by consultants at the

Munich headquarters, who also ensured that the content that was submitted was useful and clear. By 2,000

there were 3,800 users and the $7.8 m operating cost was absorbed by headquarters. One of the problems

encountered was the reluctance to share knowledge. To address this issue, a system of rewards points called

Sharenet shares were introduced. An employee could earn 3 shares by answering an urgent request or 20

44

shares for providing a success story/technical solution. Each share is equivalent to a Euro and could be

redeemed for company products, textbooks or business trips. This resulted in an increase in contributions.

Control of content was also shifted back to the functional areas. Each area of the organization had a

designated “global editor” who reviewed and published the content. The total number of objects published

worldwide was approximately 20,000. One of the success stories was the winning of a $460,000

telecommunications contract for a hospital that made use of the knowledge in Sharenet to prove that the

Siemens network was more reliable. By July 2002, sharenet had 19,000 subscribers.

The system saved time for employees as well as for Siemens and provided visibility for “unsung” experts in

the organization. Employees earned rewards for contributing their knowledge. Approximately 2.5 million

“shares” were earned by employees. The system played a key role in a total of twenty seven projects

representing € 120 million. The system also earned an award from the American Productivity and Quality

Center. Sharenet was unfortunately discontinued in 2003 due to financial problems in the parent company

and the unwillingness of division executives to share in the costs.

CASE STUDIES OF TECHNICAL APPROACHES

Fig 16. Case Studies of Technical Approaches