is 524 corporate information systems - wiu...
TRANSCRIPT
©Copyright 2017 by Chandra S. Amaravadi. All rights reserved. Electronic reproduction or distribution is strictly prohibited.
IS 524 CORPORATE INFORMATION SYSTEMS
SUPPLEMENTARY NOTES
Chandra S. Amaravadi April 26, 2017
Part III
Management Support Systems I Management Support Systems II Artificial Intelligence Knowledge Management
1
MANAGEMENT SUPPORT SYSTEMS-I
Introduction
Business complexity has been increasing since the 1970’s. There are now more products and payment
options than ever before. Competition aggravates the situation by forcing organizations to change strategies
more frequently. Greater amount of globalization means that there is a greater amount of co-ordination that
is necessary. All of these imply a greater degree of reliance on management support systems. The evolution
of information systems has been discussed in a previous chapter. DSS evolved from reporting systems which
grew out of transaction processing systems. DSS allows data to be analyzed, using models. The DSS concept
led to GIS, GDSS and BI systems. GIS overlay statistical data on a geography, the typical example being
automobile sales shown on a map of the country (BI systems provide similar capability). BI systems are based
on warehouses which evolved out of DBMS technologies. GDSS evolved into collaborative systems. Thus
DSS, GDSS, GIS & BI systems are all grouped into MSS.
Fig 1. MSS Evolution
Management Support Systems
Management support systems are those which support managerial activity. This begs the question what is
managerial activity? Decision making is obviously one important managerial activity and IT support for
managerial decisions can have its benefits. Hence MSS is an important area of Information Systems. MSS
have some characteristics – they tend to be interactive and customizable. Interactive means that the decision
maker is actively involved with the system and can change parameters, models and data. Contemporary
businesses cannot operated without the use of MSS. Some MSS are model based in the sense that there are
algorithms or mathematical equations for solving the problems.
Types of Decisions:
MSS are intended to support all managerial decisions. The types of decisions faced in organizations range
from those that are structured to those that are unstructured. Generally speaking, structured decisions are those
where the assumptions are known, the problem is known, the solution method is known. In unstructured
decisions, some or all of these are often not known. For e.g. whether to approve a loan, whether to promote
an employee or how much raw material to order are structured decisions. A company could be experiencing
a decline in sales and management may not be able to know the reason – whether due to product quality or
due to pricing or due to consumer whims. These unknowns lead to an unstructured decision situation and
become hypotheses to be tested. Unstructured decisions involve more variables and are therefore more
complex than structured decisions. They also have a greater impact on the organization.
2
Decision Making Styles:
Just as there are structured and unstructured decisions there are also different decision making styles.
Decision making styles vary on a continuum from analytical to intuitive. Analytical decision makers follow a
rational decision making style. They analyze a problem, develop alternatives and identify the alternative that
maximizes the decision maker’s utility. Intuitive decision makers make use of no known models but rather get
their information and opinions by talking to colleagues, going around company facilities, meeting vendors and
then abruptly produce a decision. If they are asked to explain their reasoning they cannot.
Fig 2. Decision Making Styles
Simon’s phases of decision making (IDC):
Herbert Simon postulated that rational decision makers move through a systematic process while making
decisions. This is characterized by IDC or ‘Intelligence’, ‘Design’ ‘Choice’. In the intelligence phase, the
decision maker collects information on the problem. During the design phase, the decision maker develops
viable alternatives for the problem. Finally during the choice phase, the decision maker selects the choice that
maximizes his or her decision making utility. It could be reducing costs, increasing profits, market share etc.
To give you an example let us say a manager wants to buy a plane for his/her company. First step is to
identify information about the problem -- -- how many people will need to travel, what range,
domestic/international travel, budget etc. Next step is to identify alternatives. Alternatives could involve
outright purchase, lease or fractional ownership. Next is to identify what alternative maximizes the utility to
the company. In this case utility is to fulfill the needs at the lowest cost. A DSS allows decision makers to list
and compare these alternatives and evaluate the decision.
Fig 3. Individual model of Decision Making/ Rational model
Organizational Models of Decision Making:
Just as there is an individual decision making model, there is also an organizational model or the organization’s
way of making decisions. This is the overall decision making style that is prevalent in the organization. You
3
are already familiar with the rational model. This means gathering information about the problem in a
systematic way, identifying alternatives and choosing an alternative that maximizes the decision makers’
utility. Then there is the bureaucratic model. You should already be familiar with this. In the bureaucratic
model decisions are made under constraints. For e.g. consider a department at a state university needing a
faculty member and lets say that a faculty from the department goes to a conference to recruit candidates. If
the faculty found a good candidate, he/she cannot hire them on the spot. The search has to be advertised in
a national media, to get all candidates. These candidates need to be selected and ranked. Reasons have to be
provided for the selection and ranking for the purpose of affirmative action. The selected candidates are then
called for an interview. But unfortunately top candidates also apply elsewhere and interview at other places
(because they are rational). The candidates still need to be brought on campus, wined and dined and
interviewed. Until all the top three candidates are interviewed, none of them can be given an offer. Thus this
illustrates decision making under constraints. The political model is one where decision making is clouded by
politics (tug of war between decision makers) or decision makers act to maximize their own interests. The
classic example of this is Jim Barksdale of Netscape who according to one magazine, took the company
public because he had to make a down payment on his yacht. Here an individual acted on personal interest.
Otherwise it would have been advantageous for the company to wait a few more years before IPO.
Biases in decision making (No ideal decisions)
Decision making is fraught with human biases.
a. Bounded rationality –Because of human cognitive limitations, decision makers can process only a limited
amount of information at any given time. For example, a manager may not be able to review more than
100 applications in a day. We have this problem for computers too. In linear programming, technology is
unable to handle problems with more than 120 constraints.
b. Anchoring and adjustment biases – In unstructured decision making situations many problem variables are
not known and need to be estimated. When estimating, we make an estimate based on the most recent
value. If we are asked to estimate the value of the Nasdaq in 20 years, we might start with 4,849. Since
that value is a little bit inflated the estimate may be completely wrong.
c. Localized search/uncertainty avoidance – It has been found that decision makers search for solutions in
their area of comfort and avoid as much uncertainty as possible. Suppose the company has a software in
existence, they will try to continue using it rather than to get a different software.
d. Heuristics – Heuristics are short cuts in decision making. For e.g. if a person is asked to estimate Nasdaq
in 20 years, he/she may use short cut such as four times the current value.
e. Constraints – Sometimes time or resource constraints force a decision. Examples of these are Candidates
need to be hired before a certain time period and contracts awarded before a deadline.
Decision Support Systems
A system that supports structured and semi-structured decision making by managers in their own personalized
way e.g. to decide projects, investments etc. DSS provide support for decisions where there is some degree
of structure. They do not support unstructured decisions simply because such types of decisions are difficult to
model let alone support with programs (i.e. computer programs or DSS). DSS are intended for use by
operational and tactical (lower and middle management but not strategic).
4
Fig 4. Overview of DSS
The diagram above us an overview of DSS concepts. We have seen that decisions can be structured or semi-
structured or unstructured. Unstructured decisions cannot be supported with a DSS as noted earlier since the
variables and solution method may not be known. Regardless of what type of decision, the decision has
variables or parameters – Interest rates, initial investment, degree of risk, etc. Decision parameters are entered
into the DSS through an interface called Dialogue Management. The Dialogue Management facility allows
problems to be entered. The decision is represented in an abstract fashion through formula -referred as a
decision model. Note that models need not always be mathematical. We can have qualitative models such as
simulation models where a definite outcome cannot be calculated. Decision utility concerns outcome valued
by decision maker such as costs, profits, etc.
Classical DSS Architecture:
The classical DSS architecture has been introduced in the late seventies and has remained unchanged over the
years. A decision support system has three components, as follows:
Fig 5. Classical DSS Architecture
5
Dialog management: Dialog management allows the decision problem to be represented and modeled. It is
essentially the user interface. For example certain types of decisions such as investment and cash flow lend
themselves to be modeled in a spreadsheet. Here the interface is two dimensional and the data is generally
numerical. So what type of problems cannot be represented in a spread sheet?
Model management: Models are abstract problem representations. Examples of models include NPV – Net
Present Value, EOQ – Economic Order Quantity. The Model management facility allows decision makers to
create and link models with the parameters/decision data (in the process of making a decision). Another way
to think about this is to create a formula using the decision data.
Data management: Manages decision data. It allows us to sort, filter and select data for the decision just as
with a database.
DSS Capabilities:
DSS capabilities are the analysis features provided by the system. This terminology is so popular that it has
become part of the business vocabulary.
What if - change one or more variables. What if sales were to increase by 10% instead of 15% and costs to
increase by 8% instead of 5%?
Sensitivity - Change one variable such as int. rates or growth rate and see impact on bottom line. If
changing a variable can change the solution by a great amount the variable is said to be sensitive.
Goal seeking - finding a solution to satisfy constraints. What level of sales would satisfy profit target?
Optimization- find best solution under a given set of constraints. What level of sales would maximize profit?
Classification of Models:
While decisions are characterized by degree of structure, DSS models are classified based on the degree of
certainty in the decision. It refers to how much information we have about the decision situation. Accordingly
we have decision making under certainty (e.g. L.P.), decision making under risk (e.g. decision tree, Bayesian,
Queing), and decision making under uncertainty (e.g. causal models and influence diagrams). Uncertainty or
risk in the decision is generally modelled with probability.
D.M. under certainty
Linear programming (LP)
Integer Programming
Non-linear programming models
Graph models (e.g. PERT)
D.M. under risk
Decision tree
Bayesian Analysis
Queuing, Discrete Event Simulation
Markov
D.M. under uncertainty.
Causal models
Influence diagrams
Strategic Assumption Surfacing & Testing (SAST)
6
Linear Programming
This is an example of a model for solving problems that have constraints. The constraints are typically
expressed as linear equations (y = mx + c). Suppose a company is manufacturing three products, ‘A’, ‘B’ and
‘C’. If ‘a’, ‘b’, ‘c’ represent quantities of A, B, C that should be produced and if there is a maximum capacity
of 200 units a day, this is expressed as a constraint:
a + b + c <= 200
if A takes 8 hours to produce, B takes 5 hours to produce and C takes 3 hours to produce, but the maximum
number of production hours available are 150, this is expressed as:
8 a + 5 b + 3 c <= 150
For a given problem there will be many such constraints modelled by linear equations and solved by an LP
program.
Decision Tree
A decision tree is one of the models for decision making under risk. In a decision tree, a decision is shown
as a tree with branches, the branches represent outcomes of a decision. Uncertainty is modeled with
probability or the chance that an outcome will occur. Consider for example a person who does an MBA (see
diagram below) – let us say that there is a 90% chance of them joining a company and a 10% chance of them
going into the teaching line. These are shown as outcomes of doing an MBA. Out of the people who work
in companies, 10% become managers and how many become co-ordinators? These are shown as sub
branches. Out of the people who work in academia some decide to go for a Ph.D while others go directly
into teaching. Note the use of squares for decisions and use of circles for outcomes.
Fig 6. An MBA Example of Decision Tree
D.M. Under Risk – Case of SS. Kuniang
This case illustrates the complexity of real world decision making. The Kuniang ran aground off the coast of
Florida in a storm. The owners wanted to sell it, but the Coast Guard had authority for selling. The scrap
value was estimated at $5 million and the repair costs at $15 million.
New England Electric System
NEES is a utility company on the East Coast. It uses coal for power generation. Their demand was for 4
million tons/year. But they already had a General Dynamics vessel that costs $70 million and could transport
7
2.5 million tons/year. They need additional capacity for transporting 1.5 million tons. They thought they
could refurbish Kuniang and use it to transport coal. The problem is whether or not to bid for Kuniang and
how much to bid. This is not a simple matter. It is tied in with issues of source of coal and turnaround times.
Decision Complications
There are some decision complications. First is whether to use Egyptian coal or PA coal. Egyptian coal has
less sulfur content than PA coal, but it takes 80 days for round trip time. Further there is the complication of
the Jones Act that gives priority to U.S. ships in U.S. ports. If there are ten ships in line at a U.S. port and the
third ship is a U.S. ship, it gets into port first. The only exception is if the repair cost exceeds 3 times the
value of the ship. Then it will be considered a Jones act ship. How is the value to be judged? That depends
on how the coast guard values the ship. They could value it as scrap or at the highest bid. Depending on that,
the repair costs necessary to make it a Jones Act ship will change. If they add a crane that adds to the cost (to
meet the Jones act requirement) but reduces turnaround time.
Decision Options
The decision alternatives are Kuniang, Kuniang with crane, General Dynamics vessel, or Tug Barge. The
next slide shows the data for the four options. The cost of the General Dynamics vessel is $70m, tug barge is
$32m, Kuniang without the crane is Bidprice plus $15m which is the cost of repairs and Kuniang with crane
costs Bid price plus $36million. Out of this, $15 million is for the repair of the ship and $21 million is for the
crane. Note that with the crane the cargo capacity of the Kuniang decreases by 5,750 tons. The round trip
takes 5.15 days with the General Dynamics vessel, 7.15 days with the Tug barge, 8.18 days with the Kuniang
without the crane and 5.39 days with the crane.
Decision Tree of How Much to Bid
A decision tree shows different decision alternatives and decision outcomes. In the chart, squares are
decisions, circles are outcomes. The decision utility is shown on the right hand side of the decision tree.
Here the utility is the total cost and net present value. If the company bids $7m, there is a 50% chance of
winning and a 50% chance of losing. If it wins they estimated that there is a 70% chance that it will be valued
as scrap and 30% chance that it will be valued at the bid price. The net present value is highest for the
Kuniang without the crane – the gearless option. So this is the preferred decision option. The company
ended being more conservative in its bid, but lost to a bid of $10 m. Why does the decision tree approach
make sense in this situation? Think about it!
Causal Model
For decision making under uncertainty there are no models to support the entire decision process (why?).
There are some to represent the causes and effects (causal model), influences (influence diagram) and
decision assumptions (SAST). We will briefly discuss the causal model. The causal model depicts causes and
effects in a decision situation. Consider the following situation. If we assume that ‘training’, ‘satisfaction’
with co-workers and ‘better pay’ leads to better ‘on-the-job’ performance, this can be modeled as a causal
map.
8
Fig 7. Causal model
Note the use of ‘+’ and ‘-‘. ‘+’ means ‘is proportional to’ and ‘-‘ means ‘is inversely proportional to’. For e.g.
the link between ‘on the job performance’ and ‘domestic problems’ is –ve. This means that if ‘domestic
problems’ increase, ‘on the job performance’ would decrease. According to the model if pay is increased, on-
the-job-performance increases. See if you can research other factors affecting ‘on the job performance’ and
add them here! How does a causal model help the decision maker?
DSS Applications
Decision support systems have been developed in various areas including cash forecasting, stock selection,
event scheduling etc. Following are some examples:
Cash forecasting -- forecasting cash requirements for an organization
Fire-fighting – deciding on a fire fighting technique to use
Portfolio selection – decide on stocks to select for a portfolio
Evaluate lending risk – decide whether to approve a loan or not
Event scheduling – to schedule games at Olympics
School location – decide where to locate schools
Police beat – decide on how to patrol neighborhoods based on crime patterns
Movie forecasting – forecasts the success of movies with a DSS called movieGuru.
Extensions to DSS
As discussed earlier, the DSS concept has been extended to BI, GIS, Collaborative Systems, Expert Systems
and Data Mining, but only BI and Collaborative Systems are discussed here. Other topics are discussed
further downstream.
BI systems:
BI systems are systems that provide information to executives on the business environment. Summary
information is provided in an easy-to-use drill-down format from operational databases. Drill down capability
refers to the ability to view data in increasing levels of detail. Yearly sales of a product for example can be
broken down into monthly and weekly sales. The screens and reports in a BI system can be easily tailored to
the needs of the individual executive.
Dashboard:
An interface that displays information needed to effectively run an enterprise. A dashboard is a metaphor for
the manner in which the information is displayed. Just as a car has a dashboard which shows the vital signs
of the car (oil level, battery level, engine RPM.), a BI dashboard is designed to give an overview of the health
9
of the enterprise, in a graphical, intuitive and easy to comprehend manner. Data can be provided in a variety
of formats including bar graphs, trend lines, line graphs, pie charts etc.
Fig 8. BI architecture
BI Architecture and capabilities:
A BI system draws on data from a number of operational databases. These are typically stored to a
warehouse and periodically updated.
The system provides a dashboard or an interface as described above. It also enables variables to be measured
according to score cards or metrics. These are defined by the individual company. For e.g. one company may
define the minimum volume of sales per sales person per month as 500 units. BI systems also support trend
analysis such as whether sales have increased or decreased relative to previous years etc. Finally, BI system can
provide reports that can be customized in a million different ways. A magazine editor may want to know what
articles on diets were printed between 2010-2015. An example of customization is that product sales data
may be listed row wise or column wise. Access to data is obviously an important assumption for BI as seen
in the diagram above. Data is drawn into a warehouse from operational databases. Although it is
advantageous (for decision makers) to have current data in the warehouse, this is not always possible due to
volume and time required for updates. So generally there is a time lag between warehouse data and
operational databases.
BI features can be summarized as:
Dashboard
Score cards/metrics
Analysis
Reports
Some limitations of BI systems are that detailed information may not be always be available and therefore
drill-down cannot be provided in those instances. A second limitation is that if the data is not favorable, such
as if there is a cost increase, the BI system is unable to explain it. For this we may need Artificial Intelligence
capabilities such as data mining.
BI Technologies:
OLAP consists of tools to analyze data in a warehouse for decision support. The analysis is carried out
mainly by data summarization (‘aggregation’) and slice and dice (cross sectionalization). The data is logically
10
organized in memory in the form of a cube (also called multi-dimensional organization). This organization
permits queries such as ‘how many leather jackets were sold in Tucson store in the month of December
2015’? Consider the sales table below: How many Camrods were sold in the northern region? The process
we go through to answer this question is called aggregation.
Sales
part month region units
Cam rod Apr SW 35
Cam rod Feb SW 52
Cam rod Jun NE 10
Cam rod Mar SW 35
Cam rod May SE 52
Cam shafts Jun MW 43
Cam shafts Mar MW 52
Fig 4. Table of Sales
Dimensions and concept hierarchies.
A dimension is an aspect of the data, it is a characteristic of a variable such as location, for sales variable. It is
an attribute that can vary along fixed values. For e.g. location could be Tucson, Chicago, Los Angeles etc.
Other examples of dimensions include product, manufacturer, time etc.
Dimensions can have hierarchies (or various levels of aggregations)
A concept hierarchy defines a sequence of mappings from a set of low-level concepts to higher-level, more
general concepts. A concept hierarchy permits data summarization (aggregation) along one or more
dimensions.
Fig 5. Example of Hierarchy Concept
In the example above, the concept hierarchy shows cellular sales broken across the manufacturer and product
dimensions. The manufacturer can take on the values ‘Samsung’, ‘Apple’ and ‘LG’. Samsung has the product
models ‘GAL S7’ and ‘GAL S6’, Apple has the ‘iPhone7’ and ‘iPhone6’ and LG has the ‘G5’, ‘G4’ and ‘G3’
11
models. Thus to find the total sales of Samsung, the sales of ‘GAL S7’ and ‘GAL S6’ need to be added. It is
10 mil + 38 mil = 48 mil units.
Multi-Dimensional Organization
The cube organization of data permits slice and dice operations. In this case we can see LG sales across
different regions (Midwest, Southwest, Southeast etc.), different products and different time periods (January,
February, march etc.). This data cube allows managers to ask questions such as “what were the sales of ‘G4’
models in the Midwest for the first quarter?”
Fig 6. LG sales across different regions
Collaborative Systems
Collaborative systems are a collection of technologies to support group activity. This includes support for
design, strategic planning, software development, research, reports etc. This technology evolved out of
Group Decision Support Systems which was an extension of decision support technologies. The primary
purpose of GDSS was to support group decision making. Collaborative systems technologies are broader
and include those that support decision making, document sharing and design.
The advantages of collaborative systems are that they support the 22nd century organization, reduce travel,
save time and could ultimately improve the end product as a result of inputs from multiple parties.
Social Media
Technology supported social interaction has become a popular phenomenon in recent times and is an
important source of marketing information for many retailers. Twitter, Facebook, Whatsapp and LinkedIn
are among the prime examples of social media. These can be used in marketing in a number of ways:
1) They can be used to directly advertise the medium itself as in the case of twitter (i.e. tweets) and
iphone (text messaging).
2) Many organizations also have internal social networking sites for employees to interact. This is
valuable for organizations to promote if it can gain work related information from the social media
site or if the (internal) networking site has work interactions. Companies like W.R. Grace and
GoreTex use internal social networking sites to publicize their projects or get feedback from them.
3) Social media can be used to create a standing presence, such as having a “company page” in
Facebook.
4) Social media can also be used to mine data on brands and brand preferences as well as to predict
demand. The latest trend is to analyze sentiments regarding products and services i.e. whether a
12
product/brand is positively mentioned or negatively mentioned. Sentiments are a good predictor of
future sales.
13
MANAGEMENT SUPPORT SYSTEMS—II
The concept of Management Support Systems has been extended through the use of data for business
improvement. This has been an increasing trend since the late 1990’s. Technologies such as data
warehousing, OLAP and data mining are available to collect and analyze data for improving sales and
marketing. These technologies collectively are being known as ‘business analytics’ and sometimes incorrectly
as ‘big data’.
Data Analytics: This is a broad umbrella term that describes the tools and technologies used to draw
meaningful conclusions from the data. It uses both traditional statistical techniques such as correlation,
regression, chi-square, T-test etc. as well as more advanced techniques (used in mining) as listed below:
Table 1. Examples of statistical techniques used in Data Analytics
Technique Brief Description
Correlation Technique to find relationship between numeric variables.
Chi-squared test Allows to test if two random variables are independent i.e. probability distribution of
each variable does not influence others.
T-test Evaluate if there are differences among group distribution by comparing the mean
and variance of each group.
Analysis of Variance Statistical model to analyze the difference among group distribution by comparing
mean and variance of each group.
Logistic regression Looks for relationship between a binary variable and one or more nominal variables.
Table 2. Examples of statistical techniques used in Data Analytics
Technique Brief Description
Machine learning Field of computer science that deals with pattern & speech recognition and text
analytics. There are two types: supervised and unsupervised.
Naïve bayes
classifier
Probabilistic technique for constructing classifiers.
K-means clustering Used to find groupings in the data that are not pre-defined.
Time Series analysis Analysis of data that has time (day, month, year) as a component.
Association rule Uncovered relationship can be represented in the form of association rules.
Text analysis Identifying relationships from unstructured data such as blogs and tweets.
Table 3. Examples of tools used in Data Analytics
Tool Brief Description
R programming
language
Open source programming language with focus on statistical analysis
Python General purpose programming language and contains large number of libraries for
data analysis
Julia High level dynamic programming language for technical computing
14
SAS Commercial language used for business intelligence
SPSS Used to analyze survey data. It is a product of IBM for statistical analysis
Matlab, Octave Both are tools used for research. Octave is open source version of Matlab
CRISP-DM (Cross Industry Standard Process for data analysis)
CRISP is an industry standard methodology that is widely used. The process starts with understanding the
business objectives of the mining. What exactly needs to be found? Is it gas usage, sales trends, blow out of
transformers? The next step is to understand the data in terms of what needs to be extracted into the data
warehouse (note that the diagram shows database). Operational databases have hundreds of tables. Those
relevant to the business objectives will need to be extracted. Data often needs to be integrated from different
sources, cleaned and transformed. For example, if the dollar amount, sales/sales person is required, it needs
to be calculated. This is called data preparation. In the modelling stage an analyst will select a suitable model
for e.g. K-means for clustering (note that decision tree can also be used) or A-priori for associative analysis.
The evaluation stage is concerned about seeing if the business objectives have been achieved. If they have
been achieved, the results are shared (i.e. deployment stage) or if the results are not satisfactory the study is
repeated (perhaps with different models and different data sets.).
Fig 1. The Crisp Cycle for Data Mining (same for analytics too)
15
Data Warehouses:
A data warehouse is large collection of historical data that is organized specifically for use in decision support
(i.e. OLAP, data mining). Please note that a data warehouse is simply a ‘bare bones’ database designed
specifically for volume and speed. It has facilities for selecting and loading large volumes of business data –
this is typically transaction data from Point of Sale systems (e.g. retail checkout). Generally the data has to be
in the order of terabytes to be considered a warehouse and larger than a petabyte to be considered “big data.”
Data life cycle:
Data is brought into the warehouse from various sources. It could come from external or internal sources.
Data can be collected from outside the organization and shared with it. For example a credit card company
could share transaction information with hotels and restaurants. Most frequently, data is internally generated
through sales transactions. Since a data warehouse is too unwieldy, data could be divided up into data marts
which are subsets of the warehouse. For example a warehouse could have sales data going back 20 years and
managers could be only interested in the last ten years of data. So a data mart that has the last ten years of
data could be created to meet the needs of the manager. Once data is extracted to a data mart, two
technologies could be deployed: a) OLAP and b) data mining. OLAP uses summarization operations to
provide a high level summary of the data – example “how many Volvo S70’s had transmission problems in
the last ten years?” Data mining in contrast, looks for patterns in the data – “What model had more
transmission problems than other models?” Data visualization refers to the presentation of information via a
dashboard (which is one part of a BI system).
Fig 2. Data Life Cycle
16
Characteristics of a warehouse (FYI):
A data warehouse has some defining characteristics. It is subject-oriented, integrated, time-variant, non-
volatile.
Subject-oriented: A data warehouse is organized around major subjects, such as customer, supplier, product,
and sales. You should recognize these as entity classes.
Integrated: A data warehouse is usually constructed by integrating data from multiple heterogeneous sources,
such as relational databases, flat files, and on-line transaction records.
Time-variant: This means that time is a dimension or attribute of the data. In the case of sales data, the time
when the sale occurred is often captured with the data.
Nonvolatile: The data in a warehouse is permanent and does not change.
Design of warehouse:
The design of a warehouse is similar to databases. In relational databases, cross reference keys are used to
relate tables together. For example consider two tables, accounts and transactions. What key is used to link
these tables together? What is the purpose of linking tables together?
Table 4. Tables of Account and Transactions
Accounts
acct# name date opened balance
8898444 Smith 6/7/15 $35,000
8898454 Farley 4/22/16 $300
Transactions
tid t_type t_amt account#
45940 interest 12.43 8898444
45941 deposit 200.00 8898454
The design of the warehouse is not very different from databases, but a warehouse is designed for queries.
Warehouse data is converted to a multi-dimensional organization at run time – this is what is OLAP.
W/H organization:
The ware house could be organized into star, snowflake and constellation schemas.
Star schema: Consists of a large central table and a set of smaller tables, one each for each dimension.
Snowflake schema: A variant of the star schema, where some dimension tables are normalized, thereby
splitting the data into additional tables.
Constellation schema: A collection of stars.
17
Fig 3. Example of Warehouse and Marts
Shown above is an example of warehouses and data marts. The warehouse contains detailed sales data of an
organization between 2010-2015. A data mart is an extract or subset of the warehouse. In this case, there are
two data marts: weekly sales listed by state for the years 2010-2015 (i.e. across all products) and weekly sales
listed by product for 2012-2015 (across all states). These can be aggregated into sales by region and sales by
product line as seen in the top level of the diagram. In theory any number of data marts are possible from a
warehouse. The type of data mart chosen depends on the information (query) needs of managers.
OLAP:
OLAP consists of tools to analyze data in a warehouse for decision support. The analysis is carried out
mainly by data summarization (‘aggregation’) and slice and dice (cross sectionalization). The data is logically
organized in memory in the form of a cube (also called multi-dimensional organization). This organization
permits queries such as ‘how many leather jackets were sold in Tucson store in the month of December
2015’? Consider the sales table below: How many Camrods were sold in the southwest region? The process
we go through to answer this question is called aggregation.
Table 5. Table of Sales
Sales
part month region units
Cam rod Apr SW 35
Cam rod Feb SW 52
Cam rod Jun NE 10
Cam rod Mar SW 35
Cam rod May SE 52
Cam shafts Jun MW 43
Cam shafts Mar MW 52
18
Data mining:
Data mining is the application of statistical and AI techniques to identify patterns that exist in large databases
but are hidden in the vast amounts of data, for e.g. “how long do patients stay at a hospital following a hip
surgery?”, “how much do people spend on average at a shopping mall?”, “how much meat is consumed in a
fast food restaurant on a weekend vs weekday? “ This information is utilized by management in various
ways -- Utility companies can set rates, insurance companies can judge claims as being reasonable or not and
retail companies can stock merchandise according to how much customers are likely to buy. Harrahs casinos
found that its most profitable customers are not high rollers but middle income people and started catering
more to their needs. For e.g. by having bus shuttles from parking lots to casinos. The techniques are usually
sequence, association, classification and clustering.
Data Mining Techniques/Types of analysis
There are a number of techniques that are used in data mining stemming from statistics and AI some of these
are discussed as follows:
Sequence: Sequence is concerned with purchasing activities occurring one after another. For example, consumers who
buy a house are typically in the market for appliances and home owner’s insurance. Similarly a person who
finishes a degree and who may be entering the job market may be in the market for a credit card and a car.
Time series analysis, neural networks and genetic algorithms are techniques used in prediction.
Associative analysis:
Associative analysis is used to predict purchasing activities that occur at the same time i.e. items purchased
together. This is most often used in deciding retail store layouts. The A-priori is one of the popular
algorithms for mining.
Classification:
In classification, data is placed in one or other of pre-defined categories i.e. whether a person is a sports fan
or not or whether a person is likely to be in the market for a cellular package. The techniques that are used
are discriminant analysis, bayesian classification and others.
Clustering:
Clustering is concerned with identifying the natural groupings of data. The technique is similar to the notion
of a centroid. A centroid is the center of gravity or equivalently ‘center of mass’ for a two dimensional object.
It is the point from which the distance to every other point is minimized. Since there is no algorithm to find
the centroid, this is a trial and error technique. Clustering is tried with a trial number of cluster centers (e.g. 4)
and the inter-cluster/intra-cluster distances are examined for goodness of fit. If the fit is not good, clustering
is repeated with a different number of cluster centers.
The example below illustrates clustering of grocery store customers. The cluster centers are actual grocery
stores. The dots are customer addresses. The clustering exercise shows that customers are served well by
existing grocery stores.
19
Fig 4. An Example of Clustering of Grocery Store Customers
Associative analysis using a-priori algorithm
The A-priori algorithm is used to mine associations in a set of transactions.
1. list all items – create a 1 item set C1
2. filter items by min. transaction support and get set L1
3. identify 2-item sets by pairing elements of L1 with every other element in L1 and get C2
(L1* L1) = (C2 )
4. filter items by min. transaction support and generate L2
5. repeat with 3 item and 4 item sets...
The minimum transaction support is the minimum number (sometimes given as %) of transactions in which the
item must occur. Here it is given as 2.
Consider the following table showing purchases at a grocery store; each purchase/transaction is identified by
a transaction ID and followed by a list of items purchased such as eggs, oranges, jam etc.
Table 6. Table of Purchases identified by a Transaction ID
TID Items
100 A C D
200 B C E
300 A B C E
400 B E
20
From this a one item set C1 is created by taking each item and assessing its frequency. Item ‘A’ occurs twice
(in TID 100 and 200). Item ‘B’ occurs three times, in TID 200, 300 and 400 etc.
Table 7. Table of items and frequencies
Item Set Sup.
{A} 2
{B} 3
{C} 3
{D} 1
{E} 3
From this set L1 the set L1 * L1 = C2 is created, consisting of all possible pairings. The resulting items are
tested for support.
Table 8. Frequency tally of two item pairs
Itemset Sup.
{A B} 1
{A C} 2
{A E} 1
{B C} 2
{B E} 3
{C E} 2
In this case, ‘AB’, and ‘AE’ do not meet the minimum support. The resulting matrix answers the question
which two items are most frequently purchased together? (it is ‘BE’).
21
Probability:
Probability is the chance of an event happening. It is a number between 1 and 0.
1 – means the event occurs with certainty
0 – the event will not occur with certainty.
P(sun will rise tomorrow) = 1; P(rain) = 0.2.
P(Elvis is alive!) = 0
Prior probabilities:
Prior probabilities are knowledge of other events that may help improve prediction. According to one financial
blog, someone reported that the probability of an IPO (Initial Public Offering) being successful is 0.33 and
that if a big a company (eg: McDonalds) is behind the IPO, the chances of it being successful is 0.99. These
are expressed as follows:
P (IPO success) = 0.33
P (IPO success| big company) = .99 (‘this should be read as Probability of IPO success given big company is
behind IPO).
In this case knowledge of previous events improves prediction success. Prior probabilities are therefore
useful for prediction and there is a theorem that involves prior probabilities. How would you write in
formulae form: “the probability that it will rain given that it is sunny is 0.3”?
Bayes Theorem
Bayes theorem involves prior probabilities and their inverse
P(A|B) = [P(B|A) * P(A)]/P(B) where A and B are some events
Bayes theorem can be used in classification. If there are two classes ‘A’ and ‘B’ we can decide which class
based solely on probability – whichever class has the higher probability is the most likely outcome. If a state
in the U.S. has 60% declared republicans 30% declared democrats and 10% undecided, the outcome of an
election is most likely to be republicans since this probability is 0.6 and is higher than those of others. The
use of conditional probabilities increases the chances of prediction, so Bayes theorem can often be exploited
for calculating conditional probabilities.
A Problem involving Bayes Theorem
The following problem illustrates Bayes theorem:
We are interested in the p(person becoming manager/he she is doing MBA). We are given the following
assumptions.
300 m population; 100 m employees
500,000 are managers
10,000 managers go to college for an MBA
20 m go to college
100,000 do MBA
22
First we write out Bayes formula:
P(person becoming mgr|doing MBA) = [P(doing MBA|is a mgr) * P(mgr)]/P(doing MBA)
Please note that the formula always needs to be written in terms of the problem (rather than P(A) or P(B)..)
P(doing MBA|person is a mgr) = 10,000/500,000 {10,000 mgrs out of 500,000 go for MBA}
P(mgr) = 500,000/300 m {there are 500,000 mgrs out of a total population of 300 m}
P(doing MBA) = 100,000/300m {100,000 students do an MBA out of a population of
300m}
P(person becoming mgr|doing MBA) = [(10,000/500,000) * 500,000/300m]/(100,000/300m)
0.10 or 10%
Using Bayes Theorem for Classification
Bayes theorem can be exploited for classification. As noted earlier, placing a data item as belonging to Class I
or Class II is made on the basis of probability. The target class is the class for which the probability is higher.
We illustrate this via an example of classifying birds as eagles or hawks based on their wingspan. Since these
are birds of prey it is not possible to get close to them – the only distinguishing feature is their wingspan. The
average eagle is bigger than the average hawk. The probability of birds being eagles and hawks was calculated
by the observer as follows:
P(eagle) = neagle/N = 0.8 (from observations of birds, where N is total number of birds)
P(hawk) = nhawk/N = 0.2 (from observations of birds where N is total number of birds)
P(eagle|x) = [P(x|eagle). P(eagle)]/P(x). ………….. (1)
P(hawk|x) = [P(x|hawk). P(hawk)]/P(x) ……………(2)
It is assumed that the birds observed are only eagles and hawks since only these birds fly high. The unknown
bird will be classified depending on whichever probability is higher, P(eagle/x)? or P(hawk/x)? Since these
are unknown, Bayes theorem will need to be used. Further since all we need to ascertain is a comparison of
probability, the common term P(x) can be dropped. In equations (1) and (2) only P(x/eagle) and P(x/hawk)
are not known. These can be calculated from the probability density function (PDF). The PDF for eagles
and hawks is shown below. The PDF shows the probability of a certain wingspan given an eagle or a certain
wingspan given a hawk. The PDF for eagles is flatter than for hawks showing a greater variation in wingspan.
The numbers for a wingspan of 45 cm are given below, from the PDF.
P(45|eagle) = 2.22 x 10 -2
P(45|hawk) = 1.1 x 10 -2
23
Fig 5. The PDF for eagles and hawks
To compare P(eagle|x) and P(hawk|x),
P(eagle|x) = [P(x|eagle). P(eagle)]/P(x) = 2.22 x 10 -2 x 0.8
P(hawk|x) = [P(x|hawk). P(hawk)]/P(x) = 1.1 x 10 -2 x 0.2
= 0.01776 vs 0.0021 (the probability for eagle is higher, predicting eagle class).
Big Data
Big data is the name given to large volumes of data generated from diverse sources such as blogs, social
media posts, sensor data etc. This is data generated often in real time, in Petabytes (billion billion bytes or all
data in all the world’s academic libraries) and Exabytes (quadrillion or all conversations ever spoken). It is
useful to analyze it. For example in 2012, MIT researchers used cell phone data from the parking lots of
Macy’s in New York to estimate sales in Macy’s New York stores on black Friday. Similarly they used google
search data to predict real estate sales in Pennsylvania. This proved more accurate than the National Realtor’s
Association which is done only on an annual basis.
The technical framework for Big Data is called Hadoop. Hadoop is thus an (IS) environment for Big Data.
Housing the big data requires use of low cost servers called Hadoop Cluster Servers. Since the same data is
spread over multiple servers a distributed file system called HDFS is used. This file system is then aware of
how the data is spread over multiple servers. The data/queries have to be processed in parallel and answers
integrated and the software is called Map Reduce.
24
ARTIFICIAL INTELLIGENCE
Major milestones
The progress of any complex field is marked by a series of unrelated developments. The same is true for
Artificial Intelligence – we discuss some of the developments in AI. In 1950, Alan Turing, the famous British
mathematician wrote the paper “Computing Machinery and Intelligence” in which he proposed a test for
intelligence which is now known as the Turing test. To put it simply, the test is to compare a machine with a
human to see if it can perform as well as a human.
The next major event in the field was when the Rockefeller foundation funded the Dartmouth conference in
1956, which was a three day conference in Dartmouth, in New Hampshire. AI was formally recognized as a
field of study here. Programs like the logic theorist and GPS were exhibited here. The goal of AI is to endow
machines with intelligence so as to make them capable of human like behavior.
In 1958 the LISP language was introduced by John McCarthy. In this language the basic data structure is a
list. It was thought to be useful for symbolic reasoning i.e. reasoning with words. Subsequently in 1965 two
computer programs were introduced which caught the attention of both the businesss and scientific
communities. These are now known as expert systems. Dendral could identify chemical compounds from
spectrographs and Mycin could diagnose ailments. 1972 was a banner year for AI since two languages, Small
Talk and Prolog were introduced. Small Talk is the mother of all Object Oriented Languages. Prolog is a
language based on predicate logic and is still used today. In 1981 Japanese introduced the fifth generation project.
Part of the 5thGeneration project was to develop natural language interfaces with computers. They had the
idea that Personal Computers could be made more user friendly if they could communicate in natural
language. If PCs became more user friendly they could spur demand of semi-conductor chips over which
Japanese had a dominance at that time. Unfortunately the fifth generation project did not succeed. But it
spurred R & D funding in other countries like the U.K. and USA.
In 1995, Japanese introduced the Honda robot which was originally a 500 lb robot; now shrunk to the size of
a child so as to be less intimidating. Called Asimo (after Isaac Asimov the science fiction writer) it can walk
stairs and play football, much like a human child. In 2004 DARPA (department of army advanced research
projects agency) had a competition for driver-less vehicles. It was a 100 mile course in the desert that was laden
with natural and artificial barriers, which none of the vehicles could finish. The maximum distance
completed in the previous year was 7.5 miles and this became the length of the course for the following year
and Stanford’s entry won the DARPA competition. Subsequently, an Italian version of this vehicle (a
separate venture) travelled 8,000 km to prove the concept of driver-less vehicles.
Early research:
Logic – it’s a system of mathematical symbols for representing statements of truth that are known as facts or
assertions. Here it is asserted that if X is a dog, it likes to go on walks. If given a fact that ‘Fido’ is a dog the
conclusion that Fido likes to go on walks can be reached.
Dog(X) :- like_walks(X).
Dog(Fido) :- likes_walks(Fido)
25
Perceptrons – are machines based on a simple version of the neural model of the brain. In those days, the
brain was thought of as consisting of neurons that turned on or off in response to stimuli much like semi-
conductor memory. Perceptrons were trained to recognize animals (dogs and cats) as well as to distinguish
between males and females from photographs. This work was discontinued following a published critique by
Minsky and Pappert, but later continued in the form of neural networks.
Chess -- Chess is a game of eight squares but has real-world complexity. The games uses 32 pieces but there
are 2120 possible combinations of these pieces. Researchers thought that if they could conquer chess they
could solve problems in the real world. But this unfortunately was not the case. However the result of this
research was work on search algorithms. Solving a problem was viewed as searching for a solution in a space of
solutions i.e. the solution space.
Blocks World – Blocks world is an artificial world consisting of building blocks. Programs were developed to
move blocks around and to stack them, with the idea that this work could be transferred over to construction
in the real world. Unfortunately, this was also not the case. In the real world there is gravity and rules for
stacking blocks. For this reason blocks world programs did not transfer too well to the real world.
The outcome of early research was search strategies, like breadth-first search, depth-first search, heuristic, hill-
climbing as well as the realization that common sense (what happens if we throw a stone up?) is needed for real
world programs. To get an idea of search strategies, just imagine one has a van full of hungry children and
one just arrived at a major city. Because it is raining heavily assume cell phone map is not working. How
would one go about finding a pizza parlor? For example one could explore all possible roads one by one, a
depth first search. Do you have a better method (other than asking someone)?
Nature of intelligence:
Knowledge + reasoning power = intelligence
Expert systems (recall from this intro as well as from intro to IS) have given AI researchers one theory about
achieving intelligence -- to achieve intelligence, programs should be endowed with knowledge and reasoning
power. The reasoning is usually given via a program that can understand knowledge and draw required
conclusions or perform some action. The problem of AI then is one of giving machines knowledge.
Knowledge can be represented in various forms but it is not always easy or amenable. Assuming knowledge can
be given, how do we know whether a machine has become intelligent or not?
The test for machine intelligence:
The test for machine intelligence is called the Turing Test. The machine/program to be tested is isolated from
the tester. The tester simply asks questions through a computer to avoid bias. If by simply asking questions,
the tester cannot make out whether he or she is dealing with a human or a machine, the machine is said to
have passed the Turing test. To date, the Turing test has not been passed despite IBM’s Deep Thought.
We give machines knowledge and the ability to think with it. Knowledge is not easily defined. In the AI context it
is information organized for problem solving and can consist of facts, goals, problems and procedures. For
e.g. 1) The Eiffel tower has a height of 320 m. 2) If you want to get up to the top you have to take a lift up to
the second stage and take another lift from there.
Which of this is factual and which is procedural knowledge?
26
Fig 1. The Test for Machine Intelligence
Knowledge
Knowledge can be declarative (factual) or procedural. Declarative knowledge is factual like item #1 above.
Procedural is ‘how to knowledge’ illustrated by item#2. The author recently had a leaky shower tap. How to
fix the leak is procedural knowledge.
Predicate Logic
Predicate logic is based on logic but uses predicates to make or prove assertions (statements). Predicates are
labels signifying relationships. If we say:
Partner(tom,mary).
This states that the partner of ‘tom’ is ‘mary’. Note the use of lower case for variable values. Below we
assert that X is a mammal if it is warm blooded. Thus Mammal(X) implies that X is warm blooded.
‘Mammal’ and ‘warmblooded’ are predicates below.
Mammal(X) :-- warmblooded(X)
Predicate Logic is a very versatile general purpose representation scheme.
Frames
Are record like structures used to hold an assortment of facts about an object or a situation. It is
characterized as a ‘slot’ and ‘filler’ notation. Slots correspond to attributes. Fillers correspond to attribute
values. Fillers can be values or procedures or is-a links. Differs from records in this way.
Is_a: sports car
Model: Cayman
Manufacturer: Porsche
Retail price: $52,600
Accessories: check_with_dealer
Fig 2. A Representation of A Car Frame
Above is a representation of a car frame. The ‘IS-A’ slot indicates that the frame is a subtype or subclass of
sports car. The ‘Model’ is Cayman, ‘Manufacturer’ is Porsche and ‘retail price’ is $52,600. The ‘Accessories’
slot has a value ‘check_with_dealer’ which is a procedure. Frames are useful in classification (this should ring
a bell!) such as for ailments or rocks.
27
Scripts
Scripts are descriptions of actions in a pre-defined situation. For e.g. there are scripts for when a person goes
to a restaurant (restaurant script), when a person goes to a party (party script) or a wedding (wedding script).
A regular sequence of actions occur in these situations. The method originated in the film industry. A movie
script defines what actors do or say in the different scenes. Similarly we can expect a restaurant script to
consist of the following actions:
Walk into restaurant
Wait to be seated
Hostess asks how many
Shows you a table or booth
Asks if it is ok?
…………….
A formalized description of these events constitutes a script in knowledge representation. Scripts were used in
natural language understanding, to understand stories (e.g. red riding hood). They are necessary for a
program to understand stories since some of the knowledge in stories is implicit, such as why someone would
take a cake to visit grandma’s house.
Semantic nets
Semantic nets are a representation scheme based on associative memory. It is thought that the human brain
stores knowledge of related objects together for efficiency. Thus one would expect ‘wealth’ to be stored
adjacent to neurons storing big estates, luxury vehicles, boats and exotic vacations. A semantic net is said to
be a ‘node + link’ formalism. The nodes can represent concepts or values. Only single concepts or values are
permitted. Thus ’64 in TV’ is not permitted as a node value (it has to be just ‘TV’ with an attribute of size)
Links represent typically relationships and are of two types structural and descriptive. Structural links describe
the structure of the objects such as class/subclass (the link is labeled ‘is-a’) and part/subpart (the link is
labeled ‘has-a’). Descriptive links describe properties (or attributes – what does an attribute mean?) can have
any label (e.g. ‘max capacity’, ‘color’, ‘manufacturer’). In the example below eagle is both a bird and a bird-of-
prey. It has a part, wings (see the ‘has-a’ link). The max wingspan is given as 1.5 m. Note that ‘bird-of-prey’,
‘bird’, ‘eagle’ and ‘wings’ are concepts and ‘1.5m’ is a value. Semantic networks are useful in expressing
knowledge that has many relationships e..g relationships between various entities of the government.
Fig 3. Eagle Example of Semantic Networks
28
Rules.
Rules are a representation scheme based on the way we react to situations. Ultimately these are derived from
the S-R (stimulus-response) paradigm in psychology. A rule consists of the following format:
If A then B else C {else part is optional}
Where A is a condition or situation and B/C is a conclusion or an action. The conditions can be combined or
‘And-ed’ together. But conditions have to be expressed in terms of variables such as if temperature is high or
coolant level is low. Consider another example, ‘if patient has a high temperature and spots on the face then
patient is suffering from measles.’ This is expressed as a rule as follows:
If patient_temperature = high and spots_on_face = yes then patient_illness = measles.
Rules are useful in diagnosis/recommendation problems in narrowly defined domains such as selecting stocks,
currency trading, locomotive fault diagnosis or CIM configuration.
Branches of AI
AI is concerned with the principles and mechanisms for achieving intelligent behavior in machines. Initial
research on AI branched into a number of areas as follows:
Natural language processing
Natural language processing is concerned with understanding speech and written text. When a system can
understand natural language inputs, it can carry out a number of tasks such as taking dictation, translating
from one language to another etc.
Robotics
Robotics is concerned with giving machines human like movement so they can perform useful tasks such as
assembly and driving. This is one of the more developed branches in AI as industrial robots are very
common and currently carry out 20% of manufacturing work. Driverless vehicles will be deployed in the
near future.
Vision/perception systems
Vision/perception systems are concerned with machines having human like vision, so they can perform tasks
such as object recognition or machine inspection. Typical applications are recognizing employees at a factory
gate or examining finished parts in a factory. A more spectacular application is a driverless vehicle.
Expert systems
Expert systems are programs that incorporate human expertise and function like human experts. Some of
the early systems diagnosed medical problems (what is the name of this?) and identified chemical compounds
(what is the name of this?). Contemporary systems can perform any number of tasks including adjusting
roller height for a paper mill, restoring power in a ship under attack, configuration of CIM etc.
Machine learning
It’s a branch of AI concerned with automatic acquisition of knowledge by machines from successful and
unsuccessful cases. It can be used to predict exchange rates, detect spam from regular mail, recognize a
stressed traveler and many other applications.
29
Expert Systems
An expert system mimics a human expert by incorporating his/her knowledge. It can make
judgements/recommendations in the same way that a human expert can. Generally knowledge in an expert
system is incorporated in the form of rules such as ‘if a person has a fair credit history but a poor bank
balance then he/she is a poor credit risk’.
Expert Systems Architecture
The expert systems architecture is shown in the diagram below.
User Interface: This is the interface through which user enters information about the problem. For a credit
application, a person might enter information about his/her bank balance, his/her credit history etc.
Inference Engine: The inference engine is used to draw conclusions from rules. Consider a car that does not
start and the headlights do not turn on and the following two rules:
Rule1: If engine does not crank then problem could be battery or starter motor.
Rule2: If engine headlights will not turn on then problem is battery.
Based on the information and the two rules the inference engine can conclude that the problem is the battery.
Knowledge Base: The knowledge base consists of knowledge about the situation, generally in the form of
rules.
KA Subsystem: This is essentially an editor to enter rules in a user friendly way (imagine an email message,
but with prompts for ‘if’, ‘then’ and ‘else’ rather than ‘from’, ‘to’ ‘subject’)
Fig 4. Expert Systems Architecture
Neural Networks:
Neural networks are mathematical models to simulate neural models of the brain, often used in applications
requiring pattern recognition e.g. crime, fraud, intrusion detection etc.
30
Fig 5. Neural Networks
The diagram on the left (above) shows that the human brain consists of neurons or nerve endings that are
connected by microscopic fibers called dendrites. A neuron is 1/100th the thickness of a human hair. Neurons
are activated by signals received from the senses. If a neuron is activated, under certain conditions it activates
its neighbors, then it is said to fire. When this is done on a repeated basis a pattern is formed. Recognition then
is the matching of the incoming signal to this pattern through the process of firing neurons. The conditions
of activation are strength of the signal and frequency of association. For e.g. we are more easily able to
recognize common objects such as cars and computers than less frequent objects such as a hydraulic presses
or capacitors. The diagram below models an individual neuron. Individual neurons have an Activation
Function that are set by the designers of the neural net. An AF typically has variables and weights. If we set
an AF X+Y-2 > 0 where X and Y are inputs with weights of 1 and 1, and 0 is the threshold,
Fig 6. The Model of an Individual Neuron
for X = 1, Y = 1 the output of the neuron will be 1+1-2 = 0, so the neuron will not fire. If X = 2 and Y =1, the
output will exceed the threshold and the neuron will fire. Thus individual neurons are modeled. With a neural
net program, the weights are automatically adjusted by the program so that inputs match outputs for a
particular application. Applications could be in predicting currency rates or selecting stocks or predicting
water flow into a dam. For each of these applications the neural net has to be trained to recognize the pattern.
For e.g. a neural network can be trained for stock selection by being given a large number of successful and
unsuccessful stocks. In practice layers of neurons are required for the program to establish a connection
between inputs and outputs. In the example below the neural net is intended to associate the demographic
and calling habits to whether the cellular customer is ‘loyal’ or ‘hopper’ (person who goes for a better deal) or
‘lost’. Note that there is a layer of neurons between the inputs and outputs and that the system requires a
large number of cases before it can perform the task of classifying customers.
31
Fig 7. A Neural Net Example Associating Demographics with Customer Behavior
Challenges of AI
While the popular press claims that the Turing test has been passed, technologies such as Honda’s robot, IBM’s Deep thought and SIRI give only a semblance of intelligence in some specific situation. There are fundamental challenges in natural language processing, knowledge representation, speech and object recognition to name a few. Research online for some of these. Business applications of AI
Business applications of AI are those uses in organizations. AI applications are used in all functional areas
including marketing, production, accounting and finance, although there are more applications in production
because of the (mostly) repetitive nature of the work.
Marketing applications include data/text mining (what might be examples of these?), systems to support
pricing and negotiation; and automated response systems for customer service.
Production applications include design of machinery such as turbines, robotics to handle materials and
repetitive processes like painting and scheduling of autonomous cranes at a busy harbor.
Accounting and financial applications include expert systems for detecting accounting irregularities, neural
nets to predict currency exchange price movements, stock selection and credit approval.
Industrial applications of AI
Industrial applications of AI are the non-business applications such as driverless vehicles, facial recognition,
pothole detection and crime prevention.
32
KNOWLEDGE MANAGEMENT
KNOWLEDGE-CENTRIC DRIVERS
Organizations are under competitive pressures to improve productivity and earn a profit. For example there
are 282 car manufacturers in the world competing against one another for the customer’s purse. Previously
the approach to productivity was to attempt improvements to the technical system. What would be an
example of this? A new method of improving productivity in business is to leverage the intellectual
assets/intellectual capital of the organization. Thus a new driver of productivity in organizations is to reap
benefits from intellectual capital such as employee’s knowledge, patents, reports and customer de-briefings
rather than to strive for mechanical efficiency improvements. Knowledge is regarding as being the new
organizational resource.
KNOWLEDGE VS INFORMATION VS DATA
We are already familiar with one difference between data and information. We stated earlier in the course
that data are raw facts while information is organized data or elaboration of data. Now one should think of
information as processed data, where data could be filtered, contextualized, summarized and/or categorized
and knowledge as processed information. A company could receive customer feedback on its computers (what is
this – data, information or knowledge?) and classify it as: a) monitor problems, b) keyboard problems, c) disk
drive problems etc. This classified data is of more value to the organization and to decision makers than the
raw feedback.
Fig 1. Information Vs Data
DIFFERENCE BETWEEN INFORMATION & KNOWLEDGE?
There is also a corresponding difference between information and knowledge. Knowledge can be thought of
as human-processed information. When information is interpreted, analyzed, refined and annotated, it
becomes knowledge. An executive could review financial statements (information) and conclude that the
company is losing money in operations (knowledge).
33
Fig 2. Information Vs Knowledge
CASE OF TI
Texas Instruments like any large organization deals with a lot of paper. There are 3.100 data sheets relating to
its semiconductor products, each averaging about 12 pages in length. TI produces and maintains about 50
user guides, each of which averages 250 pages. TI supports its products with 400 application notes, each of
which is between 2 to 100 pages in length. The application notes support the product by providing additional
information. TI revises about 90,000 pages of documentation every year. The company has an army of 100
technical writers, 5 illustrators, and 10 team leaders who collectively manage this process. Is this data,
information or knowledge?
ANOTHER VIEW OF DATA, INFORMATION & KNOWLEDGE
The following highlights the difference between data, information and knowledge. Details about ethylene
such as chemical formula, density, molar mass, solubility in water are chemical properties of ethylene and
would be considered data (see first diagram below). On the other hand, the fact that it is used in the
production of (application) ethylene oxide, ethylene dichloride, ethylbenzene and polyethylene is information. It
is useful to place ethylene in a manufacturing context (see second diagram below). Finally, how reactions of
ethylene are carried out or how Ethylene Oxide is produced are examples of knowledge because this can be
used to manufacture the product. Here knowledge can be thought of as refinement of information or alternatively
as actionable information (see third diagram below).
34
Fig 3. The difference between Data, Information and Knowledge
DEFINING KM & ORGANIZATIONAL KNOWLEDGE
In this section, we will attempt to gain a greater understanding of knowledge management and organizational
knowledge.
KNOWLEDGE MANAGEMENT (KM)
The explicit management of organizational knowledge, including tools and processes to create, store, access
and disseminate organizational knowledge is called Knowledge Management. So KM efforts imply that the
organization consciously cultivates and manages its knowledge as a resource. A simple example of KM is to
manage a company’s ‘patent real estate’ using a simple database tool. This allows employees to search and
reuse the patent knowledge. Use of IT to support KM though desirable, is not always possible. KM
approaches can be classified into hard and soft approaches depending on whether they are technology oriented
or management oriented.
ORGANIZATIONAL KNOWLEDGE
Knowledge is defined as a mixture of experience and judgements that can form the basis for action. There are
two types of organizational knowledge, tacit and explicit. Tacit knowledge is knowledge embedded in human
agents, cognitive & production processes. Examples of these include: how to detect when it is time to
concede in a negotiation, how to detect when a chemical reaction has been completed, to know if an
interview candidate will make a good hire, who will be a good distribution partner, how to speak a language.
Explicit knowledge is knowledge encoded in the form of memos, procedure descriptions etc. Examples are
procedure manuals, TI’s ‘application notes’, information from past projects etc. (i.e. assuming they have
annotations and judgements). Explicit knowledge is more abundant and amenable to AI approaches, whereas
tacit knowledge is hidden in the minds of employees and is made more valuable. A major task of knowledge
management is to make tacit knowledge explicit. This is often carried out by identifying experts, capturing their
knowledge and encoding it. This is discussed in the next section.
35
KM CYCLE & APPROACHES
The Knowledge Management Cycle defines how knowledge is identified, captured, transferred and measured.
The cycle consists of:
a) Defining strategy – this is concerned with identifying the subject area to focus on for the KM effort. In
the case of HP (discussed later) the focus was on ERP consulting activity and for Rolls Royce it was the
design, development and maintenance of the supersonic engine. Usually it is carried out with the help of a
knowledge map or a high level view of the organization’s knowledge. (What phase of the software development
cycle does defining strategy correspond with?). The knowledge map indicates the scope of the KM effort
(this should be somewhat similar to what?)
Fig 4. An example of a knowledge map
b) Identifying sources of the knowledge viz. subject matter experts (SME) – these are individuals with
knowledge of about the domain. They are identified via surveys of department heads and making a shortlist
from them.
c) Generating the knowledge – since knowledge is in the tacit form it has to be made explicit via debriefings,
interviews etc.
d) Capture and codify – when knowledge is generated, it has to be captured in a suitable form to support
retrieval at a later stage. Knowledge could be captured in the form of documents, using multi-media or with
the help of artificial intelligence approaches. This issue is discussed subsequently.
e) Transfer/absorb – knowledge that is captured is made available to employees who need it, this can be done
through web sites, database approaches or customized applications,
f) Use/measure – knowledge usage is monitored to determine the effectiveness of the KM system. Rewards
can also be given. We will now discuss selected aspects of this cycle – Expert identification, knowledge
generation, capture and codification, usage and assessment.
36
Fig 5. The Knowledge Management Cycle
CAPTURE AND CODIFICATION
Knowledge can be captured using various technologies such as:
Social networking -- Knowledge is gathered from posts presumably made by experts on social media
e.g. Linked-In, CodeSnipp etc.
Document based – In this technology first popularized by Lotus Notes (a software for sharing
notes), each knowledge item is treated as a ‘document’. A user can create a document (such as about
how to design aircraft engines to be quiet) and others can respond to it (just like in an email).
Ontology based – An ontology is a formal classification of knowledge such as in welding (‘arc
welding’ ‘gas welding’..). Ontology based approaches provide a categorization of knowledge and use
it to store and retrieve knowledge.
AI based – In these approaches traditional knowledge representation techniques are used to store
knowledge.
DOCUMENT BASED KM SYSTEMS
Store knowledge as documents e.g. policy knowledge, problem resolution etc. This is a relatively simple
approach where an intranet or a collaborative tool such as ‘IBM Notes/Domino’ (an OIS) is used to store
knowledge as documents. With such a system an employee can create a document using a template. For e.g. a
software bug can be reported with the fields ‘created by’, ‘date’, ‘type of’, ‘keywords’ and the actual bug
report. It’s a simple approach that can be implemented easily. But it suffers from the same problem that
search engines have. Knowledge is retrieved by using keywords. If the query were ‘find an alloy that is
lighter than titanium and having at least three quarters of its strength’. Typing the keyword ‘alloy’ could lead
to hundreds of results. Also effort is required in keeping the knowledge up-to-date – a maintenance challenge
(where did we first come across the term maintenance?).
37
Fig 6. Store Knowledge as Documents.
ONTOLOGY BASED APPROACHES
Ontology is a scheme for characterizing and classifying knowledge so that it can be accessed easily. For
example engines could be classified into a) human powered, b) steam powered, c) gas powered etc. The best
way to understand ontology is to consider the Dewey decimal system. This is a system for properly shelving
books in libraries. In this scheme, each subject is given a number for e.g. 000 for ‘general works, computers
and information’, 200 for ‘religion’, 600 for ‘technology’, 800-for literature etc. Thus “The Adventures of
Sherlock Holmes” by Arthur Conan Doyle will have a call# of 823.91, being in literature. Ontology is a
scheme to similarly categorize knowledge items so that they can be retrieved easily with a standardized set of
keywords. (In what way is it an improvement over document based approaches?). Another way to think of
an ontology is as a formal description of a domain (e.g. welding).
THE DAST ONTOLOGY
DAST is a simple ontology to retrieve government resources online. It uses the keywords, ‘Data type,’
‘Agency’, ‘Action’, ‘Subject’ ‘Location’ (not shown below) and ‘Time’ as shown in the diagram and discussed
below.
Data Type: This describes the fundamental nature of the data, whether it is regulatory, descriptive, statistical,
ownership related etc. Thus information about an overseas embassy would be descriptive while the date a
license was issued falls into the sub-category of “ownership.”
Action: Describes the action associated with the subject that we are interested in. If there is none, the default
is “stative.” Thus in “pork consumption”, “consumption” is an action associated with the subject. Multiple
subjects will be associated with multiple actions. In queries involving regulations or ownership of property,
the action would be “stative.”
Agency: This describes the organization or organizations involved in the query. Examples include Microsoft,
the Federal Trade Commission, National Space Center etc. The default is “any.”
38
Subject: This describes the subject or main focus of the query and corresponds to the grammatical notion of
the concept. Subjects can be abstract or physical. There can be more than one subject.
Time: This attribute is used if there is a time dimension for the subject such as a license valid for a certain
duration or an expiry time for a tax credit. The time point could be a specific time point such as August 3rd,
2013 or a range such as 2000-2013.
Location: This attribute is used if there is a physical location associated with the subject. There can be more
than one location associated with a subject as in a ship going to multiple ports.
Fig 7. The DAST Ontology
EXAMPLE KNOWLEDGE ITEMS
Item1 is concerned with the per capita meat consumption in the U.S. Its most prominent qualities are time
period and data type. The data is available by decade from 2002-2012 and it is statistical in nature. When
querying for this data, a template similar to that below (see second item below) may be used.
ITEM 1: Excerpted from the U.S. Department of Agriculture Economic Research
39
Fig 8. The Per Capita Meat Consumption in the U.S
Fig 9. A template for querying Item#1
Item2 is concerned with the minimum wage and overtime pay for hourly employees. The only quality is the type of data, which is regulatory in nature. When querying for this, a template similar to the template below may be used, which indicates that a person is searching for ‘statistical data’ from the ‘USDA’ concerning ‘meat’ during the time period ‘2002-2012’. ITEM 2: Excerpted from the U.S. Department of Labor, Fair Labor Standards Act [http://www.dol.gov/elaws/esa/flsa/screen5.asp]. “The FLSA has been amended on many occasions since 1938. Currently, workers covered by the FLSA are entitled to the minimum wage of $8.15 per hour and overtime pay at a rate of not less than one and one-half times their regular rate of pay after 40 hours of work in a workweek. “
Fig 10. A template for querying Item#2
When querying for this, a template similar to the above template may be used, which indicates that a person is searching for ‘regulatory’ knowledge from ‘any organization’ concerning ‘minimum wage’ or ‘overtime’.
40
ANOTHER EXAMPLE OF AN ONTOLOGY (STEEL INDUSTRY CASE STUDY)
Fig 11. Terminology diagram
Diagram above shows ontology. Diagram below shows terminology for steel coil. There are two types of coil that are ordered, dual phase and HSLA. For quality purposes, a steel coil is sampled at points sp1, sp2… The length scope is the distance between sampling points. Defects can appear at sampling points as indicated by d1, d2 etc. Defects can be mechanical defects or material defects (only mechanical are shown above). Mechanical defects include temperature, skin pass, yield-strength etc. Defects can cause Phenomenon such as discoloration or stress that can extend into the coil.
x – x axis sp – sampling point d – defect ls – length scope
Fig 12. Ontology for steel coil
AI BASED APPROACHES
Artificial Intelligence based approaches use AI technologies to manage organizational knowledge. The great
advantage is that knowledge can be accessed by querying the knowledge base in a natural language (i.e.
spoken or written commands). Unfortunately, in this approach, even simple examples of professional
knowledge such as “Insurance is the obligation to compensate the insured in the event of a loss” are
41
surprisingly difficult to represent. The problem is that concepts such as ‘obligation’, ‘compensate’, ‘event of a
loss’, and ‘loss’ are abstract and therefore challenging and awkward to represent. At the present time the state
of the art of AI technology permits only representation of administrative support knowledge. This can be
used to represent knowledge such as “Van leaves DBSS at 8:00 am.” Or that “DBSS cannot own fixed
assets.” The approach uses systems developed using AI languages such as Visual Prolog.
THE AEI-3 SCHEME
A specialized representation scheme known as AEI-3 has been developed to store administrative support
knowledge. There are two types of links, structural (“S”) and descriptive links (“D”). ‘S’ links can designate
any structural relationship such as part-subpart (“has_a”) or class-instance (“is_a”), while “D” links designate
attributes and are defined between classes, instances and/or “extensions”. In the example below ‘Rangarajan’
is asserted as an instance of ‘Instructor’ through the ‘s:is_a’ link. Note the use of a double walled rectangle
for class and a single bordered rectangle for an instance. There is a ‘D:’ link between ‘Rangarajan’ and ‘C++’
showing that ‘Rangarajan’ teaches ‘C++’. There are no restrictions on link descriptors (“travel arrangements”)
except that they be minimalist.
Fig 13. An Example of AEI-3 Scheme
The second diagram below shows that Susan is an ‘instructor’ and that she is ‘available’ in ‘August’ which is
an instance of ‘month’. The collection of such items becomes a knowledge base. Such information cannot
be captured in conventional/rule-based knowledge bases. Note that it is necessary to capture, in this
knowledge base, common knowledge that ‘August’ is a ‘month’.
Fig 14. An Example of AEI-3 Scheme
42
KM USAGE – ASSESSMENT
The objective of the assessment is to assess the effectiveness of the KM program. One approach is to assess
knowledge levels in different areas of the organization. The assessment is carried out by department
managers and/or senior managers. Weaknesses if identified, become the targets of next year’s KM focus.
EVALUATE KNOWLEDGE
Knowledge is evaluated in areas such as ‘customer/market’, ‘employee/organization’, ‘product/service’. In
these areas, progress in knowledge is evaluated along the dimensions of ‘experience’, ‘image’, ‘fixed form’ i.e.
some outputs, and ‘system’. Managers are asked, ‘Compared to previous years levels, how do you rate
customer knowledge?’ and similar questions for other items of knowledge. They assign a number based on a
base of 100 (not maximum of 100).
As the chart below illustrates, experience knowledge seems to have decreased significantly in all three areas
i.e. customer/market’, ‘employee/organization’, ‘product/service’. Customer image has decreased
significantly in one area, slightly in another area and slightly increased in the third area. ‘Fixed form’ and
‘system’ have increased across all three areas. The knowledge assessment highlights the problem areas.
Fig 15. An Example of a Knowledge Audit
CASE STUDY OF KM INITIATIVE AT HP CONSULTING
The following case study describes the implementation of Knowledge Management (KM) initiatives at HP
Consulting in the early 2000’s. The company provides global consulting in managing IT services, PC support,
CRM and ERP. The KM Initiative was motivated by client demands for expertise, competitive pressures and
the global nature of the business. The objectives were to deliver more value and expertise to clients without
increasing the number of hours worked and to make knowledge sharing part of the corporate culture.
A pilot program was undertaken at two sites in North America. Management regarded these sites as good
candidates because only a few consultants there had expertise in the area and it had to be shared quickly. A
four step change model was utilized. Employees would be first mobilized by educating them about the
43
initiative and its vision. The vision emphasized the benefits and values of the initiative such as (Knowledge
Sharing being a valued behavior). This was operationalized through programs designed to share knowledge
including Learning Communities, Project Snapshots and Knowledge Mapping. In the transition stage, the KM
culture was propagated to the rest of employees in two day workshops. Despite initial resistance (due to
perceptions that it was ‘fluff’ and a waste of time), the workshops were successful in introducing knowledge
relationships among participants. They quickly formed learning communities and began to share their
expertise. Anecdotes regarding knowledge-sharing helped accelerate this trend. Outcomes included reduced
consulting time, re-use of trade material, increased levels of expertise and enthusiasm for the project.
With the success of the pilot programs, management of intellectual capital became an essential part of HP
Consulting’s strategy and was quickly incorporated into its knowledge processes.
CASE STUDY OF SIEMENS SHARENET
This case study concerns the Siemens Sharenet, an IT based KM system. Siemens is a large
Telecommunications conglomerate with headquarters in Munich and sales of over € 60b. It employs over
400,000 people in 190 countries. Up until the mid 1990’s Siemens dominated its markets, but
telecommunications deregulation and change in data traffic worldwide forced it to change from a simple
manufacturer of ‘boxes’ to a provider of customer communications solutions. As part of its re-organization,
the carrier and enterprise branches of Siemens was merged into a new group called the Information and
Communications Network (ICN). The new group had 65,000 employees and € 12b in sales. The case is
focused on this group. Since the company was shifting its emphasis from a “products” provider to a
“solution’ provider, it needed to leverage the expertise of its employees.
The president of strategy, Joachim Doring, invited a select group of top sales persons and developed a
detailed map of the selling process and identified broad categories of business knowledge relevant to each
stage. They planned for the system to use familiar technologies such as intranet, email, library and a forum.
It differed from other KM systems by being a multi-platform approach. The core part of the system consists
of the library that has knowledge objects. There were two types ‘environment objects’ and ‘solution objects.’
The environment objects have knowledge about competitors, markets, projects, customers etc. – “how did you
approach a customer?,” “what factors drive the customer’s decision process?” while the solution objects focused
on technical and functional solutions to specific situations – “how do you lay a cable in the jungle?” “how do
we set addresses for an IP based phone system?” To enter knowledge objects, employees filled a series of
structured questions, “What type of technology is it?” “What are the key features?”. A forum was used for
urgent requests. For e.g. “How dangerous is it to lay cable in the jungle?”
After several meetings, the first version of Sharenet was developed by a web development company in April
1999 and pilots were run in China, Malaysia, Portugal and Australia. The pilots were not successful as the
system was difficult to use. Subsequently, another company was hired and it revamped Sharenet so it was
easier to use even in places where the bandwidth was small. The next challenge was in getting employees to
use the system. To accomplish this, 2-3 day workshops were held. Participants were asked to identify
unsolved work problems and encouraged to post their problem in Sharenet. To their amazement, most
particpants received solution in a day or less. To further encourage usage, “evangelist” managers in other
countries were recruited to help propagate Sharenet. Technical support was provided by consultants at the
Munich headquarters, who also ensured that the content that was submitted was useful and clear. By 2,000
there were 3,800 users and the $7.8 m operating cost was absorbed by headquarters. One of the problems
encountered was the reluctance to share knowledge. To address this issue, a system of rewards points called
Sharenet shares were introduced. An employee could earn 3 shares by answering an urgent request or 20
44
shares for providing a success story/technical solution. Each share is equivalent to a Euro and could be
redeemed for company products, textbooks or business trips. This resulted in an increase in contributions.
Control of content was also shifted back to the functional areas. Each area of the organization had a
designated “global editor” who reviewed and published the content. The total number of objects published
worldwide was approximately 20,000. One of the success stories was the winning of a $460,000
telecommunications contract for a hospital that made use of the knowledge in Sharenet to prove that the
Siemens network was more reliable. By July 2002, sharenet had 19,000 subscribers.
The system saved time for employees as well as for Siemens and provided visibility for “unsung” experts in
the organization. Employees earned rewards for contributing their knowledge. Approximately 2.5 million
“shares” were earned by employees. The system played a key role in a total of twenty seven projects
representing € 120 million. The system also earned an award from the American Productivity and Quality
Center. Sharenet was unfortunately discontinued in 2003 due to financial problems in the parent company
and the unwillingness of division executives to share in the costs.
CASE STUDIES OF TECHNICAL APPROACHES
Fig 16. Case Studies of Technical Approaches