business intelligence and applications

[email protected]

Difference Between GDSS and DSS DSS focuses on groups rather than a particular person such as in DSS. • GDSS has a networking structure or technology which DSS does not.

GDSS focuses on group decisions while DSS focuses on individual decisions

DSS relies to a knowledge base to some degree while a GDSS does not

GDSS requires a working connection between users while DSS does not

GDSS is applicable to every situation while DSS is not

Groupware

is a term that refers to technology designed to help people collaborate and includes a wide range of applications. Wikipedia defines three handy

categories for groupware:

Communication tools:Tools for sending messages and files, including email, webpublishing, wikis, filesharing, etc.

Conferencing tools:e.g. video/audio conferencing, chat, forums, etc.

Collaborative management tools:Tools for managing group activities, e.g. project management systems, workflow systems,

information management systems, etc.

The best known groupware system is Lotus Notes.

If designed and implemented properly, groupware systems are very useful when it comes to supporting knowledge management (KM). They can

greatly facilitateexplicit knowledgesharing through publishing and communication tools. They can support theknowledge creationprocess with

collaborative management tools - although this process is still very much about people interacting and experimenting. Finally, they have some

limited benefit totacit knowledgetransfer by supporting socialization through tools like video conferencing and informal communication. Expert

finders are also beneficial for facilitating the location of tacit sources of knowledge (i.e. the right expert).

Data mart

A data mart is the access layer of the data warehouse environment that is used to get data out to the users. The data mart is a subset

of the data warehouse that is usually oriented to a specific business line or team. Data marts are small slices of the data warehouse. Whereas

data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department. In some deployments, each

department or business unit is considered the owner of its data mart including all the hardware, software anddata.[1] This enables each

department to use, manipulate and develop their data any way they see fit; without altering information inside other data marts or the data

warehouse. In other deployments where conformed dimensions are used, this business unit ownership will not hold true for shared dimensions

like customer, product, etc.

The primary use for a data mart is business intelligence (BI) applications. BI is used to gather, store, access and analyze

data. The data mart can be used by smaller businesses to utilize the data they have accumulated. A data mart can be less expensive

than implementing a data warehouse, thus making it more practical for the small business. A data mart can also be set up in much

less time than a data warehouse, being able to be set up in less than 90 days. Since most small businesses only have use for a small

number of BI applications, the low cost and quick set up of the data mart makes it a suitable method for storing data

Reasons for creating a data mart[edit]

Easy access to frequently needed data

http://www.knowledge-management-tools.net/different-types-of-knowledge.html

http://www.knowledge-management-tools.net/knowledge-creation.html

http://www.knowledge-management-tools.net/different-types-of-knowledge.html

http://en.wikipedia.org/wiki/Data_warehouse

http://en.wikipedia.org/wiki/Data_mart#cite_note-1

http://en.wikipedia.org/w/index.php?title=Data_mart&action=edit&section=2

[email protected]

Creates collective view by a group of users

Improves end-user response time

Ease of creation

Lower cost than implementing a full data warehouse

Potential users are more clearly defined than in a full data warehouse

Contains only business essential data and is less cluttered

Data manipulation language

A data manipulation language (DML) is a family of syntax elements similar to a computer programming language used for inserting,

deleting and updating data in a database. Performing read-only queries of data is sometimes also considered a component of DML.

A popular data manipulation language is that of Structured Query Language (SQL), which is used to retrieve and manipulate data in

a relational database.[1] Other forms of DML are those used by IMS/DLI, CODASYL databases, such as IDMS and others.

Data manipulation languages have their functional capability organized by the initial word in a statement, which is almost always a

verb. In the case of SQL, these verbs are:

SELECT ... FROM ... WHERE ...

INSERT INTO ... VALUES ...

UPDATE ... SET ... WHERE ...

DELETE FROM ... WHERE ...

The purely read-only SELECT query statement is classed with the 'SQL-data' statements[2] and so is considered by the standard to be

outside of DML. The SELECT ... INTO form is considered to be DML because it manipulates (i.e. modifies) data. In common practice though, this

distinction is not made and SELECT is widely considered to be part of DML.[3]

Most SQL database implementations extend their SQL capabilities by providing imperative, i.e. procedural languages. Examples of

these are Oracle's PL/SQL and DB2's SQL PL.

Data manipulation languages tend to have many different flavors and capabilities between database vendors. There have been a

number of standards established for SQL by ANSI,[1] but vendors still provide their own extensions to the standard while not implementing the

entire standard.

Data manipulation languages are divided into two types, procedural programming and declarative programming.

Data manipulation languages were initially only used within computer programs, but with the advent of SQL have come to be used

interactively by database administrators.

Data Warehousing

It is a process

Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making

decisions that were not previous possible

A decision support database maintained separately from the organization’s operational database

http://en.wikipedia.org/wiki/Response_time_(technology)

http://en.wikipedia.org/wiki/Data_warehouse

http://en.wikipedia.org/wiki/Programming_language

http://en.wikipedia.org/wiki/Database

http://en.wikipedia.org/wiki/Structured_Query_Language

http://en.wikipedia.org/wiki/Data

http://en.wikipedia.org/wiki/Relational_database

http://en.wikipedia.org/wiki/Data_manipulation_language#cite_note-SQL92-1

http://en.wikipedia.org/wiki/Information_Management_System

http://en.wikipedia.org/wiki/CODASYL

http://en.wikipedia.org/wiki/IDMS

[email protected]

A data warehouse is a

subject-oriented

integrated

time-varying

non-volatile

collection of data that is used primarily in organizational decision making.

-- Bill Inmon, Building the Data Warehouse 1996

Advantages of Warehousing

l High query performance

l Queries not visible outside warehouse

l Local processing at sources unaffected

l Can operate when sources unavailable

l Can query data not stored in a DBMS

l Extra information at warehouse

u Modify, summarize (store aggregates)

u Add historical information

UNIT-1

[email protected]

Decision Support System

Stages of Decision Making

(From Herbert Simon)

• Intelligence (in the military sense of gathering information)

• Design (Identifying the alternatives, structuring how the decision will be made)

• Choice (Picking an alternative or making the judgment)

• [Implementation – later added by other authors]

• [Evaluation]

• Each stage can be Structured (automated) or Unstructured

• “Structured” means that there is an algorithm, mathematical formula, or decision rule to accomplish the entire

stage. The algorithm can be implemented manually or it can be computerized, but the steps are so detailed

that no little or no human judgment would be needed.

• Any decision stage that is not structured is unstructured

• In a structured decision all three stages are structured

• In a non-structured decision all three stages are unstructured

• A semi-structured decision is one in which part, but not all, of the decision is structured.

• The realm of Decision Support Systems is Semi-Structured and unstructured Decisions…

• The type of decisions that can benefit from “decision support” but the human decision maker is still involved.

• A decision support system (DSS) is a computer-based application that collects, organizes and analyzes business

data to facilitate quality business decision-making for management, operations and planning.

• A DSS is a computer system at the management level of an organization that combines data, sophisticated

analytical tools & user friendly software to support semi-structured & unstructured decision making.

Capabilities of a Decision Support System (1)

• Support for problem-solving phases--Intelligence, design, choice, implementation, monitoring

• Support for different decision frequencies

Ad hoc DSS: Concerned with decisions that come up once in every 5 years (e.x., where should a company open

a new distribution center?)

Institutional DSS: Concerned with decisions that repeat (e.x., what should the company invest in?

• Support for different problem structures

– Highly structured problems: Known facts and relationships

– Semi-structured problems: Facts unknown or ambiguous, relations vague

[email protected]

– E.x.Which person to hire for a position?

• Support for various decision-making levels

– Operational level: Daily decisions

– Tactical level: Planning and control

– Strategic level: Long-term decisions

Support for Various Decision-Making Levels

Selected DSS Applications

[email protected]

Components of DSS

• A typical Decision support systems has four components: data management, model management, knowledge

management and user interface management.

Data Management Component

The data management component performs the function of storing and maintaining the information that you

want your Decision Support System to use. The data management component, therefore, consists of both the

Decision Support System information and the Decision Support System database management system.

Model Management Component

The model management component consists of both the Decision Support System models and the Decision

Support System model management system.

[email protected]

Decision Support Systems help in various decision-making situations by utilizing models that allow you to

analyze information in many different ways.

For example, you would use what-if analysis to see what effect the change of one or more variables will have

on other variables, or optimization to find the most profitable solution given operating restrictions and limited

resources. Spreadsheet software such as excel can be used as a Decision Support System for what-if analysis.

The model management system stores and maintains the Decision Support System’s models. Its function of

managing models is similar to that of a database management system. The model management component can not

select the best model for you to use for a particular problem that requires your expertise but it can help you create

and manipulate models quickly and easily.

User Interface Management Component

The user interface management component allows you to communicate with the Decision Support System. It

consists of the user interface management system. This is the component that allows you to combine your know-how

with the storage and processing capabilities of the computer.

The user interface is the part of the system you see through it when enter information, commands, and models.

Knowledge Management System

The Knowledge--Based Management Subsystem can support any of the other components or act as an independent

component.

The knowledge management component, like that in an expert system, provides information about the relationship

among data that is too complex for a database to represent. It consists of rules that can constrain possible solution as

well as alternative solutions and methods for evaluating them.

Types Of DSS

• A communication-driven DSS supports more than one person working on a shared task; examples include

integrated tools like Microsoft's NetMeeting, e-mail.

• A data-driven DSS or data-oriented DSS emphasizes access to and manipulation of a time series of internal

company data and, sometimes, external data.

• A document-driven DSS manages, retrieves, and manipulates unstructured information in a variety of

electronic formats.

• A knowledge-driven DSS provides specialized problem-solving expertise stored as facts, rules, procedures, or

in similar structures.

• A model-driven DSS emphasizes access to and manipulation of a statistical, financial, optimization, or

simulation model.

[email protected]

GDSS – Group Decision Support Systems

Group Decision Support Systems (GDSS) - An interactive, computer-based system that facilitates solution of

unstructured problems by a set of decision-makers working together as a group. It aids groups, especially

groups of managers, in analyzing problem situations and in performing group decision making tasks.

Group Support Systems has come to mean computer software and hardware used to support group functions

and processes.

Group Decision Support Systems (GDSS)

• Class of electronic meeting systems, designed to support meetings and group work.

• Electronic meeting system (EMS) is a type of computer software that facilitates creative problem solving and

decision-making of groups within or across organizations

• Mainly through anonymization and parallelization of input, electronic meeting systems overcome many

inhibitive features of group work.

Process in GDSS

• Similar to a web conference, a host invites the participants to an electronic meeting via email.

• User typically has his or her own computer, and each user can contribute to the same shared object at the

same time.

• Thus, nobody needs to wait for a turn to speak; so people don't forget what they want to say while they are

waiting for the floor.

• The group can focus on the content and meaning of ideas, rather than on their sources.

Similarities Between GDSS and DSS

Both use models, data and user-friendly software

Both are interactive with “what-if” capabilities

Both use internal and external data

Both allow the decision maker to take an active role

Both have flexible systems

Both have graphical output

Characteristics of a GDSS (1)

Special design:

◦ Effective communication

◦ Group decision making

Ease of use

Flexibility

◦ Accommodate different perspectives

[email protected]

Anonymous input

◦ Individuals’ names are not exposed

Parallel communication

Decision-making support

◦ Delphi approach: Decision makers are scattered around the globe

◦ Brainstorming: Say things as you think---think out loud

◦ Group consensus approach: The group reaches a unanimous decision (everybody agrees)

◦ Nominal group technique: Voting

Reduction of negative group behavior

◦ A trained meeting facilitator to help with sidetracking

Automated record keeping

Examples of GDSS Software

Lotus Notes

◦ Store, manipulate, distribute memos

Microsoft Exchange

◦ Keep individual schedules

◦ Decide on meeting times

NetDocuments Enterprise

◦ Two people can review the same document together

GDSS Time/Place Environment

GDSS Alternatives

[email protected]

Advantages of GDSS

Anonymity – drive out fear leading to better decisions from a diverse hierarchy of decision makers

Parallel Communication – eliminate monopolizing providing increased participation, better decisions

Automated record keeping – no need to take notes, they’re automatically recorded

Ability for virtual meetings – only need hardware, software and people connected

Portability - Can be set up to be portable… laptop

Global Potential - People can be connected across the world

No need for a computer guru – although some basic experience is a must

Disadvantages of GDSS

Cost –infrastructure costs to provide the hardware and software/room/network connectivity can be very

expensive

Security – especially true when companies rent the facilities for GDSS; also, the facilitator may be a lower-level

employee who may leak information to peers

Technical Failure – power loss, loss of connectivity, relies heavily on bandwidth and LAN/WAN infrastructure –

properly setup system should minimize this risk

Keyboarding Skills – reduced participation may result due to frustration

Training – learning curve is present for users, varies by situation

Perception of messages

Typical GDSS Process

1) Group Leader (and Facilitator?) select software, develop agenda

2) Participants meet (in decision room/Internet) and are given a task.

3) Participants generate ideas – brainstorm anonymously

4) Facilitator organize ideas into categories (different for user-driven software)

[email protected]

5) Discussion and prioritization – may involve ranking by some criteria and/or rating to the facilitators scale

6) Repeat Steps 3, 4, 5 as necessary

7) Reach decision

8) Recommend providing feedback on decision and results to all involved

Future Implications of GDSS

Integrating into existing corporate framework

◦ GDSS brings changes which must be managed

GDSS will incorporate Artificial Intelligence and Expert Systems – the software will “learn” and help the users

make better decisions

Decreasing cost will allow more organizations to use GDSS

Increasing implementation of GDSS with the customer

◦ Customer voice their needs in non-threatening environment

Choosing The Right GDSS

Consider the following;

◦ Decision Task Type

◦ Group Size

◦ Location of members of the group

EXPERT SYSTEM( some from ism notes)

Components of an Expert System

Knowledge base

◦ Stores all relevant information, data, rules, cases, and relationships used by the expert system

Inference engine

◦ Seeks information and relationships from the knowledge base and provides answers, predictions, and

suggestions in the way a human expert would

Rule

◦ A conditional statement that links given conditions to actions or outcomes

Fuzzy logic

◦ A specialty research area in computer science that allows shades of gray and does not require

everything to be simply yes/no, or true/false

Backward chaining

[email protected]

◦ A method of reasoning that starts with conclusions and works backward to the supporting facts

Forward chaining

◦ A method of reasoning that starts with the facts and works forward to the conclusions

Limitations of Expert Systems

Not widely used or tested

Limited to relatively narrow problems

Cannot readily deal with “mixed” knowledge

Possibility of error

Cannot refine own knowledge base

Difficult to maintain

May have high development costs

Raise legal and ethical concerns

UNIT-2

Data Warehousing

A data warehouse (DW) is a collection of integrated databases designed to support a DSS

An operational data store (ODS) stores data for a specific application. It feeds the data warehouse a stream of desired

raw data.

A data mart is a lower-cost, scaled-down version of a data warehouse, usually designed to support a small group of users

(rather than the entire firm)

[email protected]

The metadata is information that is kept about the warehouse

Online Analytical Processing (OLAP) is the broad category of software technology that enables multidimensional analysis

of enterprise data

[email protected]

n Data sources often store only current data, not historical data

n Corporate decision making requires a unified view of all organizational data, including historical data

n A data warehouse is a repository (archive) of information gathered from multiple sources, stored under a

unified schema, at a single site

l Greatly simplifies querying, permits study of historical trends

l Shifts decision support query load away from transaction processing systems

Data Warehouse Data Mart

1. It is a large store of data accumulated from a wide range of sources within a company and used to guide management decisions.

It is one of the access layers of the data warehouse environment that is used to get data out to the users. Or, we can say data marts are small slices of the data warehouse.

2. It is corporate/enterprise wide.

It is department specific in nature.

3. It is the union of all data marts.

It is a subset of a data warehouse.

4. Here data is received from staging area.

Here it follows the star join approach (i.e. facts and dimensions are

[email protected]

Design Issues

n When and how to gather data

l Source driven architecture: data sources transmit new information to warehouse, either continuously

or periodically (e.g. at night)

l Destination driven architecture: warehouse periodically requests new information from data sources

l Keeping warehouse exactly synchronized with data sources (e.g. using two-phase commit) is too

expensive

Usually OK to have slightly out-of-date data at warehouse

Data/updates are periodically downloaded form online transaction processing (OLTP) systems.

n What schema to use

l Schema integration

n Data cleansing

l E.g. correct mistakes in addresses (misspellings, zip code errors)

l Merge address lists from different sources and purge duplicates

n How to propagate updates

l Warehouse schema may be a (materialized) view of schema from data sources

n What data to summarize

l Raw data may be too large to store on-line

l Aggregate values (totals/subtotals) often suffice

l Queries on raw data can often be transformed by query optimizer to use aggregate values

connected).

5. Its structure is more suitable for corporate view of data.

Its structure is more suitable for departmental view of data.

6. It handles queries on presentation resource.

Here data is technologically optimal for data access and analysis.

[email protected]

Architecture of Data Warehouse

[email protected]

[email protected]

Data Marts

•Data warehouses can support all of an organization’s information

•Data marts have subsets of an organization wide data warehouse

•Data mart – subset of a data warehouse in which only a focused portion of the data warehouse information is kept

[email protected]

[email protected]

Unit-3

Data Mining(the analysis step of the "Knowledge Discovery in Databases" process, or KDD)

• Definition: “the analysis of data to discover previously unknown relationships that provide useful information”

(Hand et al.)

• Data mining makes use of statistical and visualisation techniques to discover and present information in a form

that is easily comprehensible

• Data mining can be applied to tasks such as decision support, forecasting, estimation, and uncovering and

understanding relationships among data elements

• Traditionally the task of identifying and utilising information hidden in data has been achieved through some

form of traditional statistical methods

• Typically, this involves a user formulating a guess about a possible relationship in the data and evaluating this

hypothesis via a statistical test. This is a largely time-intensive, user-driven, top-down approach to data

analysis.

• With data mining, the interrogation of the data is done by the data mining algorithm rather than by the user

[email protected]

• Data mining is a self-organising, data-influenced, bottom-up approach to data analysis

• Simply put, what data mining does is sort through masses of data to uncover patterns and relationships, then

build models to predict behaviours

Verification -v- Knowledge Data Discovery

• In the past, decision support activities were primarily based on the concept of verification

• This required a great deal of prior knowledge on the decision-maker’s part in order to verify a suspected

relationship

• With the advance of technology, the concept of verification began to turn into knowledge data discovery

Knowledge Data Discovery

• Knowledge data discovery (KDD) techniques include: statistical analysis, neural or fuzzy logic, intelligent

agents, data visualisation

• KDD techniques not only discover useful patterns in the data, but also can be used to develop predictive

models

Knowledge Data Discovery

• Knowledge data discovery (KDD) techniques include: statistical analysis, neural or fuzzy logic, intelligent

agents, data visualisation

• KDD techniques not only discover useful patterns in the data, but also can be used to develop predictive

models

Data mining[edit]

Data mining involves six common classes of tasks:[1]

Anomaly detection (Outlier/change/deviation detection) – The identification of unusual data records, that

might be interesting or data errors that require further investigation.

Association rule learning (Dependency modeling) – Searches for relationships between variables. For example

a supermarket might gather data on customer purchasing habits. Using association rule learning, the

supermarket can determine which products are frequently bought together and use this information for

marketing purposes. This is sometimes referred to as market basket analysis.

Clustering – is the task of discovering groups and structures in the data that are in some way or another

"similar", without using known structures in the data.

Classification – is the task of generalizing known structure to apply to new data. For example, an e-mail

program might attempt to classify an e-mail as "legitimate" or as "spam".

Regression – attempts to find a function which models the data with the least error.

Summarization – providing a more compact representation of the data set, including visualization and report

generation.

Data Mining Technologies

http://en.wikipedia.org/w/index.php?title=Data_mining&action=edit&section=6

http://en.wikipedia.org/wiki/Data_mining#cite_note-Fayyad-1

http://en.wikipedia.org/wiki/Anomaly_detection

http://en.wikipedia.org/wiki/Association_rule_learning

http://en.wikipedia.org/wiki/Cluster_analysis

http://en.wikipedia.org/wiki/Statistical_classification

http://en.wikipedia.org/wiki/Regression_analysis

http://en.wikipedia.org/wiki/Automatic_summarization

[email protected]

k-nearest neighbors algorithm

In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used

forclassification and regression.[1] In both cases, the input consists of the k closest training examples in the feature

space. The output depends on whether k-NN is used for classification or regression:

o In k-NN classification, the output is a class membership. An object is classified by a majority vote of its

neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a

positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single

nearest neighbor.

o In k-NN regression, the output is the property value for the object. This value is the average of the values

of its knearest neighbors.

k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all

computation is deferred until classification. The k-NN algorithm is among the simplest of all machine

learning algorithms.

Both for classification and regression, it can be useful to weight the contributions of the neighbors, so that the nearer

neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme

consists in giving each neighbor a weight of 1/d, where d is the distance to the neighbor.[2]

The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object property value

(for k-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training

step is required.

• Statistics – the most mature data mining technologies, but are often not applicable because they need clean

data. In addition, many statistical procedures assume linear relationships, which limits their use.

• Neural networks, genetic algorithms, fuzzy logic – these technologies are able to work with complicated and

imprecise data. Their broad applicability has made them popular in the field.

• Decision trees – these technologies are conceptually simple and have gained in popularity as better tree

growing software was introduced. Because of the way they are used, they are perhaps better called

“classification” trees.

A decision tree is a predictive model

A decision tree is a diagram that a decision maker can create to help select the best of several alternative

courses of action.

Problem, Options, Outcome of each option.

How to draw a Decision Tree

Start with the decision which needs to be made ––draw a box

Draw to the right possible solutions on lines

At end of each line, if result is uncertain, draw circle; if other decision, draw box

From the other decisions, draw lines for options which can be taken

Calculate decision which has greatest worth to you and give it a value

http://en.wikipedia.org/wiki/Pattern_recognition

http://en.wikipedia.org/wiki/Non-parametric_statistics

http://en.wikipedia.org/wiki/Statistical_classification

http://en.wikipedia.org/wiki/Regression_analysis

http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm#cite_note-1

http://en.wikipedia.org/wiki/Feature_space

http://en.wikipedia.org/wiki/Feature_space

http://en.wikipedia.org/wiki/Integer

http://en.wikipedia.org/wiki/Instance-based_learning

http://en.wikipedia.org/wiki/Lazy_learning

http://en.wikipedia.org/wiki/Machine_learning

http://en.wikipedia.org/wiki/Machine_learning

http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm#cite_note-2

[email protected]

Estimate probability of each uncertainty

Data Mining Techniques

• Paralleling the popularity of data mining itself, the development of new techniques is exploding as well

• Many innovations are vendor-specific, which sometimes does little to advance the state of the art

• Regardless, data-mining techniques tend to fall into four major categories:

– classification

– association

– sequencing

– clustering

Classification Methods

• The goal is to discover rules that define whether an item belongs to a particular subset or class of data

[email protected]

• For example, if we are trying to determine which households will respond to a direct mail campaign, we will

want rules that separate the “probables” from the not probables.

• These IF-THEN rules often are portrayed in a tree-like structure

Sequencing Methods

• These methods are applied to time series data in an attempt to find hidden trends

• If found, these can be useful predictors of future events

• For example, customer groups that tend to purchase products tied-in with hit movies would be targeted with

promotional campaigns timed to release dates

Clustering Techniques

• Clustering techniques attempt to create partitions in the data according to some “distance” metric

• Clustering aims to segment a diverse group into a number of similar subgroups or clusters

• The clusters formed are data grouped together simply by their similarity to their neighbours

• By examining the characteristics of each cluster, it may be possible to establish rules for classification

• In clustering, there are no predefined classes and no examples. The records are grouped together on the basis

of self-similarity.

Association Methods

• These techniques search all transactions from a system for patterns of occurrence

• A common method is market basket analysis, in which the set of products purchased by thousands of

consumers are examined

– It finds affinity groupings that discover what items are usually purchased with others, predicting the

frequency with which certain items are purchased at the same time

• Results are then portrayed as percentages; for example, “30% of the people that buy steaks also buy charcoal”

Association: Market Basket Analysis

• This is the most widely used and, in many ways, most successful data mining algorithm

• It essentially determines what products people purchase together

• Retailers can use this information to place these products in the same area

• Direct marketers can use this information to determine which new products to offer to their current

customers

• Inventory policies can be improved if reorder points reflect the demand for the complementary products

Market Basket Analysis Method

• We first need a list of transactions to see what was purchased. This can be easily obtained from cash registers

/ POS devices.

[email protected]

• Next, we choose a list of products to analyse, and tabulate how many times each was purchased with the

others …

[email protected]

Data Mining: Some Applications

• Pharmaceuticals: Massive amounts of biological and clinical information can be analysed with data mining

methods to discover new uses for existing drugs

• Healthcare: Hospitals are using data mining to perform utilisation analysis and pricing analysis, to estimate

outcome analysis, to improve preventive care, and to detect fraud and questionable practices

• Banking: Data mining tools help banks to understand customer behaviour, conduct profitability analysis,

improve cross-selling efforts, identify credit risk, identify customers for loan campaigns, tailor financial

products to meet customer needs, seek new customers, and enhance customer service

• Credit card companies: Predictors for credit card customer attrition and fraud are frequently identified via

data mining. Successful users of data mining include American Express and Citibank.

• Financial services: Security analysts are using data mining extensively to analyse large volumes of financial

data in order to build trading and risk models for developing investment strategies

• Telemarketing and direct marketing: In this sector, companies have gained big savings and are able to target

customers more accurately by using data mining. Direct marketers are configuring and mailing their product

catalogs based on customers' purchase history and demographic data.

[email protected]

• Airlines: As the competition in the airline business increases, understanding customers' needs has become

imperative. Airlines capture customer data in order to make strategic movements such as expanding their

services in new routes.

• Manufacturers: Data mining is widely used in manufacturing industries to control and schedule technical

production processes.

• Insurance companies: The insurance industry is data intensive. Data mining has recently provided insurers

with a wealth of useful information extracted from huge databases for decision making.

• Telecommunications: By applying the insights learned through data mining, telecommunications companies

can identify products and services that maximise value and then use this information to establish marketing

campaigns to improve market share. A common example in this industry is identifying factors that influence

customer retention. In the US, telephone companies were famous for their price-cutting strategy in the past,

but the new strategy is to know their customers better. Using data mining, telephone companies are able to

provide customers with a great variety of new services they are likely to purchase.

• Distribution and retailing: With the huge amount of consumer data flowing in daily from different sources,

especially from e-commerce Web sites, data mining helps companies learn more about their customers and

develop insights into their buying habits. Knowing the behaviours (e.g. likes and dislikes) of customers leads to

better customer service and allows companies to create one-to-one relationships with customers, hopefully

prolonging loyalty and prompting repeat business. As such, data mining is used extensively in the area of

customer relationship management. Large users of data mining in retailing industry include Wal-Mart and

Victoria's Secret.

• Remotely sensed data: Huge amounts of remotely sensed data are taken in every day from satellite images

and other related sources. Data mining is used in prediction of weather, monitoring and reasoning about

ozone depletion, etc.

Advantages of Data Mining

• Provide better information to achieve competitive edge

– This advantage is the primary motivation for data mining. Data mining has a powerful analytical ability

to generate information, which allows an organisation to better understand itself, its customers, and

the marketplace it competes in. When used as a marketing tool, data mining often results in sharper

competitive edge, an evidence-based selling approach, a customer-oriented marketing plan, shorter

selling cycles, and reduced operational costs.

• Add value to a data warehouse

– A data warehouse by itself is just a large repository of unstructured data, and data mining is the

process of analysing the data and transforming it into useful information. Organisations have

experienced a payback of 10 to 70 times their data warehouse investment after data mining

components are added.

• Increase operating efficiency

– Data mining's ability to quickly organise and analyse a large pool of data has dramatically increased

workplace efficiency. It allows users to create complex financial statement in minutes compared with

weeks by traditional methods.

• Provide flexibility in using data

[email protected]

– With data mining, users gain control over the data. Instead of letting the system push the data, users

are now able to pull the data they need. Users can let their imagination run and manipulate data in

various ways to answer their questions. The easy-to-use interface of data mining tools and

client/server technology has made the information directly accessible by individual users.

• Reduce operating costs

– Modern data mining tools are made of highly sophisticated hardware and software components. They

allow these tools to analyse massive data sets efficiently with reduced operating costs. (e.g. the high

costs faced by public sector organisations such as healthcare providers when asked to answer a

“parliamentary question” raised in the Oireachtas could be reduced by the use of data warehouses

and data mining)

• Ready-to-use

– Unlike traditional data analysis methods, data mining hardly requires pre-processing of data prior to

analysis. It can use a mixture of numeric, categorical, and date data, and can tolerate missing and

noisy data. The results are in the form of ready-to-use business rules with almost no statistical

expertise and guesswork needed.

• Solve research bottleneck

– In many social science and business situations, conducting real experiments is almost impossible. Data

mining is able to provide these research agendas with a more limited set of working hypotheses for

further investigation based on large, unstructured data sets.

Disadvantages of Data Mining

• No definitive answer

– Data mining yields useful insights and clues but no definitive answers. The definitive answers need to

be achieved through much more rigorous scientific experimentation. Experiences from Wall Street

have shown that this technology may not outperform traditional methods. Therefore, users should

have a realistic expectation of the results of data mining.

• High cost

– The cost of implementing data mining is quite high; thus, it may not be appropriate in some business

environments. Need to justify ROI by cost-benefit analysis

• Complex and lengthy project

– Experience from data mining system developers has shown that it takes a long time to get the project

right. Developers suggest focusing on incremental development and benefits.

• Privacy

– The detailed data about individuals used in data mining might involve a violation of privacy. This

problem worsens when the World Wide Web is involved, because detailed personal information is

easily accessible and can fall into wrong hands.

• Knowledge requirement of user

[email protected]

– Despite its increasingly simple interface and automation of the thinking processes, data mining is more

suitable for people with statistical, operation research, and management science backgrounds. The

ease of use becomes a critical factor for attracting more businesses to invest in this technology.

• Unmanageable database

– Many authors have suggested that organisations must increase the size of their databases

tremendously in order to do data mining. However, some are concerned that this will result in

unmanageable and unnecessary databases.

• Wrong information from errors in data

– The massive data used in data mining inevitably contains mistakes caused by human errors.

Information generated should be used with caution to avoid lawsuits in areas such as hiring. Experts

suggest using only relevant information for mining to reduce such risks.

Unit-4

Knowledge Management – write about tps,mis,dss etc.. refer ism notes

KM is a process which helps organizations to identify, select, organize, disseminate, and transfer important

information and expertise that are part of organization’s memory.

Structuring of knowledge enables effective and efficient problem solving , dynamic learning , strategic

planning , and decision making

Knowledge is very distinct from data and information.

Knowledge is information that is contextual, relevant, and actionable.

Data are a collection of facts , measurements, and statistics.

Information is processed data that are timely and accurate.

[email protected]

Knowledge management Life Cycle

Create knowledge

[email protected]

Capture knowledge

Refine knowledge

Store knowledge

Manage knowledge

Disseminate knowledge

Knowledge Assets

What are knowledge assets?

codified human expertise; stored in a digital/electronic format;used to create organizational value; owned by the

organization,

not vulnerable to memory loss; commonly deployed via Intranets.

Knowledge management interpreted as managing information.

Knowledge as assets, which forms a major part of organization's value.

Knowledge assets are the resources that organizations wish to cultivate.

Knowledge assets can be human, such as a person or a network of people; structural assets, such as business

process; or market assets, such as the brand name of a product.

Concept of intellectual capital (IC), which helps managers to identify and classify the knowledge components

of an organization.

The expression “intellectual capital statement” refers to “capital”, emphasizing the accounting value.

Help managers in managing and evaluating the company performance.

A knowledge asset creates, modifies, stores and/or disseminates knowledge objects.

For example, a person is a knowledge asset that can create new ideas, learning's, and proposals (knowledge

objects).

What are the characteristics of knowledge assets?

1. promote understanding

2. provide guidance for decision-making

3. record facts about critical decisions

4. create metaknowledge about how work changes

Why are knowledge assets valuable?

• Better Decisions. Make decisions faster and at lower levels.Assets allow work to be accomplished with less

supervisionand intervention.

[email protected]

• Mass Mentoring. Selective, pass-along elements of the corporate memory are actualized. Training is built-in.

• Access. Immediately access the organization’s best knowledge anywhere in the world by anyone who needs it, when

they need it.

• Time Savings. Assets help individuals simultaneously understand, learn, perform, and record performance -- all in a

single action.

Who participates in creating knowledge assets?

Customers Employees Experts Advisors Vendors Academia Regulators

How do we begin?

• Assess readiness.

• Define your strategy.

• Create strategic measurements.

• Establish a glossary.

• Organize the CMM team.

• Select a project for Knowledge Harvesting.

Knowledge Utilization

One of four activities integral to knowledge management

Knowledge creation, retention & transfer

Knowledge utilization necessary to bridge knowledge gap

Helps in achieving organizational goals and objectives

Stages of knowledge utilization

[email protected]

Knowledge Generation

Information, Knowledge and Actions

– Based on Experiences, Values, Rules

• Conscious and Intentional K generation

• Five modes of knowledge generation:

– Acquisition– Dedicated Resources– Fusion– Adaptation– Knowledge Networking

[email protected]

[email protected]

Technologies

Early KM technologies included online corporate yellow pages as expertise locators and document management

systems.[24] Combined with the early development of collaborative technologies (in particular Lotus Notes), KM

technologies expanded in the mid-1990s.[24] Subsequent KM efforts leveraged semantic technologies forsearch and

retrieval and the development of e-learning tools for communities of practice (Capozzi 2007).[49] Knowledge

management systems can thus be categorized as falling into one or more of the following groups: Groupware,

document management systems, expert systems, semantic networks, relational and object oriented databases,

simulation tools, and artificial intelligence[9]

More recently, development of social computing tools (such as bookmarks, blogs, and wikis) have allowed more

unstructured, self-governing or ecosystem approaches to the transfer, capture and creation of knowledge, including

the development of new forms of communities, networks, or matrixed organisations.[33][50]However such tools for the

most part are still based on text and code, and thus represent explicit knowledge transfer.[51] These tools face

challenges in distilling meaningful re-usable knowledge and ensuring that their content is transmissible through

diverse channels

http://en.wikipedia.org/wiki/Online

http://en.wikipedia.org/wiki/Yellow_pages

http://en.wikipedia.org/wiki/Document_management_system

http://en.wikipedia.org/wiki/Document_management_system

http://en.wikipedia.org/wiki/Knowledge_management#cite_note-19Harv-24

http://en.wikipedia.org/wiki/Lotus_Notes

http://en.wikipedia.org/wiki/Knowledge_management#cite_note-19Harv-24

http://en.wikipedia.org/wiki/Semantic

http://en.wikipedia.org/wiki/Search_engine_technology

http://en.wikipedia.org/wiki/E-learning

http://en.wikipedia.org/wiki/Communities_of_practice

http://en.wikipedia.org/wiki/Knowledge_management#CITEREFCapozzi2007

http://en.wikipedia.org/wiki/Knowledge_management#cite_note-26Cap-49

http://en.wikipedia.org/wiki/Artificial_intelligence

http://en.wikipedia.org/wiki/Artificial_intelligence

http://en.wikipedia.org/wiki/Enterprise_bookmarking

http://en.wikipedia.org/wiki/Wiki

http://en.wikipedia.org/wiki/Social_network

http://en.wikipedia.org/wiki/Matrix_management

http://en.wikipedia.org/wiki/Knowledge_management#cite_note-21TLO-33

http://en.wikipedia.org/wiki/Knowledge_management#cite_note-21TLO-33

http://en.wikipedia.org/wiki/Code

http://en.wikipedia.org/wiki/Knowledge_management#cite_note-32Mc-51

http://en.wikipedia.org/wiki/Channel_(communications)

business intelligence and applications

Education

data marts

data warehouses

data warehouse environment

dss dss

conferencing tools

communication tools

collaborative management

information management