business intelligence and applications
TRANSCRIPT
[email protected] Page 1
Difference Between GDSS and DSS DSS focuses on groups rather than a particular person such as in DSS. • GDSS has a networking structure or technology which DSS does not.
GDSS focuses on group decisions while DSS focuses on individual decisions
DSS relies to a knowledge base to some degree while a GDSS does not
GDSS requires a working connection between users while DSS does not
GDSS is applicable to every situation while DSS is not
Groupware
is a term that refers to technology designed to help people collaborate and includes a wide range of applications. Wikipedia defines three handy
categories for groupware:
Communication tools:Tools for sending messages and files, including email, webpublishing, wikis, filesharing, etc.
Conferencing tools:e.g. video/audio conferencing, chat, forums, etc.
Collaborative management tools:Tools for managing group activities, e.g. project management systems, workflow systems,
information management systems, etc.
The best known groupware system is Lotus Notes.
If designed and implemented properly, groupware systems are very useful when it comes to supporting knowledge management (KM). They can
greatly facilitateexplicit knowledgesharing through publishing and communication tools. They can support theknowledge creationprocess with
collaborative management tools - although this process is still very much about people interacting and experimenting. Finally, they have some
limited benefit totacit knowledgetransfer by supporting socialization through tools like video conferencing and informal communication. Expert
finders are also beneficial for facilitating the location of tacit sources of knowledge (i.e. the right expert).
Data mart
A data mart is the access layer of the data warehouse environment that is used to get data out to the users. The data mart is a subset
of the data warehouse that is usually oriented to a specific business line or team. Data marts are small slices of the data warehouse. Whereas
data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department. In some deployments, each
department or business unit is considered the owner of its data mart including all the hardware, software anddata.[1] This enables each
department to use, manipulate and develop their data any way they see fit; without altering information inside other data marts or the data
warehouse. In other deployments where conformed dimensions are used, this business unit ownership will not hold true for shared dimensions
like customer, product, etc.
The primary use for a data mart is business intelligence (BI) applications. BI is used to gather, store, access and analyze
data. The data mart can be used by smaller businesses to utilize the data they have accumulated. A data mart can be less expensive
than implementing a data warehouse, thus making it more practical for the small business. A data mart can also be set up in much
less time than a data warehouse, being able to be set up in less than 90 days. Since most small businesses only have use for a small
number of BI applications, the low cost and quick set up of the data mart makes it a suitable method for storing data
Reasons for creating a data mart[edit]
Easy access to frequently needed data
[email protected] Page 2
Creates collective view by a group of users
Improves end-user response time
Ease of creation
Lower cost than implementing a full data warehouse
Potential users are more clearly defined than in a full data warehouse
Contains only business essential data and is less cluttered
Data manipulation language
A data manipulation language (DML) is a family of syntax elements similar to a computer programming language used for inserting,
deleting and updating data in a database. Performing read-only queries of data is sometimes also considered a component of DML.
A popular data manipulation language is that of Structured Query Language (SQL), which is used to retrieve and manipulate data in
a relational database.[1] Other forms of DML are those used by IMS/DLI, CODASYL databases, such as IDMS and others.
Data manipulation languages have their functional capability organized by the initial word in a statement, which is almost always a
verb. In the case of SQL, these verbs are:
SELECT ... FROM ... WHERE ...
INSERT INTO ... VALUES ...
UPDATE ... SET ... WHERE ...
DELETE FROM ... WHERE ...
The purely read-only SELECT query statement is classed with the 'SQL-data' statements[2] and so is considered by the standard to be
outside of DML. The SELECT ... INTO form is considered to be DML because it manipulates (i.e. modifies) data. In common practice though, this
distinction is not made and SELECT is widely considered to be part of DML.[3]
Most SQL database implementations extend their SQL capabilities by providing imperative, i.e. procedural languages. Examples of
these are Oracle's PL/SQL and DB2's SQL PL.
Data manipulation languages tend to have many different flavors and capabilities between database vendors. There have been a
number of standards established for SQL by ANSI,[1] but vendors still provide their own extensions to the standard while not implementing the
entire standard.
Data manipulation languages are divided into two types, procedural programming and declarative programming.
Data manipulation languages were initially only used within computer programs, but with the advent of SQL have come to be used
interactively by database administrators.
Data Warehousing
It is a process
Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making
decisions that were not previous possible
A decision support database maintained separately from the organization’s operational database
[email protected] Page 3
A data warehouse is a
subject-oriented
integrated
time-varying
non-volatile
collection of data that is used primarily in organizational decision making.
-- Bill Inmon, Building the Data Warehouse 1996
Advantages of Warehousing
l High query performance
l Queries not visible outside warehouse
l Local processing at sources unaffected
l Can operate when sources unavailable
l Can query data not stored in a DBMS
l Extra information at warehouse
u Modify, summarize (store aggregates)
u Add historical information
UNIT-1
[email protected] Page 4
Decision Support System
Stages of Decision Making
(From Herbert Simon)
• Intelligence (in the military sense of gathering information)
• Design (Identifying the alternatives, structuring how the decision will be made)
• Choice (Picking an alternative or making the judgment)
• [Implementation – later added by other authors]
• [Evaluation]
• Each stage can be Structured (automated) or Unstructured
• “Structured” means that there is an algorithm, mathematical formula, or decision rule to accomplish the entire
stage. The algorithm can be implemented manually or it can be computerized, but the steps are so detailed
that no little or no human judgment would be needed.
• Any decision stage that is not structured is unstructured
• In a structured decision all three stages are structured
• In a non-structured decision all three stages are unstructured
• A semi-structured decision is one in which part, but not all, of the decision is structured.
• The realm of Decision Support Systems is Semi-Structured and unstructured Decisions…
• The type of decisions that can benefit from “decision support” but the human decision maker is still involved.
• A decision support system (DSS) is a computer-based application that collects, organizes and analyzes business
data to facilitate quality business decision-making for management, operations and planning.
• A DSS is a computer system at the management level of an organization that combines data, sophisticated
analytical tools & user friendly software to support semi-structured & unstructured decision making.
Capabilities of a Decision Support System (1)
• Support for problem-solving phases--Intelligence, design, choice, implementation, monitoring
• Support for different decision frequencies
Ad hoc DSS: Concerned with decisions that come up once in every 5 years (e.x., where should a company open
a new distribution center?)
Institutional DSS: Concerned with decisions that repeat (e.x., what should the company invest in?
• Support for different problem structures
– Highly structured problems: Known facts and relationships
– Semi-structured problems: Facts unknown or ambiguous, relations vague
[email protected] Page 5
– E.x.Which person to hire for a position?
• Support for various decision-making levels
– Operational level: Daily decisions
– Tactical level: Planning and control
– Strategic level: Long-term decisions
Support for Various Decision-Making Levels
Selected DSS Applications
[email protected] Page 6
Components of DSS
• A typical Decision support systems has four components: data management, model management, knowledge
management and user interface management.
Data Management Component
The data management component performs the function of storing and maintaining the information that you
want your Decision Support System to use. The data management component, therefore, consists of both the
Decision Support System information and the Decision Support System database management system.
Model Management Component
The model management component consists of both the Decision Support System models and the Decision
Support System model management system.
[email protected] Page 7
Decision Support Systems help in various decision-making situations by utilizing models that allow you to
analyze information in many different ways.
For example, you would use what-if analysis to see what effect the change of one or more variables will have
on other variables, or optimization to find the most profitable solution given operating restrictions and limited
resources. Spreadsheet software such as excel can be used as a Decision Support System for what-if analysis.
The model management system stores and maintains the Decision Support System’s models. Its function of
managing models is similar to that of a database management system. The model management component can not
select the best model for you to use for a particular problem that requires your expertise but it can help you create
and manipulate models quickly and easily.
User Interface Management Component
The user interface management component allows you to communicate with the Decision Support System. It
consists of the user interface management system. This is the component that allows you to combine your know-how
with the storage and processing capabilities of the computer.
The user interface is the part of the system you see through it when enter information, commands, and models.
Knowledge Management System
The Knowledge--Based Management Subsystem can support any of the other components or act as an independent
component.
The knowledge management component, like that in an expert system, provides information about the relationship
among data that is too complex for a database to represent. It consists of rules that can constrain possible solution as
well as alternative solutions and methods for evaluating them.
Types Of DSS
• A communication-driven DSS supports more than one person working on a shared task; examples include
integrated tools like Microsoft's NetMeeting, e-mail.
• A data-driven DSS or data-oriented DSS emphasizes access to and manipulation of a time series of internal
company data and, sometimes, external data.
• A document-driven DSS manages, retrieves, and manipulates unstructured information in a variety of
electronic formats.
• A knowledge-driven DSS provides specialized problem-solving expertise stored as facts, rules, procedures, or
in similar structures.
• A model-driven DSS emphasizes access to and manipulation of a statistical, financial, optimization, or
simulation model.
[email protected] Page 8
GDSS – Group Decision Support Systems
Group Decision Support Systems (GDSS) - An interactive, computer-based system that facilitates solution of
unstructured problems by a set of decision-makers working together as a group. It aids groups, especially
groups of managers, in analyzing problem situations and in performing group decision making tasks.
Group Support Systems has come to mean computer software and hardware used to support group functions
and processes.
Group Decision Support Systems (GDSS)
• Class of electronic meeting systems, designed to support meetings and group work.
• Electronic meeting system (EMS) is a type of computer software that facilitates creative problem solving and
decision-making of groups within or across organizations
• Mainly through anonymization and parallelization of input, electronic meeting systems overcome many
inhibitive features of group work.
Process in GDSS
• Similar to a web conference, a host invites the participants to an electronic meeting via email.
• User typically has his or her own computer, and each user can contribute to the same shared object at the
same time.
• Thus, nobody needs to wait for a turn to speak; so people don't forget what they want to say while they are
waiting for the floor.
• The group can focus on the content and meaning of ideas, rather than on their sources.
Similarities Between GDSS and DSS
Both use models, data and user-friendly software
Both are interactive with “what-if” capabilities
Both use internal and external data
Both allow the decision maker to take an active role
Both have flexible systems
Both have graphical output
Characteristics of a GDSS (1)
Special design:
◦ Effective communication
◦ Group decision making
Ease of use
Flexibility
◦ Accommodate different perspectives
[email protected] Page 9
Anonymous input
◦ Individuals’ names are not exposed
Parallel communication
Decision-making support
◦ Delphi approach: Decision makers are scattered around the globe
◦ Brainstorming: Say things as you think---think out loud
◦ Group consensus approach: The group reaches a unanimous decision (everybody agrees)
◦ Nominal group technique: Voting
Reduction of negative group behavior
◦ A trained meeting facilitator to help with sidetracking
Automated record keeping
Examples of GDSS Software
Lotus Notes
◦ Store, manipulate, distribute memos
Microsoft Exchange
◦ Keep individual schedules
◦ Decide on meeting times
NetDocuments Enterprise
◦ Two people can review the same document together
GDSS Time/Place Environment
GDSS Alternatives
[email protected] Page 10
Advantages of GDSS
Anonymity – drive out fear leading to better decisions from a diverse hierarchy of decision makers
Parallel Communication – eliminate monopolizing providing increased participation, better decisions
Automated record keeping – no need to take notes, they’re automatically recorded
Ability for virtual meetings – only need hardware, software and people connected
Portability - Can be set up to be portable… laptop
Global Potential - People can be connected across the world
No need for a computer guru – although some basic experience is a must
Disadvantages of GDSS
Cost –infrastructure costs to provide the hardware and software/room/network connectivity can be very
expensive
Security – especially true when companies rent the facilities for GDSS; also, the facilitator may be a lower-level
employee who may leak information to peers
Technical Failure – power loss, loss of connectivity, relies heavily on bandwidth and LAN/WAN infrastructure –
properly setup system should minimize this risk
Keyboarding Skills – reduced participation may result due to frustration
Training – learning curve is present for users, varies by situation
Perception of messages
Typical GDSS Process
1) Group Leader (and Facilitator?) select software, develop agenda
2) Participants meet (in decision room/Internet) and are given a task.
3) Participants generate ideas – brainstorm anonymously
4) Facilitator organize ideas into categories (different for user-driven software)
[email protected] Page 11
5) Discussion and prioritization – may involve ranking by some criteria and/or rating to the facilitators scale
6) Repeat Steps 3, 4, 5 as necessary
7) Reach decision
8) Recommend providing feedback on decision and results to all involved
Future Implications of GDSS
Integrating into existing corporate framework
◦ GDSS brings changes which must be managed
GDSS will incorporate Artificial Intelligence and Expert Systems – the software will “learn” and help the users
make better decisions
Decreasing cost will allow more organizations to use GDSS
Increasing implementation of GDSS with the customer
◦ Customer voice their needs in non-threatening environment
Choosing The Right GDSS
Consider the following;
◦ Decision Task Type
◦ Group Size
◦ Location of members of the group
EXPERT SYSTEM( some from ism notes)
Components of an Expert System
Knowledge base
◦ Stores all relevant information, data, rules, cases, and relationships used by the expert system
Inference engine
◦ Seeks information and relationships from the knowledge base and provides answers, predictions, and
suggestions in the way a human expert would
Rule
◦ A conditional statement that links given conditions to actions or outcomes
Fuzzy logic
◦ A specialty research area in computer science that allows shades of gray and does not require
everything to be simply yes/no, or true/false
Backward chaining
[email protected] Page 12
◦ A method of reasoning that starts with conclusions and works backward to the supporting facts
Forward chaining
◦ A method of reasoning that starts with the facts and works forward to the conclusions
Limitations of Expert Systems
Not widely used or tested
Limited to relatively narrow problems
Cannot readily deal with “mixed” knowledge
Possibility of error
Cannot refine own knowledge base
Difficult to maintain
May have high development costs
Raise legal and ethical concerns
UNIT-2
Data Warehousing
A data warehouse (DW) is a collection of integrated databases designed to support a DSS
An operational data store (ODS) stores data for a specific application. It feeds the data warehouse a stream of desired
raw data.
A data mart is a lower-cost, scaled-down version of a data warehouse, usually designed to support a small group of users
(rather than the entire firm)
[email protected] Page 13
The metadata is information that is kept about the warehouse
Online Analytical Processing (OLAP) is the broad category of software technology that enables multidimensional analysis
of enterprise data
[email protected] Page 14
n Data sources often store only current data, not historical data
n Corporate decision making requires a unified view of all organizational data, including historical data
n A data warehouse is a repository (archive) of information gathered from multiple sources, stored under a
unified schema, at a single site
l Greatly simplifies querying, permits study of historical trends
l Shifts decision support query load away from transaction processing systems
Data Warehouse Data Mart
1. It is a large store of data accumulated from a wide range of sources within a company and used to guide management decisions.
It is one of the access layers of the data warehouse environment that is used to get data out to the users. Or, we can say data marts are small slices of the data warehouse.
2. It is corporate/enterprise wide.
It is department specific in nature.
3. It is the union of all data marts.
It is a subset of a data warehouse.
4. Here data is received from staging area.
Here it follows the star join approach (i.e. facts and dimensions are
[email protected] Page 15
Design Issues
n When and how to gather data
l Source driven architecture: data sources transmit new information to warehouse, either continuously
or periodically (e.g. at night)
l Destination driven architecture: warehouse periodically requests new information from data sources
l Keeping warehouse exactly synchronized with data sources (e.g. using two-phase commit) is too
expensive
Usually OK to have slightly out-of-date data at warehouse
Data/updates are periodically downloaded form online transaction processing (OLTP) systems.
n What schema to use
l Schema integration
n Data cleansing
l E.g. correct mistakes in addresses (misspellings, zip code errors)
l Merge address lists from different sources and purge duplicates
n How to propagate updates
l Warehouse schema may be a (materialized) view of schema from data sources
n What data to summarize
l Raw data may be too large to store on-line
l Aggregate values (totals/subtotals) often suffice
l Queries on raw data can often be transformed by query optimizer to use aggregate values
connected).
5. Its structure is more suitable for corporate view of data.
Its structure is more suitable for departmental view of data.
6. It handles queries on presentation resource.
Here data is technologically optimal for data access and analysis.
[email protected] Page 16
Architecture of Data Warehouse
[email protected] Page 17
[email protected] Page 18
[email protected] Page 19
Data Marts
•Data warehouses can support all of an organization’s information
•Data marts have subsets of an organization wide data warehouse
•Data mart – subset of a data warehouse in which only a focused portion of the data warehouse information is kept
[email protected] Page 20
[email protected] Page 21
[email protected] Page 22
Unit-3
Data Mining(the analysis step of the "Knowledge Discovery in Databases" process, or KDD)
• Definition: “the analysis of data to discover previously unknown relationships that provide useful information”
(Hand et al.)
• Data mining makes use of statistical and visualisation techniques to discover and present information in a form
that is easily comprehensible
• Data mining can be applied to tasks such as decision support, forecasting, estimation, and uncovering and
understanding relationships among data elements
• Traditionally the task of identifying and utilising information hidden in data has been achieved through some
form of traditional statistical methods
• Typically, this involves a user formulating a guess about a possible relationship in the data and evaluating this
hypothesis via a statistical test. This is a largely time-intensive, user-driven, top-down approach to data
analysis.
• With data mining, the interrogation of the data is done by the data mining algorithm rather than by the user
[email protected] Page 23
• Data mining is a self-organising, data-influenced, bottom-up approach to data analysis
• Simply put, what data mining does is sort through masses of data to uncover patterns and relationships, then
build models to predict behaviours
Verification -v- Knowledge Data Discovery
• In the past, decision support activities were primarily based on the concept of verification
• This required a great deal of prior knowledge on the decision-maker’s part in order to verify a suspected
relationship
• With the advance of technology, the concept of verification began to turn into knowledge data discovery
Knowledge Data Discovery
• Knowledge data discovery (KDD) techniques include: statistical analysis, neural or fuzzy logic, intelligent
agents, data visualisation
• KDD techniques not only discover useful patterns in the data, but also can be used to develop predictive
models
Knowledge Data Discovery
• Knowledge data discovery (KDD) techniques include: statistical analysis, neural or fuzzy logic, intelligent
agents, data visualisation
• KDD techniques not only discover useful patterns in the data, but also can be used to develop predictive
models
Data mining[edit]
Data mining involves six common classes of tasks:[1]
Anomaly detection (Outlier/change/deviation detection) – The identification of unusual data records, that
might be interesting or data errors that require further investigation.
Association rule learning (Dependency modeling) – Searches for relationships between variables. For example
a supermarket might gather data on customer purchasing habits. Using association rule learning, the
supermarket can determine which products are frequently bought together and use this information for
marketing purposes. This is sometimes referred to as market basket analysis.
Clustering – is the task of discovering groups and structures in the data that are in some way or another
"similar", without using known structures in the data.
Classification – is the task of generalizing known structure to apply to new data. For example, an e-mail
program might attempt to classify an e-mail as "legitimate" or as "spam".
Regression – attempts to find a function which models the data with the least error.
Summarization – providing a more compact representation of the data set, including visualization and report
generation.
Data Mining Technologies
[email protected] Page 24
k-nearest neighbors algorithm
In pattern recognition, the k-Nearest Neighbors algorithm (or k-NN for short) is a non-parametric method used
forclassification and regression.[1] In both cases, the input consists of the k closest training examples in the feature
space. The output depends on whether k-NN is used for classification or regression:
o In k-NN classification, the output is a class membership. An object is classified by a majority vote of its
neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a
positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single
nearest neighbor.
o In k-NN regression, the output is the property value for the object. This value is the average of the values
of its knearest neighbors.
k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all
computation is deferred until classification. The k-NN algorithm is among the simplest of all machine
learning algorithms.
Both for classification and regression, it can be useful to weight the contributions of the neighbors, so that the nearer
neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme
consists in giving each neighbor a weight of 1/d, where d is the distance to the neighbor.[2]
The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object property value
(for k-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training
step is required.
• Statistics – the most mature data mining technologies, but are often not applicable because they need clean
data. In addition, many statistical procedures assume linear relationships, which limits their use.
• Neural networks, genetic algorithms, fuzzy logic – these technologies are able to work with complicated and
imprecise data. Their broad applicability has made them popular in the field.
• Decision trees – these technologies are conceptually simple and have gained in popularity as better tree
growing software was introduced. Because of the way they are used, they are perhaps better called
“classification” trees.
A decision tree is a predictive model
A decision tree is a diagram that a decision maker can create to help select the best of several alternative
courses of action.
Problem, Options, Outcome of each option.
How to draw a Decision Tree
Start with the decision which needs to be made ––draw a box
Draw to the right possible solutions on lines
At end of each line, if result is uncertain, draw circle; if other decision, draw box
From the other decisions, draw lines for options which can be taken
Calculate decision which has greatest worth to you and give it a value
[email protected] Page 25
Estimate probability of each uncertainty
Data Mining Techniques
• Paralleling the popularity of data mining itself, the development of new techniques is exploding as well
• Many innovations are vendor-specific, which sometimes does little to advance the state of the art
• Regardless, data-mining techniques tend to fall into four major categories:
– classification
– association
– sequencing
– clustering
Classification Methods
• The goal is to discover rules that define whether an item belongs to a particular subset or class of data
[email protected] Page 26
• For example, if we are trying to determine which households will respond to a direct mail campaign, we will
want rules that separate the “probables” from the not probables.
• These IF-THEN rules often are portrayed in a tree-like structure
Sequencing Methods
• These methods are applied to time series data in an attempt to find hidden trends
• If found, these can be useful predictors of future events
• For example, customer groups that tend to purchase products tied-in with hit movies would be targeted with
promotional campaigns timed to release dates
Clustering Techniques
• Clustering techniques attempt to create partitions in the data according to some “distance” metric
• Clustering aims to segment a diverse group into a number of similar subgroups or clusters
• The clusters formed are data grouped together simply by their similarity to their neighbours
• By examining the characteristics of each cluster, it may be possible to establish rules for classification
• In clustering, there are no predefined classes and no examples. The records are grouped together on the basis
of self-similarity.
Association Methods
• These techniques search all transactions from a system for patterns of occurrence
• A common method is market basket analysis, in which the set of products purchased by thousands of
consumers are examined
– It finds affinity groupings that discover what items are usually purchased with others, predicting the
frequency with which certain items are purchased at the same time
• Results are then portrayed as percentages; for example, “30% of the people that buy steaks also buy charcoal”
Association: Market Basket Analysis
• This is the most widely used and, in many ways, most successful data mining algorithm
• It essentially determines what products people purchase together
• Retailers can use this information to place these products in the same area
• Direct marketers can use this information to determine which new products to offer to their current
customers
• Inventory policies can be improved if reorder points reflect the demand for the complementary products
Market Basket Analysis Method
• We first need a list of transactions to see what was purchased. This can be easily obtained from cash registers
/ POS devices.
[email protected] Page 27
• Next, we choose a list of products to analyse, and tabulate how many times each was purchased with the
others …
[email protected] Page 28
Data Mining: Some Applications
• Pharmaceuticals: Massive amounts of biological and clinical information can be analysed with data mining
methods to discover new uses for existing drugs
• Healthcare: Hospitals are using data mining to perform utilisation analysis and pricing analysis, to estimate
outcome analysis, to improve preventive care, and to detect fraud and questionable practices
• Banking: Data mining tools help banks to understand customer behaviour, conduct profitability analysis,
improve cross-selling efforts, identify credit risk, identify customers for loan campaigns, tailor financial
products to meet customer needs, seek new customers, and enhance customer service
• Credit card companies: Predictors for credit card customer attrition and fraud are frequently identified via
data mining. Successful users of data mining include American Express and Citibank.
• Financial services: Security analysts are using data mining extensively to analyse large volumes of financial
data in order to build trading and risk models for developing investment strategies
• Telemarketing and direct marketing: In this sector, companies have gained big savings and are able to target
customers more accurately by using data mining. Direct marketers are configuring and mailing their product
catalogs based on customers' purchase history and demographic data.
[email protected] Page 29
• Airlines: As the competition in the airline business increases, understanding customers' needs has become
imperative. Airlines capture customer data in order to make strategic movements such as expanding their
services in new routes.
• Manufacturers: Data mining is widely used in manufacturing industries to control and schedule technical
production processes.
• Insurance companies: The insurance industry is data intensive. Data mining has recently provided insurers
with a wealth of useful information extracted from huge databases for decision making.
• Telecommunications: By applying the insights learned through data mining, telecommunications companies
can identify products and services that maximise value and then use this information to establish marketing
campaigns to improve market share. A common example in this industry is identifying factors that influence
customer retention. In the US, telephone companies were famous for their price-cutting strategy in the past,
but the new strategy is to know their customers better. Using data mining, telephone companies are able to
provide customers with a great variety of new services they are likely to purchase.
• Distribution and retailing: With the huge amount of consumer data flowing in daily from different sources,
especially from e-commerce Web sites, data mining helps companies learn more about their customers and
develop insights into their buying habits. Knowing the behaviours (e.g. likes and dislikes) of customers leads to
better customer service and allows companies to create one-to-one relationships with customers, hopefully
prolonging loyalty and prompting repeat business. As such, data mining is used extensively in the area of
customer relationship management. Large users of data mining in retailing industry include Wal-Mart and
Victoria's Secret.
• Remotely sensed data: Huge amounts of remotely sensed data are taken in every day from satellite images
and other related sources. Data mining is used in prediction of weather, monitoring and reasoning about
ozone depletion, etc.
Advantages of Data Mining
• Provide better information to achieve competitive edge
– This advantage is the primary motivation for data mining. Data mining has a powerful analytical ability
to generate information, which allows an organisation to better understand itself, its customers, and
the marketplace it competes in. When used as a marketing tool, data mining often results in sharper
competitive edge, an evidence-based selling approach, a customer-oriented marketing plan, shorter
selling cycles, and reduced operational costs.
• Add value to a data warehouse
– A data warehouse by itself is just a large repository of unstructured data, and data mining is the
process of analysing the data and transforming it into useful information. Organisations have
experienced a payback of 10 to 70 times their data warehouse investment after data mining
components are added.
• Increase operating efficiency
– Data mining's ability to quickly organise and analyse a large pool of data has dramatically increased
workplace efficiency. It allows users to create complex financial statement in minutes compared with
weeks by traditional methods.
• Provide flexibility in using data
[email protected] Page 30
– With data mining, users gain control over the data. Instead of letting the system push the data, users
are now able to pull the data they need. Users can let their imagination run and manipulate data in
various ways to answer their questions. The easy-to-use interface of data mining tools and
client/server technology has made the information directly accessible by individual users.
• Reduce operating costs
– Modern data mining tools are made of highly sophisticated hardware and software components. They
allow these tools to analyse massive data sets efficiently with reduced operating costs. (e.g. the high
costs faced by public sector organisations such as healthcare providers when asked to answer a
“parliamentary question” raised in the Oireachtas could be reduced by the use of data warehouses
and data mining)
• Ready-to-use
– Unlike traditional data analysis methods, data mining hardly requires pre-processing of data prior to
analysis. It can use a mixture of numeric, categorical, and date data, and can tolerate missing and
noisy data. The results are in the form of ready-to-use business rules with almost no statistical
expertise and guesswork needed.
• Solve research bottleneck
– In many social science and business situations, conducting real experiments is almost impossible. Data
mining is able to provide these research agendas with a more limited set of working hypotheses for
further investigation based on large, unstructured data sets.
Disadvantages of Data Mining
• No definitive answer
– Data mining yields useful insights and clues but no definitive answers. The definitive answers need to
be achieved through much more rigorous scientific experimentation. Experiences from Wall Street
have shown that this technology may not outperform traditional methods. Therefore, users should
have a realistic expectation of the results of data mining.
• High cost
– The cost of implementing data mining is quite high; thus, it may not be appropriate in some business
environments. Need to justify ROI by cost-benefit analysis
• Complex and lengthy project
– Experience from data mining system developers has shown that it takes a long time to get the project
right. Developers suggest focusing on incremental development and benefits.
• Privacy
– The detailed data about individuals used in data mining might involve a violation of privacy. This
problem worsens when the World Wide Web is involved, because detailed personal information is
easily accessible and can fall into wrong hands.
• Knowledge requirement of user
[email protected] Page 31
– Despite its increasingly simple interface and automation of the thinking processes, data mining is more
suitable for people with statistical, operation research, and management science backgrounds. The
ease of use becomes a critical factor for attracting more businesses to invest in this technology.
• Unmanageable database
– Many authors have suggested that organisations must increase the size of their databases
tremendously in order to do data mining. However, some are concerned that this will result in
unmanageable and unnecessary databases.
• Wrong information from errors in data
– The massive data used in data mining inevitably contains mistakes caused by human errors.
Information generated should be used with caution to avoid lawsuits in areas such as hiring. Experts
suggest using only relevant information for mining to reduce such risks.
Unit-4
Knowledge Management – write about tps,mis,dss etc.. refer ism notes
KM is a process which helps organizations to identify, select, organize, disseminate, and transfer important
information and expertise that are part of organization’s memory.
Structuring of knowledge enables effective and efficient problem solving , dynamic learning , strategic
planning , and decision making
Knowledge is very distinct from data and information.
Knowledge is information that is contextual, relevant, and actionable.
Data are a collection of facts , measurements, and statistics.
Information is processed data that are timely and accurate.
[email protected] Page 33
Capture knowledge
Refine knowledge
Store knowledge
Manage knowledge
Disseminate knowledge
Knowledge Assets
What are knowledge assets?
codified human expertise; stored in a digital/electronic format;used to create organizational value; owned by the
organization,
not vulnerable to memory loss; commonly deployed via Intranets.
Knowledge management interpreted as managing information.
Knowledge as assets, which forms a major part of organization's value.
Knowledge assets are the resources that organizations wish to cultivate.
Knowledge assets can be human, such as a person or a network of people; structural assets, such as business
process; or market assets, such as the brand name of a product.
Concept of intellectual capital (IC), which helps managers to identify and classify the knowledge components
of an organization.
The expression “intellectual capital statement” refers to “capital”, emphasizing the accounting value.
Help managers in managing and evaluating the company performance.
A knowledge asset creates, modifies, stores and/or disseminates knowledge objects.
For example, a person is a knowledge asset that can create new ideas, learning's, and proposals (knowledge
objects).
What are the characteristics of knowledge assets?
1. promote understanding
2. provide guidance for decision-making
3. record facts about critical decisions
4. create metaknowledge about how work changes
Why are knowledge assets valuable?
• Better Decisions. Make decisions faster and at lower levels.Assets allow work to be accomplished with less
supervisionand intervention.
[email protected] Page 34
• Mass Mentoring. Selective, pass-along elements of the corporate memory are actualized. Training is built-in.
• Access. Immediately access the organization’s best knowledge anywhere in the world by anyone who needs it, when
they need it.
• Time Savings. Assets help individuals simultaneously understand, learn, perform, and record performance -- all in a
single action.
Who participates in creating knowledge assets?
Customers Employees Experts Advisors Vendors Academia Regulators
How do we begin?
• Assess readiness.
• Define your strategy.
• Create strategic measurements.
• Establish a glossary.
• Organize the CMM team.
• Select a project for Knowledge Harvesting.
Knowledge Utilization
One of four activities integral to knowledge management
Knowledge creation, retention & transfer
Knowledge utilization necessary to bridge knowledge gap
Helps in achieving organizational goals and objectives
Stages of knowledge utilization
[email protected] Page 35
Knowledge Generation
Information, Knowledge and Actions
– Based on Experiences, Values, Rules
• Conscious and Intentional K generation
• Five modes of knowledge generation:
– Acquisition– Dedicated Resources– Fusion– Adaptation– Knowledge Networking
[email protected] Page 36
[email protected] Page 37
Technologies
Early KM technologies included online corporate yellow pages as expertise locators and document management
systems.[24] Combined with the early development of collaborative technologies (in particular Lotus Notes), KM
technologies expanded in the mid-1990s.[24] Subsequent KM efforts leveraged semantic technologies forsearch and
retrieval and the development of e-learning tools for communities of practice (Capozzi 2007).[49] Knowledge
management systems can thus be categorized as falling into one or more of the following groups: Groupware,
document management systems, expert systems, semantic networks, relational and object oriented databases,
simulation tools, and artificial intelligence[9]
More recently, development of social computing tools (such as bookmarks, blogs, and wikis) have allowed more
unstructured, self-governing or ecosystem approaches to the transfer, capture and creation of knowledge, including
the development of new forms of communities, networks, or matrixed organisations.[33][50]However such tools for the
most part are still based on text and code, and thus represent explicit knowledge transfer.[51] These tools face
challenges in distilling meaningful re-usable knowledge and ensuring that their content is transmissible through
diverse channels