u nit 6 by sachin dhande march 11, 2016 1 data mining & dataware housing

107
UNIT 6 BY SACHIN DHANDE J u n e 2 1 , 2 0 2 2 1 D a t a M i n i n g & D a t a w a r e h o u s i n g

Upload: sharyl-hutchinson

Post on 20-Jan-2018

217 views

Category:

Documents


0 download

DESCRIPTION

W HY D ATA M INING ? The Explosive Growth of Data: from terabytes to petabytes Data collection and data availability Automated data collection tools, database systems, Web, computerized society Major sources of abundant data Business: Web, e-commerce, transactions, stocks, … Science: Remote sensing, bioinformatics, scientific simulation, … Society and everyone: news, digital cameras, YouTube We are drowning in data, but starving for knowledge! “Necessity is the mother of invention”—Data mining—Automated analysis of massive data sets March 11, Data Mining & Dataware housing

TRANSCRIPT

Page 1: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

UNIT 6

BY SACHIN DHANDE

May 14, 2023

1

Data Mining & Dataware housing

Page 2: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

CHAPTER 1. INTRODUCTION Motivation: Why data mining? What is data mining? Data Mining: On what kind of data? Data mining functionality Classification of data mining systems Top-10 most popular data mining algorithms Major issues in data mining Overview of the course

May 14, 2023

2

Data Mining & Dataware housing

Page 3: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

WHY DATA MINING? The Explosive Growth of Data: from terabytes to petabytes

Data collection and data availability Automated data collection tools, database systems, Web,

computerized society Major sources of abundant data

Business: Web, e-commerce, transactions, stocks, … Science: Remote sensing, bioinformatics, scientific

simulation, … Society and everyone: news, digital cameras, YouTube

We are drowning in data, but starving for knowledge! “Necessity is the mother of invention”—Data mining—

Automated analysis of massive data sets

May 14, 2023

3

Data Mining & Dataware housing

Page 4: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

WHAT IS DATA MINING?

Data mining (knowledge discovery from data) Extraction of interesting (non-trivial, implicit, previously

unknown and potentially useful) patterns or knowledge from huge amount of data

Data mining: a misnomer? Alternative names

Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.

Watch out: Is everything “data mining”? Simple search and query processing (Deductive) expert systems

May 14, 2023

4

Data Mining & Dataware housing

Page 5: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

KNOWLEDGE DISCOVERY (KDD) PROCESS

Data mining—core of knowledge discovery process

May 14, 2023

5

Data Mining & Dataware housing

Data Cleaning

Data Integration

Databases

Data Warehouse

Task-relevant Data

Selection

Data Mining

Pattern Evaluation

Page 6: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

DATA MINING AND BUSINESS INTELLIGENCE M

ay 14, 2023

6

Data Mining & Dataware housing

Increasing potentialto supportbusiness decisions End User

Business Analyst

DataAnalyst

DBA

Decision

MakingData PresentationVisualization Techniques

Data MiningInformation Discovery

Data ExplorationStatistical Summary, Querying, and Reporting

Data Preprocessing/Integration, Data WarehousesData Sources

Paper, Files, Web documents, Scientific experiments, Database Systems

Page 7: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

DATA MINING: CONFLUENCE OF MULTIPLE DISCIPLINES M

ay 14, 2023

7

Data Mining & Dataware housing

Data Mining

Database Technology Statistics

MachineLearning

PatternRecognition

AlgorithmOther

Disciplines

Visualization

Page 8: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

WHY NOT TRADITIONAL DATA ANALYSIS?

Tremendous amount of data Algorithms must be highly scalable to handle such as tera-

bytes of data High-dimensionality of data

Micro-array may have tens of thousands of dimensions High complexity of data

Data streams and sensor data Time-series data, temporal data, sequence data Structure data, graphs, social networks and multi-linked data Heterogeneous databases and legacy databases Spatial, spatiotemporal, multimedia, text and Web data Software programs, scientific simulations

New and sophisticated applications

May 14, 2023

8

Data Mining & Dataware housing

Page 9: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

MULTI-DIMENSIONAL VIEW OF DATA MINING Data to be mined

Relational, data warehouse, transactional, stream, object-oriented/relational, active, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW

Knowledge to be mined Characterization, discrimination, association, classification,

clustering, trend/deviation, outlier analysis, etc. Multiple/integrated functions and mining at multiple levels

Techniques utilized Database-oriented, data warehouse (OLAP), machine learning,

statistics, visualization, etc. Applications adapted

Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text mining, Web mining, etc.

May 14, 2023

9

Data Mining & Dataware housing

Page 10: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

DATA MINING: ON WHAT KINDS OF DATA?

Database-oriented data sets and applications Relational database, data warehouse, transactional database

Advanced data sets and advanced applications Data streams and sensor data Time-series data, temporal data, sequence data (incl. bio-sequences) Structure data, graphs, social networks and multi-linked data Object-relational databases Heterogeneous databases and legacy databases Spatial data and spatiotemporal data Multimedia database Text databases The World-Wide Web

May 14, 2023

10

Data Mining & Dataware housing

Page 11: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

DATA MINING: CLASSIFICATION SCHEMES

General functionality Descriptive data mining Predictive data mining

Different views lead to different classifications Data view: Kinds of data to be mined Knowledge view: Kinds of knowledge to be discovered Method view: Kinds of techniques utilized Application view: Kinds of applications adapted

May 14, 2023

11

Data Mining & Dataware housing

Page 12: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

DATA MINING FUNCTIONALITIES

Multidimensional concept description: Characterization and discrimination Generalize, summarize, and contrast data characteristics,

e.g., dry vs. wet regions Frequent patterns, association, correlation vs. causality

Beer Chips [0.5%, 75%] (Correlation or causality?) Classification and prediction

Construct models (functions) that describe and distinguish classes or concepts for future prediction E.g., classify countries based on (climate), or classify

cars based on (gas mileage) Predict some unknown or missing numerical values

May 14, 2023

12

Data Mining & Dataware housing

Page 13: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

DATA MINING FUNCTIONALITIES (2) Cluster analysis

Class label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patterns

Maximizing intra-class similarity & minimizing interclass similarity Outlier analysis

Outlier: Data object that does not comply with the general behavior of the data

Noise or exception? Useful in fraud detection, rare events analysis Trend and evolution analysis

Trend and deviation: e.g., regression analysis Sequential pattern mining: e.g., digital camera large SD

memory Periodicity analysis Similarity-based analysis

May 14, 2023

13

Data Mining & Dataware housing

Page 14: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

14SUPERVISED VS. UNSUPERVISED LEARNING

Supervised learning (classification)Supervision: The training data (observations,

measurements, etc.) are accompanied by labels indicating the class of the observations

New data is classified based on the training set Unsupervised learning (clustering)

The class labels of training data is unknownGiven a set of measurements, observations, etc.

with the aim of establishing the existence of classes or clusters in the data

Page 15: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

15

Classification predicts categorical class labels (discrete or nominal)classifies data (constructs a model) based on the

training set and the values (class labels) in a classifying attribute and uses it in classifying new data

Numeric Prediction models continuous-valued functions, i.e., predicts

unknown or missing values Typical applications

Credit/loan approval:Medical diagnosis: if a tumor is cancerous or benignFraud detection: if a transaction is fraudulentWeb page categorization: which category it is

PREDICTION PROBLEMS: CLASSIFICATION VS. NUMERIC PREDICTION

Page 16: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

16CLASSIFICATION—A TWO-STEP PROCESS

Model construction: describing a set of predetermined classes Each tuple/sample is assumed to belong to a predefined class, as

determined by the class label attribute The set of tuples used for model construction is training set The model is represented as classification rules, decision trees,

or mathematical formulae Model usage: for classifying future or unknown objects

Estimate accuracy of the model The known label of test sample is compared with the classified

result from the model Accuracy rate is the percentage of test set samples that are

correctly classified by the model Test set is independent of training set (otherwise overfitting)

If the accuracy is acceptable, use the model to classify new data Note: If the test set is used to select models, it is called validation

(test) set

Page 17: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

17PROCESS (1): MODEL CONSTRUCTION

TrainingData

NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no

ClassificationAlgorithms

IF rank = ‘professor’OR years > 6THEN tenured = ‘yes’

Classifier(Model)

Page 18: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

18PROCESS (2): USING THE MODEL IN PREDICTION

Classifier

TestingData

NAME RANK YEARS TENUREDTom Assistant Prof 2 noMerlisa Associate Prof 7 noGeorge Professor 5 yesJoseph Assistant Prof 7 yes

Unseen Data

(Jeff, Professor, 4)

Tenured?

Page 19: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

19

DECISION TREE INDUCTION: AN EXAMPLE

age?

overcast

student? credit rating?

<=30 >40

no yes yes

yes

31..40

no

fairexcellentyesno

age income student credit_rating buys_computer<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no

Training data set: Buys_computer The data set follows an example of

Quinlan’s ID3 (Playing Tennis) Resulting tree:

Page 20: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

20WHAT IS CLUSTER ANALYSIS?

Cluster: A collection of data objects similar (or related) to one another within the same group dissimilar (or unrelated) to the objects in other groups

Cluster analysis (or clustering, data segmentation, …) Finding similarities between data according to the characteristics found

in the data and grouping similar data objects into clusters

Unsupervised learning: no predefined classes (i.e., learning by observations vs. learning by examples: supervised)

Typical applications As a stand-alone tool to get insight into data distribution As a preprocessing step for other algorithms

Page 21: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

ARCHITECTURE: TYPICAL DATA MINING SYSTEM

May 14, 2023

21

Data Mining & Dataware housing

data cleaning, integration, and selection

Database or Data Warehouse Server

Data Mining Engine

Pattern Evaluation

Graphical User Interface

Knowledge-Base

Database Data Warehouse

World-WideWeb

Other InfoRepositories

Page 22: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

MAJOR ISSUES IN DATA MINING Mining methodology

Mining different kinds of knowledge from diverse data types, e.g., bio, stream, Web

Performance: efficiency, effectiveness, and scalability Pattern evaluation: the interestingness problem Incorporation of background knowledge Handling noise and incomplete data Parallel, distributed and incremental mining methods Integration of the discovered knowledge with existing one: knowledge

fusion User interaction

Data mining query languages and ad-hoc mining Expression and visualization of data mining results Interactive mining of knowledge at multiple levels of abstraction

Applications and social impacts Protection of data security, integrity, and privacy

May 14, 2023

22

Data Mining & Dataware housing

Page 23: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

SUMMARY

Data mining: Discovering interesting patterns from large amounts of data

A natural evolution of database technology, in great demand, with wide applications

A KDD process includes data cleaning, data integration, data selection, transformation, data mining, pattern evaluation, and knowledge presentation

Mining can be performed in a variety of information repositories Data mining functionalities: characterization, discrimination,

association, classification, clustering, outlier and trend analysis, etc.

Data mining systems and architectures Major issues in data mining

May 14, 2023

23

Data Mining & Dataware housing

Page 24: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

What is a data warehouse? A multi-dimensional data model Data warehouse architecture

May 14, 2023

Data warehousig &

OLAP

24 DATA WAREHOUSING AND OLAP TECHNOLOGY: AN OVERVIEW

Page 25: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

Defined in many different ways, but not rigorously. A decision support database that is maintained separately

from the organization’s operational database Support information processing by providing a solid

platform of consolidated, historical data for analysis. “A data warehouse is a subject-oriented, integrated, time-

variant, and nonvolatile collection of data in support of management’s decision-making process.”—W. H. Inmon

Data warehousing: The process of constructing and using data warehouses

May 14, 2023

Data warehousig &

OLAP

25

WHAT IS DATA WAREHOUSE?

Page 26: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

Organized around major subjects, such as customer, product, sales

Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing

Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process

May 14, 2023

Data warehousig &

OLAP

26DATA WAREHOUSE—SUBJECT-ORIENTED

Page 27: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

Constructed by integrating multiple, heterogeneous data sources relational databases, flat files, on-line transaction

records Data cleaning and data integration techniques are

applied. Ensure consistency in naming conventions, encoding

structures, attribute measures, etc. among different data sources E.g., Hotel price: currency, tax, breakfast covered, etc.

When data is moved to the warehouse, it is converted.

May 14, 2023

Data warehousig &

OLAP

27

DATA WAREHOUSE—INTEGRATED

Page 28: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

The time horizon for the data warehouse is significantly longer than that of operational systemsOperational database: current value dataData warehouse data: provide information from a

historical perspective (e.g., past 5-10 years) Every key structure in the data warehouse

Contains an element of time, explicitly or implicitlyBut the key of operational data may or may not contain

“time element”

May 14, 2023

Data warehousig &

OLAP

28

DATA WAREHOUSE—TIME VARIANT

Page 29: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

A physically separate store of data transformed from the operational environment

Operational update of data does not occur in the data warehouse environment Does not require transaction processing, recovery, and

concurrency control mechanisms Requires only two operations in data accessing:

initial loading of data and access of data

May 14, 2023

Data warehousig &

OLAP

29DATA WAREHOUSE—NONVOLATILE

Page 30: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

Traditional heterogeneous DB integration: A query driven approach◦ Build wrappers/mediators on top of heterogeneous databases ◦ When a query is posed to a client site, a meta-dictionary is used

to translate the query into queries appropriate for individual heterogeneous sites involved, and the results are integrated into a global answer set

◦ Complex information filtering, compete for resources Data warehouse: update-driven, high performance

◦ Information from heterogeneous sources is integrated in advance and stored in warehouses for direct query and analysis

May 14, 2023

Data warehousig &

OLAP

30DATA WAREHOUSE VS. HETEROGENEOUS DBMS

Page 31: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

OLTP (on-line transaction processing) Major task of traditional relational DBMS Day-to-day operations: purchasing, inventory, banking,

manufacturing, payroll, registration, accounting, etc. OLAP (on-line analytical processing)

Major task of data warehouse system Data analysis and decision making

Distinct features (OLTP vs. OLAP): User and system orientation: customer vs. market Data contents: current, detailed vs. historical, consolidated Database design: ER + application vs. star + subject View: current, local vs. evolutionary, integrated Access patterns: update vs. read-only but complex queries

May 14, 2023

Data warehousig &

OLAP

31DATA WAREHOUSE VS. OPERATIONAL DBMS

Page 32: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

OLTP VS. OLAP OLTP OLAP users clerk, IT professional knowledge worker function day to day operations decision support DB design application-oriented subject-oriented data current, up-to-date

detailed, flat relational isolated

historical, summarized, multidimensional integrated, consolidated

usage repetitive ad-hoc access read/write

index/hash on prim. key lots of scans

unit of work short, simple transaction complex query # records accessed tens millions #users thousands hundreds DB size 100MB-GB 100GB-TB metric transaction throughput query throughput, response

May 14, 2023 Data warehousig & OLAP 32

Page 33: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

High performance for both systems◦ DBMS— tuned for OLTP: access methods, indexing,

concurrency control, recovery◦ Warehouse—tuned for OLAP: complex OLAP queries,

multidimensional view, consolidation Different functions and different data:

◦ missing data: Decision support requires historical data which operational DBs do not typically maintain

◦ data consolidation: DS requires consolidation (aggregation, summarization) of data from heterogeneous sources

◦ data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled

Note: There are more and more systems which perform OLAP analysis directly on relational databases

May 14, 2023

Data warehousig &

OLAP

33

WHY SEPARATE DATA WAREHOUSE?

Page 34: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

A data warehouse is based on a multidimensional data model which views data in the form of a data cube

A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions Dimension tables, such as item (item_name, brand,

type), or time(day, week, month, quarter, year) Fact table contains measures (such as dollars_sold) and

keys to each of the related dimension tables In data warehousing literature, an n-D base cube is called a

base cuboid. The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube.

May 14, 2023

Data warehousig &

OLAP

34A MULTIDIMENSIONAL DATA MODELFROM TABLES AND SPREADSHEETS TO DATA CUBES

Page 35: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data warehousig &

OLAP

35

CUBE: A LATTICE OF CUBOIDS

time,item

time,item,location

time, item, location, supplier

all

time item location supplier

time,location

time,supplier

item,location

item,supplier

location,supplier

time,item,supplier

time,location,supplier

item,location,supplier

0-D(apex) cuboid

1-D cuboids

2-D cuboids

3-D cuboids

4-D(base) cuboid

Page 36: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

Modeling data warehouses: dimensions & measuresStar schema: A fact table in the middle connected to a

set of dimension tables Snowflake schema: A refinement of star schema where

some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake

Fact constellations: Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation

May 14, 2023

Data warehousig &

OLAP

36CONCEPTUAL MODELING OF DATA WAREHOUSES

Page 37: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data warehousig &

OLAP

37

EXAMPLE OF STAR SCHEMA

time_keydayday_of_the_weekmonthquarteryear

time

location_keystreetcitystate_or_provincecountry

location

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_salesMeasures

item_keyitem_namebrandtypesupplier_type

item

branch_keybranch_namebranch_type

branch

Page 38: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data warehousig &

OLAP

38

EXAMPLE OF SNOWFLAKE SCHEMA

time_keydayday_of_the_weekmonthquarteryear

time

location_keystreetcity_key

location

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_sales

Measures

item_keyitem_namebrandtypesupplier_key

item

branch_keybranch_namebranch_type

branch

supplier_keysupplier_type

supplier

city_keycitystate_or_provincecountry

city

Page 39: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data warehousig &

OLAP

39

EXAMPLE OF FACT CONSTELLATION

time_keydayday_of_the_weekmonthquarteryear

time

location_keystreetcityprovince_or_statecountry

location

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_salesMeasures

item_keyitem_namebrandtypesupplier_type

item

branch_keybranch_namebranch_type

branch

Shipping Fact Table

time_key

item_key

shipper_key

from_location

to_location

dollars_cost

units_shipped

shipper_keyshipper_namelocation_keyshipper_type

shipper

Page 40: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

Sales volume as a function of product, month, and region

May 14, 2023

Data warehousig &

OLAP

40MULTIDIMENSIONAL DATA

Prod

uct

Reg

ion

Month

Dimensions: Product, Location, TimeHierarchical summarization paths

Industry Region Year

Category Country Quarter

Product City Month Week

Office Day

Page 41: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data warehousig &

OLAP

41

A SAMPLE DATA CUBE

Total annual salesof TV in U.S.A.Date

Produ

ct

Cou

ntrysum

sum TV

VCRPC

1Qtr 2Qtr 3Qtr 4Qtr

U.S.A

Canada

Mexico

sum

Page 42: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data warehousig &

OLAP

42CUBOIDS CORRESPONDING TO THE CUBE

all

product date country

product,date product,country date, country

product, date, country

0-D(apex) cuboid

1-D cuboids

2-D cuboids

3-D(base) cuboid

Page 43: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

Visualization OLAP capabilities Interactive manipulation

May 14, 2023

Data warehousig &

OLAP

43

BROWSING A DATA CUBE

Page 44: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

Roll up (drill-up): summarize databy climbing up hierarchy or by dimension reduction

Drill down (roll down): reverse of roll-upfrom higher level summary to lower level summary or

detailed data, or introducing new dimensions Slice and dice: project and select Pivot (rotate):

reorient the cube, visualization, 3D to series of 2D planes Other operations

drill across: involving (across) more than one fact tabledrill through: through the bottom level of the cube to its

back-end relational tables (using SQL)

May 14, 2023

Data warehousig &

OLAP

44TYPICAL OLAP OPERATIONS

Page 45: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data warehousig &

OLAP

45Fig. 3.10 Typical OLAP Operations

Page 46: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data

Mining: Concepts

and Technique

s

46

Classification predicts categorical class labels (discrete or

nominal)classifies data (constructs a model) based on the

training set and the values (class labels) in a classifying attribute and uses it in classifying new data

Prediction models continuous-valued functions, i.e., predicts

unknown or missing values Typical applications

Credit approvalTarget marketingMedical diagnosisFraud detection

CLASSIFICATION VS. PREDICTION

Page 47: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data

Mining: Concepts

and Technique

s

47CLASSIFICATION—A TWO-STEP PROCESS

Model construction: describing a set of predetermined classes Each tuple/sample is assumed to belong to a predefined

class, as determined by the class label attribute The set of tuples used for model construction is training set The model is represented as classification rules, decision

trees, or mathematical formulae Model usage: for classifying future or unknown objects

Estimate accuracy of the model The known label of test sample is compared with the

classified result from the model Accuracy rate is the percentage of test set samples that

are correctly classified by the model Test set is independent of training set, otherwise over-

fitting will occur If the accuracy is acceptable, use the model to classify data

tuples whose class labels are not known

Page 48: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data

Mining: Concepts

and Technique

s

48PROCESS (1): MODEL CONSTRUCTION

TrainingData

NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no

ClassificationAlgorithms

IF rank = ‘professor’OR years > 6THEN tenured = ‘yes’

Classifier(Model)

Page 49: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data

Mining: Concepts

and Technique

s

49PROCESS (2): USING THE MODEL IN PREDICTION

Classifier

TestingData

NAME RANK YEARS TENUREDTom Assistant Prof 2 noMerlisa Associate Prof 7 noGeorge Professor 5 yesJoseph Assistant Prof 7 yes

Unseen Data

(Jeff, Professor, 4)

Tenured?

Page 50: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data

Mining: Concepts

and Technique

s

50SUPERVISED VS. UNSUPERVISED LEARNING

Supervised learning (classification)Supervision: The training data (observations,

measurements, etc.) are accompanied by labels indicating the class of the observations

New data is classified based on the training set Unsupervised learning (clustering)

The class labels of training data is unknownGiven a set of measurements, observations,

etc. with the aim of establishing the existence of classes or clusters in the data

Page 51: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data

Mining: Concepts

and Technique

s

51DECISION TREE INDUCTION: TRAINING DATASET

age income student credit_rating buys_computer<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no

This follows an example of Quinlan’s ID3 (Playing Tennis)

Page 52: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data

Mining: Concepts

and Technique

s

52OUTPUT: A DECISION TREE FOR “BUYS_COMPUTER”

age?

overcast

student? credit rating?

<=30 >40

no yes yes

yes

31..40

no

fairexcellentyesno

Page 53: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data

Mining: Concepts

and Technique

s

53USING IF-THEN RULES FOR CLASSIFICATION

Represent the knowledge in the form of IF-THEN rulesR: IF age = youth AND student = yes THEN buys_computer = yes Rule antecedent/precondition vs. rule consequent

Assessment of a rule: coverage and accuracy ncovers = # of tuples covered by R ncorrect = # of tuples correctly classified by Rcoverage(R) = ncovers /|D| /* D: training data set */accuracy(R) = ncorrect / ncovers

If more than one rule is triggered, need conflict resolution Size ordering: assign the highest priority to the triggering rules that has

the “toughest” requirement (i.e., with the most attribute test) Class-based ordering: decreasing order of prevalence or

misclassification cost per class Rule-based ordering (decision list): rules are organized into one long

priority list, according to some measure of rule quality or by experts

Page 54: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data

Mining: Concepts

and Technique

s

54

age?

student? credit rating?

<=30 >40

no yes yes

yes

31..40

no

fairexcellentyesno

Example: Rule extraction from our buys_computer decision-treeIF age = young AND student = no THEN buys_computer = noIF age = young AND student = yes THEN buys_computer = yesIF age = mid-age THEN buys_computer = yesIF age = old AND credit_rating = excellent THEN buys_computer = yesIF age = young AND credit_rating = fair THEN buys_computer = no

RULE EXTRACTION FROM A DECISION TREE

Rules are easier to understand than large trees

One rule is created for each path from the root to a leaf

Each attribute-value pair along a path forms a conjunction: the leaf holds the class prediction

Rules are mutually exclusive and exhaustive

Page 55: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data

Mining: Concepts

and Technique

s

55WHAT IS PREDICTION?

(Numerical) prediction is similar to classification construct a model use model to predict continuous or ordered value for a given

input Prediction is different from classification

Classification refers to predict categorical class label Prediction models continuous-valued functions

Major method for prediction: regression model the relationship between one or more independent or

predictor variables and a dependent or response variable Regression analysis

Linear and multiple regression Non-linear regression Other regression methods: generalized linear model, Poisson

regression, log-linear models, regression trees

Page 56: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data Mining: Concepts and Techniques

56

LINEAR REGRESSION Linear regression: involves a response variable y and a single

predictor variable xy = w0 + w1 x

where w0 (y-intercept) and w1 (slope) are regression coefficients Method of least squares: estimates the best-fitting straight line

Multiple linear regression: involves more than one predictor variable Training data is of the form (X1, y1), (X2, y2),…, (X|D|, y|D|) Ex. For 2-D data, we may have: y = w0 + w1 x1+ w2 x2

Solvable by extension of least square method or using SAS, S-Plus

Many nonlinear functions can be transformed into the above

||

1

2

||

1

)(

))((

1 D

ii

D

iii

xx

yyxxw xwyw 10

Page 57: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data

Mining: Concepts

and Technique

s

57

Some nonlinear models can be modeled by a polynomial function

A polynomial regression model can be transformed into linear regression model. For example,

y = w0 + w1 x + w2 x2 + w3 x3

convertible to linear with new variables: x2 = x2, x3= x3

y = w0 + w1 x + w2 x2 + w3 x3

Other functions, such as power function, can also be transformed to linear model

Some models are intractable nonlinear (e.g., sum of exponential terms)possible to obtain least square estimates through

extensive calculation on more complex formulae

NONLINEAR REGRESSION

Page 58: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

CLUSTERINGSKNCOE

Page 59: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data Mining:

Concepts and Techniques

59CLUSTERING: RICH APPLICATIONS AND MULTIDISCIPLINARY EFFORTS

Pattern Recognition Spatial Data Analysis

Create thematic maps by clustering feature spacesDetect spatial clusters or for other spatial mining

tasks Image Processing Economic Science (especially market research) WWW

Document classificationCluster Weblog data to discover groups of similar

access patterns

Page 60: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data Mining:

Concepts and Techniques

60EXAMPLES OF CLUSTERING APPLICATIONS

Marketing: Help marketers discover distinct groups in their customer bases, and then use this knowledge to develop targeted marketing programs

Land use: Identification of areas of similar land use in an earth observation database

City-planning: Identifying groups of houses according to their house type, value, and geographical location

Earth-quake studies: Observed earth quake epicenters should be clustered along continent faults

Page 61: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data Mining:

Concepts and Techniques

61QUALITY: WHAT IS GOOD CLUSTERING?

A good clustering method will produce high quality clusters withhigh intra-class similarity low inter-class similarity

The quality of a clustering result depends on both the similarity measure used by the method and its implementation

The quality of a clustering method is also measured by its ability to discover some or all of the hidden patterns

Page 62: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data Mining:

Concepts and Techniques

62MEASURE THE QUALITY OF CLUSTERING

Dissimilarity/Similarity metric: Similarity is expressed in terms of a distance function, typically metric: d (i, j)

There is a separate “quality” function that measures the “goodness” of a cluster.

The definitions of distance functions are usually very different for interval-scaled, boolean, categorical, ordinal ratio, and vector variables.

Weights should be associated with different variables based on applications and data semantics.

It is hard to define “similar enough” or “good enough” the answer is typically highly subjective.

Page 63: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data Mining:

Concepts and Techniques

63MAJOR CLUSTERING APPROACHES (I)

Partitioning approach: Construct various partitions and then evaluate them by some

criterion, e.g., minimizing the sum of square errors Typical methods: k-means, k-medoids, CLARANS

Hierarchical approach: Create a hierarchical decomposition of the set of data (or objects)

using some criterion Typical methods: Diana, Agnes, BIRCH, ROCK, CAMELEON

Density-based approach: Based on connectivity and density functions Typical methods: DBSACN, OPTICS, DenClue

Page 64: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

May 14, 2023

Data Mining:

Concepts and Techniques

64MAJOR CLUSTERING APPROACHES (II)

Model-based: A model is hypothesized for each of the clusters and tries to find

the best fit of that model to each other Typical methods: EM, SOM, COBWEB

Frequent pattern-based: Based on the analysis of frequent patterns Typical methods: pCluster

User-guided or constraint-based: Clustering by considering user-specified or application-specific

constraints Typical methods: COD (obstacles), constrained clustering

Page 65: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

INTRODUCTION TO MACHINE LEARNINGSKNCOE

Page 66: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

Branch of artificial intelligence that allows us to make our application intelligent without being explicitly programmed

Concepts are used to enable applications to take a decision from the available datasets.

INTRODUCTION

Page 67: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

spam mail detectors self-driven cars speech recognition face recognition online transactional fraud-activity detection Recommender Systems

APPLICATIONS

Page 68: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

1.2

3

TYPES OF MACHINE LEARNING

Page 69: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

a) Linear regression

b) Logistic regression

1.SUPERVISED MACHINE LEARNING

Page 70: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

Predicting and forecasting values based on historical information

Identify the linear relationship between target variables and explanatory variables.

Variables that are going to be predicted are considered as Target variables

Variables that are going to help predict the target variables are called explanatory variables

LINEAR REGRESSION

Page 71: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

LINEAR REGRESSION

Page 72: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing
Page 73: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing
Page 74: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

Sales forecasting Predicting optimum product price Predicting the next online purchase from

various sources and campaigns

APPLICATIONS OF LINEAR REGRESSION

Page 75: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

Type of probabilistic classification model. Used in medical & social science.

Binary logistic regression deals with situations in which the outcome for a dependent variable can have two possible types

Multinomial logistic regression deals with situations where the outcome can have three or more possible types.

It provides a classification boundary to classify the outcome variable.

2.LOGISTIC REGRESSION

Page 76: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing
Page 77: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

1. Predicting the likelihood of an online purchase2. Detecting the presence of diabetes

APPLICATIONS OF LOGISTIC REASONING

Page 78: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

Algorithms used are…

1. Clustering2. Artificial neural networks3. Vector quantization

2.UNSUPERVISED MACHINE LEARNING

Page 79: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

CLUSTERING

Clustering Algorithms : K-means,k-medoid, hierarchy & density based clustering.

Page 80: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

1. Market segmentation2. Social network analysis3. Organizing computer network4. Astronomical data analysis

APPLICATIONS OF CLUSTERING

Page 81: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

A machine-learning technique to predict what new items a user would like based on associations with the user's previous items

When a customer is looking for a Samsung Galaxy S5 mobile phone on Amazon, the store will also suggest other mobile phones similar to this one, presented in the Customers Who Bought This Item Also Bought window.

3.RECOMMENDATION ALGORITHMS

Page 82: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

1.User Based Recommendation

2.Item Based Recommendation

TYPES OF RECOMMENDATIONS

Page 83: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

USER BASED RECOMMENDATIONUSERS SIMILAR TO THE CURRENT USER ARE DETERMINEDBASED ON SMILARITY THEIR LIKED/USED PRODUCT CAN BE RECOMENDED

Page 84: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

items similar to the items that are being currently used by a user are determined

Eg:

ITEM BASED RECOMMENDATION

Page 85: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

STEPS IN R TO GENEARATE RECOMMENDATIONS

Page 86: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

E- commerce Increasing the sales and growing the business Customer satisfaction

APPLICATIONS /USES OF RECOMMENDATIONS

Page 87: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

Bussiness Bussiness Intelligence Intelligence

Page 88: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

CHANGING BUSINESS ENVIRONMENTCHANGING BUSINESS ENVIRONMENT

CISB594 – Business IntelligenceCISB594 – Business Intelligence

The environment in which organizations operate today is The environment in which organizations operate today is becoming more and more complexbecoming more and more complex

The complexity creates opportunities on one hand and problems The complexity creates opportunities on one hand and problems on the other. on the other.

Business environment factors are divided into four major Business environment factors are divided into four major categories:categories: markets, markets, consumer demands, consumer demands, technology,technology, societalsocietal

The intensity of these factors increases with time, hence more The intensity of these factors increases with time, hence more pressures, more competition, more management pressures, more competition, more management problemsproblems

Page 89: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

BUSINESS ENVIRONMENT BUSINESS ENVIRONMENT FACTORSFACTORS

FACTORFACTOR DESCRIPTIONDESCRIPTIONMarketsMarkets Strong competitionStrong competition

Expanding global marketsExpanding global marketsBlooming electronic markets on the InternetBlooming electronic markets on the InternetInnovative marketing methodsInnovative marketing methodsOpportunities for outsourcing with IT supportOpportunities for outsourcing with IT support

Need for real-time, on-demand transactionsNeed for real-time, on-demand transactionsConsumer Consumer Desire for customizationDesire for customization demanddemand Desire for quality, diversity of products, and speed of deliveryDesire for quality, diversity of products, and speed of delivery Customers getting powerful and less loyalCustomers getting powerful and less loyal TechnologyTechnology More innovations, new products, and new servicesMore innovations, new products, and new services

Increasing obsolescence rateIncreasing obsolescence rateIncreasing information overloadIncreasing information overload

Social networking, Web 2.0 and beyondSocial networking, Web 2.0 and beyondSocietalSocietal Growing government regulations and deregulationGrowing government regulations and deregulation

Workforce more diversified, older, and composed of more womenWorkforce more diversified, older, and composed of more women Prime Prime concerns of homeland security and terrorist attacksconcerns of homeland security and terrorist attacks

Increasing social responsibility of companiesIncreasing social responsibility of companiesGreater emphasis on sustainabilityGreater emphasis on sustainability

– – Business IntelligenceBusiness Intelligence

Page 90: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

DECISION MAKING IN BUSINESSDECISION MAKING IN BUSINESS Management Management Decision Making Decision Making Decision making means selecting the best Decision making means selecting the best

solution from two or more alternativessolution from two or more alternatives Management was considered an art because a Management was considered an art because a

variety of individual styles could be used in variety of individual styles could be used in addressing problemsaddressing problems

Often based on creativity, judgment, intuition, Often based on creativity, judgment, intuition, experience rather than on a scientific approach.experience rather than on a scientific approach.

Studies suggest that managers roles can be Studies suggest that managers roles can be classified into 3 major categories:classified into 3 major categories: Interpersonal – figurehead, leaderInterpersonal – figurehead, leader Informational- spokesperson, disseminatorInformational- spokesperson, disseminator Decisional- negotiator, resource allocatorDecisional- negotiator, resource allocator

Page 91: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

THE IDEATHE IDEA

The right decision = Intelligence + InformationThe right decision = Intelligence + InformationIntelligence = The capacity to acquire and apply knowledgeIntelligence = The capacity to acquire and apply knowledgeInformation = is used to tell stories, to discover things, to keep Information = is used to tell stories, to discover things, to keep

track of track of things, to provide answer and eventually will lead to innovationthings, to provide answer and eventually will lead to innovation

Business Intelligence = Business Intelligence = The right information + The right time + From the Right Resources The right information + The right time + From the Right Resources

Using information effectively to make better decisions Using information effectively to make better decisions (Gautner, 1989)(Gautner, 1989)

Page 92: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

WHAT IS BUSINESS WHAT IS BUSINESS INTELLIGENCE?INTELLIGENCE?

Business Intelligence (BI) refers to computer-based techniques used in spotting, digging-out, and analyzing business data, such as sales revenue by products and/or departments or associated costs and incomes

(Wikipedia,2010)

Business Intelligence (BI) helps business people make more informed decisions by providing them timely, data-driven answers to their business questions. BI analyzes data stored in data warehouses, operational databases, and/or ERP systems (i.e. SAP®, Oracle, JD Edwards, Peoplesoft) and transforms it into attractive and easy to understand dashboards and reports. BI delivers the insight needed to make strategic planning decisions, improve operational efficiencies, and optimize business processes.

(Microstrategy.com)

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Page 93: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

A WHAT IS BUSINESS WHAT IS BUSINESS INTELLIGENCE?INTELLIGENCE?

An umbrella term that combines architectures, tools, databases, applications and methodologies in order to enable interactive access to data, to enable manipulation of data and to give business managers the ability to make more informed and better business decisions

(Turban, 2010)

Business intelligence uses knowledge management, data warehouse[ing], data mining and business analysis to identify, track and improve key processes and data, as well as identify and monitor trends in corporate, competitor and market performance.”

(bettermanagement.com)

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Page 94: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

BUSINESS INTELLIGENCE MAIN BUSINESS INTELLIGENCE MAIN OBJECTIVESOBJECTIVES

Enable interactive access to data (sometimes in real time) Enable manipulation of data to allow appropriate analysis by

managers Provide valuable insights to produce informed and better

decisions The process of BI is based on transformation of data to

information, then to decisions and finally to actions Facilitate closing the strategy gap of an organization

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Page 95: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

VARIOUS TOOLS AND TECHNIQUES VARIOUS TOOLS AND TECHNIQUES IN BIIN BI

CISB594 – Business IntelligenceCISB594 – Business IntelligenceMost sophisticated BI products include most of the above

Page 96: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

DECISION MAKING IN BUSINESSDECISION MAKING IN BUSINESS

Will require informationWill require information

Page 97: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

THE ARCHITECTURE OF BUSINESS THE ARCHITECTURE OF BUSINESS INTELLIGENCEINTELLIGENCE

Four major componentsFour major components

Page 98: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

4 MAJOR COMPONENTS OF 4 MAJOR COMPONENTS OF BUSINESS INTELLIGENCE BUSINESS INTELLIGENCE ARCHITECTUREARCHITECTURE

1. The data warehouse is a special database or repository of data that had been prepared to support decision making applications ranging from simple reporting to complex optimization

Business IntelligenceBusiness Intelligence

Page 99: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

4 MAJOR COMPONENTS OF BUSINESS 4 MAJOR COMPONENTS OF BUSINESS INTELLIGENCE ARCHITECTUREINTELLIGENCE ARCHITECTURE

2. Business analytics are the software tools that allow users to create on-demand reports, queries and conduct analysis of data Originally they appear under the name online analytical processing (OLAP)Data Mining - A class of information analysis based on databases that looks for hidden patterns in a collection of data which can be used to predict future behaviore.g. Amazon.com uses data mining to predict the behaviour of their customersAutomated Decision Systems - Rule-based system that provide solution usually in one functional area to a specific repetitive managerial problems

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Page 100: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

4 MAJOR COMPONENTS OF BUSINESS 4 MAJOR COMPONENTS OF BUSINESS INTELLIGENCE ARCHITECTUREINTELLIGENCE ARCHITECTURE

3. Business performance management (BPM) based on balanced scorecard methodology – a framework for defining, implementing, and managing an enterprise’s business strategy by linking objectives with factual measures

Objective is to optimize overall performance of an organization. A real-time system that alert managers to potential opportunities, impending problems, and threats, and then empowers them to react through models and collaboration

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Page 101: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

THE ARCHITECTURE OF BUSINESS THE ARCHITECTURE OF BUSINESS INTELLIGENCEINTELLIGENCE

4. User interface allows access and easy manipulation of other BI components

Tools used to broadcast information

Data visualization provides graphical, animation, or video

presentation of data and the results of data analysis The ability to quickly identify important trends in corporate and market data can provide competitive advantage

CISB594 – Business IntelligenceCISB594 – Business Intelligence

Page 102: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

BUSINESS MODEL

May 14, 2023

102

Data Mining & Dataware housing

Page 103: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

6-103

WHAT IS A BUSINESS MODEL? Model

A model is a plan or diagram that is used to make or describe something.

Business ModelA firm’s business model is its plan or diagram for how it

competes, uses its resources, structures its relationships, interfaces with customers, and creates value to sustain itself on the basis of the profits it generates.

The term “business model” is used to include all the activities that define how a firm competes in the marketplace.

Page 104: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

6-104

BUSINESS MODELS Timing of Business Model Development

The development of a firm’s business model follows the feasibility analysis stage of launching a new venture but comes before writing a business plan.

If a firm has conducted a successful feasibility analysis and knows that it has a product or service with potential, the business model stage addresses how to surround it with a core strategy, a partnership network, a customer interface, distinctive resources, and an approach to creating value that represents a viable business.

Page 105: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

6-105

IMPORTANCE OF A BUSINESS MODELHaving a clearly articulated business

model is important because it does the following:

• Serves as an ongoing extension of feasibility analysis. A business model continually asks the question, “Does this business make sense?”• Focuses attention on how all the elements of a business fit together and constitute a working whole.• Describes why the network of participants needed to make a business idea viable are willing to work together.• Articulates a company’s core logic to all stakeholders, including the firm’s employees.

Page 106: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

6-106COMPONENTS OF A BUSINESS MODEL

Four Components of a Business Model

Page 107: U NIT 6 BY SACHIN DHANDE March 11, 2016 1 Data Mining & Dataware housing

6-107RECAP: THE IMPORTANCE OF BUSINESS MODELS

Business ModelsIt is very useful for a new venture to look at itself in a

holistic manner and understand that it must construct an effective “business model” to be successful.

Everyone that does business with a firm, from its customers to its partners, does so on a voluntary basis. As a result, a firm must motivate its customers and its partners to play along.

Close attention to each of the primary elements of a firm’s business model is essential for a new venture’s success.