document15

24
Organizational intelligence technologies Chapter 15

Upload: tess98

Post on 22-Nov-2014

298 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Document15

Organizational intelligence technologies

Chapter 15

Page 2: Document15

Too many companies are data rich but information poorOrganizational intelligence is the outcome of an organization’s efforts to collect store, process, and interpret data from internal and external sourcesIntelligence in the sense of gathering and distributing information

Organizational intelligence

Page 3: Document15

TPS handles common business tasks such as accounting, inventory, purchasing, and sales; can generate huge volumes of dataTPS data is usually highly fragmented:

Different systemsDifferent database technologiesDifferent locations

Result: an underused intelligence system containing undetected key facts

Transaction processing systems

Page 4: Document15

The data warehouse

A repository of organizational dataCan be measured in terabytes

Analysis tools

DSSEIS

OLAPSQL

Data mining

Management tools

Data extractorTransformation engine

CleanserLoader

SchedulerMetadata manager

Datawarehouse

Page 5: Document15

Extraction: Pulling data from existing systems.

Transformation: Data must be standardized and follow consistent coding schemes.

Cleaning: Removing errors, inconsistencies, or redundancies.

Loading: Copying operational data in the data warehouse (archival, current, or ongoing).

Scheduling: Refreshing the warehouse.

Metadata: Data dictionary containing facts about the data in the warehouse.

Managing the data warehouse

Page 6: Document15

Warehouse architecturesServer architecturesDBMS

Data Warehouse Technology

Page 7: Document15

CentralizedFederatedTiered

Warehouse architectures

Page 8: Document15

Mainframe

Corporatedata-

warehouse

CorporateFinancial

MarketingManufacturing

Distribution

Server Analyst

Analyst

Analyst

Centralized data warehouse

The centralized data warehouse gives processing efficiency and lowers support costs.

Page 9: Document15

Mainframe

Corporatedata

warehouse

Financial

Analyst

Analyst

AnalystMarketing

Manufacturing

Distribution

Analyst

Federated data warehouse

The data warehouse may appear as one logical structure, but in order to reduce response time, it is physically dispersed across several related physical databases.

Page 10: Document15

Local data mart

Mainframe

Analyst

Tier 3 (detailed data)

Tier 1 (highly summarized data)

Tier 2 (summarized data)

Workstation

Corporate data warehouse

Tiered data warehouse

A tiered architecture houses highly aggregated data on an analyst’s workstation, with more detailed summaries on a second server, and most detailed data on a third server.

Page 11: Document15

Single processorSymmetric multiprocessorMassively parallel processorNonuniform memory access

Server architectures

Page 12: Document15

Single processor

Processor Memory Databases

Single processor

The simplest option. Easy to manage, but limited processing power and scalability.

Page 13: Document15

Processor Memory Databases

Symmetric multiprocessing

Symmetric multiprocessor

SMP has multiple processors sharing memory and disks. Very scalable, but memory bus can become congested. OS must be designed for multiprocessing.

Page 14: Document15

Processor Memory Databases

Massively parallel processor

MPP connects an array of processors that have their independent memory and disks. Applications must be designed to work in parallel (ex: needs the « parallel version » of DB2 or Oracle).

Page 15: Document15

Processor Memory Databases

Non-uniform memory access

NUMA joins multiples SMP nodes into a single, distributed memory pool with a single OS. OS must be designed to work with NUMA – no widely used in commercial environment.

Page 16: Document15

DBMS choices

Features/ functions

Relational

Super-relationa

l

Multidimensional (logical)

Multidimensional

(physical)

Object-relation

al

Normalized data structures

Abstract data types

Parallelism

Multidimensional structures

Drill-down

Rotation

Data-dependent operations

Page 17: Document15

Sales 1996

Redblob

Blueblob

1997

MDDB: Data in a hypercube

Whereas the relational world is two-dimensional, MDDB allows the representation of multiple dimensions.

North

South

Page 18: Document15

Verification vs. DiscoveryOLAPData mining

Analysis Tools

Page 19: Document15

Verification DiscoveryWhat is the average sale forin-store and catalogcustomers?

What is the best predictorof sales?

What is the average highschool GPA of students whograduate from collegecompared to those who donot?

What are the bestpredictors of collegegraduation?

Verification and discovery

The verification approach to data analysis is driven by a hypothesis about some relationship in the data.

The discovery approach to data analysis sifts through the data in search of frequently occurring patterns and trends.

Page 20: Document15

Relational model was not designed for data synthesis, analysis, and consolidationThis is the role of other special purpose software, such as OLAPOLAP tools give fast, flexible, shared access to analytical information. OLAP tools support the « verification approach »

OLAP

Page 21: Document15

TPS OLAPOptimize for transaction volume Optimize for data analysisProcess a few records at a time Process summarized dataReal time update as transactions occurBatch update (e.g., daily)Based on tables Based on hypercubesRaw data Aggregated dataSQL is widely used No common query languaged

TPS versus OLAP

Page 22: Document15

Data mining is the search for relationships and patterns that exist in large databases but are hidden in the vast amounts of dataMultiple applications:

Database marketingPredicting bad loansDetecting flaws in VLSI chipsIdentifying quasars

Data mining tools support the « discovery approach »

Data mining

Page 23: Document15

Associations85 percent of customers who buy a certain brand of wine also buy a certain type of pasta

Sequential patterns32 percent of female customers who order a red jacket within six months buy a gray skirt

ClassifyingIdentification of the attributes that discriminate different groups

ClusteringDivides a dataset into mutually exclusive groups

Predictingpredict the revenue value of a new customer based on that person’s demographic variables

Data mining functions

Page 24: Document15

Data management is an evolving disciplineData managers have a dual responsibility

Manage data to be in business todayManage data to be in business tomorrow

Data managers now need to support organizational intelligence technologies

Conclusion