organizational intelligence technologies

Post on 18-Mar-2016

29 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Organizational intelligence technologies. - PowerPoint PPT Presentation

TRANSCRIPT

Organizational intelligence technologies

There are three kinds of intelligence: one kind understands things for itself, the other appreciates what others can understand, the third understands

neither for itself nor through others. This first kind is excellent, the second good, and the third kind useless.

Machiavelli, The Prince, 1513.

Organizational intelligence

Organizational intelligence is the outcome of an organization’s efforts to collect store, process, and interpret data from internal and external sourcesIntelligence in the sense of gathering and distributing information

Types of information systems

Type of information system

System’s purpose

Transaction processing systemTPS

Collects and stores data from routine transactions

Management information systemMIS

Converts data from a TPS into information for planning, controlling, and managing an organization

Decision support systemDSS

Supports managerial decision making by providing models for processing and analyzing data

Executive information systemEIS

Provides senior management with information necessary to monitor organizational performance, and develop and implement strategies

On-line analytical processingOLAP

Presents a multidimensional, logical view of data to the analyst with no requirements as to how the data are stored

Data mining Uses statistical analysis and artificial intelligence techniques to identify hidden relationships in data

The information systems cycle

Transaction processing systems

Can generate huge volumes of dataA telephone company may generate 200 million records per dayRaw material for organizational intelligence

The problemOrganizational memory is fragmented

Different systemsDifferent database technologiesDifferent locations

An underused intelligence system containing undetected key facts about customers

The data warehouseA repository of organizational dataCan be measured in terabytes

Managing the data warehouse

ExtractionTransformationCleaningLoadingSchedulingMetadata

ExtractionPulling data from existing systemsOperational systems were not designed for extraction to load into a data warehouseApplications are often independent entitiesTime consuming and complexAn ongoing process

Transformation

Encodingm/f, male/female to M/F

Unit of measureinches to cms

Fieldsales-date to salesdate

Datedd/mm/yy to yyyy/mm/dd

Cleaning

Same record stored in different departmentsMultiple records for a companyMultiple entries for the same organizationMisuse of data entry fields

Loading

ArchivalMay be too costly

CurrentFrom operational systems

OngoingContinual updating of the warehouse

Scheduling

A trade-offToo frequent is costlyInfrequently means old data

Metadata

A data dictionary containing additional facts about the data in the warehouse

Description of each data typeFormat Coding standardsMeaningOperational system sourceTransformationsFrequency of extracts

Warehouse architectures

CentralizedFederatedTiered

Centralized data warehouse

Federated data warehouse

Tiered data warehouse

Server options

Single processorSymmetric multiprocessorMassively parallel processorNonuniform memory access

Single processor

Symmetric multiprocessor

Massively parallel processor

Nonuniform memory access

DBMS choicesFeatures/ functions

Relational

Super-relationa

l

Multidimensional (logical)

Multidimensional

(physical)

Object-relation

alNormalized data structures

Abstract data types

Parallelism

Multidimensional structures

Drill-down

Rotation

Data-dependent operations

Decision matrixFor these environments … Choose …Business requirements

Client population

Systems support

Architecture Server DBMS

Scope: departmentalUses: data analysis

Small;Single location

Minimal local;average central

Consolidate; turnkey package

Single-processor or SMP

MDDB

Scope: departmentalUses: analysis plus informational

Large; analysis at single location;informationalusers dispersed

Minimal local;average central

Tiered; detail at central; summary at local

Clustered SMP for central; SP or SMP for local

RDBMS for central; MDDB for local

Scope: EnterpriseUses: analysis plus informational

Large; geographically dispersed

Strong central

Centralized Clustered SMP

Object-relational Web support

Scope: departmentalUses: exploratory

Small; few sites Strong central

Centralized MPP RDBMS with parallel support

The decision

Selection of a server architecture and DBMS are not independent decisionsParallelism may be an option only for some RDBMSsNeed to find the fit that meets organizational goals

Exploiting data stores

Verification and discoveryData miningOLAP

Verification and discovery

Verification DiscoveryWhat is the average sale for in-store and catalog customers?

What is the best predictor of sales?

What is the average high school GPA of students who graduate from college compared to those who do not?

What are the best predictors of college graduation?

OLAP

Relational model was not designed for data synthesis, analysis, and consolidationThis is the role of spreadsheets and other special purpose softwareNeed to complement RDBMS technology with a multidimensional view of data

TPS versus OLAPTPS OLAPOptimize for transaction volume

Optimize for data analysis

Process a few records at a time

Process summarized data

Real time update as transactions occur

Batch update (e.g., daily)

Based on tables Based on hypercubesRaw data Aggregated dataSQL is widely used MDX becoming a

standard

ROLAP

A relational OLAPA multidimensional model is imposed on a relational structureRelational is a mature technology with extensive data management featuresNot as efficient as OLAP

The star structure

The snowflake structure

Rotation

Drill down

Region Sales variance

Africa 105%Asia 57%Europe 122%North America 97%Pacific 85%South America 163%

Nation Sales variance

China 123%Japan 52%India 87%Singapore 95%

A hypercube

A three-dimensional hypercube display

Page Columns

Region: North

Sales

Red blob

Blue blob

Total

1996Rows 1997Year Total

A six-dimensional hypercube

Dimension ExampleBrand Mt. AiryStore AtlantaCustomer segment

Business

Product group DesksPeriod JanuaryVariable Units sold

A six-dimensional hypercube display

Page ColumnsMonthSegment

Product groupVariable

March Business Desks ChairsUnits Revenue Units Revenue

Carolina AtlantaBoston

Rows Mt. Airy AtlantaBrand BostonStore Totals

The link between RDBMS and MDDB

MDDB designKey concepts

Variable dimensions• What is tracked

• Sales

Identifier dimensions• Tagging what is tracked

• Time, product, and store of sale

Prompts for identifying dimensions

Prompt ExampleWhen? June 5, 1998Where? ParisWhat? TentHow? CatalogWho? Young adult

womanOutcome?

Revenue of 6,000 FF

Variables and identifiers

Identifier time (hour)

Variablesales

(dollars)10:00 52311:00 78912:00 1,25613:00 4,12814:00 2,634

Identifier

hit

Variabletime (hh:mm:ss)

1 9:34:452 9:34:573 9:36:124 9:41:56

Analysis and variable type

Identifier dimensionContinuous Nominal or ordinal

Variable dimension

Continuous

Regression and curve fittingSales by quarter

Analysis of varianceSales by store

Nominal or ordinal

Logistic regression Customer response (yes or no) to the level of advertising

Contingency table analysisNumber of sales by region

Data mining

The search for relationships and patternsApplications

Database marketingPredicting bad loansDetecting flaws in VLSI chipsIdentifying quasars

Data mining functionsAssociations

85 percent of customers who buy a certain brand of wine also buy a certain type of pasta

Sequential patterns32 percent of female customers who order a red jacket within six months buy a gray skirt

ClassifyingFrequent customers as those with incomes about $50,000 and having two or more children

ClusteringMarket segmentation

PredictingPredict the revenue value of a new customer based on that person’s demographic variables

Data mining technologiesDecision treesGenetic algorithmsK-nearest neighbor methodNeural networksData visualization

SQL-99 and OLAPSQL can be tedious and inefficientThe following questions require four queries

Find the total revenueReport revenue by locationReport revenue by channel Report revenue by location and channel

SQL-99 extensionsGROUP BY extended with

GROUPING SETSROLLUPCUBE

GROUPING SETSSELECT location, channel,DECIMAL(SUM(revenue),9)FROM expedGROUP BY GROUPING SETS (location, channel);

GROUPING SETSLocation Channel Revenuenull Catalog 108762

null Store 347537

null Web 27166

London null 214334

New York null 39123

Paris null 143303

Sydney null 29989

Tokyo null 56716

ROLLUP

SELECT location, channel,DECIMAL(SUM(revenue),9)FROM expedGROUP BY ROLLUP (location, channel);

ROLLUPLocation Channel Revenuenull null 483465London null 214334New York null 39123Paris null 143303Sydney null 29989Tokyo null 56716London Catalog 50310London Store 151015London Web 13009New York Catalog 8712New York Store 28060New York Web 2351Paris Catalog 32166Paris Store 104083Paris Web 7054Sydney Catalog 5471Sydney Store 21769Sydney Web 2749Tokyo Catalog 12103Tokyo Store 42610Tokyo Web 2003

CUBE

SELECT location, channel,DECIMAL(SUM(revenue),9)FROM expedGROUP BY CUBE (location, channel);

Location Channel Revenuenull Catalog 108762null Store 347537null Web 27166null null 483465London null 214334New York null 39123Paris null 143303Sydney null 29989Tokyo null 56716London Catalog 50310London Store 151015London Web 13009New York Catalog 8712New York Store 28060New York Web 2351Paris Catalog 32166Paris Store 104083Paris Web 7054Sydney Catalog 5471Sydney Store 21769Sydney Web 2749Tokyo Catalog 12103Tokyo Store 42610Tokyo Web 2003

CUBE

SQL OLAP extensionsUsefulNot as powerful as MDDB toolsUse CUBE as the default

ConclusionData management is an evolving disciplineData managers have a dual responsibility

Manage data to be in business todayManage data to be in business tomorrow

Data managers now need to support organizational intelligence technologies

top related