document15
DESCRIPTION
TRANSCRIPT
Organizational intelligence technologies
Chapter 15
Too many companies are data rich but information poorOrganizational intelligence is the outcome of an organization’s efforts to collect store, process, and interpret data from internal and external sourcesIntelligence in the sense of gathering and distributing information
Organizational intelligence
TPS handles common business tasks such as accounting, inventory, purchasing, and sales; can generate huge volumes of dataTPS data is usually highly fragmented:
Different systemsDifferent database technologiesDifferent locations
Result: an underused intelligence system containing undetected key facts
Transaction processing systems
The data warehouse
A repository of organizational dataCan be measured in terabytes
Analysis tools
DSSEIS
OLAPSQL
Data mining
Management tools
Data extractorTransformation engine
CleanserLoader
SchedulerMetadata manager
Datawarehouse
Extraction: Pulling data from existing systems.
Transformation: Data must be standardized and follow consistent coding schemes.
Cleaning: Removing errors, inconsistencies, or redundancies.
Loading: Copying operational data in the data warehouse (archival, current, or ongoing).
Scheduling: Refreshing the warehouse.
Metadata: Data dictionary containing facts about the data in the warehouse.
Managing the data warehouse
Warehouse architecturesServer architecturesDBMS
Data Warehouse Technology
CentralizedFederatedTiered
Warehouse architectures
Mainframe
Corporatedata-
warehouse
CorporateFinancial
MarketingManufacturing
Distribution
Server Analyst
Analyst
Analyst
Centralized data warehouse
The centralized data warehouse gives processing efficiency and lowers support costs.
Mainframe
Corporatedata
warehouse
Financial
Analyst
Analyst
AnalystMarketing
Manufacturing
Distribution
Analyst
Federated data warehouse
The data warehouse may appear as one logical structure, but in order to reduce response time, it is physically dispersed across several related physical databases.
Local data mart
Mainframe
Analyst
Tier 3 (detailed data)
Tier 1 (highly summarized data)
Tier 2 (summarized data)
Workstation
Corporate data warehouse
Tiered data warehouse
A tiered architecture houses highly aggregated data on an analyst’s workstation, with more detailed summaries on a second server, and most detailed data on a third server.
Single processorSymmetric multiprocessorMassively parallel processorNonuniform memory access
Server architectures
Single processor
Processor Memory Databases
Single processor
The simplest option. Easy to manage, but limited processing power and scalability.
Processor Memory Databases
Symmetric multiprocessing
Symmetric multiprocessor
SMP has multiple processors sharing memory and disks. Very scalable, but memory bus can become congested. OS must be designed for multiprocessing.
Processor Memory Databases
Massively parallel processor
MPP connects an array of processors that have their independent memory and disks. Applications must be designed to work in parallel (ex: needs the « parallel version » of DB2 or Oracle).
Processor Memory Databases
Non-uniform memory access
NUMA joins multiples SMP nodes into a single, distributed memory pool with a single OS. OS must be designed to work with NUMA – no widely used in commercial environment.
DBMS choices
Features/ functions
Relational
Super-relationa
l
Multidimensional (logical)
Multidimensional
(physical)
Object-relation
al
Normalized data structures
Abstract data types
Parallelism
Multidimensional structures
Drill-down
Rotation
Data-dependent operations
Sales 1996
Redblob
Blueblob
1997
MDDB: Data in a hypercube
Whereas the relational world is two-dimensional, MDDB allows the representation of multiple dimensions.
North
South
Verification vs. DiscoveryOLAPData mining
Analysis Tools
Verification DiscoveryWhat is the average sale forin-store and catalogcustomers?
What is the best predictorof sales?
What is the average highschool GPA of students whograduate from collegecompared to those who donot?
What are the bestpredictors of collegegraduation?
Verification and discovery
The verification approach to data analysis is driven by a hypothesis about some relationship in the data.
The discovery approach to data analysis sifts through the data in search of frequently occurring patterns and trends.
Relational model was not designed for data synthesis, analysis, and consolidationThis is the role of other special purpose software, such as OLAPOLAP tools give fast, flexible, shared access to analytical information. OLAP tools support the « verification approach »
OLAP
TPS OLAPOptimize for transaction volume Optimize for data analysisProcess a few records at a time Process summarized dataReal time update as transactions occurBatch update (e.g., daily)Based on tables Based on hypercubesRaw data Aggregated dataSQL is widely used No common query languaged
TPS versus OLAP
Data mining is the search for relationships and patterns that exist in large databases but are hidden in the vast amounts of dataMultiple applications:
Database marketingPredicting bad loansDetecting flaws in VLSI chipsIdentifying quasars
Data mining tools support the « discovery approach »
Data mining
Associations85 percent of customers who buy a certain brand of wine also buy a certain type of pasta
Sequential patterns32 percent of female customers who order a red jacket within six months buy a gray skirt
ClassifyingIdentification of the attributes that discriminate different groups
ClusteringDivides a dataset into mutually exclusive groups
Predictingpredict the revenue value of a new customer based on that person’s demographic variables
Data mining functions
Data management is an evolving disciplineData managers have a dual responsibility
Manage data to be in business todayManage data to be in business tomorrow
Data managers now need to support organizational intelligence technologies
Conclusion