![Page 1: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/1.jpg)
1Enterprise Information SystemsUmberto Nanni
Master Degree Programme in
Management Engineering
Enterprise Information Systems
Umberto Nanni
DIPARTIMENTO DI INGEGNERIA INFORMATICA AUTOMATICA E GESTIONALE
ANTONIO RUBERTI
Introduction to
Business Intelligence and
Data Warehousing
![Page 2: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/2.jpg)
2Enterprise Information SystemsUmberto Nanni
Business Intelligence Architecture
goals results
managementsystem
operationalsystem
ETL systems
externaldata
sources
servicesystems
ERPsystems
internet /extranet
rep
ort
ing
OLA
P
dat
a m
inin
g
Datawarehouse
Datamart-1 Datamart-2 Datamart-3
KPI DSS MKT HRCRM …
managementsystem
operational system
![Page 3: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/3.jpg)
3Enterprise Information SystemsUmberto Nanni
What is Data Warehousing
Collection of methods, technologies and tools to
assist the “knowledge worker” (manager,
analyst) to conduct data analysis aimed at
supporting decision-making and/or improving
the management of information assets
![Page 4: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/4.jpg)
4Enterprise Information SystemsUmberto Nanni
What is a Data Warehouse
A data warehouse is a collection of data
• integrated (far beyond the organization)
• consistent (despite the heterogeneous origin)
• focused (an interest area is defined)
• historical (over a consistent timeframe)
• permanent (never delete your data!)
![Page 5: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/5.jpg)
5Enterprise Information SystemsUmberto Nanni
Purpose of a Data Warehouse
A Data Warehouse helps (allows) you:
• to take decisions
• to identify and interpret phenomena
• to make predictions about the future
• to control a complex system
![Page 6: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/6.jpg)
6Enterprise Information SystemsUmberto Nanni
Value and quantity of information
value
quantity
strategicinformation
primaryinformation
sources
reports
selectedinformation
BD
$$$$
competitors
marketing
prices
sales
logistics
![Page 7: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/7.jpg)
7Enterprise Information SystemsUmberto Nanni
OLTP & OLAP
OLTP - On-Line Transaction Processing– realm of (write and / or read) transactions, recovery,
consistency
– many, fast and frequent operations
– high level of concurrency
– access to a small amount of data
– on-the-fly data update
OLAP - On-Line Analytical Processing– read only
– few operations
– low level of concurrency
– access to huge amounts of data
– historical but essentially static data
![Page 8: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/8.jpg)
8Enterprise Information SystemsUmberto Nanni
Separation between:Operational Database & Data Warehouse
• different computational load
• different needs:
– DB: dynamic data, asynchronous updates
– DW: static data, periodic updates
• integration with business activity:
– DB: supporting operations (focused, timely)
– DW: supporting decisions (descriptive, historical)
• data collection:
– DB: minimal
– DW: maximal
![Page 9: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/9.jpg)
9Enterprise Information SystemsUmberto Nanni
Two issues with different perspectives
• Data redundancy
– OLTP (DB): to avoid, bringing to inconsistency and/or inefficiency on updates
– OLAP (DW): redundancy avoids recomputation and shorten response time
• Indexing
– OLTP (DB): good when you search – bad when you update... you need some trade-off
– OLAP (DW): the more, the best
![Page 10: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/10.jpg)
10Enterprise Information SystemsUmberto Nanni
Some Data Warehouse Systems
• Oracle
• IBM InfoSphere
• Microsoft SQL-Server 2014 – Analysis Services
• Sybase IQ
• Hyperion (bought by Oracle)
• Teradata (division of NCR)
• Netezza – Cognos (bought by IBM)
• Business Objects (bought by SAP)
• ...
![Page 11: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/11.jpg)
11Enterprise Information SystemsUmberto Nanni
A comparison by Gartner
Mark A. Beyer, Roxane EdjlaliMagic Quadrant for Data WarehouseDatabase Management SystemsGartner RAS Core Research Note G00255860, 07 March 2014
SAP
IBMMicrosoft
1010dataAmazon Web Services HP
Kognitio
Pivotal (Greeplum)
Cloudera
Exasol
ActianInfobright
InfiniDB (formerly Calpont)
MarkLogic
TeradataOracle
2014
![Page 12: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/12.jpg)
12Enterprise Information SystemsUmberto Nanni
2010
A comparison by Gartner (some years ago)
Donald Feinberg, Mark A. BeyerMagic Quadrant for Data WarehouseDatabase Management SystemsGartner RAS Core Research Note G00173535, 28 January 2010
![Page 13: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/13.jpg)
13Enterprise Information SystemsUmberto Nanni
Architectures for Datawarehousing: issues
• separating OLTP & OLAP
• scalability
• extensibility
• security
• administrability
![Page 14: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/14.jpg)
14Enterprise Information SystemsUmberto Nanni
Architecture for Datawarehousing:
‒ determined by design choices
‒ determined by / determines the choice of a
software system
‒ determines the cost and makes possible
future integration (quantitative and / or
qualitative)
‒ affects the cost of data processing
![Page 15: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/15.jpg)
15Enterprise Information SystemsUmberto Nanni
Data Mart
Collection of data focused on particular user profile or
on particular target analysis
Alternatives:
1. dependent Data Mart: it is a subset and/or an aggregation of
data in the primary DW
→ DM extracted from a DW
2. independent Data Mart: it is a subset and/or an aggregation
of data in the operational DB
→ DW=Ui(DMi), that is, DW is a set of DM
3. hybrid solution, combining 1, 2
![Page 16: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/16.jpg)
16Enterprise Information SystemsUmberto Nanni
DW architecture: 1 Level
• there is only an operational DW
• virtual DB (no OLTP-OLAP separation)
• data coincident with DB operational
• difficult integration with other sources
sources warehouse analysis
data - level 1 middleware
(copy of)operational
DB
externalsources
![Page 17: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/17.jpg)
17Enterprise Information SystemsUmberto Nanni
DW architecture: 2 Levels – dependent DMs
• data sources complemented with external sources• running on dedicated software platform• ETL: Extraction, Transformation, Loading• materialization of the DW• materialization of Data Marts
operBD
extBD
sources warehouse analysisfeeding
DW
DataMart
DataMart
ETL
data - level 1 data - level 2
![Page 18: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/18.jpg)
18Enterprise Information SystemsUmberto Nanni
sources warehouse analysisfeeding
DW architecture: 2 Levels – independent DMs
• Data Mart are materialized by feeding
• DW = union of DMs
operBD
extBD
DataMart
DataMart
ETL
data - level 1 data - level 2
![Page 19: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/19.jpg)
19Enterprise Information SystemsUmberto Nanni
DW architecture: 3 Levels
• a level of "reconciled" data (operational data store) is introduced
• separation into two phases of ETL activities:1. extraction / transformation
2. loading
operBD
extBD
DW
DataMart
DataMart
ET(L)
reconcilieddata
loading
data - level 1 data - level 2 data - level 3
sources warehouse analysisfeeding
![Page 20: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/20.jpg)
20Enterprise Information SystemsUmberto Nanni
ETL: Extraction, Transformation, Loading
• extraction
• cleaning - validation - filtering
• transformation
• loading
Operational Data, External Data
Reconciled Data
Data Warehouse
![Page 21: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/21.jpg)
21Enterprise Information SystemsUmberto Nanni
Extraction
• initial extraction:
– targeted at the creation of the DW
• furter extractions:
– static (replaces the whole DW)
– incremental
• log (journal)
• timestamp
![Page 22: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/22.jpg)
22Enterprise Information SystemsUmberto Nanni
Cleaning
• changing VALUES
• duplicates
• inconsistencies
– domain violation
– functional dependency violation
• null values
• misuse of fields
• spelling
• abbreviations (not homogeneous)
![Page 23: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/23.jpg)
23Enterprise Information SystemsUmberto Nanni
Transformation
• changing FORMATS:
• misalignment of formats
• field overloading
• unhomogeneous coding
![Page 24: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/24.jpg)
24Enterprise Information SystemsUmberto Nanni
Loading
• Refresh:
ex-novo load of the whole DW
• Update:
differential updates
![Page 25: Enterprise Information Systemsnanni/Didattica/MatDid/EIS... · InfiniDB (formerly Calpont) MarkLogic Teradata Oracle 2014. Umberto Nanni Enterprise Information Systems 12 2010 A comparison](https://reader035.vdocument.in/reader035/viewer/2022071218/6053a48078f47f5a46615b29/html5/thumbnails/25.jpg)
25Enterprise Information SystemsUmberto Nanni
Metadata
• internal metadata
– concerning the administration of the DW (i.e., sources, transformations,
schemas, users, etc..)
• external metadata
– interesting for users (e.g., measurement units, possible combinations)
• STANDARDs
• CWM - Common Warehouse Model (OMG), defined by:
– UML (Unified Modeling Language)
– XML (eXtensible Markup Language)
– XMI (XML Metadata Interchange)
OMG = Object Management Group: CORBA (Common Object Request Broker Architecture), UML
(Unified Modeling Language), MDA (Model-Driven Architecture)