metadata in business intelligence
DESCRIPTION
This presentation is part of my work for the course 'Heterogeneous and Distributed Information Systems' at TU Berlin within the IT4BI (Information Technology for Business Intelligence) master programme.TRANSCRIPT
Metadata in Business Intelligence
Jose Luis Lopez Pino
Database Systems and Information ManagementTechnische Universitat Berlin
January 28, 2014
v1.2
Table of Contents
1 MetadataWhat is Metadata?Metadata for InformationSystems
2 Business IntelligenceWhat is BusinessIntelligence?Business Intelligence in aNutshellThe Dimensional FactModelData Warehousing
3 Metadata in BIMotivationClassificationThe Four Commandmentsof BI Metadata
4 ExamplesROLAP and MetadataOracle Administration Tool
5 ResearchMetadata andInteroperabilityPlatform-IndependentModelsMetadata in MultiversionDWH
6 Big DataExamplesSome Thoughts aboutMetadata and Hadoop
7 Conclusions10 Reasons why Metadatamatters in BIFinal Conclusions
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Metadata
Jose Luis Lopez Pino 3
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
What is Metadata?
“ Metadata is a set of data that describes and givesinformation about other data. ”
— Oxford Dictionary
“ Metadata is explicitly managed data describing other data orsystem elements to support their documentation, reusabilityand interoperation.” 1
1Susanne Busse, Ralf-Detlef Kutsche, Ulf Leser, and Herbert Weber.Federated information systems: Concepts, terminology and architectures.Citeseer, 1999
Jose Luis Lopez Pino 4
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Metadata for Information Systems
I Technical metadata: describes information regarding thetechnical access mechanisms of components.
I Logical metadata: relates to the schemas and their logicalrelationships.
I Metamodels: supports the interoperability of schemas indifferent data models.
I Semantic metadata: helps to describe the semantic ofconcepts.
I Quality-related: describes source-specific properties ofinformation systems regarding their quality.
I Infrastructure metadata: helps users to find relevant data.
I User-related metadata: describes responsibilities andpreferences of the users
Jose Luis Lopez Pino 5
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Business Intelligence
Jose Luis Lopez Pino 6
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
What is Business Intelligence?
Processing and organizing data in order to extract informationand using this information to make business decisions.
“ Business intelligence (BI) is an umbrella term that includesthe applications, infrastructure and tools, and best practicesthat enable access to and analysis of information to improveand optimize decisions and performance.”
— Gartner
Jose Luis Lopez Pino 7
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Why Data Analysis?
Jose Luis Lopez Pino 8
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Business Intelligence in a Nutshell I
I OLTP: information system oriented to small and interactiveoperations
I ETL: process that consist of extractions, transformations andloads of data
I Data warehouse: central repository of data used for reportingand analysis
I Datamart: contains a subset of the information of a datawarehouse and it is personalized for a single business view.
I OLAP: technique to analyse multi-dimensional data
I ROLAP: using a relational database do OLAP analysis
I MDX: query language for multidimensional data
I Data mining: discovering patterns in data
Jose Luis Lopez Pino 9
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Business Intelligence in a Nutshell II
I Data visualization: representation of data to make it moremeaningful and/or attractive
I Decision support: tools that facilitates making a decisionbased on data
I Data-driven business: companies leaded by a strategy basedon data
Jose Luis Lopez Pino 10
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
The Dimensional Fact Model I
I Fact: is an event that is relevant to the decision-makingprocess.
I Measure: is a numerical attribute of the fact
I The dimensions categorize the data into a finite number ofslots.
Jose Luis Lopez Pino 11
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
The Dimensional Fact Model II
Jose Luis Lopez Pino 12
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Cube
Jose Luis Lopez Pino 13
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Data Warehousing
Copyright 2013 Toon Calders http://goo.gl/ds8nZc
Jose Luis Lopez Pino 14
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Metadata Management in Data Warehousing
Copyright 2014 LINGARO http://goo.gl/Wfxsni
Jose Luis Lopez Pino 15
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Metadata in BI
Jose Luis Lopez Pino 16
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Motivation: Quotes I
“Metadata is a vital element of the data warehouse.”
— William Inmon2
“Metadata is the DNA of the data warehouse.”
— Ralph Kimball3
“Metadata is analogous to the data warehouse encyclopedia.”
— Ralph Kimball3
2William H Inmon. Metadata in the Data Warehouse. Morgan Kaufmann,2000
3Ralph Kimball. The data warehouse lifecycle toolkit: expert methods fordesigning, developing, and deploying data warehouses. Wiley. com, 1998
Jose Luis Lopez Pino 17
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Motivation: Quotes II
“The fact that metadata drives the warehouse is the literaltruth. If you think you wont use metadata, you are mistaken.”
— Ralph Kimball4
“In the scope of data warehousing, meta-data plays anessential role because it specifies source, values, usage andfeatures of data warehouse data and defines how data can bechanged and processed at every architecture layer.”
— Matteo Golfarelli, Stefano Rizzi54Ralph Kimball. The data warehouse lifecycle toolkit: expert methods for
designing, developing, and deploying data warehouses. Wiley. com, 19985M. Golfarelli and S. Rizzi. Data Warehouse Design: Modern Principles and
Methodologies. Mcgraw-Hill, 2009Jose Luis Lopez Pino 18
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Metadata is everywhere!
I Meaning of the objects.I User profiles.I Security permissions.I Usage statistics.I Logical model.I Relation between physical and logical objects.I DBMS metadata: tables, indexes, FKs, PKs, etc.I Reporting / Data analysis objects.I Transformations of the data.I Data sources and data targets.I Query logs.I ETL logs.I Materialized information.
Jose Luis Lopez Pino 19
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Classification
1. Technical metadata:� Describes the physical objects that make up the datata
warehouse.� Tables, fields, indexes, sources, targets, transformations, etc.
2. Business metadata:� Describes the contents of the data warehouse in an accessible
way to conduct the day-to-day business.6
� Facts, dimensions, logical relationships, etc.
3. Process metadata:� Describes operations executed on the warehouse and their
results.� Results of the ETL process, query logging, etc.
6William H Inmon, Bonnie O’Neil, and Lowell Fryman. Business Metadata:Capturing Enterprise Knowledge: Capturing Enterprise Knowledge. MorganKaufmann, 2010
Jose Luis Lopez Pino 20
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
The Four Commandments of BI Metadata
A data warehouses likelihood for success is greatly increased byfollowing Ralph Kimball advices:7
1. Be aware of what metadata you keep.
2. Centralize it where possible.
3. Track your metadata.
4. Keep it up to date.
7Ralph Kimball. The data warehouse lifecycle toolkit: expert methods fordesigning, developing, and deploying data warehouses. Wiley. com, 1998
Jose Luis Lopez Pino 21
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Examples
Jose Luis Lopez Pino 22
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
ROLAP and Metadata
Figure: PostgreSQL’s ROLAP server translates MDX query into SQL
Jose Luis Lopez Pino 23
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
ROLAP and Metadata
1SELECT2Expenses . ” Expenses p e r day ” saw 0 ,3Expenses . ” Days w i t h e x p e n s e s ” saw 1 ,4Expenses . ” T o t a l Expenses ” saw 2 ,5P e r i o d . ” Year ” saw 36FROM ”HR − T r a v e l Expenses ”7ORDER BY saw 3
Figure: MDX Query
Jose Luis Lopez Pino 24
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
ROLAP and Metadata
1s e l e c t2sum( case when T1757 .ZD NUM = 0 then 0 e l s e ( T1757 .
ZMDTACE NAC IM + 1 7 5 7 .ZMDTACO NAC IM + T1757 . ZD NAC IM +T1757 . ZCOMD NAC IM + T1757 . ZCOMDDIC IM + T1757 .ZMDTACE EXT IM + T1757 . ZMDTACO EXT IM + T1757 . ZD EXT IM +
T1757 . ZCOMD EXT IM) / n u l l i f ( T1757 .ZD NUM, 0) end ) asc1 ,
3sum( T1757 .ZD NUM) as c2 ,4sum( T1757 . ZCLV 032 + T1757 . ZCLV 132 ) as c3 ,5T623 .YEAR as c46from7SYSADM. PS ZOBI CALENDA VW T623 ,8SYSADM. PS ZOBI DS TBL T17579where ( T623 . MONTH OF YEAR = T1757 . MONTH OF YEAR and T1757 .
ZID COL = ’T ’ and T623 . MONTH OF YEAR <= 201206 and T623 .YEAR between 2012 − 2 and 2012 )
10group by T623 .YEAR11order by c4
Figure: SQL Query
Jose Luis Lopez Pino 25
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Oracle Administration Tool
Figure: The physical layer stores the tehnical metadata meanwhile theother two layers store the business metadata.
Jose Luis Lopez Pino 26
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Advantages
I Abstraction: the data analysts do not need to have knowledgeof the complex data sources involved in the system. Dataanalysts only worry about the business question, not abouthow to answer it.
I Portability: the changes on the physical model don’t affectthe logical model.
I Security: defining a strong security policy allow theadministrators to restrict the access of the users toinformation that they must not know about.
I Customization: the information is adapted to the user.
Azriel Marla and Bob Ertl. Oracle fusion middleware metadata repositorybuilder’s guide for oracle business intelligence enterprise edition, 11g release 1(11.1. 1), 2011
Jose Luis Lopez Pino 27
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Research
Jose Luis Lopez Pino 28
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Metadata and Interoperability
I The BI environment is compound of a wide variety of toolsI Complex bridges are crucial to integrate metadata among
them.I It is necessary to define a standard to facilitate the
interoperability and integration.I Some attempts:
� Open Information Model (OIM) by Meta Data Coalition.� Common Warehouse Metamodel (CWM) by OMG.� OIM was integrated to CWM.
I Suggestion: to use domain ontologies to establish semanticmappings between different data-marts
Stefano Rizzi, Alberto Abello, Jens Lechtenborger, and Juan Trujillo.Research in data warehouse modeling and design: dead or alive? In Proceedingsof the 9th ACM international workshop on Data warehousing and OLAP, pages3–10. ACM, 2006
Jose Luis Lopez Pino 29
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
How Standards proliferate?
Figure: XKCD http://xkcd.com/927/
Jose Luis Lopez Pino 30
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
OIM Vs. CWD
I They both are metadata standards for data warehousing
I OIM’s scope is wider, not only for metadata.
I Good for technical metadata, not for business metadata.
I OIM is limited to relational data.
I Using CWM, metadata exchange between tools that use theXMI standard is automatic.
Thomas Vetterli, Anca Vaduva, and Martin Staudt. Metadata standards fordata warehousing: open information model vs. common warehouse metadata.ACM Sigmod Record, 29(3):68–75, 2000
Jose Luis Lopez Pino 31
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Platform-Independent Models
I The problem: You have to provide OLAP metadata to bridgethe gap between the conceptual and logical model. Thismetadata depends on the platform.
I The solution:� Define an OLAP algebra that provides semantics in
multidimensional models.� It derives the logical design automatically, for any platform.� Model Driven Architecture: derive the metadata from the
conceptual model.
Jesus Pardillo, Jose-Norberto Mazon, and Juan Trujillo. Bridging thesemantic gap in olap models: platform-independent queries. In Proceedings ofthe ACM 11th international workshop on Data warehousing and OLAP, pages89–96. ACM, 2008
Jesus Pardillo, Jose-Norberto Mazon, and Juan Trujillo. Towards theautomatic generation of analytical end-user tools metadata for data warehouses.In Sharing Data, Information and Knowledge, pages 203–206. Springer, 2008
Jose Luis Lopez Pino 32
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Metadata in Multiversion DWH
I Multiversion DWH:
� It keeps track of the changes in the schema and the data.� Metadata become more complex and useful in these systems.
I Proposal:� Use a metamodel to manage different versions of the DWH.� Use a metamodel to detect changes in the external data
sources.
Robert Wrembel and Bartosz Bebel. Metadata management in amultiversion data warehouse. In On the Move to Meaningful Internet Systems2005: CoopIS, DOA, and ODBASE, pages 1347–1364. Springer, 2005
Jose Luis Lopez Pino 33
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Big Data
Jose Luis Lopez Pino 34
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Examples: HDFS
I The NameNode stores all the metadata in a single point.
I It keeps all the metadata in memory.
I It might be problematic when we store a vast amount of smallfiles14
14Grant Mackey, Saba Sehrish, and Jun Wang. Improving metadatamanagement for small files in hdfs. In Cluster Computing and Workshops, 2009.CLUSTER’09. IEEE International Conference on, pages 1–4. IEEE, 2009
Jose Luis Lopez Pino 35
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Examples: Query Planner
Figure: Apache Drill architecture: http://goo.gl/icZctF
Jose Luis Lopez Pino 36
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Examples: Table and Storage Management Layer
Figure: HCatalog http://goo.gl/7E1xLc
Jose Luis Lopez Pino 37
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Examples: Authorization to Data and Metadata
Figure: Apache Sentry: http://goo.gl/zAsIyk
Jose Luis Lopez Pino 38
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Some Thoughts about Metadata and Hadoop
I Technical metadata is necessary.
I Hadoop is rapidly becoming a mature platform and hencemetadata will be more relevant in the following years.
I Metadata seems to be a perfect fit for the heterogeneousHadoop ecosystem.
Jose Luis Lopez Pino 39
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Conclusions
Jose Luis Lopez Pino 40
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
10 Reasons why Metadata matters in BI
1. It’s everywhere!
2. It meets the disparate needs of the data warehouses technical,administrative, and business user groups.
3. It contains information at least as valuable as regular data.
4. It is used to describe the semantic of concepts.
5. It facilitates the extraction, transformation and load process.
6. It improves data security.
7. It hides implementation details.
8. We can customize how the user sees the data.
9. It helps interoperability among systems.
10. It allow us to design portable solutions.
Jose Luis Lopez Pino 41
Metadata Business Intelligence Metadata in BI Examples Research Big Data Conclusions
Final Conclusions
1. Metadata matters
2. Metadata is everywhere.You can’t get out ofdodge
3. Research is alive
4. Metadata management isless painful when usingthe right tools
5. Big data challenges areeased by metadata
Jose Luis Lopez Pino 42