0
2418
18
8
What is a Data Catalog?
Dr. Markus Eurich
CDQ Advent Calendar 2019
1
2418
18
8 » A Data Catalog is an integrated platform for data curation; matching data supply and demand. It offers users functions to register data; to retrieve and use data; and to assess and analyze data. «
Picture: unsplash.com
What is a data catalog?
2
2418
18
8
As data evolves into an important asset, data catalogs proliferateto democratize data
DATA CATALOGS
• satisfy the increasing demand for data from different user groups
• make data FAIR (findable, accessible, interoperable, reusable)
• help to democratize data inside enterprises
FAIR Principles
3
2418
18
8
A Data Catalog matches data supply and demand
Internal Data Sources
External Data Sources
System ASystem A System ASystem B System ASystem C
System AOpen Data System ACommercial Data
DATA SOURCES
Internal Users
External Users
DATAUSERS
Data Owners
Data Analysts
Data Stewards
Data Citizen
Customers / Suppliers
Business Partners
Government agencies
DATA CATALOG
Data Assessment
Data Collaboration
Administration
Data Analytics
Automation & Machine-learning
Data Visualization
Data In
ventory
Data D
iscovery
Data Governance
Data Supply Data DemandData Curation
MetadataNew or
modified metadata
New or modified
data
4
2418
18
8 Like goods in a library, data too needs tobe registered and described, so that itcan easily be retrieved by different usergroups. Thus, this block should containfunctions to organize and documentall data in the Data Catalog – withoutmoving the data from the data storagesystem.
Data Supply Data DemandData CurationPicture: unsplash.com
Data Inventory manages the data supply by registering and describing data in various ways
5
2418
18
8 A Data Catalog should support theassessment of data resources with regardto relevant aspects, for instance quality,usage, risk or value help organizations inmanaging data as a strategic resource.Besides, through the platform approachrespective tasks can be assigned and theirstatus tracked in a transparent manner.Through the central approach, the access andthe usage of data can be controlled on anenterprise level.
Data Supply Data DemandData CurationPicture: unsplash.com
Picture: unsplash.com
Data Curation functions support organizations to assess, secure and analyze data assets
6
2418
18
8 The Data Catalog provides differentfunctionalities to find and obtain relevantdata. It can be assumed that the faster andmore precise relevant data is found, the higherthe usage and usefulness of the Data Catalog.Furthermore, the obtainment of data must bemanaged in compliant but also in a self-servicemanner. These functionalities are crucial tocomponents of a Data Catalog to enable anefficient and effective consumption.
Data Supply Data DemandData CurationPicture: unsplash.com
Data Discovery manages the general data demand and provides functionalities to find and obtain relevant data
7
2418
18
8 All Data Catalog function groups can leveragemachine learning and can be automated toa certain degree. For instance, the DataCatalog learns how to label data or torecommend data based on previous usagebehavior. In this sense, functions can also becompletely automated.
Data Supply Data DemandData CurationPicture: unsplash.com
Picture: unsplash.com
Automation & Machine-learning technologies support data management tasks in various ways
8
2418
18
8 A Data Catalog should contain visualizationcapabilities that support users to overviewand understand data, data models and datalineage via dashboards, reports, flowgraphics, etc.
Data Supply Data DemandData CurationPicture: unsplash.com
Visualization Capabilities help users in diverse data management tasks
9
2418
18
8
Learn about our CC CDQ member benefits
Examples of CC CDQ member benefits include: • detailed information about data catalog functions• access to eight data catalog user role definitions• a generic metadata model including definitions of metadata objects, a list of metadata
attributes, and information about the link between metadata objects and attributes• further data catalog insights on data catalog implementation approaches, vendor solutions,
detailed usage scenarios• four CC CDQ workshops, several web conferences, and tool vendor demonstrations each year
with relevant data management topics• access to working reports, the latest CDQ presentations, and scientific publications on the CC
CDQ knowledge base
10
2418
18
8
Join our 2020 data catalogs activities
Unstructured data Data catalogs Managing the enterprise analytics platform
Type • Focus Day • Focus Day • Co-Innovation
Experience exchange on
• How to characterize and classify unstructured data?
• How to manage the lifecycle (e.g. retention periods) across various platforms?
• How to apply data management principles to unstructured data(documentation, quality, etc.)?
• How to build up and scale a data catalog (incl. usage scenarios)?
• How to measure and communicate the successes of data catalogs?
• How to prepare and ingest data for analytics (onboarding - push vs. pull)?
• How to make data available and FAIR to data scientists (data catalog)?
• How to define and measure data quality?
• What are good practices and maturity stages in EAP management?
Deliverables • Classification of unstructured data
• Approach(es) and principles to manage unstructured data
• Usage scenarios (update) and good practices
• Success metrics & business value of data catalogs
• Data excellence model / maturity model for the EAP
• Case studies and good practices
11
2418
18
8
I look forward to discussing with you!
Head of Knowledge ManagementCompetence Center Corporate Data Quality (CC CDQ)
Dr. Markus Eurich