what is a data catalog? - cdq€¦ · • access to eight data catalog user role definitions • a...

12
What is a Data Catalog? Dr. Markus Eurich CDQ Advent Calendar 2019

Upload: others

Post on 14-Aug-2020

13 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: What is a Data Catalog? - CDQ€¦ · • access to eight data catalog user role definitions • a generic metadata model including definitions of metadata objects, a list of metadata

0

2418

18

8

What is a Data Catalog?

Dr. Markus Eurich

CDQ Advent Calendar 2019

Page 2: What is a Data Catalog? - CDQ€¦ · • access to eight data catalog user role definitions • a generic metadata model including definitions of metadata objects, a list of metadata

1

2418

18

8 » A Data Catalog is an integrated platform for data curation; matching data supply and demand. It offers users functions to register data; to retrieve and use data; and to assess and analyze data. «

Picture: unsplash.com

What is a data catalog?

Page 3: What is a Data Catalog? - CDQ€¦ · • access to eight data catalog user role definitions • a generic metadata model including definitions of metadata objects, a list of metadata

2

2418

18

8

As data evolves into an important asset, data catalogs proliferateto democratize data

DATA CATALOGS

• satisfy the increasing demand for data from different user groups

• make data FAIR (findable, accessible, interoperable, reusable)

• help to democratize data inside enterprises

FAIR Principles

Page 4: What is a Data Catalog? - CDQ€¦ · • access to eight data catalog user role definitions • a generic metadata model including definitions of metadata objects, a list of metadata

3

2418

18

8

A Data Catalog matches data supply and demand

Internal Data Sources

External Data Sources

System ASystem A System ASystem B System ASystem C

System AOpen Data System ACommercial Data

DATA SOURCES

Internal Users

External Users

DATAUSERS

Data Owners

Data Analysts

Data Stewards

Data Citizen

Customers / Suppliers

Business Partners

Government agencies

DATA CATALOG

Data Assessment

Data Collaboration

Administration

Data Analytics

Automation & Machine-learning

Data Visualization

Data In

ventory

Data D

iscovery

Data Governance

Data Supply Data DemandData Curation

MetadataNew or

modified metadata

New or modified

data

Page 5: What is a Data Catalog? - CDQ€¦ · • access to eight data catalog user role definitions • a generic metadata model including definitions of metadata objects, a list of metadata

4

2418

18

8 Like goods in a library, data too needs tobe registered and described, so that itcan easily be retrieved by different usergroups. Thus, this block should containfunctions to organize and documentall data in the Data Catalog – withoutmoving the data from the data storagesystem.

Data Supply Data DemandData CurationPicture: unsplash.com

Data Inventory manages the data supply by registering and describing data in various ways

Page 6: What is a Data Catalog? - CDQ€¦ · • access to eight data catalog user role definitions • a generic metadata model including definitions of metadata objects, a list of metadata

5

2418

18

8 A Data Catalog should support theassessment of data resources with regardto relevant aspects, for instance quality,usage, risk or value help organizations inmanaging data as a strategic resource.Besides, through the platform approachrespective tasks can be assigned and theirstatus tracked in a transparent manner.Through the central approach, the access andthe usage of data can be controlled on anenterprise level.

Data Supply Data DemandData CurationPicture: unsplash.com

Picture: unsplash.com

Data Curation functions support organizations to assess, secure and analyze data assets

Page 7: What is a Data Catalog? - CDQ€¦ · • access to eight data catalog user role definitions • a generic metadata model including definitions of metadata objects, a list of metadata

6

2418

18

8 The Data Catalog provides differentfunctionalities to find and obtain relevantdata. It can be assumed that the faster andmore precise relevant data is found, the higherthe usage and usefulness of the Data Catalog.Furthermore, the obtainment of data must bemanaged in compliant but also in a self-servicemanner. These functionalities are crucial tocomponents of a Data Catalog to enable anefficient and effective consumption.

Data Supply Data DemandData CurationPicture: unsplash.com

Data Discovery manages the general data demand and provides functionalities to find and obtain relevant data

Page 8: What is a Data Catalog? - CDQ€¦ · • access to eight data catalog user role definitions • a generic metadata model including definitions of metadata objects, a list of metadata

7

2418

18

8 All Data Catalog function groups can leveragemachine learning and can be automated toa certain degree. For instance, the DataCatalog learns how to label data or torecommend data based on previous usagebehavior. In this sense, functions can also becompletely automated.

Data Supply Data DemandData CurationPicture: unsplash.com

Picture: unsplash.com

Automation & Machine-learning technologies support data management tasks in various ways

Page 9: What is a Data Catalog? - CDQ€¦ · • access to eight data catalog user role definitions • a generic metadata model including definitions of metadata objects, a list of metadata

8

2418

18

8 A Data Catalog should contain visualizationcapabilities that support users to overviewand understand data, data models and datalineage via dashboards, reports, flowgraphics, etc.

Data Supply Data DemandData CurationPicture: unsplash.com

Visualization Capabilities help users in diverse data management tasks

Page 10: What is a Data Catalog? - CDQ€¦ · • access to eight data catalog user role definitions • a generic metadata model including definitions of metadata objects, a list of metadata

9

2418

18

8

Learn about our CC CDQ member benefits

Examples of CC CDQ member benefits include: • detailed information about data catalog functions• access to eight data catalog user role definitions• a generic metadata model including definitions of metadata objects, a list of metadata

attributes, and information about the link between metadata objects and attributes• further data catalog insights on data catalog implementation approaches, vendor solutions,

detailed usage scenarios• four CC CDQ workshops, several web conferences, and tool vendor demonstrations each year

with relevant data management topics• access to working reports, the latest CDQ presentations, and scientific publications on the CC

CDQ knowledge base

Page 11: What is a Data Catalog? - CDQ€¦ · • access to eight data catalog user role definitions • a generic metadata model including definitions of metadata objects, a list of metadata

10

2418

18

8

Join our 2020 data catalogs activities

Unstructured data Data catalogs Managing the enterprise analytics platform

Type • Focus Day • Focus Day • Co-Innovation

Experience exchange on

• How to characterize and classify unstructured data?

• How to manage the lifecycle (e.g. retention periods) across various platforms?

• How to apply data management principles to unstructured data(documentation, quality, etc.)?

• How to build up and scale a data catalog (incl. usage scenarios)?

• How to measure and communicate the successes of data catalogs?

• How to prepare and ingest data for analytics (onboarding - push vs. pull)?

• How to make data available and FAIR to data scientists (data catalog)?

• How to define and measure data quality?

• What are good practices and maturity stages in EAP management?

Deliverables • Classification of unstructured data

• Approach(es) and principles to manage unstructured data

• Usage scenarios (update) and good practices

• Success metrics & business value of data catalogs

• Data excellence model / maturity model for the EAP

• Case studies and good practices

Page 12: What is a Data Catalog? - CDQ€¦ · • access to eight data catalog user role definitions • a generic metadata model including definitions of metadata objects, a list of metadata

11

2418

18

8

I look forward to discussing with you!

[email protected]

Head of Knowledge ManagementCompetence Center Corporate Data Quality (CC CDQ)

Dr. Markus Eurich