harshal vora - bnp paribas

CDO FORUM NYC 2016

ARE WE WORKING BACK TO FRONT?

A USE BASED APPROACH TO DEFINING CRITICAL DATA ELEMENTS

Harshal Vora

CDO, BNP Paribas Americas

June 23, 2016

Agenda

6/23/2016 2

Data & Analytics Landscape

Why we shouldn't boil the ocean?

Key Data Elements & Prioritization

Storage – Moving beyond single source of data

D

A

T

A

Agenda

6/23/2016 3

Data & Analytics Landscape DATA Data – Its application in business decision making

and complexity in managing

Data – Its Impact, Scale & Complexity

6/23/2016 4

Data is boon and bane to any organization. Data is extremely powerful in being a critical

factor in making business decisions, transforming industry and business models. At the

same time it has the power to overwhelm systems and stymie growth

Over the years, in most organizations data proliferation, data replication, data

morphing for specific business objectives have increased the complexity of managing

data effectively

Objective of CDO: To provide business and organization executives a clear direction in ensuring that

they create a greater value from their organization’s data through strategy, policies and processes to

govern the data

Also, growing need to derive business insights from variety of data – consuming structured

and unstructured from internal sources as well as social media, IoT and other venues has

rapidly lead to high volumes of data consumption and storage creating urgent need for effective

data management on larger volumes of data

Data has been historically siloed, but lately data is realized as an enterprise asset and is

viewed from enterprise point of view. Hence approach to identifying critical data elements

should not result in boiling the ocean but driven by selective approach to data quality

governance Source: Analytics: A blueprint for value – Converting big data and analytics insights into results IBM Institute for Business Value. IBM 2013

Agenda

6/23/2016 5

Why we shouldn’t boil the ocean? DATA Data Quality is vast and can be difficult to manage

if controls are not enabled on focused items

Data Management Journey

Items to consider when beginning the journey

Data Quality Framework

Data Management – Data Quality Governance

6/23/2016 6

Quality is a nebulous term. Every stakeholder in the organization sees quality from their own perspective. Certain

data aspects which are critical to one stakeholder may not be critical for others

Data Warehouses are intended to bring data from multiple sources (incl. historical data) to store them centrally; hence

it inherits variety of data quality issues consolidated from different sources

Acceptance that not all data can be managed with the same level of rigor, cost and resource expense is important in

driving data quality effort

Identification of Critical/Key Data Elements is the critical first step in positioning business and IT to

meet data governance objectives both internal and external stakeholders, regulators and others

FRY 9C Analysis

90 KDEs analyzed

& identified

1835 Report Line

Items

3200 Fed Edit

Checks

However, there is increasing need for CDOs to effectively

manage data and make it fit for use in the organization for o Regulatory Compliance such as BCBS 239, Basel, EPS

o Financial Impact

o Business Impact

o Customer Impact

The key is to find the right balance and determine

what data is critical to manage and focus energy

and resources in governing that data

The Data Management Journey

6/23/2016 7

Define DG Target

Operating Model &

Roadmap

Establish

Foundation &

Initiate program

Initiate

Stakeholder

Engagements

Implement Data

Governance over

the normal course

of business

Scale across

additional Data

Domains & Bring

data under control

Items to consider

6/23/2016 8

In order to do improve data quality and effectively management data it is important to consider the following

factors:

Understand the usage of data

Identify where data has major impact – Financial, Regulatory, Business, Customer

In most organizations where data is consolidated from various systems into a data warehouse for the purpose

of achieving specific decision making objectives, it is quiet possible to find several data quality issues

Before thinking of enabling data quality controls, it is important to identify key areas of focus, then articulate

data quality dimensions pertinent to your quality use cases and how you will measure them

Although data quality issues may be identified at downstream systems such as a data warehouse, it is always

important to address the issues at origination, making business and data owners accountable

Approach data quality programs from the business point of view have sustained. Most data quality programs

that approached data quality purely based on technical point of view, have only failed resulting from

overwhelming amount of data quality issues to resolve, shortage of resources and the loss of passion from the

key drivers and stakeholders over a period of time.

Provide a forum for users to share data quality issues across organizations. This can be done by providing

users an ability to log in to an issue management system via a front end application. This allows for a shared

perspective of how users view the data and enhance standards/controls on relevant data

Establish Framework & Process

6/23/2016 9

Establish a framework to approach data quality with the following objectives:

Establish Organization Structure - Identify Key Stakeholders

Identify Key Business Elements – Business focused

Explore Critical Data Elements (at the system level)

Define Linkage

Define Data Quality Controls

Determine Metrics

Enable Data Quality Monitoring

Agenda

6/23/2016 10

Key Data Elements & Prioritization DATA Why defining Key Data Elements is important first

step in enabling Data Quality controls

Deep Dive into Key Data Elements

o Identification

o Prioritization

Data Quality Program Establishment – Key Data Elements

6/23/2016 11

Note: The above representation is an example only

Key Data Elements (KDE)

Identifying KDEs:

Prepare Core Team with

Business and IT

members who are close

to Business processes,

systems and data

Prepare questionnaire

based on the criteria for

determining KDEs to

evaluate data

elements against KDE

qualification

KDEs are identified and

then prioritized for

business importance /

impact.- Tier 1, Tier 2

and Tier 3

1

2

3

4

5

Inventory Data Domains Review Target Operating Model Determine data domains in scope

Determine Subject Areas Review Enterprise Information Model & subject areas in scope

Inventory Candidate Data Elements Review business glossary & STTM models for candidate data elements Determine data elements in scope for BIOs

Evaluate Data Elements and Identify KDEs Review data lineage to identify key data elements Determine DQ metrics, data definition, unique ID Determine DQ measurement location (at source, at target)

Prioritize KDEs Evaluate breadth of reach, business impact, visibility and severity of failure for KBE Establish data quality rules against metrics Publish prioritized KBEs

Define KDEs

Evaluate &

Measure Remediate

KDE Identification Criteria

6/23/2016 12

The following identification criteria provides a guideline for consideration by the Data Stewards when identifying KDEs:

Dimension Criteria Definition

Visibility

Regulator / Auditor

Identification

Data element has been specifically mentioned by internal or external auditors/regulators

as a key data element, the quality of which has been mentioned as potential risk factors

BCBS Key Risk Indicators /

Regulatory Risk

Data element has a significant regulatory risk impact as per the BCBS Key Risk

Indicators

Materiality per Reporting Data element is frequently used for internal and external reporting pertaining to a

portfolio considered as material to business

Key Business Investment /

Relationship Decisions

Data element is important for key business decisions around investments and client

relationship management

Severity of

Failure

Level of Controls Data element is required for implementing effective controls associated with inherent

risk in key business processes (potentially manual in nature)

Unique Identifier /

Segmentation

Data element is used as a unique identifier or used as a segmentation contextual, or

filtering variable that that is important to the Data Domain

Relationship to Other KDEs Data element is used as an input to “derive” (i.e., transformed or aggregated to

measure specific business outcome) a DE that is critical to business

Breadth of

Reach Multiple Business Process

Data element is created or consumed by multiple business processes within the data

domain; and must be correct & consistent across systems for accurate reporting

Other Expert Judgment Identification as KDE by a business SME within the Data Domain

KDE Prioritization Criteria

6/23/2016 13

A comprehensive risk-based approach for KDE prioritization has been developed to drive highest “Business Value” across following dimensions:

The long term strategic implementation around data governance and data management of KDEs should also consider the “Implementation Complexity” by evaluating: Required cost and resources Number of dependencies Overall implementation risk

The points for each rank will be added up to calculate the total prioritization score of the KDE:

For KDEs with the total scores ranging from 7 to 9 : Priority Tier 1 KDEs



Dimension Related Identification

Criteria

Prioritization Criteria

Tier 1 (3 pts) Tier 2 (2 pts) Tier 3 (1 pt)

Visibility

Regulator / Auditor

Identification

BCBS Key Risk Indicators

/ Regulatory Risk

Materiality per Reporting

Key Business Investment

/ Relationship Decisions

The KDE is leveraged

for external reporting

of financial / operating

performance or

regulatory

requirements

pertaining to a portfolio

considered as

material to business

The KDE is

leveraged for

internal financial or

operating

performance

reporting

The KDE is

leveraged for the

management of

operational

activities

Severity of

Failure

Level of Controls

Unique Identifier /

Segmentation

Relationship to Other

KDEs

Failure in KDE data

quality will result in

significant risk to the

enterprise including

customer / market

share loss

Potential for

increased risk

resulting from

deficiencies in KDE

data quality

Impact from

deficiencies in data

quality is limited or

none

Breadth of

Reach

Multiple Business

Process

The KDE is used for

reporting and/or

decision making

across more than

three Data Domains

or business

processes

The KDE is used for

reporting and/or

decision making

across two or three

Data Domains or

business processes

The KDE is used for

reporting and/or

decision making

within one Data

Domain or business

processes

Visibility (Material impact to business);

Severity of failure (Likelihood of inherent risk)

Breadth of reach (Usage across business processes and

domains)

Agenda

6/23/2016 14

Storage – Moving beyond single source of data DATA Why is volume and storage also important aspects

to consider besides Data Quality & KDEs?

Increasing Volumes and Cost of Storage

6/23/2016 15

The original objective of ECM (Enterprise Content Management) was to consolidate corporate information into a single, central

repository. Although this was the philosophy, most organizations end up with multiple repositories driven by adoption of different

vendor technologies or acquisitions/mergers leading to increase in these data silos.

2009 800,000 petabytes

as much Data & Content Over Coming Decade

44x

Of world’s data is unstructured

80%

2020 35 zetabytes

The key objective today for any organization are:

Managing the cost of storage (lowering TCO)

Real Time/Near Real Time integrated, actionable

data made available to the members

Cloud allows enterprises now to have scalable infrastructures that allow

them to expand their storage to massive volumes with low TCO.

Private and Hybrid Clouds now enable organizations to store variety of

data from internal as well as external sources independent of the

structure into a single virtual repository.

With the advent of web, smartphones, tablets & IoTs people became information creators and consumers leading to multiplied data

sources and increased challenges of integration

Moving beyond single source of data

6/23/2016 16

Solutions such as Data Lake have lately becoming more prominent in allowing variety of data to be

ingested into a structure independent repository where organization can leverage and use as they

see fit.

Enable appropriate controls on the relevant data and in context of usage

Reconciliations across various sources in ensuring they are consistent with system of record and allow reporting accuracy

Catalog metadata enabling users to fetch right and relevant data via indexing and search capabilities with proper security

enforcement

With increasing availability of data virtualization and big data technologies enterprises no longer need to wrangle with the challenges of

creating a single integrated repositories.

These technologies allow a virtual integrated environments allowing users from various departments and business lines to

Provide ability to ingest data from multiple sources

Have a unified view of data

Metadata capabilities to link the sources for a meaningful search and retrieval

Although these virtual integration layers provide opportunities they also present challenges. Hence

there is a need for proper governance to

6/23/2016 17

Questions

https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=0ahUKEwiDopfv_arNAhWH1x4KHSxeD2sQjRwIBw&url=https://www.abine.com/blog/2012/data-questions-for-presidential-campaigns/&bvm=bv.124272578,d.dmo&psig=AFQjCNGEwT81qb_qkyZv847CQaCTTmq7HQ&ust=1466112752248910

harshal vora - bnp paribas

Technology