harshal vora - bnp paribas
TRANSCRIPT
CDO FORUM NYC 2016
ARE WE WORKING BACK TO FRONT?
A USE BASED APPROACH TO DEFINING CRITICAL DATA ELEMENTS
Harshal Vora
CDO, BNP Paribas Americas
June 23, 2016
Agenda
6/23/2016 2
Data & Analytics Landscape
Why we shouldn't boil the ocean?
Key Data Elements & Prioritization
Storage – Moving beyond single source of data
D
A
T
A
Agenda
6/23/2016 3
Data & Analytics Landscape DATA Data – Its application in business decision making
and complexity in managing
Data – Its Impact, Scale & Complexity
6/23/2016 4
Data is boon and bane to any organization. Data is extremely powerful in being a critical
factor in making business decisions, transforming industry and business models. At the
same time it has the power to overwhelm systems and stymie growth
Over the years, in most organizations data proliferation, data replication, data
morphing for specific business objectives have increased the complexity of managing
data effectively
Objective of CDO: To provide business and organization executives a clear direction in ensuring that
they create a greater value from their organization’s data through strategy, policies and processes to
govern the data
Also, growing need to derive business insights from variety of data – consuming structured
and unstructured from internal sources as well as social media, IoT and other venues has
rapidly lead to high volumes of data consumption and storage creating urgent need for effective
data management on larger volumes of data
Data has been historically siloed, but lately data is realized as an enterprise asset and is
viewed from enterprise point of view. Hence approach to identifying critical data elements
should not result in boiling the ocean but driven by selective approach to data quality
governance Source: Analytics: A blueprint for value – Converting big data and analytics insights into results IBM Institute for Business Value. IBM 2013
Agenda
6/23/2016 5
Why we shouldn’t boil the ocean? DATA Data Quality is vast and can be difficult to manage
if controls are not enabled on focused items
Data Management Journey
Items to consider when beginning the journey
Data Quality Framework
Data Management – Data Quality Governance
6/23/2016 6
Quality is a nebulous term. Every stakeholder in the organization sees quality from their own perspective. Certain
data aspects which are critical to one stakeholder may not be critical for others
Data Warehouses are intended to bring data from multiple sources (incl. historical data) to store them centrally; hence
it inherits variety of data quality issues consolidated from different sources
Acceptance that not all data can be managed with the same level of rigor, cost and resource expense is important in
driving data quality effort
Identification of Critical/Key Data Elements is the critical first step in positioning business and IT to
meet data governance objectives both internal and external stakeholders, regulators and others
FRY 9C Analysis
90 KDEs analyzed
& identified
1835 Report Line
Items
3200 Fed Edit
Checks
However, there is increasing need for CDOs to effectively
manage data and make it fit for use in the organization for o Regulatory Compliance such as BCBS 239, Basel, EPS
o Financial Impact
o Business Impact
o Customer Impact
The key is to find the right balance and determine
what data is critical to manage and focus energy
and resources in governing that data
The Data Management Journey
6/23/2016 7
Define DG Target
Operating Model &
Roadmap
Establish
Foundation &
Initiate program
Initiate
Stakeholder
Engagements
Implement Data
Governance over
the normal course
of business
Scale across
additional Data
Domains & Bring
data under control
Items to consider
6/23/2016 8
In order to do improve data quality and effectively management data it is important to consider the following
factors:
Understand the usage of data
Identify where data has major impact – Financial, Regulatory, Business, Customer
In most organizations where data is consolidated from various systems into a data warehouse for the purpose
of achieving specific decision making objectives, it is quiet possible to find several data quality issues
Before thinking of enabling data quality controls, it is important to identify key areas of focus, then articulate
data quality dimensions pertinent to your quality use cases and how you will measure them
Although data quality issues may be identified at downstream systems such as a data warehouse, it is always
important to address the issues at origination, making business and data owners accountable
Approach data quality programs from the business point of view have sustained. Most data quality programs
that approached data quality purely based on technical point of view, have only failed resulting from
overwhelming amount of data quality issues to resolve, shortage of resources and the loss of passion from the
key drivers and stakeholders over a period of time.
Provide a forum for users to share data quality issues across organizations. This can be done by providing
users an ability to log in to an issue management system via a front end application. This allows for a shared
perspective of how users view the data and enhance standards/controls on relevant data
Establish Framework & Process
6/23/2016 9
Establish a framework to approach data quality with the following objectives:
Establish Organization Structure - Identify Key Stakeholders
Identify Key Business Elements – Business focused
Explore Critical Data Elements (at the system level)
Define Linkage
Define Data Quality Controls
Determine Metrics
Enable Data Quality Monitoring
Agenda
6/23/2016 10
Key Data Elements & Prioritization DATA Why defining Key Data Elements is important first
step in enabling Data Quality controls
Deep Dive into Key Data Elements
o Identification
o Prioritization
Data Quality Program Establishment – Key Data Elements
6/23/2016 11
Note: The above representation is an example only
Key Data Elements (KDE)
Identifying KDEs:
Prepare Core Team with
Business and IT
members who are close
to Business processes,
systems and data
Prepare questionnaire
based on the criteria for
determining KDEs to
evaluate data
elements against KDE
qualification
KDEs are identified and
then prioritized for
business importance /
impact.- Tier 1, Tier 2
and Tier 3
1
2
3
4
5
Inventory Data Domains Review Target Operating Model Determine data domains in scope
Determine Subject Areas Review Enterprise Information Model & subject areas in scope
Inventory Candidate Data Elements Review business glossary & STTM models for candidate data elements Determine data elements in scope for BIOs
Evaluate Data Elements and Identify KDEs Review data lineage to identify key data elements Determine DQ metrics, data definition, unique ID Determine DQ measurement location (at source, at target)
Prioritize KDEs Evaluate breadth of reach, business impact, visibility and severity of failure for KBE Establish data quality rules against metrics Publish prioritized KBEs
Define KDEs
Evaluate &
Measure Remediate
KDE Identification Criteria
6/23/2016 12
The following identification criteria provides a guideline for consideration by the Data Stewards when identifying KDEs:
Dimension Criteria Definition
Visibility
Regulator / Auditor
Identification
Data element has been specifically mentioned by internal or external auditors/regulators
as a key data element, the quality of which has been mentioned as potential risk factors
BCBS Key Risk Indicators /
Regulatory Risk
Data element has a significant regulatory risk impact as per the BCBS Key Risk
Indicators
Materiality per Reporting Data element is frequently used for internal and external reporting pertaining to a
portfolio considered as material to business
Key Business Investment /
Relationship Decisions
Data element is important for key business decisions around investments and client
relationship management
Severity of
Failure
Level of Controls Data element is required for implementing effective controls associated with inherent
risk in key business processes (potentially manual in nature)
Unique Identifier /
Segmentation
Data element is used as a unique identifier or used as a segmentation contextual, or
filtering variable that that is important to the Data Domain
Relationship to Other KDEs Data element is used as an input to “derive” (i.e., transformed or aggregated to
measure specific business outcome) a DE that is critical to business
Breadth of
Reach Multiple Business Process
Data element is created or consumed by multiple business processes within the data
domain; and must be correct & consistent across systems for accurate reporting
Other Expert Judgment Identification as KDE by a business SME within the Data Domain
KDE Prioritization Criteria
6/23/2016 13
A comprehensive risk-based approach for KDE prioritization has been developed to drive highest “Business Value” across following dimensions:
The long term strategic implementation around data governance and data management of KDEs should also consider the “Implementation Complexity” by evaluating: Required cost and resources Number of dependencies Overall implementation risk
The points for each rank will be added up to calculate the total prioritization score of the KDE:
For KDEs with the total scores ranging from 7 to 9 : Priority Tier 1 KDEs
For KDEs with the total scores ranging from 5 to 6 : Priority Tier 2 KDEs
For KDEs with the total scores ranging from 3 to 4 : Priority Tier 3 KDEs
Dimension Related Identification
Criteria
Prioritization Criteria
Tier 1 (3 pts) Tier 2 (2 pts) Tier 3 (1 pt)
Visibility
Regulator / Auditor
Identification
BCBS Key Risk Indicators
/ Regulatory Risk
Materiality per Reporting
Key Business Investment
/ Relationship Decisions
The KDE is leveraged
for external reporting
of financial / operating
performance or
regulatory
requirements
pertaining to a portfolio
considered as
material to business
The KDE is
leveraged for
internal financial or
operating
performance
reporting
The KDE is
leveraged for the
management of
operational
activities
Severity of
Failure
Level of Controls
Unique Identifier /
Segmentation
Relationship to Other
KDEs
Failure in KDE data
quality will result in
significant risk to the
enterprise including
customer / market
share loss
Potential for
increased risk
resulting from
deficiencies in KDE
data quality
Impact from
deficiencies in data
quality is limited or
none
Breadth of
Reach
Multiple Business
Process
The KDE is used for
reporting and/or
decision making
across more than
three Data Domains
or business
processes
The KDE is used for
reporting and/or
decision making
across two or three
Data Domains or
business processes
The KDE is used for
reporting and/or
decision making
within one Data
Domain or business
processes
Visibility (Material impact to business);
Severity of failure (Likelihood of inherent risk)
Breadth of reach (Usage across business processes and
domains)
Agenda
6/23/2016 14
Storage – Moving beyond single source of data DATA Why is volume and storage also important aspects
to consider besides Data Quality & KDEs?
Increasing Volumes and Cost of Storage
6/23/2016 15
The original objective of ECM (Enterprise Content Management) was to consolidate corporate information into a single, central
repository. Although this was the philosophy, most organizations end up with multiple repositories driven by adoption of different
vendor technologies or acquisitions/mergers leading to increase in these data silos.
2009 800,000 petabytes
as much Data & Content Over Coming Decade
44x
Of world’s data is unstructured
80%
2020 35 zetabytes
The key objective today for any organization are:
Managing the cost of storage (lowering TCO)
Real Time/Near Real Time integrated, actionable
data made available to the members
Cloud allows enterprises now to have scalable infrastructures that allow
them to expand their storage to massive volumes with low TCO.
Private and Hybrid Clouds now enable organizations to store variety of
data from internal as well as external sources independent of the
structure into a single virtual repository.
With the advent of web, smartphones, tablets & IoTs people became information creators and consumers leading to multiplied data
sources and increased challenges of integration
Moving beyond single source of data
6/23/2016 16
Solutions such as Data Lake have lately becoming more prominent in allowing variety of data to be
ingested into a structure independent repository where organization can leverage and use as they
see fit.
Enable appropriate controls on the relevant data and in context of usage
Reconciliations across various sources in ensuring they are consistent with system of record and allow reporting accuracy
Catalog metadata enabling users to fetch right and relevant data via indexing and search capabilities with proper security
enforcement
With increasing availability of data virtualization and big data technologies enterprises no longer need to wrangle with the challenges of
creating a single integrated repositories.
These technologies allow a virtual integrated environments allowing users from various departments and business lines to
Provide ability to ingest data from multiple sources
Have a unified view of data
Metadata capabilities to link the sources for a meaningful search and retrieval
Although these virtual integration layers provide opportunities they also present challenges. Hence
there is a need for proper governance to