02 data webinar final.pptx [read-only] · does your organization currently leverage big data? a)...
TRANSCRIPT
![Page 1: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/1.jpg)
©2016 FI Consulting. All rights reserved.
DATA ESSENTIALSFOR ANALYTICS AND MODELING
September 14th, 2016
![Page 2: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/2.jpg)
©2016 FI Consulting. All rights reserved.
Why Does Data Matter?
“If we have data, let’s look at data. If all we have are opinions, let’s go with mine.”
– Jim Barksdale, Former CEO of Netscape
2
![Page 3: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/3.jpg)
©2016 FI Consulting. All rights reserved.
Why Does Data Matter?
25% of critical data at the worlds top companies is flawed
37% were not satisfied with quality and availability of appropriate data for BI and Analytics
Bad data costs U.S. businesses $600 billion a year.
3
![Page 4: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/4.jpg)
©2016 FI Consulting. All rights reserved.
Why Does Data Matter?
4
![Page 5: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/5.jpg)
©2016 FI Consulting. All rights reserved.
Why Does Data Matter?
5
![Page 6: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/6.jpg)
©2016 FI Consulting. All rights reserved.6
The Art of Data Curation
![Page 7: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/7.jpg)
©2016 FI Consulting. All rights reserved.
Data Lifecycle
Collect Prepare Store Analyze/ Report Decide
7
80% of Time
![Page 8: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/8.jpg)
©2016 FI Consulting. All rights reserved.
Early Stages of Data Analysis
8
C P S A D
C P S A D
C P S A D
Ope
ratio
nsFinance
Compliance
Collect Prepare Store Analyze/ DecideReport
![Page 9: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/9.jpg)
©2016 FI Consulting. All rights reserved.
Centralized Data Management
9
C P S A D
A D
A D
Ope
ratio
nsFinance
Compliance
Collect Prepare Store Analyze/ DecideReport
![Page 10: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/10.jpg)
©2016 FI Consulting. All rights reserved.
Audience Poll #1What framework best represents the current state of your organization’s data analysis?
A) We process data and perform analytics in siloes.
B) Our data is centralized, but business units perform significant additional data cleaning and preparation.
C) Data management is a centralized function.
D) All the data I use is on my desktop.
10
![Page 11: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/11.jpg)
©2016 FI Consulting. All rights reserved.
Challenges Data Proximity and Ownership Complexity vs. Flexibility Continued Duplication of Efforts
11
![Page 12: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/12.jpg)
©2016 FI Consulting. All rights reserved.
Keys to Success Data Governance Transparency Communication
12
![Page 13: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/13.jpg)
©2016 FI Consulting. All rights reserved.
Data Curation in the Age of Big Data
13
![Page 14: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/14.jpg)
©2016 FI Consulting. All rights reserved.
Audience Poll #2Does your organization currently leverage big data?
A) Yes, we use big data in some analysis
B) Yes, we use big data in all of our analysis
C) Yes, but we are just getting started
D) No, we do not currently use big data
14
![Page 15: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/15.jpg)
©2016 FI Consulting. All rights reserved.
Big Data“Datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze”
“The ability of society to harness information in novel ways to produce useful insights or goods and services of significant value” and “…things one can do at a large scale that cannot be done at a smaller one, to extract new insights or create new forms of value.”
15http://www.forbes.com/sites/gilpress/2014/09/03/12-big-data-definitions-whats-yours/#35b0bfc221a9
![Page 16: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/16.jpg)
©2016 FI Consulting. All rights reserved.16
How big is a petabyte?
![Page 17: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/17.jpg)
©2016 FI Consulting. All rights reserved.
Big DataKey enabler of big data is Hadoop
17
“parallelizes large data sets across low‐cost commodity hardware for easy scale and dramatically reduces the cost of petabyte environments.”
http://www.forbes.com/sites/ciocentral/2012/04/16/the-big-cost-of-big-data/#4afd59456a21
![Page 18: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/18.jpg)
©2016 FI Consulting. All rights reserved.
Big Data + Data Curation
18
Hadoop / Unstructured Data
Email DataCall Center
Data
Mobile Clickstream
DataProduct Data
Business User Analytical Stores
Summary & Aggregation Models
BI/Data Visualization Tools
Business Users
Data Scientists & Analysts
R, Hive, Python Learnings from
Data Science
http://analytics-magazine.org/theres-no-such-thing-as-unstructured-data/
![Page 19: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/19.jpg)
©2016 FI Consulting. All rights reserved.
Data CurationThe process of turning independently created data sources (structured and semi‐structured data) into unified data sets ready for analytics, using domain experts to guide the process.
19
![Page 20: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/20.jpg)
©2016 FI Consulting. All rights reserved.20
Why is data curation important?
![Page 21: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/21.jpg)
©2016 FI Consulting. All rights reserved.21
Data is growing in size and complexity
Data in siloes spread throughout organizationTime and money spent on data preparation Need to answer business questions more quickly Self-service analytics
Image: iStock/stefanamer
![Page 22: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/22.jpg)
©2016 FI Consulting. All rights reserved.
Credit risk analyst needs additional metric for analysis Permission to database Different unique identifier for borrower Nuances to data aggregation
22
![Page 23: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/23.jpg)
©2016 FI Consulting. All rights reserved.
With data curation process…. Dataset identified as potentially useful since it has borrower
financial information Borrower unique identifier has been standardized across
database Data dictionary provides information on how data should be
aggregated
23
![Page 24: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/24.jpg)
©2016 FI Consulting. All rights reserved.
Challenges to an effective data curation process
24
![Page 25: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/25.jpg)
©2016 FI Consulting. All rights reserved.
Data Integration Separate business units / legal entities Firms acquiring other firms Data quality from legacy systems
25
![Page 26: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/26.jpg)
©2016 FI Consulting. All rights reserved.26
Curating Golden Source Data
![Page 27: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/27.jpg)
©2016 FI Consulting. All rights reserved.
Data DocumentationEssential part of data governanceMitigates two types of risks:
1. Misinterpret and misuse the data2. Unaware of data availability
![Page 28: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/28.jpg)
©2016 FI Consulting. All rights reserved.
Data DocumentationData Dictionary Plain language definitions Data types (text, numeric, date, etc.) Valid values
Source‐to‐Target Mapping Trace to original source Show transformations Maintain transparency and accuracy
![Page 29: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/29.jpg)
©2016 FI Consulting. All rights reserved.
Audience Poll #3Which statement best describes your use of data dictionaries?
A) Most data sources have data dictionaries that are well‐documented and regularly used
B) We have data dictionaries, but they are not helpful or outdated
C) Data dictionaries do not exist, but would be helpful
D) My work does not require data dictionaries
29
![Page 30: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/30.jpg)
©2016 FI Consulting. All rights reserved.
Data Quality AssuranceFormally defined and documentedPerformed closer to sourceAutomated in ETLs toolsException handlingLeverage Data Dictionaries and STMs
![Page 31: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/31.jpg)
©2016 FI Consulting. All rights reserved.
Data Merging, Enrichment, DeduplicationCombine data from multiple sourcesExact matching by common elementsFuzzy matching No unique identifier (i.e. SSN) Names and addresses Spelling, punctuation, and abbreviations
I. (J. Smith vs. John Smith vs. John D. Smith)
Requires a minimum certainty threshold Remove duplicates
![Page 32: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/32.jpg)
©2016 FI Consulting. All rights reserved.
Feedback Loop for User‐Driven Enhancements Two‐way processEnd users perform high‐value analytics (i.e. data mining, network graphs, clustering, segmentation,
model diagnostics, etc.)
Detect outliers and anomaliesIncorporate changes at source
![Page 33: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/33.jpg)
©2016 FI Consulting. All rights reserved.
Tools For Your Easel
33
![Page 34: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/34.jpg)
©2016 FI Consulting. All rights reserved.
Databases
34
![Page 35: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/35.jpg)
©2016 FI Consulting. All rights reserved.
Programming Languages
35
![Page 36: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/36.jpg)
©2016 FI Consulting. All rights reserved.
ETLs (Extract – Transform – Load)
36
![Page 37: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/37.jpg)
©2016 FI Consulting. All rights reserved.
What’s next?
37
![Page 38: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/38.jpg)
©2016 FI Consulting. All rights reserved.38
Early Stage, Siloed Data Analysis Data Assessment
Moving towards Centralized Data Management Implement Data Curation
Centralized Data ManagementIn Place Data Governance Review
![Page 39: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/39.jpg)
©2016 FI Consulting. All rights reserved.
Q&A
39
![Page 40: 02 Data Webinar Final.pptx [Read-Only] · Does your organization currently leverage big data? A) Yes, we use big data in some analysis B) Yes, we use big data in all of our analysis](https://reader034.vdocument.in/reader034/viewer/2022042302/5ecd4f20bdf3a53aec2faeab/html5/thumbnails/40.jpg)
©2016 FI Consulting. All rights reserved.
Speakers Christina PrevalskySenior [email protected]
Greg SteckSenior [email protected]
Vadim BondarenkoSenior [email protected]
FI Consulting, Inc.1500 Wilson BlvdArlington, VA 2009(571) 255 - 6900 www.ficonsulting.com
40