how to source good data
TRANSCRIPT
Sourcing Good Data10 best practices
Welcome
Why is data quality
important?Our 10 best practices
Agenda:
Data Quality Story
Overbooked 10,000 tickets for event
Manual spreadsheet error
- telegraph.co.uk
Your data has reach…
* Panko and Port, 2012
Inter-departmental
69%
Within
department
31%
42%
Where data from a report is used: % of data in spreadsheets that influences CEO
Just how much of an issue is data quality?
1 in 10 organisations rate their data quality as “excellent”
Poor data quality accounts for 20% of business process costs
$611bn The cost of poor data quality to US companies each year
* Gartner, TDWI
And we want more…
2009 – enough data to fill a stack of DVDs to the moon and back
2020 – Grow by 44x
Less than 1% of available data is analysed
93% of execs believe they are losing revenue as a result of not fully leveraging the information they collect
* IDC, Oracle and EMC
1%
x44 by 2020
What is data quality?
HOW RELIABLEIS YOUR DATA?
TRUSTEDAND
CREDIBLE
Complete
Accurate
Available
Consistent
Why is data quality important?
“It gives us accurate and timely
information to manage our business”
“It supports accountability”
“It ensures the best use of our resources”
“It increases our efficiency”
“It reduces the cost of rework”
“It can increase customer satisfaction”
“It ensures we have the best possible
understanding of our customers and employees”
“It improves the success rate of enterprise initiatives
like Business Intelligence…”
Building high quality “supply chains” of data
MEASUREFOR QUALITY
GET THERIGHT DATA
BE AGILE
Focus on the outcome
Analysis Paralysis
Letting data dictate what is
“important”
Limited time and energy
to focus
1IS
SU
ES
Focus on the outcome1
Start with
the
outcome…
…then the
data.
Focus on
what matters
REC
OM
MEN
DA
TIO
NS
Profile your data2
Data supplier doesn’t know
your data needs
The data you source is as
good as the information
you provide to the
supplier…
ISSU
ES
Profile your data2
Write your data profileStructure, Format, Frequency, Age, Delivery Method
Communicate it to data providers
Opportunity to identify issues and gaps
REC
OM
MEN
DA
TIO
NS
Get as close to the source as possible3
When your source data is somebody else’s
spreadsheet….
Human Error Risk
Unexpected Changes
Additional effort and complexity
Availability of data
ISSU
ES
Get as close to the source as possible3
CAUTION
Be cautious of
manual
spreadsheets
Skip the
spreadsheet as a
source
PLAN
Communicate and
measure for quality
REC
OM
MEN
DA
TIO
NS
Streamline data sources4
Using multiple sources
Redundant data
Increased complexity and quality risk
ISSU
ES
Streamline data sources4
Identify redundant data
Focus on the essentials
Cut out the stuff you don’t need
REC
OM
MEN
DA
TIO
NS
Set data quality expectations5
Perfectionism Burnout
You can’t expect to focus on everythingISSU
ES
Set data quality expectations5
Focus on high impact data
Employ tolerances and ranges for quality and accuracy
REC
OM
MEN
DA
TIO
NS
RELAX(a little)
Catch data quality issues early6
Early
$1
$10
$100
If found in the
middle of the
journey
If found at the end
of the journeyLate
* Total Quality Management
If found at the
start of journey
1-10-100 Rule:
ISSU
ES
Catch data quality issues early6
Implement quality measures near the start of
the data supply chain
Use the “start” as a reference point when
checking data further down the journey
REC
OM
MEN
DA
TIO
NS
Actively measure quality7IS
SU
ES
No simple way to identify if data is correct
Invalid Assumption:
If the data meets our expectations today, it will
going forward
What happens when we do find an issue?
Actively measure quality7
OK
GOOD
NOT GOOD
Define metrics for your data quality
Measure for quality on a consistent basis
Address consistent issues with strategic
solutions (e.g. data cleansing)
REC
OM
MEN
DA
TIO
NS
Expect Change. Embrace It.8
We all know change is coming
Business activity, changes in
strategies and systems
So rigid that you need to “reset”
ISSU
ES
Expect Change. Embrace It.8Li
kelih
oo
d
Impact
L
L
H
H
Focus on high likelihood/impact
changes
Score and rank potential changes
Have a plan in place for high risk items
REC
OM
MEN
DA
TIO
NS
Plan for change9
A change occurs, then what?
Lack of clear policies and rules on who
needs to do what…
Knowledge resting in the minds of key
individuals
ISSU
ES
Plan for change9R
EC
OM
MEN
DA
TIO
NS
CAUTION
In the event
of a change
the following
people will…
Policies and rules Tracking ChangesDocumentation
Controlled human interaction10
Value of human interaction with data…
… at the cost of data quality
Uncontrolled manipulation of data
ISSU
ES
Controlled human interaction10
Avoid uncontrolled manipulation
Facilitate controlled and discrete changes
Make sure it is traceable
REC
OM
MEN
DA
TIO
NS
Recap
1 Focus on the outcome
2 Profile your data
3 Get close to the source
4 Streamline data sources
5 Set data quality expectations
Recap
6 Catch data quality issues early
7 Measure quality
8 Expect and embrace change
9 Plan for change
10 Controlled human interaction
Thank You