how to recognize and fix data faults that can make or...
TRANSCRIPT
![Page 1: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/1.jpg)
Cary S. Shaw & Associates, LLC
How to Recognize
and Fix Data Faults
That Can Make or Break Analytics
Data
Interpretation
Sampling
By Cary S. Shaw & Associates, LLC
For further information contact (203)505-3180
© Copyright Cary S. Shaw 2013
For dissemination, permission required from copyright holder.
![Page 2: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/2.jpg)
Cary S. Shaw & Associates, LLC
Get the Data Quality Right
Business depends on data.
Whether you are:
Requestor
Developer
Producer
Interpreter
Analytics Consultant
Analytics Recipient
![Page 3: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/3.jpg)
Cary S. Shaw & Associates, LLC
We will cover:
Qualify
Sniff, Define, Frequency Check
Interpret
Lurk, Miss, Population
Sample
Methods, Application
Summary
Tools, Lessons
![Page 4: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/4.jpg)
Cary S. Shaw & Associates, LLC
How many products are in company’s
data base?
Marketing client sent request to IT
Detail result provided in Excel.
Marketing client relayed file: Answer: 1,000.
![Page 5: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/5.jpg)
Cary S. Shaw & Associates, LLC
How many products are in company’s
data base?
Marketing client sent request to IT
Detail result provided in Excel.
Marketing client relayed file: Answer: 1,000.
What happened: SQL Navigator timed out.
Real answer: 3,646.
![Page 6: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/6.jpg)
Cary S. Shaw & Associates, LLC
Spreadsheet data
received in this form.
Business Name Street City State Zip Phone
xxxxxxxxxx xxxxxxxxxx xxxxx xx xxxxx (xxx)xxx-xxxx
xxxxxxxxxx xxxxxxxxxx xxxxx xx xxxxx (xxx)xxx-xxxx
xxxxxxxxxx xxxxxxxxxx xxxxx xx xxxxx (xxx)xxx-xxxx
xxxxxxxxxx xxxxxxxxxx xxxxx xx xxxxx (xxx)xxx-xxxx
xxxxxxxxxx xxxxxxxxxx xxxxx xx xxxxx (xxx)xxx-xxxx
xxxxxxxxxx xxxxxxxxxx xxxxx xx xxxxx (xxx)xxx-xxxx
xxxxxxxxxx xxxxxxxxxx xxxxx xx xxxxx (xxx)xxx-xxxx
xxxxxxxxxx xxxxxxxxxx xxxxx xx xxxxx (xxx)xxx-xxxx
xxxxxxxxxx xxxxxxxxxx xxxxx xx xxxxx (xxx)xxx-xxxx
xxxxxxxxxx xxxxxxxxxx xxxxx xx xxxxx (xxx)xxx-xxxx
xxxxxxxxxx xxxxxxxxxx xxxxx xx
![Page 7: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/7.jpg)
Cary S. Shaw & Associates, LLC
Assume data is in error.
Search for errant values.
Frequency tables
Outliers - Confidence intervals
Outliers - One variable graphed/estimated against
another
Referential Integrity (cross table connections within an
integrated database)
Sniff test
Don’t dismiss data if it is different; only if it is wrong.
![Page 8: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/8.jpg)
Cary S. Shaw & Associates, LLC
How unique are our customer phone
numbers?
Are we really reaching the person / business we intend?
Are we alienating by multiple calls?
Are we missing multiple clients?
Practical question: How frequently does the same phone
number appear on our customer contact file?
![Page 9: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/9.jpg)
Cary S. Shaw & Associates, LLC
Exploring Dividend Yields
Found company with 7.2 % dividend yield
What’s going on?
![Page 10: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/10.jpg)
Cary S. Shaw & Associates, LLC
Could a ‘9’ or ‘0’
mean something else?
![Page 11: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/11.jpg)
Cary S. Shaw & Associates, LLC
How do we want to select business
vs. home prospects?
H = Home
B = Business
And how do we want to convert this to a numeric
factor?
![Page 12: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/12.jpg)
Cary S. Shaw & Associates, LLC
Select B (business) or Reject H
(home)? Different impact.
Other values have impact
Is field up-to-date?
Is client (blank) different
from proprietary (blank)?
Is (blank) or misspelling a
proxy for other
characteristics?
How about Duns Industry
sector?
Business, Home Frequency
(blank) 98,947
B 5,032
b 159
Business 49
Busn 1
H 24,891
h 475
Home 63
Hume 1
Hme 11
![Page 13: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/13.jpg)
Cary S. Shaw & Associates, LLC
Riddles
Product Usage a predictor of retention
Optimum student loan amount
Effect of sales channel on retention
Longevity dependency on location
![Page 14: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/14.jpg)
Cary S. Shaw & Associates, LLC
Fitting a line: effect of product usage on retention
Large database
![Page 15: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/15.jpg)
Cary S. Shaw & Associates, LLC
Effect of Product Usage (Line fitted to
data points) Effect of Usage on Cancel Rate??
0 100 200 300 400 500
Meter Usage
Can
cel R
ate
Cancel Rate
Linear (Cancel Rate)
Cancel Rate = f(Usage) = constant
![Page 16: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/16.jpg)
Cary S. Shaw & Associates, LLC
Is ‘zero’ really ‘missing’? Impact
on bining and fitting. Effect of Usage on Cancel Rate??
0 100 200 300 400 500
Meter Usage
Can
cel R
ate
Cancel Rate
Linear (Cancel Rate)
![Page 17: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/17.jpg)
Cary S. Shaw & Associates, LLC
Student loans and loan delinquency
![Page 18: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/18.jpg)
Cary S. Shaw & Associates, LLC
Do small loans result in high
delinquencies? Dismissal factor? Effect of Loan Amount on Delinquency
0.0000%
1.0000%
2.0000%
3.0000%
4.0000%
5.0000%
6.0000%
7.0000%
2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000 24000 26000
Loan Amount
De
lin
qu
en
cy
Rate
.
Loans in Collection
![Page 19: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/19.jpg)
Cary S. Shaw & Associates, LLC
1 out of 10 in the original
population
![Page 20: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/20.jpg)
Cary S. Shaw & Associates, LLC
Looks like 1 out of 4 in the
remaining population
![Page 21: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/21.jpg)
Cary S. Shaw & Associates, LLC
Watch the whole population of interest
(affects the denominator) Effect of Loan Amount on Delinquency
0.0000%
1.0000%
2.0000%
3.0000%
4.0000%
5.0000%
6.0000%
7.0000%
2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000 24000 26000
Loan Amount
De
lin
qu
en
cy
Rate
.
Loans in Collection
Original Loans in Collection
![Page 22: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/22.jpg)
Cary S. Shaw & Associates, LLC
Where you live can affect your life span.
Connecticut ?
![Page 23: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/23.jpg)
Cary S. Shaw & Associates, LLC
Where is Mortality Lowest? Mortality (Prob of Dying Next Yr) As % of U.S. Rate
Alaska............, 57.0
Connecticut......., 102.5
50.0
60.0
70.0
80.0
90.0
100.0
110.0
Reported Mortality Index
Ind
ex (
US
= 1
00)
Alaska............
Connecticut.......
![Page 24: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/24.jpg)
Cary S. Shaw & Associates, LLC
Is Connecticut Mortality Above Avg? Connecticut Mortality (Prob of Dying Next Yr) As % of U.S. Rate
Reported Total
Under 1
1-4
5-14 15-24
25-34
35-44
45-54
55-64
65-74
75-84
85 years
50.0
60.0
70.0
80.0
90.0
100.0
110.0
Age Group
Ind
ex (
US
=100)
Connecticut.......
![Page 25: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/25.jpg)
Cary S. Shaw & Associates, LLC
Compare with Alaska by Age Mortality (Prob of Dying Next Yr) As % of U.S. Rate
Under 1
1-4
5-14
15-24
25-34
35-44
45-54
55-64
65-7475-84
85 years
Under 1
1-4
5-14 15-24
25-34 35-4445-54 55-64
65-7475-84
85 years
0.0
20.0
40.0
60.0
80.0
100.0
120.0
140.0
160.0
180.0
200.0
Age Group
Ind
ex (
US
=1
00
)
Alaska............
Connecticut.......
![Page 26: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/26.jpg)
Cary S. Shaw & Associates, LLC
Compare Observed with Age-Adjusted Mortality Rates by State as Percent of U.S.
40.0
60.0
80.0
100.0
120.0
140.0
160.0
180.0
50.0 60.0 70.0 80.0 90.0 100.0 110.0 120.0 130.0 140.0 150.0
Reported Mortality Index
Ag
e A
dju
ste
d M
ort
ality
In
dex
Series1
Linear (Series1)
Alaska
CT
![Page 27: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/27.jpg)
Cary S. Shaw & Associates, LLC
Compare Rankings State Ranks Compared
0
13
26
39
52
0 13 26 39 52
Reported State Rank
Ad
jus
ted
Sta
te R
an
k
RankAdjusted
Linear (RankAdjusted)
Alaska
CT
![Page 28: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/28.jpg)
Cary S. Shaw & Associates, LLC
For full table identifying mortality rate in each state,
contact Cary S. Shaw & Associates, LLC
![Page 29: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/29.jpg)
Cary S. Shaw & Associates, LLC
Riddles
Product Usage a predictor of retention
Optimum student loan amount
Effect of sales channel on retention
Longevity dependency on location
![Page 30: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/30.jpg)
Cary S. Shaw & Associates, LLC
Lessons
Separate cause and effect by chronology.
Find the initial populations under study.
Distinguish between missing and zero values.
Search for significant lurking variables.
Creatively test your own ideas.
![Page 31: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/31.jpg)
Cary S. Shaw & Associates, LLC
Why Sample?
Examine large databases
Rich on detail, economical on resources
Check own analytics
Cross check IT
Cross check vendors
May enable GREATER accuracy because
Enables correction of representative points
Enables understanding of representative points
![Page 32: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/32.jpg)
Cary S. Shaw & Associates, LLC
Which sample is most
representative?
Beginning Middle
![Page 33: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/33.jpg)
Cary S. Shaw & Associates, LLC
Which sample is most
representative?
Beginning Middle Random
![Page 34: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/34.jpg)
Cary S. Shaw & Associates, LLC
Which sample is most
representative?
Beginning Middle Random Stratified Random
![Page 35: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/35.jpg)
Cary S. Shaw & Associates, LLC
Stratified Random is making sure
the sample is representative.
Method A:
Divide data into strata (categories, deciles, etc.)
Pick target number from each strata
Method B:
Sort file by factors known to be important (such as
income, age, even ID number)
Pick every n th record, starting with record n/2
![Page 36: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/36.jpg)
Cary S. Shaw & Associates, LLC
What happens when you match
samples?
How big is the resulting sample?
Year 1 Year 2 Year 3
10% 10% 10% = 0.1%
Customer Product Sales$ Supplies A/R
10% 10% . 10% 10% 10%
= 0.001%
![Page 37: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/37.jpg)
Cary S. Shaw & Associates, LLC
There’s a way to take samples that
match even as the data is changing
(don’t lose by inner joins)
![Page 38: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/38.jpg)
Cary S. Shaw & Associates, LLC
Examples of Record IDs
Social Security Number
Credit Card Number
Account ID
Develop algorithm to transform Record ID into
Random Number.
Then select from each file based on Random Number
range.
Samples from different files will then match.
![Page 39: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/39.jpg)
Cary S. Shaw & Associates, LLC
Example: Duns File Update
How did NAICS code change?
Jan NAICS = April NAICS
Jan NAICS <> April NAICS
No January Record
No April Record
![Page 40: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/40.jpg)
Cary S. Shaw & Associates, LLC
Example: Duns File Update
How did NAICS code change?
Jan NAICS = April NAICS 1006
Jan NAICS <> April NAICS 0
No January Record 0
No April Record 0
![Page 41: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/41.jpg)
Cary S. Shaw & Associates, LLC
Example: Duns File Update
How did NAICS code change?
Jan NAICS = April NAICS 829
Jan NAICS <> April NAICS 107
No January Record 64
No April Record 70
![Page 42: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/42.jpg)
Cary S. Shaw & Associates, LLC
We covered:
Qualify
Sniff, Define, Frequency Check
Interpret Lurk, Miss, Population
Sample
Methods, Application
Summary Tools, Lessons
![Page 43: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/43.jpg)
Cary S. Shaw & Associates, LLC
Lessons
Perform the sniff test.
Check number of records.
Do frequency counts (highest, lowest).
Look for missing values (distinguish from zeros and
duplicates).
Investigate lurking interactive variables.
Analyze what happened to original populations, not
final populations.
Examine representative samples (stratified preferred).
Assume errors and look for them. (Apply G.I.V.O.).
![Page 44: How to Recognize and Fix Data Faults That Can Make or ...nymetro.chapter.informs.org/prac_cor_pubs/01-2013 Cary...Analytics Consultant Analytics Recipient Cary S. Shaw & Associates,](https://reader034.vdocument.in/reader034/viewer/2022051808/600c61911435d5129d422582/html5/thumbnails/44.jpg)
Cary S. Shaw & Associates, LLC
- FIN -
Avoid the pitfalls of found
data and reach for the Stars
For further information contact:
Cary S. Shaw & Associates, LLC
(203)505-3180