common sense validation using sas lisa eckler lisa eckler consulting inc. tass interfaces, december...

54
Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Upload: toby-rice

Post on 21-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

Lisa Eckler Lisa Eckler Consulting Inc.

TASS Interfaces, December 2015

Page 2: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

• Holistic approach• Allocate most effort to what’s most

important• Avoid or automate repetitive tasks• Ask ourselves the right questions

Page 3: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

Defining terms: QA

Data quality assurance is the process of profiling the data to discover inconsistencies, and other anomalies in the data and performing data cleansing activities to improve the data quality.

– Wikipedia

Page 4: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

Defining terms: Verification

Verification is the act of reviewing, inspecting, testing, etc. to establish and document that a product, service, or system meets the regulatory, standard, or specification requirements.

Does it meet the structural requirements? Is it complete?

Page 5: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

Defining terms: Validation

Validation refers to meeting the needs of the intended end-user or customer. 

Does it answer the user’s question?Does it meet all of the needs?

Structure and completeness, data integrity, appropriateness

Page 6: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

– Pablo Picasso

“Computers are useless. They can only give you answers.”

Page 7: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

How do I know if I got it right?

Page 8: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

Is Validation a programming task?

Yes – mostly

The routine parts can and should be automated and repeatable

That leaves more resources for the parts which require human attention

Page 9: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

PROC COMPARE PROC CONTENTS PROC CONTENTS with compare (using

PROC COMPARE or TRANSPOSE, MERGE and flag) PROC FREQ +/- PROC FORMAT PROC SUMMARY PROC SUMMARY + compare (using PROC COMPARE or TRANSPOSE, MERGE and flag)

Page 10: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

PROC COMPARE PROC CONTENTS PROC CONTENTS with compare (using

PROC COMPARE or TRANSPOSE, MERGE and flag) PROC FREQ +/- PROC FORMAT PROC SUMMARY PROC SUMMARY + compare (using PROC COMPARE or TRANSPOSE, MERGE and flag)

Wrap a macro around this and you have a

flexible, re-usable tool!

Page 11: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

Does this mean writing more SAS code after I thought I was finished writing SAS code?

• Yes… and no • We can save time and improve the quality of results by using

code that isn’t part of the final program. • Don’t think of it as disposable, though: this code can be set up

once and saved to use for all future validation efforts.

Additional benefits• Automated validation provides a log• Easily repeatable

Page 12: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

What are the questions?• Should this be a replication of something I have

seen before? If not, is it similar to something I’ve done before?

• Is it – or some part of it – supposed to be different from anything I’ve seen before?

• Is the result packaged properly?

Page 13: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

Mantra for Validation• Check your assumptions• Confirm similarities• Focus on differences

Page 14: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

How is this result expected to compare with what we’ve seen before?

Entirely different Some overlap Complete overlap Subset

Page 15: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

Some possibilities – not an exhaustive list!

Page 16: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

** This is the simplest form of **;** comparison between two sets of data **;proc compare        compare = SHOES        base  = OLD_SHOES;run;

Page 17: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

Page 18: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

** PROC CONTENTS gives us metadata **;proc contents           data = OLD_SHOES;run;

Page 19: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

Page 20: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

** CONTENTS with select facts saved to **;** a data set --> a table of metadata **;proc contents           data = OLD_SHOES        out = CONTENTS_OLD_SHOES

(keep=name type length);run;

Page 21: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

** Same as previous slide except for the **;** new data set **;proc contents           data = NEW_SHOES        out = CONTENTS_NEW_SHOES

(keep=name type length);run;

Page 22: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

** Comparing metadata tables rather than **;** data tables **;proc compare        compare =  CONTENTS_OLD_SHOES        base  = CONTENTS_NEW_SHOES;run;

  

Page 23: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

 

Page 24: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

 

Page 25: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

proc contents        data = OLD_SHOES     out = CONTENTS1(keep=name type length);run;proc contents        data = NEW_SHOES     out = CONTENTS2(keep=name type length);run; proc compare       compare = CONTENTS1 base = CONTENTS2;run;

 

Page 26: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

%macro COMPARE_STRUCTURE1;proc contents         data = OLD_SHOES     out = CONTENTS1(keep=name type length);run; proc contents        data = NEW_SHOES     out = CONTENTS2(keep=name type length);run;proc compare      compare = CONTENTS1 base = CONTENTS2;run;%mend COMPARE_STRUCTURE1; %COMPARE_STRUCTURE1; 

Page 27: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

%macro COMPARE_STRUCTURE(DS1,DS2);proc contents           data = &DS1        out = CONTENTS1(keep=name type length);run; proc contents           data = &DS2        out = CONTENTS2(keep=name type length);run;proc compare     compare = CONTENTS1 base = CONTENTS2;run;%mend COMPARE_STRUCTURE; %COMPARE_STRUCTURE(OLD_SHOES, NEW_SHOES); 

 

Page 28: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

%macro COMPARE_STRUCTURE(DS1,DS2);proc contents           data = &DS1        out = CONTENTS1(keep=name type length);run; proc contents           data = &DS2        out = CONTENTS2(keep=name type length);run;proc compare     compare = CONTENTS1 base = CONTENTS2;run;%mend COMPARE_STRUCTURE; %COMPARE_STRUCTURE(OLD_SHOES, NEW_SHOES); 

 

Page 29: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

** We've just built a generic tool for comparing **;** the STRUCTURE of any two SAS data sets **;

%COMPARE_STRUCTURE( <any SAS data set name>, <any other SAS data set name> );

Page 30: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS Reasonableness: complete overlap

 

Page 31: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS Reasonableness: complete overlap

“_character_” gives the list of ALL vars in the table with data type character, which may include some vars with too many values

Page 32: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS Reasonableness: complete overlap

This code also gives a list of ALL vars in the table with data type character

Page 33: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS Reasonableness: complete overlap

The above code lets us customize our list to exclude non-categorical character columns and include the others

Page 34: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS Reasonableness: complete overlap

Page 35: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS Reasonableness: complete overlap

Page 36: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS Reasonableness: complete overlap

Similar to the way we compared the structure of two tables, we can compare the frequency counts of values in two tables

Page 37: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

proc compare  compare = OLD_SHOES

base = NEW_SHOES;run;

Judicious use of unrestricted PROC COMPARE -- after confirming reasonableness 

Data correctness: complete overlap

Page 38: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

If we are expecting a result that is a complete replication of something that already exists• Confirm that the structure is identical• Confirm that the data is the same at a high

level• Confirm that the data is the same at a

detailed level

fully automated

Page 39: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

What if we don’t have an existing results table to compare to?• Similar SAS data in an existing table or produced by

someone else?• Similar data in some other format that can be imported

into SAS for comparison?• Do we have a data requirements document?• The truly original data will require much greater

attention to validation and the involvement of a subject matter expert

Data correctness: completely new

Page 40: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS Packaging: completely new

Page 41: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

Assuming we have a Requirements “document”…• Import REQUIREMENTS into SAS data set• run PROC CONTENTS on new data set to get

CONTENTS_NEW_SHOES• run PROC COMPARE, comparing

CONTENTS_NEW_SHOES to REQUIREMENTS

OR• Join REQUIREMENTS with CONTENTS_NEW_SHOES

and flag non-matching rows

Packaging: completely new

Page 42: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS Packaging: completely new

Page 43: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS Packaging: completely new

Page 44: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

 

Reasonableness: completely new

Page 45: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

 

Reasonableness: completely new

Page 46: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

 

Page 47: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

What if part of our result should be the same as an existing result but there should be some differences?• Treat it as a hybrid and split the validation

exercise into two parts• Expected same (by rows, columns,

data or metadata)• Expected different (by rows, columns,

data or metadata)

Page 48: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

• For each of the two parts• Confirm (expected) similarities• Focus efforts on (expected) differences• Run the validation procedures we’ve alreay looked

at as appropriate for the “same” and “different” aspects

Page 49: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

Recall the scenario where our data sets should be identical

record_id a b c1 * * *2 * * *3 * * *

record_id a b c1 * * *2 * * *3 * * *

Page 50: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

record_id a b c1 * *2 * *3 * *

record_id a b d1 * *2 * *3 * *

When some columns should be the same

50

Page 51: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

record_id a b c1 * *2 * *3 * *

record_id a b d1 * *3 * *4

When some “cells” (parts of rows and columns) should be the same

Page 52: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

Review:

Page 53: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

Summary:

Ask the right questions Confirm similarities with known things – quickly and

programmatically – then focus time and effort on validating “unknown” or new things

Basic base SAS procedures for validation vary the technique based on how much is similar/different from

what you’ve validated previously and what types of data are involved

Page 54: Common Sense Validation Using SAS Lisa Eckler Lisa Eckler Consulting Inc. TASS Interfaces, December 2015

Common Sense Validation Using SAS

You can find my related conference papers at www.lexjansen.com

 • Don’t Forget About Small Data (SESUG 2015)• When Good Looks Aren’t Enough (NESUG 2009)

If you have comments or questions…

[email protected]