defensive programming 2013-03-18

42
Defensive Programming Laura Schild 3/18/2013

Upload: laura-a-schild

Post on 14-Aug-2015

120 views

Category:

Documents


1 download

TRANSCRIPT

• Defensive programming is an approach to improve source code, in terms of:– General quality - Reducing the number of software

bugs and problems.– Making the source code comprehensible - the

source code should be readable and understandable.

– Making the software behave in a predictable manner despite unexpected inputs or user actions.

Defensive Programming

• Don’t trust your data• Don’t trust that your data hasn’t or won’t

change over time• Don’t trust your programming skills• Don’t trust that you will always look at your

logs/reports as well as you should

Defensive Programming

• Weigh need for accuracy with need to complete the project within the allowed time.

• Add additional defensive logic for programs that will be run multiple times (e.g. monthly intervention)

• or by multiple sites.

Don’t trust your data – Really get to know your data

• Frequencies• Means• Detail reports• Record keys / duplicate records

Frequencies: Checking Percent Missing

• Example: You want to know the percent of rx records w/ a missing GPI. You could run a frequency with the missing option, but that results in 58 pages of output.

• Solution: Run a proc frequency with a format.

proc format;value $missing

' ' = 'Missing'other = 'Valued';

run;

proc freq data=&_vdw_rx;where year(rxdate) = 2013; tables KPGA_GPI

/ missing;format KPGA_GPI $missing.;

run;

Creating Error Messages Based on Frequencies

proc freq data=&_vdw_rx;where year(rxdate) = 2013; tables KPGA_GPI/ missing out=rx2013_freq;format KPGA_GPI $missing.;

run;

data _null_;set rx2013_freq;where KPGA_GPI = ' ';if percent > 1.0 then put 'ERROR: GPI is missing on more than 1% of the Rx records.';

run;

ERROR: GPI is missing on more than 1% of the Rx records.NOTE: There were 1 observations read from the data set WORK.RX2013_FREQ. WHERE KPGA_GPI=' ';NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds

Checking for Duplicates• If you believe your file should

only have one record per member (or some other key), check to make sure that is true.

• You could do a simple proc sort with the nodupkey option and check the log to verify if any records were eliminated.

• It might be better, however, to create an error report.

• Example: Can a member have more than one rx record for the same drug with the same fill date?

proc sql;create table Rx2012 asselect a.*from&_vdw_rx as awhere year(rxdate) = 2012;

quit;

proc sort data=Rx2012; by mrn rxdate kpga_chem_name;

run;

data dup;set Rx2012; by mrn rxdate kpga_chem_name;

if first.kpga_chem_name and last.kpga_chem_name then delete;

run;

proc print data=dup (obs=20);by mrn rxdate kpga_chem_name;id mrn rxdate kpga_chem_name;var rxsup ndc kpga_source_file_code;title 'Duplicate Rx';

run;

Checking for Duplicates (Example Output)

Duplicate Rx KPGA_SOURCE_ MRN RXDATE KPGA_CHEM_NAME RXSUP NDC FILE_CODE xxxxxx01 04/05/2012 METRONIDAZOLE 7 00179148214 2 7 50111033401 5 xxxxxx02 06/22/2012 SULFAMETHOXAZOLE-TRIMETHO 3 00179195814 2 3 53489014605 5 xxxxxx05 06/14/2012 CYCLOBENZAPRINE HCL 10 00179005730 2 10 59746017710 5 xxxxxx07 02/27/2012 CIPROFLOXACIN HCL 5 00172531260 5 5 00179198214 2 xxxxxx08 12/03/2012 FAMOTIDINE 90 00172572960 5 90 00179147160 2 xxxxxx09 05/17/2012 INSULIN SYRINGE/NEEDLE U- 30 08290328438 2 30 08290328468 2 xxxxxx10 07/10/2012 METFORMIN HCL 30 00179197960 2 30 65862000805 5

Don’t trust that your data hasn’t or won’t change over time

Example 1: Verify Data Is Current*-------------------------------------------------------------------------------------;* Use PROC CONTENTS to check if VDW file is current. If not, print error message to ;* log, and abort the program.

;*-------------------------------------------------------------------------------------;%include 'P:\SAS\Scripts\ActivateSASAce.sas';

options obs=1;proc contents data=&_vdw_rx noprint out=rx_contents (keep=CRDATE MODATE);run;data _null_;

set rx_contents;if datepart(crdate) < (today()-1) then do;

put "ERROR: VDW Rx File was last updated on " crdate;abort abend;end;

run;endrsubmit;

<LOG>12982 put "ERROR: VDW Rx File was last updated on " crdate;12983 abort abend;12984 end;12985 run;

ERROR: VDW Rx File was last updated on 04FEB13:05:27:38ERROR: Execution terminated by an ABORT statement at line 12983 column 17, it specified the ABEND option.

Example 2: Missing Data SourcePATIENT Study - No External Pharmacy Claims

%macro Checkobs(data=,where=);%global obs;

proc sql noprint;select count(*) into :obs from &data&where;

quit;

%if &obs eq 0 %then%put WARNING: *** No obs in &data &where ;%else %put &data has %trim(%left(&obs)) observation(s) &where ;

%mend;

%checkobs(data=&_vdw_rx,where=where KPGA_SOURCE_FILE_CODE in ('1','3') and RXDATE ge '01Jan2013'd);

WARNING: *** No obs in __vdw3.rx where KPGA_SOURCE_FILE_CODE in ('1','3') and RXDATE ge '01Jan2013'd

Compare Two Frequencies*---------------------------------------------------;* Compare Frequencies for Two Files ;*---------------------------------------------------;%macro compareFreq (file1, file2, var, fmt, percent);

proc freq data=&file1;tables &var

/ missing out=freq_&file1;format &var &fmt;

run;

proc freq data=&file2;tables &var

/ missing out=freq_&file2;format &var &fmt;

run;

Compare Two Frequencies (continued)data combine (keep=&var percent1 percent2 PctDiff message);

merge freq_&file1 (in=a rename=(percent=percent1)) freq_&file2 (in=b rename=(percent=percent2)); by &var;

length variable $30;if "&fmt" = ' ' then variable = &var;else variable = put(&var,&fmt);

length message $70;PctDiff = percent1 - percent2;

if a and not b then do; message = "Warning: No records for &var = "||variable; put "Warning: &file2 does not contain any records where &var = " variable; end;else if b and not a then do; message = "Warning: New value for &var: "||variable; put "Warning: &file2 contains the following new value for &var = " variable; end;

.

Compare Two Frequencies (continued)else if a and b then do; if PctDiff > &percent then do; message = "Warning: Decreased "||PctDiff||" %"; put "Warning: Decreased percentage of records on &file2 where &var = "

variable ". % on &file1 was " percent1 ". % on &file2 was " percent2;

end; if PctDiff < -&percent then do; message = "Warning: Increased "||PctDiff||" %"; put "Warning: Increased percentage of records on &file2 where &var = "

variable ". % on &file1 was " percent1 ". % on &file2 was " percent2;

end; end;

run;

proc print data=combine;var &var percent1 percent2 PctDiff message;title "Comparion of Values for &var on &file1 and &file2";

run;%mend compareFreq;

%compareFreq (rx_201102,rx_201302,KPGA_SOURCE_FILE_CODE,$rx_src.,5);%compareFreq (rx_201201,rx_201302,KPGA_SOURCE_FILE_CODE,$rx_src.,3);

Compare Two Frequencies (continued)%compareFreq (rx_201102,rx_201302,KPGA_SOURCE_FILE_CODE,$rx_src.,5);

Warning: rx_201302 does not contain any records where KPGA_SOURCE_FILE_CODE = 1 PRESC_CLAIMSWarning: rx_201302 does not contain any records where KPGA_SOURCE_FILE_CODE = 3 PRESC_CLAIMS_HTWarning: rx_201302 contains the following new value for KPGA_SOURCE_FILE_CODE = 5 PRESC_CLMS_MI

%compareFreq (rx_201201,rx_201302,KPGA_SOURCE_FILE_CODE,$rx_src.,3);

Warning: Increased percentage of records on rx_201302 where KPGA_SOURCE_FILE_CODE = 2 PRESC_FILLS . % on rx_201201 was 82.701184026 . % on rx_201302 was 86.376546479

Warning: Decreased percentage of records on rx_201302 where KPGA_SOURCE_FILE_CODE = 5 PRESC_CLMS_MI . % on rx_201201 was 17.298815974 . % on rx_201302 was 13.623453521

Example 3: Use ‘Other’ Option for PROC FORMAT

PATIENT Study - Phonetic Drug Namesproc format;

value $phonetic

'LISINOPRIL' = 'ly-sino-pril‘'ENALAPRIL MALEATE' = 'e-nalo-pril‘'CAPTOPRIL'= 'cap-TOE-pril‘'RAMIPRIL' = 'ram-opril‘'BENAZEPRIL HCL' = 'BUH-Nasal-pril‘'LOSARTAN POTASSIUM‘ = 'low-sar-tan‘'LOSARTAN POTASSIUM & HYDR' = 'Combination -- low-sar-tan -- WITH -- HyDroCloro-thigh-uh-zide‘'SIMVASTATIN' = 'simva-statin‘'LOVASTATIN' = 'lova-statin‘'PRAVASTATIN SODIUM‘ = 'prava-statin‘'EZETIMIBE-SIMVASTATIN‘ = 'vy-torin‘'ATORVASTATIN CALCIUM‘ = 'lip-it-tor‘'ROSUVASTATIN CALCIUM‘ = 'crest-tor‘'FLUVASTATIN SODIUM‘ = 'Less-call‘'NIACIN-LOVASTATIN' = 'Add-vih-core‘'NIACIN-SIMVASTATIN‘ = 'Sim-core‘'PITAVASTATIN CALCIUM‘ = 'Liv-VAH-lo‘ other = 'Error‘;

run;

Example 3: PATIENT StudyPhonetic Drug Names (continued)

data phonetic error ;set drugs_2012;

phonetic = put(KPGA_CHEM_NAME,$phonetic.);if phonetic = 'Error' then output error;else output phonetic;

run;

proc freq data=error;tables KPGA_CHEM_NAME

KPGA_GPI/ missing;

title 'Generic Drug Name Not On Phonetics Table';run;

Example 3: PATIENT StudyPhonetic Drug Names (continued)

Generic Drug Name Not On Phonetics

The FREQ Procedure

CHEM_NAME

Cumulative CumulativeKPGA_CHEM_NAME Frequency Percent Frequency Percent

------------------------------------------------------------------------------LISINOPRIL & HYDROCHLOROT 33122 100.00 33122 100.00

GPI

Cumulative CumulativeKPGA_GPI Frequency Percent Frequency Percent

-------------------------------------------------------------------36991802550305 8782 26.51 8782 26.51

36991802550310 12985 39.20 21767 65.7236991802550320 11355 34.28 33122 100.00

Don’t Trust Your Programming Skills

• No matter how easy you think the program code was, you should check your work.

• Example: On the CRANE study, I needed to flag anyone who died within 60 days after the index date. I wrote: if deathdt < (indexdt + 60) then dflag = 1;

• Just by running a simple frequency on dflag, I would have quickly realize that my logic forgot to consider that missing is always less than a value.

Select Random Sample of ID’s to Testand Create Format

proc surveyselect data=summary method=srs n=5 out=SampleSRS;run;

data sample;set SampleSRS;

start = mrn;label = 'keep';fmtname = 'testids';type = 'c';

run;

proc format cntlin=sample;run;

Use Proc Format to Print Test Recordsproc print data=summary;

where put(mrn,$testids.) = 'keep';var mrn rxdate_1st rxdate_last drugs fills;title 'Drug Summary for Randomly Selected IDs';

run;

proc print data=drugs_2012;where put(mrn,$testids.) = 'keep';by mrn;id mrn;var KPGA_CHEM_NAME rxdate rxsup;title 'Drug Detail for Randomly Selected IDs';

run;

Use Proc Format to Print Test RecordsDrug Summary for Randomly Selected IDs

Obs MRN rxdate_1st rxdate_ last drugs fills

98AAAAAAAA 01/18/2012 11/05/2012 1 412416 BBBBBBBB 04/23/2012 12/17/2012 2 6

Drug Detail for Randomly Selected IDs

MRN KPGA_CHEM_NAME RXDATE RXSUP

AAAAAAAA LISINOPRIL & HYDROCHLOROT 01/18/2012 90 LISINOPRIL & HYDROCHLOROT 04/25/2012 90 LISINOPRIL & HYDROCHLOROT 07/25/2012 90 LISINOPRIL & HYDROCHLOROT 11/05/2012 90

BBBBBBBB LISINOPRIL 04/23/2012 90 LISINOPRIL 09/11/2012 90 LISINOPRIL 12/17/2012 90

SIMVASTATIN 06/27/2012 90 SIMVASTATIN 09/11/2012 90 SIMVASTATIN 12/17/2012 90

Automate Dates When Possible

If your program includes date logic involving a date that might change over time, you can:

Bad: Hard-code the date(s) in the program. You then need to remember to update the date(s) each time the program is run and/or trust other sites to update the date(s) correctly for their site.

Better: Define macro variables at the beginning of the program. You still need to remember to update the macro variables each time the program is run, but at least they are at the top of the code where you can easily find them and are more likely to remember to update them.

Best: Automate the calculation of the date(s) if possible based on the run date or a date in your data.

Date Automation Example

• You want to include the most recent year of complete utilization data. You believe the last 3 months of utilization data might not be as complete as needed.

• Step #1: Determine the last date in the utilization file:

proc sql noprint; select distinct

("'"||put(max(adate),date9.)||"'d" ) into :vdw3max from &_vdw_utilization ;

quit;

Date Automation Example (continued)

• Step #2: Calculate the extract start and end dates based on the utilization max date and create macro variables:

data _null_;call symput('lastdate' ,put(intnx('month',&vdw3max, -3,'end' ),date9.));call symput('firstdate',put(intnx('month',&vdw3max,-14,'beginning'),date9.));

run;

Date Automation Example (continued)

• Step #3: Use the macro variable dates in your program:

proc freq data=&_vdw_utilization;where "&firstdate"d <= adate <= "&lastdate"d;tables adate / nocum;format adate yymmd7.;title1 "VDW Utilization Extract";title2 "VDW Max Date = &vdw3max";title3 "Extract Start Date = &firstdate";title4 "Extract End Date = &lastdate";

run;

VDW Utilization ExtractVDW Max Date = '31DEC2012'dExtract Start Date = 01OCT2011Extract End Date = 30SEP2012

The FREQ Procedure

ADATE Frequency Percent

--------------------------------2011-10 280700 8.202011-11 272056 7.952011-12 268559 7.842012-01 290088 8.472012-02 289494 8.462012-03 306967 8.972012-04 279971 8.182012-05 298172 8.712012-06 281735 8.232012-07 276801 8.082012-08 306963 8.972012-09 272342 7.95

Don’t Trust that You will Always Check Your Logs as Well as You Should

Save Your Log

FILENAME MyLOG "G:\YourPath\YourLogName_&sysdate..LOG";

PROC PRINTTO LOG=MyLOG NEW;RUN;

* . . . Your SAS Code . . .;

PROC PRINTTO;RUN;

Extract Error and Warning Messages from LOGs

data logerrors (keep=type message where=(message ne ''));

retain noerr 0;length string1 Message $200. Type $10.;infile MyLog pad length=len missover end=eof;input @01 string1 $varying200. len;

if substr(string1,1,5) ='ERROR' then do;message=strip(string1);type='ERROR';noerr+1;

end;

Extract Error and Warning Messages from LOGs (continued)

if substr(string1,1,7) = 'WARNING' then do;message=strip(string1);type='WARNING';noerr+1;

end;

if eof then do;if noerr=0 then do;

message='NO MESSAGES OF CONCERN IN LOG';type='NONE';

end;end;

run;

Create an Error Reportproc tabulate

data=logerrors;class type;table type*n;

run;

proc tabulate data=logerrors;where type = 'ERROR';class type message;table message all, type*n;

run;

Send an Automatic Email using SAS

Basic Syntax to Send Email

• To include record counts:%let dsid = %sysfunc(open(deceased,i));%let nobsx = %sysfunc(attrn(&dsid,NOBS));

• Assign Filename:FILENAME outmail EMAIL

SUBJECT= "PATIENT Study: Recently Deceased Members"FROM = "[email protected]"TO = "[email protected]"CC = ("[email protected]" "[email protected]" "[email protected]");

Data Step to Compose Email

DATA _NULL_;FILE outmail;

PUT "We identified &nobsx recently deceased study participant(s), see:";PUT ;PUT " ...\PATIENT\Mailings\&mailingMonth.\Source Data\Deceased_&SYSDATE..htm";PUT ;

RUN;

%let rc=%sysfunc(close(&dsid));

Including an Attachment

• If the SAS output contains any HIPAA Protected Health Information (PHI), think about whether or not sending an email attachment is the best way to distribute.

• If so, make sure you include “[PHI]” in the subject line.

• Add the following to the FILENAME statement:attach = "C:\My Folder\myfile.html";

Set-up #1

• From the Desktop, right-click My Computer and click Properties.

• On the Advanced tab,

click Environment Variables.

Set-up #1 (continued)• In the System variables list, select

Path, and then click Edit. Note: You may have to get the help desk to do this for you.

• On the Edit System Variable page, in the Variable Value text box, add a semi-colon (;) to the end of the existing path, and then type the path to the folder where the Lotus Notes executable files are installed. For example, type:

;C:\Program Files\IBM\Lotus\Notes

• Click OK.

Set-up #2

• Add the following to your SAS configuration file (i.e. C:\Program Files\SAS92\SASFoundation\9.2\nls\en\SASV9.CFG)

/* Setup Automatic Email Parameters */-EMAILHOST=LOCALHOST-EMAILAUTHPROTOCOL=NONE-EMAILSYS=VIM-EMAILID='Laura A Schild/GA/KAIPERM'-EMAILPW='YourEmailPassword'-EMAILDLG=NATIVE-EMAILPORT=25

Route SAS Results Directly to Email

FILENAME outbox EMAIL SUBJECT = 'My SAS Report'

FROM = '[email protected]' TO = '[email protected]' TYPE = 'text/html';

ODS HTML Body = Outbox ;Proc Print Data= Work.Data noobs;Run;

Ods HTML close;

References

http://www2.sas.com/proceedings/sugi26/p074-26.pdfTaking Control and Keeping It: Creating and Using Conditionally Executable SAS ® CodeJustina M. Flavin, Pfizer Global Research & Development, La Jolla Laboratories, San Diego, CAArthur L. Carpenter, California Occidental Consultants, Oceanside, CA

http://www.phusewiki.org/docs/2006/P008.pdfDefensive programming techniquesJohn Woods, ICON Clinical Research, Dublin, IrelandJennie McGuirk, ICON Clinical Research, Dublin, Ireland

https://ideabook.kp.org/message/11954#11954http://www.sas.com/offices/NA/canada/downloads/presentations/Calgary09/Email.pdf

Automated Emails with Attachments- Base SAS or SAS Enterprise Guide

http://www2.sas.com/proceedings/sugi31/128-31.pdf You’ve Got E-Mail: Automatic Log Checking Via E-mail NotificationAaron Augustine, Deloitte & Touche LLPPrasenjit Dutta, Deloitte & Touche Audit Services India Private Limited

Conclusion