using base sas® to automate quality checks of excel ...torsas.ca/attachments/file/20200228/using...
Post on 29-Jan-2021
1 Views
Preview:
TRANSCRIPT
-
© 2020. All rights reserved. IQVIA® is a registered trademark of IQVIA Inc. in the United States, the European Union, and various other countries.
Lisa Mendez, PhD
Andrew T. Kuligowski
Using Base SAS® to Automate
Quality Checks of Excel ®
Workbooks that have Multiple
Worksheets
-
1
+The Process
• Determine how to identify the smaller problems within the
larger, overwhelming problem
• Solve each problem using SAS code
• Putting it all together
• Implementing the code
• Lessons Learned
Overview
-
2
• Unfamiliar with the data – thrown into the deep end
• Five Markets
- ADHD, BNZD, CNNB, CDNE, and PAIN
• Each market had seven (7) Excel workbooks to be checked
• Each workbook had various multiple worksheets (variables were the same but
worksheet names were different for each market)
- ADHD – 7 worksheets
- BNZD – 24 worksheets
- CNNB – 7 worksheets
- CDNE – 5 worksheets
- PAIN – 55 worksheets
Background
-
3
Background
• Sample (note: the number of workbooks have increased since the writing of the original paper
and presentation
-
4
Background
• Sample of
variable names
(partial) and
worksheet
names (partial)
-
5
• Let’s do the math…
• 5 markets multiplied by 7 workbooks (35 workbooks) that had a total of 98
worksheets that needed to be checked
- That is 3,430 worksheets
- For 27 quarters!!!!
- For a grand total of 92,610 worksheets
• That can be just a little bit overwhelming!
The Overarching Problem
-
6
1. lexjansen.com (more information at the end of the discussion)
2. SAS Communities
3. SAS Support
4. University websites
Research
-
7
• XLSX Engine
- Allows you to read and write Microsoft Excel files as if they were data sets in a
library
- Advantage is that it accesses the XLSX file directly - does not use the Microsoft
data APIs as a go-between
- You have to have a license for SAS/ACCESS to PC Files to utilize the XLSX
engine
- SAS University Edition, the SAS/ACCESS product is part of the that package
Getting the Data into SAS
libname Cadhd1 XLSX
"C:\Users\lmendez\Documents\RMPDC\Deliverables2017_Q2\ADHD\
RMPD_Patient Tracking_ADHD_NDW_2018Q2.xlsx";
-
8
Getting the Data into SAS
libname Cadhd1 XLSX "C:\Users\lmendez\Documents\RMPDC\Deliverables2017_Q2\
ADHD\RMPD_Patient Tracking_ADHD_NDW_2018Q2.xlsx";
The libname statement sets up the
datasets, and you will see them in the
cadhd1 library, but the datasets will be
empty
Names of datasets are
the names of the
worksheets
-
9
• Using PROC SQL and SAS Dictionary Tables
Loading the Data
-
10
Loading the Data
Note: All Caps
where
libname="CADHD1"
-
11
Loading the Data
-
12
Loading the Data
-
13
• Macro variables (will be used in the macro)
Loading the Data
Output from the log:52
53 %put &snamlist_1; /* show the macro variable snamlist in the
log */
LOOKUP*STATE_SUBGRP*STATE_SUPERGRP*ZIP_SUBGRP_AMPH*ZIP_SUBGRP_METH*
ZIP_SUBGRP_OTH_ANAL*ZIP_SUBGRP_OTH_ANTI*ZIP_SUPER
54 %put &n_1; /* show the macro variable n_1 I the log */
8
-
14
• SAS Macro
Loading the Data
54 %put &n_1; /* show the macro variable n_1 I the log */
8
LOOKUP*STATE_SUBGRP*STATE_SUPERGRP*ZIP_SUBGRP_AMPH*ZIP_SUBGRP_ME
TH*ZIP_SUBGRP_OTH_ANAL*ZIP_SUBGRP_OTH_ANTI*ZIP_SUPER
-
15
• Lessons Learned
- After consulting with Vince DelGabo, he suggested using Proc Datasets to copy
in a library
- Different ways to deal with invalid worksheet names
› Rename the dataset
› Delete the invalid SAS dataset
Loading the Data
-
16
Loading the Data
-
17
• Need templates to compare
• Load templates each quarter
- Ensure permanent template library (libname statement)
- By Market
› List of variable names
› List of worksheet names
Validating Worksheet and Variable Names
-
18
• Once templates are loaded, compare worksheet names
Validating Worksheet Names
-
19
• Dataset created after PROC SQL compare for Worksheet Names
• All worksheet names match – no errors
Validating Worksheet Names
-
20
• Create an error report
Validating Worksheet Names
-
21
• Sample of Worksheet Error
Exporting Error Report to Excel
-
22
• Once templates are loaded, compare variable names
• Use Proc Contents to get a current list of variable names
Validating Variable Names
-
23
• Dataset created after PROC SQL
compare for Variable Names
• Everything Matches
• Note: change variable names either
before PROC SQL, or in the PROC
SQL statement
Validating Variable Names
-
24
Exporting Error Report to Excel
The macro variable ‘x’ is used to
number the reports that correspond
with each workbook
-
25
• Used within a macro
• One Excel file per Market
• Multiple worksheets for each
workbook checked
• No errors for this workbook
Exporting Error Report to Excel
Each worksheet corresponds to a workbook
-
26
• Lessons Learned:
- Do not output if there are no errors, or output “no error” message, because
most of the workbooks do not have variable name or worksheet name errors
• Found code to deal with data sets with no observations in order to not write out to
Excel
- See next two slides
Exporting Error Report to Excel
-
27
Export only datasets that have error messages
Code continues on next slide
-
28
Export only datasets that have error messages
-
29
• A macro variable was created, using the same methods as before for all the
worksheet/dataset names
• The macro variable was used in conjunction with a macro to execute a data step
multiple times to check all the data within a worksheet/dataset
Validating Data
-
30
Validating Data
-
31
• Similar code was written to check the products within a workbook
• A pre-loaded template was used to ensure the correct products were in the
correct worksheet/dataset
• A macro was used, along with a data step, and a PROC SQL step to compare
product names in the pre-loaded template with the product names of the current
data
• An exception report was created for the values check
• Utilized lesson learned from previous Excel export
- For these exception reports, only MS Excel workbooks were created for each
worksheet only if any errors were found
Validating Data
-
32
• Run multiple SAS programs from one “Main” program
“Main” Program
-
33
• Many macros are used to create many datasets in the process of checking one
workbook
• To ensure there is enough space in the SAS session, PROC Datasets is used to
clean up the libraries used in the program
• To delete all files in a SAS data library at one time use the KILL option
• CAUTION: The KILL option deletes all members of the library immediately after
the statement is submitted
Deleting Datasets
-
34
Deleting Datasets
-
35
• When faced with overwhelming task break it down
• Solve one problem at a time
• Doing research online may help provide different solutions
- Find one that works for your problem, and YOU prefer
- Don’t be afraid to code your program and do some steps that are not as efficient
(“down and dirty”)
• When utilizing macros, get the program to work before coding the macro(s)
• Enhance your program for efficiency when you have more time
Conclusion
-
36
LexJansen.com
-
© 2020. All rights reserved. IQVIA® is a registered trademark of IQVIA Inc. in the United States, the European Union, and various other countries.
Lisa Mendez
lisa.mendez@iqvia.com
Andrew T. Kulligowski
kuligowskiconference@gmail.com
Thank you!
top related