data integrity verification - chapters site county/iia oc presentation... · data integrity...

Post on 21-Apr-2018

217 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Data Integrity Verification

Michael Kano, ACDA

1Data Integrity Verification

IIA Orange County Chapter

November 13, 2015

Excel Transformed My Data!

Data Integrity Verification2

BEFORE AFTER

101122001XIOB00260002 10112210000000

3

Michael KanoSenior Manager, Data Analytics

Sunera LLC

Michael is a Senior Manager with Sunera’s national data analytics practice. Michael has 20 years of experience in data analytics and internal audit with organizations in the USA, Canada, and Kuwait.

He has 20 years of experience with ACL software, including 8 years as the leader of ACL Services Ltd.’s global training team. During his tenure at ACL Services, Michael helped drive the training business to new levels of revenues and profits by actively supporting the Sales team in pre-sales discussions.

Michael’s most recent experience consists of four years with eBay, Inc.’s internal audit team as Manager, Audit Analysis. He was tasked with integrating data analytics into the audit workflow on strategic and tactical levels. This included developing quality and documentation standards, training users, and providing analytics support on numerous audits in the IT, PayPal, and eBay marketplaces business areas. He also provided support to non-IA teams such as the Business Ethics Office and Enterprise Risk Management teams.

During his years at eBay, Michael supported audits throughout the organization in the IT, compliance, operations, vendor management, revenue assurance, T&E, and human resources areas.

Michael also has 7 years of experience with Arbutus Software, and has managed the transition to Arbutus from other data analysis tools. He is a proficient user of Tableau, Microsoft Access, and Teradata SQL Assistant.

AGENDA

� Defining data integrity verification (DIV)

� Sources of integrity erosion

� File-level testing

� Field-level testing

Data Integrity Verification4

Defining Data Integrity Verification

5Data Integrity Verification

Data Integrity Verification (DIV)

• The process by which the data analyst tests

the data to determine whether it is acceptable

for analysis

• Tests should be carried out at both the file

level and the field level before conducting any

analytics.

Data Integrity Verification6

The Risks of Integrity Erosion

• Lost time

• Incorrect conclusions

• Revenue/cost

• Security

• Professional standing

7Data Integrity Verification

Evidence of data integrity erosion

• Missing records

• Excess records

• Duplicates

• Shifted fields

• Skewed records

Data Integrity Verification8

• Blank/invalid entries

in key fields

• Incorrect/invalid

formatting

• Invalid characters in

data

Shifted Fields

Data Integrity Verification9

Skewed Records

Data Integrity Verification10

Sources of Integrity Erosion

11Data Integrity Verification

Processing…

Data Integrity Verification12

The Process

13Data Integrity Verification

Sources of data integrity errors

• Miscommunication of requirements

• Extraction

• Conversion

• Transmission

• Import

• Manual edits

• Data definition

Data Integrity Verification14

Miscommunication

• "All AP transactions between April and June,

including all important fields."

• "All AP payments and reversals between

4/1/2015 and 6/30/2015 (inclusive) including

the following fields: <field list>. The output

should be in a tab-delimited text file, and at

no point should it pass through a spreadsheet

or be opened in a spreadsheet application."

15Data Integrity Verification

Conversion

• Dropping leading zeros (ID numbers)

• Converting date to numeric

• Removing alphas from alphanumeric field

• Use of delimiter that is included within a text

field

• Insertion of blank lines in Excel

Data Integrity Verification16

Date Conversion

Data Integrity Verification17

Manual Edits

• Inadvertent/deliberate editing

• How does that happen?

– Sorting

– Formatting

– Copy/pasting

18Data Integrity Verification

Data Definition

• Record length

• Field position

• Formatting (date fields)

Data Integrity Verification19

File-Level Testing

20Data Integrity Verification

File-Level Testing

• Structure

• Content

Data Integrity Verification21

Structure

• Review metadata

• Send table layout to a table in Arbutus/ACL

• Compare field type/length/format to

metadata

22Data Integrity Verification

Content

• Completeness

– Run COUNT to document number of records

– Run TOTAL on numeric fields for control totals

• Uniqueness: Run DUPLICATES command

selecting all fields to identify duplicate

records

• Validity: Run VERIFY against numeric and

date fields

Data Integrity Verification23

Field-Level Testing: Numerics

24Data Integrity Verification

Numeric Fields: What to look for

Data Integrity Verification25

• Field total • Lowest value

•Highest value •Average

•Second-highest value •Range

•Ratio of 2nd highest to highest •Absolute value

•Median •Number of zeros

•Number of positives •Number of negatives

•Number of corrupt entries

Testing Numeric Fields

• Run STATISTICS against all numeric fields

– Look for zeros, negatives, bounds,

highest/second-highest

• Recalculate computed value with computed

fields (e.g, Total_Amount = Price * Quantity)

Data Integrity Verification26

Scripted Solution

Data Integrity Verification27

•Shows table/field names, and test date-time in a table

•Provides comprehensive, standard test results

•Faster and less error-prone than manual execution

•2 million records, 4 numeric fields in ~45 seconds

•Also saves table layout for file with _TL suffix

Script Results: Numerics

28Data Integrity Verification

Field-Level Testing: Dates

29Data Integrity Verification

Date Fields: What to look for

Data Integrity Verification30

•Oldest •Weekends

•Most recent •Blanks

•Span of valid dates •Invalid non-blank dates

Testing Date Fields

• Run STATISTICS against all date fields

– Blanks/invalids/weekends

– Bounds

• Test related fields, e.g., PO_Date <=

Invoice_Date

• Test for completeness (24/7 data) with GAPS

command

Data Integrity Verification31

Blank Dates & Formatting

• Entire date column is blank = Incorrect format

in field definition.

• Edit >> Table Layout to review and correct

format

Data Integrity Verification32

Formatting Date Fields

Data Integrity Verification33

Dates: Scripted Solution

34Data Integrity Verification

Field-Level Testing: Characters

35Data Integrity Verification

Character Fields: What to look for

Data Integrity Verification36

Item Functionality

Blanks ISBLANK(<key>)

Invalid entries CLASSIFY ON <key>

CLASSIFY ON FORMAT(<key>)

Duplicates DUPLICATES ON <key>

Character Fields: Formats

• Verify that format is valid

• May need to scrub

• PO numbers, customer IDs, phone numbers,

zip codes

• Use FORMAT() function in CLASSIFY to

display list of unique formats

CLASSIFY ON FORMAT(<field name>) TO "<output file>" OPEN

Data Integrity Verification37

Output of CLASSIFY + Format()

Data Integrity Verification38

•1 record per format

•Shows frequency

x= lower-case alpha

X = upper-case alpha

9 = numeric

Blanks/special characters

Mitigating Integrity Risk

39Data Integrity Verification

Key Items

• Know your data

• Obtain data independently (SQL?)

• Short chain from extraction to analysis

• Automated DIV

40Data Integrity Verification

The Process

41Data Integrity Verification

The New Process

42Data Integrity Verification

Benefits

• Independence

• Confidence

• Shorter time

• Comprehensive DIV

Data Integrity Verification43

Any questions?

Michael Kano, ACDA

mkano@sunera.com

Data Integrity Verification44

top related