audito tools

Detectlets for Better Fraud Detection

Conan C. Albrecht, PhD

Marriott School of Management

Brigham Young University

Today’s Presentation

• Give a few fraud stories

• Outline the Detectlet vision and Picalo

Architecture

• Show example code and working products

• Describe future research directions and

solicit help

Two Types of Fraud

• Fraud on behalf of an organization

– Financial statement manipulation to make the

company look better to stockholders

– Also called management fraud

• Fraud against an organization

– Stealing assets, information, etc.

– Also called employee or consumer fraud

ACFE Report to the Nation Occupational

Fraud and Abuse

• 2 1/2 year study of 2608 Frauds totaling $15 million

– Fraud costs U.S. organizations more than $400 billion annually.

– Fraud and abuse costs employers an average of $9 a day per employee

– The average organization loses about 6 percent of its total annual revenue to fraud and abuse admitted to by its own employees

Ernst & Young Fraud Study 2002 (Europe)

• One in five workers are aware of fraud in their workplace

• 80% would be willing to turn in a colleague but only 43% have

• Employers lost 20 cents on every dollar to workplace fraud

• Types of fraud– Theft of office items—37%

– Claiming extra hours worked—16%

– Inflating expenses accounts—7%

– Taking kickbacks from suppliers—6%

Revenues $100 100%Expenses 90 90%Net Income $ 10 10%Fraud 1Remaining $ 9

To restore income to $10, need $10 more dollars of revenue to generate $1 more dollar of income.

Cost of Fraud

• Fraud Losses Reduce Net

Income $ for $

• If Profit Margin is 10%,

Revenues Must Increase by

10 times Losses to Recover

Affect on Net Income

– Losses……. $1 Million

– Revenue….$1 Billion

• Large Bank

– $100 Million Fraud

– Profit Margin = 10 %

– $1 Billion in Revenues

Needed

– At $100 per year per

Checking Account,

10 Million New

Accounts

Fraud Cost….Two Examples

• Automobile

Manufacturer

– $436 Million Fraud

– Profit Margin = 10%

– $4.36 Billion in

Revenues Needed

– At $20,000 per Car,

218,000 Cars

0

500,000,000

1,000,000,000

1,500,000,000

2,000,000,000

2,500,000,000

3,000,000,000

Year 1 Year 3 Year 5 Year 7 Year 9

Some of the organizations involved: Merrill Lynch, Chase, J.P. Morgan,

Union Bank of Switzerland, Credit Lynnaise, Sumitomo, and others.

A Recent Fraud

• Large Fraud of $2.6 Billion

over 9 years

– Year 1 $600K

– Year 3 $4 million



– Year 9 $2.6 billion

• In years 8 and 9, four of the

world’s largest banks were

involved and lost over $500

million

Every Person Has A Price

• Abraham Lincoln once threw a man out of

his office, angrily turning down a

substantial bribe. “Every man has his

price”, explained Lincoln, “and he was

getting close to mine.”

Examples of Data-Based

Detection

Superhuman Workers

• Summed all hours (normal, OT, DT) per two week period, regardless of invoice or timecard)

• Workers were logging hours on two timecards for simultaneous jobs

The Family Business

Work Orders Authorized By Purchaser

The Family Business

Invoice Charges Authorized By Purchaser

The Family Business

Work Orders Given To Contractor Crew

The Family Business

• Tip stated that kickbacks were occurring

with a certain company

• We researched the company and

determined which purchaser authorized

the work

• A contractor crew and company purchaser

were family

Systematic Increases In Spending

Unexpected Peaks In Spending

Increases In Only Part Of A Trend

Caught by his Pool…

Research Background

Accounting History

• 1940 SEC Statement: “Accountants can be expected to

detect gross overstatements of assets and profits

whether resulting from collusive fraud or otherwise”

(Accounting Series Release 1940)

• 1961: “If the ten (auditing) standards now accepted were

satisfactory for their purpose we would not have the

pleas for guidance on the extent of (auditors’)

responsibility for the detection of irregularities we now

find in our professional literature.” (Mautz & Sharaf 1961)

• 1997 - SAS 82

• 2002 - SAS 99

Expectation Gap

Historical Fraud Research

• Excellent literature review by Nieschwietz,

Shultz, & Zimbelman (2000)

– Who commits fraud

– Red flags

– Expectation gap

– Auditor expectations

– Game theory between auditors and management

– Auditor-client relationships

– Risk assessment, decision aids

– Management factors affecting fraud

FS Fraud using Ratio Analysis

• Hansen, et. al (1996) developed a generalized qualitative-response model from internal sources

• Green and Choi (1997) used neural networks to classify fraudulent cases

• Summers and Sweeny (1998) identified FS fraud using external and internal information

• Benish (1999) developed a probit model using ratios for fraud identification

• Bell and Carcello (2000) developed a logistic regression model to identify fraud

• Current work by McKee and by Cecchini and by Albrecht

• None have found the “silver bullet” in using external information to identify fraud– Management (FS) fraud is very difficult to find

What are the Big 4 Doing?

• Each firm seems to have different groups

working on fraud detection

– No best practices model has emerged

• IT auditors perform control testing on

company systems, not fraud detection

• Meeting with Bill Titera of EY

Why Don’t “They” Find Fraud?

• Limited time– Our most precious resource is our attention

• History– Heavy use of sampling - lack of detail

– Lack of historical fraud detection instruction

• Lack of fraud symptom expertise

• Lack of fraud-specific tools

• Lack of analysis skills

• Lack of expertise in technology

• Auditors do find 20-30 percent of fraud» ACFE 2004 Report to the Nation

Isn’t there a better way?

Reasonable time requirements

Within reach of most auditors

(highly technical skills not required)

Cost effective

Integrate easily into different

database schemas

Integrate AI and

auto-detection

Initial Thoughts

• A small “manual” about frauds

– Cliff notes about different types of fraud

– Describes the scheme

– Describes the indicators of the scheme

• Worldwide repository wth contributions

from many different industries

• Primary focus was training

Detectlets

• A detectlet encodes:

– Background information on a scheme

– Detail on a specific indicator of the scheme

– Wizard interface to walk the user through

input selection

– Algorithm coded in standard format

– “How to interpret results” follow-up

• Input is one or more table objects

• Output is one or more table objects

Detectlet Demonstration

• Bid rigging where one person prepares all

bids Item BidderAUnit BidderATotal BidderBUnit BidderBTotal BidderCUnit BidderCTotal1 .1 .1 0 1 829 .85 1 829 .65 2 100.00 1 895 .001 .1 .20 1 25 6 .99 1 25 6 .99 1 380.00 1 301 .881 .1 .3 0 3467 .5 2 3467 .5 2 3900.00 3591 .3 61 .1 .40 4 .2 1 421 .00 4 .65 465 .00 4 .3 6 436 .001 .1 .5 0 1 .91 229 .20 2 .1 0 252 .00 1 .98 237 .001 .1 .60 1 3328.00 1 3328.00 1 5 1 00.00 1 3804.001 .1 .7 0 3360.001 .2 .1 0 32 .48 1 62 .40 35 .60 1 7 8.00 33 .62 1 68.201 .2 .20 1 3 .22 661 .00 1 4 .5 0 7 25 .00 1 3 .69 684 .5 01 .2 .30 1 3 .89 694 .00 1 5 .2 5 7 62 .5 0 1 4 .38 7 1 9 .001 .2 .40 9 .97 229 .1 0 1 0.95 328.5 0 1 0.3 2 309 .601 .3 .1 0 1 24 .43 3 7 3 .29 1 36 .65 409 .95 1 28.88 386 .641 .3 .20 1 39 .63 27 9 .26 1 5 3 .35 306 .7 0 1 44 .62 289 .241 .3 .30 34.1 2 1 02 .36 3 7 .45 1 1 2 .3 5 3 5 .34 1 06 .021 .3 .40 1 24 .43 622 .1 5 1 36 .65 683 .2 5 1 28.88 644 .401 .3 .5 0 26.82 5 36 .40 29.45 5 89 .00 27 .7 8 65 5 .601 .3 .60 20.80 41 6 .00 22 .85 45 7 .00 2 1 .5 4 430.801 .3 .7 0 39.66 7 93 .20 43 .5 5 87 1 .00 41 .08 821 .601 .3 .80 5 1 .48 1 287 .00 5 6.5 5 1 41 3 .7 5 5 3 .32 1 333 .001 .3 .90 5 2 .96 1 324 .00 5 8.1 0 1 45 2 .60 5 4.85 1 37 1 .251 .3 .1 00 5 2 .96 847 .3 6 5 8.1 0 929 .60 5 4.85 87 7 .601 .3 .1 1 0 27 7 .28 1 1 091 .20 304 .5 0 1 21 80.00 287 .1 9 1 1 487 .601 .3 .1 20 203 .5 3 223 .5 0 2 10.801 .3 .1 30 45 .99 27 5 9.40 5 0.50 3030.00 47 .63 285 7 .801 .3 .1 40 1 2 .1 9 487 .60 1 3 .40 5 36 .00 1 2 .63 5 05 .201 .3 .1 50 1 1 .7 0 468.00 1 2 .85 5 1 4 .00 1 2 .1 2 484 .801 .3 .1 60 1 2 .49 249 .80 1 3 .7 0 27 4 .00 1 2 .94 258.801 .3 .1 7 0 2 .45 24 .50 2 .7 0 27 .00 2 .5 4 25 .401 .3 .1 80 326 .3 9 326 .3 9 3 58.00 338.051 .4 .1 0 954 1 .68 9541 .62 1 0480.00 1 0480.00 9882 .46 9882 .46

Potential Supporting Platforms

• MS Access

• ACL or IDEA

• Build ground up application

– Allows total control over platform

– Stays with open source rather than tying the program

to a particular platform

• For example, consider PowerBuilder

– Supports Windows, Unix, Linux, Mac

– Allows embedded use within a greater platform

– Personal preference was Python

Picalo: The Supporting Platform

Central Detectlet Repository

How Detectlets Address the Problem

• Limited Time: Detectlets provide a wizard

interface for quick execution; they can be

chained and automated into a larger

system

• High Cost: Detectlets are based in open

source software, putting them within reach

of small and large accounting firms; they

also create a community environment for

fraud detection


• Lack of fraud symptom expertise:

Detectlets provide a large library of

available routines to both train and walk

auditors through the detection process

• Lack of fraud-specific tools: Picalo

provides an open solution that we can

improve over time; it puts a fraud-specific

toolkit in the hands of auditors


• Lack of analysis skills: Detectlets

encode full algorithms and code, allowing

the auditor to stay at the conceptual level

rather than the implementation level

• Lack of expertise in technology:

Detectlets provide a wizard-based solution

that are easy to use; Picalo provides an

Excel-like user interface

Picalo Level 1 API

Data Structures

The Table object is the basic data structure. Nearly all

routines both input and return tables, allowing them to be

chained. Its methods include sorting, column operations, row

operations, import/export from delimited text and Excel

formats.

Column types include Boolean, Integer, Floating Point, Date,

DateTime, String, etc.

Simple Module

Provides joining, matching, fuzzy matching, and selection.

col_join, col_left_join, col_right_join, col_match,

col_match_same, col_match_diff, compare_records, custom_match,

custom_match_same, custom_match_diff, describe,

expression_match, find_duplicates, find_gaps, fuzzysearch,

fuzzymatch, fuzzycoljoin, get_unordered, join, left_join,

right_join, select, select_by_value, select_outliers,

select_outliers_z, select_nonoutliers, select_nonoutliers_z,

select_records, soundex, soundexcol, sort, etc.

Benfords Module

calc_benford: Calculates probability for a single digit

get_expected: Calculates probability for a full number

analyze: Analyzes an entire data set and calculates summarized

results

Crosstable Module

pivot: Similar to Excel’s pivot table function

pivot_table: Pivots and keeps detail in each cell

pivot_map: Pivots and keeps results in a dictionary rather than a

grid

pivot_map_detail: Pivots and keeps results in a very detailed

fashion using a dictionary

Database Module

OdbcConnection: Connects to any ODBC-compliant database

PostgreSQLConnection: Connects to PostgreSQL

MySQLConnection: Connects to MySQL

Also includes various query helper functions, such as query

creation, results analysis, etc.

Financial Module

Calculates various financial ratios to help in financial

statement analysis:

Current ratio

Quick ratio

Net working capital

Return on assets

Return on equity

Return on common equity

Profit margin

Earnings per share

Asset turnover

Inventory turnover

Debt to equity

Price earnings

Grouping Module

Stratification gives the details behind SQL GROUP BY. It keeps

the detail tables rather than summarizing them.

stratify: Stratifies a table into N number of tables

stratify_by_expression: Stratifies a table using an arbitrary

expression

stratify_by_value: Stratifies on unique values

stratify_by_step: Stratifies based on a set numerical range

stratify_by_date: Stratifies based on a date range

Summarizing is similar to SQL GROUP BY, but it allows any type of

function to be used for summarization (GROUP BY generally only

allows sum, stdev, mean, etc.)

This can by done in the same ways as stratification.

Trending Module

Various ways of analyzing trends and patterns over time.

cusum, highlow_slope, average_slope, regression, handshake_slope

Python Libraries

Powerful yet easy language with a significant online community

Full object-oriented support (classes, inheritance, etc.)

Text maniuplation and analysis routines

Web site spidering routines

Email analysis routines

Random number generation

Connection to nearly all databases

Web site development and maintenance

Countless libraries available online (almost all are open source)

Research Directions

Level 1 Research

• Foundation routines for fraud detection

– Development, testing, empirical use, field studies

• Connections to production software

– Standard SAP, Oracle, Peoplesoft, JD Edwards, etc.

modules

• Application of CS, statistics, other techniques to

fraud detection

– Time series analysis

– Pattern recognition for fraud detection

Level 2 Research

• Studies about detectlet presentation, user interface

• Creation and testing of detectlets for industries, data schemas, etc.

• Detectlets for financial statement fraud detection

• Testing of detectlet vs. traditional ACL-type fraud detection

• Patterns of detectlet development, best practices

Level 3 Research

• Automatic mapping of field schemas to a

common schema

• Application of expert system, learning

models for automatic detection

– Decision trees

– Classification models

• Meta-detectlets to combine various Level

2 detectlets into higher-level logic

Other Research

• Group-oriented processes for the central

repository

– Searching, categorization

– Testing, rating systems

• Marketplaces for detectlets

• Development of Picalo itself

My Hope

• In 5 years we’ll have a large repository of

detectlets to:

– Support both external and internal auditors

– Teach students in fraud classes

– Conduct theoretical and empirical research

http://www.picalo.org/

audito tools

Education

fraud storiesoutline

aware of fraud

better fraud detectionconan

nation occupational

recent fraudlarge fraud

new accounts fraud cost

dollar of income

revenues neededat