fraud detection with matlab · types of fraud corporate –financial statement falsification...
TRANSCRIPT
![Page 1: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/1.jpg)
1© 2015 The MathWorks, Inc.
Fraud Detection with MATLAB
Ian McKenna, Ph.D.
![Page 2: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/2.jpg)
2
Agenda
Introduction: Background on Fraud Detection
Challenges: Knowing your Risk
Overview of the MATLAB Solution– Connect to financial data sources
– Calculate fraud indicators
– Classify funds with machine learning
– Generate reports & deploy applications
Questions & Answers
![Page 3: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/3.jpg)
4
Fraud Detection
Detecting when people
intentionally act secretly
to deprive another of
something of value
Types
– Returns Forensics
– Linguistic Based Cues
http://nakedshorts.typepad.com/files/madoff_fairfieldsentry3x.pdf
![Page 4: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/4.jpg)
5
Types of Fraud
Corporate
– Financial statement falsification
Securities and commodities
– Hedge Fund returns manipulation
– Stock markets manipulation, regulation compliance
Healthcare
Mortgage
Identity theft (credit card)
Insurance
Mass marketing
Asset forfeiture/money laundering
![Page 5: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/5.jpg)
6
Hedge Fund Returns Manipulation
More prone to fraud due to decreased regulation
– SEC stats indicate 1% misbehave
Scenarios
– Misbehavior: HF managers that have some discretion in
valuing illiquid investments. Academics have devised methods
to analyze and flag potentially “manipulated” fund returns.
– Outright fraud: Quantitative screening and use of dedicated
algorithms can save a lot of time
![Page 6: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/6.jpg)
7
Return-Based Analysis
# of negative monthly returns used to judge manager’s
performance
Attract investors by misreporting returns
Distortion possible for returns at manager’s discretion
– Illiquid assets, complex assets
E.g. discontinuity exists at zero but disappears if returns
computed bimonthly
“Suspicious Patterns in Hedge Fund Returns and the Risk of Fraud”. Bollen, Nicolas P.B. and Veronika
K. Pool (2012) Review of Financial Studies 25, 2673-2702.
![Page 7: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/7.jpg)
9
Returns Distribution Discontinuity
![Page 8: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/8.jpg)
10
Benford’s Law
Frequency distribution of digits in many real-life sources
of data:
– Electricity bills
– Street addresses
– Stock prices
– Population numbers
– Death rates
– Physical and mathematical constants
– Processes described by power laws
![Page 9: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/9.jpg)
11
Stock Market Returns First Digit Frequency
Source: Checking Financial markets via Benford's law, Marco Corazza, Andrea Ellero, and Alberto
Zorzi
![Page 10: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/10.jpg)
12
Agenda
Introduction: Background on Fraud Detection
Challenges: Knowing your Risk
Overview of the MATLAB Solution– Connect to financial data sources
– Calculate fraud indicators
– Classify funds with machine learning
– Generate reports & deploy applications
Questions & Answers
![Page 11: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/11.jpg)
13
Challenges in Fraud Detection
Cost/Economics
– Most cases not fraud
– Manual analysis
Data
– Huge data sets
– Complex data types
– Data integration
Change
– Evolutionary
– Secrecy in detection methods
![Page 12: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/12.jpg)
15
Traditional Approach Challenge
Challenges Faced During Model Development
Off-the-shelf softwareInability to work with
custom and complex data
In-house development with
traditional languages
Adapting requires long
development times
Spreadsheets, Excel Limited data size
Combination of the aboveInefficiencies in
Integration & Automation
![Page 13: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/13.jpg)
16
Computational Finance Workflow
Research and Quantify
Data Analysis
& Visualization
Financial
Modeling
Application
Development
Reporting
Applications
Production
Share
Automate
Files
Databases
Datafeeds
Access
![Page 14: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/14.jpg)
17
The Desired Report
Three funds to analyze and report:
– Gateway Fund
– American Funds Growth Fund
– Fairfield Sentry (known fraudulent Madoff fund)
![Page 15: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/15.jpg)
18
Agenda
Introduction: Background on Fraud Detection
Challenges: Knowing your Risk
Overview of the MATLAB Solution– Connect to financial data sources
– Calculate fraud indicators
– Classify funds with machine learning
– Generate reports & deploy applications
Questions & Answers
![Page 16: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/16.jpg)
20
Implemented Methods – Returns Based
Returns distribution and discontinuity at 0 Check discontinuity at 0 of the distribution of monthly returns
Low correlation with other assets Regress fund returns on a combination of style factors that maximize
explanatory power of the analysis
Unconditional serial correlation Check if monthly returns are serially correlated, i.e. correlated with their
previous month value. Because managers investing in illiquid securities,
with no end-of-month quoted price, may smooth their returns compared to
all available market information
Conditional serial correlation Using the optimal factor model constructed in “Low correlation with other
assets”, check serial correlation occurring especially after a down month
(i.e. when the suspicious managers has the highest incentive to “catch up”)
![Page 17: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/17.jpg)
21
Implemented Methods – Returns Based
Number of returns equal 0 Calculate the theoretical number of returns being 0, using cumulative
distribution function and binomial coefficients, for a time series exhibiting
the same characteristics (average returns and variance) as the fund. Then
compare that number with the actual count.
Number of negative returns Calculate the theoretical number of negative returns as above. Then
compare that number with the actual count.
Number of unique returns/length of identical recurring
series Calculate the theoretical number of each patterns. Unique returns is the
number of unique numbers in the time series and length of identical series
is the number of consecutive observations that are identical . Then
compare these statistical numbers with the actual count.
![Page 18: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/18.jpg)
22
Implemented Methods – Returns Based
Sample distribution of the last digit Check if the distribution of the returns last digit is uniformly distributed with
a goodness-of-fit test
Sample distribution of the first digit Check if the distribution of the returns first digit is following the Benford’s
Law with a goodness-of-fit test
Supervised classification methods Using machine learning tools (such a Neural Networks, Classification
methods) train a model to identify potential fraudsters. Input variables
consists of all of the indicators described above so far, attributed to
previously identified fraudulent and non fraudulent fund. Apply the fitted
model to a new fund to obtain its classification.
![Page 19: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/19.jpg)
24
Text Based Indicators
Idea from published research in criminal investigation
Hypothesis - deceptive senders display:
– Higher quantity
– Higher expressivity
– Higher informality
– Higher uncertainty
– Higher nonimmediacy
– Lower complexity
– Lower diversity
– Lower specificity
“Automating Linguistics-Based Cues for Detecting Deception in Text-based Asynchronous Computer-Mediated Communication”.
LINA ZHOU, Department of Information Systems, University of Maryland, Baltimore County, MD, USA. JUDEE K. BURGOON, JAY F.
NUNAMAKER, JR. AND DOUG TWITCHELL, Center for the Management of Information, University of Arizona, Tucson, AZ, USA. Group
Decision and Negotiation 13: 81–106, 2004
![Page 20: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/20.jpg)
25
Implemented Methods – Text Based
Measure Complexity Average number of statements (average concepts per sentence)
Average sentence length (average complexity of structures)
Vocabulary complexity (average word length)
Measure Uncertainty Average use of modifiers (number of adjectives/adverbs per sentence)
Average reference to other (number of he, they, …)
Measure of Expressivity Emotiveness (number of adjectives compared to nouns)
Measure of Diversity Lexical diversity (number of unique words)
![Page 21: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/21.jpg)
26
Classifying Words
Java POS Tagger
Reference online dictionary
Only a few line of code
![Page 22: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/22.jpg)
28
Comparison: American Growth Fund
![Page 23: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/23.jpg)
29
Comparison: Madoff
![Page 24: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/24.jpg)
31
Next Steps: Machine Learning with MATLAB
To learn more, visit: www.mathworks.com/machine-learning
Basket Selection using
Stepwise Regression
Classification in the
presence of missing data
Regerssion with Boosted
Decision Trees
Hierarchical Clustering
![Page 25: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/25.jpg)
32
MATLAB Solutions
Traditional Approach Challenge Solution
Off-the-shelf softwareInability to work with
custom and complex dataFlexible Modeling
Work with structured/unstructured
In-house development
with traditional languages
Adapting requires long
development timesRapid Prototyping
Advanced
Spreadsheets, Excel Limited data sizeWork with Big Data Sets
Database/Hadoop
Combination of the aboveInefficiencies in
Integration & AutomationEasy to Integrate & Deploy
Automated reports, encrypted models
![Page 26: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/26.jpg)
33
Financial Modeling Workflow
Financial
Statistics & Machine
LearningOptimization
Financial Instruments Econometrics
MATLAB
Parallel Computing MATLAB Distributed Computing Server
Files
Databases
Datafeeds
Access
Reporting
Applications
Production
Share
Data Analysis and Visualization
Financial Modeling
Application Development
Research and Quantify
MATLAB Compiler
SDK
MATLAB Compiler
Rep
ort G
en
era
tor
Production Server
Datafeed
Database
Spreadsheet Link EX
Trading
![Page 27: Fraud Detection with MATLAB · Types of Fraud Corporate –Financial statement falsification Securities and commodities –Hedge Fund returns manipulation –Stock markets manipulation,](https://reader034.vdocument.in/reader034/viewer/2022042210/5eaf632c21169a5cd4785ed3/html5/thumbnails/27.jpg)
34
Q&A