DERIVING INSIGHTS FROM BIG DATA
Presented by: Solon Angel
Product Manager
CaseWare IDEA Inc.
November 13, 2012
• Introduction
• What is BIG DATA?
• Impact on Audit
• Analytics & Collaboration
• Best Practices
• Questions & Answers
Agenda
BIG DATA
Megabytes
Gigabytes
Terabytes
Petabytes
Increasing Data Variety & Complexity
Web Logs
Sales transactions
Offer history
Affiliate Networks Search Marketing
Behavioral Targeting
Sensors /RFID/Devices
Mobile Web
User Click Stream
Sentiment
User Generated Content
Social Interactions & Feeds
Spatial & GPS Coordinates
Business Data Feeds
Speech to Text
Product Service Logs
SMS/MMS
Purchase
Detail Purchase Record
Payment Record
Support contacts
External Demographics
HD Audio, Video, Images
ERP
CRM
WEB
Automated reports
Offer details
Printed reports
AP / AR
What is BIG DATA?
Devil in the Data
ATMs
ERPs
Transactional data
CRM , Accounting
databases, new compliance
requirements, new medias etc…
Exabyte(s)
TENFOLD GROWTH OBSERVED IN FIVE YEARS
• In 2011, digital data was 10 times the size than in 2006
• Data sets are beyond the standard ability to process
• 44-fold in the next ten years
• Data growth cannot be ignored
• Requires new approach to enable insights and process
optimization
Growing Challenge
Poll 1
• What is the size of the biggest data file
you’ve worked with?
• 100Mb – 1Gb
• 1Gb – 500Gb
• 500Gb – 1Tb (Terabyte)
• More than 1Tb
Impact on Audit
Impact on Audit
• The big data problem
Higher volumes means longer analysis time
Larger variety of data type increases audit complexity
Fast changing record set turns audit-focused data into
a moving target
Providing insights becomes difficult on desktops
Higher Volume Impact
• The problem with big data:
Higher volumes means longer analysis time, or no
analysis!
• Example: Medicare
Higher Volume Impact
• Medicare
Medicare data spans across states, dozens agencies,
private companies and datacenters
Record set extremely fragmented
It is impossible to transfer all the data in one location
for processing
Yesterday/Today’s data:
Files
Databases
Tables
Columns
More Variety Impact Today/Tomorrow’s data:
Large PDFs
Automated feeds
Raw data extracts
Unstructured data
Scanned data
Audio files
Video
More Variety Impact
• Cause of complex problems and gaps
Data Sources Analytics
Extract
Aging Sort
Search
Group
Stratify
Standards
Gaps
Duplicates Sampling
Statistics
Join
Append
Audit Tests Transactional systems
Data
warehouses Online
databases
Client files
Printed
reports
Velocity Impact
• The problem with Velocity:
Fast changing record set turns audit-focus data into a
moving target
Speed Imports Scalable
Velocity
Volume Variety
Value
Impact
Poll 2
• What problems do Big Data pose for Audit?
• Higher volumes means longer analysis time
• Larger variety of data types increases audit
complexity
• Fast changing record sets turn audit-focus data
into a moving target
Analytics & Collaboration
1. Import from ERPs, CRM, other data files
2. Prepare the data
3. Analyze
4. Create report as PDF, Word, Excel…
5. Send emails / file sharing
6. Meet to discuss
35% 10% 30% 5% 5% 15%
Typical Day in Audit
Consider the Following
• Senior Auditor A spends a lot of time requesting datasets from IT.
• There is a delay of 3 days between the systems and the data he is
given.
• The datasets being IT-formatted, he spends considerable amount of
time cleaning the datasets into a workable database.
• At the same time, Senior Auditor B asks for similar datasets, but the
data was acquired by IT 5 days after. He also needs to spend time
cleaning the datasets.
• Hours are spent duplicating efforts for different results!
“Garbage in, garbage out”
Scenario
PROJECT A
PROJECT B
PROJECT C
PROJECT D
PROJECT E
PROJECT F
PROJECT G
PROJECT H
Day 3
Day 5
Risks Associated
• Data acquisition is cumbersome
• Risk of inaccurate data sources from IT
• Duplication of effort
• No visibility of the team’s activity
Server Scenario
PROJECTS A-H
Auditor B
Auditor C
Auditor D
Network
backup
Auditor A
1. Import from ERPs, CRM, other data files
2. Prepare the data
3. Analyze
4. Create report as PDF, Word, Excel…
5. Send emails / file sharing
6. Meet to discuss
35% 10% 30% 5% 5% 15%
Accelerate the audit process by 50%
Impact on Audit
1. Data is available from data sources
2. Analyze and share easily
3. Meet to discuss
Streamline the audit process by 50%
Keeping Audit Relevant
Poll 3
• What are advantages of a collaborative
approach to analytics?
• Less duplication of tasks between individuals
• Tackling problems that require group intelligence
• Retaining analytical process of all audits
Best Practices
• Modern Science
Human DNA code
Protein folding is one of the
hardest computational
problems in biology
In Today’s World
Popular Mechanics 2012
• Traditionally requires:
Mathematicians and developers able to write algorithms
Highly qualified scientists capable of interpreting results
• Modern Science - How did they do it?
New approach, new tools:
― Distributed computing grid based on
commodity hardware
― Ease to use interface providing a
single view of the problem, without
the need to interpret data
― Enabling collaboration of thousands
of individual (as a game)
In Today’s World
In Today’s World
"You don't find many soloists among the top scorers.”
Global Game Moderator
Popular Mechanics 2012
• Modern Science
• Enabling collaboration is key to solve big data
• Collaborate between teams
Less duplication of time spent on acquiring data
Easy to repeat success on a larger scale
• Applied knowledge transfer is greater and more effective
Retain analytical process of all audits – keep
expertise
Collaboration vs. Big Data
Bridging the gap between Desktop and Data Center
Collaborative Server Platform
Traditional
Hard To Manage
Costly
Limited
Distributed computing
Self-Managed
Cost Efficient
Scalable
Accelerate Performance
Tests Desktop (hrs.) Server (mins.) Gain
Summarization 2:29:01 0:08:14 1810 %
Duplicate Key Detection 1:09:56 0:07:01 3139%
Stratification 5:03:01 0:11:32 2626% Random Sample 2.5 million 1:12:26 0:08:13 881%
TESTS PERFORMED WITH BANKING DATA
3.2 MILLION RECORDS, 300+ FIELDS, 20 GIGABYTES
Results
Poll 4
• What are the advantages of server based
processing?
• Enabling efficient collaboration
• Run tasks faster
• More secure data
• Turning a foe into friend
Involving IT
Audit team
Datacenter
Data (fraud)
IT
IT wants:
Secured data
Control of access
ROI for investments
Disaster recovery
Audit needs:
Consistent data access
Data integrity
Speed
On demand analytics
Give to Get
• Remove friction points in audit process
• Look to identify best practices and scale them
• Enable auditors to be in control, help each other
• Leverage the latest technologies in datacenters
• If you know how to use CAATs (IDEA, ACL, etc.) you are
ready!
• Let the data speak: start with a pilot process, management
quick to approve success and immediate ROI
Recommendations