deriving insights from big data - success •the big data problem higher volumes means longer...

Click here to load reader

Post on 03-Oct-2020

1 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • DERIVING INSIGHTS FROM BIG DATA

    Presented by: Solon Angel

    Product Manager

    CaseWare IDEA Inc.

    November 13, 2012

  • • Introduction

    • What is BIG DATA?

    • Impact on Audit

    • Analytics & Collaboration

    • Best Practices

    • Questions & Answers

    Agenda

  • BIG DATA

    Megabytes

    Gigabytes

    Terabytes

    Petabytes

    Increasing Data Variety & Complexity

    Web Logs

    Sales transactions

    Offer history

    Affiliate Networks Search Marketing

    Behavioral Targeting

    Sensors /RFID/Devices

    Mobile Web

    User Click Stream

    Sentiment

    User Generated Content

    Social Interactions & Feeds

    Spatial & GPS Coordinates

    Business Data Feeds

    Speech to Text

    Product Service Logs

    SMS/MMS

    Purchase

    Detail Purchase Record

    Payment Record

    Support contacts

    External Demographics

    HD Audio, Video, Images

    ERP

    CRM

    WEB

    Automated reports

    Offer details

    Printed reports

    AP / AR

    What is BIG DATA?

  • Devil in the Data

    ATMs

    ERPs

    Transactional data

    CRM , Accounting

    databases, new compliance

    requirements, new medias etc…

    Exabyte(s)

    TENFOLD GROWTH OBSERVED IN FIVE YEARS

  • • In 2011, digital data was 10 times the size than in 2006

    • Data sets are beyond the standard ability to process

    • 44-fold in the next ten years

    • Data growth cannot be ignored

    • Requires new approach to enable insights and process

    optimization

    Growing Challenge

  • Poll 1

    • What is the size of the biggest data file

    you’ve worked with?

    • 100Mb – 1Gb

    • 1Gb – 500Gb

    • 500Gb – 1Tb (Terabyte)

    • More than 1Tb

  • Impact on Audit

  • Impact on Audit

    • The big data problem

     Higher volumes means longer analysis time

     Larger variety of data type increases audit complexity

     Fast changing record set turns audit-focused data into

    a moving target

     Providing insights becomes difficult on desktops

  • Higher Volume Impact

    • The problem with big data:

     Higher volumes means longer analysis time, or no

    analysis!

    • Example: Medicare

  • Higher Volume Impact

    • Medicare

     Medicare data spans across states, dozens agencies,

    private companies and datacenters

     Record set extremely fragmented

     It is impossible to transfer all the data in one location

    for processing

  • Yesterday/Today’s data:

     Files

     Databases

     Tables

     Columns

    More Variety Impact Today/Tomorrow’s data:

     Large PDFs

     Automated feeds

     Raw data extracts

     Unstructured data

     Scanned data

     Audio files

     Video

  • More Variety Impact

    • Cause of complex problems and gaps

    Data Sources Analytics

    Extract

    Aging Sort

    Search

    Group

    Stratify

    Standards

    Gaps

    Duplicates Sampling

    Statistics

    Join

    Append

    Audit Tests Transactional systems

    Data

    warehouses Online

    databases

    Client files

    Printed

    reports

  • Velocity Impact

    • The problem with Velocity:

     Fast changing record set turns audit-focus data into a

    moving target

  • Speed Imports Scalable

    Velocity

    Volume Variety

    Value

    Impact

  • Poll 2

    • What problems do Big Data pose for Audit?

    • Higher volumes means longer analysis time

    • Larger variety of data types increases audit

    complexity

    • Fast changing record sets turn audit-focus data

    into a moving target

  • Analytics & Collaboration

  • 1. Import from ERPs, CRM, other data files

    2. Prepare the data

    3. Analyze

    4. Create report as PDF, Word, Excel…

    5. Send emails / file sharing

    6. Meet to discuss

    35% 10% 30% 5% 5% 15%

    Typical Day in Audit

  • Consider the Following

    • Senior Auditor A spends a lot of time requesting datasets from IT.

    • There is a delay of 3 days between the systems and the data he is

    given.

    • The datasets being IT-formatted, he spends considerable amount of

    time cleaning the datasets into a workable database.

    • At the same time, Senior Auditor B asks for similar datasets, but the

    data was acquired by IT 5 days after. He also needs to spend time

    cleaning the datasets.

    • Hours are spent duplicating efforts for different results!

  • “Garbage in, garbage out”

    Scenario

    PROJECT A

    PROJECT B

    PROJECT C

    PROJECT D

    PROJECT E

    PROJECT F

    PROJECT G

    PROJECT H

    Day 3

    Day 5

  • Risks Associated

    • Data acquisition is cumbersome

    • Risk of inaccurate data sources from IT

    • Duplication of effort

    • No visibility of the team’s activity

  • Server Scenario

    PROJECTS A-H

    Auditor B

    Auditor C

    Auditor D

    Network

    backup

    Auditor A

  • 1. Import from ERPs, CRM, other data files

    2. Prepare the data

    3. Analyze

    4. Create report as PDF, Word, Excel…

    5. Send emails / file sharing

    6. Meet to discuss

    35% 10% 30% 5% 5% 15%

    Accelerate the audit process by 50%

    Impact on Audit

  • 1. Data is available from data sources

    2. Analyze and share easily

    3. Meet to discuss

    Streamline the audit process by 50%

    Keeping Audit Relevant

  • Poll 3

    • What are advantages of a collaborative

    approach to analytics?

    • Less duplication of tasks between individuals

    • Tackling problems that require group intelligence

    • Retaining analytical process of all audits

  • Best Practices

  • • Modern Science

     Human DNA code

     Protein folding is one of the

    hardest computational

    problems in biology

    In Today’s World

    Popular Mechanics 2012

    • Traditionally requires:

     Mathematicians and developers able to write algorithms

     Highly qualified scientists capable of interpreting results

  • • Modern Science - How did they do it?

     New approach, new tools:

    ― Distributed computing grid based on

    commodity hardware

    ― Ease to use interface providing a

    single view of the problem, without

    the need to interpret data

    ― Enabling collaboration of thousands

    of individual (as a game)

    In Today’s World

  • In Today’s World

    "You don't find many soloists among the top scorers.”

    Global Game Moderator

    Popular Mechanics 2012

    • Modern Science

  • • Enabling collaboration is key to solve big data

    • Collaborate between teams

     Less duplication of time spent on acquiring data

     Easy to repeat success on a larger scale

    • Applied knowledge transfer is greater and more effective

     Retain analytical process of all audits – keep

    expertise

    Collaboration vs. Big Data

  • Bridging the gap between Desktop and Data Center

    Collaborative Server Platform

    http://images.google.ca/imgres?imgurl=http://pubpages.unh.edu/~rem28/excel_icon.png&imgrefurl=http://pubpages.unh.edu/~rem28/&usg=__i8l4sk8z0_4cnuQSFaZAjBH4p6E=&h=126&w=128&sz=14&hl=en&start=2&tbnid=TSQwRqMPgKxfwM:&tbnh=90&tbnw=91&prev=/images?q=excel+icon&gbv=2&ndsp=18&hl=en&sa=X http://images.google.ca/imgres?imgurl=http://terrydube.com/images/wordicon.gif&imgrefurl=http://terrydube.com/emp/employers.html&usg=__Sz_616QUniAFVuIC1IZNq-1iN_s=&h=261&w=264&sz=28&hl=en&start=53&tbnid=JwAPsA1VEw036M:&tbnh=111&tbnw=112&prev=/images?q=word+icon&gbv=2&ndsp=18&hl=en&safe=active&sa=N&start=36 http://images.google.ca/imgres?imgurl=http://www.sfu.ca/cscd/outreach/aboriginal_ced/images/pdf_icon_002.gif&imgrefurl=http://www.sfu.ca/cscd/outreach/aboriginal_ced/index.html&usg=__GiuAA-NqffkYA5oD-qUS5DOhQfk=&h=150&w=150&sz=9&hl=en&start=2&tbnid=7yWDXvwqx9xWjM:&tbnh=96&tbnw=96&prev=/images?q=pdf+icon&gbv=2&hl=en&safe=active http://images.google.ca/imgres?imgurl=http://www.javierleung.com/sub/_img/Microsoft Office - PowerPoint.png&imgre