industry perspective: big data and big data analytics · pdf file industry perspective: big...

Click here to load reader

Post on 12-Mar-2020

10 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • Industry Perspective: Big Data and Big Data Analytics

    David Barnes Program Director Emerging Internet Technologies IBM Software Group

  • What is Big Data?

  • The Adjacent Possible

  • Inexpensive disk + Increased processing power

    + Data Warehouse +The Web

    + X

    = Big Data

    X=Sensors used to gather climate information, posts to social media sites, digital pictures and videos, transaction records, cell phone GPS signals, and more.

  • © 2010 IBM Corporation

    161 exabytes of data were created in 2006 – 3 million times the amount of information contained

    in all the books ever written.

    In 2010 the number reached hit 988 exabytes.

    IDC estimates that 1.8 zettabytes were created and replicated in 2011.

  • © 2010 IBM Corporation

    Every day, people create the equivalent of 2.5 quintillion bytes of data from sensors, mobile devices,

    online transactions, and social networks.

    Every month people send one billion Tweets and post 30 billion messages on Facebook.

    90% (or more) of the world’s data is unstructured.

  • The true nature of information

  • Is noisy

    Is often times dirty

    Is often full of valuable information

    Unstructured Data

  • © 2010 IBM Corporation

    Big Data has swept into every industry and business function.

    Businesses need to put the power of Big Data analytics in the hands of their business employees – Data Scientist is somewhat misleading.

    “Leaders in every sector will have to grapple with the implications of big data, not just a few data-oriented managers.” – McKinsey Global Institute

    The Big Data Imperative

    9

    Big Data Business Patterns

    Computational Journalism

    Chief Legal Officer

    Retail Business Planner

    IT Systems Management

    Pharma - Clinical Trials

    Business Fraud Detection

    Evidence Based Medicine

    Web Archiving

    . . .

  • © 2010 IBM Corporation

    Today’s Problem

    Data growing at compound annual growth of 60%/year

    Storage capacity continue to increase dramatically

    Storage access speeds have not kept up

    At transfer speed of 500 MB/sec - 1 terabyte of data will require ~30 mins to read from single drive

    Enter Map/Reduce • Automates the mechanisms of large-scale distributed computation ( i.e. work

    distribution, load balancing, replication, failure/recovery)

    • Divide & Conquer: Split 1 terabyte split among 100 drives will require ~20 seconds to read

    • M/R parallel processing model provides cost effective framework for new generation of analytic applications on unstructured or semi-structured data

  • © 2010 IBM Corporation

    Requirement: A New Class of Big Data Applications

    Big Data analytics must be brought to the line-of-business user.

    •Leverage easy-to-use manipulation metaphors

    •Use natural language technologies for analytics

    •Provide rich visualizations to quickly identify insights

  • Demo Buyer Sentiment Analysis

  • © 2010 IBM Corporation SlideSharenomics - Rise of Social Economy

    Social Media: Chiliean Earthquake 2010

    2010 Chilean earthquake fifth largest earthquake in recorded history

    The affected areas suffered major devastation - buildings, airports, hospitals, prisons, bridges, and roads were severely damaged

    Land-based communications systems suffered major outages

    The wireless 3G infrastructure remained intact and operational

    13

  • © 2010 IBM Corporation SlideSharenomics - Rise of Social Economy

    Social Media: Chiliean Earthquake 2010

    14

    Social networking on wireless networks major form of communications

    Extreme Blue students collected 226 million Tweets, analyzed,categorized by incidence type and location

    Tweets included - Can I get food? Can I get gas? Are the bridges down - images

    The results were visualized

    Completed in ~12 weeks

  • © 2010 IBM Corporation

    Big Data = Volume, Variety and Velocity

    15

    •Volume - Scale from terabytes to zettabytes •Variety - Relational and non-relational data types from an ever-

    expanding variety of sources •Velocity - Streaming data and large volume data movement

  • © 2010 IBM Corporation

    Big Data = Volume, Variety and Velocity

    •Volume - Scale from terabytes to zettabytes •Variety - Relational and non-relational data types from an ever-

    expanding variety of sources •Velocity - Streaming data and large volume data movement

  • The Supercomputer is based on over 1,200 high powered IBM System X servers and can perform 150 trillion calculations per second -- equivalent to 30 million calculations per Danish citizen per

    second.

    Vestas expects its data sets will grow to 20-plus petabytes over the next four years.

  • © 2010 IBM Corporation

    Big Data = Volume, Variety and Velocity

    •Volume - Scale from terabytes to zettabytes •Variety - Relational and non-relational data types from an ever-

    expanding variety of sources •Velocity - Streaming data and large volume data movement

  • ©  2011  IBM  Corporation

    Seton  Healthcare  Family Reducing  CHF  readmission  to  improve  care  

    Business  Challenge Seton  Healthcare  strives  to  reduce  the  occurrence  of  high   cost  Congestive  Heart  Failure  (CHF)  readmissions  by   proactively  identifying  patients  likely  to  be  readmitted  on   an  emergent  basis.  

    What’s  Smart? IBM  Content  and  Predictive  Analytics  for  Healthcare   solution  will  help  to  better  target  and  understand  high-­‐risk   CHF  patients  for  care  management  programs  by:

    Smarter  Business  Outcomes • Seton  will  be  able  to  proactively  target  care  management  

    and  reduce  re-­‐admission  of  CHF  patients. • Teaming  unstructured  content  with  predictive  analytics,  

    Seton  will  be  able  to  identify  patients  likely  for  re-­‐admission   and  introduce  early  interventions  to  reduce  cost,  mortality  

    IBM  solution • IBM  Content  and  

    Predictive  Analytics   for  Healthcare

    • IBM  Cognos  Business   Intelligence

    • IBM  BAO  solution   services

    • Utilizing  natural  language  processing  to  extract  key  elements   from  unstructured  History  and  Physical,  Discharge  Summaries,   Echocardiogram  Reports,  and  Consult  Notes

    • Leveraging  predictive  models  that  have  demonstrated  high   positive  predictive  value  against  extracted  elements  of   structured  and  unstructured  data  

    • Providing  an  interface  through  which  providers  can  intuitively   navigate,  interpret  and  take  action

    “IBM  Content  and  Predictive  Analytics  for  Healthcare  uses  the  same  type  of  natural  language  processing  as  IBM  Watson,   enabling  us  to  leverage  information  in  new  ways  not  possible  before.  We  can  access  an  integrated  view  of  relevant   clinical  and  operational  information  to  drive  more  informed  decision  making  and  optimize  patient  and  operational   outcomes.”

  • ©  2011  IBM  CorporaUon2 ©  2011  IBM  CorporaUon

    IBM  Content  and  PredicUve  AnalyUcs  for  Healthcare The  Seton  CHF  Readmission  SoluUon  

    Unstructured  Data (Cerner  Clinical  Documenta0on:   History  and  Physical,  Discharge   Summary,  Echocardiogram.)

    Structured  Data (Avega  Cost  Data,  DSS  Admission   History,  DSS  Procedure  History,   Cerner  Clinical  Events)

    Raw   Informa=on

    Search  and  Visually  Explore   (Mine)

    Monitor,  Dashboard  and   Report  (Cognos  BI)

    Ques%on  and  Answer*

    Custom  SoluBons

    Dynamic   Mul=mode Interac=on

    IBM  Content  and   Predic=ve   Analy=cs

    Content  AnalyBcs •Natural  Language  Processing •Medical  Fact  and  Rela0onship   Extrac0on  (Annota0on)

    • Trend,  PaIern,  Anomaly, Devia0on  Analysis

    PredicBve  AnalyBcs • Predic0ve  Scoring  and   Probability  Analysis

    Analyzed  and   Visualized

    Informa=on

    Health   Integra=on   Framework

    Data  Warehouse  and  Model

    Master  Data  Management

    Advanced  Case  Management

    Business  AnalyBcsPa

View more