david douglas department of information systems introducing… 2

62
David Douglas Department of Information Systems

Upload: theresa-kelley

Post on 23-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • Slide 2
  • David Douglas Department of Information Systems
  • Slide 3
  • Introducing 2
  • Slide 4
  • 12 Definitions of Big Data 25 Big Data Facts 3
  • Slide 5
  • Short Big Data Video Short Video 4
  • Slide 6
  • Leveraging Big Data in Todays Enterprise 5
  • Slide 7
  • Big Data Via the Three Vs 6
  • Slide 8
  • 7
  • Slide 9
  • Volume 8
  • Slide 10
  • SAS adds a V Visualization Of course, the objective to get Valuethe ultimate V 12 Vs has been reported 9
  • Slide 11
  • Definition of Big Data Big Data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high velocity capture, discovery and/or analysis -- IDC Data is the new oil European Consumer Commissioner Meglena Keneva 10
  • Slide 12
  • 11
  • Slide 13
  • The Human Face of Big Data 12
  • Slide 14
  • Digital Universe 13
  • Slide 15
  • Gartner Technology Hype Cycle 14
  • Slide 16
  • Gartner Technology Hype Cycle 15
  • Slide 17
  • Big Data Technologies 16
  • Slide 18
  • Hadoop/MapReduce Was driven by the need to index the web Existing technology did not scale MapReduce framework developed at Google Yahoo! built Hadoop on the Map/Reduce framework Note: recent survey indicates only 16% of companies using the Hadoop/MapReduce environment dominated by the online companie 17
  • Slide 19
  • Hadoop Is the Storage Layer (HDFS) Hadoop Distributed File System - Software to distribute data across multiple computing nodes. Typically runs on top of Linux Store each block 3 timeshopefully with one on a node in a different rack Sequential access write once, read many Optimized for streaming no random access No predefined schemaany data type 18
  • Slide 20
  • Hadoop (cont) The Execution Layer (Map/Reduce) Responsible of running a batch job in parallel on many servers Typically runs on top of Linux Works with (key, value) pairs For a job Mapper pulls data from their respective files Mapper Feeds Shuffle (may not be needed) Shuffle feeds Reducer which summarizes and returns result Java is native language 19
  • Slide 21
  • Map Reduce Example Five files; each with two columns of key, value pairs of city, max temperature Example: Toronto, 20 Whitby, 25 Problem: Find the maximum temperature for each city Break down into 5 mapper tasks; results of mapper tasks are: (Toronto, 20) (Whitby, 25) (New York, 22) (Rome, 33) (Toronto, 18) (Whitby, 27) (New York, 32) (Rome, 37) (Toronto, 32) (Whitby, 20) (New York, 33) (Rome, 38) (Toronto, 22) (Whitby, 19) (New York, 20) (Rome, 31) (Toronto, 31) (Whitby, 22) (New York, 19) (Rome, 30) Mapper task results feeds into reduce tasks which combine the input results and outputs a single value for each city (Toronto, 32) (Whitby, 27) (New York, 33) (Rome, 38) 20
  • Slide 22
  • Hadoop Ecosystem 21
  • Slide 23
  • Technology for Big Data 22
  • Slide 24
  • In-Memory Computing-Speed RAM Latency 70 Nanoseconds 1400 MPH Disk Latency 5 Milliseconds 0.003 MPH 23
  • Slide 25
  • HANA In-Memory Computing Speed Demo Backdrop Oracle World Demo Put all of Wikipedia into Oracle 12c with in-memory option SAP Tech Ed a few weeks later Put all of Wikipedia into SGI HANA box(250 billion rows) Query and Plot of Wikipedia page views of AIDS versus Ebola by date Forecast of Wikipedia page views of AIDS versus Ebola by dat e 24
  • Slide 26
  • 25
  • Slide 27
  • 26
  • Slide 28
  • RAM is so inexpensive, it is a no-brainer to move to in-memory computing? In-memory computing is an expected evolution in the digital universe? In-memory computing tenet: RAM is the new DISK DISK is the new TAPE Myth or Reality 27
  • Slide 29
  • On-line Gambling Increasing number of online bets per second from 20,000 to 150,000 (Bwin.Party) Education Near real-time analytics driving intervention for improving retention (University of Kentucky) Health Care Intersection of smart devices, electronic health care records and in-memory analytics to provide real-time diagnostics and treatment McKinsey & Company Global package company Move to real-time tracking of packages MarketWatch Cases 28
  • Slide 30
  • Thoughts on In-Memory Computing In-Memory Computing makes Big Data Possible Insight at the speed of thought IMDBMS reduces data footprint Eliminates aggregates Compression for columns higher than for rows Optimized for RAM instead of optimized for disk 29
  • Slide 31
  • Automated Decision-Making Mobile Computing Two Factors Will Drive In-Memory Computing Faster than Planned 30
  • Slide 32
  • A Data Scientist 31
  • Slide 33
  • Another View of a Data Scientist 32
  • Slide 34
  • So How Do I Find One 33
  • Slide 35
  • Big Data is disruptive in the following ways It brings grid and in-memory computing to business Software is being moved to the data instead of moving the data to the software Transition from analytics as rest to analytics in motion Will create new demand for workers with analytics skills 34
  • Slide 36
  • Big Data is really about Analytics 35
  • Slide 37
  • A View of Analytics Source: mu-sigma 36
  • Slide 38
  • Source: Rose Business Technologies Another View of Analytics 37
  • Slide 39
  • Competitive Advantage Basic Reporting What happened? Ad Hoc Reporting How many, how often, where? Dynamic ReportingWhere exactly are the problems? Reporting with Early WarningWhat actions are needed? Basic Statistical Analysis Why is this happening? Forecasting What if these trends continue? Predictive Modeling What will happen next? Decision OptimizationWhat is the best decision? Data Information Intelligence Advanced Analytics Basic Analytics Reporting Decision Support Decision Guidance Achieving Success with Business Analytics Another View of Analytics 38
  • Slide 40
  • Another View of Analytics 39
  • Slide 41
  • Another View of Analytics 40
  • Slide 42
  • Cognitive Computing? Watson gains eyes, ears and a voice 41
  • Slide 43
  • The Importance of Big Data and Analytics Wall Street Journal 9/16/13 44% of CIOs consider Business Intelligence as top priority for technology spending 51% of the companies plan to increase spending on Business Intelligence and Analytics software this year A recent McKinsey report Considers Big Data as The next frontier for competition The United States alone faces a shortage of 140,000 to 190,000 people with analytical expertise and 1.5 million managers and analysts with the skills to understand and make decisions based on the analysis of big data. Do you need a Data Scientist? 42
  • Slide 44
  • The Importance of Big Data and Analytics 43
  • Slide 45
  • Analytics, The New Path to Value (MIT research report: 30 industries, 100 countries) Analytics is the differentiator for the top performing companies chart on next slide Data is not the problem Data Driven Decisions 44
  • Slide 46
  • 45
  • Slide 47
  • 5 Stages of Big Data and Analytics Maturity 46
  • Slide 48
  • Current State 47
  • Slide 49
  • There is a Journal Big Data Journal Word (Tag) Cloud Word Cloud with Images Easy Text Manipulation http://www.ibm.com/analytics/watson-analytics/ https://ace.ng.bluemix.net/ http://www.biography.com/people/warren-buffett-9230729#synopsis 48
  • Slide 50
  • 49
  • Slide 51
  • Of Interest Social Bakers Amazing Twitter Stats Google Trends Social media location adds considerable opportunity 50
  • Slide 52
  • Implications 51
  • Slide 53
  • The Analytics At rest (static) Models including predictive models using historical data In-motion (real-time) Using models on a stream feed Combination Uses models on a stream feed; stream feed goes into the data at rest to update models 52
  • Slide 54
  • Analytics at RestAnalytics in Motion 53
  • Slide 55
  • Thoughts It is not a matter of if but when you get into Big Data analytics Purpose is to provide enablement for users Choices Pure plays like Cloudera, Hortonworks, MapR, Pivotal, etc. NoSQL databases (key-value, documents, networks) Major computing player like IBM, Oracle, etc. In-Memory Computing Should not be a new silo 54
  • Slide 56
  • Terms IoT Internet of Things IoE Internet of Everything IoN Internet of Nothing The vast majority of the billions of things connected to the internet on Ciscos website, for instance, are not the toasters, refrigerators, thermostats, smoke detectors, pace-makers and insulin pumps that the IoT's true believers enthuse about. Almost exclusively, they are existing smartphones, tablets, computers and routers, plus a surprising number of industrial components used to beam performance statistics back to corporate headquarters. Without any hoopla, operators of power stations, passenger jets, railways, refineries, chemical plants, oil platforms and other industrial equipment have been doing this for ages. 55
  • Slide 57
  • EMC Digital Universe with Research & Analysis by IDC The Digital Universe of Opportunities: Rich Data and the Increasing Value of the Internet of Things April 2014 40% of data created and consumed by consumers 56
  • Slide 58
  • 57
  • Slide 59
  • We Live in an Era of Change 58
  • Slide 60
  • Gartners Magic Quadrant BI & Analytics 59
  • Slide 61
  • 60
  • Slide 62
  • Good Reading iPad App 61
  • Slide 63
  • 62