big data a big deal?

40
BIG DATA…A BIG DEAL? Organized by: Andrew Waitman

Upload: andrew-waitman

Post on 22-Jan-2015

1.196 views

Category:

Technology


1 download

DESCRIPTION

Why all the fuss over Big Data? And why now? What CIOs and CEOs should understand about Big Data and how it may impact their business.

TRANSCRIPT

  • 1. BIG DATAA BIG DEAL?Organized by: Andrew Waitman

2. Big Data, Small Sound Bytes2 2009/2012 Pythian All Rights Reserved 3. Big Data, Small Sound Bytes3 2009/2012 Pythian All Rights Reserved 4. Big Data, Small Sound Bytes4 2009/2012 Pythian All Rights Reserved 5. Why Big Data Now? VOLUME1. All on-line digital activity creates artifacts or metadata which in Tera to Peta byte or more volume is being called BIG DATA2. Unstructured Metadata collection occurs when ever digital activity occurs3. Digital metadata volume has exploded with growing internet usage and has accelerated with recent smart phone & iPAD usage driving global mobile and social activity5 2009/2012 Pythian All Rights Reserved 6. Why Big Data Now? HUMAN VOLUME1. In 1998 Google provided 3.6 Million searches in the year2. In 2011 Google ran 1,722,071,000,000 searches per year3. In August 2008 there were 100 Million Facebook users4. In December 2012 there will be over 1 Billion Facebook users5. In August 2012 Twitter reached over 500,000,000 usersDigital volume of user on-line metadata has exploded with growinginternet, mobile and social use.6 2009/2012 Pythian All Rights Reserved 7. Why Big Data Now? DEVICE VOLUME1. In 2005 There were 1.5 Billion RFID Tags2. In 2012 There are 30 Billion RFID Tags3. 350 Billion Smart Meter Transactions per year4. 1 Billion smart phones by 2015 with location sensorsDigital sensor data volume has exploded with growingmachine usage of sensor and measurement reporting7 2009/2012 Pythian All Rights Reserved 8. Why Big Data Now?ZEITGIEST1. Data Driven Decision Making is mainstream thinking Think Moneyball by Michael Lewis2. Google demonstrated the value and importance of mining Big Data for Search, Ad Placement, Language Translation and a myriad of other computing challenges with economic benefit.3. Data trumps smarter algorithms. It isthe dawning of the Age of Real Time & Near RealTime BIG Impact Analytics.8 2009/2012 Pythian All Rights Reserved 9. Why Big Data Now? ECONOMICS1. Collection & Analysis of large volumes of metadata is now relatively simple, low cost and potentially highly valuable2. Storage & computing power is relatively low cost enabling the mining of massive metadata volumes in real time, near real time or later3. The economic benefit or value of the insights can far exceed the costs of acquiring & storing the data4. The simplification and access of Big Data infrastructure tools9 2009/2012 Pythian All Rights Reserved 10. Purpose of Data Analysis The analysis of data are required to understand (a) why consumers purchase a particular, (b) how consumers purchase the product, (c) the demographics and psychographics of the purchaser of the product and (d) the ultimate user of the product.10 2009/2012 Pythian All Rights Reserved 11. An Alternative PerspectiveBig Data is just the new rallying cry for the same old stuff BI companies have been producing all along-Stephen Few Perceptual Edge This seems obvious, but almost no attention is being given to building the skills and technologies that help us glean insights from data more effectively. As Richards J. Heuer,AVOID Jr. argued in the Psychology of Intelligence Analysis (1999), the primary failures of analysis are less due toCONFUSING insufficient data than to flawed thinking. To succeed analytically, we must invest a great deal more of ourABUNDANCE resources in training people to think effectively and we must equip them with tools that support that effort.WITH Heuer spent 45 years supporting the work of the CIA. Identifying a potential terrorist plot requires an analyst INSIGHT to sift through a lot of data (perhaps Big Data), but more importantly, it relies on their ability to connect the dots. Contrary to Heuers emphasis on thinking skills, big data is merely about more, more, more; not smarter or better.11 2009/2012 Pythian All Rights Reserved 12. Is Big Data really new?NOWhat is new is that the access-to-insights occurs ateconomics and tools available to almost anyone todaySaving all data is now economically viable for everyone.Large public and private sector (Global 2000) enterpriseshave always generated, stored, processed and analyzedlarge volume and a variety of structured andunstructured data:1.Particle Physics Research - Large Hadron Collider generates 1 Petabyte per second.2.Oil Exploration - Seismic sensor daa3.Bioinformatics -Human Genome Project 12 2009/2012 Pythian All Rights Reserved. 13. BIG DATA VS TRADITIONAL DATAPetabytes at1/10th Cost of Pre-EngineeredGigabytes to Tera-bytes StorageSQL Structured Semi-structuredEngineered SystemsVariety of SourcesData Model/Schema Store EverythingSelected Data StoredRaw DataComplexity at Design/Architecture stage No Data Model/Schema Simplicity at Usage stageParallelize to handle volumeMajority of $$ Investment up front Simplicity at Design/Architecture stage Complexity at Insight stage13 2009/2012 Pythian All Rights Reserved 14. Big Data is BI at ScalePHASE 1 PHASE 2 PHASE 3Capture & SpeculateExploitStore andInsights Petabyte scaleInvestigate Real Time 300 Data Science NRT DecisionsTerabytes/Rack Analytics MAP-R14 2009/2012 Pythian All Rights Reserved 15. Big Data Phase 1- Capture & Store Is the value of potential insights much greater than the cost of searching for them?BUSINESS QUESTIONS How do you plan to store what types of semi-structured data? What questions are you attempting to answer? What Data Analysis is being currently done? What are people asking questions about? What DR? What compression? What Storage is possible? Flash vs Disk? Capacity and How fast to access? How many people can access simultaneously? KNOW THE DATA? SOURCE? RATE OF GENERATION?15 2009/2012 Pythian All Rights Reserved 16. Big Data Phase 1- Capture & Store Is the value of potential insights much greater than the cost of searching for them?STORAGE REQUIREMENTS Be scalable Provide tiered storage Be self managing Ensure content is highly available Ensure content is widely accessible Support both analytical and content applications Support workflow automation Integrate with legacy applications Enable integration with public, private and hybrid cloud ecosystems Be self healing16 2009/2012 Pythian All Rights Reserved 17. Big Data Phase 2- Speculate and Investigate Is the value of potential insights much greater than the cost of searching for them?BUSINESS QUESTIONS What type of semi-structured data do I have? What type of questions am I trying to answer? Statistical? Correlation? Causal? Patterns? How do I need to manipulate, translate, transform, cleanse, organize, visualize the data? How much time do I have for analysis? What tools do I have to perform transformation and analysis?17 2009/2012 Pythian All Rights Reserved 18. Big Data Phase 3- Exploit Insights Is the value of potential insights much greater than the cost of searching for them?BUSINESS QUESTIONS Are discovered patterns/insights available in real-time, near real- time or further out? How do systemically find pattern/insight going forward? How do I integrate into business impacting decision process?18 2009/2012 Pythian All Rights Reserved 19. Top 10 Reasons Why all the Hype around Big Data now?1. At Tera & Peta bytes it really does get interesting.2. All the Cool Kids are doing it. Once the Four Digerati Horseman (Google, Facebook, Twitter, Amazon) say its important, then it really is.3. BI Folks needed a new marketing moniker.4. CLOUD hype was already annoying and slowing.5. Gartner says its near its peak!6. The term went viral!7. People thought you said Big Deal!8. Voluminous data could not be pronounced9. User Data mining is next to Voyeurism10. Its Googles Vault!19 2009/2012 Pythian All Rights Reserved 20. What is considered Big Data?VOLUME & VARIETY 1. Any data stored digitally and at scale (Tera bytes+) with potential for providing practical, usefulinsights, potentially with economic benefits 2. Very large volume of unstructuredinformation/data 3. Big Data is characterized by the volume, velocityand variety of large data sets Every connected person or connected device is potentially a data generator20 2009/2012 Pythian All Rights Reserved 21. What is considered Big Data?DIFFICULT & TIMELY 1. Big Data by the nature of the volume hides orobscures valuable insights. A lot of noise but withcritical and potentially valuable signals buriedwithin 2. Often the signal value perishes rapidly requiringreal time or near real time analysis and action Big Data is the quintessential signal vs noise problem21 2009/2012 Pythian All Rights Reserved 22. Examples of Big Data? Local/regional weather information WEB Traffic information User search behavior Social information who connected to whom, whopoked who etc. Mobile User information preferences, likes,habits Application usage information E-commerce transaction information Physical retail customer transaction data22 2009/2012 Pythian All Rights Reserved 23. Who are the Top 15 Big Data Players? 1. Google11.Microsoft 2. Amazon12.IBM 3. Apple 13.Hortonworks 4. Yahoo 14.Zynga 5. Facebook15.eBay 6. Salesforce 7. Twitter 8. Cloudera 9. LinkedIN 10.NetFlix23 2009/2012 Pythian All Rights Reserved 24. 1. www.kaggle.com 2. www.indeed.com 3. www.recordedfuture.com 4. www.datamarket.com 5. www.climate.com 6. www.manybills.com 7. www.electrion.twitter.com 8. www.consensu.gov 9. www.coursera.com 10. www.data.gov24 2009/2012 Pythian All Rights Reserved 25. What is the size of the BIG DATA Market? Deloitte pegs the size of the big data market at about $1.3-$1.5 billion in 2012 In March, the IDC released a statement that predicted the worldwide big data technology services market to reach $16.9 billion in 2015. The 2012 Global BI SW Market is $35 Billion25 2009/2012 Pythian All Rights Reserved 26. Where does BI and Big Data co- exist? PREDICTIVE ANALYTICS26 2009/2012 Pythian All Rights Reserved 27. How does Machine Learning and Big Data relate?PREDICTIVE ANALYTICS27 2009/2012 Pythian All Rights Reserved 28. When is Big Data valuable? 1. When better Business decisions result from practicalinsights provided by data that were unavailable toexpert judgment or unaware by experts 2. When time-to-insight results in big returns or benefiteg. Real time book recommendation 3. Where precision of analysis results in specificalternative decisions 4. Where patterns from heterogeneous or seeminglydisparate data sources provide material competitiveinsights/advantage versus competition28 2009/2012 Pythian All Rights Reserved 29. What is unique about Big Data Technology? MASSIVE PARRALLISMAFFORDABLE HARDWARELOCAL PROCESSING 1. The tools do not require the data to be firststructured in a particular schema as is required inrelational databases 2. Data is analyzed in native format closest to whereit is stored, dramatically reducing the time andeffort for retrieval and restore.29 2009/2012 Pythian All Rights Reserved 30. Visualization may unlock the key to Big Data Insights30 2009/2012 Pythian All Rights Reserved 31. What skills do I need in my organization for Big Data?1. Data scientists Identify what analysis makes sense in context. Typical background in math andstatistics, as well as artificial intelligence and natural language processing.2. Data architects Create Data mode and identify required data sources and analytical tools3. Data visualizers Using visualizations exploring what the data means and presenting how it willimpact the company4. Data change agents Good communicators, and a Six Sigma background Understand how to applystatistics.31 2009/2012 Pythian All Rights Reserved 32. What skills do I need in my organization for Big Data?5. Data engineer/operators Big Data infrastructure operations. Develop architecture that helps analyze andsupply data in the way the business needs, and make sure systems areperforming smoothly6. Data stewards Ensure that data sources are properly accounted for, and may also maintain acentralized repository as part of a Master Data Management approach, in whichthere is one gold copy of enterprise data to be referenced.7. Data virtualization/cloud specialists Build and maintain a virtualized data service layer that can draw data from anysource and make it available across organizations in a consistent, easy-to-accessmanner8. Systems Administrators32 2009/2012 Pythian All Rights Reserved 33. Six Steps to Big Data alchemy?1. Select the right data sets Identify rich data sources which may contain insights to a particular problem you are trying to solve or insight you are trying to gain. Social media data is providing incredible insights to changes in Brand positioning and new product introductions2. Join the various sets of data Rich unstructured and sometimes incomplete data into a new set for manipulation and analysis3. Clean the new large data set Begin to discover important and relevant patterns, signatures, anomalies, correlations, outliers using advanced analytic models4. Create models These models predict outcomes using the data. Iterate your hypothesis and keep experimenting5. Use visualization tools Visualization may assist in discovery or presentation of key insights from the data6. Iterate Keep varying your various models and data sets to assist future planning or decision making33 2009/2012 Pythian All Rights Reserved 34. How is Big Data providing Value today? On line Media and Social Sites mine user behavior Big Data for what interests whom, when, why and how. Big WEB SURF Data provides insights to Sites of what people are interested in, whom do they share that information with, and how long they stay engaged on line. On line retailers mining Big Data to predict consumers buying behavior, purchase preferences and high impact offers to drive up total spend per session. Insurance companies mining Big Data can improve their overall performance by facilitating greater pricing accuracy, deeper relationships with customers, and more effective and efficient loss prevention.34 2009/2012 Pythian All Rights Reserved 35. How can Pythian help you with Big Data? 1. First, get informed. 2. Second, get started. Recognize an opportunity for competitive Advantage within your company. 3. Third, get the right team of people involved. Organize an internal task force to drive the Big Data initiative. Dont forget to find the critical Data Scientist. That person who will understand the data sources and know what questions to pose. 4. Fourth, identify the key sources of Big Databoth external and internal. 5. Fifth, with Pythians assistance evaluate thetools and technology that will help your BigData program.35 2009/2012 Pythian All Rights Reserved 36. Key Questions for Executives What does the data say? Where did the data come from? Has the data been sufficiently cleaned? How was the data analyzed? How confident can we be in our analysis? Can we distinguish correlation from causality? How much will the data influence the key decision makers?36 2009/2012 Pythian All Rights Reserved 37. A compelling balanced perspective on BigDataStephen Few- Perceptual Edge37 2009/2012 Pythian All Rights Reserved 38. Archive Slides38 2009/2012 Pythian All Rights Reserved 39. Big Data Start-ups WeatherBill (which compiles large amounts of weather data from a variety of sources, then sells insurance based on statistical analysis), Klout (a controversial startup that processes large amounts of data to create every userss social influence score) or Wonga (which crunches data to grant financial loans) are some early examples of startups with big data as their core DNA. John Partridge, the president and CEO of Tokutek Inc. a Lexington company founded in 2006 that makes databases run faster. Trifacta raised $$4.3 million from Accels Big Data fund for a solution that doesnt just visualize insight, but also the analytics tools that produce it. Platfora is a software company based in San Mateo, California, building a revolutionary BI and analytics platform that democratizes and simplifies use of big data and Hadoop. The company was founded by Ben Werther, former product head of Greenplum, an analytical database company acquired by EMC. Platfora is assembling a superb team of data and distributed systems architects/engineers, UI and UX developers, and data scientists.39 2009/2012 Pythian All Rights Reserved 40. Big Data Start-ups About MapR TechnologiesMapR delivers on the promise of Hadoop, making managing and analyzing Big Data a reality formore business users. MapR enables customers to harness the power of Big Data analytics.Leading companies including Amazon, Cisco, EMC and Google partner with MapR to deliver anenterprise-grade Hadoop solution. Investors include Lightspeed Venture Partners, NEA andRedpoint Ventures. Alteryx provides indispensable analytic solutions for enterprise and SMB companies makingcritical decisions about how to expand and grow. Our product, Alteryx Strategic Analytics, is adesktop-to-cloud Agile BI and analytics solution designed for data artisans and business leadersthat brings together the market knowledge, location insight, and business intelligence todaysorganizations require. For more than a decade, Alteryx has enabled strategic planningexecutives to identify and seize market opportunities, outsmart their competitors, and drivemore revenue.40 2009/2012 Pythian All Rights Reserved