“big data” and data -intensive science (escience)

Click here to load reader

Post on 16-Feb-2016




0 download

Embed Size (px)


“Big Data” and Data -Intensive Science (eScience). Ed Lazowska Bill & Melinda Gates Chair in Computer Science & Engineering University of Washington July 2013. E xponential improvements in technology and algorithms are enabling the “big data” revolution. A proliferation of sensors - PowerPoint PPT Presentation


Slide 1

Big Data andData-Intensive Science (eScience)Ed LazowskaBill & Melinda Gates Chair in Computer Science & EngineeringUniversity of WashingtonJuly 2013

1Exponential improvements in technology and algorithms are enabling the big data revolutionA proliferation of sensorsThink about the sensors on your phoneMore generally, the creation of almost all information in digital formIt doesnt need to be transcribed in order to be processedDramatic cost reductions in storageYou can afford to keep all the dataDramatic increases in network bandwidthYou can move the data to where its neededDramatic cost reductions and scalability improvements in computationWith Amazon Web Services, or Google App Engine, or Microsoft Azure, 1000 computers for 1 day cost the same as 1 computer for 1000 days!Dramatic algorithmic breakthroughsMachine learning, data mining fundamental advances in computer science and statisticsSome examples of big data in actionCollaborative filtering

Fraud detection

Price prediction

Hospital re-admission prediction

Travel time prediction under specific circumstances


Home energy monitoring

Larry Smarr, UCSDGordon Bell, Microsoft ResearchJohn Guttag & Collin Stultz, MITGoogle self-driving car

Speech recognitionMachine translationSpeech -> textText -> text translationText -> speech in speakers voice

http://www.youtube.com/watch?v=Nu-nlQqFCKg&t=7m30s7:30 8:40

Scientific discovery

Ocean Observatories InitiativeGene SequencingLarge Hadron ColliderLarge Synoptic Survey Telescope

Presidential campaigning

Electoral forecasting

Real data-driven decision-making (vs. MBA baloney) for every sector!

eScience: Sensor-driven (data-driven) science and engineering

Transforming science (again!)Jim Gray




TheoryExperimentObservation[John Delaney, University of Washington]21




23 eScience is driven by data more than by cyclesMassive volumes of data from sensors and networks of sensors

Apache Point telescope, SDSS80TB of raw image data (80,000,000,000,000 bytes)over a 7 year period24

Large Synoptic Survey Telescope (LSST)40TB/day(an SDSS every two days),100+PB in its 10-year lifetime400mbps sustained data rate betweenChile and NCSA25 Large Hadron Collider700MB of dataper second,60TB/day, 20PB/year

26 IlluminaHiSeq 2000 Sequencer~1TB/day

Major labs have 25-100 of these machines27 Regional Scale Nodes of the NSF Ocean Observatories Initiative1000 km of fiber optic cable on the seafloor, connecting thousands of chemical, physical, and biological sensors

28 The Web20+ billion web pages x 20KB = 400+TBOne computer can read 30-35 MB/sec from disk => 4 months just to read the web

29 eScience is about the analysis of dataThe automated or semi-automated extraction of knowledge from massive volumes of dataTheres simply too much of it to look atIts not just a matter of volumeVolumeRateComplexity / dimensionality30 eScience utilizes a spectrum of computer science techniques and technologiesSensors and sensor networksBackbone networksDatabasesData miningMachine learningData visualizationCluster computing at enormous scale

31eScience will be pervasiveSimulation-oriented computational science has been transformational, but it has been a nicheAs an institution (e.g., a university), you didnt need to excel in order to be competitiveeScience capabilities must be broadly available in any institutionIf not, the institution will simply cease to be competitive