this month in data science - april edition

19
THIS MONTH IN DATA SCIENCE APRIL PRESENTS

Upload: pivotal

Post on 11-Aug-2014

306 views

Category:

Data & Analytics


1 download

DESCRIPTION

During the month of April, the growing impact of Big Data and data-driven insight on our daily lives became increasingly apparent. While pundits debated the merits of this massive sea change in data collection and analysis, its value and results were borne out this month in intriguing and surprising ways, including revealing things like why UPS trucks never turn left and exploring if there are time travelers living among us.

TRANSCRIPT

Page 1: This Month in Data Science - April Edition

THIS MONTH INDATA SCIENCEAPRIL

PRESENTS

Page 2: This Month in Data Science - April Edition

Learn more about how data science is changing your world at blog.gopivotal.com

Why UPS Trucks Don’t Turn LeftAmong data geeks, UPS’s 2004 announcement that their delivery vehicles would avoid taking left turns to conserve fuel has long been a source of curiosity. As this Priceonomics post explains, the company’s idiosyncratic yet data-driven company policy has yielded significant efficiency gains, utilizing simple algorithms to map routes which maximize right turns. According to the company, since 2012 the policy has “saved around 10 million gallons of gas and reduced emissions by the equivalent of taking 5,300 cars off the road for a year.”

http://priceonomics.com/why-ups-trucks-dont-turn-left/?imm_mid=0bae5b&cmp=em-strata-na-na-newsltr_20140416_elist

Page 3: This Month in Data Science - April Edition

Learn more about how data science is changing your world at blog.gopivotal.com

Data Science: From Half-Baked Ideas to Data-Driven InsightsIn this post for the Wall Street Journal’s CIO Journal, Irving Wladawsky-Berger provides executives with a high-level overview of the growing importance of data scientists within the enterprise, a field he describes as “one of the most exciting new professions and academic disciplines.”

http://blogs.wsj.com/cio/2014/04/11/data-science-from-half-baked-ideas-to-data-driven-insights/

Page 4: This Month in Data Science - April Edition

Learn more about how data science is changing your world at blog.gopivotal.com

The Backlash Against Big Data, ContinuedBig Data may have entered the hype cycle’s dreaded trough of disillusionment if the recent media backlash is any indication, even though many critiques lack a sophisticated understanding of the tools and methodologies involved. Mike Loukides at O’Reilly Media pushes back against the backlash. He acknowledges that data scientists must be ever-vigilant and skeptical when considering the limitations of particular methodologies and data sources, but emphasizes that the Big Data revolution is well underway, and powers a great number of technologies we rely on daily, and trend that will only continue to grow in future years.

http://radar.oreilly.com/2014/04/the-backlash-against-big-data-continued.html?imm_mid=0ba721&cmp=em-strata-na-na-newsltr_20140409_elist

Page 5: This Month in Data Science - April Edition

Learn more about how data science is changing your world at blog.gopivotal.com

Is There a Wonk Bubble?The semi-concurrent launch of Nate Silver’s 538, Vox, and a slew of data-driven “explainer” sites from big media outlets like the Washington Post and the New York Times has driven much debate this month about the value and potential limitations of data journalism. In this Politico essay, Felix Salmon argues why the boom in data journalism is actually a good thing for the news industry and media junkies alike.

http://www.politico.com/magazine/story/2014/04/is-there-a-wonk-bubble-105473.html?imm_mid=0bb43a&cmp=em-strata-na-na-newsltr_20140423_elist#

Page 6: This Month in Data Science - April Edition

Learn more about how data science is changing your world at blog.gopivotal.com

The Internet of Things is Great for Chipmakers And a Challenge for IntelGigaOM details how the Internet of Things has the potential to revitalize the chip industry, noting the amount of new opportunities and challenges that will arise as companies attempt to bring everyday physical objects into the connected world.

http://gigaom.com/2014/04/16/the-internet-of-things-is-great-for-chipmakers-and-a-challenge-for-intel/

Page 7: This Month in Data Science - April Edition

Learn more about how data science is changing your world at blog.gopivotal.com

900 Years of Tree Diagrams, the Most Important Data Viz Tool in HistoryIt may be the new hotness in boardrooms and shareable viral content, but data visualization is a centuries-old practice. In this fun post, Wired looks back at the past 900 years of tree diagrams, which came about during the Middle Ages, during which time there was an explosion of new knowledge needing to be categorized and communicated, drawing parallels with the Big Data explosion of today.

http://www.wired.com/2014/04/tree-diagrams-the-most-important-data-viz-tool-in-history/

Page 8: This Month in Data Science - April Edition

Learn more about how data science is changing your world at blog.gopivotal.com

The Big Data Challenge to Legacy Data Management CompaniesThe New York Times explores how big data software companies are threatening the profitability of legacy hardware vendors such as Oracle, IBM, Teredata, and others. It relates the current industry shift to the way microprocessor-based computing drove computer mainframe prices into the ground.

http://bits.blogs.nytimes.com/2014/04/07/the-big-data-challenge-to-legacy-data-management-companies/?_php=true&_type=blogs&_php=true&_type=blogs&_php=true&_type=blogs&emc=edit_tu_20140407&nl=technology&nlid=7804711&_r=2

Page 9: This Month in Data Science - April Edition

Learn more about how data science is changing your world at blog.gopivotal.com

Researchers Search for Time Travelers Using Internet Tools, Clever Statistical AnalysisA capricious group of Cornell researchers utilized data mining and deep statistical analysis to trawl the web and determine whether there are time travelers lurking in our midst. Unfortunately for the sci-fi minded among us, the researchers came up short in their research, but in the process illuminated the lighter side of data analysis.

http://www.geek.com/science/new-statistical-research-asks-do-time-travelers-walk-among-us-1581169/?imm_mid=0bb43a&cmp=em-strata-na-na-newsltr_20140423_elist

Page 10: This Month in Data Science - April Edition

Learn more about how data science is changing your world at blog.gopivotal.com

THIS MONTH IN PIVOTAL DATA SCIENCE

Page 11: This Month in Data Science - April Edition

Learn more about how data science is changing your world at blog.gopivotal.com

Pivotal’s New Big Data Suite Redefines the Economics of Big Data Including UNLIMITED Hadoop to EnterprisesThis month, Pivotal changed the economics of Big Data forever, launching the Pivotal Big Data Suite. It is an annual subscription based software, support, and maintenance package that bundles Pivotal Greenplum Database, Pivotal GemFire, Pivotal SQLFire, Pivotal GemFire XD, and Pivotal HAWQ, into a flexible pool of big and fast data products.

http://blog.gopivotal.com/pivotal/products/pivotals-new-big-data-suite-redefines-the-economics-of-big-data-including-unlimited-hadoop-to-enterprises

Page 12: This Month in Data Science - April Edition

Learn more about how data science is changing your world at blog.gopivotal.com

Pivotal Debuts at ApacheCon North America Thanks For Having Us!Pivotal’s Roman Shaposhnik reviews ApacheCon 14, which took place last week in Denver. At Pivotal’s self-described “coming out party” to the Apache Software Foundation, we worked to make an impression by starting off with a keynote, providing and attending various sessions and even hosting a cocktail party. In this review of the event, Shaposhnik also points community members to some of the newer technologies he believes are hot to watch and use right now.

http://blog.gopivotal.com/pivotal/features/pivotal-debuts-at-apachecon-north-america-2014-thanks-for-having-us

Page 13: This Month in Data Science - April Edition

Learn more about how data science is changing your world at blog.gopivotal.com

Big Data & Brews Video Explains How Pivotal’s Hadoop Distribution Is DifferentIn a video interview for the Big Data & Brews series, Pivotal’s Chief Scientist Milind Bhandarkar shares a beer with Datameer’s CEO Stefan Groschupf and provides an overview of the many features that differentiate Pivotal’s Hadoop distribution from the rest.

http://blog.gopivotal.com/pivotal/products/big-data-brews-video-explains-how-pivotals-hadoop-distribution-is-different

Page 14: This Month in Data Science - April Edition

Learn more about how data science is changing your world at blog.gopivotal.com

DSC Webinar Series: Data Science for the 99% Open Source Software for Machine Learning and AnalyticsIn this webinar, available to now view at Data Science Central, Pivotal’s Woo J. Jung, Sarah Aerni, and Srivatsan Ramanujam discuss some of the open source tools in their arsenal. They introduce and provide details on the variety of open source tools — such as MADlib, PL/R, PL/Python, PivotalR, PyMADlib and a host of others — they have utilized and extended for customer engagements.

http://www.datasciencecentral.com/video/dsc-webinar-series-data-science-for-the-99-open-source-software

Page 15: This Month in Data Science - April Edition

Learn more about how data science is changing your world at blog.gopivotal.com

Time Series Analysis Part 3: Resampling and InterpolationThe previous blog posts in this series introduced how Window Functions can be used for many types of ordered data analysis. This post further elaborates how these techniques can be expanded to handle time series resampling and interpolation.

http://blog.gopivotal.com/tag/data-science#sthash.RCeaWqlT.dpuf

Page 16: This Month in Data Science - April Edition

Learn more about how data science is changing your world at blog.gopivotal.com

UPCOMING EVENTS

Page 17: This Month in Data Science - April Edition

Learn more about how data science is changing your world at blog.gopivotal.com

Parquet: Open-Source Columnar Format For HadoopThursday, May 15, 20145:45 PM to 8:30 PMPivotal LabsSan Francisco, CA

Twitter’s Dmitriy Ryaboy and Pivotal’s Milind Bhandarkar discuss Parquet, an open source project implementing columnar storage that supports deeply nested structures, efficient encoding and column compression schemes, and is designed to be compatible with a variety of higher-level type systems. In this talk, they will go over the Parquet design, use cases, and performance numbers.

http://www.meetup.com/Pivotal-Open-Source-Hub/events/177942192/

Page 18: This Month in Data Science - April Edition

Learn more about how data science is changing your world at blog.gopivotal.com

Hadoop Cluster Configuration andPerformance TuningTuesday, May 20, 20145:45 PM to 8:30 PMPivotal LabsSan Francisco, CA

Configuring and operating a Hadoop cluster is still not a trivial task and needs special considerations. In this talk, Pivotal’s Suhas Gogate will provide various tips to configure a Hadoop cluster and to analyze and tune the performance of Map/Reduce applications. He will also demo “Hadoop Vaidya”, a performance advisor for Hadoop M/R, which he submitted as a Hadoop contrib project.

http://www.meetup.com/Pivotal-Open-Source-Hub/events/178861422/

Page 19: This Month in Data Science - April Edition

Learn more about how data science is changing your world at blog.gopivotal.com

2014 Hadoop SummitJune 3–5, 2014San Jose, CA

The 7th Annual Hadoop Summit will feature many of the Apache Hadoop thought leaders who will showcase successful Hadoop use cases, share development and administration tips and tricks, and educate organizations about how best to leverage Apache Hadoop as a key component in their enterprise data architecture.

http://hadoopsummit.org/san-jose/