data science in future tense
DESCRIPTION
"Data Science in Future Tense" keynote talk for GalvanizeU Launch in SFTRANSCRIPT
![Page 1: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/1.jpg)
Data Science in Future Tense !
GalvanizeU Launch! 2014-10-29 gulaunch.splashthat.com
!
Paco Nathan @pacoid
![Page 2: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/2.jpg)
Whither Data Science?
![Page 3: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/3.jpg)
Whither Data Science?
twitter.com/josh_wills/status/198093512149958656
![Page 4: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/4.jpg)
Whither Data Science?
twitter.com/josh_wills/status/198093512149958656
FLAW
ED
issue: aristotelian perspectives in a non-linear world…
![Page 5: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/5.jpg)
Whither Data Science?
circa 2008: a large ad-tech firm, running one of the largest Hadoop instances in the cloud, execs said “Don’t bother” to dig into analysis of geo, clustering, time series, etc.
![Page 6: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/6.jpg)
Whither Data Science?
circa 2008: a large ad-tech firm, running one of the largest Hadoop instances in the cloud, execs said “Don’t bother” to dig into analysis of geo, clustering, time series, etc. !We did anyway.
![Page 7: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/7.jpg)
Whither Data Science?
circa 2008: a large ad-tech firm, running one of the largest Hadoop instances in the cloud, execs said “Don’t bother” to dig into analysis of geo, clustering, time series, etc. !We did anyway:
• people in SF don’t click online travel ads much, however, people in Dodge City do… a lot!
• largest customer segment: flag poles, portable generators, hammocks, sea salt, mail-order steaks, defibrillators
FLAW
ED
![Page 8: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/8.jpg)
Whither Data Science?
primary sources for the notion:
Cleveland, W. S., “Data Science: an Action Plan for Expanding the Technical Areas of the Field of Statistics,” International Statistical Review (2001), 69, 21-26. http://cm.bell-labs.com/stat/doc/datascience.ps
Breiman L., “Statistical modeling: the two cultures”, Statistical Science (2001), 16:199-231. http://projecteuclid.org/euclid.ss/1009213726
…also good to mention John Tukey
![Page 9: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/9.jpg)
Whither Data Science?
we have a long, long way yet to go:
So many problems that we encounter in industry can be represented as graphs… !Tensors provide means for representing multiple-edge graphs, ostensibly solving for a general case… !Even so, how much time have you spent working with tensors for data science apps?
wikipedia.org
![Page 10: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/10.jpg)
Historical Arc 1: The Alchemists…
“Who has the crystal ball?”
![Page 11: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/11.jpg)
Arc 1: Who has the crystal ball?
TL;DR: Nods to some people who envisaged and modeled our shared future…
![Page 12: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/12.jpg)
Theory, Eight Decades Ago: what can be computed?
Haskell Curry haskell.org
Alonso Churchwikipedia.org
John Backusacm.org
David Turnerwikipedia.org
Praxis, Four Decades Ago: algebra for applicative systems
Pattie MaesMIT Media Lab
Reality, Two Decades Ago: web apps, ML, machine data
Arc 1: Who has the crystal ball?
![Page 13: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/13.jpg)
spark.apache.org
A Brief History: Functional Programming for Big Data
![Page 14: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/14.jpg)
databricks.com/blog/2014/10/10/spark-petabyte-sort.html
A Brief History: Smashing The Previous Petabyte Sort Record
spark.apache.org
![Page 15: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/15.jpg)
Historical Arc 2: An Oblivoir Of Origins…
“Why are we here?”
![Page 16: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/16.jpg)
Arc 2: Why are we here?
TL;DR: We share the delightful role of… !!speaking truth to power
![Page 17: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/17.jpg)
Arc 2: Why are we here?
Reason 1: early 19th c. Prussian/Napoleonic “General Staff” organization => corporate IT silos !translated: too many people saying “That is not my concern.” !action: interdisciplinary teams tear down silos, surfacing insights
![Page 18: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/18.jpg)
Arc 2: Why are we here?
Reason 2: 19th-20th c. statistics emphasized defensibility in lieu of predictability !translated: defend one’s job, not boost top-line revenue !action: focus on predictability; if you need to defend your job, you should be working elsewhere
![Page 19: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/19.jpg)
Arc 2: Why are we here?
Reason 3: machine learning derives from several disciplines, but ultimately is a subset of optimization !translated: they couldn’t talk to each other very much, we have difficulty understanding them collectively !action: learn to leverage optimization theory, thoroughly
![Page 20: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/20.jpg)
Arc 2: Why are we here?
Reason 4: university math curricula are still tilted toward Cold War priorities !translated: 2-3 years calculus weeds out the better mechanical engineering candidates who can build the most cost-effective ICBMs !action: leadership must embrace how to leverage advanced math for business use cases
![Page 21: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/21.jpg)
Arc 2: Why are we here?
Reason 5: brogrammers tend to emphasize logical reasoning over analytic reasoning !translated: left-brained lopsidedness wins temporarily, then fails spectacularly !action: ask security to walk the brogrammers back to their cave
![Page 22: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/22.jpg)
Arc 2: Why are we here?
Reason 6: people can make intuitive decisions in ~4 dimensions at most, period !translated: product managers as Steve Jobs wannabes are poisonous !action: leverage data science, visualization, machine learning with distributed systems at scale to address the high dimensionality of data
![Page 23: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/23.jpg)
Arc 2: Why are we here?
Reason 7: embracing perpetual learning curves represents a promethean challenge !translated: learning is hard, and many organizations go to great lengths to minimize it !action: learn efficiently, continually, with a great thirst
![Page 24: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/24.jpg)
Historical Arc 3: Be There Then…
“What happens next?”
![Page 25: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/25.jpg)
Arc 4: What happens next?
TL;DR: Brace yourselves…
![Page 26: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/26.jpg)
Arc 4: What happens next?
• Full stack… no, really
• You’ll work with functional programming and cloud-based notebooks
• Shift from modeling based on variance (batch) towards probabilistic approximation
• Early data scientists displace the old-school product managers
• IoT, drones, microsats: several orders of magnitude more data up ahead
• leave SF – the more interesting data science work to be accomplished is not here
![Page 27: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/27.jpg)
Arc 4: What happens next?
Full stack… no, really
from visualization to virtualization, all points in-between
source: Microsoft
![Page 28: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/28.jpg)
Arc 4: What happens next?
• Full stack… no, really
• You’ll work with functional programming and cloud-based notebooks
• Shift from modeling based on variance (batch) towards probabilistic approximation
• Early data scientists displace the old-school product managers
• IoT, drones, microsats: several orders of magnitude more data up ahead
• leave SF – the more interesting data science work to be accomplished is not here
![Page 29: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/29.jpg)
Arc 4: What happens next?
You’ll work with functional programming and cloud-based notebooks
http://databricks.com/product
![Page 30: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/30.jpg)
Arc 4: What happens next?
• Full stack… no, really
• You’ll work with functional programming and cloud-based notebooks
• Shift from modeling based on variance (batch) towards probabilistic approximation
• Early data scientists displace the old-school product managers
• IoT, drones, microsats: several orders of magnitude more data up ahead
• leave SF – the more interesting data science work to be accomplished is not here
![Page 31: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/31.jpg)
Arc 4: What happens next?
Shift from modeling based on variance (batch) towards probabilistic approximation
highlyscalable.wordpress.com/2012/05/01/probabilistic-structures-web-analytics-data-mining/
![Page 32: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/32.jpg)
Arc 4: What happens next?
• Full stack… no, really
• You’ll work with functional programming and cloud-based notebooks
• Shift from modeling based on variance (batch) towards probabilistic approximation
• Early data scientists displace the old-school product managers
• IoT, drones, microsats: several orders of magnitude more data up ahead
• leave SF – the more interesting data science work to be accomplished is not here
![Page 33: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/33.jpg)
Arc 4: What happens next?
Early data scientists displace the old-school product managers
![Page 34: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/34.jpg)
Arc 4: What happens next?
• Full stack… no, really
• You’ll work with functional programming and cloud-based notebooks
• Shift from modeling based on variance (batch) towards probabilistic approximation
• Early data scientists displace the old-school product managers
• IoT, drones, microsats: several orders of magnitude more data up ahead
• leave SF – the more interesting data science work to be accomplished is not here
![Page 35: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/35.jpg)
Arc 4: What happens next?
IoT, drones, microsats: several orders of magnitude more data up ahead
airshipse.g., JP Aerospace, 40 km
atmostatse.g., Titan Aerospace, 20 km
microsatse.g., Planet Labs, 400 km
robotse.g., Blue River, 1 m
sensorse.g., Hortau, -0.3 m
dronese.g., HoneyComb, 120 m
Layered Sensing Networks
![Page 36: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/36.jpg)
Arc 4: What happens next?
• Full stack… no, really
• You’ll work with functional programming and cloud-based notebooks
• Shift from modeling based on variance (batch) towards probabilistic approximation
• Early data scientists displace the old-school product managers
• IoT, drones, microsats: several orders of magnitude more data up ahead
• leave SF – the more interesting data science work to be accomplished is not here
![Page 37: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/37.jpg)
Arc 4: What happens next?
leave SF – the more interesting data science work to be accomplished is not here
![Page 38: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/38.jpg)
Summary?
![Page 39: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/39.jpg)
After we’ve cleaned up data, formulated workflows in terms of monoids, used graph representation, and parallelized with a wealth of linear algebra, much of the heavy-lifting that remains on the clusters is in optimization
For example, deep learning @Google uses many layers of neural nets trained with gradient descent optimization Taming Latency Variability and Scaling Deep Learning Jeff Dean @Google (2013) youtu.be/S9twUcX1Zp0
Vector Quantization:
![Page 40: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/40.jpg)
One advantage of quantum algorithms is to run large gradient descent problems in constant time… Reworking high-ROI apps to leverage lots of ML and large clusters, then SGD represents the datacenter cost basis, notably that part that scales…
Want to slash costs exponentially? Plug in quantum for a game-changer, maybe
Fast quantum algorithm for numerical gradient estimation Stephen P. Jordan Phys. Rev. Lett. 95, 050501 (2005) arxiv.org/abs/quant-ph/0405146 dwavesys.com
Vector Quantization:
![Page 41: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/41.jpg)
Proposal: let’s drop clusters of quantum devices into lunar polar craters, so we can handle massive vector quantization workloads
• micro-kelvin environs
• near perpetual sunlight for energy sources
• park routers at L4
• approx. $15B to finance, i.e., ~6 days DoD budget
Vector Quantization:
![Page 42: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/42.jpg)
We’ll just put this here… a couple o’ Googly projects in progress:
qCraft: Quantum Physics In Minecraft plus.google.com/u/1/+QuantumAILab/posts/grMbaaDGChH
Vector Quantization:
“We’re going back to the Moon. For good.”lunar.xprize.org
![Page 43: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/43.jpg)
Resources
![Page 44: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/44.jpg)
• spark.apache.org/community.html
• databricks.com/spark-training
• oreilly.com/go/sparkcert
Apache Spark community:
![Page 45: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/45.jpg)
events:Strata EUBarcelona, Nov 19-21 strataconf.com/strataeu2014 Data Day Texas Austin, Jan 10 datadaytexas.com Strata CA San Jose, Feb 18-20 strataconf.com/strata2015 Spark Summit East NYC, Mar 18-19 spark-summit.org/east
Spark Summit 2015 SF, Jun 15-17 spark-summit.org
![Page 46: Data Science in Future Tense](https://reader036.vdocument.in/reader036/viewer/2022062319/557d60bad8b42aba3d8b501c/html5/thumbnails/46.jpg)
presenter:
Just Enough Math O’Reilly, 2014
justenoughmath.compreview: youtu.be/TQ58cWgdCpA
monthly newsletter for updates, events, conf summaries, etc.: liber118.com/pxn/
Enterprise Data Workflows with Cascading O’Reilly, 2013
shop.oreilly.com/product/0636920028536.do