python for data: past, present, future (pycon jp 2017 keynote)
TRANSCRIPT
© 2016 Continuum Analytics - Confidential & Proprietary
Python for Data:Past, Present, and Future
Peter Wang CTO, Co-founder Anaconda / Continuum Analytics
© 2017 Anaconda, Inc.
• Our Journey with Anaconda • Why Python for Data? • The Future
Agenda
2
3
My Journey with Anaconda
© 2017 Anaconda, Inc.
• Degree in Physics (Cornell Univ.) • Computer graphics developer (C, C++) • Scientific Python developer and consultant (Chaco, Traits, …) • Founded Continuum Analytics in 2012 with Travis Oliphant • Launched / Created: PyData conferences and community, Anaconda
distribution, conda package manager, Bokeh web visualization, Blaze data library
• Think a lot about future of Python for data+science, machine learning
About Peter
4
When we started 5 years ago…
© 2017 Anaconda, Inc.
The birth of conda…
6
“Guido, please help convince core dev to work with us to solve
the packaging problem!”
“Meh. Feel free to solve it yourselves.”
© 2017 Anaconda, Inc. 7
• 500+ Popular Python Packages • Optimized & Compiled • Free for Everyone
• Extensible via Conda Package Manager • Sandbox Packages & Libraries • Cross-Platform – Windows, Linux, Mac • Not just Python - over 230 R packages
© 2017 Anaconda, Inc. 8
0
500
1,000
1,500
2,000
2,500
3,000
3,50020
15/1
2015
/2 20
15/3
2015
/4 20
15/5
2015
/6 20
15/7
2015
/8 20
15/9
2015
/10
2015
/11
2015
/12
2016
/1 20
16/2
2016
/3 20
16/4
2016
/5 20
16/6
2016
/7 20
16/8
2016
/9 20
16/10
20
16/11
20
16/12
20
17/1
2017
/2 20
17/3
2017
/4 20
17/5
2017
/6 20
17/7
Thou
sand
s
Anaconda & Miniconda Downloads
Anaconda Miniconda
Over 20 Million Downloads
© 2017 Anaconda, Inc.
The Growth of Data Science - Python Leading the Way
9
https://stackoverflow.blog/2017/09/06/incredible-growth-python/
© 2017 Anaconda, Inc.
Other Problems in 2012…
10
• Performance: You had to choose between vectorized system like NumPy, or going to Cython or wrapping C code. No nice JIT like Julia.
• We created Numba
• No system for building simple data-driven web apps, like Shiny for R. • We created Bokeh, to serve as both Shiny and D3 for Python
• No easy parallelism, or intrinsic parallel primitives like Spark. • We created Dask, which has parallel arrays and dataframes. • Also solves “data doesn't fit in RAM” problem.
© 2017 Anaconda, Inc. 11
• Everyone is learning it, major universities are teaching it • Proven in production at Serious Places, not merely hip startups • Vastly outstrips scripting language rivals like Ruby, Perl • Growing faster than pure analysis langs like R, SAS, Matlab • Data science, machine learning application is taking off like a rocket • Python is most popular language for Deep Learning, the most
rapidly-innovating area of machine learning • Python 2 vs 3 rift is less of an issue for most people
Python in 2017
https://www.youtube.com/watch?v=nU09j2gGHYg
Why Python for Data?
13
© 2017 Anaconda, Inc. 14
1973 19811968 1974
SQL
Numeric
19962005 1993 1991
© 2017 Anaconda, Inc.
Python & ABC
15
It is interactive, structured, high-level, and intended to be used instead of BASIC, Pascal, or AWK.
It is not meant to be a systems-programming language but is intended for teaching or prototyping.
© 2017 Anaconda, Inc. 16
Analyst
• Uses graphical tools • Can call functions,
cut & paste code• Can change some
variables
Gets paid for: Insight
Excel, VB, Tableau,
Analyst / Data Developer
• Builds simple apps & workflows• Used to be "just an analyst" • Likes coding to solve problems• Doesn't want to be a "full-time
programmer"
Gets paid (like a rock star) for: Code that produces insight
SAS, R, Matlab,
Programmer
• Creates frameworks & compilers
• Uses IDEs • Degree in CompSci• Knows multiple
languages
Gets paid for: Code
C, C++, Java, JS,
Python Python Python
© 2017 Anaconda, Inc.
• VERY common misconception • Python is probably the most misunderstood language
• There are “tribes” and ecosystems in Python: web dev, scipy, pydata, embedded, scripting, 3D graphics, etc.
• But businesses tend to pigeonhole it: • IT/software/data engineering view: competes
with Java, C#, Ruby… • Analytics, stats, data science view: competes
with R, SAS, Matlab, SPSS, BI systems
Data science != Software Development
17
© 2017 Anaconda, Inc.
• Data exploration and analysis are going to be a new kind of literacy that will be required to do great work in any field.
• Language is a human instinct and is a natural path to insight. We see this in our interaction with Python/PyData users, whose passion chiefly stems from this expressiveness and agility.
• An analytical language is “thoughtware”, not “software”.
Era of Data Literacy
18
© 2017 Anaconda, Inc. 19
What’s Next?
20
© 2017 Anaconda, Inc.
• Python will become a preferred way to develop cognitive applications: online model learning and training
• There will be a steady income stream for people who want to maintain Python 2.x codebases
• Multi-language interoperability will be greatly improved once people adopt the Apache Arrow format for storing data. This means Python code running alongside Java/Scala/JVM will not be a second-class citizen.
• Constant improvements in memory and storage, as well as GPUs, mean that people will continue doing lots of Python locally on big workstations.
A Few Predictions
21
© 2017 Anaconda, Inc.
• Not about licenses • Empowering people &
communities to innovate • Aligns us with users, customers,
innovators
• “Software is eating the world” • Open source is eating software
Open Source and Developers
23
© 2017 Anaconda, Inc.
• Not about cost of software (“capital expense”)
• Not even about maintenance of software (“operational expense”)
• Core business goals: • Avoid lock-in • Harness innovation
Open Source and Businesses
25
5 Years 25+ Conferences 100s of talks
© 2017 Anaconda, Inc.
Questions?
28