on the code of data science
TRANSCRIPT
On the code of data science
VaroquauxGael
import data sciencedata science.discover()
Disclaimer:A bit of a Python biasKey ideas carry over
On the code of data science
VaroquauxGael
import data sciencedata science.discover()
Disclaimer:A bit of a Python biasKey ideas carry over
What is data science?
G Varoquaux 2
What is data science?
G Varoquaux 2
Data science = statistics + code
Larger datasets
Automating insights
Data science is disruptive when it enablesreal-time or personalized decisions
G Varoquaux 3
1 Writing code for data science
2 Machine learning in Python
3 Outlook: infrastructure
Make you more productive as a data scientist
G Varoquaux 4
1 Writing code for data scienceProductivity & Quality
G Varoquaux 5
1 A data-science workflowWork based on intuition
and experimentation
Conjecture
Experiment
ñ Interactive & framework-less
Yet needs consolidationkeeping flexibility
G Varoquaux 6
1 A design pattern in computational experimentsMVC pattern from Wikipedia:ModelManages the dataand rules of theapplication
ViewOutput represen-tationPossibly several views
ControllerAccepts inputand converts it tocommandsfor model and view
Photo-editing softwareFilters Canvas Tool palettes
Typical web applicationDatabase Web-pages URLs
For data science:Numerical, data-processing, & ex-perimental logic
Results, as files.Data & plots
Imperative APIAvoid input as files:not expressive
Modulewith functions
Post-processing scriptCSV & data files
Scriptñ for loops
A recipe3 types of files:•modules • command scripts • post-processing scriptsCSVs & intermediate data files
Separate computation from analysis / plottingCode and text (and data) ñ version control
Decouple stepsGoals: Reuse code
Mitigate compute time
G Varoquaux 7
1 A design pattern in computational experimentsMVC pattern from Wikipedia:ModelManages the dataand rules of theapplication
ViewOutput represen-tationPossibly several views
ControllerAccepts inputand converts it tocommandsfor model and view
For data science:Numerical, data-processing, & ex-perimental logic
Results, as files.Data & plots
Imperative APIAvoid input as files:not expressive
Modulewith functions
Post-processing scriptCSV & data files
Scriptñ for loops
A recipe3 types of files:•modules • command scripts • post-processing scriptsCSVs & intermediate data files
Separate computation from analysis / plottingCode and text (and data) ñ version control
Decouple stepsGoals: Reuse code
Mitigate compute time
G Varoquaux 7
1 A design pattern in computational experimentsMVC pattern from Wikipedia:ModelManages the dataand rules of theapplication
ViewOutput represen-tationPossibly several views
ControllerAccepts inputand converts it tocommandsfor model and view
For data science:Numerical, data-processing, & ex-perimental logic
Results, as files.Data & plots
Imperative APIAvoid input as files:not expressive
Modulewith functions
Post-processing scriptCSV & data files
Scriptñ for loops
A recipe3 types of files:•modules • command scripts • post-processing scriptsCSVs & intermediate data files
Separate computation from analysis / plottingCode and text (and data) ñ version control
Decouple stepsGoals: Reuse code
Mitigate compute time
G Varoquaux 7
1 A design pattern in computational experimentsMVC pattern from Wikipedia:ModelManages the dataand rules of theapplication
ViewOutput represen-tationPossibly several views
ControllerAccepts inputand converts it tocommandsfor model and view
For data science:Numerical, data-processing, & ex-perimental logic
Results, as files.Data & plots
Imperative APIAvoid input as files:not expressive
Modulewith functions
Post-processing scriptCSV & data files
Scriptñ for loops
A recipe3 types of files:•modules • command scripts • post-processing scriptsCSVs & intermediate data files
Separate computation from analysis / plottingCode and text (and data) ñ version control
Decouple stepsGoals: Reuse code
Mitigate compute time
G Varoquaux 7
1 How I work progressive consolidation
Start with a script playing to understand the problem
Identify blocks/operations ñ move to a function
As they stabilize, move to a module
Clean: delete code & files you have version control
Why is it hard?Long compute timesmake us unadventurous
Know your toolsRefactoring editorVersion control
G Varoquaux 8
1 How I work progressive consolidation
Start with a script playing to understand the problem
Identify blocks/operations ñ move to a functionUse functionsObstacle: local scoperequires identifying input and output variables
That’s a good thing
Solution: Interactive debugging / understandinginside a function: %debug in IPython
Functions are the basic reusable abstraction
As they stabilize, move to a module
Clean: delete code & files you have version control
Why is it hard?Long compute timesmake us unadventurous
Know your toolsRefactoring editorVersion control
G Varoquaux 8
1 How I work progressive consolidation
Start with a script playing to understand the problem
Identify blocks/operations ñ move to a function
As they stabilize, move to a moduleModules
enable sharing between experimentsñ avoid 1000 lines scripts + commented codeenable testing
Fast experiments as testsñ gives confidence, hence refactorings
Clean: delete code & files you have version control
Why is it hard?Long compute timesmake us unadventurous
Know your toolsRefactoring editorVersion control
G Varoquaux 8
1 How I work progressive consolidation
Start with a script playing to understand the problem
Identify blocks/operations ñ move to a function
As they stabilize, move to a module
Clean: delete code & files you have version control
Attentional load makes it impossibleto find or understand things
Where’s Waldo?
Why is it hard?Long compute timesmake us unadventurous
Know your toolsRefactoring editorVersion control
G Varoquaux 8
1 How I work progressive consolidation
Start with a script playing to understand the problem
Identify blocks/operations ñ move to a function
As they stabilize, move to a module
Clean: delete code & files you have version control
Why is it hard?Long compute timesmake us unadventurous
Know your toolsRefactoring editorVersion control
G Varoquaux 8
1 Caching to tame computation timeThe memoize pattern
mem = joblib.Memory(cachedir=’.’)g = mem.cache(f)b = g(a) # computes a using fc = g(a) # retrieves results from memory
For data sciencea & b can be biga & b arbitrary objects no change in workflowResults stored on disk
G Varoquaux 9
1 Caching to tame computation timeThe memoize pattern
mem = joblib.Memory(cachedir=’.’)g = mem.cache(f)b = g(a) # computes a using fc = g(a) # retrieves results from memory
For data scienceFits in experimentation loopHelps decrease re-run times œ
Black-boxy, persistence only implicit
G Varoquaux 9
Adopting software-engineering best practices
G Varoquaux 10
1 The ladder of code quality
Incr
easin
gco
st
?İ
Use linting/code analysis in your editor seriously
Coding convention, good naming
Version control Use git + github
Code review
Unit testingIf it’s not tested, it’s broken or soon will be
Make a packagecontrolled dependencies and compilation...
G Varoquaux 11
1 The ladder of code quality
Incr
easin
gco
st
?İ
Use linting/code analysis in your editor seriously
Coding convention, good naming
Version control Use git + github
Code review
Unit testingIf it’s not tested, it’s broken or soon will be
Make a packagecontrolled dependencies and compilation...
G Varoquaux 11
1 The ladder of code qualityIn
crea
sing
cost
?İ
Use linting/code analysis in your editor seriously
Coding convention, good naming
Version control Use git + github
Code review
Unit testingIf it’s not tested, it’s broken or soon will be
Make a packagecontrolled dependencies and compilation...
Avoid premature software engineering
G Varoquaux 11
1 The ladder of code qualityIn
crea
sing
cost
?İ
Use linting/code analysis in your editor seriously
Coding convention, good naming
Version control Use git + github
Code review
Unit testingIf it’s not tested, it’s broken or soon will be
Make a packagecontrolled dependencies and compilation...
Avoid premature software engineering
Over versus under engineeringWhen the goal is generating insights
Experimentation to develop intuitionsñ new ideas
As the path becomes clear: consolidationHeavy engineering too early freezes bad ideas
G Varoquaux 11
1 LibrariesIn
crea
sing
cost
?İ
Use linting/code analysis in your editor seriously
Coding convention, good naming
Version control Use git + github
Code review
Unit testingIf it’s not tested, it’s broken or soon will be
Make a packagecontrolled dependencies and compilation...
A libraryG Varoquaux 12
2 Machine learning in Python
scikit-learn
G Varoquaux 13
2 Tradeoffs
Experimentation Production-đ §
scikit
G Varoquaux 14
2 My stack for data science
Python, what else?General-purpose languageInteractiveEasy to read / write
G Varoquaux 15
2 My stack for data science
The scientific Python stacknumpy arrays
Mostly a float**No annotation / structureUniversal across applicationsEasily shared across languages
0387879479
7927
0179075270
1578
9407174612
4797
5497071871
7887
1365349049
5190
7475426535
8098
4872154634
9084
9034567324
5614
7895718774
5620
0387879479
7927
0179075270
1578
9407174612
4797
5497071871
7887
1365349049
5190
7475426535
8098
4872154634
9084
9034567324
5614
7895718774
5620G Varoquaux 15
2 My stack for data science
The scientific Python stacknumpy arrays
Connecting topandas
Columnar datascikit-image
Imagesscipy
Numerics, signal processing...
G Varoquaux 15
2 Machine learning in a nutshellMachine learning is about making predictions from data
e.g. learning to distinguish apples from oranges
Prediction is very difficult, especially about the future. Niels Bohr
Learn as much as possible from the databut not too much
G Varoquaux 16
2 Machine learning in a nutshellMachine learning is about making predictions from data
e.g. learning to distinguish apples from oranges
Prediction is very difficult, especially about the future. Niels Bohr
Learn as much as possible from the databut not too much
G Varoquaux 16
2 Machine learning in a nutshellMachine learning is about making predictions from data
e.g. learning to distinguish apples from oranges
Prediction is very difficult, especially about the future. Niels Bohr
Learn as much as possible from the databut not too much
x
y
x
y
Which model do you prefer?
G Varoquaux 16
2 Machine learning in a nutshellMachine learning is about making predictions from data
e.g. learning to distinguish apples from oranges
Prediction is very difficult, especially about the future. Niels Bohr
Learn as much as possible from the databut not too much
x
y
x
y
Minimizing train error ‰ generalization : overfit
G Varoquaux 16
2 Machine learning in a nutshellMachine learning is about making predictions from data
e.g. learning to distinguish apples from oranges
Prediction is very difficult, especially about the future. Niels Bohr
Learn as much as possible from the databut not too much
x
y
x
y
Adapting model complexity to data – regularization
G Varoquaux 16
2 Machine learning without learning the machinery
A library, not a programMore expressive and flexibleEasy to include in an ecosystem
let’s disrupt something new
As easy as pyfrom s k l e a r n import svmc l a s s i f i e r = svm.SVC()c l a s s i f i e r . f i t ( X t r a i n , y t r a i n )Y t e s t = c l a s s i f i e r . p r e d i c t ( X t e s t )
G Varoquaux 17
2 Machine learning without learning the machinery
A library, not a programMore expressive and flexibleEasy to include in an ecosystem
let’s disrupt something new
As easy as pyfrom s k l e a r n import svmc l a s s i f i e r = svm.SVC()c l a s s i f i e r . f i t ( X t r a i n , y t r a i n )Y t e s t = c l a s s i f i e r . p r e d i c t ( X t e s t )
G Varoquaux 17
2 Machine learning without learning the machinery
A library, not a programMore expressive and flexibleEasy to include in an ecosystem
let’s disrupt something new
As easy as pyfrom s k l e a r n import svmc l a s s i f i e r = svm.SVC()c l a s s i f i e r . f i t ( X t r a i n , y t r a i n )Y t e s t = c l a s s i f i e r . p r e d i c t ( X t e s t )
G Varoquaux 17
2 Show me your data: the samples ˆ features matrix
Data input: a 2D numerical arrayRequires transforming your problem
With text documents:
0307809070
7907
0079075270
0578
9407100600
0797
0097000800
7000
1000040040
0090
0005020500
8000
samples
features
sklearn.feature extraction.text.TfIdfVectorizer
G Varoquaux 18
2 Show me your data: the samples ˆ features matrix
Data input: a 2D numerical arrayRequires transforming your problem
With text documents:
0307809070
7907
0079075270
0578
9407100600
0797
0097000800
7000
1000040040
0090
0005020500
8000
documents
the
Python
performance
profiling
module
is
code
can
a
sklearn.feature extraction.text.TfIdfVectorizer
G Varoquaux 18
“Big” dataEngineering efficient processing pipelines
Many samples or
0307809070
7907
0079075270
0578
9407100600
0797
0097000800
7000
1000040040
0090
0005020500
8000
samples
features
Many features
0307809070
7907
0079075270
0578
9407100600
0797
0097000800
7000
1000040040
0090
0005020500
8000
samples
features 03078
090707907
0079075270
0578
9407100600
0797
0097000800
7000
1000040040
0090
0005020500
8000
See also: http://www.slideshare.net/GaelVaroquaux/processing-biggish-data-on-commodity-hardware-simple-python-patterns
G Varoquaux 19
2 Many samples: on-line algorithms
e s t i m a t o r . p a r t i a l f i t ( X t r a i n , Y t r a i n )
0387
8794
7979
27
0179
0752
7015
78
9407
1746
1247
97
5497
0718
7178
87
1365
3490
4951
90
7475
4265
3580
98
4872
1546
3490
84
9034
5673
2456
14
7895
7187
7456
200387
8794
7979
27
0179
0752
7015
78
9407
1746
1247
97
5497
0718
7178
87
1365
3490
4951
90
7475
4265
3580
98
4872
1546
3490
84
9034
5673
2456
14
7895
7187
7456
20
0387
8794
7979
27
0179
0752
7015
78
9407
1746
1247
97
5497
0718
7178
87
1365
3490
4951
90
7475
4265
3580
98
4872
1546
3490
84
9034
5673
2456
14
7895
7187
7456
200387
8794
7979
27
0179
0752
7015
78
9407
1746
1247
97
5497
0718
7178
87
1365
3490
4951
90
7475
4265
3580
98
4872
1546
3490
84
9034
5673
2456
14
7895
7187
7456
20
0387
8794
7979
27
0179
0752
7015
78
9407
1746
1247
97
5497
0718
7178
87
1365
3490
4951
90
7475
4265
3580
98
4872
1546
3490
84
9034
5673
2456
14
7895
7187
7456
200387
8794
7979
27
0179
0752
7015
78
9407
1746
1247
97
5497
0718
7178
87
1365
3490
4951
90
7475
4265
3580
98
4872
1546
3490
84
9034
5673
2456
14
7895
7187
7456
20
G Varoquaux 20
2 Many samples: on-line algorithms
e s t i m a t o r . p a r t i a l f i t ( X t r a i n , Y t r a i n )
Supervised models: predictingsklearn.naive bayes...sklearn.linear model.SGDRegressorsklearn.linear model.SGDClassifier
Clustering: grouping samplessklearn.cluster.MiniBatchKMeanssklearn.cluster.Birch
Linear decompositions: finding new representationssklearn.decompositions.IncrementalPCAsklearn.decompositions.MiniBatchDictionaryLearningsklearn.decompositions.LatentDirichletAllocation
G Varoquaux 20
2 Many features: on-the-fly data reduction
ñ Reduce the data as it is loaded
X s m a l l =e s t i m a t o r . t r a n s f o r m ( X big , y)
G Varoquaux 21
2 Many features: on-the-fly data reduction
Random projections (will average features)sklearn.random projection
random linear combinations of the features
Fast clustering of featuressklearn.cluster.FeatureAgglomeration
on images: super-pixel strategy
Hashing when observations have varying size(e.g. words)
sklearn.feature extraction.text.HashingVectorizer
stateless: can be used in parallel
G Varoquaux 21
More gems in scikit-learnSAG:linear model.LogisticRegression(solver=’sag’)Fast linear model on biggish data
G Varoquaux 22
More gems in scikit-learnSAG:linear model.LogisticRegression(solver=’sag’)Fast linear model on biggish data
PCA == RandomizedPCA: (0.18)Heuristic to switch PCA to random linear algebra
Fights global warming
Huge speed gains for biggish data
G Varoquaux 22
More gems in scikit-learn
New cross-validation objects (0.18)from s k l e a r n . c r o s s v a l i d a t i o n
import S t r a t i f i e d K F o l d
cv = S t r a t i f i e d K F o l d (y , n f o l d s =2)for t r a i n , t e s t in cv :
X t r a i n = X[ t r a i n ]y t r a i n = y[ t r a i n ]
Data-independent better nested-CV
G Varoquaux 22
More gems in scikit-learn
New cross-validation objects (0.18)from s k l e a r n . m o d e l s e l e c t i o n
import S t r a t i f i e d K F o l d
cv = S t r a t i f i e d K F o l d ( n f o l d s =2)for t r a i n , t e s t in cv . s p l i t (X, y):
X t r a i n = X[ t r a i n ]y t r a i n = y[ t r a i n ]
Data-independent ñ better nested-CV
G Varoquaux 22
More gems in scikit-learn
Outlier detection and isolation forests (0.18)
G Varoquaux 22
3 Outlook: infrastructure
No strings attached
G Varoquaux 23
3 Dataflow is key to scale
Array computingCPU
03878794797927
01790752701578
03878794797927
01790752701578
Data parallel
03878794797927
03878794797927
Streaming
0387
8794
7979
27
0179
0752
7015
78
9407
1746
1247
97
5497
0718
7178
87
1365
3490
4951
90
7475
4265
3580
98
4872
1546
3490
84
9034
5673
2456
14
7895
7187
7456
200387
8794
7979
27
0179
0752
7015
78
9407
1746
1247
97
5497
0718
7178
87
1365
3490
4951
90
7475
4265
3580
98
4872
1546
3490
84
9034
5673
2456
14
7895
7187
7456
20
0387
8794
7979
27
0179
0752
7015
78
9407
1746
1247
97
5497
0718
7178
87
1365
3490
4951
90
7475
4265
3580
98
4872
1546
3490
84
9034
5673
2456
14
7895
7187
7456
200387
8794
7979
27
0179
0752
7015
78
9407
1746
1247
97
5497
0718
7178
87
1365
3490
4951
90
7475
4265
3580
98
4872
1546
3490
84
9034
5673
2456
14
7895
7187
7456
20
0387
8794
7979
27
0179
0752
7015
78
9407
1746
1247
97
5497
0718
7178
87
1365
3490
4951
90
7475
4265
3580
98
4872
1546
3490
84
9034
5673
2456
14
7895
7187
7456
200387
8794
7979
27
0179
0752
7015
78
9407
1746
1247
97
5497
0718
7178
87
1365
3490
4951
90
7475
4265
3580
98
4872
1546
3490
84
9034
5673
2456
14
7895
7187
7456
20
Parallel computingData + code transfer Out-of-memory persistence
These patterns can yield horrible codeG Varoquaux 24
3 Parallel-computing engine: joblibsklearn.Estimator(n jobs=2)
Under the hood: joblibąąąąąąąąą from joblib import Parallel, delayedąąąąąąąąą Parallel(n jobs=2)(delayed(sqrt)(i**2)... for i in range(8))[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
Threading and multiprocessing mode
New: distributed computing backends:Yarn, dask.distributed, IPython.parallel
import distributed.joblibfrom joblib import Parallel, parallel backendwith parallel backend(’dask.distributed’,
scheduler host=’HOST:PORT’):# normal Joblib code
G Varoquaux 25
3 Parallel-computing engine: joblibsklearn.Estimator(n jobs=2)
Under the hood: joblibąąąąąąąąą from joblib import Parallel, delayedąąąąąąąąą Parallel(n jobs=2)(delayed(sqrt)(i**2)... for i in range(8))[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
Threading and multiprocessing mode
New: distributed computing backends:Yarn, dask.distributed, IPython.parallel
import distributed.joblibfrom joblib import Parallel, parallel backendwith parallel backend(’dask.distributed’,
scheduler host=’HOST:PORT’):# normal Joblib code
G Varoquaux 25
3 Parallel-computing engine: joblibsklearn.Estimator(n jobs=2)
Under the hood: joblibąąąąąąąąą from joblib import Parallel, delayedąąąąąąąąą Parallel(n jobs=2)(delayed(sqrt)(i**2)... for i in range(8))[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
Threading and multiprocessing mode
New: distributed computing backends:Yarn, dask.distributed, IPython.parallel
import distributed.joblibfrom joblib import Parallel, parallel backend
with parallel backend(’dask.distributed’,scheduler host=’HOST:PORT’):
# normal Joblib code
G Varoquaux 25
3 Parallel-computing engine: joblibsklearn.Estimator(n jobs=2)
Under the hood: joblibąąąąąąąąą from joblib import Parallel, delayedąąąąąąąąą Parallel(n jobs=2)(delayed(sqrt)(i**2)... for i in range(8))[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
Threading and multiprocessing mode
New: distributed computing backends:Yarn, dask.distributed, IPython.parallel
import distributed.joblibfrom joblib import Parallel, parallel backendwith parallel backend(’dask.distributed’,
scheduler host=’HOST:PORT’):# normal Joblib code
G Varoquaux 25
3 Parallel-computing engine: joblibsklearn.Estimator(n jobs=2)
Under the hood: joblibąąąąąąąąą from joblib import Parallel, delayedąąąąąąąąą Parallel(n jobs=2)(delayed(sqrt)(i**2)... for i in range(8))[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
Threading and multiprocessing mode
New: distributed computing backends:Yarn, dask.distributed, IPython.parallel
import distributed.joblibfrom joblib import Parallel, parallel backendwith parallel backend(’dask.distributed’,
scheduler host=’HOST:PORT’):# normal Joblib code
Algorithmic plain-Python codewith optional bells and whistles
G Varoquaux 25
3 Persistence to diskPersisting any Python object fast, with little overhead
Streamed compressionI/O from/in open file handlesñ In S3, HDFS
G Varoquaux 26
3 joblib.Memory as a storage pool in devS3/HDFS/cloud backend:
joblib.Memory(’uri’, backend=’s3’)https://github.com/joblib/joblib/pull/397
Cache replacement:mem = joblib.Memory(’uri’, bytes limit=’1G’)mem.reduce size() # Remove oldest
Out-of-memory computingąąąąąąąąą result = mem.cache(g).call and shelve(a)ąąąąąąąąą result
MemorizedResult(cachedir=”...”, func=”g”, argument hash=”...”)ąąąąąąąąą c = result.get()
G Varoquaux 27
3 joblib.Memory as a storage pool in devS3/HDFS/cloud backend:
joblib.Memory(’uri’, backend=’s3’)https://github.com/joblib/joblib/pull/397
Cache replacement:mem = joblib.Memory(’uri’, bytes limit=’1G’)mem.reduce size() # Remove oldest
Out-of-memory computingąąąąąąąąą result = mem.cache(g).call and shelve(a)ąąąąąąąąą result
MemorizedResult(cachedir=”...”, func=”g”, argument hash=”...”)ąąąąąąąąą c = result.get()
G Varoquaux 27
3 joblib.Memory as a storage pool in devS3/HDFS/cloud backend:
joblib.Memory(’uri’, backend=’s3’)https://github.com/joblib/joblib/pull/397
Cache replacement:mem = joblib.Memory(’uri’, bytes limit=’1G’)mem.reduce size() # Remove oldest
Out-of-memory computingąąąąąąąąą result = mem.cache(g).call and shelve(a)ąąąąąąąąą result
MemorizedResult(cachedir=”...”, func=”g”, argument hash=”...”)ąąąąąąąąą c = result.get()
G Varoquaux 27
3 joblib as bricks of a compute engine: vision
03878794797927
03878794797927
Parallel ona cloud/cluster
Memory + shelvingon distributed storesas out-of-core& commoncomputation
Simple algorithmic codeUsing infrastructure without buying into it
G Varoquaux 28
3 joblib as bricks of a compute engine: vision
03878794797927
03878794797927Parallel ona cloud/cluster
Memory + shelvingon distributed storesas out-of-core& commoncomputation
Simple algorithmic codeUsing infrastructure without buying into it
G Varoquaux 28
Time to wrap up
Time to wrap up
Code, code, code
Scipy-lectures: learning numerical Python
Many problems are better solved bydocumentation than new code
Scipy-lectures: learning numerical PythonComprehensive document: numpy, scipy, ...
1. Getting started with Python for science2. Advanced topics3. Packages and applications
http://scipy-lectures.org
Scipy-lectures: learning numerical Python
Code examples
On the code of data science
@GaelVaroquaux
I believe in code
without compromises
On the code of data science
@GaelVaroquaux
I believe in codewithout compromises
Librarieswith side-effect free codetight APIsdecoupling of analysisand plotting
Software engineeringversion control
On the code of data science
@GaelVaroquaux
I believe in codewithout compromises
Code empowersenables automationopens new applications
Disrupt something new
We use scikit-learn for markersof neuropsychiatric diseases
On the code of data science
@GaelVaroquaux
I believe in codewithout compromises
Code empowers
Software purity holds backInteractivity:brittle and costlybut necessary for insights
Refusing syntax shortcutsñ verbosity
On the code of data science
@GaelVaroquaux
I believe in codewithout compromises
Code empowers
Software purity holds back
The cost is worth the benefit in the long run