6 scikit-learn - data tuesday 26 fev 2013

15
scikit-learn Machine Learning in Python Data Tuesday - Feb. 26 2013 - Paris dimanche 24 février 13

Upload: data-tuesday

Post on 27-Jan-2015

119 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

scikit-learnMachine Learning in Python

Data Tuesday - Feb. 26 2013 - Paris

dimanche 24 février 13

• Library of Machine Learning models

• Simple fit / predict / transform API

• Python / NumPy / SciPy / Cython

& wrappers for libsvm / liblinear

• Model Assessment, Selection & Ensembles

• Some support for multi-core

dimanche 24 février 13

Possible Applications

• Text Classification / Sequence Tagging NLP

• Computer Vision / Robotics

• Learning To Rank - IR and advertisement

• Statistical Analysis of the Brain: fMRI / MEG

• Astronomy, Biology, Social Sciences...

dimanche 24 février 13

dimanche 24 février 13

dimanche 24 février 13

dimanche 24 février 13

Example:Training a Model for

Face Recognition

dimanche 24 février 13

Total dataset size:n_samples: 1288, n_features: 1850, n_classes: 7

Extracting the top 150 eigenfaces from 966 facesdone in 0.466s

Projecting the input data on the eigenfaces orthonormal basisdone in 0.056s

Fitting the SVM classifier to the training setdone in 18.549s

Predicting people's names on the test setdone in 0.062s precision recall f1-score support

Ariel Sharon 0.90 0.75 0.82 12 Colin Powell 0.78 0.94 0.85 62 Donald Rumsfeld 0.86 0.72 0.78 25 George W Bush 0.89 0.96 0.92 141Gerhard Schroeder 0.92 0.74 0.82 31 Hugo Chavez 0.90 0.53 0.67 17 Tony Blair 0.81 0.74 0.77 34

avg / total 0.86 0.86 0.86 322

dimanche 24 février 13

dimanche 24 février 13

Learned Eigen Faces

dimanche 24 février 13

Contributors

• GitHub-centric contribution workflow

• each pull request needs 2 x [+1] reviews

• code + tests + doc + example

• 92% test coverage / Continuous Integr.

• 4 major releases per years + 4 bugfix rel.

• 66 contributors for release 0.13

dimanche 24 février 13

Users

• We support users on & ML

• 200+ questions tagged with [scikit-learn]

• Many competitors + benchmarks

• 500+ answers on ongoing user survey

• 60% academics / 40% from industry

• Some data-drive Startups use sklearn

dimanche 24 février 13

Thank you!

• http://scikit-learn.org - Main Project + doc

• @ogrisel on twitter

• http://ogrisel.com - ML Consultancy (soon)

dimanche 24 février 13

Backup Slides

dimanche 24 février 13

Caveat Emptor

• Domain specific tooling kept to a minimum

• Some feature extraction for Bag of Words Text Analysis

• Some functions for extracting image patches

• Domain integration is the responsibility of the user or 3rd party libraries

dimanche 24 février 13