could you be a data scientist? quantify data scientist profiles using machine learning and linkedin...

13
Could You Be a Data Scientist? Carlo Torniai, Ph.D. @carlotorniai

Upload: carlo-torniai

Post on 27-Jan-2015

118 views

Category:

Technology


3 download

DESCRIPTION

Short presentation about my final project at Zipfian Academy about quantifying Data Scientist profiles using Linkedin data. The prototype web app is available at: bit.ly/cybads

TRANSCRIPT

Page 1: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Could You Be a Data Scientist?

Carlo Torniai, Ph.D.@carlotorniai

Page 2: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

• Quantify data scientist profiles features • Analyze aspirant data scientist profiles• Provide useful feedback

Goal

?

Page 3: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Why this is relevant?

• A quantitative characterization of data scientists profiles can help closing the loop between job seekers and recruiters

Image: http://www.getelastic.com/wp-content/uploads/puzzle1.jpg

Page 4: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Data Collection Data AnalysisFeature Extraction Model Testing Data Product

• Linkedin API:– General Information– Past work history– Education

• Web Scraping:– Skills

• 1500 profiles– Data Scientists– Software Engineer– Business Analysts– Mathematicians– Statisticians

Page 5: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Data Collection Data AnalysisFeature Extraction Model Testing Data Product

Business AnalystsData scientists

Software Engineers

StatisticiansMathematicians

Page 6: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Data Collection Data AnalysisFeature Extraction Model Testing Data Product

Bioi

nfor

mati

cs

Biol

ogy

Com

pute

r Sc

ienc

e

Econ

omic

s

Elec

tron

ics

Astr

onom

y

Mat

h

Neu

rosc

ienc

e

Oth

er

Phys

ics

Psyc

holo

gy

Stat

s

Engi

neer

ing

Number of PhDs by topic and profiles

Page 7: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Data Collection Data AnalysisFeature Extraction Model Testing Data Product

For the purpose of this project I trained with skills and education features the following models:Random Forest• Classify the profileNaïve Bayes• Multi class probabilities to asses profiles

background componentsK-means• Capability of suggesting similar and relevant profiles

Page 8: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Data Collection Data AnalysisFeature Extraction Model Testing Data Product

For the purpose of this project I trained with skills and education features the following models:

Model Training set Purpose

Random Forest

All 5 categories Classify the profile

Naïve Bayes 4 classic categories: SE, BA, MT, ST

Asses profile backgrounds components with multi class probabilities

K-means All 5 categories Identify similar profiles

Page 9: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Data Collection Data AnalysisFeature Extraction Model Testing Data Product

bit.ly/cybads

Page 10: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Data Collection Data AnalysisFeature Extraction Model Testing Data Product

Naïve BayesMulti class probabilities

Random Forest

Page 11: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Data Collection Data AnalysisFeature Extraction Model Testing Data Product

K-meansclustering

Page 12: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Next Steps

Data Collection Data AnalysisFeature Extraction Model Testing Data Product

Get more data:- Other websites- Indeed- User input on

Web app

- Fine grained parsing of education- Experiment with additional features (industry, years of experience)

• Extend feature set and test more models

• Fuzzy C-means

• Add interactive data collection

• Personalized links for skills

• Explanation about similarity results

Close the loop by analyzing job offers and suggest matching profiles

Page 13: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Thank you!

Technologies

Web App: Flask, jQuery, Vega, MongoDB

NMF, HC, RF ,DT, NB, K-means models:: scikit-learn

Visualizations:Vincent, Vega, NetworkX, Gephi

Acknowledgementyatish27 : Ruby Linkedin public profile Web Scraperozgut : Linkedin API Python wrapper