data analysis of data science job ads · 1. anaconda is a free and open source distribution of the...

18
Data Analysis of Data Science Job Ads with suggestions Liuhua Peng Lecturer, School of Mathematics and Statistics 1

Upload: others

Post on 20-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Analysis of Data Science Job Ads · 1. Anaconda is a free and open source distribution of the Python and R for data science. 2. It aims to simplify package management and deployment

Data Analysis of Data Science Job Ads

with suggestions

Liuhua Peng

Lecturer, School of Mathematics and Statistics

1

Page 2: Data Analysis of Data Science Job Ads · 1. Anaconda is a free and open source distribution of the Python and R for data science. 2. It aims to simplify package management and deployment

Thanks !

Fiona Simpson

Careers & Industry ConsultantAcademic Engagement

Faculty of Science

The University of Melbourne

2

Peter Karutz

Senior Advisor Experiential Learning, Industry

Global Leadership and Employability, Academic Services

University Services

The University of Melbourne

Kathy Ryan

Academic Services and Registrar

University Services

The University of Melbourne

Copyright:

Page 3: Data Analysis of Data Science Job Ads · 1. Anaconda is a free and open source distribution of the Python and R for data science. 2. It aims to simplify package management and deployment

An Increasing Market ! ! ! ! !

3

666

1150

1287

0

200

400

600

800

1000

1200

1400

2016 2017 Last 365 Days

Number of Job Ads with data science OR data scientist in the Title

Title with : data science OR data scientist

Last 365 Days: May 03, 2017 – May 02, 2018

Occupations

• Job Titles

• Employers

Skills

• Baseline Skills

• Software and Programming Skills

• Specialized Skills

Education and Experience

• Degree

• Experience

Salaries

Page 4: Data Analysis of Data Science Job Ads · 1. Anaconda is a free and open source distribution of the Python and R for data science. 2. It aims to simplify package management and deployment

Trend in Advertised Occupations

Occupation (Job Title) # Job Ads in 2016 # Job Ads in 2017 # Job Ads in Last 365 Days

Data Science 616 1030 1165

Data Analyst 13 38 32

Software Programmers 8 4 4

Database Administrator 5 9 17

University Lecturer 5 14 14

Software Engineer 3 7 8

Business Intelligence Analyst 2 9 8

Business Manager 1 2 4

4

Page 5: Data Analysis of Data Science Job Ads · 1. Anaconda is a free and open source distribution of the Python and R for data science. 2. It aims to simplify package management and deployment

Employers

5

Page 6: Data Analysis of Data Science Job Ads · 1. Anaconda is a free and open source distribution of the Python and R for data science. 2. It aims to simplify package management and deployment

Degree and Experience

6

2

1

51

9

101

49

12

4 6

0 20 40 60 80 100 120 140 160 180

Bachelors and Honours Degrees

Postgraduate Degree

Levels of Education and Levels of Experience Requested

No Experience Required 0 to 2 Years of Experience 3 to 5 Years of Experience 6 to 8 Years of Experience 9+ Years of Experiences

Page 7: Data Analysis of Data Science Job Ads · 1. Anaconda is a free and open source distribution of the Python and R for data science. 2. It aims to simplify package management and deployment

Top Skills: Baseline and Specialized Skills

40.55

32.96 32.36

16.82 16.31 15.1911.48 10.27 10.01

0

5

10

15

20

25

30

35

40

45

Baseline Skills

Percent of Jobs Requested

7

62.38 61.95

34.8629.68

22.35 21.4

11.82 9.84 8.89 8.8 8.71

0

10

20

30

40

50

60

70

Specialized Skills

Percent of Jobs Requested

Page 8: Data Analysis of Data Science Job Ads · 1. Anaconda is a free and open source distribution of the Python and R for data science. 2. It aims to simplify package management and deployment

Top Skills: Software and Programming Skills

54

.15

43

.58 38

.05

37

.89

19

.35 1

4.8

23

.74

14

.63

5.8

5

11

.38

10

.24

17

.56

6.3

58

.09

41

.11

27

.14

23

.82 19

.5

16

.38

13

.97

11

.86 6

.33

8.6

4

6.7

3

6.5

3

6.2

3

59

.28

41

.33

25

.11

22

.26 18

.9 15

.1

14

.67

10

.53 6.8

2

6.5

6

6.5

6

6.4

7

6.3

0

10

20

30

40

50

60

70

Python SQL R SAS ApacheHadoop

Tableau JAVA MATLAB NoSQL C++ MicrosoftExcel

SPSS Teradata

Percent of Job Ads with specific software and programming skills requested

2016 2017 Last 365 Days

8

Page 9: Data Analysis of Data Science Job Ads · 1. Anaconda is a free and open source distribution of the Python and R for data science. 2. It aims to simplify package management and deployment

Salaries ! ! ! ! ! ! ! ! !

0

9

2831

16

0

5

10

15

20

25

30

35

Salary Range

Number of job ads within each salarycategory

9Mean salary: $112,000Median salary: $108,000

4

18

47

57

27

0

10

20

30

40

50

60

Salary Range

Number of job ads within each salarycategory

2

20

50

84

36

0

10

20

30

40

50

60

70

80

90

Salary Range

Number of job ads within each salarycategory

2016 2017 Last 365 Days

Mean salary: $109,000Median salary: $100,000

Mean salary: $110,000Median salary: $100,000

Page 10: Data Analysis of Data Science Job Ads · 1. Anaconda is a free and open source distribution of the Python and R for data science. 2. It aims to simplify package management and deployment

Salaries ! ! ! ! ! ! ! ! !

0%

9, 11%

28, 33%

31, 37%

16, 19%

Salary Range

$35,000 to $49,999

$50,000 to $74,999

$75,000 to $99,999

$100,000 to $149,999

More than $150,000

10

2%

18, 12%

47, 31%

57, 37%

27, 18%

Salary Range

$35,000 to $49,999

$50,000 to $74,999

$75,000 to $99,999

$100,000 to $149,999

More than $150,000

1%

20, 10%

50, 26%

84, 44%

36, 19%

Salary Range

$35,000 to $49,999

$50,000 to $74,999

$75,000 to $99,999

$100,000 to $149,999

More than $150,000

2016 2017 Last 365 Days

Mean salary: $112,000Median salary: $108,000

Mean salary: $109,000Median salary: $100,000

Mean salary: $110,000Median salary: $100,000

Page 11: Data Analysis of Data Science Job Ads · 1. Anaconda is a free and open source distribution of the Python and R for data science. 2. It aims to simplify package management and deployment

Suggestions and Tips

Page 12: Data Analysis of Data Science Job Ads · 1. Anaconda is a free and open source distribution of the Python and R for data science. 2. It aims to simplify package management and deployment

The Data Science Process

12Schutt and O’Neil (2013), Doing Data Science. O’Reilly.

Page 13: Data Analysis of Data Science Job Ads · 1. Anaconda is a free and open source distribution of the Python and R for data science. 2. It aims to simplify package management and deployment

Useful Resources

13

1. Kaggle https://www.kaggle.com/2. Coursera https://www.coursera.org/3. Amazon Web Services https://aws.amazon.com/

The Elements of Statistical Learninghttps://web.stanford.edu/~hastie/ElemStatLearn/

An Introduction to Statistical Learningwith Applications in Rhttp://www-bcf.usc.edu/~gareth/ISL/

Page 14: Data Analysis of Data Science Job Ads · 1. Anaconda is a free and open source distribution of the Python and R for data science. 2. It aims to simplify package management and deployment

1. Numpy http://www.numpy.org/

2. Scipy https://www.scipy.org/

3. Pandas https://pandas.pydata.org/

4. Matplotlib https://matplotlib.org/

5. Seaborn https://seaborn.pydata.org/

6. Bokeh https://bokeh.pydata.org/

7. Scikit-learn http://scikit-learn.org/

8. Theano http://deeplearning.net/software/theano/

9. Tensorflow https://www.tensorflow.org/

10.BeautifulSoup4, Urllib2, Selenium, Scrapy

Top Python Data Analysis Libraries

14

Page 15: Data Analysis of Data Science Job Ads · 1. Anaconda is a free and open source distribution of the Python and R for data science. 2. It aims to simplify package management and deployment

1. Anaconda is a free and open source distribution of the Python and R for data science.

2. It aims to simplify package management and deployment.

3. Use Jupyter Notebook as IDE

4. Anaconda Distribution is used by over 6 million users, and it includes more than 250 popular data science packages suitable for Windows, Linux, and MacOS.

5. https://www.anaconda.com/download/

Anaconda

15

Page 16: Data Analysis of Data Science Job Ads · 1. Anaconda is a free and open source distribution of the Python and R for data science. 2. It aims to simplify package management and deployment

1. Rcpp, RcppArmadillo, RcppEigen: integrate R with C++

2. Parallel Computing in R:

✓ parallel, foreach: execute the for loop in parallel

✓ SNOW, doSNOW

✓ H2o: facilitates machine learning (e.g. random forests, GBM) in a parallel environment

✓ Rhadoop, RHIPE: interface between R and Hadoop

✓ Rmpi, pbdMPI: interface MPI in R

✓ rgpu, gcbd, OpenCl: GPU programming

✓ data.table, ff, bigmemory: large memory and out-of-memory data

https://cran.r-project.org/web/views/HighPerformanceComputing.html

High Performance Computing in R

16

Page 17: Data Analysis of Data Science Job Ads · 1. Anaconda is a free and open source distribution of the Python and R for data science. 2. It aims to simplify package management and deployment

1. tidyverse https://www.tidyverse.org/

2. rvest

3. dplyr, tidyr, stringr, purrr

4. lubridate

5. ggplot2, maptools, rgdal

6. leaflet

7. caret, e1071, kernlab, nnet, rpart, xgboost, tensorflow

https://cran.r-project.org/web/views/MachineLearning.html

8. shiny http://shiny.rstudio.com/

9. knitr

Useful Packages in R/Rstudio

17

Page 18: Data Analysis of Data Science Job Ads · 1. Anaconda is a free and open source distribution of the Python and R for data science. 2. It aims to simplify package management and deployment

Thank you

Questions?

Liuhua Peng

Lecturer, School of Mathematics and Statistics