data science presentation 2nd ci day

24
CIJ is Sponsored By: Career of Future 10/13/2015 1

Upload: mohammed-barakat

Post on 17-Feb-2017

241 views

Category:

Data & Analytics


0 download

TRANSCRIPT

CIJ is Sponsored By:

Career of Future

10/13/2015 1

About MEMohammed K. Barakat

• Industrial Engineer, The University of Jordan

• Business Excellence Manager-FINE Hygienic Paper Company

• Professional Engineer in Industrial Engineering (PE), (JCPQA-JEA)

• Project Management Professional (PMP), (PMI)

• Risk Management Professional (PMI-RMP), (PMI)

• Certified Six Sigma Black Belt (CSSBB), (ASQ)

• Certified Six Sigma Green Belt (CSSGB), (ASQ)

• Microsoft Certified Technology Specialist (MCTS), (Microsoft)

• Microsoft Certified Trainer (MCT), (Microsoft)

mohammedbarakat

MohdBarakat

MohdKBarakat

10/13/2015

2

Data Science: Career of the Future

10/13/2015

3

http://www.wired.com/insights/2014/06/tell-kids-data-scientists-doctors/

…Did you hear that? Data scientists earning more than

doctors…

…But salary is not the only reason…

…data scientists will have a measurable impact on the

future of healthcare.

Why Data Science?

10/13/2015

4

http://www.economist.com/node/15579717

…the quantity of information in the world is soaring

…150 exabytes (billion gigabytes) of data in 2005. This year,

it will create 1,200 exabytes…

…keeping up with this flood, and storing the bits that might

be useful, is difficult enough…

…Analyzing it, to spot patterns and extract useful

information, is harder..

…Even so, the data deluge is already starting to transform

business…

Why “Data Scientist” is a hugely important

profession in the next decade?

10/13/2015

5

“I keep saying that the sexy job in the next

10 years will be statisticians,” said Hal

Varian, chief economist at Google. “And I’m

not kidding.”

https://www.youtube.com/watch?v=pi472Mi3VLw

Why “Data Scientist” is a hugely important

profession in the next decade?

• …ability to take the data

10/13/2015

6

• …extract value from it

• …understand the process

• …visualize it

• …Not only at the professional level

• …communicate it

• …Ubiquitous data…but

• …Statisticians are just part of it

• …Scarcity in ability to understand data and extract value from it

• …Managers need to access and understand the data themselves

• …No army behind the scenes to digest the information for you

What is Data Science?

10/13/2015

7

“Data Science is the extraction of knowledge from

large volumes of data that are structured or

unstructured”

often requires sorting through a great amount of

information and writing algorithms to extract insights

from this data.

What is Big Data?

10/13/2015

8

Big Data is high volume, high velocity, and/or high varietyinformation assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization."

The 3V’s of Big Data:

Volume: amount of data

Velocity: speed of data in and out

Variety: range of data type and sources

The Data Science Process

10/13/2015

9

The Data Scientist Toolbox

10/13/2015

10

R Software

a software environment for statistical computing and graphics

The Data Scientist Toolbox

10/13/2015

11

RStudio

An open source software to make it easy for anyone to analyze data with R

The Data Scientist Toolbox

10/13/2015

12

You’ve got to do a lot of

coding!

The Data Scientist Toolbox

10/13/2015

13

You’ve got to work out

a lot of statistics!

The Data Scientist Toolbox

10/13/2015

14

Github.com RPubs.com

Share your results and codePublish your full report and build a personal Brand

The Data Scientist Toolbox

10/13/2015

15

RPubs.com

You’d be a Data Scientist…

…..evidence-based results

…..reproducible research

The Data Science process explained

10/13/2015

16

STEP 1: Getting and Cleaning Data

Downloading files

Reading data

Raw vs. Tidy data

Merging data

Reshaping data

Summarizing data

Data ‘Housekeeping’

The Data Science process explained

10/13/2015

17

STEP 2: Exploratory Data Analysis

understand data properties

find patterns in data

communicate results

It is made quickly

Many are made

The goal is for personal understanding

The Data Science process explained

10/13/2015

18

STEP 3: Perform Statistical Inference

“Statistical inference is the process of drawing formal

conclusions from data”.

Some techniques and concepts:

Sampling

Randomization

Hypothesis Testing

Confidence Intervals (uncertainty)

Experimental Design

The Data Science process explained

10/13/2015

19

STEP 4: Perform Regression Modelling

“a statistical process for estimating the

relationships among variables”

understand how the value of the dependent

variable changes when any one of the

independent variables is varied.

widely used for prediction (next step)

The Data Science process explained

10/13/2015

20

STEP 5: Perform Machine Learning

“is a computer's way of learning from examples

by using algorithms that take in data and

improve themselves to predict on new data”

Example:

The spam filter working in the background to

block your junk email.

The Data Science process explained

10/13/2015

21

STEP 6: Make your research Reproducible

“Make analytic data and code available so that

others may reproduce findings”

Why?!

To provide scientific evidence of your findings.

http://www.rpubs.com/mohammedkb/TransMPGAnalysis

What it takes you to be a good Data Scientist

10/13/2015

22

Business

skills Communications

skills

Analytical

skills

Computer

science

Statistics

Creativity

Scientific

Mindset

Passion &

Perseverance

What to do next?

10/13/2015

23

Start learning about Data Science

Go to the Massive Open Online Course (MOOC)

o Coursera/Data Science

o DataCamp

10/13/2015

24