data science - wordpress.com · 8/2/2018  · traditional data mining life cycle crisp-dm...

20
Data Science Life Cycle DR. SYED IMTIYAZ HASSAN Assistant Professor, Deptt. of CSE, Jamia Hamdard (Deemed to be University), New Delhi, India. http://www.jamiahamdard.edu https://Syedimtiyazhassan.org [email protected]

Upload: others

Post on 03-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Science - WordPress.com · 8/2/2018  · Traditional Data Mining Life Cycle CRISP-DM methodology Cross-industry standard process for data mining As a methodology, it includes

Data ScienceLife Cycle

DR. SYED IMTIYAZ HASSANAssistant Professor,

Deptt. of CSE, Jamia Hamdard(Deemed to be University),

New Delhi, India.http://www.jamiahamdard.edu

https://[email protected]

Page 2: Data Science - WordPress.com · 8/2/2018  · Traditional Data Mining Life Cycle CRISP-DM methodology Cross-industry standard process for data mining As a methodology, it includes

Basis Databases Data Science

Data Value “Precious” “Cheap”

Data Volume Modest Massive

Examples Bank records, Personnel records,

Census, Medical records

Online clicks, GPS logs, Tweets,

Building sensor readings

Priorities Consistency, Error recovery,

Auditability

Speed, Availability, Query richness

Structured Strongly (Schema) Weakly or none (Text)

Properties Transactions, ACID* CAP* theorem, eventual consistency

Realizations SQL NoSQL:

Apache River,

MongoDB, CouchDB,

Hbase, Cassandra,…

ACID = Atomicity, Consistency, Isolation and Durability CAP = Consistency, Availability, Partition Tolerance

Data Science Vs Databases

For Querying the past Querying the future

Page 3: Data Science - WordPress.com · 8/2/2018  · Traditional Data Mining Life Cycle CRISP-DM methodology Cross-industry standard process for data mining As a methodology, it includes

(Un) Structured Data

https://www.edureka.co/blog/what-is-data-science/

Page 4: Data Science - WordPress.com · 8/2/2018  · Traditional Data Mining Life Cycle CRISP-DM methodology Cross-industry standard process for data mining As a methodology, it includes

Scientific Modeling

Physics-based models

Problem-Structured

Mostly deterministic, precise

Run on Supercomputer or

High-end Computing Cluster

Data-Driven Approach

General inference engine replaces model

Structure not related to problem

Statistical models handle true

randomness, and unmodeled complexity.

Run on cheaper computer Clusters (EC2)

Data Science Vs Scientific Computing

Page 5: Data Science - WordPress.com · 8/2/2018  · Traditional Data Mining Life Cycle CRISP-DM methodology Cross-industry standard process for data mining As a methodology, it includes

Data Science

Explore many models, build and

tune hybrids

Understand empirical properties of

models

Develop/use tools that can handle

massive datasets

Take action!

Machine Learning

Develop new (individual) models

Prove mathematical properties

of models

Improve/validate on a few,

relatively clean, small datasets

Publish a paper

Data Science Vs Machine Learning

Page 6: Data Science - WordPress.com · 8/2/2018  · Traditional Data Mining Life Cycle CRISP-DM methodology Cross-industry standard process for data mining As a methodology, it includes

Data Science (Analytics) Data Analysis

Providing strategic actionable insights

into the world

Providing operational observations into

issues

Mathematical, technical and strategic

knowledge are mandatory

Data analysis and visualization skills

required

Deal with big data Not necessarily deal with big data

Data Science Vs Data Analysis

Page 7: Data Science - WordPress.com · 8/2/2018  · Traditional Data Mining Life Cycle CRISP-DM methodology Cross-industry standard process for data mining As a methodology, it includes

Data Analytics

http://www.rosebt.com/blog/descriptive-diagnostic-predictive-prescriptive-analytics

Page 8: Data Science - WordPress.com · 8/2/2018  · Traditional Data Mining Life Cycle CRISP-DM methodology Cross-industry standard process for data mining As a methodology, it includes

Traditional Data Mining Life Cycle

CRISP-DM methodologyCross-industry standard process for data mining

As a methodology, it includes descriptions of the typical phases of a project, the

tasks involved with each phase, and an explanation of the relationships between

these tasks.

As a process model, CRISP-DM provides an overview of the data mining life

cycle.

Page 9: Data Science - WordPress.com · 8/2/2018  · Traditional Data Mining Life Cycle CRISP-DM methodology Cross-industry standard process for data mining As a methodology, it includes

Traditional Data Mining Life Cycle

CRISP-DM methodologyCross-industry standard process for data mining

https://www.ibm.com/support/knowledgecenter/en/SS3RA

7_15.0.0/com.ibm.spss.crispdm.help/crisp_overview.htm

Page 10: Data Science - WordPress.com · 8/2/2018  · Traditional Data Mining Life Cycle CRISP-DM methodology Cross-industry standard process for data mining As a methodology, it includes

By Suite of Analytics Software

(SAS)

5. Assess

4. Model

3. Modify

2. Explore

1. Sample

SEMMA Methodology

Data mining model

Page 11: Data Science - WordPress.com · 8/2/2018  · Traditional Data Mining Life Cycle CRISP-DM methodology Cross-industry standard process for data mining As a methodology, it includes

Microsoft

5. Customer Acceptance

4. Deployment

3. Modeling

2. Data Acquisition and Understanding

1. Business Understanding

Data Science Lifecycle

Page 12: Data Science - WordPress.com · 8/2/2018  · Traditional Data Mining Life Cycle CRISP-DM methodology Cross-industry standard process for data mining As a methodology, it includes

Data Science Lifecycle

https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview

Page 13: Data Science - WordPress.com · 8/2/2018  · Traditional Data Mining Life Cycle CRISP-DM methodology Cross-industry standard process for data mining As a methodology, it includes

Data Science Process (Generic)

OPD Data Science Process

Organise Data

involves the physical storage and

format of data and incorporated best

practices in data management.

Package Data

involves logically manipulating and

joining the underlying raw data into a

new representation and package.

Deliver Data

involves ensuring that the message,

the data has is being accessed by

those that need to hear it.

Page 14: Data Science - WordPress.com · 8/2/2018  · Traditional Data Mining Life Cycle CRISP-DM methodology Cross-industry standard process for data mining As a methodology, it includes

Ben Fry Visualizing Data Process

1. Acquire

2. Parse

3. Filter

4. Mine

5. Represent

6. Refine

7. Interact

Page 15: Data Science - WordPress.com · 8/2/2018  · Traditional Data Mining Life Cycle CRISP-DM methodology Cross-industry standard process for data mining As a methodology, it includes

Tools

Page 16: Data Science - WordPress.com · 8/2/2018  · Traditional Data Mining Life Cycle CRISP-DM methodology Cross-industry standard process for data mining As a methodology, it includes

Python or R?

Countries are color-coded for their relative preference for Python (red/purple) or R

(blue) as a Data Science tool. 167 out of 171 countries (98%) demonstrate a value of >

1, indicating a preference for Python over R.

Page 17: Data Science - WordPress.com · 8/2/2018  · Traditional Data Mining Life Cycle CRISP-DM methodology Cross-industry standard process for data mining As a methodology, it includes

The State of Data Science

Kaggle Survey 2017

An industry-wide survey to establish a comprehensive

view of the state of data science and machine learning.

Received over 16,000 responses

Page 18: Data Science - WordPress.com · 8/2/2018  · Traditional Data Mining Life Cycle CRISP-DM methodology Cross-industry standard process for data mining As a methodology, it includes

https://www.kaggle.com/surveys/2017

Page 19: Data Science - WordPress.com · 8/2/2018  · Traditional Data Mining Life Cycle CRISP-DM methodology Cross-industry standard process for data mining As a methodology, it includes

Summary

https://www.edureka.co/blog/what-is-data-science/

Page 20: Data Science - WordPress.com · 8/2/2018  · Traditional Data Mining Life Cycle CRISP-DM methodology Cross-industry standard process for data mining As a methodology, it includes

Questions ???