data science - wordpress.com · 8/2/2018 · traditional data mining life cycle crisp-dm...
TRANSCRIPT
Data ScienceLife Cycle
DR. SYED IMTIYAZ HASSANAssistant Professor,
Deptt. of CSE, Jamia Hamdard(Deemed to be University),
New Delhi, India.http://www.jamiahamdard.edu
https://[email protected]
Basis Databases Data Science
Data Value “Precious” “Cheap”
Data Volume Modest Massive
Examples Bank records, Personnel records,
Census, Medical records
Online clicks, GPS logs, Tweets,
Building sensor readings
Priorities Consistency, Error recovery,
Auditability
Speed, Availability, Query richness
Structured Strongly (Schema) Weakly or none (Text)
Properties Transactions, ACID* CAP* theorem, eventual consistency
Realizations SQL NoSQL:
Apache River,
MongoDB, CouchDB,
Hbase, Cassandra,…
ACID = Atomicity, Consistency, Isolation and Durability CAP = Consistency, Availability, Partition Tolerance
Data Science Vs Databases
For Querying the past Querying the future
(Un) Structured Data
https://www.edureka.co/blog/what-is-data-science/
Scientific Modeling
Physics-based models
Problem-Structured
Mostly deterministic, precise
Run on Supercomputer or
High-end Computing Cluster
Data-Driven Approach
General inference engine replaces model
Structure not related to problem
Statistical models handle true
randomness, and unmodeled complexity.
Run on cheaper computer Clusters (EC2)
Data Science Vs Scientific Computing
Data Science
Explore many models, build and
tune hybrids
Understand empirical properties of
models
Develop/use tools that can handle
massive datasets
Take action!
Machine Learning
Develop new (individual) models
Prove mathematical properties
of models
Improve/validate on a few,
relatively clean, small datasets
Publish a paper
Data Science Vs Machine Learning
Data Science (Analytics) Data Analysis
Providing strategic actionable insights
into the world
Providing operational observations into
issues
Mathematical, technical and strategic
knowledge are mandatory
Data analysis and visualization skills
required
Deal with big data Not necessarily deal with big data
Data Science Vs Data Analysis
Data Analytics
http://www.rosebt.com/blog/descriptive-diagnostic-predictive-prescriptive-analytics
Traditional Data Mining Life Cycle
CRISP-DM methodologyCross-industry standard process for data mining
As a methodology, it includes descriptions of the typical phases of a project, the
tasks involved with each phase, and an explanation of the relationships between
these tasks.
As a process model, CRISP-DM provides an overview of the data mining life
cycle.
Traditional Data Mining Life Cycle
CRISP-DM methodologyCross-industry standard process for data mining
https://www.ibm.com/support/knowledgecenter/en/SS3RA
7_15.0.0/com.ibm.spss.crispdm.help/crisp_overview.htm
By Suite of Analytics Software
(SAS)
5. Assess
4. Model
3. Modify
2. Explore
1. Sample
SEMMA Methodology
Data mining model
Microsoft
5. Customer Acceptance
4. Deployment
3. Modeling
2. Data Acquisition and Understanding
1. Business Understanding
Data Science Lifecycle
Data Science Lifecycle
https://docs.microsoft.com/en-us/azure/machine-learning/team-data-science-process/overview
Data Science Process (Generic)
OPD Data Science Process
Organise Data
involves the physical storage and
format of data and incorporated best
practices in data management.
Package Data
involves logically manipulating and
joining the underlying raw data into a
new representation and package.
Deliver Data
involves ensuring that the message,
the data has is being accessed by
those that need to hear it.
Ben Fry Visualizing Data Process
1. Acquire
2. Parse
3. Filter
4. Mine
5. Represent
6. Refine
7. Interact
Tools
Python or R?
Countries are color-coded for their relative preference for Python (red/purple) or R
(blue) as a Data Science tool. 167 out of 171 countries (98%) demonstrate a value of >
1, indicating a preference for Python over R.
The State of Data Science
Kaggle Survey 2017
An industry-wide survey to establish a comprehensive
view of the state of data science and machine learning.
Received over 16,000 responses
https://www.kaggle.com/surveys/2017
Summary
https://www.edureka.co/blog/what-is-data-science/
Questions ???