what is data science? - files.meetup.comfiles.meetup.com/18695153/290915_datasciencet... · big...

31
WHAT IS DATA SCIENCE? Timo Aho // Data Scientist, PhD // [email protected] // Twitter: @ahotimom Data Science Tampere Meetup 29.9.2015

Upload: others

Post on 21-Jul-2020

21 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

WHAT IS DATA

SCIENCE?

Timo Aho // Data Scientist, PhD // [email protected] // Twitter: @ahotimom

Data Science Tampere Meetup 29.9.2015

Page 2: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

Turnover 2014

38,6Million euros

Over

340professionals

THIS IS SOLITA

Over

18years

Working in

3offices

Over

1000projects

Over

97 %customer

satisfaction

Ranking

6.in Great Place to Work

in Finland

Ranking

43.in European Best

Workplaces

Page 3: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

Strategic planning

Pre-studies

Road maps

Service concepts

Service design

Visual design

User experience design

Usability design

Architecture design

Solution implementation

Continuous services

Hosting services

Understand & concept Pilot & implement Maintain & develop

Page 4: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

ONLINE AND

ECOMMERCE

INFORMATION

MANAGEMENT AND

BIG DATA

UTILIZATION AND

VISUALIZATION OF

INFORMATION

SOFTWARE

DEVELOPMENT

PREDICTIVE

ANALYTICS

DIGITAL STRATEGY AND

TRANSFORMATION

PUBLIC SECTOR ONLINE

SERVICES

BUSINESS PLANNING

AND MANAGEMENT

INTEGRATION

SERVICES

DIGITAL BUSINESS SOLUTIONS

Page 5: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

OUR CUSTOMERS

Page 6: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

RETAIL SERVICES PUBLIC

Page 8: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

OPEN FINLAND CHALLENGE

› An open data contest where you can win prices!

› Solita is offering a challenge on

predictive traffic analytics

› See more: http://openfinlandchallenge.fi/

› The site unfortunately mostly in Finnish

Page 9: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

AGENDA

1. Data science vs Big data

2. Use case examples

3. Data science process

4. Data science methods

Page 10: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

AGENDA

1. Data science vs Big data

2. Use case examples

3. Data science process

4. Data science methods

Page 11: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

BIG DATA CONCEPTS

Big data

Data analysis

Data scienceKnowledge discovery in databases (KDD)

Data mining

Machine learning High VolumePredictive analytics

High Velocity

High Variety

Cloud computation

NoSQL

Cloud storageHadoop MapReduce

Batch vs. Real time

Structured

Unstructured

Semi-structured

Spark

Internet of thingsSensory data

Business analytics

Business intelligence

Page 12: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

DEFINITION FOR BIG DATA?

› Narrow:

• Infrastructure for processing exceptionally large or rapidly produced

data

› Broad:

• All data storing, processing and analyzing

• (Does not necessarily fit into computer memory)

Page 13: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

AGENDA

1. Data science vs Big data

2. Use case examples

3. Data science process

4. Data science methods

Page 14: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

CASE

SANOMA OYJ

A personalized user experience on the most popular web services by

analyzing 200 millions new events daily

Page 15: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

CASE: DIGITAL SERVICE PROVIDER

› Predicting:

• Customer churn

• Cross-selling

› The information available in all customer contacts

• When the customer contacts support

• When marketing contacts customer

• When meeting in shops, in phone, in web

Page 16: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

CASE: RETAIL / SERVICE PROVIDER

› Customers act in waves, for a couple of weeks high service demand

› Analysis

• Segment customers according to behavior

• Predict customer action timing and high demand times

• Affect the customers to make demand level steadier. No peaks.

Page 17: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

CASE

SANOMA OYJ

A personalized user experience on the most popular web services by

analyzing 200 millions new events daily

Page 18: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

AGENDA

1. Data science vs Big data

2. Use case examples

3. Data science process

4. Data science methods

Page 19: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

DATA ANALYSIS PROCESS

Source: CRISP-DM, Image: Wikipedia

50-70%

10-20%20-30%

10-20%

10-20%

5-10%

Page 20: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

Servicelayer

Informationexploitation

Analyticsresult

Analyticsmodeling

Discoveringavailable dataBusiness goals

Reducechurn

WHAT A DATA SCIENTIST DOES?

Datapreprocessing

Ex. 1

Ex. 2Increase

manufacturing quality

Leaving customersBilling

ContractsContacts

Service qualityDemography

Failures

Raw materialsMachine

parametersManufacturing

sensorsEnd-product

quality measurements

Failures

Database connections

Abnormal data forms

Bringing to matrix form

Cleaning or highlighting outliers and exceptions

Handling missing information

Training:80 variables per

leaving customer, three times more

current customers

Training:tens of starting,

intermediate and ending variables

Churn prediction for each customer

Prediction for the optimal parameter values for quality

Getting the predictions to data

bases in source systems

Hint for good parameter values,

indication if suboptimal ones

selected

Optimizing communication to

prevent churn. Customer service

sees the churn prediction for

current customer.

Process controller either uses the recommended

parameter values or tunes them.

Also creates real-time reports on process quality.

Page 21: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

AGENDA

1. Data science vs Big data

2. Use case examples

3. Data science process

4. Data science methods

Page 22: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

NATURE OF THE DATA

› Structured

› Semi-structured

› Unstructured

{"cod":"200","message":0.0032,"city": {

"id":1851632,"name":"Shuzenji","coord":{"lon":138.933334,"lat":34.966671},"country":"JP"

},"cnt":10,"list": [{

"dt":1406080800,"temp": {

"day":297.77,"min":293.52,"max":297.77,"night":293.52,"eve":297.77,"morn":297.77

},"pressure":925.04,"humidity":76

}]}

Page 23: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

HISTORY OR FUTURE?

Descriptive

• What happened?• What is happening?

Predictive

• What will probably happen?

Prescriptive

• What should be done for optimal outcome?

• Reporting• Data warehouses• Master data

• Statistical modeling• Machine learning

• Optimizing• Machine learning• Simulation• Real-time analytics

Most organizations are here

Page 24: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

Feature 1 Feature 2 Feature 3 Feature 4

Data point 1

Data point 2

Data point 3

Data point 4

Data point 5

WHAT DO ALGORITHMS EAT?

Page 25: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

› Visualizations

• High dimension?

› Statistical values, dependencies

› Clustering

DESCRIPTIVE MODELING METHODS

Page 26: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

Source: Wikipedia

Page 27: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

Feature 1 Feature 2 Feature 3 Target feature

Data point 1

Data point 2

Data point 3

Data point 4

Data point 5

WHAT DO ALGORITHMS EAT?

Page 28: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

› Regression

› Classification

PREDICTIVE MODELING METHODS

1 €2 €4,5 €1,5 € 1,3 €2 €

AAAA OP

Page 29: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

WHY IS DATA SCIENCE RELEVANT?

› More data available

› A lot of software tools available

• R, Python, Weka, Rapidminer, Tableau, SPSS, SAS

• Hadoop, Spark, NoSQL databases

• Cloud tools

› Business understanding on how to apply?

Page 30: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases

Twitter @SolitaOy

www.solita.fi

THANK YOU!

TIMO AHO

Data Scientist, PhD

[email protected]

Page 31: WHAT IS DATA SCIENCE? - files.meetup.comfiles.meetup.com/18695153/290915_DataScienceT... · BIG DATA CONCEPTS Big data Data analysis Data science Knowledge discovery in databases