what is data science? - files. · pdf file big data concepts big data data analysis data...

Click here to load reader

Post on 21-Jul-2020

17 views

Category:

Documents

2 download

Embed Size (px)

TRANSCRIPT

  • WHAT IS DATA

    SCIENCE?

    Timo Aho // Data Scientist, PhD // [email protected] // Twitter: @ahotimom

    Data Science Tampere Meetup 29.9.2015

    mailto:[email protected]

  • Turnover 2014

    38,6 Million euros

    Over

    340 professionals

    THIS IS SOLITA

    Over

    18 years

    Working in

    3 offices

    Over

    1000 projects

    Over

    97 % customer

    satisfaction

    Ranking

    6. in Great Place to Work

    in Finland

    Ranking

    43. in European Best

    Workplaces

  • Strategic planning

    Pre-studies

    Road maps

    Service concepts

    Service design

    Visual design

    User experience design

    Usability design

    Architecture design

    Solution implementation

    Continuous services

    Hosting services

    Understand & concept Pilot & implement Maintain & develop

  • ONLINE AND

    ECOMMERCE

    INFORMATION

    MANAGEMENT AND

    BIG DATA

    UTILIZATION AND

    VISUALIZATION OF

    INFORMATION

    SOFTWARE

    DEVELOPMENT

    PREDICTIVE

    ANALYTICS

    DIGITAL STRATEGY AND

    TRANSFORMATION

    PUBLIC SECTOR ONLINE

    SERVICES

    BUSINESS PLANNING

    AND MANAGEMENT

    INTEGRATION

    SERVICES

    DIGITAL BUSINESS SOLUTIONS

  • OUR CUSTOMERS

  • RETAIL SERVICES PUBLIC

  • INSURANCE MEDIA & TELECOM MANUFACTURING

    http://www.metso.com/corporation/home_eng.nsf/WebFrontPage/$First?OpenDocument http://www.metso.com/corporation/home_eng.nsf/WebFrontPage/$First?OpenDocument http://www.fazer.fi/ http://www.fazer.fi/

  • OPEN FINLAND CHALLENGE

    › An open data contest where you can win prices!

    › Solita is offering a challenge on

    predictive traffic analytics

    › See more: http://openfinlandchallenge.fi/

    › The site unfortunately mostly in Finnish

  • AGENDA

    1. Data science vs Big data

    2. Use case examples

    3. Data science process

    4. Data science methods

  • AGENDA

    1. Data science vs Big data

    2. Use case examples

    3. Data science process

    4. Data science methods

  • BIG DATA CONCEPTS

    Big data

    Data analysis

    Data science Knowledge discovery in databases (KDD)

    Data mining

    Machine learning High VolumePredictive analytics

    High Velocity

    High Variety

    Cloud computation

    NoSQL

    Cloud storage Hadoop MapReduce

    Batch vs. Real time

    Structured

    Unstructured

    Semi-structured

    Spark

    Internet of things Sensory data

    Business analytics

    Business intelligence

  • DEFINITION FOR BIG DATA?

    › Narrow:

    • Infrastructure for processing exceptionally large or rapidly produced

    data

    › Broad:

    • All data storing, processing and analyzing

    • (Does not necessarily fit into computer memory)

  • AGENDA

    1. Data science vs Big data

    2. Use case examples

    3. Data science process

    4. Data science methods

  • CASE

    SANOMA OYJ

    A personalized user experience on the most popular web services by

    analyzing 200 millions new events daily

  • CASE: DIGITAL SERVICE PROVIDER

    › Predicting:

    • Customer churn

    • Cross-selling

    › The information available in all customer contacts

    • When the customer contacts support

    • When marketing contacts customer

    • When meeting in shops, in phone, in web

  • CASE: RETAIL / SERVICE PROVIDER

    › Customers act in waves, for a couple of weeks high service demand

    › Analysis

    • Segment customers according to behavior

    • Predict customer action timing and high demand times

    • Affect the customers to make demand level steadier. No peaks.

  • CASE

    SANOMA OYJ

    A personalized user experience on the most popular web services by

    analyzing 200 millions new events daily

  • AGENDA

    1. Data science vs Big data

    2. Use case examples

    3. Data science process

    4. Data science methods

  • DATA ANALYSIS PROCESS

    Source: CRISP-DM, Image: Wikipedia

    50-70%

    10-20% 20-30%

    10-20%

    10-20%

    5-10%

  • Service layer

    Information exploitation

    Analytics result

    Analytics modeling

    Discovering available dataBusiness goals

    Reduce churn

    WHAT A DATA SCIENTIST DOES?

    Data preprocessing

    Ex. 1

    Ex. 2 Increase

    manufacturing quality

    Leaving customers Billing

    Contracts Contacts

    Service quality Demography

    Failures

    Raw materials Machine

    parameters Manufacturing

    sensors End-product

    quality measurements

    Failures

    Database connections

    Abnormal data forms

    Bringing to matrix form

    Cleaning or highlighting outliers and exceptions

    Handling missing information

    Training: 80 variables per

    leaving customer, three times more

    current customers

    Training: tens of starting,

    intermediate and ending variables

    Churn prediction for each customer

    Prediction for the optimal parameter values for quality

    Getting the predictions to data

    bases in source systems

    Hint for good parameter values,

    indication if suboptimal ones

    selected

    Optimizing communication to

    prevent churn. Customer service

    sees the churn prediction for

    current customer.

    Process controller either uses the recommended

    parameter values or tunes them.

    Also creates real- time reports on process quality.

  • AGENDA

    1. Data science vs Big data

    2. Use case examples

    3. Data science process

    4. Data science methods

  • NATURE OF THE DATA

    › Structured

    › Semi-structured

    › Unstructured

    { "cod":"200","message":0.0032, "city": {

    "id":1851632,"name":"Shuzenji", "coord":{"lon":138.933334,"lat":34.966671}, "country":"JP"

    }, "cnt":10, "list": [{

    "dt":1406080800, "temp": {

    "day":297.77, "min":293.52, "max":297.77, "night":293.52, "eve":297.77, "morn":297.77

    }, "pressure":925.04, "humidity":76

    }] }

  • HISTORY OR FUTURE?

    Descriptive

    • What happened? • What is happening?

    Predictive

    • What will probably happen?

    Prescriptive

    • What should be done for optimal outcome?

    • Reporting • Data warehouses • Master data

    • Statistical modeling • Machine learning

    • Optimizing • Machine learning • Simulation • Real-time analytics

    Most organizations are here

  • Feature 1 Feature 2 Feature 3 Feature 4

    Data point 1

    Data point 2

    Data point 3

    Data point 4

    Data point 5

    WHAT DO ALGORITHMS EAT?

  • › Visualizations

    • High dimension?

    › Statistical values, dependencies

    › Clustering

    DESCRIPTIVE MODELING METHODS

  • Source: Wikipedia

  • Feature 1 Feature 2 Feature 3 Target feature

    Data point 1

    Data point 2

    Data point 3

    Data point 4

    Data point 5

    WHAT DO ALGORITHMS EAT?

  • › Regression

    › Classification

    PREDICTIVE MODELING METHODS

    1 €2 €4,5 €1,5 € 1,3 €2 €

    AAAA OP

  • WHY IS DATA SCIENCE RELEVANT?

    › More data available

    › A lot of software tools available

    • R, Python, Weka, Rapidminer, Tableau, SPSS, SAS

    • Hadoop, Spark, NoSQL databases

    • Cloud tools

    › Business understanding on how to apply?

  • Twitter @SolitaOy

    www.solita.fi

    THANK YOU!

    TIMO AHO

    Data Scientist, PhD

    [email protected]

    mailto:[email protected]