data science 101 - presentation · 2019-09-15 · data marts data scientist data mining and...

23
Data Science 101 Arik Pelkey Pentaho Senior Director – Product Marketing, Hitachi Vantara Scott Cooley Pentaho Data Scientist, Hitachi Vantara

Upload: others

Post on 29-May-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite

DataScience101ArikPelkeyPentaho SeniorDirector– ProductMarketing,HitachiVantaraScottCooleyPentaho DataScientist,HitachiVantara

Page 2: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite

Agenda

Thissessionwillprovideanintroductiontodatasciencefundamentals.

• WhatisDataScience?

• CommonUseCasesandAlgorithms

• TheDataScienceProcess• BuildingaDataScienceTeam• TheFuture

Page 3: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite

AI,MachineLearning,andDeepLearning

Imagefromhttps://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/.

• AI:Gettingmachinestodowhathumansaregoodat

• DeepLearning:Atypeofmachinelearning

• MachineLearning:Feedinganalgorithmdatatolearnandpredictsomething

Page 4: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite

DataScience:SolvingProblemswithData

DiagramfromDrewConway:http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram.

Understandingoftheunderlyingassumptions

Algorithmsandnumerical

techniquestoderiveinsights

HACKINGSKILLS

MATHANDSTATISTICS

KNOWLEDGE

DATASCIENCE

DangerZone!

TraditionalResearch

MachineLearning

SUBSTANTIVEEXPERIENCE

Computerscience,dataengineeringandwrangling,coding

Domainknowledge,businessacumen,experience,

valuetothebusiness

Page 5: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite

What’sallthefuss?Thisstuffwascreatedmanymanyyearsago

• Legendre,GaussandGaltonearly1800’s

Hereisasamplefootnote.

• ThomasBayesmid1700’s

• McCullochandPittsearly1940s

• BayesTheorem

• Regression

• NeuralNetworks

Page 6: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite

ThinkaboutAllOurDataandCompute

https://www.computerworld.com.au/article/392735/ska_telescope_generate_more_data_than_entire_internet_2020/.

SKA- 2020(SquareKilometerArrayTelescope)

WillgenerateasmuchdatainadayastheentirePLANETdoesinayear!

ItisstillGROWING!

Page 7: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite

Hereisasamplefootnote.

Regression – Lookingforastatisticalrelationshipacrossvariablesthatmaygiveusanestimateofaparticularoutcome.

Classification – Similartoregressionbutlookingforseparationsinthedatagivenpredefinedclasses.(Supervised)

Clustering – Donothavepredefinedclassesbuttryingtofindgroupsorsetsbasedupondataathand.(Unsupervised)

AnomalyDetection–Identificationofoutliersbaseduponexpectedrangesofdata.

✕✕✕✕ ✕

✕✕

△△△△

✕✕✕

△△

◇△△△△△△

△△△

△△△

?

?

△△△

TypesofMachineLearning

Page 8: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite

LabelledvsUnlabelledLetssaywewanttoClassifyHousesbySize

Unsupervised

SIZEismissing!We needtolookforsimilaritiesinthedataandgroupthemintoclusters.

GivenFeaturesorFeatureSet

LabelFullBath HalfBath Bedrooms HomeAge1 0 2 561 1 3 592 1 3 202 1 3 19

SizeMLMS

SupervisedLearning

Usethelabelstobuildamodel.ModelusedtoclassifynewhousesizebasedONLYontheknownfeatureset.

Page 9: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite

MoreonMachineLearningMachineLearning isamethodologytocreateamodelbasedonsampledataandusethemodeltomakeapredictionorstrategyusingamorealgorithmicapproach.

Historicalrecordsthatcontainsquarefeet,numberofbathrooms,zipcode….

Recordsthatcontainthepricethehousesoldfor

Iteratethealgorithmoverthecombineddatatotrainthemodel

Usethetrainedmodeltopredictoutcomeonnewrecords

SUPERVISEDLEARNINGMODEL

Page 10: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite

TheDataScienceProcess:GettingfromRawDatatoOutcomes

JoeBlizstein andHanspeter Pfister createdforHarvardDataSciencecourse.

FormalFrameworkCRISP–DMCrossIndustryStandardProcess

forDataMining

TheDataScienceWorkflow

Page 11: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite

SpecialistTraditionalDataScienceTeam

DataScientist(DS)– Preparesdata,engineersfeatures,mostvaluableskill:trainingmodels.

DataEngineer(DE)– Dataacquisitionfocus.Builddatapipelines.Notuncommontohave5:1ratioDE:DS

DataAnalyst(DA)– AssistDSwithdataprep

Applicationarchitect(AA)– Designcompletesolution;deployandmaintainmodelsinproduction

Page 12: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite

MythicalCreatures

Page 13: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite

Trends

• Automation

• ToolsforCitizenDataScientists• Pre-trainedmodelsinthecloud

Hereisasamplefootnote.

Page 14: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite

HiringGuidance

Hereisasamplefootnote.

Page 15: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite

DefiningSuccess

• Easyforthetangible– Searchorderoptimization– RecommendationengineorCTR

• Hardforothers– Leadscoring– Attrition

• Trytomeasuredirectoutcomes

• Rarelyasilverbullet• ThinkROI

Hereisasamplefootnote.

Page 16: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite

TypicalDataScienceProject

DS

Understandbusinessobjectives

AA

DE

DS

IDandprocure

trainingdata

DA

DS

Preparedataandbuild

newfeatures

DS

Trainmodel

Deploymodels

AA

DS

Updatemodels

AA

Page 17: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite

PreventiveMaintenance:Caterpillar

Page 18: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite

MarineAssetIntelligence

Business User (COO) Reporting on

Operations and Efficiency

Dashboards and Reports on Machine

Performance (Onboard and

Onshore)

DataMarts

Data ScientistData Mining and

Predictive Maintenance

LocalEquipmentsensorandServerData

FleetDataviaSatellite

CrossDepartmentOperationsDataScheduling/ERP

DataIntegration

DataIntegration

Page 19: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite

TheFuture

• Scalingup/enablingmoredatascientists

• Modelmanagement

• Improvedproductivity

• Supportforcontainerizedapplications.

Hereisasamplefootnote.

Page 20: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite

PentahoMLOrchestration

• Makesdatascienceteamsmoreproductive

• Broadsupportforopensourcelibrariesinvariouslanguages

Page 21: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite

Summary

• WhatisDataScience

• CommonUseCasesandAlgorithms

• TheDataScienceProcess• BuildingaDataScienceTeam• TheFuture

Page 22: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite

NextSteps

Wanttolearnmore?

• ScheduleaMeettheExpert

• ReadMarkHall’sMachineLearningwithPentahoBlog

Page 23: Data Science 101 - Presentation · 2019-09-15 · Data Marts Data Scientist Data Mining and Predictive Maintenance Local Equipment sensor and Server Data Fleet Data via Satellite