convertirse en un equipo de ingenieros centrado en datos: … · 4 data science maturity levels...

26
2 © 2015 The MathWorks, Inc. Convertirse en un equipo de ingenieros centrado en datos: ponerse al día con el aluvión de datos Paula Poza

Upload: others

Post on 09-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

2© 2015 The MathWorks, Inc.

Convertirse en un equipo de ingenieros centrado

en datos: ponerse al día con el aluvión de datos

Paula Poza

Page 2: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

3

A path for how your team can

better work with and utilize

data.

Page 3: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

4

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Generally Useful

Tools for

Analysis

Common

Infrastructure;

Tested and

Documented

Overhead when asking

a new question

Ease of scaling

to more people

Page 4: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

5

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

Generally Useful

Tools for

Analysis

• Goal is to be fast: reduce time to insight

Page 5: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

6

Getting Started: Exploring a New Dataset

Page 6: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

7

Getting Started: Exploring a New Dataset

Page 7: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

8

Getting Started: Exploring a New Dataset

Page 8: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

9

Getting Started: Exploring a New Dataset Missing Dataismissing

rmmissing

fillmissing

Outliersisoutlier

rmoutliers

filloutliers

Change Pointsischange

Noisy Datasmoothdata

and more…

https://www.mathworks.com/help/

matlab/preprocessing-data.html

Page 9: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

10

Getting Started: Exploring a New Dataset

Page 10: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

11

Getting Started: Exploring a New Datasetgeoplot

geoscatter

geobubble

geodensityplot

https://www.mathworks.com/help/

matlab/geographic-plots.html

Page 11: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

12

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

• Explore and understand data

• Document analysis

• Tools will be re-used in next steps

Generally Useful

Tools for

Analysis

Page 12: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

13

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

• Apply to different datasets

• Functions/Scripts

• MATLAB Apps

• Trend: Work with BIG DATA

Generally Useful

Tools for

Analysis

Page 13: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

14

Overview of Flight Data

▪ 35 unique aircraft

▪ 180,000 unique flights

▪ 300 GB of data

▪ Source:

– NASA Dash Link: Sample Flight Data

– https://c3.nasa.gov/dashlink/projects/85/

Page 14: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

15

Big Data Creates Opportunities

Find rare events, then dive deeper

Build and validate test scenarios that match real-world conditions

Perform fleet-wide calculations

Page 15: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

16

Big Data Requires New Tools

Built-In Datastores

General datastore

spreadsheetDatastore

tabularTextDatastore

fileDatastore

Database databaseDatastore

Image imageDatastore

denoisingImageDatastore

randomPatchExtractionDatastore

pixelLabelDatastore

augmentedImageDatastore

Audio audioDatastore

Predictive

Maintenance

fileEnsembleDatastore

simulationEnsembleDatastore

Simulink SimulationDatastore

Automotive mdfDatastore

Page 16: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

17

Big Data Requires New Tools

▪ Customize a datastore to work with

your dataset

▪ Gives you control over how data is

loaded and formatted

▪ MATLAB subclass: “fill-in-the-blanks”

▪ Build a piece of infrastructure, then re-

use it in your analyses

function [data,info] = read(ds)

...

end

function tf = hasdata(ds)

...

end

function reset(ds)

...

end

function p = progress(ds)

...

end

function data = readall(ds)

...

end

Custom Datastore

Page 17: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

18

A Custom Datastore for Flight Data

Page 18: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

19

Find Rare Events, then Dive Deeper

Page 19: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

20

Perform Fleet-Wide Calculations

Page 20: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

21

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

• Make it easy to navigate the data

• Re-use each time you analyze the dataset

Generally Useful

Tools for

Analysis

Page 21: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

22

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

Generally Useful

Tools for

Analysis

• Collaborate: Work with others on a common code base

• Verify: Write well-tested software

• Share: Build tools for others

Page 22: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

23

MATLAB Projects

Page 23: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

24

Testing

Page 24: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

25

Creating a Toolbox

Page 25: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

26

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

• Scale-out to larger group of users

• Easier to maintain and share

Generally Useful

Tools for

Analysis

Page 26: Convertirse en un equipo de ingenieros centrado en datos: … · 4 Data Science Maturity Levels Ad-hoc Individual Analysis Generally Useful Tools for Analysis Common Infrastructure;

27

What’s Next?

Advanced Analytics and Machine Learning

Build and Test Algorithms for

Embedded Systems

Deploy Apps and Analytics to

Enterprise IT Systems