becoming a data-centric engineering team - mathworks · control data import with importoptions 2....

57
1 © 2015 The MathWorks, Inc. Becoming a Data-Centric Engineering Team Paul Peeling

Upload: others

Post on 15-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

1© 2015 The MathWorks, Inc.

Becoming a Data-Centric Engineering Team

Paul Peeling

Page 2: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

2

A path for how your team can

better explore, understand

and analyze data.

Page 3: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

3

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Generally Useful

Tools for

Analysis

Common

Infrastructure;

Tested and

Documented

Cost of asking a

question

Ease of scaling

to more people

Page 4: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

4

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

Generally Useful

Tools for

Analysis

• Goal is to be fast: reduce time to insight

Page 5: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

5

Overview of Flight Data

▪ 35 unique aircraft

▪ 180,000 unique flights

▪ 300 GB of data

▪ Source:

– NASA Dash Link: Sample Flight Data

– https://c3.nasa.gov/dashlink/projects/85/

Page 6: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

6

Big Data Creates Opportunities

Find rare events, then dive deeper

Build and validate test scenarios that match real-world conditions

Perform fleet-wide calculations

Page 7: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

7

Exploring a New Dataset

1. Control data import with ImportOptions

2. Working with table and timetable datatypes

3. Interacting with data in the Live Editor

4. Filling in outliers with a Live Task

5. Synchronizing time-based data

6. New geographic visualizations

Missing Dataismissing

rmmissing

fillmissing

Outliersisoutlier

rmoutliers

filloutliers

Change Pointsischange

Noisy Datasmoothdata

and more…

https://www.mathworks.com/help/

matlab/preprocessing-data.html

geoplot

geoscatter

geobubble

geodensityplot

https://www.mathworks.com/help/

matlab/geographic-plots.html

Page 8: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

8

Getting Started: Exploring a New Dataset

Page 9: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

9

Getting Started: Exploring a New Dataset

Page 10: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

10

Getting Started: Exploring a New Dataset

Page 11: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

11

Getting Started: Exploring a New Dataset

Page 12: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

12

Getting Started: Exploring a New Dataset

Page 13: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

13

Getting Started: Exploring a New Dataset

Page 14: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

14

Getting Started: Exploring a New Dataset

Page 15: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

15

Getting Started: Exploring a New Dataset

Page 16: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

16

Getting Started: Exploring a New Dataset

Page 17: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

17

Getting Started: Exploring a New Dataset

Page 18: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

18

Getting Started: Exploring a New Dataset

Page 19: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

19

Getting Started: Exploring a New Dataset

Page 20: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

20

Getting Started: Exploring a New Dataset

Page 21: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

21

Getting Started: Exploring a New DatasetMissing Dataismissing

rmmissing

fillmissing

Outliersisoutlier

rmoutliers

filloutliers

Change Pointsischange

Noisy Datasmoothdata

and more…

https://www.mathworks.com/help/

matlab/preprocessing-data.html

Page 22: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

22

Getting Started: Exploring a New Dataset

Page 23: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

23

Getting Started: Exploring a New Dataset

Page 24: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

24

Getting Started: Exploring a New Dataset

Page 25: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

25

Getting Started: Exploring a New Dataset

Page 26: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

26

Getting Started: Exploring a New Dataset

Page 27: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

27

Getting Started: Exploring a New Datasetgeoplot

geoscatter

geobubble

geodensityplot

https://www.mathworks.com/help/

matlab/geographic-plots.html

Page 28: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

28

Getting Started: Exploring a New Dataset

Page 29: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

29

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

• Explore and understand data

• Document analysis

• Tools will be re-used in next steps

Generally Useful

Tools for

Analysis

Page 30: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

30

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

• Apply to different datasets

• Functions/Scripts

• MATLAB Apps

• Trend: Work with BIG DATA

Generally Useful

Tools for

Analysis

Page 31: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

31

Reusable Tools for Big Data Analysis

1. Custom datastore for reading all the data

2. Extension of the datastore for specific queries

3. Extending datastore with tall variables

Page 32: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

32

Big Data Requires New Tools

Built-In Datastores

General datastore

spreadsheetDatastore

tabularTextDatastore

fileDatastore

Database databaseDatastore

Image imageDatastore

denoisingImageDatastore

randomPatchExtractionDatastore

pixelLabelDatastore

augmentedImageDatastore

Audio audioDatastore

Predictive

Maintenance

fileEnsembleDatastore

simulationEnsembleDatastore

Simulink SimulationDatastore

Automotive mdfDatastore

Page 33: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

33

Big Data Requires New Tools

▪ Customize a datastore to work with

your dataset

▪ Gives you control over how data is

loaded and formatted

▪ MATLAB subclass: “fill-in-the-blanks”

▪ Build a piece of infrastructure, then re-

use it in your analyses

function [data,info] = read(ds)

...

end

function tf = hasdata(ds)

...

end

function reset(ds)

...

end

function p = progress(ds)

...

end

function data = readall(ds)

...

end

Custom Datastore

Page 34: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

34

A Custom Datastore for Flight Data

Page 35: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

35

A Custom Datastore for Flight Data

Page 36: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

36

A Custom Datastore for Flight Data

Page 37: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

37

A Custom Datastore for Flight Data

Page 38: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

38

A Custom Datastore for Flight Data

Page 39: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

39

Find Rare Events, then Dive Deeper

Page 40: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

40

Find Rare Events, then Dive Deeper

Page 41: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

41

Find Rare Events, then Dive Deeper

Page 42: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

42

Find Rare Events, then Dive Deeper

Page 43: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

43

Big Data Requires New Tools

▪ When to use tall

– the function has already been implemented

– the output of the operation does not fit in memory

– multiple independent outputs are required

– the algorithm works over a moving window

Tall Arrays

Page 44: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

44

Perform Fleet-Wide Calculations

Page 45: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

45

Perform Fleet-Wide Calculations

Page 46: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

46

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

• Make it easy to navigate the data

• Re-use each time you analyze the dataset

Generally Useful

Tools for

Analysis

Page 47: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

47

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

Generally Useful

Tools for

Analysis

• Collaborate: Work with others on a common code base

• Verify: Write well-tested software

• Share: Build tools for others

Page 48: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

48

MATLAB for Data Science Teams

1. MATLAB Projects

2. Unit Testing

3. Toolbox Packaging

Page 49: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

49

MATLAB Projects

Page 50: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

50

MATLAB Projects

Page 51: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

51

MATLAB Projects

Page 52: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

52

Testing

Page 53: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

53

Testing

Page 54: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

54

Creating a Toolbox

Page 55: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

55

Creating a Toolbox

Page 56: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

56

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

• Scale-out to larger group of users

• Easier to maintain and share

Generally Useful

Tools for

Analysis

Page 57: Becoming a Data-Centric Engineering Team - MathWorks · Control data import with ImportOptions 2. Working with table and timetable datatypes 3. Interacting with data in the Live Editor

57

Takeaways

▪ MATLAB has many new tools to help

you better work with and utilize your

data

▪ Create tools for you / your team /

your organization to explore and

analyze data

▪ Increasing maturity with data science

is a journey; we’re here to help