convertirse en un equipo de ingenieros centrado en datos: … · 4 data science maturity levels...

Post on 09-Jul-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

2© 2015 The MathWorks, Inc.

Convertirse en un equipo de ingenieros centrado

en datos: ponerse al día con el aluvión de datos

Paula Poza

3

A path for how your team can

better work with and utilize

data.

4

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Generally Useful

Tools for

Analysis

Common

Infrastructure;

Tested and

Documented

Overhead when asking

a new question

Ease of scaling

to more people

5

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

Generally Useful

Tools for

Analysis

• Goal is to be fast: reduce time to insight

6

Getting Started: Exploring a New Dataset

7

Getting Started: Exploring a New Dataset

8

Getting Started: Exploring a New Dataset

9

Getting Started: Exploring a New Dataset Missing Dataismissing

rmmissing

fillmissing

Outliersisoutlier

rmoutliers

filloutliers

Change Pointsischange

Noisy Datasmoothdata

and more…

https://www.mathworks.com/help/

matlab/preprocessing-data.html

10

Getting Started: Exploring a New Dataset

11

Getting Started: Exploring a New Datasetgeoplot

geoscatter

geobubble

geodensityplot

https://www.mathworks.com/help/

matlab/geographic-plots.html

12

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

• Explore and understand data

• Document analysis

• Tools will be re-used in next steps

Generally Useful

Tools for

Analysis

13

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

• Apply to different datasets

• Functions/Scripts

• MATLAB Apps

• Trend: Work with BIG DATA

Generally Useful

Tools for

Analysis

14

Overview of Flight Data

▪ 35 unique aircraft

▪ 180,000 unique flights

▪ 300 GB of data

▪ Source:

– NASA Dash Link: Sample Flight Data

– https://c3.nasa.gov/dashlink/projects/85/

15

Big Data Creates Opportunities

Find rare events, then dive deeper

Build and validate test scenarios that match real-world conditions

Perform fleet-wide calculations

16

Big Data Requires New Tools

Built-In Datastores

General datastore

spreadsheetDatastore

tabularTextDatastore

fileDatastore

Database databaseDatastore

Image imageDatastore

denoisingImageDatastore

randomPatchExtractionDatastore

pixelLabelDatastore

augmentedImageDatastore

Audio audioDatastore

Predictive

Maintenance

fileEnsembleDatastore

simulationEnsembleDatastore

Simulink SimulationDatastore

Automotive mdfDatastore

17

Big Data Requires New Tools

▪ Customize a datastore to work with

your dataset

▪ Gives you control over how data is

loaded and formatted

▪ MATLAB subclass: “fill-in-the-blanks”

▪ Build a piece of infrastructure, then re-

use it in your analyses

function [data,info] = read(ds)

...

end

function tf = hasdata(ds)

...

end

function reset(ds)

...

end

function p = progress(ds)

...

end

function data = readall(ds)

...

end

Custom Datastore

18

A Custom Datastore for Flight Data

19

Find Rare Events, then Dive Deeper

20

Perform Fleet-Wide Calculations

21

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

• Make it easy to navigate the data

• Re-use each time you analyze the dataset

Generally Useful

Tools for

Analysis

22

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

Generally Useful

Tools for

Analysis

• Collaborate: Work with others on a common code base

• Verify: Write well-tested software

• Share: Build tools for others

23

MATLAB Projects

24

Testing

25

Creating a Toolbox

26

Data Science Maturity Levels

Ad-hoc

Individual

Analysis

Common

Infrastructure;

Tested and

Documented

• Scale-out to larger group of users

• Easier to maintain and share

Generally Useful

Tools for

Analysis

27

What’s Next?

Advanced Analytics and Machine Learning

Build and Test Algorithms for

Embedded Systems

Deploy Apps and Analytics to

Enterprise IT Systems

top related