better data, better science! [ better science through better data management ] todd d. obrien noaa...

44
Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. O’Brien NOAA – NMFS - COPEPOD

Upload: luis-patterson

Post on 27-Mar-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

Better Data, Better Science![ Better Science through Better Data Management ]

Todd D. O’BrienNOAA – NMFS - COPEPOD

Page 2: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

“BETTER DATA” is …

• Easily Accessible

• Well Documented

• Integrated / Interlinked

• The Best Quality possible

Page 3: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

Oops! (When Data Management Fails)

Page 4: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

“BETTER DATA” is …

• Easily Accessible

• Well Documented

• Integrated / Interlinked

• The Best Quality possible

Page 5: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

“BETTER DATA” is …

• Easily Accessible

• Well Documented

• Integrated / Interlinked

• The Best Quality possible

Page 6: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

WHY QC?

• To find errors in the data …

Page 7: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

WHY QC?

• To find errors in the data …

– To detect instrument failure or sampling problems

Page 8: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

WHY QC?

• To find errors in the data …

– To detect instrument failure or sampling problems

– To detect phenomena of scientific interest• Natural physical or biological events

• Something “new”

Page 9: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

WHY QC?

• To find errors in the data … that were not present in the original data ?!

Page 10: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

WHY QC?

• To find errors in the data … that were not present in the original data ?!

– Data Pathway errors• human error

• computer error

Page 11: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

WHAT TO QC?

• Individual values (the measurements)?

• Profile of multiple values?

• Cruise of multiple profiles?

• Project of multiple cruises?

• Region or Ocean of multiple Projects?

• Entire World of multiple Regions?

Page 12: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

What software, tools, and skills are available?

Page 13: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

Station Lon Lat Time SPEED1 -69.30732 39.86233 7:002 -68.93825 38.70241 8:00 29.213 -68.54282 37.30523 9:00 34.854 -67.96285 35.5917 10:00 43.425 -66.56567 33.1664 11:00 67.186 -66.11751 32.45462 12:00 20.197 -67.54106 34.58994 13:00 61.598 -65.03667 30.87291 14:00 107.579 -64.11399 30.84654 15:00 22.1510 -63.56039 31.37378 16:00 18.3511 -65.64299 34.53722 18:00 45.4512 -67.35653 38.46515 19:00 102.8513 -60.89783 38.14881 19:15 620.7814 -67.67287 39.41418 20:00 220.5515 -68.25284 40.38957 21:00 27.23

What software, tools, and skills are available?

Page 14: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

Station Lon Lat Time SPEED1 -69.30732 39.86233 7:002 -68.93825 38.70241 8:00 29.213 -68.54282 37.30523 9:00 34.854 -67.96285 35.5917 10:00 43.425 -66.56567 33.1664 11:00 67.186 -66.11751 32.45462 12:00 20.197 -67.54106 34.58994 13:00 61.598 -65.03667 30.87291 14:00 107.579 -64.11399 30.84654 15:00 22.1510 -63.56039 31.37378 16:00 18.3511 -65.64299 34.53722 18:00 45.4512 -67.35653 38.46515 19:00 102.8513 -60.89783 38.14881 19:15 620.7814 -67.67287 39.41418 20:00 220.5515 -68.25284 40.38957 21:00 27.23

What software, tools, and skills are available?

Page 15: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

What software, tools, and skills are available?

Station Lon Lat Time SPEED1 -69.30732 39.86233 7:002 -68.93825 38.70241 8:00 29.213 -68.54282 37.30523 9:00 34.854 -67.96285 35.5917 10:00 43.425 -66.56567 33.1664 11:00 67.186 -66.11751 32.45462 12:00 20.197 -67.54106 34.58994 13:00 61.598 -65.03667 30.87291 14:00 107.579 -64.11399 30.84654 15:00 22.1510 -63.56039 31.37378 16:00 18.3511 -65.64299 34.53722 18:00 45.4512 -67.35653 38.46515 19:00 102.8513 -60.89783 38.14881 19:15 620.7814 -67.67287 39.41418 20:00 220.5515 -68.25284 40.38957 21:00 27.23

Page 16: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

What software, tools, and skills are available?

30

35

40

45

-75 -70 -65 -60

Page 17: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

What software, tools, and skills are available?

0

100

200

300

400

500

600

700

800

900

1000

00.2

0.4

0.6

0.8 1

1.2

1.4

1.6

1.8 2

2.2

2.4

2.6

2.8 3

3.2

3.4

3.6

3.8 4

4.2

4.4

4.6

4.8 5

5.2

5.4

5.6

5.8 6

6.2

6.4

6.6

6.8 7

7.2

7.4

7.6

7.8 8

8.2

8.4

8.6

8.8 9

9.2

9.4

9.6

9.8 10

10.2

10.4

10.6

10.8 11

11.2

11.4

11.6

11.8 12 Mor

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Page 18: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

What software, tools, and skills are available?

Page 19: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

Let’s get started …

Page 20: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

QC OF THE “WHAT & HOW”

Page 21: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

QC OF THE “WHAT & HOW”

• Need to first understand the methods, variables, and units of the data before trying to QC the data

Page 22: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

QC OF THE “WHAT & HOW”

• Need to first understand the methods, variables, and units of the data before trying to QC the data

– Are all labels clear and unambiguous

– Are methods provided (or a reference)

– What are the value units

Page 23: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

QC OF THE “WHEN & WHERE”

Page 24: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

QC OF THE “WHEN & WHERE”

• Primary Data:– First, check the master ship record– Then check PI files

Page 25: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

QC OF THE “WHEN & WHERE”

• Primary Data:– First, check the master ship record– Then check PI files

• Simple Range Checks– Time (0-23? 1-24?)

• What is the time zone? – Lat +/- 90 Lon +/- 180

• Are hemisphere signs present (E/W) or described

Page 26: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

QC OF THE “WHEN & WHERE”

• Map the Cruise Track– sorted by station sequence– sorted by sampling time

Page 27: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

QC OF THE “WHEN & WHERE”

• Calculate ship speed (distance/time) between stations

Station Lon Lat Time SPEED1 -69.30732 39.86233 7:002 -68.93825 38.70241 8:00 29.213 -68.54282 37.30523 9:00 34.854 -67.96285 35.5917 10:00 43.425 -66.56567 33.1664 11:00 67.186 -66.11751 32.45462 12:00 20.197 -67.54106 34.58994 13:00 61.598 -65.03667 30.87291 14:00 107.579 -64.11399 30.84654 15:00 22.1510 -63.56039 31.37378 16:00 18.3511 -65.64299 34.53722 18:00 45.4512 -67.35653 38.46515 19:00 102.8513 -60.89783 38.14881 19:15 620.7814 -67.67287 39.41418 20:00 220.5515 -68.25284 40.38957 21:00 27.23

Page 28: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

QC OF THE “HOW MUCH”

Page 29: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

QC OF THE “HOW MUCH”

• First, look at the background environment• Check for depth inversions• Check for density inversions• Look at T vs. S plot

Page 30: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

QC OF THE “HOW MUCH”

• Look at the variable vs. depth

Page 31: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

QC OF THE “HOW MUCH”

• Check against basic value ranges

0

20

40

60

80

100

120

140

160

0 5 10 15

Measurement

Depth

Page 32: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

QC OF THE “HOW MUCH”

• Check against basic value ranges

• Check for excessive gradients (spikes) between values at adjacent depths

0

20

40

60

80

100

120

140

160

0 5 10 15

Measurement

Depth

Page 33: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

QC OF THE “HOW MUCH”

Page 34: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

Expert / Specialist Data Centers

Page 35: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

Expert / Specialist Data Centers

• Can provide guidance on– Metadata (standards, minimum requirements)– Data Formats (format suggestions / review)– Tools and Methods

Page 36: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

Expert / Specialist Data Centers

• Can provide guidance on– Metadata (standards, minimum requirements)– Data Formats (format suggestions / review)– Tools and Methods

• May have advanced visualization or QC methods available for your data.

Page 37: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD
Page 38: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

Empirical Comparisons with Historical Observations (ECHO)

Page 39: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

Expert / Specialist Data Centers(just a few examples)

• CCHDO- CLIVAR Carbon & Hydrographic Data Office

• BCO-DMO- Biological and Chemical Oceanography Data

Management Office

• BODC- British Oceanographic Data Centre

• COPEPOD- Coastal & Oceanic Plankton Ecology, Production & Observation Database

Page 40: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

The Conclusions

Page 41: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

Some Conclusions

• Each additional layer of QC and examination may highlight issues that were previously undetected.

Page 42: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

Some Conclusions

• Each additional layer of QC and examination may highlight issues that were previously undetected.

• Each instance of transfer or reformatting the data has a chance of introducing new errors (or data loss).

Page 43: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

Some Conclusions

• Each additional layer of QC and examination may highlight issues that were previously undetected.

• Each instance of transfer or reformatting the data has a chance of introducing new errors (or data loss).

• The comprehensiveness of the co-stored metadata will determine the extent to which the data are still usable/understandable 10+ years after the project.

Page 44: Better Data, Better Science! [ Better Science through Better Data Management ] Todd D. OBrien NOAA – NMFS - COPEPOD

“BETTER DATA” is …

• Easily Accessible

• Well Documented

• Integrated / Interlinked

• The Best Quality possible