ecoinformatics: managing data, rescuing data and …lessons learned 3. build a database partnership...

62
Ecoinformatics: Managing Data, Rescuing Data and Changing the Scientific Culture William Michener LTER Network Office Department of Biology University of New Mexico

Upload: others

Post on 04-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Ecoinformatics:Managing Data, Rescuing Data and

Changing the Scientific Culture

William MichenerLTER Network Office

Department of BiologyUniversity of New Mexico

Page 2: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Today’s Road Map

• Science Challenges• Ecoinformatics• Data Rescue• Changing the Culture of Science

Page 3: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Today’s Road Map

• Science Challenges• Ecoinformatics• Data Rescue• Changing the Culture of Science

Page 4: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Abstraction of Phenomena

Date Site Species Density 10/1/1993 N654 Picea

rubens 13

10/3/1994 N654 Picea rubens

14.5

10/1/1993 N654 Betula papyifera

3

10/31/1993 1 Picea rubens

13.5

10/31/1993 1 Betula papyifera

1.6

11/14/1994 1 Picea rubens

8.4

11/14/1994 1 Betula papyifera

1.8

(Michener, 2000)

Page 5: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Characteristics of Ecological Data

Complexity/Metadata RequirementsComplexity/Metadata Requirements

SatelliteImages

DataDataVolumeVolume(per(perdataset)dataset)

LowLow

HighHigh

HighHigh

Soil CoresSoil Cores

PrimaryPrimaryProductivityProductivity

GISGIS

Population DataPopulation Data

BiodiversityBiodiversitySurveysSurveys

Gene Sequences

Business Data

WeatherStations Most EcologicalMost Ecological

DataData

Most Most SoftwareSoftware

Page 6: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Data Integration

• Syntax and Schema transformations• Semantic conversion

Page 7: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Semantics—Linking Taxonomic Semantics to Ecological Data

Elliot 1816

Rhynchospora plumosa s.l.

Gray 1834

Kral 1998

Peet 2002?

R. plumosaTaxon concepts change over time (and space)Multiple competing concepts coexistNames are re-used for multiple concepts Chapman

1860

R. plumosa

R. Plumosav. intermedia

R. plumosav. plumosa

R. Plumosav. interrupta

Date Species # 1830 R.plumosa 39 1840 R.plumosa 49 1900 R.plumosa 42 1985 R.plumosa 48 1995 R.plumosa 22 2000 R.plumosa 19

0

10

20

30

40

50

60

1/ 1/ 00 1/ 2/ 00 1/ 3/ 00 1/ 4/ 00 1/ 5/ 00 1/ 6/ 00

R. intermedia

R. pineticola R. plumosa

R. plumosav. pinetcola

R. plumosav. plumosa R. sp. 1

A B Cfrom R. Peet

Page 8: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Data EntropyIn

form

atio

n C

onte

nt

Time

Specific details

General details

Accident

Retirement or career change

Death

(Michener et al. 1997)

Time of publication

Page 9: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Today’s Road Map

• Science Challenges• Ecoinformatics• Data Rescue• Changing the Culture of Science

Page 10: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Ecoinformatics

• “a broad interdisciplinary science that incorporates conceptual approaches and practical tools for the generation, processing, understanding and dissemination of ecological data and information.”

Page 11: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Experimental DesignMethods

Data DesignData Forms

Field Computer EntryElectronically

Interfaced FieldEquipment

ElectronicallyInterfaced Lab

Equipment

Raw Data File

Quality Assurance Checks

Data Contamination

Data verified?

Archive Data File

yes

Research ProgramInvestigators

StudiesQuality Control

Data Entry

no

Summary AnalysesMetadataData Validated

Archival Mass Storage Off-site StorageMagnetic Tape / Optical Disk / PrintoutsInvestigators Secondary UsersAccess Interface

PublicationSynthesis

Page 12: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Data Design

• Conceptualize and implement a logical structure within and among data sets that will facilitate data acquisition, entry, storage, retrieval and manipulation.

Page 13: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

7 Habits of Highly Effective Data Set Design

1. Assign descriptive file names2. Use consistent and stable file formats3. Define the parameters4. Use consistent data organization5. Perform basic quality assurance6. Assign descriptive data set titles7. Provide comprehensive documentation

(metadata)

Adapted from Cook et al. 2000

Page 14: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

1. Assign descriptive file names

• File names should be unique and reflect the file contents– Bad file names

• Mydata• 2001_data

– A better file name• Sevilleta_LTER_NM_2001_NPP.asc

– Sevilleta_LTER is the project name– NM is the state abbreviation– 2001 is the calendar year– NPP represents Net Primary Productivity data– asc stands for the file type--ASCII

Page 15: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

2. Use consistent and stable file formats

• Use ASCII file formats – avoid proprietary formats• Be consistent in formatting

– don’t change or re-arrange columns– include header rows (first row should contain file name, data

set title, author, date, and companion file names)– column headings should describe content of each column,

including one row for parameter names and one for parameter units

– within the ASCII file, delimit fields using commas, pipes (|), tabs, or semicolons (in order of preference)

Page 16: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

3. Define the parameters

• Use commonly accepted parameter names that describe the contents (e.g., precip for precipitation)

• Use consistent capitalization (e.g., not temp, Temp, and TEMP in same file)

• Explicitly state units of reported parameters in the data file and the metadata (SI units are recommended)

• Choose a format for each parameter, explain the format in the metadata, and use that format throughout the file– e.g., use yyyymmdd; January 2, 1999 is 19990102

Page 17: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

4. Use consistent data organization (one good approach)

-99991919961003HOGI

31419961002HOGI

01219961001HOGI

mmCYYYYMMDDUnits

PrecipTempDateStation

Note: -9999 is a missing value code for the data set

Page 18: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

4. Use consistent data organization (a second good approach)

mm3Precip19961002HOGI

mm0Precip19961001HOGI

C14Temp19961002HOGI

C12Temp19961001HOGI

UnitValueParameterDateStation

Page 19: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

5. Perform basic quality assurance

• Assure that data are delimited and line up in proper columns

• Check that there no missing values for key parameters

• Scan for impossible and anomalous values• Perform and review graphical & statistical

summaries• Map location data (lat/long) and assess errors• Verify automated data transfers• For manual data transfers, consider double

keying data and comparing 2 data sets

Page 20: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

6. Assign descriptive data set titles

• Data set titles should ideally describe the type of data, time period, location, and instruments used (e.g., Landsat 7).

• Titles should be restricted to 80 characters.• Data set title should be similar to names of

data files– Good: “Shrub Net Primary Productivity at the

Sevilleta LTER, New Mexico, 2000-2001”– Bad: “Productivity Data”

Page 21: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

7. Provide comprehensive documentation (metadata)

Page 22: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

What are metadata?

“Data about data”

or, more appropriately,

“the information necessary to understand and effectively use the data”

Metadata?

Page 23: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Metadata helps you decide which can you would like to eat !

Page 24: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Metadata Content Specifications

• Dublin Core• NBII Biological Data Profile / CSDGM• ISO CD 19115, Geographic information

- metadata • Darwin Core • Ecological Metadata Language (EML)

Page 25: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

ESRI ArcCatalog Metadata creation/import selections

Metadata Tab

MetadataSections

MetadataParts

Catalog list

Page 26: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

An Example EML Document<?xml version="1.0"?><eml:eml packageId="piscoUCSB.5.20" system="knb" xmlns:eml="eml://ecoinformatics.org/eml-2.0.0"><dataset>

<shortName>Alegria Temperatures</shortName><title>PISCO: Intertidal Temperature Data:

Alegria, California: 1996-1997</title><creator id="C.Blanchette">

<individualName><givenName>Carol</givenName><surName>Blanchette</surName>

</individualName><organizationName>PISCO</organizationName>

<address><deliveryPoint>UCSB Marine Science

Institute</deliveryPoint><city>Santa Barbara</city><administrativeArea>CA</administrativeArea><postalCode>93106</postalCode>

</address></creator><abstract>

<para>These temperature data were collected at Alegria Beach, California, and were ...

</para></abstract><keywordSet>

<keyword>OceanographicSensorData</keyword><keyword>Thermistor</keyword>

<keywordThesaurus>PISCOCategories

</keywordThesaurus></keywordSet><intellectualRights><para>Please contact the

authors for permission to use these data. Please also acknowledge the authors in any publications.</para>

</intellectualRights><contact>

<references>C.Blanchette</references></contact>

</dataset></eml:eml>

Transform

Page 27: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

MorphoEcological Metadata Management Software

Page 28: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

How much work is this going to

be???

Page 29: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Rules of Thumb (Michener 2000)

• the more comprehensive the metadata, the greater the longevity (and value) of the data

• structured metadata can greatly facilitate data discovery, encourage “best metadata practices” and support data and metadata use by others

• metadata implementation takes time!!!• start implementing metadata for new data

collection efforts and then prioritize “legacy”and ongoing data sets that are of greatest benefit to the broadest user community

Page 30: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Make metadata implementation a team effort! Include: • Team Leader

• GIS Specialist

• Field Personnel

• Database Manager

• Laboratory Specialist

• Voucher/Repository Specialist

• And others as appropriate….

Page 31: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

An Idealized Information Environment

National/Regional Systems

“Value-Added” or “Integrated”Infobases

ResearchersResearchers

Individual datasetsIndividual datasets

Project or SiteProject or Site--Based SystemsBased Systems

Page 32: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Individual Datasets

Metadata Data

Page 33: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Site & Project Systems

Page 34: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Regional & National

Page 35: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

• Climate database integrates data from a number of sites

Integrated “Value Added” Systems

Page 36: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Key Elements Needed at each level

• Site/Project– Metadata – ideally, standards-based

• National or Network– Consistent keyword vocabularies– Standards for metadata content

• “Value Added” – Domain Expertise– Need for structured metadata– Standards for data products

Page 37: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Lessons Learned

1. Keep data close to home

• Experts on a data set are those who collected the data

Page 38: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Lessons Learned

2. Build a modular system

• Traditional approach of building a single, large system is unwieldy and inflexible

• Each module should be able to take advantage of new technologies

Page 39: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Lessons Learned

3. Build a database partnership

• Successful design and implementation depends on combining expertise of:– Scientists (what is needed)– IM’s (how to make it work)– Computer scientists (software tools)In that order!

Page 40: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Lessons Learned

4. Make extensive use of prototypes

• Describing a system as “user friendly” does not make it so—testing is required

• Prototypes lend themselves to the exploitation of new technologies and ideas; prototypes are inexpensive so you can have lots of them.

Page 41: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Lessons Learned

5. Data should be expected to far outlive the database

• Databases are ephemeral• Have a clear exit strategy

Page 42: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Lessons Learned

6. Give something back

• If you build it, it does not mean that they will come

• Data contributors should receive something of value back– “If we do this, we both

benefit.”

Page 43: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Today’s Road Map

• Science Challenges• Ecoinformatics• Data Rescue• Changing the Culture of Science

Page 44: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

What is responsible for the apparent

rise/decrease in ______???

Page 45: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

What relevant data exist?

Page 46: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Where are those data?

Page 47: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

What do the data mean?

Page 48: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

(How) Can I use the data?

Page 49: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

(How) Can I integrate the various data sources?

Weeks > Months > Years

Page 50: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Finding Data & Metadata

• Colleagues• Scientific literature• WWW searches• Data and metadata registries

– Global Change Master Directory• Metadata Clearinghouses

– National Biological Information Infrastructure– NODC

Page 51: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –
Page 52: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –
Page 53: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Today’s Road Map

• Science Challenges• Ecoinformatics• Data Rescue• Changing the Culture of Science

Page 54: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Planning

Analysis and

modeling

Cycles of Research“A Conventional View”

Publication

s Data

ProblemCollection

Page 55: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Cycles of Research“A New View”

Planning

Analysis and

modeling

Planning

Selection andextraction

OriginalObservations

SecondaryObservations

Publication

sArchive of Data

Collection

Problem Definition(Research Objectives)

Page 56: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Reasons to Not Share Data:

My data will be misinterpreted ...

I will get to publishing on

it later …

Someone will find errors …

I will be scooped …

Page 57: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Benefits of Data Sharing• Publicity, accolades, media attention• Renewed or increased funding • Teaching:

• long-term data sets adapted for teaching & texts

• Archival: back-up copy of critical data sets • Research:

• new synthetic studies • peer-reviewed publications

• Document global and regional change• Conservation and resource management:

• species and natural areas protection• new environmental laws

Page 58: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Culture Change

• Scientific societies• Funding agencies and programs• Universities and science/resource

management agencies

Page 59: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Increasing value of data over time

Dat

a V

alu

e

Time

SerendipitousDiscovery

Inter-siteSynthesis

Gradual IncreaseIn Data Equity

Methodological Flaws, Instrumentation

Obsolescence

Non-scientific Monitoring

Page 60: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Thanks !!!

Page 61: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

References

Brunt (2000) Ch. 2 in Michener and Brunt (2000)

Porter (2000) Ch. 3 in Michener and Brunt (2000)

Edwards (2000) Ch. 4 in Michener and Brunt (2000)

Michener (2000) Ch. 7 in Michener and Brunt (2000)

Cook, R.B., R.J. Olson, P. Kanciruk, and L.A. Hook. 2000. Best practices for preparing ecological and ground-based data sets to share and archive. (online at http://www.daac.ornl.gov/cgi-bin/MDE/S2K/bestprac.html)

Page 62: Ecoinformatics: Managing Data, Rescuing Data and …Lessons Learned 3. Build a database partnership • Successful design and implementation depends on combining expertise of: –

Metadata Resources• Michener, W.K. 2000. Metadata. In: Ecological Data: Design,

Management and Processing. (eds. W.K. Michener and J.W. Brunt), pp. 92-116. Blackwell Science, Oxford, United Kingdom.

• Michener, W.K., J.W. Brunt, J.J. Helly, T.B. Kirchner, and S.G. Stafford. 1997. Nongeospatial metadata for the ecological sciences. Ecological Applications 7(1):330-342.

• http://knb.ecoinformatics.org and • http://seek.ecoinformatics.org -- for ecologists• http://www.w3.org/DesignIssues -- for technologists