11 building a usable infrastructure for e-science: an information perspective christine l. borgman...

35
1 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies University of California, Los Angeles e-Science All Hands Meeting, Nottingham 20 September 2005 These slides are available under Creative Commons Non-commercial Attribution License, Christine L. Borgman, 2005 http://creativecommons.org

Upload: laurence-obrien

Post on 20-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

1111

Building a Usable Infrastructure for e-Science:

An Information Perspective

Building a Usable Infrastructure for e-Science:

An Information Perspective

Christine L. BorgmanProfessor & Presidential Chair in Information Studies

University of California, Los Angeles

e-Science All Hands Meeting, Nottingham

20 September 2005

Christine L. BorgmanProfessor & Presidential Chair in Information Studies

University of California, Los Angeles

e-Science All Hands Meeting, Nottingham

20 September 2005These slides are available under Creative Commons Non-commercial Attribution License, Christine L. Borgman, 2005

http://creativecommons.org

Page 2: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

22

e-Science Goalse-Science Goals

• to enable new forms of science that are– information-intensive – data-intensive– distributed – collaborative– multi-disciplinary

• to use information technology to – leverage data as a form of science capital– to manage the “data deluge”– improve access to scientific information

• to enable new forms of science that are– information-intensive – data-intensive– distributed – collaborative– multi-disciplinary

• to use information technology to – leverage data as a form of science capital– to manage the “data deluge”– improve access to scientific information

Page 3: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

3

Information & knowledge layer

Middleware

services layer

ITC Infrastructure

Processors, memory, network

ContentApplications

Space

e-Science infrastructure: Layered Model

Digital Libraries

Scientific DBs

UserInterfaces &Tools

Slide courtesy of Stephen Griffin, NSF, and Norman Wiseman, JISC

Page 4: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

44

UsabilityUsability

• Screen displays

• Work practices

• Culture of science

• Incentives of scientists

•Economics, law, policy, and institutions of science

• Screen displays

• Work practices

• Culture of science

• Incentives of scientists

•Economics, law, policy, and institutions of science

Page 5: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

55

An infrastructure OF or FOR information? An infrastructure OF or FOR information?

– OF information: • A framework to support any kind of information• Bits, objects, independent of context

– FOR information:• Fits into work practices• Facilitates communication between groups• Provides context for interpretation, use, re-use of

information• Reflects the incentives of scientists• Provides permanent access to information

– OF information: • A framework to support any kind of information• Bits, objects, independent of context

– FOR information:• Fits into work practices• Facilitates communication between groups• Provides context for interpretation, use, re-use of

information• Reflects the incentives of scientists• Provides permanent access to information

Page 6: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

66

Value chain of information Value chain of information

• Relationships– Scientific basis– Sources– Methods– History– Provenance

• Networks of – publications– data– composite objects

• Relationships– Scientific basis– Sources– Methods– History– Provenance

• Networks of – publications– data– composite objects

Image: http://www.indexgeo.com.au/tech/asdd/discover.gif

Page 7: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

7

Grid

E-Scientists

Entire E-Science CycleEncompassing experimentation, analysis, publication, research, learning

5

Institutional Archive

LocalWebPublisher

Holdings

Digital Library

E-Scientists Graduate Students

Undergraduate Students

Virtual Learning Environment

E-Experimentation

E-Scientists

Technical Reports

Reprints

Peer-Reviewed Journal &

Conference Papers

Preprints & Metadata

Certified Experimental

Results & Analyses

Data, Metadata & Ontologies

eBank Project

Slide Courtesy of Liz Lyon, UKOLN

Page 8: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

8

Crystallographic e-Prints Direct Access to Raw Data from scientific papers

Raw data sets can be very large and these are stored at National Datastore using SRB server Slide courtesy Jeremy Frey & Tony Hey

Page 9: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

9

British Atmospheric Data Centre

British Oceanographic Data Centre

Simulations

Assimilation

Complexity + Volume + Remote Access = Grid Challenge

Slide courtesy Bryan Lawrence & Tony Hey

Page 10: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

10Roman Forum, Western End, ca. 400AD, copyright Regents of the University of California

Page 11: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

1111

Role of publications in scienceRole of publications in science

• Product of research

• Cumulative, historical record of science

• Input to research

• Value chain: Network of documents linked via citations

• Product of research

• Cumulative, historical record of science

• Input to research

• Value chain: Network of documents linked via citations

Image: http://www.bronxville.k12.ny.us/Library/Good_Library_person.jpg

Page 12: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

1212

Access to scientific publicationsAccess to scientific publications

• Libraries– Paper journals via

subscription– Electronic journals via

leased access– Control via bibliographic

records (metadata)

• Colleagues– Pre-prints in disciplinary

repositories– Private circulation

• Libraries– Paper journals via

subscription– Electronic journals via

leased access– Control via bibliographic

records (metadata)

• Colleagues– Pre-prints in disciplinary

repositories– Private circulation

Image: http://siggy.chem.ucla.edu/Visit_UCLA/Visit_UCLA.html

Page 13: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

1313

Role of data in e-ScienceRole of data in e-Science

• Data-centric collaboration• Data as product of

research?• Data as input to research?• Value chain

– Data to data links?– Provenance?– Data to publication links?

• Data-centric collaboration• Data as product of

research?• Data as input to research?• Value chain

– Data to data links?– Provenance?– Data to publication links?

Image: http://quake.wr.usgs.gov/research/deformation/twocolor/lvnet.gif

Page 14: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

1414

What are data in science?What are data in science?• Ecology: weather, ground

water, sensor readings, historical record

• Medicine: x-rays• Chemistry: protein

structures• Astronomy: spectral

surveys• Biology: specimens• Physics: events, objects• Documentation: Lab and

field notebooks, spreadsheets

• Ecology: weather, ground water, sensor readings, historical record

• Medicine: x-rays• Chemistry: protein

structures• Astronomy: spectral

surveys• Biology: specimens• Physics: events, objects• Documentation: Lab and

field notebooks, spreadsheets

Image: http://cdiac.ornl.gov/oceans/NAtl_map.jpg

Page 15: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

1515

When are data?When are data?

• Instrument readings or scientific fact?

• Events or findings?

• When to trust data

• Factual status– What to release– When to release

• Instrument readings or scientific fact?

• Events or findings?

• When to trust data

• Factual status– What to release– When to release

CENS Image: New York Times

Page 16: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

Contaminant Transport Group

“backbone”network

adapted fromCA DW R website

• Multimedia, Multiscale problems (time and space) • Multidisciplinary (current and as yet unknown) problems• Management, visualization, exploration of massive,

heterogeneous data streams

Monitoring habitat with sensor networks

Page 17: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

1717

How are data documented?How are data documented?

• Standards– Metadata standards within fields– Ontologies within fields

• Practices– Project-specific data models– Instrument-specific models– Researcher-specific models

• Current data– Born digital

• Legacy data– Born digital in other formats– Paper, other media– Documented by project,

instrument, researcher…

• Standards– Metadata standards within fields– Ontologies within fields

• Practices– Project-specific data models– Instrument-specific models– Researcher-specific models

• Current data– Born digital

• Legacy data– Born digital in other formats– Paper, other media– Documented by project,

instrument, researcher… Image source:http://www.medscape.com/content/2004/00/46/81/468129/art-mgm468129.fig1.jpg

Page 18: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

1818

Page 19: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

1919

What data are retained for re-use?What data are retained for re-use?

• Genomics: deposit expected

• Physics: shared by collaborators, not openly published

• Chemistry: highly contentious

• Ecology: many small, local projects, local data

• Genomics: deposit expected

• Physics: shared by collaborators, not openly published

• Chemistry: highly contentious

• Ecology: many small, local projects, local data

Image source: http://www.bbc.co.uk/schools/gcsebitesize/img/ict04datastorage.gif

Page 20: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

2020

Under what conditions can data be shared, re-used?Under what conditions can data be shared, re-used?

• Funding source– Public: access may be mandatory– Private: access may be limited– Public-private partners: negotiated

• Economic (resale) value of data– Chemistry: very high– Stock market, geospatial: time

dependent– Particle physics: low

• Funding source– Public: access may be mandatory– Private: access may be limited– Public-private partners: negotiated

• Economic (resale) value of data– Chemistry: very high– Stock market, geospatial: time

dependent– Particle physics: low

Image: http://www.britishcouncil.org/global-common-330x220-pound-sign.jpg

Page 21: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

2121

Under what conditions can data be shared, re-used?Under what conditions can data be shared, re-used?

• Privacy, confidentiality– Sciences (e.g., atoms, molecules, genomes): low– Sciences (e.g., endangered species): high– Medicine (e.g., patient records): high– Social sciences (e.g., interviews, observations): high

• Security– Authorizing access– Security practices

• Privacy, confidentiality– Sciences (e.g., atoms, molecules, genomes): low– Sciences (e.g., endangered species): high– Medicine (e.g., patient records): high– Social sciences (e.g., interviews, observations): high

• Security– Authorizing access– Security practices

Image: Christine L. Borgman, 2005

Page 22: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

2222

Who controls, who owns data? Who controls, who owns data?

• Ownership vs control– Own and control – Control but not own– Own but not control

• Who can authorize release of data?– Investigator– University intellectual property

office– Funding agency– Collaboration partner

• What intellectual property practices, rules, laws govern?

• Ownership vs control– Own and control – Control but not own– Own but not control

• Who can authorize release of data?– Investigator– University intellectual property

office– Funding agency– Collaboration partner

• What intellectual property practices, rules, laws govern?

Image source: http://www.nelsonmullins.com/legal-practice-area/Practice_Insets/Intellectual-Property-Inter.jpg

Page 23: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

2323

How to access scientific data?How to access scientific data?

• Private access– Request data from another

investigator– Train researcher in a novel

method or technique– Barter data for research funds,

access to labs, corporate partnership…

• Open, public access– Data repositories: BODC,

BADC, BIRN, NEON, GEON, UKDA…

– Data posted on local portal, website

• Private access– Request data from another

investigator– Train researcher in a novel

method or technique– Barter data for research funds,

access to labs, corporate partnership…

• Open, public access– Data repositories: BODC,

BADC, BIRN, NEON, GEON, UKDA…

– Data posted on local portal, website

Image: Christine L. Borgman, 1995

Page 24: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

2424

Page 25: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

2525

Page 26: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

2626

Page 27: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

2727

What data deserve to be permanently accessible?What data deserve to be permanently accessible?

• What are the scientific criteria for preservation?

• What is the equivalent of peer review for data?

• Whose data do you trust? • What data will be re-

used?• How much to invest?• Who will add the value?

• What are the scientific criteria for preservation?

• What is the equivalent of peer review for data?

• Whose data do you trust? • What data will be re-

used?• How much to invest?• Who will add the value?

Image: Christine L. Borgman, 2005

Page 28: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

2828

Page 29: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

2929

Incentives to share dataIncentives to share data

• Tradition of “open science”

• Replicate, compare results

• Ask new questions

• Form multi-disciplinary alliances

• Required by funding agency or journal

• Tradition of “open science”

• Replicate, compare results

• Ask new questions

• Form multi-disciplinary alliances

• Required by funding agency or journal

Image source:www.buffaloworks.us/ images/sharing%20orangs.jpg

Page 30: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

3030

Incentives not to share dataIncentives not to share data

• Rewards for publication, not for data management

• Effort to document data

• Concern for “free riders”

• Risks of misinterpretation of data

• Risks of losing control over data

• Risks of loss of intellectual property

• Rewards for publication, not for data management

• Effort to document data

• Concern for “free riders”

• Risks of misinterpretation of data

• Risks of losing control over data

• Risks of loss of intellectual property

Image source: www.buildingsrus.co.uk/.../ target1.htm

Page 31: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

3131

Content and contextContent and context

• Scholarly publications provide context– Literature review, history of problem, definitions of terms

– Theory, hypotheses, goals

– Research method, discussion of results

– Cumulation of scientific knowledge

• Datasets, repositories remove context– Data elements, names of variables

– Instrument readings

– Numerical, textual data

– Images, descriptions of artifacts

• Scholarly publications provide context– Literature review, history of problem, definitions of terms

– Theory, hypotheses, goals

– Research method, discussion of results

– Cumulation of scientific knowledge

• Datasets, repositories remove context– Data elements, names of variables

– Instrument readings

– Numerical, textual data

– Images, descriptions of artifacts

Page 32: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

3232

Constructing the value chainConstructing the value chain• Links between data, documents and

objects– Based on common standards – Robust over time– Robust over migration of software and

hardware

• Metadata– Based on common standards– Describe what can be done with them– Describe conditions for use

• Permanent access– Incentives – Curatorial expertise– Institutional models

• Links between data, documents and objects– Based on common standards – Robust over time– Robust over migration of software and

hardware

• Metadata– Based on common standards– Describe what can be done with them– Describe conditions for use

• Permanent access– Incentives – Curatorial expertise– Institutional models

Page 33: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

3333

An infrastructure for informationAn infrastructure for information

• Support work practices– Tools and services to capture,

retain, document

• Facilitate communication, exchange– Methods to describe, cite

documents and data– Methods to represent and use

composite objects– Robust linking

• Reflect scientific incentives– Rewards for data contribution– Rewards for data management

• Support work practices– Tools and services to capture,

retain, document

• Facilitate communication, exchange– Methods to describe, cite

documents and data– Methods to represent and use

composite objects– Robust linking

• Reflect scientific incentives– Rewards for data contribution– Rewards for data management Image source:

http://clubs.myams.org/gvca/i mages/Context-Meaning_web.gif

Page 34: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

3434

Involve the stakeholdersInvolve the stakeholders• Scientists• Technologists• Social scientists• Librarians, archivists• Universities• Funding agencies• Corporate partners• Publishers

• Scientists• Technologists• Social scientists• Librarians, archivists• Universities• Funding agencies• Corporate partners• Publishers

Image source:http://www.ox.ac.uk/

Page 35: 11 Building a Usable Infrastructure for e-Science: An Information Perspective Christine L. Borgman Professor & Presidential Chair in Information Studies

3535

“May all your problems be technical”

Jim Gray, ACM Turing award winner

“May all your problems be technical”

Jim Gray, ACM Turing award winner