11 building a usable infrastructure for e-science: an information perspective christine l. borgman...
TRANSCRIPT
1111
Building a Usable Infrastructure for e-Science:
An Information Perspective
Building a Usable Infrastructure for e-Science:
An Information Perspective
Christine L. BorgmanProfessor & Presidential Chair in Information Studies
University of California, Los Angeles
e-Science All Hands Meeting, Nottingham
20 September 2005
Christine L. BorgmanProfessor & Presidential Chair in Information Studies
University of California, Los Angeles
e-Science All Hands Meeting, Nottingham
20 September 2005These slides are available under Creative Commons Non-commercial Attribution License, Christine L. Borgman, 2005
http://creativecommons.org
22
e-Science Goalse-Science Goals
• to enable new forms of science that are– information-intensive – data-intensive– distributed – collaborative– multi-disciplinary
• to use information technology to – leverage data as a form of science capital– to manage the “data deluge”– improve access to scientific information
• to enable new forms of science that are– information-intensive – data-intensive– distributed – collaborative– multi-disciplinary
• to use information technology to – leverage data as a form of science capital– to manage the “data deluge”– improve access to scientific information
3
Information & knowledge layer
Middleware
services layer
ITC Infrastructure
Processors, memory, network
ContentApplications
Space
e-Science infrastructure: Layered Model
Digital Libraries
Scientific DBs
UserInterfaces &Tools
Slide courtesy of Stephen Griffin, NSF, and Norman Wiseman, JISC
44
UsabilityUsability
• Screen displays
• Work practices
• Culture of science
• Incentives of scientists
•Economics, law, policy, and institutions of science
• Screen displays
• Work practices
• Culture of science
• Incentives of scientists
•Economics, law, policy, and institutions of science
55
An infrastructure OF or FOR information? An infrastructure OF or FOR information?
– OF information: • A framework to support any kind of information• Bits, objects, independent of context
– FOR information:• Fits into work practices• Facilitates communication between groups• Provides context for interpretation, use, re-use of
information• Reflects the incentives of scientists• Provides permanent access to information
– OF information: • A framework to support any kind of information• Bits, objects, independent of context
– FOR information:• Fits into work practices• Facilitates communication between groups• Provides context for interpretation, use, re-use of
information• Reflects the incentives of scientists• Provides permanent access to information
66
Value chain of information Value chain of information
• Relationships– Scientific basis– Sources– Methods– History– Provenance
• Networks of – publications– data– composite objects
• Relationships– Scientific basis– Sources– Methods– History– Provenance
• Networks of – publications– data– composite objects
Image: http://www.indexgeo.com.au/tech/asdd/discover.gif
7
Grid
E-Scientists
Entire E-Science CycleEncompassing experimentation, analysis, publication, research, learning
5
Institutional Archive
LocalWebPublisher
Holdings
Digital Library
E-Scientists Graduate Students
Undergraduate Students
Virtual Learning Environment
E-Experimentation
E-Scientists
Technical Reports
Reprints
Peer-Reviewed Journal &
Conference Papers
Preprints & Metadata
Certified Experimental
Results & Analyses
Data, Metadata & Ontologies
eBank Project
Slide Courtesy of Liz Lyon, UKOLN
8
Crystallographic e-Prints Direct Access to Raw Data from scientific papers
Raw data sets can be very large and these are stored at National Datastore using SRB server Slide courtesy Jeremy Frey & Tony Hey
9
British Atmospheric Data Centre
British Oceanographic Data Centre
Simulations
Assimilation
Complexity + Volume + Remote Access = Grid Challenge
Slide courtesy Bryan Lawrence & Tony Hey
10Roman Forum, Western End, ca. 400AD, copyright Regents of the University of California
1111
Role of publications in scienceRole of publications in science
• Product of research
• Cumulative, historical record of science
• Input to research
• Value chain: Network of documents linked via citations
• Product of research
• Cumulative, historical record of science
• Input to research
• Value chain: Network of documents linked via citations
Image: http://www.bronxville.k12.ny.us/Library/Good_Library_person.jpg
1212
Access to scientific publicationsAccess to scientific publications
• Libraries– Paper journals via
subscription– Electronic journals via
leased access– Control via bibliographic
records (metadata)
• Colleagues– Pre-prints in disciplinary
repositories– Private circulation
• Libraries– Paper journals via
subscription– Electronic journals via
leased access– Control via bibliographic
records (metadata)
• Colleagues– Pre-prints in disciplinary
repositories– Private circulation
Image: http://siggy.chem.ucla.edu/Visit_UCLA/Visit_UCLA.html
1313
Role of data in e-ScienceRole of data in e-Science
• Data-centric collaboration• Data as product of
research?• Data as input to research?• Value chain
– Data to data links?– Provenance?– Data to publication links?
• Data-centric collaboration• Data as product of
research?• Data as input to research?• Value chain
– Data to data links?– Provenance?– Data to publication links?
Image: http://quake.wr.usgs.gov/research/deformation/twocolor/lvnet.gif
1414
What are data in science?What are data in science?• Ecology: weather, ground
water, sensor readings, historical record
• Medicine: x-rays• Chemistry: protein
structures• Astronomy: spectral
surveys• Biology: specimens• Physics: events, objects• Documentation: Lab and
field notebooks, spreadsheets
• Ecology: weather, ground water, sensor readings, historical record
• Medicine: x-rays• Chemistry: protein
structures• Astronomy: spectral
surveys• Biology: specimens• Physics: events, objects• Documentation: Lab and
field notebooks, spreadsheets
Image: http://cdiac.ornl.gov/oceans/NAtl_map.jpg
1515
When are data?When are data?
• Instrument readings or scientific fact?
• Events or findings?
• When to trust data
• Factual status– What to release– When to release
• Instrument readings or scientific fact?
• Events or findings?
• When to trust data
• Factual status– What to release– When to release
CENS Image: New York Times
Contaminant Transport Group
“backbone”network
adapted fromCA DW R website
• Multimedia, Multiscale problems (time and space) • Multidisciplinary (current and as yet unknown) problems• Management, visualization, exploration of massive,
heterogeneous data streams
Monitoring habitat with sensor networks
1717
How are data documented?How are data documented?
• Standards– Metadata standards within fields– Ontologies within fields
• Practices– Project-specific data models– Instrument-specific models– Researcher-specific models
• Current data– Born digital
• Legacy data– Born digital in other formats– Paper, other media– Documented by project,
instrument, researcher…
• Standards– Metadata standards within fields– Ontologies within fields
• Practices– Project-specific data models– Instrument-specific models– Researcher-specific models
• Current data– Born digital
• Legacy data– Born digital in other formats– Paper, other media– Documented by project,
instrument, researcher… Image source:http://www.medscape.com/content/2004/00/46/81/468129/art-mgm468129.fig1.jpg
1818
1919
What data are retained for re-use?What data are retained for re-use?
• Genomics: deposit expected
• Physics: shared by collaborators, not openly published
• Chemistry: highly contentious
• Ecology: many small, local projects, local data
• Genomics: deposit expected
• Physics: shared by collaborators, not openly published
• Chemistry: highly contentious
• Ecology: many small, local projects, local data
Image source: http://www.bbc.co.uk/schools/gcsebitesize/img/ict04datastorage.gif
2020
Under what conditions can data be shared, re-used?Under what conditions can data be shared, re-used?
• Funding source– Public: access may be mandatory– Private: access may be limited– Public-private partners: negotiated
• Economic (resale) value of data– Chemistry: very high– Stock market, geospatial: time
dependent– Particle physics: low
• Funding source– Public: access may be mandatory– Private: access may be limited– Public-private partners: negotiated
• Economic (resale) value of data– Chemistry: very high– Stock market, geospatial: time
dependent– Particle physics: low
Image: http://www.britishcouncil.org/global-common-330x220-pound-sign.jpg
2121
Under what conditions can data be shared, re-used?Under what conditions can data be shared, re-used?
• Privacy, confidentiality– Sciences (e.g., atoms, molecules, genomes): low– Sciences (e.g., endangered species): high– Medicine (e.g., patient records): high– Social sciences (e.g., interviews, observations): high
• Security– Authorizing access– Security practices
• Privacy, confidentiality– Sciences (e.g., atoms, molecules, genomes): low– Sciences (e.g., endangered species): high– Medicine (e.g., patient records): high– Social sciences (e.g., interviews, observations): high
• Security– Authorizing access– Security practices
Image: Christine L. Borgman, 2005
2222
Who controls, who owns data? Who controls, who owns data?
• Ownership vs control– Own and control – Control but not own– Own but not control
• Who can authorize release of data?– Investigator– University intellectual property
office– Funding agency– Collaboration partner
• What intellectual property practices, rules, laws govern?
• Ownership vs control– Own and control – Control but not own– Own but not control
• Who can authorize release of data?– Investigator– University intellectual property
office– Funding agency– Collaboration partner
• What intellectual property practices, rules, laws govern?
Image source: http://www.nelsonmullins.com/legal-practice-area/Practice_Insets/Intellectual-Property-Inter.jpg
2323
How to access scientific data?How to access scientific data?
• Private access– Request data from another
investigator– Train researcher in a novel
method or technique– Barter data for research funds,
access to labs, corporate partnership…
• Open, public access– Data repositories: BODC,
BADC, BIRN, NEON, GEON, UKDA…
– Data posted on local portal, website
• Private access– Request data from another
investigator– Train researcher in a novel
method or technique– Barter data for research funds,
access to labs, corporate partnership…
• Open, public access– Data repositories: BODC,
BADC, BIRN, NEON, GEON, UKDA…
– Data posted on local portal, website
Image: Christine L. Borgman, 1995
2424
2525
2626
2727
What data deserve to be permanently accessible?What data deserve to be permanently accessible?
• What are the scientific criteria for preservation?
• What is the equivalent of peer review for data?
• Whose data do you trust? • What data will be re-
used?• How much to invest?• Who will add the value?
• What are the scientific criteria for preservation?
• What is the equivalent of peer review for data?
• Whose data do you trust? • What data will be re-
used?• How much to invest?• Who will add the value?
Image: Christine L. Borgman, 2005
2828
2929
Incentives to share dataIncentives to share data
• Tradition of “open science”
• Replicate, compare results
• Ask new questions
• Form multi-disciplinary alliances
• Required by funding agency or journal
• Tradition of “open science”
• Replicate, compare results
• Ask new questions
• Form multi-disciplinary alliances
• Required by funding agency or journal
Image source:www.buffaloworks.us/ images/sharing%20orangs.jpg
3030
Incentives not to share dataIncentives not to share data
• Rewards for publication, not for data management
• Effort to document data
• Concern for “free riders”
• Risks of misinterpretation of data
• Risks of losing control over data
• Risks of loss of intellectual property
• Rewards for publication, not for data management
• Effort to document data
• Concern for “free riders”
• Risks of misinterpretation of data
• Risks of losing control over data
• Risks of loss of intellectual property
Image source: www.buildingsrus.co.uk/.../ target1.htm
3131
Content and contextContent and context
• Scholarly publications provide context– Literature review, history of problem, definitions of terms
– Theory, hypotheses, goals
– Research method, discussion of results
– Cumulation of scientific knowledge
• Datasets, repositories remove context– Data elements, names of variables
– Instrument readings
– Numerical, textual data
– Images, descriptions of artifacts
• Scholarly publications provide context– Literature review, history of problem, definitions of terms
– Theory, hypotheses, goals
– Research method, discussion of results
– Cumulation of scientific knowledge
• Datasets, repositories remove context– Data elements, names of variables
– Instrument readings
– Numerical, textual data
– Images, descriptions of artifacts
3232
Constructing the value chainConstructing the value chain• Links between data, documents and
objects– Based on common standards – Robust over time– Robust over migration of software and
hardware
• Metadata– Based on common standards– Describe what can be done with them– Describe conditions for use
• Permanent access– Incentives – Curatorial expertise– Institutional models
• Links between data, documents and objects– Based on common standards – Robust over time– Robust over migration of software and
hardware
• Metadata– Based on common standards– Describe what can be done with them– Describe conditions for use
• Permanent access– Incentives – Curatorial expertise– Institutional models
3333
An infrastructure for informationAn infrastructure for information
• Support work practices– Tools and services to capture,
retain, document
• Facilitate communication, exchange– Methods to describe, cite
documents and data– Methods to represent and use
composite objects– Robust linking
• Reflect scientific incentives– Rewards for data contribution– Rewards for data management
• Support work practices– Tools and services to capture,
retain, document
• Facilitate communication, exchange– Methods to describe, cite
documents and data– Methods to represent and use
composite objects– Robust linking
• Reflect scientific incentives– Rewards for data contribution– Rewards for data management Image source:
http://clubs.myams.org/gvca/i mages/Context-Meaning_web.gif
3434
Involve the stakeholdersInvolve the stakeholders• Scientists• Technologists• Social scientists• Librarians, archivists• Universities• Funding agencies• Corporate partners• Publishers
• Scientists• Technologists• Social scientists• Librarians, archivists• Universities• Funding agencies• Corporate partners• Publishers
Image source:http://www.ox.ac.uk/
3535
“May all your problems be technical”
Jim Gray, ACM Turing award winner
“May all your problems be technical”
Jim Gray, ACM Turing award winner