ebankii workshop 1 making scientific data openly available simon coles school of chemistry,...

19
eBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton

Upload: abigail-roberts

Post on 28-Mar-2015

218 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton

                                                             

eBankII Workshop1

Making Scientific Data Openly Available

Simon Coles

School of Chemistry,

University of Southampton

Page 2: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton

                                                             

eBankII Workshop2

Scientific Data Overload!

Cl

Cl

Cl

Cl

Cl

Cl

ClCl Cl

Cl

Cl

ClCl

O

O

O

O

N

N

N

N

N+

O

O

O

N+

O

O

O

Page 3: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton

                                                             

eBankII Workshop3

CombeChem: eScience testbed

Properties

X-Raye-Lab

Analysis

Propertiese-Lab

SimulationVideo

Diff

ract

omet

er

Grid Middleware

StructuresDatabase

Page 4: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton

                                                             

eBankII Workshop4

Chemistry Publications

Ideas and interpretations Hooks into the literature

Results & derived data

Raw data!

Page 5: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton

                                                             

eBankII Workshop5

Page 6: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton

                                                             

eBankII Workshop6

Establishing common ground

• Understand the data creation process • Terminology and definitions

– Data– Metadata– Datafile– Dataset– Data holding

• Different views– Digital library researchers, computer scientists, chemists– Generic vs specific– Modeller vs practitioner

• Aim for a common ontology• Modelling the domain• Creating a metadata schema

Page 7: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton

                                                             

eBankII Workshop7

Crystallography workflow

RAW DATA DERIVED DATA RESULTS DATA

Page 8: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton

                                                             

eBankII Workshop8

Crystallography datasets

• Initialisation: mount new sample on diffractometer & set up data collection

• Collection: collect data• Processing: process and correct images• Solution: solve structure• Refinement: refine structure• CIF: produce CIF (Crystallographic Information

File format)• Report: generate Crystal Structure Report• Validation: generate report from structure checks

Within a dataholding are the following datasets:

Page 9: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton

                                                             

eBankII Workshop9

Publishing, Informatics & Schemas

• Current schema is for publishing / advertising only• eCrystals publishing requires lightweight schema

only• eBank harvesting requires lightweight schema only• Aggregation and Linking requires a comprehensive

schema • Data management, Information delivery and

Searching services require a very rich schema

Page 10: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton

                                                             

eBankII Workshop10

Deposition into the archive

Page 11: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton

                                                             

eBankII Workshop11

An Archive entry

ecrystals.chem.soton.ac.uk

Page 12: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton

                                                             

eBankII Workshop12

Access to the underlying data

Page 13: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton

                                                             

eBankII Workshop13

Some metadata issues

• Using simple and qualified Dublin Core • Additional chemical information in schema for

harvesting e.g. empirical formula• Schema contains International Chemical Identifier

(InChI)• Specifies which ‘datasets’ are present in an entry• Links to ePrints (and other published literature)

derived from the data• Using vocabularies specific to crystallography

Page 14: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton

                                                             

eBankII Workshop14

Harvesting: OAIster

Page 15: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton

                                                             

eBankII Workshop15

Linking and aggregating

Page 16: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton

                                                             

eBankII Workshop16

Embedded in a science portal

Page 17: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton

                                                             

eBankII Workshop17

Current situation

• Version 2.0 eBank metadata schema• Pilot institutional e-data repository for harvesting (raw,

derived, results data) using EPrints software• Exports records as ebank_dc and oai_dc• Validation of schema & discussion with International

Union of Crystallography for developments (and wider deployment)

• Pilot eBank UK aggregator service• Developing search interface Version 1.0 • Testing with PSIgate physical sciences portal –

embedding eBank UK

Page 18: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton

                                                             

eBankII Workshop18

What’s next?

• Generic metadata schema vs Subject specific schema • Validation against other schema (CCLRC Model)• (Eprints.org software: allow for more generic scientific data

and schemas?) • Metadata enhancement: keywords based on knowledge of

keywords in related publications?• Investigate identifiers: International Chemical Identifier • Explore context sensitive linking• Embedding into chemical and crystallographic research and

publishing• e-Learning embedding and pedagogic evaluation• Feasibility study in related domains

Page 19: EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton

                                                             

eBankII Workshop19

Crystallography Schema Breakout

• Describing non dc: terms– METS– SET container

• Rights– IPR– Copyright– Publisher– Funder

• Linking – DOI– Keyword ontology– Identifiers

• Data validation- Add validation dataset

- Other forms of validation: Mogul

• Chemical representation

- Naming conventions

- Empirical formula representation

• Relationship between repositories and harvesters

- Registration / subscription

• Syndication

- FRIENDS container

- RSS feeds