a centre of expertise in digital information management ukoln is supported by: c21st scholarship:...
TRANSCRIPT
A centre of expertise in digital information management
www.ukoln.ac.uk
UKOLN is supported by:
C21st Scholarship: Data as an Agent for Change
Dr Liz Lyon, Director, UKOLN, University of Bath, UKAssociate Director, UK Digital Curation Centre
3rd Bloomsbury Conference, London, June 2009
.This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0
Perspectives1. The 21stC Scholar :
Team Science in the Cloud
2. Chemical Crystallography : Data Publishing Showcase
3. The Future : a Transformational Agenda
What does the C21st research(er) look like?• “From users to
choosers” (Yanosky)
• Pro-sumers (Toffler)
• Digital nomads
• Work on the Webtop
http://www.flickr.com/photos/shankrad/2905938179/
• Multi-scale & complex
• Highly data-intensive
• Increasingly “open”http://www.flickr.com/photos/stormsriver/2286011597/
What do we mean by Team Science?• Science as a
social activityTweet
BlogComment
RateVote
RecommendTag
ShareMash• Highly collaborative
• Multi-disciplinary
• Core team skills
• Trust is key• Inter-institutional
collaboration –better science (Brian Uzzi, 2008)
A new digital economy?• Data is:– On demand– A utility– Commoditised– Un-differentiated– “Publish then filter”
(Shirky)– Traded
• “Cloud” model?• Brokers & aggregators
are key roles• Free, pay per use, pay as
you grow…..
http://www.flickr.com/photos/will-lion/2738252562/
• Economies of scale• Network effects
• New data publishing business models
http://www.flickr.com/photos/thomasreichart/2130018485/sizes/l/
Chemical Crystallography : Data Publishing Showcase
Data Deluge
A bottleneck : the primary cause is the current data publication process, which is tied to journal articles and peer review A bottleneck : the primary cause is the current data publication process, which is tied to journal articles and peer review
“40 years ago a PhD student would determine about 3 crystal structures for their thesis – this can now be easily achieved in a day”
35 million 2.5 million
0.5 million
‘Few thousand’
Slide: Dr Simon Coles, Univ Soton
Simon Coles, Mike Hursthouse, Jeremy Frey, Cameron Neylon, Andrew
Milsted, Richard Stephenson, Jamie Robinson, Steven
Wilson, Andrew Bailey, Mark Borkum
Dave DeRoure, Les Carr, Monica Schraefel, Chris
Gutteridge, Tim Myles-Board, Arouna Woukei, Dave
Tarrant, Stuart Middleton
Liz Lyon, Manjula Patel, Rachel Heery, Monica Duke, Michael Day, Traugott Koch,
Pete Cliff
Domain (Chemists)
Computer science
Informatics
eCrystals Team
eCrystals Data Repository
• Quick & simple to deposit• Software tools • Laboratory archive• Community involvement• ‘Embargo’ facility• Structured foundations• Discoverable & harvestable
http://ecrystals.chem.soton.ac.ukhttp://ecrystals.chem.soton.ac.uk
Trust
Standards
Audit and certification tools
• TRAC
• DRAMBORA
• PLATTER
• NESTOR
• Data Seal of Approval
eCrystals Curation Reports (3)• Preservation metadata• PREMIS Data Dictionary• OAIS
• Representation Information• Registry/Repository RRORI
Data sustainability
Data Discovery & Access
“Community Criteria for Interoperability” (Scaling Up Report 2008)
• Domain data format standard: CIF• Domain data validation standard: CheckCIF• Metadata schema: eCrystals Application Profile
http://www.ukoln.ac.uk/projects/ebank-uk/schemas/
• Crystallography Data Commons: TIDCC Data Model in development
• Embargo & Rights http://ecrystals.chem.soton.ac.uk/rights.html
• Domain identifier: International Chemical Identifier • Citation & linking: DOI
http://dx.doi.org/10.1594/ecrystals.chem.soton.ac.uk/145
A centre of expertise in digital information management
www.ukoln.ac.uk
Slide of data services : CrystalEye, Crystal Web, Chemxseer etc search structures check PMR stuff aggregate, syndiucate, filter etc.
New Web service to aggregate published crystallography data...
27molecules
Data(capture)
SemanticGraph
(storage)
Mash-up(reuse)
text
experiments
measurements
documents data
molecules
data
scientists
oreChem – The Chemical Semantic Web
• University of Cambridge
• Cornell University• Indiana University• Penn State University• University of
Queensland• University of
Southampton
• At-source capture of chemistry data• Chemical structure search• Compound object authoring• Retrospective harvesting of chemistry
data• Reuse through common ORE data model• Semantic authoring• Virtualized triple storage
Slide: Dr Simon Coles, Univ Soton
We need to understand the value and benefits of data publishing and associated data curation / management.... and articulate them clearly
• Values & benefits may be:– political – economic – societal...
• DCC Research Data Management Forum 3
Some issues and challenges.....
1. Research quality• Publications based on closed peer review
• Maintain reputation
• Demonstrate provenance
• Open pilots – Nature
• Use collective intelligence
• Ratings, polls, recommender systems
• Data publishing policy?
2. Research sustainability• Ensure curation & preservation of long term scientific
record including the data
• Requires significant investment in infrastructure
• Assure data security
• Demonstrate resilience & robustness
• Establish trust
• New business models
• Understand full costs
3. Research capacity & capability
• Multi-disciplinary team
• Hybrid skills
• New field - data informatics
• New roles for information professionals?
IJDC 2009 (in press)
• Increase capacity & capability• Embed skills in LIS curriculum• Develop career paths, incentivise
Take homes
1. Team science is a social activity
2. We need to advocate the value & benefits of data publishing
3. Data informatics underpins C21st scholarship