second open economics workshop - thoughts from the biosciences
Post on 06-May-2015
196 Views
Preview:
DESCRIPTION
TRANSCRIPT
Thoughts from the Biomedical Sciences
Philip E. BourneUCSD
pbourne@ucsd.edu
Second Open Economics Workshop 1June 11, 2013
My Perspective is Drawn from Being:
A data producer and a data user* An overseer of data curation efforts A database provider (PDB & IEDB) Suspicious of workshop reports, data
standards bodies … A supporter of data publication An open access journal founder Opinionated
Second Open Economics Workshop 2June 11, 2013
The Big Picture
The Good News:– NLM – Entrez - A Great Job
– Open data/software/papers have spawned science and jobs
– Success stories: Encode, PDB
– D2K?
The Bad News:– We have resources but now
they are perceived as silos
– Lack of reproducibility revealed
– Sustainability is unsolved
– Failures: CaBIG, DataNet
– D2K?
June 11, 2013 Second Open Economics Workshop
The Big Picture – What is the Way Forward? Driven by scientific outcomes – not build it and they
will come Community, community, - which means:
– A simple vision that many stakeholders can buy into
– Transparency
– Shared ownership
– A code of conduct
– A reward system for individuals and teams
– Strategic policies eg open access, data sharing plans
– Use resources as drivers – funding bodies, societies, institutions have a role here
– Building trust through quality data/software
June 11, 2013 Second Open Economics Workshop 4
Worldwide Protein Data Bank
www.wwpdb.org
Personal Experiences to Support My Big Picture View
June 11, 2013 Second Open Economics Workshop 5
Its All About Trust
6Second Open Economics Workshop
PDB
Trust in the datais perhaps ourbiggest achievement
Its All About Trust
Trust is like compound interest Comes from listening Comes from engaging the community in
every aspect of the process Comes from data consistency and level of
annotation Comes from responsiveness Comes from the quality of the delivery service
7Second Open Economics Workshop June 11, 2013
Data Quality Begats Trust
About 25% of our budget has been spent on data remediation
Support for versioning hence the copy of record
Our ontology/data model has been a critical component of our workflow and data accuracy
Until recently the same data model was too complex to facilitate wide adoption by others that use our data
Second Open Economics Workshop 8June 11, 2013
http://collections.plos.org/ploscompbiol/biocurators.php
Its All About PeopleCurators are the Unsung Heroes
• They really should do more to promote themselves
• Institutions must do more to respect their efforts
9
Its All About PeopleThe Users
Constantly striving to have the user distinguish raw from derived data
All data are not created equal but the user thinks so
Second Open Economics Workshop 10June 11, 2013
Its All About PeopleThe Global Personalities
11 Second Open Economics Workshop
Its NOT All About Institutions
As far as I am aware no data standards body has directly influenced anything we have done in 15 years of running the PDB
The structural biology community created a very successful data sharing plan long before funding bodies did
12Second Open Economics Workshop June 11, 2013
It is About Openness
There are no restrictions on the usage of the data beyond attribution
The PDB runs exclusively on open source software
We maintain and contribute to the Biojava repository
We need to be transparent about data usage
Second Open Economics Workshop 13June 11, 2013
Worldwide Protein Data Bank
www.wwpdb.org
So What Needs to Change re Data?
Second Open Economics Workshop 14June 11, 2013
That All Data Are Created Equal Must End
We need to understand how data are used
Sustainability is not more money from the funding agencies its about business models
Reductionism is not a dirty word – Reference Data!
We need to do more with the long tail
Second Open Economics Workshop
On the Future of Genomic DataScience 11 February 2011: vol. 331 no. 6018 728-729
June 11, 2013
Institutions That Generate Data Must Play a Greater Role
We need institutional data sharing plans
We need data scientists to be better recognized by institutions – its not all about papers – this implies new metrics
Second Open Economics Workshop 16June 11, 2013
www.force11.org– Tim Clark– Ivan Herman– Paul Groth– Ed Hovy– Maryann Martone– Cameron Neylon– David Shotton– Anita de Waard
www.plos.org Beyond the PDF Many others
Second Open Economics Workshop
Funding Agencies: NSF, NIGMS, DOE, NLM, NCI, NCRR, NIBIB, NINDS, NIDDK
17
Acknowledgements
June 11, 2013
The {Lack of} Distinction Between Data and Knowledge Needs to be Better Appreciated
• The PDB paper has been cited 14,000 times • No one has ever read it• Some PDB datasets have 1,000’s of downloads • These data are not associated with publications 18
top related