database publishing at nature

49
Database Publishing at Nature Timo Hannay Nature Publishing Group 7 October 2005

Upload: cahil

Post on 21-Jan-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Database Publishing at Nature. Timo Hannay Nature Publishing Group 7 October 2005. Overview. Publishing collaborations: Making databases more like journals NPG New Technology: Making journals more like databases Tagging and social bookmarking: New methods of annotation and navigation. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Database Publishing at Nature

Database Publishing at Nature

Timo Hannay

Nature Publishing Group

7 October 2005

Page 2: Database Publishing at Nature

Overview

Publishing collaborations: Making databases more like journals

NPG New Technology: Making journals more like databases

Tagging and social bookmarking: New methods of annotation and navigation

Page 3: Database Publishing at Nature

Database publishing at NPG

The AfCS-Nature Signaling Gateway (http://www.signaling-gateway.org/)

The CMC-Nature Cell Migration Gateway (http://www.cellmigration.org/)

Forthcoming collaborations with NCI and several other groups

Page 4: Database Publishing at Nature

The AfCS-Nature Signaling Gateway

A freely available online resource for anyone interested in cellular signalling

A collaboration with the research community through the Alliance for Cellular Signaling

An experiment in the next generation of online, database-driven scientific publications

Page 5: Database Publishing at Nature

The Signaling Gateway

Hardware & software hosted atSan Diego Supercomputer Center

Molecule Pages

AfCS Data

Center

Signaling Update

Home,Info & News

• Facts and figures on major cell signaling proteins (3,700+)• Continually updated by selected experts (~1000)• Peer-review run by NPG

News & comment written and commissioned by NPG editors

• Repository for raw experimental data from AfCS• Tools for viewing and analyzing data (online & offline)

Page 6: Database Publishing at Nature
Page 7: Database Publishing at Nature
Page 8: Database Publishing at Nature
Page 9: Database Publishing at Nature
Page 10: Database Publishing at Nature
Page 11: Database Publishing at Nature
Page 12: Database Publishing at Nature

The Molecule Pages

Comprehensive, structured data for 3,700+ proteins involved in cellular signalling

Some information automatically fed in from other online databases and updated monthly

Other information entered by selected expert authors and updated annually

Author-entered data peer-reviewed by NPG Fully citable using digital object identifiers

(DOIs)

Page 13: Database Publishing at Nature
Page 14: Database Publishing at Nature
Page 15: Database Publishing at Nature
Page 16: Database Publishing at Nature
Page 17: Database Publishing at Nature
Page 18: Database Publishing at Nature
Page 19: Database Publishing at Nature

Using Digital Object Identifiers

Nature 409,860 - 921 (2001)

doi:10.1038/35057062

• Allows unambiguous identification of paper• Allows readers to find the paper online• Allows publishers to cross-link reference lists• Guaranteed not to change (even if the publisher changes)

http://dx.doi.org/10.1038/35057062

IDF/CrossRef databases

Correct URL at publisher’s website

Page 20: Database Publishing at Nature

The Molecule Pages: A scientific publication

Characteristic Traditional journal

Traditional database

Molecule Pages

Recognised serial publication with an ISSN

Authored by recognised scientific experts ?

Subjected to full anonymous peer review

Maintained indefinitely (with errata and addenda)

Formerly citable and fully integrated into CrossRef

Structured and highly queryable

The Molecule Pages has the same features as a traditional journal, except that the information it contains is more highly structured and queryable.

Page 21: Database Publishing at Nature

Overview

Publishing collaborations: Making databases more like journals

NPG New Technology: Making journals more like databases

Tagging and social bookmarking: New methods of annotation and navigation

Page 22: Database Publishing at Nature

Great underestimated technologies of our age

Alternating current(1880s)

Executing criminals

The electrically powered society

Web-based scientific publishing(2004)

A new charging model for scientific papers

Redefining the concept the scientific paper

Steam engines(early 1700s)

Pumping water from coal mines The Industrial Revolution

Technology Purported use Eventual impact

Page 23: Database Publishing at Nature

Scientific papers as structured data objects

Print journal

Online facsimile

circa 2000

<rdf>

</rdf>

<svg>

</svg>

Article metadata database

Structured data sets

circa 2006

Structured, interactive and queryable figures and text

Page 24: Database Publishing at Nature

Experimental article metadata database

Initial data to be included:

Author and institute details Scientific:

Molecules (InChI) Genes (Entrez Gene) Proteins (UniProt) Cellular processes, functions, locations (GO)

Species (NCBI) Citation annotations (controlled vocabulary)

Page 25: Database Publishing at Nature
Page 26: Database Publishing at Nature

Support for structured data sets

Preview in browser Download to desktop software

Search for more data

Developing support for:• Systems Biology

Markup Language • CellML• Chemical Markup

Language• Others

Page 27: Database Publishing at Nature

SVG: Figures as interactive data objects

Plot graph on axes of choice Overlay data sets of choice

Click to download raw dataZoom and pan to view detail

Page 28: Database Publishing at Nature
Page 29: Database Publishing at Nature

Automated scientific markup and linking

Page 30: Database Publishing at Nature

Increasing structure in text markup (1)

The old way (no semantic markup):“<p>...gp120 binding to CXCR4 or CCR5 activates PYK2 and FAK…</p>”

Now (key entities and concepts marked up):“<p>...<protein id="urn:lsid:uniprot.org:uniprot:P03378">gp120</protein> <action id="urn:lsid:geneontology.org:go:000548">binding</action> to <protein id="urn:lsid:uniprot.org:uniprot:P48061">CXCR4</protein> or <protein id="urn:lsid:uniprot.org:uniprot:P10147">CCR5</protein> <action id="urn:lsid:geneontology.org:go:0008047">activates</action> <protein id="urn:lsid:uniprot.org:uniprot:O43150">PYK2</protein> and <protein id="urn:lsid:uniprot.org:uniprot:Q05397">FAK</protein>…</p>”

Page 31: Database Publishing at Nature

Increasing structure in text markup (2)

The new way (full RDF/XML):<p>...<rdf:Graph xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:go="urn:lsid:geneontology.org:go:" xmlns:uniprot="urn:lsid:uniprot.org:uniprot:"> <go:000548> <uniprot:Protein rdf:resource="urn:lsid:uniprot.org:uniprot:P03378"/> <uniprot:Protein rdf:resource="urn:lsid:uniprot.org:uniprot:P48061"/> <go:0008047 rdf:resource="urn:lsid:uniprot.org:uniprot:O43150"/> <go:0008047 rdf:resource="urn:lsid:uniprot.org:uniprot:Q05397"/> </go:000548> <go:000548> <uniprot:Protein rdf:resource="urn:lsid:uniprot.org:uniprot:P03378"/> <uniprot:Protein rdf:resource="urn:lsid:uniprot.org:uniprot:P10147"/> <go:0008047 rdf:resource="urn:lsid:uniprot.org:uniprot:O43150"/> <go:0008047 rdf:resource="urn:lsid:uniprot.org:uniprot:Q05397"/> </go:000548> <rdf:label>gp120 binding to CXCR4 or CCR5 activates PYK2 and FAK</rdf:label></rdf:Graph>…</p>

With RDF markup, the article XML itself literally becomes a relational database

Page 32: Database Publishing at Nature

Why go to all this effort?

Discoverability and recontextualisation

“Show me statements about the hedgehog gene.”

“Find claims that disagree with this.”

Transparency and flexibility “Plot this graph on a different scale, with error bars added and with these two extra data sets overlaid.”

Specificity and completeness “Give me a full description of this mathematical model that I can run on my own computer.”

Reuse and interoperability “Provide the raw data set used in this analysis in a form that allows me to merge it with my own data.”

Page 33: Database Publishing at Nature

Views from the database side

“Before the end of the next decade, pathway databases will become scientific journals and journals

will become databases. Biologists will be greatly empowered, and bioinformatics will continue its long

evolution.”

Lincoln Stein (Reactome)

“Is a biological database any different than a biological journal? I am working toward reaching an

answer of, no, there is no difference.”

Phil Bourne (Protein Data Bank)

Page 34: Database Publishing at Nature

Overview

Publishing collaborations: Making databases more like journals

NPG New Technology: Making journals more like databases

Tagging and social bookmarking: New methods of annotation and navigation

Page 35: Database Publishing at Nature
Page 36: Database Publishing at Nature
Page 37: Database Publishing at Nature
Page 38: Database Publishing at Nature
Page 39: Database Publishing at Nature
Page 40: Database Publishing at Nature
Page 41: Database Publishing at Nature
Page 42: Database Publishing at Nature
Page 43: Database Publishing at Nature
Page 44: Database Publishing at Nature
Page 45: Database Publishing at Nature
Page 46: Database Publishing at Nature
Page 47: Database Publishing at Nature
Page 48: Database Publishing at Nature
Page 49: Database Publishing at Nature

A few uses for Connotea

Keeping bookmarks and references in order Sharing links and ideas within a team (perhaps

geographically dispersed) Providing readers with a (dynamic) list of further

or related reading Encouraging readers to share relevant links with

the author and with each other