[acm press the 4th international workshop - london, united kingdom (2011.12.07-2011.12.09)]...

2
Supporting Nanopublication Provenance: PMID2DOI Converter Christine Chichester, Hailiang Mei, Kees Burger, Barend Mons Netherlands Bioinformatics Centre 260 NBIC, P.O. Box 9101 6500 HB Nijmegen, The Netherlands +31 24 36 19 500 christine.chichester, kees.burger, hailiang.mei, barend.mons @nbic.nl ABSTRACT A major challenge of linked data is resolving the many different identifiers representing the same object Interconnecting the data requires mappings between the vocabularies and identifiers used in different data sets. To help with this issue, we have developed a service, which we use specifically for nanopublication provenance, that provides the conversion between two types of identifiers; the PubMed Identifier (PMID) which is a unique number assigned to PubMed citations of life science journal articles and the Digital Object Identifier™ (DOI) which is used for identifying digital content. DOI’s are used to provide current information, including where the content (or information about the content) can be found on the Internet. DOI’s are a very useful identifier as they often give a direct link back to the full text scientific article. We provide SOAP and REST web services the conversion data. In addition, there is a SPARQL endpoint for querying the mappings. http://www.pmid2doi.org/ Keywords Nanopublication, ontology, provenance, PMID, DOI, linked data, citation, standardization. \ 1. BACKGROUND AND PURPOSE A nanopublication can be defined as the smallest unit of publishable information: an assertion about anything that can be uniquely identified and attributed to its author [1]. Nanopublications support fine-grained attribution to authors and institutions with the intention of incentivizing data interoperability and rapid dissemination. In general, nanopublications should be citable and monitored for their impact on the scientific community, similar to the h-index, which is an index that attempts to measure both the productivity and impact of the published work of a scientist. Therefore, detailing the provenance information of the nanopublication will be of upmost importance and any tool that can aid in the process can be considered beneficial. Nanopublications may emerge from published scientific articles that presently appear in PubMed. In the provenance of a nanopublication, details necessary to evaluate the origin and quality of the assertion should be given. Since PubMed is a major source of information, more than 21 million citations for biomedical literature, life science journals, and online books, the PMID is often used as the identifier of the citation. Although, DOI names are the only widely adopted persistent identifier for scholarly works, mainly because each DOI name is unique and serves as a stable link to the full-text of an electronic item on the Internet. Therefore we have developed web services to expose PMID to DOI mappings so that a given set of PMIDs can be simply and quickly be associated with the corresponding DOIs. 2. METHOD 2.1 Generating mapping For all PubMed entries that are not associated with a DOI, we extracted the following fields: PMID, journal title, volume, issue, start page, publish date, and last name of last author. Excluding the PMID, but using the remaining 6 fields, we build the query for Open URL Resolver from CrossRef. The returned results are considered correct when there is only one exact match. The query runs constantly and the mapping results are automatically stored in database. The updated PMID to DOI mappings are then used to support our pmid2doi web services. 2.2 Web Services There are SOAP and REST web services available to query the mappings database. A SPARQL endpoint is also offered. http://www.pmid2doi.org/ 3. POSTER DESCRIPTION 3.1 Figure 1: Schema for Nanopublications The anatomy of a second-generation nanopublication adapted and extended from [1]. Paul Groth is the author of the nanopublication ontology. The ontology can also be found here: http://www.nanopub.org/nschema 3.2 Figure 2: Nanopublication RDF Figure 2 shows a representation of the nanopublication RDF model with provenance. This example was generated from the iHOP database for the Open PHACTS project and is courtesy of Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SWAT4LS-2011, December 7-9, 2011 London, UK Copyright © 2011 ACM 978-1-4503-1076-5/11/12... $10.00

Upload: barend

Post on 24-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Supporting Nanopublication Provenance: PMID2DOI Converter

Christine Chichester, Hailiang Mei, Kees Burger, Barend Mons Netherlands Bioinformatics Centre

260 NBIC, P.O. Box 9101 6500 HB Nijmegen, The Netherlands

+31 24 36 19 500

christine.chichester, kees.burger, hailiang.mei, barend.mons @nbic.nl

ABSTRACT A major challenge of linked data is resolving the many different identifiers representing the same object Interconnecting the data requires mappings between the vocabularies and identifiers used in different data sets. To help with this issue, we have developed a service, which we use specifically for nanopublication provenance, that provides the conversion between two types of identifiers; the PubMed Identifier (PMID) which is a unique number assigned to PubMed citations of life science journal articles and the Digital Object Identifier™ (DOI) which is used for identifying digital content. DOI’s are used to provide current information, including where the content (or information about the content) can be found on the Internet. DOI’s are a very useful identifier as they often give a direct link back to the full text scientific article. We provide SOAP and REST web services the conversion data. In addition, there is a SPARQL endpoint for querying the mappings. http://www.pmid2doi.org/

Keywords Nanopublication, ontology, provenance, PMID, DOI, linked data, citation, standardization. \

1. BACKGROUND AND PURPOSE A nanopublication can be defined as the smallest unit of publishable information: an assertion about anything that can be uniquely identified and attributed to its author [1]. Nanopublications support fine-grained attribution to authors and institutions with the intention of incentivizing data interoperability and rapid dissemination. In general, nanopublications should be citable and monitored for their impact on the scientific community, similar to the h-index, which is an index that attempts to measure both the productivity and impact of the published work of a scientist. Therefore, detailing the provenance information of the nanopublication will be of upmost importance and any tool that can aid in the process can be considered beneficial.

Nanopublications may emerge from published scientific articles that presently appear in PubMed. In the provenance of a nanopublication, details necessary to evaluate the origin and quality of the assertion should be given. Since PubMed is a major source of information, more than 21 million citations for biomedical literature, life science journals, and online books, the PMID is often used as the identifier of the citation. Although, DOI names are the only widely adopted persistent identifier for scholarly works, mainly because each DOI name is unique and serves as a stable link to the full-text of an electronic item on the Internet. Therefore we have developed web services to expose PMID to DOI mappings so that a given set of PMIDs can be simply and quickly be associated with the corresponding DOIs.

2. METHOD 2.1 Generating mapping For all PubMed entries that are not associated with a DOI, we extracted the following fields: PMID, journal title, volume, issue, start page, publish date, and last name of last author. Excluding the PMID, but using the remaining 6 fields, we build the query for Open URL Resolver from CrossRef. The returned results are considered correct when there is only one exact match. The query runs constantly and the mapping results are automatically stored in database. The updated PMID to DOI mappings are then used to support our pmid2doi web services.

2.2 Web Services There are SOAP and REST web services available to query the mappings database. A SPARQL endpoint is also offered. http://www.pmid2doi.org/

3. POSTER DESCRIPTION 3.1 Figure 1: Schema for Nanopublications The anatomy of a second-generation nanopublication adapted and extended from [1]. Paul Groth is the author of the nanopublication ontology. The ontology can also be found here: http://www.nanopub.org/nschema

3.2 Figure 2: Nanopublication RDF Figure 2 shows a representation of the nanopublication RDF model with provenance. This example was generated from the iHOP database for the Open PHACTS project and is courtesy of

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

SWAT4LS-2011, December 7-9, 2011 London, UK Copyright © 2011 ACM 978-1-4503-1076-5/11/12... $10.00

Miguel Vazquez, Jose Maria Fernandez Gonzalez, Victor de la Torre and Alfonso Valencia from the CNIO, Madrid, Spain.

3.3 Figure 3: Outline of conversion method Figure 3a diagrams the method used for converting from PMIDs to DOIs and figure 3b shows the page used for accessing the web services.

REFERENCES [1] Groth, P., Gibson, A., Velterop, J. 2010. The anatomy of a

nanopublication. Information Services and Use. 30, 1-2, 51-56. DOI= http://dx.doi.org/10.3233/ISU-2010-0613