data publication coasp 2012. publications 26 million abstracts 2.2 million full text articles...
TRANSCRIPT
![Page 1: Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?](https://reader035.vdocument.in/reader035/viewer/2022062620/551b8315550346d6338b57ce/html5/thumbnails/1.jpg)
Data Publication
COASP 2012
![Page 2: Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?](https://reader035.vdocument.in/reader035/viewer/2022062620/551b8315550346d6338b57ce/html5/thumbnails/2.jpg)
Publications
![Page 3: Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?](https://reader035.vdocument.in/reader035/viewer/2022062620/551b8315550346d6338b57ce/html5/thumbnails/3.jpg)
26 million abstracts
2.2 million full text articles
Citation networksDatabase linksText-mining
20122006 2011 2016?
Europe PubMed Central
![Page 4: Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?](https://reader035.vdocument.in/reader035/viewer/2022062620/551b8315550346d6338b57ce/html5/thumbnails/4.jpg)
How many open access articles in UKPMC?PubMed (995K)
UKPMC (18%,182K)
OA (9.6%, 96K)
![Page 5: Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?](https://reader035.vdocument.in/reader035/viewer/2022062620/551b8315550346d6338b57ce/html5/thumbnails/5.jpg)
Big Data:Deposition
Primary
Research articles
Big Data:Curated
Annotation
Managing the public data ecosystem
Unstructured Data
1
2
12
3
![Page 6: Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?](https://reader035.vdocument.in/reader035/viewer/2022062620/551b8315550346d6338b57ce/html5/thumbnails/6.jpg)
Literature citation from data(data annotation)
![Page 7: Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?](https://reader035.vdocument.in/reader035/viewer/2022062620/551b8315550346d6338b57ce/html5/thumbnails/7.jpg)
Links from Literature to Databases
• Proteins• Nucleotides• OMIM• Chemicals• Structure• Clinical reviews• Protein families• Protein-protein interactions• Gene expression experiments
800 K
370 K
110 K
![Page 8: Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?](https://reader035.vdocument.in/reader035/viewer/2022062620/551b8315550346d6338b57ce/html5/thumbnails/8.jpg)
Database crosslinks
Bibliography from P25106
![Page 9: Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?](https://reader035.vdocument.in/reader035/viewer/2022062620/551b8315550346d6338b57ce/html5/thumbnails/9.jpg)
Data citation from literature(provenance)
![Page 10: Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?](https://reader035.vdocument.in/reader035/viewer/2022062620/551b8315550346d6338b57ce/html5/thumbnails/10.jpg)
Semantic Type Unique Terms Articles Annotations
Accession No. 233,017 66,356 387,787
Chemical 76,712 1,694,385 83,923,066
Disease 171,692 1,768,214 57,821,871
Gene/Protein 227,318 1,310,382 77,189,022
GO Terms 32,664 1,832,294 65,061,579
Organism 180,637 1,713,280 70,832,222
Text Mining in UKPMC (2.2 million articles)
![Page 11: Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?](https://reader035.vdocument.in/reader035/viewer/2022062620/551b8315550346d6338b57ce/html5/thumbnails/11.jpg)
Accession numbers stories: data citation in OA articles
Senay Kafkas Jee-Hyub Kim
![Page 12: Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?](https://reader035.vdocument.in/reader035/viewer/2022062620/551b8315550346d6338b57ce/html5/thumbnails/12.jpg)
gen
pdb
spro
t
genp
ept
geo
omim pir
embla
lign
pubc
hem
pmc
0
10
20
30
40
50
60
70
80
90
100
gen
pdb
spro
t
arra
yexp
ress
pfam
inter
pro
0
10
20
30
40
50
60
70
80
90
100
publisher-annotated text-mined
Annotation of accession numbers (OA)
~10,000 articles >25,000 articles
• Névéol A, Wilbur WJ, Lu Z (2012) Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE. Database 2012:bas026 (PMC3371192)
• Névéol A, Wilbur WJ, Lu Z (2011) Extraction of data deposition statements from the literature: a method for automatically tracking research results. Bioinformatics 27, 3306-3312 (PMC3223368)
![Page 13: Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?](https://reader035.vdocument.in/reader035/viewer/2022062620/551b8315550346d6338b57ce/html5/thumbnails/13.jpg)
bmc genomics
bmc evolutionary biology
the journal of cell biology
virology journal
bmc microbiology
the journal of experimental medicine
bmc bioinformatics
bmc plant biology
the journal of biological chemistry
bmc molecular biology
• plos one
acta crystallographica section e:
british journal of cancer
the journal of cell biology
environmental health perspectives
• nucleic acids research
the journal of experimental medicine
critical care
• emerging infectious diseases
bmc bioinformatics
• plos one
• nucleic acids research
bmc genomics
bmc evolutionary biology
the journal of cell biology
plos pathogens
bmc bioinformatics
virology journal
bmc microbiology
• emerging infectious diseases
Most publisher tags Most articlesMost text-mined tags
BMC Genomics: 1,484 TM tags*, 4,337 articlesPLoS One: 4,226 TM tags*, 42,888 articles
Efficacy of Accession number tagging (OA)
![Page 14: Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?](https://reader035.vdocument.in/reader035/viewer/2022062620/551b8315550346d6338b57ce/html5/thumbnails/14.jpg)
Scientific:
Linking articles that cite the same data
Citation:
Data Citation as measure of impact (Thomson: Data citation index)
Context of data citation: submission, reuse, analysis
Operational:
Services for publishers to improve Accession number tagging
Editorial policies and adherence
Extension of NLM DTD
Lessons learned for considering unstructured data
Why is this important? Implications
That we can perform this analysis at all highlights a benefit of Open Access
![Page 15: Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?](https://reader035.vdocument.in/reader035/viewer/2022062620/551b8315550346d6338b57ce/html5/thumbnails/15.jpg)
AY387398: needle in a haystack
![Page 16: Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?](https://reader035.vdocument.in/reader035/viewer/2022062620/551b8315550346d6338b57ce/html5/thumbnails/16.jpg)
![Page 17: Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?](https://reader035.vdocument.in/reader035/viewer/2022062620/551b8315550346d6338b57ce/html5/thumbnails/17.jpg)
Unstructured data
![Page 18: Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?](https://reader035.vdocument.in/reader035/viewer/2022062620/551b8315550346d6338b57ce/html5/thumbnails/18.jpg)
Articles with supplemental data (UKPMC)
• 235,000 articles (50K+ in 2011)
• 718, 511 files
• 459 extensions
• 0.8 TB (1200 CDs)• (However most data in ~60 extension types)
%
Pub Year
![Page 19: Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?](https://reader035.vdocument.in/reader035/viewer/2022062620/551b8315550346d6338b57ce/html5/thumbnails/19.jpg)
Big Data:Deposition
Primary
Research articles
Big Data:Curated
Annotation
Managing the public data ecosystem
Structured links
Unstructured Data
reuse
analysisprovenance
• Open• Citable • Discoverable• Reusable
![Page 20: Data Publication COASP 2012. Publications 26 million abstracts 2.2 million full text articles Citation networks Database links Text-mining 2012 200620112016?](https://reader035.vdocument.in/reader035/viewer/2022062620/551b8315550346d6338b57ce/html5/thumbnails/20.jpg)
People
• Paula Buttery• Andrew Caines• Norman Cobley• Yuci Gou• Senay Kafkas• Jyothi Katuri• Oliver Kilian• Jee-Hyub Kim• Nikos Marinos• Jo McEntyre• Xingjun Pi• Philip Rossiter
• Rebholz Group• Peter Stoehr
• University of Manchester• British Library
• OpenAIRE/OpenAIRE Plus
• NCBI, NLM