ii-sdv 2016 michael iarrobino - improving text mining results with access to full-text scientific...
TRANSCRIPT
Improving Text Mining Results withAccess to Full-Text Scientific Articles
Mike IarrobinoProduct Manager, CCC
Introduction
Mike IarrobinoProduct ManagerRightFind™ XML for MiningCopyright Clearance Center
Making Copyright Work – CCC and RightsDirect
Rightsholders Content Users
• Licensing Solutions
• Rights Management
• Content Delivery
• Copyright Education950+ million rights from:
• Publishers
• Authors
• Agents
• Creators
• 35,000 companies
• Workers worldwide
• 1,200 colleges and universities
• Publishers and Authors
CCC and Text Mining
Rightsholders Content Users
Servicing many text mining license and content requests
Managing text mining feeds
Negotiating text mining rights with
multiple publishers
“Text mining” is the process of deriving high-quality information from text materials using software.
Text Mining Non-Patent Literature
• Mining limited to abstracts
• High cost to obtain formatted full-text content and permission from multiple publishers
• Multiple formats
• Researchers can’t mine content to which they are not subscribed
What is the Benefit of Full Text?
Volume Timeliness Quality
Catherine Blake. “Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles.” Journal of Biomedical Informatics Volume 43, Issue 2, April 2010, Pages 173–189
Elsevier (2015) Harnessing the Power of Content -Extracting value from scientific literature: the power of mining full-text articles for pathway analysis. Available at www.elsevier.com/__data/assets/pdf_file/0016/83005/R_D-Solutions_Harnessing-Power-of-Content_DIGITAL.pdf
Elsevier (2015) Harnessing the Power of Content -Extracting value from scientific literature: the power of mining full-text articles for pathway analysis. Available at www.elsevier.com/__data/assets/pdf_file/0016/83005/R_D-Solutions_Harnessing-Power-of-Content_DIGITAL.pdf
Enrique Bernal-Delgado and Elliot S Fisher. “Abstracts in high profile journals often fail to report harm.” BMC Medical Research Methodology (2008); 8:14
Volume and Recall
December 20158
(Abstract: "tau hyperphosphorylation" AND Abstract: kinase OR (GSK3β OR (CDK5 OR (MAPK1 OR (MARK1 OR (MARK2 OR (MARK3 OR MARK4))))))) AND (Abstract: alzheimer OR alzheimer's)
content:"tau hyperphosphorylation kinase"~25 OR "tau hyperphosphorylation GSK3β "~25 OR "tau hyperphosphorylation CDK5"~25 OR "tau hyperphosphorylation MAPK1"~25 OR "tau hyperphosphorylation MARK1"~25 OR "tau hyperphosphorylation MARK2"~25 OR "tau hyperphosphorylation MARK3"~25 OR "tau hyperphosphorylation MARK4"~25
Volume and Recall - Results
December 20159
0
100
200
300
400
500
600
700
800
BTK Tauhyperphosphorylation
Nu
mb
er A
rtic
les
Abstract
Full text
Text Mining Today – Example Workflow
December 201510
SearchGet
permissionDownload
PDFsConvert PDFs
Import into text mining software
SearchGet
permissionDownload
PDFsConvert PDFs
Import into text mining software
• Perform search• Obtain permission from publishers to mine full text for commercial use
• Requires automated tool or custom software to download in bulk
• Requires text mining permission from multiple publishers
• Requires content storage and feed management
• PDF is converted to a “blob of text”
• No tags
• Loss of metadata
• Low fidelity of content
• References induce noise
• Requires structuring text into XML
• Article text does not have “fields”
• Combining content from multiple sources takes time to normalize the metadata
SearchGet
permissionDownload
PDFsConvert
PDFs
Import into text mining
software
TEXT MINING TOOLS
Run queries
View results
MANUAL WORKTypically takes 4-8 weeks
CCC’s RightFind™ XML for Mining Service
Build a corpus of full-text articles in XML format for mining
Text Mining SoftwareCCC’s Text Mining Service
XML for Mining
• Rapid inventory growth
• MEDLINE abstract corpus
• Purchase not subscribed articles with cost optimization process
• MeSH article tagging and flat synonym list
Market Observations and Future Vision
ACCESS
AUTOMATION
Thank you!Mike IarrobinoProduct Manager, [email protected]