text and data mining (tdm):tools to make it easier by chuck koscher
TRANSCRIPT
![Page 1: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/1.jpg)
Chuck KoscherDirector of Technology
Ejournal Press RoundtableWashington DC
February 5, 2014
Text and Data Mining (TDM)Tools to make it easier
![Page 2: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/2.jpg)
Geoffrey BilderDirector of Strategic Initiatives
Taking the tedium out of TDM….
![Page 3: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/3.jpg)
X 4079 CrossRef publishers
![Page 4: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/4.jpg)
• Subscription-based publishers find it impractical to negotiate multiple bilateral agreements with thousands of researchers and institutions in order to authorize TDM of subscribed content.
• Researchers find it impractical to negotiate multiple bilateral agreements with hundreds of subscription-based publishers in order to authorize TDM of subscribed content
• All parties would benefit from support of standard APIs and data representations in order to enable TDM across both open access and subscription-based publishers.
![Page 5: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/5.jpg)
Prospect Working Group
• AAAS: Walter Jones, Stewart Wills, Deborah Rivera-Wienhold
• American Institute of Physics: Evan Owens,
• American Physical Society: Mark Doyle
• Elsevier: Chris Shillum, Ale de Vries
• HighWire: John Sack, Craig Jurney
• Institute of Physics Publishing: Graham McCann, James Walker
• Springer: Chinchu Ann Belarmin, Michiel van der Heyden
• Taylor & Francis: Gillian Howcroft
• Walter de Gruyter: Bettina de Keijzer
• Wiley: Edward Wates, Alan Bacon
• CrossRef: Geoffrey Bilder, Chuck Koscher, Ed Pentz, Carol Meyer, Kirsty Meddings.
![Page 6: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/6.jpg)
There is no fee
Text and Data Mining (TDM)
![Page 7: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/7.jpg)
Text & Data Mining (TDM)
IS
IS NOT
Ways to automate the acceptance and verification of acceptance of terms of use licenses
Standardized techniques for navigation to the content
Access control to content
Actual delivery of content
These are under the control of and are the responsibility and the publisher
*
*
![Page 8: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/8.jpg)
Text & Data Mining (TDM)
1)
2)
![Page 9: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/9.jpg)
Based on Content Negotiation
DOI
10.1371/journal.pone.0031314
![Page 10: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/10.jpg)
Based on Content Negotiation
DOI
10.1371/journal.pone.0031314
![Page 11: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/11.jpg)
curl -L -iH "Accept: text/turtle" http://dx.doi.org/10.5555/515151
Based on Content Negotiation
http://help.crossref.org/#content_negotiation
![Page 12: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/12.jpg)
What do publishers need to do
• Deposit text mining specific metadata with CrossRef
MUST
• Distribute mineable data
MUST
• Register licenses and validate user’s requests for data
Might want to
![Page 13: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/13.jpg)
What do researchers need to do
• Use content-negotiation to retrieve an article’s metadata• Extract license and mineable URL• Use this data to retrieve the mineable content
MUST
• If required by the publisher login to the license registry, review and accept the applicable licenses.
MUST
• Be nice when retrieving mineable content
Might want to
![Page 14: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/14.jpg)
TDM driven by specific metadata(that must be deposited to CrossRef)
<collection property="text-mining”> <item> <resource mime_type="application/pdf"> http://annalsofpsychoceramics.labs.crossref.org/fulltext/10.5555/515151.pdf </resource> </item> <item> <resource mime_type="application/xml"> http://annalsofpsychoceramics.labs.crossref.org/fulltext/10.5555/515151.xml </resource> </item></collection>
1) Tell the researcher where to go to get the content
<collection property="text-mining”> <item> <resource mime_type="application/pdf"> http://dx.doi.org/10.5555/515151 </resource> </item> <item> <resource mime_type=“ application/xml"> http://dx.doi.org/10.5555/515151 </resource> </item></collection>
For example: IF the publisher’s content delivery platform abides by Accept headers
![Page 15: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/15.jpg)
2) Tell the researcher what TDM licenses apply to the article
<program name="AccessIndicators"> <license_ref> http://creativecommons.org/licenses/by/3.0/deed.en_US </license_ref></program>
<program name="AccessIndicators"> <license_ref> http://www.annalsofpschoceramics.org/art_license.html </license_ref></program>
Creative Commons CC-BY license:
Publisher’s proprietary license:
<program name="AccessIndicators"> <license_ref start_date="2013-02-03"> http://www.crossref.org/license </license_ref></program><program name="AccessIndicators"> <license_ref start_date="2014-02-03"> http://creativecommons.org/licenses/by/3.0/deed.en_US </license_ref></program>
1 year embargo license:
![Page 16: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/16.jpg)
Researchers should (must) understand and agree to the terms of the license
Can this be validated?
![Page 17: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/17.jpg)
![Page 18: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/18.jpg)
![Page 19: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/19.jpg)
![Page 20: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/20.jpg)
![Page 21: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/21.jpg)
Research queries DOI using CN + API token
Publisher verifies API token with Prospect
If token verified AND access control allows,publisher returns full text
(frequency at publisher discretion)
![Page 22: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/22.jpg)
curl -k -H "CR-Prospect-Client-Token: hZqJDbcbKSSRgRG_PJxSBAx” “https://psychoceramics.org/fulltext/515151" -D - -L -O
![Page 23: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/23.jpg)
![Page 24: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/24.jpg)
![Page 25: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/25.jpg)
![Page 26: Text and Data Mining (TDM):Tools to make it easier by Chuck Koscher](https://reader035.vdocument.in/reader035/viewer/2022070319/55835c42d8b42aa3798b533e/html5/thumbnails/26.jpg)