velterop 2 a ssp arlington may 2015

Big Journal LiteratureBig Usage

Jan Velterop – SSP – Arlington, May 28, 2015

11,135,542

More than 2 addedevery minute of 2014

Number of abstracts in PubMed

Information overload!

that Overload?

Or rapidly increasing knowledge…

…making a world of difference that can change the course of scientific thought?

Dissemination of knowledge

Optimal dissemination for

Lamp post research

Looking merely at the literature that one can read – which is not necessarily all the literature that is potentially important to one’s

research

Lamp post research:

Big Usage

But not in the way we’re used to

So, what to do?

problemproblemEveryEvery has itshas its solutionsolution

Possible strategies:

1.Publish a smaller number of papers

2.Accept that an ever smaller proportion of the available papers is actually being read

3.Capture the knowledge contained in all papers and map it in such a way that you can navigate that knowledge


1.Publish a smaller number of papersMaybe, but if it means less information, it’s

ludicrous






How to choose, though?


In any event:

l’embarras du choix





Yes! Helps to see trends and what to choose!

First

create an overview…

…only then

start digging

How might we create overviews?

“As the rate of publishing accelerates, the need for computational support to work out which articles to read, and how to interpret, reproduce and validate the claims they contain is growing.”

Quote from ‘Lazarus’: http://www.bbsrc.ac.uk/pa/grants/AwardDetails.aspx?FundingReference=BB/L005298/1

http://www.bbsrc.ac.uk/pa/grants/AwardDetails.aspx?FundingReference=BB/L005298/1

Extract Key Insights

Extract Key Insights

Imagine you had a paper that concluded:

“On hot days, it turns out that aspirin decreases the chances of blot clots, but increases the chances of heart attack in humans; the effect wasn't observed in rats at all; simulations of dogs seem to suggest that the effect is present but independent of temperature unless the dog is accompanied by a human”

Imagine you had a paper that concluded:

“On hot dayshot days, it turns out that aspirinaspirin decreasesdecreases the chances of blot clotsblot clots, but increasesincreases the chances of heart attackheart attack in humanshumans; the effect wasn't observed in ratsrats at all; simulations of dogsdogs seem to suggest that the effect is present but independent of temperaturetemperature unless the dogdog is accompanied by a humanhuman”

Significant concepts:

[CHEMBL25] (aspirin)[EFO_0001702] ('temperature' from the experimental factors ontology)[Canis lupus familiaris][Homo sapiens][Mus musculus]

Headline Interactions (in the form of Triples):

[ASPIRIN] [DECREASES] [THROMBOSIS][ASPIRIN] [INCREASES] [MYOCARDIAL INFARCTION]

Significant concepts:

[CHEMBL25] (aspirin)[EFO_0001702] ('temperature' from the experimental factors ontology)[Canis lupus familiaris][Homo sapiens][Mus musculus]

Headline Interactions (in the form of Triples):

[ASPIRIN] [DECREASES] [THROMBOSIS][ASPIRIN] [INCREASES] [MYOCARDIAL INFARCTION]

Add this to the article’s abstract (after it’s been validated by the author):

Most efficient:If publishers were to do this (doesn’t cost much, and makes articles far more useful)

In case publishers don’t, alternative ways are being developed outside publishers’ control

publishing data in articles

Currently:

equals burying data R.I.P.R.I.P.

ocumentsVia Utopia Documents, LAZARUS ‘resurrects’

knowledge from being buried in articles:• entities (‘concepts’, incl. synonyms, e.g.

proteins)• phrases, statements, assertions (e.g. triples)• molecules (incl. Markush structure groups)• graphs• tables

http://utopiadocs.com

http://utopiadocs.com/

• entities (‘concepts’, incl. synonyms, e.g. proteins)• phrases, statements, assertions (e.g. triples)• molecules (incl. Markush structure groups)• graphs• tables

These are captured – with their provenance, e.g. DOI – in a ‘Knowledge Graph’ of their relationshipsWhen assertions are captured, they are compared to the Knowledge Graph and labelled as ‘new’ (to the Graph) or ‘already found earlier’

should be should be interesting interesting for the peer for the peer

reviewer of a reviewer of a newly newly

submitted submitted articlearticle

“Lazarus to harness the crowd reading life-science articles to resurrect the swathes of legacy data buried in charts, tables, diagrams and free-text, to liberate processable data into a shared resource that benefits the community.”


“…activities currently carried out anyway by individuals for their own purposes (annotating, cross-referencing articles with databases, organising collections of articles).”


Works on any pdf, from

Works on any pdf, from

paywalled and open sources

paywalled and open sources

alikealike

“…activities currently carried out anyway by individuals for their own purposes (annotating, cross-referencing articles with databases, organising collections of articles).”

VHL protein binds to HIF-α which is ubiquitinated and tagged for degradation in the proteasome.

‘Assertions’ and ‘significant concepts’ extracted from articles (either by the publisher or by others, like Utopia’s LAZARUS), are added to a growing ‘knowledge graph’ which can be analysed for trends, clusters, areas of intensive activity, etc.

Getting the picture from a large number of data

What we need is information extracted from as many

articles as possible

The more we have, the ‘sharper’ the knowledge

picture

Getting a better picture from even more assertions

Homing in

i.e. making the

choice what to

read in detail

BRAIN — Bio Relations And Intelligence Network

“Recombinant Knowledge”

Once researchers have identified the articles they really need to read,

it should be made very easy to do so

Ergo, what publishers should do, too, is to make all articles

available in all formats: HTML, XML, PDF and ePub – even print, on demand.

Also on mobile devices

For instance:

Easier than you might think

(www.researchpad.co)

http://www.researchpad.co/

Build collection of favourites

Read full text

Inspect metrics

share with others

[email protected] technical inquiries: [email protected]

In their words:

mailto:[email protected]


ResearchPad Launch Process

ProjectDefinition

Branding

Publishing

Go LiveTurnaround

Time - 8 weeks

Slide borrowed from:

What ResearchPad can do for publishers who want it, at no extra cost*, is to integrate a publisher’s content with anything from elsewhere that’s freely available with open access, so that this open access material can be accessed from

within the publisher’s platform

* personal communication

[email protected] technical inquiries: [email protected]



Thank you

Jan Velterop – 28 May 2015

[email protected]

velterop 2 a ssp arlington may 2015

Science

smaller number of papers

available papers

smaller proportion

dissemination of knowledge

increasing knowledge

possible strategies

effect wasnt

number of abstracts