text mining full text for molecular targets

27
Country Long Distance Australia +61 3 8488 8993 Austria +43 (0) 7 2088 2171 Belgium +32 (0) 42 68 0164 Canada +1 (647) 497-9386 Denmark +45 (0) 89 88 04 43 Finland +358 (0) 931 58 4587 France +33 (0) 182 880 933 Germany +49 (0) 692 5736 7304 Ireland +353 (0) 19 036 186 Text Mining Full Text for Molecular Targets with George Jiang, Ph.D., M.B.A Our Webinar will begin in a few minutes Country Long Distance Italy +39 0 294 75 15 36 Netherlands +31 (0) 108 080 115 New Zealand +64 (0) 9 801 0293 Norway +47 21 03 72 89 Spain +34 911 23 4247 Sweden +46 (0) 852 500 292 Switzerland +41 (0) 435 0824 40 United Kingdom +44 (0) 330 221 9921 United States +1 (646) 307-1726 TO USE YOUR COMPUTER'S AUDIO: When the webinar begins, you will be connected to audio using your computer's microphone and speakers (VoIP). A headset is recommended. --OR-- TO USE YOUR TELEPHONE: If you prefer to use your phone, you must select "Use Telephone" after joining the webinar and call in using the numbers below. Dial your country’s number and then use Access Code: 655-028-479

Upload: ann-marie-roche

Post on 16-Jul-2015

120 views

Category:

Science


3 download

TRANSCRIPT

Page 1: Text mining full text for molecular targets

Country Long Distance

Australia +61 3 8488 8993

Austria +43 (0) 7 2088 2171

Belgium +32 (0) 42 68 0164

Canada +1 (647) 497-9386

Denmark +45 (0) 89 88 04 43

Finland +358 (0) 931 58 4587

France +33 (0) 182 880 933

Germany +49 (0) 692 5736 7304

Ireland +353 (0) 19 036 186

Text Mining Full Text for Molecular Targetswith George Jiang, Ph.D., M.B.A

Our Webinar will begin in a few minutes

Country Long Distance

Italy +39 0 294 75 15 36

Netherlands +31 (0) 108 080 115

New Zealand +64 (0) 9 801 0293

Norway +47 21 03 72 89

Spain +34 911 23 4247

Sweden +46 (0) 852 500 292

Switzerland +41 (0) 435 0824 40

United Kingdom +44 (0) 330 221 9921

United States +1 (646) 307-1726

TO USE YOUR COMPUTER'S AUDIO: When the webinar begins, you will be connected to audio using

your computer's microphone and speakers (VoIP). A headset is recommended.

--OR--

TO USE YOUR TELEPHONE: If you prefer to use your phone, you must select "Use Telephone" after

joining the webinar and call in using the numbers below.

Dial your country’s number and then use Access Code: 655-028-479

Page 2: Text mining full text for molecular targets

Text Mining Full Text for Molecular Targets

George Jiang, PhD, MBA

Product Manager, Text Mining

[email protected]

March 31, 2015

Page 3: Text mining full text for molecular targets

George Jiang

Product ManagerText Mining

Trained scientist with several years of experience in text analytics, data integration, and

scientific software development

• Currently, Product Manager with Elsevier working on text mining projects and

semantic search products, based out of Rockville, MD

• Previously, worked at US National Center for Biotechnology Information (NCBI)

working on Discovery Initiative to understand users needs and crosslink data and

expose it to make research information more discoverable

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Page 4: Text mining full text for molecular targets

World Leader in Digital Information Solutions

Published over

330,000 articlesin 2013

Founded over

130 years ago

Work with over

30 million scientists, students, health

& information professionals

Received over

1 million submissions in 2013

SOLUTIONS

Over 53 million items indexed by

Scopus

Elsevier

R+D Solutions

Elsevier

Clinical Solutions

Helps corporate

researchers, R+D

professionals, and

engineers improve how

they interact with, share,

and apply information to

solve problems using

our digital workflow

tools, analytics, and data

Provides universities,

governments, and

research institutions with

the resources and

insights to improve

institutional research

strategy, management,

and performance.

Elsevier

Education

Helps medical

professionals apply

trusted data and

sophisticated tools to

make better clinical

decisions, deliver better

care, and produce

better healthcare

outcomes.

Helps educate

highly-skilled,

effective healthcare

professionals, using

the most advanced

pedagogical tools

and reference

works.

Elsevier

Research Intelligence

CONTENT

CA

PA

BIL

ITIE

SP

LA

TF

OR

MS

Publishes over

2,200 online

journals & over

26,000 books

(e + print)

Elsevier eBooks, Online

Journals, Databases

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Page 5: Text mining full text for molecular targets

Working With Text is A Big Data Challenge

Text is everywhere! We’ve already covered 100s of terms in this presentation.

Twitter - 58M tweets/day x 14.98 words/tweet => 868M words/day => 6B

Average journal article = 10, 150, 6000 words in title, abstract , full text

abstracts – 2.4B words (24M abstracts @ PubMed x 100 words/abstract)

full text – 144B words ( if comparable set from PubMed, 25M x 6000

The information deluge of scientific content and how to manage

and/or leverage this information is a big data challenge

Information seeking challenges can be addressed with automation assistance and text mining for greater insight

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Page 6: Text mining full text for molecular targets

Summary

• Text mining can help to sift through large amounts of scientific literature and other

textual content

• Text mining can help to increase project team efficiency to find precise statements

and relationships

• Full text articles provide richer result sets that can be useful in finding additional

insights that cannot be garnered just using abstracts

• Several hurdles still exist to implement text mining but the value can outweigh costs

Text mining full text can be used to help find molecular targets of

interest quickly that may be missed if relying on abstracts and

keyword searching

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Page 7: Text mining full text for molecular targets

Agenda

• Introduction to Text Mining

• The Value of Full Text Articles

• Illustration of Text Mining Full Text Articles

• Recap

• Q&A

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Page 8: Text mining full text for molecular targets

What is Text Mining?

Text Mining

• Refers to the process of deriving high-quality structured

A Does B

X Inhibits Y

G Stops D

I Drink T

documents facts

Why Text Mining?

• Text Mining can yield better results, and increase team efficiency

• The application of text mining techniques can be used to solve

business problems

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Page 9: Text mining full text for molecular targets

Example of Getting Structured Information (Facts)

Triple negative breast cancer (TNBC) cells lack receptor expression, are frequently

more aggressive and are resistant to growth factor inhibition

documents

sentence

fact(s)

Tumour cells show greater dependency on glycolysis so providing a sufficient and rapid energy supply for fast growth. In many breast cancers, estrogen, progesterone and

epidermal growth factor receptor-positive cells proliferate in response to growth factors and growth factor antagonists are a mainstay of treatment. However, triple negative

breast cancer (TNBC) cells lack receptor expression, are frequently more aggressive and are resistant to growth factor inhibi tion. Downstream of growth factor receptors,

signal transduction proceeds via phosphatidylinositol 3-kinase (PI3k), Akt and FOXO3a inhibition, the latter being partly responsible for coordinated increases in glycolysis

and apoptosis resistance. FOXO3a may be an attractive therapeutic target for TNBC. Therefore we have undertaken a systematic review of FOXO3a as a target for breast

cancer therapeutics.

paragraph

TNBC cells lack receptor expression

TNBC cells are more aggressive

TNBC cells resist growth factor inhibition

Excerpt from Taylor et al. Evaluating the

evidence for targeting FOXO3a in breast

cancer: a systematic review.

Wordcloud plotted with Wordle.nettokens

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Text analytics and

Visualizations

Page 10: Text mining full text for molecular targets

What is Text Mining Being Used For?

Use cases include:

• Target identification and prioritization

• Biomarker discovery

• Drug repurposing

• Drug safety and finding adverse events

• Clinical study design and site selection

• Competitive intelligence

DISCOVERYPRE-

CLINICALCLINICAL

POST-LAUNCH

Text mining article submissions for curation

assistance in publishing

Basic Research Applied Research

Text mining can be used to support several research and development areas

Information retrieval and analysis

of biomedical literature for target

identification, systematic reviews,

etc.

Searching clinical trial data

or electronic health records

to find signals in patient

populations

Triage of news and papers

for literature curation and

regulatory reporting

Identifying relevant items for

meta-analysis of specific research

results

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Page 11: Text mining full text for molecular targets

How to Text Mine?

• Content

• Ontology

• Software solution(s)

• Expertise

Several pieces and steps are often needed to get results from text mining

Aggregate1 Structure2

Normalize3

Integrate4

• PDF -> XML

• XML quality differs

• XML uniformity e.g. dealing

with sources, types, etc.

Default or custom ontology

• Text mining the corpus

• Balancing expectations of

precision and recall

1. Aggregate

2. Structure

3. Normalize

4. Integrate

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Text Mining solutions &

Professional Services

Page 12: Text mining full text for molecular targets

Elsevier Offers Several Text Mining Solutions

facts and data out

support downstream

applications and activitiesAggregate

Normalize

Structure

Integrate

1

2

3

4

Journals and Books

Internal content

Patents

Other

Software solution

UI / API

Public data sources

User Questions

Software solutions and Professional Services available for text mining and

semantic searching

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Page 13: Text mining full text for molecular targets

• Introduction to Text Mining

• The Value of Full Text Articles

• Illustration of Text Mining Full Text Articles

• Recap

• Q&A

Agenda

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Page 14: Text mining full text for molecular targets

Abstracts vs Full Text

• Concise summaries

• Readily accessible

• Relatively uniform

Summary of main differences

• Complete documents

• May not be as accessible

• Information within can vary

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Page 15: Text mining full text for molecular targets

Benefits of Using Full Text

• Distribution of keywords, facts and relations – more keywords, facts

and relations are found in full text

• Concept under-representation in abstracts – specific entities may not

be mentioned in abstracts but primarily in full text sections e.g. biological functions

• Missing Negative data – often negative results or non-significant data

are missing from abstracts

• Citations per article – full text sections are more cited vs abstracts

• Timeliness – Relevant facts and relationships can be found in full text

first before any mentions in abstracts as researchers surmise in

Full Text provide richer results sets

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Page 16: Text mining full text for molecular targets

Additional Reading

• Information extraction from full text scientific articles: where are the keywords? BMC Bioinformatics. 2003 May 29;4:20. Epub 2003 May 29.

• Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles. J Biomed Inform. 2010

Apr;43(2):173-89. doi: 10.1016/j.jbi.2009.11.001. Epub 2009 Nov 10.

• Do Peers See More in a Paper Than Its Authors? Adv Bioinformatics. 2012;2012:750214. doi: 10.1155/2012/750214. Epub 2012 Nov 27.

• Is searching full text more effective than searching abstracts? Bioinformatics. 2009 Feb 3;10:46. doi: 10.1186/1471-2105-10-46.

• Challenges for automatically extracting molecular interactions from full-text articles. BMC Bioinformatics. 2009 Sep 24;10:311. doi:

10.1186/1471-2105-10-311.

• Semi-Automatic Indexing of Full Text Biomedical Articles. AMIA Annu Symp Proc. 2005:271-5.

• Discovering implicit associations between genes and hereditary diseases. Pac Symp Biocomput. 2007:316-27.

• The structural and content aspects of abstracts versus bodies of full text journal articles are different. BMC Bioinformatics. 2010

Sep 29;11:492. doi: 10.1186/1471-2105-11-492.

• Abstracts in high profile journals often fail to report harm. BMC Med Res Methodol. 2008 Mar 27;8:14. doi: 10.1186/1471-2288-8-14.

• Quality of abstracts of original research articles in CMAJ in 1989. CMAJ. 1991 Feb 15;144(4):449-53.

• Accuracy of data in abstracts of published research articles. JAMA. 1999 Mar 24-31;281(12):1110-1.

Articles highlighting the differences between abstracts and full text

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Page 17: Text mining full text for molecular targets

Abstract vs Full Text Example

Challenges

Sifting through more information!

Finding the right results

Concise abstracts cannot contain all details whereas full text will

contain all the relevant information

Significant advances have been made in the treatment of human immunodeficiency virus (HIV) infection over the past two

decades. Improved therapy has prolonged survival and improved clinical outcome for HIV-infected children and adults.

Sixteen antiretroviral (ART) medications have been approved for use in pediatric HIV infection. The Department of Health

and Human Services (DHHS) has issued “Guidelines for the Use of Antiretroviral Agents in Pediatric HIV Infection”, which

provide detailed information on currently recommended antiretroviral therapies (ART). However, consultation with an HIV

specialist is recommended as the current therapy of pediatric HIV therapy is complex and rapidly evolving.

Elvitegravir is a once daily integrase inhibitor being studied in adults.

Children with treatment failure should be evaluated for medication adherence, drug intolerance, and possible drug

interactions which may lessen the efficacy of the therapeutic regimen.

Abstract

Full Text

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Page 18: Text mining full text for molecular targets

• Introduction to Text Mining

• The Value of Full Text Articles

• Illustration of Text Mining Full Text Articles

• Recap

• Q&A

Agenda

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Page 19: Text mining full text for molecular targets

• Use Elsevier Text Mining solution to search against corpus of biomedical literature

• Abstracts – MEDLINE/PubMed (24M)

• Full text – PubMed Central, Elsevier and partner publishers (4M)

• Refine results corpus, redefine search / text mining output

• Review and analyze data

• Create visual data reports using other tools available

Methods

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Page 20: Text mining full text for molecular targets

Search against scientific literature corpus for sentences related to efficacy

If looking for details, one really needs to look at the full text results

Text Mining Abstracts vs Full Text

Word clouds suggest insight differences between abstracts and full text

Full textAbstracts Only

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Page 21: Text mining full text for molecular targets

Full text provides insights into the specific mutations implicated in differential enzymatic efficacy of

a particular drug class

Finding Molecular Targets in Full Text

Word clouds illustrating differences in point mutations mentioned

Full TextAbstracts Only

Gives insight into the mutations implicated for changes in efficacy.

No mutations mentioned in abstracts of comparable document set.

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Page 22: Text mining full text for molecular targets

Finding Molecular Targets in Full Text

Example searching for cancer immunity checkpoint proteins

Full text provides insights into additional protein targets that may be of interest for cancer

immunology research in cancer checkpoints

Full TextAbstracts Only

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Page 23: Text mining full text for molecular targets

Text Mining Results Can Then Be Used For Analyses

• Review results. Not just keyword matching anymore …

identifying more relevant documents for review

identifying relationships and precise statements

Identifying other targets/content of interest

• Link data to other items of interest

• Analytics, visualization and system/network analysis e.g. Pathway Studio,

Cytoscape

• Integrate text mining data and process into different workflows for project

quality and efficiency

Text mining results can be used to improve scientific research and can be

used to address business problems

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Page 24: Text mining full text for molecular targets

Text Mining Finds Answers Faster & Increases Efficiency

An Example Project Comparison

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Savings:

Text mining robustly identifies the relevant articles

Savings of 171 person-days per project

Allows more projects/higher quality with same staff

Keyword searching: Text Mining:

Finds 1,408 articles

Many of them not relevantIdentifies 142 relevant articles

176 person-days to review

@ 20 min/article

5 person-days to review

@ 20 min/article

VS

24

Writing comprehensive state of the science review article on the chemical toxicity of a particular

substance

Page 25: Text mining full text for molecular targets

Relationship map using Elsevier Text Mining

results into Cytoscape visualization

NLP

Example of Visual Insights of Text Mining Results

Intersecting adverse events between two anti-TNF drugs

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Page 26: Text mining full text for molecular targets

Summary

• Text mining can help to sift through large amounts of scientific literature and other

textual content

• Text mining can help to increase project team efficiency to find precise statements

and relationships

• Full text articles provide richer result sets that can be useful in finding additional

insights that cannot be garnered just using abstracts

• Several hurdles still exist to implement text mining but the value can outweigh costs

Text mining full text can be used to help find molecular targets of

interest quickly that may be missed if relying on abstracts and

keyword searching

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Page 27: Text mining full text for molecular targets

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang

Thank you for joining our webinar today:

Text Mining Full Text for Molecular Targetswith George Jiang, Ph.D., M.B.A

If you have any questions for our speaker, please type them into

the CHAT window.

If you would like more information you can contact:

George Jiang

[email protected]

Text Mining Full Text for Molecular Targets – March 31, 2015 – George Jiang