extracting chemical- protein interactions using long short-term memory networks · 2018-11-26 ·...

19
Extracting Chemical- Protein Interactions using Long Short-Term Memory Networks Sérgio Matos [email protected] http://bioinformatics.ua.pt BioCreative VI Workshop Bethesda, 18-20 October 2017

Upload: others

Post on 21-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Extracting Chemical- Protein Interactions using Long Short-Term Memory Networks · 2018-11-26 · Results Run Development Test Precision Recall F-Score Precision Recall F-Score 1

Extracting Chemical-Protein Interactions using Long Short-Term Memory Networks

Sérgio [email protected]://bioinformatics.ua.pt

BioCreative VI WorkshopBethesda, 18-20 October 2017

Page 2: Extracting Chemical- Protein Interactions using Long Short-Term Memory Networks · 2018-11-26 · Results Run Development Test Precision Recall F-Score Precision Recall F-Score 1

Problem

• Detect relations between chemical compounds/drug and genes/proteins in PubMed abstracts

• Gold-standard entities provided

• Five relation classes

Page 3: Extracting Chemical- Protein Interactions using Long Short-Term Memory Networks · 2018-11-26 · Results Run Development Test Precision Recall F-Score Precision Recall F-Score 1

MethodsOverview

• Multi-class classification

• Each possible chemical-gene pair in each sentence considered as an instance

• Labeled as one of five classes or negative

• Dependency features + linear sentence features

Page 4: Extracting Chemical- Protein Interactions using Long Short-Term Memory Networks · 2018-11-26 · Results Run Development Test Precision Recall F-Score Precision Recall F-Score 1

MethodsOverview

• Multi-class classification

• Each possible chemical-gene pair in each sentence considered as an instance

• Labeled as one of five classes or negative

• Dependency features + linear sentence features

Page 5: Extracting Chemical- Protein Interactions using Long Short-Term Memory Networks · 2018-11-26 · Results Run Development Test Precision Recall F-Score Precision Recall F-Score 1

MethodsPreprocessing

• Extract each chemical-protein pair• Extract shortest path

– TEES (https://github.com/jbjorne/TEES) - BLLIP parser– Datasets preprocessed using command line tool: XML output– Create sentence graph and extract SP (NetworkX)– Gold-standard entity annotations mapped to tokens– Use head word for multi-word entity mentions

Page 6: Extracting Chemical- Protein Interactions using Long Short-Term Memory Networks · 2018-11-26 · Results Run Development Test Precision Recall F-Score Precision Recall F-Score 1

MethodsFeatures

1) wordsinshortestpath– entitiesblinded2) POStagsofwordsinshortestpath3) shortestpathdependencies4) upto30wordsbeforethefirstentity5) thewordsbetweentheentities6) upto30wordsafterthesecondentity

Page 7: Extracting Chemical- Protein Interactions using Long Short-Term Memory Networks · 2018-11-26 · Results Run Development Test Precision Recall F-Score Precision Recall F-Score 1

MethodsFeatures

1) chemical effectscomparedthosediclofenacinhibitorgene2) NNNNSVBNDTNNNNNN3) prep_of nsubjpass prep_with prep_of appos nn4) theeffectsof5) werecomparedwiththoseofdiclofenacanonselective6) inhibitor

Page 8: Extracting Chemical- Protein Interactions using Long Short-Term Memory Networks · 2018-11-26 · Results Run Development Test Precision Recall F-Score Precision Recall F-Score 1

Data

Training Development Test

Documents 1020 612 3334Sentences 10309 6175 33854Chemical 13017 8004 44066Gene 12735 7563 41072

Instances Training Development Test

Total 11953 7653 40887upregulation/activation 595 432

downregulation/inhibition 1827 941agonist 140 104

antagonist 206 182substrate 624 401

Page 9: Extracting Chemical- Protein Interactions using Long Short-Term Memory Networks · 2018-11-26 · Results Run Development Test Precision Recall F-Score Precision Recall F-Score 1

Deep learning classifier

shortestpath

words

Bi-LSTMDropoutWord

embeddingshortestpath

POStags

Bi-LSTMDropoutPOS

embedding Fullyconnected

Outputlayer

shortestpath

depe

nden

cies

Bi-LSTMDropoutDependencyembedding

64unitsdropout=0.1

64unitsdropout=0.1

64unitsdropout=0.1

0.2

0.2

0.2

Page 10: Extracting Chemical- Protein Interactions using Long Short-Term Memory Networks · 2018-11-26 · Results Run Development Test Precision Recall F-Score Precision Recall F-Score 1

Embeddings

• Word embeddings– Word2vec (Gensim)– 15 million MEDLINE abstracts – simple tokenization– ~775k words– 6 model parameters

• window = 5/20/50• vector size = 100/300

• POS embeddings (size 200, random init)

• Dependency embeddings (300, random init)

Page 11: Extracting Chemical- Protein Interactions using Long Short-Term Memory Networks · 2018-11-26 · Results Run Development Test Precision Recall F-Score Precision Recall F-Score 1

Run configurations

Run

Configuration

Dependencyfeatures SentencefeaturesClass

weightsWord POS Dep Left Middle Right

1 x x x x

2 x x x x x

3 x x x

4 x x x x

5 x x x x x x x

Page 12: Extracting Chemical- Protein Interactions using Long Short-Term Memory Networks · 2018-11-26 · Results Run Development Test Precision Recall F-Score Precision Recall F-Score 1

Results

RunDevelopment Test

Precision Recall F-Score Precision Recall F-Score

1 0,6547 0,5403 0,5919 0,6419 0,2577 0,3677

2 0,4856 0,6221 0,5449 0,5156 0,4670 0,4901

3 0,6334 0,5126 0,5664 0,5919 0,2403 0,3418

4 0,4310 0,6092 0,5047 0,4024 0,4193 0,4107

5 0,4999 0,6074 0,5470 0,5738 0,4722 0,5181

Page 13: Extracting Chemical- Protein Interactions using Long Short-Term Memory Networks · 2018-11-26 · Results Run Development Test Precision Recall F-Score Precision Recall F-Score 1

Results

RunDevelopment Test

Precision Recall F-Score Precision Recall F-Score

1 0,6547 0,5403 0,5919 0,6419 0,2577 0,3677

2 0,4856 0,6221 0,5449 0,5156 0,4670 0,4901

3 0,6334 0,5126 0,5664 0,5919 0,2403 0,3418

4 0,4310 0,6092 0,5047 0,4024 0,4193 0,4107

5 0,4999 0,6074 0,5470 0,5738 0,4722 0,5181

LinearSVM~0.25F-scoreusing1+2gramsofsamefeatures

Page 14: Extracting Chemical- Protein Interactions using Long Short-Term Memory Networks · 2018-11-26 · Results Run Development Test Precision Recall F-Score Precision Recall F-Score 1

Results

CPR:0 CPR:3 CPR:4 CPR:5 CPR:6 CPR:9

CPR:0 292 433 46 77 234

CPR:3 147 258 30 0 1 1

CPR:4 271 19 661 1 4 5

CPR:5 40 1 0 70 0 0

CPR:6 24 0 2 2 122 0

CPR:9 158 3 5 0 0 223

Page 15: Extracting Chemical- Protein Interactions using Long Short-Term Memory Networks · 2018-11-26 · Results Run Development Test Precision Recall F-Score Precision Recall F-Score 1

Results

CPR:0 CPR:3 CPR:4 CPR:5 CPR:6 CPR:9

CPR:0 292 433 46 77 234

CPR:3 147 64% 30 0 1 1

CPR:4 271 19 71% 1 4 5

CPR:5 40 1 0 64% 0 0

CPR:6 24 0 2 2 84% 0

CPR:9 158 3 5 0 0 59%

Precision59%- 84%

Page 16: Extracting Chemical- Protein Interactions using Long Short-Term Memory Networks · 2018-11-26 · Results Run Development Test Precision Recall F-Score Precision Recall F-Score 1

Results

CPR:0 CPR:3 CPR:4 CPR:5 CPR:6 CPR:9

CPR:0 53% 40% 40% 39% 51%CPR:3 147 258 30 0 1 1

CPR:4 271 19 661 1 4 5

CPR:5 40 1 0 70 0 0

CPR:6 24 0 2 2 122 0

CPR:9 158 3 5 0 0 223

Recall47%- 61%

Page 17: Extracting Chemical- Protein Interactions using Long Short-Term Memory Networks · 2018-11-26 · Results Run Development Test Precision Recall F-Score Precision Recall F-Score 1

Conclusions / Future

Page 18: Extracting Chemical- Protein Interactions using Long Short-Term Memory Networks · 2018-11-26 · Results Run Development Test Precision Recall F-Score Precision Recall F-Score 1

Conclusions / Future

• Error analysis

• Different kernels

• Network topology

• Hyper-parameters

Page 19: Extracting Chemical- Protein Interactions using Long Short-Term Memory Networks · 2018-11-26 · Results Run Development Test Precision Recall F-Score Precision Recall F-Score 1

Thank you!

Sérgio [email protected]://bioinformatics.ua.pt

BioCreative VI WorkshopBethesda, 18-20 October 2017