imputation of assay activity data using deep learninggjc29/talks/basel2019.pdf · 2019-01-25 ·...

Post on 18-Jul-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Imputation of assay activity datausing deep learning

Tom Whitehead, Peter Hunt, Matt Segall, Gareth Conduit

Alchemite™ machine learning tool to

Reduce the need for experiments and accelerate drug discovery

Utilise all available information: computer simulations and real-life measurements

Impute values from sparse data

Broadly applicable with proven applications in drug design and materials discovery

Novartis dataset to benchmark machine learning

159 kinase proteins, 10000 compounds, data 5% complete

159proteins

100

00

com

po

un

ds

Data from ChEMBLMartin, Polyakov, Tian, and Perez,

J. Chem. Inf. Model. 57, 2077 (2017)

Validate imputation of missing entries

Realistically split holdout data set, extrapolate to new chemical space

Impute missing entries in new chemical space

Random Realistic

Data from ChEMBLMartin, Polyakov, Tian, and Perez,

J. Chem. Inf. Model. 57, 2077 (2017)

Training

Validation

QSAR: quantitative structure-activity relationships

x3 x1 x1

Molecular weight=183 Da

Train off one column at a time

Standard methods learn descriptor-protein correlations

Train and predict one column at a time

Alchemite™ uses all available data

Include protein-protein correlations

Validate imputation of missing entries

Realistically split holdout data set, extrapolate to new chemical space, and calculate the accuracy

Alchemite™ outperforms other methods

Alchemite™

Random forest

pQSAR2

Neural net

Matrix factorize

1 10 100

-0.2

0

0.2

0.4

0.6

0.8

1

% of missing data imputed

Acc

ura

cy

Calculate probability distribution

Activity

Pro

bab

ility

de

nsi

ty

Mean prediction

Less confident prediction

Activity

Pro

bab

ility

de

nsi

ty

Activity

Mean prediction

Focus on most confident predictions

Activity

Pro

bab

ility

de

nsi

ty

Activity

Mean prediction

Reporting on only most confident predictions

Alchemite™

Random forest

pQSAR2

Neural net

Matrix factorize

1 10 100

-0.2

0

0.2

0.4

0.6

0.8

1

% of missing data imputed

Acc

ura

cy

1 10 100

-0.2

0

0.2

0.4

0.6

0.8

1

% of missing data imputed

Acc

ura

cySelect performance level

Alchemite™

Random forest

pQSAR2

Neural net

Matrix factorize

Taking Alchemite™ to market

Summary

Alchemite™ trains across all endpoints to capture

activity-activity correlations

Understand and exploit probability distribution to focus on most confident results

Impute results of missing assays to high accuracy, enabling computational screening of compounds to

identify new hits

Take Alchemite™ to market with Optibrium

gareth@intellegens.ai

top related