imputation of assay activity data using deep learninggjc29/talks/basel2019.pdf · 2019-01-25 ·...

18
Imputation of assay activity data using deep learning Tom Whitehead, Peter Hunt, Matt Segall, Gareth Conduit

Upload: others

Post on 18-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Imputation of assay activity data using deep learninggjc29/Talks/Basel2019.pdf · 2019-01-25 · Impute values from sparse data Broadly applicable with ... Validate imputation of

Imputation of assay activity datausing deep learning

Tom Whitehead, Peter Hunt, Matt Segall, Gareth Conduit

Page 2: Imputation of assay activity data using deep learninggjc29/Talks/Basel2019.pdf · 2019-01-25 · Impute values from sparse data Broadly applicable with ... Validate imputation of

Alchemite™ machine learning tool to

Reduce the need for experiments and accelerate drug discovery

Utilise all available information: computer simulations and real-life measurements

Impute values from sparse data

Broadly applicable with proven applications in drug design and materials discovery

Page 3: Imputation of assay activity data using deep learninggjc29/Talks/Basel2019.pdf · 2019-01-25 · Impute values from sparse data Broadly applicable with ... Validate imputation of

Novartis dataset to benchmark machine learning

159 kinase proteins, 10000 compounds, data 5% complete

159proteins

100

00

com

po

un

ds

Data from ChEMBLMartin, Polyakov, Tian, and Perez,

J. Chem. Inf. Model. 57, 2077 (2017)

Page 4: Imputation of assay activity data using deep learninggjc29/Talks/Basel2019.pdf · 2019-01-25 · Impute values from sparse data Broadly applicable with ... Validate imputation of

Validate imputation of missing entries

Realistically split holdout data set, extrapolate to new chemical space

Page 5: Imputation of assay activity data using deep learninggjc29/Talks/Basel2019.pdf · 2019-01-25 · Impute values from sparse data Broadly applicable with ... Validate imputation of

Impute missing entries in new chemical space

Random Realistic

Data from ChEMBLMartin, Polyakov, Tian, and Perez,

J. Chem. Inf. Model. 57, 2077 (2017)

Training

Validation

Page 6: Imputation of assay activity data using deep learninggjc29/Talks/Basel2019.pdf · 2019-01-25 · Impute values from sparse data Broadly applicable with ... Validate imputation of

QSAR: quantitative structure-activity relationships

x3 x1 x1

Molecular weight=183 Da

Page 7: Imputation of assay activity data using deep learninggjc29/Talks/Basel2019.pdf · 2019-01-25 · Impute values from sparse data Broadly applicable with ... Validate imputation of

Train off one column at a time

Standard methods learn descriptor-protein correlations

Page 8: Imputation of assay activity data using deep learninggjc29/Talks/Basel2019.pdf · 2019-01-25 · Impute values from sparse data Broadly applicable with ... Validate imputation of

Train and predict one column at a time

Page 9: Imputation of assay activity data using deep learninggjc29/Talks/Basel2019.pdf · 2019-01-25 · Impute values from sparse data Broadly applicable with ... Validate imputation of

Alchemite™ uses all available data

Include protein-protein correlations

Page 10: Imputation of assay activity data using deep learninggjc29/Talks/Basel2019.pdf · 2019-01-25 · Impute values from sparse data Broadly applicable with ... Validate imputation of

Validate imputation of missing entries

Realistically split holdout data set, extrapolate to new chemical space, and calculate the accuracy

Page 11: Imputation of assay activity data using deep learninggjc29/Talks/Basel2019.pdf · 2019-01-25 · Impute values from sparse data Broadly applicable with ... Validate imputation of

Alchemite™ outperforms other methods

Alchemite™

Random forest

pQSAR2

Neural net

Matrix factorize

1 10 100

-0.2

0

0.2

0.4

0.6

0.8

1

% of missing data imputed

Acc

ura

cy

Page 12: Imputation of assay activity data using deep learninggjc29/Talks/Basel2019.pdf · 2019-01-25 · Impute values from sparse data Broadly applicable with ... Validate imputation of

Calculate probability distribution

Activity

Pro

bab

ility

de

nsi

ty

Mean prediction

Page 13: Imputation of assay activity data using deep learninggjc29/Talks/Basel2019.pdf · 2019-01-25 · Impute values from sparse data Broadly applicable with ... Validate imputation of

Less confident prediction

Activity

Pro

bab

ility

de

nsi

ty

Activity

Mean prediction

Page 14: Imputation of assay activity data using deep learninggjc29/Talks/Basel2019.pdf · 2019-01-25 · Impute values from sparse data Broadly applicable with ... Validate imputation of

Focus on most confident predictions

Activity

Pro

bab

ility

de

nsi

ty

Activity

Mean prediction

Page 15: Imputation of assay activity data using deep learninggjc29/Talks/Basel2019.pdf · 2019-01-25 · Impute values from sparse data Broadly applicable with ... Validate imputation of

Reporting on only most confident predictions

Alchemite™

Random forest

pQSAR2

Neural net

Matrix factorize

1 10 100

-0.2

0

0.2

0.4

0.6

0.8

1

% of missing data imputed

Acc

ura

cy

Page 16: Imputation of assay activity data using deep learninggjc29/Talks/Basel2019.pdf · 2019-01-25 · Impute values from sparse data Broadly applicable with ... Validate imputation of

1 10 100

-0.2

0

0.2

0.4

0.6

0.8

1

% of missing data imputed

Acc

ura

cySelect performance level

Alchemite™

Random forest

pQSAR2

Neural net

Matrix factorize

Page 17: Imputation of assay activity data using deep learninggjc29/Talks/Basel2019.pdf · 2019-01-25 · Impute values from sparse data Broadly applicable with ... Validate imputation of

Taking Alchemite™ to market

Page 18: Imputation of assay activity data using deep learninggjc29/Talks/Basel2019.pdf · 2019-01-25 · Impute values from sparse data Broadly applicable with ... Validate imputation of

Summary

Alchemite™ trains across all endpoints to capture

activity-activity correlations

Understand and exploit probability distribution to focus on most confident results

Impute results of missing assays to high accuracy, enabling computational screening of compounds to

identify new hits

Take Alchemite™ to market with Optibrium

[email protected]