leveraging network biology to drive drug discovery...may 20, 2020 · drug discovery why waste time...

Leveraging network biology to drive drug discovery CBDD Consortium focused on implementation of systems biology algorithms

Alex Ivliev, PhDDirector Bioinformatics

Alex Ishkin, PhDSenior Research Scientist

© 2020 Clarivate

2

• Academic groups around the world develop a large number of advanced computational biology methods

• Application of such methods can substantially accelerate further academic research

Computational biology tools for drug discovery

Algorithm development

Multiple academic

groups

and publication

Tools

Consulting Services© 2020 Clarivate

3

• Academic groups around the world develop a large number of advanced computational biology methods

• Application of such methods can substantially accelerate further academic research



Network databases

Omics data

Academia & pharmaMultiple

academic groups

and publication

New discoveriesTools


4

Computational biology tools for drug discoveryChallenges:

• Too many algorithms are published every month to keep track of, benchmark and select for further use

• Steep learning curve• “Quick and dirty”

implementations

• Disparate languages• Poor documentation


and publication

Network databases

Omics data

ToolsNew

discoveries

Multiple academic

groups

Academia & pharma


5


This creates a significant gap between algorithm publishing and their adoption and use to

drive actual research

Network databases

Omics data

Challenges:



implementations


Multiple academic

groupsNew

discoveries

Academia & pharma


6

Computational biology tools for drug discovery Why waste time and resources on bridging this gap

individually when global consortium model offers effective solution for all?

Challenges:



implementations



7

Computational biology tools for drug discoveryUse cases:

• Target ID• Biomarker ID• Patient stratification• MoA reconstruction• Drug combinations• Indication discovery• Drug repositioning


and publication

• Algorithm selection• Implementation• Benchmarking• Documentation

Network databases

Omics data

New discoveries

Multiple academic

groups

Academia & pharma


8




and publication

Broad coverage of data type and

use casesSupported and used by leading

pharma companies

Network databases

Omics data

• Algorithm selection• Implementation• Benchmarking• Documentation New

discoveries

Multiple academic

groups

Academia & pharma


9




and publication

Network databases

Omics data

• Algorithm selection• Implementation• Benchmarking• Documentation

300+ algorithms evaluated and 59 algorithms implemented

5 years of FTE time in

development

New discoveries

Multiple academic

groups

Academia & pharma


CBDD consortium members• 3 companies in 2014• 17 companies in 2020

https://cbdd.clarivate.com

10Consulting Services© 2020 Clarivate

https://cbdd.clarivate.com/

R package Documentation Training Additional• Algorithms• Documentation• Tutorials• Datasets

• User manual• Tutorials and case

studies• Performance evaluation

11

CBDD deliverables

• Workshops• Case studies

• CBDD website


Accessibility Performance General approaches & flexibility

IO unification & generalityMain dissemination form

is an R packageComputationally intensive parts are done in Java

“Swiss army knife” of systems biology

• Consistent inputs and outputs to easily build seamless workflows

• Works with public and proprietary networks and datasets

IO = input & output

12

Development approach and data model


13

CBDD use cases


14

Value of CBDD implementations over original implementations


15

Algorithms and real-life applications


Build network Analyze data Interpret results Beyond the R package• Existing networks or

pathways• Data-driven networks• Network adjustments

and merging• Import and prepare

omics data

• Node prioritization• Subnetwork

prioritization• Pathway prioritization• Integration• Unsupervised• Crosstalk• Heterogeneous

16

CBDD structure and status

• Visualization• Benchmark & compare• Network comparison

• CBDD web application• GUI• API• Community• Learning materials• Documentation


17

Import and export capabilities

Tab-delimited text filesXGMMLCX (Cytoscape JSON)

Clarivate’s MetaBaseReactome and other public pathway databases

NDEx network warehouse

Advanced visualization


• Typically gathered from literature (MetaBaseTM, IntAct)

• Pro:

– Contain curated interactions (high confidence)

• Contra:

– Inspection bias

• Also, contain all interactions ever found in any biological context. Are we sure that node X is ever expressed in tissue Y?

18

Scaffold (curated) networks

Existing data sets

• Preprocessed networks supplied with package

• Generic text file import– Network content from other vendors

– Internal content

– Public interaction DBs

• MetaBase (if available)


• Builds on the similarities of gene behavior (e.g. co-expression)

• Pro: approximately unbiased (it might discover interactions where experimenters did not even care to look at)

• Contra: false positives (e.g. indirect interactions) – the biggest problem and an area of active research

19

Data-driven reconstruction

Data-driven networks• Correlation network• ARACNE• Kueffner et al. (networkAnova)• SCODE (SC)• Ocone et al. (SC)

Gene expression data

Interactions

Expression correlations

=


20

Data-driven networks: ARACNE

• Input: Gene expression data• Workflow:

– Compute mutual information between expression profiles

– Prune indirect edges (no need to keep edges between nodes connected via high-confidence paths)

• Output: purely data-driven, undirected network

PMID: 16723010Consulting Services

© 2020 Clarivate

21

Best of all worldsIntegrate network datasets

Gene coexpression

Scaffold network(PPI database)

Text mining

STRING database is a good example• Predicted interactions integrating diverse

evidence– Full networks from STRING for human,

mouse and rat are supplied with CBDD


22

Adjust the (scaffold) network using the status of nodes (i.e. expressed or not in given biological system)Network adjustment (contextualization)

Option 1: node filtering

• Remove genes that are not expressed in a particular context

• This may be too harsh

Option 2: edge reweighting

• Weight interactions based on the presence of their adjacent nodes

• Less radical changes



pathways• Data-driven networks• Import and prepare data• Network modifications• Existing datasets



23





• Find network regulators for a set of nodes of interest using guilt by association approach

• Inputs:– Network + nodes

of interest• Outputs:

– Ranked list of nodes

24

Node prioritization

Research applications:• Protein function prediction• Disease-related gene prediction

Industry applications:• Drug target prediction• Drug repositioning• Drug combinations• Biological mechanism

reconstruction

Tools Description Example

‘Local’ approaches Use local neighborhood of start nodes MARINa

‘Global’ approaches Use entire topology of graph to prioritize regulators

Network propagation

‘Causal’ approaches Use directed network to infer causal regulation of differential expression

Causal reasoning

What kinds of algorithms are there

What are they used for?


• Inputs– Network(s); preferably directed, weighted and with edge types

– Start nodes (with expression changes)

• Workflow– For each regulator:

– Get ‘regulons’- directly regulated nodes (activated by TF and repressed by TF)

– Compute GSEA score for enrichment with start nodes

– Make correction for overlapping target spaces (shadowing; synergy)

• Results– Ranked list of ‘master’ regulators

Node prioritization: MARINa

PMID: 20531406Consulting Services

© 2020 Clarivate

• Goal:– FGFR2 is a GWAS risk locus for breast cancer , and the

study was exploring mechanisms of FGF signaling in disease.

• Workflow:– ARACNe applied to build network of transcription

regulation edges

– Overconnectivity and MARINa were applied to find TFs responsible for FGFR2’s effects in cancer.

• Outcome– The SPDEF, ESR1, GATA3 and FOXA1 regulons are

consistently FGFR2-responsive

– Findings validated by shRNA experiments

26

Mechanism reconstruction in cancerResearch use case:


• Initially created for identification of protein functions and new disease genes from already known data

• Inputs: – Network– Disease-causing genes from OMIM (or any list of start nodes)– Disease-disease similarities are used in the original publication

(but not in CBDD)• Scoring: flow is propagated through the network from

start nodes, until a steady state is reached

• Results: ranked disease-associated nodes (by flow level)

27

Simple, global topology approachesNetwork propagation example

First iteration

Steady state

Vanunu et al., PLoS Comput Biol, 2009Consulting Services

© 2020 Clarivate

Establishing a pipeline for systematic indication prioritizationIndustry use case

Client

Aim • Before the project, Boehringer Ingelheim (BI) prioritized indications manually and not for every target of interest

• We aimed to build analytical pipeline for indication selection

Solution • Using its indication prioritization methods, Clarivate helped BI’s team to develop an analytical pipeline for automated and systematic indication prioritization

• The pipeline combined: (1) network and pathway information from MetaBaseTM, (2) disease biomarker content from IntegritySM and public sources, (3) OMICs data and (4) advanced analytics developed within CBDD

Results • Indication prioritization is automated and is run systematically• It is mandatory for BI computational biology team to perform

systematic indication prioritization once a project reaches lead optimization stage

28

Presented by Dr Elia Stupka at Bio-IT, 2017


• Inputs:– Network (directed, with effects of edges)– Start nodes and directions of change

• For each node H in network:– Make assumption about role of H (say, H is aberrantly activated in

phenotype X)– Predict activity changes for genes which are influenced by H– Compare actual activity changes (e.g. expression fold change) with

predictions• Scoring criteria

– Score (#correct - #incorrect predictions)– The more disease-specific genes explained by H, the better

(Enrichment)– The more consistent predictions and real data, the better

(Concordance)

29

Causal reasoningFancier node prioritization example

Pollard et al., Diabetes Technology & Therapeutics 2005.Chindelevitch et al., Bioinformatics 2012.


• Identify sub-networks/pathways based on network analysis of phenotypes-specific data (OMICs, small-scale experiment)

• Inputs (usual):– Network + nodes

of interest• Outputs:

– Ranked subnetworks

30

Subnetwork / pathway prioritization

Research applications:• Protein complex identification• Biological mechanism

reconstruction

Industry applications:• Biomarker

discovery/stratification• Biological mechanism

reconstruction


‘Dense’ approaches Find densely connected subnetworks DENSE

‘Enrichment’ approaches

Find subnetworks/pathways enriched with nodes of interest

Active modules

‘Biomarker’ approaches

Find subnetworks/pathways which reliably differ in expression between two phenotypes

Subnetwork markers




• ‘Grandfather’ of subnetwork algorithms

• Inputs: – network;

– start nodes with differential expression p-values

• Workflow:– Assign nodes z-scores

– Use simulated annealing to find subnetwork with high overall score

• Output: high-scoring subnetwork

31

ActiveModulesSubnetwork algorithm example


© 2020 Clarivate

Case study catalogue

e© 2020 Clarivate

Case study for the drug combinations servicesIdentification of synergistic drug combinations using subnetworks

Client

Aim Establish an analysis pipeline that predicts (i) the efficacy of drug combinations better than known methods and (ii) the potential mechanism of action

Solution • Identified treatment-specific networks for individual drugs• Combined individual networks to predict the effect of pairwise

combinations

Results • Pipeline that outperforms the best method reported by DREAM benchmarking challenge (Bansal et al., Nat. Biotechnoly 2014)

• Experimental validation: prediction was consistent with Takeda experimental results

• Takeda uses the pipeline to prioritize drug combinations for further experimental validation

32

Results

InputsData for each drug: 1) Pre and post treatment gene expression data2) Treatment response data

Prioritized combinations

Potential MOA

Pipeline:Network and Statistical analysis

Drug 1 Drug 2 Combination

https://www.ncbi.nlm.nih.gov/pubmed/25419740

33

Step 1: identify phenotype-specific subnetworks

Subnetwork biomarkersStep 2: use subnetworks as features for classification


Case study catalogue © 2020 Clarivate

Case study for the patient stratification and biomarker discovery servicesIdentification of multiple sclerosis molecular subtypes

Client

Aim • Aim: Identify MS subtypes and biomarkers for the subtypes• Challenge: MS is an heterogeneous disease

Solution • As input we used gene expression profiles from ~200 MS patients from the CLIMB longitudinal study

• Applied statistical and pathway analysis of the input data

Results • 4 molecular patient subtypes significantly associated with disease prognosis

• Identification of disease prognosis biomarkers

34

Read more about the project here:Orion Bionetworks - Turning Big Data Knowledge

http://www.msdiscovery.org/news/blogs/19971-orion-conference-turning-big-data-knowledge

• Identification of phenotype-specific sub-networks/pathways connecting the data points from distinct phenotypes-specific data sets (OMICs, small-scale experiment)

35

Multi-omics analysis

Research applications:• Oncology: ‘drivers’• Exploration of disease

heterogeneity• eQTL mechanism elucidation

Industry applications:• Biomarker

discovery/stratification• Biological mechanism

reconstruction• Drug target identification• Stratification• Drug combinations


‘Cause-consequence’

Find subnetwork connecting two sets of omics-derived nodes of interest

TieDie; ResponseNet

‘Stratification’ approaches

Joint clustering of multiple omics data sets PARADIGM; CIMLR




• Huge collection of consistently profiled multi-omics data from cancer patients

• Applications– Driver gene and pathway discovery

– Disease heterogeneity exploration

• Invented or inspired many tools for:– Patient stratification

– Multi-omics data analysis

36

TCGA


• Input:

– Two start sets: causes and targets (cause-consequence philosophy)

• Workflow:

– Directed approach: downward network propagation from causes, upward network propagation from targets

• Output:

– Modules of interest, connecting sources and targets

37

TieDieSimple integration algorithm example

Paul et al., Bioinformatics, 2013Consulting Services

© 2020 Clarivate

• Workflow:– NBS (network-based stratification of somatic mutation

patterns) to identify two thyroid carcinoma subtypes

– Identification of likely drivers for them (RAS family vs. BRAF)

– Differential expression analysis between subtypes

– GSEA on TF target gene sets to find expression regulators

– TieDie connects 1) driver mutations to 2) differentially expressed proteins from RPPA data set and 3) master regulators of differentially expressed genes.

• Result: – A signaling pathway highlighting ERK signaling and RHEB

protein is reported as crucial difference between subtypes.

38

Use case: stratification + multi-omics analysis in cancer


• Inputs:– Network/pathway (directed, with effects and mechanisms

of interactions)

– Data matrices for several data types

• Workflow:– Build probabilistic graphical model based on pathway

structure

– Use trained BN to infer the true (‘hidden’) activity changes of genes and proteins given observed data

• Output:

– Trained model;

– Matrix with node activity estimates for each patient and each pathway

39

Example of an integration algorithmPARADIGM

Vaske et al., Bioinformatics, 2010Consulting Services

© 2020 Clarivate

40

Single-cell specific tools

Algorithms Function

• netSmooth Data imputation

• scID Cell type mapping

• Ocone et al.• SCODE

Network inference

• SCENIC• ACTION

• Network inference• Subnetwork identification• Cell clustering

• PAGODA• ClusterMine

• Pathway analysis• Cell clustering

• cellPhoneDB• PyMINEr• iTALK

• Cell-cell communication

Case study: examples on real data + infrastructure for preprocessing of single-cell data (e.g. normalization / clustering / dimensionality reduction / lineage tracing. There is plenty of tools available for that


Single-cell subtype: cell-cell communications

• Tools:– iTALK

– pyMINER

– CellPhoneDB

• Similar workflow in all of them:– Take clusters of cells

– Take DB of ligand-receptor interactions

– Evaluate pairs of ligand (expressed in cell A) vs. receptor (expressed in cell type B)


42

Algorithms dealing with GWAS data

Algorithms Function

DEPICT • Node prioritization• Pathway prioritization

DAPPLE • Subnetwork identitfication• Node prioritization

PASCAL Pathway analysis

Case study: examples on real data + infrastructure for preprocessing of GWAS results:• SNP to gene mapping• Dealing with LD and reference populations


Case study catalogue © 2020 Clarivate

Case study for the target identification and drug repositioning servicesDrug target discovery from GWAS loci in Parkinson’s disease

Client

Aim Build a pipeline for the identification of causal genes and drug targets within GWAS loci using PD as a test case

Solution • Combined use of multiple approaches such as: • Network connectivity with known disease drivers• Pathway analysis• Protein subcellular localization• Human and mouse phenotypes• Co-expression with known disease drivers• Gene differential expression• Programmatic literature mining• Disease biomarker knowledge

• Applied data integration through machine learning analysis

Results Causal genes and targets with detailed supportive evidence

43

GWAS Loci


pathways• Data-driven networks• Import and prepare data• Network modifications• Existing datasets



44





45Consulting Services© 2020 Clarivate

Technology:•htmlwidgets R package•cytoscape.js

Network viewer

Context-specific highlight

Control of: • Edge and vertex attributes• Layout

Annotation support

Data overlay support(support for multiple data layers)

Biological interpretation support

Also:• Export (PNG, XGMML)• Node grouping

features

Thank you!

Taylor [email protected](646) 585-1742https://clarivate.com/cortellis/cbdd/

46Consulting Services

For questions on CBDD:

© 2020 Clarivate. All rights reserved. Republication or redistribution of Clarivate content, including by framing or similar means, is prohibited without the prior written consent of Clarivate. Clarivate and its logo, as well as all other trademarks used herein are trademarks of their respective owners and used under license.

https://clarivate.com/cortellis/cbdd/

Slide Number 1Computational biology tools for drug discoveryComputational biology tools for drug discoveryComputational biology tools for drug discoveryComputational biology tools for drug discoveryComputational biology tools for drug discoveryComputational biology tools for drug discoveryComputational biology tools for drug discoveryComputational biology tools for drug discoveryCBDD consortium membersCBDD deliverablesDevelopment approach and data modelCBDD use casesValue of CBDD implementations over original implementationsSlide Number 15CBDD structure and statusImport and export capabilitiesScaffold (curated) networksData-driven reconstructionData-driven networks: ARACNEIntegrate network datasetsNetwork adjustment (contextualization)CBDD structure and statusNode prioritizationNode prioritization: MARINaResearch use case:Network propagation exampleIndustry use caseFancier node prioritization exampleSubnetwork / pathway prioritizationSubnetwork algorithm exampleIdentification of synergistic drug combinations using subnetworksSubnetwork biomarkersIdentification of multiple sclerosis molecular subtypesMulti-omics analysisTCGASimple integration algorithm exampleUse case: stratification + multi-omics analysis in cancerPARADIGMSingle-cell specific toolsSingle-cell subtype: cell-cell communicationsAlgorithms dealing with GWAS dataDrug target discovery from GWAS loci in Parkinson’s diseaseCBDD structure and statusNetwork viewerSlide Number 46

leveraging network biology to drive drug discovery...may 20, 2020 · drug discovery why waste time...

Documents