leveraging network biology to drive drug discovery...may 20, 2020 · drug discovery why waste time...
TRANSCRIPT
-
Leveraging network biology to drive drug discovery CBDD Consortium focused on implementation of systems biology algorithms
Alex Ivliev, PhDDirector Bioinformatics
Alex Ishkin, PhDSenior Research Scientist
© 2020 Clarivate
-
2
• Academic groups around the world develop a large number of advanced computational biology methods
• Application of such methods can substantially accelerate further academic research
Computational biology tools for drug discovery
Algorithm development
Multiple academic
groups
and publication
Tools
Consulting Services© 2020 Clarivate
-
3
• Academic groups around the world develop a large number of advanced computational biology methods
• Application of such methods can substantially accelerate further academic research
Computational biology tools for drug discovery
Algorithm development
Network databases
Omics data
Academia & pharmaMultiple
academic groups
and publication
New discoveriesTools
Consulting Services© 2020 Clarivate
-
4
Computational biology tools for drug discoveryChallenges:
• Too many algorithms are published every month to keep track of, benchmark and select for further use
• Steep learning curve• “Quick and dirty”
implementations
• Disparate languages• Poor documentation
Algorithm development
and publication
Network databases
Omics data
ToolsNew
discoveries
Multiple academic
groups
Academia & pharma
Consulting Services© 2020 Clarivate
-
5
Computational biology tools for drug discovery
This creates a significant gap between algorithm publishing and their adoption and use to
drive actual research
Network databases
Omics data
Challenges:
• Too many algorithms are published every month to keep track of, benchmark and select for further use
• Steep learning curve• “Quick and dirty”
implementations
• Disparate languages• Poor documentation
Multiple academic
groupsNew
discoveries
Academia & pharma
Consulting Services© 2020 Clarivate
-
6
Computational biology tools for drug discovery Why waste time and resources on bridging this gap
individually when global consortium model offers effective solution for all?
Challenges:
• Too many algorithms are published every month to keep track of, benchmark and select for further use
• Steep learning curve• “Quick and dirty”
implementations
• Disparate languages• Poor documentation
Consulting Services© 2020 Clarivate
-
7
Computational biology tools for drug discoveryUse cases:
• Target ID• Biomarker ID• Patient stratification• MoA reconstruction• Drug combinations• Indication discovery• Drug repositioning
Algorithm development
and publication
• Algorithm selection• Implementation• Benchmarking• Documentation
Network databases
Omics data
New discoveries
Multiple academic
groups
Academia & pharma
Consulting Services© 2020 Clarivate
-
8
Computational biology tools for drug discoveryUse cases:
• Target ID• Biomarker ID• Patient stratification• MoA reconstruction• Drug combinations• Indication discovery• Drug repositioning
Algorithm development
and publication
Broad coverage of data type and
use casesSupported and used by leading
pharma companies
Network databases
Omics data
• Algorithm selection• Implementation• Benchmarking• Documentation New
discoveries
Multiple academic
groups
Academia & pharma
Consulting Services© 2020 Clarivate
-
9
Computational biology tools for drug discoveryUse cases:
• Target ID• Biomarker ID• Patient stratification• MoA reconstruction• Drug combinations• Indication discovery• Drug repositioning
Algorithm development
and publication
Network databases
Omics data
• Algorithm selection• Implementation• Benchmarking• Documentation
300+ algorithms evaluated and 59 algorithms implemented
5 years of FTE time in
development
New discoveries
Multiple academic
groups
Academia & pharma
Consulting Services© 2020 Clarivate
-
CBDD consortium members• 3 companies in 2014• 17 companies in 2020
https://cbdd.clarivate.com
10Consulting Services© 2020 Clarivate
https://cbdd.clarivate.com/
-
R package Documentation Training Additional• Algorithms• Documentation• Tutorials• Datasets
• User manual• Tutorials and case
studies• Performance evaluation
11
CBDD deliverables
• Workshops• Case studies
• CBDD website
Consulting Services© 2020 Clarivate
-
Accessibility Performance General approaches & flexibility
IO unification & generalityMain dissemination form
is an R packageComputationally intensive parts are done in Java
“Swiss army knife” of systems biology
• Consistent inputs and outputs to easily build seamless workflows
• Works with public and proprietary networks and datasets
IO = input & output
12
Development approach and data model
Consulting Services© 2020 Clarivate
-
13
CBDD use cases
Consulting Services© 2020 Clarivate
-
14
Value of CBDD implementations over original implementations
Consulting Services© 2020 Clarivate
-
15
Algorithms and real-life applications
Consulting Services© 2020 Clarivate
-
Build network Analyze data Interpret results Beyond the R package• Existing networks or
pathways• Data-driven networks• Network adjustments
and merging• Import and prepare
omics data
• Node prioritization• Subnetwork
prioritization• Pathway prioritization• Integration• Unsupervised• Crosstalk• Heterogeneous
16
CBDD structure and status
• Visualization• Benchmark & compare• Network comparison
• CBDD web application• GUI• API• Community• Learning materials• Documentation
Consulting Services© 2020 Clarivate
-
17
Import and export capabilities
Tab-delimited text filesXGMMLCX (Cytoscape JSON)
Clarivate’s MetaBaseReactome and other public pathway databases
NDEx network warehouse
Advanced visualization
Consulting Services© 2020 Clarivate
-
• Typically gathered from literature (MetaBaseTM, IntAct)
• Pro:
– Contain curated interactions (high confidence)
• Contra:
– Inspection bias
• Also, contain all interactions ever found in any biological context. Are we sure that node X is ever expressed in tissue Y?
18
Scaffold (curated) networks
Existing data sets
• Preprocessed networks supplied with package
• Generic text file import– Network content from other vendors
– Internal content
– Public interaction DBs
• MetaBase (if available)
Consulting Services© 2020 Clarivate
-
• Builds on the similarities of gene behavior (e.g. co-expression)
• Pro: approximately unbiased (it might discover interactions where experimenters did not even care to look at)
• Contra: false positives (e.g. indirect interactions) – the biggest problem and an area of active research
19
Data-driven reconstruction
Data-driven networks• Correlation network• ARACNE• Kueffner et al. (networkAnova)• SCODE (SC)• Ocone et al. (SC)
Gene expression data
Interactions
Expression correlations
=
Consulting Services© 2020 Clarivate
-
20
Data-driven networks: ARACNE
• Input: Gene expression data• Workflow:
– Compute mutual information between expression profiles
– Prune indirect edges (no need to keep edges between nodes connected via high-confidence paths)
• Output: purely data-driven, undirected network
PMID: 16723010Consulting Services
© 2020 Clarivate
-
21
Best of all worldsIntegrate network datasets
Gene coexpression
Scaffold network(PPI database)
Text mining
STRING database is a good example• Predicted interactions integrating diverse
evidence– Full networks from STRING for human,
mouse and rat are supplied with CBDD
Consulting Services© 2020 Clarivate
-
22
Adjust the (scaffold) network using the status of nodes (i.e. expressed or not in given biological system)Network adjustment (contextualization)
Option 1: node filtering
• Remove genes that are not expressed in a particular context
• This may be too harsh
Option 2: edge reweighting
• Weight interactions based on the presence of their adjacent nodes
• Less radical changes
Consulting Services© 2020 Clarivate
-
Build network Analyze data Interpret results Beyond the R package• Existing networks or
pathways• Data-driven networks• Import and prepare data• Network modifications• Existing datasets
• Node prioritization• Subnetwork
prioritization• Pathway prioritization• Integration• Unsupervised• Crosstalk• Heterogeneous
23
CBDD structure and status
• Visualization• Benchmark & compare• Network comparison
• CBDD web application• GUI• API• Community• Learning materials• Documentation
Consulting Services© 2020 Clarivate
-
• Find network regulators for a set of nodes of interest using guilt by association approach
• Inputs:– Network + nodes
of interest• Outputs:
– Ranked list of nodes
24
Node prioritization
Research applications:• Protein function prediction• Disease-related gene prediction
Industry applications:• Drug target prediction• Drug repositioning• Drug combinations• Biological mechanism
reconstruction
Tools Description Example
‘Local’ approaches Use local neighborhood of start nodes MARINa
‘Global’ approaches Use entire topology of graph to prioritize regulators
Network propagation
‘Causal’ approaches Use directed network to infer causal regulation of differential expression
Causal reasoning
What kinds of algorithms are there
What are they used for?
Consulting Services© 2020 Clarivate
-
• Inputs– Network(s); preferably directed, weighted and with edge types
– Start nodes (with expression changes)
• Workflow– For each regulator:
– Get ‘regulons’- directly regulated nodes (activated by TF and repressed by TF)
– Compute GSEA score for enrichment with start nodes
– Make correction for overlapping target spaces (shadowing; synergy)
• Results– Ranked list of ‘master’ regulators
Node prioritization: MARINa
PMID: 20531406Consulting Services
© 2020 Clarivate
-
• Goal:– FGFR2 is a GWAS risk locus for breast cancer , and the
study was exploring mechanisms of FGF signaling in disease.
• Workflow:– ARACNe applied to build network of transcription
regulation edges
– Overconnectivity and MARINa were applied to find TFs responsible for FGFR2’s effects in cancer.
• Outcome– The SPDEF, ESR1, GATA3 and FOXA1 regulons are
consistently FGFR2-responsive
– Findings validated by shRNA experiments
26
Mechanism reconstruction in cancerResearch use case:
Consulting Services© 2020 Clarivate
-
• Initially created for identification of protein functions and new disease genes from already known data
• Inputs: – Network– Disease-causing genes from OMIM (or any list of start nodes)– Disease-disease similarities are used in the original publication
(but not in CBDD)• Scoring: flow is propagated through the network from
start nodes, until a steady state is reached
• Results: ranked disease-associated nodes (by flow level)
27
Simple, global topology approachesNetwork propagation example
First iteration
Steady state
Vanunu et al., PLoS Comput Biol, 2009Consulting Services
© 2020 Clarivate
-
Establishing a pipeline for systematic indication prioritizationIndustry use case
Client
Aim • Before the project, Boehringer Ingelheim (BI) prioritized indications manually and not for every target of interest
• We aimed to build analytical pipeline for indication selection
Solution • Using its indication prioritization methods, Clarivate helped BI’s team to develop an analytical pipeline for automated and systematic indication prioritization
• The pipeline combined: (1) network and pathway information from MetaBaseTM, (2) disease biomarker content from IntegritySM and public sources, (3) OMICs data and (4) advanced analytics developed within CBDD
Results • Indication prioritization is automated and is run systematically• It is mandatory for BI computational biology team to perform
systematic indication prioritization once a project reaches lead optimization stage
28
Presented by Dr Elia Stupka at Bio-IT, 2017
Consulting Services© 2020 Clarivate
-
• Inputs:– Network (directed, with effects of edges)– Start nodes and directions of change
• For each node H in network:– Make assumption about role of H (say, H is aberrantly activated in
phenotype X)– Predict activity changes for genes which are influenced by H– Compare actual activity changes (e.g. expression fold change) with
predictions• Scoring criteria
– Score (#correct - #incorrect predictions)– The more disease-specific genes explained by H, the better
(Enrichment)– The more consistent predictions and real data, the better
(Concordance)
29
Causal reasoningFancier node prioritization example
Pollard et al., Diabetes Technology & Therapeutics 2005.Chindelevitch et al., Bioinformatics 2012.
Consulting Services© 2020 Clarivate
-
• Identify sub-networks/pathways based on network analysis of phenotypes-specific data (OMICs, small-scale experiment)
• Inputs (usual):– Network + nodes
of interest• Outputs:
– Ranked subnetworks
30
Subnetwork / pathway prioritization
Research applications:• Protein complex identification• Biological mechanism
reconstruction
Industry applications:• Biomarker
discovery/stratification• Biological mechanism
reconstruction
Tools Description Example
‘Dense’ approaches Find densely connected subnetworks DENSE
‘Enrichment’ approaches
Find subnetworks/pathways enriched with nodes of interest
Active modules
‘Biomarker’ approaches
Find subnetworks/pathways which reliably differ in expression between two phenotypes
Subnetwork markers
What kinds of algorithms are there
What are they used for?
Consulting Services© 2020 Clarivate
-
• ‘Grandfather’ of subnetwork algorithms
• Inputs: – network;
– start nodes with differential expression p-values
• Workflow:– Assign nodes z-scores
– Use simulated annealing to find subnetwork with high overall score
• Output: high-scoring subnetwork
31
ActiveModulesSubnetwork algorithm example
Consulting Services© 2020 Clarivate
© 2020 Clarivate
-
Case study catalogue
e© 2020 Clarivate
Case study for the drug combinations servicesIdentification of synergistic drug combinations using subnetworks
Client
Aim Establish an analysis pipeline that predicts (i) the efficacy of drug combinations better than known methods and (ii) the potential mechanism of action
Solution • Identified treatment-specific networks for individual drugs• Combined individual networks to predict the effect of pairwise
combinations
Results • Pipeline that outperforms the best method reported by DREAM benchmarking challenge (Bansal et al., Nat. Biotechnoly 2014)
• Experimental validation: prediction was consistent with Takeda experimental results
• Takeda uses the pipeline to prioritize drug combinations for further experimental validation
32
Results
InputsData for each drug: 1) Pre and post treatment gene expression data2) Treatment response data
Prioritized combinations
Potential MOA
Pipeline:Network and Statistical analysis
Drug 1 Drug 2 Combination
https://www.ncbi.nlm.nih.gov/pubmed/25419740
-
33
Step 1: identify phenotype-specific subnetworks
Subnetwork biomarkersStep 2: use subnetworks as features for classification
Consulting Services© 2020 Clarivate
-
Case study catalogue © 2020 Clarivate
Case study for the patient stratification and biomarker discovery servicesIdentification of multiple sclerosis molecular subtypes
Client
Aim • Aim: Identify MS subtypes and biomarkers for the subtypes• Challenge: MS is an heterogeneous disease
Solution • As input we used gene expression profiles from ~200 MS patients from the CLIMB longitudinal study
• Applied statistical and pathway analysis of the input data
Results • 4 molecular patient subtypes significantly associated with disease prognosis
• Identification of disease prognosis biomarkers
34
Read more about the project here:Orion Bionetworks - Turning Big Data Knowledge
http://www.msdiscovery.org/news/blogs/19971-orion-conference-turning-big-data-knowledge
-
• Identification of phenotype-specific sub-networks/pathways connecting the data points from distinct phenotypes-specific data sets (OMICs, small-scale experiment)
35
Multi-omics analysis
Research applications:• Oncology: ‘drivers’• Exploration of disease
heterogeneity• eQTL mechanism elucidation
Industry applications:• Biomarker
discovery/stratification• Biological mechanism
reconstruction• Drug target identification• Stratification• Drug combinations
Tools Description Example
‘Cause-consequence’
Find subnetwork connecting two sets of omics-derived nodes of interest
TieDie; ResponseNet
‘Stratification’ approaches
Joint clustering of multiple omics data sets PARADIGM; CIMLR
What kinds of algorithms are there
What are they used for?
Consulting Services© 2020 Clarivate
-
• Huge collection of consistently profiled multi-omics data from cancer patients
• Applications– Driver gene and pathway discovery
– Disease heterogeneity exploration
• Invented or inspired many tools for:– Patient stratification
– Multi-omics data analysis
36
TCGA
Consulting Services© 2020 Clarivate
-
• Input:
– Two start sets: causes and targets (cause-consequence philosophy)
• Workflow:
– Directed approach: downward network propagation from causes, upward network propagation from targets
• Output:
– Modules of interest, connecting sources and targets
37
TieDieSimple integration algorithm example
Paul et al., Bioinformatics, 2013Consulting Services
© 2020 Clarivate
-
• Workflow:– NBS (network-based stratification of somatic mutation
patterns) to identify two thyroid carcinoma subtypes
– Identification of likely drivers for them (RAS family vs. BRAF)
– Differential expression analysis between subtypes
– GSEA on TF target gene sets to find expression regulators
– TieDie connects 1) driver mutations to 2) differentially expressed proteins from RPPA data set and 3) master regulators of differentially expressed genes.
• Result: – A signaling pathway highlighting ERK signaling and RHEB
protein is reported as crucial difference between subtypes.
38
Use case: stratification + multi-omics analysis in cancer
Consulting Services© 2020 Clarivate
-
• Inputs:– Network/pathway (directed, with effects and mechanisms
of interactions)
– Data matrices for several data types
• Workflow:– Build probabilistic graphical model based on pathway
structure
– Use trained BN to infer the true (‘hidden’) activity changes of genes and proteins given observed data
• Output:
– Trained model;
– Matrix with node activity estimates for each patient and each pathway
39
Example of an integration algorithmPARADIGM
Vaske et al., Bioinformatics, 2010Consulting Services
© 2020 Clarivate
-
40
Single-cell specific tools
Algorithms Function
• netSmooth Data imputation
• scID Cell type mapping
• Ocone et al.• SCODE
Network inference
• SCENIC• ACTION
• Network inference• Subnetwork identification• Cell clustering
• PAGODA• ClusterMine
• Pathway analysis• Cell clustering
• cellPhoneDB• PyMINEr• iTALK
• Cell-cell communication
Case study: examples on real data + infrastructure for preprocessing of single-cell data (e.g. normalization / clustering / dimensionality reduction / lineage tracing. There is plenty of tools available for that
Consulting Services© 2020 Clarivate
-
Single-cell subtype: cell-cell communications
• Tools:– iTALK
– pyMINER
– CellPhoneDB
• Similar workflow in all of them:– Take clusters of cells
– Take DB of ligand-receptor interactions
– Evaluate pairs of ligand (expressed in cell A) vs. receptor (expressed in cell type B)
Consulting Services© 2020 Clarivate
-
42
Algorithms dealing with GWAS data
Algorithms Function
DEPICT • Node prioritization• Pathway prioritization
DAPPLE • Subnetwork identitfication• Node prioritization
PASCAL Pathway analysis
Case study: examples on real data + infrastructure for preprocessing of GWAS results:• SNP to gene mapping• Dealing with LD and reference populations
Consulting Services© 2020 Clarivate
-
Case study catalogue © 2020 Clarivate
Case study for the target identification and drug repositioning servicesDrug target discovery from GWAS loci in Parkinson’s disease
Client
Aim Build a pipeline for the identification of causal genes and drug targets within GWAS loci using PD as a test case
Solution • Combined use of multiple approaches such as: • Network connectivity with known disease drivers• Pathway analysis• Protein subcellular localization• Human and mouse phenotypes• Co-expression with known disease drivers• Gene differential expression• Programmatic literature mining• Disease biomarker knowledge
• Applied data integration through machine learning analysis
Results Causal genes and targets with detailed supportive evidence
43
GWAS Loci
-
Build network Analyze data Interpret results Beyond the R package• Existing networks or
pathways• Data-driven networks• Import and prepare data• Network modifications• Existing datasets
• Node prioritization• Subnetwork
prioritization• Pathway prioritization• Integration• Unsupervised• Crosstalk• Heterogeneous
44
CBDD structure and status
• Visualization• Benchmark & compare• Network comparison
• CBDD web application• GUI• API• Community• Learning materials• Documentation
Consulting Services© 2020 Clarivate
-
45Consulting Services© 2020 Clarivate
Technology:•htmlwidgets R package•cytoscape.js
Network viewer
Context-specific highlight
Control of: • Edge and vertex attributes• Layout
Annotation support
Data overlay support(support for multiple data layers)
Biological interpretation support
Also:• Export (PNG, XGMML)• Node grouping
features
-
Thank you!
Taylor [email protected](646) 585-1742https://clarivate.com/cortellis/cbdd/
46Consulting Services
For questions on CBDD:
© 2020 Clarivate. All rights reserved. Republication or redistribution of Clarivate content, including by framing or similar means, is prohibited without the prior written consent of Clarivate. Clarivate and its logo, as well as all other trademarks used herein are trademarks of their respective owners and used under license.
https://clarivate.com/cortellis/cbdd/
Slide Number 1Computational biology tools for drug discoveryComputational biology tools for drug discoveryComputational biology tools for drug discoveryComputational biology tools for drug discoveryComputational biology tools for drug discoveryComputational biology tools for drug discoveryComputational biology tools for drug discoveryComputational biology tools for drug discoveryCBDD consortium membersCBDD deliverablesDevelopment approach and data modelCBDD use casesValue of CBDD implementations over original implementationsSlide Number 15CBDD structure and statusImport and export capabilitiesScaffold (curated) networksData-driven reconstructionData-driven networks: ARACNEIntegrate network datasetsNetwork adjustment (contextualization)CBDD structure and statusNode prioritizationNode prioritization: MARINaResearch use case:Network propagation exampleIndustry use caseFancier node prioritization exampleSubnetwork / pathway prioritizationSubnetwork algorithm exampleIdentification of synergistic drug combinations using subnetworksSubnetwork biomarkersIdentification of multiple sclerosis molecular subtypesMulti-omics analysisTCGASimple integration algorithm exampleUse case: stratification + multi-omics analysis in cancerPARADIGMSingle-cell specific toolsSingle-cell subtype: cell-cell communicationsAlgorithms dealing with GWAS dataDrug target discovery from GWAS loci in Parkinson’s diseaseCBDD structure and statusNetwork viewerSlide Number 46