leveraging network biology to drive drug discovery...may 20, 2020  · drug discovery why waste time...

46
Leveraging network biology to drive drug discovery CBDD Consortium focused on implementation of systems biology algorithms Alex Ivliev, PhD Director Bioinformatics Alex Ishkin, PhD Senior Research Scientist © 2020 Clarivate

Upload: others

Post on 13-Feb-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

  • Leveraging network biology to drive drug discovery CBDD Consortium focused on implementation of systems biology algorithms

    Alex Ivliev, PhDDirector Bioinformatics

    Alex Ishkin, PhDSenior Research Scientist

    © 2020 Clarivate

  • 2

    • Academic groups around the world develop a large number of advanced computational biology methods

    • Application of such methods can substantially accelerate further academic research

    Computational biology tools for drug discovery

    Algorithm development

    Multiple academic

    groups

    and publication

    Tools

    Consulting Services© 2020 Clarivate

  • 3

    • Academic groups around the world develop a large number of advanced computational biology methods

    • Application of such methods can substantially accelerate further academic research

    Computational biology tools for drug discovery

    Algorithm development

    Network databases

    Omics data

    Academia & pharmaMultiple

    academic groups

    and publication

    New discoveriesTools

    Consulting Services© 2020 Clarivate

  • 4

    Computational biology tools for drug discoveryChallenges:

    • Too many algorithms are published every month to keep track of, benchmark and select for further use

    • Steep learning curve• “Quick and dirty”

    implementations

    • Disparate languages• Poor documentation

    Algorithm development

    and publication

    Network databases

    Omics data

    ToolsNew

    discoveries

    Multiple academic

    groups

    Academia & pharma

    Consulting Services© 2020 Clarivate

  • 5

    Computational biology tools for drug discovery

    This creates a significant gap between algorithm publishing and their adoption and use to

    drive actual research

    Network databases

    Omics data

    Challenges:

    • Too many algorithms are published every month to keep track of, benchmark and select for further use

    • Steep learning curve• “Quick and dirty”

    implementations

    • Disparate languages• Poor documentation

    Multiple academic

    groupsNew

    discoveries

    Academia & pharma

    Consulting Services© 2020 Clarivate

  • 6

    Computational biology tools for drug discovery Why waste time and resources on bridging this gap

    individually when global consortium model offers effective solution for all?

    Challenges:

    • Too many algorithms are published every month to keep track of, benchmark and select for further use

    • Steep learning curve• “Quick and dirty”

    implementations

    • Disparate languages• Poor documentation

    Consulting Services© 2020 Clarivate

  • 7

    Computational biology tools for drug discoveryUse cases:

    • Target ID• Biomarker ID• Patient stratification• MoA reconstruction• Drug combinations• Indication discovery• Drug repositioning

    Algorithm development

    and publication

    • Algorithm selection• Implementation• Benchmarking• Documentation

    Network databases

    Omics data

    New discoveries

    Multiple academic

    groups

    Academia & pharma

    Consulting Services© 2020 Clarivate

  • 8

    Computational biology tools for drug discoveryUse cases:

    • Target ID• Biomarker ID• Patient stratification• MoA reconstruction• Drug combinations• Indication discovery• Drug repositioning

    Algorithm development

    and publication

    Broad coverage of data type and

    use casesSupported and used by leading

    pharma companies

    Network databases

    Omics data

    • Algorithm selection• Implementation• Benchmarking• Documentation New

    discoveries

    Multiple academic

    groups

    Academia & pharma

    Consulting Services© 2020 Clarivate

  • 9

    Computational biology tools for drug discoveryUse cases:

    • Target ID• Biomarker ID• Patient stratification• MoA reconstruction• Drug combinations• Indication discovery• Drug repositioning

    Algorithm development

    and publication

    Network databases

    Omics data

    • Algorithm selection• Implementation• Benchmarking• Documentation

    300+ algorithms evaluated and 59 algorithms implemented

    5 years of FTE time in

    development

    New discoveries

    Multiple academic

    groups

    Academia & pharma

    Consulting Services© 2020 Clarivate

  • CBDD consortium members• 3 companies in 2014• 17 companies in 2020

    https://cbdd.clarivate.com

    10Consulting Services© 2020 Clarivate

    https://cbdd.clarivate.com/

  • R package Documentation Training Additional• Algorithms• Documentation• Tutorials• Datasets

    • User manual• Tutorials and case

    studies• Performance evaluation

    11

    CBDD deliverables

    • Workshops• Case studies

    • CBDD website

    Consulting Services© 2020 Clarivate

  • Accessibility Performance General approaches & flexibility

    IO unification & generalityMain dissemination form

    is an R packageComputationally intensive parts are done in Java

    “Swiss army knife” of systems biology

    • Consistent inputs and outputs to easily build seamless workflows

    • Works with public and proprietary networks and datasets

    IO = input & output

    12

    Development approach and data model

    Consulting Services© 2020 Clarivate

  • 13

    CBDD use cases

    Consulting Services© 2020 Clarivate

  • 14

    Value of CBDD implementations over original implementations

    Consulting Services© 2020 Clarivate

  • 15

    Algorithms and real-life applications

    Consulting Services© 2020 Clarivate

  • Build network Analyze data Interpret results Beyond the R package• Existing networks or

    pathways• Data-driven networks• Network adjustments

    and merging• Import and prepare

    omics data

    • Node prioritization• Subnetwork

    prioritization• Pathway prioritization• Integration• Unsupervised• Crosstalk• Heterogeneous

    16

    CBDD structure and status

    • Visualization• Benchmark & compare• Network comparison

    • CBDD web application• GUI• API• Community• Learning materials• Documentation

    Consulting Services© 2020 Clarivate

  • 17

    Import and export capabilities

    Tab-delimited text filesXGMMLCX (Cytoscape JSON)

    Clarivate’s MetaBaseReactome and other public pathway databases

    NDEx network warehouse

    Advanced visualization

    Consulting Services© 2020 Clarivate

  • • Typically gathered from literature (MetaBaseTM, IntAct)

    • Pro:

    – Contain curated interactions (high confidence)

    • Contra:

    – Inspection bias

    • Also, contain all interactions ever found in any biological context. Are we sure that node X is ever expressed in tissue Y?

    18

    Scaffold (curated) networks

    Existing data sets

    • Preprocessed networks supplied with package

    • Generic text file import– Network content from other vendors

    – Internal content

    – Public interaction DBs

    • MetaBase (if available)

    Consulting Services© 2020 Clarivate

  • • Builds on the similarities of gene behavior (e.g. co-expression)

    • Pro: approximately unbiased (it might discover interactions where experimenters did not even care to look at)

    • Contra: false positives (e.g. indirect interactions) – the biggest problem and an area of active research

    19

    Data-driven reconstruction

    Data-driven networks• Correlation network• ARACNE• Kueffner et al. (networkAnova)• SCODE (SC)• Ocone et al. (SC)

    Gene expression data

    Interactions

    Expression correlations

    =

    Consulting Services© 2020 Clarivate

  • 20

    Data-driven networks: ARACNE

    • Input: Gene expression data• Workflow:

    – Compute mutual information between expression profiles

    – Prune indirect edges (no need to keep edges between nodes connected via high-confidence paths)

    • Output: purely data-driven, undirected network

    PMID: 16723010Consulting Services

    © 2020 Clarivate

  • 21

    Best of all worldsIntegrate network datasets

    Gene coexpression

    Scaffold network(PPI database)

    Text mining

    STRING database is a good example• Predicted interactions integrating diverse

    evidence– Full networks from STRING for human,

    mouse and rat are supplied with CBDD

    Consulting Services© 2020 Clarivate

  • 22

    Adjust the (scaffold) network using the status of nodes (i.e. expressed or not in given biological system)Network adjustment (contextualization)

    Option 1: node filtering

    • Remove genes that are not expressed in a particular context

    • This may be too harsh

    Option 2: edge reweighting

    • Weight interactions based on the presence of their adjacent nodes

    • Less radical changes

    Consulting Services© 2020 Clarivate

  • Build network Analyze data Interpret results Beyond the R package• Existing networks or

    pathways• Data-driven networks• Import and prepare data• Network modifications• Existing datasets

    • Node prioritization• Subnetwork

    prioritization• Pathway prioritization• Integration• Unsupervised• Crosstalk• Heterogeneous

    23

    CBDD structure and status

    • Visualization• Benchmark & compare• Network comparison

    • CBDD web application• GUI• API• Community• Learning materials• Documentation

    Consulting Services© 2020 Clarivate

  • • Find network regulators for a set of nodes of interest using guilt by association approach

    • Inputs:– Network + nodes

    of interest• Outputs:

    – Ranked list of nodes

    24

    Node prioritization

    Research applications:• Protein function prediction• Disease-related gene prediction

    Industry applications:• Drug target prediction• Drug repositioning• Drug combinations• Biological mechanism

    reconstruction

    Tools Description Example

    ‘Local’ approaches Use local neighborhood of start nodes MARINa

    ‘Global’ approaches Use entire topology of graph to prioritize regulators

    Network propagation

    ‘Causal’ approaches Use directed network to infer causal regulation of differential expression

    Causal reasoning

    What kinds of algorithms are there

    What are they used for?

    Consulting Services© 2020 Clarivate

  • • Inputs– Network(s); preferably directed, weighted and with edge types

    – Start nodes (with expression changes)

    • Workflow– For each regulator:

    – Get ‘regulons’- directly regulated nodes (activated by TF and repressed by TF)

    – Compute GSEA score for enrichment with start nodes

    – Make correction for overlapping target spaces (shadowing; synergy)

    • Results– Ranked list of ‘master’ regulators

    Node prioritization: MARINa

    PMID: 20531406Consulting Services

    © 2020 Clarivate

  • • Goal:– FGFR2 is a GWAS risk locus for breast cancer , and the

    study was exploring mechanisms of FGF signaling in disease.

    • Workflow:– ARACNe applied to build network of transcription

    regulation edges

    – Overconnectivity and MARINa were applied to find TFs responsible for FGFR2’s effects in cancer.

    • Outcome– The SPDEF, ESR1, GATA3 and FOXA1 regulons are

    consistently FGFR2-responsive

    – Findings validated by shRNA experiments

    26

    Mechanism reconstruction in cancerResearch use case:

    Consulting Services© 2020 Clarivate

  • • Initially created for identification of protein functions and new disease genes from already known data

    • Inputs: – Network– Disease-causing genes from OMIM (or any list of start nodes)– Disease-disease similarities are used in the original publication

    (but not in CBDD)• Scoring: flow is propagated through the network from

    start nodes, until a steady state is reached

    • Results: ranked disease-associated nodes (by flow level)

    27

    Simple, global topology approachesNetwork propagation example

    First iteration

    Steady state

    Vanunu et al., PLoS Comput Biol, 2009Consulting Services

    © 2020 Clarivate

  • Establishing a pipeline for systematic indication prioritizationIndustry use case

    Client

    Aim • Before the project, Boehringer Ingelheim (BI) prioritized indications manually and not for every target of interest

    • We aimed to build analytical pipeline for indication selection

    Solution • Using its indication prioritization methods, Clarivate helped BI’s team to develop an analytical pipeline for automated and systematic indication prioritization

    • The pipeline combined: (1) network and pathway information from MetaBaseTM, (2) disease biomarker content from IntegritySM and public sources, (3) OMICs data and (4) advanced analytics developed within CBDD

    Results • Indication prioritization is automated and is run systematically• It is mandatory for BI computational biology team to perform

    systematic indication prioritization once a project reaches lead optimization stage

    28

    Presented by Dr Elia Stupka at Bio-IT, 2017

    Consulting Services© 2020 Clarivate

  • • Inputs:– Network (directed, with effects of edges)– Start nodes and directions of change

    • For each node H in network:– Make assumption about role of H (say, H is aberrantly activated in

    phenotype X)– Predict activity changes for genes which are influenced by H– Compare actual activity changes (e.g. expression fold change) with

    predictions• Scoring criteria

    – Score (#correct - #incorrect predictions)– The more disease-specific genes explained by H, the better

    (Enrichment)– The more consistent predictions and real data, the better

    (Concordance)

    29

    Causal reasoningFancier node prioritization example

    Pollard et al., Diabetes Technology & Therapeutics 2005.Chindelevitch et al., Bioinformatics 2012.

    Consulting Services© 2020 Clarivate

  • • Identify sub-networks/pathways based on network analysis of phenotypes-specific data (OMICs, small-scale experiment)

    • Inputs (usual):– Network + nodes

    of interest• Outputs:

    – Ranked subnetworks

    30

    Subnetwork / pathway prioritization

    Research applications:• Protein complex identification• Biological mechanism

    reconstruction

    Industry applications:• Biomarker

    discovery/stratification• Biological mechanism

    reconstruction

    Tools Description Example

    ‘Dense’ approaches Find densely connected subnetworks DENSE

    ‘Enrichment’ approaches

    Find subnetworks/pathways enriched with nodes of interest

    Active modules

    ‘Biomarker’ approaches

    Find subnetworks/pathways which reliably differ in expression between two phenotypes

    Subnetwork markers

    What kinds of algorithms are there

    What are they used for?

    Consulting Services© 2020 Clarivate

  • • ‘Grandfather’ of subnetwork algorithms

    • Inputs: – network;

    – start nodes with differential expression p-values

    • Workflow:– Assign nodes z-scores

    – Use simulated annealing to find subnetwork with high overall score

    • Output: high-scoring subnetwork

    31

    ActiveModulesSubnetwork algorithm example

    Consulting Services© 2020 Clarivate

    © 2020 Clarivate

  • Case study catalogue

    e© 2020 Clarivate

    Case study for the drug combinations servicesIdentification of synergistic drug combinations using subnetworks

    Client

    Aim Establish an analysis pipeline that predicts (i) the efficacy of drug combinations better than known methods and (ii) the potential mechanism of action

    Solution • Identified treatment-specific networks for individual drugs• Combined individual networks to predict the effect of pairwise

    combinations

    Results • Pipeline that outperforms the best method reported by DREAM benchmarking challenge (Bansal et al., Nat. Biotechnoly 2014)

    • Experimental validation: prediction was consistent with Takeda experimental results

    • Takeda uses the pipeline to prioritize drug combinations for further experimental validation

    32

    Results

    InputsData for each drug: 1) Pre and post treatment gene expression data2) Treatment response data

    Prioritized combinations

    Potential MOA

    Pipeline:Network and Statistical analysis

    Drug 1 Drug 2 Combination

    https://www.ncbi.nlm.nih.gov/pubmed/25419740

  • 33

    Step 1: identify phenotype-specific subnetworks

    Subnetwork biomarkersStep 2: use subnetworks as features for classification

    Consulting Services© 2020 Clarivate

  • Case study catalogue © 2020 Clarivate

    Case study for the patient stratification and biomarker discovery servicesIdentification of multiple sclerosis molecular subtypes

    Client

    Aim • Aim: Identify MS subtypes and biomarkers for the subtypes• Challenge: MS is an heterogeneous disease

    Solution • As input we used gene expression profiles from ~200 MS patients from the CLIMB longitudinal study

    • Applied statistical and pathway analysis of the input data

    Results • 4 molecular patient subtypes significantly associated with disease prognosis

    • Identification of disease prognosis biomarkers

    34

    Read more about the project here:Orion Bionetworks - Turning Big Data Knowledge

    http://www.msdiscovery.org/news/blogs/19971-orion-conference-turning-big-data-knowledge

  • • Identification of phenotype-specific sub-networks/pathways connecting the data points from distinct phenotypes-specific data sets (OMICs, small-scale experiment)

    35

    Multi-omics analysis

    Research applications:• Oncology: ‘drivers’• Exploration of disease

    heterogeneity• eQTL mechanism elucidation

    Industry applications:• Biomarker

    discovery/stratification• Biological mechanism

    reconstruction• Drug target identification• Stratification• Drug combinations

    Tools Description Example

    ‘Cause-consequence’

    Find subnetwork connecting two sets of omics-derived nodes of interest

    TieDie; ResponseNet

    ‘Stratification’ approaches

    Joint clustering of multiple omics data sets PARADIGM; CIMLR

    What kinds of algorithms are there

    What are they used for?

    Consulting Services© 2020 Clarivate

  • • Huge collection of consistently profiled multi-omics data from cancer patients

    • Applications– Driver gene and pathway discovery

    – Disease heterogeneity exploration

    • Invented or inspired many tools for:– Patient stratification

    – Multi-omics data analysis

    36

    TCGA

    Consulting Services© 2020 Clarivate

  • • Input:

    – Two start sets: causes and targets (cause-consequence philosophy)

    • Workflow:

    – Directed approach: downward network propagation from causes, upward network propagation from targets

    • Output:

    – Modules of interest, connecting sources and targets

    37

    TieDieSimple integration algorithm example

    Paul et al., Bioinformatics, 2013Consulting Services

    © 2020 Clarivate

  • • Workflow:– NBS (network-based stratification of somatic mutation

    patterns) to identify two thyroid carcinoma subtypes

    – Identification of likely drivers for them (RAS family vs. BRAF)

    – Differential expression analysis between subtypes

    – GSEA on TF target gene sets to find expression regulators

    – TieDie connects 1) driver mutations to 2) differentially expressed proteins from RPPA data set and 3) master regulators of differentially expressed genes.

    • Result: – A signaling pathway highlighting ERK signaling and RHEB

    protein is reported as crucial difference between subtypes.

    38

    Use case: stratification + multi-omics analysis in cancer

    Consulting Services© 2020 Clarivate

  • • Inputs:– Network/pathway (directed, with effects and mechanisms

    of interactions)

    – Data matrices for several data types

    • Workflow:– Build probabilistic graphical model based on pathway

    structure

    – Use trained BN to infer the true (‘hidden’) activity changes of genes and proteins given observed data

    • Output:

    – Trained model;

    – Matrix with node activity estimates for each patient and each pathway

    39

    Example of an integration algorithmPARADIGM

    Vaske et al., Bioinformatics, 2010Consulting Services

    © 2020 Clarivate

  • 40

    Single-cell specific tools

    Algorithms Function

    • netSmooth Data imputation

    • scID Cell type mapping

    • Ocone et al.• SCODE

    Network inference

    • SCENIC• ACTION

    • Network inference• Subnetwork identification• Cell clustering

    • PAGODA• ClusterMine

    • Pathway analysis• Cell clustering

    • cellPhoneDB• PyMINEr• iTALK

    • Cell-cell communication

    Case study: examples on real data + infrastructure for preprocessing of single-cell data (e.g. normalization / clustering / dimensionality reduction / lineage tracing. There is plenty of tools available for that

    Consulting Services© 2020 Clarivate

  • Single-cell subtype: cell-cell communications

    • Tools:– iTALK

    – pyMINER

    – CellPhoneDB

    • Similar workflow in all of them:– Take clusters of cells

    – Take DB of ligand-receptor interactions

    – Evaluate pairs of ligand (expressed in cell A) vs. receptor (expressed in cell type B)

    Consulting Services© 2020 Clarivate

  • 42

    Algorithms dealing with GWAS data

    Algorithms Function

    DEPICT • Node prioritization• Pathway prioritization

    DAPPLE • Subnetwork identitfication• Node prioritization

    PASCAL Pathway analysis

    Case study: examples on real data + infrastructure for preprocessing of GWAS results:• SNP to gene mapping• Dealing with LD and reference populations

    Consulting Services© 2020 Clarivate

  • Case study catalogue © 2020 Clarivate

    Case study for the target identification and drug repositioning servicesDrug target discovery from GWAS loci in Parkinson’s disease

    Client

    Aim Build a pipeline for the identification of causal genes and drug targets within GWAS loci using PD as a test case

    Solution • Combined use of multiple approaches such as: • Network connectivity with known disease drivers• Pathway analysis• Protein subcellular localization• Human and mouse phenotypes• Co-expression with known disease drivers• Gene differential expression• Programmatic literature mining• Disease biomarker knowledge

    • Applied data integration through machine learning analysis

    Results Causal genes and targets with detailed supportive evidence

    43

    GWAS Loci

  • Build network Analyze data Interpret results Beyond the R package• Existing networks or

    pathways• Data-driven networks• Import and prepare data• Network modifications• Existing datasets

    • Node prioritization• Subnetwork

    prioritization• Pathway prioritization• Integration• Unsupervised• Crosstalk• Heterogeneous

    44

    CBDD structure and status

    • Visualization• Benchmark & compare• Network comparison

    • CBDD web application• GUI• API• Community• Learning materials• Documentation

    Consulting Services© 2020 Clarivate

  • 45Consulting Services© 2020 Clarivate

    Technology:•htmlwidgets R package•cytoscape.js

    Network viewer

    Context-specific highlight

    Control of: • Edge and vertex attributes• Layout

    Annotation support

    Data overlay support(support for multiple data layers)

    Biological interpretation support

    Also:• Export (PNG, XGMML)• Node grouping

    features

  • Thank you!

    Taylor [email protected](646) 585-1742https://clarivate.com/cortellis/cbdd/

    46Consulting Services

    For questions on CBDD:

    © 2020 Clarivate. All rights reserved. Republication or redistribution of Clarivate content, including by framing or similar means, is prohibited without the prior written consent of Clarivate. Clarivate and its logo, as well as all other trademarks used herein are trademarks of their respective owners and used under license.

    https://clarivate.com/cortellis/cbdd/

    Slide Number 1Computational biology tools for drug discoveryComputational biology tools for drug discoveryComputational biology tools for drug discoveryComputational biology tools for drug discoveryComputational biology tools for drug discoveryComputational biology tools for drug discoveryComputational biology tools for drug discoveryComputational biology tools for drug discoveryCBDD consortium membersCBDD deliverablesDevelopment approach and data modelCBDD use casesValue of CBDD implementations over original implementationsSlide Number 15CBDD structure and statusImport and export capabilitiesScaffold (curated) networksData-driven reconstructionData-driven networks: ARACNEIntegrate network datasetsNetwork adjustment (contextualization)CBDD structure and statusNode prioritizationNode prioritization: MARINaResearch use case:Network propagation exampleIndustry use caseFancier node prioritization exampleSubnetwork / pathway prioritizationSubnetwork algorithm exampleIdentification of synergistic drug combinations using subnetworksSubnetwork biomarkersIdentification of multiple sclerosis molecular subtypesMulti-omics analysisTCGASimple integration algorithm exampleUse case: stratification + multi-omics analysis in cancerPARADIGMSingle-cell specific toolsSingle-cell subtype: cell-cell communicationsAlgorithms dealing with GWAS dataDrug target discovery from GWAS loci in Parkinson’s diseaseCBDD structure and statusNetwork viewerSlide Number 46