unit 3; session 1 principles of biomarker discovery and ... 2… · big data training for...

61
Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational Medicine Liu 6/7/2017 Class 1 10:15am Unit 3; Session 1

Upload: others

Post on 22-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Principles of Biomarker

Discovery and Development

In Translational Medicine

Liu6/7/2017

Class 1

10:15am

Unit 3; Session 1

Page 2: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Breakdown

Learning objectives

Biomarker and Precision Medicine

Biomarker in preclinical and clinical studies

Principles of Biomarker Discovery: Overview

Principles of Biomarker discovery: data collection

Principles of Biomarker discovery: data analysis

Principles of Biomarker Discovery: validation

Page 3: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Philosophy of Translational Research

• As a biomedical researcher, how

can I make something to benefit

patients?

• I am working on cell lines and

mice, how the omics approach can

help me understand the

mechanism? esp. causality?

• Can the key molecule(s) I

identified in cells and animals be

able to used in humans?

Lab researchers, grant writers, physicians…

Page 4: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Key Words

• Biomarker: A characteristic that is objectively measured

and evaluated as an indicator of normal biologic process,

pathogenic processes, or pharmacologic responses to a

therapeutic intervention.

NIH Biomarkers Definition Working Group

• Translational: Translational research aims to aid in the

transformation of biological knowledge into solutions that

can be applied in a clinical setting

Atkinson, et al., Clin Pharm Ther, 2001.

Azuaje F. Bioinformatics and Biomarker Discovery, 2010

Page 5: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Why Biomarker?

Page 6: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

A Core Question in Modern Medicine

How to Address Patient Heterogeneity?

Page 7: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Patient Heterogeneity

Page 8: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

BiomarkerPersonalized Medicine

CML Patients

All Breast Cancer

Patients

HER2+ Breast Cancer

Patients

All NSCLC Patients

EGFR MT+ NSCLC

Patients

Gleevec

Herceptin

Herceptin

Iressa

Iressa

90% RR

10–15% RR

35–45% RR

10–15% RR

60–70% RR

Slamon et al. NEJM 2001; Kantarjian et al. NEJM 2002; Vogel et al. JCO 2002. 20:3; Douillard et al. JCO 2010.

Biomarkers are especially important in diseases with low response rates in

the overall population

Page 9: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Cancer

Other common diseases

Discovery ImplementationDrug development

EGFR

KRAS

ALK

HER2

ALK

BRAF

Gefitinib

ARS-853?

Crizotinib

Herceptin

Vemurafenib

Gene A

Gene B

ALK

Gene D

Gene C

Gene E

Precision molecules

BiomarkerPersonalized Medicine

Page 10: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Precision Medicine

To deliver the right treatment to the right patient with the right dose

and at the right time

Page 11: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Clinical Application of Biomarker

• Deal with the patient heterogeneity– Early risk assessment

– Disease prevention

– Assist diagnosis

– Optimize treatment: high effectiveness, low risk

– Match the patient to therapeutic strategy

– Monitor therapy success/disease recurrence

– Long-term management

Risk

Diagnosis

Treatment

Monitoring

Page 12: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Biomarker in Preclinical Studies• To characterize the phenotype

• To monitor the response

• To identify potential translational biomarkers for humans

Page 13: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Omics Approach in Basic Research

• Explore molecular mechanism

• Hypothesis generating

• Identify therapeutic targets and strategies

• Establish intermediate phenotypes

Page 14: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Type of Biomarkers

• Prognostic marker (a): before treatment

• Predictive marker (b): before treatment

• Pharmacodynamic marker (c): after treatment

• Surrogate marker (d): during treatment

Gosho, et al. Sensors 2012, 12, 8966-8986

Page 15: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Prognostic Marker• Signature separates a population with respect to the outcome (risk)

• Regardless of the types of therapies or treatments– Markers associated with overall survival regardless of treatment

• Distinguish outcome (poor or good) following the test and standard treatments

• Cannot guide the choice of a particular treatment

• Can determine the aggressiveness of treatment

Ballman KL, JCO. 2015.63.3651

Page 16: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Predictive Biomarker

Ballman KL, JCO. 2015.63.3651

• Predicts the differential outcome of a particular therapy or treatment

• Prospectively identify patients who are likely to have a favorable clinical outcome from a specific treatment; therefore, a predictive biomarker

• Can guide the choice of treatment

Page 17: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Prognostic and Predictive Markers

Ballman KL, JCO. 2015.63.3651

• Biomarkers are both predictive of disease susceptibility or progression and certain treatment outcomes

• ER status and breast cancer-prognostic

• ER status and antiestrogen therapy-prediction

Page 18: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Pharmacodynamic Markers• PD biomarkers provide information about the pharmacologic

effects of a drug on its target

• Measured after treatment

• A clinical endpoint to be measured

• Application:– Proof of mechanism: i.e., Does the drug hit its intended target?

– Proof of concept: i.e., Does hitting the drug target alter the biology of the tumor?

– Selection of optimal biologic dosing

– Understanding response/resistance mechanisms

• Examples:– Protein phosphorylation markers. i.e. p-EGFR, p-ERK to evaluate

changes in target protein phosphorylation or the activation status of downstream signaling/adapter molecules.

– Apoptosis (TUNEL assay) to assess pharmacologic effect on proliferation

Page 19: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Surrogate Biomarker• Substitute for a clinical endpoint

• Expected to predict clinical benefit (lack of benefit or harm) based on epidemiologic, therapeutic, pathophysiologic, or other scientific evidence

• During or after treatment

• Examples:

• Glucose level monitoring the treatment for diabetes

• Imaging-based measurement for anti-cancer therapy

Page 20: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Questions

What kind of biomarker is

HOX13B:IL17BR in the first case paper?

What kind of biomarker is blood

concentration of R-/S-methadone in the

second case paper?

Page 21: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Examples of FDA Approved Biomarkers

Gosho, et al. Sensors 2012, 12, 8966-8986

Page 22: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Gosho, et al. Sensors 2012, 12, 8966-8986

Examples of FDA Approved Biomarkers

Page 23: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Biomarker Discovery and Development in the Omics Era

1970s 1980s 1990s

>2005

Page 24: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Biomarker Discovery and Development in the Omics Era

Genomics

Transcriptomics

miRNomics

lncRNomics

Epigenomics

Proteomics

Metabolomics

Lipidomics

Exposomics

Page 25: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Prognostic-diagnostic Markers

• Genes for ~50% of rare diseases identified

Nature Reviews Genetics 14, 681–691 (2013)

Page 26: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Prognostic-Diagnostic Markers• 11,907 SNPs strongly associated with common diseases

Page 27: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Pharmacogenomic Markers

• 166 FDA approved PGx markers for drug treatment

Page 28: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Transcriptomic Biomarkers

• MammaPrint test– Agendia

– 70-gene signature for breast cancer prognosis

• Oncotype Dx test– Genomic Health

– 21 gene-expression biomarkers for predicting the recurrence of breast cancer patients, and predicting response to both chemotherapy and radiation therapy

• H/I test– AviaraDx

– 2-gene signature that is used to estimate the risk of recurrence and response to therapy of breast cancer patients.

Page 29: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Technical

development

Biomarker Development Pipeline

Discovery ConfirmationAssay

development

Validation/

Refinement

Clinical Validation

Clinical Adoption

Genomics

Transcriptomics

Proteomics

Metabolomics

Lipidomics

Epigenomics

Exposomics

Imaging

Target

selection

Integrated technologies and platforms

Multi-analyst assays

Robust validated assays

Clinical grade assays

Accurate, specific,

reproducible, reliable

Clinical grade assays

Instruments

Number of analytes

Number of samples

https://is.muni.cz

Lead

identification

Preclinical

Retrospective

Clinical

trials

Marketing

clinical use

Page 30: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Institute of Medicine Roadmap for omics-

based tumor biomarker test development

Hayes BMC Medicine 2013, 11:221

Page 31: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Institute of Medicine Roadmap for omics-

based tumor biomarker test development Hayes BMC Medicine 2013, 11:221

Page 32: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Data Acquisition Strategies

• Retrospective:– Clinical samples collected before the design of the biomarker study,

and before comparison with control samples.

– Looks back at past, recorded data to find evidence of marker-disease relationships

– Inexpensive, rapid

– Potentially biased, noisy

– Weak evidence

• Prospective– The biomarker-based prediction or classification model is applied on

patients at the time of patient enrolment

– Clinical outcomes or disease occurrence are unknown at the time of enrolment

– Less biased

– Strong evidence

– Expensive, time-consuming,

• Pro-retrospective

FDA approval!!

Page 33: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Study Design Consideration

• Biomarker discovery studies require careful planning and design

• Study style: retrospective, prospective, pro-retrospective

• Sample collection

• Phenotype

• Sample size and power estimation

• Other covariates

• Data collection

• Platform

• Replication, validation and application

• Data analysis plan

Page 34: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Sample Collection, Assay Design, Data Analysis Plan

• Establish methods• Specimen collection • Processing • Storage

• Establish criteria • Quantity and quality• Minimum amount

• Feasibility • Obtaining specimens

• Assay design• Communication with core/service provider

• Data Analysis• Communication biostatistician and bioinformatician

Page 35: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Sample and Materials

• Biospecimen• Tissue

• Blood

• Oral swab

• Hair

• Tear

• Urine

• Feces

• Saliva

• …

• Test materials• DNA

• RNA

• Protein

• Small

molecules

• Lipids

• Principles:• Non-invasive

• Reproducible

• Reliable

• Specific

• Accurate

• Inexpensive

• Point-of-care

invasiv

eness

Page 36: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Ethical, Legal, and Regulatory Issues

• Establish communication with regulatory agencies, e.g. IRB, FDA

• Regulatory approvals

• Documents: – Informed consent

– Study protocol

• Intellectual property issues

• CLIA-lab based test for clinical trials involving patient selection

Page 37: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Sample Size and Power Estimation• Power setting: 0.8

• Statistical significance: – Discovery: multiple hypothesis (corrected p

according to # of tests)

– Validation: usually one hypothesis (p<0.05)

• Input parameters: previous publication or pilot study

• Online tools:– piface.jar by Lenth (2006).

• http://homepage.stat.uiowa.edu/~rlenth/Power/

– Microarray power/sample size estimation• http://sph.umd.edu/department/epib/sample-size-

and-power-calculations-microarray-studies

• RNA-seq data:

• Scotty: http://bioinformatics.bc.edu/marthlab/scotty/scotty.php

• RnaSeqSampleSize: https://cqs.mc.vanderbilt.edu/shiny/RnaSeqSampleSize/

Page 38: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Key Principles: Big Data in Biomarker

Phenotype Molecular Profiles

X“Digits” “Digits”Statistics

Bioinformatics

Network

Page 39: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Always Start Your Design and

Analysis From Data Evaluation!

• What kind of phenotypic and marker data do I

have/should I use/collect?

• Are my data normally distributed?

• What kind of models should I choose?

• What factors may possibly confound my analyses?

• How covariate data may be correlated with my

phenotype?

Page 40: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Phenotype to Digits

• Nominal data: no order– yes or no (Binary): disease vs normal, response vs no

response

– Cancer type: Breast, lung, colon…

• Ordinal data: some order– Pathologic: Tumor stage: I, II, III

– Disease progression: no, mild, severe, death

• Continuous data: – glucose level, LDL, drug concentration, gene expression

• Survival data: time to event– Death, occurrence of disease, onset of toxicity, in hr, day,

wk, month, yr, etc.

Page 41: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Platform

Raw data

“Digits”Ordinal data

0, 1, 2

Continuous Variables-1.2,

-1.1,

0.58,

1.09,

2.34

Genomics

Transcriptomics

miRNomics

lncRNomics

Epigenomics

Proteomics

Metabolomics

Lipidomics

Molecular Data Collection

Page 42: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Basic Statistical MethodsPhenotype Molecular Profiles

XNumerical data Numerical data

Nominal

Ordinal

Continuous

Nominal

Ordinal

Continuous

Survival

Chi-square test

t-test

ANOVA

Correlation

Log rank

Statistic

Models

Descriptive and exploratory association

Page 43: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Basic Statistical Methods

• Continuous data

– Normal distributed: parametric method

– Non-normal distribution/ordinal data: non-parametric

method

• Winsorization

• Log transformation: log2

Parametric Non-parametric

t-test Mann-Whitney rank-sum test

Paired t-test Wilcoxon signed-rank test

ANOVA Kruskal-Wallis test

Pearson correlation Spearman correlation

Page 44: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Statistic Models

• Univariate models– Logistic regression: binary/categorical phenotype

– Linear regression: continuous phenotype

– Kaplan-Meier (KM) method: survival phenotype

• Multivariate models– Multivariate regressions: linear or logistic

– Cox regression: survival phenotype

• Other sophisticated models

Page 45: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

• Example• P value cutoff =0.05

• 1000 genes: 50 genes by chance (error) at this significance level

• If 60 genes with p<0.05, many might be due to noise (false positive)

• Common Correction Method• Bonferroni Correction

• True significance level: pXn, e.g. p=0.0005, n=1000 genes, true p=

0.0005X1000=0.5.

• Correct p value = 0.05/N

• Explanation: among all genes selected, the p value for at least one

false positive is <=0.05

• False discovery rate (FDR)• FDR=0.1, meaning among all genes selected, (e.g. 100), we would

expect 10 to be false positive

• FDR as high as 0.5 may be acceptable to biologists

• Several different approaches to estimate (Benjamini & Hochberg,

B&H, most popular)

• Data filtering in the process step can also reduce the number of genes

Multiple Testing Issue

Page 46: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Azuaje F. Bioinformatics and Biomarker Discovery, 2010

Basic Biomarker Discovery Pipeline

Page 47: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Data Processing

• Data pre-processing – Data filtering and QC

• Remove samples with failed experiment

• Exclude markers with very low variance

• Exclude markers with very low expression levels, e.g. RNA-seq

– Data Normalization• To transform the data into a format that is compatible

or comparable between different samples or assays

• To level potential differences caused by experimental factors, such as labelling and hybridization

Page 48: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Why Remove Genes with Low Variance?

Case

Co

ntr

ol

Case

Co

ntr

ol

0

1

2

3

4

Ge

ne

Ex

pre

ss

ion

p=0.004 p=0.008

Page 49: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Data Reduction

• Focus on smaller sets of potentially novel and interesting data patterns (e.g. groups of samples or gene sets).

• Confirm initial hypothesis about the relevance of the features available and to guide future experimental and computational analysis

• Exploratory univariate analyses– T-test

– Chi-square test

– Correlation

– Univariate regression

Page 50: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Data Matrix

• Data matrix

• Color-coded representations of

• Absolute or relative expression levels

Expre

ssio

n

Samples

Page 51: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Data Visualization

dendrogram

• Statistical plotting: Graphpad

• Dendrogram and heatmap: R, GENE-E, Gitools

Page 52: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Exploratory Analysis

• Univariate analysis

• Single marker vs phenotype

• Multiple-hypotheses testing corrections– DEG

– Fold change

– Statistical model: t-test, correlation, univariate regression

– P values and other cut-off

• Unsupervised classification (clustering) and visualization

• Filtering: to remove uninformative, highly noisy or redundant markers for subsequent analyses

• Supervised classification

Page 53: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Data Integration

• Further reduction

• Which marker to be chosen for the predictive model construction

• To estimate the potential relevance of the identified markers and relationships;

• To discover other significant genes and relationships (e.g. gene-gene or gene-disease) not found in previous data-driven analysis steps

• Tools:– human gene annotation databases (e.g. GO),

– metabolic pathways databases (e.g. KEGG),

– gene-disease association extractors from public databases (e.g. Endeavour),

– Other functional catalogues

• Resulting data- and knowledge-driven findings, patterns or predictions provide a selected catalogue of genes, pathways and (gene-gene and gene-disease) relationships relevant to the phenotype classes investigated

IPA

Page 54: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Don’t Forget Covariates!• Don’t forget these:

– Demographic• age, gender, race (often a PCA component), smoking, drinking, life style etc.

– Physiological• BMI, weight, height, etc.

– Clinical• blood tests, urine tests, other analytes.

• Integrate information– Molecular data

– Knowledge-driving data

– Covariates

• Multivariate regression– Model training

– Model validation

– Model assessment• ROC

Page 55: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Data Integration is Critical

• Provide more reliable information

• Increase the prediction value

• Insight into the mechanism

• Reliable hypothesis generating

• But can be biased as well

Transcription Translation Catalysis

DNA RNA Protein Metabolites

Genome Transcriptome Proteome Metabolome/Lipidome Clinical endpoint

dysregulation

Genetic effect

Environmental effect

Page 56: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Examples of Cardiovascular

Biomarkers with Integrated

Data

Vasan, 2006; Gerszten and Wang, 2008

Page 57: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Building Predictive Models

If …Then…

Build up a model based on selected markers

Discovery set

validation set

Pro-retrospective set

Prospective set

Y= β0 + β1X1 + β2 X2 + βiXi^ ^ ^ ^

Page 58: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Predictive Models

• Multivariable models

– Linear regression

• Continuous data

– logistic regression

• Presence/absence of disease

– Cox regression

• Survival data

• Algorithmic models—Machine learning

– Support vector machines (SVM)

– Artificial neural networks (ANN)

Page 59: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Validation Strategies

• Internal validation

– Cross-validation

– Random/non-random split samples into

training and test set

• External validation

– Independent sample and dataset

Page 60: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Assessment of Performance• Basic parameters

– Sensitivity: the proportion of the true positive outcomes (e.g. truly diseased subjects) that are predicted to be positive

– Specificity: the proportion of the true negative outcomes (e.g. truly disease-free subjects) that are predicted to be negative

Page 61: Unit 3; Session 1 Principles of Biomarker Discovery and ... 2… · Big Data Training for Translational Omics Research Principles of Biomarker Discovery and Development In Translational

Big Data Training for Translational Omics Research

Assessment of Performance

• Receiver Operating Characteristic (ROC) curve

• Area under the curve (AUC)

– AUC=0.5: no association

– AUC=1: perfect association

– AUC<0.6: No medical value

– AUC>0.75: reasonable

“AUROC”