lecture 3 case studies

Upload: dgrapov

Post on 14-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Lecture 3 Case Studies

    1/32

    Metabolomic Data Analysis

    Case Studies

    Dmitry Grapov, PhD

    Case

    Stud

    ies

  • 7/29/2019 Lecture 3 Case Studies

    2/32

    Case Studies

    1. Data Exploration and Analysis Planning

    Lung Cancer

    2. Multifactorial Design Mouse Cerebellum

    3. Time Course

    OGTT Metabolomics

  • 7/29/2019 Lecture 3 Case Studies

    3/32

    Analysis Planning

    DOD Lung Cancer Plasma (CARET)Summary

    Analysis of plasma primary metabolites to identify circulating markers

    related with lung cancer histology type.

    Methods

    Exploratory data analysis using principal components analysis (PCA)

    Analysis of covariance (ANCOVA)

    Orthogonal partial least squares discriminant analysis (OPLS-DA)

    Hierarchical cluster analysis (HCA) and multidimensional scaling (MDS)

  • 7/29/2019 Lecture 3 Case Studies

    4/32

    Lung Cancer: Exploratory Analysis

    Purpose

    Overview data variance structureMethods

    Singular value decomposition (SVD) on autoscaled data

    PC1 and 2 (14% variance

    explained) display 2

    clusters of points

    Cluster structure could not be

    explained by histology or any

    other metadata

    Cluster structure is best

    explained by instrumental

    acquisition date

    Black - 110629 to 110701

    Red - 110702 to 110705

  • 7/29/2019 Lecture 3 Case Studies

    5/32

    Lung Cancer: Analysis Planning

    Purpose Identify significant changes in metabolites while adjusting for the noted batch effect, gender and

    smoking status covariates.Methods Shifted logarithm (natural) transformed data ANCOVA: batch + gender + smoking False Discovery Rate correction and estimation

    PCA used to overview covariate

    adjusted data structure

    Cluster structure in the adjusted data suggests

    that there is another unexplained covariate

    OPLS-DA was used to evaluate covariate adjustments and

    hypothesis testing strategies

    Modeling histology (control in green) Modeling control/cancer and histology

  • 7/29/2019 Lecture 3 Case Studies

    6/32

    Lung Cancer: ANCOVA

    Summary

    Optimal testing strategy was identified as : Using covariate adjusted data ( ~batch +gender +smoking) to test for differences between control and

    cancer (adenocarcinoma, NSCLC and squamous)

    OPLS-DA overview of optimized

    modeling strategyIdentified 24 (8%) significantly changes species (3 post

    FDR)

  • 7/29/2019 Lecture 3 Case Studies

    7/32

    Lung Cancer: Correlation Analysis

    PurposeIdentify relationships betweenknown and unknown metabolicfeatures.

    Methods

    Hierarchical cluster analysis(euclidean distances fromspearmans correlations,linked by wards method)

    Summary

    Top features could begrouped into 8 majorcorrelated clusters

    Top changed unknown metabolites could

    be linked to named species

    223566 tryptophan 225405 1/ beta-alanine 274174 methionine, glucuronic acid 228377 tryptophan 362112 tryptophan

  • 7/29/2019 Lecture 3 Case Studies

    8/32

    Lung Cancer

    Conclusions

    Metabolic data contained batch effects, which could be in part explained

    by data acquisition date Univariate analyses were limited by the effects of outliers

    Multivariate modeling was used to identify 64 features (21%) which best

    explain differences in plasma metabolites from patients with or without

    lung cancer

    hydroxylamine, aspartic acid, and tryptophan displayed patterns of

    change consistent with differences in patient cancer histology

    Correlation analysis was used to link many significant changes in

    unknowns to tryptophan

  • 7/29/2019 Lecture 3 Case Studies

    9/32

    Multifactorial Design

    Mouse Cerebellum MetabolomicsSummary

    Analysis of mice carrying a gene mutation in ERCC8. Cockayne Syndrome B, rareautosomal recessive congenital disorder, which is related to premature aging.Mutant animals display altered glycolytic and mitochondrial metabolism which

    is benefited by a high fat diet.

    Study Design

    2 genotypes (WT, CSB; n=20)

    4 diets per genotype (SD, Resv, CR, HFD; n=5)

    Analysis

    principal components analysis (PCA)

    two-way analysis of variance (ANOVA)

    orthogonal partial least squares discriminant analysis (OPLS-DA)

    network mapping

    http://en.wikipedia.org/wiki/Cockayne_syndromehttp://en.wikipedia.org/wiki/Cockayne_syndromehttp://en.wikipedia.org/wiki/Cockayne_syndromehttp://en.wikipedia.org/wiki/Cockayne_syndromehttp://en.wikipedia.org/wiki/Cockayne_syndromehttp://en.wikipedia.org/wiki/Cockayne_syndrome
  • 7/29/2019 Lecture 3 Case Studies

    10/32

    Mouse Cerebellum: PCA

    Method

    Conducted on autoscaled data

    using SVD.

    Findings

    Identified 6 possible outliers all

    of which are in the WT genotype

  • 7/29/2019 Lecture 3 Case Studies

    11/32

    Mouse Cerebellum: Outliers

    methods

    Use PLS-DA to determine if

    outlier samples hold when trying

    to maximize the difference

    between WT and CSB animals.

    Findings

    Noted outliers in WT should be

    removed or analyzed separately

    PCA

    PLS-DA

  • 7/29/2019 Lecture 3 Case Studies

    12/32

    Mouse Cerebellum: ANOVAMethods

    shifted log transformed data

    two-way ANOVA (genotype, diet)

    Findings

    Identification of significant changes in metabolites due to genotype,

    diet (treatment) and interaction between genotype and diet

    genotype effect treatment effect interaction effect

  • 7/29/2019 Lecture 3 Case Studies

    13/32

    Mouse Cerebellum: Multivariate Modeling

    Methods

    autoscaled data

    classification of sample genotype OSC-PLS-DA/OPLS-DA

    OSC-PLS-DA/OPLS-DA Validation

  • 7/29/2019 Lecture 3 Case Studies

    14/32

    Mouse Cerebellum: Multivariate Modeling

    Methods

    autoscaled data

    classification of sample genotype and diet (OPLS-DA) evaluation of Y construction (separate and combined)

    multiple Y single Y

  • 7/29/2019 Lecture 3 Case Studies

    15/32

    Mouse Cerebellum: Multivariate Modeling

    Methods

    autoscaled data

    classification of diet (treatment) effects independently in eachgenotype

    WT CSB

  • 7/29/2019 Lecture 3 Case Studies

    16/32

    Mouse Cerebellum: Network Analysis

    Methods

    generate biochemical and chemical similarity network

    map statistical and OPLS-DA model results to network

    Analyze

    genotype network

    Treatment networks in WT and CSB separately

  • 7/29/2019 Lecture 3 Case Studies

    17/32

    Mouse Cerebellum: Genotype Network

  • 7/29/2019 Lecture 3 Case Studies

    18/32

    Mouse Cerebellum: WT Treatment Network

  • 7/29/2019 Lecture 3 Case Studies

    19/32

    Mouse Cerebellum: CSB Treatment Network

  • 7/29/2019 Lecture 3 Case Studies

    20/32

    Mouse Cerebellum

    Conclusions

    Major differences between CSB and WT : elevation of 2-hydroxyglutaric acid in CSB

    2-hydroxyglutaric aciduria is either autosomal recessive or autosomaldominant

    perturbations in methionine and (potentially) single-carbon

    metabolisms. Increase in the related species methionine, homoserine and serine anddecrease in adenosine-5'phosphate may point to decreases in s-adenosyl methionine (SAM-e) synthesis. Reduction in SAM-e could havedetrimental effects on single carbon metabolism and methylationreactions, which through a systemic reduction in choline would impactphospotidylcholine synthesis.

    Independent of genotype, treatment effects can be classified on acontinuum of metabolic change from CR >HFD > Resv > SD.

    Treatment-related changes in citrulline were modified based on genotype(strong genotype/treatment interaction).

    Similar changes due to treatment in both genotypes (e.g. 1,5-anhydroglycitol) may be an outcome of diet composition and not

    biology.

    http://en.wikipedia.org/wiki/2-Hydroxyglutaric_aciduriahttp://en.wikipedia.org/wiki/2-Hydroxyglutaric_aciduriahttp://en.wikipedia.org/wiki/2-Hydroxyglutaric_aciduriahttp://en.wikipedia.org/wiki/2-Hydroxyglutaric_aciduriahttp://en.wikipedia.org/wiki/2-Hydroxyglutaric_aciduriahttp://en.wikipedia.org/wiki/2-Hydroxyglutaric_aciduriahttp://en.wikipedia.org/wiki/2-Hydroxyglutaric_aciduriahttp://en.wikipedia.org/wiki/2-Hydroxyglutaric_aciduriahttp://en.wikipedia.org/wiki/2-Hydroxyglutaric_aciduria
  • 7/29/2019 Lecture 3 Case Studies

    21/32

    Time Course

    Oral Glucose Tolerance Test MetabolomicsSummary

    Analysis of changes in plasma primary metabolites during an oral glucosetolerance test (OGTT) before and after a 14 week diet and exerciseintervention.

    Study Design

    Overweight women (12-15, obese sedentary, glucose 100 -128 mg/dL )

    Pre and post intervention

    Clinical panel: insulin, glucose, lipids

    Primary metabolites at 0, 30, 60, 90, 120 minutes

    Analysis

    principal components analysis (PCA)

    two-way analysis of variance (ANOVA)

    orthogonal partial least squares discriminant analysis (OPLS-DA) network mapping

  • 7/29/2019 Lecture 3 Case Studies

    22/32

    OGTT: Data Properties

    Excursion

    Baseline and Area

    Under the Curve

    (AUC)

  • 7/29/2019 Lecture 3 Case Studies

    23/32

    Time Course: Options

    Baseline adjusted vs AUC

    Raw (top) vs Baseline

    adjusted (bottom)

  • 7/29/2019 Lecture 3 Case Studies

    24/32

    OGTT: Data Analysis

    Identification of OGTT effects significant metabolomic excursions (one sample t-Test on AUC)

    pre, post or both

    intervention-adjusted PLS model

    OGTT biochemical/chemical similarity network

    Identification of treatment effects Univariate statics

    Two-way ANOVA time and intervention

    Mixed effects modeling (intervention as the main effect and individual subjects asrandom effects)

    PLS-DA modeling and feature selection of changes in

    Baseline (t =0)

    AUC

    Combined baseline and AUC

    Analysis of correlations

  • 7/29/2019 Lecture 3 Case Studies

    25/32

    OGTT: effects on primary metabolism

    PCAPLS-DA

    (intervention adjusted data

    modeling time)

  • 7/29/2019 Lecture 3 Case Studies

    26/32

    OGTT: effects network

  • 7/29/2019 Lecture 3 Case Studies

    27/32

    OGTT: Treatment Effects

    PLS-DA

  • 7/29/2019 Lecture 3 Case Studies

    28/32

    OGTT: Treatment Effects

    Learning from the samples scores position

  • 7/29/2019 Lecture 3 Case Studies

    29/32

    OGTT: Treatment Effects

    Feature Selection onLoadingsVariable Loadings

  • 7/29/2019 Lecture 3 Case Studies

    30/32

    OGTT: Linking biology with our experiment

  • 7/29/2019 Lecture 3 Case Studies

    31/32

    OGTT: Analysis of Correlations

  • 7/29/2019 Lecture 3 Case Studies

    32/32

    Conclusion

    Each data analysis is unique Which method should be used is

    defined by how the data looks and the

    goal of the analysis Different analysis techniques are used to

    get independent perspectives of the data

    Combination of similar evidence fromdifferent techniques is used to define the

    robust explanation of the experiment