statistical analysis methods

30
An International Centre for Mouse Genetics STATISTICAL ANALYSIS METHODS Hugh Morgan

Upload: lael

Post on 25-Feb-2016

52 views

Category:

Documents


0 download

DESCRIPTION

Hugh Morgan. Statistical analysis methods. Introduction. Role of statistics Current Methods EuroPhenome Numerical Parameters Categorical Parameters MGP Problems with these methods and alternatives Worked Example. Tasks. Role of statistics. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Statistical analysis methods

An International Centre for Mouse Genetics

STATISTICAL ANALYSIS METHODS

Hugh Morgan

Page 2: Statistical analysis methods

An International Centre for Mouse Genetics

Introduction

• Role of statistics

• Current Methods• EuroPhenome

• Numerical Parameters• Categorical Parameters

• MGP

• Problems with these methods and alternatives

• Worked Example.

• Tasks.

Page 3: Statistical analysis methods

An International Centre for Mouse Genetics

Role of statistics

• To determine the effect of the genomic alteration on the phenotype

of the animal

• Distinguish effect from substantial multi-factorial noise

• Provide an estimate of the confidence in the veracity of the effect

Page 4: Statistical analysis methods

An International Centre for Mouse Genetics

Current Methods

• EuroPhenome• Numerical Parameters - Wilcoxon rank-sum test• Categorical Parameters – Fishers Exact or Chi-Squared• p-value threashold: 0.0001 (equivalent to 4% change of a false

positive in 400 measured parameters)

• Sanger Mouse Portal / MGP• Numerical Parameters – Reference Range• Categorical Parameters – Fishers Exact with absolute change

threshold

Page 5: Statistical analysis methods

An International Centre for Mouse Genetics

Do them yourself

• All commands are at:• http://mrcmousenetwork.har.mrc.ac.uk/r-commands-mrc-mouse-network-

training• Get data:

• Akt2, Fat mass, View Data, Get as CSV, Save Page• Install R (if required, google R)

• akt2Fat=read.csv("akt2Fat.csv")• summary(akt2Fat)

• Wilcoxon rank-sum test• wilcox.test(Value~Genotype, data = akt2Fat)

• W = 1, p-value = 6.252e-06• T Test

• t.test(Value~Genotype, data = akt2Fat)• t = -9.5627, df = 23.909, p-value = 1.212e-09

Page 6: Statistical analysis methods

An International Centre for Mouse Genetics

Page 7: Statistical analysis methods

An International Centre for Mouse Genetics

Page 8: Statistical analysis methods

An International Centre for Mouse Genetics

Page 9: Statistical analysis methods

An International Centre for Mouse Genetics

Page 10: Statistical analysis methods

An International Centre for Mouse Genetics

Page 11: Statistical analysis methods

An International Centre for Mouse Genetics

Page 12: Statistical analysis methods

An International Centre for Mouse Genetics

Page 13: Statistical analysis methods

An International Centre for Mouse Genetics

Page 14: Statistical analysis methods

An International Centre for Mouse Genetics

Do them yourself

• All commands are at:• http://mrcmousenetwork.har.mrc.ac.uk/r-commands-mrc-mouse-network-

training• Get data:

• Akt2, Fat mass, View Data, Get as CSV, Save Page• Install R (if required, google R)

• akt2Fat=read.csv("akt2Fat.csv")• summary(akt2Fat)

• Wilcoxon rank-sum test• wilcox.test(Value~Genotype, data = akt2Fat)

• W = 1, p-value = 6.252e-06• T Test

• t.test(Value~Genotype, data = akt2Fat)• t = -9.5627, df = 23.909, p-value = 1.212e-09

Page 15: Statistical analysis methods

An International Centre for Mouse Genetics

Do them yourself

• Get data:• Abcd4, Touch escape

• R• abcd4Touch=matrix(c(122,9,2,8),2)

• Fishers Exact Test• fisher.test(abc4Touch)

Page 16: Statistical analysis methods

An International Centre for Mouse Genetics

• abcd4Touch=matrix(c(122,9,2,8),2)

Page 17: Statistical analysis methods

An International Centre for Mouse Genetics

Do them yourself

• Get data:• Abcd4, Touch escape

• R• abcd4Touch=matrix(c(122,9,2,8),2)

• Fishers Exact Test• fisher.test(abcd4Touch)

Fisher's Exact Test for Count Data

data: abcd4Touch p-value = 3.052e-07alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 8.491575 550.552750 sample estimates:odds ratio 50.40908

Page 18: Statistical analysis methods

An International Centre for Mouse Genetics

Sanger Mouse Portal / MGP

• Numerical Parameters – Reference Range• Calculate the range of values that encompases 95% of the

baseline dataset• Call a line phenodeviant in a parameter if 60% or more of the

animals fall outside of that range

• Categorical Parameters – Fishers Exact with absolute change threshold• Fishers Exact test gives p-value < 5% AND• Absolute change of proportion > 60%

Page 19: Statistical analysis methods

An International Centre for Mouse Genetics

Sanger Mouse Portal / MGP

• Numerical Parameters – Reference Range• Calculate the range of values that encompases 95% of the

baseline dataset• Call a line phenodeviant in a parameter if 60% or more of the

animals fall outside of that range

Page 20: Statistical analysis methods

An International Centre for Mouse Genetics

Sanger Mouse Portal / MGP

• Numerical Parameters – Reference Range• Calculate the range of values that encompases 95% of the

baseline dataset• Call a line phenodeviant in a parameter if 60% or more of the

animals fall outside of that range

• Categorical Parameters – Fishers Exact with absolute change threshold• Fishers Exact test gives p-value < 5% AND• Absolute change of proportion > 60%

Page 21: Statistical analysis methods

An International Centre for Mouse Genetics

Problems with these methods and alternatives

• Local structure / Lack of independence

• Numerical Parameters - Wilcoxon rank-sum test• Categorical Parameters – Fishers Exact or Chi-Squared

• MGP• Numerical Parameters – Reference Range• Categorical Parameters – Fishers Exact with absolute change

threshold

Page 22: Statistical analysis methods

An International Centre for Mouse Genetics

Problems with these methods and alternatives

• Local structure / Lack of independence• Inter day variance greater than intra day variance• 2 measurements on the same day are likely to be more similar

than 2 measurements on different days• Cause

• ?• Solution

• Model the structure• Linear Mixed Model

Page 23: Statistical analysis methods

An International Centre for Mouse Genetics

Mixed Model

• Model data as sum of 2 normal distributions, plus a number of fixed effects• Normally distributed

• Inter animal difference• Inter day difference

• Fixed• Gender• Other parameters such as Weight• Genomic alteration (Genotype)• Gender / Genotype effect

• Calculate p value given that Genotype effect is zero

Page 24: Statistical analysis methods

An International Centre for Mouse Genetics

Do them yourself

• Get data:• Ptk7, Grip-Strength, Forelimb grip strength measurement mean,

View Data, Get as CSV, Save File• R

• ptk7GS=read.csv("ptk7GS.csv")• summary(ptk7GS)

Centre Strain Genotype Zygosity Gender Parameter WTSI:29 129/SvEv:29 Akt2 :14 :15 Male:29 Fat mass:29 baseline:15 Hom:14

Page 25: Statistical analysis methods

An International Centre for Mouse Genetics

Do them yourself

• Linear Model (no batch effect modeled)• ptk7GSLM=lm(Value~Genotype + Gender + Genotype*Gender,

ptk7GS, na.action="na.omit")• summary(ptk7GSLM)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 68.777 2.475 27.794 < 2e-16 ***GenotypePtk7 -14.134 5.891 -2.399 0.01777 * GenderMale 11.454 4.011 2.855 0.00497 ** GenotypePtk7:GenderMale 1.987 8.966 0.222 0.82496 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 20.7 on 136 degrees of freedomMultiple R-squared: 0.1222, Adjusted R-squared: 0.1028 F-statistic: 6.311 on 3 and 136 DF, p-value: 0.0004862

Page 26: Statistical analysis methods

An International Centre for Mouse Genetics

Do them yourself

• Look at Fit• ptk7GSLMRes<-residuals(ptk7GSLM)• qqnorm(scale(ptk7GSLMRes))

Page 27: Statistical analysis methods

An International Centre for Mouse Genetics

Do them yourself

• Mixed Model• Excel

• load ptk7GS.csv• =LEFT(H2,(SEARCH("_",H2)-2))• Save ptk7GSLitter.csv

• R• ptk7GSLitter=read.csv("ptk7GSLitter.csv")• ptk7GSMM=lme(Value~Genotype + Gender +

Genotype*Gender,random=~1|Litter, ptk7GSLitter, na.action="na.omit“)• summary(ptk7GSMM)

Page 28: Statistical analysis methods
Page 29: Statistical analysis methods

An International Centre for Mouse Genetics

Do them yourself

• Mixed Model• R

• ptk7GSLitter=read.csv("ptk7GSLitter.csv")• ptk7GSMM=lme(Value~Genotype + Gender + Genotype*Gender, random=~1|Litter, ptk7GSLitter, na.action="na.omit“)

• summary(ptk7GSMM)

Linear mixed-effects model fit by REML

Fixed effects: Value ~ Genotype + Gender + Genotype * Gender Value Std.Error DF t-value p-value(Intercept) 67.02067 3.377184 85 19.845137 0.0000GenotypePtk7 -12.05973 7.461470 85 -1.616267 0.1097GenderMale 12.59607 4.403984 85 2.860154 0.0053GenotypePtk7:GenderMale 1.42342 8.819061 85 0.161403 0.8722

Page 30: Statistical analysis methods

An International Centre for Mouse Genetics

Do them yourself

• Mixed Model• ptk7GSMMRes<-residuals(ptk7GSMM)• qqnorm(scale(ptk7GSLMRes))