statistical analysis methods

Post on 25-Feb-2016

52 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Hugh Morgan. Statistical analysis methods. Introduction. Role of statistics Current Methods EuroPhenome Numerical Parameters Categorical Parameters MGP Problems with these methods and alternatives Worked Example. Tasks. Role of statistics. - PowerPoint PPT Presentation

TRANSCRIPT

An International Centre for Mouse Genetics

STATISTICAL ANALYSIS METHODS

Hugh Morgan

An International Centre for Mouse Genetics

Introduction

• Role of statistics

• Current Methods• EuroPhenome

• Numerical Parameters• Categorical Parameters

• MGP

• Problems with these methods and alternatives

• Worked Example.

• Tasks.

An International Centre for Mouse Genetics

Role of statistics

• To determine the effect of the genomic alteration on the phenotype

of the animal

• Distinguish effect from substantial multi-factorial noise

• Provide an estimate of the confidence in the veracity of the effect

An International Centre for Mouse Genetics

Current Methods

• EuroPhenome• Numerical Parameters - Wilcoxon rank-sum test• Categorical Parameters – Fishers Exact or Chi-Squared• p-value threashold: 0.0001 (equivalent to 4% change of a false

positive in 400 measured parameters)

• Sanger Mouse Portal / MGP• Numerical Parameters – Reference Range• Categorical Parameters – Fishers Exact with absolute change

threshold

An International Centre for Mouse Genetics

Do them yourself

• All commands are at:• http://mrcmousenetwork.har.mrc.ac.uk/r-commands-mrc-mouse-network-

training• Get data:

• Akt2, Fat mass, View Data, Get as CSV, Save Page• Install R (if required, google R)

• akt2Fat=read.csv("akt2Fat.csv")• summary(akt2Fat)

• Wilcoxon rank-sum test• wilcox.test(Value~Genotype, data = akt2Fat)

• W = 1, p-value = 6.252e-06• T Test

• t.test(Value~Genotype, data = akt2Fat)• t = -9.5627, df = 23.909, p-value = 1.212e-09

An International Centre for Mouse Genetics

An International Centre for Mouse Genetics

An International Centre for Mouse Genetics

An International Centre for Mouse Genetics

An International Centre for Mouse Genetics

An International Centre for Mouse Genetics

An International Centre for Mouse Genetics

An International Centre for Mouse Genetics

An International Centre for Mouse Genetics

Do them yourself

• All commands are at:• http://mrcmousenetwork.har.mrc.ac.uk/r-commands-mrc-mouse-network-

training• Get data:

• Akt2, Fat mass, View Data, Get as CSV, Save Page• Install R (if required, google R)

• akt2Fat=read.csv("akt2Fat.csv")• summary(akt2Fat)

• Wilcoxon rank-sum test• wilcox.test(Value~Genotype, data = akt2Fat)

• W = 1, p-value = 6.252e-06• T Test

• t.test(Value~Genotype, data = akt2Fat)• t = -9.5627, df = 23.909, p-value = 1.212e-09

An International Centre for Mouse Genetics

Do them yourself

• Get data:• Abcd4, Touch escape

• R• abcd4Touch=matrix(c(122,9,2,8),2)

• Fishers Exact Test• fisher.test(abc4Touch)

An International Centre for Mouse Genetics

• abcd4Touch=matrix(c(122,9,2,8),2)

An International Centre for Mouse Genetics

Do them yourself

• Get data:• Abcd4, Touch escape

• R• abcd4Touch=matrix(c(122,9,2,8),2)

• Fishers Exact Test• fisher.test(abcd4Touch)

Fisher's Exact Test for Count Data

data: abcd4Touch p-value = 3.052e-07alternative hypothesis: true odds ratio is not equal to 1 95 percent confidence interval: 8.491575 550.552750 sample estimates:odds ratio 50.40908

An International Centre for Mouse Genetics

Sanger Mouse Portal / MGP

• Numerical Parameters – Reference Range• Calculate the range of values that encompases 95% of the

baseline dataset• Call a line phenodeviant in a parameter if 60% or more of the

animals fall outside of that range

• Categorical Parameters – Fishers Exact with absolute change threshold• Fishers Exact test gives p-value < 5% AND• Absolute change of proportion > 60%

An International Centre for Mouse Genetics

Sanger Mouse Portal / MGP

• Numerical Parameters – Reference Range• Calculate the range of values that encompases 95% of the

baseline dataset• Call a line phenodeviant in a parameter if 60% or more of the

animals fall outside of that range

An International Centre for Mouse Genetics

Sanger Mouse Portal / MGP

• Numerical Parameters – Reference Range• Calculate the range of values that encompases 95% of the

baseline dataset• Call a line phenodeviant in a parameter if 60% or more of the

animals fall outside of that range

• Categorical Parameters – Fishers Exact with absolute change threshold• Fishers Exact test gives p-value < 5% AND• Absolute change of proportion > 60%

An International Centre for Mouse Genetics

Problems with these methods and alternatives

• Local structure / Lack of independence

• Numerical Parameters - Wilcoxon rank-sum test• Categorical Parameters – Fishers Exact or Chi-Squared

• MGP• Numerical Parameters – Reference Range• Categorical Parameters – Fishers Exact with absolute change

threshold

An International Centre for Mouse Genetics

Problems with these methods and alternatives

• Local structure / Lack of independence• Inter day variance greater than intra day variance• 2 measurements on the same day are likely to be more similar

than 2 measurements on different days• Cause

• ?• Solution

• Model the structure• Linear Mixed Model

An International Centre for Mouse Genetics

Mixed Model

• Model data as sum of 2 normal distributions, plus a number of fixed effects• Normally distributed

• Inter animal difference• Inter day difference

• Fixed• Gender• Other parameters such as Weight• Genomic alteration (Genotype)• Gender / Genotype effect

• Calculate p value given that Genotype effect is zero

An International Centre for Mouse Genetics

Do them yourself

• Get data:• Ptk7, Grip-Strength, Forelimb grip strength measurement mean,

View Data, Get as CSV, Save File• R

• ptk7GS=read.csv("ptk7GS.csv")• summary(ptk7GS)

Centre Strain Genotype Zygosity Gender Parameter WTSI:29 129/SvEv:29 Akt2 :14 :15 Male:29 Fat mass:29 baseline:15 Hom:14

An International Centre for Mouse Genetics

Do them yourself

• Linear Model (no batch effect modeled)• ptk7GSLM=lm(Value~Genotype + Gender + Genotype*Gender,

ptk7GS, na.action="na.omit")• summary(ptk7GSLM)

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 68.777 2.475 27.794 < 2e-16 ***GenotypePtk7 -14.134 5.891 -2.399 0.01777 * GenderMale 11.454 4.011 2.855 0.00497 ** GenotypePtk7:GenderMale 1.987 8.966 0.222 0.82496 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 20.7 on 136 degrees of freedomMultiple R-squared: 0.1222, Adjusted R-squared: 0.1028 F-statistic: 6.311 on 3 and 136 DF, p-value: 0.0004862

An International Centre for Mouse Genetics

Do them yourself

• Look at Fit• ptk7GSLMRes<-residuals(ptk7GSLM)• qqnorm(scale(ptk7GSLMRes))

An International Centre for Mouse Genetics

Do them yourself

• Mixed Model• Excel

• load ptk7GS.csv• =LEFT(H2,(SEARCH("_",H2)-2))• Save ptk7GSLitter.csv

• R• ptk7GSLitter=read.csv("ptk7GSLitter.csv")• ptk7GSMM=lme(Value~Genotype + Gender +

Genotype*Gender,random=~1|Litter, ptk7GSLitter, na.action="na.omit“)• summary(ptk7GSMM)

An International Centre for Mouse Genetics

Do them yourself

• Mixed Model• R

• ptk7GSLitter=read.csv("ptk7GSLitter.csv")• ptk7GSMM=lme(Value~Genotype + Gender + Genotype*Gender, random=~1|Litter, ptk7GSLitter, na.action="na.omit“)

• summary(ptk7GSMM)

Linear mixed-effects model fit by REML

Fixed effects: Value ~ Genotype + Gender + Genotype * Gender Value Std.Error DF t-value p-value(Intercept) 67.02067 3.377184 85 19.845137 0.0000GenotypePtk7 -12.05973 7.461470 85 -1.616267 0.1097GenderMale 12.59607 4.403984 85 2.860154 0.0053GenotypePtk7:GenderMale 1.42342 8.819061 85 0.161403 0.8722

An International Centre for Mouse Genetics

Do them yourself

• Mixed Model• ptk7GSMMRes<-residuals(ptk7GSMM)• qqnorm(scale(ptk7GSLMRes))

top related