gladstone bioinformatics core

15
+ Gladstone Bioinformatics Core Kirsten E. Eilertson + Statistic s in Science: Best Practices

Upload: laasya

Post on 20-Feb-2016

39 views

Category:

Documents


0 download

DESCRIPTION

+. Statistics in Science: Best Practices. Gladstone Bioinformatics Core. Kirsten E. Eilertson. Our Goal. “thoughtful” “insightful” “rigorous” statistical analyses Meaningful and solid inference which can be the basis of future work. Our Challenges. Every application is different - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Gladstone Bioinformatics Core

+

Gladstone Bioinformatics Core

Kirsten E. Eilertson

+ Statistics in Science:

Best Practices

Page 2: Gladstone Bioinformatics Core

+Our Goal•“thoughtful” “insightful” “rigorous” statistical analyses• Meaningful and solid inference which can be the basis of future work

Our Challenges•Every application is different• Precedents can be a

problem•P-value centric publication system•Reproducibility

Page 3: Gladstone Bioinformatics Core

+Reporting and Reproducibility

Reproducibility Crisis!!! Dr. Ioannidis (2005) PLoS Medicine

Page 4: Gladstone Bioinformatics Core

+Our Goal•“thoughtful” “insightful” “rigorous” statistical analyses• Meaningful and solid inference which can be the basis of future work

Our Challenges•Every application is different• Precedents can be a

problem•P-value centric publication system•Reproducibility

Discussion Today:

Reporting results Power and Experimental

Design Outliers

Page 5: Gladstone Bioinformatics Core

+Guidelines for reporting Resources:

Annals of Internal Medicine http://www.people.vcu.edu/~albest/Guidance/guidelines_f

or_statistical_reporting.htm American Physiological Society

http://physiolgenomics.physiology.org/content/18/3/249.full?

‘Describe the statistical methods with enough detail to enable a knowledgeable reader with access to the original data to verify the reported results’ (Bailar & Mosteller, 1988, p. 266)

Page 6: Gladstone Bioinformatics Core

+Guidelines for reporting ‘The design of an experiment, the analysis of its data,

and the communication of the results are intertwined. In fact, design drives analysis and communication.’

Always report the test statistic, the degrees of freedom, the test value, and the P-value that the result occurred at chance under the null hypothesis.

Report how assumptions were checked (e.g. histograms of residuals, tests of normality, etc.)

Provide a clear description of the design of your study or experiment; state the null hypothesis and alternative

Page 7: Gladstone Bioinformatics Core

+Guidelines for reporting

Control for multiple comparisons. Report variability using a standard deviation (not

standard error). Avoid sole reliance on statistical hypothesis testing,

such as the use of P values, which fails to convey important quantitative information. Report uncertainty about scientific importance using a

confidence interval. Interpret each main result by assessing the numerical

bounds of the confidence interval and by considering the precise p value.

Page 9: Gladstone Bioinformatics Core

+A stochastic process

Error Statistics Blog Discussion

Not an “Argument from intentions” but really a “Probabative capacity of the test”.

Page 10: Gladstone Bioinformatics Core

+Power analysis

Uses: Pilot studies! (don’t forget to control for multiple

comparisons) Detectable Effect Size (when non-significant result)

Consider confidence intervals instead

From the American Statistician (2001)

Page 11: Gladstone Bioinformatics Core

+Outliers measurement or model error?

Reasons for concern Increases the estimated standard deviation May indicate the model (e.g. assumption of normality) is

not correct May lead to model misspecification Biased parameter estimation

Page 12: Gladstone Bioinformatics Core

+Methods

Detection Visual inspection Grubbs Test (assumes Normality) Chauvenet’s criterion (assumes Normality) Dixon’s Q test (assumes Normality) Based on interquartile range measure 2 standard deviations

Page 13: Gladstone Bioinformatics Core

+MethodsAnalysis Delete the outlier Trimmed Mean/Winsorized Mean Weighted regression techniques Do nothing Report with & without outliers Arguments for keeping

Methods for identification does not make the practice of deleting scientifically or methodologically sound

Minimal effect on estimates/model (low influence)

Page 14: Gladstone Bioinformatics Core

+Influential Points

Outlier

Leverage

Influence

Page 15: Gladstone Bioinformatics Core

+Outliers: Decide whether or not deleting data points is

warranted: Do not delete data points just because they do not fit your

preconceived model. You must have a good, objective reason for deleting data

points. Implausible; inaccuracy in measurement; from a different

population If you delete any data after you've collected it, justify and

describe it in your reports. If you are not sure what to do about a data point, analyze

the data twice — once with and once without the data point — and report the results of both analyses.