introduction to statistics by harry

24
8/14/2019 Introduction to Statistics by Harry http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 1/24

Upload: uzama

Post on 30-May-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 1/24

Page 2: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 2/24

Resources

•  Crawley, MJ (2005) Statistics: An Introduction

Using R. Wiley.•  Gentle, J (2002) Elements of Computational 

Statistics. Springer.•  Gonick, L., and Woollcott Smith (1993) A Cartoon

Guide to Statistics. HarperResource (for fun).

Page 3: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 3/24

Who am I?•  Dr. Harry Erwin BS MA PhD MIET MBCS•  My PhD was awarded in bioinformatics. Although my

research interests are in neuroscience, I've had the

coursework and understand current research directions

in computational biology and statistics. I’ve also had

the coursework for a PhD in mathematics.•  I teach computing and neuroscience here at the

University of Sunderland.

Page 4: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 4/24

Doing Statistics•  Usually you do statistics to explore the structure of 

data. The questions you might ask are rather open-

ended. Your understanding is facilitated by a model.•  A model embodies what you currently know about the

data. You can formulate it either as a data-generating

process or a set of rules for processing the data.•  We’ll look at modelling in detail later.

Page 5: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 5/24

Statistical Models•  Often expressed as a set of equations relating

data elements.•  Can include probability distributions for the

elements. If this is the case, you have a

stochastic model.•  The model should be free to evolve based on

data mining.

Page 6: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 6/24

Common Stochastic Models•  Parameterized statistical distributions, such as

the normal distribution, binomial distribution, or

the chi-squared distribution.

•  Sometimes more complicated, where you might

need to use simulation, resampling, and

visualization to determine the parameters of the

model.

Page 7: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 7/24

Page 8: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 8/24

Visualization•  Multiple views are necessary, particularly for

multivariate data.•  Be able to zoom in on the data as a few points

can obscure the interesting structure.•  Scaling of the axes may be necessary, since our

eyes are not perfect tools for detecting structure.•  Watch out for time-ordered or location-ordered

data, particularly if time or location are notexplicitly reported.

Page 9: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 9/24

Plots•  Use simple plots to start with.•  Watch for rounded data—shown by horizontal

strata in the data. That often signals otherproblems.

•  There are a number of plotting tutorials, consult

them.

Page 10: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 10/24

Statistical Activities•  Data collection (ideally the statistician has a say on

how they are collected)•  Description of a dataset

 –  Averages –  Spreads –  Extreme points

•  Inference within a model or collection of models•  Model selection

Page 11: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 11/24

How to Do It•  Start by determining what sort of statistical

analysis you will be doing. You need to know: –  Which variable is the response variable? –  Which are the explanatory variables? –  What kind are the explanatory variables? –  What kind of response variable do you have?

•  If you have multiple response variables, you needto do multivariate analysis (more advanced).

Page 12: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 12/24

Basic Methods•  If all explanatory variables are continuous, plan

on a regression analysis.•  If all explanatory variables are categorical, plan

for an analysis of variance (ANOVA).•  If you have a mix, plan for an analysis of 

covariance (ANCOVA)

Page 13: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 13/24

Effect of the Response Variable•  If the response variable is continuous, then plan on a

normal regression, ANOVA, or ANCOVA.•  If the response variable is a proportion, do a logistic

regression.•  If a count, you need a log linear model.•  If binary, you need a binary logistic analysis•  If time to event or time at death, you will be doing a

survival analysis.

Page 14: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 14/24

Variation•  You want to understand how the response is

dependent on variation in the explanatory

variables, but you are also interested in lack of dependence.•  Design the simplest model that explains the data

adequately.

Page 15: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 15/24

Significance•  You have to determine what the probability of a

false alarm will be—that is, the chance that you

will think something is significant which reallyis not.•  Typical values are 5%, 1%, and 0.1%.•  Don’t test every hypothesis. Some will be true

by chance.

Page 16: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 16/24

Good and Bad Hypotheses•  ‘There are vultures in the local park.’•  ‘There are no vultures in the local park.’•  Which is testable?•  Discuss…

Page 17: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 17/24

Answer•  The ‘null hypothesis’ is testable. •  ‘There are no vultures in the local park.’•  You test it by taking measurements and showing

that if the null hypothesis were true, the chance

of those measurements would be close to zero.•  Discuss further…

Page 18: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 18/24

Experimental Design•  Replication

 –  Increases reliability, so be thorough. Often theanswer is ‘30’.

 – Discuss why.•  Randomization

 – Reduces systematic bias, so do it properly –

 Almost never done properly

 – Discuss why.

Page 19: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 19/24

Controls•  “No controls, no conclusions.”•  A ‘control experiment’ is one where you don’t

apply the treatment or don’t enable the part of your experiment that is supposed to produce thedifferent outcome.

•  You’re comparing the results when the

treatment is applied to the results with notreatment.

Page 20: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 20/24

Replication•  Must be independent•  Not part of a time series•  Not grouped together in space•  Of an appropriate spatial scale•  Covers the normal variation in initial

conditions.

Page 21: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 21/24

Error TypesNull hypothesis

actually true Null hypothesis

actually falseAccept null

hypothesis Correct(no paper but no

embarrassment)Type II (β) error(further experiments

can change this)Reject null

hypothesisType I (α) error(can result in a paper

you have to

withdraw)Correct(a publishable paper)

Page 22: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 22/24

Typical α and β values•  You usually want the probability of rejecting the null

hypothesis (α) when it is true to be less than 5%.•  You usually want the probability of accepting the null

hypothesis (β) when it is false to be less than 20%.•  The power of a test is 1- β, or greater than 80% in this case.•  Rule of Thumb: the number of replicates to reject the null

hypothesis with probability 80% is about 8s2/d 2, where s2 is

the variance in the response and d is the size of thedifference to be detected in a single sample.

Page 23: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 23/24

Inference•  Strong inference

 – A clear hypothesis –

 An acceptable test

•  Weak inference

 – Natural experiments•  Conclusions from natural experiments are

hypotheses. Can still produce good papers.•  Discuss

Page 24: Introduction to Statistics by Harry

8/14/2019 Introduction to Statistics by Harry

http://slidepdf.com/reader/full/introduction-to-statistics-by-harry 24/24

How Long to Go On?•  To stop the experiment as soon as a pleasing

result is obtained?•  To keep going until the theoretically correct

result is obtained? •  Discuss.•  Gregor Mendel’s experiments with peas.