stats workshop2010

24
MCT Mathematics & Statistics Paul Garthwaite [email protected] http://statistics.open.ac.uk/advisory.html Introduction Introduction to Statistical to Statistical Analysis Analysis

Upload: anesah

Post on 11-May-2015

693 views

Category:

Documents


0 download

DESCRIPTION

Paul Garthwaite's Presentation on Statistics

TRANSCRIPT

Page 1: Stats Workshop2010

MCTMathematics & Statistics

Paul Garthwaite

[email protected]

http://statistics.open.ac.uk/advisory.html

Introduction to Introduction to Statistical AnalysisStatistical Analysis

Page 2: Stats Workshop2010

The Scientific Method

• Deductive reasoning:– from the general to the specific ("top-

down" approach)

Page 3: Stats Workshop2010

3

Theory: In a pig’s digestive system, all phosphate ions are the same, regardless of what they

were bound with.

Theory: If you are a diabetic, losing weight will help you live longer.

Page 4: Stats Workshop2010

Study Design(deductive reasoning)

Page 5: Stats Workshop2010

5

Hypothesis testing is like a court of law: You aim to disprove the null hypothesis.

The hypothesis of a court: The person in the dock is innocent.

The aim is to gather evidence that is inconsistent with this hypothesis. We reject the hypothesis (and decide the person is guilty) if the evidence makes the hypothesis unlikely (beyond all reasonable doubt).

Page 6: Stats Workshop2010

Inductive Reasoning

• From set of specific observations to broader generalizations and theories ("bottom up" approach)

Page 7: Stats Workshop2010

7

Observational Study(inductive reasoning)

Page 8: Stats Workshop2010

8

Observational studies could feed into inductive reasoning.

Pilot studies have a place in forming hypotheses.

Some disciplines (e.g. psychology) seem to disapprove of observational studies. Presumably such studies are written up as if the hypotheses were decided before gathering the data. (A dangerous practice!)

Page 9: Stats Workshop2010

Statistical Design

• Study can be:– Observational analyse existing data (Inductive)– Experimental produce new data (Deductive)

• Relies on random sampling– Obtain information about the whole from analysing

the part (inferential statistics)

• Experimental design:– randomly allocates conditions/treatments on

subjects to observe their response

Page 10: Stats Workshop2010

Warning

Poor designs can lead to:

• Inefficient use of collected data

• Difficult statistical analysis

• Inability to draw meaningful

conclusions

Page 11: Stats Workshop2010

Use Common Sense

• Think about questions your research might answer.

• Can you gather data related to those questions?• Using common sense, would the data answer

those questions?

Pigs and phosphates: feed pigs different phosphate compounds and see if their bone strengths differ?

Diabetes and diet: use patient notes to get age at death, age at diagnosis, and weight loss in first year after diagnosis.

Page 12: Stats Workshop2010

12

• In many ways, statistics just makes common sense rigorous.

• Think about what covariates may be relevant and try to measure them (gender and age in many social contexts; smoking in medical studies; etc.)

• Try to reduce random variation.

Page 13: Stats Workshop2010

13

Gather lots of data

• A decent experiment will generally form about a quarter of a PhD (perhaps more) – four papers are enough for a PhD in most disciplines.

• Designing an experiment, collecting data, analysing it, writing a paper, revising the paper, and so on, will take several months.

• People typically do not spend enough time gathering data. The data drives the conclusions you can reach

More data = Firmer conclusions

Page 14: Stats Workshop2010

14

How much data? (My rules of thumb.)• In a controlled experiment where the quantity of

interest is a measurement, forty or so independent observations will typically enable modest-sized differences to be identified.

• With observational data and questionnaire data, gathering 150 data or more should typically be the aim: you want 25 observations in each category of interest.

• More data is needed with counts than measurements.

• More data is needed with binary quantities (yes/no; cured/not cured; success/failure) than with Likert scores.

Page 15: Stats Workshop2010

15

Questionnaires

Likert scales are good:

strongly weakly indifferent/ disagree/ strongly agree/ agree/ disagree.

Having five points on a Likert scale is often about right. Code the values as 1, 2, 3, 4, 5 and it is usually OK to treat them as measurements.

Open-ended questions are hard to analyse.

Page 16: Stats Workshop2010

Statistical Data Analysis• Turning data into information: First produce

summary statistics (means percentages, standard deviations), graphs, bar-charts, cross-tabulations.

• Try to get a feel for your data – what does it tell you? (If you feel you are non-numerate, work at becoming numerate.)

• Try to form quantitative hypotheses that you think the data will refute. (e.g. “The proportions in the ‘strongly agree’ category are the same in these two sub-populations” or “As this quantity changes, the average value of this other quantity does not change”.)

Page 17: Stats Workshop2010

17

Common fundamental statistical methods

• t-tests

• Comparison of proportions

• Contingency tables

• Regression

• Analysis of variance

It is worth knowing when these are useful.

Page 18: Stats Workshop2010

18

Regression

• In many ways regression is the most useful statistical method.

• It lets you test whether one variable affects another (while controlling for other covariates if necessary).

• It also describes the relationship.• Stepwise methods help you find/test which

variables are important.• Generalised linear models add flexibility.

survival time (weight change) .age .gendera b c d

.BMI .IHD .(blood pressure).e f g

Page 19: Stats Workshop2010

19

• There is an advisory service that can help on:

– Designing an experiment

– How to approach the analysis of data

– Choosing appropriate techniques

– Interpreting results

– Understanding outputs from statistical packages

• Too few people ask for advice before gathering data.

Page 20: Stats Workshop2010

Statistical Software

• Packages are only tools (‘number crunches’)

Most important is to choose adequatemethod for your problem

Remember:

Garbage in Garbage out

Page 21: Stats Workshop2010

Some Statistical Packages

• General software (e.g. spreadsheets)

• Specialised:– Genstat, Minitab, SAS, Statistica, – SPSS

• wide range of statistical procedures• good graphical capability• fairly easy to use (menu driven option)• Good help facility with case studies

Page 22: Stats Workshop2010

Statistics Courses

• M248: Analysing Data– Exploratory data analysis. Models for data.

Estimation. Confidence intervals. Hypothesis testing. Regression and two-variable problems. (Minitab)

• M249: Practical Modern Statistics– Medical statistics. Time series analysis.

Multivariate statistics. Bayesian methods.– Focus on applications: SPSS and WinBUGS.

Page 23: Stats Workshop2010

Statistics Courses

• M343: Applications of Probability– Models to describe patterns in time and space.

Epidemiological models. Genetics and stockmarket price applications.

• M346: Linear Statistical Modelling– ANOVA. Design of experiments. Linear

regression. Generalized linear models. Diagnostic checking. Log-linear models. (GenStat)

Page 24: Stats Workshop2010

The Stats-Advisory Service

• Drop-in sessions

– Mondays: 2:00 – 4:00 (M216)

– Thursdays: 10:30 – 12:20 (M214)

(Both in Maths and Computing Building)

• Web:– http://statistics.open.ac.uk/advisory.html

• E-mail:

[email protected]