best practices for statistics

68
BEST PRACTICES FOR STATISTICS

Upload: keaton

Post on 06-Feb-2016

19 views

Category:

Documents


0 download

DESCRIPTION

Best Practices for Statistics. Best Practices. Purpose of Statistics. Best Practice: Know what you already know, what you want to know and what you don’t know. Starting with Your Research Question. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Best Practices for Statistics

BEST PRACTICES FOR STATISTICS

Page 2: Best Practices for Statistics

Know what you know and what you don’t know

Have a comparison group

Use validated measures

Have a Data Entry Plan

Get to know your data

If it doesn’t fit, change it

Place your bets before you collect the data

Use the best methods of analysis for your question & your dataGo beyond the p-value

BEST PRACTICES

Page 3: Best Practices for Statistics

What is Statistics?

• Study of Data• Collecting• Organizing• Summarizing • Analyzing• Presenting• Storing &

Sharing

Why is it Important?

• Make sense of the data

• Explain what happens and (possibly) why

• Make sound decisions

• To know how close we are to the truth.

Page 4: Best Practices for Statistics

Results

Bias?

Sampling Error?

Invalid Measures

?

Random Error?

Other Factors?

PURPOSE OF STATISTICS

Page 5: Best Practices for Statistics

BEST PRACTICE:KNOW WHAT YOU ALREADY

KNOW, WHAT YOU WANT TO KNOW

AND WHAT YOU DON’T KNOW

Page 6: Best Practices for Statistics

How do users differ when (searching, finding, selecting) (articles, books, Web sites)?What are the effects of ___________On ____________?

Which is better at improving _________?

How are people (finding, selecting, using) _______?

What are factors associated with ___________?

STARTING WITH YOUR RESEARCH QUESTION

Page 7: Best Practices for Statistics

KINDS OF VARIABLES

Independent

Subjects

Factors

Effects of…

Dependent

Objects

Outcomes

Effects on…

Page 8: Best Practices for Statistics

Nominal• Counts by category• No meaning between the categories (Blue is not

better than Red)

Ordinal• Ranks• Scales• Space between ranks is subjective

Interval• Integers• No baseline• Space between values is equal and objective, but

discrete

Ratio• Interval data with a baseline• Space between is continuous

LEVELS OF MEASUREMENT (NOIR)

Page 9: Best Practices for Statistics

• Counts by Categories

• Ranks• Scales

Qualitative

• Measurements• Composite scores• Simple Counts

Quantitative

ANOTHER WAY

Page 10: Best Practices for Statistics

LIKERT-TYPE SCALE?

Arbitrary

Few Levels

Individual Questions

Ordinal?

Symmetrical

Many Levels

Composite Score

Interval?

Page 11: Best Practices for Statistics

BEST PRACTICE:HAVE A COMPARISON

GROUP

Page 12: Best Practices for Statistics

WAYS OF COMPARING…

Time Periods

Other Libraries

National Surveys

Patron Types

Material Types

Page 13: Best Practices for Statistics

• Qualitative• Comparison

Expected ranks or ratios

• Quantitative• Correlations

Two variables

• Quantitative or Qualitative• Paired or Not Paired

Samples or Groups

KINDS OF COMPARISON

Page 14: Best Practices for Statistics

BEST PRACTICE: USE A VALID

MEASURE

Page 15: Best Practices for Statistics

Are you actually measuring what you are trying to

measure?

VALIDITY OF MEASURES

Page 16: Best Practices for Statistics

USE A TOOL WITH ESTABLISHED VALIDITY

Approaches and Study Skills Inventory for Students (ASSIST)

User Engagement Scale (UES)

Page 17: Best Practices for Statistics

ESTABLISH VALIDITY OF MEASURES

• ConsistencyReliability

• Common senseContent or

Face Validity

• Based on theoryConstruct Validity

• Comparison with other valid measures

Criterion Validity

Page 18: Best Practices for Statistics

BEST PRACTICE: HAVE A DATA PLAN

Page 19: Best Practices for Statistics

GOAL OF DATA COLLECTION IN STATISTICS

Reliability

Bias

Page 20: Best Practices for Statistics

BIAS

Systematic (not random) deviation from the true value (Statistics.com)

Selection Bias

Measurement• Observer Bias• Non-response

Bias

Analysis Bias

Page 21: Best Practices for Statistics

DATA INPUT

Have a data entry plan

Train the inputters

Use data validation tricks

Double-entry

Page 22: Best Practices for Statistics

BEST PRACTICE:GET TO KNOW YOUR

DATA

Page 23: Best Practices for Statistics

Central Tenden

cy

SpreadError

EXPLORATORY DATA ANALYSIS

Page 24: Best Practices for Statistics

• Average• For Quantative data• Excel function:

=Average(range)

Mean• Middle• For Quantitative or Rank data• Excel function:

=Median(range)

Median

• Most common• Primarily for Qualitative data• Excel function: =Mode(range)

Mode

MEASURES OF CENTRAL TENDENCY

Page 25: Best Practices for Statistics

SPREAD & DISTRIBUTION

Page 26: Best Practices for Statistics

DISTRIBUTION OR SPREAD OF QUALITATIVE DATA

Tables• Counts• Percentages/Ratios• Averages of Counts

Excel• Pivot Tables

Page 27: Best Practices for Statistics

PIVOT TABLES IN EXCEL

Select Data

• Highlight table• Insert->Pivot Table

Select Variables

• Categories (Row Labels)• Values

Change Settings

• Percentage of Grand Total

• Average

Page 28: Best Practices for Statistics

DEMONSTRATION OF PIVOT TABLES FOR SPREAD OF QUALITATIVE DATA

Page 29: Best Practices for Statistics

GRAPH & CHART RULES OF THUMB

TrendsConnection across the

X-axis

CategoricalCompariso

nsGroupedStackedRelative Stacked

CategoricalFew

CategoriesDifferences are Wide

Page 30: Best Practices for Statistics

QUANTITATIVE DISTRIBUTIONS

Stem & Leaf

Histogram

Distribution graphs

Page 31: Best Practices for Statistics

John W. TukeyExploratory Data

AnalysisExamining your

data visually.Stem & LeafHingesBox plotsScatter plots, etc.

EXPLORATORY DATA ANALYSIS

Page 32: Best Practices for Statistics

STEM-AND-LEAF

Stem

Leaf

0 01112222222222222233333344445556666677788899

1 0000000011122223333356778899

2 00122234444799

3 0245

First digit(s

)

Last digit

Years at UNT

0 5 131 6 131 6 131 6 132 6 152 6 162 7 172 7 172 7 182 8 182 8 19

3 11 294 11 294 12 304 12 324 12 345 12 355 13 

Page 33: Best Practices for Statistics

FROM STEM-AND-LEAF TO HISTOGRAMS

Page 34: Best Practices for Statistics

Stem

Leaf Count

0 1122223334445555666666677777899

31

1 000011122222222333346677889 27

2 0122234468 10

3 1112355888 11

4 12 2Range Count

0-9 31

10-19 27

20-29 10

30-39 11

40-49 2

0-9 10-19 20-29 30-39 40-490

10

20

30

40

Histogram of Years at UNT

Page 35: Best Practices for Statistics

HISTOGRAMS IN EXCEL

• Options• Add-ins• Manage Add-ins

Analysis Toolpak

• Equal Size Ranges

• Ceiling (“more”)

Set ranges• Data• Data Analysis• Histogram

Create Histogram

• Insert Bar Chart• Highlight

histogram• Select bars &

Format Selection• Gap Width=0%

Create Graph

For Histogra

m

9

19

29

39

49

Page 36: Best Practices for Statistics

DEMONSTRATION OF HISTOGRAM IN EXCEL

Page 37: Best Practices for Statistics

SPREAD OF QUANTITATIVE DATA

How variable is the data?

Range

Quantiles

Standard

Deviation

Page 38: Best Practices for Statistics

RANGE & QUARTILES

Page 39: Best Practices for Statistics

Box plotsMedianUpper & lower quartiles

Outliers

PRESENTATION OF SPREAD

Page 40: Best Practices for Statistics

Measure of dispersion of data

Square root of the average variation from the mean

STANDARD DEVIATION

Page 41: Best Practices for Statistics

Greater variation, less certainty

Lower variation, more certainty

WHAT DOES THE SD TELL YOU?

Page 42: Best Practices for Statistics

• Min(range)• Max(range)Range

• Percentiles.inc(range, %)• Quartile.inc(range,

{1,2,3,4})Quantiles

• STDEV.S(range)Standard Deviation

SPREAD IN EXCEL

Page 45: Best Practices for Statistics

DEMONSTRATION OF DISTRIBUTIONS

Distribution of the PopulationThe “Truth”

N is the # of samples

n is the number of items in each

sample

Watch the cumulative mean & medians slowly merge to the population

Page 46: Best Practices for Statistics

Transformation of data

BEST PRACTICE:IF IT DOESN’T FIT,

CHANGE IT

Page 47: Best Practices for Statistics

WHY TRANSFORM?

0-9 10-19 20-29 30-3905

101520253035404550

Years at UNT

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9 1

1.1

1.2

1.3

1.4

1.5

1.6

More

0

2

4

6

8

10

12

14

16

Log10(Years at UNT)

Page 48: Best Practices for Statistics

Y=a+bxLog(Y)=Log(a+

bx)1/Y =

1/(a+bx)

HOW TRANSFORMATION WORKS

Page 50: Best Practices for Statistics

BEST PRACTICE: PLACE YOUR BETS

BEFORE YOU START

Page 51: Best Practices for Statistics

INFERENTIAL STATISTICS

Tests of hypotheses• Associations• ExpectationsAccounts for uncertainty• Random error• Confidence interval

Page 52: Best Practices for Statistics

Your Hypothe

sis(H1)

Null Hypothesis(H0)

HYPOTHESIS TESTING

Page 53: Best Practices for Statistics

EXAMPLE HYPOTHESIS

>=75%* <75%*

*…of journal articles cited by UNT PACS faculty in journal articles published between 2008-2011.

UNT Libraries provides access to…

Page 54: Best Practices for Statistics

p

Sample Size

Central Tendency

SpreadDistribution

Significance Level

HYPOTHESIS TESTING

Page 55: Best Practices for Statistics

TESTING HYPOTHESES

Page 56: Best Practices for Statistics

BEST PRACTICE:CHOOSE THE BEST METHOD

FOR YOUR QUESTION AND DATA

Page 57: Best Practices for Statistics

Assumptions

LimitationsAppropriate data

typeWhat the test tests

KNOW THE TESTS

Page 58: Best Practices for Statistics

Variable Type

What is being

compared

Independence of units

Underlying variance in

the population

Distribution Sample size

Number of comparison

groups

FACTORS ASSOCIATED WITH CHOICE OF STATISTICAL METHOD

Page 59: Best Practices for Statistics

USE A FLOW CHART

Page 60: Best Practices for Statistics

BEST PRACTICE: GOING BEYOND THE

P-VALUE

Page 61: Best Practices for Statistics

AND THE P-VALUE SAYS…

Much about the

distributions

More about the H0 than

H1

Little about size of

differences

Page 62: Best Practices for Statistics

MORE USEFUL STATISTICS

Effect Sizes• Tell the real story

Confidence Intervals• State your certainty

Page 63: Best Practices for Statistics

Correlations

• Cohen’s guidelines for Pearson’s r

Differences from the mean

• Standardized• weighted

against the standard deviation

• Cohen’s d

EFFECT SIZES OF QUANTITATIVE DATA

Effect Size

r>

Small .10

Medium

.30

Large .50

Page 64: Best Practices for Statistics

Based on Contingency

table

• Odds of event A divided by odds of event B

• Case-control studiesOdds ratio

• Uses probabilities rather than odds• Experiments, RCTsRelative risk

EFFECT SIZES OF QUALITATIVE DATA

Test A/B Yes No Total

Yes 10 15 25

No 50 25 75

Totals 60 40 100

Page 65: Best Practices for Statistics

Point estimates

Intervals

Based on

Expressed as:

• Single value• Mean

• Degree of uncertainty• Range of certainty around the point estimate

• Point estimate (e.g. mean)• Confidence level (usually .95)• Standard deviation

• The mean score of the students who had the IL training was 83.5 with a 95% CI of 78.3 and 89.4.

CONFIDENCE INTERVALS

Page 66: Best Practices for Statistics

Noise

Signal

STATISTICAL ANALYSIS

Page 67: Best Practices for Statistics

Know what you know and what you don’t know

Have a comparison group

Use validated measures

Have a Data Entry Plan

Get to know your data

If it doesn’t fit, change it

Place your bets before you collect the data

Use the best methods of analysis for your question & your dataGo beyond the p-value

BEST PRACTICES