best practices for statistics
DESCRIPTION
Best Practices for Statistics. Best Practices. Purpose of Statistics. Best Practice: Know what you already know, what you want to know and what you don’t know. Starting with Your Research Question. - PowerPoint PPT PresentationTRANSCRIPT
BEST PRACTICES FOR STATISTICS
Know what you know and what you don’t know
Have a comparison group
Use validated measures
Have a Data Entry Plan
Get to know your data
If it doesn’t fit, change it
Place your bets before you collect the data
Use the best methods of analysis for your question & your dataGo beyond the p-value
BEST PRACTICES
What is Statistics?
• Study of Data• Collecting• Organizing• Summarizing • Analyzing• Presenting• Storing &
Sharing
Why is it Important?
• Make sense of the data
• Explain what happens and (possibly) why
• Make sound decisions
• To know how close we are to the truth.
Results
Bias?
Sampling Error?
Invalid Measures
?
Random Error?
Other Factors?
PURPOSE OF STATISTICS
BEST PRACTICE:KNOW WHAT YOU ALREADY
KNOW, WHAT YOU WANT TO KNOW
AND WHAT YOU DON’T KNOW
How do users differ when (searching, finding, selecting) (articles, books, Web sites)?What are the effects of ___________On ____________?
Which is better at improving _________?
How are people (finding, selecting, using) _______?
What are factors associated with ___________?
STARTING WITH YOUR RESEARCH QUESTION
KINDS OF VARIABLES
Independent
Subjects
Factors
Effects of…
Dependent
Objects
Outcomes
Effects on…
Nominal• Counts by category• No meaning between the categories (Blue is not
better than Red)
Ordinal• Ranks• Scales• Space between ranks is subjective
Interval• Integers• No baseline• Space between values is equal and objective, but
discrete
Ratio• Interval data with a baseline• Space between is continuous
LEVELS OF MEASUREMENT (NOIR)
• Counts by Categories
• Ranks• Scales
Qualitative
• Measurements• Composite scores• Simple Counts
Quantitative
ANOTHER WAY
LIKERT-TYPE SCALE?
Arbitrary
Few Levels
Individual Questions
Ordinal?
Symmetrical
Many Levels
Composite Score
Interval?
BEST PRACTICE:HAVE A COMPARISON
GROUP
WAYS OF COMPARING…
Time Periods
Other Libraries
National Surveys
Patron Types
Material Types
• Qualitative• Comparison
Expected ranks or ratios
• Quantitative• Correlations
Two variables
• Quantitative or Qualitative• Paired or Not Paired
Samples or Groups
KINDS OF COMPARISON
BEST PRACTICE: USE A VALID
MEASURE
Are you actually measuring what you are trying to
measure?
VALIDITY OF MEASURES
USE A TOOL WITH ESTABLISHED VALIDITY
Approaches and Study Skills Inventory for Students (ASSIST)
User Engagement Scale (UES)
ESTABLISH VALIDITY OF MEASURES
• ConsistencyReliability
• Common senseContent or
Face Validity
• Based on theoryConstruct Validity
• Comparison with other valid measures
Criterion Validity
BEST PRACTICE: HAVE A DATA PLAN
GOAL OF DATA COLLECTION IN STATISTICS
Reliability
Bias
BIAS
Systematic (not random) deviation from the true value (Statistics.com)
Selection Bias
Measurement• Observer Bias• Non-response
Bias
Analysis Bias
DATA INPUT
Have a data entry plan
Train the inputters
Use data validation tricks
Double-entry
Central Tenden
cy
SpreadError
EXPLORATORY DATA ANALYSIS
• Average• For Quantative data• Excel function:
=Average(range)
Mean• Middle• For Quantitative or Rank data• Excel function:
=Median(range)
Median
• Most common• Primarily for Qualitative data• Excel function: =Mode(range)
Mode
MEASURES OF CENTRAL TENDENCY
SPREAD & DISTRIBUTION
DISTRIBUTION OR SPREAD OF QUALITATIVE DATA
Tables• Counts• Percentages/Ratios• Averages of Counts
Excel• Pivot Tables
PIVOT TABLES IN EXCEL
Select Data
• Highlight table• Insert->Pivot Table
Select Variables
• Categories (Row Labels)• Values
Change Settings
• Percentage of Grand Total
• Average
DEMONSTRATION OF PIVOT TABLES FOR SPREAD OF QUALITATIVE DATA
GRAPH & CHART RULES OF THUMB
TrendsConnection across the
X-axis
CategoricalCompariso
nsGroupedStackedRelative Stacked
CategoricalFew
CategoriesDifferences are Wide
QUANTITATIVE DISTRIBUTIONS
Stem & Leaf
Histogram
Distribution graphs
John W. TukeyExploratory Data
AnalysisExamining your
data visually.Stem & LeafHingesBox plotsScatter plots, etc.
EXPLORATORY DATA ANALYSIS
STEM-AND-LEAF
Stem
Leaf
0 01112222222222222233333344445556666677788899
1 0000000011122223333356778899
2 00122234444799
3 0245
First digit(s
)
Last digit
Years at UNT
0 5 131 6 131 6 131 6 132 6 152 6 162 7 172 7 172 7 182 8 182 8 19
3 11 294 11 294 12 304 12 324 12 345 12 355 13
FROM STEM-AND-LEAF TO HISTOGRAMS
Stem
Leaf Count
0 1122223334445555666666677777899
31
1 000011122222222333346677889 27
2 0122234468 10
3 1112355888 11
4 12 2Range Count
0-9 31
10-19 27
20-29 10
30-39 11
40-49 2
0-9 10-19 20-29 30-39 40-490
10
20
30
40
Histogram of Years at UNT
HISTOGRAMS IN EXCEL
• Options• Add-ins• Manage Add-ins
Analysis Toolpak
• Equal Size Ranges
• Ceiling (“more”)
Set ranges• Data• Data Analysis• Histogram
Create Histogram
• Insert Bar Chart• Highlight
histogram• Select bars &
Format Selection• Gap Width=0%
Create Graph
For Histogra
m
9
19
29
39
49
DEMONSTRATION OF HISTOGRAM IN EXCEL
SPREAD OF QUANTITATIVE DATA
How variable is the data?
Range
Quantiles
Standard
Deviation
RANGE & QUARTILES
Box plotsMedianUpper & lower quartiles
Outliers
PRESENTATION OF SPREAD
Measure of dispersion of data
Square root of the average variation from the mean
STANDARD DEVIATION
Greater variation, less certainty
Lower variation, more certainty
WHAT DOES THE SD TELL YOU?
• Min(range)• Max(range)Range
• Percentiles.inc(range, %)• Quartile.inc(range,
{1,2,3,4})Quantiles
• STDEV.S(range)Standard Deviation
SPREAD IN EXCEL
NORMAL DISTRIBUTION
SKEWED DISTRIBUTIONS
DEMONSTRATION OF DISTRIBUTIONS
Distribution of the PopulationThe “Truth”
N is the # of samples
n is the number of items in each
sample
Watch the cumulative mean & medians slowly merge to the population
Transformation of data
BEST PRACTICE:IF IT DOESN’T FIT,
CHANGE IT
WHY TRANSFORM?
0-9 10-19 20-29 30-3905
101520253035404550
Years at UNT
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9 1
1.1
1.2
1.3
1.4
1.5
1.6
More
0
2
4
6
8
10
12
14
16
Log10(Years at UNT)
Y=a+bxLog(Y)=Log(a+
bx)1/Y =
1/(a+bx)
HOW TRANSFORMATION WORKS
Evaluate the distribution of raw data
Select a transformation method
Transform the data
Normally Distributed?
Statistically Test
Transformed Data
HOW TO BECOME NORMAL
Express the result in the terms of the
transformation
BEST PRACTICE: PLACE YOUR BETS
BEFORE YOU START
INFERENTIAL STATISTICS
Tests of hypotheses• Associations• ExpectationsAccounts for uncertainty• Random error• Confidence interval
Your Hypothe
sis(H1)
Null Hypothesis(H0)
HYPOTHESIS TESTING
EXAMPLE HYPOTHESIS
>=75%* <75%*
*…of journal articles cited by UNT PACS faculty in journal articles published between 2008-2011.
UNT Libraries provides access to…
p
Sample Size
Central Tendency
SpreadDistribution
Significance Level
HYPOTHESIS TESTING
TESTING HYPOTHESES
BEST PRACTICE:CHOOSE THE BEST METHOD
FOR YOUR QUESTION AND DATA
Assumptions
LimitationsAppropriate data
typeWhat the test tests
KNOW THE TESTS
Variable Type
What is being
compared
Independence of units
Underlying variance in
the population
Distribution Sample size
Number of comparison
groups
FACTORS ASSOCIATED WITH CHOICE OF STATISTICAL METHOD
USE A FLOW CHART
BEST PRACTICE: GOING BEYOND THE
P-VALUE
AND THE P-VALUE SAYS…
Much about the
distributions
More about the H0 than
H1
Little about size of
differences
MORE USEFUL STATISTICS
Effect Sizes• Tell the real story
Confidence Intervals• State your certainty
Correlations
• Cohen’s guidelines for Pearson’s r
Differences from the mean
• Standardized• weighted
against the standard deviation
• Cohen’s d
EFFECT SIZES OF QUANTITATIVE DATA
Effect Size
r>
Small .10
Medium
.30
Large .50
Based on Contingency
table
• Odds of event A divided by odds of event B
• Case-control studiesOdds ratio
• Uses probabilities rather than odds• Experiments, RCTsRelative risk
EFFECT SIZES OF QUALITATIVE DATA
Test A/B Yes No Total
Yes 10 15 25
No 50 25 75
Totals 60 40 100
Point estimates
Intervals
Based on
Expressed as:
• Single value• Mean
• Degree of uncertainty• Range of certainty around the point estimate
• Point estimate (e.g. mean)• Confidence level (usually .95)• Standard deviation
• The mean score of the students who had the IL training was 83.5 with a 95% CI of 78.3 and 89.4.
CONFIDENCE INTERVALS
Noise
Signal
STATISTICAL ANALYSIS
Know what you know and what you don’t know
Have a comparison group
Use validated measures
Have a Data Entry Plan
Get to know your data
If it doesn’t fit, change it
Place your bets before you collect the data
Use the best methods of analysis for your question & your dataGo beyond the p-value
BEST PRACTICES
RESOURCESRice Virtual Lab in Statistics
Excel Tutorials for Statistical Analysis
Khan Academy - videos
Basic Research Methods for Librarians
Descriptive Statistical Techniques for Librarians