the following lecture has been approved for university undergraduate students this lecture may...
TRANSCRIPT
The following lecture has been approved for
University Undergraduate Students
This lecture may contain information, ideas, concepts and discursive anecdotes that may be thought provoking and challenging
It is not intended for the content or delivery to cause offence
Any issues raised in the lecture may require the viewer to engage in further thought, insight, reflection or critical evaluation
Background to StatisticsBackground to Statisticsfor for
non-statisticiansnon-statisticians
Craig Jackson Craig Jackson Prof. Occupational Health PsychologyProf. Occupational Health Psychology
Faculty of Education, Law & Social SciencesBCU
[email protected]@bcu.ac.uk
Keep it simpleKeep it simple
““Some people hate the very name of statistics but.....their power of Some people hate the very name of statistics but.....their power of
dealing with complicated phenomena is extraordinary. They are the dealing with complicated phenomena is extraordinary. They are the
only tools by which an opening can be cut through the formidable only tools by which an opening can be cut through the formidable
thicket of difficulties that bars the path of those who pursue the science thicket of difficulties that bars the path of those who pursue the science
of man.”of man.”
Sir Francis Galton, 1889Sir Francis Galton, 1889
How Many Make a Sample?How Many Make a Sample?
““8 out of 10 owners who expressed a preference, said their cats 8 out of 10 owners who expressed a preference, said their cats preferred it.”preferred it.”
How confident can we be about such statistics?How confident can we be about such statistics?
8 out of 10?8 out of 10?80 out of 100?80 out of 100?
800 out of 1000?800 out of 1000?80,000 out of 100,000?80,000 out of 100,000?
Types of Data / VariablesTypes of Data / Variables
ContinuousContinuous DiscreteDiscrete
BPBP ChildrenChildrenHeightHeight Age last birthday Age last birthday WeightWeight colds in last year colds in last year Age Age
OrdinalOrdinal NominalNominal
Grade of conditionGrade of condition SexSexPositions 1Positions 1stst 2 2ndnd 3 3rdrd Hair colourHair colour““Better- Same-Worse”Better- Same-Worse” Blood groupBlood groupHeight groupsHeight groups Eye colourEye colourAge groupsAge groups
Conversion & Re-classificationConversion & Re-classification
Easier to summarise Ordinal / Nominal dataEasier to summarise Ordinal / Nominal data
Cut-off PointsCut-off Points (who decides this?)(who decides this?)
Allows Continuous variables to be changed into Nominal variables Allows Continuous variables to be changed into Nominal variables
BPBP > 90mmHg> 90mmHg == HypertensiveHypertensive
BPBP =< 90mmHg=< 90mmHg == NormotensiveNormotensive
Easier clinical decisionsEasier clinical decisions
Categorisation reduces quality of dataCategorisation reduces quality of data
Statistical tests may be more “sensational”Statistical tests may be more “sensational”
Good for summariesGood for summaries Bad for “accuracy”Bad for “accuracy”
BMIBMI
Obese vs Underweight Obese vs Underweight
Types of statistics / analysesTypes of statistics / analyses
DESCRIPTIVE STATISTICSDESCRIPTIVE STATISTICS Describing a phenomenaDescribing a phenomena
FrequenciesFrequencies How many…How many…Basic measurementsBasic measurements Meters, seconds, cmMeters, seconds, cm33, IQ, IQ
INFERENTIAL STATISTICSINFERENTIAL STATISTICS Inferences about phenomenaInferences about phenomena
Hypothesis TestingHypothesis Testing Proving or disproving theoriesProving or disproving theoriesConfidence IntervalsConfidence Intervals If sample relates to the larger populationIf sample relates to the larger populationCorrelationCorrelation Associations between phenomena Associations between phenomena Significance testingSignificance testing e.g diet and healthe.g diet and health
Multiple MeasurementMultiple Measurementor….or….
why statisticians and love don’t mixwhy statisticians and love don’t mix25 cells25 cells
22 cells22 cells
24 cells24 cells
21 cells21 cells
Total Total = 92 cells= 92 cellsMean Mean = 23 cells= 23 cellsSDSD = 1.8 cells= 1.8 cells
26
25
24
23
22
21
20
NN AgeAge IQIQ
11 2020 10010022 2020 10010033 2020 10010044 2020 10010055 2020 10010066 2020 10010077 2020 10010088 2020 10010099 2020 1001001010 2020 100100
TotalTotal 200200 10001000MeanMean 2020 100100SDSD 00 00
NN AgeAge IQIQ
11 1818 10010022 2020 11011033 2222 11911944 2424 10110155 2626 10510566 2121 11311377 1919 12012088 2525 11911999 2020 1141141010 2121 101101
TotalTotal 216216 11021102MeanMean 21.621.6 110.2110.2SDSD ± 4.2± 4.2 ± 19.2 ± 19.2
NN AgeAge IQIQ
11 1818 10010022 2020 11011033 2222 11911944 2424 10110155 2626 10510566 2121 11311377 1919 12012088 2525 11911999 2020 1141141010 4545 156156
TotalTotal 240240 11571157MeanMean 2424 115.7115.7SDSD ± 8.5± 8.5 ± 30.2 ± 30.2
Small samples spoil researchSmall samples spoil research
Central TendencyCentral Tendency
ModeMode MedianMedian MeanMean
Patient comfort ratingPatient comfort rating
1010 99 88 77 66 55 44 33 22 11
3131 2727 7070 121121 140140 129129 128128 9090 8080 6262FrequencyFrequency
DispersionDispersion
RangeRange Spread of dataSpread of data
MeanMean Arithmetic averageArithmetic average
MedianMedian LocationLocation
ModeMode FrequencyFrequency
SDSD Spread of dataSpread of dataabout the meanabout the mean
RangeRange 50-112 mmHg50-112 mmHgMeanMean 82mmHg82mmHg MedianMedian 82mmHg82mmHg ModeMode 82mmHg82mmHgSDSD ± 10mmHg± 10mmHg
DispersionDispersion
An individual score therefore possess a standard deviation (away from the An individual score therefore possess a standard deviation (away from the mean), which can be positive or negativemean), which can be positive or negative
Depending on which side of the mean the score isDepending on which side of the mean the score is
If add the positive and negative deviations together, it equals zero If add the positive and negative deviations together, it equals zero
(the positives and negatives cancel out)(the positives and negatives cancel out) central value (mean)central value (mean)
negative deviationnegative deviation positive deviation positive deviation
5’6” 5’7” 5’8” 5’9” 5’10” 5’11” 6’ 6’1” 6’2” 6’3” 6’4”5’6” 5’7” 5’8” 5’9” 5’10” 5’11” 6’ 6’1” 6’2” 6’3” 6’4”
RangeRange
1st1st 5th5th 25th25th 50th50th 75th75th 95th95th 99th99th
DispersionDispersion
RangeRangeThe interval between the highest and lowest measuresThe interval between the highest and lowest measuresLimited value as it involves the two most extreme (likely faulty) measuresLimited value as it involves the two most extreme (likely faulty) measures
PercentilePercentileThe value below / above which a particular percentage of values fall The value below / above which a particular percentage of values fall (median is the 50th percentile)(median is the 50th percentile)e.g 5th percentile - 5% of values fall below it, 95% of values fall above it.e.g 5th percentile - 5% of values fall below it, 95% of values fall above it.A series of percentiles (1st, 5th, 25th, 50th, 75th, 95, 99th) gives a good general A series of percentiles (1st, 5th, 25th, 50th, 75th, 95, 99th) gives a good general idea of the scatter and shape of the dataidea of the scatter and shape of the data
Standard DeviationStandard Deviation
To get around this, we square each of the observationsTo get around this, we square each of the observations
Makes all the values positive Makes all the values positive (a minus times a minus….)(a minus times a minus….)
Then sum all those squared observations to calculate the meanThen sum all those squared observations to calculate the mean
This gives the variance - where every observation is squaredThis gives the variance - where every observation is squared
Need to take the square root of the variance, to get the standard deviationNeed to take the square root of the variance, to get the standard deviation
SD SD = = Σ xΣ x22 – (Σ x) – (Σ x)22 / N / N (N – 1)(N – 1)
Non Normal DistributionNon Normal DistributionSome distributions fail to be symmetricalSome distributions fail to be symmetricalIf the tail on the left is longer than the right, If the tail on the left is longer than the right, the distribution is negatively skewed (to the left)the distribution is negatively skewed (to the left)
If the tail on the right is longer than the left, the If the tail on the right is longer than the left, the distribution is positively skewed distribution is positively skewed (to the right)(to the right)
Grouped DataGrouped Data
Normal DistributionNormal DistributionSD is useful because of the shape of many distributions of data.SD is useful because of the shape of many distributions of data.Symmetrical, bell-shaped / normal / Gaussian distributionSymmetrical, bell-shaped / normal / Gaussian distribution
central value (mean)central value (mean)
3 SD 3 SD 2 SD 2 SD 1 SD 1 SD 0 SD 1 SD 2 SD 3 SD 0 SD 1 SD 2 SD 3 SD
Normal DistributionsNormal Distributions
Standard Normal Distribution has a mean of 0 and a standard deviation of 1Standard Normal Distribution has a mean of 0 and a standard deviation of 1The total area under the curve amounts to 100% / unity of the observationsThe total area under the curve amounts to 100% / unity of the observations
Proportions of observations within any given range can be obtained from the Proportions of observations within any given range can be obtained from the distribution by using statistical tables of the standard normal distributiondistribution by using statistical tables of the standard normal distribution
95% of measurements / observations lie within 1.96 SD’s either side of the 95% of measurements / observations lie within 1.96 SD’s either side of the meanmean
balls dropped through a balls dropped through a succession of metal pins…..succession of metal pins…..
……..a normal distribution ..a normal distribution of ballsof balls
do not have a normal do not have a normal distribution here. Why?distribution here. Why?
Quincunx machine 1877Quincunx machine 1877
The distribution derived from the The distribution derived from the quincunx is not perfectquincunx is not perfect
It was only made from 18 ballsIt was only made from 18 balls
Normal & Non-normal distributionsNormal & Non-normal distributions
5’6” 5’7” 5’8” 5’9” 5’10” 5’11” 6’ 6’1” 6’2” 6’3” 6’4”5’6” 5’7” 5’8” 5’9” 5’10” 5’11” 6’ 6’1” 6’2” 6’3” 6’4” Height Height
% o
f pop
ulat
ion
% o
f pop
ulat
ion
DistributionsDistributions
Sir Francis Galton (1822-1911) Alumni of Birmingham UniversitySir Francis Galton (1822-1911) Alumni of Birmingham University
9 books and > 200 papers 9 books and > 200 papers Fingerprints, correlation of calculus, twins, neuropsychology, blood Fingerprints, correlation of calculus, twins, neuropsychology, blood transfusions, travel in undeveloped countries, criminality and meteorology)transfusions, travel in undeveloped countries, criminality and meteorology)
Deeply concerned with improving standards of measurementDeeply concerned with improving standards of measurement
Normal & Non-normal distributionsNormal & Non-normal distributions
Galton’s quincunx machine ran with hundreds of ballsGalton’s quincunx machine ran with hundreds of ballsa more “perfect” shaped normal distribution.a more “perfect” shaped normal distribution.
Obvious implications for the size of samples of populations usedObvious implications for the size of samples of populations used
The more lead shot runs through the quincunx machine, the smoother the The more lead shot runs through the quincunx machine, the smoother the distribution distribution
in the long run . . . . . in the long run . . . . .
ExposedExposed ControlsControls TT P Pn=197n=197 n=178n=178
AgeAge 45.545.5 48.948.9 2.19 2.19 0.070.07(yrs)(yrs) (( 9.4) 9.4) (( 7.3) 7.3)
I.QI.Q 105105 9999 1.78 1.78 0.120.12(( 10.8) 10.8) (( 8.7) 8.7)
Speed Speed 115.1115.1 94.794.7 3.76 3.76 0.040.04 (ms) (ms) (( 13.4) 13.4) (( 12.4) 12.4)
Presentation of dataPresentation of data
Table of meansTable of means
ExposedExposed ControlsControls
HealthyHealthy 5050 150 150 200200
UnwellUnwell 147147 28 28 175175
197197 178 178 375375
Chi square (test of association) shows:Chi square (test of association) shows:
Chi square = 7.2Chi square = 7.2 P = 0.02P = 0.02
Presentation of dataPresentation of data
Category tablesCategory tables
y-axisy-axis
x-axis (abscissa)x-axis (abscissa)
y-axisy-axislabellabel
(ordinate)(ordinate)
scalescale
Data display areaData display area
groupsgroups
Legend keyLegend key
Title of graphTitle of graph
Bar Charts
A set of measurements can be presented either as a table or as a figureGraphs are not always as accurate as tables, but portray trends more easily
00
10001000
20002000
30003000
40004000
50005000
60006000
70007000
11 22 33 44 55 66 77 88 99 1010User ratingUser rating
Vo
tes
Vo
tes
Movie goers’ ratings for both moviesMovie goers’ ratings for both movies
VacationVacation
EmpireEmpire
Bar ChartsBar Charts
Some Real DataSome Real DataA combination of distributions is acceptable to facilitate comparisonsA combination of distributions is acceptable to facilitate comparisons
With a scatter diagram, each individual observation becomes a point on the scatter plot, based on two co-ordinates, measured on the abscissa and the ordinate
Two perpendicular lines are drawn through the medians - dividing the plot into quadrants
Each quadrant should outlie 25% of all observations
Correlation and AssociationCorrelation and Association
ordi
nate
abscissa
Correlation is a numerical expression between 1 and -1 (extending through all points Correlation is a numerical expression between 1 and -1 (extending through all points in between). Properly called the Correlation Coefficient.in between). Properly called the Correlation Coefficient.A decimal measure of association (not necessarily causation) between variablesA decimal measure of association (not necessarily causation) between variables
Correlation of 1 Maximal - any value of one variable precisely determines the other. Perfect +ve
Correlation of -1 Any value of one variable precisely determines the other, but in an opposite direction to a correlation of 1. As one value increases, the other decreases. Perfect -ve
Correlation of 0 - No relationship between the variables. Totally independent of each other. “Nothing”
Correlation of 0.5 - Only a slight relationship between the variables i.e half of the variables can be predicted by the other, the other half can’t. Medium +ve
Correlations between 0 and 0.3 are weak
Correlations between 0.4 and 0.7 are moderate
Correlations between 0.8 and 1 are strong
Correlation and AssociationCorrelation and Association
Correlation is a numerical expression between 1 and -1 (extending through all points Correlation is a numerical expression between 1 and -1 (extending through all points in between). in between).
Properly called the Correlation Coefficient.Properly called the Correlation Coefficient.
A decimal measure of association (not necessarily causation) between variablesA decimal measure of association (not necessarily causation) between variables
How can the above variables be correlated?How can the above variables be correlated?
Correlation and AssociationCorrelation and Association
POPULATIONSPOPULATIONSCan be mundane or extraordinaryCan be mundane or extraordinary
SAMPLESAMPLEMust be representativeMust be representative
INTERNALY VALIDITY OF SAMPLEINTERNALY VALIDITY OF SAMPLESometimes validity is more important than generalizabilitySometimes validity is more important than generalizability
SELECTION PROCEDURESSELECTION PROCEDURESRandomRandom
OpportunisticOpportunisticConscriptiveConscriptive
QuotaQuota
Sampling KeywordsSampling Keywords
THEORETICALTHEORETICALDeveloping, exploring, and testing ideasDeveloping, exploring, and testing ideas
EMPIRICALEMPIRICALBased on observations and measurements of realityBased on observations and measurements of reality
NOMOTHETICNOMOTHETICRules pertaining to the general case (nomos - Greek)Rules pertaining to the general case (nomos - Greek)
PROBABILISTICPROBABILISTICBased on probabilitiesBased on probabilities
CAUSALCAUSALHow causes (treatments) effect the outcomesHow causes (treatments) effect the outcomes
Sampling KeywordsSampling Keywords
Clinical ResearchClinical Research
Types of clinical researchTypes of clinical research
Experimental vs. ObservationalExperimental vs. Observational
Longitudinal vs. Cross-sectionalLongitudinal vs. Cross-sectional
Prospective vs. RetrospectiveProspective vs. Retrospective
LongitudinalLongitudinal
ProspectiveProspective
ExperimentalExperimental
Randomised Controlled TrialRandomised Controlled Trial
ObservationalObservational
LongitudinalLongitudinal Cross-sectionalCross-sectional
SurveySurveyRetrospectiveRetrospectiveProspectiveProspective
Case control studiesCase control studiesCohort studiesCohort studies
patientspatients
Treatment groupTreatment group
Control groupControl group
Outcome measuredOutcome measured
Outcome measuredOutcome measured
patientspatients Outcome measured #1Outcome measured #1 TreatmentTreatment Outcome measured #2Outcome measured #2
Experimental DesignsExperimental Designs
Between subjects studiesBetween subjects studies
Within Subjects studies Within Subjects studies
prospectively measure risk factors prospectively measure risk factors cohortcohort end point measuredend point measured
aetiologyaetiologyprevalenceprevalence
developmentdevelopmentodds ratiosodds ratios
retrospectively measure risk factors retrospectively measure risk factors start point measuredstart point measured casescases
aetiologyaetiologyodds ratiosodds ratiosprevalenceprevalencedevelopmentdevelopment
Observational studiesObservational studies
Cohort (prospective)Cohort (prospective)
Case-Control (retrospective)Case-Control (retrospective)
Case-Control Study – Smoking & CancerCase-Control Study – Smoking & Cancer
““Cases” have Lung CancerCases” have Lung Cancer
““Controls” could be other hospital patients (other disease) or “normals”Controls” could be other hospital patients (other disease) or “normals”
Matched Cases & Controls for age & genderMatched Cases & Controls for age & gender
Option of 2 Controls per CaseOption of 2 Controls per Case
Smoking years of Lung Cancer cases and controls Smoking years of Lung Cancer cases and controls (matched for age and sex)(matched for age and sex)
CasesCases ControlsControlsn=456n=456 n=456n=456
FF PPSmoking yearsSmoking years 13.7513.75 6.126.12 7.57.5 0.040.04
(± 1.5)(± 1.5) (± 2.1)(± 2.1)
Cohort Study: MethodsCohort Study: Methods
Volunteers in 2 groups e.g. exposed vs non-exposedVolunteers in 2 groups e.g. exposed vs non-exposed
All complete health survey every 12 monthsAll complete health survey every 12 months
End point at 5 years: groups compared for Health StatusEnd point at 5 years: groups compared for Health Status
Comparison of general health between users and non-users of mobile Comparison of general health between users and non-users of mobile phonesphones
illill healthyhealthy
mobile phone usermobile phone user 292292 108108 400400
non-phone usernon-phone user 8989 313313 402402
381381 421421 802802
Randomized Controlled Trials in GP & Primary CareRandomized Controlled Trials in GP & Primary Care
90% consultations take place in GP surgery90% consultations take place in GP surgery
50 years old50 years old
Potential problemsPotential problems
2 Key areas:2 Key areas: Recruitment BiasRecruitment Bias
Randomisation BiasRandomisation Bias
Over-focus on failings of RCTsOver-focus on failings of RCTs
RCT DeficienciesRCT Deficiencies
Trials too smallTrials too small
Trials too shortTrials too short
Poor qualityPoor quality
Poorly presentedPoorly presented
Address wrong questionAddress wrong question
Methodological inadequaciesMethodological inadequacies
Inadequate measures of quality of life (changing)Inadequate measures of quality of life (changing)
Cost-data poorly presentedCost-data poorly presented
Ethical neglectEthical neglect
Patients given limited understandingPatients given limited understanding
Poor trial managementPoor trial management
PoliticsPolitics
MarketeeringMarketeering
Why still the dominant model?Why still the dominant model?
Quantitative Data SummaryQuantitative Data Summary
• What data is needed to answer the larger-scale research questionWhat data is needed to answer the larger-scale research question
• Combination of quantitative and qualitative ?Combination of quantitative and qualitative ?
• Cleaning, re-scoring, re-scaling, or re-formattingCleaning, re-scoring, re-scaling, or re-formatting
• Measurement of both IV’s and DV’s is complex but can be simplifiedMeasurement of both IV’s and DV’s is complex but can be simplified
• Binary measurement makes analysis easier but less meaningfulBinary measurement makes analysis easier but less meaningful
• Binary data needs clear parameters e.g exposed vs controlsBinary data needs clear parameters e.g exposed vs controls
• Collecting good quality data Collecting good quality data at sourceat source is vital is vital
Quantitative Data SummaryQuantitative Data Summary
• Continuous & Discrete data can also be converted into Binary dataContinuous & Discrete data can also be converted into Binary data
• Normal distribution of participants / data points desirableNormal distribution of participants / data points desirable
• Means - age, height, weight, BMI, IQ, attitudesMeans - age, height, weight, BMI, IQ, attitudes
• Frequencies / Classifications - job type, sick vs. healthy, dead vs aliveFrequencies / Classifications - job type, sick vs. healthy, dead vs alive
• Means must be followed by Standard Deviation (SD or ±)Means must be followed by Standard Deviation (SD or ±)
• Presentation of data must enhance understanding or be redundantPresentation of data must enhance understanding or be redundant
If you or anyone you know has been affected by any of the issues
covered in this lecture, you may need a statistician’s help:
www.statistics.gov.uk
Further ReadingFurther Reading
Abbott, P., & Sapsford, R.J. (1988). Abbott, P., & Sapsford, R.J. (1988). Research methods for nurses and the Research methods for nurses and the caring professions.caring professions. Buckingham: Open University Press. Buckingham: Open University Press.
Altman, D.G. (1991). Designing Research. In D.G. Altman (ed.), Altman, D.G. (1991). Designing Research. In D.G. Altman (ed.), Practical Practical Statistics For Medical ResearchStatistics For Medical Research (pp. 74-106). London: Chapman and Hall. (pp. 74-106). London: Chapman and Hall.
Bland, M. (1995). The design of experiments. In M. Bland (ed.), Bland, M. (1995). The design of experiments. In M. Bland (ed.), An introduction An introduction to medical statisticsto medical statistics (pp5-25). Oxford: Oxford Medical Publications. (pp5-25). Oxford: Oxford Medical Publications.
Bowling, A. (1994). Bowling, A. (1994). Measuring HealthMeasuring Health. Milton Keynes: Open University Press.. Milton Keynes: Open University Press.
Daly, L.E., & Bourke, G.J. (2000). Epidemiological and clinical research Daly, L.E., & Bourke, G.J. (2000). Epidemiological and clinical research methods. In L.E. Daly & G.J. Bourke (eds.), methods. In L.E. Daly & G.J. Bourke (eds.), Interpretation and uses of medical Interpretation and uses of medical statisticsstatistics (pp. 143-201). Oxford: Blackwell Science Ltd. (pp. 143-201). Oxford: Blackwell Science Ltd.
Jackson, C.A. (2002). Research Design.Jackson, C.A. (2002). Research Design. In F. Gao-In F. Gao-Smith & J. Smith (eds.),Smith & J. Smith (eds.), Key Key Topics in Clinical ResearchTopics in Clinical Research. (pp. 31-39). Oxford: BIOS scientific Publications.. (pp. 31-39). Oxford: BIOS scientific Publications.
Further ReadingFurther Reading
Jackson, C.A. (2002). Planning Health and Safety Research Projects. Jackson, C.A. (2002). Planning Health and Safety Research Projects. Health Health and Safety at Work Special Reportand Safety at Work Special Report 62, (pp 1-16). 62, (pp 1-16).
Jackson, C.A. (2003). Analyzing Statistical Data in Occupational Health Jackson, C.A. (2003). Analyzing Statistical Data in Occupational Health Research.Research. Management of Health Risks Special Report Management of Health Risks Special Report 81, (pp. 2-8). 81, (pp. 2-8).
Kumar, R. (1999). Kumar, R. (1999). Research Methodology: a step by step guide for beginnersResearch Methodology: a step by step guide for beginners. . London: Sage. London: Sage.
Polit, D., & Hungler, B. (2003). Polit, D., & Hungler, B. (2003). Nursing research: Principles and methodsNursing research: Principles and methods (7th (7th ed.). Philadelphia: Lippincott, Williams & Wilkins.ed.). Philadelphia: Lippincott, Williams & Wilkins.