statistics for the health scientist: basic statistics i
DESCRIPTION
An introduction to medical statistics - Part 1. An introduction to statistics, data and variablesTRANSCRIPT
![Page 1: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/1.jpg)
1
Topic 1An Introduction to Statistics
Dr Luke KaneApril 2014
Topic 1: An Introduction to Statistics
![Page 2: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/2.jpg)
Topic 1: An Introduction to Statistics 2
OK… Rules – Very serious!
• YOU MUST ASK QUESTIONS • If you dont understand - let's work it out!• Otherwise – no rules
![Page 3: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/3.jpg)
Topic 1: An Introduction to Statistics 3
What is Statistics?
• Statistics is the science of learning from data, and of measuring, controlling, and communicating uncertainty;
• it thereby provides the navigation essential for controlling the course of scientific and societal advances
![Page 4: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/4.jpg)
Topic 1: An Introduction to Statistics 4
Outline
• Describing variables and data• Descriptive statistics
– Tables– Charts– Shapes
![Page 5: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/5.jpg)
Topic 1: An Introduction to Statistics 5
Objectives
• Define variable• Define data• Classify variables in quantitative or categorical• Sub-classify quantitative variables into discrete or
continuous• Sub-classify categorical variables into nominal or
ordinal• Use the type of variable to determine which table and
chart to display it• Understand the normal distribution and other shapes
![Page 6: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/6.jpg)
Topic 1: An Introduction to Statistics 6
What is a Variable?
• A variable is something whose value can vary• Examples (many!):
![Page 7: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/7.jpg)
Topic 1: An Introduction to Statistics 7
What is Data?
• Data are the values you get when you measure variables
• Example:
![Page 8: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/8.jpg)
Topic 1: An Introduction to Statistics 8
Types of Variable
• Lots of different ways of thinking about variables:– Categorical vs. Metric– Continuous vs. Categorical
I like this one:
Quantitative Vs Categorical
![Page 9: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/9.jpg)
Topic 1: An Introduction to Statistics 9
Categorical Variables - "What type?”
• "Categories"• Nominal:
– Unordered, order not important – Male or female, dead/alive, Blood group A B AB O
• Ordinal:– Ordered, order is important– type of breast cancer, agree neither agree nor
disagree disagree
![Page 10: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/10.jpg)
Topic 1: An Introduction to Statistics 10
Categorical Variables
– Types of houses– Days of the week– Opinions/viewpoints– Hair colour– Malaria positive or negative
![Page 11: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/11.jpg)
Topic 1: An Introduction to Statistics 11
Quantitative Variables - " How much?"
• Also known as metric• Quantitative variables can be:• Continuous:
– The variables come from measuring– Have units of measurement– Good for analysis
• Discrete:– The variables come from counting– The values are usually integer (whole number)
![Page 12: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/12.jpg)
Topic 1: An Introduction to Statistics 12
Quantitative Variables
• Weight, Height • Number of cigarettes per day • Blood pressure• How many malaria parasites in the blood• Number of workers with malaria
![Page 13: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/13.jpg)
Topic 1: An Introduction to Statistics 13
Variables: A Summary TableQuantitative
Continuous Discrete
Blood pressure, height, weight, age Number of children Number of asthma attacks per child
Categorical Ordinal Nominal
Grade of breast cancerBetter, same, worseDisagree, neutral, agree
Sex – male or femaleAlive or deadBlood group
![Page 14: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/14.jpg)
Topic 1: An Introduction to Statistics 14
Variables – more!• It is easy to summarise categorical variables• You can convert quantitative variables into categorical variables
– For example, in diabetes it is dangerous when sugar is very low – So a blood sugar of 1.6mmol/l is the quantitative measurement– You can place this in a low, normal or high range (which makes it a categorical
variable)– 1.6 is low - patient needs treatment (sugar!)
• Continuous variables allow better analysis as they are the real numbers • Tests have more power if used on continuous variables• So it is better to use continuous variables for statistical analysis• Better to use categorical variables for summarising results and
presentation
![Page 15: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/15.jpg)
Topic 1: An Introduction to Statistics 15
Descriptive Statistics
• This is taking the raw data and consolidating it into a table or chart
![Page 16: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/16.jpg)
Topic 1: An Introduction to Statistics 16
Descriptive Statistics
• Frequency tables• Relative frequency tables• Grouping the data• Open ended groups• Cumulative frequency• Cross tabulation
![Page 17: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/17.jpg)
Topic 1: An Introduction to Statistics 17
Frequency Tables
• Nominal Categorical Variables• Start with largest• Tell reader what the total number is (n = X)
Category Hair Colour
Frequency (number of adults)n=116
Black 85
Brown 17
Blonde 8
Red / Ginger 4
Other (e.g. blue, green) 2
![Page 18: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/18.jpg)
Topic 1: An Introduction to Statistics 18
Relative FrequencyCategory Hair Colour
Frequency (number of adults)n=116
Relative Frequency (%)
Black 85 73.3
Brown 17 14.7
Blonde 8 6.8
Red / Ginger 4 3.5
Other (e.g. blue, green) 2 1.7
![Page 19: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/19.jpg)
Topic 1: An Introduction to Statistics 19
Ordinal Categorical Variables
• Hair colour is a nominal categorical variable so does not need to be ordered.
• Satisfaction is an ordinal categorical variable so you can make a frequency table but you must put the categories in order.
• For example: How would you put these in order?• Unsatisfied• Very satisfied• Satisfied• Extremely unsatisfied
![Page 20: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/20.jpg)
Topic 1: An Introduction to Statistics 20
Continuous Data
• Not practical to display all of the raw data
• The table is too big even with the small sample size.
• Easier to group the data• Then make a frequency table
Pig Number (n = 21)
Weight of pigs at market / Kg
1 120
2 210
3 110
4 209
5 205
6 164
7 145
8 177
9 185
10 184
11 180
12 183
13 182
14 190
15 198
16 134
17 140
18 156
19 154
20 201
21 200
![Page 21: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/21.jpg)
Topic 1: An Introduction to Statistics 21
Grouped Frequency Table
• So if we group the data into groups of equal width you get a grouped frequency distribution
Weight of pigs at market / kg Number of pigs (Frequency) n =21
110-130 2
131-150 3
151 - 170 3
171- 190 7
191-210 6
![Page 22: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/22.jpg)
Topic 1: An Introduction to Statistics 22
Outliers
• This is fine if all the data is close together – i.e. if all the pigs weigh about the
same – But what do you do if there are
some giant pigs and some tiny pigs? – Like if you added two extra pigs to
our data set:• a pig weighing 54kg • one big one weighing 327kg
Weight of pigs at market / kg
Number of pigs (Frequency) n =21
51-70 1
71-90 0
91-110 0
111-130 2
131-150 3
151 – 170 3
171- 190 7
191-210 0
211-230 0
231-250 0
251-270 0
271 – 290 0
291 – 310 0
311 – 330 1
![Page 23: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/23.jpg)
Topic 1: An Introduction to Statistics 23
Open Ended Groups
• The big and small pig are called Outliers. • To make things easier you can use open ended
groups at the top and bottom Weight of pigs at market / kg Number of pigs (Frequency) n =21
≤110 1
111-130 2
131-150 3
151 - 170 3
171- 190 7
191-210 6
≥211 1
![Page 24: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/24.jpg)
Topic 1: An Introduction to Statistics 24
Symbols
• > More than• < Less than• ≥ Equal to or more than• ≤ Equal to or more than
![Page 25: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/25.jpg)
Topic 1: An Introduction to Statistics 25
Cumulative Frequency
• Adding up (cumulate) the frequencies as you go along
• Enables you to make a nice chart - see later• For example, the lengths of snakes below
Length of snake / cm Frequency (number of snakes) n = 61
Cumulative frequency of snakes
<30 10 10
31-60 17 27 (=10+17)
61-90 19 46 (=10+17+19)
91-120 12 58 (=10+17+19+12)
>121 3 61 (10+17=+19+12+3)
![Page 26: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/26.jpg)
Topic 1: An Introduction to Statistics 26
Cross-Tabulation
• Everything so far has been a table of a single variable
• Sometimes you want to look at how two variables influence one sample
• Crosstab - is the combination of two variables
![Page 27: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/27.jpg)
Topic 1: An Introduction to Statistics 27
Cross Tab Example
• Does drinking alcohol affect the number of accidents people have on motorbikes?
• What are the two variables?• The two variables are accidents and drinking• If there was a big party and you breathalysed
500 people leaving, you could determine if they were above or below the drink-drive limit. You could then ask them the next day if there was an accident on their way home.
![Page 28: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/28.jpg)
Topic 1: An Introduction to Statistics 28
Cross Tab Example
Accident on way home?
Above the alcohol limit?
Yes No
Yes 40 2
No 116 342
Total 156 344
![Page 29: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/29.jpg)
Topic 1: An Introduction to Statistics 29
Cross Tab Example• You can then convert this into percentages by adding up the
columns and rows.
• It is very easy to see that over 99% of the sober drivers did not have accidents and more than 1 in 4 of the drunk drivers had accidents
• Dont drink and drive!
Accident on way home?
Above the alcohol limit?
Yes No
Yes 25.6% 0.6%
No 74.4% 99.4%
![Page 30: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/30.jpg)
Topic 1: An Introduction to Statistics 30
Charts
• Charts are a good way of describing data• Categorical data is easily plotted as:
– Pie chart– Bar chart– Clustered bar chart– Stacked bar chart
![Page 31: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/31.jpg)
Topic 1: An Introduction to Statistics 31
Pie Charts
• Good: for categorical nominal data, easy to make, easy to understand
• Bad: Can only use one variable - need separate pie chart for each variable, confusing if many categories used
![Page 32: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/32.jpg)
Topic 1: An Introduction to Statistics 32
Simple Bar Chart
• Good: for categorical nominal data, easy to make, easy to understand
• Bad: only one variable
• Note must have spaces between bars, equal bars
![Page 33: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/33.jpg)
Topic 1: An Introduction to Statistics 33
Clustered Bar Chart
• Very similar to a simple bar chart but allows you to compare sub-groups, e.g. boys and girls
• Good for comparing category sizes between groups, e.g. blonde boys and blonde girls
![Page 34: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/34.jpg)
Topic 1: An Introduction to Statistics 34
Stacked Bar Chart
• Good for comparing total number of subjects in each group, e.g. all boys and all girls
![Page 35: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/35.jpg)
Topic 1: An Introduction to Statistics 35
Quantitative Charts
• Bar Charts can also be used to graph discrete quantitative data
• But for continuous quantitative data it is better to use a histogram
• Cumulative quantitative data can be charted with a step chart or a frequency curve
![Page 36: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/36.jpg)
Topic 1: An Introduction to Statistics 36
Histograms
• Frequency Histogram• Uses data that is
grouped together to save space
• There are no gaps between the bars - it is a continuous variable
• Bad: only use one variable at a time
![Page 37: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/37.jpg)
Topic 1: An Introduction to Statistics 37
Frequency Curve
• For cumulative data you can make a frequency curve
• Continuous quantitative data is assumed to have a smooth continuum of values
• This should make a nice, smooth curve - the cumulative frequency curve
• This is also known as an ogive
![Page 38: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/38.jpg)
Topic 1: An Introduction to Statistics 38
Frequency Curve - Snakes
• So if we take the snakes: Length of snake / cm Frequency (number of
snakes) n = 61Cumulative frequency of snakes
% Cumulative frequency of snakes
<30 10 10 10/61 = 16.4%
31-60 17 27 27/61 = 44.3%
61-90 19 46 75.4%
91-120 12 58 95.1%
>121 3 61 100%
<30 31-60 61-90 91-120 >1210.00%
20.00%
40.00%
60.00%
80.00%
100.00%
120.00%
![Page 39: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/39.jpg)
Topic 1: An Introduction to Statistics 39
Shapes
• OK now we have charts – how do you describe data from the shape of the graph?
• A uniform distribution is evenly distributed– "A normal curve represents perfectly symmetrical
distribution"– Also known as a "bell shape"
• Then you have "skews" to the left or right– Left skews are negatively skewed– Right skews are positively skewed
• Bimodal distributions have two humps
![Page 40: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/40.jpg)
Topic 1: An Introduction to Statistics 40
Normal distribution
![Page 41: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/41.jpg)
Topic 1: An Introduction to Statistics 41
Skew• A measure of the asymmetry
![Page 42: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/42.jpg)
Topic 1: An Introduction to Statistics 42
Bimodal distribution
![Page 43: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/43.jpg)
Topic 1: An Introduction to Statistics 43
Summary so far…
• Types of data and variables• Ways to put this data in tables• Ways to put this data in charts• Ways to examine the shape of the data• Next: TOPIC 2
– Using numbers to summarise the data– Prevalence and Incidence
![Page 44: Statistics for the Health Scientist: Basic Statistics I](https://reader034.vdocument.in/reader034/viewer/2022051609/54798cf8b4795977098b47d2/html5/thumbnails/44.jpg)
Topic 1: An Introduction to Statistics 44
References
• This lecture is based on David Bowers “Medical statistics from Scratch: An introduction for health professionals”
• Bowers, D. (2008) Medical Statistics from Scratch: An Introduction for Health Professionals. USA: Wiley-Interscience.