basic statistical concepts and methods
Post on 27-Jan-2015
171 Views
Preview:
DESCRIPTION
TRANSCRIPT
Ahmed-Refat-ZU
Basic Statistical Concepts and Methods
Ahmed-Refat AG RefatAhmed-Refat AG Refat
FOM-ZUFOM-ZU
Ahmed-Refat-ZU
Definition of Statistics
Statistics is the science of dealing with numbers.
It is used for collection, summarization,
presentation and analysis of data.
Statistics provides a way of organizing data to get information on a wider and more formal (objective) basis than relying on personal experience (subjective).
Ahmed-Refat-ZU
Uses of medical statistics
Medical statistics are used in1- Planning, monitoring and evaluating community
health care programs.2- Epidemiological research studies.3- Diagnosis of community health problems.4- Comparison of health status and diseases in different
countries and in one country over years. 5- To form standards for the different biological
measurements as weight, height.6- To differentiate between diseased and normal groups.
Ahmed-Refat-ZU
Types of data
Any aspect of an individual that is measured, is called
variable. Variables are either
1-Quantitative or 2-Qualitative.
1- Quantitative data: it is numerical data. Discrete data: are usually whole numbers, such as
number of cases of certain disease, number of hospital beds (no decimal fraction).
Continuous data: it implies the measurement on a continuous scale e.g. height, weight, age (a decimal fraction can be present).
Ahmed-Refat-ZU
1- Quantitative data
. Quantitative data: it is numerical data.
Tow Types A- Discrete data: are usually whole numbers, such
as number of cases of certain disease, number of hospital beds (no decimal fraction).
B- Continuous data: it implies the measurement on a continuous scale e.g. height, weight, age (a decimal fraction can be present).
Ahmed-Refat-ZU
2- Qualitative data
Qualitative data: It is non numerical data and is subdivided into Two Types:
A- Categorical : data are purely descriptive and imply no ordering of any kind such as sex, area of residence.
B- Ordinal data: are those which imply some kind of ordering like
- Level of education: - Socio-economic status: - Degree of severity of disease:
Ahmed-Refat-ZU
Presentation Of Data
The first step in statistical analysis is to present data in an easy way to be understood.
The two basic ways for data presentation are:
1. Tabular presentation.
2. Graphical presentation
Ahmed-Refat-ZU
Tabulation
Some rules for the construction tables: 1- The table must be self-explanatory. 2- Title: written at the top of table to define precisely the content, the place and the time.3- Clear heading of the columns and rows and units of measurements 4- The size of the table depends on the number of classes. Usually lie between 2 and 10 rows or classes. Its selection depends on the form of data and the requirement of the distribution. Too small may obscure some information and too long will not differ from raw data.
Ahmed-Refat-ZU
Types of tables
For Qualitative data, draw a simple table eg., List Table : count the number of observations ( frequencies) in each category.
For Quantitative data, we have to form a frequency distribution Table
List tables (2 columns- one value for each measured variable)
Frequency Distribution Tables
Ahmed-Refat-ZU
Types of tables
:List: A table consisting of two columns, the first giving an identification of the observational unit and the second giving the value of variable for that unit.Example : number of patients in each hospital department are
Medicine 100 patients Surgery 80 “ ENT 28 “
Ophthalmology 30 “
Ahmed-Refat-ZU
Frequency Distribution tables
FDTs are used for presentation of qualitative ( and quantitative Discrete) data,
By recording the number of observations in each category.
These counts are called frequencies.
…………………………………….
No Classes ….. No Intervals
Ahmed-Refat-ZU
FDT for Quantitative Continuous Data consists of a series of classes (intervals) together with the number of observations ( frequency) whose values fall within the interval of each class.
Frequency Distribution tables
Ahmed-Refat-ZU
Frequency Distribution tables
EXAMPLE (1) Assume we have a group of 20 individuals whose blood groups were as followed : A , AB, AB, O, B, A, A, B, B, AB, O, AB, AB, A, B, B, B, A, O, A. We want to present these data by table.
????? Type of data >>>>>>……
Ahmed-Refat-ZU
How to Construct a Frequency Distribution
tablesFour Steps
Title, Table, No , %1- Put a title
2- Draw Columns & Rows
3- Enumerate the individuals in each category
4- 4- Calculate The relative frequency (%)Calculate The relative frequency (%)
Ahmed-Refat-ZU
How to Construct a Frequency Distribution
tablesFour Steps
1- Put a title eg.,
Distribution of the studied individuals according to their blood group.
2- Draw a table (Columns & Rows),First column > Studied Variable“ Blood Group”, 2nd column heading >“Frequency-Number”
3rd column heading > “ Percentage %”
Ahmed-Refat-ZU
Frequency Distribution tables
3- Enumerate the individuals in each blood group , i.e. individuals with blood group A are 6 and those with blood group B are 6 , AB are 5 and blood group
O are 3.
Make sure that the total number of individuals in all blood groups is 20 (the number of the studied group).
Ahmed-Refat-ZU
Frequency Distribution tables
4- Calculate The relative frequency 4- Calculate The relative frequency (%)(%) of each blood group by dividing the frequency of that group over the total number of individuals and multiplied by 100 i.e. the percentage of group A = 6 / 20 x 100, and the same for group AB = 5 / 20 x 100 and group O = 3 / 20 x 100. The final
table will be :
Ahmed-Refat-ZU
Frequency Distribution tables What is Your Conclusion?
Ahmed-Refat-ZU
Frequency Distribution tables
We can conclude from this table that blood groups A & B are the most common groups and the rarest is group O (depending on the percentage of each group).
So presenting data in table is beneficial in deducing facts and simplify information than raw data.
Ahmed-Refat-ZU
Frequency Distribution tables
EXAMPLE (3) : The Following data are Systolic Blood Pressure measurements (mmHg) of 30 patients with hypertension. Present these data in frequency table:
150, 155, 160, 154, 162, 170, 165, 155, 190, 186, 180, 178, 195, 200, 180,156, 173, 188, 173, 189, 190, 177, 186,
177, 174, 155, 164, 163, 172, 160.
??????? Type of Data
Ahmed-Refat-ZU
Frequency Distribution tables
Four Steps 1- Put a title eg.,
Frequency distribution of blood pressure measurements (mmHg) among a group of
hypertensive patients. 2- Draw a table (Columns & Rows),
First column > Studied Variable“ Blood Pressure-mm Hg”,
2nd column heading >“Frequency-Number”
3rd column heading > “ Percentage %”
Ahmed-Refat-ZU
Frequency Distribution tables
3-In the first column we have to classify blood pressure into categories or classes because we have a large sample (N=30)
and the measured variable is of continuous type (not discrete as in the previous
examples).
Ahmed-Refat-ZU
Frequency Distribution tables
construction of classes Calculate the Range of observation: subtract the lowest value of blood pressures from the highest value
(the highest was 200 and the lowest was 150) the difference is 50.
Determine the number of classes and the width class intervals Let class interval be 10 , so we will have 50/10 = 5 classes. Enumerate the Frequency By Tally MethodsCalculate the Exact Frequncy & Relative frequency
Ahmed-Refat-ZU
Frequency Distribution tables
construction of classes Determine the the number of classes You want to display ( not too few ~2 and too frequent >8. it is a matter of trial and sense !!!Let class interval= 10 mmHg , we will have 5 classes. If we choose 5 mmHg as a class interval-width we will obtain 10 classes (too long table).
We must maintain constant width for all intervals. Choose the upper and lower limits of the class start with the lowest value i.e 150 List the intervals in order every 10
Ahmed-Refat-ZU
Ahmed-Refat-ZU
2-Graphical PresentationThe diagram should be:
Simple
Easy to understand
Save a lot of words
Self explanatory
Has a clear title indicating its content
Fully labeledThe y axis (vertical) is usually used for frequency
Ahmed-Refat-ZU
2-Graphical Presentation
Graphic presentations used to illustrate and clarify information. Tables are essential in presentation of scientific data and diagrams are complementary to summarize these tables in an easy, attractive and simple way.
Ahmed-Refat-ZU
Graphical Presentation 1-Bar chart
It is used for presenting discrete or qualitative data. It represent the measured value (or %) by separated rectangles of constant width and its lengths proportional to the frequencyType:
>>>Simple , >>> Multiple, >>>Components
Ahmed-Refat-ZU
Graphical Presentation 1-Bar chart- Simple
Mean maternal age of three studied groups
24
24.5
25
25.5
26
26.5
27
group I group II group III
The studied groups
Mea
n ag
e in
yea
rs
Ahmed-Refat-ZU
Graphical Presentation 1-Bar chart
Multiple bar chart: Each observation has more than one value represented, by a group of bars. Percentage of males and females in different countries, percentage of deaths from heart diseases in old and young age, mode of delivery (cesarean or vaginal) in different female age groups.
Ahmed-Refat-ZU
Graphical Presentation 1-Bar chart-MultipleMultiple bar chart:
Cancer Anemia
Males
Females
Ahmed-Refat-ZU
Graphical Presentation 1-Bar chart
Component bar chart : subdivision of a single bar to indicate the composition of the total divided into sections according to their relative proportion.
Ahmed-Refat-ZU
Graphical Presentation 1-Bar chart
Component bar chart : For example two countries are compared in their socio-economic standard of living, each bar represent one country, the height of the bar is 100, it is divided horizontally into 3 components (low, moderate and high classes) of socio-economic classes (SE), each class is represented by different color or shape.
Ahmed-Refat-ZU
Graphical Presentation 1-Bar chart- Component
0%
20%
40%
60%
80%
100%
perc
enta
ge o
f pop
ulat
ion
Egypt USA
Comparison between Egypt and USA in socio-economic standard of living
high
moderate
low
Ahmed-Refat-ZU
Graphical Presentation 2-Pie diagram:
Consist of a circle whose area represents the total frequency (100%) which is divided into segments.
Each segment represents a proportional composition of the total frequency.
Ahmed-Refat-ZU
Graphical Presentation 2-Pie diagram:
Percentage of causes of child death in Egypt
diarrhea50%
chest infection30%
congenital10%
accident10%
Ahmed-Refat-ZU
Graphical Presentation 3- Histogram:
It is very similar to the bar chart with the difference that the rectangles or bars are adherent (without gaps).
It is used for presenting class frequency table (continuous data).
Each bar represents a class and its height represents the frequency (number of cases), its width represent the class interval.
Ahmed-Refat-ZU
Graphical Presentation 3- Histogram:
Distribution of studied group according to their height
0
5
10
15
20
25
30
100- 110- 120- 130- 140- 150-
height in cm
num
ber
of in
divi
dual
s
Ahmed-Refat-ZU
Graphical Presentation 4 -Frequency Polygon
Derived from a histogram by connecting the mid points of the tops of the rectangles in the histogram.The line connecting the centers of histogram rectangles is called frequency polygon. We can draw polygon without rectangles so we will get simpler form of line graph.
A special type of frequency polygon is the Normal Distribution Curve.
Ahmed-Refat-ZU
Graphical Presentation 5 - Scatter diagram
- It is useful to represent the relationship between two numeric measurements, each observation being represented by a point corresponding to its value on each axis
Ahmed-Refat-ZU
This scatter diagram showed a positive or direct relationship between NAG and
albumin/creatinine among diabetic patients
Correlation between NAG and albumin creatinine ratio in group of early diabetics
05
101520253035
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
albumin creatinine ratio
NA
G
Ahmed-Refat-ZU
In negative correlation, the points will be scattered in downward direction, meaning that the relation between the two studied measurements is controversial i.e. if one measure increases the other decreases. As shown in the following graph
Correlation between Doppler velocimetry (RI) and baby birth weight
0
0.2
0.4
0.6
0.8
1
1.5 2 2.5 3 3.5 4 4.5
baby weight in kgR
I
Ahmed-Refat-ZU
Graphical Presentation 6- Line graph:
it is diagram showing the relationship between two numeric variables (as the scatter) but the points are joined together to form a line (either broken line or
smooth curve)
Changes in body temperature of a patient after use of antibiotic
36
36.5
37
37.5
38
38.5
39
39.5
1 2 2 4 5 6 7
time in hours
tem
pe
ratu
re
Ahmed-Refat-ZU
Normal Distribution Curve
Ahmed-Refat-ZU
Normal Distribution curve
NDC is a Graphical Presentation <Frequency Polygon> of any Quantitative Biologic Variables
The Normal Distribution Curve is the frequency polygon of a quantitative variable measured in large number. It is a form of presentation of frequency distribution of biologic variables such as weights, heights, hemoglobin level and blood pressure or any continuous data.
It occupies a major role in the techniques of statistical analysis.
Ahmed-Refat-ZU
Ahmed-Refat-ZU
Characteristics of Normal Distribution curve
1- It is bell shaped, continuous curve.2- It is symmetrical i.e. can be divided into two
equal halves vertically.3- The tails never touch the base line but
extended to infinity in either direction.4- The mean, median and mode values coincide5- It is described by two parameters: arithmetic
mean determine the location of the center of the curve and standard deviation represents the scatter around the mean.
Ahmed-Refat-ZU
Areas under the normal curve
X ± 1 SD = 68% of the area on each side of the mean.
X ± 2 SD = 95% of area on each side of the mean.
X ± 3 SD = 99% of area on each side of the mean.
Ahmed-Refat-ZU
Skewed data
If we represent a collected data by a frequency polygon graph and the
resulted curve does not simulate the normal distribution curve (with all its characteristics)
then these data are not normally distributed
Ahmed-Refat-ZU
Causes of Skewed CurveNot Normally Distributed Data
The curve may be skewed to the right or to the left side
This is because The data collected are from:
1. certain heterogeneous group
2. or from diseased or abnormal population
therefore the results obtained from these data can not be applied or generalized on the whole population.
Ahmed-Refat-ZU
NDC can be used in distinguishing between normal from abnormal measurements.
Example:If we have NDC for hemoglobin levels for a
population of normal adult males with mean ± SD = 11 ±1.5
If we obtain a hemoglobin reading for an individual = 8.1 and we want to know if he/she is normal or
anemic. If this reading lies within the area under the curve at
95% of normal (i.e. mean ± 2 SD)he /she will be considered normal. If his reading is less
then he is anemic.
Ahmed-Refat-ZU
The normal range for hemoglobin in this example will be: the higher level of hemoglobin: 11 + 2 ( 1.5 ) =14. The lower hemoglobin level 11 – 2 ( 1.5 ) = 8.
i.e the normal range of hemoglobin of adult males is from 8 to 14.
our sample (8.1 ) our sample (8.1 ) lies within the 95% of his population.
therefore this individual is normalis normal because his reading lies within the 95% of his population.
Ahmed-Refat-ZU
Data Summarization
To summarize data, we need to use one or two parameters that can describe the data.
1. Measures of Central tendency which describes the center of the data
2. and the Measures of Dispersion, which show how the data are scattered around its center.
Ahmed-Refat-ZU
Measures of central tendency
Variable usually has a point (center) around which the observed values lie. These averages are also called measures of central tendency. The three most commonly used averages are:
1. The arithmetic mean:
2. The Median
3. The Mode
Ahmed-Refat-ZU
1- The arithmetic mean:
the sum of observation divided by the number of observations:
x = ∑ x
n
Where : x = mean
∑ denotes the (sum of)
x the values of observation
n the number of observation
Ahmed-Refat-ZU
Example: In a study the age of 5 students were: 12 , 15, 10, 17, 13
Mean = sum of observations / number of observations
Then the mean X = (12 + 15 + 10 + 17 + 13) / 5 =13.4 years
1- The arithmetic mean:
Ahmed-Refat-ZU
Calculation of Mean For frequency Distribution Data
In case of frequency distribution data we calculate the mean by this equation:
x = ∑ fx nwhere f = frequency
for example : we want to calculate the mean incubation period of this group.
Ahmed-Refat-ZU
Calculation of Mean For frequency Distribution Data
Ahmed-Refat-ZU
If data is presented in frequency table with class intervals we calculate mean by the same equation summation of f x1 /n , x1 denotes the midpoint of class interval.
Example : calculate the mean of blood pressure of the following group :
Calculation of Mean For frequency Distribution Data
with class intervals
Ahmed-Refat-ZU
Ahmed-Refat-ZU
Ahmed-Refat-ZU
2- Median
It is the middle observation in a series of observation after arranging them in an ascending or descending manner.
The rank of median for is (n + 1)/2 if
the number of observation is odd and n/2 if the number is even
Ahmed-Refat-ZU
Calculate the median of the following data 5, 6, 8, 9, 11 n = 5~ Odd!!
-The rank of the median = n + 1 / 2 i.e. (5+ 1)/ 2 = 3
The median is the third value in these groups when data are arranged in ascending (or descending) manner.
- So the median is 8 (the third value)
2- Median
Ahmed-Refat-ZU
- If the number of observation is even, the median will be calculated as follows:e.g. 5, 6, 8, 9 n = 4
- The rank of median = n / 2 i.e. 4 / 2 = 2 .The median is the second value of that group. If data are arranged ascendingly then the median will be 6 and if arranged descendingly the median will be 8 therefore the median will be the mean of both observations i.e. (6 + 8)/2 =7.
2- Median
Ahmed-Refat-ZU
For simplicity we can apply the same equation used for odd numbers i.e. n + 1 / 2. The median rank will be 4 + 1 /2 = 2 ½ i.e. the median will be the second and the third values i.e. 6 and 8, take their mean = 7.
2- Median
Ahmed-Refat-ZU
The most frequent occurring value in the data is the mode and is calculated as follows:
Example: 5, 6, 7, 5, 10. The mode in this data is 5 since number 5 is repeated twice. Sometimes, there is more than one mode and sometimes there is no mode especially in small set of observations.
3- Mode
Ahmed-Refat-ZU
Example : 20 , 18 , 14, 20, 13, 14, 30, 19. There are two modes 14 and 20.
Example : 300, 280 , 130, 125 , 240 , 270 . Has no mode.
Unimodal Bimodal Nomodal
3- Mode
Ahmed-Refat-ZU
Advantages and disadvantages
of the measures of central Tendency:
- Mean: is the preferred CTM since it takes into account each individual observation but its main disadvantage is that it is affected by the extreme valus of observations.
Ahmed-Refat-ZU
Median: it is a useful descriptive measure if there are one or two extremely high or low values.
-Mode: is seldom used.
Advantages and disadvantages
of the measures of central Tendency:
Ahmed-Refat-ZU
Measures of Dispersion
The measure of dispersion describes the degree of variations or scatter or dispersion of the data around its central values: (dispersion = variation = spread = scatter).
1. Range - R2. Variance -V3. Standard Deviation - SD4. Coefficient of Variation -COV
Ahmed-Refat-ZU
1- Range:
is the difference between the largest and smallest values. is the simplest measure of variation.
disadvantages, it is based only on two of the observations and gives no idea of how the other observations are arranged between these two.
Also, it tends to be large when the size of the sample increases
Ahmed-Refat-ZU
If we want to get the average of differences between the mean and each observation in the data,we have to reduce each value from the mean
and then sum these differences and divide it by the number of observation. V
= ∑ (mean – xi) / n
2- Variance
Ahmed-Refat-ZU
Variance V = ∑ (mean – x) / n
The value of this equation will be equal to zero
because the differences between each value and the mean will have negative and positive signs that will equalize zero on algebraic summation.
2- Variance
Ahmed-Refat-ZU
To overcome this zero we square the difference between the mean and each value so the sign will be always positive . Thus we get:
V = ∑ (mean – x)2 / n - 1
2- Variance
Ahmed-Refat-ZU
3- Standard Deviation SD
The main disadvantage of the variance is that it is the square of the units used. So, it is more convenient to express the variation in the original units by taking the square root of the variance. This is called the standard deviation (SD). Therefore SD = √ V
i.e. SD = √ ∑ (mean – x)2 / n - 1
Ahmed-Refat-ZU
The coefficient of variation expresses the standard deviation as a percentage of the sample mean.
C. V = SD / mean * 100
C.V is useful when, we are interested in the relative size of the variability in the data. Example : if we have observations 5, 7, 10, 12 and 16. Their mean will be 50/5=10. SD = √ (25+9 +0 + 4 + 36 ) / (5-1) = √ 74 / 4 = 4.3C.V. = 4.3 / 10 x 100 = 43%
4- Coefficient of variation CoV
Ahmed-Refat-ZU
Example
Calculate the mean, variance, SD and CV From the following measurements
5, 7, 10, 12 and 16.
Mean= 5+7+10+12+16/5=10.
SD = √ (25+9 +0 + 4 + 36 ) / (5-1) =
√ 74 / 4 = 4.3
C.V. = 4.3 / 10 x 100 = 43%
Ahmed-Refat-ZU
Another observations are 2, 2, 5, 10, and 11. Their mean = 30 / 5 = 6 SD = √ (16 + 16 + 1 + 16 + 25)/(5 –1) = √ 74 / 4 = 4.3 C.V = 4.3 /6 x 100 = 71.6 %Both observations have the same SD but they are different in C.V. because data in the first group is homogenous (so C.V. is not high), while data in the second observations is heterogenous (so C.V. is high).
Example
Ahmed-Refat-ZU
Example: In a study where age was recorded the following were the observed values: 6, 8, 9, 7, 6. and the number of observations were 5.Calculate the mean, SD and range, mode and median.- The mean = sum of observation / their number
Example
Ahmed-Refat-ZU
The variance = Sum of the squared differences (mean minus observation) / number of observations. (7.2 – 6)2 + (7.2 – 8)2 + (7.2 – 9)2 + (7.2 – 7)2 + (7.2 – 6)2 / 5 – 1. which is equal to (1.2)2 + (- 0.8)2 + (- 1.8) 2 +(0.2)2 + (1.2)2 / 4 = 1.7
- So the variance = 1.7
Examples
Ahmed-Refat-ZU
- The S.D. = √ 1.7 = 1.3
- Range = 9 – 6 = 3
- The mode is 6
- The median is : first we have to arrange data ascendingly i.e. 6 – 6 – 7 – 8 – 9.
The rank of median = n + 1 / 2 i.e. 5 + 1 / 2 = 3 therefore the median is the third value i.e. median = 7
Examples
Ahmed-Refat-ZU
Inferential statistics
Inference involves making a Generalization about a larger group of individuals on the basis of a subset or sample.
Ahmed-Refat-ZU
Inferential statisticsHypothesis Testing
In hypothesis testing we want to find out whether the observed variation among sampling is explained by chance alone ???? (i.e., the chance of random sampling
variations ), or due to a real difference ???? between groups.
Ahmed-Refat-ZU
Hypothesis Testing
It involves conducting a test of statistical significance quantifying the chance of
random sampling variations that may account for observed results. In hypotheses testing, we are asking whether the sample mean for example is consistent with a certain hypothesis value for the population mean.
Ahmed-Refat-ZU
Hypothesis Testing
The method of assessing the hypotheses testing is known as
significance testsignificance test.
The significance testingThe significance testing is a method for assessing whether a result is likely to be due to chance or due to a real effect.
Ahmed-Refat-ZU
Hypothesis Testing –Steps
>>> Formulate Hypothesis
>>> Collect the Data
>>>> Test Your Hypothesis
>>> Accept of Reject Your Hypothesis
Ahmed-Refat-ZU
Null and alternative hypotheses
In hypotheses testing, a specific hypothesis ( Null and alternative Hypothesis ) are formulated and tested. The null hypotheses H0 means : X1=X 2
Or X1-X 2=0this means that there is no difference between x1 and x2
The alternative hypotheses H1 means X1>X2 or X1< X2
Ahmed-Refat-ZU
Null and alternative hypotheses
The alternative hypotheses H1 means X1>X2 or X1< X2
this means that there is no difference between x1 and x2. If we reject the null hypothesis, i.e there is a difference between the two readings, it is either H1 : x1 < x2 or H2 : x1> x2in other words the null hypothesis is rejected because x1 is different from x2.
Ahmed-Refat-ZU
General principles of significance tests
1. set up a null hypothesis and its alternative.
2. find the value of the test statistic.
3. refer the value of the test statistic to a known distribution which it would follow if the null hypothesis was true.
Ahmed-Refat-ZU
General principles of significance tests
4-conclude that the data are consistent or inconsistent with the null hypothesis.
If the data are not consistent with the null hypotheses, the difference is said to be statistically significant. If the data are consistent with the null hypotheses it is said that we accept it i.e. statistically insignificant.
Ahmed-Refat-ZU
General principles of significance tests P<0.05
In medicine, we usually consider that differences are significant if the probability is less than 0.05. This means that if the null hypothesis is true, we shall make a wrong decision less than 5 in a hundred times
Ahmed-Refat-ZU
Tests of significance
The selection of test of significance depends essentially on the type of data that we have.
1-Quantitative Data ( Means & SD): tt
test ,test ,paired tpaired t test and , test and ,ANOVAANOVA
2-Qualitative Data>>> ChiChi, and , and Z testZ test.
Ahmed-Refat-ZU
Tests of significance
Comparison of means:1-comparing two means of large samples using the normal distribution:(z test or SND standard normal deviate)If we have a large sample size i.e. 60 or more and it follows a normal distribution then we have to use the z-test.
z = (population mean — sample mean) / SD. If the result of z >2 then there is significant difference.
Ahmed-Refat-ZU
Tests of significance
Since the normal range for any biological reading lies between the mean value of the population reading ± 2 SD. (this range includes 95% of the area under the normal distribution curve).
Ahmed-Refat-ZU
Student’s t-test
2-Comparing two means of small samples using t-test:
If we have a small sample size (less than 60), we can use the t distribution instead of the normal distribution.
T = mean1 — mean2 / (SD1 2 / n1) +
(SD22 / n2)
Ahmed-Refat-ZU
The value of t will be compared to values in the specific table of "t distribution test" at the value of the degree of freedom. If the value of t is less than that in the table , then the difference between samples is insignificant.
If the t value is larger than that in the table so the difference is significant i.e. the null hypothesis is rejected.
t-test
Ahmed-Refat-ZU
2-Comparing two means of small samples using t-test:
If we have a small sample size (less than 60), we can use the t distribution instead of the normal distribution.
T = mean1 — mean2 / (SD1 2 / n1) +
(SD22 / n2)
t-test
Ahmed-Refat-ZU
3-paired t-test:
If we are comparing repeated observation in the same individual or difference between paired data, we have to use paired t-test where the analysis is carried out using the mean and standard deviation of the difference between each pair.
Paired t-test
Ahmed-Refat-ZU
4-comparing several means:
Sometimes we need to compare more than two means, this can be done by the use of several t-test which is not only tedious but can lead to spurious significant results. Therefore we have to use what we call analysis of variance or ANOVA.
ANOVA
Ahmed-Refat-ZU
4-comparing several means:There are two main types: one-way analysis of variance and two-way analysis of variance. One-way analysis of variance is appropriate when the subgroups to be compared are defined by just one factor, for example comparison between means of different socio-economic classes. The two-way analysis of variables is used when the subdivision is based upon more than one factor
ANOVA
Ahmed-Refat-ZU
The main idea in the analysis of variance is that we have to take into account the variability within the groups and between the groups and value of F is equal to the ratio between the means sum square of between the groups and within the groups.
F = between-groups MS / within-groups MS
ANOVA
Ahmed-Refat-ZU
b-Qualitative variables:
1)Chi -squared test:
Qualitative data are arranged in table formed by rows and columns. One variable define the rows and the categories of the other variable define the column.
Chi-Squared Test
Ahmed-Refat-ZU
A chi-squared test is used to test whether there is an association between the row variable and the column variable or, in other words whether the distribution of individuals among the categories of one variable is independent of their distribution among the categories of the other.
X2=(O-E)2 / E
Chi-Squared Test
Ahmed-Refat-ZU
1)Chi -squared test:
degree of freedom = (row - 1) (column - 1)
O = observed value in the table
E = expected value calculated as follows:
E= Rt x Ct / GT
total of row x total of column / grand total
Chi-Squared Test
Ahmed-Refat-ZU
Ahmed-Refat-ZU
From tables of X2 significance at degree of freedom (row 3-1)x(column 3-1) = 2x 2=4. The level of significance at 0.05 level, d.f.=4 is 9.48. therefore we conclude that there is significant relation between socioeconomic level and the degree of intelligence (because the value of X2 > that of the table).
Chi-Squared Test
Ahmed-Refat-ZU
2) Z test for comparing two percentages:
z = p1 – p2 /√p1q1/n1 + p2q2/n2. where p1=percentage in the 1st group. P2 = percentage in the 2nd group, q1=100-p1, q2=100-p2, n1= sample size of group 1, n2=sample size of group2.Z test is significant(at 0.05 level)if the result>2.
Z Test
Ahmed-Refat-ZU
Example: if the number of anemic patients in group 1 which includes 50 patients is 5 and the number of anemic patients in group 2 which contains 60 patients is 20. To find if groups 1 & 2 are statistically different in prevalence of anemia we calculate z test.
P1=5/50=10% p2=20/60=33% q1=100-10=90 q2=100-33=67
Chi-Squared Test
Ahmed-Refat-ZU
Z=10 – 33/ √ 10x90/50 + 33x67/60
Z= 23 / √ 18 + 36.85 z= 23/ 7.4 z= 3.1
Therefore there is statistical significant difference between percentages of anemia in the studied groups (because z >2).
Chi-Squared Test
Ahmed-Refat-ZU
c-Correlation and regression:
Correlation measures the closeness of the association between two continuous variables, while linear regression gives the equation of the straight line that best describes and enables the prediction of one variable from the other.
Correlation & regression
Ahmed-Refat-ZU
1-Correlation:In the correlation, the closeness of the association is measured by the correlation coefficient, r. The values of r ranges between + 1 and —1. One means perfect correlation while 0 means no correlation. If r value is near the zero, it means weak correlation while near the one it means strong correlation. The sign — and + denotes the direction of correlation,
Correlation & regression
Ahmed-Refat-ZU
1-Correlation:
the +ve correlation means that if one variable increases the other one increases similarly while for the –ve correlation means that when one variable increases the other one decreases
Correlation
Ahmed-Refat-ZU
2- Linear regression:
Similar to correlation, linear regression is used to determine the relation and prediction of the change in a variable due to changes in other variable. For linear regression, the independent factor has to be specified from the dependent variable.
Linear regression
Ahmed-Refat-ZU
2- Linear regression:The linear regression, not only allow assessment of the presence of association between the independent and dependent variable but also allows the prediction of dependent variable for a particular independent variable. However, regression for prediction should not be used outside the range of original data. a t-test is also used for the assessment of the level of significance. The dependent variable in linear regression must be a continuous one.
Linear regression
Ahmed-Refat-ZU
Correlation between Doppler velocimetry (RI) and baby birth weight
0
0.2
0.4
0.6
0.8
1
1.5 2 2.5 3 3.5 4 4.5
baby weight in kg
RI
Ahmed-Refat-ZU
3-Multiple regression:
Situations frequently occur in which we are interested in the dependency of a dependent variable on several independent variables, not just one. Test of significance used is the analysis of variance.(F test).
Multiple regression
Ahmed-Refat-ZU
1. How do you select a representative sample of 100 students from a primary school – Use all possible methods of sample selection
2. How to select a primary school from a rural area and another school from an urban area in Egypt?
Ahmed-Refat-ZU
What Type of Sample is?
1. Lottery to select a winner2. Hospitalized Patients with SLE3. Every 6th patient coming to an
outpatient clinic 4. Random 20 females and 20 males out
of group of 100 person5. All workers in a factory chosen from
all factories in certain governorate
Ahmed-Refat-ZU
Present the following data by a suitable table & graph
Infant mortality rates in 2006 in some countries were as follows : Egypt =25/1000 , USA=10/1000 , Sweden 12/1000 and Pakistan= 30/1000
Ahmed-Refat-ZU
Present the following data by a suitable table & graph
A the body weight (Kg ) of a group of male children were as follow:
12-22-18-17-28-20-16-21-19-16-27-21 Kg and for a group of female children were as follows:
16-23-19-29-18-22-17-15-21-21-24 Kg
Ahmed-Refat-ZU
The weight (Kg ) of a pregnant
Ahmed-Refat-ZU
top related