basic statistical concepts and methods

Ahmed-Refat-ZU

Basic Statistical Concepts and Methods

Ahmed-Refat AG RefatAhmed-Refat AG Refat

FOM-ZUFOM-ZU

Ahmed-Refat-ZU

Definition of Statistics

Statistics is the science of dealing with numbers.

It is used for collection, summarization,

presentation and analysis of data.

Statistics provides a way of organizing data to get information on a wider and more formal (objective) basis than relying on personal experience (subjective).

Ahmed-Refat-ZU

Uses of medical statistics

Medical statistics are used in1- Planning, monitoring and evaluating community

health care programs.2- Epidemiological research studies.3- Diagnosis of community health problems.4- Comparison of health status and diseases in different

countries and in one country over years. 5- To form standards for the different biological

measurements as weight, height.6- To differentiate between diseased and normal groups.

Ahmed-Refat-ZU

Types of data

Any aspect of an individual that is measured, is called

variable. Variables are either

1-Quantitative or 2-Qualitative.

1- Quantitative data: it is numerical data. Discrete data: are usually whole numbers, such as

number of cases of certain disease, number of hospital beds (no decimal fraction).

Continuous data: it implies the measurement on a continuous scale e.g. height, weight, age (a decimal fraction can be present).

Ahmed-Refat-ZU

1- Quantitative data

. Quantitative data: it is numerical data.

Tow Types A- Discrete data: are usually whole numbers, such

as number of cases of certain disease, number of hospital beds (no decimal fraction).

B- Continuous data: it implies the measurement on a continuous scale e.g. height, weight, age (a decimal fraction can be present).

Ahmed-Refat-ZU

2- Qualitative data

Qualitative data: It is non numerical data and is subdivided into Two Types:

A- Categorical : data are purely descriptive and imply no ordering of any kind such as sex, area of residence.

B- Ordinal data: are those which imply some kind of ordering like

- Level of education: - Socio-economic status: - Degree of severity of disease:

Ahmed-Refat-ZU

Presentation Of Data

The first step in statistical analysis is to present data in an easy way to be understood.

The two basic ways for data presentation are:

1. Tabular presentation.

2. Graphical presentation

Ahmed-Refat-ZU

Tabulation

Some rules for the construction tables: 1- The table must be self-explanatory. 2- Title: written at the top of table to define precisely the content, the place and the time.3- Clear heading of the columns and rows and units of measurements 4- The size of the table depends on the number of classes. Usually lie between 2 and 10 rows or classes. Its selection depends on the form of data and the requirement of the distribution. Too small may obscure some information and too long will not differ from raw data.

Ahmed-Refat-ZU

Types of tables

For Qualitative data, draw a simple table eg., List Table : count the number of observations ( frequencies) in each category.

For Quantitative data, we have to form a frequency distribution Table

List tables (2 columns- one value for each measured variable)

Frequency Distribution Tables

Ahmed-Refat-ZU

Types of tables

:List: A table consisting of two columns, the first giving an identification of the observational unit and the second giving the value of variable for that unit.Example : number of patients in each hospital department are

Medicine 100 patients Surgery 80 “ ENT 28 “

Ophthalmology 30 “

Ahmed-Refat-ZU

Frequency Distribution tables

FDTs are used for presentation of qualitative ( and quantitative Discrete) data,

By recording the number of observations in each category.

These counts are called frequencies.

…………………………………….

No Classes ….. No Intervals

Ahmed-Refat-ZU

FDT for Quantitative Continuous Data consists of a series of classes (intervals) together with the number of observations ( frequency) whose values fall within the interval of each class.

Ahmed-Refat-ZU

EXAMPLE (1) Assume we have a group of 20 individuals whose blood groups were as followed : A , AB, AB, O, B, A, A, B, B, AB, O, AB, AB, A, B, B, B, A, O, A. We want to present these data by table.

????? Type of data >>>>>>……

Ahmed-Refat-ZU

How to Construct a Frequency Distribution

tablesFour Steps

Title, Table, No , %1- Put a title

2- Draw Columns & Rows

3- Enumerate the individuals in each category

4- 4- Calculate The relative frequency (%)Calculate The relative frequency (%)

Ahmed-Refat-ZU

How to Construct a Frequency Distribution

tablesFour Steps

1- Put a title eg.,

Distribution of the studied individuals according to their blood group.

2- Draw a table (Columns & Rows),First column > Studied Variable“ Blood Group”, 2nd column heading >“Frequency-Number”

3rd column heading > “ Percentage %”

Ahmed-Refat-ZU

3- Enumerate the individuals in each blood group , i.e. individuals with blood group A are 6 and those with blood group B are 6 , AB are 5 and blood group

O are 3.

Make sure that the total number of individuals in all blood groups is 20 (the number of the studied group).

Ahmed-Refat-ZU

4- Calculate The relative frequency 4- Calculate The relative frequency (%)(%) of each blood group by dividing the frequency of that group over the total number of individuals and multiplied by 100 i.e. the percentage of group A = 6 / 20 x 100, and the same for group AB = 5 / 20 x 100 and group O = 3 / 20 x 100. The final

table will be :

Ahmed-Refat-ZU

Frequency Distribution tables What is Your Conclusion?

Ahmed-Refat-ZU

We can conclude from this table that blood groups A & B are the most common groups and the rarest is group O (depending on the percentage of each group).

So presenting data in table is beneficial in deducing facts and simplify information than raw data.

Ahmed-Refat-ZU

EXAMPLE (3) : The Following data are Systolic Blood Pressure measurements (mmHg) of 30 patients with hypertension. Present these data in frequency table:

150, 155, 160, 154, 162, 170, 165, 155, 190, 186, 180, 178, 195, 200, 180,156, 173, 188, 173, 189, 190, 177, 186,

177, 174, 155, 164, 163, 172, 160.

??????? Type of Data

Ahmed-Refat-ZU

Four Steps 1- Put a title eg.,

Frequency distribution of blood pressure measurements (mmHg) among a group of

hypertensive patients. 2- Draw a table (Columns & Rows),

First column > Studied Variable“ Blood Pressure-mm Hg”,

2nd column heading >“Frequency-Number”

3rd column heading > “ Percentage %”

Ahmed-Refat-ZU

3-In the first column we have to classify blood pressure into categories or classes because we have a large sample (N=30)

and the measured variable is of continuous type (not discrete as in the previous

examples).

Ahmed-Refat-ZU

construction of classes Calculate the Range of observation: subtract the lowest value of blood pressures from the highest value

(the highest was 200 and the lowest was 150) the difference is 50.

Determine the number of classes and the width class intervals Let class interval be 10 , so we will have 50/10 = 5 classes. Enumerate the Frequency By Tally MethodsCalculate the Exact Frequncy & Relative frequency

Ahmed-Refat-ZU

construction of classes Determine the the number of classes You want to display ( not too few ~2 and too frequent >8. it is a matter of trial and sense !!!Let class interval= 10 mmHg , we will have 5 classes. If we choose 5 mmHg as a class interval-width we will obtain 10 classes (too long table).

We must maintain constant width for all intervals. Choose the upper and lower limits of the class start with the lowest value i.e 150 List the intervals in order every 10

Ahmed-Refat-ZU

2-Graphical PresentationThe diagram should be:

Simple

Easy to understand

Save a lot of words

Self explanatory

Has a clear title indicating its content

Fully labeledThe y axis (vertical) is usually used for frequency

Ahmed-Refat-ZU

2-Graphical Presentation

Graphic presentations used to illustrate and clarify information. Tables are essential in presentation of scientific data and diagrams are complementary to summarize these tables in an easy, attractive and simple way.

Ahmed-Refat-ZU

Graphical Presentation 1-Bar chart

It is used for presenting discrete or qualitative data. It represent the measured value (or %) by separated rectangles of constant width and its lengths proportional to the frequencyType:

>>>Simple , >>> Multiple, >>>Components

Ahmed-Refat-ZU

Graphical Presentation 1-Bar chart- Simple

Mean maternal age of three studied groups

group I group II group III

The studied groups

Ahmed-Refat-ZU

Multiple bar chart: Each observation has more than one value represented, by a group of bars. Percentage of males and females in different countries, percentage of deaths from heart diseases in old and young age, mode of delivery (cesarean or vaginal) in different female age groups.

Ahmed-Refat-ZU

Graphical Presentation 1-Bar chart-MultipleMultiple bar chart:

Cancer Anemia

Females

Ahmed-Refat-ZU

Component bar chart : subdivision of a single bar to indicate the composition of the total divided into sections according to their relative proportion.

Ahmed-Refat-ZU

Component bar chart : For example two countries are compared in their socio-economic standard of living, each bar represent one country, the height of the bar is 100, it is divided horizontally into 3 components (low, moderate and high classes) of socio-economic classes (SE), each class is represented by different color or shape.

Ahmed-Refat-ZU

Graphical Presentation 1-Bar chart- Component

Egypt USA

Comparison between Egypt and USA in socio-economic standard of living

moderate

Ahmed-Refat-ZU

Graphical Presentation 2-Pie diagram:

Consist of a circle whose area represents the total frequency (100%) which is divided into segments.

Each segment represents a proportional composition of the total frequency.

Ahmed-Refat-ZU

Graphical Presentation 2-Pie diagram:

Percentage of causes of child death in Egypt

diarrhea50%

chest infection30%

congenital10%

accident10%

Ahmed-Refat-ZU

Graphical Presentation 3- Histogram:

It is very similar to the bar chart with the difference that the rectangles or bars are adherent (without gaps).

It is used for presenting class frequency table (continuous data).

Each bar represents a class and its height represents the frequency (number of cases), its width represent the class interval.

Ahmed-Refat-ZU

Graphical Presentation 3- Histogram:

Distribution of studied group according to their height

100- 110- 120- 130- 140- 150-

height in cm

Ahmed-Refat-ZU

Graphical Presentation 4 -Frequency Polygon

Derived from a histogram by connecting the mid points of the tops of the rectangles in the histogram.The line connecting the centers of histogram rectangles is called frequency polygon. We can draw polygon without rectangles so we will get simpler form of line graph.

A special type of frequency polygon is the Normal Distribution Curve.

Ahmed-Refat-ZU

Graphical Presentation 5 - Scatter diagram

- It is useful to represent the relationship between two numeric measurements, each observation being represented by a point corresponding to its value on each axis

Ahmed-Refat-ZU

This scatter diagram showed a positive or direct relationship between NAG and

albumin/creatinine among diabetic patients

Correlation between NAG and albumin creatinine ratio in group of early diabetics

101520253035

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

albumin creatinine ratio

Ahmed-Refat-ZU

In negative correlation, the points will be scattered in downward direction, meaning that the relation between the two studied measurements is controversial i.e. if one measure increases the other decreases. As shown in the following graph

Correlation between Doppler velocimetry (RI) and baby birth weight

1.5 2 2.5 3 3.5 4 4.5

baby weight in kgR

Ahmed-Refat-ZU

Graphical Presentation 6- Line graph:

it is diagram showing the relationship between two numeric variables (as the scatter) but the points are joined together to form a line (either broken line or

smooth curve)

Changes in body temperature of a patient after use of antibiotic

1 2 2 4 5 6 7

time in hours

Ahmed-Refat-ZU

Normal Distribution Curve

Ahmed-Refat-ZU

Normal Distribution curve

NDC is a Graphical Presentation <Frequency Polygon> of any Quantitative Biologic Variables

The Normal Distribution Curve is the frequency polygon of a quantitative variable measured in large number. It is a form of presentation of frequency distribution of biologic variables such as weights, heights, hemoglobin level and blood pressure or any continuous data.

It occupies a major role in the techniques of statistical analysis.

Ahmed-Refat-ZU

Characteristics of Normal Distribution curve

1- It is bell shaped, continuous curve.2- It is symmetrical i.e. can be divided into two

equal halves vertically.3- The tails never touch the base line but

extended to infinity in either direction.4- The mean, median and mode values coincide5- It is described by two parameters: arithmetic

mean determine the location of the center of the curve and standard deviation represents the scatter around the mean.

Ahmed-Refat-ZU

Areas under the normal curve

X ± 1 SD = 68% of the area on each side of the mean.

X ± 2 SD = 95% of area on each side of the mean.

X ± 3 SD = 99% of area on each side of the mean.

Ahmed-Refat-ZU

Skewed data

If we represent a collected data by a frequency polygon graph and the

resulted curve does not simulate the normal distribution curve (with all its characteristics)

then these data are not normally distributed

Ahmed-Refat-ZU

Causes of Skewed CurveNot Normally Distributed Data

The curve may be skewed to the right or to the left side

This is because The data collected are from:

1. certain heterogeneous group

2. or from diseased or abnormal population

therefore the results obtained from these data can not be applied or generalized on the whole population.

Ahmed-Refat-ZU

NDC can be used in distinguishing between normal from abnormal measurements.

Example:If we have NDC for hemoglobin levels for a

population of normal adult males with mean ± SD = 11 ±1.5

If we obtain a hemoglobin reading for an individual = 8.1 and we want to know if he/she is normal or

anemic. If this reading lies within the area under the curve at

95% of normal (i.e. mean ± 2 SD)he /she will be considered normal. If his reading is less

then he is anemic.

Ahmed-Refat-ZU

The normal range for hemoglobin in this example will be: the higher level of hemoglobin: 11 + 2 ( 1.5 ) =14. The lower hemoglobin level 11 – 2 ( 1.5 ) = 8.

i.e the normal range of hemoglobin of adult males is from 8 to 14.

our sample (8.1 ) our sample (8.1 ) lies within the 95% of his population.

therefore this individual is normalis normal because his reading lies within the 95% of his population.

Ahmed-Refat-ZU

Data Summarization

To summarize data, we need to use one or two parameters that can describe the data.

1. Measures of Central tendency which describes the center of the data

2. and the Measures of Dispersion, which show how the data are scattered around its center.

Ahmed-Refat-ZU

Measures of central tendency

Variable usually has a point (center) around which the observed values lie. These averages are also called measures of central tendency. The three most commonly used averages are:

1. The arithmetic mean:

2. The Median

3. The Mode

Ahmed-Refat-ZU

1- The arithmetic mean:

the sum of observation divided by the number of observations:

x = ∑ x

Where : x = mean

∑ denotes the (sum of)

x the values of observation

n the number of observation

Ahmed-Refat-ZU

Example: In a study the age of 5 students were: 12 , 15, 10, 17, 13

Mean = sum of observations / number of observations

Then the mean X = (12 + 15 + 10 + 17 + 13) / 5 =13.4 years

1- The arithmetic mean:

Ahmed-Refat-ZU

Calculation of Mean For frequency Distribution Data

In case of frequency distribution data we calculate the mean by this equation:

x = ∑ fx nwhere f = frequency

for example : we want to calculate the mean incubation period of this group.

Ahmed-Refat-ZU

If data is presented in frequency table with class intervals we calculate mean by the same equation summation of f x1 /n , x1 denotes the midpoint of class interval.

Example : calculate the mean of blood pressure of the following group :

with class intervals

Ahmed-Refat-ZU

2- Median

It is the middle observation in a series of observation after arranging them in an ascending or descending manner.

The rank of median for is (n + 1)/2 if

the number of observation is odd and n/2 if the number is even

Ahmed-Refat-ZU

Calculate the median of the following data 5, 6, 8, 9, 11 n = 5~ Odd!!

-The rank of the median = n + 1 / 2 i.e. (5+ 1)/ 2 = 3

The median is the third value in these groups when data are arranged in ascending (or descending) manner.

- So the median is 8 (the third value)

2- Median

Ahmed-Refat-ZU

- If the number of observation is even, the median will be calculated as follows:e.g. 5, 6, 8, 9 n = 4

- The rank of median = n / 2 i.e. 4 / 2 = 2 .The median is the second value of that group. If data are arranged ascendingly then the median will be 6 and if arranged descendingly the median will be 8 therefore the median will be the mean of both observations i.e. (6 + 8)/2 =7.

2- Median

Ahmed-Refat-ZU

For simplicity we can apply the same equation used for odd numbers i.e. n + 1 / 2. The median rank will be 4 + 1 /2 = 2 ½ i.e. the median will be the second and the third values i.e. 6 and 8, take their mean = 7.

2- Median

Ahmed-Refat-ZU

The most frequent occurring value in the data is the mode and is calculated as follows:

Example: 5, 6, 7, 5, 10. The mode in this data is 5 since number 5 is repeated twice. Sometimes, there is more than one mode and sometimes there is no mode especially in small set of observations.

3- Mode

Ahmed-Refat-ZU

Example : 20 , 18 , 14, 20, 13, 14, 30, 19. There are two modes 14 and 20.

Example : 300, 280 , 130, 125 , 240 , 270 . Has no mode.

Unimodal Bimodal Nomodal

3- Mode

Ahmed-Refat-ZU

Advantages and disadvantages

of the measures of central Tendency:

- Mean: is the preferred CTM since it takes into account each individual observation but its main disadvantage is that it is affected by the extreme valus of observations.

Ahmed-Refat-ZU

Median: it is a useful descriptive measure if there are one or two extremely high or low values.

-Mode: is seldom used.

Advantages and disadvantages

of the measures of central Tendency:

Ahmed-Refat-ZU

Measures of Dispersion

The measure of dispersion describes the degree of variations or scatter or dispersion of the data around its central values: (dispersion = variation = spread = scatter).

1. Range - R2. Variance -V3. Standard Deviation - SD4. Coefficient of Variation -COV

Ahmed-Refat-ZU

1- Range:

is the difference between the largest and smallest values. is the simplest measure of variation.

disadvantages, it is based only on two of the observations and gives no idea of how the other observations are arranged between these two.

Also, it tends to be large when the size of the sample increases

Ahmed-Refat-ZU

If we want to get the average of differences between the mean and each observation in the data,we have to reduce each value from the mean

and then sum these differences and divide it by the number of observation. V

= ∑ (mean – xi) / n

2- Variance

Ahmed-Refat-ZU

Variance V = ∑ (mean – x) / n

The value of this equation will be equal to zero

because the differences between each value and the mean will have negative and positive signs that will equalize zero on algebraic summation.

2- Variance

Ahmed-Refat-ZU

To overcome this zero we square the difference between the mean and each value so the sign will be always positive . Thus we get:

V = ∑ (mean – x)2 / n - 1

2- Variance

Ahmed-Refat-ZU

3- Standard Deviation SD

The main disadvantage of the variance is that it is the square of the units used. So, it is more convenient to express the variation in the original units by taking the square root of the variance. This is called the standard deviation (SD). Therefore SD = √ V

i.e. SD = √ ∑ (mean – x)2 / n - 1

Ahmed-Refat-ZU

The coefficient of variation expresses the standard deviation as a percentage of the sample mean.

C. V = SD / mean * 100

C.V is useful when, we are interested in the relative size of the variability in the data. Example : if we have observations 5, 7, 10, 12 and 16. Their mean will be 50/5=10. SD = √ (25+9 +0 + 4 + 36 ) / (5-1) = √ 74 / 4 = 4.3C.V. = 4.3 / 10 x 100 = 43%

4- Coefficient of variation CoV

Ahmed-Refat-ZU

Example

Calculate the mean, variance, SD and CV From the following measurements

5, 7, 10, 12 and 16.

Mean= 5+7+10+12+16/5=10.

SD = √ (25+9 +0 + 4 + 36 ) / (5-1) =

√ 74 / 4 = 4.3

C.V. = 4.3 / 10 x 100 = 43%

Ahmed-Refat-ZU

Another observations are 2, 2, 5, 10, and 11. Their mean = 30 / 5 = 6 SD = √ (16 + 16 + 1 + 16 + 25)/(5 –1) = √ 74 / 4 = 4.3 C.V = 4.3 /6 x 100 = 71.6 %Both observations have the same SD but they are different in C.V. because data in the first group is homogenous (so C.V. is not high), while data in the second observations is heterogenous (so C.V. is high).

Example

Ahmed-Refat-ZU

Example: In a study where age was recorded the following were the observed values: 6, 8, 9, 7, 6. and the number of observations were 5.Calculate the mean, SD and range, mode and median.- The mean = sum of observation / their number

Example

Ahmed-Refat-ZU

The variance = Sum of the squared differences (mean minus observation) / number of observations. (7.2 – 6)2 + (7.2 – 8)2 + (7.2 – 9)2 + (7.2 – 7)2 + (7.2 – 6)2 / 5 – 1. which is equal to (1.2)2 + (- 0.8)2 + (- 1.8) 2 +(0.2)2 + (1.2)2 / 4 = 1.7

- So the variance = 1.7

Examples

Ahmed-Refat-ZU

- The S.D. = √ 1.7 = 1.3

- Range = 9 – 6 = 3

- The mode is 6

- The median is : first we have to arrange data ascendingly i.e. 6 – 6 – 7 – 8 – 9.

The rank of median = n + 1 / 2 i.e. 5 + 1 / 2 = 3 therefore the median is the third value i.e. median = 7

Examples

Ahmed-Refat-ZU

Inferential statistics

Inference involves making a Generalization about a larger group of individuals on the basis of a subset or sample.

Ahmed-Refat-ZU

Inferential statisticsHypothesis Testing

In hypothesis testing we want to find out whether the observed variation among sampling is explained by chance alone ???? (i.e., the chance of random sampling

variations ), or due to a real difference ???? between groups.

Ahmed-Refat-ZU

Hypothesis Testing

It involves conducting a test of statistical significance quantifying the chance of

random sampling variations that may account for observed results. In hypotheses testing, we are asking whether the sample mean for example is consistent with a certain hypothesis value for the population mean.

Ahmed-Refat-ZU

Hypothesis Testing

The method of assessing the hypotheses testing is known as

significance testsignificance test.

The significance testingThe significance testing is a method for assessing whether a result is likely to be due to chance or due to a real effect.

Ahmed-Refat-ZU

Hypothesis Testing –Steps

>>> Formulate Hypothesis

>>> Collect the Data

>>>> Test Your Hypothesis

>>> Accept of Reject Your Hypothesis

Ahmed-Refat-ZU

Null and alternative hypotheses

In hypotheses testing, a specific hypothesis ( Null and alternative Hypothesis ) are formulated and tested. The null hypotheses H0 means : X1=X 2

Or X1-X 2=0this means that there is no difference between x1 and x2

The alternative hypotheses H1 means X1>X2 or X1< X2

Ahmed-Refat-ZU

Null and alternative hypotheses

The alternative hypotheses H1 means X1>X2 or X1< X2

this means that there is no difference between x1 and x2. If we reject the null hypothesis, i.e there is a difference between the two readings, it is either H1 : x1 < x2 or H2 : x1> x2in other words the null hypothesis is rejected because x1 is different from x2.

Ahmed-Refat-ZU

General principles of significance tests

1. set up a null hypothesis and its alternative.

2. find the value of the test statistic.

3. refer the value of the test statistic to a known distribution which it would follow if the null hypothesis was true.

Ahmed-Refat-ZU

General principles of significance tests

4-conclude that the data are consistent or inconsistent with the null hypothesis.

If the data are not consistent with the null hypotheses, the difference is said to be statistically significant. If the data are consistent with the null hypotheses it is said that we accept it i.e. statistically insignificant.

Ahmed-Refat-ZU

General principles of significance tests P<0.05

In medicine, we usually consider that differences are significant if the probability is less than 0.05. This means that if the null hypothesis is true, we shall make a wrong decision less than 5 in a hundred times

Ahmed-Refat-ZU

Tests of significance

The selection of test of significance depends essentially on the type of data that we have.

1-Quantitative Data ( Means & SD): tt

test ,test ,paired tpaired t test and , test and ,ANOVAANOVA

2-Qualitative Data>>> ChiChi, and , and Z testZ test.

Ahmed-Refat-ZU

Comparison of means:1-comparing two means of large samples using the normal distribution:(z test or SND standard normal deviate)If we have a large sample size i.e. 60 or more and it follows a normal distribution then we have to use the z-test.

z = (population mean — sample mean) / SD. If the result of z >2 then there is significant difference.

Ahmed-Refat-ZU

Since the normal range for any biological reading lies between the mean value of the population reading ± 2 SD. (this range includes 95% of the area under the normal distribution curve).

Ahmed-Refat-ZU

Student’s t-test

2-Comparing two means of small samples using t-test:

If we have a small sample size (less than 60), we can use the t distribution instead of the normal distribution.

T = mean1 — mean2 / (SD1 2 / n1) +

(SD22 / n2)

Ahmed-Refat-ZU

The value of t will be compared to values in the specific table of "t distribution test" at the value of the degree of freedom. If the value of t is less than that in the table , then the difference between samples is insignificant.

If the t value is larger than that in the table so the difference is significant i.e. the null hypothesis is rejected.

t-test

Ahmed-Refat-ZU

2-Comparing two means of small samples using t-test:

If we have a small sample size (less than 60), we can use the t distribution instead of the normal distribution.

T = mean1 — mean2 / (SD1 2 / n1) +

(SD22 / n2)

t-test

Ahmed-Refat-ZU

3-paired t-test:

If we are comparing repeated observation in the same individual or difference between paired data, we have to use paired t-test where the analysis is carried out using the mean and standard deviation of the difference between each pair.

Paired t-test

Ahmed-Refat-ZU

4-comparing several means:

Sometimes we need to compare more than two means, this can be done by the use of several t-test which is not only tedious but can lead to spurious significant results. Therefore we have to use what we call analysis of variance or ANOVA.

Ahmed-Refat-ZU

4-comparing several means:There are two main types: one-way analysis of variance and two-way analysis of variance. One-way analysis of variance is appropriate when the subgroups to be compared are defined by just one factor, for example comparison between means of different socio-economic classes. The two-way analysis of variables is used when the subdivision is based upon more than one factor

Ahmed-Refat-ZU

The main idea in the analysis of variance is that we have to take into account the variability within the groups and between the groups and value of F is equal to the ratio between the means sum square of between the groups and within the groups.

F = between-groups MS / within-groups MS

Ahmed-Refat-ZU

b-Qualitative variables:

1)Chi -squared test:

Qualitative data are arranged in table formed by rows and columns. One variable define the rows and the categories of the other variable define the column.

Chi-Squared Test

Ahmed-Refat-ZU

A chi-squared test is used to test whether there is an association between the row variable and the column variable or, in other words whether the distribution of individuals among the categories of one variable is independent of their distribution among the categories of the other.

X2=(O-E)2 / E

Chi-Squared Test

Ahmed-Refat-ZU

1)Chi -squared test:

degree of freedom = (row - 1) (column - 1)

O = observed value in the table

E = expected value calculated as follows:

E= Rt x Ct / GT

total of row x total of column / grand total

Chi-Squared Test

Ahmed-Refat-ZU

From tables of X2 significance at degree of freedom (row 3-1)x(column 3-1) = 2x 2=4. The level of significance at 0.05 level, d.f.=4 is 9.48. therefore we conclude that there is significant relation between socioeconomic level and the degree of intelligence (because the value of X2 > that of the table).

Chi-Squared Test

Ahmed-Refat-ZU

2) Z test for comparing two percentages:

z = p1 – p2 /√p1q1/n1 + p2q2/n2. where p1=percentage in the 1st group. P2 = percentage in the 2nd group, q1=100-p1, q2=100-p2, n1= sample size of group 1, n2=sample size of group2.Z test is significant(at 0.05 level)if the result>2.

Z Test

Ahmed-Refat-ZU

Example: if the number of anemic patients in group 1 which includes 50 patients is 5 and the number of anemic patients in group 2 which contains 60 patients is 20. To find if groups 1 & 2 are statistically different in prevalence of anemia we calculate z test.

P1=5/50=10% p2=20/60=33% q1=100-10=90 q2=100-33=67

Chi-Squared Test

Ahmed-Refat-ZU

Z=10 – 33/ √ 10x90/50 + 33x67/60

Z= 23 / √ 18 + 36.85 z= 23/ 7.4 z= 3.1

Therefore there is statistical significant difference between percentages of anemia in the studied groups (because z >2).

Chi-Squared Test

Ahmed-Refat-ZU

c-Correlation and regression:

Correlation measures the closeness of the association between two continuous variables, while linear regression gives the equation of the straight line that best describes and enables the prediction of one variable from the other.

Correlation & regression

Ahmed-Refat-ZU

1-Correlation:In the correlation, the closeness of the association is measured by the correlation coefficient, r. The values of r ranges between + 1 and —1. One means perfect correlation while 0 means no correlation. If r value is near the zero, it means weak correlation while near the one it means strong correlation. The sign — and + denotes the direction of correlation,

Correlation & regression

Ahmed-Refat-ZU

1-Correlation:

the +ve correlation means that if one variable increases the other one increases similarly while for the –ve correlation means that when one variable increases the other one decreases

Correlation

Ahmed-Refat-ZU

2- Linear regression:

Similar to correlation, linear regression is used to determine the relation and prediction of the change in a variable due to changes in other variable. For linear regression, the independent factor has to be specified from the dependent variable.

Linear regression

Ahmed-Refat-ZU

2- Linear regression:The linear regression, not only allow assessment of the presence of association between the independent and dependent variable but also allows the prediction of dependent variable for a particular independent variable. However, regression for prediction should not be used outside the range of original data. a t-test is also used for the assessment of the level of significance. The dependent variable in linear regression must be a continuous one.

Linear regression

Ahmed-Refat-ZU

Correlation between Doppler velocimetry (RI) and baby birth weight

1.5 2 2.5 3 3.5 4 4.5

baby weight in kg

Ahmed-Refat-ZU

3-Multiple regression:

Situations frequently occur in which we are interested in the dependency of a dependent variable on several independent variables, not just one. Test of significance used is the analysis of variance.(F test).

Multiple regression

Ahmed-Refat-ZU

1. How do you select a representative sample of 100 students from a primary school – Use all possible methods of sample selection

2. How to select a primary school from a rural area and another school from an urban area in Egypt?

Ahmed-Refat-ZU

What Type of Sample is?

1. Lottery to select a winner2. Hospitalized Patients with SLE3. Every 6th patient coming to an

outpatient clinic 4. Random 20 females and 20 males out

of group of 100 person5. All workers in a factory chosen from

all factories in certain governorate

Ahmed-Refat-ZU

Present the following data by a suitable table & graph

Infant mortality rates in 2006 in some countries were as follows : Egypt =25/1000 , USA=10/1000 , Sweden 12/1000 and Pakistan= 30/1000

Ahmed-Refat-ZU

Present the following data by a suitable table & graph

A the body weight (Kg ) of a group of male children were as follow:

12-22-18-17-28-20-16-21-19-16-27-21 Kg and for a group of female children were as follows:

16-23-19-29-18-22-17-15-21-21-24 Kg

Ahmed-Refat-ZU

The weight (Kg ) of a pregnant

Ahmed-Refat-ZU

basic statistical concepts and methods

presentation of data

data presentation

quantitative data

present data

types of data

form of data

discrete data

raw data

Education

text: investigating statistical concepts, applications, and...

brief review of statistical concepts and methods

pearson edexcel award in statistical methods ·...

stat/soc/csss 221 statistical concepts and methods for...

numerical concepts geometrical concepts statistical

basic statistical concepts part ii psych 231: research...

jargon & basic concepts howell statistical methods for...

nonparametric statistical methods using r - pindex.com...

1 stat 217 introduction to statistical concepts and methods

statistical concepts: introduction

statistical and discrete methods for scientific...

name of research group - stellenbosch university · –...

basic statistical concepts

review of statistical models and linear regression concepts...

stat/soc/csss 221 statistical concepts and...

statistical methods

bayesian statistical concepts

basic statistical concepts, research design, &...

basic statistical concepts psych 231: research methods in...

botswana compendium of statistical concepts and...