msc unit 2

9
Biostatistics UNIT – 2 CHAPTER 8 DESCRIPTIVE STATISTICS Measures of central tendency To describe the central theme of data, and summarize the characteristics of an entire mass of data. The most common and useful measure of central tendency is the arithmetic mean. The other measures are median, mode, geometric mean, harmonic mean and weighted mean. Measures of dispersion Describe the extent of scatter of the values around measure of central tendency.( hoe far or how near are they to an average) Standard deviation is the most important and common measure of dispersion. The other measures of dispersion are range, quartile deviation, decile range and mean deviation. Chapter 9 measures of central tendency - averages Averages Definition: An average is a value that summarizes the characteristics of an entire mass of data. Objectives: (i) To present huge mass of statistical data in a simple and concise manner. (ii) It makes the central theme of the data readily understandable. (iii) It is useful for purposes of camparison. Types of Averages: (i) arithmetic mean (ii) median (iii) mode (iv) geometric mean (v) harmonic mean Arithmetic mean / Mean It is defined as the sum of all the variates of a variable divided by the total number of items in the sample. It should be expressed in the same unit in which the data is given. Median It is the value of the middle item of a given series of data arranged in ascending order of magnitude It should be expressed in the same unit in which the data is given. Mode Mode is defined as that value which occurs most frequently in a sample.

Upload: karthi-shanmugam

Post on 05-Feb-2016

217 views

Category:

Documents


0 download

DESCRIPTION

msc

TRANSCRIPT

Page 1: msc unit 2

Biostatistics UNIT – 2

CHAPTER 8 DESCRIPTIVE STATISTICS Measures of central tendency •To describe the central theme of data, and summarize the characteristics of an entire mass of data. •The most common and useful measure of central tendency is the arithmetic mean. •The other measures are median, mode, geometric mean, harmonic mean and weighted mean. Measures of dispersion •Describe the extent of scatter of the values around measure of central tendency.( hoe far or how near are they to an average) •Standard deviation is the most important and common measure of dispersion. •The other measures of dispersion are range, quartile deviation, decile range and mean deviation.

Chapter 9 measures of central tendency - averages Averages •Definition: An average is a value that summarizes the characteristics of an entire mass of data. •Objectives: (i) To present huge mass of statistical data in a simple and concise manner. (ii) It makes the central theme of the data readily understandable. (iii) It is useful for purposes of camparison. •Types of Averages: (i) arithmetic mean (ii) median (iii) mode (iv) geometric mean (v) harmonic mean Arithmetic mean / Mean •It is defined as the sum of all the variates of a variable divided by the total number of items in the sample. •It should be expressed in the same unit in which the data is given. Median •It is the value of the middle item of a given series of data arranged in ascending order of magnitude •It should be expressed in the same unit in which the data is given. Mode •Mode is defined as that value which occurs most frequently in a sample.

Page 2: msc unit 2

•A sample with a single mode is referred to as unimodal. If a sample has two modes, it is called bimodal. Multimodal or polymodal samples also do occur. A sample with no mode is called a no modal sample of ill-defined mode. Geometric mean •It is defined as the nth root of the product of the n items in an ungrouped data. •It is used when the average of a rate of change is required Harmonic mean •It is defined as the reciprocal of the arithmetic mean of the reciprocals of the given data. •It is an appropriate measure to average the speed and time. Weighted averages

Properties of Mean: •The arithmetic mean possesses certain properties, some are desirable and some are not so desirable. These properties include the following: 1. Uniqueness. For a give set of data there is one and only arithmetic mean. 2. Simplicity. The arithmetic mean is easily understood and easy to compute. 3. since each and every value in a set of data enters into the computation of the mean, it is affected by each value. Extreme values, therefore, have an influence on the mean and, in some cases, can so distort it that it becomes undesirable as a measure of central tendency Merits and Demerits of arithmetic mean: •Merits: • It is easy to understand and easy to compute. • It is rigidly defined. • It is based upon all the observations. •Demerits: • It cannot be obtained if a single value is lost. • Not suitable for open end class. • It is not suitable for qualitative phenomenon. Properties of Median • Uniqueness. As is true with the mean, there is only one median for a given set of data. • Simplicity. The median is easy to calculate. • It is not as drastically affected by extreme values as is the mean. Relation between AM,GM,HM •AM>GM>HM

Page 3: msc unit 2

Comparison of Mean, Median, Mode •A distribution in which the values of mean, median, mode coincide it is known as a symmetrical distribution. When the values of mean, median, mode are not equal the distribution is known as asymmetrical or skewed. •Karl Pearson’s (Empirical formula) has expressed this relationships follows. Mode = 3 Median – 2 mean ( this formula is used to find mode for ill-defined distributions)

Chapter 10. Measures of Dispersion Measures of dispersion •It is defined as an absolute or relative measure of differences of the values of the various items from a measure of central tendency of these items. •The difference b/w the value of an item and a measure of central tendency is called ‘deviation’. •An average of the deviations of the values of various items from a measure of central tendency is called a measure of dispersion. •The different measures of dispersion are range, quartile deviation, decile range, standard deviation and mean deviation. Range •It is defined as the difference b/w maximum value and minimum value of the given series of data. Quartile Deviation •The given data( in ascending order) is divided into four equal parts called quartiles. •Q1: first quartile or lower quartile •Q2: second quartile or middle quartile or median •Q3: third quartile or upper quartile •Quartile deviation = ( Q3 – Q1 ) / 2 •Coefficient of deviation = ( Q3-Q1) / ( Q3 + Q1) Decile Range •The given data ( in ascending order ) is divided into 10 equal parts ( D1,D2,…,D9,D10) •Decile range = D9 – D1 Mean Deviation from mean, median or mode •It is the arithmetic mean of the absolute deviations of the various items from a measure of central tendency ( mean, median or mode) Standard deviation (SD) •SD is defined as the square root of the arithmetic mean of the squared deviations of the various items from arithmetic mean. •Variance is defined as the arithmetic mean of the squared deviations of the various items from arithmetic mean. •Relation b/w SD and variance: SD = square root of variance

Page 4: msc unit 2

Coefficient of variation(cv) •The relative measure of standard deviation is called the coefficient of variation. •It is used to study the variability or consistency of the data. •More cv => less consistent •Less consistent => more cv • cv = SD / mean * 100 Chapter 11 skewness and kurtosis Skewness Skewness : to study the lack of symmetry in the shape of the frequency curve

Coefficient of skewness = SD

emean mod ( Karl – Pearson’s )

(or)

=

13

13 2QQmedianQQ

( Bowley’s)

(or)

32

23

1

( method of moments)

01 - negatively skewed 01 - positively skewed 01 - symmetric

(i) If mean = median = mode, symmetrical distribution (ii) If mean > median > mode, positively skewed distribution (iii) If mean < median < mode, negatively skewed distribution

Page 5: msc unit 2

Kurtosis The degree of peaked ness of a frequency polygon

22

42

Types of Kurtosis (i) if 2 >3 , leptokurtic (ii) if 2 <3, platikurtic (iii) if 2 =3, mesokurtic.

. A- mesokurtic B – Platykurtic C- leptokurtic STEM-AND-LEAF-DIAGRAM •STEM-AND-LEAF-DIAGRAM A simple technique to visualize the nature of the population using the data from a sample of that population is the stem-and-leaf-diagram. It is one of the exploratory data

Page 6: msc unit 2

analysis (EDA) tools, which can be constructed easily and quickly. A stem-and-leaf-diagram is constructed as a series of horizontal rows of numbers. The first number of each row is label of that row and called the stem. The remaining numbers in a row following the stem number are called the leaves. Construction of a Stem-and-Leaf-Diagram •Step 1: Not less than five numbers are chosen from the given data as stems. Usually the first one or two digits of numbers in the given data is chosen as the stems. •Step 2: The rows are labelled using the stem numbers. •Step 3: If the first one or two digits do not provide sufficient number of stems to visualize the shape of the distribution, each stem may be used twice. The first of the twin stems is to enter the lower levels such as 0,1,2,3 & 4 and the second one for the higher levels viz., 5,6,7,8 & 9. •Step 4: canning the data, the digits following the stem number are reproduced as a leaf on the appropriate stem. •Step 5: The diagram is turned on its side to visualize how the numbers are distributed. Specifically the following aspects are considered: •Whether there is any tendency for the leaves to cluster close to a particular stem or stems. •Whether there is any tendency for the data to taper towards one end or the other. •Whether a smooth curve drawn across the top of the diagram forms a rough bell shaped curve. If so, whether the curve is symmetric, flat or peaked. •Step 6: The observations of the stem-and-leaf-diagram with reference to the above aspects would throw light on the nature of the population, such as its pattern, symmetry etc. BOX PLOT •The box plot is a diagrammatic representation of data series to give visual information about measures of central tendency, dispersion and direction of skewness. Chapter 12 Inferential Statistics Inferential Statistics • To reach decisions about a large body of data by examining only a small part of data. [Inferential Statistics: A decision, estimate, prediction, or generalization about a population, based on a sample. ] •Any descriptive measure of the sample ( population) is called as sample statistic ( population parameters) •Inferential Statistics includes “ Hypothesis testing” and “ Tests of significance” Chapter 13 Probability Probability – Basic Definitions •Random experiment is an experiment whose outcomes cannot be predicted in advance.

Page 7: msc unit 2

•Performing a random experiment is known as trial •Each possible outcome of an experiment is called an event •All possible events of a trial are known as Exhaustive Events. •A sample space of a random experiment is the collection of all possible outcomes. •A group of events is said to be Equiprobable Events if there is equal chance for each event in the group to occur. •Two or more events are said to be Mutually Exclusive if the occurrence of one event excludes the occurrence of the other. • Two or more events are said to be Independent if the occurrence of any one of them does not affect the occurrence of the other. Simple choice, Permutation and Combination •If one operation performs in m ways and second operation perform in n ways then these two operations performed together m x n ways. •Permutation means arrangement •Combination means selection Types of Probability •TYPES OF PROBABILITY: We can distinguish two types of probability for the purpose of computation of the value of probability of occurrence of an event. Apriori probability (mathematical probability) Aposteriori probability (statistical probability) •Apriori Probability: The probability is computed based on established facts. If an event can happen in ways and fail to happen in ways and all these ways are mutually exclusive and equiprobable, then the probability of occurrence of the event is and the probability of failure of the event is . It can also be described as Apriori probability (p) = Number of favourable cases / Total number of possible cases •Aposteriori Probability: Aposteriori probability (statistical or empirical probability) is a ratio of the number of occurrences of an event to the total number of trials. Aposteriori Probability: P = Number of times the event occurred / Total number of trials Theorems or rules of probability •Addition Rule: If A and B are Mutually exclusive events, then P(A or B)= P(A)+P(B) •Product Rule: If A and B are independent then P(A and B)= P(A). P(B) •Conditional probability: If A and B are dependent events then (i) P(A/B)= P(A and B) / P(B) (ii) P(B/A)= P(A and B) / P(A) Application of Principles of Probability to biological problems

Page 8: msc unit 2

•Problem 1: Phenylketonuria(PKU) is inherited as a simple autosomal recessive trait. (i) What is the probability that 2 normal persons will produce a PKU child if we know that both sets grandparents are carriers? (ii) Suppose the couple has 3 children .What is the probability that atleast one of the 3 children will be afflicted with PKU? •Problem 2: A man and a lady are getting married. The man has a brother who is afflicted with a sort of mental retardation that is inherited as a simple recessive trait. The man and his parents are all normal. The lady is normal and no one in her family has this form of mental abnormality. (i) What is the probability that this couple will have a mentally afflicted child? The probability that any individual, picked at random from the population, is heterozygous carrier of this mental retardation gene is 1/100. (ii) Suppose the man also has no family history for this abnormality, what is the probability of this couple having an afflicted child? •Problem 3: A women has a haemophilic brother. She is married to a normal man. Her parents were normal. What is the probability that any son born to her will be haemophilic? •Problem 4: A dihybrid cross between AaBb and AaBb where A and B are dominant produces 3 offsprings. What is the probability that atleast two of the three will be of the genotype aabb? Venn Diagrams •A diagrammatic representation of a sample space enclosing all possible events associated with an experiment is the Venn diagram. (J.Venn) Chapter 14 Theoretical Probability Distributions Theoretical Probability Distributions •Binomial •Poisson •Normal

Measure

Individual Observations / raw data

Discrete Continuous

AM

nx

x i N

xfx ii ,

N = if

Nxmidf

x ii ,

N = if GM

Antilog

n

xilog

Antilog

N

xf ii log Antilog

N

midxf ii log

Page 9: msc unit 2

HM

ix

n1

ixf

N

ixmidf

N

Median

Size of positionthn

21

Size of positionthN

21

.

Value Corresponds to Next higher frequency.

Size of positionthN

21

Median = if

cfh

L

2

Mode Max no. of repeated values

Value corresponds to max frequency

Modal class corresponds to max frequency.

Mode= L + h

201

01

2 fffff

Range Highest value – Lowest value

Highest value – Lowest value Highest value – Lowest value

Q1

Size of positionthn

41

Size of positionthN

41

Size of positionthN

41

Q3 Size of 3 positionthn

41

Size of positionthN

413

Size of positionthN

413

Inter Quartile

range

Q3 – Q1

Q3 – Q1 Q3 – Q1

QD

2Q 13 Q

2Q 13 Q

2

Q 13 Q

MD about 3

measures neormedianormeanx mod)()(

Neormedianormeanxf mod)()(

N

ormedianormeanmidxf mod)()(

SD

nxx

2

N

xxf 2

N

xxf 2