bio statistics notes

Upload: gopi-cool

Post on 06-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/3/2019 Bio Statistics Notes

    1/135

    Biostatistics 2010

    1

    INTRODUCTIONANDIMPORTANCE

    The subject of Statistics, as it seems, is not a new discipline but it is asold as that human society itself. The word Statistics seems to have beenderived from the Latin word Status or Italian word Statista or the Germanword Statistik or the French word Statittique each of which means apolitical state. In ancient times the scope of Statistics was primarily limited tothe collection of the following data by the Government for assessing manpower and framing fiscal policies:(i) Age and sexwise population of the country(ii)

    Property and wealth of the countryIn India, an efficient system of collection official and administrativestatistics existed over 2000 years ago in particular during the region ofChandragupta Mauriya (324 300 B.C.). From Kautilyas Arthashastra it isknown that even before 300 BC, a very good system of collecting VitalStatistics and registration of births and deaths was in vogue. During Akbarsregion (1556 1605 A.D) Raja Todarmal, the then land and revenue minister,maintained good record of land and agricultural statistics.Seventeenth century saw the origin of Vital Statistics. Captain JohnGraunt of London (1620 1674), known as the

    fatherofvitalstatistics

    wasthe first man to study the statistics of births and deaths. The theoreticaldevelopment of the so called modern statistics came during the mid seventeenth century with the introduction of Theory of Probability andTheory of games and Chance, the chief contributors being mathematiciansand gamblers of France, Germany and England.Karl Pearson (1857 1936), the founder of greatest statisticallaboratory in England (1911) is the pioneer in correlational analysis. Hisdiscovery of the Chi square test, the first and most important of moderntests of significance won for statistics a place of science. In 1908, thediscovery of Students t distribution by W.S.Gosset who wrote under penname of Student ushered in an area of exact sample tests. Sir Ronald A. Fisher(1890 1962) known as fatherofstatistics placed statistics on a very soundfooting by applying it to various diversified fields such as genetics, biometryeducation, and agriculture.

  • 8/3/2019 Bio Statistics Notes

    2/135

    Biostatistics 2010

    2

    DefinitionofStatistics

    Different authors have given different definitions of statistics.Although no single definition of statistics is satisfactory for the purpose, thefollowing statement will be useful.Statistics is the study of methods and procedures for collecting,

    classifying, summarizing, and analyzing data and for making scientific

    inferencesfromsuchdata.DefinitionsbyA.L.Bowley:Statistics are numerical statement of facts in anydepartment of enquiry placed in relation to each other. Statistics may becalled the science of counting in one of the departments due to Bowley,obviously this is an incomplete definition as it takes into account only theaspect of collection and ignores other aspects such as analysis, presentationand interpretation. Bowley gives another definition for statistics, whichstates statistics may be rightly called the scheme of averages. This definitionis also incomplete, as averages play an important role in understanding andcomparing data and statistics provide more measures.DefinitionbyCroxtonandCowden:Statistics may be defined as the scienceof collection, presentation analysis and interpretation of numerical data fromthe logical analysis. It is clear that the definition of statistics by Croxton andCowden is the most scientific and realistic one.Definition

    by

    Horace

    Secrist:

    Statistics may be defined as the aggregate offacts affected to a marked extent by multiplicity of causes, numericallyexpressed, enumerated or estimated according to a reasonable standard ofaccuracy, collected in a systematic manner, for a predetermined purpose andplaced in relation to each other.DevelopmentofBiostatisticsBiostatistics is defined as the application of the statistical methods tothe problems of biology, including human biology, medicine, and publichealth. It is also known as

    Biometrics

    orBiometry

    (literally meaningBiological measurement).Perhaps the earliest important figure in biostatistics thought wasAdolphe Quetelet (17961874), a Belgian astronomer and mathematician,who in his work combined the theory and practical methods of statistics andapplied them to the problems of biology, medicine and sociology. Francis

  • 8/3/2019 Bio Statistics Notes

    3/135

    Biostatistics 2010

    3

    Galton (18221911) has been called the father of biostatistics andeugenics

    , the two subjects that he studied interrelatedly.SomedefinitionsconcerningstatisticalinferenceUnit: The smallest object or individual that can be investigated, the sourceof the basic information. In surveys, the units are often called samplingunits;in experiments, experimentalunits. (e.g.) individual animals in a farmPopulationorUniverse:A very large (possible infinite) group of units concerning whichscientific inferences are to be made. (e.g.) animals in a farmSample:When a few units are selected from a population, it is called as asample. (e.g.) animals of a particular breed in a farmVariable:The quantitative or numerical characteristic of the data is called as avariable. (e.g.) body weight of goatsContinuousvariable:

    A variable that can potentially take any value within a range is calledas acontinuousvariable

    . (e.g.) daily milk yield of a cow.Discreteordiscontinuousvariable:If a variable takes only integral values, then it is called as a discrete ordiscontinuousvariable. (e.g.) blood cells countAttribute:It refers to the qualitative character of the items chosen. (e.g.) colourof an animalConstant:

    It is a numerical value, which is same for all the units in thepopulation. (e.g.) number of chromosomes for sheepParameter:A statistical measure pertaining to a population is called as aparameter. (e.g.) mean, standard deviation of the population

  • 8/3/2019 Bio Statistics Notes

    4/135

    Biostatistics 2010

    4

    Statistic:

    A statistical measure pertaining to a sample is called as astatistic

    .(e.g.) mean, standard deviation of the sample.Functionsofstatistics presents facts in a definite form simplifies mass of figures facilitates comparison helps in formulating hypothesis helps in testing the hypothesis helps in prediction

    helps in the formulation of suitable policiesLimitationsofstatistics Statistics is not suitable to the study of qualitative phenomenon Statistics does not study individuals Statistical laws are not exact Statistics table may be misused Statistics is only, one of the methods of studying a problem

    MisuseofStatistics:The following are some of the common ways in which Statistics canbe misused1 Quoting figures without their context2 Comparing entirely different sets of figures because of somesuperficial similarity3 Enumerating only figures favourable in an argument.4 Arguing from effect to cause.5 Generalizing from the part to whole without any base.

    COLLECTIONOF

    DATA

    Before starting collection of data, one should take the following intoconsideration:

  • 8/3/2019 Bio Statistics Notes

    5/135

    Biostatistics 2010

    5

    One should have a definite object

    One should have clear idea about the information to becollectedA statistical investigation always begins with collection of data. Onecan collect the data either by himself or from available records.The data are of two kinds:1. Primary data2. Secondary dataPrimaryData:

    The data collected by the investigator himself from the sample orpopulation is called as the primary data. The source from which one gathersprimary data is called as the primary sourceMethodsofcollectingprimarydata

    DirectpersonalobservationThis method consists in the collection of data personally by theinvestigator from the sources concerned. In other words investigator has togo to the field personally to make enquiry and soliciting information from theinformants or respondents.IndirectpersonalobservationThe investigator collects data from a third person (called as witness),who knows about the data being gathered.Datacollectionthroughagents,localreportersetc.Here the investigator appoints some person called as an agent tocollect information on his behalf. In this method the schedule (is the nameusually applied to a set of questions which are asked and filled in a face to face situation with another person) which elicit comprehensive informationwill be framed by the chief investigator with the help of other experts basedon objective of the survey.DatacollectionthroughquestionnairesThe method of sending the questionnaires (refers to a devise forseeking answers to questions by using a form which the respondent fills in

  • 8/3/2019 Bio Statistics Notes

    6/135

    Biostatistics 2010

    6

    himself) by post and collecting the replies also by post should be employed ifit is not feasible to appoint enumerators to cover the whole ground. Oftendistinction is made between the schedule and a questionnaire. A schedule isfilled by the interviewers in a facetoface situation with the informant. Aquestionnaire is filled by the informant which he receives and returns bypost. The questionnaire is mailed to the respondents with a request for quickresponse within a specified time. A very polite covering note explaining indetail the aim and object of collecting the information and also theoperational definitions of various terms and concepts used in thequestionnaire is attached. Respondents are also requested to extend full cooperation by furnishing the correct replies and returning the questionnairesduly filled in time. Respondents are also taken into confidence by ensuingthem that the information supplied by them in the questionnaire will be keptstrictly confidential. In order to ensure quick and better response the returnpostage expenses are usually born by the investigator by sending a selfaddressed stamped envelope. Preparing a questionnaire is a technical joband requires a great amount of skill, expertise and practices.Characteristicsofagoodquestionnaire:1. Number of questions should be minimum.2. Questions should be in logical orders, moving from easy to moredifficult questions.3. Questions should be short and simple. Technical terms and vague

    expressions capable of different interpretations should be avoided.4.

    Questions fetching YES or NO answers are preferable. There may besome multiple choice questions requiring lengthy answers are to beavoided.5. Personal questions and questions which require memory power andcalculations should also be avoided.6. Question should enable cross check. Deliberate or unconsciousmistakes can be detected to an extent.7. Questions should be carefully framed so as to cover the entire scope ofthe survey.8. The wording of the questions should be proper without hurting thefeelings or arousing resentment.9.

    As far as possible confidential information should not be sought.10.Physical appearance should be attractive, sufficient space should beprovided for answering each question.Before the actual survey, a pilot survey is conducted. Thequestionnaire/Schedule is pretested in a pilot survey. A few among the

  • 8/3/2019 Bio Statistics Notes

    7/135

    Biostatistics 2010

    7

    people from whom actual information is needed are asked to reply. If theymisunderstand a question or find it difficult to answer or do not like itswordings etc., it is to be altered. Further it is to be ensured that everyquestion fetches the desired answer.

    Methods Merits DemeritsDirect personalobservation It is veryaccurate Expensive in terms of time andmoneyIntensive detailscan be collected Not suitable when the field ofenquiry is large

    Indirect personalobservation It saves time Witnesses should possess thoroughknowledge of the facts regarding theproblem of investigationWitness must be willing to giveinformationData collectionthrough agentsand localreporters etc.

    It saves time The agents will collect information intheir own fashionOnly approximate results can beobtainedIt is expensiveData collectionthroughquestionnaires Large areas canbe covered It cannot be used if the informantsare illiterateIt is lessexpensive Response may be poor.It saves time Possibility of vague/ inaccurateanswers.Secondarydata:

    The data collected from the available sources like published reports,documents, journals etc. are called secondary data. The source from whichthe secondary data are collected is called as secondary source of data. Whilethe primary data are collected for a specific purpose, the secondary data aregathered from sources which were done for some other purpose.

  • 8/3/2019 Bio Statistics Notes

    8/135

    Biostatistics 2010

    8

    Sourcesofobtainingsecondarydata

    (i) PublishedSource:(a) Official Publication of Central Government: (to mention a few)

    Directorate of Economics and Statistics Ministry ofAgriculture and Irrigation National Sample Survey Organization (NSO), Department ofStatistics, Ministry of Planning Central Statistical Organization (CSO), Department of Statistics,Ministry of Planning

    (b)

    Publication of Semi Government Statistical Organization: Statistics Department of Reserve Bank of India, Bombay Economics Department of Reserve Bank of India, Bombay The Institute of Economic Growth, Delhi The Institute of Foreign Trade, New Delhi(c) Publication of Research Institutes: Indian Statistical Institute, Calcutta Indian Council of Agricultural Research, New Delhi

    Indian Agricultural Statistics Research Institute, New Delhi National Council of Educational Research and Training National Council of Applied Economic research(d) Publications of Commercial and Financial Institutions(e) Reports of Various Committees and commissions appointed(f) News Papers and Periodicals(g) International Publications

    United Nations Organizations(ii) UnpublishedSource:The statistical data need not always be published. There arevarious sources of unpublished statistical materials such as the records

  • 8/3/2019 Bio Statistics Notes

    9/135

    Biostatistics 2010

    9

    maintained by private firms or business enterprises, who may not like torelease their data to any outside agency; the various departments and officesof the Central and State Governments; the researches carried out by theindividual research scholars in the Universities or research Institutions.MeritsofSecondarydata

    It saves time, labour and moneyDemeritsofSecondarydata

    It may not be very accurate All the data needed may not be available

    It might have been collected by some improper methods and in someabnormal conditionCLASSIFICATIONOFDATAClassification of data is the next step after collection of data. It is theprocess of arranging data into homogeneous classes according to similarities.

    Objectives(uses)ofclassification

    1. To remove unnecessary details2. To bring out explicitly the significant features in the data3. To make comparisons and drawing inferencesTYPESOFCLASSIFICATION

    1. NumericalclassificationClassification of data according to quantitative characters. (e.g)classification of animals in a farm according to their weight2. Descriptiveclassification

    Classification according to attributes i.e, qualitative characters. (e.g).classification of animals according to breeds3. SpatialclassificationClassification according to geographical area. (e.g) district wiselivestock population in Tamil Nadu

  • 8/3/2019 Bio Statistics Notes

    10/135

    Biostatistics 2010

    10

    4. Temporalorchronological classification

    Classification according to time (e.g) livestock population in differentyears.TABULATIONTabulation is the process of summarizing classified or grouped data inthe form of a table so that it is easily understood and an investigator isquickly able to locate the desired information. A table is a systematicarrangement of classified data in columns and rows. Thus, a statistical table

    makes it possible for the investigator to present a huge mass of data in adetailed and orderly form. It facilitates comparison and often reveals certainpatterns in data which are otherwise not obvious. Classification andTabulation, as a matter of fact, are not two distinct processes. Actually theygo together. Before tabulation data are classified and then displayed underdifferent columns and rows of a table.AdvantagesofTabulationStatistical data arranged in a tabular form serve following objectives:

    1.

    It simplifies complex data and the data presented are easilyunderstood.2. It facilitates comparison of related facts.3. It facilitates computation of various statistical measures like averages,dispersion, correlation etc.4. It presents facts in minimum possible space and unnecessaryrepetitions and explanations are avoided. Moreover, the neededinformation can be easily located.5. Tabulated data are good for references and they make it easier topresent the information in the form of graphs and diagrams.PreparingaTable

    The making of a compact table itself an art. This should contain all theinformation needed within the smallest possible space. What the purpose oftabulation is and how the tabulated information is to be used are the mainpoints to be kept in mind while preparing for a statistical table. An ideal tableshould consist of the following main parts:

  • 8/3/2019 Bio Statistics Notes

    11/135

    Biostatistics 2010

    11

    1. Table number2.

    Title of the table3.

    Captions or column headings4. Stubs or row designation5. Body of the table6. Footnotes7. Sources of dataAmodelstructureofatableisgivenbelow:

    TableNumber TitleoftheTable

    Sub

    Heading

    Caption HeadingsTotal

    Caption Sub-Headings

    StubSub-Headings

    Body

    Total

    Footnotes:

    SourcesNote:

    RequirementsofaGoodTableA good statistical table is not merely a careless grouping of columnsand rows but should be such that it summarizes the total information in aneasily accessible form in minimum possible space. Thus while preparing a

    table, one must have a clear idea of the information to be presented, the factsto be compared and he points to be stressed.Though, there is no hard and fast rule for forming a table yet a fewgeneral points should be kept in mind:

  • 8/3/2019 Bio Statistics Notes

    12/135

    Biostatistics 2010

    12

    1. A table should be formed in keeping with the objects of statisticalenquiry.2.

    A table should be carefully prepared so that it is easilyunderstandable.3. A table should be formed so as to suit the size of the paper. But suchan adjustment should not be at the cost of legibility.4. If the figures in the table are large, they should be suitably rounded orapproximated. The method of approximation and units ofmeasurements too should be specified.5. Rows and columns in a table should be numbered and certain figuresto be stressed may be put in box or circle or in bold letters.6. The arrangements of rows and columns should be in a logical andsystematic order. This arrangement may be alphabetical,chronological or according to size.7. The rows and columns are separated by single, double or thick lines torepresent various classes and subclasses used. The correspondingproportions or percentages should be given in adjoining rows andcolumns to enable comparison. A vertical expansion of the table isgenerally more convenient than the horizontal one.8. The averages or totals of different rows should be given at the right ofthe table and that of columns at the bottom of the table. Totals forevery subclass too should be mentioned.9. In case it is not possible to accommodate all the information in asingle table, it is better to have two or more related tables.TypeofTablesTables can be classified according to their purpose, stage of enquiry,nature of data or number of characteristics used. On the basis of the numberof characteristics, tables may be classified as follows:1. Simple or oneway table2. Two way table3. Manifold table

    Simpleorone-wayTable

    A simple or oneway table is the simplest table which contains data ofone characteristic only. A simple table is easy to construct and simple tofollow.

  • 8/3/2019 Bio Statistics Notes

    13/135

    Biostatistics 2010

    13

    For example, the blank table given below may be used to show the number of adults in different types ofanimals in a locality.

    The number of adults in different occupations in a locality

    Type of animal No. of Adults

    Total

    Two-wayTableA table, which contains data on two characteristics, is called a twoway table. In such case, therefore, either stub or caption is divided into twocoordinate parts.In the given table, as an example the caption may be further divided in respect of sex. This subdivision isshown in two-way table, which now contains two characteristics namely, type of animal and sex.

    Type of animalNo. Of Adults

    TotalMale Female

    Total

    ManifoldTable

    Thus, more and more complex tables can be formed by includingother characteristics.For example, we may further classify the caption sub-headings in the above table in respect of maritalstatus, religion and socio-economic status etc. A table, which has more than two characteristics of data,is considered as a manifold table. For instance, table shown below shows three characteristics namely,occupation, sex and marital status.

    Occupation

    No. of Adults

    TotalMale Female

    M U Total M U Total

    Total

    Foot note: M Stands for Married and U stands for unmarried

  • 8/3/2019 Bio Statistics Notes

    14/135

    Biostatistics 2010

    14

    Manifold tables, though complex are good in practice as these enablefull information to be incorporated and facilitate analysis of all related facts.Still, as a normal practice, not more than four characteristics should berepresented in one table to avoid confusion. Other related tables may beformed to show the remaining characteristics.

    FREQUENCYDISTRIBUTIONFrequency distribution is a series when a number of observationswith similar or closely related values are put in separate bunches or groups,each group being in order of magnitude in a series. It is simply a table inwhich the data are grouped into classes and the number of cases which fall ineach class are recorded. It shows the frequency of occurrence of differentvalues of a single Phenomenon.Afrequencydistributionisconstructedforthreemainreasons:1. To facilitate the analysis of data.2. To estimate frequencies of the unknown population distribution fromthe distribution of sample data and3. To facilitate the computation of various statistical measuresDiscreteorUngroupedFrequencyDistribution

    In this form of distribution, the frequency refers to discrete value.Here the data are presented in a way that exact measurements of units areclearly indicated. There are definite differences between the variables ofdifferent groups of items. Each class is distinct and separate from the otherclass. Noncontinuity from one class to another class exists. Data such as factslike the number of rooms in a house, the number of companies registered in acountry, the number of children in a family, etc.In this method, the observations are arrayed in a systematic way (inan ascending order of magnitude and this process of arraying theobservations in natural order is called array). From the arrayed figuresfrequency distribution for each value can be obtained.Eg. Weight of ten numbers of eggs in grams.51, 58, 49, 52, 55, 61, 59, 55, 45, 48

    Array Form 45 48 49 51 52 55 58 59 61

    No. of egg 1 1 1 1 1 2 1 1 1

  • 8/3/2019 Bio Statistics Notes

    15/135

    Biostatistics 2010

    15

    This representation though better than array, does not condense thedata much and it is quite cumbersome to go through huge data.Grouped Frequency DistributionWhen the data are grouped into classes of appropriate interval,showing the number in each class, we get frequency distribution.

    (e.g).The following is the frequency table showing the distribution of chicks in different weight classes:

    Class(weight in grams)

    Frequency(no. of chicks)

    30 34 234 38 7

    38 42 842 46 3

    Total 20

    Rawdata&groupeddata:The observed data given as such is known as raw data. When theobserved data are grouped in to groups or classes, they are known asgrouped data.Classlimits

    Class limits are the limits within which the class interval lies. Thuseach class interval has two limits, the upper and the lower limits.FrequencyFrequency is the number of observation in that class.Widthorlengthoftheclass/classintervalWidth of the class is the difference between the upper boundary andlower boundary of the same class. The width of a class is known as the classinterval.Classmark

    The midpoint of the class is calledClass mark.Rulestobefollowedinformingafrequencydistribution The class interval should be of equal width and of such size that thecharacteristic features of the distribution are displayed.

  • 8/3/2019 Bio Statistics Notes

    16/135

    Biostatistics 2010

    16

    Classes should not be too large (or) too small. If too large, it willinvolve considerable errors in assuming that the midpoints of theclass intervals are the average of that class. If too small, there will bemany classes with zero frequency (or) small frequency. There arehowever certain type of data, which may require the use of unequal orvarying class intervals. When there is irregular flow of data and widefluctuating gap among the varieties, varying class intervals are to betaken (or) otherwise there may be a possibility of classes without anyfrequency or observations falling in that category.

    The range of the classes should cover the entire range of data and theclasses must be continuous.

    It is convenient to have the midpoint of the class interval to be aninteger. As a general rule, the number of classes should be in the rangeof 616 and never more than 30.FormationofClassIntervalsFirst we have to form the class interval. L is lowest value in the datato be classified and the H highest value. Find the difference.i.e.difference=HL

    k No. of required classesThe number of required classes can be calculated using the formulasuggested by Sturges rule 1 3.322 log n is total number of observations.

    ChoiceoftheClassInterval:

    The following are the different types of class intervals that arefollowed.

  • 8/3/2019 Bio Statistics Notes

    17/135

    Biostatistics 2010

    17

    a b c d e F0 1010 2020 30 0 and under 1010 and under 2020 and under 30 51525 Less than 2020 30more than 30 0 1010 3030 70 0 9.910 19.920 29.9In typed the end class are open. In type 'e' there is unequal classinterval. In type 'c' the mid points of the class intervals are given. In type fthe class limits are exactly defined. In type b is good, we are not using it, asthe class limits are not clearly expressed in type 'b'. We often use type 'a'.The difficulty is where to include 10, 20 etc. we often include 10 in thesecond class and 20 in the third class & so on. We define 0 and up to 10 infirst class, 10 and up to 20 in second class so on. Depending on the need andsituation, appropriate type of class interval should be chosen.Formationoffrequencydistribution:1.MethodofTallyMarkAfter forming the class interval each should be written one below theother and for each item in the collected data a stroke is marked against theclass interval in which it falls. Usually after every 4 stroke in a class intervalthe 5th item is indicated by making a diagonal line through the previous 4strokes. Thus strokes are counted and this is called formation of frequencydistribution by the method of tally marks.Example: Let us consider the weights in kg of 50 college students.42 62 46 54 41 37 54 44 32 45 47 50 58 49 51 42 46 37 42 39 54 39 51 58 47 64 43 48 49 48 49 61 41 4058 49 59 57 57 34 56 38 45 52 46 40 63 41 51 41

    Construct a frequency distribution

    Here the size of the class interval as per sturges rule is obtained as follows

    . . . 5Thus the number of classes is 7 and size of each class is 5. The required size of each class is 5.

    The required frequency distribution is prepared using tally marks as given below:

  • 8/3/2019 Bio Statistics Notes

    18/135

    Biostatistics 2010

    18

    Classes(Weight in Kg)

    Tally Mark Frequency

    30 35

    (from 18.5 to below 19.5)2

    35 40 6

    40 45 12

    45 50 14

    50 55 6

    55 60 6

    60 65 4

    2.ArrayMethodAn array is an orderly arrangement of the data by magnitude in theascending or descending order. Then arrange the given data in the ascendingorder of magnitude.Form the class interval. From the array, we will count the number ofobservations belonging to each class and write against that class. Thismethod is not easy, when the number of observation is large. We can adoptthis method in the cases, where the number of observations is less than 30.

    PRESENTATIONOFDATAClassification and tabulation reduce the complexity of vast andcomplicated statistical data but still it is not easy to interpret the tabulateddata. Diagrams and graphs will catch the eye more easily than tables whichprovide array of figures. A glance over a graph or diagram will enable anylayman (without statistical knowledge) to get an idea about the essentialcharacteristics of the tabulated data without much strain or effort. Thepresentation of data in the form of diagrams and graphs is also called visualpresentation of data.

  • 8/3/2019 Bio Statistics Notes

    19/135

    Biostatistics 2010

    19

    Functionsofdiagrams&graphs

    It will attract the attention of a large number of persons. They carry a birds eye view impression in the human mind. It saves a lot of valuable time if presented in a form of suitable charts& graphs instead of pages of numerical figures. To facilitate comparison between two or more sets of data. Prediction equations can be represented by graphs and these will beof much in forecasting.

    Limitationsof

    diagrams

    &

    graphs

    They are approximate indicators. Exact and accurate information'scan be obtained from original tabular information. They cannot substitute the tabular information. They fail to disclose small difference when large figures are involved.

    GRAPHICALREPRESENTATIONOFDATA

    The discussion of frequency distribution has shown that tabularpresentation of data tends to preserve numerical accuracy while graphicrepresentation fosters comparison and quick communication of majorfeatures.The general rules for constructing graphs are as follows:1.Titleandfootnotes:The graphs must bear a concise and selfexplanatory title and mustcontain appropriate foot notes.2.

    Selection

    of

    Scale:

    Proper care should be taken in the selection of scale so that the graphis neither too big nor small in addition, the scales used on the Xaxis and theYaxis should be mentioned clearly.

  • 8/3/2019 Bio Statistics Notes

    20/135

    Biostatistics 2010

    20

    3.Neatness:

    The graph should be neat and clear. If a number of diagrams are to beprepared, it is desirable to number them for the purpose of reference.4.Attractive:The graph should be attractive so that it invites the attention of thereader immediately. To make the diagram attractive, leave reasonablemargin on all sides of the diagram.5.Falsebaseline:

    Generally the vertical scale (ie Yaxis) starts from zero. However if theminimum value to be portrayed on the Yaxis is large, it would be difficult todraw the graph. In such cases use is made of, what is known as false baseline. For this space between the origin point and the maximum value isreduced by drawing two zigzag horizontal lines for the space between theminimum value and the origin.6.Depictionofmorethanonevariable:If more than one variable is to be depicted on a graph, they should beshown by different types of lines. Colours and shades should be used toexhibit various components of a diagram and a key be provided.With the data classified in the form of grouped frequency distribution,we can have the following graphical representation.1. Histogram2. Frequency polygon3. Frequency curve4. Ogive5. Lorenz Curve

    The constructions of the above graphical representations arefurnished hereunder.1.Histogram:In drawing the histogram of given grouped frequency distribution, wefirst mark off along the axis of X all the classes on as suitable scale. With the

  • 8/3/2019 Bio Statistics Notes

    21/135

    Biostatistics 2010

    21

    class intervals as bases, draw rectangles whose heights are proportional tothe frequency in their classes. For equal class intervals, the heights of therectangles will be proportional to the frequencies while for unequal classintervals, the height will be proportional to the ratios of the frequencies tothe widths of classes.Example 10: Draw a histogram for the following data

    Wages in Rs Number ofWorkers

    0-50 8

    50-100 16

    100-150 27

    150-200 19

    200-250 10250-300 6

    2.Frequencypolygon:

    Polygon means a figure having more than four sides. For agrouped frequency distribution, the abscissa of points are mid values of theclasses. For equal class intervals the frequency polygon can be obtained byjoining the middle points of the upper sides of the adjacent rectangles of thehistogram by means of straight lines. If the class intervals are of small widththe polygon can be approximated by a smooth curve. This form of graphicalrepresentation also depicts clearly the features of distribution.

  • 8/3/2019 Bio Statistics Notes

    22/135

    Biostatistics 2010

    22

    Example: Draw a frequency polygon for the following data.

    Weight (in kg) Number of Students

    30-35 4

    35-40 7

    40-45 10

    45-50 18

    50-55 14

    55-60 8

    60-65 3

    3.Frequencycurve:

    Construction is same as for frequency polygon, but the frequencycurve can be obtained by drawing a smooth freehand curve through thevertices of frequency polygon.Example: Draw a frequency polygon for the following data.

    Weight (in kg) Number of Students

    30-35 4

    35-40 7

    40-45 10

    45-50 18

    50-55 14

    55-60 8

    60-65 3

    0

    5

    10

    15

    20

    30 35 40 45 50 55 60 65

    No.ofstudents

    Weight(inKg)

  • 8/3/2019 Bio Statistics Notes

    23/135

    Biostatistics 2010

    23

    4.Ogive:This is a cumulative frequency curve. This curve is obtained bymaking use of cumulative frequency instead the simple frequency. It runsmore regularly than the ordinary frequency curve. This is particularly usefulfor finding out median, quartiles. The mode can also be obtained by findingthe Xvalue for the steepest part of the curve. This curve is also smoothed outif necessary. As it assumes the form of an arch, it is called Ogive or Ogee. Thepoint to be particularly noted in drawing out an Ogive is that in case of acumulative frequency curve, plotting is to be done at the upper limits / lowerlimits of the classes and not on the midpoint as is done in the case offrequency polygon / curve.Less than Ogive:If the points are plotted with upper limits of classes on Xaxis and thecorresponding cumulative frequencies (less than) on Yaxis, the figureformed by joining these points with a smooth hand is known as cumulativefrequency curve (less than).Formation of table of (less than and greater than) Ogives:Example: Draw the Ogives for the following data.

    0

    2

    4

    6

    8

    10

    12

    14

    16

    18

    20

    30 35 40 45 50 55 60 65

    No.ofstudents

    Weight(inKg)

  • 8/3/2019 Bio Statistics Notes

    24/135

    Biostatistics 2010

    24

    Class interval Frequency

    20-30 4

    30-40 6

    40-50 13

    50-60 25

    60-70 32

    70-80 19

    80-90 8

    90-100 3

    Solution:

    Class limit Less thanOgive

    More thanOgive

    20 0 110

    30 4 10640 10 100

    50 23 87

    60 48 62

    70 80 30

    80 99 11

    90 107 3

    100 110 0

    Form the point of intersection of less than and greater thancumulative frequency curves, a perpendicular is drawn to the base, themedian value can be ascertained by measuring the length of the axis of Xfrom the point of origin to the foot of the perpendicular. In other words,

    0102030405060708090

    100

    110120

    0 10 20 30 40 50 60 70 80 90 100

    Cumulativefrequency

    Classlimit

  • 8/3/2019 Bio Statistics Notes

    25/135

    Biostatistics 2010

    25

    median value is the Xvalue of the point of intersection of greater than andless than Ogives.5. LorenzCurveLorenz curve is a graphical method of studying dispersion. It wasintroduced by Max.O.Lorenz, a great Economist and a statistician, to studythe distribution of wealth and income. It is also used to study the variabilityin the distribution of profits, wages, revenue, etc. It is specially used to studythe degree of inequality in the distribution of income and wealth betweencountries or between different periods. It is a percentage of cumulativevalues of one variable in combined with the percentage of cumulative valuesin other variable and then Lorenz curve is drawn.DIAGRAMMATICREPRESENTATIONOFDATASignificance:Even after proper classification by the process of tabulation of a massdata, it will not lend itself for a ready grasp of information contained in it.The data if expressed diagrammatically will have a better visual effect andthis is apt to create interest even in a casual observer. Visual effect is found tohave a lasting effect and this makes the observer to hold in mind a largenumber of facts without much strain of effort.Essentialrequisitesofagooddiagram:1. A diagram should be well planned out.2. I should be drawn with almost care.3. It should be neat and drawn to the scale4. It should have a good visual effect5. The design of the diagram should be simple but impressive6. It should involve less time, labour and cost and should have maximum

    utility.BarDiagram:The bar is the simples of statistical diagrams. It consists of a series ofbars of equal width (all horizontal or all vertical) standing on a common base

  • 8/3/2019 Bio Statistics Notes

    26/135

    Biostatistics 2010

    26

    line, at equal intervals, the lengths of these bars being proportional to themagnitude of the variables that they represent.

    ComponentBardiagram:In certain cases, the variable is capable of being subdivided. Then thebars are divided into parts and marked in different colours or in some otherdistinct ways to show the component parts.

    PercentageBar

    Diagram:

    When the component parts are expressed as percentages of thewhole, the resulting bar diagram is called percentage bar diagram. In thiscase all bars of equal length.

    0

    1

    2

    3

    4

    5

    1995 2000 2005 2010

    Averagemilkyield(Kg)

    Year

    0

    1

    2

    3

    4

    5

    67

    8

    9

    10

    1995 2000 2005 2010

    Averagemilkyield

    (Kg)

    Year

    Goat

    Buffal

    o

    Cow

  • 8/3/2019 Bio Statistics Notes

    27/135

    Biostatistics 2010

    27

    MultipleBarDiagram:Bar diagrams may sometimes be superimposed or placed injuxtaposition for comparative purposes.

    PieDiagram:Instead of presenting the variables by bar (or rectangles) they can berepresented by circles whose areas are proportional to the value of variables.This presentation is known as Pie diagram. The component parts of thedifferent variables can then be represented by sectors of these circles.

    0%

    20%

    40%

    60%

    80%

    100%

    1995 2000 2005 2010

    Averagemilkyield

    Year

    Goat

    Buffalo

    Cow

    0

    1

    2

    3

    4

    5

    1995 2000 2005 2010

    Averagemilkyield(Kg)

    Year

    Cow

    Buffalo

    Goat

    8.23.2

    1.4

    1.2Population

    Cow

    Buffalo

    Goat

    Others

  • 8/3/2019 Bio Statistics Notes

    28/135

    Biostatistics 2010

    28

    PercentagePieDiagram:

    If the components parts are expressed as percentages of the whole,the variables can be represented by circles of equal radii each of which isdivided into sectors showing the component parts. This representation maybe called the percentage pie diagram.

    Pictograph:These consist of actual pictures representing the variable, the line ofthe pictures being proportional to the value of the variable.

    1995

    Cow

    Buffalo

    Goat

    Others

    2005

    Cow

    Buffalo

    Goat

    Others

    0

    12

    3

    4

    5

    2005 2006 2007 2008

    Avetargeno.of

    Computers

    Year

    Computer

  • 8/3/2019 Bio Statistics Notes

    29/135

    Biostatistics 2010

    29

    StatisticalMaps:

    Statistical maps are widely used to represent special distribution suchas areas under different crops, population its density etc., on the map, themagnitude of the data is showna. by points, dots or crossesb. by writing actual figuresc. by using different coloursand it is indicated by the key shown in one of the corners of the map.

    SUMMATIONNOTATIONS

    If there are n

    observations and their values are given byx1,x2, ...,xn

    then the sum of the observations, x1+x2+ ...+xn can be written using thesymbol ` (read as sigma) as follows: x which means the sum ofxi's, i taking values from1 to n

    When no ambiguity is likely to arise, then the suffixi

    can be removedand we can write as x1+x2++xn=x.Then using the notation fixi=f1x1+f2x2++fnxnwhich can be written as fx. x

  • 8/3/2019 Bio Statistics Notes

    30/135

    Biostatistics 2010

    30

    which can be written as

    MEASURESOFCENTRALTENDENCY/MEASURESOFLOCATION/

    AVERAGESA Statistical average condenses a frequency distribution or raw dataand presents it in one single representative number. Thus a single expressionrepresenting the whole group is selected which may convey a fairly adequateidea about the whole group. This single expression in statistics is known asthe average. Such a value can neither be the smallest one nor the largest onebut is one which usually referred to as a measure of an average is usuallyreferred to as a measure of central tendency. It is located at a point aroundwhich most of the other values tend to cluster and therefore it is also termedas a measure of location. It is considered as a measure of description becauseit describes the main characteristics of the data.Objectivesofaveragingorneedforcalculatingaverages:1. Describing the distribution in a concise manner

    An average condenses the mass data into a single value and thisenables us to form an idea about the entire distribution.2. Comparing two or more distributionWhen averages for two or more distributions are calculated, the taskof comparison becomes easy.3. In computing other statistical measuresComplete study of a distribution requires calculation of variousstatistical measures like dispersion, skewness, kurtosis etc. the computationof many of these measures requires as a first step the computation of anaverage value.4. In carrying out other statistical analysisTo compare the mean performance of two distributions first, meanvalue has to be compared.Average is a general term. There are different types of averages.

  • 8/3/2019 Bio Statistics Notes

    31/135

    Biostatistics 2010

    31

    Typesofaverages

    1. Arithmetic Mean (AM) or Mean or Common Mean2. Geometric Mean (GM)3. Harmonic Mean (HM)4. Median5. Mode1.ArithmeticMean(AM)

    Forungrouped/Rawdata:The Arithmetic Mean is the value arrived at by dividing the sum ofobservations by the total number of observations. The mean of a populationis denoted by (read as 'mu'); whereas for the sample it is denoted by (read as 'x bar'). The total number of observations is denoted by 'N' for thepopulation and by 'n' for the sample.If we denote the 'n' observations in a series byx1,x2,x3xn.

    Arithmetic Mean

    n Example:The following are the body weight (Kg) of 10 Merino Rams. Calculate A.M.

    52, 58, 65, 70, 65, 55, 74, 68, 75, 80

    A.M = (52 + 58 + 65 + 70 + 65 + 55 + 74 + 68 + 75 + 80) / 10

    = 662 / 10 = 66.2 Kg.

  • 8/3/2019 Bio Statistics Notes

    32/135

    Biostatistics 2010

    32

    GroupedData:

    In the case of a frequency distribution if the different class marks ofthe 'k' classes are denoted by x1,x2,x3,xkand corresponding frequenciesbyf1,f2,f3,fkthen the mean of the data is

    Wherefi = frequency of the ith classxi = mid value of the ith classk = no. of classes

    ShortcutMethod(groupeddata):

    fdN C fdN C

    C

    Wherefi= frequency of the ith classxi= mid value of the ith classC= class interval

  • 8/3/2019 Bio Statistics Notes

    33/135

    Biostatistics 2010

    33

    total number of observationsA

    is an Arbitrary point or Assumed Mean or Provisional MeanThe assumed mean can be the middle most mid value or it may be themid value for which the frequency is maximum.Eg. Find A.M for the following grouped frequency distribution of birth weight of Nilagiri Ram lambs

    ClassesMid

    Value(Xi)fi fix i d i fidi

    1.8 2.0 1.9 2 3.8 -5 -10

    2.0 2.2 2.1 1 2.1 -4 -4

    2.2 2.4 2.3 2 4.6 -3 -6

    2.4 2.6 2.5 3 7.5 -2 -6

    2.6 2.8 2.7 9 24.3 -1 -9

    2.8 3.0 2.9 11 31.9 0 03.0 3.2 3.1 11 34.1 1 11

    3.2 3.4 3.3 4 13.2 2 8

    3.4 3.6 3.5 4 14.0 3 12

    3.6 3.8 3.7 1 3.7 4 4

    3.8 4.0 3.9 2 7.8 5 10

    Total 50 147.0 10

    f

    x

    N

    14750 2.94 KgShortcut Method (grouped data):

    fdN C 2.9 1020 2 2.94

    COMBINED

    MEAN

    If is the mean of the first group of n1 items, is the mean ofsecond group ofn2 items, then the combined mean of the two groups is

  • 8/3/2019 Bio Statistics Notes

    34/135

    Biostatistics 2010

    34

    Where Sum of all the observations in the irst groupn Sum of all the observations in the irst group Sum of all the observations in the second groupn

    Sum of all the observations in the second group Sum of all the observations in the two groups of size Extending the above result if is the mean of the ith group ofniobservations, then

    WEIGHTEDARITHMETICMEAN:In computing simple AM, it was assumed that all the items are of equalimportance. This may not be always true. When items vary in importancethey must be assigned weight in proportion to their relative importance.Thus, a weighted mean is the mean of weighted items. In calculatingweighted A.M. each item is multiplied by its weight and the products soderived are summed up. This total is divided by the total weights (and not bythe number of items) to get the weighted mean.Symbolically ifx1,x2xn are the different items with weights w1,w2..wn respectively then the weighted mean is given by

  • 8/3/2019 Bio Statistics Notes

    35/135

  • 8/3/2019 Bio Statistics Notes

    36/135

    Biostatistics 2010

    36

    WhereN

    = total frequency =fi

    xi is the midpoint of the ithclass with frequencyfiSimplifying by taking logarithm on both sides,log 1 log . . log 1

    flog flog f loglog f log Therefore, GM antilog f log Where

    xiis the mid value of the class whose frequency is fi3. Harmonicmean(HM)

    Harmonic mean is the total number of items of a variable divided bythe sum of the reciprocals of the items. Ifx1,x2..xn are the n observationsand HM represents the harmonic mean, then 1 1 1

    1 1

  • 8/3/2019 Bio Statistics Notes

    37/135

    Biostatistics 2010

    37

    1A.M. of the reciprocals

    Harmonic

    mean

    is

    the

    reciprocal

    of

    arithmetic

    mean

    of

    the

    reciprocalvalues.In the case of a frequency distribution, HM is obtained by using theformula,

    Wherexiis the mid value of the class whose frequency isfiNis the total frequency.

    4. MedianIt is the value which has got equal number of observations on eitherside when the items are arranged in the ascending or descending order ofmagnitude. Median divides the series into two equal parts; one part willconsist of all variables less than median and the other part greater thanmedian.Foranungrouped(raw)data:

    Casea:When n is odd, then

    Median size of 1 2

    item after arranging the data in theascending or descending order of magnitudeE.g. Find the median value of body weight of Merino Rams

  • 8/3/2019 Bio Statistics Notes

    38/135

    Biostatistics 2010

    38

    52, 58, 65, 70, 65, 55, 74, 68, 75

    Here n = 9 (odd)

    First, arrange the body weight in the ascending order of magnitude

    52, 55, 58, 65, 65, 68, 70, 74, 75

    Median term = (n + 1) / 2th term in the array

    i.e. (9 + 1) / 2th term = 5th term in the array

    Median = 65 Kg. (5th term in the array)

    Caseb:

    When n is even, thenMedian average of 2 2 1 item in the arrayE.g. To find the median value of body weight of 10 Merino Rams:

    52, 58, 65, 70, 65, 55, 74, 68, 75, 80

    First arrange the body weights in the ascending order of magnitude

    52, 55, 58, 65, 65, 68, 70, 74, 75, 80

    Here n = 10 (even)

    Median average of 2 2 1 item in the array= average of 10/2 & (10/2) + 1 th term in the array

    = average of 5 & 6th term in the array

    = 65 + 68 / 2 = 133 / 2 = 66.5 Kg.

    GroupedData:

    In the case of frequency distribution, median is the value which hasgot equal number of frequencies on either side (i.e.) which corresponds tothe cumulative frequency ofN/2. It is obtained by

  • 8/3/2019 Bio Statistics Notes

    39/135

    Biostatistics 2010

    39

    2

    Where l= lower limit of the median classf= frequency of the median classC= class intervalm = cumulative frequency of the class proceeding to the median classMedian class = class whose cumulative frequency just exceeds N/2

    Note1:Medianalclass istheclasscorrespondingtothecumulativefrequency

    equaltoorjustgreaterthanN/2.2: MediancanbecomputedusingOgive.Itisthexcoordinateofthepointof intersectionof the less thanandgreater than cumulativefrequency

    curve.

    e.g. Find the median for the following frequency distribution of birth weight of Nilagiri lambs

    Classes (birth weight) Frequency (fi) Cumulative Frequency

    1.8 2.0 2 2

    2.0 2.2 1 3

    2.2 2.4 2 5

    2.4 2.6 3 82.6 2.8 9 17

    2.8 3.0 11 26

    3.0 3.2 11 39

    3.2 3.4 4 43

    3.4 3.6 4 47

    3.6 3.8 1 48

    3.8 4.0 2 50

    Total 50

    Median class = the class whose cumulative frequency just exceeds N/2

    = the class whose cumulative frequency just exceeds 50/2 i.e. 25

    = 2.8 3.0 l = 2.8; c = 0.2; f = 11; N/2 = 50/2 = 25; m = 17

    Median = 2.8 + 0.2 / 11 (25 17)

    = 2.945 Kg.

  • 8/3/2019 Bio Statistics Notes

    40/135

    Biostatistics 2010

    40

    5. Mode

    It is the size of the most frequent item in a large set of data. Thusmode is the value of that variable which occurs most frequently or repeatsitself the greatest number of times.E.g. Find the mode for the body weight of Merino Rams

    52, 58, 65, 70, 65, 55, 74, 68, 75, 80

    FORM UNGROUPED FREQUENCY DISTRIBUTION:

    Value of the variable 52 55 58 65 68 70 74 75 80

    Frequency 1 1 1 2 1 1 1 1 1

    Mode = 65 Kg.

    In the case of grouped data mode can be calculated by Wherel= lower limit of the model classf1 = frequency of the class which proceeds (comes earlier) the model classf2

    = frequency of the class which succeeds (comes after) the model classC= class intervalModel class = the class which is having maximum frequency.E.g. Calculate mode for the following frequency distribution of birth weight of Nilagiri lambs

    Classes (birth weight) Frequency (fi)

    1.8 2.0 2

    2.0 2.2 1

    2.2 2.4 2

    2.4 2.6 3

    2.6 2.8 9

    2.8 3.0 12

    3.0 3.2 103.2 3.4 4

    3.4 3.6 4

    3.6 3.8 1

    3.8 4.0 2

    Total 50

  • 8/3/2019 Bio Statistics Notes

    41/135

    Biostatistics 2010

    41

    Model class: 2.8 3.0

    l = 2.8; f1 = 9; f2 = 10; c = 0.2

    Mode = 2.8 + (10/19)(0.2) = 2.905 Kg.Note : 1. Mode can be computed from histogram . It is the x coordinates ofthe points of intersection of the two diagonals from the top corners ofthe modal class to the pre and post modal class top corners.2. As a first approximation, midpoint of the modal class will be takenas the value of the mode which is called crude mode.

    3. In a moderately asymmetrical distribution meanmode = 3 (meanmedian), (approximately), Mode = 3 median 2 mean(approximately). This is empirical mode.4. A distribution can have more than one mode. If it has got onemode, it is called unimodal distribution; if it has got two modes, it iscalled bimodal distribution; if it has got three modes, it is called trimodal distribution; if it has got more than three modes, it is calledmultimodal or polymodal distribution.Propertiesofarithmeticmean

    1.

    The sum of the deviations of the items from the mean is equal to zero.2. The sum of the squared deviations from the mean is smaller than thesum of the squared deviations of the items from any other value i.e. is minimum.3. The product of the mean with the number of observations gives the totalof the original data, i.e. 4. If , are the means of the two groups with the number ofobservations, n1 and n2 respectively the mean of the combined group isgiven by

    nx nxn n 5. AM > GM > HM6. When all the values are equal, AM = GM = HM

  • 8/3/2019 Bio Statistics Notes

    42/135

    Biostatistics 2010

    42

    7. i. for a symmetrical distribution, AM = median = modeii. for a positively skewed distribution, AM > median > mode (short tailon the left)iii. for a negatively skewed distribution AM < median < mode (short tailon the right)

    Propertiesofgeometricmean1. GM will be zero if one or more of the values are zero.2. GM < AM, GM > HMPropertiesofharmonicmean

    1. HM < GM < AMPropertiesofMedian1. Mean deviation taken about median as the origin is the minimum.2. i. If the distribution is symmetrical, median = mean = mode,ii. Median mode, if the distribution is positivelyskewed.iii. Median > mean, and median < mode, if the distribution is negativelyskewed.PropertiesofMode1. i. If the distribution is symmetrical, mode = median = mean.ii. If the distribution is positively skewed, mode < median < mean,iii. If the distribution is negatively skewed, mode > median > mean2. If the distribution is moderately asymmetrical then, Mode = 3 median 2 mean (approximately).ChoiceofanaverageThe selection of an average is a difficult one. It should be done aftergiving consideration to the nature and type of enquiry taken up and also theobject of statistical investigation. No one average can be good for all

  • 8/3/2019 Bio Statistics Notes

    43/135

    Biostatistics 2010

    43

    purposes, as different forms of averages have different characteristics. Thusin selecting an average the chief characteristics and limitation of variousaverages must be considered. Most of the averages suffer from one limitationor the other and they have their own advantages and disadvantages.Anidealaverageshouldpossessthefollowingqualities:1. It should be rigidly defined2. It should be based on all observations3. It should be simple to understand and easy to calculate4. It should have minimum influence of extreme values5. It should possess sampling stability6.

    It should be capable of further algebraic treatment.7.

    It should not be affected by open end classesLet us see how the different averages satisfy these qualities:Arithmeticmean It is rigidly defined, based on all observations, simple to understandand easy to calculate and is capable of further algebraic treatment. Itpossesses sampling stability to some extent. It is affected much by extreme values and also by open end classes

    Geometricmean

    It is rigidly defined, based on all observations and is capable of furtheralgebraic treatment. It is not simple to understand and easy to calculate as it involveslogarithms, and it does not possess sampling stability. It is affected tosome extent by extreme values. It is affected by open end classes.Harmonicmean

    It is rigidly defined, based on all observations and is capable of furtheralgebraic treatment. It is not simple to understand and easy to calculate as it involvesreciprocals and does not possess sampling stability. It is affected tosome extent by extreme values. It is affected by open end classes.Median

    It is simple to understand and easy to calculate and is not affected byextreme values. It can be calculated for distribution with open endclasses.

  • 8/3/2019 Bio Statistics Notes

    44/135

    Biostatistics 2010

    44

    It is not rigidly defined, not based on all observations, does notpossess sampling stability and is not capable of algebraic treatment.Mode It is simple to understand and easy to calculate and is not affected byextreme values and open end classes (normally).

    It is not rigidly defined, not based on all observations, does notpossess sampling stability and is not capable of algebraic treatment.Arithmeticmean Geometricmean Harmonicmean Median Mode1. Rigidly defined 2. Based on all observations

    3.

    Simple to understand and easy tocalculate

    4. Minimum influence of extreme values 5. Sampling stability 6. Further algebraic treatment 7. Not be affected by open end classes Thus we see that the qualities essential for a good average aresatisfied in varying degrees by different measures of central tendency thathave been seen. It is obvious that AM possesses the above properties morethan any other type of averages. It is the most popular device in practice.Hence it is called common average. Though the median and mode are easilycomputed than others, they are indeterminate in many cases and are notcapable of algebraic manipulations.Situationswheredifferentaveragesareused

    AM is generally applicable for all sorts of data. It should be used when thedistribution is reasonably symmetrical and further statistical analysis isto be carried out such as the computation of the standard deviation etc.and also algebraic manipulation is to be followed subsequently.

    GM is used when it is desired to give more weights to small items and lessweight to large items and in the case of ratios, percentages andmicroorganisms growth. HM is used in averaging certain types of ratios and rates and problemsinvolving time. It gives more weight to small items.

  • 8/3/2019 Bio Statistics Notes

    45/135

    Biostatistics 2010

    45

    The median is to be used when the attribute of the data are not directlymeasurable. As it can be easily located by mere inspection, it can becalculated when the data are incomplete. Use the median when thedistribution is highly skewed and the extreme items may have distortingeffects on the mean.

    Mode can be used to know the most typical value or the most commonitem. It is also used when the quickest estimate of centrality is required.MEASURESOFDISPERSION

    The measures of central tendency indicate only the central position.But they offer their own limitations and do not throw light on the formationof the series of data. Sometimes they may offer misleading results too. Fore.g. consider the following three series.Series A 100,100,100,100

    Series B 100,106,98,92,93,109,102

    Series C 1,79,220They have the same mean 100. Hence we may conclude thatthese series are alike in nature. But a close examination shall reveal that thedistributions differ widely from one another. In one distribution, the valuesmay be closely packed and in the other they may be widely scattered. Such avariation is called scatter, spread or dispersion. Hence an average is moremeaningful when it is examined in the light of dispersion. When dispersion isnot significant then the average appears to be a true representative figure ofthe series and when dispersion is significant, it implies that the average is farfrom being a true representative figure. The measurement of the scatteringof item in a distribution about the average is called a measure of variation ordispersion. Measures of dispersion also enable comparison of two or moredistributions with regard to their variability or consistency.Objectivesofmeasuresofdispersion

    To determine the reliability of an averageStudy of dispersion helps us in understanding how for anaverage is representation of the mass To serve as a basis for control of the variabilityIt helps us to determine the nature and causes of dispersionwith a view to control variability

  • 8/3/2019 Bio Statistics Notes

    46/135

    Biostatistics 2010

    46

    To compare two or more series with regard to their variabilityMeasures of dispersion help in comparing the variability of twoor more seriesAbsoluteandrelativedispersionWhen dispersion is expressed in terms of original units of series, fore.g., weight in kgs, income in rupees etc., it is called as absolute dispersion. Ifdispersion is expressed in terms of a pure number, free from units ofmeasurements, then the dispersion is relative dispersion.A relative measure of dispersion is an absolute measure of dispersiondivided by an average.

    Differentmeasures

    of

    dispersion

    1. RangeIt is the difference between the highest and lowest values in the rawdata. For the grouped data, the range is the difference between the lowerlimit of the first class and the upper limit of the last class. It is a very simplemeasure of dispersion. It is useful in the study of variation in money rate andrates of exchange, weather forecast etc.Relative measure of dispersion for range is the ratio of range (R.R)which is given by

  • 8/3/2019 Bio Statistics Notes

    47/135

    Biostatistics 2010

    47

    2. QuartileDeviation(QD)

    It is also known as semiinter quartile range. It is based on quartileswhich are points which divide the data into four equal parts. The lower orfirst quartile (Q1) divides the lower half of the distribution into two equalparts, i.e., it is the value below which 25% of the observations lie and abovewhich 75% of the observations lie. Similarly, the upper or third quartile (Q3)divides the upper half of the distribution into two equal parts, i.e. it is thevalue below which 75% of the observations lie and above which 25% of theobservations lie. The difference, Q3Q1 is called inter quartile range and QDis given by (Q3Q1)/2

    For grouped data,Q1

    is the value which corresponds to the cumulativefrequency ofN/4

    andQ3

    is the value which corresponds to the frequency of3N/4. QD is used in the case of open end distribution.FormulatocomputeQDIn the case of raw data, after arranging the data in the ascendingorder,

    1

    4

    3 1 4 Then,

    2

    In the case of frequency distribution or grouped data, 4

  • 8/3/2019 Bio Statistics Notes

    48/135

    Biostatistics 2010

    48

    WhereL1

    is the lower boundary of the first quartile class,m1is the cumulative frequency up to the first quartile class,f1is the frequency in the first quartile class andCis the width of the class interval

    34 WhereL3

    is the lower boundary of the third quartile class,m3 is the cumulative frequency up to third quartile class,f3 is frequency in the third quartile class andCis the width of the class intervalThen,

    2

    Relative measure of QD is known as the quartile coefficient ofdispersion (QC). Note:

    The second quartile (Q2) is the median. Q1,Q2,Q3 are three quartiles which divide the series into 4 equal parts.

    We have 9 deciles which divide the entire range into 10 equal parts andthey are denoted by D1,D2,.D9. We have 99 percentiles which divide the entire range into 100 equalparts and they are denoted by P1,P2,..P99. In the case of symmetrical distribution, /2 and therefore,

  • 8/3/2019 Bio Statistics Notes

    49/135

    Biostatistics 2010

    49

    2

    3.

    Mean

    deviation

    (MD)

    Mean deviation or average deviation in a series is the AM of thedeviations of the various items from an average (mean, median or mode) ofthe series taking all deviations as positive.For raw data, | |

    |x A|n WhereA is Mean or Median or ModeFor grouped data,

    f|x A|N

    WhereA is Mean or Median or Modexiis midpoint of the ithclass with frequencyfiThe relative measure of Mean Deviation is known as mean coefficientof dispersion or coefficient of mean deviation and is obtained by dividing theMD by the average from which it is computed.

    % 100

  • 8/3/2019 Bio Statistics Notes

    50/135

    Biostatistics 2010

    50

    Note : In actual practice, MD is calculated either from mean or median, butmode is not used as its value is indeterminate. However, median ispreferred to mean because mean deviation from the median isminimum.4. Standarddeviation(SD)Karl Pearson introduced the concept of standard deviation in 1893. Itis the most important measure of dispersion and is widely used in manystatistical formulae. Standard deviation is also called RootMean SquareDeviation. The reason is that it is the squareroot of the mean of the squareddeviation from the arithmetic mean. It provides accurate result. Square of

    standard deviation is called Variance.It is defined as the positive squareroot of the arithmetic mean of theSquare of the deviations of the given observation from their arithmetic mean.The standard deviation is denoted by the Greek letter (sigma).For raw data 1

    1 To simplify the above, 1

    Eg. The following are the crimps of the greece fleece yield of Nilagiri breed. Find out S.D6, 5, 3, 4, 2, 3

    XI Xi2

    6 36

    5 25

    3 9

    4 16

  • 8/3/2019 Bio Statistics Notes

    51/135

    Biostatistics 2010

    51

    2 4

    3 9

    Xi = 23 Xi2 = 99

    1 1 16 99 236

    = 1.34 no.

    Forgroupeddata(withSheppardscorrection)

    1 12

    1

    12

    To simplify the above, 1 12

    Where fi= frequency of the ithclassxi = mid value of the ithclassC

    = class intervalk = no. of classesN=fiC2/12 = Sheppards correction factor

  • 8/3/2019 Bio Statistics Notes

    52/135

    Biostatistics 2010

    52

    Eg. Calculate S.D for the following frequency distribution of birth weight of Nilagiri lambs

    Classes Mid Value(Xi) Frequency (fi) (Xi -X)2 fi(Xi -X)2

    1.8 2.0 1.9 2 1.0816 2.16322.0 2.2 2.1 1 0.7056 0.7056

    2.2 2.4 2.3 2 0.4096 0.8192

    2.4 2.6 2.5 3 0.1936 0.5808

    2.6 2.8 2.7 9 0.0576 0.5184

    2.8 3.0 2.9 11 0.0016 0.0176

    3.0 3.2 3.1 11 0.0256 0.2816

    3.2 3.4 3.3 4 0.1296 0.5184

    3.4 3.6 3.5 4 0.3136 1.2544

    3.6 3.8 3.7 1 0.5776 0.5776

    3.8 4.0 3.9 2 0.9216 1.8462

    Total 50 9.2800

    1 12(with Sheppards correction)

    9.280.0033= 0.427 Kg.

    Shortcutmethod

    Where

    fi= frequency of the ith classxi= mid value of the ith classC

    = Class intervalN=fi = total number of observations.A is an Arbitary point or Assumed Mean or Provisional Mean.

  • 8/3/2019 Bio Statistics Notes

    53/135

    Biostatistics 2010

    53

    Shephards correction

    In computing the standard deviation, sometimes grouping error mayoccur on account of grouping of data into different classes. For statisticaladjustment of this grouping error, Sheppard has suggested a correction valueto be deducted from the variance of the grouped data, which is given by12 12 CoefficientofVariationThe Standard deviation is an absolute measure of dispersion. It isexpressed in terms of units in which the original figures are collected andstated. The standard deviation of heights of students cannot be comparedwith the standard deviation of weights of students, as both are expressed indifferent units, i.e heights in centimeter and weights in kilograms. Thereforethe standard deviation must be converted into a relative measure ofdispersion for the purpose of comparison. The relative measure is known asthe coefficient of variation.The coefficient of variation is obtained by dividing the standarddeviation by the mean and multiply it by 100. Symbolically,

    % 100

    If we want to compare the variability of two or more series, we canuse C.V. The series or groups of data for which the C.V. is greater indicate thatthe group is more variable, less stable, less uniform, less consistent or lesshomogeneous. If the C.V. is less, it indicates that the group is less variable,more stable, more uniform, more consistent or more homogeneous.Variance

    Square of standard deviation is called as variance. It is the meansquare deviation. It is the sum of the squared deviation of individualobservations from the mean divided by the number of observations. It isdenoted by 2.

  • 8/3/2019 Bio Statistics Notes

    54/135

    Biostatistics 2010

    54

    Standarderror(SE)

    The mean of random sample may be taken as a representative of thepopulation mean. The difference between the sample mean and populationmean is due to sampling and it is called sampling error or standard error. It isdefined as the SD of the mean of different samples, taken from thepopulation.If we study only one sample, then

    Where n is the size of the sample is Standard Deviation is that of the sample.Probableerror(PE)

    PropertiesofQD

    1. 2. Mean + QD will cover 50% of the casesPropertiesofMD

    1. 2. MD about median as the origin is the minimumPropertiesofSD

    1. SD is greater than MD, QD and PE2. Mean square deviation will be minimum, if the deviation is taken fromAM as the origin.3. Mean + 1 SD will cover 68.27% of the items

  • 8/3/2019 Bio Statistics Notes

    55/135

    Biostatistics 2010

    55

    Mean + 2 SD will cover 95.45% of the itemsMean + 3 SD will cover 99.73% of the items4. By adding or subtracting a constant from all the observations, SD isunaltered.5. If , are the means of two samples of sizes n1,n2 respectively withSD 1, 2, then the combined SD () is given by(n1+n2)2=n112+n222+n1d12+n2d22Where , is the combined mean and d1= ; d2=

    Meritsanddemeritsofdispersionmeasures

    The essential requisites of a good measure of dispersion are the same as thatof averages.Dispersionmeasure

    Merits DemeritsRange It is easy to calculateand simple tounderstand. It is not rigidly defined. It is not based on allobservations. It is affectedmuch by extreme values andopen end classes.

    It does not possess samplingstability. It is not amenable for furthermathematical treatment.QuartileDeviation It is easy to calculateand simple tounderstand and It is not affected byextreme items andopen end classes.

    It is not rigidly defined. It is not based on allobservations. It does notpossess sampling stability. It is not amenable for furthermathematical treatment.

    MeanDeviation

    It is based on allobservations It is rigidly definedand It is easy to calculateand simple tounderstand.

    It is affected much by extremevalues and open end classes. It does not possess samplingstability. It is not amenable for furthermathematical treatment.

  • 8/3/2019 Bio Statistics Notes

    56/135

    Biostatistics 2010

    56

    StandardDeviation

    It has a rigid formula

    It is based on allobservations. It is capable offurther algebraictreatment. It is less affected bysampling

    It is affected much by extremevalues and open end classes.

    Range QD MD SD1.

    Rigidly defined

    2.Based on all observations

    3. Simple to understand and easy tocalculate 4.Minimum influence of extreme values 5. Sampling stability 6.Further algebraic treatment 7.Not be affected by open end classes

    ChoiceofdispersionmeasureWe see that standard deviation satisfies many of the ideal qualitiesthan the other measures of dispersion. It is the most reliable and a bettersummary descriptive measure. Among other measures, the range is unstableand its value depends upon extreme item in the data. Range fails to considercentral tendency of the data where quartile deviation excludes half of theitems from consideration. Mean deviation suffers from the mathematicallylogical defect of neglecting algebraic signs (particularly negative signs).Standard deviation is free from all these defects to a great extent and is themost useful and most popularly used measure of dispersion.

    MOMENTS

    Moments can be defined as the arithmetic mean of various powers ofdeviations taken from the mean of a distribution. These moments are knownas central moments. The first four moments about arithmetic mean or centralmoments are defined below.

  • 8/3/2019 Bio Statistics Notes

    57/135

    Biostatistics 2010

    57

    Individual series Discrete seriesFirst moments about the Mean;

    1

    0

    0Second moments about the Mean;2 2 2Third moments about the Mean ;3 Fourth moment about the Mean ;4 rthmoment about the Mean ;r

    If the mean is a fractional value, then it becomes a difficult task towork out the moments. In such cases, we can calculate moments about aworking origin and then change it into moments about the actual mean. Themoments about an origin are known as raw moments.SKEWNESSIt has been seen that the measures of central tendency indicate thecentral position or central tendency of the frequency distribution and themeasures of dispersion give an indication to the extent to which the items

    cluster around or scatter away from the central tendency. But none of thesemeasures indicate the form or type of the distribution.Skewness refers to the lack of symmetry or departure from symmetry.We study skewness to have an idea about the shape of the curve which wecan draw with the help of the given data. Symmetry means that the numberof values above the mode and below the mode is same in a data. If in adistribution mean = median = mode, then that distribution is known assymmetricaldistribution. If in a distribution meanmedianmode, then itis not a symmetrical distribution and it is called a skeweddistribution andsuch a distribution could either be positively skewed or negatively skewed.Evidently, in the case of symmetrical distribution, the two tails of the curveare of equal size and in the case of asymmetrical distribution, one tail of thecurve is longer than the other. A distribution is said to be skewed in thedirection of the excess tail. Thus if the right tail is longer than the left, thedistribution is positively skewed; if the left tail is longer than the right, thedistribution is negativelyskewed.

  • 8/3/2019 Bio Statistics Notes

    58/135

    Biostatistics 2010

    58

    Consider the following two distributions:

    Class Frequency Class Frequency

    0-5 10 0-5 10

    5-10 30 5-10 40

    10-15 60 10-15 3015-20 60 15-20 90

    20-25 30 20-25 20

    25-30 10 25-30 10

    The above two distributions have the same mean (=15) and SD (= 6), but yet they are not identicaldistribution. The distribution on the left hand side (LHS) is symmetrical one, whereas the distribution on theright hand side (RHS) is asymmetrical or skewed.Measuresofskewness

    The important measures of skewness are

    1. Karl Pearsons coefficient of skewness2. Bowleys coefficient of skewness3. Measure of skewness based on moments1. Karl-PearsonscoefficientofskewnessAccording to Karl Pearson, the absolute measure ofskewness=meanmode. This measure is not suitable for making valid comparison of the

    skewness in two or more distributions because the unit of measurement maybe different in different series. To avoid this difficulty use relative measure ofskewness called Karl Pearsons coefficient of skewness given by:Karl Pearsons Coeficient Skewness .

  • 8/3/2019 Bio Statistics Notes

    59/135

    Biostatistics 2010

    59

    In case of mode is ill defined, the coefficient can be determined bythe formula: Coeficient Skewness 3 . 2. Bowley'scoefficientofskewnessIn Karl Pearsons method of measuring skewness the whole of theseries is needed. Prof. Bowley has suggested a formula based on relativeposition of quartiles. In a symmetrical distribution, the quartiles areequidistant from the value of the median; i.e.

    MedianQ1=Q3Median. But in a skewed distribution, the quartileswill not be equidistant from the median. Hence Bowley has suggested thefollowing formula:Bowleys Coeficient of skewness sk 2 3. MeasureofskewnessbasedonmomentsThe measure of skewness based on moments is denoted by1 and isgiven by:

    If is negative, then is negativeKURTOSISThe expression Kurtosis is used to describe the peakedness of acurve. The three measures central tendency, dispersion and skewnessdescribe the characteristics of frequency distributions. But these studies willnot give us a clear picture of the characteristics of a distribution.As far as the measurement of shape is concerned, we have twocharacteristics skewness which refers to asymmetry of a series andkurtosis which measures the peakedness of a normal curve. All the frequencycurves expose different degrees of flatness or peakedness. This characteristicof frequency curve is termed as kurtosis. Measures of kurtosis denote theshape of top of a frequency curve. Measure of kurtosis tell us the extent towhich a distribution is more peaked or more flat topped than the normalcurve, which is symmetrical and bellshaped, is designated as Mesokurtic. If a

  • 8/3/2019 Bio Statistics Notes

    60/135

    Biostatistics 2010

    60

    curve is relatively more narrow and peaked at the top, it is designated asLeptokurtic

    . If the frequency curve is more flat than normal curve, it isdesignated asPlatykurtic

    .L = Lepto KurticM = Meso KurticP = Platy Kurtic

    MeasureofKurtosis

    The measure of kurtosis of a frequency distribution based onmoments is denoted by2 and is given by If2=3, the distribution is said to be normal and the curve is Mesokurtic.If2 > 3, the distribution is said to be more peaked and the curve is

    Leptokurtic.If2

  • 8/3/2019 Bio Statistics Notes

    61/135

    Biostatistics 2010

    61

    and strength of relationship may be examined by correlation and Regressionanalysis.MeaningofcorrelationThus Correlation refers to the relationship of two variables or more.(e.g) relation between height of father and son, yield and rainfall, wage andprice index, share and debentures etc. Correlation is statistical Analysiswhich measures and analyses the degree or extent to which the two variablesfluctuate with reference to each other. The word relationship is important. Itindicates that there is some connection between the variables. It measuresthe closeness of the relationship. Correlation does not indicate cause andeffect relationship. Price and supply, income and expenditure are correlated.Definitions

    Correlation Analysis attempts to determine the degree of relationshipbetween variables YaKunChou. Correlation is an analysis of the covariation between two or morevariables A.M.Tuttle.Correlation expresses the interdependence of two sets of variablesupon each other. One variable may be called as (subject) independent andthe other relative variable (dependent). Relative variable is measured interms of subject.

    Usesofcorrelation

    1. It is used in physical and social sciences.2. It is useful for economists to study the relationship between variableslike price, quantity etc. Businessmen estimates costs, sales, price etc.using correlation.3. It is helpful in measuring the degree of relationship between thevariables like income and expenditure, price and supply, supply anddemand etc.4. Sampling error can be calculated.

    5.

    It is the basis for the concept of regression.TypesofCorrelation:Correlation is classified into various types. The most important ones arei) Positive and negative.ii) Linear and nonlinear.

  • 8/3/2019 Bio Statistics Notes

    62/135

    Biostatistics 2010

    62

    iii) Partial and total.iv)

    Simple and Multiple.PositiveandNegativeCorrelationIt depends upon the direction of change of the variables. If the twovariables tend to move together in the same direction (ie) an increase in thevalue of one variable is accompanied by an increase in the value of the other,(or) a decrease in the value of one variable is accompanied by a decrease inthe value of other, then the correlation is called positiveordirectcorrelation.Price and supply, height and weight, yield and rainfall, are some examples ofpositive correlation.If the two variables tend to move together in opposite directions sothat increase (or) decrease in the value of one variable is accompanied by adecrease or increase in the value of the other variable, then the correlation iscalled negative or inverse correlation. Price and demand, yield of crop andprice, are examples of negative correlation.

    LinearandNon-linearcorrelationIf the ratio of change between the two variables is a constant thenthere will be linear correlation between them. Consider the following.X 2 4 6 8 10 12

    Y 3 6 9 12 15 18

    Here the ratio of change between the two variables is the same. If weplot these points on a graph we get a straight line and its functionalrelationship is represented by the relation, y=a+bx, where a and b areconstants. If the amount of change in one variable does not bear a constantratio of the amount of change in the other. Then the relation is called Curvi-linear(or) non-linearcorrelation. The graph will be a curve.SimpleandMultiplecorrelations

    When we study only two variables, the relationship issimple

    correlation. For example, feed intake and growth of animals, birth weight andnumber of piglets, demand and price. But in a multiplecorrelation we studymore than two variables simultaneously. The relationship of milk yield vs.first lactation period, food supplied, age etc are an example for multiplecorrelations.

  • 8/3/2019 Bio Statistics Notes

    63/135

    Biostatistics 2010

    63

    Partialandtotalcorrelation

    The study of two variables excluding some other variable is calledPartial correlation. For example, correlation between the weight of broilerand feed intake assuming the other factors like area provided, labour used,medicinal cost etc. as constant.If there is no relationship between the two variables, they are said tobe independent or uncorrelated.RealandSpuriouscorrelationWhen there is a real correlation between two variables, it may be thata change in one variable is the cause of the change in the other. There iscovariation based on the logical relationships and causation.Sometimes, even if two variables are independent of each other, theremay be a high degree of correlation between them. Such a correlationindicates the relationship with no logical basis. For e.g., rainfall in Tamil Naduand yield in Karnataka, cattle number and number of human illiterates. Sucha correlation is called spurious or nonsensecorrelation.Computationofcorrelation

    When there exists some relationship between two variables, we haveto measure the degree of relationship. This measure is called the measure ofcorrelation (or) correlation coefficient and it is denoted by r.Co-variation:The covariation between the variables x and y is defined as , where , are respectively means ofxand yand nis the number of pairs of observations.Methodsofstudyingcorrelation

    1.

    Scatter diagram2. Correlation graph3. Karl Pearsons coefficient of correlation4. Concurrent deviation method5. Rank method

  • 8/3/2019 Bio Statistics Notes

    64/135

    Biostatistics 2010

    64

    Scatterdiagram

    A scatter diagram or scattergram or scatterplot or dot diagram is achart prepared to represent graphically the relationship between twovariables. Take one variable on the horizontal and another on the verticalaxis and mark points corresponding to each pair of the given observationsafter taking suitable scale. Then, the figure which contains the collection ofdots or points is called a scatterdiagram. The way in which the dot lies on thescatter diagram shows the type of correlation. If these dots show some trendeither upward or downward the two variables are correlated. If the dots donot show any trend, there is absence of correlation between the twovariables.

  • 8/3/2019 Bio Statistics Notes

    65/135

    Biostatistics 2010

    65

    Merits

    1.

    It is a simplest and attractive method of finding the nature ofcorrelation between the two variables.2. It is a nonmathematical method of studying correlation. It is easy tounderstand.3. It is not affected by extreme items.4. It is the first step in finding out the relation between the twovariables.5. We can have a rough idea at a glance whether it is a positivecorrelation or negative correlation.Demerits

    1. By this method we cannot get the exact degree or correlation betweenthe two variables.CorrelationgraphIn this method, curves are plotted for the data on two variables. Byexamining the direction and closeness of the two curves so drawn, we caninfer whether or not the variables are related. If both the curves drawn onthe graph are moving in the same direction (either upward or downward),correlation is said to be positive. On the other hand, if the curves are movingin the opposite direction, correlation is said to be negative.

    This method is normally used for time series data. However, likescatter diagram, this method also does not offer any numerical value forcoefficient of correlation.

  • 8/3/2019 Bio Statistics Notes

    66/135

    Biostatistics 2010

    66

    KarlPearsonscoefficientofcorrelation

    Karl Pearson, a great biometrician and statistician, suggested amathematical method for measuring the magnitude of linear relationshipbetween the two variables. It is most widely used method in practice and it isknown as pearsonian coefficient of correlation. It is denoted by r. It is alsocalled product moment formula. It is given by , Where

    , are S.D ofx

    andy

    respectively , If simplified Where

    Concurrentdeviationmethod

    This method of studying correlation is the simplest of all the methods.What is to be found in this method is the direction of change of x and yvariables. The stepwise procedure is:Stepi. Find out the direction of change ofxvariable, i.e as comparedwith the first value, whether the second value is increasing or decreasing orconstant. If it is increasing, put a + sign, if it is decreasing, put a sign and if itis constant, put zero. Similarly, as compared to second value, find outwhether the third value is increasing, decreasing or constant. Repeat thesame process for the other values also. Denote the column as Dx.Step

    ii.

    In the same way, find out the direction of change ofyvariableand denote this column as Dy.Stepiii.Multiply Dxwith Dyand determine the value ofc, the numberof concurrent deviations or the number of positive signs obtained aftermultiplying Dxwith Dy.

  • 8/3/2019 Bio Statistics Notes

    67/135

    Biostatistics 2010

    67

    Stepiv.Then apply the formula sign is taken as that of (2c-n)RankmethodIt is studied when no assumption about the parameters of thepopulation is made. This method is based on ranks. It is useful to study thequalitative measure of attributes like honesty, colour, beauty, intelligence,character, morality etc. The individuals in the group can be arranged in orderand there on, obtaining for each individual a number showing his/her rank in

    the group. This method was developed by Edw