chapter 3 intro. to statistics -1 - wikispaces3+… · a statisstatisstatistical exercise tical...

27
SHF1124 1 CHAPTER CHAPTER CHAPTER CHAPTER 3: : : : INTODUCTION TO STATISTICS INTODUCTION TO STATISTICS INTODUCTION TO STATISTICS INTODUCTION TO STATISTICS The Statistical Process The Statistical Process The Statistical Process The Statistical Process 3.1 Introduction .1 Introduction .1 Introduction .1 Introduction Statistics: Statistics: Statistics: Statistics: A field of study which implies collecting, presenting, analyzing and interpreting data as a basis for explanation, description and comparison. used to analyze the results of surveys and as a tool in scientific research to make decisions based on controlled experiments. Also useful for operations, research, quality control, estimation and prediction. Population Population Population Population: a collection, or set of individuals or objects or events whose properties are to be analyzed. Sample: Sample: Sample: Sample: a group of subjects selected from the population. Sample is a subset of a population. Statistical POPULATION POPULATION POPULATION POPULATION : -Collection of data we wish to gather information about - Eg: All students of CFS IIUM SAMPLE SAMPLE SAMPLE SAMPLE: Data collected from Population -Eg: Students of Dept. of Science Analyze the Data Analyze the Data Analyze the Data Analyze the Data : Organize, Describe & Present them Sample Statistics Sample Statistics Sample Statistics Sample Statistics : -Graphic : Eg: Histogram, Ogive, Frequency Polygon -Numeric : Eg: Mean, Standard Deviation Make Inferences Make Inferences Make Inferences Make Inferences : Determine what the statistics tell us about the Population Plan the Investigation: What? How? Who? Where? Collect the Sample

Upload: dokhanh

Post on 09-Mar-2018

241 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

1

CHAPTER CHAPTER CHAPTER CHAPTER 3333: : : : INTODUCTION TO STATISTICSINTODUCTION TO STATISTICSINTODUCTION TO STATISTICSINTODUCTION TO STATISTICS The Statistical ProcessThe Statistical ProcessThe Statistical ProcessThe Statistical Process

3333.1 Introduction.1 Introduction.1 Introduction.1 Introduction Statistics:Statistics:Statistics:Statistics: � A field of study which implies collecting, presenting, analyzing and interpreting data as a basis for explanation, description and comparison. � used to analyze the results of surveys and as a tool in scientific research to make decisions based on controlled experiments. � Also useful for operations, research, quality control, estimation and prediction. PopulationPopulationPopulationPopulation: a collection, or set of individuals or objects or events whose properties are to be analyzed. Sample:Sample:Sample:Sample: a group of subjects selected from the population. Sample is a subset of a population.

Statistical POPULATIONPOPULATIONPOPULATIONPOPULATION : -Collection of data we wish to gather information about- Eg: All students of CFS IIUM

SAMPLESAMPLESAMPLESAMPLE:Data collected from Population-Eg: Students of Dept. of Science

Analyze the Data Analyze the Data Analyze the Data Analyze the Data : Organize, Describe & Present themSample Statistics Sample Statistics Sample Statistics Sample Statistics :-Graphic : Eg: Histogram, Ogive, Frequency Polygon-Numeric : Eg: Mean, Standard Deviation

Make Inferences Make Inferences Make Inferences Make Inferences : Determine what the statistics tell us about the Population

Plan the Investigation:

What? How? Who? Where?

Collect the Sample

Page 2: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

DataDataDataData: consist a set of recorded observations or values. number of values is variable. Variacalled random variables. Data Data Data Data set:set:set:set: a collection of data values. Each value in the data set is called a data value or a datum. VariableVariableVariableVariable: a characteristics or attribute that can assume different values.A statisstatisstatisstatistical exercisetical exercisetical exercisetical exercise normally consists of 4 stages: i) Collection of data by counting or measuring.ii) Ordering and presentation of the data in a convenient form.iii) Analysis of the collected data.iv) Interpretation of the results and conclusions formulated. 3.1.13.1.13.1.13.1.1 Two brTwo brTwo brTwo branches of anches of anches of anches of StatisticsStatisticsStatisticsStatistics

DESCRIPTIVE STATISTICSDESCRIPTIVE STATISTICSDESCRIPTIVE STATISTICSDESCRIPTIVE STATISTICSConsists of the collection,organization, summarizationand presentation of data.-Describes a situation. Datapresented in the form of charts,graphs or tables.-Make use of graphicaltechniques and numericaldescriptive measures suchaverage to summarizepresent the data.-E.g.: National census conductedby Malaysian goverment everyyears or 10 years. The resultsthis census give someinformation regarding averageage, income and othercharacteristics of the Malaysianpopulation

2

a set of recorded observations or values. Any quantity that can have a number of values is variable. Variables whose values are determined by chance are a collection of data values. Each value in the data set is called a data value or a

: a characteristics or attribute that can assume different values. normally consists of 4 stages: Collection of data by counting or measuring. Ordering and presentation of the data in a convenient form. Analysis of the collected data. Interpretation of the results and conclusions formulated. StatisticsStatisticsStatisticsStatistics

STATISTICSSTATISTICSSTATISTICSSTATISTICSDESCRIPTIVE STATISTICSDESCRIPTIVE STATISTICSDESCRIPTIVE STATISTICSDESCRIPTIVE STATISTICS

collection,summarizationDatacharts,

graphicalnumericalsuch asandconductedevery 5results ofsomeaverageotherMalaysian

INFERENTIAL STATISTICSINFERENTIAL STATISTICSINFERENTIAL STATISTICSINFERENTIAL STATISTICSConsists of generalizing from samplespopulations, performing hypothesistests, detemining relationshipsvariables and making prediction- Inferences are made from samplespopulations-Use probability, that is the chanceevent occurring.-The area of inferential statisticshypotesis testing is a decisionprocess for evaluating claimspopulation, based on informationobtined from samples.- E.g.: A researcher may want toa new product of skin lotion containingaloe vera will reduce the skin problemon children. For this study, two groupyoung children would be selectedgroup would be given thecontaining aloe vera and the otherbe given a normal lotioncontaining aloe vera. As aresultobserved by experts to seeeffectiveness of the new product.

SHF1124

Any quantity that can have a bles whose values are determined by chance are a collection of data values. Each value in the data set is called a data value or a

INFERENTIAL STATISTICSINFERENTIAL STATISTICSINFERENTIAL STATISTICSINFERENTIAL STATISTICSsamples tohypothesisamongsamples to

chance of anstatistics calleddecision-makingabout ainformation

know ifcontainingproblemgroup ofselected. Onethe lotionother wouldwithoutaresult issee the.

Page 3: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

3

3.1.23.1.23.1.23.1.2 Variables and Types of DataVariables and Types of DataVariables and Types of DataVariables and Types of Data

LEVEL OF MEASUREMENTLEVEL OF MEASUREMENTLEVEL OF MEASUREMENTLEVEL OF MEASUREMENT

� Statisticians gain information about a particular situation by collecting data for random variables.

TYPES OF DATA TYPES OF DATA TYPES OF DATA TYPES OF DATA (VARIABLES(VARIABLES(VARIABLES(VARIABLES)QUALITATIVEQUALITATIVEQUALITATIVEQUALITATIVE

QUANTITATIVEQUANTITATIVEQUANTITATIVEQUANTITATIVECONTINUOUSCONTINUOUSCONTINUOUSCONTINUOUS

DISCRETEDISCRETEDISCRETEDISCRETE

NOMINALNOMINALNOMINALNOMINAL

ORDINALORDINALORDINALORDINAL

INTERVALINTERVALINTERVALINTERVAL

RATIORATIORATIORATIO

Page 4: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

4

� Types Types Types Types of of of of Data Data Data Data ((((variablesvariablesvariablesvariables) ) ) ) 1111)))) Qualitative variablesQualitative variablesQualitative variablesQualitative variables • Variables that can be placed into distinct categories, according to some characteristics or attribute. • Nonnumeric categories • E.g.: Gender , color, religion , workplace and etc 2222)))) QuantitQuantitQuantitQuantitative variablesative variablesative variablesative variables • It is numerical in nature and can be ordered or ranked. • A quantitative variable may be one of two kinds:

� Discrete variableDiscrete variableDiscrete variableDiscrete variable – a variable that can be counted or for which there is a fixed set of values. Example: the number of children in a family, the number of students in a class and etc � Continuous variableContinuous variableContinuous variableContinuous variable – a variable that can be measured on continuous scale , the result depending on the precision of the measuring instrument, or the accuracy of the observer. Continuous variable can assume all values between any two specific values. Example: temperatures, heights, weights, time taken and etc.

� Variables can be classified by how they are categorized, counted or measured. Data/ variables can be classified according to the LEVEL OF MEASUREMELEVEL OF MEASUREMELEVEL OF MEASUREMELEVEL OF MEASUREMENTNTNTNT as follows: 1) Nominal Level DataNominal Level DataNominal Level DataNominal Level Data: - classifies data (persons/objects) into two or more categories. Whatever the basis for classification, a person can only be in one category and members of a given category have a common set of characteristics. • The lowest level of measurement. • No ranking/order can be placed on the data • E.g. : Gender (Male / Female) , Type of school (Public / Private), Height (Tall/Short) , etc 2) Ordinal Level DataOrdinal Level DataOrdinal Level DataOrdinal Level Data:- classifies data into categories that can be ranked; however precise differences between the ranks do not exist. • This type of measuring scale puts the data/subjects in order from highest to lowest, from most to least. It does not indicate how much higher or how much better. Intervals between ranks are not equal. • E.g.: Letter grades (A,B,C,D,E,F) ; Man’s build (small, medium, or large)-large variation exists among the individuals in each class.

Page 5: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

5

3) Interval Level DataInterval Level DataInterval Level DataInterval Level Data:- has all characteristics of a nominal and ordinal scale but in addition it is based upon predetermined equal interval. It has no true zero point (ratio between number on the scale are not meaningful). E.g.: • Achievement test; aptitude tests, IQ test. A one point difference between IQ test of 110 and an IQ of 111 gives a significant difference. • The Fahrenheit scale is a clear example of the interval scale of measurement. Thus, 60 degree Fahrenheit or -10 degrees Fahrenheit represent interval data. Measurement of Sea Level is another example of an interval scale. With each of these scales there are direct, measurable quantities with equality of units. In addition, zero does not represent the absolute lowest value. Rather, it is point on the scale with numbers both above and below it (for example, -10degrees Fahrenheit). 4) Ratio Level DataRatio Level DataRatio Level DataRatio Level Data:- possesses all the characteristics of interval scale and in addition it has a meaningful (true zero point). True ratios exist when the same variable is measured on two different members of the population. • The highest, most precise level of measurement. • E.g.: Weight, number of calls received; height. 3.1.33.1.33.1.33.1.3 Data collection and Sampling TechniquesData collection and Sampling TechniquesData collection and Sampling TechniquesData collection and Sampling Techniques

� Sampling Sampling Sampling Sampling is the process of selecting a number of individuals for a study in such a way that the individuals represent the larger group from which they were selected. � The purpose of sampling is to use a sample to gain information about a population. � In order to obtain samples that are unbiased, statisticians use 4 basic methods of 4 basic methods of 4 basic methods of 4 basic methods of samplingsamplingsamplingsampling: i) Random SamplingRandom SamplingRandom SamplingRandom Sampling: subjects are selected by random numbers. ii) Systematic SamplingSystematic SamplingSystematic SamplingSystematic Sampling: Subjects are selected by using every kth number after the first subject is randomly from 1 through k. iii) Stratified SamplingStratified SamplingStratified SamplingStratified Sampling: Subjects are selected by dividing up the population into groups (strata) and subjects within groups are randomly selected. - E.g.: We divide the population into 5 group then we take the subjects from each group to become our sample. iv) Cluster SamplingCluster SamplingCluster SamplingCluster Sampling: Subjects are selected by using an intact group that is representative of the population. - E.g.: We divide the population into 5 group then we take 2 groups to become our sample. That means 2 group of subject represent 5 groups of subjects.

Page 6: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

6

ExerciseExerciseExerciseExercise: A ) Classify each set of data as discrete or continuousdiscrete or continuousdiscrete or continuousdiscrete or continuous. 1) The number of suitcases lost by an airline. 2) The height of corn plants. 3) The number of ears of corn produced. 4) The number of green M&M's in a bag. 5) The time it takes for a car battery to die. 6) The production of tomatoes by weight. B) Identify the following as nominal level, ordinal level, interval level, or ratio level danominal level, ordinal level, interval level, or ratio level danominal level, ordinal level, interval level, or ratio level danominal level, ordinal level, interval level, or ratio level datatatata. 1) Percentage scores on a Math exam. 2) Letter grades on an English essay. 3) Flavors of yogurt. 4) Instructors classified as: Easy, Difficult or Impossible. 5) Employee evaluations classified as : Excellent, Average, Poor. 6) Religions. 7) Political parties. 8) Commuting times to school. 9) Years (AD) of important historical events. 10) Ages (in years) of statistics students. 11) Ice cream flavor preference. 12) Amount of money in savings accounts. 13) Students classified by their reading ability: Above average, Below average, Normal.

Page 7: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

7

3333.2.2.2.2 HISTOGRAMS, FREQUENCY POLYGONS AND OGIVESHISTOGRAMS, FREQUENCY POLYGONS AND OGIVESHISTOGRAMS, FREQUENCY POLYGONS AND OGIVESHISTOGRAMS, FREQUENCY POLYGONS AND OGIVES Example: For 108 randomly selected college applicants, the following frequency distribution for entrance exam scores was obtained. Class LimitClass LimitClass LimitClass Limit FrequFrequFrequFrequencyencyencyency 90 – 98 6 99 – 107 22 108 – 116 43 117 – 125 28 126 - 134 9 Construct: 1111.... Histogram Histogram Histogram Histogram i) x-axis :class boundary ii) x-axis :class boundary y-axis : frequency y-axis : relative frequency 2222.... Frequency PolygonFrequency PolygonFrequency PolygonFrequency Polygon i) x-axis :class midpoint ii) x-axis :class midpoint y-axis : frequency y-axis : relative frequency 3333.... OgiveOgiveOgiveOgive i) x-axis : class boundary ii) x-axis : class boundary y-axis : cumulative frequency y-axis : cumulative relative frequency Relative frequencyRelative frequencyRelative frequencyRelative frequency = f

f∑

Cumulative relative frequencyCumulative relative frequencyCumulative relative frequencyCumulative relative frequency = cumulative frequencyf∑

or add the relative frequency in each class to the total relative frequency.

Page 8: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

8

Note:Note:Note:Note: Graphing Given the frequency distribution below: Class LimitClass LimitClass LimitClass Limit Class BoundaryClass BoundaryClass BoundaryClass Boundary ffff CfCfCfCf 0 – 19 -0.5 – 19.5 13 13 20 – 39 19.5 – 39.5 18 31 The first value on the x-axis is -0.5 can be drawn as below OR -0.5 19.5 39.5 -0.5 19.5 39.5 All graphs must be drawn on the right side of y-axis and omit question on analyzing the graph in exercise. Exercise: 1. In a class of 35 students, the following grade distribution was found. Construct a histogram, frequency polygon and ogive for the data. (A=4, B=3, C=2, D=1, F=0) GradeGradeGradeGrade FrequencyFrequencyFrequencyFrequency 0 3 1 6 2 9 3 12 4 5 2. Using the histogram shown below. Construct i) A frequency distribution ii) A frequency polygon iii) An ogive y 7 6 6 5 5 4 3 3 3 2 2 1 1 x 21.5 24.5 27.5 30.5 33.5 36.5 39.5 42.5 Class Boundaries

Page 9: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

9

3. Below is a data set for the duration (in minutes) of a random sample of 24 long-distance phone calls: 1 20 10 20 12 23 3 7 18 12 4 5 15 7 29 10 18 10 10 23 4 12 8 6 a) Construct a frequency distribution table for the data using the classes “1 to 5” “6 to 10” etc. b) Construct a cumulative frequency distribution table and use it to draw up an ogive. 4. The following table refers to the 2003 average income (in thousand Ringgit) per year for 20 employees of company A. Income (‘000 Income (‘000 Income (‘000 Income (‘000 Ringgit)Ringgit)Ringgit)Ringgit) FrequencyFrequencyFrequencyFrequency 5 -9 6 10 – 14 3 15 – 19 2 20 – 24 4 25 – 29 3 30 – 34 2 a) Draw the histogram and frequency polygon for the above data. b) Construct the cumulative frequency table. Hence, draw up an ogive for the above data. 3333.3.3.3.3 DATA DESCRIPTIONDATA DESCRIPTIONDATA DESCRIPTIONDATA DESCRIPTION 3.3.13.3.13.3.13.3.1 MEASURES OF CENTRAL TENDENCYMEASURES OF CENTRAL TENDENCYMEASURES OF CENTRAL TENDENCYMEASURES OF CENTRAL TENDENCY

� Mean, median and Mode for Mean, median and Mode for Mean, median and Mode for Mean, median and Mode for Ungrouped dataUngrouped dataUngrouped dataUngrouped data • MeanMeanMeanMean (arithmetic average) Symbol for Sample: X Symbol for Population: μ (Syllabus focus on sample formula), Mean, X

Xn

=∑ • MedianMedianMedianMedian : (the middle point in ordered data set) - arrange the data in order, ascending or descending - select the middle point or use formula 1

2

nT

+= , n is number of data. - Then, the median is:

� the value at location T (for odd number of data) � the average of the value at location T and the value at location (T +1) (for even number of data)

• ModeModeModeMode : the value that occur most often in the data set

Page 10: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

10

Example: 1) The following data are the number of burglaries reported for a specific year for nine western Pennsylvania universities. Find mean, median and mode. 61, 11, 1, 3, 2, 30, 18, 3, 7 2) Twelve major earthquakes had Richter magnitudes shown here. Find mean, median and mode. 7.0 , 6.2 , 7.7 , 8.0 , 6.4 , 6.2 , 7.2 , 5.4 , 6.4 , 6.5 , 7.2 , 5.4 3) The number of hospitals for the five largest hospital systems is shown here. Find mean, median and mode. 340, 75, 123, 259, 151 � Mean, median and Mode for Mean, median and Mode for Mean, median and Mode for Mean, median and Mode for Ungrouped frequency Ungrouped frequency Ungrouped frequency Ungrouped frequency distributiondistributiondistributiondistribution

• MeanMeanMeanMean, NO = P QNP Q

• MedianMedianMedianMedian : - find cumulative frequency - Location of median 2

f= ∑

• ModeModeModeMode : the value with the largest frequency Example: 4) A survey taken in a restaurant. This ungrouped frequency distribution of the number of cups of coffee consumed with each meal was obtained. Find mean, median and mode. Number of cupsNumber of cupsNumber of cupsNumber of cups FrequencyFrequencyFrequencyFrequency 0 5 1 8 2 10 3 2 4 3 5 2

Page 11: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

11

� Mean, median and Mode for Mean, median and Mode for Mean, median and Mode for Mean, median and Mode for Grouped frequency distributionGrouped frequency distributionGrouped frequency distributionGrouped frequency distribution

• MeanMeanMeanMean, NO = P QNRP Q where; mX =class midpoint (Student must show the working ie. Find midpoint and QNR )

• MedianMedianMedianMedian : - find cumulative frequency - find location of median class 2

f= ∑ - Median :

SR + TU2 V P QRWXQR Y Z [ Where; SR : lower boundary of the median class P QRWX: cumulative frequency until the point L (before median class) QR : frequency of the median class c : class width of median class

• ModeModeModeMode : - find location of modal class : class with the largest frequency - Mode : SR\ + ] ^^ + _` Z [ where; SR\ : lower boundary of the modal class a : different between frequencies of modal class and the class before b : different between frequencies of modal class and the class after c : class width of median class Example: 5) These numbers of books were read by each of the 28 students in a literature class. Find mean, median and mode. Number of booksNumber of booksNumber of booksNumber of books FrequencyFrequencyFrequencyFrequency 0 – 2 2 3 – 5 6 6 – 8 12 9 – 11 5 12 – 14 3

Page 12: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

12

6) Eighty randomly selected light bulbs were tested to determine their lifetimes (in hours). This frequency distribution was obtained. Find mean, median and mode. Class BoundariesClass BoundariesClass BoundariesClass Boundaries FrequencyFrequencyFrequencyFrequency 52.5 – 63.5 6 63.5 – 74.5 12 74.5 – 85.5 25 85.5 – 96.5 18 96.5 – 107.5 14 107.5 – 118.5 5 3.3.23.3.23.3.23.3.2 MEASURES OF VARIATIONMEASURES OF VARIATIONMEASURES OF VARIATIONMEASURES OF VARIATION Variance and Standard deviation Variance and Standard deviation Variance and Standard deviation Variance and Standard deviation (the spread of data set) Group AGroup AGroup AGroup A Group BGroup BGroup BGroup B 80 55 81 88 82 100 X =81 X =81 Variation, s2 =1 Variation, s2 =543 80 81 82 55 88 10080 81 82 55 88 10080 81 82 55 88 10080 81 82 55 88 100 Even though the average for both groups is the same, the spread or variation of data in the Group B larger than Group A.

Page 13: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

13

(Syllabus focus on sample formula) Sample variSample variSample variSample variance and standard deviationance and standard deviationance and standard deviationance and standard deviation � For Ungrouped DataFor Ungrouped DataFor Ungrouped DataFor Ungrouped Data VarianceVarianceVarianceVariance, ( )2

2

1

X Xs

n

−=

Standard deviationStandard deviationStandard deviationStandard deviation,

( )2

2

1

X Xs s

n

−= =

where; X =individual value X =sample mean n = sample size OROROROR VarianceVarianceVarianceVariance,

( )22

2

1

XX

n

sn

− =−

∑∑

Standard deviationStandard deviationStandard deviationStandard deviation,

( )22

2

1

XX

n

s sn

− = =−

∑∑

(Note: 2X∑ is not the same as ( )2X∑ )

VariancePopulation variance , σ2= (Σ(X -μ)2)/N

Sample variance , sSample variance , sSample variance , sSample variance , s2222

Standard deviationPopulation standard deviation , σ= √(Σ(X -μ)2)/N=√σ2

Sample standard deviation , sSample standard deviation , sSample standard deviation , sSample standard deviation , s

Page 14: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

14

Example: 1) The normal daily temperatures (in degrees Fahrenheit) in January for 10 selected cities are as follows. Find the variance and standard deviation. 50 37 29 54 30 61 47 38 34 61 2) Twelve students were given an arithmetic test and the times (in minutes) to complete it were 10 9 12 11 8 15 9 7 8 6 12 10 Find the variance and standard deviation. � For Grouped DataFor Grouped DataFor Grouped DataFor Grouped Data VarianceVarianceVarianceVariance, fg = P Q Ng V h(P QN)gP Q i(P Q) V 1 Standard deviationStandard deviationStandard deviationStandard deviation,

f = jfg f = kP Q Ng V h(P QN)gP Q i(P Q) V 1 (Students must show the working ie. Find P QN and P Q Ng) Example: 3) In a class of 29 students, this distribution of quiz scores was recorded. Find variance and standard deviation.

GradeGradeGradeGrade FrequencyFrequencyFrequencyFrequency 0 – 2 1 3 – 5 3 6 – 8 5 9 – 11 14 12 – 14 6

Page 15: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

15

4) Eighty randomly selected light bulbs were tested to determine their lifetimes (in hours). This frequency distribution was obtained. Find variance and standard deviation. Class BoundariesClass BoundariesClass BoundariesClass Boundaries FrequencyFrequencyFrequencyFrequency 52.5 – 63.5 6 63.5 – 74.5 12 74.5 – 85.5 25 85.5 – 96.5 18 96.5 – 107.5 14 107.5 – 118.5 5 5) These data represent the scores (in words per minute) of 25 typists on a speed test. Find variance and standard deviation. Class limitClass limitClass limitClass limit FrequencyFrequencyFrequencyFrequency 54 – 58 2 59 – 63 5 64 – 68 8 69 – 73 0 74 – 78 4 79 – 83 5 84 – 88 1 3.3.33.3.33.3.33.3.3 MEASURES OF MEASURES OF MEASURES OF MEASURES OF POSITIONPOSITIONPOSITIONPOSITION Standard scores, perceStandard scores, perceStandard scores, perceStandard scores, percentiles, deciles ntiles, deciles ntiles, deciles ntiles, deciles and quartilesquartilesquartilesquartiles are used to locate the relative position of the data value in the data set. � Standard score / zStandard score / zStandard score / zStandard score / z----scorescorescorescore The z-score represent the number of standard deviations the data value is above or below the mean. X X

zs

−=

� if the z score is positive, the scoreif the z score is positive, the scoreif the z score is positive, the scoreif the z score is positive, the score is above the mean is above the mean is above the mean is above the mean � if the z score is negative, the score is below the meanif the z score is negative, the score is below the meanif the z score is negative, the score is below the meanif the z score is negative, the score is below the mean

Page 16: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

16

Example: 1) Let data set : 65 , 70 , 75 ,80 , 85 ; X =75 , s =5 65 70 75 80 85 X -2s X - s X X +2s X +s z= -2 z= -1 z= 0 z= 1 z= 2 For data value 83: 83 75

1.65

z−

= = 2) Test marks are shown here. On which test she perform better?

Math marks: 65656565 50 45 ; X =53.3 , s=10.4 Biology marks: 80 75757575 70 ; X =75 , s=5 65 53.3

1.12210.4

Mz−

= = 75 750

5Bz

−= =

M Bz z> , the relative position in math class is higher than her the relative position in biology class. She performs better in math paper than biology paper. (the marks that she get from biology paper is more than mathematics paper but we cannot compare the marks directly because the papers are different i.e. number of question, standard of questions and so on, that is why we have to compare the relative position) � Quartiles, decilQuartiles, decilQuartiles, decilQuartiles, deciles and percentilees and percentilees and percentilees and percentile For Ungrouped dataFor Ungrouped dataFor Ungrouped dataFor Ungrouped data � QuartilesQuartilesQuartilesQuartiles: divide the distribution into four group Q1 , Q2 , Q3 Smallest data Q1 Q2 Q3 Largest data 25% 25% 25% 25% Median

� arrange the data in order � Find location of quartiles, [ = nZop , where ; n = total number of values q =quartile

Page 17: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

17

i) If cccc is not whole numbernot whole numbernot whole numbernot whole number, round up to the next whole number ii) If cccc is a whole numbera whole numbera whole numbera whole number, take average of cth and (c+1)th Example: 1) The weights in pounds in the data set. Find Q1 , Q2 , Q3. 16 18 22 19 3 21 17 20 2) The test score in the data set. Find Q1 , Q2 , Q3. 42 35 28 12 47 50 49 � DecilesDecilesDecilesDeciles: divide the distribution into 10 groups Smallest data D1 D2 D3 D4 D5 D6 D7 D8 D9 Largest data 10% 10% 10% 10% 10% 10% 10% 10% 10% Median

� arrange the data in order � Find location of quartiles, [ = nZqXr where ; n = total number of values d =decile iii) If cccc is not whole numbernot whole numbernot whole numbernot whole number, round up to the next whole number iv) If cccc is a whole numbera whole numbera whole numbera whole number, take average of cth and (c+1)th Example: 1) (from previous example) Find D5. 16 18 22 19 3 21 17 20 2) (from previous example)Find D7. 42 35 28 12 47 50 49

� PercentilesPercentilesPercentilesPercentiles: divide the distribution into 100 equal groups Smallest data P1 P2 P3 P97 P98 P99 Largest data 10% 10% 10% 10% 10% 10% 10% 10% 10% D1 , D2, D3, … , D9 correspond to P10 , P20, P30, … , P90 Q1 , Q2 , Q3 correspond to P25 , P50, P75 Median = Q2 = D5 = P50

Page 18: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

18

� arrange the data in order � Find location of quartiles, [ = nZtXrr where ; n = total number of values p =percentile v) If cccc is not whole numbernot whole numbernot whole numbernot whole number, round up to the next whole number vi) If cccc is a whole numbera whole numbera whole numbera whole number, take average of cth and (c+1)th Example: 1) (from previous example) Find P33. 16 18 22 19 3 21 17 20 2) (from previous example)Find P60. 42 35 28 12 47 50 49 Finding percentile corresponding to given value, XXXX

( )number of values below X 0.5Percentile 100%

total number of values

+= × Example of data set : 1 1 3 4 5 Find percentile for 4. 3 0.5

Percentile 100% 70%5

+= × = P70 = 4 (round off the answer)(round off the answer)(round off the answer)(round off the answer) Example: 2) (from previous example)Find the percentile rank for each test score in the data set. 42 35 28 12 47 50 49 (Data value 47 = P64 but previously when we want to find P60 the data value is 47b too. So actually P60 closer to P64 which is data value 47) For Grouped DataFor Grouped DataFor Grouped DataFor Grouped Data METHOD 1: (USE PERCENTILE GRAPH)METHOD 1: (USE PERCENTILE GRAPH)METHOD 1: (USE PERCENTILE GRAPH)METHOD 1: (USE PERCENTILE GRAPH) x-axis: class boundaries y-axis: relative cumulative frequency (percentage) Cumulative relative freqCumulative relative freqCumulative relative freqCumulative relative frequencyuencyuencyuency (%)(%)(%)(%) = cumulative frequency 100%

Page 19: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

19

Graph:Graph:Graph:Graph: iiii)))) percentile graphpercentile graphpercentile graphpercentile graph Relative cumulative frequency (%) 100 25 P25 iiiiiiii)))) Ogive using relative frequencyOgive using relative frequencyOgive using relative frequencyOgive using relative frequency (iii) Ogive(iii) Ogive(iii) Ogive(iii) Ogive Relative cumulative frequency Cumulative Frequency 1.0 75 0.25 P25 18.75 P25 25% x 75 =18.75 METHOD 2: (USE FORMULA)METHOD 2: (USE FORMULA)METHOD 2: (USE FORMULA)METHOD 2: (USE FORMULA) 100

n n

nf F

P L cf

− = +

∑�

Example: This distribution represents the data for weights of fifth-grade boys. Weights (pounds)Weights (pounds)Weights (pounds)Weights (pounds) frequencyfrequencyfrequencyfrequency 52.5 – 55.5 9 55.5 – 58.5 12 58.5 – 61.5 17 61.5 – 64.5 22 64.5 – 67.5 15 1) Find the approximate weights corresponding to each percentile given by constructing a percentile graph. (i) Q1 (ii) D8 (iii) Median (iv) P95

Page 20: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

20

2) Find the approximate percentile ranks of the following weights. (i) 57 pounds (ii) 64 pounds (iii) 62 pounds (iv) 59 pounds 3) Find P63 by using the formula. EXERCISE CHAPTER 3EXERCISE CHAPTER 3EXERCISE CHAPTER 3EXERCISE CHAPTER 3 1. What type of sampling is being employed if a country is divided into economic classes

and a sample is chosen from each class to be surveyed?

2. Given a set of data 5,2,8,14,10,5,7,10,m, n where X =7 and mode = 5. Find the possible values of mmmm and nnnn. (ansansansans: m=5, n=4 or m =4 , n =5) 3. Find the value that corresponds to the 30th percentile of the following data set: 78 82 86 88 92 97 (ansansansans: P30 =82) 4. Given the variance of the set of 8 data x1 , x2, x3, … , x8 is 5.67. If 2 944.96X =∑ , find the mean of the data. (ansansansans: 11.09) 5. Find Q3 for the given data set : 18,22,50,15,13,6,5,12 (ansansansans: 20) 6. The number of credits in business courses that eight applicants took is 9, 12, 15, 27, 33, p, 63, 72. Given the value that corresponds to the 75th percentile is 54, find pppp. (ansansansans: 45) 7. The mean of 5, 10, 26, 30, 45, 32, x, y is 25 where x and y are constants. If x = 16, find

the median. (ans: 28)

a) Construct a frequency distribution by using 7 classes (use 3 as lower limit of the first

class)

b) Find the mean, mode and standard deviation. (ans: 28.15 , 31.3 , 14.63)

c) Draw an ogive by using relative frequency and estimate the median from the graph.

Page 21: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

21

EXERCISEEXERCISEEXERCISEEXERCISE 1. In four successive history tests, a student received grades of 45, 73, 77 and 86. Which of the following conclusions can be obtained from these figures by descriptive descriptive descriptive descriptive methodmethodmethodmethod and inferential methodinferential methodinferential methodinferential method? Explain your answer. a) Only one of the grades exceeds 85. b) The student’s grades increased from each test to the next. c) The student must have studied harder for each successive test. d) The difference between the highest and the lowest grade is 41. 2. State whether the following are nominal, ordinal, intervalnominal, ordinal, intervalnominal, ordinal, intervalnominal, ordinal, interval or ratioratioratioratio data. a) A statistics test which a student took was easy, difficult or very difficult and these alternatives are coded 1, 2 and 3. b) The temperature if different kilns at the factory. c) The bottles on a Chemistry laboratory shelf are numbered 1,2,3 and 4 representing sulfuric acid, hydrochloric acid, nitric acid and sodium hydroxide. d) The race of the students in university campus. e) The normal operating temperature of a car engine. f) Classification of students using an academic program. g) Speaker of a seminar rated as excellent, good, average or poor. h) Number of hour’s parents spends with their children per day. EXERCISEEXERCISEEXERCISEEXERCISE 1. In four successive history tests, a student received grades of 45, 73, 77 and 86. Which of the following conclusions can be obtained from these figures by descriptive descriptive descriptive descriptive methodmethodmethodmethod and inferential methodinferential methodinferential methodinferential method? Explain your answer. a) Only one of the grades exceeds 85. b) The student’s grades increased from each test to the next. c) The student must have studied harder for each successive test. d) The difference between the highest and the lowest grade is 41. 2. State whether the following are nominal, ordinal, intervalnominal, ordinal, intervalnominal, ordinal, intervalnominal, ordinal, interval or ratioratioratioratio data. a) A statistics test which a student took was easy, difficult or very difficult and these alternatives are coded 1, 2 and 3. b) The temperature if different kilns at the factory. c) The bottles on a Chemistry laboratory shelf are numbered 1,2,3 and 4 representing sulfuric acid, hydrochloric acid, nitric acid and sodium hydroxide. d) The race of the students in university campus. e) The normal operating temperature of a car engine. f) Classification of students using an academic program. g) Speaker of a seminar rated as excellent, good, average or poor. h) Number of hour’s parents spends with their children per day.

Page 22: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

22

EXERCISE CHAPTER 3

SEM 3, 07/08

1. (a) Name the two main areas of statistics.

(b) State whether the variable from the following statements is discrete or continuous.

(i) The number of calls received by a switchboard operator each day for a month.

(ii) Lifetime (in hours) of 12 flashlight batteries.

(iii) Actual cost of a student’s textbook for a given semester.

2. (a) The following is the systolic blood pressure, in mm Hg, of 10 patients in a hospital

165 135 151 155 158 146 149 124 162 173

Find:

(i) The number of patients whose systolic blood pressures exceed one standard score

above or below the mean.

(ii) The data value that corresponds to the third quartile, Q3 (b) The table indicates the scores obtained by a group of students in a mathematics quiz.

Score 0 1 2 3 4 5

Number of students 8 1 1 0 x 3

If the median is 1.

(i) Find the value of x.

(ii) Hence, find the mode.

SEM 2, 07/08

1. State whether the following are nominal, ordinal, interval or ratio data.

(a) A statistics test which a student took was easy, difficult or very difficult and these

alternatives are coded 1, 2, and 3.

(b) The temperature of different kilns at the factory.

(c) The bottles on a Chemistry laboratory shelf are numbered 1,2,3 and 4 representing

sulfuric acid, hydrochloric acid, nitric acid and sodium hydroxide.

(d) The race of the students in a university campus.

Page 23: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

23

2. (a) The mean of the ages of a group of 8 people is 40 years and the variance is 50 years2.

Two other people whose ages 30 years and 79 years join the group. Calculate the mean

and standard deviation of the ages of the 10 people.

(b) The number of hand phones that are sold in a week by 15 representatives in a town is

as follows:

5, 10, 8, 7, 25, 12, 5, 14, 11, 10, 21, 9,8 11,18

Find:

(i) The number of representatives whose number of hand phones sold is above the

median of the data set.

(ii) The data values that corresponds to the 63rd percentile.

3. The distribution of the weights of 133 mineral specimens collected on a field trip is given

below:

Weights

(grams)

Number of

specimens

20-34 8

35-49 27

50-64 42

64-79 31

80-94 17

95-109 8

(a) Find the median and mode.

(b) Construct a percentile graph (use graph paper). Then, find:

(i) The percentile rank for the weight of the mineral specimen of 45 grams.

(ii) The value of k if the weight of 25% of the specimens is at least k grams.

Page 24: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

24

SEM 1, 07/08

1. State whether each of the following statements is true or false.

(a) A study of statistics can be divided into two sections: qualitative and quantitative

methods.

(b) Ordinal scales permit comparison of scores or categories in terms of smaller or larger,

higher or lower, or, best or worst.

(c) The method of dividing the population elements into two groups based on income

level and then selecting a simple random sample from each group is called cluster

sampling.

(d) The highest level of measurement is the interval level.

(e) The weight of pumpkins is considered to be continuous variable.

2. (a) Given a set of numbers { }1 2 8, ,...,x x x , ( )2 46.08x x− =∑ , find

(I) The variance of the set of numbers

(II) The mean if 2 944.96x =∑

(b) The following data shows the number of television sets sold by a firm in period of 10

weeks :

15 21 5 6 7 29 9 10 14 12

Find the percentile rank for 10 television sets sold by a firm.

3.

(b) The relative frequency distribution shown in the following table refers to the marks

for statistics test obtained by a group of matriculation students. The mean for the

distribution of marks is 51.9

Marks 0-

19

20-39 40-59 60-79 80-99

Relative

Frequency

0.07 0.20 0.36 0.28 0.09

(i) If the students consist of 55% male students and 45% female students and the

mean obtained by male students are 51, find the mean marks obtained by the

female students.

(ii) Draw an ogive using relative frequency.

(iii)If a student who scores 85 marks or more is given a grade A, estimate the

percentage of students who obtained grade A.

(iv) Find D8.

Page 25: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

25

SEM 3, 06/07

1. (a) Name the two areas of statistics.

(b) Identity each of the following as examples of qualitative or quantitative variables:

(i) The breaking strength of a given type of string.

(ii) The hair color of children auditioning for the theatre Ali Baba.

(i) Whether or not a tap is defective.

(ii) The length of time required to answer a telephone call at an office.

2. (a) Farid scores 60 on an English test that has mean of 54 and a standard deviation of 3

and he scores 81 on a History test with a mean of 76 and a standard deviation of 2. On

which test did he perform better? Give your reason.

(b) The following data shows the number of books sold by ABC publisher in the Kuala

Lumpur Book Fair in a period of 11 days. Find the percentile rank of selling 12 books.

23 15 13 6 12 5

10 22 18 11 25

3. The income (in thousands of RM) of 28 managers, grouped by class mid-points are as

follows:

Mid-point 40 45 50 55 60

Number of managers 5 7 10 4 2

(a) Construct a frequency distribution

(b) Find the median

(c) Find the standard deviation

(d) Draw an ogive using relative frequency. (Use the graph paper) .

SEM 2, 06/07

1. (a) Classify the two groups of quantitative variables.

(b) State whether the data from the following statements is nominal, ordinal , interval or

ratio.

(i) The normal operating temperature of a car engine.

(ii) Classification of students using an academic program.

(iii)Speaker of seminar rated as excellent, good, average or poor.

(iv) Number of hours parents spend with their children per day.

Page 26: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

26

2. (a) The following data shows the number of television sets sold by a firm in a period of 9

weeks. Given that mode=m.

15 21 5 6 7 29 m 10 14

Find the possible value(s) of m if D7=m.

(b) A statistics was conducted for 100 male students and 120 female students taking a

pre-university course. The mean and standard deviation of the marks obtained by the

male and female students are follows.

Students Mean Standard

deviation

Male 55.8 6.5

Female 57.1 5.3

Calculate the mean and standard deviation of the marks obtained by all students. ANANANANSWERSWERSWERSWER Sem. 3, 07/08Sem. 3, 07/08Sem. 3, 07/08Sem. 3, 07/08 1. (a) Descriptive , Inferential (b) (i) Discrete (ii) Continuous (iii) Continuous 2. (a) (i) No. of patients = 7 (ii) Q3 = 162 (b) (i) x = 4 or 5 (ii) Mode = 0 Sem. 2, 07/08Sem. 2, 07/08Sem. 2, 07/08Sem. 2, 07/08 1. (a) Nominal (b) Interval (c) Nominal (d) Nominal 2. (a) New mean = 42.9 ; Standard deviation = 14.48 (b) (i) Median =10 ; no. of representative = 7 (ii) P63 = 11 3. (a) median = 60.75 ; Mode = 58.15 (b) (i) 45 grams correspond to 20-21 percentile (ii) Q3 ≈ 75-76 gm

Page 27: CHAPTER 3 INTRO. TO STATISTICS -1 - Wikispaces3+… · A statisstatisstatistical exercise tical exercisetical exercise normally consists of 4 ... -Use probability, that is ... Percentage

SHF1124

27

Sem. 1, 07/08Sem. 1, 07/08Sem. 1, 07/08Sem. 1, 07/08 1. (a) False (b) True (c) False (d) False (e) True 2. (a) (i) Variance = 6.58 (ii) Mean = 10.6 (b) P45 = 10 3. (b) (i) Mean female = 53 (iii) 6-7 ( iv) 70-72 Sem. 3 , 06/07Sem. 3 , 06/07Sem. 3 , 06/07Sem. 3 , 06/07 1. (a) (i) Descriptive , Inferential (b) (i) Quantitative (ii) Qualitative ( iii) Qualitative (iv) Quantitative 2. (a) Farid did better in History since z-score is higher than English. (b) P41 = 12 3. (b) Median = 48.5 (c) Standard Deviation = 5.78 Sem. 2, Sem. 2, Sem. 2, Sem. 2, 06/0706/0706/0706/07 1. (a) Continuous , Discrete (b) (i) Interval (ii) Nominal (iii) Ordinal (iv) Ratio 2. (a) m = 15 or 21 (b) Mean = 56.51 ; Standard Deviation = 5.90