chapter 1 notes for stat400
TRANSCRIPT
-
7/21/2019 Chapter 1 Notes for STAT400
1/31
Populations, Samples, and Processes.Populations and Samples
Random Variables and Statistical PopulationsBrunches of statistics.
Types of data.Describing data by tables and graphs.
Measures of LocationMeasures of Variability
STAT400. Chapter 1. Overview and Descriptive
Statistics
Natalia Tchetcherina
January 26, 28
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
http://find/ -
7/21/2019 Chapter 1 Notes for STAT400
2/31
Populations, Samples, and Processes.Populations and Samples
Random Variables and Statistical PopulationsBrunches of statistics.
Types of data.Describing data by tables and graphs.
Measures of LocationMeasures of Variability
Populations, Samples, and Processes.
Populations and Samples
Random Variables and Statistical Populations
Brunches of statistics.
Types of data.Categorical dataNumerical data.
Describing data by tables and graphs.
Categorical Data.Discrete Data.Continuous data.
Measures of Location
Measures of Variability
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
http://find/ -
7/21/2019 Chapter 1 Notes for STAT400
3/31
Populations, Samples, and Processes.Populations and Samples
Random Variables and Statistical PopulationsBrunches of statistics.
Types of data.Describing data by tables and graphs.
Measures of LocationMeasures of Variability
Statistics.
Statisticsas a subject provides a body of principles andmethodology for designing the process of data collection,
summarizing and interpreting the data, and drawing conclusions orgeneralities.
Examples.
Employment. Monthly, as part of the Current Population Survey,
the Bureau of Census collects information about employmentstatus from a sample of about 65,000 households. Households arecontacted on a rotating basis with three-fourths of the sampleremaining the same for any two consecutive months.The survey data are analyzed by the Bureau of Labor Statistics,
which reports monthly unemployment rates.
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
http://find/ -
7/21/2019 Chapter 1 Notes for STAT400
4/31
Populations, Samples, and Processes.Populations and Samples
Random Variables and Statistical PopulationsBrunches of statistics.
Types of data.Describing data by tables and graphs.
Measures of LocationMeasures of Variability
Statistics.
Examples.Gallup Poll. This, the best known of the national polls, producesestimates of the percentage of popular vote for each candidatebased on interviews with a minimum of 1500 adults. Beginningseveral months before the presidential election, results are regularlypublished. These reports help predict winners and track changes invoter preferences.
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
P l i S l d P
http://find/http://goback/ -
7/21/2019 Chapter 1 Notes for STAT400
5/31
Populations, Samples, and Processes.Populations and Samples
Random Variables and Statistical PopulationsBrunches of statistics.
Types of data.Describing data by tables and graphs.
Measures of LocationMeasures of Variability
Statistics.
Examples.
Making in medical research studies. Heart decease is the most
common cause of death in the industrialized nations. In the USand Canada nearly 30 % of deaths each year are due to heartdeceases, mainly heart attack. Does regular aspirin intake reducesdeaths from heart attacks? The Harvard Medical Schoolconducted a landmark study to investigate. The peopleparticipating in the study regularly took either aspirin or placibo (atablet with no active ingredient). Of those who took aspirin 0.9%suffered heart attacks during the study. Of those who took placibo1.7 % had heart attacks. Could we conclude that its beneficial for
people to take aspirin?Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
P l ti S l d P
http://find/http://goback/ -
7/21/2019 Chapter 1 Notes for STAT400
6/31
Populations, Samples, and Processes.Populations and Samples
Random Variables and Statistical PopulationsBrunches of statistics.
Types of data.Describing data by tables and graphs.
Measures of LocationMeasures of Variability
Populations, Units and Characteristics
Population is a well-defined collection of objects or subjects.Studies involve the investigation of certain characteristic(s) of
members (called units) of population(s). All water in a lake. Characteristic: Mercury concentration.
All items of a certain manufactured product (that have, or willbe produced). Characteristic: Proportion of defectives.
All students enrolled in Big Ten universities during the2013-14 academic year. Characteristics: Favorite type ofmusic; Political affiliation.
Two types of cleaning products. Characteristic: cleaningeffectiveness.
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
Pop lations Samples and Processes
http://find/http://goback/ -
7/21/2019 Chapter 1 Notes for STAT400
7/31
Populations, Samples, and Processes.Populations and Samples
Random Variables and Statistical PopulationsBrunches of statistics.
Types of data.Describing data by tables and graphs.
Measures of LocationMeasures of Variability
Variable = a Numerical Characteristic
In most examples considered above, the characteristics weconsidered are quantities that can be measured and expressed as
numbers, e.g. thermal expansion of a metal, hardness of cement,mercury concentration. Such characteristics are calledquantitative.Examples of non-quantitative characteristics are gender, make ofcar, eye color, strength category, political affiliation. Such
characteristics are called categorical orqualitative.Because statistical procedures are applied to numerical data sets,the categories in categorical characteristic are labeled witharbitrarily chosen numbers (i.e. male= 1, female= +1).A characteristic expressed as a number is calleda variable.
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
http://find/ -
7/21/2019 Chapter 1 Notes for STAT400
8/31
Populations Samples and Processes
-
7/21/2019 Chapter 1 Notes for STAT400
9/31
Populations, Samples, and Processes.Populations and Samples
Random Variables and Statistical PopulationsBrunches of statistics.
Types of data.Describing data by tables and graphs.
Measures of LocationMeasures of Variability
Branches of statistics.
Experimental design.Planning of data collection.
Descriptive statistics.Summarizing and describing theprominent features of data.
Inferential statistics. Evaluation of the information present
in data (making conclusion).
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
Populations, Samples, and Processes.
http://find/http://goback/ -
7/21/2019 Chapter 1 Notes for STAT400
10/31
Populations, Samples, and Processes.Populations and Samples
Random Variables and Statistical PopulationsBrunches of statistics.
Types of data.Describing data by tables and graphs.
Measures of LocationMeasures of Variability
Categorical dataNumerical data.
Categorical (qualitative) data
When the characteristic under study concerns a qualitative trait
that is only classified in categories and not numerically measured,the resulting data are called categorical data.
Examples.
Hair color: blond, brown, red, black
Employment status: employed, unemployed
Blood type:O, A, B, AB
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
Populations, Samples, and Processes.
http://find/ -
7/21/2019 Chapter 1 Notes for STAT400
11/31
Populations, Samples, and Processes.Populations and Samples
Random Variables and Statistical PopulationsBrunches of statistics.
Types of data.Describing data by tables and graphs.
Measures of LocationMeasures of Variability
Categorical dataNumerical data.
Numerical (measurement) data
If the characteristic is measured on a numerical scale, the resultingdata consist of a set of numbers and are called measurement data.We will use the term (numerical) variable to refer to acharacteristic that is measured on a numerical scale.Examples.
Shoe size of an adult male
Height of a 1-year-old pine seedling
The time in line at an automated teller
The number of offspring in an animal litter.
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
http://find/http://goback/ -
7/21/2019 Chapter 1 Notes for STAT400
12/31
Populations, Samples, and Processes.
-
7/21/2019 Chapter 1 Notes for STAT400
13/31
Populations and SamplesRandom Variables and Statistical Populations
Brunches of statistics.Types of data.
Describing data by tables and graphs.Measures of Location
Measures of Variability
Categorical Data.Discrete Data.Continuous data.
Frequency table.
Relative Frequency of a category = Frequency in the categoryTotal number of observations
.
Opinion poll on new dorm regulations.
Responses Frequency Relative Frequency
Support 152 152/280 =.543Neutral 77 77/280 =.275Oppose 51 51/280 =.182Total 280 1.000
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
Populations, Samples, and Processes.
http://find/ -
7/21/2019 Chapter 1 Notes for STAT400
14/31
Populations and SamplesRandom Variables and Statistical Populations
Brunches of statistics.Types of data.
Describing data by tables and graphs.Measures of Location
Measures of Variability
Categorical Data.Discrete Data.Continuous data.
Frequency table.
Daily numbers (x) of internet system crashes.
Data: 1,3,1,1,0,1,0,1,1,0,2,2,0,0,0,1,2,1,2,0,0,1,6,4,3,3,1,2,4,0.
Value x Frequency Relative Frequency0 9 .3001 10 .3332 5 .167
3 3 .1004 2 .0675 0 .0006 1 .033Total 30 1.000
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
Populations, Samples, and Processes.P l i d S l
http://find/ -
7/21/2019 Chapter 1 Notes for STAT400
15/31
Populations and SamplesRandom Variables and Statistical Populations
Brunches of statistics.Types of data.
Describing data by tables and graphs.Measures of Location
Measures of Variability
Categorical Data.Discrete Data.Continuous data.
Histogram and line diagram
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
Populations, Samples, and Processes.P l ti d S l
http://find/ -
7/21/2019 Chapter 1 Notes for STAT400
16/31
Populations and SamplesRandom Variables and Statistical Populations
Brunches of statistics.Types of data.
Describing data by tables and graphs.Measures of Location
Measures of Variability
Categorical Data.Discrete Data.Continuous data.
Constructing a Frequency Distribution for a ContinuousVariable
Find the minimum and the maximum values in the data set. Choose intervals or cells of equal length that cover the range
between the minimum and the maximum without overlapping.These are called class intervals, and their endpoints classboundaries.
Count the number of observations in the data that belong toeach class interval. The count in each class is the classfrequencyorcell frequency.
Calculate the relative frequency of each class by dividing theclass frequency by the total number of observations in the
data: Class frequencyNatalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
Populations, Samples, and Processes.Populations and Samples
http://find/ -
7/21/2019 Chapter 1 Notes for STAT400
17/31
Populations and SamplesRandom Variables and Statistical Populations
Brunches of statistics.Types of data.
Describing data by tables and graphs.Measures of Location
Measures of Variability
Categorical Data.Discrete Data.Continuous data.
Example.
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
Populations, Samples, and Processes.Populations and Samples
http://find/ -
7/21/2019 Chapter 1 Notes for STAT400
18/31
Populations and SamplesRandom Variables and Statistical Populations
Brunches of statistics.Types of data.
Describing data by tables and graphs.Measures of Location
Measures of Variability
Categorical Data.Discrete Data.Continuous data.
Example.
Frequency Distribution for Bookstore Sales Data(left endpoints included, but right endpoints
excluded)
Class Interval Frequency Relative Frequency$ 0125 5 5/40 =.125125250 8 8/40 =.200
250375 13 13/40 =.325375500 11 11/40 =.275500625 3 3/40 =.075
Total 40 1.000
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
Populations, Samples, and Processes.Populations and Samples
http://find/http://goback/ -
7/21/2019 Chapter 1 Notes for STAT400
19/31
Populations and SamplesRandom Variables and Statistical Populations
Brunches of statistics.Types of data.
Describing data by tables and graphs.Measures of Location
Measures of Variability
Categorical Data.Discrete Data.Continuous data.
Example.
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
Populations, Samples, and Processes.Populations and Samples
http://find/http://goback/ -
7/21/2019 Chapter 1 Notes for STAT400
20/31
Populations and SamplesRandom Variables and Statistical Populations
Brunches of statistics.Types of data.
Describing data by tables and graphs.Measures of Location
Measures of Variability
Population Averages and Sample Averages
Consider a population ofNunits, and let v1, v2, . . . , vNdenote thestatistical population corresponding to some variable.Then the
population average orpopulation mean, denoted by , is thearithmetic average of all values in the statistical population. Thus,
= 1
N
Ni=1
vi.
If the random variable Xdenotes the value of the variable of arandomly selected population unit, then a synonymous terminologyfor the population mean is expected value ofX, ormean valueofX, and is denoted by X orE(X).
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
Populations, Samples, and Processes.Populations and Samples
http://find/ -
7/21/2019 Chapter 1 Notes for STAT400
21/31
p pRandom Variables and Statistical Populations
Brunches of statistics.Types of data.
Describing data by tables and graphs.Measures of Location
Measures of Variability
Population Averages and Sample Averages
EXAMPLE: In a population of 500 tin plates, the number of plateswith 0, 1 and 2 scratches is N0= 190, N1= 160 and
N2= 150.Thus, in the statistical population v1, . . . , v500, 190 viequal 0, 160 equal 1, and 150 equal 2.The population mean is
= 1
500
500
i=1
vi=0 N0
500 +
1 N1500
+2 N2
500 = 0.92
If a tin plate is selected at random and X is the rv denoting thenumber of scratches, the mean value ofX is 0.92. (We writeX = 0.92, or E(X) = 0.92).
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
Populations, Samples, and Processes.Populations and Samples
http://find/ -
7/21/2019 Chapter 1 Notes for STAT400
22/31
p pRandom Variables and Statistical Populations
Brunches of statistics.Types of data.
Describing data by tables and graphs.Measures of Location
Measures of Variability
Population Averages and Sample Averages
If a sample of size n is taken, and x1, x2, . . . , xn denote the variablevalues of the sample units, then the sample average orsample
mean, denoted by x, is
x= 1
n
ni=1
xi
Under s.r. sampling, a sample mean approximates, but in generalis different fromthe population mean.EXAMPLE: If a s.r. sample ofn= 100 is taken from the 500 tinplates, it could be that there are n0= 40, n1= 34 and n2= 26plates with 0, 1 and 2 scratches.In this case, x= 0.86.
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
Populations, Samples, and Processes.Populations and Samples
http://find/ -
7/21/2019 Chapter 1 Notes for STAT400
23/31
Random Variables and Statistical PopulationsBrunches of statistics.
Types of data.Describing data by tables and graphs.
Measures of LocationMeasures of Variability
Median
The sample median of a set ofn measurements x1, x2. . . , xn is themiddle value when the measurements are arranged from smallest
to largest. It is denoted as xHow to compute the median
1. Order the data from smallest to largest.
2. When the number of observations n is ODDthe median ismiddle observation of the ordered sample.
3. When the number of observations n is EVEN, twoobservations from the ordered sample fall in the middle, andthe median is their average.
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
Populations, Samples, and Processes.Populations and Samples
http://find/http://goback/ -
7/21/2019 Chapter 1 Notes for STAT400
24/31
Random Variables and Statistical PopulationsBrunches of statistics.
Types of data.Describing data by tables and graphs.
Measures of LocationMeasures of Variability
Comparing mean and median
The median is not affected by a few very small or very largeobservations, whereas the presence of such extremes can have aconsiderable effect on the mean. For extremely asymmetricaldistributions, the median is likely to be a more sensible measure of
center than the mean.
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
http://find/ -
7/21/2019 Chapter 1 Notes for STAT400
25/31
Populations, Samples, and Processes.Populations and Samples
R d V i bl d St ti ti l P l ti
-
7/21/2019 Chapter 1 Notes for STAT400
26/31
Random Variables and Statistical PopulationsBrunches of statistics.
Types of data.Describing data by tables and graphs.
Measures of LocationMeasures of Variability
Sample Quartiles
Lower (first) quartile (designated Q1) = 25th percentile Second quartile (median) (designated Q2) = 50th percentile Upper (third) quartile (designated Q3) = 75th percentile
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
Populations, Samples, and Processes.Populations and Samples
Random Variables and Statistical Populations
http://find/ -
7/21/2019 Chapter 1 Notes for STAT400
27/31
Random Variables and Statistical PopulationsBrunches of statistics.
Types of data.Describing data by tables and graphs.
Measures of LocationMeasures of Variability
Population Variance and Sample Variance
Let v1, v2, . . . , vNbe a statistical population with mean .DEFINITION: The population variance, 2, is defined as
2 = 1
N
Ni=1
(vi )2.
The standard deviation is the positive square root of thevariance: = 2.If the rv Xdenotes a randomly selected value from the statisticalpopulation, then a synonymous terminology for the populationvariance is variance ofX, and is denoted by 2
X, or Var(X).
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
Populations, Samples, and Processes.Populations and Samples
Random Variables and Statistical Populations
http://find/ -
7/21/2019 Chapter 1 Notes for STAT400
28/31
Random Variables and Statistical PopulationsBrunches of statistics.
Types of data.Describing data by tables and graphs.
Measures of LocationMeasures of Variability
Population Variance and Sample Variance
Similarly, the standard deviation ofX is X =
2X
.
A simpler computational formula for the variance is
2 = 1
N
Ni=1
v2i 2
.EXAMPLE: Consider the tin plate example, so the statistical
population v1, . . . , v500, has 190 viequal 0, 160 equal 1, 150 equal2, and = 0.92.Then,
2 =190 0
500 +
1 160500
+4 150
500 0.922 = 0.6736.
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
Populations, Samples, and Processes.Populations and Samples
Random Variables and Statistical Populations
http://find/http://goback/ -
7/21/2019 Chapter 1 Notes for STAT400
29/31
Random Variables and Statistical PopulationsBrunches of statistics.
Types of data.Describing data by tables and graphs.
Measures of LocationMeasures of Variability
Population Variance and Sample Variance
Ifx1, x2, . . . , xn denotes a sample from the statistical population,the sample variance and its computational value are:
S2 = 1n 1
ni=1
(xi x)2 = 1n 1
ni=1
x2i 1n n
i=1
xi2
.
The sample standard deviation is S=S2. Under s.r.
sampling, a sample variance approximates, but in general is
different fromthe population variance.EXAMPLE: Consider the s.r. sample ofn= 100 tin plates, whichhas 40, 34 and 26 plates with 0, 1 and 2 scratches.Then,
S2 = 1
99[138 73.96] = 0.647
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
Populations, Samples, and Processes.Populations and Samples
Random Variables and Statistical Populations
http://find/ -
7/21/2019 Chapter 1 Notes for STAT400
30/31
Random Variables and Statistical PopulationsBrunches of statistics.
Types of data.Describing data by tables and graphs.
Measures of LocationMeasures of Variability
Other measures of variation
Sample range = Largest observation - Smallest observation
Sample interquartile range = Third quartile - First quartile
Five number summary:
(minimum, Q1, Q2 (median), Q3, maximum).
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
Populations, Samples, and Processes.Populations and Samples
Random Variables and Statistical Populations
http://find/http://goback/ -
7/21/2019 Chapter 1 Notes for STAT400
31/31
pBrunches of statistics.
Types of data.Describing data by tables and graphs.
Measures of LocationMeasures of Variability
Boxplot
Natalia Tchetcherina STAT400. Chapter 1. Overview and Descriptive Statistics
http://find/