business statistics chapter 1 - ca sri lanka
TRANSCRIPT
1-1 1-1
Business Statistics
Chapter 1
By:
Chinthaka Amila Kankanamge
1
1-2
2
1-1
GOALS:
Explain what is meant by statistics
Explain what is meant by descriptive statistics and inferential statistics.
Distinguish between a qualitative variable and a quantitative variable; discrete variable and a continuous variable.
Define the terms mutually exclusive and exhaustive. Distinguish among the nominal, ordinal, interval, and ratio
levels of measurement.
1-3
What is meant by Statistics?
• Statistics is the science of data which involves – collecting, – classifying – summarizing, –organizing, –analyzing, and interpreting numerical
information to assist in making effective decision
3
1-2
1-4
Types of Statistics
Descriptive Statistics: Methods of organizing, summarizing, and
presenting data in an informative way.
4
Inferential Statistics: A decision, estimate, prediction, or generalization about a population, based on a sample.
Descriptive Statistics Inferential Statistics
1-5
Types of Statistics
Descriptive Statistics: Methods of organizing, summarizing, and
presenting data in an informative way.
5
Inferential Statistics:
A decision, estimate, prediction, or generalization about a population, based on a sample.
1-6
Types of Statistics (examples of inferential statistics)
6
Eg 1: TV networks constantly monitor the popularity of heir programs by hiring Nielsen and other organizations to sample the preferences of TV viewers.
Eg 2: The accounting department of a large firm will select a sample of the invoices to check for accuracy for all the invoices of the company.
Eg 3: Wine tasters sip a few drops of wine to make a decision with respect to all the wine waiting to be released for sale.
1-7
Types of Statistics
A population is a collection of all possible individuals, objects, or measurements of interest. A parameter is a descriptive measure of the entire population of all observations of interest
A sample is a portion, or part, of the population of interest. A statistic describes a sample and serves as an estimate of the corresponding population parameter
1-8
Types of Variables
8
Qualitative or attribute (type of car owned)
Discrete (number of children)
Continuous (time taken for an exam)
Quantitative or Numerical
DATA
1-9
Types of Variables
For a Qualitative or Attribute variable the characteristic being studied is nonnumeric.
9
Gender, religious affiliation, type of automobile owned, state of birth, eye color are examples.
In a Quantitative variable information is reported numerically.
balance in your checking account, minutes remaining in class, or number of children in a family.
Quantitative variables can be classified as either discrete or continuous.
1-10
The pressure in a tire, the weight of a pork chop, or the height of students in a class.
10
Discrete variables: can only assume certain values and there are usually “gaps” between values.
The number of bedrooms in a house, or the number of hammers sold at the local Home Depot (1,2,3,…,etc).
Continuous variable can assume any value within a specified range.
1-11
Sources of Data
Primary data :
Collected for specific purpose
Direct Observation
Questionnaires
Interviewing
Secondary Data :
Collected for another purpose
11
1-12
Levels of Measurement
12
Nominal level:
Data that is classified into categories and cannot be arranged in any particular order.
eye color, gender, religious affiliation
Mutually exclusive:
An individual, object, or measurement is included in only one category.
Level of Data
Nominal Ordinal Ratio Interval
1-13
Levels of Measurement
13
Ordinal level:
involves data arranged in some order, but the differences between data values cannot be determined or are meaningless.
During a taste test of 4 soft drinks, Mellow Yellow was ranked number 1, Sprite number 2, Seven-up number 3, and Orange Crush number 4.
Exhaustive: Each individual, object, or measurement must appear in one of the categories.
1-14
Levels of Measurement (Cont..)
Interval level:
similar to the ordinal level, with the additional property that meaningful amounts of differences between data values can be determined. There is no natural zero point.
14
Temperature on the Fahrenheit scale.
1-15
15
Ratio level:
The interval level with an inherent zero starting point. Differences and ratios are meaningful for this level of measurement.
Monthly income of surgeons, or distance traveled by manufacturer’s representatives per month.
1-16
16
Level of data
Nominal
Data may only be classified
Classification of students by
district
Ordinal
Data are ranked
Your rank for this course
module
Interval
Meaningful difference
between values
Temperature
Ratio
Meaningful 0 point &
ratio between values
Number of study hours
1-17
17
Who Uses Statistics?
Statistical techniques are used extensively by marketing, accounting, quality control, consumers, hospital administrators, educators, politicians, physicians, etc...
1-18
18
For Researching problems
usually requires published data. Statistics on these problems can be found in published articles, journals, and magazines.
Published data is not always available on a given subject. In such cases, information will have to be collected and analyzed.
One way of collecting data is via questionnaires.
What are the other data collection methods?
Sources of Statistical Data
1-19
19
Chapter Two
Describing Data: Frequency Distributions and Graphic
Presentation
1-20
Presentation of Data
Row data reveals very little
Shows how a large data set can be organized and managed to provide a quick visual interpretation of the massage the data convey.
20
Methods of Data Presentation
i. Data Array
ii. Tabulation of Data
iii. Stem-and-Leaf display
iv. Frequency Distribution
1-21
Data Array Arrange data in systematic way (Ascending data array & Descending data Array)
1st Year 2nd Year
Physical 320 160
Bio 246 126
21
Tabulation of Data
Arrange data in a tables (Rows & Columns)
1-22
Frequency Distribution
A Frequency distribution is a grouping of data
into mutually exclusive categories showing the
number of observations in each class.
22
Construction of a Frequency Distribution
1-23
Frequency Distribution
Class midpoint: A point that divides a class into two equal parts. This is the average of the upper and lower class limits.
23
Class frequency:
The number of observations in each class.
Class interval:
The class interval is obtained by subtracting the lower
limit of a class from the lower limit of the next class.
1-24
Eg 1: Dr. Tillman is Dean of the School of Business Socastee University. He wishes prepare to a report showing the number of hours per week students spend studying. He selects a random sample of 30 students and determines the number of hours each student studied last week.
24
15.0, 23.7, 19.7, 15.4, 18.3, 23.0, 14.2, 20.8, 13.5, 20.7, 17.4, 18.6, 12.9, 20.3, 13.7, 21.4, 18.3, 29.8, 17.1, 18.9, 10.3, 26.1, 15.7, 14.0, 17.8, 33.8, 23.2, 12.9, 27.1, 16.6.
Organize the data into a frequency distribution.
1-25
Eg 1: continued
There are 30 observations
Two raised to the fifth power is 32.
Therefore, we should have at least 5 classes.
It turns out we will need 6.
The range is 23.5 hrs, found by 33.8 hrs – 10.3 hrs.
We choose an interval of 5 hrs.
The lower limit of the first class is 7.5 hrs.26
25
1-26
26
Hours studying Frequency, f
7.5 up to 12.5 1
12.5 up to 17.5 12
17.5 up to 22.5 10
22.5 up to 27.5 5
27.5 up to 32.5 1
32.5 up to 37.5 1
1-27
Suggestions on Constructing a Frequency
Distribution
The class intervals used in the frequency distribution should be equal.
27
classes ofNumber
ue)Lowest val - lueHighest va(i
Determine a suggested class interval by using
the formula:
Use the computed suggested class interval to
construct the frequency distribution.
1-28
Suggestions on Constructing a Frequency
Distribution
Note: this is a suggested class interval; if the
computed class interval is 97, it may be
better to use 100.
28
Count the number of values in each class.
Eg 1: A relative frequency distribution shows the
percent of observations in each class.
1-29
EXAMPLE Mr. Jayatissa wishes prepare to a report showing
the number of hours per week students spend studying. He selects a random sample of 30 students and determines the number of hours each student studied last week.
15.0, 23.7, 19.7, 15.4, 18.3,
23.0, 14.2, 20.8, 13.5, 20.7,
17.4, 18.6, 12.9, 20.3, 13.7,
21.4, 18.3, 29.8, 17.1, 18.9,
10.3, 26.1, 15.7, 14.0, 17.8,
33.8, 23.2, 12.9, 27.1, 16.6.
Organize the data into a frequency distribution.
1-30
Relative Frequency Distribution
30
Hours f Relative
Frequency
7.5 up to 12.5 1 1/30=.0333
12.5 up to 17.5 12 12/30=.400
17.5 up to 22.5 10 10/30=.333
22.5 up to 27.5 5 5/30=.1667
27.5 up to 32.5 1 1/30=.0333
32.5 up to 37.5 1 1/30=.0333
TOTAL 30 30/30=1
T
1-31
Stem-and-leaf Displays
Stem-and-leaf display:
A statistical technique for displaying a set of data. Each numerical value is divided into two parts: the leading digits become the stem and the trailing digits the leaf.
31
Note: an advantage of the stem-and-leaf display over a frequency distribution is we do not lose the identity of each observation.
1-32
Eg 2 : Colin achieved the following scores on his
twelve accounting quizzes this semester:
86, 79, 92, 84, 69, 88, 91, 83, 96, 78, 82, 85.
Construct a stem-and-leaf chart.
stem leaf
6 9
7 8 9
8 2 3 4 5 6 8
9 1 2 6
32
1-33
Graphic Presentation of a Frequency
Distribution
The three commonly used graphic forms are
histograms, frequency polygons, and a cumulative
frequency distribution.
33
A Histogram is a graph in which the classes are
marked on the horizontal axis and the class
frequencies on the vertical axis.
The class frequencies are represented by the heights
of the bars and the bars are drawn adjacent to each
other.
1-34
Graphic Presentation of a Frequency
Distribution
A frequency polygon consists of line
segments connecting the points formed by
the class midpoint and the class frequency.
34
A cumulative frequency distribution is used
to determine how many or what proportion of
the data values are below or above a certain
value.
1-35
Histogram for Hours Spent Studying
0
2
4
6
8
10
12
14
10 15 20 25 30 35
Hours spent studying
Fre
qu
en
cy
35
1-36
Frequency Polygon for Hours Spent Studying
0
2
4
6
8
10
12
14
10 15 20 25 30 35
Hours spent studying
Fre
qu
en
cy
36
1-37
Cumulative Frequency Distribution For Hours Studying
0
5
10
15
20
25
30
35
10 15 20 25 30 35
Hours Spent Studying
Frequency
37
1-38
Bar Chart
A bar chart can be used to depict any of the
levels of measurement (nominal, ordinal,
interval, or ratio).
City Number of unemployed
per 100,000 population
Atlanta, GA 7300
Boston, MA 5400
Chicago, IL 6700
Los Angeles, CA 8900
New York, NY 8200
Washington, D.C. 8900
38
Eg 3: Construct a bar chart for the number of unemployed per
100,000 population for selected cities during 2001
1-39
Bar Chart for the Unemployment Data
7300
5400
6700
89008200
8900
0
2000
4000
6000
8000
10000
1 2 3 4 5 6
Cities
# u
nem
plo
yed
/100,0
00
Atlanta
Boston
Chicago
Los Angeles
New York
Washington
39
1-40
Pie Chart
A pie chart is useful for displaying a relative frequency distribution. A circle is divided proportionally to the relative frequency and portions of the circle are allocated for the different groups.
Type of shoes # of runners
Nike 92
Adidas 49
Reebook 37
Asics 13
Other 9 40
Eg 4: A sample of 200 runners were asked to indicate their
favorite type of running shoe.
Draw a pie chart based on the following information.
1-41
Pie Chart for Running Shoes
Nike
Adidas
ReebokAsics
Other
Nike
Adidas
Reebok
Asics
Other
41
1-42
42
Chapter Three
Describing Data: Measures of Central Tendency
1-43
Characteristics of the Mean
The arithmetic mean is the most widely used measure of location.
43
It is calculated by summing the values and dividing by the number of values.
The major characteristics of the mean are:
It requires the interval scale.
All values are used.
It is unique.
The sum of the deviations from the mean is 0.
1-44
Population Mean
For ungrouped data, the population mean is
the sum of all the population values divided
by the total number of population values:
N
X
where µ is the population mean.
N is the total number of observations.
X is a particular value.
indicates the operation of adding.
1-45
Eg 1: The Kiers family owns four cars. The following is the current mileage on each of the four cars:
56,000, 23,000, 42,000, 73,000
500,484
000,73...000,56
N
X
Find the mean mileage for the cars.
A Parameter is a measurable characteristic of a population.
1-46
Sample Mean
• For ungrouped data, the sample mean is the
sum of all the sample values divided by the
number of sample values:
Where n is the total number of values in the
sample.
n
XX
1-47
Eg 2: A sample of five executives received the
following bonus last year ($000):
14.0, 15.0, 17.0, 16.0, 15.0
4.155
77
5
0.15...0.14
n
XX
A statistic is a measurable characteristic of a
sample.
1-48
Properties of the Arithmetic Mean
• Every set of interval-level and ratio-level data
has a mean.
• All the values are included in computing the
mean.
• A set of data has a unique mean.
• The mean is affected by unusually large or small
data values.
• The arithmetic mean is the only measure of
central tendency where the sum of the deviations
of each value from the mean is zero.
1-49
Weighted Mean
Eg 3: Consider the set of values: 3, 8, and 4. The
mean is 5. Illustrating the fifth property:
)21
)2211
...(
...(
n
nnw
www
XwXwXwX
0)54()58()53()( XX
The weighted mean of a set of numbers X1, X2, ...,
Xn, with corresponding weights w1, w2, ...,wn, is
computed from the following formula:
1-50
Eg 6: During a one hour period on a hot Saturday
afternoon cabana boy Chris served fifty
drinks. He sold five drinks for $0.50, fifteen
for $0.75, fifteen for $0.90, and fifteen for
$1.10. Compute the weighted mean of the
price of the drinks.
89.0$50
50.44$
1515155
)15.1($15)90.0($15)75.0($15)50.0($5
wX
1-51
The Median
• The Median is the midpoint of the values after
they have been ordered from the smallest to the
largest. There are as many values above the median as below it in the data array.
For an even set of values, the median will be the arithmetic average of the two middle numbers.
1-52
Eg 4 : The ages for a sample of five college students are:
21, 25, 19, 20, 22
Arranging the data in ascending order gives: 19, 20, 21, 22, 25. Thus the median is 21.
Eg 5 : The heights of four basketball players, in
inches, are:
76, 73, 80, 75
Arranging the data in ascending order gives:
73, 75, 76, 80. Thus the median is 75.5
1-53
Properties of the Median
• There is a unique median for each data set.
• It is not affected by extremely large or small values and is therefore a valuable measure of central tendency when such values occur.
• It can be computed for ratio-level, interval-level, and ordinal-level data.
• It can be computed for an open-ended frequency distribution if the median does not lie in an open-ended class.
1-54
The Mode
• The mode is the value of the observation
that appears most frequently.
Eg 5: The exam scores for ten students are: 81, 93,
84, 75, 68, 87, 81, 75, 81, 87. Because the score
of 81 occurs the most often, it is the mode.
1-55
Geometric Mean
• The geometric mean (GM) of a set of n
numbers is defined as the nth root of the
product of the n numbers. The formula is:
The geometric mean is used to average
percents, indexes, and relatives.
GM X X X Xnn ( )( )( )...( )1 2 3
1-56
• The interest rate on three bonds were 5, 21, and 4
percent.
• The geometric mean is
• The arithmetic mean is (5+21+4)/3 =10.0
• The GM gives a more conservative profit figure
because it is not heavily weighted by the rate of
21percent.
49.7)4)(21)(5(3 GM
1-57
Geometric Mean continued
• Another use of the geometric mean is to
determine the percent increase in sales,
production or other business or economic series
from one time period to another.
1period) of beginningat (Value
period) of endat Value( nGM
1-58
Eg 8 : The total number of females enrolled in American
colleges increased from 755,000 in 1992 to
835,000 in 2000. That is, the geometric mean rate
of increase is 1.27%.
0127.1000,755
000,8358 GM
1-59
Eg 9 : There are many flights from Houston to Little Rock, AK each day. The data below shows the number of minutes a flight was late (or early) in arriving in Little Rock for a sample of 5 flights. To explain, a positive number means the flight was late, a value of 0 indicates it arrived on time, and a negative number indicates it was early. So the first flight was 4 minutes late and the last flight 10 minutes early.
4 12 -9 6 -10
59
a. Determine the mean amount flights were late (or early).
b. Determine the median amount flights were late (or early).
1-60
60
60.05
3X
4Median
a. Determine the mean amount flights were late (or early).
b. Determine the median amount flights were late (or early).
1-61
Eg10: Suppose your cousin started a management job with Ford Motor Company in 1990 at $30,000 per year. In the year 2002 her salary was $65,000. What was the geometric mean rate of increase per year for the period?
Eg 11: For a sample of 50 stocks traded yesterday on the American Stock Exchange, 10 showed a decline of $1.00, 15 showed no change, and 25 increased by $2.00. Find the weighted mean.
61
06655.00.106655.100.1000,30$
000,65$12 GM
80.0$251510
)00.2($25)0($15)00.1$(10
wX
1-62
The Mean of Grouped Data
The mean of a sample of data organized in a
frequency distribution is computed by the
following formula:
6.610
66
n
XX
n
XfX
Eg12: A sample of ten movie theaters in a large
metropolitan area tallied the total number of movies
showing last week. Compute the mean number of
movies showing.
1-63
Eg 12 : continued
Movies
showing
frequency
f
class
midpoint
X
(f)(X)
1 up to 3 1 2 2
3 up to 5 2 4 8
5 up to 7 3 6 18
7 up to 9 1 8 8
9 up to 11 3 10 30
Total 10 66
1-64
The Median of Grouped Data
• The median of a sample of data organized in a
frequency distribution is computed by:
)(2 if
CFn
LMedian
where L is the lower limit of the median class, CF is
the cumulative frequency preceding the median class,
f is the frequency of the median class, and i is the
median class interval.
1-65
Finding the Median Class
To determine the median class for grouped
data:
– Construct a cumulative frequency distribution.
– Divide the total number of data values by 2.
– Determine which class will contain this value.
For example, if n=50, 50/2 = 25, then
determine which class will contain the 25th
value.
1-66
Eg13 :
33.6)2(3
32
10
5)(2
if
CFn
LMedian
Movies
showing
Frequency Cumulative
Frequency
1 up to 3 1 1
3 up to 5 2 3
5 up to 7 3 6
7 up to 9 1 7
9 up to 11 3 10
From the table, L=5, n=10, f=3, i=2, CF=3
1-67
The Mode of Grouped Data
• The mode for grouped data is approximated
by the midpoint of the class with the largest
class frequency.
The modes in Eg 13 are 8 and 10. When two values
occur a large number of times, the distribution is
called bimodal, as in Eg 13.
1-68
Eg14: The following frequency distribution reports the number of students enrolled in each of the 50 sections of various courses taught in the College of Business last summer.
Students Frequency
0 up to 10 3
10 up to 20 8
20 up to 30 16
30 up to 40 10
40 up to 50 9
50 up to 60 4
Total 50
a. Determine the mean number of students per section.
b. Determine the median number of students per section. 68
2.3050
1510X
75.281016
112520
Median
1-69
Symmetric Distribution
zero skewness mode = median = mean
1-70
Right Skewed Distribution
positively skewed: Mean and Median are to
the right of the Mode.
Mode < Median < Mean
1-71
Left Skewed Distribution
Negatively Skewed: Mean and Median are to the
left of the Mode.
Mean < Median < Mode
1-72
End of the Chapter
• Thank you
• Questions
72