chap 2-1 ef 507 quantitative methods for economics and finance fall 2008 chapter 2 describing data:...
Post on 20-Dec-2015
219 views
TRANSCRIPT
Chap 2-1
EF 507
QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE
FALL 2008
Chapter 2
Describing Data: Graphical
Chap 2-2
Chapter Goals
After completing this chapter, you should be able to: Identify types of data and levels of measurement Create and interpret graphs to describe categorical variables:
bar chart, pie chart Create a line chart to describe time-series data Create and interpret graphs to describe numerical variables:
Histogram Construct and interpret graphs to describe relationships between
variables Describe appropriate and inappropriate ways to display data
graphically
Chap 2-3
Types of Data
Data
Categorical Numerical
Discrete Continuous
Examples:
Marital Status Are you registered to
vote? Eye Color (Defined categories or
groups)
Examples:
Number of Children Defects per hour (Counted items)
Examples:
Weight Voltage (Measured characteristics)
Chap 2-4
Measurement Levels
Interval Data
Ordinal Data
Nominal Data
Quantitative Data
Qualitative Data
Categories (no ordering or direction)
Ordered Categories (rankings, order, or scaling)
Differences between measurements but no true zero
Ratio DataDifferences between measurements, true zero exists
Chap 2-5
Graphical Presentation of Data
Data in raw form are usually not easy to use for decision making
Some type of organization is needed Table Graph
The type of graph to use depends on the variable being summarized
Chap 2-6
Graphical Presentation of Data
Techniques reviewed in this chapter:
CategoricalVariables
NumericalVariables
• Bar chart• Pie chart• Pareto diagram
• Line chart• Histogram• Scatter plot
(continued)
Chap 2-7
Tables and Graphs for Categorical Variables
Categorical Data
Graphing Data
Pie Chart
Bar Chart
Chap 2-8
The Frequency Distribution Table
Example: Hospital Patients by Unit
Hospital Unit Number of Patients
Cardiac Care 1,052
Emergency 2,245
Intensive Care 340
Maternity 552
Surgery 4,630(Variables are categorical)
Summarize data by category
Chap 2-9
Bar and Pie Charts
Bar charts and Pie charts are often used for qualitative (category) data
Height of bar or size of pie slice shows the frequency or percentage for each category
Chap 2-10
Bar Chart Example
Hospital Patients by Unit
0
1000
2000
3000
4000
5000
Car
dia
cC
are
Em
erg
ency
Inte
nsi
veC
are
Mat
ern
ity
Su
rger
y
Nu
mb
er
of
pa
tie
nts
pe
r y
ea
r
Hospital Number Unit of Patients
Cardiac Care 1,052
Emergency 2,245
Intensive Care 340
Maternity 552
Surgery 4,630
Chap 2-11
Hospital Patients by Unit
Emergency25%
Maternity6%
Surgery53%
Cardiac Care12%
Intensive Care4%
Pie Chart Example
(Percentages are rounded to the nearest percent)
Hospital Number % of Unit of Patients Total
Cardiac Care 1,052 11.93Emergency 2,245 25.46Intensive Care 340 3.86Maternity 552 6.26Surgery 4,630 52.50
Chap 2-12
Pareto Diagram
Used to portray categorical data
A bar chart, where categories are shown in
descending order of frequency
A cumulative polygon is often shown in the
same graph
Used to separate the “vital few” from the “trivial
many”
Chap 2-13
Example: 400 defective items are examined for cause of defect:
Source of Manufacturing Error Number of defects
Bad Weld 34
Poor Alignment 223
Missing Part 25
Paint Flaw 78
Electrical Short 19
Cracked case 21
Total 400
Pareto Diagram Example
Chap 2-14
Step 1: Sort by defect cause, in descending orderStep 2: Determine % in each category
Source of Manufacturing Error Number of defects % of Total Defects
Poor Alignment 223 55.75
Paint Flaw 78 19.50
Bad Weld 34 8.50
Missing Part 25 6.25
Cracked case 21 5.25
Electrical Short 19 4.75
Total 400 100%
Pareto Diagram Example(continued)
Chap 2-15
Pareto Diagram Examplecu
mu
lative % (lin
e grap
h)%
of
def
ects
in
eac
h c
ateg
ory
(b
ar g
rap
h)
Pareto Diagram: Cause of Manufacturing Defect
0%
10%
20%
30%
40%
50%
60%
Poor Alignment Paint Flaw Bad Weld Missing Part Cracked case Electrical Short
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Step 3: Show results graphically
(continued)
Chap 2-16
Graphs for Time-Series Data
A line chart (time-series plot) is used to show the values of a variable over time
Time is measured on the horizontal axis
The variable of interest is measured on the vertical axis
Chap 2-17
Line Chart Example
Magazine Subscriptions by Year
0
50
100
150
200
250
300
350
19
90
19
91
19
92
19
93
19
94
19
95
19
96
19
97
19
98
19
99
20
00
20
01
20
02
20
03
20
04
20
05
20
06
Th
ou
sa
nd
s o
f s
ub
sc
rib
ers
Chap 2-18
Numerical Data
Histogram
Frequency Distributions and
Cumulative Distributions
Graphs to Describe Numerical Variables
Chap 2-19
Histogram
A graph of the data in a frequency distribution is called a histogram
The interval endpoints are shown on the horizontal axis
the vertical axis is either frequency, relative frequency, or percentage
Bars of the appropriate heights are used to represent the number of observations within each class
Chap 2-20
Histogram : Daily High Tem perature
0
3
6
5
4
2
00
1
2
3
4
5
6
7
0 10 20 30 40 50 60
Fre
qu
ency
Temperature in Degrees
Histogram Example
(No gaps between
bars)
Interval
10 but less than 20 3
20 but less than 30 6
30 but less than 40 5
40 but less than 50 4
50 but less than 60 2
Frequency
0 10 20 30 40 50 60 70
Chap 2-22
Choose Histogram
2
3
Input data range and bin range (bin range is a cell range containing the upper interval endpoints for each class grouping)
Select Chart Output and click “OK”
Histograms in Excel(continued)
(
Chap 2-23
Questions for Grouping Data into Intervals
1. How wide should each interval be? (How many classes should be used?)
2. How should the endpoints of the intervals be determined?
Often answered by trial and error, subject to user judgment
The goal is to create a distribution that is neither too "jagged" nor too "blocky”
Goal is to appropriately show the pattern of variation in the data
Chap 2-24
How Many Class Intervals?
Many (Narrow class intervals) may yield a very jagged distribution
with gaps from empty classes Can give a poor indication of how
frequency varies across classes
Few (Wide class intervals) may compress variation too much and
yield a blocky distribution can obscure important patterns of
variation. 0
2
4
6
8
10
12
0 30 60 More
TemperatureF
req
ue
nc
y
0
0.5
1
1.5
2
2.5
3
3.5
4 8
12 16 20 24 28 32 36 40 44 48 52 56 60
Mor
e
Temperature
Fre
qu
ency
(X axis labels are upper class endpoints)
Chap 2-25
Distribution Shape
The shape of the distribution is said to be symmetric if the observations are balanced, or evenly distributed, about the center.
Symmetric Distribution
0123456789
10
1 2 3 4 5 6 7 8 9
Fre
qu
ency
Chap 2-26
Distribution Shape
The shape of the distribution is said to be skewed if the observations are not symmetrically distributed around the center.
(continued)
Positively Skewed Distribution
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9
Fre
qu
ency
Negatively Skewed Distribution
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9
Fre
qu
ency
A positively skewed distribution (skewed to the right) has a tail that extends to the right in the direction of positive values.
A negatively skewed distribution (skewed to the left) has a tail that extends to the left in the direction of negative values.
Chap 2-27
Relationships Between Variables
Graphs illustrated so far have involved only a single variable
When two variables exist other techniques are used:
Categorical(Qualitative)
Variables
Numerical(Quantitative)
Variables
Cross tables Scatter plots
Chap 2-28
Scatter Diagrams are used for paired observations taken from two numerical variables
The Scatter Diagram: one variable is measured on the vertical
axis and the other variable is measured on the horizontal axis
Scatter Diagrams
Chap 2-29
Scatter Diagram Example
Cost per Day vs. Production Volume
0
50
100
150
200
250
0 10 20 30 40 50 60 70
Volume per Day
Cos
t per
Day
Volume per day
Cost per day
23 125
26 140
29 146
33 160
38 167
42 170
50 188
55 195
60 200
Chap 2-30
Scatter Diagrams in Excel
Select the chart wizard
1
2Select XY(Scatter) option,
then click “Next”
When prompted, enter the data range, desired legend, and desired destination to complete the scatter diagram
3
Chap 2-31
Side by side bar charts
Graphing Multivariate Categorical Data
Comparing Investors
0 10 20 30 40 50 60
S toc k s
B onds
CD
S avings
Inves tor A Inves tor B Inves tor C
Chap 2-32
Side-by-Side Chart Example Sales by quarter for three sales territories:
0
10
20
30
40
50
60
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
EastWestNorth
1st Qtr 2nd Qtr 3rd Qtr 4th QtrEast 20.4 27.4 59 20.4West 30.6 38.6 34.6 31.6North 45.9 46.9 45 43.9
Chap 2-33
Data Presentation Errors
Goals for effective data presentation:
Present data to display essential information
Communicate complex ideas clearly and
accurately
Avoid distortion that might convey the wrong
message
Chap 2-34
Unequal histogram interval widths Compressing or distorting the
vertical axis Providing no zero point on the
vertical axis Failing to provide a relative basis
in comparing data between groups
Data Presentation Errors(continued)
Chap 2-35
Chapter Summary
Reviewed types of data and measurement levels Data in raw form are usually not easy to use for decision
making -- Some type of organization is needed:
Table Graph
Techniques reviewed in this chapter:
Frequency distribution Bar chart Pie chart Pareto diagram
Line chart Frequency distribution Histogram Scatter plot Side-by-side bar charts
Chap 2-36
Which of the following variables is an example of a categorical variable?
A. The amount of money you spend on eating out each month.
B. The time it takes you to write a test.
C. The geographic region of the country in which you live.
D. The weight of a cereal box.
Chap 2-37
The data in the time series plot below represents monthly sales for two years of beanbag animals at a local retail store (Month 1 represents January and Month 12 represents December). Do you see any obvious patterns in the data? Explain.
This is a representation of seasonal data. There seems to be a small increase in months 3, 4, and 5 and a large increase at the end of the year. The sales of this item seem to peak in December and have a significant drop off in January.
Chap 2-38
At a large company, the majority of the employees earn from $20,000 to $30,000 per year. Middle management employees earn between $30,000 and $50,000 per year while top management earn between $50,000 and $100,000 per year. A histogram of all salaries would have which of the following shapes?
a. Symmetrical
b. Uniform
c. Skewed to right
d. Skewed to left