chap 2-1 ef 507 quantitative methods for economics and finance fall 2008 chapter 2 describing data:...

38
Chap 2-1 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 2 Describing Data: Graphical

Post on 20-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Chap 2-1

EF 507

QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE

FALL 2008

Chapter 2

Describing Data: Graphical

Chap 2-2

Chapter Goals

After completing this chapter, you should be able to: Identify types of data and levels of measurement Create and interpret graphs to describe categorical variables:

bar chart, pie chart Create a line chart to describe time-series data Create and interpret graphs to describe numerical variables:

Histogram Construct and interpret graphs to describe relationships between

variables Describe appropriate and inappropriate ways to display data

graphically

Chap 2-3

Types of Data

Data

Categorical Numerical

Discrete Continuous

Examples:

Marital Status Are you registered to

vote? Eye Color (Defined categories or

groups)

Examples:

Number of Children Defects per hour (Counted items)

Examples:

Weight Voltage (Measured characteristics)

Chap 2-4

Measurement Levels

Interval Data

Ordinal Data

Nominal Data

Quantitative Data

Qualitative Data

Categories (no ordering or direction)

Ordered Categories (rankings, order, or scaling)

Differences between measurements but no true zero

Ratio DataDifferences between measurements, true zero exists

Chap 2-5

Graphical Presentation of Data

Data in raw form are usually not easy to use for decision making

Some type of organization is needed Table Graph

The type of graph to use depends on the variable being summarized

Chap 2-6

Graphical Presentation of Data

Techniques reviewed in this chapter:

CategoricalVariables

NumericalVariables

• Bar chart• Pie chart• Pareto diagram

• Line chart• Histogram• Scatter plot

(continued)

Chap 2-7

Tables and Graphs for Categorical Variables

Categorical Data

Graphing Data

Pie Chart

Bar Chart

Chap 2-8

The Frequency Distribution Table

Example: Hospital Patients by Unit

Hospital Unit Number of Patients

Cardiac Care 1,052

Emergency 2,245

Intensive Care 340

Maternity 552

Surgery 4,630(Variables are categorical)

Summarize data by category

Chap 2-9

Bar and Pie Charts

Bar charts and Pie charts are often used for qualitative (category) data

Height of bar or size of pie slice shows the frequency or percentage for each category

Chap 2-10

Bar Chart Example

Hospital Patients by Unit

0

1000

2000

3000

4000

5000

Car

dia

cC

are

Em

erg

ency

Inte

nsi

veC

are

Mat

ern

ity

Su

rger

y

Nu

mb

er

of

pa

tie

nts

pe

r y

ea

r

Hospital Number Unit of Patients

Cardiac Care 1,052

Emergency 2,245

Intensive Care 340

Maternity 552

Surgery 4,630

Chap 2-11

Hospital Patients by Unit

Emergency25%

Maternity6%

Surgery53%

Cardiac Care12%

Intensive Care4%

Pie Chart Example

(Percentages are rounded to the nearest percent)

Hospital Number % of Unit of Patients Total

Cardiac Care 1,052 11.93Emergency 2,245 25.46Intensive Care 340 3.86Maternity 552 6.26Surgery 4,630 52.50

Chap 2-12

Pareto Diagram

Used to portray categorical data

A bar chart, where categories are shown in

descending order of frequency

A cumulative polygon is often shown in the

same graph

Used to separate the “vital few” from the “trivial

many”

Chap 2-13

Example: 400 defective items are examined for cause of defect:

Source of Manufacturing Error Number of defects

Bad Weld 34

Poor Alignment 223

Missing Part 25

Paint Flaw 78

Electrical Short 19

Cracked case 21

Total 400

Pareto Diagram Example

Chap 2-14

Step 1: Sort by defect cause, in descending orderStep 2: Determine % in each category

Source of Manufacturing Error Number of defects % of Total Defects

Poor Alignment 223 55.75

Paint Flaw 78 19.50

Bad Weld 34 8.50

Missing Part 25 6.25

Cracked case 21 5.25

Electrical Short 19 4.75

Total 400 100%

Pareto Diagram Example(continued)

Chap 2-15

Pareto Diagram Examplecu

mu

lative % (lin

e grap

h)%

of

def

ects

in

eac

h c

ateg

ory

(b

ar g

rap

h)

Pareto Diagram: Cause of Manufacturing Defect

0%

10%

20%

30%

40%

50%

60%

Poor Alignment Paint Flaw Bad Weld Missing Part Cracked case Electrical Short

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Step 3: Show results graphically

(continued)

Chap 2-16

Graphs for Time-Series Data

A line chart (time-series plot) is used to show the values of a variable over time

Time is measured on the horizontal axis

The variable of interest is measured on the vertical axis

Chap 2-17

Line Chart Example

Magazine Subscriptions by Year

0

50

100

150

200

250

300

350

19

90

19

91

19

92

19

93

19

94

19

95

19

96

19

97

19

98

19

99

20

00

20

01

20

02

20

03

20

04

20

05

20

06

Th

ou

sa

nd

s o

f s

ub

sc

rib

ers

Chap 2-18

Numerical Data

Histogram

Frequency Distributions and

Cumulative Distributions

Graphs to Describe Numerical Variables

Chap 2-19

Histogram

A graph of the data in a frequency distribution is called a histogram

The interval endpoints are shown on the horizontal axis

the vertical axis is either frequency, relative frequency, or percentage

Bars of the appropriate heights are used to represent the number of observations within each class

Chap 2-20

Histogram : Daily High Tem perature

0

3

6

5

4

2

00

1

2

3

4

5

6

7

0 10 20 30 40 50 60

Fre

qu

ency

Temperature in Degrees

Histogram Example

(No gaps between

bars)

Interval

10 but less than 20 3

20 but less than 30 6

30 but less than 40 5

40 but less than 50 4

50 but less than 60 2

Frequency

0 10 20 30 40 50 60 70

Chap 2-21

Histograms in Excel

Select

Tools/Data Analysis

1

Chap 2-22

Choose Histogram

2

3

Input data range and bin range (bin range is a cell range containing the upper interval endpoints for each class grouping)

Select Chart Output and click “OK”

Histograms in Excel(continued)

(

Chap 2-23

Questions for Grouping Data into Intervals

1. How wide should each interval be? (How many classes should be used?)

2. How should the endpoints of the intervals be determined?

Often answered by trial and error, subject to user judgment

The goal is to create a distribution that is neither too "jagged" nor too "blocky”

Goal is to appropriately show the pattern of variation in the data

Chap 2-24

How Many Class Intervals?

Many (Narrow class intervals) may yield a very jagged distribution

with gaps from empty classes Can give a poor indication of how

frequency varies across classes

Few (Wide class intervals) may compress variation too much and

yield a blocky distribution can obscure important patterns of

variation. 0

2

4

6

8

10

12

0 30 60 More

TemperatureF

req

ue

nc

y

0

0.5

1

1.5

2

2.5

3

3.5

4 8

12 16 20 24 28 32 36 40 44 48 52 56 60

Mor

e

Temperature

Fre

qu

ency

(X axis labels are upper class endpoints)

Chap 2-25

Distribution Shape

The shape of the distribution is said to be symmetric if the observations are balanced, or evenly distributed, about the center.

Symmetric Distribution

0123456789

10

1 2 3 4 5 6 7 8 9

Fre

qu

ency

Chap 2-26

Distribution Shape

The shape of the distribution is said to be skewed if the observations are not symmetrically distributed around the center.

(continued)

Positively Skewed Distribution

0

2

4

6

8

10

12

1 2 3 4 5 6 7 8 9

Fre

qu

ency

Negatively Skewed Distribution

0

2

4

6

8

10

12

1 2 3 4 5 6 7 8 9

Fre

qu

ency

A positively skewed distribution (skewed to the right) has a tail that extends to the right in the direction of positive values.

A negatively skewed distribution (skewed to the left) has a tail that extends to the left in the direction of negative values.

Chap 2-27

Relationships Between Variables

Graphs illustrated so far have involved only a single variable

When two variables exist other techniques are used:

Categorical(Qualitative)

Variables

Numerical(Quantitative)

Variables

Cross tables Scatter plots

Chap 2-28

Scatter Diagrams are used for paired observations taken from two numerical variables

The Scatter Diagram: one variable is measured on the vertical

axis and the other variable is measured on the horizontal axis

Scatter Diagrams

Chap 2-29

Scatter Diagram Example

Cost per Day vs. Production Volume

0

50

100

150

200

250

0 10 20 30 40 50 60 70

Volume per Day

Cos

t per

Day

Volume per day

Cost per day

23 125

26 140

29 146

33 160

38 167

42 170

50 188

55 195

60 200

Chap 2-30

Scatter Diagrams in Excel

Select the chart wizard

1

2Select XY(Scatter) option,

then click “Next”

When prompted, enter the data range, desired legend, and desired destination to complete the scatter diagram

3

Chap 2-31

Side by side bar charts

Graphing Multivariate Categorical Data

Comparing Investors

0 10 20 30 40 50 60

S toc k s

B onds

CD

S avings

Inves tor A Inves tor B Inves tor C

Chap 2-32

Side-by-Side Chart Example Sales by quarter for three sales territories:

0

10

20

30

40

50

60

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

EastWestNorth

1st Qtr 2nd Qtr 3rd Qtr 4th QtrEast 20.4 27.4 59 20.4West 30.6 38.6 34.6 31.6North 45.9 46.9 45 43.9

Chap 2-33

Data Presentation Errors

Goals for effective data presentation:

Present data to display essential information

Communicate complex ideas clearly and

accurately

Avoid distortion that might convey the wrong

message

Chap 2-34

Unequal histogram interval widths Compressing or distorting the

vertical axis Providing no zero point on the

vertical axis Failing to provide a relative basis

in comparing data between groups

Data Presentation Errors(continued)

Chap 2-35

Chapter Summary

Reviewed types of data and measurement levels Data in raw form are usually not easy to use for decision

making -- Some type of organization is needed:

Table Graph

Techniques reviewed in this chapter:

Frequency distribution Bar chart Pie chart Pareto diagram

Line chart Frequency distribution Histogram Scatter plot Side-by-side bar charts

Chap 2-36

Which of the following variables is an example of a categorical variable?

A. The amount of money you spend on eating out each month.

B. The time it takes you to write a test.

C. The geographic region of the country in which you live.

D. The weight of a cereal box.

Chap 2-37

The data in the time series plot below represents monthly sales for two years of beanbag animals at a local retail store (Month 1 represents January and Month 12 represents December). Do you see any obvious patterns in the data? Explain.

This is a representation of seasonal data. There seems to be a small increase in months 3, 4, and 5 and a large increase at the end of the year. The sales of this item seem to peak in December and have a significant drop off in January.

Chap 2-38

At a large company, the majority of the employees earn from $20,000 to $30,000 per year. Middle management employees earn between $30,000 and $50,000 per year while top management earn between $50,000 and $100,000 per year. A histogram of all salaries would have which of the following shapes?

a. Symmetrical

b. Uniform

c. Skewed to right

d. Skewed to left