statistics for decision making sta 253

Post on 24-Feb-2016

33 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Dr. Ginner W. Hudson Covenant College. Statistics for Decision Making STA 253. 1.1 Examining Distributions - Intro. A statistical analysis starts with a set of … Data We construct a set of data by first deciding what cases or individuals that we want to study. - PowerPoint PPT Presentation

TRANSCRIPT

1

Statistics forDecision MakingSTA 253

Dr. Ginner W. HudsonCovenant College

2

1.1 Examining Distributions - Intro

A statistical analysis starts with a set of …

Data We construct a set of data by first

deciding what cases or individuals that we want to study.

For each case/individual we record information about characteristics that we call variables.

3

Constructing Our Data Set

Looking at data …Individuals, cases, records

the WHOObservation Takes

PlaceVariable – a characteristic of a

case

the WHAT

4

Important terms

Individuals (cases, records): Objects described by the data. Ex: customers, cities, patients, cars

Variable: A characteristic of a case. Ex: profit, duration of a service call, number of customers, gender Different cases can have different values for the variables. Some variables may be a label to distinguish the different cases.

Distribution of a variable: the values the variable takes and how often it takes them.

5

To better understand a data set, ask:

Who? What cases (individuals) do the data describe? How many cases (individuals)?

Think of an assembly line with the WHO passing by on the conveyor belt and the variables of interest being observed.

6

To better understand a data set, ask:

Who? What cases do the data describe? How many cases?

What? How many variables? What is the exact definition of each variable? What is the unit of measurement for each variable?

Why? What is the purpose of the data? What questions are being asked? Are the variables suitable?

7

Types of variables

Quantitative Variable: Takes numerical values for which we can do

arithmetic Ex: credit card balance, number of employees,

time until customer is served, age

Discrete or continuous? Categorical Variable:

Places a case into one of several groups or categories

Ex: gender, brand of credit card, own a home (yes/no)

8

Example: An iTunes playlist

9

Example: Grade book data for statistics course

10

Example: The FAA

The Federal Aviation Administration (FAA) monitors airlines for safety and customer service. For each flight the carrier must report the type of aircraft, number of passengers, whether or not the flights departed and arrived on schedule, and any mechanical problems.

Identify the WHO.• The FAA• The airline carriers• The passengers• The flights• None of the above

11

12

Example: The common cold

Scientists at a major pharmaceutical firm conducted an experiment to study the effectiveness of an herbal compound to treat the common cold. They exposed volunteers to a cold virus, then gave them either the herbal compound or a useless sugar solution. Several days later they assessed each patient’s condition using a cold severity scale ranging from 0-5.

Identify the WHO.• Scientists• Volunteers• The pharmaceutical firm• The herbal compound• None of the above

13

14

Displaying distributions with graphs

Ways to chart categorical data Bar/column graphs (called Pareto

charts when ordered) Pie charts

Ways to chart quantitative data Histograms Stemplots Time plots

15

Law firm example

A law firm studies the gender of their clients. They find 55% are males and 45% are females.

Cases: Variable: Distribution:

Values: Male, Female How often: 55% and 45%, respectively

Are the data (the variable) categorical or quantitative?

16

17

Credit card example

A credit card company studies the spending behavior of their 21- to 25- year-old customers with a $1000 credit limit. They randomly select 100 of them and record the following variables for each person. For each item identify the type of variable.

Average balance on their card over the last year Whether customer has ever made late payments Which day of the week their card is used the most Customer’s age (in years)

18

Credit card example

For each item, give its possible values.

Average balance on their card over the last year Quantitative: $0.00 through $1000.00

Whether customer has ever made late payments Categorical: Yes, No

Which day of the week their card is used the most Categorical: Sunday, Monday, Tuesday, …, Saturday

Customer’s age (in years) Quantitative: 21, 22, 23, 24, 25 years

19

Displaying categorical data Purpose:

Summarize the data so the reader can grasp the distribution quickly

Process: List the categories Give either the count or the percent of cases that fall into each category

Methods: Tables, pie charts, bar/column graphs, Pareto charts

20

Ways to chart categorical dataBecause the variable is categorical, the data in the graph can be ordered any way we want (alphabetical, by increasing value, by year, by personal preference, etc.).

Bar graphsEach category is represented by

a bar.

Pie chartsThe slices must represent the parts of

one whole.

23

Bar graph sorted by rank (Pareto Chart) Easy to analyze

Automobile Accidents per day of the week

Sorted chronologically Much less useful

24

Ways to chart quantitative data

Histograms and stemplotsThese are summary graphs for a single variable. They are very useful to understand the pattern of variability in the data.

Line graphs: time plotsUse when there is a meaningful sequence, like time. The line connecting the points helps emphasize any change over time.

Histograms

The range of values that a variable can take is divided into equal size intervals.

The histogram shows the number of individual data points that fall in each interval. Example: Histogram of the

December 2004 unemployment rates in the 50 states and Puerto Rico.

26

How to create a histogram

It is an iterative process – try and try again.What bin size should you use?

Not too many bins with either 0 or 1 counts Not overly summarized that you loose all the

information Not so detailed that it is no longer summary

rule of thumb: start with 5 to10 bins

Look at the distribution and refine your bins

(There isn’t a unique or “perfect” solution)

Interpreting histograms

When describing the distribution of a quantitative variable, we look for the overall pattern and for striking deviations from that pattern. We can describe the overall pattern of a histogram by its shape, center, and spread.

Histogram with a line connecting each column too

detailed

Histogram with a smoothed curve highlighting the overall

pattern of the distribution

28

Common distribution patterns (shapes)

SymmetricLeft and right sides are mirror images of each other (or

close)

29

Common distribution patterns (shapes)

Skewed leftLeft side extends farther out than the right side

30

Common distribution patterns (shapes)

Skewed rightRight side extends farther out than the left side

31

Common distribution patterns (shapes)

Many shapes are bimodel or complexTwo peaksFirst part symmetric; flat in the middle; increasing at the

end

32

Outliers

An important kind of deviation is an outlier.

Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them.

33

Alaska Florida

Outliers

The overall pattern is fairly symmetrical except for two states clearly not belonging to the main trend. Alaska and Florida have unusual representation of the elderly in their population.

A large gap in the distribution is typically a sign of an outlier.

34

IMPORTANT NOTE:Your data are the way they are. Do not try to force them into a particular shape.

Example: US Female Population 1997

35

It is a common misconception that if you have a large enough data set, the data will eventually turn out nice and symmetrical.

Example: Dry Days per Month 1995

Histogram of dry days in 1995

36

Example: Customer Service Center Call Lengths

37

Example: Customer Service Center Call Lengths

Why were there so many calls lasting 10 seconds or less?

38

Example: Customer Service Center Call Lengths

Example: Customer Service Center Call LengthsThe inappropriate actions by customer service reps were hidden in this histogram where the software chose the classes (bin intervals).

42

Example: Constructing a Histogram

Class Exercise: GDP by Country

2005 Growth Domestic Product (GDP)

Growth Rates for 30 Industrialized Countries

Country Growth Rate %

Turkey 7.4Czech Republic 6.1

Slovakia 6.1

Hungary 4.1

South Korea 4.0

Luxembourg 4.0

Greece 3.7

Poland 3.4

Spain 3.4

Denmark 3.2

United States 3.2

Mexico 3.0

Canada 2.9

Finland 2.9

Sweden 2.7

Japan 2.6

Australia 2.5

New Zealand 2.3

Norway 2.3

Austria 2.0

Switzerland 1.9

United Kingdom 1.9

Belgium 1.5

Netherlands 1.5

France 1.2

Germany 0.9

Portugal 0.4

Italy 0.0

               

               

               

               

               

               

               

               

               

               

43

StatTutor

StatsPortal

Example: T-bill interest rates

What is this type of plot called?

What is a Time Series?

Time series -- observations collected over time

Time plot -- plot of the data over time

Identifying Trends in the Data

Trend- gradual increases or decreases over time

0

10

20

30

40

50

60

70

1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004Year

In m

illio

ns

Annual Sales – XYZ Company

Other Common Components Of Time Series

0

5

10

15

20

25

30

35

1st 2nd 3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd0

10

20

30

40

50

60

70

'80 '81 '82 '83 '84 '85 '86 '87 '88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99

Seasonality Cycles

Quarter Year

Line Graphs: Time Plots

Retail Price of Fresh Oranges over Time

This time plot shows a regular pattern of yearly variations. These are seasonal variations in fresh orange pricing most likely due to similar seasonal variations in the production of fresh oranges. There is also an overall upward trend in pricing over time. It could simply be reflecting inflation trends or a more fundamental change in this industry.

Time is on the horizontal, x axis. The variable of interest—here “retail price of fresh oranges”— goes on the vertical, y axis.

top related