introduction to data analysis and decision making

60
Introduction to Data Analysis and Decision Making

Upload: henry-barker

Post on 20-Jan-2016

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Introduction to Data Analysis and Decision Making

Introduction to Data Analysis and Decision Making

Page 2: Introduction to Data Analysis and Decision Making

Data Analysis

• Describing data and datasets

• Making inferences from data and datasets

• Searching for relationships in data and datasets

Page 3: Introduction to Data Analysis and Decision Making

Decision Making

• Optimization

• Decision analysis with uncertainty

• Sensitivity Analysis

Page 4: Introduction to Data Analysis and Decision Making

Uncertainty

• Measuring uncertainty

• Modeling and simulation

Page 5: Introduction to Data Analysis and Decision Making

What is Management Science?

• Logical, systematic approach to decision making using quantitative methods.

• Science Scientific methods used to solve business related problems.

• Goal for this class: logically approach and solve many different problems.

Page 6: Introduction to Data Analysis and Decision Making

Management Science Approach to Problem Solving

• Observation

• Definition of the Problem

• Constructing the Model

• Solving the Model/problem

• Implementation of Solution

(process is never really complete)

Page 7: Introduction to Data Analysis and Decision Making

Observation

• Identify the problem

• Problem does not imply that there is something wrong with the process

• “Problem” could imply need for improvement

Page 8: Introduction to Data Analysis and Decision Making

Definition of the Problem

• Clearly define problem

• Prevents incorrect/inappropriate solution

• Listing goals could be helpful

Page 9: Introduction to Data Analysis and Decision Making

Constructing the Model

• Represents the problem in abstract form

• Schematic, scale, mathematical relationship between variables (equation)

• Ex: Income = Hours Worked * Pay

Page 10: Introduction to Data Analysis and Decision Making

Components of the Model

• Variable/Decision Variables– Independent– Dependent

• Objective Function

• Parameter

• Constraints

Page 11: Introduction to Data Analysis and Decision Making

Model Solution

• Same as solving the problem:

• Ex: Z = $20X – 5X

subject to

4X = 100

• Solution:X=25 Z = $375

Page 12: Introduction to Data Analysis and Decision Making

Implementation of Solution

• Solution aids us in making a decision but does not constitute the actual decision making.

Page 13: Introduction to Data Analysis and Decision Making

Example

Blue Ridge Hot Tubs manufactures and sell hot tubs. The company needs to decide how many hot tubs to produce during the next production cycle. The company buys prefabricated fiberglass hot tub shells from a local supplier and adds pump and tubing to the shells to create his hot tubs. The company has 200 pumps available. Each hot tub requires 9 hours of labor. The company expects to have 1,566 production labor hours during the next production cycle. A profit of $350 will be earned on each hot tub sold. The company is confident that all of the hot tubs will sell. The question is, how many should be produced if the company wants to maximize profits during the next production cycle?

Page 14: Introduction to Data Analysis and Decision Making

Msci Approach to Problem Solving

• Problem: Determine # of hot tubs to produce• Definition: Maximize profit within the constraints

of the labor hours and materials available• Model: Max Z = $350X

subject to9X 1,566 labor hours

• Solution: X = 174; Z = 350(174) = $60,900• Implementation: Recommend making 174 hot

tubs

Page 15: Introduction to Data Analysis and Decision Making

A Generic Mathematical Model

Y = f(X1, X2, …, Xk)

Y = dependent variable (a bottom line performance measure)

Xi = independent variables (inputs having an impact on Y)

f(.) = function defining the relationship between the Xi and Y

Where:

Page 16: Introduction to Data Analysis and Decision Making

Categories of Mathematical Models

Prescriptive known, known or under LP, Networks, IP,

well-defined decision maker’s CPM, EOQ, NLP,

control GP, MOLP

Predictive unknown, known or under Regression Analysis,

ill-defined decision maker’s Time Series Analysis,

control Discriminant Analysis

Descriptive known, unknown or Simulation, PERT,well-defined uncertain Queueing, Inventory Models

Model Independent OR/MS

Category Form of f(.) Variables Techniques

Page 17: Introduction to Data Analysis and Decision Making

Example – Spring Mills

• 280 observations

• Three variables per observation

• Relatively large dataset

Page 18: Introduction to Data Analysis and Decision Making

Background Information

• Spring Mills produces and distributes a wide variety of manufactured goods. It has a large number of customers.

• Spring Mills classifies these customers as small, medium, or large, depending on the volume of business each does with them.

• Recently they have noticed a problem with accounts receivable. They are not getting paid by their customers in as timely a manner as they would like. This obviously costs them money.

Page 19: Introduction to Data Analysis and Decision Making

RECEIVE.XLS

• Spring Mills has gathered data on 280 customer accounts.

• For each of these accounts the data set lists three variables:– Size - The size of the customer (coded 1 for

small, 2 for medium, 3 for large).– Days - The number of days since the customer

was billed.– Amount - The amount the customer owes.

• What information can we obtain from this data?

Page 20: Introduction to Data Analysis and Decision Making

Summary Measures for Combined Data

Page 21: Introduction to Data Analysis and Decision Making

Scatterplot: Amount vs DaysAll Customers

Page 22: Introduction to Data Analysis and Decision Making

Scatterplot: Amount vs DaysSmall Customers

Page 23: Introduction to Data Analysis and Decision Making

Scatterplot: Amount vs DaysMedium Customers

Page 24: Introduction to Data Analysis and Decision Making

Scatterplot: Amount vs DaysLarge Customers

Page 25: Introduction to Data Analysis and Decision Making

Analysis -- continued• There is obviously a lot going on here and it is evident form the charts.

We point out the following:

– there are considerably fewer large customers than small or medium customers.

– the large customers tend to owe considerably more than small or medium customers.

– the small customers do not tend to be as long overdue as the large and medium customers.

– there is no relationship between Days and Amount for the small customers, but there is a definite positive relationship between these variables for the medium and large customers.

Page 26: Introduction to Data Analysis and Decision Making

Findings

• If Spring Mills really wants to decrease receivables, it might want to target the medium-sized customer group, from which it is losing the most interest.

• Or it could target the large customers because they owe the most on average.

• The most appropriate action depends on the cost and effectiveness of targeting any particular customer group. However, the analysis presented here gives the company a much better picture of what’s currently going on.

Page 27: Introduction to Data Analysis and Decision Making

Modeling and Models

• Graphical models

• Algebraic models

• Spreadsheet models

Page 28: Introduction to Data Analysis and Decision Making

The Modeling Process

• Define the problem• Collect and summarize data• Formulate a model• Verify the model• Select one or more suitable decisions• Present the results to the organization• Implement the model and update through time

Page 29: Introduction to Data Analysis and Decision Making

Describing Data:The Basics

Page 30: Introduction to Data Analysis and Decision Making

Descriptive vs Inferential Statistics

• Descriptive statistics:– The process of applying a method of analysis

to a set of data in order to better understand the information contained within.

• Inferential statistics:– Using a (sub)set of data (a sample) to predict

behavior of a larger set of data (the population).

Page 31: Introduction to Data Analysis and Decision Making

Population

• Definition:– Set of existing units (usually people, objects,

transactions, or events); or– Every element in a group that is the subject of

interest– Depends upon the problem or situation

• Examples:– College students, Honda Accords, cash sales

Page 32: Introduction to Data Analysis and Decision Making

Population Parameters and Sample Statistics

A population parameter is number calculated from all the population measurements that describes some aspect of the population.

The population mean, denoted , is a population parameter and is the average of the population measurements.

A point estimate is a one-number estimate of the value of a population parameter.

A sample statistic is number calculated using sample measurements that describes some aspect of the sample.

Page 33: Introduction to Data Analysis and Decision Making

Measures of Central Tendency

Mean, The average or expected value

Median, Md The middle point of the ordered measurements

Mode, Mo The most frequent value

Page 34: Introduction to Data Analysis and Decision Making

The Mean

Population X1, X2, …, XN

Population Mean

N

X

N

1=ii

Sample x1, x2, …, xn

Sample Mean

n

xx

n

1=ii

x

Page 35: Introduction to Data Analysis and Decision Making

Relationships Among Mean, Median and Mode

Page 36: Introduction to Data Analysis and Decision Making

Variables

• Definition:– Characteristic or property of an individual

population unit– Particular characteristics or properties may

vary among units in a population

• Examples:– Starting salary of MBA college graduates– Price of peanut butter at grocery stores

Page 37: Introduction to Data Analysis and Decision Making

Measurement

• Definition:– The process of quantifying information

• Quantitative variables:– Test scores, product and process

measurements, survey results, etc.

• Qualitative variables:– Product rating, arbitrary scales, etc.

Page 38: Introduction to Data Analysis and Decision Making

Sample

• Definition:– Subset of the units of the population

• Example:– 100 GPA’s from all finance majors– Tool wear on 3 machines out of 45 machines

• Notes:– A random sample implies no statistical bias– A census includes all population members

Page 39: Introduction to Data Analysis and Decision Making

Statistical Inference

• Definition:– Estimation, prediction, or other generalizations

about a population based on information contained in a sample.

• Example:– Based on a 5 year sample of similar weather

patterns, predicting the chance of rain today.

Page 40: Introduction to Data Analysis and Decision Making

Reliability of the Inference

• Four items discussed thus far allow for statistical inference:– A population, variable(s) of interest, a sample,

and an inference.

• Fifth Item: A measure of the reliability of the inference.– How good the inference is, i.e. how much

confidence can we place in the inference?

Page 41: Introduction to Data Analysis and Decision Making

Example

• The approval rating of the President; what does it really mean?

• Uses a sample from the population to infer the percentage of the population that approves of his overall performance.

• Implies that 55% of the population approves of the president’s performance plus or minus 5%, i.e. between 50% and 60%.

Page 42: Introduction to Data Analysis and Decision Making

Process Statistics

• A process transforms inputs into outputs:– A manufacturing process which transforms aluminum

sheet into aluminum cans.– A service process which offers financial advice based

on a customer’s input.

• Samples are obtained from a process and statistical procedures can then be applied to make inferences about the process itself.

Page 43: Introduction to Data Analysis and Decision Making

Process

A sequence of operations that takes inputs (labor, raw materials, methods, machines, and so on) and turns them into outputs (products, services, and the like.)

ProcessInputs Outputs

Sampling a Process

A process is in statistical control if it displays constant level and constant variation.

Page 44: Introduction to Data Analysis and Decision Making

Types of Data

• Data can be classified into four types:– Nominal– Ordinal – Interval– Ratio

Page 45: Introduction to Data Analysis and Decision Making

Nominal Data

• Classify the members of the sample into categories (Categorical Data).

• Examples:– An individual’s religious affiliation– Gender of applicants– An individual’s political party affiliation

• No mathematical properties, i.e. numerical values are only codes.

Page 46: Introduction to Data Analysis and Decision Making

Ordinal Data

• Units of the sample can be ordered with respect to the variable of interest.

• Examples:– Size of rental cars.– Ranking of microbrews with respect to taste.– Ranking of consumer preferences for a product.

• No mathematical properties in that the difference between ranking values is meaningless.

Page 47: Introduction to Data Analysis and Decision Making

Interval Data

• Sample measurements enable comparisons between members of the sample, i.e. the differences between samples has meaning.

• Examples:– Temperature or pressure readings.– Machine speeds

• Can add and subtract but cannot multiply or divide; origin has no meaning.

Page 48: Introduction to Data Analysis and Decision Making

Ratio Data

• Equal distance between numbers imply equal distances between the values of the characteristic being measured, i.e. zero represents the absence of the characteristic being measured.

• Examples:– Sales revenue for a product or service.– Unemployment rate.

Page 49: Introduction to Data Analysis and Decision Making

Classes of Data

• Data can be classified as either being:– Qualitative data - nominal, ordinal, or– Quantitative data - interval, ratio.

• Numerical data can also be discrete (countable) or continuous.

• Spreadsheet (or Database)– Variable (or Field)– Observation (or Record)

Page 50: Introduction to Data Analysis and Decision Making

Describing Data:Graphs and Tables

Page 51: Introduction to Data Analysis and Decision Making

Displaying Data

• For both Qualitative and Quantitative Data:– Pie Charts– Bar Graphs (Bar Charts)– Histograms– Frequency Tables– Stem and Leaf Diagrams

Page 52: Introduction to Data Analysis and Decision Making

Pie Chart Example

• 1999 Cigarette Sales (in billions) by company– Philip Morris, 211.8– Reynolds, 189.7– Brown and Williamson,

69.1– Lorillard, 48.6– American, 43.9– Liggett, 29.8

1999 Cigarette Sales(Billions of Cigarettes)

211.8189.

7

69.1

48.6 43.9

29.8

Philip Morris Reynolds

Brown and Williamson Lorillard

American Liggett

Page 53: Introduction to Data Analysis and Decision Making

Bar Graph Example

• 1999 Cigarette Sales (in billions) by company– Philip Morris, 211.8– Reynolds, 189.7– Brown and Williamson,

69.1– Lorillard, 48.6– American, 43.9– Liggett, 29.8

1999 Cigarette Sales(Billions of Cigarettes)

0 100 200 300

Philip Morris

Reynolds

Brown and Williamson

Lorillard

American

Liggett

Page 54: Introduction to Data Analysis and Decision Making

Histogram Example

• Percentage of Sales Revenue spent on Advertising for a sample of 35 Fortune 500 companies:– 1% to 3% (4)– 3% to 5% (9)– 5% to 7% (11)– 7% to 9% (8)– 9% to 11% (3)

4

9

11

8

3

0

2

4

6

8

10

12

Page 55: Introduction to Data Analysis and Decision Making

Measurement Classes

• Intervals are called measurement classes:– A count of the members of a measurement class is

the frequency.– The proportion of members in a measurement class

is the relative frequency. For a given interval, this proportion is calculated by dividing the frequency of the measurement class by the sample size.

Page 56: Introduction to Data Analysis and Decision Making

Relative Frequency

• Sample: • Frequency Table:

– Divide range into intervals of equal size.

– Count the number of sample members that fall within the ranges.

Sales SalesCompany Revenue Company Revenue

1 3.1 19 6.22 7.4 20 8.43 2.2 21 1.94 10.9 22 5.85 4.5 23 4.96 8.6 24 6.47 3.7 25 3.68 6.3 26 7.99 7.6 27 3.210 5.4 28 8.511 2.3 29 6.212 5.8 30 9.713 4.2 31 7.114 6.1 32 5.915 9.1 33 5.716 5.5 34 4.417 4.8 35 2.918 8.9

Range Count Proportion1% to 3% 4 0.1143% to 5% 9 0.2575% to 7% 11 0.3147% to 9% 8 0.2299% to 11% 3 0.086

Page 57: Introduction to Data Analysis and Decision Making

Relative Frequency Histogram Example

• Percentage of Sales Revenue spent on Advertising for a sample of 35 Fortune 500 companies:– 1 to 3% (4/35=0.114)– 3 to 5% (9/35=0.257)– 5 to 7% (11/35=0.314)– 7 to 9% (8/35=0.229)– 9 to 11% (3/35=0.086)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Page 58: Introduction to Data Analysis and Decision Making

Stem and Leaf Diagrams

• Data is displayed graphically:– The stem is the portion of

the data to the left of the decimal point.

– The leaf is the portion of data to the right of the decimal point.

• Graphical representation much like Histogram.

• From our previous data:

Stem Leaf1 92 2 3 93 1 2 6 74 2 4 5 8 95 4 5 7 8 8 96 1 2 2 3 47 1 4 6 98 4 5 6 99 1 7

10 9Key: Leaf units are tenths.

Page 59: Introduction to Data Analysis and Decision Making

The Effect of Measurement Class Size on a Histogram

• A Histogram showing greater detail can be obtained by:– Decreasing class size

(which increases the number of classes), or

– Increasing sample size (which increases the number of members in each class). 1

3

4

5

6

5

4 4

2

1

0

1

2

3

4

5

6

7

Page 60: Introduction to Data Analysis and Decision Making

Excel and StatPro Add-in Demonstration

• Frequency tables

• Histograms

• Scatterplots

• Time series plots