3_measure phase -introduction to statistics _14!1!15 [autosaved] [compatibility mode](1)
DESCRIPTION
measure phaseTRANSCRIPT
IBM-09: Six Sigma – Tools and TechniquesMeasure Phase
Introduction to Basic Statistics
A. Ramesh PhDDepartment of Management StudiesIndian Institute of Technology [email protected]
Measure Phase – Basic Statistics
What is Data?: Refer to facts usually collected as the result of experience, observation or
measurement
Consist of numbers, words, or images
Lowest level of abstraction from which information and knowledge arederived
DATA Information Knowledge
“ I believe in God - Rest bring data!” – A famous quote
Measure Phase – Basic Statistics
WHAT DO THESE WORDS & NUMBERS MEAN TO YOU ?
2.87, 2.85, 2.88, 2.85, 2.86, 2.85, 2.81, 2.82, 2.83, 2.85, 2.84, 2.84, 2.85, 2.86, 2.85, 2.84, 2.85, 2.85, 2.87, 2.81, 2.85, 2.82, 2.83, 2.85, 2.85, 2.86, 2.85, 2.86, 2.89, 2.85, 2.84, 2.84, 2.85, 2.85, 2.83, 2.82, 2.86, 2.83, 2.85, 2.86, 2.85, 2.84, 2.84, 2.87, 2.85, 2.86, 2.85, 2.84, 2.90, 2.88
PROBABLY NOTHING MUCH !
ROOF
GLASS
TREE
SKY
LAMP
FLOWERS
WIRES
WINDOWS
GRASSPOLE
TREE
TILES
Measure Phase – Basic Statistics
Statistics helps in summarizing and understanding the data
How about following pictures?
6
Can Statistics Be Trusted?“There are three kinds of lies:
Lies, damned lies, and statistics.”‐‐Mark Twain
“It is easy to lie with statistics. But it is easier to lie without them.” ‐‐Frederick Mosteller
“Figures won’t lie but liars will figure.”‐‐Charles Grosvenor
7
Statistics..
• Plays an important role in many facets of human endeavour
• Occurs remarkably frequently in our everyday lives
• It is often incorrectly thought of as just a collection of data, graphs and diagrams
8
What is Statistics?
• Science of gathering, analyzing, interpreting, and presenting data
• Branch of mathematics• Facts and figures• Statistics is the scientific method that enables us to
make decisions as responsibly as possible.
9
Statistics: Science of variability..?
• Virtually everything varies• Variation occurs among individuals• Variation occurs within any one individual as
time passes
Measure Phase – Basic Statistics
Concept of Variation:
Variation is natural and can not be avoided
Customer experiences the variation not the average.
Lower the variation, better the customer experience
What Customer wants
What customer experiences
(Variation)
What we measure
(Average)
Measure Phase – Basic Statistics
Sources of Variation:
Any changes to the above factors would directly impact the process performance and causes for variation
Measure Phase – Basic Statistics
Variation – Example:
One day he reaches little earlier (6:55)
Another day he reaches little late (7:05)
6.55 7.05
A man wants to reach his work place exactly by 7:00 a.m
Can we identify cause for this variation?
No ! We may not be !It may be affected by factors which
Affects the time he takes to travelHe cannot controlVary randomly
E.g..) The normal traffic he encountersunder normal course of travel
These Variation is called asInherent Variation or
Common Cause Variation orWhite Noise
Measure Phase – Basic Statistics
Variation – Example:
YES. We Can ! It may be because of some specific circumstances which do not occur in the normal scheme of actions.E.g..)
• His watch was running fast• He got a lift• He had a Client Call• He had some important work to be
finished before 7.30
These variations are called asSpecial Cause Variation or
Black Noise
TODAY HE IS VERY EARLY !
Can We find out what is cause for this?
6.00
Measure Phase – Basic Statistics
Reacting to common cause vs. special cause:
Measure Phase – Basic Statistics
Sampling:
Sampling is the process of:
Collecting only a portion of the data that is available or could be available,and drawing conclusions about the total population (statistical inference)
From the sample, we infer that the
average resolutiontime (x) is 1.2 days
Population Sample
xx
x
xxx
x
x
x
xx
x x
x
x
xx
x
xxx
x
N = 567 daysn = 3 days
What is the AverageResolution time?
*Within a certain confidence band or
margin of error
16
Population Versus Sample• Population — the whole
– a collection of persons, objects, or items under study– The entire group of individuals in a statistical study we
want information about.
• Census — gathering data from the entire population• Sample — a portion of the whole
– a subset of the population– a part of the population from which we actually collect
information, used to draw conclusions about the whole (statistical inference
17
Statistics can be split into two broad categories
1. Descriptive statistics
2. Statistical inference
Descriptive Statistics
Collect data ex. Survey
Present data ex. Tables and graphs
Characterize data ex. Sample mean = iX
n
19
Descriptive statistics..
• Encompasses the following:– Graphical or pictorial display– Condensation of large masses of data into a form
such as tables– Preparation of summary measures to give a
concise description of complex information (e.g. an average figure)
– Exhibition of patterns that may be found in sets of information
Inferential Statistics
Estimation ex. Estimate the population
mean weight using the sample mean weight
Hypothesis testing ex. Test the claim that the
population mean weight is 120 pounds
Drawing conclusions and/or making decisions concerning a population based on sample results.
21
Inferential Statistics..
• Especially relates to:– Determining whether characteristics of a situation
are unusual or if they have happened by chance– Estimating values of numerical quantities and
determining the reliability of those estimates– Using past occurrences to attempt to predict the
future
22
Process of Inferential Statistics
Population
(parameter)
)(statisticx
Sample estimate to
x Calculate
Select arandom sample
Population vs. Sample
Population Sample
Measures used to describe the population are called parameters
Measures computed from sample data are called statistics
24
Parameter vs. Statistic
• Parameter — descriptive measure of the population– Usually represented by Greek letters
• Statistic — descriptive measure of a sample– Usually represented by Roman letters
25
Symbols for Population Parameters
denotes population parameter2 denotes population variance
denotes population standard deviation
26
Symbols for Sample Statistics
x denotes sample mean2S denotes sample variance
S denotes sample standard deviation
Types of Variables
Categorical (qualitative) variables have values that can only be placed into categories, such as “yes” and “no.”
Numerical (quantitative) variables have values that represent quantities.
Types of Variables
Data
Categorical Numerical
Discrete ContinuousExamples:
Marital Status Political Party Eye Color
(Defined categories)Examples:
Number of Children Defects per hour
(Counted items)
Examples:
Weight Voltage(Measured characteristics)
29
Levels of Data Measurement
• Nominal — Lowest level of measurement• Ordinal• Interval• Ratio — Highest level of measurement
Levels of Measurement
A nominal scale classifies data into distinct categories in which no ranking is implied.
Categorical Variables Categories
Personal Computer Ownership
Type of Stocks Owned
Internet Provider
Yes / No
Microsoft Network / AOL
Growth Value Other
Levels of Measurement
An ordinal scale classifies data into distinct categories in which ranking is implied
Categorical Variable Ordered Categories
Student class designation Freshman, Sophomore, Junior, Senior
Product satisfaction Satisfied, Neutral, Unsatisfied
Faculty rank Professor, Associate Professor, Assistant Professor, Instructor
Standard & Poor’s bond ratings AAA, AA, A, BBB, BB, B, CCC, CC, C, DDD, DD, D
Student Grades A, B, C, D, F
Levels of Measurement
An interval scale is an ordered scale in which the difference between measurements is a meaningful quantity but the measurements do not have a true zero point.
A ratio scale is an ordered scale in which the difference between the measurements is a meaningful quantity and the measurements have a true zero point.
Interval and Ratio Scales
35
Data Level, Operations, and Statistical Methods
Data Level
Nominal
Ordinal
Interval
Ratio
Meaningful Operations
Classifying and Counting
All of the above plus Ranking
All of the above plus Addition, Subtraction
All of the above plus multiplication and division
StatisticalMethods
Nonparametric
Nonparametric
Parametric
Parametric
36
Data preparation rules
• Data presented must be– factual– relevant
Before presentation always check:• the source of the data• that the data has been accurately transcribed• the figures are relevant to the problem
37
Methods of visual presentation of data
• Table
1st Qtr 2nd Qtr 3rd Qtr 4th QtrEast 20.4 27.4 90 20.4West 30.6 38.6 34.6 31.6North 45.9 46.9 45 43.9
38
Methods of visual presentation of data
• Graphs
0102030405060708090
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
EastWestNorth
39
Methods of visual presentation of data
• Pie chart
1st Qtr2nd Qtr3rd Qtr4th Qtr
40
Methods of visual presentation of data
• Multiple bar chart
0 20 40 60 80 100
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
NorthWestEast
41
Methods of visual presentation of data
• Simple pictogram
020406080
100
1st Qtr 2nd Qtr 3rd Qtr 4th QtrEast
North
West
42
Frequency distributions
• Frequency tables
Class Interval Frequency Cumulative Frequency< 20 13 13<40 18 31<60 25 56<80 15 71<100 9 80
Observation Table
43
Frequency
0
5
10
15
20
25
30
< 20 <40 <60 <80 <100
Frequency
Frequency diagramsFrequency
0
5
10
15
20
25
30
< 20 <40 <60 <80 <100
Frequency
Cumulative Frequency
0102030405060708090
< 20 <40 <60 <80 <100
Cumulative Frequency
44
Ungrouped Versus Grouped Data
• Ungrouped data• have not been summarized in any way• are also called raw data
• Grouped data• have been organized into a frequency
distribution
45
Example of Ungrouped Data
42
30
53
50
52
30
55
49
61
74
26
58
40
40
28
36
30
33
31
37
32
37
30
32
23
32
58
43
30
29
34
50
47
31
35
26
64
46
40
43
57
30
49
40
25
50
52
32
60
54
Ages of a Sample of Managers from
XYZ
46
Frequency Distribution of Ages
Class Interval Frequency20-under 30 630-under 40 1840-under 50 1150-under 60 1160-under 70 370-under 80 1
47
Data Range
42
30
53
50
52
30
55
49
61
74
26
58
40
40
28
36
30
33
31
37
32
37
30
32
23
32
58
43
30
29
34
50
47
31
35
26
64
46
40
43
57
30
49
40
25
50
52
32
60
54
Smallest
Largest
Range = Largest - Smallest = 74 - 23= 51
48
Number of Classes and Class Width• The number of classes should be between 5 and 15.
• Fewer than 5 classes cause excessive summarization.• More than 15 classes leave too much detail.
• Class Width• Divide the range by the number of classes for an approximate
class width
• Round up to a convenient number
10=Width Class
8.5 =651 = Width Class eApproximat
49
Class Midpoint
Class Midpoint = beginning class endpoint + ending class endpoint
2
= 30 + 40
2= 35
Class Midpoint = class beginning point + 12
class width
= 30 + 12
10
= 35
50
Relative FrequencyRelative
Class Interval Frequency Frequency20-under 30 6 .1230-under 40 18 .3640-under 50 11 .2250-under 60 11 .2260-under 70 3 .0670-under 80 1 .02
Total 50 1.00
650
1850
51
Cumulative FrequencyCumulative
Class Interval Frequency Frequency20-under 30 6 630-under 40 18 2440-under 50 11 3550-under 60 11 4660-under 70 3 4970-under 80 1 50
Total 50
18 + 611 + 24
52
Class Midpoints, Relative Frequencies, and Cumulative Frequencies
Relative CumulativeClass Interval Frequency Midpoint Frequency Frequency20-under 30 6 25 .12 630-under 40 18 35 .36 2440-under 50 11 45 .22 3550-under 60 11 55 .22 4660-under 70 3 65 .06 4970-under 80 1 75 .02 50
Total 50 1.00
53
Cumulative Relative Frequencies
Relative Cumulative Cumulative Relative
Class Interval Frequency Frequency Frequency Frequency20-under 30 6 .12 6 .1230-under 40 18 .36 24 .4840-under 50 11 .22 35 .7050-under 60 11 .22 46 .9260-under 70 3 .06 49 .9870-under 80 1 .02 50 1.00
Total 50 1.00
54
Common Statistical Graphs
• Histogram -- vertical bar chart of frequencies• Frequency Polygon -- line graph of frequencies• Ogive -- line graph of cumulative frequencies• Pie Chart -- proportional representation for
categories of a whole• Stem and Leaf Plot• Pareto Chart• Scatter Plot
55
Histogram
Class Interval Frequency20-under 30 630-under 40 1840-under 50 1150-under 60 1160-under 70 370-under 80 1 0
1020
0 10 20 30 40 50 60 70 80
Years
Freq
uenc
y
56
Histogram Construction
Class Interval Frequency20-under 30 630-under 40 1840-under 50 1150-under 60 1160-under 70 370-under 80 1
010
20
0 10 20 30 40 50 60 70 80
Years
Freq
uenc
y
57
Frequency Polygon
Class Interval Frequency20-under 30 630-under 40 1840-under 50 1150-under 60 1160-under 70 370-under 80 1 0
1020
0 10 20 30 40 50 60 70 80
Years
Freq
uenc
y
58
Ogive
CumulativeClass Interval Frequency20-under 30 630-under 40 2440-under 50 3550-under 60 4660-under 70 4970-under 80 50
020
4060
0 10 20 30 40 50 60 70 80
Years
Freq
uenc
y
59
Relative Frequency Ogive
CumulativeRelative
Class Interval Frequency20-under 30 .1230-under 40 .4840-under 50 .7050-under 60 .9260-under 70 .9870-under 80 1.00
0.000.100.200.300.400.500.600.700.800.901.00
0 10 20 30 40 50 60 70 80
Years
Cum
ulat
ive
Rel
ativ
e Fr
eque
ncy
60
Complaints by Passengers
COMPLAINT NUMBER PROPORTION DEGREES
Stations, etc. 28,000 .40 144.0
TrainPerformance
14,700 .21 75.6
Equipment 10,500 .15 50.4
Personnel 9,800 .14 50.6
Schedules,etc.
7,000 .10 36.0
Total 70,000 1.00 360.0
61
Complaints by Passengers
Stations, Etc.40%Train
Performance21%
Equipment15%
Personnel14%
Schedules, Etc.10%
62
Second Quarter Truck
Production
2d QuarterTruck
ProductionCompany
A
B
C
D
ETotals
357,411
354,936
160,997
34,099
12,747920,190
64
Pie Chart Calculations for Company A
2d QuarterTruck
Production Proportion DegreesCompany
A
B
C
D
ETotals
357,411
354,936
160,997
34,099
12,747920,190
.388
.386
.175
.037
.0141.000
140
139
63
13
5360
357,411920,190
=
.388 360 =
65
Pareto Chart
0102030405060708090
100
PoorWiring
Short inCoil
DefectivePlug
Other
Freq
uenc
y
0%10%20%30%40%50%60%70%80%90%100%
66
Scatter Plot
Registered Vehicles (1000's)
Gasoline Sales (1000's of
Gallons)
5 60
15 120
9 90
15 140
7 60
0
100
200
0 5 10 15 20Registered Vehicles
Gas
olin
e Sa
les
Principles of Excellent Graphs
The graph should not distort the data. The graph should not contain unnecessary
adornments (sometimes referred to as chart junk). The scale on the vertical axis should begin at zero. All axes should be properly labeled. The graph should contain a title. The simplest possible graph should be used for a
given set of data.
Graphical Errors: Chart Junk
1960: $1.00
1970: $1.60
1980: $3.10
1990: $3.80
Minimum Wage
Bad Presentation
Minimum Wage
0
2
4
1960 1970 1980 1990
$
Good Presentation
Graphical Errors: Compressing the Vertical Axis
Good Presentation
Quarterly Sales Quarterly Sales
Bad Presentation
0
25
50
Q1 Q2 Q3 Q4
$
0
100
200
Q1 Q2 Q3 Q4
$
Graphical Errors: No Zero Point on the Vertical Axis
Monthly Sales
36
39
42
45
J F M A M J
$
Graphing the first six months of sales
Monthly Sales
0
394245
J F M A M J
$
36
Good PresentationsBad Presentation
71
Thank You
• http://www.stats.gla.ac.uk/steps/glossary/presenting_data.html
• http://www.ilir.uiuc.edu/courses/lir593/