part ii stat 305: chapter 1 - github pages · wh y engi n eers study statistics chapter 1: intro...

33
STAT 305: Chapter 1 STAT 305: Chapter 1 Part II Part II Amin Shirazi Amin Shirazi Course page: Course page: ashirazist.github.io/stat305.github.io ashirazist.github.io/stat305.github.io 1 / 33 1 / 33

Upload: others

Post on 13-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

STAT 305: Chapter 1STAT 305: Chapter 1

Part IIPart II

Amin ShiraziAmin Shirazi

Course page:Course page:ashirazist.github.io/stat305.github.ioashirazist.github.io/stat305.github.io

1 / 331 / 33

Page 2: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

Why Engineers Study StatisticsWhy Engineers Study Statistics

Chapter 1: Introduction, ContinuedChapter 1: Introduction, Continued

Chapter 2: Data CollectionChapter 2: Data Collection

2 / 332 / 33

Page 3: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

Section 1.2Basic Terminology, Continued

3 / 33

Page 4: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

DataStructures

Types of Data StructuresThe most basic way to think about data is to imagine howthe the raw observations could be organized oncecollected.

Collected data can be referred to as a data set. If the dataset is simple enough, we can store it in a data table or flatfile. Traditional data tables store values relating to a singleobservation/unit/individual as a row of the table. Eachcolumn in the table represents a value for some observedcharacterstic observed.

Example: Failure time of lightbulbs

A single brand and model of lightbulb is being examinedfor average failure time. Five bulbs were run until theyburned out and their lifetime was recorded in hours. Thefirst bulb lasted 521.4 hours, the second bulb lasted 501.2hours, the third bulb lasted 541.8 hours, the fourth bulblasted 498.1 hours, and the fifth bulb lasted 528.2 hours.

4 / 33

Page 5: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

DataStructures

Types of Data StructuresExample: Failure time of lightbulbs, continued

Assembling the results in a data table could look like this:

Bulb Number Failure Time (hours)1 521.42 501.23 541.84 498.15 528.2

Each bulb tested gets its own row - which row is attachedto which bulb is identified by the first column. The onlyfeature being observed is failure time - so only one columnof observations are recorded for each bulb.

Notice:

Failure Time is a quantitative continuous variable.This is a univariate data set.

5 / 33

Page 6: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

DataStructures

Types of Data StructuresExample: Type of bill, date of payment, and paymentamount for Mediacom

Customer Type Date AmountJohn Doe Internet 01-05-2015 110.00John Doe Phone 01-15-2015 10.00John Doe Internet 02-05-2015 110.00John Doe Phone 02-15-2015 10.00John Doe Internet 03-05-2015 110.00John Doe Phone 03-15-2015 10.00... ... ... ...John Doe Internet 01-05-2016 110.00John Doe Phone 01-15-2016 10.00Jane Doe Internet 04-12-2015 90.00Jane Doe Internet 05-12-2015 90.00... ... ... ...Jane Doe Internet 01-12-2016 90.00

Notice:

Type of bill is is a Qualitative variable.Amount paid is quantitative discrete. 6 / 33

Page 7: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

DataStructures

Types of Data StructuresExample: Machine Parts

Suppose we get a shipment of 5000 machineparts and would like to verify that the shipmentmeets the standards the machinist agreed to.We take out 100 parts and examine themcarefully. To verify that the parts are as strongas we anticipated, we measure the "Rockwellhardness" with a machine that is accurate tothe first decimal place. We also examine eachpart for scratches and record it weight. Further,we run the part in a test machine to determineif it works correctly.

In this case, we are gathering 4 values on each part. So forinstance, the first of the 100 parts we examine could havea measured Rockwell hardness of 3.2, no scratches, aweight of 1.7562 g, and it works correctly. The second ofthe 100 parts we examine could have a measuredRockwell hardness of 3.1, no scratches, a weight of 1.7901g, and does not work correctly.

7 / 33

Page 8: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

DataStructures

Types of Data StructuresThe data as recorded by the researcher might look like this

Part identifier: 1/100 Rockwell Hardness: 3.2 scratches: no weight (g): 1.7562 functioning: yes

Part identifier: 2/100 Rockwell Hardness: 3.1 scratches: no weight (g): 1.7901 functioning: no

...

Part identifier: 100/100 Rockwell Hardness: 3.4 scratches: no weight (g): 1.7651 functioning: yes

8 / 33

Page 9: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

DataStructures

Types of Data StructuresWhich we could turn into structured data table like this:The data as recorded by the researcher might look like this

part rockwell_hardness weight scratches functioning1 3.2 1.7562 no yes2 3.1 1.7901 no no. . . . .. . . . .. . . . .100 3.4 1.7651 no yes

When data is arranged like this, with each sampling uniton its own row, the data is said to be in wide format.

9 / 33

Page 10: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

DataStructures

Types of Data StructuresHowever, we could also structure a data table like this:

part measurement value1 Rockwell 3.21 weight 1.75621 scratches no1 functioning yes2 Rockwell 3.12 weight 1.79012 scratches no2 functioning no. . .. . .. . .100 functioning yes

When data is arranged like this, with each sampling uniton its own row, the data is said to be in long format.

10 / 33

Page 11: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

DataStructures

Factorial StudiesFactorial Studies involve scenarios in whichseveral process variables are indentified asbeing of interest and data are collected underdifferent settings of these process variables.

We call the process variables factors and thepossible settings for a process variable itslevels

Complete Factorial Studies are factorialstudies where data is collected from eachpossible combination of the levels of the factors(also known as Full Factorial Studies).

Partial(Fractional) Factorial Studies arefactorial studies where data is collected fromsome (but not all) possible combinations of thelevels of the factors.

11 / 33

Page 12: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

DataStructures

Factorial Studies ExampleA pair of chemists, Walter and Jessie, areattempting to synthesize a chemical productand consider purity to be the most importantquality. There are three environments availableto them ( RV, a basement, and a laboratory) andtwo precursors (Chemical compound)(pseudoephedrine/methylamine). They are bothwilling to try all their options in order to get thebest results.

What parts of this synthesis are being treated asvariables which can be controlled at the start of theexperiment?

What are the possible values for each of thesevariables?

How many ways can the variables be combined?

12 / 33

Page 13: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

DataStructures

Factorial Studies Example, cont

Here are all the possible combinations of the factors:

cook environment precursor walter RV psuedoephedrine walter RV methylamine walter basement psuedoephedrine walter basement methylamine walter lab psuedoephedrine walter lab methylamine jessie RV psuedoephedrine jessie RV methylamine jessie basement psuedoephedrine jessie basement methylamine jessie lab psuedoephedrine

(# of Cooks) ⋅ (# of Environments) ⋅ (# of Precursors) = 2 ⋅ 3 ⋅ 2 = 12

13 / 33

Page 14: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

DataStructures

Factorial Studies Example, cont

If we collect data from each of these combinations, wehave performed a A Complete Factorial Study

14 / 33

Page 15: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

DataStructures

Factorial Studies Example, cont

After testing each scenario, Walter and Jessie decide thatthe best combination to use is Walt as cook in the lab withmethylamine. However, a new "chemist" Victor has joinedthe group and is going to try to be the cook and "follow therecipe" in the lab. Jessie also tries a new environment,South America.

If we consider the all the past combinations to be partof this new study, how many combinations of factorlevels are now possible?

15 / 33

Page 16: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

DataStructures

Victor never works in the RV, the basement, or SouthAmerica.

Walter never works in South America.

16 / 33

Page 17: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

DataStructures

Factorial Studies Example, cont

cook env precursor1. walt RV pseudo2. walt RV methylamine3. walt basement pseudo4. walt basement methylamine5. walt lab pseudo6. walt lab methylamine7. jessie RV pseudo8. jessie RV methylamine9. jessie basement pseudo10. jessie basement methylamine11. jessie lab pseudo12. jessie lab methylamine13. jessie so. am. methylamine14. jessie so. am. pseudo15. victor lab methylamine16. victor lab pseudo 17 / 33

Page 18: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

DataStructures

Factorial Studies Example, cont

In this case, we would have a Fractional Factorial Study -a factorial study in which no data is collected for somepossible combinations.

18 / 33

Page 19: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

Section 1.3Section 1.3Measurement: It's Importance and Di�cultyMeasurement: It's Importance and Di�culty

19 / 3319 / 33

Page 20: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

Measure

Key Words

If You Can't Measure, You Can't DoStatistics

Or Engineering For That Matter

Success in statistical engineering studies requires theability to measureMethods of measurements are available for somephysical properties (length, mass, temperature, ...)

Often, the behavior of anengineering system canbe adequately characterized in terms of suchproperties

If it cannot, engineers must carefully define what isabout the system that needs observing and then createa suitable method of measurement.

20 / 33

Page 21: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

Measure

Key Words

If You Can't Measure, You Can't DoStatistics

Example:

Two students wanted to conduct a factorial studycomparing joint strengths for combinations of threedifferent woods and three glues.

Didn't know how to have access to strength-testingequipment, so invented their own.Suspend a large container from one of the pieces ofwood involved and poured water into it until theweight was sufficient to break the joint.Knowing the volume of the water poured into thecontainer and the density of the water, they coulddetermine the force required to break the joint.

21 / 33

Page 22: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

Measure

Key Words

If You Can't Measure, You Can't DoStatistics

22 / 33

Page 23: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

Measure

Key Words

If You Can't Measure, You Can't DoStatistics

Measurement and its importance and di�culty

Validity: appropriately represent the feature ofinterest.

Variation is always present in collecting data.

Some come from the the objects under study asthey are never alike(that might be of interest tosee if the variation is due to the object)Some of it is due to the fact that the measurementprocesses have their own inherent variablity.

23 / 33

Page 24: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

Measure

Key Words

If You Can't Measure, You Can't DoStatistics

Measurement and its importance and di�culty

Precision: the amount of variation in repeatedmeasurement of the same object

A measurement system is called precise if it producessmall variation in repreated measurement of the sameobject

Precision is the internal consistency of a measurementsystem: typically, it can be improved only with basicchanges in the configuration of the system.

24 / 33

Page 25: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

Measure

Key Words

If You Can't Measure, You Can't DoStatistics

Measurement and its importance and di�culty

Precision of a measurement is important, but for manypurposes it alone is not adequate.

Accuracy: or Unbiasedness; how close ameasurement is to the true value "on average".

Accuracy is the agreement of a measuring system withsome external standard. It can be changed withoutextensive physical change in a measurement method. So,we calibrate to improve accuracy.

25 / 33

Page 26: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

Measure

Key Words

If You Can't Measure, You Can't DoStatistics

Measurement and its importance and di�culty

Calibration of a system against the standard (bringingthe measurement system in line with the standard)can be

As simple as comparing the measurement systemto a standardDeveloping an appropriate conversion schemeand then using converted values in place ofrecording observed measurements.

26 / 33

Page 27: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

Measure

Key Words

If You Can't Measure, You Can't DoStatistics

Accuracy VS. Precision

Accuracy: how close a measurement is to the truevalue "on average"Precision: the amount of variation in repeatedmeasurement of the same objectComparing measurement to target shooting

27 / 33

Page 28: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

Section 1.4Section 1.4Mathematical ModelsMathematical Models

28 / 3328 / 33

Page 29: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

Measure

MathModels

Mathematical Models and Data AnalysisA discussion on the relationships of mathematics to thephysical words and to engineering statistics.

Mathematical Model: A description of aphysical system using mathematical conceptsand language (in terms of symbols, equations,numbers, and the like)

Identifying mathematical relationships between partsof a system allows us to describe complexity in simpleterms.An effective mathematical model is the one which issimple and has predictive ability.

29 / 33

Page 30: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

Measure

MathModels

Mathematical Models and Data AnalysisExample: Height of an Object in Projectile Motion

We can describe the relationship between height of aprojectile and time as

where

is the initial height, is the initial vertical velocity, and

is the (constant) acceleration due to gravity

h t

h = h0 + vh ⋅ t − gt2,  t ≥ 0,1

2

h0vh

g

30 / 33

Page 31: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

Measure

MathModels

Mathematical Models and Data AnalysisExample: Height of an Object in Projectile Motion

31 / 33

Page 32: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

Measure

MathModels

Example: Height of an Object in Projectile Motion, cont.

However, this is not what we see in real life for a varietyof reasons. This model assumes

1. is constant as the ball falls, while actually dependson the distance between the object and earth,

2. is a known to infinite accuracy, while we would beusing a value that is estimated,

3. Gravity is the only force acting on the object, ignoringdrag force, electrical attractions, etc.

4. There are no other changes in the system (forinstance, changes in air pressure)

We can fix these by writing a better relationship or we canaccept that some things won't be known and use astochastic model - a mathematical model that specificallyallows for variation (or "randomness"). Understandinghow these stochastic models work is a major focus of thiscourse.

g g

g

32 / 33

Page 33: Part II STAT 305: Chapter 1 - GitHub Pages · Wh y Engi n eers Study Statistics Chapter 1: Intro duction, Conti nued Chapter 2: Data Collection Section 1.2 Basic Term i n ology, Conti

What andWhy

Terms

Measure

MathModels

What's my pointWe cannot say there is no variation in themeasurement or the relation under study is justaffected by the components defined in themathematical model.

We can control some parts of the variation byplanning the data collection process

There is always some error out of control which arestochastic (random)

Statistical methods help to deal with this randomness

33 / 33