week 2 basic statistical concepts, part iipersonal.psu.edu › acq › 401 › course.info ›...

60
Outline Data Presentation (Lab 1) Comparative Studies Causation: Experiments and Observational Studies Comparative Graphics (Lab 2) Week 2 Basic Statistical Concepts, Part II Week 2 Basic Statistical Concepts, Part II

Upload: others

Post on 05-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Week 2Basic Statistical Concepts, Part II

Week 2 Basic Statistical Concepts, Part II

Page 2: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Week 2 Objectives

1 Data presentation through numerical and graphicalsummaries using R:

sample mean, variance and percentiles; the box plot,histogram, stem and leaf diagram, the pie chart and bargraph, the scatter plot and scatter plot matrix.

2 The basics of comparative studies includingrandomization, confounding and Simpson’s paradox.

3 Statistical experiments vs observational studies, and theirrelevance for establishing causation.

4 Factorial designs concepts: main effects and interactions.5 Use of R for comparative graphics, the interaction plot, and

for computing the main effects and interactions.

Week 2 Basic Statistical Concepts, Part II

Page 3: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

1 Data Presentation (Lab 1)

Basic Statistics and the Boxplot

Pie Charts, Bar Graphs, and Histograms

Scatterplots, Scatterplot Matrices and 3D Scatterplots

2 Comparative Studies

Randomization, Confounding and Simpson’s Paradox

3 Causation: Experiments and Observational Studies

Factorial Experiments, Main and Interaction Effects

4 Comparative Graphics (Lab 2)

Week 2 Basic Statistical Concepts, Part II

Page 4: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Basic Statistics and the BoxplotPie Charts, Bar Graphs, and HistogramsScatterplots, Scatterplot Matrices and 3D Scatterplots

Outline1 Data Presentation (Lab 1)

Basic Statistics and the Boxplot

Pie Charts, Bar Graphs, and Histograms

Scatterplots, Scatterplot Matrices and 3D Scatterplots

2 Comparative Studies

Randomization, Confounding and Simpson’s Paradox

3 Causation: Experiments and Observational Studies

Factorial Experiments, Main and Interaction Effects

4 Comparative Graphics (Lab 2)

Week 2 Basic Statistical Concepts, Part II

Page 5: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Basic Statistics and the BoxplotPie Charts, Bar Graphs, and HistogramsScatterplots, Scatterplot Matrices and 3D Scatterplots

Mean, Variance and Standard Deviation

• With the data set in the R object x, use:

mean(x) # for the meanvar(x); sd(x) # for the variance and standard deviation

• If the population is categorical,

table(x); table(x)/length(x)

return the sizes and proportions of the categories, respectively.

• If v contains the statistical population usevar(v)*(length(v)-1)/length(v) for the population variance.

Week 2 Basic Statistical Concepts, Part II

Page 6: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Basic Statistics and the BoxplotPie Charts, Bar Graphs, and HistogramsScatterplots, Scatterplot Matrices and 3D Scatterplots

Example

The productivity of each of the N = 10,000 employees of acompany is rated on a scale from 1 - 5. Let the statisticalpopulation v1, v2, . . . , v10,000 be

vi = 1, i = 1, . . . ,300,

vi = 2, i = 301, . . . ,1,000,

vi = 3, i = 1,001, . . . ,5,000,

vi = 4, i = 5,001, . . . ,9,000,

vi = 5, i = 9,001, . . . ,10,000.Find the population proportions for each rating category, theaverage rating, and the population variance and standarddeviation of rating.

Week 2 Basic Statistical Concepts, Part II

Page 7: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Basic Statistics and the BoxplotPie Charts, Bar Graphs, and HistogramsScatterplots, Scatterplot Matrices and 3D Scatterplots

Solution. Set the statistical population in v:

v=c(rep(1,300),rep(2,700),rep(3,4000),rep(4,4000),rep(5,1000))

Compute the proportions:

table(v)/10000

Compute the mean, variance and standard deviation:

mean(v); var(v)*(length(v)-1)/length(v)

sqrt(var(v)*(length(v)-1)/length(v))

Week 2 Basic Statistical Concepts, Part II

Page 8: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Basic Statistics and the BoxplotPie Charts, Bar Graphs, and HistogramsScatterplots, Scatterplot Matrices and 3D Scatterplots

ExampleTake a simple r.s. of size n = 500 from the population ofemployees from the previous example, and compute thesample proportions of the different ratings, the average rating,and the sample variance and standard deviation.

Solution.x=sample(v, size = 500)table(x)/500mean(x); var(x); sd(x)

Week 2 Basic Statistical Concepts, Part II

Page 9: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Basic Statistics and the BoxplotPie Charts, Bar Graphs, and HistogramsScatterplots, Scatterplot Matrices and 3D Scatterplots

Sample Percentiles

With the data set in the object x, the commands

median(x)quantile(x,0.25)quantile(x,c(0.3,0.7,0.9))summary(x)

R commandsfor percentiles

give, respectively, the median, the 25th percentile, the 30th,70th and 90th percentiles, and a five number summary of thedata consisting of x(1), q1, x̃ , q3, and x(n). [summary(x) alsogives the sample median.]

Week 2 Basic Statistical Concepts, Part II

Page 10: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Basic Statistics and the BoxplotPie Charts, Bar Graphs, and HistogramsScatterplots, Scatterplot Matrices and 3D Scatterplots

ExampleScientists have been monitoring the ozone hole since 1980. See the imagesshown in http://ozonewatch.gsfc.nasa.gov The 14 Ozonemeasurements (Dobson units) given in OzoneData set are taken in 2002from the lower stratosphere, between 9 and 12 miles altitude. Obtain the fivenumber summary as well as the 70th, 80th and 90th percentiles.Solution: Read the data in the R object oz usingoz = read.table(”http:

//media.pearsoncmg.com/cmg/pmmg_mml_shared/mathstatsresources/Akritas/OzoneData.txt”,

header =T)

and use x=oz$OzoneData; summary(x); quantile(x, c(0.7, 0.8, 0.9)) .

NOTE: By typing the commands oz; x you can see the difference betweenthe data frame oz and the data column x. Not all commands accept both:summary(oz) works but quantile(oz, c(0.7, 0.8, 0.9)) does not work.

Week 2 Basic Statistical Concepts, Part II

Page 11: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Basic Statistics and the BoxplotPie Charts, Bar Graphs, and HistogramsScatterplots, Scatterplot Matrices and 3D Scatterplots

ExampleSort the ozone measurements in increasing order anddetermine the sample percentiles each ordered observationcorresponds to.Solution: The commandssort(x); 100*(1:length(x) - 0.5)/length(x)return the order statistics and the percentile each order statisticestimates.

Week 2 Basic Statistical Concepts, Part II

Page 12: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Basic Statistics and the BoxplotPie Charts, Bar Graphs, and HistogramsScatterplots, Scatterplot Matrices and 3D Scatterplots

The Boxplot

The five number summary given by the “summary”command is the basis for the boxplot.A boxplot displays the central 50% of the data with a box:

the lower and upper edges are at q1 and q3, respectively,a line inside the box represents the median.

Extending from each edge of the box are whiskers:The lower (upper) whisker extends from q1 (q3) until thesmallest (largest) observation within 1.5 interquartileranges from q1 (q3).Observations further from the box than the whisker ends(i.e., smaller than q1 − 1.5 × IQR or larger thanq3 + 1.5 × IQR) are called outliers, and are plottedindividually.

Week 2 Basic Statistical Concepts, Part II

Page 13: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Basic Statistics and the BoxplotPie Charts, Bar Graphs, and HistogramsScatterplots, Scatterplot Matrices and 3D Scatterplots

The R command boxplot

ExampleConstruct the box plot for the ozone data. Are there anyoutliers?Solution: The ozone data are already in the object x. Use thecommand

boxplot(x, col=”grey”).

There are two outliers.

Week 2 Basic Statistical Concepts, Part II

Page 14: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Basic Statistics and the BoxplotPie Charts, Bar Graphs, and HistogramsScatterplots, Scatterplot Matrices and 3D Scatterplots

Outline1 Data Presentation (Lab 1)

Basic Statistics and the Boxplot

Pie Charts, Bar Graphs, and Histograms

Scatterplots, Scatterplot Matrices and 3D Scatterplots

2 Comparative Studies

Randomization, Confounding and Simpson’s Paradox

3 Causation: Experiments and Observational Studies

Factorial Experiments, Main and Interaction Effects

4 Comparative Graphics (Lab 2)

Week 2 Basic Statistical Concepts, Part II

Page 15: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Basic Statistics and the BoxplotPie Charts, Bar Graphs, and HistogramsScatterplots, Scatterplot Matrices and 3D Scatterplots

Pie Charts and Bar Graphs

Pie charts and bar graphs are used with count data whichdisplay the percentage of each category in the sample.For example, counts (or percentages or proportions) ofdifferent ethnic or education or income categories, themarket share of different car companies, and so on.The pie chart is popular in the mass media and one of themost widely used statistical charts in the business world.It is a circular chart, where the sample is represented by acircle divided into sectors whose sizes representproportions.

Week 2 Basic Statistical Concepts, Part II

Page 16: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Basic Statistics and the BoxplotPie Charts, Bar Graphs, and HistogramsScatterplots, Scatterplot Matrices and 3D Scatterplots

It has been pointed out that it is difficult to comparedifferent sections of a given pie chart.According to Steven’s power law length is a better scale touse than area.The bar graph uses bars of height proportional to theproportion it represents.

Remark: When the heights of the bars are arranged in a decreasingorder, the bar graph is also called Pareto chart. The Pareto chart isone of the key tools for quality control, where it is often used torepresent the most common sources of defects in a manufacturingprocess, the most frequent reasons for customer complaints, etc.

Week 2 Basic Statistical Concepts, Part II

Page 17: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Basic Statistics and the BoxplotPie Charts, Bar Graphs, and HistogramsScatterplots, Scatterplot Matrices and 3D Scatterplots

R commands for the pie chart and bar graph

ExampleThe MarketShareLightVeh data set displays the November 2011 light vehiclemarket share of car companies. Import the data set into the R data frame lv,and construct a pie chart and bar graph.Solution: The data frame lv has two columns with labels Company andPercent containing the manes of companies and their percent market share,respectively. (You can see that by typing the command lv after importing thedata.) The R commands for the pie chart and bar graph are:attach(lv)pie(Percent, labels=Company, col=rainbow(length(Percent)))barplot(Percent, names.arg=Company, col= rainbow(length(Percent)), las=2)detach(lv)The option las=2 in the barplot command is what results in the companynames to be written vertically.

Week 2 Basic Statistical Concepts, Part II

Page 18: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Basic Statistics and the BoxplotPie Charts, Bar Graphs, and HistogramsScatterplots, Scatterplot Matrices and 3D Scatterplots

Histograms

In histograms the range of the data is divided into bins, anda box is constructed above each bin.The height of each box is the bin’s frequency. Alternatively,the heights can be adjusted so the histogram’s area is one.R will automatically choose the number of bins but it alsoallows user specified intervals. Moreover, R offers theoption of constructing a smooth histogram.In stem and leaf plots each observation gets split into itsstem, which is the beginning digit(s), and its leaf, which isthe first of the remaining digits.They retain more information about the original data but donot offer as much flexibility in selecting the bins.

Week 2 Basic Statistical Concepts, Part II

Page 19: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Basic Statistics and the BoxplotPie Charts, Bar Graphs, and HistogramsScatterplots, Scatterplot Matrices and 3D Scatterplots

The R data set faithful

x = faithful$eruptions # set the eruption duration data in x

hist(x) # basic frequency histogram

hist(x, freq = FALSE) # histogram area = 1

plot(density(x)) # basic smooth histogram

hist(x, freq = F) ; lines(density(x)) # superimposes the two

stem(x) # basic stem and leaf plot

hist(x, freq = F, col=”grey”, main=”Histogram of Old Faithfuleruption durations”, xlab=”Eruption durations”) ;lines(density(x), col=”red”)

Week 2 Basic Statistical Concepts, Part II

Page 20: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Basic Statistics and the BoxplotPie Charts, Bar Graphs, and HistogramsScatterplots, Scatterplot Matrices and 3D Scatterplots

Outline1 Data Presentation (Lab 1)

Basic Statistics and the Boxplot

Pie Charts, Bar Graphs, and Histograms

Scatterplots, Scatterplot Matrices and 3D Scatterplots

2 Comparative Studies

Randomization, Confounding and Simpson’s Paradox

3 Causation: Experiments and Observational Studies

Factorial Experiments, Main and Interaction Effects

4 Comparative Graphics (Lab 2)

Week 2 Basic Statistical Concepts, Part II

Page 21: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Basic Statistics and the BoxplotPie Charts, Bar Graphs, and HistogramsScatterplots, Scatterplot Matrices and 3D Scatterplots

Scatterplot with gender identification

• With the bear measurements data in the data frame br, abasic chest girth and weight scatterplot can be constructed by:

attach(br); plot(Chest.G, Weight)

• An enhanced chest girth and weight scatterplot with genderdifferentiation can be constructed by:

plot(Chest.G, Weight, pch=21,bg=c(‘”red”,”green”)[unclass(Sex)])

legend( x=22, y=400,pch = c(21,21), col = c(”red”,”green”),legend = c(”Female”, ”Male”))

Week 2 Basic Statistical Concepts, Part II

Page 22: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Basic Statistics and the BoxplotPie Charts, Bar Graphs, and HistogramsScatterplots, Scatterplot Matrices and 3D Scatterplots

Scatterplot matrix with gender identification

• For more than two variables, a scatterplot matrix arranges allpairwise scatterplots in a matrix form. With the bearmeasurements in the data frame br use the command:

pairs(br[4:8],pch=21,bg=c(”red”, ”green”)[unclass(Sex)]) #br[4:8] is a data frame consisting of columns 4-8

• (∗)For a variation, which gives histograms on the diagonaland additional information, use the commands:

install.packages(”psych”) # installs the package psych

library(psych) # it activates the package

pairs.panels(br[4:8], pch=21,bg=c(”red”, ”green”)[unclass(Sex)])

Week 2 Basic Statistical Concepts, Part II

Page 23: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Basic Statistics and the BoxplotPie Charts, Bar Graphs, and HistogramsScatterplots, Scatterplot Matrices and 3D Scatterplots

(∗) 3D Scatterplots

With the bear measurements data in the data frame br (useinstall.packages(”scatterplot3d”) if not installed before) use:

library(scatterplot3d); scatterplot3d(br[6:8]) # for the basic 3Dscatterplot

scatterplot3d(br[6:8],angle=35, col.axis=“blue”, col.grid=“lightblue”, color=“red”) # angle and color controls

scatterplot3d(br[6:8], angle=35, col.axis=“blue”, col.grid=“lightblue”, color=“red”, type=“h”, box=F) # vertical lines, no box

scatterplot3d(br[6:8],pch=21,bg=c(“red”,“green”)[unclass(br$Sex)])# with gender differentiation

detach(br)Week 2 Basic Statistical Concepts, Part II

Page 24: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Basic Statistics and the BoxplotPie Charts, Bar Graphs, and HistogramsScatterplots, Scatterplot Matrices and 3D Scatterplots

Output options(∗)

The figure can be saved as pdf, or jpg etc. Alternatively:

pdf(”Desktop/HistOF.pdf”) # saves figure in Desktop/HistOF.pdf

hist(x, freq = F, col=”grey”); lines(density(x), col=”red”)

dev.off() # this must be done before opening the pdf file.

To save it as a jpg file replace pdf(”Desktop/HistOF.pdf”) inthe above set of commands by jpeg(”Desktop/HistOF.jpg”).To save text output to a txt file, for example the stem andleaf plot, copy and past, or use:

sink(“Desktop/StemOF.txt”); stem(x); sink(file=NULL)

Week 2 Basic Statistical Concepts, Part II

Page 25: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Randomization, Confounding and Simpson’s Paradox

• Comparative studies aim at discerning and explainingdifferences between two or more populations. Examplesinclude:

The comparison of two methods of cloud seeding for hailand fog suppression at international airports,the comparison of the survival times of a type of rootsystem under different watering regimens,the comparison of the effectiveness of three cleaningproducts in removing four different types of stains.

Week 2 Basic Statistical Concepts, Part II

Page 26: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Randomization, Confounding and Simpson’s Paradox

• Some common terms used in comparative studies are:

Experimental units: These are the subjects or objects onwhich measurements are made.Response variable: The variable being measured.One-factor studies.

Factor levels; treatments; populationsMulti-factor studies.

Factor level combinations; treatments; populations

• The notions of factor(s), factor levels and factor levelcombinations are explained in the two examples that follow.

Week 2 Basic Statistical Concepts, Part II

Page 27: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Randomization, Confounding and Simpson’s Paradox

ExampleTo compare the effect of four different watering regimens on thesurvival times of a type of root system,

The roots are the experimental units.The response variable is the survival time.Watering is the factor.The different watering regimens are the factor levels ortreatments. Treatments correspond to populations.

Week 2 Basic Statistical Concepts, Part II

Page 28: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Randomization, Confounding and Simpson’s Paradox

ExampleIn the same root survival time as above, it is desired to alsostudy the effect of depth on the survival of the root systems.Two different depths are to be considered. This is now atwo-factor study:

Factor A is depth with two levels. Factor B is watering withfour levels.Treatments, or populations, are the different factor levelcombinations. There are 2 × 4 = 8 treatments.As before, the root systems are the experimental units, andthe survival time is the response variable.

Week 2 Basic Statistical Concepts, Part II

Page 29: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Randomization, Confounding and Simpson’s Paradox

• The following table shows the eight factor level combinationsof the above two-factor study:

Factor BFactor A 1 2 3 4

1 Tr11 Tr12 Tr13 Tr14

2 Tr21 Tr22 Tr23 Tr24

Week 2 Basic Statistical Concepts, Part II

Page 30: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Randomization, Confounding and Simpson’s Paradox

Contrasts

Comparisons of treatments, or populations, typically focuson differences (e.g., of means, or proportions). Suchdifferences are called contrasts.

For example, the comparison of two different cloud seedingmethods may focus on the simple contrast µ1 − µ2.

In one-factor studies where the factor has more than twolevels, a number of different contrasts may be of interest.An example follows.In multi-factor studies interest lies in more specializedcontrasts, which are discussed in the section on FactorialExperiments.

Week 2 Basic Statistical Concepts, Part II

Page 31: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Randomization, Confounding and Simpson’s Paradox

ExampleIn a study to compare the mean tread life of four types of highperformance tires, possible sets of contrasts of interest are

1 µ1 − µ2, µ1 − µ3, µ1 − µ4 (control vs treatment)

2µ1 + µ2

2− µ3 + µ4

2(brand A vs brand B)

3 µ1 − µ, µ2 − µ, µ3 − µ, µ4 − µ (tire effects)

Week 2 Basic Statistical Concepts, Part II

Page 32: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Randomization, Confounding and Simpson’s Paradox

Outline1 Data Presentation (Lab 1)

Basic Statistics and the Boxplot

Pie Charts, Bar Graphs, and Histograms

Scatterplots, Scatterplot Matrices and 3D Scatterplots

2 Comparative Studies

Randomization, Confounding and Simpson’s Paradox

3 Causation: Experiments and Observational Studies

Factorial Experiments, Main and Interaction Effects

4 Comparative Graphics (Lab 2)

Week 2 Basic Statistical Concepts, Part II

Page 33: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Randomization, Confounding and Simpson’s Paradox

• To avoid comparing apples with oranges, the experimentalunits for the different treatments must be homogenous.

If fabric age affects the effectiveness of cleaning productsthen, unless the fabrics used in different treatments areage- homogenous, the comparison of treatments will bedistorted.

• To mitigate the distorting effects, or confounding, of otherpossible factors, called lurking variables, it is recommendedthat the allocation of units to treatments be randomized.

Week 2 Basic Statistical Concepts, Part II

Page 34: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Randomization, Confounding and Simpson’s Paradox

Randomizing the allocation of fabric pieces to the differenttreatments (cleaning product and stain) avoidsconfounding with the factor age of fabric.

• The distortion caused by lurking variables in the comparisonof proportions is called Simpson’s Paradox.

Week 2 Basic Statistical Concepts, Part II

Page 35: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Randomization, Confounding and Simpson’s Paradox

ExampleThe success rates of two treatments, Treatments A and B, forkidney stones are:

Treatment A Treatment B78% (273/350) 83% (289/350)

The obvious conclusion is that Treatment B is more effective.The lurking variable here is the size of the kidney stone.

Week 2 Basic Statistical Concepts, Part II

Page 36: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Randomization, Confounding and Simpson’s Paradox

Example (Kidney Stone Example Continued)When the size of the treated kidney stone is taken intoconsideration, the success rates are as follows:

Small Large CombinedTr.A 81/87 or .93 192/263 or .73 273/350 or .78Tr.B 234/270 or .87 55/80 or .69 289/350 or .83

Now we see that Treatment A has higher success rate for bothsmall and large stones.

Week 2 Basic Statistical Concepts, Part II

Page 37: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Randomization, Confounding and Simpson’s Paradox

Example (Batting Averages)The overall batting average of baseball players Derek Jeter andDavid Justice during the years 1995 and 1996 were 0.310 and0.270, respectively. But looking at each year separately we geta different picture:

1995 1996 CombinedJeter 12/48 or .250 183/582 or .314 195/630 or .310

Justice 104/411 or .253 45/140 or .321 149/551 or .270

Justice had a higher batting average than Jeter in both 1995and 1996.

Week 2 Basic Statistical Concepts, Part II

Page 38: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Factorial Experiments, Main and Interaction Effects

DefinitionA study is called a statistical experiment if the investigatorcontrols the allocation of units to treatments or factor-levelcombinations, and this allocation is done in a randomizedfashion. Otherwise the study is called observational.

• Causation can only be established via a statisticalexperiment. Thus, a relation between salary increase andproductivity does not imply that salary increases causeincreased productivity.• Observational studies cannot establish causation, unlessthere is additional corroborating evidence. Thus, the linkbetween smoking and health has been established throughobservational studies with the use of additional corroboratingevidence.

Week 2 Basic Statistical Concepts, Part II

Page 39: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Factorial Experiments, Main and Interaction Effects

Outline1 Data Presentation (Lab 1)

Basic Statistics and the Boxplot

Pie Charts, Bar Graphs, and Histograms

Scatterplots, Scatterplot Matrices and 3D Scatterplots

2 Comparative Studies

Randomization, Confounding and Simpson’s Paradox

3 Causation: Experiments and Observational Studies

Factorial Experiments, Main and Interaction Effects

4 Comparative Graphics (Lab 2)

Week 2 Basic Statistical Concepts, Part II

Page 40: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Factorial Experiments, Main and Interaction Effects

A statistical experiment involving several factors is called afactorial experiment if all factor-level combinations areconsidered. Thus,

Factor BFactor A 1 2 3 4

1 Tr11 Tr12 Tr13 Tr14

2 Tr21 Tr22 Tr23 Tr24

is a factorial experiment if all 8 treatments are included inthe study.

Week 2 Basic Statistical Concepts, Part II

Page 41: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Factorial Experiments, Main and Interaction Effects

Main Effects and Interactions

In factorial experiments it is not enough to considerdifferences between the levels within each factorseparately. Possible synergistic effects are also of interest.

DefinitionIf there are synergistic effects among two different factors, i.e.,when a change in the level of factor A has different effects onthe response depending on the level of factor B, we say thatthere is interaction between the two factors. The absence ofinteraction is called additivity.

Week 2 Basic Statistical Concepts, Part II

Page 42: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Factorial Experiments, Main and Interaction Effects

ExampleAn experiment considers two types of corn, used for bio-fuel,and two types of fertilizer. The following two tables givepossible population mean yields for the four combinations ofseed type and fertilizer type.

Week 2 Basic Statistical Concepts, Part II

Page 43: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Factorial Experiments, Main and Interaction Effects

Fertilizer Row MainI II Averages Row Effects

Seed A µ11 = 107 µ12 = 111 µ1· = 109 α1 = −0.25

Seed B µ21 = 109 µ22 = 110 µ2· = 109.5 α2 = 0.25Column

Averages µ·1 = 108 µ·2 = 110.5 µ·· = 109.25Main

Column β1 = −1.25 β2 = 1.25Effects

Here the factors interact.

Week 2 Basic Statistical Concepts, Part II

Page 44: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Factorial Experiments, Main and Interaction Effects

Fertilizer Row Main RowI II Averages Effects

Seed A µ11 = 107 µ12 = 111 µ1· = 109 α1 = −1

Seed B µ21 = 109 µ22 = 113 µ2· = 111 α2 = 1Column

Averages µ·1 = 108 µ·2 = 112 µ·· = 110Main

Column β1 = −2 β2 = 2Effects

Here the factors do not interact.

Week 2 Basic Statistical Concepts, Part II

Page 45: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Factorial Experiments, Main and Interaction Effects

Under additivity:There is an indisputably best level for each factor, andThe best factor level combination is that of the best level offactor A with the best level of factor B.What is the best level of each factor in the above design?

Under additivity, the comparison of the levels within eachfactor are based on the factor’s main effects:

αi = µi· − µ··, βj = µ·j − µ··

Under additivity,µij = µ·· + αi + βj

Week 2 Basic Statistical Concepts, Part II

Page 46: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Factorial Experiments, Main and Interaction Effects

When the factors interact, the cell means are not given interms of the main effects as above.The difference

γij = µij − (µ·· + αi + βj)

quantifies the interaction effect.For example, in the above non-additive design,

γ11 = µ11 − µ·· − α1 − β1

= 107 − 109.25 + 0.25 + 1.25

= −0.75.

Week 2 Basic Statistical Concepts, Part II

Page 47: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Factorial Experiments, Main and Interaction Effects

Data Versions of Main Effects and Interactions

Data from a two-factor factorial experiment use threesubscripts:

Factor BFactor A 1 2 3

1 x11k , x12k , x13k ,k = 1, . . . ,n11 k = 1, . . . ,n12 k = 1, . . . ,n13

2 x21k , x22k , x23k ,k = 1, . . . ,n21 k = 1, . . . ,n22 k = 1, . . . ,n23

Week 2 Basic Statistical Concepts, Part II

Page 48: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Factorial Experiments, Main and Interaction Effects

Sample versions of main effects and interactions aredefined using

x ij =1nij

nij∑k=1

xijk ,

instead of µij :

α̂i = x i· − x ··, β̂j = x ·j − x ··Sample Main Row

and Column Effects

γ̂ij = x ij −(

x ·· + α̂i + β̂j

) Sample InteractionEffects

Week 2 Basic Statistical Concepts, Part II

Page 49: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Factorial Experiments, Main and Interaction Effects

Sample versions of main effects and interactions estimatetheir population counterparts but, in general, they are notequal to them.Thus, even if the data has come from an additive design,the sample interaction effects will not be zero.The interaction plot is a graphical technique that can helpassess whether the sample interaction effects aresignificantly different from zero.

For each level of, say, factor B, the interaction plot tracesthe cell means along the levels of factor A. Seehttp://personal.psu.edu/acq/401/fig/CloudSeedInterPlot.pdf for an example.For data coming from additive designs, these traces (orprofiles) should be approximately parallel.

Week 2 Basic Statistical Concepts, Part II

Page 50: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

• In this unit we will see the comparative boxplot, thecomparative bar graph, and the interaction plot, where

The comparative boxplot consists of side-by-side individualboxplots for the data sets from each population. It providesa visual impression of differences in the median andpercentiles of the levels in one-factor studies.The comparative bar graph provides visual comparison ofthe categories’ proportions for two populations.The interaction plot provides a visual aid for assessing thepresence of interactions in two-factor studies.

• R commands for computing the main effects and interactionsfor two-factor design will also be given in this unit.

Week 2 Basic Statistical Concepts, Part II

Page 51: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

The Comparative Boxplot

ExampleIron concentration measurements from four ore formations aregiven in the FeData data set. Read this data into the R dataframe fe, and construct a comparative boxplot.

Solution: With the data set read into the data frame fe, use thecommands

fe[1:3,] # to see what the data frame looks like

boxplot(fe$conc ∼ fe$ind, col=rainbow(4))

Week 2 Basic Statistical Concepts, Part II

Page 52: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

The Notched Boxplot

Notched boxplots provide additional information through thenotches: If notches do not overlap we may, as an informal test,conclude that the population medians differ.

Import the steal strength (SteelStrengthData.txt) and the robotreaction times (RobotReactTime.txt) data in the data frames ssand rt, respectively, use attach(ss); attach(rt), and compare thenotched boxplots produced by

boxplot(Value ∼ Sample, col=rainbow(2),notch=T)

boxplot(Time ∼ Robot, col=rainbow(2),notch=T); detach(ss);detach(rt)

Week 2 Basic Statistical Concepts, Part II

Page 53: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

The Comparative Bar Graph

ExampleThe light vehicle market share of car companies for the monthof November in 2010 and 2011 is given in the data fileMarketShareLightVehComp.txt. Construct a comparative bargraph.Solution: With the data set read into the data frame lv2, use thecommandsm=rbind(lv2$Percent 2010, lv2$Percent 2011)barplot(m, names.arg=lv2$Company, ylim=c(0,20),col=c(”darkblue”, ”red”), legend.text= c(”2010”,”2011”),beside=T,las=2)

Week 2 Basic Statistical Concepts, Part II

Page 54: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

The Interaction Plot

The interaction plot is a useful graphical technique forassessing whether the sample interaction effects aresufficiently different from zero to imply a non-additivedesign.For each level of one factor, say factor B, the interactionplot traces the cell means along the levels of the otherfactor. If the design is additive, these traces (also calledprofiles) should be approximately parallel.

Week 2 Basic Statistical Concepts, Part II

Page 55: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Example (Cloud seeding in Tasmania)

The could seeding data (CloudSeed2w.txt) was collected tostudy the effect of the factors seed and season on rainfall(source: Miller, A.J, et al. (1979), Analyzing the results of acloud-seeding experiment in Tasmania, Communications inStatistics - Theory & Methods, A8(10), 1017-1047). Constructthe interaction plot.Solution: Import the data set into the data frame cs and useattach(cs) and the commandinteraction.plot(season, seeded, rain, col=c(2,3), lty = 1,xlab=”Season”, ylab=”Cell Means of Rainfall”,trace.label=”Seeding”)

Week 2 Basic Statistical Concepts, Part II

Page 56: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Computation of the main and interaction effects of a two-factordesign require the use of the tapply command in R, and thecommands for computing certain means of a matrix. Thesecommands are presented first.

Week 2 Basic Statistical Concepts, Part II

Page 57: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

The R function tapply

• The tapply function is useful when we need to break up a setof numbers into subgroups, which are defined by someclassifying factor(s), compute a statistic on each subgroup, andreturn the results in a convenient form.

• We will use the tapply function to compute the sampleaverages in all factor-level combinations in a × b designs, i.e.,we will break up the set of values of the response variable intothe subgroups defined by the factor level combinations,compute the sample mean for each subgroup, and return theresults in a matrix form. See the example that follows

Week 2 Basic Statistical Concepts, Part II

Page 58: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

ExampleCompute the sample averages for all factor-level combinationsof the cloud seeding data set.Solution: With the cloud seeding data set in the data frame cs,use the command

mcm=tapply(cs$rain, cs[,c(2,3)],mean) # matrix of cell meansmcm # to display the results

Week 2 Basic Statistical Concepts, Part II

Page 59: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Means of a Matrix

• The functions mean, rowMeans, and colMeans, when appliesto a matrix, return the mean of all elements of the matrix, thevector of row means and the vector of column means.

mean(mcm) # returns the mean of all sample means.rowMeans(mcm) # returns the vector of row meanscolMeans(mcm) # returns the vector of column means

Week 2 Basic Statistical Concepts, Part II

Page 60: Week 2 Basic Statistical Concepts, Part IIpersonal.psu.edu › acq › 401 › course.info › week2.pdf · Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices

OutlineData Presentation (Lab 1)

Comparative StudiesCausation: Experiments and Observational Studies

Comparative Graphics (Lab 2)

Main Effects and Interactions in R

ExampleUse R to compute the main and interaction effects for thefactors seed and season of the cloud seeding data.Solution. Use the following commands:

alphas= rowMeans(mcm) - mean(mcm) # main row effects

betas = colMeans(mcm) - mean(mcm) # main column effects

gammas=t(mcm-mean(mcm)-alphas)-betas # matrix ofinteraction effects.

Week 2 Basic Statistical Concepts, Part II