math 80 lects week3 f19 - fredpark.com€¦ · chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. math...

72
Math 80 Lecture 7

Upload: others

Post on 16-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80• Lecture 7

Page 2: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 7

Dr. Fred Park

Graphical Representation of Data Cont’d.

Stem and Leaf Plot:quick way to look at small amounts of numerical data

Math 80 Test Grades example (grades are a bit high)percentage grades of 25 studentsDraw a stem and leaf plot

Divide each number so that the tens digit is the stem and the ones digit is the leaf:62 --> 6|2place on vertical chart

Page 3: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 7

Dr. Fred Park

Graphical Representation of Data Cont’d.

Divide each number so that the tens digit is the stem and the ones digit is the leaf:62 --> 6|2, place on vertical chart.

Stems on chart below (high to low): 2 placed on right of 6

Page 4: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 7

Dr. Fred Park

Graphical Representation of Data Cont’d.

6|2 8|7 remaining

Page 5: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 7

Dr. Fred Park

Graphical Representation of Data Cont’d.

remaining

sort leaf values horizontally (low to high)

Page 6: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 7

Dr. Fred Park

Graphical Representation of Data Cont’d.

Now can interpret data:• somewhat symmetric• center roughly 70

Page 7: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 7

Dr. Fred Park

Graphical Representation of Data Cont’d: Scatter Plots

Q: what if want to see if two different variables are related

ex. is there a relationship between elevation and temp on a given day?

prelim: state random variablesx = altitudey = high temperatureplot x vs y

Page 8: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 7

Dr. Fred Park

Graphical Representation of Data Cont’d: Scatter Plots

Scatter Plots in R

#get the input valuesinput <- mtcars[,c(‘wt’,’mpg’)]

# plot the chart for cars w/ weights between 2.5 and 5# and mileage between 15 and 30 (no prius)plot(x=input$wt, y=input$mpg,

xlab = “weight”,ylab = “mileage”,xlim = c(2.5,5),ylim = c(15,30),main = “Weight vs Mileage”)

Page 9: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

• Lecture 8

Page 10: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 8

Dr. Fred Park

Graphical Representation of Data Cont’d: Scatter Plots

Scatter Plots in R

#get the input valuesinput <- mtcars[,c(‘wt’,’mpg’)]

# plot the chart for cars w/ weights between 2.5 and 5# and mileage between 15 and 30 (no prius)

R-code:

plot(x=input$wt, y=input$mpg, xlab = “weight”,ylab = “mileage”,xlim = c(2.5,5),ylim = c(15,30),main = “Weight vs Mileage”)

Page 11: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 8

Dr. Fred Park

class exercise:

ex 1. Table contains value of a house and amount of rental income in a year house brings in. Create a scatter plot and state if there is a relationship between the value of the house and annual rental income

Use R for this one!

Page 12: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 8

Dr. Fred Park

class exercise:

ex 2.

do by hand!

Page 13: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 8

Dr. Fred Park

HW#2: Due Next Thursday 10/3Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72.

Page 14: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 8

Dr. Fred Park

Graphical Representation of Data Cont’d: Histograms

Q: what if want to see distribution of data for continuous numerical data

A Histogram consists of adjoining boxes where • each height is the frequency or relative frequency on one axis• on the other are binned values or partition points bet. values

f = frequencyn = total # of data valuesRF = relative frequency = f/n

ex. data = {1,2,2,3,3,3,4,4,5}bins = {[0.5,1.5), [1.5,2.5),[2.5,3.5),[3.5,4.5),[4.5,5.5)} = {b1,b2,b3,b4,b5}

frequency of 1 in bin 1 = 1frequency of 2 in bin 2 = 2frequency of 3 in bin 3 = 3frequency of 4 in bin 4 = 2frequency of 5 in bin 5 = 1

Page 15: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 8

Dr. Fred Park

Graphical Representation of Data Cont’d: Histograms

ex. data = {1,2,2,3,3,3,4,4,5}bins = {[0.5,1.5), [1.5,2.5),[2.5,3.5),[3.5,4.5),[4.5,5.5)} = {b1,b2,b3,b4,b5}

frequency of 1 in bin 1 = 1frequency of 2 in bin 2 = 2frequency of 3 in bin 3 = 3frequency of 4 in bin 4 = 2frequency of 5 in bin 5 = 1

R code:%rx <-c(1,2,2,3,3,3,4,4,5)bins <-seq(0.5,5.5,by=1)hist(x,breaks=bins,col="red")

Page 16: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 8

Dr. Fred Park

Graphical Representation of Data Cont’d: Histograms

ex. data = {1,2,2,3,3,3,4,4,5}bins = {[0.5,1.5), [1.5,2.5),[2.5,3.5),[3.5,4.5),[4.5,5.5)} = {b1,b2,b3,b4,b5}

frequency of 1 in bin 1 = 1frequency of 2 in bin 2 = 2frequency of 3 in bin 3 = 3frequency of 4 in bin 4 = 2frequency of 5 in bin 5 = 1

rel freq of 1 = 1/9rel freq of 2 = 2/9rel freq of 3 = 3/9rel freq of 4 = 2/9rel freq of 5 = 1/9

R code:%rx <-c(1,2,2,3,3,3,4,4,5)bins <-seq(0.5,5.5,by=1)hist(x,breaks=bins,freq=FALSE,col="red")

Page 17: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 8

Dr. Fred Park

Graphical Representation of Data Cont’d: Histograms

Diff between histogram and bar chart?

Note for bin boundaries, you can use [ ,) or (, ] or () or even []first two are more optimal since you do not double count.whichever used depends on book

software implementations use some combination of these boundaries.we will use [ , )

Page 18: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

• Lecture 9

Page 19: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 9

Dr. Fred Park

Graphical Representation of Data Cont’d: Histograms

Monthly rent dataCreate a histogram by hand and in R

Page 20: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 9

Dr. Fred Park

Graphical Representation of Data Cont’d: Histograms

Page 21: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 8

Dr. Fred Park

Graphical Representation of Data Cont’d: Hi

Page 22: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 9

Dr. Fred Park

Graphical Representation of Data Cont’d: Histograms

Page 23: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 9

Dr. Fred Park

Graphical Representation of Data Cont’d: Histograms

Page 24: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 9

Dr. Fred Park

Graphical Representation of Data Cont’d: Histograms%r

x<-c(1500,1500,1250,900,1350,1150,600,800,350,1500,610,2550,1200,900,960,495,850,1400,890,1200,900,1100,1325,690)

xmin = min(x)xmax = max(x)sprintf("min = %2.1f, max = %2.1f", xmin, xmax)

L = xmax-xminprint(L)nBins = 7if (L/nBins == 0){

del = ceiling(L/(nBins))+1}else{ bw = ceiling(L/(nBins))

}#round up to next integer even if already an integersprintf("del = %2.0f", del)bin_lim <-seq(xmin-1/2,xmax+1/2+del,by=del)cat("bin limits = ", bin_lim)bins<-bin_lim - 0.5cat("\nbins=" ,bins)bin_mpt<-c(507,822,1137,1452,1767,2082,2397)hist(x,breaks=bins,freq=TRUE,col="red")#print(bins)

Page 25: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 9

Dr. Fred Park

Graphical Representation of Data Cont’d: Histograms

See Example from Classon hand construction of histogram of dataD = {1,2,2,3,3,3,4,4,5}

Page 26: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

• Lecture 10

Page 27: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 10

Dr. Fred Park

Graphical Representation of Data Cont’d: HistogramsD = {1,2,2,3,3,3,4,4,5}

xmin = 1, xmax = 5

n = #data points = 9

L = length of data = xmax-xmin = 5-1 = 4

nb = # of bins = 5 (usually n^(1/2) for larger data sets )

bw = bin width = L/#bins = 4/5 = 0.8 (take ceiling if decimal, else +1 if integer)

del = partition offset = 0.5 (if integer valued data. free choice dep. on data)

Page 28: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 10

Dr. Fred Park

Graphical Representation of Data Cont’d: HistogramsD = {1,2,2,3,3,3,4,4,5}

break points = bp = bin boundaries = {xmin-del, xmin-del+bw, xmin-del+2*bw, ..., xmin-del+nb*bw} = {1-0.5, 1-0.5+1, 1-0.5+2, 1-0.5+3, 1-0.5+4, 1-0.5+5}= {0.5, 1.5, 2.5, 3.5, 4.5, 5.5} (note: #bp’s = 6)

bins = {[0.5,1.5), [1.5,2.5), [2.5,3.5), [3.5,4.5), [4.5,5.5)}= {b1, b2, b3, b4, b5}

#bins = #bp’s-1 = 6-1 = 5

1 2 3 4 5

xmin-del xmin-del+bwxmin-del+5*bw

bw

Page 29: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 10

Dr. Fred Park

Graphical Representation of Data Cont’d: Histograms

bin bin centers data value frequency[0.5,1.5) 1 1 1

[1.5,2.5) 2 2 2

[2.5,3.5) 3 3 3

[3.5,4.5) 4 4 2

[4.5,5.5) 5 5 1

note: bin boundaries vary with different software platforms, books, interpretationse.g. can be [, ] or [, ) or (, ] or (, ) depending on author or implementation

bin centers = (left bp + right bp)/2e.g. for bin [0.5, 1.5 ), center = (1.5+0.5)/2 = 1

Page 30: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 10

Dr. Fred Park

Graphical Representation of Data Cont’d: Histograms

stunning histogram!!

green: break ptsred: bin centers

bin bin centers

data value

frequency

[0.5,1.5) 1 1 1

[1.5,2.5) 2 2 2

[2.5,3.5) 3 3 3

[3.5,4.5) 4 4 2

[4.5,5.5) 5 5 1

Page 31: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 10

Dr. Fred Park

Page 32: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 10

Dr. Fred Park

Graphical Representation of Data Cont’d: Histograms

Monthly rent dataCreate a histogram by (1) hand and (2) in R

Page 33: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 10

Dr. Fred Park

Graphical Representation of Data Cont’d: Time Series

Time Series plot: graph showing data measurements in chronological order

Page 34: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 10

Dr. Fred Park

Graphical Representation of Data Cont’d: Time Series

Time Series plot: graph showing data measurements in chronological order

Page 35: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 10

Dr. Fred Park

Graphical Representation of Data Cont’d: Time Series

Time Series plot: graph showing data measurements in chronological order

Page 36: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 10

Dr. Fred Park

Graphical Representation of Data Cont’d: Histograms

Class exercise:find 2 data sets1. Data that you can create a histogram for2. Data that you can create a time series for

Plot both in R

Page 37: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

• Lecture 11

Page 38: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 11

Dr. Fred Park

Measures of Center of Data

mode: data value that occurs most frequently in datafind it by looking at data pt with highest frequencye.g. D = {1,2,2,3,3,3,3,4,5,8,8,8,8,8,9,10,11,2}what’s mode?

median: data value in the middle of a sorted listfind it by sorting data and taking middle valuee.g. D = {2,1,5,3,4} what’s median?How’s about D = {4,2,3,1}?

mean: arithmetic average of the numbersfind mean of D = {1,2,3,4,5}?or D = {1,2,1,1,3,4,6}

Page 39: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 11

Dr. Fred Park

Measures of Center of Data

Page 40: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 11

Dr. Fred Park

1. Find the mean median and mode of the data:D = {6.8, 8.2,7.5,9.4,8.2}weights of cats in lbs.

2. Find a data set that interests you and calculate the meanmedian and mode

Page 41: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 11

Dr. Fred Park

ex 1. Find the mean median and mode of the data:D = {6.8, 8.2,7.5,9.4,8.2}weights of cats in lbs.

variable x = weight of the cat

à mean:

6.8 7.5 8.2 8.2 9.4à median: sort list

take middle value

à mode: most freq’ly occurring value = 8.2

Page 42: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 11

Dr. Fred Park

ex 2. Find the mean median and mode of the data:D = {6.8, 8.2,7.5,9.4,8.2,6.3} #even number of ptsweights of cats in lbs.

variable x = weight of the cat

6.3 6.8 7.5 8.2 8.2 9.4à median: sort list

take middle value

even # pts = no middle data valueso avg 2 neighbor pts

median = (7.5+8.2)/2 = 7.85 lbs.

Page 43: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 11

Dr. Fred Park

ex 3. Effect of Extreme Values on Mean and MedianD = {6.8, 7.5,8.2,8.2,9.4,22.1} #even number of ptsweights of cats in lbs.

variable x = weight of the cat

note: median > mean

mean went from 8.02 to 10.37 but median stayed the samemean effected by extreme values more median is not

fat cat brought the mean up! à outlier!

Page 44: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 11

Dr. Fred Park

However, due to fact that data is sampled, mean is a morereliable measure of the center (e.g. consistent) of the data

see different distributions of data below:

mean < median mean, median, modeall centered

mean > median

Page 45: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 11

Dr. Fred Park

Average vs Weighted Average?suppose you take 3 classes Spring 2018:Math 141A (5 units), grade = A-COSC 220 (3 units), grade = BMath 80 (3 units), grade = C

what is the avg gpa for that semester?

method 1 avg: (3.7 + 3 + 2)/3 = 2.9

method 2 weighted avg: (5*3.7 + 3*3 + 3*2)/(5+3+3) = 3.0455

Discrepancy!

Page 46: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 11

Dr. Fred Park

Average vs Weighted Average?suppose you take 3 classes Spring 2018:Math 141A (3 units), grade = A-COSC 220 (3 units), grade = BMath 80 (3 units), grade = C

weighted avg gp = (3*3.7 + 3*3 + 3*2)/(3+3+3) = 3(3.7+3+2)/3(1+1+1) = (3.7+3+2)/3 = avg gpa

note if all weights equal, avg = weighted averagewhy?

Page 47: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 11

Dr. Fred Park

ex. weighted average

Page 48: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 11

Dr. Fred Park

Measures of Spread of Data

going back to cat example:mean = avg weight = 8.02 lbs

were most of weights close to this weight?how far off were they?

range of data = highest value – lowest value = max val – min val

cat example: D = {6.8, 8.2, 7.5, 9.4, 8.2}variable x = weight of a catmean = 8.02

range of data = 9.4-6.8 = 2.6

look at distance from data to mean: called deviation

Page 49: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 11

Dr. Fred Park

Measures of Spread of Data

cat example: D = {6.8, 8.2, 7.5, 9.4, 8.2}variable x = weight of a carmean = 8.02

look at distance from data to mean: called deviation

Page 50: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 11

Dr. Fred Park

Measures of Spread of Data

sum all deviations

why sum of deviations = 0?

Page 51: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 11

Dr. Fred Park

Measures of Spread of Data

why sum of deviations = 0?

better sum squares of deviations

Page 52: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 11

Dr. Fred Park

Measures of Spread of Data

better sum squares of deviations

avg total of squared deviations:

note: 1 less than # data pts

standard deviation:

standard deviation: avg (mean) distance from a data pt. to the meanhow much a typ data pt differs from mean.

Page 53: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 11

Dr. Fred Park

Measures of Spread of Data

standard deviation: avg (mean) distance from a data pt. to the meanhow much a typ data pt differs from mean.

n-1 used due to degrees of freedom.makes sample stdv better approx pop’n stdv

Page 54: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 11

Dr. Fred Park

Measures of Spread of Data

standard deviation: avg (mean) distance from a data pt. to the meanhow much a typ data pt differs from mean.

Page 55: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

• Lecture 12

Page 56: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 12

Dr. Fred Park

Measures of Spread of Data

Recall:standard deviation: avg (mean) distance from a data pt. to the meanhow much a typ data pt differs from mean.

Page 57: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 12

Dr. Fred Park

Page 58: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 12

Dr. Fred Park

Page 59: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 12

Dr. Fred Park

squared deviations for training 1 squared deviations for training 2

Page 60: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 12

Dr. Fred Park

R code:

%rx1 <- c(56,75,48,63,59)x2 <- c(60,58,66,59,58)

x1_bar = sum(x1)/length(x1)x2_bar = sum(x2)/length(x2)

sprintf("mean of data set #1 = %2.2f", x1_bar)sprintf("mean of data set #1 = %2.2f", x2_bar)

sigma1 = sqrt(sum((x1-x1_bar)^2)/(length(x1)-1))sigma2 = sqrt(sum((x2-x2_bar)^2)/(length(x2)-1))

sprintf("sigma1 = %2.2f",sigma1)sprintf("sigma2 = %2.2f",sigma2)

output: 'mean of data set #1 = 60.20''mean of data set #1 = 60.20''sigma1 = 9.93''sigma2 = 3.35'

Page 61: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 12

Dr. Fred Park

Page 62: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 12

Dr. Fred Park

Page 63: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 12

Dr. Fred Park

Page 64: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 12

Dr. Fred Park

Ranking

A percentile is measure of ranking

The kth percentile: data value that has k% of the data at or below that value

e.g. The median is the 50th percentile

If you are in the 90th percentile what does that mean? (no pun intended)

Page 65: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 12

Dr. Fred Park

Ranking

A percentile is measure of ranking

The kth percentile: data value that has k% of the data at or below that value

e.g. The median is the 50th percentile

If you are in the 90th percentile what does that mean? (no pun intended)

This means that 90% of the scores were below this score. So you did the same or better than 90% of the test takers

Page 66: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 12

Dr. Fred Park

Quartiles: split the data into fourths

Interquartile Range (IQR):IQR = Q3-Q1 typical box plot

Page 67: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 12

Dr. Fred Park

example:Total assets (in billions of AUD) of Australian Banks (2012)

2855 2862 2861 2884 3014 2965

2971 3002 3032 2950 2967 2964

find the 5 number summary and interquartile range IQR

variable x = total assets of Austr. bankssort the data

min = 2855 billion AUDmax = 3032 billion AUD

median = ?

Page 68: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 12

Dr. Fred Park

sorted data with total assets median

median = (2964+2965)/2 = 2964.5 billion AUD

Q1? find median of 1st half of list

Q1 = (2862+2884)/2 = 2873 bill. AUD

Page 69: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 12

Dr. Fred Park

sorted data with total assets median

Q3? find median of 3rd half of list

Q3 = (2971+3002)/2 = 2986.5 bill. AUD

five number summary (in billions of AUD):min = 2855Q1 = 2873median = 2964.5Q3 = 2986.5max = 3032

Page 70: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 12

Dr. Fred Park

five number summary (in billions of AUD):min = 2855Q1 = 2873median = 2964.5Q3 = 2986.5max = 3032

IQR = Q3-Q1 = 2986.5-2873 = 113.5 billion AUD

à middle 50% of assets were within 113.5 billion AUD of each other

Page 71: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 12

Dr. Fred Park

Box-and-Whiskers Plot (Box Plot)

Box-and-Whiskers Plot of Total Assets of Aust. Banks in 2012

distribution is skewed right bc right tail is longer

Page 72: Math 80 lects week3 f19 - fredpark.com€¦ · Chap 2: 3,8,12,21,40,41,43,44,45,49-55, 69-72. Math 80: Elementary Statistics Lecture 8 Dr. Fred Park Graphical Representation of Data

Math 80: Elementary StatisticsLecture 12

Dr. Fred Park

Create a Box-and-Whiskers Plot (Box Plot) for following

ex. The life expectancy for a person living in one of 11 countriesin a region of South East Asia in 2012 is given below

Find the 5 number summary of the data and the IQR and drawa box-and-whiskers plot.

Starter:variable x = life expectancy of a personsort the listcalculate approp. medians to split the data into different quartiles