stat 31, section 1, last time
DESCRIPTION
Stat 31, Section 1, Last Time. Course Organization & Website https://www.unc.edu/%7Emarron/UNCstat31-2005/Stat31sec1Home.html What is Statistics? Data types and structure Get going in EXCEL Exploratory Data Analysis Bar Graphs. Stat 31, Student Poll Results. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/1.jpg)
Stat 31, Section 1, Last Time
• Course Organization & Websitehttps://www.unc.edu/%7Emarron/UNCstat31-2005/Stat31sec1Home.html
• What is Statistics?
• Data types and structure
• Get going in EXCEL
• Exploratory Data Analysis
• Bar Graphs
![Page 2: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/2.jpg)
Stat 31, Student Poll Results
Stat 31, Sec 1: Majors
0
5
10
15
20
25
30
35
Busine
ss
Bus. +
Biolog
y
Public
Poli
cy
Envrio
nmen
t
Health
Pol.
Poli S
ci
OR - Actu
ary
Undec
ided
Other
As indicated on “Student Info” form:
Big changes from the past:
More biology
More diversity
![Page 3: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/3.jpg)
Stat 31, Student Poll Results
“Have you taken an AP Exam?”
Only ~10% had & grades generally low
So don’t worry if you haven’t…
![Page 4: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/4.jpg)
Major Concept: Distributions
“Distribution” = “Patterns of data”
= “way data is spread out”
e.g. Bar Graph is visual display of categorical “distribution”
![Page 5: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/5.jpg)
Exploratory Data Analysis 3
Visual Display of Quantitative Distributions:
1. Stem and Leaf Plots
Not Recommended
(Main motivation was pencil and paper statistical analysis, but now have better graphical methods readily accessible)
A limited special case of….
![Page 6: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/6.jpg)
Visual Disp: Quantitative Dist’ns
2. Histograms
Idea: Apply bar graph idea,
By creating categories,
Called “class intervals” or “classes” or “bins”
![Page 7: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/7.jpg)
Histograms
Idea: put numbers into “bins”,
bar heights are counts, or “frequencies”
1.3
3.6
1.9
3.1
1.5 0 1 2 3 4
![Page 8: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/8.jpg)
Class Histogram Example
Buffalo, N. Y. (Annual) Snowfall Data
Raw Data:https://www.unc.edu/%7Emarron/UNCstat31-2005/Stat31Eg2Raw.xls
63 years, ranging from ~30 - ~120 (inches)
![Page 9: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/9.jpg)
Buffalo Snowfall Data
Buffalo, N. Y. (Annual) Snowfall Data
Raw Data:https://www.unc.edu/%7Emarron/UNCstat31-2005/Stat31Eg2Raw.xls
63 years, ranging from ~30 - ~120 (inches)
Histogram Analysis (pre-done):https://www.unc.edu/%7Emarron/UNCstat31-2005/Stat31Eg2Done.xls
![Page 10: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/10.jpg)
Buffalo Snowfall Data, I
A. EXCEL default (of bin edges)
• Unround numbers for bin edges
• Data “centered around 90”
• Most data between 50 and 130
• Assymetric Distribution
![Page 11: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/11.jpg)
Buffalo Snowfall Data, II
B. Smaller bins
• Chosen by me
• Binwidth = 5, << ~13 from EXCEL default
• Nicer edge numbers
• Data centered around 84 (now more precise)
• Bar graph rougher (fewer points in each bin)
• Suggests 3 main groups (called “modes”)
(can’t see this above: bin width counts)
![Page 12: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/12.jpg)
Buffalo Snowfall Data, III
C. Larger bins
• Chosen by me
• Binwidth = 30, >> ~13 from EXCEL default
• Bar graph is “smooth”
(since many points in each bin)
• Only one mode???
• Quite symmetric?
(different from above: bin width counts)
![Page 13: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/13.jpg)
Buffalo Snowfall Data, IV
C. What’s under the hood (how to do this):
i. Tools Data Analysis Histogram (& Chart Out)
(may need Data Analysis “Add-in”)
i. Massage pic (especially bar width)
ii. Sigma min, max
iii. Bin range: create first two & drag
iv. Histogram, using input bin edges
![Page 14: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/14.jpg)
Buffalo Snowfall Data, IV
C. What’s under the hood (how to do this):
i. Tools Data Analysis Histogram (& Chart Out)
(may need Data Analysis “Add-in”)
i. Massage pic (especially bar width)
ii. Sigma min, max
iii. Bin range: create first two & drag
iv. Histogram, using input bin edges
![Page 15: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/15.jpg)
Histogram HWHW: 1.21
• Use Excel and histograms
• Get data from CDrom
• Do both: – Excel Default bins
– Bins set to: 0,10,20,…,240
• Which gives answers closer to answers in back of book?
• Turn in only one page
![Page 16: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/16.jpg)
Histogram Binwidths
Nice Example from the Webster West, U.S.C.:
http://www.stat.sc.edu/~west/applets/histogram.html
Control Binwidth with slider:
• Undersmoothing?
• About right?
• Oversmoothing?
(critical to visual impression)
![Page 17: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/17.jpg)
Histogram Binwidth Example
Hidalgo Stamp Data
From Mexico in 1800s
How many sources of paper?
How many modes:
1, 2, 5, 7, 10?
![Page 18: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/18.jpg)
Histogram Binwidth Example
How many modes?
Caution: Answer depends on binwidth
(a serious and current
statistical research problem)
![Page 19: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/19.jpg)
Stamps Data Histogram
How many modes?
2nd Caution: Answer also depends on bin location
(i.e. “shift” of bins)
![Page 20: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/20.jpg)
Histogram Bins
For this course:
Try several binwidths, to “get the idea”
Weakness of EXCEL (we will see several):
This is inconvenient
![Page 21: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/21.jpg)
Comparison of Histograms
Class Example: Study Habits Data
Idea: Compare Study Habits of Males vs. Females (measured by some “survey score”, perhaps of questionable value?)
https://www.unc.edu/%7Emarron/UNCstat31-2005/Stat31Eg4Done.xls
![Page 22: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/22.jpg)
Study Habits Data
EXCEL default histograms:
• Populations look similar???
• Careful: Binwidth very big…
• Careful: Different bin ranges…
• Need smaller binwidths, and common scales
![Page 23: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/23.jpg)
Study Habits Data
Better Choice: Binwidths = 10, same bins for both
• Clear difference, easy to see
• Females higher “on average”
• Males are “more spread”
• 1 “exceptional value”, really true???
![Page 24: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/24.jpg)
Things to look for (in histo’s)
1. Population Center Point (Study Habits Data)
2. Population Spread (Study Habits Data)
3. Shape - Symmetric vs. Skewed
Right Skewed:
Left Skewed:
1. Modes - Unexpected clusters
2. Outliers - “unusual data points”
![Page 25: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/25.jpg)
Comparison of Histograms HW
HW: 1.25b, 1.27, 1.29, 1.22• Work in this order• Get data from CDrom• Use EXCEL and histograms• Odd answers in back• You choose the bins
(if you miss something in answers, change this)• Turn in at most one page for each
![Page 26: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/26.jpg)
Plotting Bivariate Data
Toy Example:
(1,2)
(3,1)
(-1,0)
(2,-1)
Toy Scatterplot, Separate Points
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
-2 -1 0 1 2 3 4
x
y
![Page 27: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/27.jpg)
Plotting Bivariate Data
Sometimes:
Can see more
insightful patterns
by connecting
points
Toy Scatterplot, Connected points
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
-2 -1 0 1 2 3 4
x
y
![Page 28: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/28.jpg)
Plotting Bivariate Data
Sometimes:
Useful to switch off
points, and only
look at lines/curves
Toy Scatterplot, Lines Only
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
-2 -1 0 1 2 3 4
x
y
![Page 29: Stat 31, Section 1, Last Time](https://reader035.vdocument.in/reader035/viewer/2022062422/5681339f550346895d9ab3bd/html5/thumbnails/29.jpg)
Plotting Bivariate Data
Common Name: “Scatterplot”
A look under the hood:
EXCEL: Chart Wizard (colored bar icon)
• Chart Type: XY (scatter)
• Subtype conrols points only, or lines
• Later steps similar to above
(can massage the pic!)