chapter 3.4 exploratory data analysis. traditional statistics data are organized by using a...
TRANSCRIPT
![Page 1: Chapter 3.4 Exploratory Data Analysis. Traditional Statistics Data are organized by using a frequency distribution Use distribution to create various](https://reader036.vdocument.in/reader036/viewer/2022082611/56649ec85503460f94bd5448/html5/thumbnails/1.jpg)
Chapter 3.4
Exploratory Data Analysis
![Page 2: Chapter 3.4 Exploratory Data Analysis. Traditional Statistics Data are organized by using a frequency distribution Use distribution to create various](https://reader036.vdocument.in/reader036/viewer/2022082611/56649ec85503460f94bd5448/html5/thumbnails/2.jpg)
Traditional Statistics Data are organized by using a frequency
distribution
Use distribution to create various graphs, histogram, frequency polygon, ogive
Mean and standard deviation are computer to summarize data
Purpose is to confirm various conjectures about the nature of the data
![Page 3: Chapter 3.4 Exploratory Data Analysis. Traditional Statistics Data are organized by using a frequency distribution Use distribution to create various](https://reader036.vdocument.in/reader036/viewer/2022082611/56649ec85503460f94bd5448/html5/thumbnails/3.jpg)
Exploratory Data Analysis (EDA) Purpose is to examine data to find out what
information can be discovered about the data such as the center and the spread
Organized using a stem and leaf plot
Measure of central tendency is the median and variation is the interquartile range
Represented graphically using a boxplot
![Page 4: Chapter 3.4 Exploratory Data Analysis. Traditional Statistics Data are organized by using a frequency distribution Use distribution to create various](https://reader036.vdocument.in/reader036/viewer/2022082611/56649ec85503460f94bd5448/html5/thumbnails/4.jpg)
Quartiles Quartiles divide the distribution into four
groups, separated by Q1, Q2, Q3
Q1 is the same as the 25th percentile
Q2 is the same as the 50th percentile (median)
Q3 is the same as the 75th percentile
For example: 5, 6, 12, 13, 15, 18, 22, 50
![Page 5: Chapter 3.4 Exploratory Data Analysis. Traditional Statistics Data are organized by using a frequency distribution Use distribution to create various](https://reader036.vdocument.in/reader036/viewer/2022082611/56649ec85503460f94bd5448/html5/thumbnails/5.jpg)
The five number summary1. The lowest value of the data set (minimum)
2. Q1
3. the median
4. Q3
5. The highest value of the data set (maximum)
![Page 6: Chapter 3.4 Exploratory Data Analysis. Traditional Statistics Data are organized by using a frequency distribution Use distribution to create various](https://reader036.vdocument.in/reader036/viewer/2022082611/56649ec85503460f94bd5448/html5/thumbnails/6.jpg)
Boxplot A boxplot is a graph of a data set obtained by
drawing a horizontal line from the minimum data value to Q1 , drawing a horizontal line from Q3 to the maximum data value, and drawing a box whose vertical sides pass through Q1 and Q3 with a vertical line inside the box passing through the median or Q2
![Page 7: Chapter 3.4 Exploratory Data Analysis. Traditional Statistics Data are organized by using a frequency distribution Use distribution to create various](https://reader036.vdocument.in/reader036/viewer/2022082611/56649ec85503460f94bd5448/html5/thumbnails/7.jpg)
Procedure for constructing a boxplot1. Find the five-number summary for the data
values
2. Draw a horizontal axis with a scale such that it includes the maximum and the minimum data values.
3. Draw a box whose vertical sides go through Q1 and Q3 and draw a vertical line through the median
4. Draw a line from the minimum data value to the left side of the box and a line from the maximum data value to the right side of the box.
![Page 8: Chapter 3.4 Exploratory Data Analysis. Traditional Statistics Data are organized by using a frequency distribution Use distribution to create various](https://reader036.vdocument.in/reader036/viewer/2022082611/56649ec85503460f94bd5448/html5/thumbnails/8.jpg)
Number of Meteorites Found The number of meteorites found in 10 states
of the U. S. is 89, 47, 164, 296, 30, 215, 138, 78, 48, 39. Construct a boxplot for the data
![Page 9: Chapter 3.4 Exploratory Data Analysis. Traditional Statistics Data are organized by using a frequency distribution Use distribution to create various](https://reader036.vdocument.in/reader036/viewer/2022082611/56649ec85503460f94bd5448/html5/thumbnails/9.jpg)
Information obtained from a boxplot If the median is near the
center of the box, the distribution is approximately symmetric
If the median falls to the left for the center of the box, the distribution is positively (right) skewed.
If the median falls to the right of the center, the distribution is negatively (left) skewed.
If the lines are about the same length, the distribution is approximately symmetric
If the right line is larger than the left line, the distribution is positively (right) skewed
If the left line is larger than the right line, the distribution is negatively (left) skewed
![Page 10: Chapter 3.4 Exploratory Data Analysis. Traditional Statistics Data are organized by using a frequency distribution Use distribution to create various](https://reader036.vdocument.in/reader036/viewer/2022082611/56649ec85503460f94bd5448/html5/thumbnails/10.jpg)
Sodium Content of Cheese A dietitian is interest in comparing the sodium
content of real cheese with the sodium content of a cheese substitute. Compare the distribution using boxplots.
Real Cheese Cheese Substitute
310 4520 45 40 270 180 250 290
220 240 180 90 130 260 340 310
![Page 11: Chapter 3.4 Exploratory Data Analysis. Traditional Statistics Data are organized by using a frequency distribution Use distribution to create various](https://reader036.vdocument.in/reader036/viewer/2022082611/56649ec85503460f94bd5448/html5/thumbnails/11.jpg)
Resistant Statistic A resistant statistic is relatively less affected
by outliers than a nonresistant statistic.
The mean and standard deviation are nonresistant statistics
Sometimes, when a distribution is skewed or contains outliers, the median and interquartile range may more accurately summarize the data than the mean and standard deviation
![Page 12: Chapter 3.4 Exploratory Data Analysis. Traditional Statistics Data are organized by using a frequency distribution Use distribution to create various](https://reader036.vdocument.in/reader036/viewer/2022082611/56649ec85503460f94bd5448/html5/thumbnails/12.jpg)
Correspondence between traditional and exploratory data analysis
Traditional Exploratory data analysis
Frequency Distribution Stem and leaf plot
Histogram boxplot
Mean median
Standard Deviation interquartile range
![Page 13: Chapter 3.4 Exploratory Data Analysis. Traditional Statistics Data are organized by using a frequency distribution Use distribution to create various](https://reader036.vdocument.in/reader036/viewer/2022082611/56649ec85503460f94bd5448/html5/thumbnails/13.jpg)
Try it! Applying the concepts 3-4
Pg. 174