unit #1 chapters by jeremy green, adam paquettey, and matt staub
TRANSCRIPT
UNIT #1CHAPTERS
BY JEREMY GREEN, ADAM PAQUETTEY, AND MATT
STAUB
CHAPTER 2: DATA• THE FIVE W’S
• WHO- THE ROWS OF A DATA TABLE THAT CORRESPOND TO THE INDIVIDUAL CASES ABOUT WHOM WE RECORD SOME STATISTICS.
• WHAT- THE CHARACTERISTICS RECORDED ABOUT EACH INDIVIDUAL
• WHY- REASON DATA WAS COLLECTED
• WHERE- WHERE THE DATA WAS COLLECTED
• WHEN- THE TIME THAT THE DATA WAS COLLECTED
• HOW- IT IS IMPORTANT TO EXPLAIN THE TYPE OF EXPERIMENT, SURVEY, OR STUDY THAT WAS CONDUCTED
• COLLECTED DATA IS ORGANIZED INTO DATA TABLESX Y Z
A 123 34567 56789
B 123345 789 0987654
CHAPTER 2 CONT.• VARIABLES
• MEASURED IN UNITS
• CATEGORICAL VARIABLE- ANSWERS QUESTIONS ABOUT HOW CASES FALL INTO CATEGORIES
• QUANTITATIVE VARIABLE- ANSWERS QUESTIONS ABOUT THE QUANTITY OF WHAT IS MEASURED
• TYPES OF RESPONDENTS
• SUBJECTS- PEOPLE WE EXPERIMENT ON
• EXPERIMENTAL UNITS- ANIMALS, PLANTS, WEB SITES AND OTHER INANIMATE SUBJECTS
• RESPONDENTS- INDIVIDUALS WHO ANSWER A SURVEY
CHAPTER 3: DISPLAYING AND DESCRIBING DATA
• THREE RULES OF DATA ANALYSIS
1. MAKE A PICTURE
2. MAKE A PICTURE
3. MAKE A PICTURE
• FREQUENCY TABLE
• TABLE THAT ORGANIZES COUNTS FOR CATEGORICAL DATA
• RELATIVE FREQUENCY TABLES SHOW PERCENTS
• IMPORTANT TO KNOW PROPORTIONS SO WE CAN USE PERCENTS
• AREA PRINCIPLE- THE AREA OCCUPIED BY A PART OF THE GRAPH SHOULD CORRESPOND TO THE MAGNITUDE OF THE VALUE IT REPRESENTS.
CHAPTER 3 CONT.• BAR CHART- DISPLAYS THE DISTRIBUTION OF A CATEGORICAL VARIABLE, SHOWING THE COUNTS
FOR EACH CATEGORY NEXT TO EACH OTHER FOR EASY COMPARISON.
• PIE CHARTS- SHOWS ALL THE CASES ON AS A CIRCLE AND THEY SLICE THE CIRCLE INTO PIECES WHO SIZES ARE PROPORTIONAL TO THE FRACTION OF THE WHOLE OF EACH CATEGORY.
• CONTINGENCY TABLE
• SHOWS TWO VARIABLES SIDE BY SIDE
• MARGINAL DISTRIBUTION- SHOWS THE COUNTS FOR EACH VARIABLE
• CONDITIONAL DISTRIBUTION- SHOWS THE PERCENTS FOR EACH VARIABLE
• INDEPENDENCE- WHEN THE DISTRIBUTION OF ONE VARIABLE IS THE SAME FOR ALL CATEGORIES OF ANOTHER
AP Stats Grades
A B
C D
F
X Y
A 5678 234567
B 98765 345678
Total 99999 98765
Bar Chart Pie Chart
Contingency Table
CHAPTER 4: DISPLAYING AND SUMMARIZING DATA
• HISTOGRAM- REPRESENTS COUNTS AS BARS AND PLOTS THEM AGAINST QUANTITATIVE DATA.
• RELATIVE FREQUENCY HISTOGRAM- SAME AS HISTOGRAM, REPLACING THE COUNTS ON THE VERTICAL AXIS WITH PERCENTAGES OF THE TOTAL NUMBER OF CASES.
• STEM-AND-LEAF PLOT- SIMILAR TO A HISTOGRAM, BUT IT SHOWS EACH INDIVIDUAL VALUE.
• DOTPLOT- A DOT IS PLACED ALONG AN AXIS FOR EACH CASE IN THE DATA.
• QUANTITATIVE DATA CONDITION- THE DATA ARE VALUES OF A QUANTITATIVE VARIABLE WHOSE UNITS ARE KNOWN. MUST KNOW THIS BEFORE MAKING A GRAPHICAL DISPLAY.
CHAPTER 4 CONT.• THREE THINGS TO DESCRIBE A DISTRIBUTION
1. SHAPE- WHETHER IT UNIMODAL OR BIMODAL, SYMMETRIC OR SKEWED, AND WHETHER OR NOT THERE ARE OUTLIERS.
2. CENTER- THE CENTER OF THE DATA. USUALLY TALKS ABOUT THE MEDIAN.
MEDIAN-IS THE MIDDLE VALUE THAT DIVIDES THE TWO HALVES OF THE HISTOGRAM.
3. SPREAD- THE RANGE AND INTERQUARTILE RANGE OF THE DATA.
RANGE- THE DIFFERENCE BETWEEN THE MAXIMUM AND THE MINIMUM OF THE DATA.
INTERQUARTILE RANGE- THE DIFFERENCE BETWEEN THE UPPER QUARTILE RANGE AND THE LOWER QUARTILE RANGE
• 5 NUMBER SUMMARY- REPORTS THE MEDIAN, QUARTILES, MINIMUM, AND THE MAXIMUM OF A DATA SET.
CHAPTER 4 CONT.• MEAN
• FEELS LIKE THE CENTER BECAUSE IT IS THE POINT WHERE THE HISTOGRAM BALANCES.
• CALCULATED BY DIVIDING THE TOTAL OF YOUR DATA BY THE NUMBER OF DATA POINTS.
• USED WHEN THE HISTOGRAM IS SYMMETRIC AND THERE ARE NO OUTLIERS.
• MEDIAN
• IS RESISTANT TO VALUES THAT ARE EXTRAORDINARILY LARGE OR SMALL
• USED WHEN THE DATA IS SKEWED OR HAS OUTLIERS.
• STANDARD DEVIATION
• ACCOUNTS FOR HOW FAR EACH VALUE IS FROM THE MEAN.
• ONLY WORKS FOR SYMMETRIC DATA.
• CANNOT BE CALCULATED BY ITS SELF, SO YOU MUST TAKE THE SQUARE ROOT OF THE VARIANCE IN ORDER TO OBTAIN THE STANDARD DEVIATION.
CHAPTER 5: UNDERSTANDING AND COMPARING DISTRIBUTIONS
• BOXPLOT- A GRAPHICAL REPRESENTATION OF A 5 NUMBER SUMMARY. ALSO, SHOWS OUTLIERS OF THE DATA.
• OUTLIERS
• ANY POINT THAT HAS LEVERAGE ON THE DATA DUE TO BEING EXTREMELY HIGH OR EXTREMELY LOW.
• TO DETERMINE WHETHER OR NOT A POINT IS AN OUTLIER YOU USE THE FORMULA: 1.5 X IQR THEN SUBTRACT FROM LOWER QUARTILE AND ADD TO UPPER QUARTILE.
• RE-EXPRESSING OR TRANSFORMING DATA- APPLY A SIMPLE FUNCTION TO FIX SKEWED DATA. EX: TAKING THE NATURAL LOG OF YOUR DATA.
• BOXPLOTS ALLOW YOU TO COMPARE MULTIPLE SPREADS OF DATA.
COMPARING DISTRIBUTIONS
CHAPTER 6: THE STANDARD DEVIATION AS A RULER AND THE NORMAL MODEL
• STANDARD DEVIATION
• ANSWERS THE QUESTION HOW FAR IS THIS VALUE FROM THE MEAN AND HOW DIFFERENT ARE THESE TWO STATISTICS
• STANDARDIZED VALUES OR Z-SCORES MEASURE THE DISTANCE OF EACH DATA VALUE FROM THE MEAN IN STANDARD DEVIATIONS. STANDARDIZED VALUES HAVE NO UNITS.
• SHIFTING DATA
• WHEN WE ADD OR SUBTRACT A CONSTANT TO EACH VALUE ALL MEASURES OF POSITION(CENTER, PERCENTILES, MIN, AND MAX) WILL INCREASE OR DECREASE BY THAT SAME CONSTANT. THIS LEAVES SPREAD THE SAME.
• WHEN WE MULTIPLY OR DIVIDE BY A CONSTANT TO EACH VALUE ALL MEASURES OF POSITION AND SPREAD WILL BE MULTIPLIED OR DIVIDED BY THAT CONSTANT.
CHAPTER 6 CONT.• NORMAL MODEL
• THE BELL SHAPE CURVE THAT IT IS APPROPRIATE FOR DISTRIBUTIONS WHOSE SHAPES ARE UNIMODAL AND SYMMETRIC.
• NUMBERS WE USE TO SPECIFY THIS MODEL ARE CALLED PARAMETERS.
• SUMMARIES OF THIS DATA ARE CALLED STATISTICS.
• A NORMAL MODEL WITH A MEAN OF 0 AND A STANDARD DEVIATION OF 1 IS CALLED THE STANDARD NORMAL MODEL.
• IN ORDER TO USE THIS MODEL THE DATA MUST MEET THE NEARLY NORMAL CONDITION.
• THE 68-95-99.7 RULE- SAYS THAT 68% OF THE DATA WILL FALL WITHIN 1 STANDARD DEVIATION OF THE MEAN, 95% WILL FALL WITHIN 2 STANDARD DEVIATIONS, AND 99.7% WILL FALL WITHIN 3 STANDARD DEVIATIONS.
CHAPTER 6 CONT.
• RULES FOR WORKING WITH THE NORMAL MODEL
1. MAKE A PICTURE
2. MAKE A PICTURE
3. MAKE A PICTURE
• NORMAL PROBABILITY PLOT- TELLS YOU IF YOUR DATA IS NORMAL BY SHOWING WHETHER OR NOT YOUR DATA LIES ON A DIAGONAL LINE.