lincoln jiang statistical consultant western michigan university the graduate college graduate...
TRANSCRIPT
![Page 1: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/1.jpg)
Lincoln JiangStatistical Consultant
Western Michigan UniversityThe Graduate College
Graduate Center for Research and Retention
![Page 2: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/2.jpg)
Definition of Statistics
Statistics is the art of making numerical conjectures about puzzling questions.
--- Statistics Fourth Edition
by Freedman
![Page 3: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/3.jpg)
Definition of Statistics
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data.
---From Wikipedia
![Page 4: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/4.jpg)
Definition of Statistics
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data.
---From Wikipedia
![Page 5: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/5.jpg)
Definition of Statistics
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data.
---From Wikipedia
![Page 6: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/6.jpg)
Definition of Statistics
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data.
---From Wikipedia
![Page 7: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/7.jpg)
Definition of Statistics
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data.
---From Wikipedia
![Page 8: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/8.jpg)
Basic TermsVariables
Characteristics that can take on any number of different values
ValuesPossible numbers or categories that of a
variable can haveScores
A particular person’s value on a variable
![Page 9: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/9.jpg)
Types of DataQualitative data --nonnumeric
eg: types of material {straw, sticks, bricks}Quantitative -- numeric Discrete data --numeric data that have a finite number
of possible values eg: counting numbers, {1,2,3,4,5} Continuous data
--numeric data that have a infinite number of possible values
eg: Real numbers
![Page 10: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/10.jpg)
Types of ScaleNominal---have no order and thus only gives names or
labels to various categories. Variables assessed on a nominal scale are called
categorical variables
Ordinal---have order, but the interval between measurements is not meaningful.
Interval---have meaningful intervals between measurements, but there is no true starting point (zero).
Eg: temperature with the Celsius scale
Ratio---have the highest level of measurement. Ratios between measurements as well as intervals are meaningful because there is a starting point (zero).
Eg: length, time, plane angle, energy
![Page 11: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/11.jpg)
EX
![Page 12: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/12.jpg)
Definition of Statistics
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data.
---From Wikipedia
![Page 13: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/13.jpg)
Collecting Data
“Twenty-five percent of Americans doubt that the Holocaust ever occurred.”
--- a news in 1993
Census
Sample Survey
![Page 14: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/14.jpg)
Why Study Samples?Often not practical to study an entire populationInstead, researchers attempt to make samples
representative of populationsRandom selection
Each member of population has an equal chance of being sampled
Good but difficultHaphazard selection
Take steps to ensure samples do not differ from the population in systematic ways
Not as good but much more practical
![Page 15: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/15.jpg)
Sample vs. PopulationSample
Relatively small number of instances that are studied in order to make inferences about a larger group from which they were drawn
PopulationThe larger group from
which a sample is drawn
![Page 16: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/16.jpg)
Sample vs. Population ExamplesPopulation
a. pot of beansb. larger circlec. histogram
Samplea. spoonfulb. smaller circlec. shaded scores
![Page 17: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/17.jpg)
Sampling MethodsSimple Random Sampling
Systematic sampling
Stratified sampling
Cluster sampling
Other samplings: Quota sampling, Mechanical sampling and so on
![Page 18: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/18.jpg)
Definition of Statistics
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data.
---From Wikipedia
![Page 19: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/19.jpg)
After Collecting…….Before Analyzing….
![Page 20: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/20.jpg)
Frequency TablesFrequency table
Shows how many times each value was used for a particular variable
Percentage of scores of each valueGrouped frequency table
Range of scores in each of several equally sized intervals
![Page 21: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/21.jpg)
Steps for Making a Frequency Table
1. Make a list of each possible value, from highest to lowest
2. Go one by one through the data, making a mark for each data next to its value on the list
3. Make a table showing how many times each value on your list was used
4. Figure the percentage of data for each value
![Page 22: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/22.jpg)
A Frequency Table
Stress rating Frequency Percent,%10 14 9.3
9 15 9.9
8 26 17.2
7 31 20.5
6 13 8.6
5 18 11.9
4 16 10.6
3 12 7.9
2 3 2.0
1 1 0.7
0 2 1.3
![Page 23: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/23.jpg)
A Grouped Frequency TableStress rating interval Frequency Percent
10-11 14 9
8-9 41 27
6-7 44 29
4-5 34 23
2-3 15 10
0-1 3 2
![Page 24: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/24.jpg)
Frequency GraphsHistogram
Depicts information from a frequency table or a grouped frequency table as a bar graph
EX2
![Page 25: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/25.jpg)
Shapes of Frequency DistributionsUnimodal
Having one peakBimodal
Having two peaksMultimodal
Having two or more peaks
RectangularHaving no peaks
![Page 26: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/26.jpg)
Symmetrical vs. Skewed Frequency DistributionsSymmetrical distribution
Approximately equal numbers of observations above and below the middle
Skewed distributionOne side is more spread out that the other, like
a tailDirection of the skew
Right or left (i.e., positive or negative) Side with the fewer scores Side that looks like a tail
![Page 27: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/27.jpg)
Skewed Frequency DistributionsSkewed right (b)
Fewer scores right of the peakPositively skewedCan be caused by a floor effect
Skewed left (c)Fewer scores left of the peakNegatively skewedCan be caused by a ceiling effect
![Page 28: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/28.jpg)
Ceiling and Floor EffectsCeiling effects
Occur when scores can go no higher than an upper limit and “pile up” at the top
e.g., scores on an easy exam, as shown on the right
Causes negative skewFloor effects
Occur when scores can go no lower than a lower limit and pile up at the bottom
e.g., household incomeCauses positive skew
![Page 29: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/29.jpg)
KurtosisDegree to which tails of the distribution are
“heavy” or “light”heavy tails = higher Kurtosis(b)Light tails = lower Kurtosis(c)Normal distribution= Zero Kurtosis (a)
![Page 30: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/30.jpg)
Measures of Central TendencyCentral tendency = representative or typical
value in a distributionmean, the median and the mode
can measure central tendency.MeanComputed by
Summing all the scores (sigma, ) Dividing by the number of scores (N)
M
XN
![Page 31: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/31.jpg)
Measures of Central TendencyMean
Often the best measure of central tendency Most frequently reported in research articles
Think of the mean as the “balancing point” of the distribution
![Page 32: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/32.jpg)
Measures of Central TendencyMode
Most common single number in a distributionIf distribution is symmetrical and unimodal, the
mode = the meanTypical way of describing central tendency of a
nominal variable
![Page 33: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/33.jpg)
Measures of Central TendencyMedian
Middle value in a group of scoresPoint at which
half the scores are above half the scores are below
Unaffected by extremity of individual scores Unlike the mean Preferable as a measure of central tendency when a
distribution has some extreme scores
![Page 34: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/34.jpg)
Measures of Central TendencyExamples of means as
balancing points of various distributionsDoes not have to be a
score exactly at the median
Note that a score’s distance from the balancing point matters in addition to the number of scores above or below it
![Page 35: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/35.jpg)
Measures of Central TendencyExamples of means
and modes
![Page 36: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/36.jpg)
Measures of Central TendencySteps to computing the median
1. Line up scores from highest to lowest2. Figure out how many scores to the middle
Add 1 to number of scores Divide by 2
3. Count up to middle score If there is 1 middle score, that’s the median If there are 2 middle scores, median is their
average
Ex3
![Page 37: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/37.jpg)
Measures of VariationVariation = how
spread out data isVariance
Measure of variationAverage of each score’s
squared deviations (differences) from the mean
![Page 38: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/38.jpg)
Measures of VariationSteps to computing the variance
1. Subtract the mean from each data
2. Square each deviation value
3. Add up the squared deviation scores
4. Divide sum by the number of scores
ix x2( )ix x
2( )ix x2( )ix x
n
Ex4
![Page 39: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/39.jpg)
Measures of VariationStandard deviation
Another measure of variation, roughly the average amount that scores differ from the mean
Used more widely than varianceAbbreviated as “SD”
To compute standard deviationCompute varianceSimply take the square root
SD is square root of variance Variance is SD squared
2SD Variance
![Page 40: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/40.jpg)
Two Branches of Statistical MethodsDescriptive statistics
Summarize and describe a group of numbers such as the results of a research study
Inferential statisticsAllow researchers to draw conclusions and
inferences that are based on the numbers from a research study, but go beyond these numbers
![Page 41: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/41.jpg)
The Normal CurveOften seen in social and behavioral science
research and in nature generallyParticular characteristics
Bell-shapedUnimodalSymmetricalAverage tails
Bean Machine
![Page 42: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/42.jpg)
Z Scoresindicates how many standard deviations
an observation is above or below the mean
If Z>0, indicate the data > meanIf Z<0, indicate the data < mean
Z score of 1.0 is one SD above the mean Z score of -2.5 is two-and-a-half SDs below the mean Z score of 0 is at the mean
SD
MXZ
)(
![Page 43: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/43.jpg)
Z ScoresWhen values in a distribution are
converted to Z scores, the distribution will have Mean of 0Standard deviation of 1
UsefulAllows variables to be compared to one another
Provides a generalized standard of comparison
![Page 44: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/44.jpg)
Z ScoresTo compute a Z
score, subtract the mean from a raw score and divide by the SD
To convert a Z score back to a raw score, multiply the Z score by the SD and then add the mean
SD
MXZ
)(
MSDZX ))((
Ex5
![Page 45: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/45.jpg)
Confidence Intervalconfidence interval
(CI) is a particular kind of interval estimate of a population parameter.
How likely the interval is to contain the parameter is determined by the confidence level
"95% confidence interval"
Animation
ex6
![Page 46: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/46.jpg)
CorrelationA statistic for describing the relationship
between two variablesExamples
Price of a bottle of wine and its quality Hours of studying and grades on a statistics exam Income and happiness Caffeine intake and alertness
![Page 47: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/47.jpg)
Graphing Correlations on a Scatter DiagramScatter diagram
Graph that shows the degree and pattern of the relationship between two variables
Horizontal axisUsually the variable that does
the predicting e.g., price, studying, income,
caffeine intake
Vertical axisUsually the variable that is
predicted e.g., quality, grades, happiness,
alertness
![Page 48: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/48.jpg)
Graphing Correlations on a Scatter DiagramSteps for making a
scatter diagram1. Draw axes and
assign variables to them
2. Determine the range of values for each variable and mark the axes
3. Mark a dot for each person’s pair of scores
![Page 49: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/49.jpg)
CorrelationLinear correlationPattern on a scatter
diagram is a straight lineExample above
Curvilinear correlation More complex
relationship between variables
Pattern in a scatter diagram is not a straight line
Example below
![Page 50: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/50.jpg)
CorrelationPositive linear correlation
High scores on one variable matched by high scores on another
Line slants up to the rightNegative linear correlation
High scores on one variable matched by low scores on another
Line slants down to the right
![Page 51: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/51.jpg)
CorrelationZero correlation
No line, straight or otherwise, can be fit to the relationship between the two variables
Two variables are said to be “uncorrelated”
![Page 52: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/52.jpg)
Correlation Reviewa. Negative linear
correlationb. Curvilinear
correlationc. Positive linear
correlationd. No correlation
![Page 53: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/53.jpg)
Correlation CoefficientCorrelation coefficient, r,
indicates the precise degree of linear correlation between two variables
Computed by taking “cross-products” of Z scoresMultiply Z score on one variable by Z
score on the other variableCompute average of the resulting
productsCan vary from
-1 (perfect negative correlation) through 0 (no correlation) to +1 (perfect positive correlation)
Nr ZZ YX
![Page 54: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/54.jpg)
![Page 55: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/55.jpg)
Correlation and CausalityWhen two variables are
correlated, three possible directions of causalityX->YX<-YX<-Z->Y
Inherent ambiguity in correlations
Knowing that two variables are correlated tells you nothing about their causal relationship
![Page 56: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/56.jpg)
PredictionCorrelations can be used to make predictions
about scoresPredictor
X variable Variable being predicted from
Criterion Y variable Variable being predicted
Sometimes called “regression”
![Page 57: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/57.jpg)
Multiple Correlation and Multiple RegressionMultiple correlation
Association between criterion variables and two or more predictor variables
Multiple regressionMaking predictions about criterion variables
based on two or more predictor variablesUnlike prediction from one variable,
standardized regression coefficient is not the same as the ordinary correlation coefficient
![Page 58: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/58.jpg)
Proportion of Variance Accounted ForCorrelation coefficients
Indicate strength of a linear relationshipsCannot be compared directlye.g., an r of .40 is more than twice as strong as an r
of .20To compare correlation coefficients, square
themAn r of .40 yields an r2 of .16; an r of .20 an r2 of .04Squared correlation indicates the proportion of
variance on the criterion variable accounted for by the predictor variable
R-square
![Page 59: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/59.jpg)
Most Commonly Used Statistical TechniquesLinear Regression (Predicts the value of one
numerical variable given another variable)- How much does the maximum legibility
distance of Highway signs decrease when age is increased?
![Page 60: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/60.jpg)
Data on winning bid price for 12 Saturn cars on eBaY in July 2002
• Simple linear regression is a data analysis technique that tries to find a linear pattern in the data.
•In linear regression, we use all of the data to calculate a straight line which may be used to predict Price based on Miles.
• Since Miles is used to predict Price, Miles is called an `Explanatory (Independent) Variable' while Price is called a `Response (Dependent) Variable'.
![Page 61: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/61.jpg)
•The slope of the line is -.05127, which means that predicted Price tends to drop 5 cents for every additional mile driven, or about $512.70 for every 10,000 miles.
•The intercept (or Y-intercept) of the line is $8136; this should not be interpreted as the predicted price of a car with 0 mileage because the data provides information only for Saturn cars between 9,300 miles and 153,260 miles
•We can now use the line to predict the selling price of a car with 60000 miles. What is the height or Y value of the line at X=60000? The answer is
![Page 62: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/62.jpg)
Most Commonly Used Statistical TechniquesT-test (for the means)- What is the mean time that college students
watch TV per day?- What is the mean pulse rate of women?
![Page 63: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/63.jpg)
Hypothesis Testing
Procedure for deciding whether the outcome of a study supports a particular theory or practical innovation
![Page 64: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/64.jpg)
Core Logic of Hypothesis TestingApproach can seem curious or even backwards
Researcher considers the probability that the experimental procedure had no effect and that the observed result could have occurred by chance alone
If that probability is sufficiently low, researcher will… Reject the notion that experimental procedure had no effect Affirm the hypothesis that the procedure did have an effect
![Page 65: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/65.jpg)
The Null Hypothesis and the Research HypothesisNull hypothesis (H0)
Opposite of desired result Usually that manipulation had no effect
Research hypothesis (H1)Also called the “alternative hypothesis”Opposite of the null hypothesisWhat the experimenter desired or expected all
along—that the manipulation did have an effect
![Page 66: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/66.jpg)
One-tailed vs. Two-tailed Hypothesis TestsDirectional prediction
Researcher expects experimental procedure to have an effect in a particular direction
One-tailed significance tests may be used
Nondirectional predictionResearch expects experimental procedure to
have an effect but does not predict a particular direction
Two-tailed significance test appropriateTakes into account that the sample could be
extreme at either tail of the comparison distribution
![Page 67: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/67.jpg)
One-tailed vs. Two-tailed Hypothesis TestsTwo-tailed tests
More conservative than one-tailed tests
Some believe that two-tailed tests should always be used, even when an experimenter makes a directional prediction
![Page 68: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/68.jpg)
Significance Level Cutoffs for One- and Two-Tailed TestsThe .05 significance
level
The .01 significance level
![Page 69: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/69.jpg)
Decision ErrorsWhen the right procedure leads to the
wrong conclusionType I Error
Reject the null hypothesis when it is trueConclude that a manipulation had an effect
when in fact it did notType II Error
Fail to reject the null when it is falseConclude that a manipulation did not have an
effect when in fact it did
![Page 70: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/70.jpg)
P-valueis the probability of obtaining a result at
least as extreme as the one that was actually observed, assuming that the null hypothesis is true.
Frequent misunderstandings
For more details, please refer to Wikipedia.
![Page 71: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/71.jpg)
Decision ErrorsSetting a strict significance level (e.g., p
< .001)Decreases the possibility of committing a Type I
errorIncreases the possibility of committing a Type II
errorSetting a lenient significance level (e.g., p
< .10)Increases the possibility of committing a Type I
errorDecreases the possibility of committing a Type II
error
![Page 72: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/72.jpg)
Test Statisticvalue computed from sample informationBasis for rejecting/ not rejecting the null
hypothesisused to compute the p-valueExample:
![Page 73: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/73.jpg)
T-testA t-test is most
commonly applied when the test statistic would follow a normal distribution. When the scaling term is unknown and is replaced by an estimate based on the data, the test statistic follows a Student's t distribution.
![Page 74: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/74.jpg)
t-testOne-sample t test
Two-sample t testIndependent two-sample
Dependent two-sample
Equal sample size, equal variance Unequal sample size, equal variance
![Page 75: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/75.jpg)
The Hypothesis Testing Process1. Restate the research question as a research
hypothesis and a null hypothesis about the populations
2. Set the level of significance, .3. Collect the sample and compute for the test
statistic.4. Assume Ho is true, compute the p-value.5. If p-value < , reject Ho.6. State your conclusion.
SUMMARY OF HYPOTHESIS TESTSEx7,8
![Page 76: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/76.jpg)
Most Commonly Used Statistical Techniques
Analysis of Variance (testing differences of means for 2 or more groups)
- Is GPA related to where a student likes to sit (front, middle, back)?
- Which internet search engine is the fastest?
![Page 77: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/77.jpg)
Analysis of VarianceAbbreviated as “ANOVA”Used to compare the means of more than two
groupsNull hypothesis is that all populations being
studied have the same meanReject null if at least one population has a
mean that differs from the others Actually works by analyzing variances
![Page 78: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/78.jpg)
Two Different Ways of Estimating Population VarianceEstimate population variance from variation
within each groupIs not affected by whether or not null
hypothesis is true Estimate population variance from variation
between each groupIs affected by whether or not null hypothesis is
true
![Page 79: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/79.jpg)
Two Important Questions1. How to estimate population variation from
variance between groups?
2. How is that estimate affected by whether or not the null is true?
![Page 80: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/80.jpg)
Estimate population variance from variation between means of groupsFirst, variation among
means of samples is related directly to the amount of variation within each population from which samples are taken
The more variation within each population, the more variation in means of samples taken from those populations
Note that populations on the right produce means that are more scattered
![Page 81: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/81.jpg)
Estimate population variance from variation between means of groupsAnd second, when null is false
there is an additional source of variation
When null hypothesis is true (left), variation among means of samples caused by Variation within the
populations
When null hypothesis is false (right), variation among means of samples caused by Variation within the
populations And also by variation among
the population means
![Page 82: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/82.jpg)
Basic Logic of ANOVAANOVA entails a
comparison between two estimates of population variance
Ratio of between-groups estimate to within-groups estimate called an F ratio
Compare obtained F value to an F distribution Groups
BetweenF
Within Groups
![Page 83: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/83.jpg)
Assumptions of an ANOVAPopulations follow a normal curve
Populations have equal variances
As for t tests, ANOVAs often work fairly well even when those assumptions are violated
![Page 84: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/84.jpg)
Rejecting the Null HypothesisA significant F tells you that at least one of
the means differs from the othersDoes not indicate how many differDoes not indicate which one(s) differ
For more specific conclusions, a researcher must conduct follow-up t tests
Problem: Lots of t tests increases the chances of finding a significant result just by chance (i.e., increases chances beyond p = .05)
![Page 85: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/85.jpg)
ANOVA (continue)Procedure that allows one to examine two or
more variables in the same studyEfficientAllows for examination of interaction effects
An ANOVA with only one variable is a one-way ANOVA, an ANOVA with two variables is a two-way ANOVA, and so on
![Page 86: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/86.jpg)
Main Effects vs. InteractionsA main effect refers to the effect of one
variable, averaging across the other(s)
An interaction effect refers to a case in which the effect of one variable depends on the level of another variable
![Page 87: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/87.jpg)
Main Effects vs. Interactions
![Page 88: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/88.jpg)
Most Commonly Used Statistical TechniquesChi-square test of independence
(Relationship of 2 categorical variables)-With whom is it easier to make friends with?- Does the opinion on legalization of marijuana
depend on one’s religion?
![Page 89: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/89.jpg)
Chi-Square TestsHypothesis testing procedure for nominal
variablesFocus on number of people/items in each category
(e.g., hair color, political party, gender)
Compare how well an observed distribution fits an expected distribution
Expected distribution can be based onA theoryPrior resultsAssumption of equal distribution across categories
![Page 90: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/90.jpg)
Chi-Square Test for Goodness of Fit
Single nominal variable
Degrees of freedom = number of categories minus 1
![Page 91: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/91.jpg)
Chi-Square StatisticCompares observed frequency distribution to
expected frequency distributionCompute difference between observed and
expected and square each oneWeight each by its expected frequencySum them
22 ( )O E
E
Ex9
![Page 92: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/92.jpg)
Chi-Square Distribution
Compare obtained chi-square to a chi-square distribution
Does mismatch between observed and expected frequency exceed what would be expected by chance alone?
![Page 93: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/93.jpg)
Chi-Square Test for IndependenceTwo nominal
variablesIndependence
means no relation between variables
To determine degrees of freedom…
Contingency tableLists number of
observations for each combination of categories
To determine expected frequencies…
Column Rows( 1)( 1)df N N
( )R
E CN
![Page 94: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/94.jpg)
Most Commonly Used Statistical Techniques
Correlation (Relationship of 2 numerical variables)
- Is there a connection between the average verbal SAT and the percent of graduates who took the SAT in a state?
![Page 95: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/95.jpg)
Other Statistical Techniques Factor analysis (reducing independent variables which
are highly correlated)
Cluster analysis (grouping observations with similar characteristics)
Correspondence Analysis (grouping the levels of 2 or more categorical variables)
Time Series Analysis
And so on……..
![Page 96: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/96.jpg)
Inference with highest confidence level
![Page 97: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/97.jpg)
Definition of Statistics
Statistics is a mathematical science pertaining to the collection, analysis, interpretation or explanation, and presentation of data.
---From Wikipedia
![Page 98: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/98.jpg)
Presentation of DataFOR CATEGORICAL DATA
---Bar Chart ---Pie Chart
![Page 99: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/99.jpg)
Presentation of DataFOR NUMERICAL DATA --- Stem-and-Leaf Plot --- Histogram --- Boxplot
![Page 100: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/100.jpg)
Overview of Statistical Techniques
![Page 101: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/101.jpg)
![Page 102: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/102.jpg)
Questions?
or
Comments ?
![Page 103: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/103.jpg)
Upcoming Workshops
10/26/2009 Overview of SPSS
12/02/2009 Overview of SAS
![Page 104: Lincoln Jiang Statistical Consultant Western Michigan University The Graduate College Graduate Center for Research and Retention](https://reader037.vdocument.in/reader037/viewer/2022110205/56649c945503460f94950715/html5/thumbnails/104.jpg)
How to lie with statistics1. The Sample with Built-in Bias.
2. Well-Chosen Average.
3. The Gee-Whiz Graph.
4. Correlation and Causation.