foundation statistics copyright douglas l. dean, 2015

Foundation Statistics

Copyright Douglas L. Dean, 2015

Types of Variables

1. Input Variables1. Might explain the outcome variable

2. Output variable1. Variable you want to explain or predict

Outline

1. One-way ANOVA– Does a category make a significant difference?

2. Bivariate Statistics– Are numeric variables related to each other?

3. Plotting bivariate relationships and lines of best fit

4. Simple linear regression

Oneway ANOVA

Outlier Thresholds

1.5 x the interquartile range

Interquartilerange

Annotated Box-whisker Plot

Bivariate descriptive stats

• Slope (m)

• Correlation (r)

• Coefficient of Determination (R2)

Every straight line can be represented by an equation: y = mx + b

The slope ‘m’ describes both the direction and the steepness of the line.

Slope Tree

Zero Slope

UndefinedSlope

How slope is calculated

A positive slope example

Another example

Negative Slope example

Correlation Coefficient• The Pearson Correlation Coefficient (r) is a measure of the

strength of the linear relationship between two numeric variables.

• The value of r ranges from -1 to 1

Correlation Coefficient

StrongerStronger

Which correlation is stronger? r = -.80 or r = .80?

Neither. They are the same strength.

Examples of Perfect Correlation

Examples of Strong Correlation

Examples of Weak Correlation

No Correlation

Ways to get r = 0

• Pure randomness

• Perfectly horizontal strait line

Correlation

• Often the first thing we check to see if a relationship with variable may exist

• Good place to start, not to finish

• Is a standardized value– Units of measure are factored out

– So you can have one variable on a small scale and the related variable on a large scale. No matter the scales, the value of r will be adjusted to be between negative and positive one.

Limitations of Correlation

• Correlation measures linear association not causality

• Correlation is only one important measure of a possible linear relationship

• Correlations lack statistical control for other possible related variables

Correlation ≠ Causality The Japanese eat very little fat and drink little red wine and suffer fewer heart attacks than the British or Americans

The French eat a lot of fat and drink a lot of red wine and suffer fewer heart attacks than the British or Americans

The Germans drink a lot of beer and eat a lot of sausages and fat and suffer fewer heart attacks than the British or Americans.

Conclusion:

Eat and drink what you like. Apparently it is speaking English that kills you.

Ambiguities in causality abound…

When Y and Z are correlated, direction of causality might be

X Z Or X Z

Ambiguities in causality abound…

When a correlation exists between Y and Z. Causality might be

X Y Z X

Statistical control

• The purpose of statistical control is to find the degree of association between two variables after removing the effects of other variables.

• Correlation lacks statistical control

• Many variables may exert influence on the variable being predicted. You cannot control for multiple influences with r alone

• Some forms of statistics and data mining methods give you statistical control

Sign (+ or -) of Basic statistics

• Slope and r can be positive or negative– Slope and r have the same sign

› If one is positive, so is the other

› If one is negative, so is the other

• R2 is always positive

R-Squared (R2)

• R2 is the Coefficient of determination

• The proportion of the variance in Y attributable to the variance in X (if only one x) or set of X variables if more than one predictor is included in the model.

Calculation of R2

If only one predictor: R2 = r2,

r = .80 R2 = .802 = .64

With multiple input variables - The math to calculate R2 is more complex - Math beyond the scope of this course.

Why we need more than just slope

• If we have the slope, why do we need r and R2?– Relationships are rarely perfectly linear in their ability to predict.

– Slopes of “best-fit” lines do not give us a measure of variability in how x and y relate to each other

• Correlation (r) measures the variability in linear association

• R2 measures proportion of variance in y attributable to all of the input variables included in the model

foundation statistics copyright douglas l. dean, 2015

causality correlation

correlation causality

direction of causality

value of r ranges

possible related variables

causality aboundwhen

lot of fat

positive slope example110

Documents

elementary business statistics - de anza...

dr. dean de cock associate professor statistics truman state...

cv of prof. _dr._ vishwa nath maurya for post of...

advisory committee on agriculture statistics · 2018. 11....

probability and statistics in engineering by william w....

test bank basic statistics for business economics 8th...

r. douglas martin* and ruben h. zamar** *professor of...

douglas w. woods, ph.d. dean of the graduate school...

knowledge engine for emergency response services -dr. e....

the “self regarding institution” using assessment for...

applied statistics and probability for...

statistics texts in statistics - department of...

fundamentals of agricultural statistics - scientific...

dr. dean de cock associate professor statistics

in this issue dean’s notes...pipelines january 2019 volume...

introductory statistics - amazon s3€¦ · introductory...

introductory statistics - amazon s3 · 2015-07-21 ·...

a summer quarter welcome message from the dean 15 · pdf...

the bls productivity measurement programthe bls productivity...

establishing a foundation for collaborative scenario...