data analysis. a few necessary terms categorical variable: discrete groups, such as type of reach...

23
Data Analysis

Upload: clare-lee

Post on 13-Dec-2015

224 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

Data Analysis

Page 2: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

A Few Necessary Terms

Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool)

Continuous Variable: Measurements along a continuum, such as Flow Velocity

What type of variable would “Mottled Sculpin /meter2” be?

What type of variable is “Substrate Type”?

What type of variable is “% of bank that is undercut”?

Page 3: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

A Few Necessary Terms

Explanatory Variable: Independent variable. On x-axis. The variable you use as a predictor.

Response Variable: Dependent variable. On y-axis. The variable that is hypothesized to depend on/be predicted by the explanatory variable.

Page 4: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

Statistical Tests: Appropriate Use

For our data, the response variable will always be continuous.

T-test: A categorical explanatory variable with 2 options.

ANOVA: A categorical explanatory variable with >2 options.

Regression: A continuous explanatory variable

Page 5: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

Statistical Tests

Hypothesis Testing: In statistics, we are always testing a Null Hypothesis (Ho) against an alternate hypothesis (Ha).

Test Statistic:

p-value: The probability of observing our data or more extreme data assuming the null hypothesis is correct

Statistical Significance: We reject the null hypothesis if the p-value is below a set value, usually 0.05.

Page 6: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

Tests the statistical significance of the difference between means from two independent samples

Student’s T-Test

Page 7: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

Cross Plains Salmo Pond

Mottled Sculpin/m2

Compares the means of 2 samples of a categorical variable

Page 8: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

Precautions and Limitations

• Meet Assumptions

• Observations from data with a normal distribution (histogram)

• Samples are independent

• Assumed equal variance (boxplot)

• No other sample biases

• Interpreting the p-value

Page 9: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

Analysis of Variance (ANOVA)Tests the statistical significance of the difference between means from two or more independent samples

ANOVA website Riffle Pool Run

Grand MeanMottled

Sculpin/m2

Page 10: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

Precautions and Limitations

• Meet Assumptions

• Observations from data with a normal distribution

• Samples are independent

• Assumed equal variance

• No other sample biases

• Interpreting the p-value

• Pairwise T-tests to follow

Page 11: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

Simple Linear Regression

• What is it? Least squares line

•When is it appropriate to use?

•Assumptions?

•What does the p-value mean? The R-value?

• How to do it in excel

Page 12: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

Simple Linear Regression

R2 = 0.6955

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 0.1 0.2 0.3 0.4 0.5

Mottled Sculpin/Meter^2

Bro

wn

Tro

ut/

Met

er^2

Tests the statistical significance of a relationship between two continuous variables, Explanatory and Response

Page 13: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

Precautions and Limitations

• Meet Assumptions

• Observations from data with a normal distribution

• Samples are independent

• Assumed equal variance

• Relationship is linear

• No other sample biases

• Interpret the p-value and R-squared value.

Page 14: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

Residual Plots

Residuals are the distances from observed points to the best-fit line

Residuals always sum to zero

Regression chooses the best-fit line to minimize the sum of square-residuals. It is called the Least Squares Line.

Page 15: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

R2 = 0.6955

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 0.1 0.2 0.3 0.4 0.5

Mottled Sculpin/Meter^2

Bro

wn

Tro

ut/

Met

er^2

Residuals

Page 16: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

Residual vs. Fitted Value Plots

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0 0.1 0.2 0.3 0.4 0.5

Fitted Values (MS_CPUA)

Re

sid

ua

ls

Model Values (Line)

Observed Values (Points)

Page 17: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

Residual Plots Can Help Test Assumptions

0

“Normal” Scatter

0

0Fan Shape: Unequal Variance

Curve (linearity)

Page 18: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0 0.1 0.2 0.3 0.4 0.5

Fitted Values (MS_CPUA)

Re

sid

ua

ls

R2 = 0.6955

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0 0.1 0.2 0.3 0.4 0.5

Mottled Sculpin/Meter^2

Bro

wn

Tro

ut/

Met

er^2

Have we violated any assumptions?

Page 19: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

R-Squared and P-value

High R-Squared

Low p-value (significant relationship)

Page 20: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

R-Squared and P-value

Low R-Squared

Low p-value (significant relationship)

Page 21: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

R-Squared and P-value

High R-Squared

High p-value (NO significant relationship)

Page 22: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

R-Squared and P-value

Low R-Squared

High p-value (No significant relationship)

Page 23: Data Analysis. A Few Necessary Terms Categorical Variable: Discrete groups, such as Type of Reach (Riffle, Run, Pool) Continuous Variable: Measurements

P-value indicates the strength of the relationship between the two variables

You can think of this as a measure of predictability

R-Squared indicates how much variance is explained by the explanatory variable.

If this is low, other variables likely play a role. If this is high, it DOES NOT INDICATE A SIGNIFICANT RELATIONSHIP!