chapter 9 statistical data analysis an introduction to scientific research methods in geography...

19
Chapter 9 Chapter 9 Statistical Data Statistical Data Analysis Analysis An Introduction to An Introduction to Scientific Research Scientific Research Methods in Geography Methods in Geography Montello and Sutton Montello and Sutton

Upload: dwain-clarke

Post on 11-Jan-2016

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 9 Statistical Data Analysis An Introduction to Scientific Research Methods in Geography Montello and Sutton

Chapter 9Chapter 9Statistical Data AnalysisStatistical Data Analysis

An Introduction to Scientific An Introduction to Scientific Research Methods in GeographyResearch Methods in Geography

Montello and SuttonMontello and Sutton

Page 2: Chapter 9 Statistical Data Analysis An Introduction to Scientific Research Methods in Geography Montello and Sutton

Data AnalysisData Analysis

Data AnalysisData Analysis Helps us achieve the four scientific goals of Helps us achieve the four scientific goals of

description, prediction, explanation, and description, prediction, explanation, and controlcontrol

Statisical Data Analysis Statisical Data Analysis Three primary reasons geographers treat data Three primary reasons geographers treat data

in a statisitical fashionin a statisitical fashion

http://rlv.zcache.com/knowledge_is_power_do_statistics_stats_humor_flyer-p2440846222778564182dwj5_400.jpg

Page 3: Chapter 9 Statistical Data Analysis An Introduction to Scientific Research Methods in Geography Montello and Sutton

Statistical DescriptionStatistical Description

Descriptive StatisticsDescriptive Statistics ParametersParameters Central TendencyCentral Tendency

ModeMode MedianMedian MeanMean

Arithmetic meanArithmetic mean

When would you use the median or the mode When would you use the median or the mode instead of the mean?instead of the mean?

,X m

Page 4: Chapter 9 Statistical Data Analysis An Introduction to Scientific Research Methods in Geography Montello and Sutton

Descriptive StatisticsDescriptive Statistics

VariabilityVariability RangeRange

= largest value – smallest value= largest value – smallest value

VarianceVariance

Standard DeviationStandard Deviation

2

2 1

( )N

ii

x

N

ms =

-=å

2

1

( )N

ii

x

N

ms =

-=

å

Page 5: Chapter 9 Statistical Data Analysis An Introduction to Scientific Research Methods in Geography Montello and Sutton

Descriptive StatisticsDescriptive Statistics

FormForm ModalityModality SkewnessSkewness

PositivePositive NegativeNegative

SymmetrySymmetry Unimodal – Bell-shapedUnimodal – Bell-shaped

Normal DistributionNormal Distribution

http://people.eku.edu/falkenbergs/images/skewness.jpg

Page 6: Chapter 9 Statistical Data Analysis An Introduction to Scientific Research Methods in Geography Montello and Sutton

Descriptive StatisticsDescriptive Statistics

Derived ScoresDerived Scores Percentile RankPercentile Rank

Highest – 99Highest – 99thth percentile percentile Where is the median?Where is the median?

Z-scoreZ-score Standard deviation units above or below the meanStandard deviation units above or below the mean

xz

ms-

=

Page 7: Chapter 9 Statistical Data Analysis An Introduction to Scientific Research Methods in Geography Montello and Sutton

Descriptive StatisticsDescriptive Statistics

RelationshipRelationship Linear RelationshipLinear Relationship

PositivePositive NegativeNegative

Relationship StrengthRelationship Strength Weak, strong, no relationshipWeak, strong, no relationship

Correlation CoefficientCorrelation Coefficient Between -1 and 1Between -1 and 1 0 – no relationship0 – no relationship

Regression AnalysisRegression Analysis Criterion variables (Y)Criterion variables (Y) Predictor variables (X)Predictor variables (X)

http://hosting.soonet.ca/eliris/remotesensing/LectureImages/correlation.gif

Page 8: Chapter 9 Statistical Data Analysis An Introduction to Scientific Research Methods in Geography Montello and Sutton

“Correlation doesn’t imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing ‘look over there’.” - XKCD

http://xkcd.com/552/

Correlation – Causation?Correlation – Causation?

Page 9: Chapter 9 Statistical Data Analysis An Introduction to Scientific Research Methods in Geography Montello and Sutton

Statistical InferenceStatistical Inference

Inferential StatisticsInferential Statistics StatisticsStatistics

Sampling errorSampling error Given our sample statistics, we infer our Given our sample statistics, we infer our

parametersparameters Assign probabilities to our guessesAssign probabilities to our guesses

Power and difficulty of inferential statistics Power and difficulty of inferential statistics comes from deriving probabilities about how comes from deriving probabilities about how likely it is that sample patterns reflect likely it is that sample patterns reflect population patternspopulation patterns

Page 10: Chapter 9 Statistical Data Analysis An Introduction to Scientific Research Methods in Geography Montello and Sutton

Inferential StatisticsInferential Statistics

Sampling distributionSampling distribution Ex: sampling distribution of means – show the Ex: sampling distribution of means – show the

probability that a single sample would have a probability that a single sample would have a mean within some given RANGE of valuesmean within some given RANGE of values

Central limit theorem – sampling distribution Central limit theorem – sampling distribution of sample means will be normal with a mean of sample means will be normal with a mean equal to the population mean and a standard equal to the population mean and a standard deviation equal to the population standard deviation equal to the population standard deviation divided by the square root of the deviation divided by the square root of the sample sizesample size

Page 11: Chapter 9 Statistical Data Analysis An Introduction to Scientific Research Methods in Geography Montello and Sutton

Inferential StatisticsInferential Statistics

Generation of sampling distributionsGeneration of sampling distributions AssumptionsAssumptions

Distributional assumptionsDistributional assumptions NonparametricNonparametric ParametricParametric

NormalityNormality Homogeneity of varianceHomogeneity of variance

Independence of scoresIndependence of scores Correct specification of modelsCorrect specification of models

Page 12: Chapter 9 Statistical Data Analysis An Introduction to Scientific Research Methods in Geography Montello and Sutton

Estimation and Hypothesis TestingEstimation and Hypothesis Testing

EstimationEstimation Point estimationPoint estimation Confidence IntervalConfidence Interval

Usually 95%Usually 95%

Hypothesis TestingHypothesis Testing Null hypothesis Null hypothesis

A hypothesis about the exact (point) value of a A hypothesis about the exact (point) value of a parameter or set of parametersparameter or set of parameters

Use sample statistics to make an inference about Use sample statistics to make an inference about the probable truth of our null hypothesisthe probable truth of our null hypothesis

Page 13: Chapter 9 Statistical Data Analysis An Introduction to Scientific Research Methods in Geography Montello and Sutton

Hypothesis TestingHypothesis Testing

Alternative Alternative HypothesisHypothesis Hypothesis that the Hypothesis that the

parameter does not parameter does not equal the exact value equal the exact value hypothesized in the hypothesized in the nullnull

A range rather than an A range rather than an exact valueexact value

Modus TollensModus Tollens Useful for Useful for

disconfirmingdisconfirming Not confirming!Not confirming!

If A is true, Then B is true

B is not true B is true

Therefore,A is not true

Therefore, ???

Page 14: Chapter 9 Statistical Data Analysis An Introduction to Scientific Research Methods in Geography Montello and Sutton

ExampleExample

From a recent nationwide study it is known that the From a recent nationwide study it is known that the typical American watches 25 hours of television per typical American watches 25 hours of television per week, with a population standard deviation of 5.6 hours. week, with a population standard deviation of 5.6 hours. Suppose 50 Denver residents are randomly sampled Suppose 50 Denver residents are randomly sampled with an average viewing time of 22 hours per week and a with an average viewing time of 22 hours per week and a standard deviation of 4.8. Are Denver television viewing standard deviation of 4.8. Are Denver television viewing habits different from nationwide viewing habits?habits different from nationwide viewing habits?

Step 1: State your null and alternative hypothesesStep 1: State your null and alternative hypotheses

What is this saying?What is this saying?

0 : 2 5

: 2 5A

H X

H X

=

¹

Page 15: Chapter 9 Statistical Data Analysis An Introduction to Scientific Research Methods in Geography Montello and Sutton

ExampleExample

Step 2: Determine your appropriate test statistic and its sampling Step 2: Determine your appropriate test statistic and its sampling distribution assuming the null is truedistribution assuming the null is true We are testing a sample mean where n>30 and so a z distribution can We are testing a sample mean where n>30 and so a z distribution can

be usedbe used

Step 3: Calculate the test statistic from your sample dataStep 3: Calculate the test statistic from your sample data

Step 4: Compare the empirically obtained test statistic to the null Step 4: Compare the empirically obtained test statistic to the null sampling distributionsampling distribution P value:P value: OR Critical value at .05 significance level: z = OR Critical value at .05 significance level: z = ±1.96±1.96 Decision: Reject the null hypothesisDecision: Reject the null hypothesis

-3.79 is less than -1.96: reject-3.79 is less than -1.96: reject The p value is very small, less than .05 and even .01: rejectThe p value is very small, less than .05 and even .01: reject

2 2

4 .8

5 0

X

s

n

===

2 5

5 .6

ms

==

2 2 2 53 .7 9

/ 5 .6 / 5 0

Xz

n

ms

- -= = =-

.0 0 0 1p =

Page 16: Chapter 9 Statistical Data Analysis An Introduction to Scientific Research Methods in Geography Montello and Sutton

ErrorError

You have made either a correct inference You have made either a correct inference or a mistakeor a mistake

Type I error is the rejection level, p (or Type I error is the rejection level, p (or αα)) Type II error - Type II error - ββ

http://www.mirrorservice.org/sites/home.ubalt.edu/ntsbarsh/Business-stat/error.gif

Page 17: Chapter 9 Statistical Data Analysis An Introduction to Scientific Research Methods in Geography Montello and Sutton

Data in Space and PlaceData in Space and Place

Spatiality is a focus in geography, unlike other disciplinesSpatiality is a focus in geography, unlike other disciplines Spatial autocorrelationSpatial autocorrelation

First Law of Geography: Everything is related to everything else, First Law of Geography: Everything is related to everything else, but near things are more related than distant thingsbut near things are more related than distant things

Positive v negative spatial autocorrelationPositive v negative spatial autocorrelation A violation of the important statistical assumption of A violation of the important statistical assumption of

independenceindependence Ex: If its raining in my backyard, I can say with a high degree of Ex: If its raining in my backyard, I can say with a high degree of

confidence its raining in my neighbor’s backyard, but my level of confidence its raining in my neighbor’s backyard, but my level of confidence that it is raining across town is lower, and 300 miles confidence that it is raining across town is lower, and 300 miles away even loweraway even lower

VariogramVariogram

http://www.innovativegis.com/basis/Papers/Other/ASPRSchapter/Default_files/image023.png

Page 18: Chapter 9 Statistical Data Analysis An Introduction to Scientific Research Methods in Geography Montello and Sutton

Data in Space and PlaceData in Space and Place

““Spatial data are special” – a special difficultySpatial data are special” – a special difficulty Which areal units should be used to analyze Which areal units should be used to analyze

geographic datageographic data Modifiable Areal Unit ProblemModifiable Areal Unit Problem

Gerrymandering Gerrymandering Geographic phenomena are often scale Geographic phenomena are often scale

dependentdependent Must identify the scale of a phenomena and collect Must identify the scale of a phenomena and collect

and organize data in units of that sizeand organize data in units of that size Data aggregation issuesData aggregation issues

Page 19: Chapter 9 Statistical Data Analysis An Introduction to Scientific Research Methods in Geography Montello and Sutton

Discussion QuestionsDiscussion Questions What measure of central tendency is best for nominal What measure of central tendency is best for nominal

data? data? When pollsters tell you that a candidate is favored by When pollsters tell you that a candidate is favored by

44% of likely voters, plus or minus 3 percent, what is the 44% of likely voters, plus or minus 3 percent, what is the 44% and what is the plus/minus 3%?44% and what is the plus/minus 3%?

A survey of all users of a park in 1980 found the average A survey of all users of a park in 1980 found the average number of people per party to be 3.5. In a random number of people per party to be 3.5. In a random sample of 35 parties in 2000 the average was 2.9. If you sample of 35 parties in 2000 the average was 2.9. If you wanted to test if the number of persons per party in 2000 wanted to test if the number of persons per party in 2000 was different from the number in 1980, what would your was different from the number in 1980, what would your null and alternative hypotheses be?null and alternative hypotheses be?

In the United States, we presume that someone is In the United States, we presume that someone is innocent. If a guilty person were found to be not guilty, innocent. If a guilty person were found to be not guilty, what type of error would this be?what type of error would this be?

A researcher finds that a particular learning software has A researcher finds that a particular learning software has an effect on student’s test scores, when actually it does an effect on student’s test scores, when actually it does not. What type of error is this?not. What type of error is this?