statistical methods

Post on 13-Nov-2014

5.159 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Statistical Methods.

Why Statistics.

• Statistics is used to take the analysis of data one stage beyond what can be achieved with maps and diagrams.

• You can gain a primitive insight into patterns at a glance but mathematical manipulation usually gives greater precision.

• This allows us to discover things which might otherwise go unnoticed.

The need for justification.

• Justifying mathematical manipulation is vital.

• It is vital to be aware that statistics is an aid to analysis and no more.

• Too often students make statistical calculations in geographical projects without adequate justification.

• Before statistics is used it is essential to ask yourself two questions.

Question 1.

• Why am I using this technique?

• In the exam be absolutely clear what it is a statistical test can prove and how a statistical test can do this.

Question 2.

• Is the data appropriate to this particular technique?

• Each technique requires data to be arranged in a particular form.

• If they aren’t the technique cannot be used.• If your data is not good in the first place the use

of a complex statistical technique will not help you

“Rubbish in- Rubbish out”

Mean, Mode, Median.

• To be used when faced with a large amount of data

• For example- average temperature of a place every day for two years.

• It makes things far easier when we can summarise it.

• This is relatively easy to do and there are three common methods to achieve this.

1- Mean

• What most people call the average is the mean.• You find it by adding all the numbers together

and then divide by the total number of data values.

• The mean is shown by the symbol- x• The mean is distorted if you have just one

extreme value which can be a problem.• However, it is the most commonly used as it can

be used for further mathematical processing.

Find the mean of these data values-

• 3, 4, 4, 4, 6, 6, 9.

36 = 5.1

7

x = 5.1

2- The Mode.

• The mode is simply the most frequently occurring event.

• If we are using simple numbers then the mode is the most frequently occurring number.

• If we are looking at data on the nominal scale (grouped into categories) the mode is the most common category.

• The mode is very quick to calculate, but it cannot be used for further mathematical processing.

• It is not effected by extreme values.

Find the mode of this data set.

• 3, 4, 4, 4, 6, 9.

Mode (most frequently occurring number)= 4

Find the mode of this nominal data.

Land Use Hectares

Clover 10

Rye 12

Vegetables 15

Fruit 3

Wheat 29

Barley 18

Pasture 17

Mode (Most frequently occurring category)= wheat.

3- The Median.

• The Median is the central value in a series of ranked values.

• If there is an even number of values, the median is the mid point between the two centrally placed values.

• The median is not effected by extreme values but it cannot be used for further mathematical processing.

Find the median of this data set.

3, 4, 4, 4, 6, 9.

Median (central value)= 4.

Now find the median of this data set.

3, 4, 4, 6, 6, 9.

Median (central value)= 5

Spread around the median and mean.

• The median, mean and mode all give us a summary value for a set of data.

• On their own, however, they give us no idea of the spread of data around the summary value, which can be misleading.

• For example…

• I collected the following rainfall data.

• The mean for this data is 20mm.• But that gives an untrue picture of what really happened. • There is a great “deviation about the mean”.• Deviation can be measured statistically as follows.

Year Rainfall (mm)

1990 0

1991 0

1992 3

1993 0

1994 97

Spread around the median: the interquartile range.

• The Interquartile range is a measure of the spread of the values around their median.

• The greater the spread the higher the interquartile range.

Method.

• Stage 1- Place the variables in rank order, smallest to largest.

• Stage 2- Find the upper quartile. This is found by taking the 25% highest values and finding the mid-point between the lowest of these and the next lowest number.

• Stage 3- Find the lower quartile. This is obtained by taking the 25% lowest values and finding the mid-point between the highest of these and the next highest value.

• Stage 4- Find the difference between the upper and lower quartiles. This is the interquartile range, a crude index of the spread of the values around the median.

• The higher the range the greater the spread.

Over to you.

• Copy out the data on the next slide• Then find the interquartile range, remembering

to follow all the four stages.

Month Average temperature

January 4

February 5

March 7

April 9

May 12

June 15

July 17

August 17

September 15

October 11

November 7

December 5

Answer

• Ranked the data looks like this.4 5 5 7 7 9 11 12 15 15 17 17

Lower Quartile Median Upper Quartile 6 10 15

Interquartile range: (15-6) = 9.

Spread about the mean: Standard deviation.

• If we want to obtain some measure of the spread of our data about its mean we calculate its standard deviation.

• Two sets of figures can have the same mean but very different standard deviations.

• Stage 1- Tabulate the values (x) and their squares (x ² ). Add these values (∑x and ∑x ² ).

• Find the mean of all the values of x (x ) and square it (x ² ).

• Stage 3- Calculate the formula

= ∑x² - x ²

n

Method.

= standard deviation.

= the square root of.

∑ = the sum of.

n = the number of values.

x = the mean of the values.

Over to you.

• Number of vehicles passing a traffic count point.

• Calculate the standard deviation of the following data.

Day Number of vehicles.

1 50

2 75

3 80

4 92

5 60

6 70

7 63

8 42

9 75

10 82

Answer.x x²

50 2 500

75 5 625

80 6 400

92 8 464

60 3 600

70 4 900

63 3 969

42 1 764

75 5 625

82 6 724

Answer

• ∑ X = 689• ∑ x² = 49 571.• x = 689 divided by 10 = 68.9• x ² = (68.9) ² = 4747.2• = ∑x² - x ² = 49 571 – 4747.2

n 10

= 14.5

Phew!!!!!!

• The higher the standard deviation, the greater the spread of data around the mean.

• The standard deviation is the best of the measures of spread as it takes into account all of the values under consideration.

Homework.

• Research the following tests of significance to find out their meaning.

1. The Mann-Whitney U test.

2. The Chi- Squared (x²) test.

top related