stats exam questions!

Upload: aby251188

Post on 02-Jun-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 STATS Exam Questions!

    1/3

    Answering Exam Questions on StatisticsExamination questions may require the calculation and interpretation of statistical measures and tests. This Factsheet discusses

    strategies for approaching such questions and gives guidance on common mistakes to avoid. Factsheets 79 and 85 cover the chi-squared

    test and t-test specifically. A later Factsheet will cover diagrams and their interpretation.

    Bio FactsheetApril 2003 Number 122

    1

    Calculator Tip:- To do this sum on your calculator, you need toput brackets around all of the top and all of the bottom, like this:(60 + 143 + 160 + 152 + 88) (6 + 11 + 10 + 8 + 4)

    What can they ask you?Exactly what is examinable depends on the specification you are studying,

    but there are three main categories:

    basic statistical calculations and their interpretation

    chi-squared test

    t-test

    Basic statistical calculations and their interpretation

    All specifications require you to calculate the mean; some also require the

    standard deviation. You need to remember the formula for the mean, but

    will be given it for the standard deviation.

    Calculating the mean

    a) For a list of numbers, just add them all up and divide by how many there

    are.

    b) For a table of grouped data, follow this procedure

    Step 1. Find out the midpointof each class, by adding its endpoints

    and dividing by two. Add it to the table. Call this column "x"

    Step 2. Add another column, and put in it the values of

    x number of individuals (f)

    Step 3.mean =

    eg: Find the mean of the following data

    mean = = 15.46

    Length

    (nearest cm)

    9 - 11

    12 - 14

    15 - 17

    18 - 20

    21 - 23

    Number of

    individuals (f)

    6

    1 1

    1 0

    8

    4

    x

    (9 + 11) 2 = 10(12 + 14) 2 = 13(15 + 17) 2 = 16

    (18 + 20) 2 = 19(21 + 23) 2 = 22

    x f

    60

    143

    160

    152

    88

    60+143+160+152+88

    6+11+10+8+4

    Calculating the standard deviation

    The formula for this that you will be given is:

    standard deviation =x2

    n mean2

    means "sum of", so x2 means "square each value then add them up"

    a) For a list of numbers:

    i) Square each number and add up the squares (this gives x2

    )ii) Divide your answer to i) by how many numbers there are

    (this gives x2/n)iii) Find the mean and square it.

    iv) Take the answer to iii) away from the answer to ii)

    (this gives everything inside the square root)

    v) Square root the answer to iv) (this gives the standard

    deviation)

    eg: Find the standard deviation of 2, 5, 6, 7, 8

    i) x2 =22+ 52+ 62+ 72+ 82= 178

    ii) x2/n = 1785 = 35.6iii) Mean = (2 + 5 + 6 + 7 + 8)5 = 5.6 Mean2 = 5.62= 31.36

    iv) x2/n mean2 = 35.6 31.36 = 4.24v) standard deviation = 4.24 = 2.0591

    b) For a table of grouped data

    i) Complete the columns "x" and "x f" as for finding the meanii) Add another column, which is x2 fiii) Find the total of the " x2 f" column. (this gives x2)iv) Divide your answer to iii) by the total of the "f" column

    (this gives x2/n)v) Find the mean, as described opposite, and square it

    vi) Take the answer to v) away from the answer to iv)

    (this gives everything inside the square root)

    vii) Square root the answer to vi) (this gives the standard deviation)

    eg. Find the standard deviation of the following data

    iii) x2= 600 + 1859 +2560 + 2888 + 1936 = 9843iv) Total of f column = 6 + 11 + 10 + 8 + 4 = 39

    x2/n = 984339 = 252.3846

    v) Mean2

    = 15.462

    = 239.0116vi) x2/n - mean2 = 252.3846 - 239.0116 = 13.3730vii) Standard deviation = 13.3730 = 3.657

    Length

    (nearest cm)

    9 - 11

    12 - 14

    15 - 17

    18 - 20

    21 - 23

    Number of

    individuals (f)

    6

    11

    1 0

    8

    4

    x

    1 0

    1 3

    1 6

    1 9

    2 2

    x f

    6 0

    143

    160

    152

    8 8

    x2f

    600

    1859

    2560

    2888

    1936

    Calculator Tip:- Most scientific or graphical calculators willallow you to calculate mean and standard deviation automatically.This can save a lot of time! However, not all calculators do it inthe same way, so you need to consult your calculator instructionbook and practice well in advance of the exam.

    One of the commonest mistakes candidates make when usingthe calculator is not to clear all the data before starting a newcalculation. You can usually do this on a scientific calculator bygoing into the statistics mode and then pressing SHIFT or 2NDand "AC". To check it works, press the button that you wouldnormally use to get the mean - if it gives you a number, youhaven't cleared the data properly!

    www.curriculumpress.co.uk

    total of "x f" columntotal of "f" column

  • 8/10/2019 STATS Exam Questions!

    2/3

    Bio Factsheet

    2

    Answering Exam Questions on Statistics

    Interpreting the mean and standard deviationThe mean, of course, is the average- but that does notmean half the values

    are below and half above it, or that it is a common value. For example, the

    mean of the values 1, 1, 2, 3, 100 is 21.4; this is nowhere near any of the actual

    values, and four out of the five values are below it!

    The mean also does not distinguish betwee these two data sets:-

    A: 48, 49, 50, 51, 52

    B: 35, 40, 50, 62, 63

    Both sets of data have mean 50, but they are not very similar.

    This is where the standard deviation comes in. This measures how spread

    outthe data are - the bigger the standard deviation, the greater the spread.

    For example, for data set A above, the standard deviation is 1.414, and for

    set B, it is 11.296.

    So, for example if you know the following:

    Data set 1: mean = 45.2 standard deviation = 2.13

    Data set 2: mean = 43.7 standard deviation = 10.03

    We know that data set 2 is more spread out than data set 1. Let's consider

    which would be more likely to have a value in it above 50, say.For data set 1, 50 is more than 2 standard deviations away from the mean

    (45.2 + 2 2.13 = 49.46)For data set 2, 50 is less than 1 standard deviation away from the mean

    (43.7 + 10.03 = 53.73).

    This tells us that 50 is a less "extreme" or "uncommon" value for data set

    2 than for data set 1. So data set 2 is more likely to have values above 50.

    Statistical tests

    In the exam, you will always be told which statistical test to use if you arebeing required to do calculations. You will be given any tables you need.

    There are various types of questions:-

    understanding statistical terms like degrees of freedom, significance, etc

    interpreting results and drawing conclusions doing the calculations according to the test formula

    finding degrees of freedom

    using statistical tablesSome of these are the same for both t-test and chi-squared; others are specific

    to the test.

    Understanding statistical terms

    Hypotheses: the purpose of a statistical test is to decide between the null

    hypothesis and the alternative hypothesis. The exact form of these

    hypotheses depends on the test. When you are carrying out the test, you

    accept the null hypothesis, unless you have convincing evidence otherwise(in a court of law, the "null hypothesis" is that the person is innocent - he

    is only decided to be guilty if there is enough evidence).

    Test statistic: this is the value calculated from your data. The formula for

    it depends on the test you are doing.

    Critical value: this is the value you compare the test statistic to, to decide

    whether you are going to accept or reject the null hypothesis.

    For both t-test and chi-squared test, you rejectthe nullhypothesis if your

    test statistic is greaterthan the critical value.

    Critical values come from statistical tables.

    Significance level: It is possible to reject the null hypothesis even if it is

    true, because "unusual" results can occur by chance (eg it is possible -

    although unlikely - to get 100 heads in succession when tossing a coin).

    The significance level is the chance of rejecting the null hypothesis when it

    is true. These may be written as percentages (10%, 5%, 1%) or as decimals

    (0.1, 0.05, 0.01).

    The normal significance level in science is 5%. Use this unless you

    are told otherwise.

    Degrees of freedom: you do not need to know the exact meaning, although

    you do need to know how to calculate them (see below). The idea is that

    the amount of data you have affects the critical value - this is because you

    are much more likely to get unusual results by chance if you only have a few

    observations, than if you have a lot of observations.

    Interpreting results and drawing conclusions

    You mustremember that if the value you calculate (the test statistic) is

    greater than the value from the tables (the critical value), then you reject

    the null hypothesis. Otherwise you accept it.

    You then need to relate this back to the original hypotheses; this will be

    discussed in more detail for each test.

    Choose your words carefully - a statistical test does not "prove" a

    hypothesis is true - there is always a chance that a wrong decision could be

    made. It is normal to say "the result is significant at the 5% level" or "the

    alternative hypothesis was accepted at the 5% level".

    The remainder of the section is divided between the chi-squared test and the

    t-test.

    Chi-squared test

    There are two types main types of chi-squared test you may have to do:

    a) Testing to see if there is a difference

    b) Testing to see if the theoretical ratios predicted by genetics apply

    The hypotheses for the tests are

    a) H0: there is no difference between the different conditions

    H1: there is a difference between the different conditions

    b) H0: the observations are in accordance with the predictions of genetics

    H1: the observations are not in accordance with the predictions of

    genetics

    Calculations for the test formula

    In chi-squared, you will need to calculate expected frequencies, and then

    the value of chi-squared, using the formula:

    2=

    a) To calculate expected values when you are testing for a difference, you

    just add up all the values and divide by the number of them.

    b) To calculate expected values for genetics, you have to use the genetic

    ratio. The procedure is:i) Add up all the values from the data you are given

    ii) Add up all the numbers in the genetic ratio

    (eg for 9:3:3:1, do 9 + 3 + 3 + 1 = 16)

    This tells you the number of parts you will be dividing your total

    from i) into.

    iii) Find out how much one part is, by dividing your total from i) by your

    total from ii)

    iv) Find out the expected frequencies, by multiplying one part by the

    numbers in the ratio (eg by 9, 3, 3 and 1)

    Once you have calculated the expected frequencies, you substitute into the

    formula above to find the chi-squared value.

    Finding degrees of freedom

    You need to learn this formula:

    For chi-squared:

    degrees of freedom = number of categories - 1

    (O - E)2E O is observed values - the data from the questionE is expected values - the ones you calculatemeans sum of

    www.curriculumpress.co.uk

  • 8/10/2019 STATS Exam Questions!

    3/3

    Bio Factsheet

    3

    Answering Exam Questions on Statistics

    Acknowledgments: This Factsheet was researched and written by Cath Brown.

    Curriculum Press, Unit 305B The Big Peg, 120 Vyse Street, Birmingham B18 6NF

    Bio Factsheets may be copied free of charge by teaching staff or students,

    provided that their school is a registered subscriber.

    No part of these Factsheets may be reproduced, stored in a retrieval system, or

    transmitted, in any other form or by any other means, without the prior

    permission of the publisher.

    ISSN 1351-5136

    Using statistical tables

    All you have to do is to read down to find the number of degrees of freedom

    you have, and across to find the significance level (usually 5% = 0.05).

    t-testThere are two types of t-test, paired and unpaired. The exam will always

    make it clear which you should do. You will always be given the relevant

    formulae.

    The hypotheses for both tests are

    H0: mean 1 = mean 2

    H1: mean 1 mean 2(This is a2-tailed test- you may also come across 1-tailed tests, but in the

    exam you will never have to choose between the two)

    Calculations for the test formula

    The calculations for either type of type of t-test are similar to those for finding

    means and standard deviations. You also need to be able to substitute into

    a formula. Provided you can do calculations like the ones on page 1, you will

    not have a problem with these. Remember, you will be given any formulae

    you require.

    The paired t-test first requires you to find the differences between each pair

    of values. You then work with these differences only.

    paired t-test: t =

    In the unpaired t-test, youwill need to use these formulae:

    Exam questions will get you to do these calculations bit by bit and "follow

    through" marks are likely to be awarded - so if you calculate s wrong, for

    example, but use your value correctly to calculate the value of t, then you

    will get the rest of the marks.

    chi-squared tables

    df 0.10 0.05 0.025 0.01 0.005

    1 2.71 3.84 5.02 6.63 7.88

    2 4.61 5.99 7.38 9.21 10.60

    3 6.25 7.81 9.35 11.34 12.84

    4 7.78 9.49 11.14 13.23 14.86

    For a chi-squared test

    with 1 degrees of freedom

    at a significance level of5%, the critical (tables)

    value is 3.84

    Common mistakes

    These are some of the commonest errors candidates make:-

    Rounding errors, due to rounding too early. If in doubt, use all thefigures.

    It is useful to keep figures in your calculator, to avoid having to keep

    writing down and re-entering data. Learn how to use your calculator

    memory.

    Calculator errors- putting the correct figures into the calculatorwrongly. See the calculator tips in this Factsheet and practice using

    your calculator well before the exam.

    Failure to show working- hence throwing away all the marks if thereis even one tiny error in calculation.

    Failure to recall the formulae for degrees of freedom- these haveto be learnt. If you get them wrong, they will invalidate your tables

    value and your conclusion.

    Not drawing conclusions correctly - you must learn that if yourcalculated value is larger than the tables value, you reject the null

    hypothesis.

    Getting the hypotheses the wrong way round - if your calculatedresult is greater than the tables value, then:

    for the t-test, there is a difference between the means for testing for a difference in chi-squared, there is a difference for genetics chi-squared, the results are not as predicted by

    genetics

    x is the mean of the differences

    n is the number of pairs

    s is the standard deviation of thedifferences

    x1and x

    2are the means of the

    two samples

    n1and n

    2are the sizes of the

    two samples

    means "sum of"t

    =

    t-table

    For a t-test with 10 degrees of

    freedom at a significance levelof 5%, the critical (tables)

    value is 2.228

    Significance level

    df 0.1 0.05 0.01

    7 1.895 2.365 3.499

    8 1.860 2.306 3.355

    9 1.833 2.262 3.250

    10 1.812 2.228 3.169

    11 1.796 2.201 3.106

    Calculator Tips:-

    To carry out any calculation that is set out as a fraction, youmust put brackets round the top and round the bottom.It is probably easier to work out the number inside the square-root first, then take the square root, rather than trying to do it allin one go.

    Finding degrees of freedom

    You need to learn these formulae:

    For paired t-test:

    degrees of freedom = number of pairs - 1

    For unpaired t-test:

    degrees of freedom = number in 1stsample + number in 2ndsample - 2

    Using statistical tables

    All you have to do is to read down to find the number of degrees of freedom

    you have, and across to find the significance level (usually 5% = 0.05).

    www.curriculumpress.co.uk

    x (n -1)s

    x1

    2- n1x

    1

    2 + x2

    2- n2x

    2

    2

    n1+ n

    2- 2

    s =

    s1 + 1n

    1n

    2

    x1- x

    2