chapter 4...data recorded about a single variable, such as a person’s weight. in this chapter, you...
TRANSCRIPT
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
C H A P T E R
4Bivariate data
What is a scatterplot, how is it constructed and what does it tell us?
What is the q-correlation coefficient, how is it calculated and what does it tell us?
How do we fit a straight line to a scatterplot by eye?
How do we fit a straight line to a scatterplot using the two-mean method?
How do we interpret the intercept and slope of a line fitted to a scatterplot?
How do we use a line fitted to a scatterplot to make predictions?
What is the difference between interpolation and extrapolation?
In Chapter 1, ‘Univariate data’, you learned about the statistical methods we use to analyse
data recorded about a single variable, such as a person’s weight. In this chapter, you will learn
about the statistical methods used to analyse data recorded about two related variables, such as
a person’s weight and height. Such data is called bivariate data (two-variable data).
When we analyse bivariate data, we are interested in how the two variables relate to each
other. We try to answer questions such as: ‘Is there a relationship between these two
variables?’ and ‘Does knowing the value of one of the variables tell us anything about the
value of the second variable?’
For example, let us take as our two variables the mark a student obtained on a test and the
amount of time they spent studying for that test. Since the amount of time spent studying may
affect the mark obtained, we distinguish between the two variables by calling the time spent
studying the independent variable (IV) and the mark obtained the dependent variable (DV).
4.1 Displaying bivariate dataScatterplotsThe first step in investigating the relationship between two numerical variables is to construct a
scatterplot.
We will illustrate the process by constructing a scatterplot to display the marks students
obtained on an examination (the DV) against the times they spent studying for the examination
(the IV).
140Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
Chapter 4 — Bivariate data 141
Student 1 2 3 4 5 6 7 8 9 10
Time (hours) 4 36 23 19 1 11 18 13 18 8
Mark (%) 41 87 67 62 23 52 61 43 65 52
In a scatterplot, each point represents a single case, in this instance a student.
When constructing a scatterplot, it is conventional to use the vertical or y-axis for the
dependent variable (DV) and the horizontal or x-axis for the independent variable (IV). This
will become very important when we come to fitting lines to scatterplots later in the chapter.
The horizontal or x-coordinate of the point represents the time spent studying (the IV).
The vertical or y-coordinate represents the mark obtained (the DV).
The scatterplot below shows the point for Student 1, who studied 4 hours for the examination
and obtained a mark of 41.
90
80
70
60
50
40
30
20
10
05 10 15 20 25 30 35 40
Mar
k (%
)
Time (hours)
Student 1 (4, 41)
The scatterplot is completed by plotting the points for each remaining student, as shown
below.
90
80
70
60
50
40
30
20
10
0 5 10 15 20 25 30 35 40
Mar
k (%
)
Time (hours)
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
142 Essential Standard General Mathematics
For example, in the scatterplot opposite, the
advertised prices of 12 second-hand cars are
plotted against the cars’ ages (in years).
In this relationship, the car’s price is clearly
the dependent variable (DV) as it depends on
its age, so price is plotted on the vertical
axis. Age, the independent variable (IV), is
plotted on the horizontal axis. 0 2 4 6 8
Pri
ce (
$’00
0)
Age (years)
16
14
12
10
8
Using a graphics calculator to construct a scatterplotWhile you need to understand the principles of constructing a scatterplot, and maybe to
construct one by hand for a few points, in practise you will use a graphics calculator to
complete this task.
How to construct a scatterplot using the TI-Nspire CAS
The data below give the marks that students obtained on an examination and the times
they spent studying for the examination.
Time (hours) 4 36 23 19 1 11 18 13 18 8
Mark (%) 41 87 67 62 23 52 61 43 65 52
Use a graphics calculator to construct a scatterplot. Treat time as the independent
(x) variable.
Steps1 Start a new document (by pressing / + N)
and select 3:Add Lists & Spreadsheet.Enter the data into lists named time and mark.
2 Statistical graphing is done through the Data &Statistics application.
Press and select 5:Data & Statistics.Note: A random display of dots will appear – this is toindicate list data are available for plotting. It is not astatistical plot.
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
Chapter 4 — Bivariate data 143
3 To construct a scatterplot
a Move the cursor to the textbox area below the
horizontal (or x-) axis. Press when
prompted and select the variable time (i.e. the
independent variable). Press enter to paste the
variable onto that axis.
b Move the cursor towards the centre of the
vertical (or y-) axis until a textbox appears.
Press when prompted to select the variable
mark.
c Finally, press enter to paste the variable mark
onto that axis and generate the required
scatterplot, which is shown opposite. The plot
is automatically scaled.
How to construct a scatterplot using the ClassPad
The data below give the marks that students obtained on an examination and the
times they spent studying for the examination.
Time (hours) 4 36 23 19 1 11 18 13 18 8
Mark (%) 41 87 67 62 23 52 61 43 65 52
Use a graphics calculator to construct a scatterplot. Treat time as the independent
(x) variable.
Steps1 Open the Statistics application and
enter the coordinate values into
lists named time and mark, as
shown.
2 Tap from the toolbar to open
the Set StatGraphs dialog box.
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
144 Essential Standard General Mathematics
Complete the dialog box as given below.
For Draw: select On
For Type: select Scatter ( )
For XList: select main \ time ( )
For YList: select main \ mark ( )
Leave Freq: as 1
Leave Mark: as square
Tap h to confirm your selections.
3 Tapping from the toolbar at
the top of the screen
automatically plots a scaled
graph in the lower-half of the
screen.
Tapping the icon will give a
full-screen sized graph. Tap
again to return to a half-screen.
4 Tapping from the toolbar
places a marker on the first data
point (xc = 4, yc = 41).
Use the horizontal cursor arrow
( ) to move from point to
point.
Exercise 4A
1 Height, x 190 183 176 178 185 165 185 163
Weight, y 77 73 70 65 65 65 74 54
The table above shows the heights and weights of eight people. Use a graphics calculator to
construct a scatterplot with height as the IV (i.e. x variable).
2 Wife’s age 26 29 27 21 23 31 27 20 22 17 22
Husband’s age 29 43 33 22 27 36 26 25 26 21 24
The table above shows the ages at marriage of 11 couples. Use a graphics calculator to
construct a scatterplot with wife’s age as the independent variable.
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
Chapter 4 — Bivariate data 145
3 Number of seats 405 296 288 258 240 193 188 148
Airspeed (km/h) 830 797 774 736 757 765 760 718
The table above shows the numbers of seats and airspeeds of eight passenger aircraft. Use a
graphics calculator to construct a scatterplot with number of seats as the independent
variable.
4 Drug dosage (mg) 0.5 1.2 4.0 5.3 2.6 3.7 5.1 1.7 0.3 0.6
Response time (min) 65 35 15 10 22 16 10 18 70 50
The table above shows the response times of 10 patients
given a pain relief drug, and the drug dosages. Use a
graphics calculator to construct a scatterplot using
drug dosage as the independent variable.
5 Time (min) 0 5 10 15 20 25
Number in cinema 87 102 118 123 135 137
The table above shows the numbers of people in a cinema at 5-minute intervals after the
advertisements started. Use a graphics calculator to construct an appropriate scatterplot.
4.2 How to interpret a scatterplotWhat features do we look for in a scatterplot that will help us to identify and describe any
relationships present?
Presence of a relationshipFirst we look to see if there is a clear pattern in the scatterplot. y
x
In the example opposite, there is no clear pattern
in the points. The points are randomly scattered across
the plot, so we conclude that there is no relationship.
For the three examples below, there is a clear (but different) pattern in each set of points, so
we conclude that there is a relationship in each case.
y
x
y
x
y
x
Having found a clear pattern, there are two main things we look for in the pattern of points:
direction and outliers (if any)
strength of the relationship (amount of scatter).
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
146 Essential Standard General Mathematics
Direction and outliersThis scatterplot of calf diameter against age
of a group of people is just a random scatter
of points. This suggests that there is no
relationship between the variables calf
diameter and age for this group of
people. However, there is an outlier, the
person with a calf diameter of 22 cm.
Cal
f dia
met
er (
cm)
Age (years)
25
20
15
5
0
10
2220 24 26 28 30 32 34 36
In contrast, there is a clear pattern in this scatterplot
of the mark students obtained in an exam and the
time they spent studying for the exam.
The two variables, mark and time, are related.
Furthermore, the points seem to drift upwards
from left to right. When this happens, we say
that there is a positive relationship between
the variables. People who spend more time
studying tend to get higher marks, and
vice versa.
In this scatterplot there are no outliers.
908070605040302010
05 10 15 20 25 30 35 40
Mar
k (%
)
Time (hours)
Likewise, this scatterplot of the price against
age of a number of second-hand cars shows
a clear pattern. The two variables are
related. However, in this case the points seem
to drift downwards from left to right. When
this happens, we say that there is a negative
relationship between the variables. Older
second-hand cars tend to have a lower
price than newer second-hand cars.
In this scatterplot there are no outliers.
0 2 4 6 8
Pri
ce (
$’00
0)
Age (years)
16
14
12
10
8
Strength of a relationship (scatter)The strength of a relationship is measured by how much scatter there is in a scatterplot.
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
Chapter 4 — Bivariate data 147
Strong relationshipWhen there is a strong relationship between the variables, the points will tend to follow a
single stream. A pattern is clearly seen. There is only a small amount of scatter in the plot.
Strong positive relationship Strong positive relationship Strong negative relationship
Moderate relationshipAs the amount of scatter in the plot increases, the pattern becomes less clear. This indicates
that the relationship is less strong. In the examples below, we might say that there is a
moderate relationship between the variables.
Moderate positive relationship Moderate positive relationship Moderate negative relationship
Weak relationshipAs the amount of scatter increases further, the pattern becomes even less clear. This indicates
that any relationship between the variables is weak. The scatterplots below are examples of
weak relationships between the variables.
Weak positive relationship Weak positive relationship Weak negative relationship
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
148 Essential Standard General Mathematics
No relationshipFinally, when all we have is scatter, as seen in the scatterplots below, no pattern can be seen. In
this situation we say that there is no relationship between the variables.
No relationship No relationship No relationship
These scatterplots should help you to get a feel for the strength of a relationship, as indicated
by the amount of scatter in a scatterplot. Later in this chapter, you will learn to calculate its
value using the idea of q-correlation. At the moment, you only need be able to estimate the
strength of a relationship as strong, moderate, weak or none, by comparing it with the standard
scatterplots given above.
Exercise 4B
1 For each of the following pairs of variables, indicate whether you expect a relationship to
exist between the variables and, if so, whether you would expect the variables to be
positively or negatively related.
a Fitness level and amount of daily exercise b Foot length and height
c Comfort level and temperature above 30◦C d Foot length and intelligence
e Time taken to get to school and distance travelled
f Weight of an ice cube and surrounding temperature
2 For each of the following scatterplots:
i state whether the variables appear to be related and note any possible outliers.
If the variables appear to be related:
ii state whether the relationship is positive or negative.
iii estimate the strength of the relationship as strong, moderate or weak.
a210
200
190
180
170
18 20 22 24 26 28 30 32
Hei
ght (
cm)
Age (years)
b15
10
5
0 100 200 300 400 500 600
Bus
ines
s ($
’000
)
Advertising ($)
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
Chapter 4 — Bivariate data 149
c 180
180
170
170
160
160
150
150
Dau
ghte
r’s
heig
ht (
cm)
Mother’s height (cm)
d
Drug dosage (mg)
Rea
ctio
n ti
me
(min
)
0
10203040506070
1 2 3 4 5 6
e20
Scor
e on
test
0
5
10
15
20 25 30 35 40Temperature (°C)
f
20
20
25
25
35
35
40
4045
30
30
45Husband’s age (years)15
15Wife
’s a
ge (
year
s)
4.3 The q-correlation coefficientIn the previous section you learned how to estimate the strength of a relationship from a
scatterplot by considering the amount of scatter in the plot. In this section, you will learn how
the q-correlation coefficient (q for quadrant) can be used to give a measure of the strength of
the relationship between two variables.
The idea behind the q-correlation coefficientFrom our earlier investigation of the relationship between two variables, we found that for:
positive relationships, high values on one variable tend to go with high values for the
other variable, and vice versa
negative relationships, high values on one variable tend to go with low values for the other
variable, and vice versa.
The q-correlation coefficient gives a measure of the tendency for points in a scatterplot to
follow these patterns.
Example 1 Calculating the q-correlation coefficient
Calculate the q-correlation coefficient for the
scatterplot shown.10
10
9
9
8
8
7
76
5
5
4
4
3
3
2
2
1
10
6
y
x
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
150 Essential Standard General Mathematics
Solution
10
10
9
9
8
8
7
76
5
5
4
4
3
3
2
2
1
10
AB
y
C D
Median ofy-values
6
Median ofx-values
x
10
10
9
9
8
8
7
76
5
5
4
4
3
3
2
2
1
10
AB
C D
6
y
x
c = 2 d = 1
b = 1 a = 4
1 Find the median of the x-values. There are
11 points, so the median will be the 6th
point from the left.
2 Draw a vertical dotted line through this point.
3 Find the median of the y-values. There are
11 points, so the median will be the 6th point
up from the bottom of the scatterplot.
4 Draw a horizontal dotted line through this
point.
5 The scatterplot has now been divided into
four quadrants. Label them A, B, C and D,
proceeding anticlockwise from the top right.
6 Count the number of points in each of the
quadrants A, B, C and D.
Call these a, b, c and d respectively.
Any points that lie on the line are omitted.
7 The q-correlation coefficient is given by
q = (a + c) − (b + d)
a + b + c + d
Substitute the values for a, b, c and d and evaluate.
∴ q = (4 + 3) − (1 + 1)
4 + 1 + 3 + 1= 5
9
The properties of the q-correlation coefficient are summarised below.
The q-correlation coefficientThe q-correlation coefficient is defined by
q = (a + c) − (b + d)
a + b + c + d
x
y
AB
C Dwhere a, b, c and d are the numbers of points in the four
quadrants of the scatterplot, labelled A, B, C and D respectively.
Any points that lie on the lines are omitted
We can see that the q-correlation can take both positive and negative values.
Suppose that all the points lie in quadrants A and C, as shown. Then b = 0 and
d = 0 and
q = (a + c) − (0 + 0)
a + 0 + c + 0= (a + c)
(a + c)= 1
x
y
AB
C D
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
B a c k t o M e n u > > >
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
Chapter 4 — Bivariate data 151
Suppose all the points lie in quadrants B and D, as shown.
Then a = 0 and c = 0 and
q = (0 + 0) − (b + d)
0 + b + 0 + d= −(b + d)
(b + d)= −1
x
y
AB
C D
When there are an equal number of points in each
quadrant, then a + b = c + d, and
q = (a + c) − (b + d)
a + b + c + d= 0
4a= 0
x
y
AB
C DHere there is no relationship (q = 0).
Thus we can see that in general:� −1 ≤ q ≤ 1� If there is a positive relationship then most of the points are in A and C and q is positive.� If there is a negative relationship then most of the points are in B and D and q is negative.
Example 2 Calculating the q-correlation coefficient
Use the scatterplot opposite to calculate the
q-correlation coefficient for reaction time and
drug dosage.
Drug dosage (mg)
Rea
ctio
n ti
me
(min
)
0
10
20
30
40
50
60
70
1 2 3 4 5 6
Solution
Drug dosage (mg)
Rea
ctio
n tim
e (m
in)
0
10
20
30
40
50
60
70
1 2 3 4 5 6
b = 4 a = 1
c = 1 d = 4
AB
C D
1 Draw in the median line for both
variables on the scatterplot.
Since there are 10 points, the median
lines fall between the 5th and
6th points.
2 Count the number of points in each
of the quadrants A, B, C and D. Call
these a, b, c and d respectively.
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
152 Essential Standard General Mathematics
3 The q-correlation coefficient is given by
q = (a + c) − (b + d)
a + b + c + d
Substitute the values for
a, b, c and d and evaluate.q = (1 + 1) − (4 + 4)
1 + 4 + 1 + 4= −6
10= −0.6
Guidelines for classifying the strength of a linear relationshipusing the q-correlation coefficientEarlier, we used the degree of scatter in a
scatterplot to classify the strength of the
relationship observed as weak, moderate
or strong. Using the table opposite, we can
do the same using the q-correlation
coefficient.
For example, a q-correlation coefficient
of q = 0.86 indicates that there is a strong
positive relationship.
In contrast, a q-correlation coefficient of
q = −0.34 indicates that there is a weak
negative relationship.
Strong positive relationship 0.75 ≤ q ≤ 1
Moderate positive relationship 0.5 ≤ q < 0.75
Weak positive relationship 0.25 ≤ q < 0.5
No relationship –0.25 < q < 0.25
Weak negative relationship –0.5 < q ≤ –0.25
Moderate negative relationship –0.75 < q ≤ –0.5
Strong negative relationship –1 ≤ q ≤ –0.75
Correlation and causationThe existence of even a strong relationship between two variables is not, in itself, sufficient to
imply that altering one variable causes a change in the other. It only implies that this may be
the explanation. It may be that both the measured variables are affected by a third and different
variable. For example, if data about the variables crime rates and unemployment in a range of
cities were gathered, a high correlation would be found. But could it be inferred that high
unemployment causes high crime rates? The explanation could be that both of these variables
are dependent on other variables, such as home circumstances, peer group pressure, level of
education or economic conditions, all of which may be related to both unemployment and
crime rates. These two variables may vary together, without one being the direct cause of the
other. Correlations must be interpreted with care.
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
Chapter 4 — Bivariate data 153
Exercise 4C
1 Use the table of q-correlation coefficients to classify each of the following.
a q = 0.20 b q = −0.30 c q = −0.85 d q = 0.33
e q = 0.95 f q = −0.75 g q = 0.75 h q = −0.24
i q = −1 j q = 0.25 k q = 1 l q = −0.50
2 Calculate the value of the q-correlation coefficient for each of the following scatterplots.
a10
101
1
2
2
3
3
4
4
5
5
6
6
7
7
8
89
90
y
x
b10
101
1
2
2
3
3
4
4
5
5
6
6
7
7
8
89
90
y
x
c
10
101
1
2
2
3
3
4
4
5
5
6
6
7
7
8
89
90
y
x
dy
10
101
1
2
2
3
3
4
4
5
5
6
6
7
7
8
89
90 x
e10
101
1
2
2
3
3
4
4
5
5
6
6
7
7
8
89
90
y
x
f10
101
1
2
2
3
3
4
4
5
5
6
6
7
7
8
89
90
y
x
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
154 Essential Standard General Mathematics
3 Calculate the q-correlation coefficient for each pair of variables shown in the following
scatterplots.
a
0 2 4 6 8
Pri
ce (
$’00
0)
Age (years)
16
14
12
10
8
b
Cal
f dia
met
er (
cm)
Age (years)
25
20
15
5
0
10
2220 24 26 28 30 32 34 36
c 90
80
70
60
50
40
30
20
10
0 5 10 15 20 25 30 35 40
Mar
k (%
)
Time (hours)
4.4 Fitting lines to scatterplotsIf the points on the scatterplot tend to lie on a straight line, then we can fit a line to the
scatterplot. The process of fitting a straight line to bivariate data is known as linear
regression. The aim of linear regression is to model the relationship between two numerical
variables by using a simple equation: the equation of a straight line.
In regression, we write the equation of a straight line as
y = a + bx
where:
y is the dependent variable (DV)
x is the independent variable (IV).
a is the y-intercept of the line
b is the slope of the line.
Once we have the equation, we can use it to predict the value of the dependent variable (y) for
different values of the independent variable (x).
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
Chapter 4 — Bivariate data 155
Fitting a line by eyeWhat we want to find is the straight line that ‘best’ fits the data. You met this idea earlier, in the
chapter on linear graphs. There is no one way of finding the line that best fits a set of bivariate
data. There are many ways.
The easiest way to fit a line to bivariate data is to construct a scatterplot and draw the line in
‘by eye’. To do this, place a ruler on the scatterplot in a position that captures the general trend
of the data, and then use the ruler to draw a straight line. This method works best when the
points in the scatterplot are reasonably tightly clustered around a straight line.
Once the line is drawn, we can use the methods you learned in ‘Linear graphs’ (Chapter 3)
to find its equation. The starting point for fitting a line ‘by eye’ is a scatterplot.
Example 3 Fitting a line by eye using the intercept and slope
The scatterplot opposite plots mark against
time spent studying for an examination, for
10 students.
In this plot, mark is the y (or dependent) variable
and time is the x (or dependent) variable.
Fit a line to the scatterplot by eye and write its
equation in terms of:
a x and y
b the variables mark and time.
90
80
70
60
50
40
30
20
10
05 10 15 20 25 30 35 40
Mar
k (%
)
Time (hours)
Solution90
80
70
60
50
40
30
20
10
0 5 10 15 20 25 30 35 40
Mark
(%
)
Time (hours)
run = 35
rise = 60
1 Place a transparent ruler on the scatterplot so
that the points in the scatterplot are reasonably
evenly spread around the line made by the
edge of the ruler.
2 Draw in the line.
3 Find the equation of the line in terms
of y and x.
As the y-intercept can be read from the graph,
use the intercept–slope form of the equation
of a straight line, y = a + bx .
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
156 Essential Standard General Mathematics
To calculate the slope, choose two
easily read points that are reasonably
widely separated. The points (0, 30)
and (35, 90) are suitable.
Substitute the values for a and b into
the equation.
4 Noting that y represents the variable mark
and x represents the variable time, rewrite
the equation in terms of mark and time.
y = a + b x
a = y -intercept = 30
b = slope = rise
run= 60
35= 1.7 (to 1 d.p.)
∴ y = 30 + 1.7x
∴ mark = 30 + 1.7 × time
Example 4 Fitting a line by eye using the two-point formula
The scatterplot on the right plots weight against
height for eight people. In this plot, height is the
x (or dependent) variable, and weight is the y
(or dependent) variable.
Fit a line to the scatterplot by eye and write its
equation in terms of:
a x and y
b the variables weight and height.
155
160
165
170
175
180
185
190
195
50
55
60
65
70
75
80
Height (cm)
Wei
ght (
kg)
Solution
155
160
165
170 175 180
185
190
195
50
55
60
65
70
75
80
Height (cm)
Wei
ght
(kg)
1 Place a ruler on the scatterplot
so that the points in the scatterplot
are reasonably evenly spread around
the line.
2 Draw in the line.
3 Find the equation of the line in
terms of y and x.
As the y-intercept cannot be read from
the graph, use the two-point formula,
y − y1
x − x1= y2 − y1
x2 − x1
or use a graphics calculator.
Choose two easily read points that are
reasonably widely separated. The points
(155, 53) and (195, 77) are suitable.
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
Chapter 4 — Bivariate data 157
Either substitute these values into the
formula and transform or use a graphics
calculator (see page 109).
4 Noting that y represents the variable
weight and x represents the variable
height, rewrite the equation in terms of
weight and height.
y − y 1
x − x 1
= y 2 − y 1
x 2 − x 1
x 1 = 155, y 1 = 53; x2 = 195, y 2 = 77
y − 53
x − 155= 77 − 53
195 − 155
y − 53
x − 155= 0.6
y − 53 = 0.6(x − 155)
y − 53 = 0.6x − 93
∴ y = −40 + 0.6x
∴ height = −40 + 0.6 × weight
Fitting a line using the two-mean methodWhile fitting a line by eye is quick and easy, it is not a reliable method for finding the equation
of the line that best fits a scatterplot, as everyone is likely to come up with a slightly different
line.
One method for overcoming this problem is to use the two-mean method. The two-mean
method locates the line on the scatterplot by finding the mean of the bottom half and top half
of the data values and draws a line between the two.
To fit a line using the two-mean method requires both the scatterplot and the data values.
Example 5 Fitting a line using the two-mean method
The data below give the marks that students obtained on an examination and the times they
spent studying for the examination.
Time (hours), x 4 36 23 19 1 11 18 13 18 8
Mark (%), y 41 87 67 62 23 52 61 43 65 52
Fit a line to the scatterplot using the two-mean method and write its equation in terms of:
a x and y b the variables mark and time.
Solution
1 Rewrite the data pairs in order, according to the x values.
Time, x 1 4 8 11 13 18 18 19 23 36
Mark, y 23 41 52 52 43 61 65 62 67 87
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
158 Essential Standard General Mathematics
2 Divide the ordered table into two new tables: one for the lower half of data values, the
other for the top half of data values. Find the mean values of x and y for each new table.
Lower half
Time,x 1 4 8 11 13 -x L = 7.4
Mark,y 23 41 52 52 43 -yL = 42.2
Upper half
Time,x 18 18 19 23 36 -xu = 22.8
Mark,y 61 65 62 67 87 -y u = 68.490
80
70
60
50
40
30
20
10
05 10 15 20 25 30 35 40
Mark
(%
)
Time (hours)
(7.4, 42.2)
(22.8, 68.4)
3 Plot the two mean points (7.4, 42.2)
and (22.8, 68.4) on the scatterplot.
4 Draw in the line through the two mean
points to plot the two-mean line.
5 Use the two mean points (7.4, 42.2)
and (22.8, 68.4) to find the equation of the
line in terms of y and x. Use either the
two-point formula or a graphics calculator
(see page 109).
6 Rewrite the equation of the two-mean line
in terms of the variables mark and time.
Equation of the two-mean line :
y = 29.6 + 1.7x
∴ mark = 29.6 + 1.7 × time
It is interesting to note that the equation of the two-mean line is very close to the equation we
got by fitting a line by eye. This is often the case when the points in the scatterplot are
reasonably closely scattered around the line. However, for scatterplots where this is not the
case, the two-mean method is a more reliable technique to use than fitting a line by eye.
To find the equation of the two-mean line:Order the data pairs according to the x values and divide into two equal-sized groups:
lower and upper. If there is an odd number of data points, discard the middle data point.
Find the coordinates of the point (xL , yL ), where xL is the mean of the x values in the
lower half and yL the mean of the y values in the lower half.
Find the coordinates of the point (xU , yU ), where xU is the mean of the x values in the
upper half and yU the mean of the y values in the upper half.
Mark in the two points on the scatterplot. Draw a line through the two points to display
the two-mean line.
Use the two points (xL , yL ) and (xU , yU ) to find the equation of the line. This can be
done using either the two-point formula or a graphics calculator.
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
Chapter 4 — Bivariate data 159
Exercise 4D
1 Fit a line by eye to the scatterplot opposite.
Write the equation of the line in terms of the
variables infant death rate and female literacy
rate.
180160140120100
80604020
0 25 50 75 100Female literacy rate (%)
Infa
nt d
eath
rat
e (p
er 1
00 0
00)
2 Fit a line by eye to the scatterplot opposite.
Write the equation of the line in terms of the
variables height and age.
36 40 44 48 52 56 60
85
90
80
95
Age (months)H
eigh
t (cm
)
100
3 Fit a line by eye to the scatterplot opposite.
Write the equation of the line in terms of the
variables daughter’s height and mother’s height.
150
160
170
180
160150 170 180Mother’s height (cm)
Dau
ghte
r’s
heig
ht (
cm)
4 The data below gives the velocity of a motorbike (in m/s) over a 5-second interval. Also
shown is the scatterplot in which velocity is plotted against time.
Time (s) Velocity (m/s)
0.5 19.3
1 20.4
1.5 18.6
2 22.2
3 22.5
3.5 24.3
4 22.5
5 25.5
0
5
10
15
20
25
30
1 2 3 4 5Time (s)
Vel
ocit
y (m
/s)
Find the equation of the two-mean line for this data. Write the equation in terms of the
variables velocity and time.Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
160 Essential Standard General Mathematics
5 The data below gives the prices and ages of 12 used cars. Also shown is the scatterplot
constructed from this data.
Age (years) Price ($)
2 15 800
3 14 300
3 13 800
4 11 800
4 13 000
4 13 300
5 11 000
6 12 200
6 9 500
7 8 300
7 9 700
8 8 000
0 2 4 6 8
Pri
ce (
$’00
0)
Age (years)
16
14
12
10
8
Find the equation of the two-mean line for this data. Write the equation in terms of the
variables price and age.
6 The data below gives the airspeed and the number of seats in 8 aircraft. Also shown is the
scatterplot constructed from this data.
Number of seats Airspeed (km/hr)
405 830
296 797
288 774
258 736
240 757
193 765
188 760
148 718 100 150 200 250 300 350 400 450
700
725
750
775
800
825
850
Number of seats
Air
spee
d (k
m/h
)
Find the equation of the two-mean line for this data. Write the equation in terms of the
variables airspeed and number of seats.
4.5 Using regression lines to make predictionsAs we said earlier, the process of fitting a straight line to bivariate data is known as linear
regression. The aim of linear regression is to model the relationship between two numerical
variables by using the equation of a straight line. Once we have this equation, we can use the
equation to make predictions.
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
Chapter 4 — Bivariate data 161
For example, in Example 5 we fitted a line to the data relating students’ marks on an
examination to the time they spent studying for the examination. The equation was
mark = 29.6 + 1.7 × time
Using this equation, and rounding off to the nearest whole number, we would predict that a
student who spent:
0 hours studying would obtain a mark of 30% (mark = 29.6 + 1.7 × 0 = 29.6)
8 hours studying would obtain a mark of 43% (mark = 29.6 + 1.7 × 8 = 43.2)
12 hours studying would obtain a mark of 50% (mark = 29.6 + 1.7 × 12 = 50)
30 hours studying would obtain a mark of 81% (mark = 29.6 + 1.7 × 30 = 80.6)
80 hours studying would obtain a mark of 166%! (mark = 29.6 + 1.7 × 80 = 165.6)
This last result points to one of the limitations of regression lines. We are predicting someone
to get more than 100%. When using a regression line to make predictions, we must remember
that, strictly speaking, the equation only applies to the range of data values used to determine
the equation.
Thus, we are safe using the line to make predictions within this data range. This is called
interpolation.
However, we must be extremely careful about how much faith we put into predictions made
outside the data range. Making predictions outside the data range is called extrapolating.
Predicting within the range of data is called interpolation.
Predicting outside the range of data is called extrapolation.
For example, if we use the regression
line to predict the examination mark for
30 hours of studying time, we would be
interpolating because we would be making
a prediction within the data.
However, if we use the regression
line to predict the examination mark for
50 hours of studying time, we would be
extrapolating because we would be making
a prediction outside the data. Extrapolation
is a less reliable process than interpolation
because we are going beyond the original
data, and we don’t know if the relationship is
still linear there.
0
20
40
60
80
100
120
140
160
180
200
10 20 30 40 50 60 70 80Time (hours)
Mar
k (%
)
Extrapolation: line isused to make predictionoutside the data range.
Interpolation: line isused to make predictionwithin the data range.
Exercise 4E
1 Complete the following sentences. Using a regression line to make a prediction:
a within the range of data that was used to derive the equation is called .
b outside the range of data that was used to derive the equation is called .
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
162 Essential Standard General Mathematics
2 For children between the ages of 36 and 60 months, the equation relating their height (in cm)
to their age (in months) is:
height = 72 + 0.4 × age
Use this equation to predict the height (to the nearest cm) of a child who is:
a 40 months old. Is this interpolation or extrapolation?
b 55 months old. Is this interpolation or extrapolation?
c 70 months old. Is this interpolation or extrapolation?
3 For shoe sizes between 6 and 12, the equation
relating a person’s weight (in kg) to shoe size is:
weight = 48.1 + 2.2 × shoe size
Use this equation to predict the weight (to
the nearest kg) of a person whose shoe size is:
a 5. Is this interpolation or extrapolation?
b 8. Is this interpolation or extrapolation?
c 11. Is this interpolation or extrapolation?
4 When preparing between 25 and 100 meals, a cafeteria’s cost (in dollars) is given by the
equation:
cost = 175 + 5.8 × number of meals
Use this equation to predict the cost (to the nearest dollar) of preparing:
a no meals. Is this interpolation or extrapolation?
b 60 meals. Is this interpolation or extrapolation?
c 89 meals. Is this interpolation or extrapolation?
5 For women of heights from 150 to 180 cm, the equation relating a daughter’s adult height
(in cm) to her mother’s height (in cm) is:
daughter’s height = 18.3 + 0.91 × mother’s height
Use this equation to predict (to the nearest centimetre) the adult height of a woman whose
mother is:
a 168 cm tall. Is this interpolation or extrapolation?
b 196 cm tall. Is this interpolation or extrapolation?
c 155 cm tall. Is this interpolation or extrapolation?
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
Review
Chapter 4 — Bivariate data 163
Key ideas and chapter summary
Scatterplot A scatterplot is used to help identify and describe the relationship
between two numerical variables.
25 30 35 40 45 50 55 60
2.5
21.5
1
3
3.5
44.5
5
Age
Socr
e on
hea
ring
test
DV
IV
In a scatterplot, the dependent variable (DV) is plotted on the
vertical axis and the independent variable (IV) on the horizontal
axis.
Identifying relationships A random cluster of points (no clear pattern) indicates that the
variables are unrelated.
A clear pattern in the scatterplot indicates that the variables are
related.
between two numericalvariables
Describing relationships Relationships are described in terms of:
direction (positive or negative) and outliers
strength (strong, moderate, weak or none).
in scatterplots
q-correlation coefficient The quadrant or q-correlation coefficient is a measure of the
strength of the relationship between two numerical variables.
The q-correlation coefficient is defined by
q = (a + c) − (b + d)
a + b + c + d
where, a, b, c and d are the number of points
in the four quadrants of the scatterplot
labelled A, B, C and D respectively.
Any points that lie on the lines are omitted.
x
y
O
AB
C D
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
B a c k t o M e n u > > >
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
Rev
iew
164 Essential Standard General Mathematics
q-correlation: strength The q-correlation coefficient
can be used to classify the
strength of the relationship
between two numerical
variables as weak, moderate
or strong, using the
guidelines shown in the
table.
Strong positive relationship 0.75 ≤ q ≤ 1
Moderate positive relationship 0.5 ≤ q < 0.75
Weak positive relationship 0.25 ≤ q < 0.5
No relationship –0.25 < q < 0.25
Weak negative relationship –0.5 < q ≤ –0.25
Moderate negative relationship –0.75 < q ≤ –0.5
Strong negative relationship –1 ≤ q ≤ –0.75
Fitting lines to A straight line can be used to model the relationship between two
numerical variables when the relationship is linear. This is known as
linear regression.
The relationship can then be described by a rule of the form
y = a + bx
where y is the dependent variable (DV), x is the independent
variable (IV), a is the y-intercept of the line and b is the slope of the
line.
scatterplots: linearregression
Fitting a line by eye Fitting a line by eye means drawing a line on the scatterplot that
captures the general trend of the data. It is most suitable when there is
minimal scatter in the scatterplot.
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
Review
Chapter 4 — Bivariate data 165
Fitting a line using The two-mean method positions the line on the scatterplot by finding the
mean of the bottom half and top half of the data values. A line is then
drawn between the two.
the two-meanmethod
Using a regression The regression line y = a + bx enables the value of y to be determined
for a given value of x.line to makepredictions
Interpolation and Predicting within the range of data is called interpolation.
Predicting outside the range of data is called extrapolation.extrapolation
Skills check
Having completed the current chapter you should be able to:
construct a scatterplot
use a scatterplot to comment on the direction of a relationship (positive or negative)
and possible outliers
calculate and interpret the q-correlation coefficient
determine the equation of a line drawn by eye
determine the equation of a two-mean line
use the equation of the line for prediction
distinguish between interpolation and extrapolation when using a line to make a
prediction.
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
Rev
iew
166 Essential Standard General Mathematics
Multiple-choice questions
1 For which one of the following pairs of variables would it be appropriate to
construct a scatterplot?
A Eye colour (blue, green, brown, other) and hair colour (black, brown, blonde,
red, other)
B Score out of 100 on a test for a group of Year 9 students and a group of Year 11
students
C Political party preference (Labor, Liberal, Other) and age in years
D Age in years and blood pressure in mm Hg
E Height in cm and sex (male, female)
2 For the scatterplot shown, the relationship between the
variables is best described as:
A weak negative
B strong negative
C no relationship
D weak positive
E strong positivex
y
3 For the scatterplot shown, the relationship between the
variables is best described as:
A weak negative
B strong negative
C no relationship
D weak positive
E strong positivex
y
4 For the scatterplot shown, the relationship between the
variables is best described as:
A weak negative
B strong negative
C no relationship
D weak positive
E strong positivex
y
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
Review
Chapter 4 — Bivariate data 167
5 For the scatterplot shown, the relationship between the
variables is best described as:
A weak negative
B strong negative
C no relationship
D weak positive
E strong positivex
y
6 A q-correlation coefficient of 0.32 would describe a relationship classified as:
A weak positive B moderate positive C strong positive
D close to zero E moderately strong
7 For the scatterplot shown, the q-correlation
coefficient is:
A −1
B −0.5
C 0
D 0.5
E 10 1 2 3 4 5 6 7 8 9 10
123456789
10
8 For the scatterplot shown, the q-correlation
coefficient is:
A −1
B −0.5
C 0
D 0.5
E 10 1 2 3 4 5 6 7 8 9 10
123456789
10
9 For the scatterplot shown, the q-correlation
coefficient is:
A 0.2
B 0.4
C 0.6
D 0.8
E 1.00 1 2 3 4 5 6 7 8 9 10
123456789
10
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
Rev
iew
168 Essential Standard General Mathematics
10 For the scatterplot shown, the line drawn by eye
would have an equation closest to:
A velocity = 5 × time
B velocity = 19 + 1 × time
C velocity = 1 + 19 × time
D velocity = 19 + 5 × time
E velocity = 5 + 19 × time
0
5
10
15
20
25
30
1 2 3 4 5Time (s)
Vel
ocit
y (m
/s)
11 For the scatterplot shown, the line drawn by
eye would have a slope closest to:
A −2000
B −1000
C −200
D 2000
E 1000
0 2 4 6 8P
rice
($’
000)
Age (years)
16
14
12
10
8
The following information relates to Questions 12 and 13The weekly income and weekly food costs for a group of 10 university students is given
in the following table.
Income ($) 150 250 300 300 380 450 600 850 950 1000
Food cost ($) 40 60 70 130 150 260 120 460 200 600
12 The equation of the two-mean line would be found by finding the equation of the
line passing through the points:
A (276, 90) and (770, 328) B (300, 70) and (850, 460)
C (90, 276) and (328, 770) D (150, 40) and (1000, 600)
E (276, 84) and (770, 334)
13 The equation of the two-mean line that would enable food cost to be predicted from
weekly income is closest to:
A food cost = 0.48 + 43 × income B food cost = 0.48 − 43 × income
C food cost = −43 + 0.48 × income D food cost = 240 + 1.4 × income
E food cost = 1.4 + 240 × income
The following information relates to Questions 14 and 15For incomes between $600 and $1200 per week, the equation of a line that relates
weekly expenditure on entertainment (in dollars) to weekly income (in dollars) is given
by:
expenditure = 40 + 0.10 × incomeCambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
Review
Chapter 4 — Bivariate data 169
14 The equation predicts that the amount spent on entertainment by a person with an
income of $800 is:
A $40 B $80 C $120 D $160 E $1200
15 The following statements relate to the equation
expenditure = 40 + 0.10 × income
Which statement is not true?
A Expenditure is the dependent variable. B Income is the independent variable.
C The slope of the line is 0.10. D The intercept of the line is 40.
E Using the line to predict the expenditure of a person with an income of $1500
per week is called interpolation.
Short-answer questions
1 The following table gives the number of times the ball was inside the 50 m line in an
AFL football game, and the team’s score in that game.
Inside 50 m 64 57 34 61 51 52 53 51 64 55 58 71
Score (points) 90 134 76 92 93 45 120 66 105 108 88 133
a Construct a scatterplot of score against the number of times the ball was
inside 50 m.
b From the scatterplot, describe any relationship between the two variables.
2 Determine the q-correlation coefficient for the
scatterplot shown.
0
10
20
30
40
50
60
70
80
5 10 15 20 25 30 35 40 45 50Time (min)
Dis
tanc
e (k
m)
3 The following scatterplot shows the relationship
between height and weight for a group of obese
people. A line by eye has been drawn on the
scatterplot. Find the equation of the line.
100
120
140
160
180
200
220
Wei
ght (
kg)
200190180170160150Height (cm)
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
Rev
iew
170 Essential Standard General Mathematics
4 The time taken to complete a task, and the number of errors on the task, were
recorded for a sample of 10 primary school children. Determine the equation of the
two-mean line that fits this data.
Time (s) 22.6 21.7 21.7 21.3 19.3 17.6 17.0 14.6 14.0 8.8
Errors 2 3 3 4 5 5 7 7 9 9
Extended-response questions
1 A marketing company wishes to predict the likely number of new clients that each of
its graduates will attract to the business in their first year of employment. It plans to
do this by using the graduates’ scores on a marketing examination in the final year of
their course.
Graduate Examination score Number of new clients
1 65 7
2 72 9
3 68 8
4 85 10
5 74 10
6 61 8
7 60 6
8 78 10
9 70 5
10 82 11
a Which is the independent variable and which is the dependent variable?
b Construct a scatterplot of this data.
c Describe the relationship between the number of new clients and the examination
score.
d Determine the value of the q-correlation coefficient for this data, and classify the
strength of the relationship.
e Determine the equation for the two-mean line and write it down in terms of the
variables number of new clients and examination score.
f Use your equation to predict, to the nearest whole number, the number of new
clients for a graduate who scored 100 on the examination.
g In making this prediction, are you interpolating or extrapolating?
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
Review
Chapter 4 — Bivariate data 171
2 To investigate the relationship between marks on an assignment and the final
examination mark, a sample of 10 students was taken. The table below indicates the
marks for the assignment and the final exam mark for each student.
Assignment mark 80 77 71 78 65 80 68 64 50 66
(max = 80)
Final exam mark 83 83 79 75 68 84 71 69 66 58
(max = 90)
a Which is the independent variable and which is the dependent variable?
b Construct a scatterplot of this data.
c Describe the relationship between the assignment mark and the final examination
mark.
d Determine the value of the q-correlation coefficient for this data, and classify the
strength of the relationship.
e Use your answer to part d to comment on the statement: ‘Good final exam marks
are the result of good assignment marks.’
f Determine the equation for the two-mean line and write it down in terms of the
variables final exam mark and assignment mark.
g Use your equation to predict the final examination mark for a student who scored
50 on the assignment.
h In making this prediction, are you interpolating or extrapolating?
3 A marketing firm wanted to investigate the relationship between airplay and CD
sales (in the following week) of newly released CDs. The following data was
collected on a random sample of 10 CDs.
Number of 47 34 40 34 33 50 28 53 25 46times played
Weekly sales 3950 2500 3700 2800 2900 3750 2300 4400 2200 3400
a Which is the independent variable and which is the dependent variable?
b Construct a scatterplot of this data.
c Describe the association between the number of times the CD was played and
weekly sales.
d Determine the value of the q-correlation coefficient for this data, and classify the
strength of the relationship.
e Determine the equation for the two-mean line and write it down in terms of the
variables number of times played and weekly sales.
f Use your equation to predict the weekly sales for a CD that was played 60 times.
g In making this prediction, are you interpolating or extrapolating?
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>
P1: FXS/ABE P2: FXS0521672600Xc04.xml CUAU034-EVANS September 15, 2008 17:27
Rev
iew
172 Essential Standard General Mathematics
4 The following table gives the gold-medal winning distance, in metres, for the men’s
long jump for the Olympic games for the years 1896 to 1996. (Some years were
missing owing to the two world wars.)
Year 1896 1900 1904 1908 1912 1920 1924 1928 1932 1936 1948 1952 1956
Distance (m) 6.35 7.19 7.34 7.49 7.59 7.16 7.44 7.75 7.65 8.05 7.82 7.57 7.82
Year 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004
Distance (m) 8.13 8.08 8.92 8.26 8.36 8.53 8.53 8.72 8.67 8.50 8.55 8.59
a Which is the independent variable and which is the dependent variable?
b Construct a scatterplot of these data.
c Describe the association between the distance and year.
d Determine the value of the q-correlation coefficient for these data, and classify the
strength of the relationship.
e Determine the equation for the two-mean line and write down in terms of the
variables distance and year.
f Use your equation to predict the winning distance in the year 2008.
g How reliable is the prediction made in part f?
Cambridge University Press • Uncorrected Sample Pages • 978-0-521-74049-4 2008 © Evans, Lipson, Jones, Avery, TI-Nspire & Casio ClassPad material prepared in collaboration with Jan Honnens & David Hibbard
SAMPLE
Back to Menu >>>