mathematics and statistics - wordpress.com and statistics james ward and james abdey fp0001 2013...

347
Mathematics and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme

Upload: hoangdiep

Post on 04-Apr-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Mathematics and Statistics

James Ward and James Abdey

FP0001

2013

International Foundation Programme

Page 2: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

This guide was prepared for the University of London International Programmes by:

J.M. Ward, The London School of Economics and Political Science

J.S. Abdey, The London School of Economics and Political Science

This is one of a series of subject guides published by the University. We regret that due to pressure

of work the authors are unable to enter into any correspondence relating to, or arising from, the

guide. If you have any comments on this subject guide, favourable or unfavourable, please use

the online form found on the virtual learning environment.

University of London International Programmes

Publications Office

Stewart House

32 Russell Square

London WC1B 5DN

United Kingdom

www.londoninternational.ac.uk

Published by: University of London

© University of London 2013

The University of London asserts copyright over all material in this subject guide except where

otherwise indicated. All rights reserved. No part of this work may be reproduced in any form, or

by any means, without permission in writing from the publisher. We make every effort to respect

copyright. If you think we have inadvertently used your copyright material, please let us know.

Cover image © Ocean/Corbis

Page 3: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Contents

Contents

Introduction 1

Route map to the guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Time management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Recommendations for working through the units . . . . . . . . . . . . . . 2

Overview of learning resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

The subject guide and textbooks . . . . . . . . . . . . . . . . . . . . . . 2

Online study resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Virtual Learning Environment (VLE) . . . . . . . . . . . . . . . . . . . . 3

Making use of the Online library . . . . . . . . . . . . . . . . . . . . . . 4

Examination advice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

Calculators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Part 1 Mathematics 6

Introduction to Mathematics 7

Syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Aims of the course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Learning outcomes for the course (Mathematics) . . . . . . . . . . . . . . . . . 8

Textbook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1 Review I — A review of some basic mathematics 9

1.1 Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.1.1 Basic arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.1.2 Fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.1.3 Powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.2 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.2.1 Algebraic expressions . . . . . . . . . . . . . . . . . . . . . . . . . 22

1.2.2 Equations, formulae and inequalities . . . . . . . . . . . . . . . . 26

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

i

Page 4: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Contents

2 Review II — Linear equations and straight lines 33

2.1 Linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2.1.1 Linear equations in one variable . . . . . . . . . . . . . . . . . . . 33

2.1.2 Linear equations in two variables . . . . . . . . . . . . . . . . . . 34

2.1.3 Visualising the solutions of linear equations in two variables . . . 35

2.2 Straight lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.2.1 Drawing straight lines given their equations . . . . . . . . . . . . 37

2.2.2 The intercepts of a straight line . . . . . . . . . . . . . . . . . . . 38

2.2.3 The gradient of a straight line . . . . . . . . . . . . . . . . . . . . 40

2.2.4 Finding the equation of a straight line . . . . . . . . . . . . . . . 40

2.2.5 Applications of straight lines . . . . . . . . . . . . . . . . . . . . . 42

2.3 Simultaneous equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

2.3.1 Visualising the solution to a pair of simultaneous equations . . . . 44

2.3.2 Solving simultaneous equations algebraically . . . . . . . . . . . . 45

2.3.3 An application of simultaneous equations in economics . . . . . . 47

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3 Review III — Quadratic equations and parabolae 50

3.1 Quadratic equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.1.1 Factorising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.1.2 Completing the square . . . . . . . . . . . . . . . . . . . . . . . . 53

3.1.3 Using the completed square form to solve quadratic equations . . 55

3.1.4 Warning! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.1.5 The quadratic formula . . . . . . . . . . . . . . . . . . . . . . . . 58

3.2 Parabolae . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

3.2.1 Sketching parabolae . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.2.2 Where do a parabola and a straight line intersect? . . . . . . . . . 62

3.2.3 Where do two parabolae intersect? . . . . . . . . . . . . . . . . . 63

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4 Functions 67

4.1 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.1.1 What is a function? . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4.1.2 Some common functions . . . . . . . . . . . . . . . . . . . . . . . 69

ii

Page 5: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Contents

4.1.3 Combinations of functions . . . . . . . . . . . . . . . . . . . . . . 72

4.1.4 Functions in economics . . . . . . . . . . . . . . . . . . . . . . . . 76

4.2 Inverse functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.2.1 Finding inverse functions . . . . . . . . . . . . . . . . . . . . . . . 77

4.2.2 Logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

5 Calculus I — Differentiation 86

5.1 The gradient of a curve at a point . . . . . . . . . . . . . . . . . . . . . . 86

5.1.1 Tangents to a parabola . . . . . . . . . . . . . . . . . . . . . . . . 87

5.1.2 Chords of a parabola . . . . . . . . . . . . . . . . . . . . . . . . . 88

5.1.3 Tangents to other curves . . . . . . . . . . . . . . . . . . . . . . . 91

5.2 What is differentiation? . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

5.2.1 Standard derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.2.2 Two rules of differentiation . . . . . . . . . . . . . . . . . . . . . . 94

5.2.3 Some general points on what we have seen so far . . . . . . . . . . 96

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

6 Calculus II — More differentiation 100

6.1 Three more rules of differentiation . . . . . . . . . . . . . . . . . . . . . . 100

6.1.1 The product rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

6.1.2 The quotient rule . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

6.1.3 The chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

6.1.4 Further applications of the chain rule . . . . . . . . . . . . . . . . 106

6.1.5 Using these rules of differentiation together . . . . . . . . . . . . . 107

6.2 Approximating functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 108

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

7 Calculus III — Optimisation 113

7.1 What derivatives tell us about functions . . . . . . . . . . . . . . . . . . 113

7.1.1 When is a function increasing or decreasing? . . . . . . . . . . . . 113

7.1.2 Stationary points . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

7.1.3 Second derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . 116

7.1.4 What second derivatives tell us about a function . . . . . . . . . . 117

iii

Page 6: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Contents

7.1.5 A note on the ‘large x ’ behaviour of functions . . . . . . . . . . . 118

7.2 Optimisation and curve sketching . . . . . . . . . . . . . . . . . . . . . . 119

7.2.1 Steps 1 and 2: Finding and classifying stationary points . . . . . . 120

7.2.2 Curve sketching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

7.2.3 Step 3: Looking for global maxima and global minima . . . . . . . 123

7.2.4 An economic application: Profit maximisation . . . . . . . . . . . 124

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

8 Calculus IV — Integration 128

8.1 Indefinite integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

8.1.1 Finding simple indefinite integrals . . . . . . . . . . . . . . . . . . 130

8.1.2 The basic rules of integration . . . . . . . . . . . . . . . . . . . . 132

8.2 Definite integrals and areas . . . . . . . . . . . . . . . . . . . . . . . . . 135

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

9 Financial Mathematics I — Compound interest and its uses 146

9.1 Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

9.1.1 A formula for balances under annually compounded interest . . . 148

9.1.2 Other compounding intervals . . . . . . . . . . . . . . . . . . . . 149

9.1.3 Continuous compounding . . . . . . . . . . . . . . . . . . . . . . 151

9.2 Problems involving interest rates . . . . . . . . . . . . . . . . . . . . . . 153

9.2.1 How much do I need to invest to get...? . . . . . . . . . . . . . . . 153

9.2.2 What interest rate do I need to get...? . . . . . . . . . . . . . . . 153

9.2.3 How long do I need to invest to get...? . . . . . . . . . . . . . . . 154

9.2.4 Annual percentage rates . . . . . . . . . . . . . . . . . . . . . . . 154

9.3 Depreciation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

10 Financial Mathematics II — Applications of series 158

10.1 Sequences and series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

10.1.1 Arithmetic sequences and series . . . . . . . . . . . . . . . . . . . 158

10.1.2 Geometric sequences and series . . . . . . . . . . . . . . . . . . . 161

10.2 Financial applications of geometric series . . . . . . . . . . . . . . . . . . 165

10.2.1 Regular saving plans . . . . . . . . . . . . . . . . . . . . . . . . . 165

iv

Page 7: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Contents

10.2.2 Annuities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

10.2.3 Future and present values . . . . . . . . . . . . . . . . . . . . . . 168

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

Part 2 Statistics 172

Introduction to Statistics 173

Syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

Aims of the course . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

Learning outcomes for the course (Statistics) . . . . . . . . . . . . . . . . . . . 174

Textbook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

11 Data exploration I — The nature of statistics 175

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175

11.1.1 Data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

11.1.2 Data classification . . . . . . . . . . . . . . . . . . . . . . . . . . 176

11.1.3 Data summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

11.1.4 Data display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176

11.1.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

11.1.6 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

11.1.7 Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

11.1.8 Descriptive and inferential statistics . . . . . . . . . . . . . . . . . 177

11.2 Types of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

11.3 The role of statistics in the research process . . . . . . . . . . . . . . . . 179

11.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

11.5 Key terms and concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

12 Data exploration II — Data visualisation 188

12.1 Grouping data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

12.2 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189

12.3 Pie charts and bar graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 192

12.4 Line graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

12.5 Scatter plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

v

Page 8: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Contents

12.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

12.7 Key terms and concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

13 Data exploration III — Descriptive statistics: measures of location,dispersion and skewness 199

13.1 Summation notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199

13.2 Measures of location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202

13.2.1 Which ‘average’ should be used? . . . . . . . . . . . . . . . . . . . 203

13.2.2 Frequency tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

13.3 Measures of dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

13.3.1 Variance and standard deviation . . . . . . . . . . . . . . . . . . . 207

13.3.2 Variance using frequency distributions . . . . . . . . . . . . . . . 209

13.4 Skewness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

13.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

13.6 Key terms and concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213

14 Probability I — Introduction to probability theory 215

14.1 Probability theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

14.1.1 Assigning probabilities . . . . . . . . . . . . . . . . . . . . . . . . 216

14.1.2 The classical method . . . . . . . . . . . . . . . . . . . . . . . . . 216

14.1.3 The relative frequency approach . . . . . . . . . . . . . . . . . . . 217

14.1.4 Subjective probabilities . . . . . . . . . . . . . . . . . . . . . . . . 217

14.2 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

14.3 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218

14.4 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

14.5 Complementary events . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

14.6 Additive laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

14.7 Multiplicative laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

14.8 Bayes’ theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225

14.8.1 Version 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

14.8.2 Version 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226

14.8.3 Version 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227

14.9 Summary — a listing of probability results . . . . . . . . . . . . . . . . . 228

vi

Page 9: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Contents

14.10Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

14.11Key terms and concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 231

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

15 Probability II — Probability distributions 236

15.1 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236

15.2 Discrete random variables . . . . . . . . . . . . . . . . . . . . . . . . . . 237

15.3 Continuous random variables . . . . . . . . . . . . . . . . . . . . . . . . 238

15.4 Mathematical expectation . . . . . . . . . . . . . . . . . . . . . . . . . . 238

15.5 Functions of a random variable . . . . . . . . . . . . . . . . . . . . . . . 239

15.6 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241

15.7 Discrete uniform distribution . . . . . . . . . . . . . . . . . . . . . . . . . 243

15.8 Bernoulli distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244

15.9 Binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245

15.10Poisson distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249

15.11A word on calculators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

15.12Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

15.13Key terms and concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253

16 Probability III — The Normal distribution and sampling distributions255

16.1 The Normal distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

16.1.1 Probabilities for any Normal distribution . . . . . . . . . . . . . . 261

16.1.2 Some probabilities around the mean . . . . . . . . . . . . . . . . . 262

16.2 Sampling distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263

16.2.1 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264

16.2.2 Sampling from a Normal population . . . . . . . . . . . . . . . . . 268

16.3 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

16.3.1 CLT examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271

16.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272

16.5 Key terms and concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273

17 Sampling and experimentation I — Sampling techniques and contact

vii

Page 10: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Contents

methods 275

17.1 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275

17.1.1 Non-probability sampling techniques . . . . . . . . . . . . . . . . 278

17.1.2 Probability sampling techniques . . . . . . . . . . . . . . . . . . . 281

17.1.3 Method of contact . . . . . . . . . . . . . . . . . . . . . . . . . . 284

17.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285

17.3 Key terms and concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286

18 Sampling and experimentation II — Bias and the design of experiments288

18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

18.2 Types of error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

18.3 Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290

18.4 Adjusting for non-response . . . . . . . . . . . . . . . . . . . . . . . . . . 291

18.5 Experimental design in the social and medical sciences . . . . . . . . . . 293

18.5.1 Experimental versus observational studies . . . . . . . . . . . . . 293

18.5.2 Randomised controlled clinical trials . . . . . . . . . . . . . . . . 293

18.5.3 Randomised blocks . . . . . . . . . . . . . . . . . . . . . . . . . . 294

18.5.4 Multi-factorial experimental designs . . . . . . . . . . . . . . . . . 294

18.5.5 Quasi-experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 294

18.5.6 Cluster randomised trials . . . . . . . . . . . . . . . . . . . . . . . 295

18.5.7 Analysis and interpretation . . . . . . . . . . . . . . . . . . . . . 295

18.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295

18.7 Key terms and concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 296

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296

19 Fundamentals of regression I — Correlation and the simple linear regressionmodel 298

19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

19.2 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301

19.3 Simple linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304

19.4 Parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305

19.5 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

19.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307

19.7 Key terms and concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 307

viii

Page 11: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Contents

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308

20 Fundamentals of regression II — Interpretation of computer output andassessing model adequacy 310

20.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310

20.2 Analysis of variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310

20.3 Coefficient of determination, R2 . . . . . . . . . . . . . . . . . . . . . . . 311

20.4 Computer output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

20.5 Several explanatory variables . . . . . . . . . . . . . . . . . . . . . . . . . 316

20.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317

20.7 Key terms and concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . 318

Learning outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318

Part 3 Appendices 321

A A sample examination paper 322

B Solutions to the sample examination paper 328

Section A: Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328

Section B: Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332

C Cumulative Normal probabilities 335

ix

Page 12: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Contents

x

Page 13: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Introduction

Introduction

Welcome to the world of Mathematics and Statistics! These are disciplines which arewidely applied in areas such as finance, business, management, economics and otherfields in the social sciences. The following units will provide you with the opportunity tograsp the fundamentals of these subjects and will equip you with some of the vitalquantitative skills and powers of analysis which are highly sought-after by employers inmany sectors.

As Mathematics and Statistics has so many applications, it should not be surprisingthat it forms the compulsory component of the International Foundation Programme.The analytical skills which you will develop on this course will therefore serve you wellin both your future studies and beyond in the real world of work. The material in thiscourse is necessary as preparation for other courses you may study later on as part of adegree programme or diploma; indeed, in many cases a course in Mathematics orStatistics is a compulsory component on the University of London InternationalProgrammes’ degrees.

Route map to the guide

This subject guide provides you with a framework for covering the syllabus of theMathematics and Statistics course in the International Foundation Programme anddirects you to additional resources such as readings and the virtual learningenvironment (VLE).

The following 20 units will introduce you to these disciplines and equip you with thenecessary quantitative skills to assist you in further programmes of study. Given thecumulative nature of Mathematics and Statistics, the units are not a series ofself-contained topics, rather they build on each other sequentially. As such, you arestrongly advised to follow the subject guide in unit order. There is little point in rushingpast material which you have only partially understood in order to reach the final unit.

Once you have completed your work on all of the units, you will be ready forexamination revision. A good place to start is the sample examination paper which youwill find at the end of the subject guide.

Time management

About one-third of your private study time should be spent reading and the othertwo-thirds doing problems. (Note the emphasis on practising problems!)

To help your time management, each unit of this course should take a week to studyand so you should be spending 10 weeks on Mathematics and 10 weeks on Statistics.

1

Page 14: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Introduction

Recommendations for working through the units

The following procedure is recommended for each unit.

i. Read the overview and the aims of the unit.

ii. Now work through each section of the unit making sure you can understand theexamples given and try the activities as you encounter them. In parallel, watch theaccompanying video tutorials for each section on the VLE.

iii. At the end of the unit, review the intended learning outcomes carefully, almost asa checklist. Do you think you have achieved these targets?

iv. Attempt the unit’s self-test quizzes on the VLE. You can treat these as additionalactivities. This time, though, you will have to think a little about which part ofthe new material you have learnt is appropriate to each question.

v. Attempt the exercises given at the end of the unit. The solutions can be found onthe VLE, but you should only look at these after attempting them yourself!

vi. If you have problems at this point, go back to the subject guide and work throughthe area you find difficult again. Don’t worry — you will improve yourunderstanding to the point where you can work confidently through the problems.

The last few steps are most important. It is easy to think that you have understood thetext after reading it, but working through problems is the crucial test ofunderstanding. Problem-solving should take most of your study time (refer to the‘Time management’ section above). Note that we have given worked examples andactivities to cover each substantive topic in the subject guide. The essential readingexamples are added for further consolidation of the whole unit and also to help youwork out exactly what the questions are about! One of the problems studentssometimes have in an examination is that they waste time trying to understand whichpart of the syllabus particular questions relate to. These final questions, together withthe further explanations on the VLE, aim to help with this before you tackle the sampleexamination questions at the end of each unit.

Try to be disciplined about this: don’t look up the answers until you have done yourbest. Some of the ideas you encounter may seem unfamiliar at first, but your attemptsat the questions, however dissatisfied you feel with them, will help you understand thematerial far better than reading and re-reading the prepared answers — honest!

So to conclude, perseverance with problem-solving is your passport to a strongexamination performance. Attempting (ideally successfully!) all the cited exercises is ofparamount importance.

Overview of learning resources

The subject guide and textbooks

This subject guide for Mathematics and Statistics has been structured so that it istailored to the specific requirements of the examinable material. It is ‘written to the

2

Page 15: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Introduction

course’, unlike textbooks which may cover additional material which will not beexaminable or may not cover some material that is! Therefore the subject guide shouldact as your principal resource.

However, a textbook may give an alternative explanation of a topic (which is useful ifyou have difficulty following something in the subject guide) and so you may want toconsult one for further clarification. Additionally, a textbook will contain furtherexamples and exercises which can be used to check and consolidate your understanding.For this course, a useful starting point is

+ Swift, L., and S. Piff Quantitative methods for business, management and finance.(Palgrave, 2010) third edition [ISBN 9780230218246].

as this will serve as useful background reading. But, many books are available coveringthe material frequently found in mathematics and statistics courses like this one and so,if you need a textbook for background reading, you should find one that is appropriateto your level and tastes.

Online study resources

In addition to the subject guide and the Essential reading, it is crucial that you takeadvantage of the study resources that are available online for this course, including theVLE and the Online Library.

You can access the VLE, the Online Library and your University of London emailaccount via the Student Portal at http://my.londoninternational.ac.uk

You should have received your login details for the Student Portal with your officialoffer, which was emailed to the address that you gave on your application form. Youhave probably already logged in to the Student Portal in order to register. As soon asyou registered, you will automatically have been granted access to the VLE, OnlineLibrary and your fully functional University of London email account.

If you have forgotten these login details, please click on the ‘Forgotten your password’link on the login page.

Virtual Learning Environment (VLE)

The VLE, which complements this subject guide, has been designed to enhance yourlearning experience, providing additional support and a sense of community. In additionto making printed materials more accessible, the VLE provides an open space for you todiscuss interests and to seek support from other students, working collaboratively tosolve problems and discuss subject material. In a few cases, such discussions are drivenand moderated by an academic who offers a form of feedback on all discussions. In othercases, video material, such as audio-visual tutorials, are available. These will typicallyfocus on taking you through difficult concepts in the subject guide. For quantitativecourses, such as Mathematics and Statistics, fully worked-through solutions ofpractice examination questions are available. For some qualitative courses, academicinterviews and debates will provide you with advice on approaching the subject andexamination questions, and will show you how to build an argument effectively.

Past examination papers and Examiners’ commentaries will be available to download in

3

Page 16: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Introduction

due course (the first examination for this course will be sat in 2014) and these provideadvice on how each examination question might best be answered. Self-testing activitiesallow you to test your knowledge and recall of the academic content of various courses.Finally, a section of the VLE has been dedicated to providing you with expert advice onpractical study skills such as preparing for examinations and developing digital literacyskills.

Making use of the Online library

The Online library contains a huge array of journal articles and other resources to helpyou read widely and extensively.

Essential reading journal articles listed on a number of reading lists are available todownload from the Online library.

The easiest way to locate relevant content and journal articles in the Online library is touse the Summon search engine.

If you are having trouble finding an article listed on the reading list, try:

1. removing any punctuation from the title, such as single quotation marks, questionmarks and colons, and/or

2. putting quotation marks around the title, for example “Why the banking systemshould be regulated”.

To access the majority of resources via the Online library you will either need to useyour University of London Student Portal login details, or you will be required toregister and use an Athens login: http://tinyurl.com/ollathens

Examination advice

Important: the information and advice given in the following section are based on theexamination structure used at the time this subject guide was written. Please note thatsubject guides may be used for several years. Because of this, we strongly advise you tocheck both the current Regulations for relevant information about the examination,and the current Examiners’ commentaries, where you should be advised of anyforthcoming changes. You should also carefully check the rubric/instructions on thepaper you actually sit and follow those instructions.

The examination is by a two-hour, unseen, written paper. No books may be taken intothe examination, but you will be provided with extracts of statistical tables (asreproduced in this subject guide). A calculator may be used when answering questionson this paper, see below, and it must comply in all respects with the specification givenin the General Regulations.

The examination comprises two sections, each containing three compulsory questions.Section A covers the mathematics part of the course counting for 50% of the marks, andSection B covers the statistics part of the course for the remaining 50% of the marks.You are required to pass both Sections A and B to pass the examination.

4

Page 17: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Introduction

In each section, the first question contains four short questions worth 5 marks each,followed by two longer questions worth 15 marks each. Since the examination will seekto assess a broad cross-section of the syllabus, we strongly advise you to study thewhole syllabus. A sample examination paper is provided at the end of this subject guidealong with a commentary providing extensive advice on how to answer each question.

Remember, it is important to check the VLE for:

Up-to-date information on examination and assessment arrangements for thiscourse.

Where available, past examination papers and Examiners’ commentaries forthe course which give advice on how each question might best be answered.

Calculators

You will need to provide yourself with a basic calculator. It should not beprogrammable, because such machines are not allowed in the examination by theUniversity. The most important thing is that you should accustom yourself to usingyour chosen calculator and feel comfortable with it. Your calculator must comply in allrespects with the specification given in the General Regulations.

5

Page 18: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Part 1Mathematics

6

Page 19: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Introduction to Mathematics

Introduction to Mathematics

Syllabus

This half of the course introduces some of the basic ideas and methods of Mathematicswith an emphasis on their application. The Mathematics part of this course has thefollowing syllabus.

Arithmetic and algebra: A review of arithmetic (including the use of fractionsand decimals) and the manipulation of algebraic expressions (including the use ofbrackets and the power laws). Solving linear equations and the relationshipbetween linear expressions and straight lines (including the solution ofsimultaneous linear equations). Solving quadratic equations and the relationshipbetween quadratic expressions and parabolae.

Functions: An introduction to functions. Some common functions (includingpolynomials, exponentials, logarithms and trigonometric functions). The existenceof inverse functions and how to find them. The laws of logarithms and their uses.

Calculus: The meaning of the derivative and how to find it (including theproduct, quotient and chain rules). Using derivatives to find approximations andsolve simple optimisation problems with economic applications. Curve sketching.Integration of simple functions and using integrals to find areas.

Financial mathematics: Compound interest over different compoundingintervals. Arithmetic and geometric sequences. The sum of arithmetic andgeometric series. Investment schemes and some ways of assessing the value of aninvestment.

Aims of the course

The aims of the Mathematics part of this course are to provide:

a grounding in arithmetic and algebra;

an overview of functions and the fundamentals of calculus;

an introduction to financial mathematics.

Throughout, the treatment is at an elementary mathematical level but, as you progressthrough this part of the course, you should develop some quite sophisticatedmathematical skills.

7

Page 20: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Introduction to Mathematics

Learning outcomes for the course (Mathematics)

At the end of the Mathematics part of the course, you should be able to:

manipulate algebraic expressions;

graph, differentiate and integrate simple functions;

calculate basic quantities in financial mathematics.

Textbook

As previously mentioned in the main introduction, this subject guide has been designedto act as your principal resource. The textbook

+ Swift, L., and S. Piff Quantitative methods for business, management and finance.(Palgrave, 2010) third edition [ISBN 9780230218246]

may be useful as ‘background reading’ but it is not essential. However, you mightbenefit from reading parts of it if you find any of the material difficult to follow at first.

8

Page 21: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

Unit 1: Review IA review of some basic mathematics

Overview

In this unit we revise some material on arithmetic and algebra which you should haveencountered before. Starting with arithmetic, this will involve revising the basicmathematical operations and how they can be combined with and without the use ofbrackets, how we can manipulate fractions and the use of powers. We then look at somebasic algebra and see how to use and manipulate algebraic expressions.

Aims

The aims of this unit are as follows.

To revise the basics of arithmetic, including the use of fractions and powers.

To revise the most basic ideas behind algebra.

Specific learning outcomes can be found near the end of this unit.

1.1 Arithmetic

In this section we revise some material which could be called ‘arithmetic’. The ideabehind this revision is to refresh our memories about how things like brackets, fractionsand powers work so that our revision of ‘algebra’ in the next section will, hopefully, beeasier.

1.1.1 Basic arithmetic

In mathematics we use four basic mathematical operations:

addition denoted by ‘+’ gives us ‘sums’, e.g. 6 + 3 = 9;

subtraction denoted by ‘−’ gives us ‘differences’, e.g. 6− 3 = 3;

multiplication denoted by ‘×’ or ‘·’ gives us ‘products’, e.g. 6× 3 = 18 or6 · 3 = 18;

division denoted by ‘÷’ or a ‘horizontal line’ gives us ‘quotients’, e.g. 6÷ 2 = 3 or62

= 3.

In particular, notice that there are two common notations for multiplication anddivision. For multiplication, the reason for this is that a handwritten ‘×’ can be

9

Page 22: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

confused with a handwritten ‘x’ whereas, for division, the reason is that writingexpressions that involve division (i.e. ‘÷’) as fractions enables us to manipulate themmore easily using the laws of fractions.

Combinations of operations

Often, different mathematical operations will occur in the same expression. Forexample, we might be asked to work out the values of the expressions

1. 22− 7 + 12− 26 + 1,

2. 125÷ 25× 2× 3÷ 15,

3. 22− 20× 3÷ 4− 5.

In such cases, we have the following rules.

1. If only addition and subtraction are involved: We work from left to right to get

22− 7︸ ︷︷ ︸+12− 26 + 1 = 15 + 12︸ ︷︷ ︸−26 + 1 = 27− 26︸ ︷︷ ︸+1 = 1 + 1︸ ︷︷ ︸ = 2.

2. If only multiplication and division are involved: We work from left to right to get

125÷ 25︸ ︷︷ ︸×2× 3÷ 15 = 5× 2︸ ︷︷ ︸×3÷ 15 = 10× 3︸ ︷︷ ︸÷15 = 30÷ 15︸ ︷︷ ︸ = 2.

3. When addition/subtraction and multiplication/division are involved: We work outthe multiplications and divisions first (working left to right as necessary) and thenwe do the additions and subtractions (working left to right as necessary) to get

22− 20× 3︸ ︷︷ ︸÷4− 5 = 22− 60÷ 4︸ ︷︷ ︸−5 = 22− 15︸ ︷︷ ︸−5 = 7− 5︸ ︷︷ ︸ = 2.

Brackets I: Evaluating expressions that involve brackets

If an expression involves brackets, then the operations within the brackets must beperformed first. As such, brackets can be used to change the order in which operationsare performed. For example, we might be asked to work out the values of the expressions

1. 9− (4 + 3) as opposed to 9− 4 + 3,

2. 6÷ (2× 3) as opposed to 6÷ 2× 3,

3. (12× 3− 8)× 2 as opposed to 12× 3− 8× 2.

In such cases, we work out the expression in brackets first, i.e. we get

1. working out the expression in brackets first we get

9− (4 + 3︸ ︷︷ ︸) = 9− 7 = 2,

as opposed to9− 4︸ ︷︷ ︸+3 = 5 + 3 = 8,

where we work from left to right.

10

Page 23: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

2. working out the expression in brackets first we get

6÷ (2× 3︸ ︷︷ ︸) = 6÷ 6 = 1,

as opposed to6÷ 2︸ ︷︷ ︸×3 = 3× 3 = 9,

where we work from left to right.

3. working out the expression in brackets first, proceeding to the rules above asnecessary, we have

(12× 3︸ ︷︷ ︸−8)× 2 = (36− 8︸ ︷︷ ︸)× 2 = 28× 2 = 56,

as opposed to12× 3︸ ︷︷ ︸− 8× 2︸ ︷︷ ︸ = 36− 16 = 20,

where we multiply first and then subtract.

What if we have two or more sets of brackets? Well, if they are not ‘nested’, for exampleif we have

(12× 3− 8)× (24− 14),

then we need to work out what is in each of the brackets first, proceeding according tothe rules above, i.e.

(12× 3︸ ︷︷ ︸−8)× (24− 14︸ ︷︷ ︸) = (36− 8︸ ︷︷ ︸)× 10 = 28× 10 = 280.

And, if the brackets are ‘nested’, for example

6 + (9− (4 + 3)),

then we start with the innermost set of brackets and work ‘outwards’, i.e.

6 + (9− (4 + 3︸ ︷︷ ︸)) = 6 + (9− 7︸ ︷︷ ︸) = 6 + 2 = 8.

These rules allow you to work out the values of simple mathematical expressions usingbrackets. In a moment we shall see another way of dealing with brackets which will bemore useful to us in this course.

Negative numbers

Consider the following three expressions and their values.

1. 6− 3 = +3,

2. 6− 6 = 0,

3. 6− 9 = −3.

In this case, we can see that subtracting larger and larger numbers from six, gives us apositive answer, zero and a negative answer respectively. For simplicity, we usually omitthe ‘+’ sign and write ‘+3’, say, as 3.

When we have expressions involving negative numbers, we have the following handyrules.

11

Page 24: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

1. Adding a negative number: This has the same effect as subtracting thecorresponding positive number, e.g.

5 + (−3) = 5− (+3) = 5− 3 = 2,

and

−5 + (−3) = −5− (+3) = −5− 3 = −8.

2. Subtracting a negative number: This has the same effect as adding thecorresponding positive number, e.g.

5− (−3) = 5 + (+3) = 5 + 3 = 8.

and

−5− (−3) = −5 + (+3) = −5 + 3 = −2.

3. Multiplying a positive number by a negative number: This gives us a negativenumber, e.g.

(+5)× (−3) = −(5× 3) = −15.

and

(−5)× (+3) = −(5× 3) = −15.

This is normally remembered as ‘positive times negative is negative’.

4. Multiplying a negative number by a negative number: This gives us a positivenumber, e.g.

(−5)× (−3) = +(5× 3) = +15 = 15.

This is normally remembered as ‘negative times negative is positive’.

5. Dividing a positive number by a negative number (or vice versa): This gives us anegative number, e.g.

(+6)÷ (−3) = −(6÷ 3) = −2.

and

(−6)÷ (+3) = −(6÷ 3) = −2.

This is normally remembered as ‘positive divided by negative is negative’ (or viceversa).

6. Dividing a negative number by a negative number: This gives us a positive number,e.g.

(−6)÷ (−3) = +(6÷ 3) = +2 = 2.

This is normally remembered as ‘negative divided by negative is positive’.

Indeed, notice the similarity between (3) and (5) which can be remembered as‘multiplying (or dividing) a positive and a negative yields a negative’ and (4) and (6)which can be remembered as ‘multiplying (or dividing) a negative and a negative yieldsa positive’.

12

Page 25: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

Brackets II: Removing brackets from expressions

A more useful way of thinking about brackets involves being able to ‘remove’ thebrackets from an expression. For example, consider the expression

3 + 2× (9− 4).

Using the rules above we could work this out by thinking of it as

3 + 2× (9− 4︸ ︷︷ ︸) = 3 + 2× 5︸ ︷︷ ︸ = 3 + 10 = 13.

Alternatively, we can ‘remove’ the brackets by thinking of the ‘2’ in ‘2× (9− 4)’ asmultiplying everything in the bracket, i.e.

2× (9− 4) = (2× 9)− (2× 4).

Using this method we get

3 + 2× (9− 4)︸ ︷︷ ︸ = 3 + ((2× 9︸ ︷︷ ︸)− (2× 4︸ ︷︷ ︸)) = 3 + (18− 8︸ ︷︷ ︸) = 3 + 10 = 13,

which is the same answer as before.

Activity 1.1 Show that if we worked out 3 + (9− 4)× 2, we would also get 13.

What if we had to work out 3− (9− 4)? We adopt the convention that a minus sign infront of a bracket is the same as adding something that has been multiplied by −1.Using this, and what we saw above, gives us

3− (9− 4) = 3 + (−1)× (9− 4) = 3 + ((−1× 9)− (−1× 4)) = 3 + (−9− (−4))

= 3 + (−9 + 4) = 3 + (−5) = −2.

Of course, this is what we should expect as 3− (9− 4︸ ︷︷ ︸) = 3− 5 = −2.

Absolute values

The magnitude (or absolute value) of a number is found by ignoring the minus sign(if there is one). For example, the magnitude of 6, written |6|, is 6 and the magnitude of−6, written | − 6|, is also 6, i.e. we have

|6| = 6 and | − 6| = 6.

In a way, the magnitude operation acts like a bracket as we need to evaluate themagnitude of the number inside it before we use it in calculations, e.g.

4− |2− 3| = 4− 1 = 3 as |2− 3| = | − 1| = 1, and

|4− 2| − 3 = 2− 3 = −1 as |4− 2| = |2| = 2.

13

Page 26: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

Inequalities

We use the symbols ‘<’ and ‘>’ to show that one number is ‘less than’ or ‘greater than’another number respectively. So, for example, 2 < 3 and 5 > 1. Zero is less than anypositive number and greater than any negative number, e.g. 0 < 5 and 0 > −5. As such,any negative number is less than any positive number, e.g. −3 < 2. Negative numbersare larger when they have smaller magnitudes (i.e. when they are closer to zero), e.g.−3 < −2 and −1 > −5. As such, we can say that smaller negative numbers (like −100compared to −1) have larger absolute values (like 100 compared to 1).

1.1.2 Fractions

A fraction such as 32

is, using our ‘horizontal line’ notation for division, the same asdividing the number above the line (i.e. 3) by the number below the line (i.e. 2). We callthe number above the line the numerator and the number below the line thedenominator. If we have two fractions, say

3

5and

4

2,

the number we get by multiplying their denominators together is called the commondenominator of these fractions, and this will be 5× 2 = 10 in this case.

Manipulating fractions

Sometimes we want to manipulate fractions in order to simplify them or to put them ina form where their denominator is the common denominator. The two basic procedureswe use to do these two manipulations are as follows.

To simplify a fraction we want to write it in lowest terms,∗ e.g. 610

can be written as

6

10=

2× 3

2× 5=

3

5,

by dividing through on top and bottom by the common factor of 2.

Conversely, to write a fraction so that its denominator is a common denominator,e.g. to write 3

5so that its denominator is, as above, the common denominator of 10

we note that it can be written as

3

5=

2× 3

2× 5=

6

10,

by multiplying top and bottom by 2.

This second technique is especially useful when we add and subtract fractions as weshall now see.

∗That is, so that the numerator and denominator have no common divisors.

14

Page 27: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

Adding and subtracting fractions

To add or subtract fractions, we first put them over a common denominator, e.g.

4

5+

2

3=

4× 3

5× 3+

2× 5

3× 5=

12

15+

10

15=

12 + 10

15=

22

15,

and4

5− 2

3=

4× 3

5× 3− 2× 5

3× 5=

12

15− 10

15=

12− 10

15=

2

15.

Multiplying fractions

To multiply fractions, we just multiply the numerators and denominators together, e.g.

4

5× 2

3=

4× 2

5× 3=

8

15.

Reciprocals

The reciprocal of a fraction is what we get when we swap the numerator anddenominator around, e.g. the reciprocal of 3

5is 5

3. The reciprocal is useful when we come

to divide fractions as we shall now see.

Dividing fractions

To divide fractions, we multiply the first fraction by the reciprocal of the second, e.g. ifwe want to evaluate

4

5÷ 2

3,

the rule tells us that this is the same as multiplying 45

by the reciprocal of 23, which is 3

2,

and so we have4

5÷ 2

3=

4

5× 3

2.

This can now be worked out using the multiplication rule, i.e.

4

5÷ 2

3=

4

5× 3

2=

4× 3

5× 2=

12

10.

Of course, we can simplify this by noting that the numerator and denominator have acommon factor of 2, i.e. the answer is 6

5in lowest terms.

It is, perhaps, also interesting to note that the reciprocal of a fraction is just onedivided by that fraction, e.g. as

1÷ 3

2= 1× 2

3=

2

3,

we can see that the reciprocal of 32, i.e. 2

3, is just one divided by 3

2.

15

Page 28: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

Improper and proper fractions

An improper fraction is one where the numerator is greater in magnitude than thedenominator and a proper fraction is one where the numerator is less in magnitudethan the denominator, e.g. 22

5is an improper fraction and 4

5is a proper fraction.

Sometimes it is convenient to be able to write improper fractions as proper fractions,e.g. we can write

22

5=

20 + 2

5=

20

5+

2

5= 4 +

2

5,

as 5 goes into 20 four times. This can be written as 425

and we read it as ‘four and twofifths’ to indicate that 22

5is the same as four ‘wholes’ and two fifths of a ‘whole’.

However, in this course, we will usually not use this way of writing fractions as, usingour convention of writing 4× 2

5as 4 · 2

5, we can easily get confused between ‘four and

two fifths’ and ‘four times two fifths’. As such, when the need arises, we will normallystick to improper fractions.

Decimals

Often, you will see fractions written as decimals and vice versa, e.g. the fraction 14

isexactly the same as the decimal 0.25. But, be aware that some fractions do not have anice finite decimal expansion, e.g.

1

3is the decimal 0.333333 . . . ,

i.e. there is an infinite number of threes after the decimal point. The problem with thisis that, in such cases, using decimals instead of fractions can lead to rounding errors, e.g.

3× 1

3= 1,

exactly. But, just keeping the first four threes of the decimal expansion for 13, i.e.

rounding 13

to four decimal places, written 4dp, we have 0.3333 and this gives us

3× 1

3' 3× 0.3333 = 0.9999,

where ‘'’ means ‘approximately equal to’. That is, using the decimal rounded to fourdecimal places gives us an answer which is not exactly one, i.e. there is a rounding errorin our calculation, and this is why we generally use fractions instead of decimals.

Percentages

The percentage sign, i.e. ‘%’, means ‘divide by 100’, e.g. 20% is the same as 20100

as afraction, or 0.2 as a decimal. As such, 20% of 150 is

150× 20

100=

3, 000

100= 30.

Knowing this, we can see what it means to increase 150 by 20% or decrease 150 by 20%,i.e.

16

Page 29: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

to increase 150 by 20%, we get

150 + 30 = 180,

as 30 is 20% of 150. Notice that an increase by 20% can also be seen as 120% of theoriginal, i.e.

150× 120

100=

18, 000

100= 180,

as before.

to decrease 150 by 20%, we get

150− 30 = 120,

as 30 is 20% of 150. Notice that a decrease by 20% can also be seen as 80% of theoriginal, i.e.

150× 80

100=

12, 000

100= 120,

as before.

These ideas will be particularly useful when we come to consider compound interest inUnit 9.

1.1.3 Powers

Another operation that you will have come across before is the idea of ‘raising a numberto a certain power’. The number which represents the power can also be called theexponent and the number which is being raised to that power is called the base. Forexample, we could have 42, 4−2 or 4

12 and, in each case, ‘4’ is the base and the other

number, i.e. ‘2’, ‘−2’ or ‘12’ respectively, is the exponent or power. We often refer to

expressions of this form as ‘powers’.

Positive integer powers

The simplest powers to work out are those where the power is a positive integer such as1, 2, 3, . . . . In such cases, the power just means ‘multiply the base by itself that manytimes’, e.g.

41 = 4, 42 = 4× 4 = 16, 43 = 4× 4× 4 = 64, . . . .

One application of this is standard index form (or scientific notation) where we areable to write large numbers in terms of powers of 10, e.g. we can write three million as

3, 000, 000 = 3× 1, 000, 000 = 3× 106,

as 1, 000, 000 is the same as 106.

Powers and other operations

In terms of combinations of operations, evaluating the effect of a power comes beforemultiplying and dividing, e.g. we can see that

2× 42︸︷︷︸+ 3 = 2× 16︸ ︷︷ ︸+ 3 = 32 + 3︸ ︷︷ ︸ = 35.

17

Page 30: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

Of course, as before, we can also use brackets to change the order in which we do theoperations, e.g.

(2× 4︸ ︷︷ ︸)2 + 3 = 82︸︷︷︸+ 3 = 64 + 3︸ ︷︷ ︸ = 67,

and2× ( 42︸︷︷︸+ 3) = 2× (16 + 3︸ ︷︷ ︸) = 2× 19︸ ︷︷ ︸ = 38.

In particular, when writing out expressions involving brackets, take care to distinguishbetween, e.g. 23 + 5 and 23+5, as the former is 13 whilst the latter is 256!

Also, similar to what we saw earlier, it is possible to remove the brackets fromexpressions involving powers by applying the power to all of the numbers in the bracket.For example,

(2× 3)4 = 24 × 34 = 16× 81 = 1, 296.(2

3

)4

=24

34=

16

81.

The power laws

If we have the same base, then the power laws can allow us to simplify expressionsthat involve multiplying powers, dividing powers and raising to powers. These laws areas follows.

Multiplying powers: If we multiply two powers, we add the powers. For example,if we have 24 × 23, we can write,

24 × 23 = 24+3,

as 24 × 23 = 16× 8 = 128 and 24+3 = 27 = 128.

Dividing powers: If we divide two powers, we subtract the power in thedenominator from the power in the numerator. For example, if we have 24/23, wecan write,

24

23= 24−3,

as 24

23= 16

8= 2 and 24−3 = 21 = 2.

Raising to powers: If we raise a power to another power, we multiply thepowers. For example, if we have (24)3, we can write,

(24)3 = 24×3,

as (24)3 = 163 = 4, 096 and 24×3 = 212 = 4, 096.

Notice that, if the bases of the powers are not the same, then we can not use the powerlaws. For example, to calculate

34× 25 we could use 34× 25 = 81× 32 = 2, 592, but we could not use the power law.

34

25we could use

34

25=

81

32, but we could not use the power law.

18

Page 31: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

Negative integer powers

Negative integer powers, such as −1,−2,−3, . . ., mean ‘take the reciprocal of the baseraised to the corresponding positive power’. For example,

4−1 =1

41=

1

4, 4−2 =

1

42=

1

16, 4−3 =

1

43=

1

64, . . . .

In particular, note that a power of −1 is the same as the reciprocal, e.g. 4−1 = 14

whichis the reciprocal of 4. Similarly, this means that(

3

5

)−1

=5

3,

which is the reciprocal of 35.

Zero powers

We now observe that any number raised to the power zero is one. For example, as

41 × 4−1 = 41−1 = 40,

by the power law, and

41 × 4−1 = 4× 1

4= 1,

we can see that 40 = 1.

Fractional powers I: Square roots

A square root of a number, say 64, is a number which, when multiplied by itself, givesus 64. So, as

8× 8 = 64,

we can see that 8 is a square root of 64. Indeed, since a negative number times anegative number is positive, we can see that

(−8)× (−8) = 64,

and so −8 is also a square root of 64. Thus, we can see that the square roots of 64 are 8and −8. We often express this by saying ‘the square roots of 64 are ±8’ where the ‘±’ isread ‘plus or minus’. Thus, we can see, by repeating this argument, that every positivenumber has two square roots, one positive and one negative, and both of the samemagnitude.

What about other numbers? Well, since 0× 0 = 0, we can see that the square root ofzero is zero and, moreover, zero is the only square root of zero. And, if we considernegative numbers, say −64, we can see that there are no square roots since there is noway of multiplying a number by itself to get −64.

We often denote the positive square root of a number, say 64, by ‘√

64’ and so, from theabove we can see that

√64 = 8 and

√0 = 0. Of course, as negative numbers have no

square roots, something like√−64 does not exist.

19

Page 32: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

Going back to our earlier example, as the square root of 64 is a number which, whenmultiplied by itself, gives us 64 we can see that(√

64)2

=√

64×√

64 = 64,

and this is why the square root is so called: squaring the square root gives us the originalnumber. Now, if we think of raising the number 64 to the power 1

2, we can see that(

6412

)2

= 6412×2 = 641 = 64,

using the power laws. And, comparing these two expressions, it is natural to think of64

12 as exactly the same thing as

√64, i.e.

6412 =√

64,

and so we identify square roots with powers of 12.

Activity 1.2 Find the square roots of 4, 9, 16, 25, 36 and 49.

Fractional powers II: nth roots

More generally, if n is a positive integer greater than 2, we say that an nth root of anumber, say 64, is a number which gives us 64 when raised to the power n. We oftendenote the nth root of a number, say 64, by n

√64. For example,

the cube root of 64, denoted by 3√

64, is 4 as four cubed is 64, i.e.

43 = 64 and so3√

64 = 4.

Notice that 64 has no negative cube root since (−4)3 = −64 and not 64, as such 64only has one cube root, i.e. 4. Repeating this argument, we can see that all positivenumbers only have one cube root.

In terms of powers, as(

3√

64)3

= 43 = 64 and(64

13

)3

= 6413×3 = 641 = 64,

comparing these two expressions it is natural to think of 6413 as exactly the same

thing as 3√

64, i.e.64

13 =

3√

64,

and so we identify cube roots with powers of 13.

the sixth root of 64, denoted by 6√

64, is 2 as two to the power six is 64, i.e.

26 = 64 and so6√

64 = 2.

Notice that 64 also has a negative sixth root since (−2)6 = 64 and so 64 has twosixth roots, i.e. ±2. Repeating this argument, we can see that all positive numberswill have two sixth roots.

20

Page 33: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

In terms of powers, as(

6√

64)6

= 26 = 64 and(64

16

)6

= 6416×6 = 641 = 64,

comparing these two expressions it is natural to think of 6416 as exactly the same

thing as 6√

64, i.e.64

16 =

6√

64,

and so we identify sixth roots with powers of 16.

And, more generally, we can write the positive nth root of a number a, or n√a, as a to

the power of 1n, i.e. a

1n .

Activity 1.3 Find the cube root of 27 and the fourth roots of 81.

Fractional powers III: powers of nth roots

Other fractional powers can be evaluated using the rules above, e.g. to evaluate 823 we

can think of it as8

23 = 82× 1

3 = (82)13 = 64

13 = 4,

or as8

23 = 8

13×2 = (8

13 )2 = 22 = 4,

using the power laws. Other examples involving fractional roots would be

(312 )4 = 3

12×4 = 32 = 9, and

423

416

= 423− 1

6 = 44−16 = 4

36 = 4

12 = 2,

using the power laws.

Fractional powers IV: Warnings

When using the above ideas you should also bear the following in mind.

When using the square root and nth root sign, i.e.√

and n√

, always be clearabout what parts of the expression are included in the root. For example,

√4× 16 and

√4× 16,

are different expressions (the former is equal to 8 whilst the latter is equal to 32).Generally speaking, you can make your expressions clear by extending the ‘tail’ ofthe root sign or using brackets.

Be careful when working with powers of negative numbers since even roots ofnegative numbers do not exist. For example,

((−2)2)12 = 4

12 = 2,

is fine, but (−2)12 does not exist and, as such, nor does

((−2)

12

)2

.

21

Page 34: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

Recap on combinations of operations

To summarise everything we have seen above about this, operations are done in‘BEDMAS’ order, i.e.

Brackets, Exponents, Division, Multiplication, Addition, Subtraction.

Otherwise, we work from left to right.

1.2 Algebra

We use algebra to express and manipulate information about unknown quantities.These unknown quantities are called variables and these are normally represented byletters such as x, y and z. One way to think of this is that numbers are constants, i.e.they always have the same value, whereas variables can take different values dependingon the context.

1.2.1 Algebraic expressions

An algebraic expression is a sequence of numbers, variables and operations, e.g.4x+ 3y − 7. In expressions such as this, 4x means 4× x, i.e. four lots of x. As such, wecan see that, for any value of x, we have things like

4x+ 3x = 7x,

as four lots of x plus three lots of x is seven lots of x. Note that all of the mathematicaloperations that we have seen so far can be used in algebraic expressions.

Attributing meaning to algebraic expressions

Often, we use mathematical expressions to represent the value of some quantity. Forinstance, we can consider the following examples.

1. If you have a job which pays £10 per hour and you work x hours, then your incomeis given by the algebraic expression £10x.

2. If a firm has a revenue of £x and costs of £y, then its profit is £(x− y).

3. If a firm prices a product at £x per unit and sells x units of this product, then therevenue is £x2. If the costs are £x, then its profit is £(x2 − x).

As the above examples show, some algebraic expressions contain one variable, such as4x+ 3x, some contain two variables, such as 4x+ 3y − 7, and some can contain onevariable used several times, such as x2 − x where x is used twice (i.e. once in an x termand once in an x2 term). Of course, the quantities represented may be more complicatedthan those given in these examples.

22

Page 35: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

Example 1.1 Suppose that you heat your house with gas for d days per year andon each day you use m cubic metres of gas. This means that you use dm cubicmetres of gas each year.

If gas costs £P per cubic metre, this means that the cost of heating your house for ayear is £dmP .

Suppose that you must also pay a fixed amount of £81 per year to the gas company.This means that the cost of heating your house for a year is now £(dmP + 81).

Suppose that you pay your gas bill in twelve equal monthly instalments, this meansthat you must pay

£dmP + 81

12

every month.

Activity 1.4 What will the annual payment be if the gas company raises the priceof gas by £p per cubic metre? What will the corresponding monthly repayments be?

Evaluating algebraic expressions

Given an algebraic expression, we are sometimes given specific values for each of thevariables involved and asked to evaluate it, i.e. find a value for the whole algebraicexpression given the values of the variables. So, for example, using our examples abovewe have the following.

1. With x = 5, you have a job which pays £10 per hour and you work 5 hours, thenyour income is given by £(10× 5) = £50.

2. With x = 40 and y = 30, the firm has a revenue of £40 and costs of £30, and so itsprofit is £(40− 30) = £10.

3. With x = 10, the firm prices the product at £10 per unit and sells 10 units, i.e. therevenue will be £102. The costs will be £10, and so its profit is£(102 − 10) = £(100− 10) = £90.

Indeed, we can also look at how this works in our more complicated example.

Example 1.2 Following on from Example 1.1, suppose that when heating yourhouse, gas costs £0.12 per cubic metre and that you use 13 cubic metres of gas perday for 125 days. This means that we have to pay

£13× 125× 0.12 + 81

12= £

195 + 81

12= £

276

12= £23

every month.

23

Page 36: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

Activity 1.5 What is the cost of heating your house for a year?

What will the annual payment be if the gas company raises the price of gas by 8pper cubic metre? What will the corresponding monthly repayments be?

Simplifying algebraic expressions

As long as we take care to combine ‘like with like’, an algebraic expression cansometimes be simplified, i.e. it can be changed into a form that is easier to evaluatewithout altering what we will get from an evaluation. For example, we saw earlier that

4x+ 3x = 7x,

and so we can write 4x+ 3x as 7x, which is simpler. In particular, we can often simplifyexpressions by removing brackets from an expression and simplifying what remains, e.g.if we have an algebraic expression like 3(2x) we can think of this as ‘three lots of 2x’which gives us 6x, i.e.

3(2x) = 6x.

However, if we have an algebraic expression like 3(x+ 2), which we can think of as‘three lots of x+ 2’, we can remove the brackets by multiplying everything inside thebrackets by 3, i.e.

3(x+ 2) = 3x+ 6,

whereas if we have an algebraic expression like −(2x− 1), we can think of the minus astelling us to multiply everything inside the brackets by −1, i.e.

−(2x− 1) = −2x+ 1.

Indeed, we may be able to do some simplifying after we have multiplied out thebrackets, e.g.

2(x+ 3) + x = 2x+ 6 + x = 3x+ 6,

where, here, we have multiplied out the brackets and collected ‘like’ terms to get asimpler expression. Some other examples of simplifying algebraic expressions are:

4x− 3x = x,

4(2x)− x = 8x− x = 7x,

3(x+ y) = 3x+ 3y,

3(x+ 1) + 4(x− 1) = 3x+ 3 + 4x− 4 = 7x− 1, and

3(x+ 1)− 4(x− 1) = 3x+ 3− 4x+ 4 = −x+ 7.

Notice that none of these simplifications changes the outcome of any evaluation whichwe may want to perform, i.e. whatever we get if we evaluate the expression at the startwe will also get if we evaluate the expression at the end. In this sense, the expressionsmay look different, but algebraically they are the same throughout.

24

Page 37: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

Multiplying out two pairs of brackets

Sometimes we will want to multiply out the brackets in more complicated expressions.For example, how would you remove the brackets from (x+ 3)(y − 2)? We can think ofthis in two ways:

Multiplying out the first bracket, everything in the first bracket needs to bemultiplied by the second bracket, i.e.

(x+ 3)(y − 2) = x(y − 2) + 3(y − 2),

and then simplifying this as before we get

(x+ 3)(y − 2) = x(y − 2) + 3(y − 2) = xy − 2x+ 3y − 6.

Multiplying out the second bracket, everything in the second bracket needs to bemultiplied by the first bracket, i.e.

(x+ 3)(y − 2) = (x+ 3)y + (x+ 3)(−2),

and then simplifying this as before we get

(x+ 3)(y − 2) = (x+ 3)y + (x+ 3)(−2) = xy + 3y − 2x− 6.

But, notice that these are the same expression, and so we can multiply out in eitherway as long as we make sure that every term in a bracket is multiplied by every term inthe other bracket.

Activity 1.6 We can write (x+ 3)2 as (x+ 3)(x+ 3). Use this to remove thebrackets from the expression (x+ 3)2. In a similar manner, remove the brackets fromthe expression (2x+ 3)2.

Factorising

Sometimes we can simplify expressions even further by putting brackets in, e.g. goingback to an earlier example, we could write

2(x+ 3) + x = 2x+ 6 + x = 3x+ 6 = 3(x+ 2),

as 3(x+ 2) = 3x+ 6 if we multiply out the brackets. The process of putting bracketsinto an expression is called factorisation. For our current purposes, we just need tonote that we can factorise when every term in our expression has a common factor, suchas 3 in the example above. Some other examples, which can be verified by multiplyingout the brackets, are:

2x− 6 = 2(x− 3),

−2x− 10 = −2(x+ 5), and

3xy − 12y = 3y(x− 4).

We will return to factorisation in Unit 3.

25

Page 38: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

1.2.2 Equations, formulae and inequalities

So far, we have considered how to manipulate algebraic expressions and what they maybe used to express. We now look at the ways in which a pair of algebraic expressionsmay be related to one another.

Equations

An equation is a mathematical statement which sets two algebraic expressions equal toone another. For example, a = b, x2 = 4 and x+ 3 = −2x+ 4 are all equations.

A solution to an equation is a value for each variable in the equation which is such that,when we evaluate both expressions with these values substituted for the variables, theexpressions are equal. For example, x = 3 is a solution of the equation x2 − 3 = 2x as,substituting x = 3 into both sides we get the same number, i.e. 6. Sometimes, anequation can have more than one solution. For example, x = −1 is also a solution ofx2 − 3 = 2x as, substituting x = −1 into both sides we get the same number, i.e. −2.Generally speaking, as we shall see in Units 2 and 3, a given equation may have nosolutions, one solution or many solutions.

Solving an equation is to find all of its solutions. Sometimes this is easy and sometimesit is not so easy to do this. In the simplest case, we just have to simplify both sides tosee the solution. For example, to solve the equation 4x− 3x = 2 + 5, we simplify bothsides to see that x = 7.

If this doesn’t work, we can rearrange the equation into a simpler equation that has thesame solution(s). To do this, we proceed by performing some well-chosen mathematicaloperation on both sides at the same time so that the equation is unchanged, butsimplified. The mathematical operations that we can use in such cases are:

add (or subtract) an expression from both sides;

multiply (or divide) by a non-zero expression on both sides.

But, raising both sides to a power can cause problems as if we were squaring both sides,say, we know that a positive expression has two square roots. For example, the equation4x− 8 = 2x+ 4 has the same solutions as the equations

4x− 8− (4x+ 4) = 2x+ 4− (4x+ 4),

and4x− 8

9=

2x+ 4

9,

but, it has different solutions to the equation

(4x− 8)2 = (2x+ 4)2.

Bearing this in mind, let’s see how we would actually solve this equation.

26

Page 39: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

Example 1.3 Solve the equation 4x− 8 = 2x+ 4.

We solve this by rearranging it, i.e. performing some well chosen mathematicaloperations on both sides at the same time:

4x− 8 = 2x+ 4 our equation

4x− 8− 2x = 2x+ 4− 2x subtracting 2x from both sides

2x− 8 = 4 simplifying

2x− 8 + 8 = 4 + 8 adding 8 to both sides

2x = 12 simplifying

x = 6 dividing both sides by 2

Thus, the solution to our equation is x = 6.

Lastly, always check that any solution you find is a solution by using it to evaluate bothsides of the original equation.

Activity 1.7 Check that x = 6 is a solution to the original equation.

Example 1.4 Solve the equation 3x+ 6 = 5x− 10.

We again proceed by rearranging the equation:

3x+ 6 = 5x− 10 our equation

3x+ 6− 3x = 5x− 10− 3x subtracting 3x from both sides

6 = 2x− 10 simplifying

6 + 10 = 2x− 10 + 10 adding 10 to both sides

16 = 2x simplifying

8 = x dividing both sides by 2

Thus, the solution to our equation is x = 8.

Activity 1.8 Check that x = 8 is a solution to the equation 3x+ 6 = 5x− 10.

The equations in the last two examples are linear equations and they will be thestarting point for a more detailed discussion of equations that will start in Unit 2.

Inequalities

An inequality is a mathematical statement where two algebraic expressions are relatedby an inequality, such as ‘>’ or ‘<’, so that we know that one of the expressions isgreater than or less than the other. Inequalities can be solved by finding the range ofvalues, for each variable, that make it true. For example, the inequality x < 2 is trueprecisely when x < 2.

27

Page 40: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

As with equations, inequalities can be solved by rearranging them into simplerinequalities that are true for the same range of values. Generally, given an inequality,this means that we can:

add (or subtract) an expression from both sides, or

multiply (or divide) by a positive expression on both sides,

to simplify, but not change, the inequality. For example,

x+ 4 > −1 can be simplified to give x > −5 by subtracting 4 from both sides.

3x > 6 can be simplified to give x > 2 by dividing both sides by 3 (as 3 is positive).

However, if we multiply (or divide) by a negative expression, we must ‘reverse thedirection’ of the inequality. For example,

−3x > 6 can be simplified to give x < −2 by dividing both sides by −3 andreversing the direction of the inequality (as −3 is negative).

To see why we need to do this, consider the inequality 2 < 3 which is true. If wemultiply by 2 (which is positive) we get 4 < 6 which is still true, but if we multiply by−2 (which is negative) we get −8 < −12 which is not true. However, if we reverse thedirection of the inequality as well, we get −8 > −12 which is now true.

Example 1.5 Solve the inequality 4x− 6 < 6x− 2.

We solve this by rearranging it, i.e. performing some well chosen mathematicaloperations on both sides at the same time:

4x− 6 < 6x− 2 our inequality

4x− 6− 4x < 6x− 2− 4x subtracting 4x from both sides

−6 < 2x− 2 simplifying

−6 + 2 < 2x− 2 + 2 adding 2 to both sides

−4 < 2x simplifying

−2 < x dividing both sides by 2

Thus, the solution to our inequality is −2 < x, or rewriting this, x > −2.

Alternatively, we could have rearranged it by doing some slightly differentoperations:

4x− 6 < 6x− 2 our inequality

4x− 6− 6x < 6x− 2− 6x subtracting 6x from both sides

−2x− 6 < −2 simplifying

−2x− 6 + 6 < −2 + 6 adding 6 to both sides

−2x < 4 simplifying

x > −2 dividing both sides by −2 and reversing the inequality

Thus, the solution to our inequality is, again, x > −2.

28

Page 41: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

Formulae

A formula is an algebraic expression where a single variable, the subject, is equatedto an expression involving other variables. For example, the area, A, of a circle is givenin terms of its radius, r, by the well-known formula A = πr2. Sometimes we will want torearrange a formula so that a different variable is the subject. The procedure for doingthis is the same as the one we used to solve an equation, but the ‘solution’ will be analgebraic expression rather than a number.

Example 1.6 Following on from Example 1.1, let S denote the amount, in pounds,of our monthly gas payments so that

S =dmP + 81

12.

If our monthly repayment, S, is now given, for how many days, d, can we heat ourhouse?

We proceed by rearranging the formula:

S =dmP + 81

12our formula

12S = dmP + 81 multiplying both sides by 12

12S − 81 = dmP subtracting 81 from both sides

12S − 81

mP= d dividing both sides by mP

Thus we can see that the number of days is given by d =12S − 81

mP.

Activity 1.9 In a similar manner, find the price, P , per cubic metre of gas.

Identities

An identity is a special kind of mathematical formula that allows us to rewrite onemathematical expression in another way. For instance,

x(x+ 1) = x2 + x,

is an identity because reading it from left to right tells us how to multiply out thebrackets in ‘x(x+ 1)’ and reading it from right to left tells us how to factorise thequadratic x2 + x. In particular, notice that although this looks like an equation, it isn’treally because it is true for all values of x! In fact, throughout this unit we have beenreviewing how certain mathematical operations work and, as you have probablyrealised, many of these can be usefully summarised by using identities. For instance, thefollowing identities allow us to summarise some of the ideas that we encountered whenwe discussed fractions.

29

Page 42: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

To add and subtract fractions, we use the rules

a

b+c

d=a+ c

bdand

a

b− c

d=a− cbd

,

where bd is called the common denominator. To multiply fractions we use the rule

a

b× c

d=ac

bd,

and we divide fractions by using the rule

a

b÷ c

d=a

b× d

c,

where d/c is called the reciprocal of c/d.

Arithmetic with fractions

At this stage, we can also usefully summarise some of the ways in which powers work asfollows.

The power laws state that

an · am = an+m an

am= an−m (an)m = anm

provided that both sides of these expressions exist. In particular, we have

a0 = 1 and a−n =1

an.

If it exists, we also define the positive nth root of a, written n√a, to be a

1n .

Power laws

We can also summarise some of our results concerning brackets by using identities asyou can see in the next activity.

Activity 1.10 Write out the identities that arise when you remove the bracketsfrom the following algebraic expressions.

i. a(bc), ii. a(b+ c), iii. (a+ b)2, iv. (a+ b)(c+ d).

And, just to be sure that we understand what is going on, try the next activity.

Activity 1.11 Use these identities to simplify the following algebraic expressions.

i. (x+y)2−x(x+y)−y(x+y), ii.(x+ y)2 − (x− y)2

4xy, iii.

√x+ y−(

√x+√y).

30

Page 43: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

Learning outcomes

At the end of this unit, you should be able to:

simplify and evaluate arithmetic expressions including those that involve bracketsand powers;

manipulate algebraic expressions including those that involve brackets and powers;

solve simple equations and inequalities;

model certain situations using formulae and be able to rearrange such formulae;

use identities to manipulate arithmetic and algebraic expressions.

Exercises

Exercise 1.1

Evaluate the expressions 3 · 2 +6

2· 7 + 4 and

3− (−3− (4− 5)− 2)− 6

−(−(−1))− 1.

Exercise 1.2

Evaluate the following expressions.

i. | − 3|+ | − 2|, ii. | − 3| − |− 2|, iii. − |3|+ | − 2|, iv. − |3| − |− 2|, v. − |3| − |2|.

Exercise 1.3

Write the proper fractions 427, 12

3and 21

4as improper fractions.

Exercise 1.4

Evaluate the following expressions, writing your answers in lowest terms.

i.1

3+

1

2, ii.

30

7− 5

3, iii.

2

5· 25

4, iv.

13

8÷ 9

4.

Exercise 1.5

You deposit £1000 in a bank account that pays 10% interest. What will the balance beafter one year? Two years?

After two years, what is the increase in the balance as a percentage of the originaldeposit?

Exercise 1.6

Evaluate the following expressions.

i. 92 − 912 , ii. 16−

14 + 16

12 , iii. 7

13 · 7 2

3 , iv.10−2

2−10.

31

Page 44: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

1. Review I — A review of some basic mathematics

Exercise 1.7

Express the following in the simplest form possible.

i.x2y

2xz+xy3z

xy, ii. x(y2z3)

12 (xz)−2, iii. x(xy)−2(x+ z)

12 .

Exercise 1.8

Multiply out the brackets in the following expressions simplifying your answers as far aspossible.

i. (x+ 1)(x− 1), ii. (2y + 3)(y − 2), iii. (x+ 3y)(2x− y), iv. (2x− 3y)(x+ z).

Exercise 1.9

Solve the following equations.

i. −3 p = 21, ii. 4 q − 1 = 15,

iii. 5 z + 4 (z − 2) = 1, iv. 56k − 2 k + 1

3= 2

3,

v. 5m− 3 (m− 2) = 11 (m+ 2), vi. 83 (w − 1,996) + 17 (w − 1,996) = 600.

Exercise 1.10

You hire a car for £20 plus the cost of petrol used. Let x be the distance you travel inmiles and p be the price, in pence, of petrol per gallon. If petrol consumption is 30 milesper gallon, write down expressions, in pence, for the amount you spend on petrol andthe cost per mile.

Exercise 1.11

Rearrange the formula y =z

2 + x− 3 to make x the subject.

Exercise 1.12

Solve the inequality 5− x > 2x− 1.

32

Page 45: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

2. Review II — Linear equations and straight lines

Unit 2: Review IILinear equations and straight lines

Overview

In this unit we continue our study of equations by looking at linear equations in one andtwo variables. In particular, we will see that linear equations in two variables representstraight lines. We see how to sketch these lines by finding their intercepts andinvestigate how their gradient allows us to measure changes. Lastly, we will see how tosolve problems that involve simultaneous equations.

Aims

The aims of this unit are as follows.

To see how to solve simple linear equations in one and two variables.

To see how linear equations in two variables represent straight lines.

To see how to sketch straight lines and find their gradient.

To see how to solve simultaneous equations.

Specific learning outcomes can be found near the end of this unit.

2.1 Linear equations

We start with a brief review of how to solve linear equations in one variable and see howsuch equations can have no solutions, one solution or an infinite number of solutions. Wethen look at linear equations in two variables, find that they give us an infinite numberof solutions, and see how we can represent these solutions in a straightforward way.

2.1.1 Linear equations in one variable

A linear equation in one variable, let’s call it x, is an equation of the form

ax = c.

where a 6= 0 and c are constants. As in Unit 1, we can solve this by dividing through onboth sides by a to get

x =c

a,

33

Page 46: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

2. Review II — Linear equations and straight lines

as a 6= 0 and, in this case, we say that such an equation has a unique solution.∗ Ofcourse, the variable need not be x, as linear equations in one variable may use adifferent variable, for example, the variable

i. y in 4y = −8, which gives the solution y = −2;

ii. z in 3z = 9 ,which gives the solution z = 3;

iii. q in 3q = 9, which gives the solution q = 3.

Notice, in particular, that examples ii. and iii. are the same equation written in terms oftwo different variables.

More generally, a linear equation in one variable can come about through an equationthat only involves multiples of the variable and constants. For example, if we considerthe equations,

1. 6y + 4 = 2y − 4,

2. 2z − 6 = −z + 3,

3. q − 5 = −2q + 4,

we can rearrange them, as in Unit 1, to yield the linear equations that we saw above.The only exceptions to this are when we have something like

2x− 6 = 2x+ 2 which rearranges to 0 = 8,

and this is never true, i.e. an equation like this has no solutions since, whatever value ofx we put into the equation, it is never satisfied. Or we have something like

2x+ 2 = 2x+ 2 which rearranges to 0 = 0,

and this is always true, i.e. an equation like this has an infinite number of solutionssince, whatever value of x we put into the equation, it is always satisfied. That is, thisequation is actually an identity because it is true for all values of x.

2.1.2 Linear equations in two variables

In its simplest form, a linear equation in two variables, say x and y, is an equation ofthe form

ax+ by = c,

where at least one of the constants a and b is non-zero. Unlike the situation with onevariable, this will generally have an infinite number of solutions.

Example 2.1 If we have the linear equation in two variables given by

2x+ y = 7,

then we can rearrange this to get

y = 7− 2x.

∗That is, it always has a solution and there is only one solution.

34

Page 47: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

2. Review II — Linear equations and straight lines

Now, if we substitute any value of x into this equation, it will give us a value of y.For instance, if we take

x = 1 we get y = 5;

x = 2 we get y = 3;

x = 3 we get y = 1;

and so on for any other values of x that we may choose. Furthermore, each of thesepairs of numbers is a solution to the equation as putting the x value and itscorresponding y value into the equation satisfies it.

Example 2.2 Consider the linear equation in two variables given by x = 2. Noticethat this linear equation only contains the variable x, but as we are told that it is alinear equation in two variables, we have to think about what this means for theother variable, which we can call y. The way to think about this is to write it as

x+ 0y = 2,

and then notice that, for any value of y, the quantity ‘0y’ is always zero and so wemust always get x = 2. That is, among the solutions to this linear equation in twovariables we will find the pairs of numbers

x = 2 and y = 1;

x = 2 and y = 2;

x = 2 and y = 3;

and so on for any other value of y as long as we pair it with x = 2.

Activity 2.1 What are the solutions to the linear equation in two variables givenby y = 3?

2.1.3 Visualising the solutions of linear equations in twovariables

Consider again the equation2x+ y = 7

from Example 2.1 and one of the solutions to this equation that we found there, say,x = 1 and y = 5. We can write such a solution as the ordered pair (1, 5).† In such apair, we often call the value of x, i.e. 1 in this case, the x-coordinate and the value of y,i.e. 5 in this case, the y-coordinate. Indeed, such ordered pairs, or coordinates, can berepresented as a point on a diagram such as the one in Figure 2.1(a). This diagramconsists of two axes, the x-axis that runs horizontally and the y-axis that runsvertically. We take the point where the axes cross, labelled O in this diagram, to be the

†As we shall see, this is an ordered pair since the pair (1, 5) is different from the pair (5, 1). That is,the order in which the numbers appear in the brackets matters.

35

Page 48: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

2. Review II — Linear equations and straight lines

point with coordinates (0, 0) and we call this the origin. We often refer to the ‘space’which contains all the points with (x, y) coordinates as the ‘xy-plane’.‡ Repeating thisfor the other two solutions of the equation we found earlier, i.e. those with coordinates(2, 3) and (3, 1), yields three points on our diagram as shown in Figure 2.1(b). Thisprocedure, of representing the solutions to an equation in two variables as points onsuch a diagram, is known as plotting those points.

1 2 3

5

1

y

xO

3

1 2 3

3

5

1

y

xO

(a) (b)

Figure 2.1: Plotting points. (a) The point (1, 5) and (b) the points (1, 5), (2, 3) and (3, 1).All of the points plotted here are solutions to the equation 2x+ y = 7.

If we were to repeat this procedure, i.e. if we were to plot all the points whichrepresented a solution to our linear equation, we would find that they are all on thestraight line shown in Figure 2.2(a). In fact, any linear equation in two variables thathas an infinite number of solutions can be represented as a line on such a diagram.

Indeed, the lines which represent the linear equations in two variables given by x = 2(from Example 2.2) and y = 3 (from Activity 2.1) are illustrated in Figure 2.2(b). Inparticular, notice that points on the vertical line, which represent the solutions to theequation x = 2, always have coordinates of the form (2, y) where y can take any value.

Activity 2.2 In a similar manner, what can we say about the coordinates of thepoints on the horizontal line? What are the coordinates of the point at which thishorizontal line and this vertical line intersect?

Activity 2.3 If we have the vertical line x = k and the horizontal line y = l wherek and l are constants, what can we say about the coordinates of the points on theselines? What are the coordinates of the point at which these two lines intersect?

‡Basically, it’s a ‘plane’ because it is ‘flat and level’. It’s the xy-plane because the points have (x, y)coordinates as determined by the x and y-axes.

36

Page 49: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

2. Review II — Linear equations and straight lines

1 2 3

3

5

1

y

xO

2x+ y = 7

1 2 3

3

5

1

y

xO

x = 2

y = 3

(a) (b)

Figure 2.2: Drawing straight lines. (a) Each point on this line has coordinates that satisfythe equation 2x+ y = 7. (b) Each point on the vertical line has coordinates that satisfythe equation x = 2 and each point on the horizontal line has coordinates that satisfy theequation y = 3.

Activity 2.4 What are the equations of the lines that we use to represent the x andy-axes? What are the coordinates of the points on these lines?

2.2 Straight lines

We now turn our attention to straight lines in general. In particular, we want to be ableto draw the straight line that represents the solutions to a given linear equation in twovariables and we want to be able to find the linear equation in two variables whosesolutions are represented by a given straight line.

2.2.1 Drawing straight lines given their equations

From what we have seen above, there are three kinds of straight line depending on theform of the linear equation in two variables we are dealing with. In particular, if wehave the equation

ax+ by = c,

then we find that:

If a = 0 and b 6= 0, then the equation can be written as y = c/b and so, for anyvalue of x, a point with coordinates (x, c/b) is on this line. As in Activity 2.1, wherewe had y = 3, we see that these equations represent horizontal straight lines asillustrated in Figure 2.2(b). In particular, the line with equation y = 0 is the x-axis.

37

Page 50: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

2. Review II — Linear equations and straight lines

If a 6= 0 and b = 0, then the equation can be written as x = c/a and so, for anyvalue of y, a point with coordinates (c/a, y) is on this line. As in Example 2.2,where we had x = 2, we see that these equations represent vertical straight lines asillustrated in Figure 2.2(b). In particular, the line with equation x = 0 is the y-axis.

If a 6= 0 and b 6= 0, then the equation can not be written so simply and any pointwhose coordinates satisfy this equation will be on this line. As in Example 2.1,where we had 2x+ y = 7, we see that these equations represent lines which areneither horizontal nor vertical and we call them oblique straight lines, as illustratedin Figure 2.2(a).

Now, given a linear equation in two variables, when it comes to drawing the straightline that represents its solutions, all we need to do is find at most two points on theline. In particular, on the one hand, if we can see from its equation that the line ishorizontal or vertical, we need only one point on the line to draw it. On the other hand,if we can see from its equation that the line is oblique, then we only need to find twopoints on the line to draw it. That is, if we find any two points whose coordinatessatisfy the equation, we can plot these two points on our diagram and the line we seekis the one that goes through these two points.

2.2.2 The intercepts of a straight line

For oblique lines, two extremely easy points to find are the x and y-intercepts. Forinstance, if the equation of the line is given by

ax+ by = c,

and we have a 6= 0 and b 6= 0 so that it is oblique, we can find the:

x-intercept, i.e. the value of x where the line crosses the x-axis. But, the x-axis isthe line y = 0 and so we are looking for the value of x which occurs when y = 0 inour equation. That is, we want x to be such that

ax = c =⇒ x =c

a,

and so the coordinates of the x-intercept are( ca, 0)

.

y-intercept, i.e. the value of y where the line crosses the y-axis. But, the y-axis isthe line x = 0 and so we are looking for the value of y which occurs when x = 0 inour equation. That is, we want y to be such that

by = c =⇒ y =c

b,

and so the coordinates of the y-intercept are(

0,c

b

).

This general case is illustrated in Figure 2.3(a).

38

Page 51: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

2. Review II — Linear equations and straight lines

y

xO

ax+ by = c

ca

cb

1 2 3

3

5

1

y

xO

2x+ y = 4

4

2

(a) (b)

Figure 2.3: The x and y-intercepts of an oblique straight line. (a) In general, the obliqueline ax + by = c has a 6= 0 and b 6= 0, so the x and y-intercepts are given by the points( ca, 0) and (0, c

b) respectively. (b) The line 2x+ y = 4 has x and y-intercepts given by the

points (2, 0) and (0, 4) respectively.

Example 2.3 As an example of how this works, consider the linear equation in twovariables given by

2x+ y = 4.

This represents an oblique line and so we can find its x and y-intercepts as follows.

For the x-intercept, we set y = 0 to get 2x = 4 and hence x = 2. Thus, the pointwith coordinates (2, 0) is the x-intercept of this line.

For the y-intercept, we set x = 0 to get y = 4. Thus, the point with coordinates(0, 4) is the y-intercept of this line.

Once we have plotted these two points, the line that we seek is the one that goesthrough both of them, as illustrated in Figure 2.3(b).

Activity 2.5 Suppose that you are going to spend exactly £3 when buying xapples and y bananas. If apples cost 50p each and bananas cost 30p each, find alinear equation in terms of x and y that gives the combinations of apples andbananas that you can purchase. Draw the straight line that is represented by thislinear equation and comment on its economic significance.

39

Page 52: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

2. Review II — Linear equations and straight lines

2.2.3 The gradient of a straight line

The gradient, or slope, of a straight line is a measure of how ‘steep’ the line is. Thatis, it can be found by taking two points on the line and dividing the change in y by thechange in x as we see in the following definition.

If (x1, y1) and (x2, y2) are the coordinates of two distinct points on a straight line,then

the change in y is ∆y = y2 − y1, and

the change in x is ∆x = x2 − x1.

The gradient, m, of this straight line is then given by

m =∆y

∆x=y2 − y1

x2 − x1

.

Gradient of a straight line

In particular, whichever two points on the straight line we use when finding thegradient, we will always get the same value.

Example 2.4 Using the line from Example 2.3 which was illustrated inFigure 2.3(b), we can see that it goes through the points with coordinates (2, 0) and(0, 4). As such, using these two points, we can see that

the change in y is ∆y = 4− 0 = 4, and

the change in x is ∆x = 0− 2 = −2,

which means that the gradient, m, of this line is

m =∆y

∆x=

4

−2= −2.

Notice that the gradient of the line is negative and this means, as we can see in thefigure, that along this line the y-coordinate decreases as the x-coordinate increases.

Activity 2.6 Following on from Example 2.1, find the gradient of the straight linewhose equation is 2x+ y = 7.

Following on from Example 2.2 and Activity 2.1, what can you say about thegradients of the straight lines whose equations are x = 2 and y = 3?

2.2.4 Finding the equation of a straight line

So far, we have seen how the equation of a straight line allows us to draw it and find itsintercepts and gradient. We now consider how we can find the equation of a straightline if we are given some information about it and there are three common cases whichcan occur.

40

Page 53: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

2. Review II — Linear equations and straight lines

Given the gradient and the y -intercept

If we know that the line has gradient, m, and y-intercept, k, then the equation of theline is given by

y = mx+ k.

For example, if we are told that a line has a gradient of 3 and its y-intercept is the pointwith coordinates (0, 7), then the equation of the line is

y = 3x+ 7

and, if the gradient of the line is zero and the y-intercept is the point with coordinates(0, 5), then the equation of the line is y = 5.

Given the gradient and a point on the line

If we are given the gradient of the line, m, and a point on the line other than they-intercept, say the point (x1, y1), then we know that for any other point, (x, y), on theline we must have

m =y − y1

x− x1

,

as the gradient of a line is the same regardless of which pair of points on the line we useto calculate it.

To verify that this formula works, consider again the line which has a gradient of 3 andwhose y-intercept is the point with coordinates (0, 7). Using the formula, this yields theequation,

3 =y − 7

x− 0which can be rearranged to give y = 3x+ 7,

as before. Similarly, in the case of the line which has a gradient of zero and whosey-intercept is the point with coordinates (0, 5), we can see that the formula yields theequation

0 =y − 5

x− 0which can be rearranged to give y = 5,

as before.

However, the full power of this formula is when we have to find the equation of the linethat, for example, has a gradient of 10 and goes through the point with coordinates(2, 3). Using the formula in this case yields the equation

10 =y − 3

x− 2=⇒ y − 3 = 10(x− 2) =⇒ y − 3 = 10x− 20,

or y = 10x− 17. Indeed, we can verify this is correct since the x-coefficient is thegradient and the point (2, 3) satisfies the equation.

Given two points on the line

If we know that the two distinct points (x1, y1) and (x2, y2) lie on the line, then itsgradient is given by

m =y2 − y1

x2 − x1

.

41

Page 54: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

2. Review II — Linear equations and straight lines

However, if the point (x, y) is also on the line, then the gradient between it and anyother point on the line, say (x1, y1), is given by

m =y − y1

x− x1

.

So, as the line has the same gradient regardless of the pairs of points we take, thismeans that the equation of the line is given by

y − y1

x− x1

=y2 − y1

x2 − x1

.

For example, if the points with coordinates (1,−7) and (2, 3) are on the line, then theequation of the line is given by

y − (−7)

x− 1=

3− (−7)

2− 1=⇒ y + 7 = 10(x− 1) =⇒ y + 7 = 10x− 10,

or y = 10x− 17, i.e. this line is the same as the one we saw earlier.

2.2.5 Applications of straight lines

If we have a situation where our two variables represent certain quantities and therelationship between these variables gives us a straight line, we often find that thegradient of the straight line also has a useful interpretation.

Example 2.5 If y is the distance travelled and x is the time taken, then a linearequation that relates these two variables would give us a straight line that representsthe distance travelled in terms of the time taken. In this case, the gradient of theline, i.e.

m =∆y

∆x,

is the speed of the object whose motion we are considering.

If the x variable is time, as in this example, we often call the gradient the rate of changeof y. So, in this example, speed is the rate of change of distance, as one might expect. Ifthe x variable is something else, then we call the gradient the rate of change of y withrespect to x.

In economics, if x measures the quantity being produced, then gradients are usuallyreferred to as marginals. So, for instance, the rate of change of profit with respect tothe amount produced would be the marginal profit and so on. To motivate this, let’sconsider another example.

Example 2.6 Suppose that a factory produces a certain product and the profit,when written in terms of the amount produced, is thought to be linear. One year,they produce 40 units and lose £4, 000, while the next year, production is doubledto 80 units resulting in a profit of £1, 000. What is the equation describing the profitin this case?

One way to start is to denote the profit by π and the amount produced by x. Thensince we know that we are looking for a linear expression, we can use the

42

Page 55: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

2. Review II — Linear equations and straight lines

information about change in profit and change in production to calculate thegradient of this linear function. That is, writing the figures as a change from(40,−4000) to (80, 1000), we can see that its gradient will be

m =∆π

∆x=

1, 000− (−4, 000)

80− 40=

5, 000

40= 125.

This means that our linear relationship between π and x will be given by

π = 125x+ k.

However, we can find k as we know that the point (80, 1000) must satisfy this linearrelationship, i.e. we must have

1, 000 = 125× 80 + k =⇒ 1, 000 = 10, 000 + k =⇒ k = −9, 000.

Thus, the linear relationship we seek is

π = 125x− 9, 000,

and this can be verified by showing that it is also satisfied by the point (40,−4000).

So, in this example, the gradient of the straight line is the marginal profit of the factory.That is, when quantities like profit and production are related by a straight line, we saythat the marginal profit is the gradient of that line, i.e. the change in profit divided bythe change in production.

2.3 Simultaneous equations

So far, we have seen how to identify the points that are on a given line, i.e. they will bethe points (x, y) which are solutions to a linear equation in two variables such as

2x+ y = 4.

But, what if we want to find the points (x, y) that two lines have in common? That is,what if we have two linear equations, say,

2x+ y = 4 and x− y = −1,

and we want to find the points (x, y) that are solutions to both of them? In such cases,we say that we are solving the two equations simultaneously, we call the pair ofequations simultaneous equations and we usually denote this by ‘pairing’ them with acurly bracket, i.e.

2x+ y = 4

x− y = −1

}Sometimes, we will refer to a collection of two or more equations, such as the onesabove, as a system of linear equations. We now turn our attention to visualising whatthe solution to a pair of simultaneous equations is and how to solve them using algebra.

43

Page 56: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

2. Review II — Linear equations and straight lines

2.3.1 Visualising the solution to a pair of simultaneous equations

Geometrically, if we draw the two lines represented by this pair of equations, as inFigure 2.4(a), then we can see that the solution to these simultaneous linear equations,

1 2 3

3

5

1

y

xO

4

2

x− y = −1

2x+ y = 4

1 2 3

3

5

1

y

xO

2x+ y = 4

4

2

(a) (b)

Figure 2.4: Finding points of intersection. (a) The lines represented by the linear equations2x+ y = 4 and x− y = −1 intersect at the point (1, 2). (b) The x and y-intercepts of theline represented by the linear equation 2x + y = 4 are its points of intersection with thelines y = 0 (i.e. the x-axis) and x = 0 (i.e. the y-axis) respectively.

i.e. the point that the lines they represent have in common, is the point (1, 2) where thetwo lines intersect. However, using pictures, no matter how well they are drawn, to findsuch points can be inaccurate and so we want to develop an algebraic method forsolving such equations.

However, we have already seen examples of such an algebraic method. For example,when we found the x and y-intercepts of the line represented by the linear equation2x+ y = 4, as illustrated in Figure 2.3(b). In this case, finding the x-intercept of theline involved finding the point (x, y) that lies on the line and the x-axis, i.e. thisinvolved solving the simultaneous equations

2x+ y = 4

x = 0

}whereas finding the y-intercept of the line involved finding the point (x, y) that lies onthe line and the y-axis, i.e. this involved solving the simultaneous equations

2x+ y = 4

y = 0

}and, geometrically, these intersections are illustrated in Figure 2.4(b).

44

Page 57: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

2. Review II — Linear equations and straight lines

2.3.2 Solving simultaneous equations algebraically

We shall consider two methods for solving simultaneous equations using algebra.Generally speaking, there is not much difference between the methods and students areencouraged to use the method that they feel most comfortable with.

Method I: Substitution

This method involves rearranging one of the two equations so that it is in the formy = mx+ k, say, and then using this to substitute for the y in the other equation. Thisyields an equation that allows us to solve for x. We can then find y by substituting thisvalue of x back into our equation of the form y = mx+ k.

Example 2.7 To solve the simultaneous equations

2x+ y = 7

x− 2y = 1

}

by substitution, we make y the subject of the first equation by rearranging. Thisyields the equation

y = 7− 2x,

and then, substituting this into the other equation, we get

x− 2(7− 2x) = 1 =⇒ x− 14 + 4x = 1 =⇒ 5x = 15,

and so x = 3. Substituting this value into y = 7− 2x then yields the value of ywhich, in this case, is y = 1. Thus, the solution to these simultaneous equations isx = 3 and y = 1.

Activity 2.7 Verify that the solution to these simultaneous equations is x = 3 andy = 1 by showing that these values satisfy the two original linear equations.

Activity 2.8 Make x the subject of the second equation in Example 2.7 and usethis to find the solution to the given simultaneous equations.

Activity 2.9 Solve the simultaneous equations y = 2x+ 4 and y− 3x = 2 using thismethod.

Method II: Elimination

This method involves multiplying each equation by a specially chosen number, namelythe number that makes the coefficients of one of the variables the same in bothequations. Then, by subtracting one equation from the other, we can eliminate thatvariable and hence solve what is left for the other.

45

Page 58: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

2. Review II — Linear equations and straight lines

Example 2.8 To solve the simultaneous equations

2x+ y = 7

x− 2y = 1

}

by elimination, we want to make the coefficient of x, say, the same in bothequations. So, taking the equations individually we have:

2x+ y = 7 multiply by 1 to get 2x+ y = 7x− 2y = 1 multiply by 2 to get 2x− 4y = 2

and subtracting gives 5y = 5

which tells us that y = 1. Then, using this value of y in either of the originalequations, say the second, we see that x = 3. Thus, the solution is x = 3 and y = 1.

Activity 2.10 Repeat the calculation in this example, but instead of eliminating xas we did above, use your multiplications to eliminate y.

Activity 2.11 Solve the simultaneous equations y = 2x+ 4 and y − 3x = 2 usingthis method.

A warning

With either of these methods, we may find that when we eliminate one variable, weeliminate the other variable as well, ending up with something like 0 = 0 or 2 = 5. Insuch cases we conclude that:

If we get the former, i.e. we get something which is always true, this means thatour simultaneous equations have an infinite number of solutions. This occurs whenthe two lines that are represented by our simultaneous equations are actually justthe same line and so every point on this line is a solution as every point is a pointof intersection.

If we get the latter, i.e. we get something which is never true, this means that oursimultaneous equations have no solutions. This occurs when the two lines that arerepresented by our simultaneous equations are parallel, i.e. they never intersect,and so no point on either of the lines can be a solution.

Thus, we can see that in such cases, we will always get parallel lines. And, if the parallellines are the same we get an infinite number of solutions, and if they are different we getno solutions.

One way of seeing when such cases occur is to note that parallel lines have the samegradient. As such, if you find that your simultaneous equations represent lines with thesame gradient, the question is whether they always intersect (e.g. they have the samey-intercept as well) or whether they never intersect (e.g. they have differenty-intercepts).

46

Page 59: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

2. Review II — Linear equations and straight lines

2.3.3 An application of simultaneous equations in economics

Simultaneous equations arise in economics when we consider questions of supply anddemand. In general:

The level of supply, q, for a product depends on the [per-unit] price, p, of theproduct. Generally, the level of supply grows as the price increases, and so a linerepresenting this relationship between p and q must have a positive gradient. Wegenerally denote the supply line by S.

The level of demand, q, for a product depends on the [per-unit] price, p, of theproduct. Generally, the level of demand falls as the price increases, and so a linerepresenting this relationship between p and q must have a negative gradient. Wegenerally denote the demand line by D.

The point where the supply and demand lines intersect is called the equilibrium point.In theory, this is the point where the market stabilises since, at this point, the [per-unit]price is such that the levels of supply and demand are equal. This is illustrated inFigure 2.5.

O

p

q

equilibrium pointS

D

Figure 2.5: Representing the supply, S, and demand, D, by lines, the equilibrium pointis where the two lines intersect. Notice that, at this point, the [per-unit] price, p, is suchthat the levels of supply and demand, q, are equal.

Activity 2.12 If demand is given by the equation 2q+ 5p = 500 and supply is givenby 3q = 25 + 7p, what are the equilibrium price and quantity?

47

Page 60: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

2. Review II — Linear equations and straight lines

Learning outcomes

At the end of this unit, you should be able to:

solve linear equations in one variable;

use a linear equation in two variables to draw the corresponding straight line;

find the gradient of a straight line;

be able to find the equation of a straight line from supplied information;

solve simultaneous equations;

solve problems in economics that use this material.

Exercises

Exercise 2.1

For the following linear equations, find two points that are solutions to the equation andhence draw the straight line.

i. 3x+ 4y = 12; iii. x− 2y = 4;

ii. 2x+ y = 10; iv. 3y − 2x = 5.

In each case, use your two points to calculate the gradient of the straight line and usethe equation of the straight line to verify that your answer for the gradient is correct.

Exercise 2.2

Draw the straight lines that go through the following pairs of points.

i. (1, 2) and (2, 4); iii. (1, 2) and (3, 2);

ii. (0,−3) and (3, 0); iv. (−2, 3) and (4, 6).

In each case, find the equation of the line you have drawn.

Exercise 2.3

Find the equations of the lines with the following properties.

i. A line that passes through the point (8,−1) and has a gradient of 14.

ii. A line with a gradient of −6 and a y-intercept with coordinates (0, 45).

iii. A line with a gradient of 72

and an x-intercept with coordinates

(−11

2, 0

).

48

Page 61: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

2. Review II — Linear equations and straight lines

Exercise 2.4

A company increased its weekly production from 20 to 25 units and found that its costswent up by £800 per week. Assuming that the relationship between costs andproduction is linear, find the marginal cost of production.

Given that the original cost was £5, 000, find the linear equation that relates the coststo production.

If the selling price was £200 per unit, how many more units does the company need toproduce in order to break-even?

Exercise 2.5

Solve the following sets of simultaneous equations.

i.x+ 2y = 7

x− 3y = −3

}iii.

4x+ 2y = 5

2x+ y = 2

}

ii.2x+ 5y = 11

3x+ 3y = 12

}iv.

4x+ 2y = 4

2x+ y = 2

}

Exercise 2.6

The demand for a product, q, is related to the price, p, by the equation q = 200− 2pwhile suppliers respond to a price of p by supplying an amount, q, given by the equationq = 3p− 200. Find the equilibrium price and the corresponding level of production.

49

Page 62: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

3. Review III — Quadratic equations and parabolae

Unit 3: Review IIIQuadratic equations and parabolae

Overview

In this unit we see how algebra allows us to solve quadratic equations by factorising andcompleting the square. We then see, more generally, that quadratics can representspecial curves known as parabolae and we see how to sketch them.

Aims

The aims of this unit are as follows.

To see how to write quadratics in their factorised and completed square forms.

To see how to use these forms to solve quadratic equations.

To see how to sketch a parabola and find various points of interest.

solve problems in economics that use this material.

Specific learning outcomes can be found near the end of this unit.

3.1 Quadratic equations

A quadratic equation in one variable, let’s call it x, is an equation of the form

ax2 + bx+ c = 0,

where a 6= 0, b and c are constants.∗ As such, we refer to expressions like the one on theleft-hand side of this equation as quadratics and we call the constants a, b and c thecoefficients of the quadratic. In this section we shall investigate several ways in whichwe can solve such equations.

3.1.1 Factorising

One way of solving a quadratic equation like

ax2 + bx+ c = 0

∗Notice that, if a = 0, then we have bx+c = 0 and this is a linear equation. That is, to be a quadraticequation in x, there must be an x2 term in the equation.

50

Page 63: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

3. Review III — Quadratic equations and parabolae

is to factorise it. This involves writing ax2 + bx+ c as the product of two linear factors,i.e. we want to ‘put brackets in’ so that we can write

ax2 + bx+ c = (Ax+B)(Cx+D),

for some constants A, B, C and D. If we can do this, we can then rewrite the quadraticequation as

(Ax+B)(Cx+D) = 0,

and this helps us because the product on the left-hand side of the expression can onlyequal zero if one of the linear factors in the brackets is equal to zero. That is, thesolutions to our quadratic equation must be the solutions to the two linear equations

Ax+B = 0 and Cx+D = 0.

Thus, as A,C 6= 0,† the solutions will be given by

x = −BA

and x = −DC,

which we can easily find given the constants A, B, C and D. Consequently, we see thatif we can factorise the quadratic in this way, we can easily solve the quadratic equation.

But, how can we go about factorising a quadratic? The basic idea involves the identity

(x+ α)(x+ β) = x2 + (α + β)x+ αβ,

which tells us how a certain factorised form, i.e. the left-hand side, is related to acertain quadratic, i.e. the right-hand side. So, reading this the other way, if we have thequadratic

x2 + (α + β)x+ αβ,

we can factorise it by simply taking the numbers α and β to get the factorised form

(x+ α)(x+ β).

But, of course, the problem is that we do not know what numbers α and β are! So, inthis relatively simple case, where a (the x2 coefficient) is one, we will have a quadraticlike

x2 + bx+ c,

and we need to find the numbers α and β which add together to give us b (as we needb = α + β) and which multiply together to give us c (as we need c = αβ). Then, if wecan find the numbers α and β that do this, we will have

x2 + bx+ c = x2 + (α + β)x+ αβ = (x+ α)(x+ β),

as the required factorised form. The trick then, is to take the values of b and c, thinkcarefully about which numbers α and β could add and multiply in the right kind of way,and hopefully settle on the ones that will make everything work.

However, generally speaking, factorising any given quadratic can be tricky (especially inthe more general case where a, the x2 coefficient, is not one) and so this method forsolving quadratic equations is not always that useful. Nevertheless, because it is so nicewhen it works, we will consider some examples below before we move on to some more‘reliable’ methods.

†This must be the case since AC = a and, as a 6= 0, neither A nor C can be zero.

51

Page 64: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

3. Review III — Quadratic equations and parabolae

Example 3.1 Solve the quadratic equation x2 − x− 6 = 0.

We start by factorising the quadratic x2 − x− 6, i.e. we need two numbers that addtogether to give us −1 and multiply together to give us −6. A little thought shouldconvince you that the required numbers are +2 and −3 which means that we have

x2 − x− 6 = (x+ 2)(x− 3),

as you can easily verify by multiplying out the brackets. This means that we canrewrite the quadratic equation as

(x+ 2)(x− 3) = 0,

so that the solutions will be given by

x+ 2 = 0 and x− 3 = 0,

i.e. the solutions are x = −2 and x = 3. When this happens, we say that we havetwo distinct solutions.

Activity 3.1 Verify that x = −2 and x = 3 are solutions to the quadratic equationin Example 3.1 by substituting them into the left-hand side of the equation andshowing that they give zero.

Example 3.2 Solve the quadratic equation x2 − 4x+ 4 = 0.

We start by factorising x2 − 4x+ 4, i.e. we need two numbers that add together togive us −4 and multiply together to give us +4. A little thought should convince youthat the required numbers are −2 and −2 which means that we have

x2 − 4x+ 4 = (x− 2)(x− 2),

as you can easily verify by multiplying out the brackets. This means that we canrewrite the quadratic equation as

(x− 2)(x− 2) = 0,

so that the solutions will be given by

x− 2 = 0 and x− 2 = 0,

i.e. both solutions are x = 2. When this happens, we say that x = 2 is a repeatedsolution.

Activity 3.2 Verify that x = 2 is a solution to the quadratic equation inExample 3.2 by substituting it into the left-hand side of the equation and showingthat it gives zero.

52

Page 65: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

3. Review III — Quadratic equations and parabolae

Activity 3.3 Solve the quadratic equation x2 + 7x+ 12 = 0 by factorising.

Unfortunately, as mentioned above, we will meet quadratic equations which are difficultto solve by factorisation. And for this reason, we now seek a method that will alwayswork. The method that we will use here requires us to complete the square of thequadratic instead of factorising it. So we now consider how to perform this procedureand then, having done this, we will be able to see how to use it to solve quadraticequations.

3.1.2 Completing the square

We know that, by multiplying out the brackets, we have

(x+ k)2 = x2 + 2kx+ k2,

and, since we can write x2 + 2kx+ k2 as (x+ k)2 in this way, we say that it is a perfectsquare. That is, it can be written as something (in this case, x+ k) squared and nothingelse.

Example 3.3 The quadratic x2 + 6x+ 9 is a perfect square because we can write

x2 + 6x+ 9 = (x+ 3)2.

But the quadratic x2 + 6x+ 10 is not a perfect square as we can only write it as

x2 + 6x+ 10 = (x2 + 6x+ 9) + 1 = (x+ 3)2 + 1,

and not as ‘something squared and nothing else’ due to the presence of the ‘+1’ onthe right-hand side.

Now imagine that we have a quadratic expression of the form x2 + 2kx and we want tocomplete the square. That is, we want to find something that we can add to thisexpression in order to get a perfect square. The idea is that:

if we add k2 to x2 + 2kx we get x2 + 2kx+ k2

and, as before, this is now a perfect square because

x2 + 2kx+ k2 = (x+ k)2,

meaning that we have ‘completed the square’ on x2 + 2kx by adding k2 to it.

But, what does this tell us about x2 + 2kx, our original quadratic expression? Using thisresult, we can take the k2 on the left-hand side over to the right-hand side in order to get

x2 + 2kx = (x+ k)2 − k2,

where, on the right-hand side, we now have a perfect square, i.e. (x+ k)2, plussomething which doesn’t depend on x, i.e. −k2. When we write x2 + 2kx in this way, wehave what we call its completed square form.

53

Page 66: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

3. Review III — Quadratic equations and parabolae

Example 3.4 The quadratic expression x2 − 4x can be made into a perfect squareby adding (−2)2 = 4 to it, i.e.

x2 − 4x+ 4 = (x− 2)2.

Consequently, we can write x2 − 4x as

x2 − 4x = (x− 2)2 − 4,

which is its completed square form.

To complete the square on a more complicated quadratic expression, say

x2 + 2kx+ c,

we can write x2 + 2kx in completed square form, as before, to get

x2 + 2kx = (x+ k)2 − k2,

and then note that

x2 + 2kx+ c =[x2 + 2kx

]+ c =

[(x+ k)2 − k2

]+ c = (x+ k)2 + (c− k2),

which is the completed square form of x2 + 2kx+ c since, on the right-hand side, wenow have a perfect square, i.e. (x+ k)2, plus something which doesn’t depend on x, i.e.c− k2.

Example 3.5 Find the completed square form of x2 − 4x+ 3.

We note that, if we just had x2 − 4x we would just add 4 to it to get

x2 − 4x+ 4 = (x− 2)2,

as before. Again, this means that we have

x2 − 4x = (x− 2)2 − 4,

and so we can write

x2 − 4x+ 3 =[x2 − 4x

]+ 3 =

[(x− 2)2 − 4

]+ 3 = (x− 2)2 − 1,

which is the desired completed square form.

We can find the completed square form of even more complicated quadratic expressions,like ax2 + 2kx+ c, by using brackets to break the expression down into simpler parts aswe did above.

Example 3.6 Find the completed square form of −2x2 + 8x+ 10.

54

Page 67: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

3. Review III — Quadratic equations and parabolae

We start by putting in brackets so that we can work with something which is similarto what we saw above, i.e. we want a quadratic expression where the x2 coefficient isone. This means that we want to write

−2x2 + 8x+ 10 = −2[x2 − 4x

]+ 10.

Now, from the example above we know that

x2 − 4x = (x− 2)2 − 4,

and so this means that we have

−2x2+8x+10 = −2[x2−4x

]+10 = −2

[(x−2)2−4

]+10 =

[−2(x−2)2+8

]+10 = −2(x−2)2+18,

which is the desired completed square form.

Activity 3.4 Verify that, in the previous examples, the completed square form ofthe expression is indeed equal to the original expression by multiplying out thebrackets.

Activity 3.5 Find the completed square form of −2x2 + 4x+ 8 and verify that youranswer is correct by multiplying out the brackets.

3.1.3 Using the completed square form to solve quadraticequations

Now that we can complete the square, we can see how we can use it to solve quadraticequations. The advantage being that, unlike with factorising, we will always know howto complete the square and so, we will always be able to use it to solve the quadraticequation! The method is probably best illustrated with an example.

Example 3.7 Solve the quadratic equation x2 − 4x = 0 by completing the square.

We saw earlier that we can write x2 − 4x as

x2 − 4x = (x− 2)2 − 4,

in completed square form. This means that the quadratic equation we have to solve is

(x− 2)2 − 4 = 0.

This is easily rearranged to get(x− 2)2 = 4,

and then, if we take the square root of both sides, we get

x− 2 = ±2.

Hence, the solutions to our quadratic equation are given by x = 2± 2, i.e. x = 4 andx = 0.

55

Page 68: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

3. Review III — Quadratic equations and parabolae

To verify that these are the solutions, we could substitute these values into theleft-hand side of the equation and check that we get zero. Or, alternatively, we canverify our answer by solving this quadratic equation by factorising, i.e.

x2 − 4x = 0 =⇒ x(x− 4) = 0 =⇒ x = 0 and x = 4,

as before.

Example 3.8 Solve the quadratic equation x2 − 4x+ 3 = 0 by completing thesquare.

We saw earlier that we can write x2 − 4x+ 3 as

x2 − 4x+ 3 = (x− 2)2 − 1

in completed square form. This means that the quadratic equation we have to solveis the same as

(x− 2)2 − 1 = 0.

This is easily rearranged to get(x− 2)2 = 1,

and then, if we take the square root of both sides, we get

x− 2 = ±1.

Hence, the solutions to our quadratic equation are given by x = 2± 1, i.e. x = 3 andx = 1.

To verify that these are the solutions, we could substitute these values into theleft-hand side of the equation and check that we get zero. Or, alternatively, we canverify our answer by solving this quadratic equation by factorising, i.e.

x2 − 4x+ 3 = 0 =⇒ (x− 1)(x− 3) = 0 =⇒ x = 1 and x = 3,

as before.

Example 3.9 Solve the quadratic equation −2x2 + 8x+ 10 = 0 by completing thesquare.

We saw earlier that, we can write −2x2 + 8x+ 10 as

−2x2 + 8x+ 10 = −2(x− 2)2 + 2

in completed square form. This means that the quadratic equation we have to solveis the same as

−2(x− 2)2 + 18 = 0.

This is easily rearranged to get(x− 2)2 = 9,

and then, if we take the square root of both sides, we get

x− 2 = ±3.

56

Page 69: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

3. Review III — Quadratic equations and parabolae

Hence, the solutions to our quadratic equation are given by x = 2± 3, i.e. x = 5 andx = −1.

To verify that these are the solutions, we could substitute these values into theleft-hand side of the equation and check that we get zero. Or, alternatively, we canverify our answer by solving this quadratic equation by factorising, i.e.

−2x2+8x+10 = 0 =⇒ x2−4x−5 = 0 =⇒ (x−5)(x+1) = 0 =⇒ x = 5 and x = −1,

as before.

Activity 3.6 In Examples 3.1 and 3.2, we solved the quadratic equations

x2 − x− 6 = 0 and x2 − 4x+ 4 = 0

by factorising. Verify your answers by solving them by completing the square.

Activity 3.7 In Activity 3.3, you were asked to solve the quadratic equationx2 + 7x+ 12 = 0 by factorising. Verify your answer by solving it by completing thesquare.

3.1.4 Warning!

So far, we have looked at the solutions to several quadratic equations and we havefound that there can be either two distinct solutions or one repeated solution. But, thisis not always the case! Consider the quadratic equation

ax2 + bx+ c = 0,

for some numbers a, b and c. When written in completed square form this will give us

a(x+ p)2 − q = 0,

for some numbers p and q. Now, this can be rearranged to get

(x+ p)2 =q

a,

and we would then take the square root of both sides of this equation to find itssolutions. But:

Ifq

a> 0, we will get two distinct [real] solutions, i.e. x = −p±

√q

a.

Ifq

a= 0, we will get one repeated [real] solution, i.e. x = −p± 0 = −p ‘twice’.

Ifq

a< 0, we will get no [real] solutions as the square root of a negative number

does not exist!

57

Page 70: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

3. Review III — Quadratic equations and parabolae

Consequently, we can see that a quadratic equation can have two, one or no [real]solutions depending on what happens when we rearrange the completed square form.‡

We shall investigate the consequences of this observation in the following sections.

Activity 3.8 (Hard)By finding the completed square form of ax2 + bx+ c show that when

ax2 + bx+ c = a(x+ p)2 − q,

the formulae

p =b

2aand q =

b2

4a− c,

tell us the values of p and q.

3.1.5 The quadratic formula

Another way of solving quadratic equations is by using the quadratic formula which isas follows.

The quadratic equationax2 + bx+ c = 0,

with a 6= 0, has solutions given by

x =−b±

√b2 − 4ac

2a.

Quadratic formula

This formula and its use should be familiar to everyone and so we will only give oneexample of its use.

Example 3.10 Solve the quadratic equation

3x2 + 22x+ 24 = 0,

by using the quadratic formula.

Comparing the quadratic equation

3x2 + 22x+ 24 = 0 with ax2 + bx+ c = 0,

we see that we have a = 3, b = 22 and c = 24. Putting these numbers into thequadratic formula

x =−b±

√b2 − 4ac

2a,

then gives us

x =−22±

√222 − 4(3)(24)

2(3)=−22±

√484− 288

6=−22±

√196

6=−22± 14

6,

‡We shall hear more about ‘real’ numbers in Unit 4.

58

Page 71: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

3. Review III — Quadratic equations and parabolae

so that taking the ‘+’ from the ‘±’ we have

x =−22 + 14

6= −8

6= −4

3,

and taking the ‘−’ from the ‘±’ we have

x =−22− 14

6= −36

6= −6.

That is, the solutions to this quadratic equation are x = −43

and x = −6.

Activity 3.9 In Examples 3.7, 3.8 and 3.9, we solved the quadratic equations

x2 − 4x = 0, x2 − 4x+ 3 = 0 and − 2x2 + 8x+ 10 = 0,

by completing the square. Use the quadratic formula to verify your answers.

You should also note that this formula comes from our method of solving quadraticequations by completing the square. Indeed, the conditions from Section 3.1.4 for two,one or no solutions can also be written as:

If b2 − 4ac > 0, we will get two distinct [real] solutions from the quadratic formula.

If b2 − 4ac = 0, we will get one repeated [real] solution from the quadratic formula.

If b2 − 4ac < 0, we will get no [real] solutions from the quadratic formula.§

We also note in passing that the quantity b2 − 4ac is called the discriminant.

Activity 3.10 (Hard)Solve the quadratic equation a(x+ p)2 − q = 0 and then, using the results ofActivity 3.8, derive the quadratic formula.

3.2 Parabolae

In Unit 2 we saw that if we had a linear equation in two variables, say ax+ by = c, thisrepresented a straight line. Indeed, we saw that oblique straight lines had equations ofthe form

y = mx+ k,

with m 6= 0. We now turn our attention to the curves which are represented byequations of the form

y = ax2 + bx+ c,

where a 6= 0 so that we can be sure that we are dealing with a quadratic expression inx. Two such curves, called parabolae, are illustrated in Figure 3.1. Observe that,unlike straight lines, parabolae have a minimum — like the point with coordinates

§Again, this is because the square root of a negative number does not exist!

59

Page 72: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

3. Review III — Quadratic equations and parabolae

(2,−1) in Figure 3.1(a) — or a maximum — like the point with coordinates (1, 4) inFigure 3.1(b). Points like these, where the curve ‘stops going down’ or ‘stops going up’are called turning points.

1 3

3

y

xO2

−1

y = x2 − 4x+ 3

3

4

1 3

y

xO−1

y = −x2 + 2x+ 3

(a) (b)

Figure 3.1: Two parabolae and their ‘key features’. (a) The parabola with equation y =x2 − 4x + 3 has a minimum with coordinates (2,−1), the y-intercept is y = 3 and thex-intercepts are x = 1 and x = 3. (b) The parabola with equation y = −x2 + 2x + 3has a maximum with coordinates (1, 4), the y-intercept is y = 3 and the x-intercepts arex = −1 and x = 3.

3.2.1 Sketching parabolae

In Unit 2, we saw how to plot a straight line if we are given its equation. The idea in thecase of plotting is that you calculate the coordinates of some number of points thatsatisfy the equation and join these points up to get the straight line. However, in thiscourse, we will generally be sketching curves as opposed to plotting them. As its namemay suggest, a sketch differs from a plot in that the former need only represent the ‘keyfeatures’ of a curve so that we can understand its ‘shape’ and how it is related to ouraxes (and, if necessary, other curves) whereas the latter will generally be a much moreaccurate drawing of it (which we could use, say, to infer values of certain quantities).Indeed, as we saw in Unit 2, the ‘key features’ which are needed to sketch a straight lineare simply its x and y-intercepts.

In this section we shall see how to sketch parabolae and, in particular, we will see thatthe ‘key features’ of a parabola are its y-intercept, its x-intercepts (if any) and thecoordinates of its maximum or minimum. Indeed, as instructive examples of how to dothis, we will go through the calculations that would enable us to draw the sketches inFigure 3.1.

60

Page 73: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

3. Review III — Quadratic equations and parabolae

Example 3.11 Sketch the parabola whose equation is y = x2 − 4x+ 3.

We start by noting that, following on from Example 3.5, we can write the equationof this parabola as

y = (x− 2)2 + 1,

in completed square form. This will enable us to find the ‘key features’ of theparabola as follows.

The y-intercept of the parabola occurs when x = 0 and so, substituting x = 0into the original form of its equation we get y = 3 as the y-intercept.

The x-intercepts of the parabola occur when y = 0 and so we have to solve thequadratic equation

x2 − 4x+ 3 = 0,

which, as we saw in Example 3.8, is easily done if we use the completed squareform. Thus, as we saw there, the solutions to the quadratic equation above arex = 1 and x = 3 which means that these values of x are the x-intercepts.

The turning point of the parabola can be found by using the completed squareform of its equation. In this case, as we know that (x− 2)2 ≥ 0 for all [real]values of x we can see that

(x− 2)2 ≥ 0 =⇒ (x− 2)2 − 1 ≥ −1 =⇒ y ≥ −1,

and so, as y must always be greater than or equal to −1, this must be theminimum value of y which occurs when (x− 2)2 = 0. Thus, the turning point isa minimum with coordinates (2,−1).

With this information, we can plot the ‘key features’ of the parabola on the axes anddraw a nice parabolic shape through them to get the sketch in Figure 3.1(a).

Let’s now consider an example where we haven’t done most of the work before.

Example 3.12 Sketch the parabola whose equation is y = −x2 + 2x+ 3.

We start by finding the completed square form of the equation of the parabola. Thiscan be found by writing

y = −x2 + 2x+ 3 = −[x2 − 2x

]+ 3,

so that, becausex2 − 2x+ 1 = (x− 1)2,

we gety = −

[(x− 1)2 − 1

]+ 3 = −(x− 1)2 + 1 + 3 = −(x− 1)2 + 4,

in completed square form. This enables us to find the ‘key features’ of the parabolaas follows.

The y-intercept of the parabola occurs when x = 0 and so, substituting x = 0into the original form of its equation we get y = 3 as the y-intercept.

61

Page 74: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

3. Review III — Quadratic equations and parabolae

The x-intercepts of the parabola occur when y = 0 and so we have to solve thequadratic equation

−x2 + 2x+ 3 = 0.

But, using the completed square form, this gives us

−(x− 1)2 + 4 = 0 =⇒ (x− 1)2 = 4 =⇒ x− 1 = ±2 =⇒ x = 1± 2.

Thus, the solutions to the quadratic equation above are x = 3 and x = −1which means that these values of x are the x-intercepts.

The turning point of the parabola can be found by using the completed squareform of its equation. In this case, as we know that (x− 1)2 ≥ 0 for all [real]values of x we can see that

−(x− 1)2 ≤ 0 =⇒ −(x− 1)2 + 4 ≤ 4 =⇒ y ≤ 4,

and so, as y must always be less than or equal to 4, this must be the maximumvalue of y which occurs when (x− 1)2 = 0. Thus, the turning point is amaximum with coordinates (1, 4).

With this information, we can plot the ‘key features’ of the parabola on the axes anddraw a nice parabolic shape through them to get the sketch in Figure 3.1(b).

One thing to note from what we have seen so far is the result of the following activity.

Activity 3.11 (Hard)From Activity 3.9, we know that

ax2 + bx+ c = a(x+ p)2 − q,

for certain values of p and q. Using this fact, explain why the turning point of theparabola

y = ax2 + bx+ c

will have coordinates (−p,−q) and why it will be

a minimum if a > 0, and

a maximum if a < 0.

In particular, observe how the sign of a determines whether the parabola has amaximum or a minimum.

Activity 3.12 Sketch the parabola whose equation is y = −x2 + 4x.

3.2.2 Where do a parabola and a straight line intersect?

In Unit 2, we saw how to find the point where two straight lines intersected. We nowconsider how we would go about finding the point(s) where a line and a parabolaintersect by looking at an example.

62

Page 75: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

3. Review III — Quadratic equations and parabolae

Example 3.13 Find the points of intersection of the parabola y = x2 − 4x+ 3 andthe straight line y = −x+ 3.

To find the points of intersection, we want to find the values of x that make thevalues of y from both equations the same, i.e. we seek the values of x that satisfy theequation

x2 − 4x+ 3 = −x+ 3.

But, in this case, these are easily found because we can rearrange this to get aquadratic equation which is particularly easy to solve, namely

x2 − 3x = 0 =⇒ x(x− 3) = 0 =⇒ x = 0 or x = 3.

Now we know the values of x, we can substitute them back into either equation toget the corresponding values of y. So, as y = −x+ 3 is the easier equation, we usethis to get

x = 0 =⇒ y = −0 + 3 = 3 and x = 3 =⇒ y = −3 + 3 = 0.

Thus, the required points of intersection between the parabola and the straight linehave coordinates (0, 3) and (3, 0) as illustrated by the ‘•’s in Figure 3.2(a).

Activity 3.13 Consider the parabola y = −x2 + 4x which we sketched inActivity 3.12. Find the point(s) of intersection (if any) of this parabola and thestraight lines (a) y = 2x+ 1 and (b) y = 2x+ 2. Draw sketches of these curves toillustrate what you find.

3.2.3 Where do two parabolae intersect?

Following on from this, we can also see how we would go about finding the point(s)where two parabolae intersect by looking at an example.

Example 3.14 Find the points of intersection of the two parabolae y = x2 − 4x+ 3and y = −x2 + 2x+ 3.

To find the points of intersection, we want to find the values of x that make thevalues of y from both equations the same, i.e. we seek the values of x that satisfy theequation

x2 − 4x+ 3 = −x2 + 2x+ 3.

But, in this case, these are easily found because we can rearrange this to get aquadratic equation which is particularly easy to solve, namely

2x2 − 6x = 0 =⇒ x2 − 3x = 0 =⇒ x(x− 3) = 0 =⇒ x = 0 or x = 3.

Now we know the values of x, we can substitute them back into either equation toget the corresponding values of y. So, using y = x2 − 4x+ 3, we use this to get

x = 0 =⇒ y = 0− 0 + 3 = 3 and x = 3 =⇒ y = 9− 12 + 3 = 0.

63

Page 76: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

3. Review III — Quadratic equations and parabolae

1 3

3

y

xO2

−1

y = x2 − 4x+ 3

y = −x+ 3

4

−1

y = −x2 + 2x+ 3

1 3

3

y

xO2

−1

y = x2 − 4x+ 3

(a) (b)

Figure 3.2: Returning to the parabola y = x2 − 4x+ 3 first seen in Figure 3.1(a) we cansee: (a) its two points of intersection with the straight line y = −x + 3 and (b) its twopoints of intersection with the parabola y = −x2 + 2x + 3 first seen in Figure 3.1(b). Inboth cases, the points of intersection are indicated by ‘•’s.

Thus, the required points of intersection between the two parabolae have coordinates(0, 3) and (3, 0) as illustrated by the ‘•’s in Figure 3.2(b).

Activity 3.14 Consider the parabola y = −x2 + 4x which we sketched inActivity 3.12. Find the point(s) of intersection (if any) of this parabola and theparabolae (a) y = x2 + 2 and (b) y = x2 + 3. Draw sketches of these curves toillustrate what you find.

Learning outcomes

At the end of this unit, you should be able to:

write a simple quadratic in factorised form;

write any quadratic in completed square form;

solve quadratic equations by factorising, completing the square or using thequadratic formula;

identify the ‘key features’ of a parabola and use these to draw a sketch;

find the points of intersection of a parabola with a straight line or another parabola.

64

Page 77: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

3. Review III — Quadratic equations and parabolae

Exercises

Exercise 3.1

Multiply out the following brackets.

i. (x+ 1)(x+ 2); iv. (3x− 5)(3x+ 5);

ii. (x+ 13)2; v. (x− 1)(x− 2)(x+ 3);

iii. (3x− 2)(5x+ 3); vi. (x− 3y)(2x+ 4y).

Exercise 3.2

Factorise the following quadratic expressions.

i. x2 − x− 2; iii. 2x2 + 2x− 12;

ii. x2 + 3x− 18; iv. −x2 + x+ 2.

Exercise 3.3

Solve the following equations. Try factorising first and then completing the square. Usethe quadratic formula only as a last resort!

i. x2 = 5; iv. x2 = −7x;

ii. x2 + 4x− 5 = 0; v. 2x2 + 5x = 3;

iii. x2 + 2x+ 3 = 0; vi. 5x2 − 8x+ 2 = 0.

Exercise 3.4

For each of the following, complete the square and then sketch the graph.

i. y = x2 − 6x+ 5; iii. y = −x2 − 6x+ 6;

ii. y = x2 − 4x+ 5; iv. y = 5x2 − 4x− 1.

In each case, you should determine the coordinates of the turning point and the x andy-intercepts.

Exercise 3.5

i. Sketch the parabola given by the equation y = 6− x− x2 and the straight linegiven by the equation y = 2x+ 4 on the same set of axes.

ii. By solving the appropriate equation, find the points where the parabola and thestraight line intersect.

iii. Sketch another line, parallel to the first, which only intersects the parabola onceand calculate the y-intercept of this second line.

65

Page 78: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

3. Review III — Quadratic equations and parabolae

Exercise 3.6

A company sells its products in a market where the price, p, is linked to the quantitysold, q, by the demand equation p = 120− 2q.

i. Calculate the market price, and the revenue, if the company sells 35 units. What isthe revenue in terms of q?

The company incurs fixed costs of 400 and an additional cost of 12 for each unitproduced.

ii. How much will it cost to produce 35 units? What is the total cost in terms of q?

iii. What profit will the company make from producing and selling 35 units? What isthe profit in terms of q?

iv. By completing the square, calculate the number of units that will maximise theprofit. What is the corresponding market price?

66

Page 79: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

4. Functions

Unit 4: Functions

Overview

In this unit we introduce the idea of a function. This will play an important role in therest of this course and, in particular, it bridges the gap between what we have seen sofar and what we will see when we look at calculus.

Aims

The aims of this unit are as follows.

To introduce the idea of a function.

To introduce some common functions and look at their properties.

To see how functions can be combined and how they can be used in economics.

To introduce the idea of an inverse and see how it can be found.

Specific learning outcomes can be found near the end of this unit.

4.1 Functions

In this unit, we want to introduce the idea of a function which, at the most basic level,is just a rule that turns an input into an output. In particular, when we talk aboutinputs and outputs we mean numbers, or more specifically, real numbers. These can bethought of in several ways but, essentially, every number that can be written as adecimal is a real number. Alternatively, we can think of each real number as a point ona number line (and vice versa) as illustrated in Figure 4.1. Of course, in a way, we have

√2 e π

−1−2−3 31 20

−12

Figure 4.1: The central portion of the real number line and some of the numbers on it.We will encounter the real numbers e and π shortly.

already seen real numbers represented in this way as the x-axis is just a real numberline which represents all the inputs a rule can have. And, similarly, if we think of they-axis as another real number line which represents all the outputs a rule can have, wemay start to appreciate that the curves we have been sketching in Units 2 and 3 are justways of visualising how certain rules relate their inputs to their outputs.

67

Page 80: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

4. Functions

Now that we have an idea of what our inputs and outputs can be, let’s look moreclosely at the relationship between rules and functions.

4.1.1 What is a function?

A function is a rule that gives exactly one output for each input. If we represent theinput by the variable x, and call the function f , we can then use f(x) (read ‘f of x’) todenote the corresponding output. In this way, it is sometimes convenient to think of afunction as a machine (or ‘black box’) by writing

x −→ f −→ f(x),

as this indicates how each input, x, is ‘processed’ by the function f to give the outputf(x). Indeed, observe that we can use any variable to represent the input and so, if wehad used t instead of x, the output would be f(t) and we could write

t −→ f −→ f(t),

to indicate how each input, t, is ‘processed’ by the function f to give the output f(t).Once we have this notation, we can then capture the effect of any given function oneach input by using an appropriate formula to express the rule.

Example 4.1 Let’s say that the rule we want to capture is ‘square the number andthen add one’. This rule gives us a function, let’s call it f , which can be captured bythe formula f(x) = x2 + 1 which tells us how each input, x, is ‘processed’ by f togive the output f(x). In particular, we can see that, if x = 1, the output isf(1) = 12 + 1 = 2 whereas if the input was x = 2, the output would bef(2) = 22 + 1 = 5.

Notice also, that this rule does define a function because every input, x, gives rise toexactly one output, namely whatever number x2 + 1 turns out to be. And, indeed, ifwe had chosen to use the variable t instead of x we would now be using the formulaf(t) = t2 + 1 to capture the effect of this function.

Activity 4.1 Following on from Example 4.1, find the values of f(0), f(−1) andf(√

2).

However, not all rules will give us a function. For instance, if we had the rule ‘take thesquare root of the number’, we find that

Negative numbers do not have square roots and so this rule gives us no outputswhen the input is a negative number, i.e. this rule can not specify a functionbecause we do not get at least one output for these inputs.

Positive numbers have two square roots and so this rule gives us two outputs whenthe input is a positive number, i.e. this rule can not specify a function because wedo not get at most one output for these inputs.

So, we can see that when looking at whether a rule can define a function, we may needto take some care when specifying what the inputs are and whether the rule itselfactually satisfies the ‘exactly one output’ requirement.

68

Page 81: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

4. Functions

In what follows, we will look at some of the most common functions that occur inmathematics and we will also see how these functions can be combined to make newfunctions.

4.1.2 Some common functions

We have already encountered several functions in this course. For instance, we have seen

constant functions: which take the form f(x) = k for some constant k,

linear functions: which take the form f(x) = ax+ b for some constants a 6= 0 and b,

quadratic functions: which take the form f(x) = ax2 + bx+ c for some constantsa 6= 0, b and c.

In particular, we know what all of these functions look like because we saw how tosketch them in Units 2 and 3. More generally, these are examples of polynomialfunctions because they take the form

f(x) = anxn + an−1x

n−1 + · · ·+ a1x+ a0,

for some constants an, an−1, . . . , a1, a0. Indeed, if xn is the highest power in thepolynomial, we say that it has degree n so that constant, linear and quadratic functionsare polynomials of degree zero, one and two respectively. What do polynomial functionslook like? We will be able to answer this question more thoroughly when we look atcurve sketching in Unit 7.

Now, however, we want to introduce some new functions and get some idea of whatthey look like.

Exponential functions

Given a positive number a 6= 1, called the base, an exponential function has the form

f(x) = ax,

and, depending on whether 0 < a < 1 or a > 1, they give us curves like the onesillustrated in Figure 4.2. In particular, observe that ax 6= 0 for all values of x.

The most important exponential function occurs when the base is the number e whichis approximately 2.71828 (5dp). We will encounter this function, ex, and see somereasons why it is so special in Units 5 and 9.

Sine and cosine functions

Two other functions that we will be interested in are the sine and cosine functions. Youare probably familiar with these from their use in problems involving triangles since weknow that

sin θ =opposite

hypotenuseand cos θ =

adjacent

hypotenuse,

using the right-angled triangle illustrated in Figure 4.3.

69

Page 82: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

4. Functions

1

xO

y

y = ax

1

xO

y

y = ax

(a) When a > 1 (b) When 0 < a < 1

Figure 4.2: The exponential function when (a) a > 1 and (b) 0 < a < 1.

θ

hypotenuse

adjacent

opposite

Figure 4.3: The sine and cosine functions can be defined in terms of the sides of aright-angled triangle.

In this course, however, when we talk about angles, we will measure them in radiansand not degrees. The basic idea here is that π radians, where the number π isapproximately 3.142 (3dp), is the same as 180 degrees and, using this, we can convertangles in degrees to angles in radians using the formula

angle in radians =π

180× angle in degrees.

So, if we use the triangles in Figure 4.4 to determine the most important values of thesefunctions, we would get the results given in the following table.

70

Page 83: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

4. Functions

θ in degrees θ in radians sin θ cos θ

30π

6

1

2

√3

2

45π

4

1√2

1√2

60π

3

√3

2

1

2

Activity 4.2 Verify that the results in the table are correct.

π/6

π/3

1

2 √3

1

π/4

1√ 2

π/4

(a) (b)

Figure 4.4: The triangles which allow us to find sin θ and cos θ when (a) θ = π/6 orθ = π/3 and (b) θ = π/4.

More generally, as illustrated in Figure 4.5, we find that these functions are periodicwith a period of 2π radians, a fact that we could express mathematically by writing

sin(x+ 2π) = sin x and cos(x+ 2π) = cos x,

and we can also see that the cosine function is just the sine function ‘shifted to the left’by π/2 radians, a fact that we could express mathematically by writing

cosx = sin(x+

π

2

).

Observe, in particular, that some other important values of these functions are given inthe following table.

71

Page 84: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

4. Functions

θ in degrees θ in radians sin θ cos θ

0 0 0 1

90π

21 0

180 π 0 −1

y = sinx y = cosx

Figure 4.5: The sine and cosine functions for −π ≤ x ≤ 4π. Notice, in particular, thatthey are both periodic with period 2π and that the cosine function is just the sine function‘shifted to the left’ by π/2 radians.

Activity 4.3 What are sinx and cos x when x is 3π/2? 2π?

4.1.3 Combinations of functions

It is also possible to combine functions in certain ways to get new functions and thisgenerally works in the obvious way. For instance, if we have a function, f , and aconstant, k, we can get the new function kf , which is called a constant multiple of f , byusing the rule

(kf)(x) = k · f(x),

and, similarly, if we have two functions, f and g, we can get the new function f + g,which is called the sum of f and g, by using the rule

(f + g)(x) = f(x) + g(x).

This may sound a bit abstract, but the following example should make it clear.

72

Page 85: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

4. Functions

Example 4.2 Suppose that the functions f and g are given by the formulae

f(x) = x2 − 4 and g(x) = ex.

In this case, the function 3f would be given by the formula

(3f)(x) = 3 · f(x) = 3(x2 − 4) = 3x2 − 12,

i.e. it is just three times f(x), whereas the function f + g would be given by theformula

(f + g)(x) = f(x) + g(x) = x2 − 4 + ex,

i.e. it is just the sum of f(x) and g(x).

Indeed, if we have two functions, f and g, and two constants, k and l, we can get thenew function kf + lg, called a linear combination of f and g, by using the rule

(kf + lg)(x) = k · f(x) + l · g(x),

which should be fairly obvious given the two rules above.

Example 4.3 Following on from Example 4.2, the function 2f − g would be givenby the formula

(2f − g)(x) = 2f(x) + (−1)g(x) = 2(x2 − 4) + (−1) ex = 2x2 − 8− ex,

as we can think of 2f − g as 2f + (−1)g.

Activity 4.4 Following on from Example 4.3, find the formulae for the functions−f ,√

2g, f − g, −9f + 2g.

Activity 4.5 Explain how the linear combination rule can be obtained from theconstant multiple and sum rules.

If f and g are functions, write down the rule which would give us the new functionf − g, called the difference of f and g.

Products and quotients

Two other ways of combining functions are products and quotients. The former, as itsname may suggest, is what we get when we have two functions, f and g, and wemultiply them together to get the new function f · g, called the product of f and g, byusing the rule

(f · g)(x) = f(x) · g(x).

Similarly, if we divide f by g we get the new function f/g, called the quotient of f andg, by using the rule (

f

g

)(x) =

f(x)

g(x).

73

Page 86: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

4. Functions

It is important to observe, however, that this last rule can only be used if we havevalues of x where g(x) 6= 0. In particular, if g(x) = 0 at some value of x, the newfunction f/g is undefined at that value of x because division by zero is never allowed.

Example 4.4 Following on from Example 4.2, the function f · g would be given bythe formula

(f · g)(x) = f(x) · g(x) = (x2 − 4) ex,

i.e. it is just f times g, whereas the function f/g would be given by the formula(f

g

)(x) =

f(x)

g(x)=x2 − 4

ex,

i.e. it is just f divided by g. Notice, in particular, that this quotient is defined for allvalues of x because ex is never equal to zero.

Activity 4.6 Following on from Example 4.4, verify that the function g · f is thesame as the function f · g.

Find the formula for the function g/f . For which inputs is this function not defined?

Compositions

The last way of combining functions that we will consider is their composition. If wehave two functions, f and g, then we can get the new function f ◦ g, which is thecomposition we get when we apply f after applying g, by using the rule

(f ◦ g)(x) = f(g(x)),

provided that it makes sense to apply the rule for f to each output, g(x), of g. Indeed,to see what is happening here, it useful to think of these functions as machines again sothat we can represent this composition as

x −→ g −→ g(x) −→ f −→ f(g(x)),

and, in this way, we see that it can only make sense if g(x) is giving us an input for fthat allows us to get its output f(g(x)).

Of course, in a similar manner, we can also get the new function g ◦ f , which is thecomposition we get when we apply g after applying f , by using the rule

(g ◦ f)(x) = g(f(x)),

provided that it makes sense to apply the rule for g to each output, f(x), of f . In thiscase, we could represent the composition as

x −→ f −→ f(x) −→ g −→ g(f(x)),

and, in this way, we see that it can only make sense if f(x) is giving us an input for gthat allows us to get its output g(f(x)). In particular, observe that the functions f ◦ gand g ◦ f are usually different as we can see in the next example.

74

Page 87: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

4. Functions

Example 4.5 Following on from Example 4.2, the function f ◦ g would be given bythe formula

(f ◦ g)(x) = f(g(x)) = f( ex) = ( ex)2 − 4 = e2x − 4,

whereas the function g ◦ f would be given by the formula

(g ◦ f)(x) = g(f(x)) = g(x2 − 4) = ex2−4.

Notice, in particular, that these functions are not the same!

Activity 4.7 Suppose that the functions f and g are given by the formulae

f(x) = x− 1 and g(x) =√x.

Find the formulae for the compositions f ◦ g and g ◦ f . For which inputs is the latterfunction not defined?

A last word on combinations of functions

So far, we have seen how to combine certain functions in different ways to get newfunctions. However, when we come to look at calculus, we will also need to do this ‘inreverse’, i.e. we will need to be able to look at a function and see how it has beenconstructed by combining other, simpler functions. This is usually quite straightforwardas illustrated in the following example.

Example 4.6 The function given by

ex sinx,

is the product, f · g, of the functions f and g where

f(x) = ex and g(x) = sin x,

whereas the function(x2 + 1)2,

is the composition, f ◦ g, of the functions f and g where

f(x) = x2 and g(x) = x2 + 1,

since it is just f(g(x)).

Activity 4.8 Find two functions f and g that can be combined to get the functions

(i) x2 ex, (ii)x2

ex, (iii) e2x, (iv) e2x + 3 ex + 1.

In each case, also indicate the kind of combination that you have found.

75

Page 88: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

4. Functions

4.1.4 Functions in economics

Functions are widely used in economics and one particularly important example occurswhen we consider supply and demand like we did in Section 2.3.3. In particular, if wehave a supply equation which can be written in the form q = qS(p), then we call qS thesupply function whereas if we have a demand equation which can be written in the formq = qD(p), then we call qD the demand function. With these functions, we can then seethat the equilibrium price, i.e. the price that makes the quantity supplied equal to thequantity demanded, can be found by solving the equation

qS(p) = qD(p),

and then we can use either of these functions to find the corresponding equilibriumquantity.

Activity 4.9 In Activity 2.12, supply was given by the equation 3q = 25 + 7p anddemand was given by the equation 2q + 5p = 500. Find the supply and demandfunctions.

Use these functions to find the equilibrium price and quantity.

We will also encounter other functions of economic significance. For instance, if acompany manufactures an amount, q, of some product then the money it makes fromselling this amount is given by its revenue function, R(q), whereas the money spent onproducing this amount is given by its cost function, C(q). The difference between thesetwo functions then gives us the firm’s profit function,

π(q) = R(q)− C(q),

which, for a given value of q, may be positive or negative meaning that the firm ismaking a profit or a loss respectively.∗

Activity 4.10 A company sells each unit of its product for £4. What is its revenuefunction?

If its profit function is given by π(q) = −q2 + 6q − 4, what is its cost function?

4.2 Inverse functions

We have seen that a function, f , is a rule that gives exactly one output for each inputand we can think of this by writing

x −→ f −→ f(x).

Now, we want to consider the circumstances under which we can ‘reverse’ this process.That is, under what circumstances can we find a function, which we will call f−1, whosejob can be thought of by writing

x←− f−1←− f(x).

∗Notice, in particular, that we use the Greek letter ‘π’ to denote the profit function as we are alreadyusing ‘p’ to denote prices.

76

Page 89: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

4. Functions

In particular, if we can find such a function, called the inverse of f , we see that it takesthe original outputs, f(x), as inputs and gives us the corresponding original inputs, x,as outputs.† Indeed, we will find that some functions have an inverse whereas others donot, unless we take some care with the inputs and outputs that we are considering.

Another thing to notice is that, if an inverse function exists, then the composition of afunction and its inverse gives us a function which takes an input and gives us this verysame input as its output. To see this, consider that the composition f−1 ◦ f can berepresented as

x −→ f −→ f(x) −→ f−1 −→ x,

and so we should always find that, if the inverse exists,

(f−1 ◦ f)(x) = f−1(f(x)) = x,

whereas the composition f ◦ f−1 can be represented as

y −→ f−1 −→ f−1(y) −→ f −→ y,

and so we should always find that, if the inverse exists,

(f ◦ f−1)(y) = f(f−1(y)) = y.

In particular, notice that this is one of the few cases where the composition gives us thesame function regardless of the order in which we perform the composition. As we shallsee, the fact that the composition of a function and its inverse must behave in this waywill provide us with a useful way of verifying that we have found the correct formula foran inverse function!

4.2.1 Finding inverse functions

Suppose that we have a function, f , and given an input x, we take y = f(x) to be theoutput. Written in this form, the inputs are related to the outputs by the equationy = f(x) which tells us y in terms of x. To find the inverse function, we need to find xin terms of y and, if this gives us exactly one value of x for each value of y that we areconsidering, then we have found the inverse function, f−1. In particular, when writtenin this new form, we now have the equation x = f−1(y) and so we can identify theformula for f−1. Let’s look at an example.

Example 4.7 Suppose that the function f is given by the formula

f(x) = 2x+ 3.

If we set y = f(x), this gives usy = 2x+ 3,

†Note that we are using ‘f−1’ to denote a new function which does a specific job and it is not to bethought of as ‘f to the power of −1’ ! In particular, if we wanted to think of ‘f to the power of −1’ wewould surely mean ‘1/f(x)’ which we would call the ‘reciprocal of f ’ or ‘1/f ’. As such, the inverse of afunction, say f−1, and its reciprocal, say 1/f , are two completely different things!

77

Page 90: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

4. Functions

as the equation which relates the inputs, x, of f to its outputs, y. Rearranging thisto find x in terms of y, we find that

y = 2x+ 3 =⇒ 2x = y − 3 =⇒ x =y − 3

2,

and this equation now relates the outputs, y, of f to its inputs, x. Moreover, as eachvalue of y will give exactly one value of x, the inverse function exists and so,thinking of this equation as x = f−1(y), we can then deduce that

f−1(y) =y − 3

2,

is the inverse of f .

In particular, notice that we can verify that this is correct by noting that

(f−1 ◦ f)(x) = f−1(f(x)) = f−1(2x+ 3) =(2x+ 3)− 3

2=

2x

2= x,

and, indeed, that

(f ◦ f−1)(y) = f(f−1(y)) = f

(y − 3

2

)= 2

(y − 3

2

)+ 3 = (y − 3) + 3 = y,

as we should expect given our discussion above.

Indeed, thinking back to our discussion of supply and demand functions in Section 4.1.4,we may be able to find their inverses. That is, if we have a supply equation which canbe written in the form p = pS(q), then we call pS the inverse supply function as it is justq−1S whereas if we have a demand equation which can be written in the form p = pD(q),

then we call pD the inverse demand function as it is just q−1D . With these functions, we

can then see that the equilibrium quantity, i.e. the quantity that makes the suppliers’price equal to the consumers’ price, can be found by solving the equation

pS(q) = pD(q),

and then we can use either of these functions to find the corresponding equilibriumprice.

Activity 4.11 Following on from Activity 4.9, where the supply was given by theequation 3q = 25 + 7p and demand was given by the equation 2q + 5p = 500. Findthe inverse supply and demand functions.

Use these functions to find the equilibrium price and quantity.

What can go wrong?

However, situations where inverses do not exist are not hard to find as the next exampleshows.

78

Page 91: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

4. Functions

Example 4.8 Suppose that the function, f , is given by the formula

f(x) = x2.

If we set y = f(x), this gives usy = x2

as the equation which relates the inputs, x, of f to its outputs, y. Rearranging thisto find x in terms of y, we find that

y = x2 =⇒ x = ±√y,

and this equation now relates the outputs, y, of f to its inputs, x. However, we cannot use this to define an inverse function because we have two problems:

If the output, y, of f is negative this equation gives us no value for thecorresponding input, x.

If the output, y, of f is positive this equation gives us two values for thecorresponding input, x.

And, of course, we need to get exactly one value of x for each value of y in order todefine an inverse function!

Although, having said this, if we take some care with the inputs and outputs that weare considering, then we can often overcome such problems and find an inverse functionas the next example shows.

Example 4.9 Suppose that the function, f , is given by the formula

f(x) = x2

and we only want to consider values of x that are positive, i.e. we have x > 0. If weset y = f(x), this gives us

y = x2

as the equation which relates the inputs, x, of f to its outputs, y. In particular, asx2 > 0 for all values of x > 0, this can only give us outputs, y, that are positive, i.e.we have y > 0. Rearranging this equation to find x in terms of y, we again find that

y = x2 =⇒ x = ±√y,

and this equation again relates the outputs, y, of f to its inputs, x. But, now we canfind an inverse function since y > 0 means that we can always find a value for

√y

and x > 0 means that we are only interested in the positive value, +√y, we get from

the square root, i.e. we can reject the problematic −√y as we know that the valuesof x that we started with are positive. That is, since f only takes positive numbersas inputs and only gives positive numbers as outputs, we must have

x =√y,

79

Page 92: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

4. Functions

for all allowed values of x and y. Consequently, as each allowed value of y will giveexactly one allowed value of x, the inverse function exists and so, thinking of thisequation as x = f−1(y), we can then deduce that

f−1(y) =√y,

is the inverse of f .

Activity 4.12 Following on from Example 4.9, verify that the inverse is correct byshowing that

(f−1 ◦ f)(x) = x and (f ◦ f−1)(y) = y,

as we should expect.

Activity 4.13 Following on from Example 4.9, suppose that the function f is againgiven by the formula f(x) = x2, but now we only want to consider values of x thatare negative. Does f have an inverse? If it does, what is it?

If you do find an inverse, verify that

(f−1 ◦ f)(x) = x and (f ◦ f−1)(y) = y,

as we should expect. (Take care here: Remember that x < 0!)

Furthermore, in some cases, this method for finding an inverse function just does notwork as we have no useful algebraic way of ‘rearranging’ the relevant equation. Instead,we often have to define an entirely new, but related, function to do the job. This is whathappens, for instance, when we have an exponential function of the form

f(x) = ax,

whose inverse is given by an appropriate logarithm. Indeed, as well as giving us theinverse of exponential functions, logarithmic functions are useful in their own right. Assuch, we now introduce logarithms as the last of our common functions, and take a lookat their special properties.

4.2.2 Logarithms

Logarithms are defined as follows.

If a 6= 1 is a positive number (called the base) and x is a positive number (calledthe argument) then the logarithm of x to base a, denoted by loga(x), is the numberb such that ab = x. That is,

ab = x means exactly the same thing as b = loga(x).

In particular, it is always the case that aloga(x) = x.

Logarithms

80

Page 93: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

4. Functions

For example, we can use this definition to see that as

22 = 4, 21 = 2, 20 = 1 and 2−1 =1

2,

we must have

log2(4) = 2, log2(2) = 1, log2(1) = 0 and log2

(1

2

)= −1,

respectively. Notice, in particular, that even though the base and the argument of alogarithm must be positive, the logarithm itself can be negative.

Activity 4.14 Suppose that a 6= 1 is any positive number. Explain why thefollowing results are true.

i. loga(1) = 0, ii. loga(a) = 1, iii. loga

(1

a

)= −1, iv. loga(a

b) = b.

Activity 4.15 Suppose that, for some positive number, a 6= 1, we have the function

f(x) = ax.

Explain why the inverse of f is given by f−1(y) = loga(y) as long as y is a positivenumber.

Why does the inverse of f not exist if a = 1?

The laws of logarithms

As logarithms to base a are closely related to powers of a, the power laws that we sawin Unit 1 allow us to deduce the laws of logarithms. These are as follows.

Logarithms obey some simple laws:

The multiplication law:

loga(x · y) = loga(x) + loga(y),

which follows from the fact that au · av = au+v.

The division law:

loga

(x

y

)= loga(x)− loga(y),

which follows from the fact that au/av = au−v.

The power law:loga(x

y) = y loga(x),

which follows from the fact that (au)v = au·v.

Notice that, when using these laws, all the logarithms have the same base.

The laws of logarithms

81

Page 94: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

4. Functions

Example 4.10 From the examples above we know that

log2(4) + log2(2) = 2 + 1 = 3,

We can verify this by using the laws of logarithms by noting that

log2(4) + log2(2) = log2(4 · 2) using the multiplication law

= log2(23) as 4 · 2 = 22 · 2 = 23

= 3 log2(2) using the power law

= 3 as log2(2) = 1

as before.

Activity 4.16 Explain why loga(x2) = 2 loga(x) and loga(x

3) = 3 loga(x) using (i)the power law and (ii) the multiplication law. Can you see how this generalises?

Activity 4.17 (Hard)Use the power laws to derive the laws of logarithms.

Changing base

Generally, when we use logarithms, we use logarithms to the base 10 or base e. As thesebases are so special, we have special names for them:

Logarithms to the base 10 are denoted by ‘log’ and are called ‘common logarithms’.

Logarithms to the base e are denoted by ‘ln’ and are called ‘natural logarithms’.

The main reason for emphasising these logarithms is that many calculators havebuttons which enable us to easily work them out. But, in this course, the basiccalculator which you are allowed to use in the examination does not have these buttonsand so the values of any logarithms (which can not easily be figured out) will be givento an appropriate number of decimal places.

Example 4.11 If we needed the value of log(100), we would say that

log(100) = 2 because 100 = 102,

as this can be easily figured out.

However, if we needed the value of log(101), we might be told its value to anappropriate number of decimal places, say

log(101) = 2.00432 (to 5dp),

as this value is not easy to figure out. Or, we might be told some other informationthat would allow us to find this value using our basic calculators.

82

Page 95: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

4. Functions

Sometimes, however, it is convenient to work with logarithms to some other base a, i.e.‘loga’, and in such cases, when it comes to working them out we need to know how toconvert the ‘loga’ into, say, ‘log’s or ‘ln’s or whatever so that we can use any givenvalues to evaluate them. For such purposes the rule is as follows.

Given two bases a and b, we can convert a logarithm to base a, say loga(x), into alogarithm to base b, say logb(x), by using

loga(x) =logb(x)

logb(a).

In particular, if we were given the relevant ‘log’s or ‘ln’s we would have to use

loga(x) =log(x)

log(a)or loga(x) =

ln(x)

ln(a),

respectively.

The change of base rule for logarithms

To see how this works, let’s say that we wanted to work out log100(10000) usingcommon logarithms. Given the numbers involved we don’t need a calculator to see that

log100(10, 000) =log(10, 000)

log(100)=

log(104)

log(102)=

4

2= 2,

which is what we would expect as log100(10000) = log100(1002) = 2. Alternatively, wecould have used natural logarithms, in which case we would have had to use a calculatorto see that

log100(10, 000) =ln(10, 000)

ln(100)=

9.21034...

4.60517...= 2,

as before.

Activity 4.18 Following on from Example 4.11, use the change of base rule forlogarithms to find log(101) to 5dp given that, to 5dp, ln(101) = 4.61512 andln(10) = 2.30259.

Activity 4.19 (Hard)Derive the change of base rule for logarithms.

Learning outcomes

At the end of this unit, you should be able to:

explain what a function is;

solve problems that involve the given common functions and combinations of them;

83

Page 96: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

4. Functions

explain what an inverse function is;

find inverse functions if they exist or explain why they do not;

solve problems that involve logarithms.

Exercises

Exercise 4.1

The function C(x) = 10x+ 315 gives the cost of producing x units of a product. Findthe cost when the following numbers of units are produced.

(i) 24, (ii) y, (iii) 3a+ 2, (iv) x+ y.

Exercise 4.2

Suppose that the functions f , g and h are given by

f(x) = 3x+ 6, g(x) = 2x, and h(x) = sinx.

What are the following functions?

(i) (f + h)(x); (ii) (f · h)(x); (iii) (f ◦ g)(x);

(iv) (g ◦ h)(x); (v) f−1(x); (vi) g−1(x).

Exercise 4.3

To convert a temperature from degrees Fahrenheit to degrees Centigrade we subtract 32and then multiply by 5/9. If f is the temperature in degrees Fahrenheit and c is thetemperature in degrees Centigrade, find the function c(f).

What is the inverse of this function?

Exercise 4.4

The total revenue that a firm receives from selling different levels of output q is given bythe function R(q) = 40q − 4q2 for 0 ≤ q ≤ 10. A manager would like to know the inversefunction so that they can determine how many products need to be sold in order toobtain a certain revenue. Explain why it is not possible to find this inverse function.

What happens if 0 ≤ q ≤ 5?

Exercise 4.5

Use the laws of logarithms to evaluate the following.

(i) log3(812); (ii) log5(25 · 125);

(iii) log(1, 000, 000); (iv) log(1003)− 2 log(100).

84

Page 97: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

4. Functions

Exercise 4.6

Solve the following equations.

(i) x2 = 4, (ii) 2x = 4, (iii) 2x2

= 4.

85

Page 98: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

5. Calculus I — Differentiation

Unit 5: Calculus IDifferentiation

Overview

In this unit, we start our study of calculus by introducing the notion of differentiation.In particular, we ask how we can find the gradient of a curve at a point and see thatdifferentiation allows us to answer this question in a very easy way. We also see how todifferentiate some simple functions using standard derivatives and rules ofdifferentiation.

Aims

The aims of this unit are as follows.

To see how derivatives are related to the gradient of a curve.

To introduce the techniques for finding simple derivatives.

Specific learning outcomes can be found near the end of this unit.

5.1 The gradient of a curve at a point

We have seen that a function is a way of mathematically describing how one quantitydepends on another. That is, given x, which we call the independent variable since weare free to specify its value, we can find the value of f(x). If we let y = f(x), then eachvalue of x, through f , gives us the value of y, which we call the dependent variable sincethe value we get for y depends on the value of x we used. Differentiation, as we shallsoon see, is a way of seeing how changes in x are related to changes in y.

Example 5.1 In economics, if we are given a function which links the profit, π, ofa firm to its production level, q, we may want to find out how the profit changes ifwe change the production level.

Indeed, if the profit function is linear, i.e. its graph is a straight line, we havesomething like

π(q) = mq + c,

and we can easily see how this works since the gradient of a linear function tells ushow changes in profit, ∆π, are related to changes in the production level, ∆q. Thatis, we have

m =∆π

∆q,

86

Page 99: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

5. Calculus I — Differentiation

and so an increase in production level of one unit (i.e. ∆q = 1) leads to a change inprofit given by ∆π = m.

However, we can only tell such a simple story for linear functions, i.e. straight lines,because the gradient of a straight line is constant. That is, as we saw before, whichevertwo points on the line we use to calculate the gradient, we always get the same answer.But, unfortunately, this doesn’t work for more complicated curves.

Example 5.2 If we are given the quadratic function f(x) = x2, whose graph is theparabola y = x2, we could try to estimate its gradient at the point (2, 4) byconsidering the changes in x and y between this point and the points, say, (3, 9) and(4, 16) which also lie on the parabola.

Here, the points (2, 4) and (3, 9) give us a gradient of

∆y

∆x=

9− 4

3− 2=

5

1= 5,

and the points (2, 4) and (4, 16) give us a gradient of

∆y

∆x=

16− 4

4− 2=

12

2= 6.

But, unsurprisingly perhaps, these give us different values and so, clearly, we can notjust use a pair of points on a parabola to find its gradient at the point (2, 4).

So, in the case of non-linear functions, i.e. curves that are not straight lines, since wecannot just look at the changes between a point and any other point on the curve tofind the gradient of the curve at that point, we must ask what we can do to find it.

5.1.1 Tangents to a parabola

We start our discussion of how to find the gradient of a curve which isn’t a straight lineby considering how we could go about doing this for parabolae. Consider, for example,the parabola y = x2 illustrated in Figure 5.1(a). Let’s say that we want to find thegradient of this curve at the point labelled A in the diagram. As you can see, this is onthe curve and has coordinates (2, 4).∗

To do this, we want to find the tangent to the curve at this point. This is the straightline that goes through the point in a ‘special way’ as illustrated by the line labelled T inFigure 5.1(a). Indeed, if we can find this line, then we say that the gradient of the curveat A is given by the gradient of this line. That is, the gradient of the curve at A isdefined to be the gradient of the straight line which is the tangent to the curve at A.

But, how can we find the gradient of this line? We first notice that there are many lines,with different gradients, that will go through this point. For instance, consider twoother straight lines that go through the point A, say the lines L1 and L2 inFigure 5.1(b), and observe that

∗Obviously, to find the gradient of a curve at any given point, that point must lie on the curve!

87

Page 100: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

5. Calculus I — Differentiation

A4

2

T

xO

y

y = x2

B

C

A

L1L2

4

2

T

xO

y

y = x2

(a) (b)

Figure 5.1: (a) The straight line labelled T is the tangent to the parabola y = x2 atthe point, A, on the curve with coordinates (2, 4). (b) The tangent, T , goes through thepoint, A, with coordinates (2, 4) in a special way, namely, unlike other lines through A(such as the lines L1 and L2) it only has one point of intersection with the parabola.(Note that L1 is ‘steeper’ than T and so it cuts the parabola at both A and B whereasL2 is ‘shallower’ than T and so it cuts the parabola at both A and C.)

L1 is steeper than T ; whereas

L2 is shallower than T .

That is, we can see that the gradient of T must be somewhere between the gradient ofL2 and the gradient of L1. This means that we can try and find the gradient of T byconsidering the gradients of other lines whose gradients provide us with better estimatesof its value and one way of doing this is to look at chords.

5.1.2 Chords of a parabola

Given two points on a curve, we call the line segment joining them a chord. So, inFigure 5.2(a), the line segment C is the chord joining the points A(2, 4) and B(3, 9). Wecan use chords to estimate the gradient of the tangent to y = x2 at the point (2, 4), i.e.the straight line T in Figure 5.2(a), and once we have this, we have an estimate of thegradient of the curve at that point. Indeed, looking at Figure 5.2(b), we have drawnthree chords and these give us the following estimates for the gradient of T .

88

Page 101: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

5. Calculus I — Differentiation

A4

T

C

B

3

9

2xO

y

y = x2

73

83

y

xO 2

1

T

499

649

13

23

3

9

4

(a) (b)

Figure 5.2: (a) C, the chord joining the points A(2, 4) and B(3, 9) on the parabola y = x2

and, T , the tangent to y = x2 at the point (2, 4). (b) The chords joining the points (3, 9),(83, 64

9

)and

(73, 49

9

)to the point (2, 4). Observe that, as the x-coordinate of the chord gets

closer to x = 2, the gradient of the chord gets closer to the gradient of T , the tangent toy = x2 at the point (2, 4).

Now, the chord joining the points (2, 4) and

(3, 9) has a gradient given by m =9− 4

3− 2=

5

1= 5.

(83, 64

9

)has a gradient given by m =

649− 4

83− 2

=28923

=14

3= 42

3.

(73, 49

9

)has a gradient given by m =

499− 4

73− 2

=13913

=13

3= 41

3.

In particular, notice that as the x-coordinate of the other point on the curve gets closerto x = 2, the gradients of the chords get smaller (i.e. the chords get less steep) and getcloser to the gradient of T . That is, we are getting better estimates of the gradient of Tand this is an idea that we can generalise to find the gradient of T itself!

To see how this generalisation works, let h > 0 be a real number and consider the chord,C, joining the points (2, 4) and (2 + h, [2 + h]2) on the parabola y = x2 as inFigure 5.3(a). As before, we now look at the gradient of the chord joining these twopoints and we find that

m =[2 + h]2 − 4

(2 + h)− 2=

[4 + 4h+ h2]− 4

h=

4h+ h2

h= 4 + h.

89

Page 102: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

5. Calculus I — Differentiation

Now, if we let h get smaller, i.e. the x-coordinate of B gets closer to x = 2, we shouldget a better estimate of the gradient of T .† Indeed, we can see that as h gets closer andcloser to zero, m gets closer and closer to four and so, this must be the sought-aftergradient of T .

A4

T

C

B

2

[2 + h]2

2 + hxO

y

y = x2

y

xO 2

h2

h1

h3T

4

[2 + h1]2

[2 + h2]2

[2 + h3]2

(a) (b)

Figure 5.3: C, the chord joining the points A(2, 4) and B(2 + h, [2 + h]2) on the parabolay = x2 for some real number h > 0 and, T , the tangent to y = x2 at the point (2, 4). (b)As we take three successively smaller values of h given by h1 > h2 > h3, we see that thegradient of the chord gets closer to the gradient of T .

But, of course, we can generalise this method further by asking for the gradient of thetangent at any point (x, x2) on the curve y = x2. In this case, we want the chord joiningthe point (x, x2) with the point (x+ h, [x+ h]2) for some real number h > 0. Thegradient of the chord joining these two points is then given by

m =[x+ h]2 − x2

(x+ h)− x =[x2 + 2hx+ h2]− x2

h=

2hx+ h2

h= 2x+ h.

Now, if we let h get smaller, i.e. the x-coordinate of the point (x+ h, [x+ h]2) getscloser to the x-coordinate of the point (x, x2), we should get a better estimate of thegradient of the tangent line at the point (x, x2). Indeed, we can see that as h gets closerand closer to zero, m gets closer and closer to 2x and so, this must be the sought-aftergradient of the tangent to y = x2 at the point (x, x2). Indeed, notice that if we havex = 2 as we did above, we find that the gradient of the tangent is 2× 2 = 4 as before!

†This idea is illustrated in Figure 5.3(b) where we take three successively smaller values of h givenby h1 > h2 > h3 and see how the gradient of the chord gets closer to the gradient of T .

90

Page 103: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

5. Calculus I — Differentiation

5.1.3 Tangents to other curves

We have seen, in the special case where f(x) = x2, that we can find the gradient of thetangent to the curve y = f(x) at the point (x, f(x)) by considering the gradient of thechord between the points (x, f(x)) and (x+ h, f(x+ h)) for some real number h > 0.Indeed, the gradient of this chord is given by

m =f(x+ h)− f(x)

(x+ h)− x =f(x+ h)− f(x)

h,

and the gradient of the tangent to the curve y = f(x) at the point (x, f(x)) is what thisgives us when we let h go to zero.

Activity 5.1 What is the gradient of the tangent to the curve y = k (where k is afixed real number) at the point with x-coordinate, x?

Activity 5.2 What is the gradient of the tangent to the curve y = mx+ c (wherem and c are fixed real numbers) at the point with x-coordinate, x?

Activity 5.3 What is the gradient of the tangent to the curve y = ax2 + bx+ c(where a, b and c are fixed real numbers) at the point with x-coordinate, x?

5.2 What is differentiation?

As mentioned above, we want to define the gradient of the curve y = f(x) at the point(x, f(x)) to be the gradient of the tangent to the curve at this point. And, we have seenhow to find the latter by looking at the chords between the point (x, f(x)) and(x+ h, f(x+ h)) for some real number h > 0 and then seeing what happens to thequantity

f(x+ h)− f(x)

h,

as h goes to zero. This procedure is known as differentiation and we denote the newfunction of x we find from this process by

df

dx, or more compactly, f ′(x),

and this notation tells us to differentiate f(x) with respect to the variable x. Theresult of this procedure is called the derivative of f(x) with respect to x.

Example 5.3 As we saw above, the gradient of the tangent to y = f(x) withf(x) = x2 at the point (x, x2) is given by 2x. Thus, the derivative of f(x) withrespect to x can be written as

df

dx= 2x, or more compactly, f ′(x) = 2x.

91

Page 104: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

5. Calculus I — Differentiation

If we want to calculate the derivative at a certain point, say when x = 2, we can nowevaluate

df

dx

∣∣∣∣x=2

= 2 · 2 = 4, or more compactly, f ′(2) = 2 · 2 = 4.

By definition, this must be the gradient of the tangent line to the parabola y = x2 atthe point where x = 2, i.e. the point with coordinates (2, 4), and this is indeed whatwe found earlier.

Most functions can be differentiated, but we don’t want to use the definition ofdifferentiation every time we need to find a derivative. So, we let other people do thehard work and take note of two different kinds of thing they can tell us, namely:

standard derivatives so that we can differentiate our basic functions, and

rules of differentiation so that we can differentiate combinations of our basicfunctions.

So, we now start our study of differentiation proper by introducing the most basicstandard derivatives and the two easiest rules.

5.2.1 Standard derivatives

We now introduce the standard derivatives which we will be using in this course. Westart with functions which are either constants or constant powers of x as these followon quite naturally from what we have seen so far. We then introduce the standardderivatives for the other basic functions that we need.

Standard derivatives: Constant functions

If k is a constant, then the derivative of the function f(x) = k is

f ′(x) = 0.

That is, if k is a constant, f(x) = k is a function whose derivative (or gradient) is equalto zero at every point.

Example 5.4 Clearly, this means that:

If f(x) = 5, then f ′(x) = 0.

If f(x) = 0, then f ′(x) = 0.

If f(x) = −5, then f ′(x) = 0.

That is, if we have a constant function (i.e. a function that gives us the same fixednumber as its output for any value of its input, x) we will get zero when wedifferentiate it!

92

Page 105: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

5. Calculus I — Differentiation

Activity 5.4 Thinking geometrically, why is this standard derivative obvious?

We now introduce a more complicated, and more useful, standard derivative.

Standard derivatives: Constant powers of x

If k 6= 0 is a constant, then the derivative of the function f(x) = xk is

f ′(x) = kxk−1.

Observe, in particular, that if k = 0, we have f(x) = x0 = 1, which is a constantfunction and so its derivative is f ′(x) = 0 using the previous standard derivative. Also,as

f(x) = x can be written as f(x) = x1, we have f ′(x) = 1x0 = 1,

i.e. we have

f(x) = x =⇒ f ′(x) = 1,

which is a useful thing to remember.

Example 5.5 Clearly, this means that:

If f(x) = x5, then f ′(x) = 5x4.

If f(x) = x0, then f ′(x) = 0.

If f(x) = x−5, then f ′(x) = −5x−6.

Indeed, using powers we can see that this allows us to differentiate some quitecomplicated looking functions of x:

If f(x) =√x = x

12 , then f ′(x) = 1

2x−

12 .

If f(x) =1√x

= x−12 , then f ′(x) = −1

2x−

32 .

If f(x) =√x3 = x

32 , then f ′(x) = 3

2x

12 .

And so, when differentiating, always be on the look-out for functions of x which areconstant powers ‘in disguise’.

Standard derivatives: exponential and logarithm functions

The derivative of the exponential function f(x) = ex is

f ′(x) = ex.

Observe, in particular, that this is one of the special properties of the function ex wheree is the exponential constant: it is the function whose value and gradient are the sameat every point. (We will see another special property of ex in Section 9.1.3.)

93

Page 106: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

5. Calculus I — Differentiation

The derivative of the logarithm function f(x) = lnx is

f ′(x) =1

x.

Of course, as the functions ex and ln x are related by the fact that one is the inverse ofthe other, we should expect that there is a relationship between these results for theirderivatives. This is indeed the case as you will see in Activity 6.1.

We also have exponential and logarithm functions with bases other than e, but theseare easily derived from these standard derivatives as you will see in Activity 6.2.

Standard derivatives: sine and cosine functions

The derivative of the sine function f(x) = sinx is

f ′(x) = cos x,

whereas the derivative of the cosine function f(x) = cos x is

f ′(x) = − sinx.

Of course, as the sine and cosine functions are related by the fact that one is just a‘shift’ of the other, we should expect that there will be a relationship between theirderivatives. Maybe you can look at the graphs of these functions (see Figure 4.5 inSection 4.1.2) and convince yourself that their derivatives (i.e. their gradients at eachpoint) are related in this way.

Standard derivatives: summary

In summary, we have the following standard derivatives.

If k is a constant, then f(x) = k gives f ′(x) = 0.

If k 6= 0 is a constant, then f(x) = xk gives f ′(x) = kxk−1.

f(x) = ex gives f ′(x) = ex.

f(x) = ln x gives f ′(x) =1

x.

f(x) = sin x gives f ′(x) = cos x.

f(x) = cos x gives f ′(x) = − sinx.

Standard derivatives

We now look at how we can differentiate some simple combinations of these functions.

5.2.2 Two rules of differentiation

In Section 4.1.3, we saw five ways of combining given functions to make new functions.The first two of these were, given a constant k and two functions f(x) and g(x), that we

94

Page 107: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

5. Calculus I — Differentiation

could find:

a constant multiple of f , which was the function kf where (kf)(x) = k · f(x).

the sum of f and g, which was the function f + g where (f + g)(x) = f(x) + g(x).

The question is, if we can differentiate the functions f and g, can we also differentiatethe functions kf and f + g? Obviously, the answer is ‘yes’, and we do it by using rulesof differentiation. Among other things, these rules will allow us to differentiate anypolynomial function of x.

The constant multiple rule

The constant multiple rule tells us how to differentiate a constant multiple of a functionf(x) and it works as follows.

If k is a constant and f is a function, then

d

dx[kf(x)] = k

df

dx,

or, using our shorthand, (kf)′(x) = kf ′(x).

Constant multiple rule

Example 5.6 Clearly, this means that:

If f(x) = 5x3, then f ′(x) = 5(3x2) = 15x2.

If f(x) = −3x−12 , then f ′(x) = −3

(−1

2x−

32

)= 3

2x−

32 .

If f(x) = −6√x3 = −6x

32 , then f ′(x) = −6

(32x

12

)= −9x

12 .

So, in these cases, we just differentiate as before and then multiply the answer bythe appropriate constant multiple.

The sum rule

The sum rule tells us how to differentiate the sum of two functions f(x) and g(x) and itworks as follows.

If f and g are functions, then

d

dx[f(x) + g(x)] =

df

dx+

dg

dx,

or, using our shorthand, (f + g)′(x) = f ′(x) + g′(x).

Sum rule

95

Page 108: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

5. Calculus I — Differentiation

Example 5.7 Clearly, this means that:

If f(x) = x+ x3, then f ′(x) = 1 + 3x2.

If f(x) = x2 + x12 , then f ′(x) = 2x+ 1

2x−

12 .

If f(x) =√x+

1√x

= x12 + x−

12 , then f ′(x) = 1

2x−

12 − 1

2x−

32 .

So, in these cases, we just differentiate as before and then add the answers together.

5.2.3 Some general points on what we have seen so far

We now take a moment to see what we can and can not do based on what we have seenin this section.

Combining our two rules of differentiation

It should be clear that, taken together, our two rules of differentiation enable us todifferentiate functions of the form kf(x) + lg(x) as follows.

If k and l are constants and f and g are functions, then

d

dx[kf(x) + lg(x)] = k

df

dx+ l

dg

dx,

or, using our shorthand, (kf + lg)′(x) = kf ′(x) + lg′(x).

Linear combination rule

Example 5.8 Clearly, this means that:

If f(x) = 5x2 + 7x3, then f ′(x) = 5(2x) + 7(3x2) = 10x+ 21x2.

If f(x) = x2 − x 12 , then f ′(x) = 2x− 1

2x−

12 .

If f(x) =√x− 3√

x= x

12 − 3x−

12 , then

f ′(x) = 12x−

12 − 3

(−1

2x−

32

)= 1

2x−

12 + 3

2x−

32 .

So, in these cases, we just differentiate as before and combine the answers in theobvious way.

Activity 5.5 Show that the constant multiple rule and the sum rule do indeed givethe linear combination rule.

Hence, use the linear combination rule to find the derivative of the functionf(x)− g(x) in terms of the derivatives of the functions f(x) and g(x).

96

Page 109: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

5. Calculus I — Differentiation

What we can and can not differentiate

There are some functions, related to the ones that we have seen above, which we candifferentiate using what we have seen so far.

Example 5.9 The following functions can be differentiated by simplifying themfirst.

As f(x) = (2x)2 = 4x2, we have f ′(x) = 4(2x) = 8x.

As f(x) = (4x)12 = 2x

12 , we have f ′(x) = 2

(12x−

12

)= x−

12 .

As f(x) = (2x)−3 = 18x−3, we have f ′(x) = 1

8(−3x−4) = −3

8x−4.

In particular, be sure that you understand why we get these derivatives and notanything else!

However, there are many other functions that we can not differentiate yet! And, inparticular, we now consider some common errors so that we can be sure that we don’tmake them in the future!

Example 5.10 Please note that, for two functions f(x) and g(x),

d

dx[f(x) · g(x)] is NOT

df

dx· dg

dx.

d

dx

[f(x)

g(x)

]is NOT

df

dx

/dg

dx.

And, for things like f(x) = e2x, we can NOT say what f ′(x) is, as even though wecan differentiate ex, we don’t yet know how to deal with the ‘2’ in e2x.

The correct way of differentiating all of the things listed in this example will be dealtwith in Unit 6.

Activity 5.6 If k is a constant, for each of the following functions, find itsderivative or explain why it can not be found using the results in this unit.

(i) f(x) = ex+k, (ii) g(x) = ekx, (iii) h(x) = exk

.

Activity 5.7 If k is a constant, for each of the following functions, find itsderivative or explain why it can not be found using the results in this unit.

(i) f(x) = ln(x+ k), (ii) g(x) = ln(kx), (iii) h(x) = ln(xk).

Differentiating with respect to other variables

So far, we have only been differentiating functions of x, like f(x), with respect to x. Butsometimes, we will want to differentiate functions of other variables with respect to

97

Page 110: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

5. Calculus I — Differentiation

their variable. The good news is that everything we have seen so far carries over in astraightforward way.

Example 5.11 Given the function f(y) = yk where k 6= 0 is a constant, we canwrite our standard derivative as

df

dy= kyk−1, or more compactly, f ′(y) = kyk−1,

so that we have things like

f(y) = 7y3 =⇒ f ′(y) = 7(3y2) = 21y2.

That is, everything stays the same with the exception that the ‘x’s are now replacedwith ‘y’s.

Example 5.12 Similarly, if f(q) = q2 − 3q + 7, then f ′(q) = 2q − 3(1) + 0 = 2q − 3.

Learning outcomes

At the end of this unit, you should be able to:

explain the relationship between the gradient of a curve and the derivative of afunction;

find simple derivatives by using the definition of the derivative;

find simple derivatives by using standard derivatives and the rules of differentiation.

Exercises

Exercise 5.1

Consider the parabola y = x2 and the point (3, 9) that is on this curve.

i. Find the gradient of the chords joining this point to the points on the curve withx = 4, x = 31

2and x = 31

4.

ii. Find the gradient of the chord joining this point to the point on the curve withx = 3 + h where h > 0 is a real number. What value does this give you as h goesto zero?

iii. By differentiating the function f(x) given by the curve y = f(x) above, find thegradient of the curve at the point (3, 9).

Note: Your final answers to ii. and iii. should be the same!

98

Page 111: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

5. Calculus I — Differentiation

Exercise 5.2

Find the derivatives of the following functions.

i. f(x) = −17; v. k(x) = 3 + ln(x);

ii. g(x) = 27x; vi. l(x) = 2 sin(x);

iii. h(x) = 2x3; vii. n(x) = 3 + x+ cos(x);

iv. j(x) = 20x253; viii. p(x) = x+ 2 ex.

Exercise 5.3

Find the derivatives of the following functions.

i. f(x) = sin(x) + cos(x); vi. l(x) = 5√x;

ii. g(x) = ln(x) + 4 ex; vii. n(x) = 3x2 − 5x+ 7;

iii. h(x) = ex − cos(x); viii. p(x) = 3x10 + 8x5;

iv. j(x) = 3 sin(x)− 3 ln(x); ix. r(x) = 3√x3 − 2x−1/2;

v. k(x) =4

x3; x. s(x) =

2

x2+

3

2x5.

Exercise 5.4

Find the derivatives of the following functions.

i. f(y) = 6y − 5; iii. h(z) = z2 −√z;

ii. g(q) = q2 − 3q + 2; iv. j(p) = −6.

99

Page 112: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

6. Calculus II — More differentiation

Unit 6: Calculus IIMore differentiation

Overview

We start by looking at how to differentiate products, quotients and compositions offunctions by introducing three new rules of differentiation. We then consider howderivatives can be used to find approximations to functions and the relevance of this toeconomics.

Aims

The aims of this unit are as follows.

To introduce the techniques for finding more complicated derivatives.

To see how derivatives allow us to find approximations.

Specific learning outcomes can be found near the end of this unit.

6.1 Three more rules of differentiation

We now consider the other three ways of combining given functions which we saw inSection 4.1.3. These were, given two functions f(x) and g(x), we could find the:

Product of f and g, which was the function f · g where (f · g)(x) = f(x)g(x).

Quotient of f and g, which was the function f/g where (f/g)(x) = f(x)/g(x).∗

Composition of f and g, which was the function f ◦ g where (f ◦ g)(x) = f(g(x)).

Once again, the question is, if we can differentiate f and g, can we also differentiate thefunctions f · g, f/g and f ◦ g? And, once again, the answer is ‘yes’ and we do it by usingthree new rules of differentiation.

6.1.1 The product rule

The product rule tells us how to differentiate the product of two functions f(x) and g(x)and it works as follows.

∗Provided, of course, that g(x) 6= 0.

100

Page 113: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

6. Calculus II — More differentiation

If f and g are functions, then

d

dx[f(x)g(x)] =

df

dxg(x) + f(x)

dg

dx,

or, using our shorthand, (f · g)′(x) = f ′(x)g(x) + f(x)g′(x).

Product rule

Example 6.1 Differentiate the function h(x) = (x+ 1)2.

We can write this function as h(x) = (x+ 1)(x+ 1) and so we have the product ofthe two functions

f(x) = x+ 1 and g(x) = x+ 1,

and these give usf ′(x) = 1 and g′(x) = 1.

As such, the product rule tells us that

h′(x) = (1)(x+ 1) + (x+ 1)(1) = 2(x+ 1).

Notice that we can check this answer as we can write h(x) = (x+ 1)2 as

h(x) = x2 + 2x+ 1,

by multiplying out the brackets and, differentiating, this gives us

h′(x) = 2x+ 2(1) = 2(x+ 1),

if we factorise. Clearly, this is the same as the answer we got from the product rule.

Example 6.2 Differentiate the function h(x) = x ex.

This is the product of the two functions

f(x) = x and g(x) = ex,

and these give usf ′(x) = 1 and g′(x) = ex.

As such, the product rule tells us that

h′(x) = (1)( ex) + (x)( ex) = (1 + x) ex.

Here, we can not check the answer as we can not rewrite the function h(x) = x ex.

Example 6.3 Differentiate the function h(x) = x ln(x).

This is the product of the two functions

f(x) = x and g(x) = ln(x),

101

Page 114: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

6. Calculus II — More differentiation

and these give us

f ′(x) = 1 and g′(x) =1

x.

As such, the product rule tells us that

h′(x) = (1)(ln(x)) + (x)

(1

x

)= ln(x) + 1.

Here, we can not check the answer as we can not rewrite the function h(x) = x ln(x).

Example 6.4 Differentiate the function h(x) = ex ln(x).

This is the product of the two functions

f(x) = ex and g(x) = ln(x),

and these give us

f ′(x) = ex and g′(x) =1

x.

As such, the product rule tells us that

h′(x) = ( ex)(ln(x)) + ( ex)

(1

x

)= ex

(ln(x) +

1

x

).

Here, we can not check the answer as we can not rewrite the functionh(x) = ex ln(x).

6.1.2 The quotient rule

The quotient rule tells us how to differentiate the quotient of two functions f(x) andg(x) and it works as follows.

If f and g are functions, then

d

dx

[f(x)

g(x)

]=

df

dxg(x)− f(x)

dg

dx[g(x)]2

,

or, using our shorthand,(f

g

)′(x) =

f ′(x)g(x)− f(x)g′(x)

[g(x)]2.

Of course, this all assumes that the quotient of the two functions is defined, i.e. itonly works for values of x where g(x) 6= 0.

Quotient rule

102

Page 115: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

6. Calculus II — More differentiation

Example 6.5 Differentiate the function h(x) =x+ 1

xfor x 6= 0.

This is the quotient of the two functions

f(x) = x+ 1 and g(x) = x,

and these give usf ′(x) = 1 and g′(x) = 1.

As such, the quotient rule tells us that

h′(x) =(1)(x)− (x+ 1)(1)

x2= − 1

x2,

for x 6= 0. Notice that we can check this answer as we can write h(x) as

h(x) =x+ 1

x=x

x+

1

x= 1 +

1

x= 1 + x−1,

and, differentiating, this gives us

h′(x) = 0 + (−x−2) = − 1

x2.

Clearly, this is the same as the answer we got from the quotient rule.

Example 6.6 Differentiate the function h(x) =ex

xfor x 6= 0.

This is the quotient of the two functions

f(x) = ex and g(x) = x,

and these give usf ′(x) = ex and g′(x) = 1.

As such, the quotient rule tells us that

h′(x) =( ex)(x)− ( ex)(1)

x2=x− 1

x2ex,

for x 6= 0. Here, we can not check the answer as we can not rewrite the functionh(x) = ex/x.

Example 6.7 Differentiate the function h(x) =x3

ln(x)for x 6= 1.†

This is the quotient of the two functions

f(x) = x3 and g(x) = ln(x),

and these give us

f ′(x) = 3x2 and g′(x) =1

x.

103

Page 116: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

6. Calculus II — More differentiation

As such, the quotient rule tells us that

h′(x) =(3x2)(ln(x))− (x3)

(1x

)[ln(x)]2

=x2(3 ln(x)− 1)

[ln(x)]2,

for x 6= 1. Here, we can not check the answer as we can not rewrite the functionh(x) = x3/ ln(x).

Example 6.8 Differentiate the function h(x) =ln(x)

ex.‡

This is the quotient of the two functions

f(x) = ln(x) and g(x) = ex,

and these give us

f ′(x) =1

xand g′(x) = ex.

As such, the quotient rule tells us that

h′(x) =

(1x

)( ex)− (ln(x))( ex)

[ ex]2=

(1− x ln(x)) ex

x e2x=

1− x ln(x)

x ex.

Here, we can not check the answer as we can not rewrite the functionh(x) = ln(x)/ ex.

6.1.3 The chain rule

The chain rule tells us how to differentiate the composition of two functions f(x) andg(x) and it works as follows.

If f and g are functions, then

d

dx[f(g(x))] =

df

dg· dg

dx,

or, using our shorthand, (f ◦ g)′(x) = f ′(g)g′(x).

Chain rule

Example 6.9 Differentiate the function h(x) = (x+ 1)2.

The function h(x) = (x+ 1)2 is the composition of the functions

f(g) = g2 and g(x) = x+ 1.

†Because, if x = 1, we have ln(x) = 0!‡Observe that as ex > 0 for all real numbers, x, we don’t have to worry about dividing by zero here.

104

Page 117: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

6. Calculus II — More differentiation

As such we havef ′(g) = 2g and g′(x) = 1,

and so the chain rule tells us that

h′(x) = (2g)(1) = 2g = 2(x+ 1).

Notice that this is the same as the answer we found Example 6.1.

Example 6.10 Differentiate the function h(x) = (2x+ 1)3.

The function h(x) = (2x+ 1)3 is the composition of the functions

f(g) = g3 and g(x) = 2x+ 1.

As such we havef ′(g) = 3g2 and g′(x) = 2,

and so the chain rule tells us that

h′(x) = (3g2)(2) = 6g2 = 6(2x+ 1)2.

Notice that we can check this answer as we can write h(x) = (2x+ 1)3 as

h(x) = 8x3 + 12x2 + 6x+ 1,

by multiplying out the brackets and, differentiating, this gives us

h′(x) = 24x2 + 24x+ 6 = 6(4x2 + 4x+ 1) = 6(2x+ 1)2,

if we factorise. And, clearly, this is the same as the answer we got from the chainrule.

Example 6.11 Differentiate the function h(x) =√

2x+ 1.

The function h(x) =√

2x+ 1 is the composition of the functions

f(g) =√g = g

12 and g(x) = 2x+ 1.

As such we have

f ′(g) =1

2g−

12 and g′(x) = 2,

and so the chain rule tells us that

h′(x) =

(1

2g−

12

)(2) = g−

12 =

1√2x+ 1

.

Here, we can not check the answer as we can not rewrite the functionh(x) =

√2x+ 1.

105

Page 118: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

6. Calculus II — More differentiation

Example 6.12 Differentiate the function h(x) =√x3 + 2.

The function h(x) =√x3 + 2 is the composition of the functions

f(g) =√g = g

12 and g(x) = x3 + 2.

As such we have

f ′(g) =1

2g−

12 and g′(x) = 3x2,

and so the chain rule tells us that

h′(x) =

(1

2g−

12

)(3x2) =

3x2

2√g

=3x2

2√x3 + 2

.

Here, we can not check the answer as we can not rewrite the functionh(x) =

√x3 + 2.

6.1.4 Further applications of the chain rule

We saw in Activities 5.6 and 5.7 that we could differentiate some quite complicatedfunctions involving logarithms and exponentials by being clever with the power lawsand the laws of logarithms. However, we also saw that some of the functions that wewanted to differentiate, such as the functions

ln(x+ k), ekx and exk

,

where k is a constant, couldn’t be found using such techniques. But, as is hopefullyclear, we can now see how to differentiate these functions by using the chain rule. Let’sconsider each of these in turn:

The function h(x) = ln(x+ k) is the composition of the functions

f(g) = ln(g) and g(x) = x+ k.

As such we have

f ′(g) =1

gand g′(x) = 1,

and so we get

h′(x) =

(1

g

)(1) =

1

g=

1

x+ k,

from the chain rule.

The function h(x) = ekx is the composition of the functions

f(g) = eg and g(x) = kx.

As such we havef ′(g) = eg and g′(x) = k,

and so we geth′(x) = ( eg)(k) = k eg = k ekx,

from the chain rule.

106

Page 119: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

6. Calculus II — More differentiation

The function h(x) = exk

is the composition of the functions

f(g) = eg and g(x) = xk.

As such we havef ′(g) = eg and g′(x) = kxk−1,

and so we geth′(x) = ( eg)(kxk−1) = kxk−1 eg = kxk−1 ex

k

,

from the chain rule.

Activity 6.1 (Hard)Suppose that y = ex so that x = ln y. Use the standard derivative for ex and thechain rule to show that

dx

dy=

1

y.

Hence deduce the standard derivative for lnx.

Activity 6.2 (Hard)Suppose that a 6= 1 is a positive number. Show that

f(x) = ax =⇒ f ′(x) = ax ln a,

and

f(x) = loga(x) =⇒ f ′(x) =1

x ln a.

6.1.5 Using these rules of differentiation together

Sometimes, it is necessary to apply several of the rules of differentiation in order to finda derivative. We now consider two examples that show how this can be done.

Example 6.13 Find the derivative of the function l(x) = (x3 + 1) ln(x2 + 4).

This is the product of the two functions

f(x) = x3 + 1 and g(x) = ln(x2 + 4),

and clearly, f ′(x) = 3x2. But, to differentiate g(x), we need to use the chain rulebecause it is a composition. In this case, we have

g(h) = ln(h) and h(x) = x2 + 4,

which gives us

g′(h) =1

hand h′(x) = 2x,

so that

g′(x) =

(1

h

)(2x) =

2x

h=

2x

x2 + 4,

by the chain rule. Now, putting all of this into the product rule gives us

l′(x) =

(3x2

)(ln(x2 + 4)

)+

(x3 + 1

)(2x

x2 + 4

)= 3x2 ln(x2 + 4) +

2x(x3 + 1)

x2 + 4,

as the sought-after derivative.

107

Page 120: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

6. Calculus II — More differentiation

Example 6.14 Find the derivative of the function l(x) = ex2

ln(x3 + 1).

This is the product of the two functions

f(x) = ex2

and g(x) = ln(x3 + 1).

To differentiate f(x) we need to use the chain rule because it is a composition. Inthis case, we have

f(h) = eh and h(x) = x2,

which gives usf ′(h) = eh and h′(x) = 2x,

so thatf ′(x) = ( eh)(2x) = 2x eh = 2x ex

2

,

by the chain rule. Then, to differentiate g(x) we need to use the chain rule againbecause it is also a composition. In this case, we have

g(h) = ln(h) and h(x) = x3 + 1,

which gives us

g′(h) =1

hand h′(x) = 3x2,

so that

g′(x) =

(1

h

)(3x2) =

3x2

h=

3x2

x3 + 1,

by the chain rule. Now, putting all of this into the product rule gives us

l′(x) =

(2x ex

2

)(ln(x3 + 1)

)+

(ex

2

)(3x2

x3 + 1

)=

(2x ln(x3 + 1) +

3x2

x3 + 1

)ex

2

,

as the sought-after derivative.

6.2 Approximating functions

Given a function, f(x), we can find its derivative, f ′(x), and we know from Unit 5 thatthe latter function tells us the gradient of the curve y = f(x) at the point on the curvewith coordinates (x, f(x)). As we shall now see, knowing the gradient of a function at apoint gives us a way of finding some useful approximations.

To see why, let’s say we have a cost function, C(q), that tells us the cost of producing aquantity, q, of some good. We might be interested in finding the increase in costs, ∆C,caused by changing the quantity produced from, say, q0 to q0 + ∆q, i.e. an increase inproduction of ∆q. In this case, the exact expression for the change in costs given thischange in production will be

∆C = C(q0 + ∆q)− C(q0).

This can be thought of as the marginal (or additional) cost of producing an extra

108

Page 121: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

6. Calculus II — More differentiation

quantity, ∆q, of our good. But, if ∆q is small§ we can find an approximation for ∆Cwhich uses the derivative of the cost function, C′(q), namely

∆C ' C′(q0)∆q.

Let’s look at an example to see how the answers from these two approaches compare.

Example 6.15 It costs a firm C(q) = 100q + 2q2 pounds to produce q units of agood. What is the increase in cost that would result from an increase in productionfrom 50 to 51 units?

To find the exact increase in costs, ∆C, we need to find

∆C = C(51)− C(50) = [100(51) + 2(51)2]− [100(50) + 2(50)2] = 302.

Or, to find the approximate increase in costs, we note that

C′(q) = 100 + 4q,

and so, as ∆q = 51− 50 = 1, we have

∆C ' C′(50)∆q = [100 + 4(50)](1) = 300.

Either way, we can see that the increase in costs resulting from an increase inproduction from 50 to 51 units would be about £300.

The reason why we can use the derivative here is that, geometrically, C′(q0) is thegradient of the tangent line, T , to the curve y = C(q) at the point (q0,C(q0)) and so,looking at this tangent line we can see that

C′(q0) =dC

dq

∣∣∣∣q=q0

' ∆C

∆q=⇒ ∆C ' C′(q0)∆q,

as shown in Figure 6.1. As such, the discrepancy between our exact and approximatevalues for ∆C is the difference between the y-coordinates of the curve y = C(q) and thetangent line T when q = q0 + ∆q. Obviously, the smaller ∆q is, the smaller thisdiscrepancy will be!

In fact, economists often work with marginal quantities and so, given a function f(x),we define the marginal function of f to be f ′(x). This will allow us to find theapproximate value of ∆f , the change in f associated with a change in x from x0 tox0 + ∆x, by using the formula

∆f ' f ′(x0)∆x.

For example, we can define the following important marginal functions from economics.

If C(q) is a cost function, MC(q) = C′(q) is the marginal cost function.

If R(q) is a revenue function, MR(q) = R′(q) is the marginal revenue function.

§In a sense that we do not exactly specify in this course!

109

Page 122: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

6. Calculus II — More differentiation

Exact∆C

Approx

∆C,

i.e.C

′(q0 )∆

q

y

qO

C(q0)

C(q0 +∆q)

q0 q0 +∆q

Ty = C(q)

∆q

Figure 6.1: The curve y = C(q) and the tangent, T , to this curve at the point (q0,C(q0)).Looking at an increase in q from q0 to q0 + ∆q, we can see that the corresponding changein the function C(q), i.e. ∆C, is given exactly by C(q0 + ∆q)− C(q0) and approximatelyby C′(q0)∆q where C′(q0) is the gradient of the tangent line.

If π(q) is a profit function, Mπ(q) = π′(q) is the marginal profit function.

Indeed, since we are using the derivative to just approximate certain changes in f , let’slook at an example of what we can do with marginal functions defined in this way.

Example 6.16 The profit function for a firm is π(q) = 100 + 20q − 2q2 poundswhen it is selling a quantity q. If the quantity sold is increased from 10 to 10.2, whatwill be the change in profit?

The marginal profit isMπ(q) = π′(q) = 20− 4q,

and so the change in profit will, approximately, be given by

∆π ' π′(10)∆q = [20− 4(10)](0.2) = −4.

Hence, the profit will decrease by approximately £4 if the quantity sold is increasedfrom 10 to 10.2 units.

Learning outcomes

At the end of this unit, you should be able to:

use the product, quotient and chain rules to find derivatives;

use derivatives to find approximations.

110

Page 123: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

6. Calculus II — More differentiation

Exercises

Exercise 6.1

For the following functions, identify the functions f(x) and g(x) such that the functionis f(x)g(x) and hence find the derivative of the function using the product rule.

i. x2(x+ 2); iii. 3x4√x;

ii. (2x+ 7)(x5 + 2); iv. (3x2 + 2) ln(x).

Also, in parts i., ii. and iii. check that your answer is correct by rewriting the functionand differentiating it without using the product rule. (Note that this check cannot beperformed in part iv.)

Exercise 6.2

For the following functions, identify the functions f(x) and g(x) such that the functionis f(x)/g(x) and hence find the derivative of the function using the quotient rule.

i.x+ 2

x2; iii.

4x2 + 1

x3 − 2x;

ii.32x5 + 3

2x5; iv.

2x2 + 7

ex.

Also, in parts i. and ii. check that your answer is correct by rewriting the function anddifferentiating it without using the quotient rule. (Note that this check cannot easily beperformed in parts iii. and iv.)

Exercise 6.3

For the following functions, identify the functions f(x) and g(x) such that the functionis f(g(x)) and hence find the derivative of the function using the chain rule.

i. (x+ 2)2; iii. (x4 + 3)−1;

ii. (x3 + 3x)2; iv. 3√

2x− 1.

Also, in parts i. and ii. check that your answer is correct by rewriting the function anddifferentiating it without using the chain rule. (Note that this check cannot beperformed in parts iii. and iv.)

Exercise 6.4

Differentiate the following functions using the appropriate rules.

i. x5(x8 + x2); iv.√√

x+ x;

ii. (x3 + 3)3; v.x4

1 + 2x6;

iii. ln ((x− 3)2); vi. 6x2(x7 + 6)−2.

111

Page 124: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

6. Calculus II — More differentiation

Exercise 6.5

Differentiate the following functions with respect to the independent variable usingwhichever rule is appropriate.

i.3

y + 1; iv.

2z5

32z5 + 3;

ii. q3 eq; v. ln(y3 + 3y2 + 4);

iii. e−2p2+p; vi. ln( ez).

Exercise 6.6

The level of demand for a product, q, is linked to its price, p, by the equationp2q = 6, 000. By writing q as a function of p and differentiating, estimate how sales willchange if the price is increased from £10 to £10.50.

What is the exact value of the change in sales if the price is increased from £10 to£10.50?

112

Page 125: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

7. Calculus III — Optimisation

Unit 7: Calculus IIIOptimisation

Overview

Having seen how to differentiate functions, we now turn our attention to someapplications of differentiation. In particular, we are interested in what derivatives cantell us about the behaviour of a function. This will lead on to a study of how we canoptimise a function of one variable, i.e. how we can use differentiation to find themaximum and/or minimum values of such a function, and how this information isinvaluable when we want to sketch their graphs.

Aims

The aims of this unit are as follows.

To see what derivatives tell us about functions.

To apply this to optimisation and curve sketching problems.

Specific learning outcomes can be found near the end of this unit.

7.1 What derivatives tell us about functions

In Unit 3 we saw how to use the completed square form of a quadratic to find theturning point of a parabola and determine whether it was a maximum or a minimum.But, if we have a curve which arises from a function that is more complicated than aquadratic, this method is useless since we can’t complete the square. As such, we nowturn our attention to developing another method for optimising a function of onevariable, i.e. finding any maxima or minima that it may have. The advantage of thismethod is that it will work for any function of one variable and it will rely heavily ondifferentiation. However, before we detail the method itself, it is useful to discuss someof the ideas behind it.

7.1.1 When is a function increasing or decreasing?

The first thing we want to note is that the sign of the first derivative at a point tells uswhether the function is increasing or decreasing at that point if we think of what ishappening as x itself is increasing. In particular, we note that:

If f ′(x) > 0, then the function is increasing at that value of x as illustrated inFigure 7.1(a).

113

Page 126: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

7. Calculus III — Optimisation

If f ′(x) < 0, then the function is decreasing at that value of x as illustrated inFigure 7.1(b).

In particular, the key idea is that f ′(x) tells us the gradient of the tangent line to thecurve at this value of x, which is labelled T in Figure 7.1, and if this is positive (ornegative) the function itself must be increasing (or decreasing).

y

xO

y = f(x)

T T

y

xO

y = f(x)

(a) (b)

Figure 7.1: As x increases, we see that at the indicated value of x, the function f(x) isincreasing in (a) and decreasing in (b). These correspond to points on the curve wherethe tangent line has a positive or negative gradient in (a) and (b) respectively. That is,the derivative, i.e. f ′(x), of the function at these values of x will be positive or negativerespectively.

Quite apart from the application of this idea to optimising functions of one variable,this idea can be useful in economic contexts as the following example shows.

Example 7.1 Consider a firm whose profit function is given byπ(q) = 100 + 20q − 2q2 pounds when it sells q units. For what values of q is thefirm’s profit decreasing with increasing q?

The firm’s profit will be decreasing when π′(q) < 0, i.e. when we have

π′(q) = 20− 4q < 0 =⇒ 20 < 4q =⇒ 5 < q,

i.e. if q > 5, then the firm’s profit is decreasing as q increases. As such, it would beunwise for the firm to produce more than five units since this puts them in aposition where their profits are decreasing!

7.1.2 Stationary points

We have seen that positive and negative values of f ′(x) correspond to points where f(x)is increasing or decreasing. But, what happens when the derivative is zero? In suchcases we say that the function is stationary because it is neither decreasing norincreasing as illustrated in Figure 7.2. We say that

114

Page 127: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

7. Calculus III — Optimisation

If f ′(x) = 0, then the function is stationary at that value of x. We call such valuesof x stationary points.

In particular, the key idea is that f ′(x) tells us the gradient of the tangent line to thecurve at this value of x, labelled T in Figure 7.2, and when f ′(x) = 0 we find that thistangent line must be horizontal.

y

xO

T

y = f(x)

y

xO

y = f(x)

T

(a) (b)

Figure 7.2: Two stationary points, i.e. points where the derivative is zero. Notice that in(a) the stationary point is a maximum and in (b) it is a minimum.

Indeed, we can see that at the point in

Figure 7.2(a), as x increases through the point, the function increases until it isstationary and then it decreases again, i.e. we have

f ′(x) > 0 until f ′(x) = 0 and then f ′(x) < 0,

and in such circumstances we say that the stationary point of the function is amaximum.

Figure 7.2(b), as x increases through the point, the function decreases until it isstationary and then it increases again, i.e. we have

f ′(x) < 0 until f ′(x) = 0 and then f ′(x) > 0,

and in such circumstances we say that the stationary point of the function is aminimum.

That is, the maxima and minima of a function, f(x), will be amongst its stationarypoints, i.e. points where f ′(x) = 0, and we can identify whether we have found amaximum or minimum by seeing how the sign of f ′(x) changes as we move through thestationary point.

A warning: Points of inflection

When looking for stationary points, i.e. finding the values of x where f ′(x) = 0, therewill be cases where what we find will be neither a maximum nor a minimum. In such

115

Page 128: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

7. Calculus III — Optimisation

cases, we will have a stationary point which is a point of inflection. Indeed, if we look atthe stationary point in:

Figure 7.3(a), as x increases through the point, the function increases until it isstationary and then it increases again, i.e. we have

f ′(x) > 0 until f ′(x) = 0 and then f ′(x) > 0.

Figure 7.3(b), as x increases through the point, the function decreases until it isstationary and then it decreases again, i.e. we have

f ′(x) < 0 until f ′(x) = 0 and then f ′(x) < 0.

In both of these cases the stationary point is a point of inflection.

y

xO

y = f(x)

T

y

xO

T

y = f(x)

(a) (b)

Figure 7.3: Two more stationary points, i.e. points where the derivative is zero. Noticethat in both (a) and (b) the stationary point is a point of inflection.

We will sometimes refer to stationary points which are maxima or minima as turningpoints since the function ‘turns’ (or ‘changes direction’) at these points. However,stationary points that are points of inflection are not turning points.

7.1.3 Second derivatives

So far, given a function of one variable, f(x), we have used differentiation to find a newfunction of x called its derivative and we denote this by

df

dx, or more compactly, f ′(x).

However, this new function is also a function of x and so we can differentiate again tofind its derivative. That is, having found

df

dxwe now work out

d

dx

(df

dx

),

116

Page 129: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

7. Calculus III — Optimisation

i.e. we differentiate the derivative again, and we denote the result of doing this by

d2f

dx2, or more compactly, f ′′(x).

Unsurprisingly, perhaps, we call this the second derivative of the original function, f(x).Of course, we could differentiate again to get the third derivative of f(x) and so on, butthe third and higher derivatives of f(x) are not necessary for this course.

Example 7.2 Given f(x) = x3 + x2 + x, its derivative is given by

df

dx= 3x2 + 2x+ 1, or more compactly, f ′(x) = 3x2 + 2x+ 1.

Now, if we want to find the second derivative of f(x), we want to differentiate f ′(x),i.e. we want

d2f

dx2=

d

dx

(df

dx

)=

d

dx

(3x2 + 2x+ 1

)= 6x+ 2,

or, more compactly, f ′′(x) = 6x+ 2. Indeed, if we want to calculate the secondderivative at a certain point, say when x = 2, we can now evaluate

d2f

dx2

∣∣∣∣x=2

= 6 · 2 + 2 = 14, or more compactly, f ′′(2) = 6 · 2 + 2 = 14.

Of course, we could now differentiate this again to get the third derivative of f(x),but we won’t!

7.1.4 What second derivatives tell us about a function

Second derivatives give us another way of assessing whether a stationary point is amaximum or a minimum. In particular, we note that given a function, f(x), the sign ofits second derivative, f ′′(x), tells us whether the derivative, f ′(x), is increasing ordecreasing at a given point. So, if x = a is a stationary point of f(x), i.e. a point wheref ′(x) = 0, then we find that:

If f ′′(a) < 0, then f ′(x) is decreasing at x = a, i.e. we must have

f ′(x) > 0 until f ′(x) = 0 and then f ′(x) < 0,

as x increases through the point where x = a. But, this means that:

If f ′′(a) < 0, the stationary point is a maximum.

If f ′′(a) > 0, then f ′(x) is increasing at x = a, i.e. we must have

f ′(x) < 0 until f ′(x) = 0 and then f ′(x) > 0,

as x increases through the point where x = a. But, this means that:

If f ′′(a) > 0, then the stationary point is a minimum.

117

Page 130: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

7. Calculus III — Optimisation

However, if our stationary point is a point of inflection, we find that f ′(x) decreases (orincreases) until it is zero and then it increases (or decreases) as x increases through thepoint where x = a, i.e. we find that f ′(x) is neither increasing nor decreasing whenx = a, and this means that we will find f ′′(a) = 0.

Warning! But, having said this, do not think that f ′′(a) = 0 implies that a stationarypoint is a point of inflection! The fact is that, in cases where f ′′(a) = 0, secondderivatives fail to tell us anything useful about the nature of a stationary point atx = a. In particular, f ′′(a) = 0 is compatible with a stationary point being a maximumor a minimum as well as a point of inflection! To see that this is the case, try thefollowing activity.

Activity 7.1 Consider the functions

f(x) = x4, g(x) = x3, h(x) = −x3 and i(x) = −x4.

Show that all four of these functions have a stationary point at x = 0 (i.e. that theirfirst derivatives are zero when x = 0) and that their second derivatives are also zerowhen x = 0.

By considering how the derivatives of these four functions change as you go throughthe stationary point with x = 0, determine their nature.

Deduce that f ′′(0) = 0 tells you nothing about the nature of the stationary point ofthese functions when x = 0.

7.1.5 A note on the ‘large x ’ behaviour of functions

Sometimes we will want to see what a function, say f(x), is doing for ‘large x’ and, bythis, we mean what the function is doing when |x| is large. That is, what happens when:

x is large [in magnitude] and positive, e.g. when x takes values like 1,000,000 andeven larger positive numbers and we think of this as telling us what happens tof(x) as x goes to infinity, denoted by x→∞, which corresponds to places whichare far along the x-axis in the right-hand direction, or

x is large in magnitude and negative, e.g. when x is −1, 000, 000 and even larger [inmagnitude] negative numbers and we like to think of this as telling us whathappens as x goes to minus infinity, denoted by x→ −∞, which corresponds toplaces which are far along the x-axis in the left-hand direction.

In particular, when dealing with polynomials, such as quadratics like

ax2 + bx+ c,

for constants a, b and c where a 6= 0 and cubics like

ax3 + bx2 + cx+ d,

for constants a, b, c and d where a 6= 0 we can easily determine how these functionsbehave for ‘large x’. The key to this is to isolate the highest power of x in yourpolynomial (so that’s ax2 in the quadratic above and ax3 in the cubic) and thenconsider that, if xn is this highest power, then:

118

Page 131: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

7. Calculus III — Optimisation

If n is even, your polynomial will become arbitrarily large and positive as x goes toboth +∞ and −∞.

If n is odd, your polynomial will become arbitrarily large and positive as x goes to+∞ and arbitrarily large [in magnitude] and negative as x goes to −∞.

Of course, multiplying xn by a constant a > 0 will not change this behaviour, but if wemultiply xn by a constant a < 0, then the sign of the large |x| behaviour above willchange.

7.2 Optimisation and curve sketching

We now summarise the method which we will use to optimise a function of one variablesuch as f(x). Most of this will follow from what we saw in the previous section, butthere will be some points that won’t become clear until we consider some examples ofhow it all works.

Step 1: Find all the stationary points of the function, i.e. all the values of x thatsatisfy the equation

f ′(x) = 0,

and, if necessary, their corresponding values of y using y = f(x).

Step 2: Determine the nature of the stationary points that you have found by usingone of the following two methods.

Method A: The first-derivative test: If, as x increases through the stationarypoint, we find that f ′(x) changes from:

positive to positive, then it is a point of inflection,

positive to negative, then it is a local maximum,

negative to positive, then it is a local minimum,

negative to negative, then it is a point of inflection.

This test will always work.

Method B: The second-derivative test: If, at the stationary point, we find thatf ′′(x) is:

negative, then it is a local maximum,

positive, then it is a local minimum.

This test fails if we find that f ′′(x) = 0 at the stationary point and,in such cases, the stationary point could be a local maximum, alocal minimum or a point of inflection!

Step 3: If necessary, we may need to identify any global maxima or global minima, i.e.the largest or smallest values that the function can take over its domain. Thisidentification will involve some or all of the following:

Identifying the largest local maxima and the smallest local minima.

If the domain of the function is:

119

Page 132: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

7. Calculus III — Optimisation

• restricted, we must evaluate the value of the function atany endpoint(s).

• unrestricted, we must consider its behaviour as xbecomes arbitrarily large in magnitude (i.e. as ‘x→ +∞’or ‘x→ −∞’).

In what follows we will see how the first two steps of this method work, how this helpsus when we want to sketch curves, examine what is involved in Step 3 and see how thismethod can be used in economics.

7.2.1 Steps 1 and 2: Finding and classifying stationary points

Let’s start by considering a straightforward example of the first two steps of thismethod in action.

Example 7.3 Find the stationary points of the function

f(x) = x3 − 3x2,

and determine their nature.

For Step 1, we find the stationary points of the function by solving f ′(x) = 0 and so,as

f ′(x) = 3x2 − 6x,

we solve the equation

3x2 − 6x = 0 =⇒ 3x(x− 2) = 0 =⇒ x = 0 or x = 2,

to see that stationary points occur when x = 0 and x = 2. Indeed, at these values ofx, we see that the function itself takes the values

f(0) = (0)3 − 3(0)2 = 0,

when x = 0, andf(2) = (2)3 − 3(2)2 = 8− 12 = −4,

when x = 2.

For Step 2, we can determine the nature of these points by using thesecond-derivative test. To do this, we see that

f ′′(x) = 6x− 6 = 6(x− 1),

so that,

when x = 0 we have f ′′(0) = −6 < 0 and so this stationary point is a localmaximum, and

when x = 2 we have f ′′(2) = 6 > 0 and so this stationary point is a localminimum.

Thus, the function f(x) has a local maximum when x = 0 and f(x) = 0, and a localminimum when x = 2 and f(x) = −4.

120

Page 133: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

7. Calculus III — Optimisation

Of course, we can also tackle similar problems for more complicated functions as youcan see if you try the next activity.

Activity 7.2 Find and classify the stationary points of the function g(x) = x2 ex.

And, lastly, if you want to have a look at an example where the second-derivative testfails at one of the stationary points, try the next activity.

Activity 7.3 Find and classify the stationary points of the function h(x) = x3 e−x.

7.2.2 Curve sketching

Now that we can find the stationary points of functions that are more complicated thanquadratics, we are in a position where we can sketch the curves represented by y = f(x)for such functions, f(x). This is a useful skill in its own right, but it will also be usefulfor us to have a graphical representation of the three functions we considered abovewhen we come to talk about the third step of our method.

Example 7.4 Sketch the curve with equation y = f(x) where, as above,f(x) = x3 − 3x2.

We find the ‘key features’ of this curve, namely:

The y-intercept of the curve occurs when x = 0 and so, substituting this into itsequation, we get y = 0 as the y-intercept.

The x-intercepts of the curve occur when y = 0 and so we have to solve theequation

x3 − 3x2 = 0 =⇒ x2(x− 3) = 0 =⇒ x = 0 or x = 3.

Thus, x = 0 and x = 3 are the x-intercepts.

The stationary points of the curve were found above, i.e. we found that it had alocal maximum at the point (0, 0) and a local minimum at the point (2,−4).

With this information, we can begin to sketch the curve by roughly indicating these‘key features’ on some axes as in Figure 7.4(a) and then, joining them up with a nicesmooth curve, we get the sketch itself as in Figure 7.4(b). In particular, notice thatin our sketch we have:

For x > 2, the function is increasing and, as we know that it must cut the x-axisat x = 3, we expect it to increase in the manner shown as x goes to +∞.

For x < 0, the function is decreasing and as such it will continue decreasing,going to more and more negative values of y, as x goes to −∞.

As we shall see, it is often useful to spend a moment thinking about what the curvedoes away from its ‘key features’ so that we can accurately represent it in our sketch.

121

Page 134: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

7. Calculus III — Optimisation

y

2

−4

3xO

y

y = f(x)

2

−4

3xO

(a) The key features (b) The sketch

Figure 7.4: Sketching the curve y = x3 − 2x2 − 15x in Example 7.4. (a) Using what wehave discovered about the ‘key features’ of the curve, we can begin to see what it mustlook like. (b) By joining up these ‘key features’ with a nice smooth curve, we get thesketch itself.

Example 7.5

Sketch the curve y = f(x) where f(x) = 2x4 − 4x3 + 2x2.

We find the key features of this curve according to the list given above, namely:

x-intercepts: These occur when y = 0 and so we solve the equation given byf(x) = 0, i.e.

2x4 − 4x3 + 2x2 = 0,

which, on taking out the common factor of 2x2 and factorising the remainingquadratic, gives us

2x2(x2 − 2x+ 1) = 0 =⇒ 2x2(x− 1)2 = 0.

Thus, the x-intercepts occur when x = 0 and x = 1.

y-intercept: This occurs when x = 0 and so using y = f(0) we see that they-intercept occurs when y = 0. Note, in particular, that this means that thecurve goes through the origin (as we should have expected since one of thex-intercepts occurs when x = 0).

Finding the stationary points: These occur when f ′(x) = 0 and so, noting that

f ′(x) = 8x3 − 12x2 + 4x,

we solve the equation8x3 − 12x2 + 4x = 0,

which, on taking out a common factor of 4x and factorising the remainingquadratic, gives us

4x(2x2 − 3x+ 1) = 0 =⇒ 4x(2x− 1)(x− 1) = 0,

122

Page 135: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

7. Calculus III — Optimisation

and so the stationary points occur when x = 0, x = 1/2 and x = 1. Then, weuse y = f(x) to find the values of y at these points so that we can locate themon the sketch. Doing this, we find that

• x = 0 gives y = f(0) = 0,

• x = 1/2 gives y = f(1/2) = 1/8, and

• x = 1 gives y = f(1) = 0.

So, the stationary points have coordinates given by (0, 0), (1/2, 1/8) and (1, 0).

Classifying the stationary points: Let’s use the second-order derivative test here.We can see that

f ′′(x) = 24x2 − 24x+ 4,

and so, looking at the stationary points, we have

• f ′′(0) = 4 > 0 and so (0, 0) is a local minimum;

• f ′′(1/2) = −2 < 0 and so (1/2, 1/8) is a local maximum; and

• f ′′(1) = 4 > 0 and so (1, 0) is a local minimum.

Limiting behaviour: The term with the highest power of x in f(x) is 2x4 and sof(x)→∞ as x→∞ and as x→ −∞.

With this information, we begin to sketch this curve by roughly indicating these keyfeatures on some axes as in Figure 7.5(a) and then, joining them up with a nicesmooth curve, we get the sketch itself as in Figure 7.5(b).

y

Ox11

2

18

y

Ox1

y = f(x)

12

18

(a) The key features (b) The sketch

Figure 7.5: Sketching the curve y = 2x4 − 4x3 + 2x2 in Example 7.5. (a) Using what wehave discovered about the key features of the curve, we can begin to see what it mustlook like. (b) By joining up these key features with a nice smooth curve, we get the sketchitself.

7.2.3 Step 3: Looking for global maxima and global minima

In Step 3, we are concerned with identifying the largest and smallest values a functioncan attain, i.e. its global maximum and its global minimum respectively, if they exist! Ofcourse, if we have sketched the graph of the function, it should be easy for us to identifyany such largest or smallest values of a function. We will use the two curves that wesketched in Examples 7.4 and 7.5 to illustrate these points in the two key cases.

123

Page 136: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

7. Calculus III — Optimisation

(a) If the domain of the function is unrestricted

In these cases, we are free to consider any value of x and we want to find the largest andsmallest values a function can attain, i.e. its global maximum and its global minimumrespectively, if they exist! In particular, we will need to consider the value of thefunction at any stationary points and the behaviour of the function for large |x|.

For the curve sketched in Example 7.4, as sketched in Figure 7.4(b), we see thatalthough there is a local minimum at (2,−4) and a local maximum at (0, 0), thereis no global minimum as x can take arbitrarily large [in magnitude] negative valuesas x goes to −∞ and there is no global maximum as x can take arbitrarily largepositive values as x goes to +∞.

For the curve sketched in Example 7.5, as sketched in Figure 7.5(b), we see thatalthough there is a local maximum at (1/2, 1/8), there is no global maximum as xcan take arbitrarily large positive values as x goes to −∞ or +∞. We also havelocal minima at (0, 0) and (1, 0) and as these both give us the smallest value thatthe function can take (i.e. y = 0), we see that both of these points give us a globalminimum.

(b) If the domain of the function is restricted

In these cases, the values of x that we are free to consider are restricted to some intervalsuch as a ≤ x ≤ b and we want to find the largest and smallest values a function canattain, i.e. its global maximum and its global minimum respectively, if they exist! Inparticular, in these cases, we need to consider the value of the function at anystationary points and its value at the endpoints of the interval.

For the curve sketched in Example 7.4 with x in the interval 1 ≤ x ≤ 3, as sketchedin Figure 7.6(a), we see there is a local minimum at (2,−4) and this is the globalminimum as y = −4 is the smallest value that the function can take. Also, we seethat there is a global maximum at the endpoint where x = 3 as y = f(3) = 0 is thelargest value that the function can take. (Of course, this global maximum is anendpoint of the interval but not a stationary point of the function!)

For the curve sketched in Example 7.5 with x in the interval −1/4 ≤ x ≤ 5/4, assketched in Figure 7.6(b), we see that the local minima at (0, 0) and (1, 0) still bothgive us the smallest value that the function can take (i.e. y = 0), and so both ofthese points still give us a global minimum. Also, the local maximum at (1/2, 1/8)is now a global maximum as y = 1/8 is now the largest value that the function cantake.

7.2.4 An economic application: Profit maximisation

Optimisation problems are very common in economics and we know one way in whichthey can arise in that subject, namely when a firm wants to find the level of productionthat maximises its profit. In particular, when a firm sells an amount, q, it makes a profitgiven by

π(q) = R(q)− C(q),

124

Page 137: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

7. Calculus III — Optimisation

y

y = f(x)

2

3xO

1

−4

−2

y

Ox1

y = f(x)

12

18

−14

54

(a) (b)

Figure 7.6: (a) The function from Example 7.4 when we only consider values of x in theinterval 1 ≤ x ≤ 3 and (b) the function from Example 7.5 when we only consider valuesof x in the interval −1/4 ≤ x ≤ 5/4. In particular, the dotted parts of these curves areirrelevant here because they correspond to values of x which are not in the given intervals.

where R(q) is the revenue generated by selling this amount and C(q) is the cost ofproducing this amount. Obviously, when doing this, the firm will want to sell anamount q that will maximise its profit. Indeed, whereas the costs involved aredetermined by factors intrinsic to the firm, the revenue generated is given by

R(q) = pq,

where p, the price per unit, is determined by the market the firm is selling in.

As an example, consider the case where the firm is a monopoly, i.e. it is the only supplierof this product to the market. Indeed, as it is the only supplier and the amount it issupplying is q, the price that the consumers will be willing to pay for this is given byp = pD(q) where pD(q) is, as in Section 4.2.1, the inverse demand function of the market.As such, in this case, the revenue generated by the sale of an amount q is given by

R(q) = qpD(q),

and this will yield a profit of

π(q) = qpD(q)− C(q).

Thus, in the case of a monopoly, given the firm’s cost function and the inverse demandfunction for the market, we should be able to determine the amount, q, that the firmshould be selling by finding the value of q that maximises the firm’s profit. Let’s look atan example.

125

Page 138: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

7. Calculus III — Optimisation

Example 7.6 Suppose that a firm is a monopoly with a cost function given by

C(q) = q3 − 10q2 + 25q + 10,

and the inverse demand function for this good is

pD(q) = 10− q.

Find the value of q that will maximise the firm’s profit.

This is a constrained optimisation problem as we must have

q ≥ 0 as q denotes the amount of the good being sold, and

q ≤ 10 as, otherwise, the price that the consumers will pay will be negative.

So, we need to maximise the firm’s profit, i.e.

π(q) = qpD(q)− C(q) = q(10− q)− (q3 − 10q2 + 25q + 10) = −q3 + 9q2 − 15q − 10,

given that q is in the interval given by 0 ≤ q ≤ 10.

To do this, we note that π′(q) is given by

π′(q) = −3q2 + 18q − 15,

and so, as the stationary points occur when π′(q) = 0, we solve the equation

−3q2 + 18q − 15 = 0 =⇒ q2 − 6q + 5 = 0 =⇒ (q − 1)(q − 5) = 0,

to see that the stationary points occur when q = 1 and q = 5. We can then see that

π′′(q) = −6q + 18,

which, using the second-derivative test, tells us that when:

q = 1, we have π′′(1) = 12 > 0, and so this is a local minimum.

q = 5, we have π′′(5) = −12 < 0, and so this is a local maximum.

This means that the point we seek, i.e. the maximum of the profit function, mustoccur at q = 5 or at one of the two endpoints of our interval. But, using the profitfunction, we see that

π(0) = −10, π(5) = 15 and π(10) = −260,

which means that the maximum occurs at q = 5 because it yields the largest profit.Thus, q = 5 will maximise the firm’s profit.

Activity 7.4 Sketch the profit function from Example 7.6 and verify that q = 5does indeed maximise the profit. (Do not try to find the q-intercepts here.)

126

Page 139: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

7. Calculus III — Optimisation

Learning outcomes

At the end of this unit, you should be able to:

explain what a derivative tells us about a function;

find and classify stationary points;

sketch a curve;

solve optimisation problems.

Exercises

Exercise 7.1

Use differentiation to find the stationary point of the following quadratic functions anddetermine whether it is a local maximum or a local minimum using (a) the firstderivative test and (b) the second derivative test.

i. f(x) = −3x2 + 6x− 20; ii. g(x) = 3x2 + 6x+ 20.

Verify your answers by completing the square.

Exercise 7.2

Find the stationary points of the following functions and determine whether they are alocal maximum, a local minimum or neither of these. In each case, determine whetherany of the points you have found are global.

i. f(x) =x3

3− 2x2 + 3x− 15; ii. g(x) = 2x3 + 3x2 + 12x− 6.

Exercise 7.3

A firm has a monopoly on its market and so it can decide the price at which it sells itsproduct. If it sells the product for price p, then demand is given by the equationq = 300− 2p where q is the amount sold. The cost of producing q is given by thefunction

C(q) = 30 + 30q − q2

10,

and the revenue is given by the function R(q) = pq.

i. Find the revenue function, R(q), in terms of q and hence find the profit function,π(q), in terms of q.

ii. Calculate the value of q that will give the firm its maximum profit making surethat you check that this value of q does indeed give you the maximum profit. Whatis the maximum profit that the firm makes and what price, p, will provide this?

iii. If the firm can produce at most 120 units, what price will maximise the profit?

127

Page 140: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

8. Calculus IV — Integration

Unit 8: Calculus IVIntegration

Overview

Our last topic in calculus is integration which can be thought of as the ‘opposite’ ofdifferentiation. In this unit we will see how to find indefinite integrals and explore therelationship between definite integrals and the area under a curve.

Aims

The aims of this unit are as follows.

To see how indefinite integrals are related to derivatives via antiderivatives.

To introduce the techniques for finding simple indefinite integrals.

To examine the relationship between definite integrals and areas.

Specific learning outcomes can be found near the end of this unit.

8.1 Indefinite integrals

In Unit 5, we introduced differentiation and saw that a function, f(x), could bedifferentiated with respect to x to yield its derivative, which we denoted by

df

dxor f ′(x).

And, in particular, we saw how to find such derivatives by using the rules ofdifferentiation and some standard derivatives. Now, given a function, f(x), we want tomake sense of what it means to integrate it and we start by looking at the indefiniteintegral of this function with respect to x, which is denoted by∫

f(x) dx.

In such cases, as we are integrating the function f(x) with respect to x, we call it theintegrand. And, similarly to what we saw before, we will see how to find such integralsby using the rules of integration and some standard integrals. In particular, the standardintegrals will be closely related to our standard derivatives since the key idea behind ourmethod for finding integrals will be the idea that integration is the process that‘undoes’ (or ‘reverses’) the process of differentiation, i.e. the process of indefiniteintegration can be thought of as antidifferentiation and the resulting indefinite integralcan be thought of as an antiderivative.

128

Page 141: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

8. Calculus IV — Integration

Consider the functions F (x) and f(x) where we know that f(x) is the derivative∗ ofF (x), i.e.

dF

dx= f(x).

Now, using the idea that integration ‘undoes’ differentiation, i.e. if we integrate f(x)with respect to x we are looking for a function, F (x), whose derivative is f(x), we cansee that ∫

f(x) dx must be, more or less, given by F (x).

In such cases, we say that F (x) is an antiderivative of f(x) as opposed to, say, theindefinite integral.

However, you may wonder why we say that the function, F (x), that we found above is‘an’, as opposed to ‘the’, antiderivative of f(x). The reason for this is that if, instead ofthe function F (x) we had the function F (x) + c where c is a constant, then itsderivative would still be f(x), i.e.

d

dx

(F (x) + c

)= f(x),

and so, using the reasoning above, we would find that∫f(x) dx can also, more or less, be given by F (x) + c,

where c is a constant. That is, F (x) + c is also an antiderivative of f(x) for thisconstant c.

Example 8.1 Show that 4x2 and 4x2 + 1 are both antiderivatives of 8x.

4x2 is an antiderivative of 8x as we can differentiate 4x2 to get 8x. But, similarly, wecan see that 4x2 + 1 is also an antiderivative of 8x as we can differentiate 4x2 + 1 toget 8x.

As such, because this works for any constant c we add to F (x), we say that theindefinite integral gives us a whole family of antiderivatives which only differ by aconstant, i.e. the choice of c. In this way, we say that indefinite integration, i.e. theprocess of finding ∫

f(x) dx,

is antidifferentiation, i.e. it seeks all the functions F (x) + c that can be differentiated toyield f(x) and, as such, every one of these functions will be an antiderivative of f(x).

Example 8.2 What is

∫8x dx?

We saw in Example 8.1 that 4x2 is an antiderivative of 8x. This means that∫8x dx = 4x2 + c,

∗We say that it is the derivative because differentiation always yields exactly one answer.

129

Page 142: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

8. Calculus IV — Integration

where c is an arbitrary (i.e. any) constant. Notice that this works becausedifferentiating 4x2 + c we get 8x.

Generally speaking then, we have the following.

If F (x) is a function whose derivative is the function f(x), then we have∫f(x) dx = F (x) + c,

where c is an arbitrary constant. In particular, we call the

function, f(x), the integrand as it is what we are integrating,

function, F (x), an antiderivative as its derivative is f(x),

constant, c, a constant of integration which is completely arbitrary,† and

integral,

∫f(x) dx, an indefinite integral since, in the result, c is arbitrary.

Key concepts in integration

Now that we have the idea, let’s see how we’re going to actually find the indefiniteintegrals of the functions that commonly occur in this course.

8.1.1 Finding simple indefinite integrals

We have seen how to find indefinite integrals using antiderivatives, but now we want toexplore a more convenient way of finding them. The key idea is that, similar to what wesaw in Unit 5 when we introduced derivatives, we can introduce standard integralswhich tell us how to integrate our basic functions and once we know how to integratethese, the rules of integration will allow us to integrate combinations of these functions.

Standard integrals

In Example 8.2, we used the idea that indefinite integration is antidifferentiation toshow that the function f(x) = 8x has an indefinite integral given by

∫8x dx = 4x2 + c,

where c is an arbitrary constant. We now state some results that will allow us to findthe indefinite integrals of our other basic functions.

†As we can add any constant to F (x) to account for the fact that F (x) + c, for any constant c, isalso an antiderivative.

130

Page 143: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

8. Calculus IV — Integration

Constant powers of x

If k 6= −1 is a constant, we have ∫xk dx =

xk+1

k + 1+ c,

where c is an arbitrary constant and this works because

d

dx

(xk+1

k + 1+ c

)=

(k + 1)xk

k + 1+ 0 = xk.

In particular, if k = 0, we have∫1 dx =

∫x0 dx = x+ c,

and this works because the derivative of x+ c is 1.

However, if we have k = −1, we have∫x−1 dx =

∫1

xdx = ln |x|+ c,

where we need the modulus sign in ln |x| as x may be negative but the logarithmfunction is only defined for x > 0. This works because, if x > 0, we have |x| = x and so

d ln |x|dx

=d ln(x)

dx=

1

x,

whereas if x < 0, we have |x| = −x and so

d ln |x|dx

=d ln(−x)

dx=−1

−x =1

x,

if we use the chain rule.

Exponential and logarithm functions

If we are using e, we have ∫ex dx = ex + c,

where c is an arbitrary constant and this works because

d

dx

(ex + c

)= ex.

However, there is no nice standard integral for ln(x) and so we won’t really discuss theindefinite integral ∫

ln(x) dx,

in this course. But, if you’re interested in what it is, see Exercise 8.2.

131

Page 144: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

8. Calculus IV — Integration

Sine and cosine functions

For the sine and cosine functions we find that∫sinx dx = − cosx+ c and

∫cosx dx = sinx+ c,

where c is an arbitrary constant. The former works because

d

dx

(− cosx+ c

)= −(− sinx) + 0 = sinx,

whereas the latter works because the derivative of sinx is cos x.

Standard integrals: summary

In summary, if c is an arbitrary constant, we have the following standard integrals.

If k 6= −1 is a constant, then

∫xk dx =

xk+1

k + 1+ c.

In particular, if k = 0, we have

∫1 dx =

∫x0 dx = x+ c.∫

x−1 dx = ln |x|+ c.∫ex dx = ex + c.∫sinx dx = − cosx+ c.∫cosx dx = sinx+ c.

Standard integrals

We now look at how we can integrate some simple combinations of these functions.

8.1.2 The basic rules of integration

In Section 4.1.3, we saw that there are five ways of combining given functions to makenew ones and, in Section 5.2.2, we saw how the rules of differentiation could be used todifferentiate these new functions. Here we will see how we can use rules of integration tointegrate some of the simplest new functions that we can make, namely constantmultiples and sums.‡

‡In particular, the rules of integration that involve the other new functions that we can create (namelyproducts, quotients and compositions) are beyond the scope of this course!

132

Page 145: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

8. Calculus IV — Integration

The constant multiple rule

The constant multiple rule tells us how to integrate a constant multiple of a functionf(x) and it works as follows.

If k is a constant and f is a function, then

∫kf(x) dx = k

∫f(x) dx.

Constant multiple rule

Example 8.3 Clearly, this means that:

∫−3x−

12 dx = −3

∫x−

12 dx = −3

(x

12

12

)= −6x

12 + c.

∫2 ex dx = 2

∫ex dx = 2 ex + c.∫

7

xdx = 7

∫x−1 dx = 7 ln |x|+ c.∫

−4 sinx dx = −4

∫sinx dx = −4(− cosx) + c = 4 cos x+ c.

So, in these cases, we just integrate as before and then multiply the answer by theappropriate constant multiple.

In particular, observe that when using this rule, we integrate to find one of theantiderivatives and then just add on an arbitrary constant, c, to take care of theconstant of integration.

Activity 8.1 Use antiderivatives to show that the constant multiple rule works.

The sum rule

The sum rule tells us how to integrate the sum of two functions f(x) and g(x) and itworks as follows.

If f and g are functions, then

∫[f(x) + g(x)] dx =

∫f(x) dx+

∫g(x) dx.

Sum rule

Example 8.4 Clearly, this means that:∫ [x2 + x

12

]dx =

∫x2 dx+

∫x

12 dx =

x3

3+x

32

32

+ c =1

3

(x3 + 2x

32

)+ c.

133

Page 146: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

8. Calculus IV — Integration

∫[sinx+ cosx] dx =

∫sinx dx+

∫cosx dx = − cosx+ sinx+ c.

∫ [1

x+ ex

]dx =

∫x−1 dx+

∫ex dx = ln |x|+ ex + c.

So, in these cases, we just integrate as before and then add the answers together.

In particular, observe that when using this rule, we integrate to find the twoantiderivatives and then just add on an arbitrary constant, c, to take care of theconstant of integration.

Activity 8.2 Use antiderivatives to show that the sum rule works.

The linear combination rule

It should be clear that, taken together, our two rules of integration enable us tointegrate functions of the form kf(x) + lg(x) as follows.

If k and l are constants and f(x) and g(x) are functions then∫[kf(x) + lg(x)] dx = k

∫f(x) dx+ l

∫g(x) dx.

Linear combination rule

Example 8.5 Clearly, this means that:∫[2x+ 5] dx = 2

∫x dx+ 5

∫1 dx = 2

(x2

2

)+ 5x+ c = x2 + 5x+ c.

∫[sinx− cosx] dx =

∫sinx dx−

∫cosx dx = − cosx− sinx+ c.

∫ [3

x− 4 ex

]dx = 3

∫x−1 dx− 4

∫ex dx = 3 ln |x| − 4 ex + c.

So, in these cases, we just integrate as before and then combine the answers in theobvious way.

In particular, observe that when using this rule, we integrate to find the twoantiderivatives and then just add on an arbitrary constant, c, to take care of theconstant of integration.

Activity 8.3 Show that the constant multiple rule and the sum rule do indeed givethe linear combination rule.

Hence, use the linear combination rule to find the derivative of the functionf(x)− g(x) in terms of the derivatives of the functions f(x) and g(x).

134

Page 147: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

8. Calculus IV — Integration

Activity 8.4 Use the rules above to find the following integrals.

(a)

∫−3 cosx dx, (b)

∫[ ex + cosx] dx, (c)

∫ [3 sinx− 3

x

]dx.

There are, of course, more rules of integration which correspond to the product andchain rules for differentiation, but these are beyond the scope of this course.

8.2 Definite integrals and areas

So far, we have been looking at indefinite integrals and we have been finding them byusing the idea of an antiderivative to deduce standard integrals and rules of integration.We now turn to the geometric interpretation of an integral and this involves introducingthe idea of a definite integral and seeing what it represents.

Definite integrals and what they represent

In Section 5.1 we saw that the derivative of a function, f(x), gave us the gradient of thecurve y = f(x). We now consider what the integral of a function, f(x), tells us aboutthe curve y = f(x) and see how this comes about through the idea of a definite integral.

What is a definite integral?

Recall that an indefinite integral is so-called since, given a function, f(x), and one of itsantiderivatives, F (x), i.e. two functions related by the fact that

dF

dx= f(x),

we have ∫f(x) dx = F (x) + c,

where c is an arbitrary constant. And, indeed, it is this arbitrary constant that makesthis integral indefinite as we do not know what c is. In a similar vein, instead of writing,∫

f(x) dx we could also write

∫ b

a

f(x) dx,

where the constants a and b are called the limits of integration.

In order to work out integrals that look like this we need to know what to do with theselimits and the procedure is:

Firstly: Deal with the integral. Integrating f(x), we take one of itsantiderivatives, F (x), and then write∫ b

a

f(x) dx =

[F (x)

]ba

.

In particular, as we shall see below, observe that we no longer need a constant ofintegration.

135

Page 148: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

8. Calculus IV — Integration

Secondly: Deal with the limits. By definition, we let[F (x)

]ba

= F (b)− F (a),

i.e. we subtract the value of the antiderivative at x = a from its value at x = b.

Notice this means that, if F (x) is an antiderivative of f(x), we have∫ b

a

f(x) dx = F (b)− F (a),

i.e. the value of the integral depends only on the value of the antiderivative at thepoints x = a and x = b. Thus, this is now a definite integral as it no longer involves anarbitrary constant, c.

Activity 8.5 If F (x) is an antiderivative of f(x), show that∫ b

a

f(x) dx =

[F (x) + c

]ba

= F (b)− F (a),

if c is a constant. Hence explain why we can omit the constant of integration whenevaluating definite integrals.

Another consequence of this discussion is that it allows us to see how to use our basicrules of integration to evaluate definite integrals. For instance, if k and l are constantsand f(x) and g(x) are functions, then we can see that the linear combination rule givesus ∫ b

a

[kf(x) + lg(x)] dx = k

∫ b

a

f(x) dx+ l

∫ b

a

g(x) dx,

if we are using definite integrals.

Activity 8.6 Following what we saw in Section 8.1.2, write down the constantmultiple rule and the sum rule for definite integrals.

Activity 8.7 Using what we have seen so far, derive the linear combination rule fordefinite integrals.

Now that we have the basic idea, let’s see how we can work out a definite integral.

Example 8.6 Evaluate

∫ 3

1

(x+ 4) dx.

If we follow the two-step procedure above, i.e. integrating to find an antiderivativeand then dealing with the limits, we get∫ 3

1

(x+4) dx =

[x2

2+ 4x

]3

1

=

(32

2+ 4(3)

)−(

12

2+ 4(1)

)=

(9

2+ 12

)−(

1

2+ 4

)= 12,

136

Page 149: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

8. Calculus IV — Integration

which is the value of this definite integral.

Alternatively, we could use the linear combination rule to get∫ 3

1

(x+ 4) dx =

∫ 3

1

x dx+

∫ 3

1

4 dx =

[x2

2

]3

1

+

[4x

]3

1

=

(32

2− 12

2

)+

(4(3)− 4(1)

)=

(9

2− 1

2

)+

(12− 4

)= 12,

which is the same answer as before.

What definite integrals with non-negative integrands represent

Definite integrals are useful because they tell us about the area under a curve.Specifically, if we have the definite integral∫ b

a

f(x) dx,

where f(x) ≥ 0 for all x such that a ≤ x ≤ b,§ we say that we have a non-negativeintegrand and find that the value of the integral is the area of the region between thecurve y = f(x), the x-axis and the vertical lines x = a and x = b as illustrated inFigure 8.1.

y

xOa b

y = f(x)

Figure 8.1: The hatched region is between the curve y = f(x), the x-axis and the verticallines x = a and x = b. In cases like this we have a non-negative integrand, i.e. f(x) ≥ 0

for a ≤ x ≤ b, and so the definite integral∫ baf(x) dx gives us the area of this hatched

region.

Example 8.7 Find the area of the region between the line y = 4− 2x, the x-axisand the vertical lines x = 0 and x = 2 which is illustrated in Figure 8.2(a).

There are two ways to find this area:

As this is just a right-angled triangle, the area is just ‘half times base timesheight’, i.e.

area of triangle =1

2× 2× 4 = 4.

Thus, the area of the region is four.

§At the moment we will just accept this caveat. The reason why we need f(x) to be non-negative forvalues of x between the limits of integration will become clear very soon.

137

Page 150: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

8. Calculus IV — Integration

As we have y = f(x) with f(x) = 4− 2x, we can see from Figure 8.2(a) thatf(x) ≥ 0 between x = 0 and x = 2. So, as noted above, the area should be givenby∫ 2

0

(4− 2x) dx =

[4x− x2

]2

0

= (4× 2− 22)− (4× 0− 02) = (8− 4)− 0 = 4,

which is, again, four.

Consequently, this confirms that the definite integral does give us the area of theregion between the line y = 4− 2x, the x-axis and the vertical lines x = 0 and x = 2,at least when f(x) ≥ 0 between the vertical lines.

����������������������������������������

����������������������������������������

1 2

3

1

O

4

2

y = 4− 2x

y

x1 2

3

1

O

4

2

−2 −1

y = 4− x2

x

y

(a) (b)

Figure 8.2: Non-negative integrands. (a) For Example 8.7, the region between the liney = 4 − 2x, the x-axis and the vertical lines x = 0 and x = 2. (b) For Example 8.8, theregion between the parabola y = 4 − x2, the x-axis and the vertical lines x = −1 andx = 1.

However, generally, we won’t have a simple geometric way of finding the area under acurve and so we will have to use integration.

Example 8.8 Find the area of the region between the parabola y = 4− x2, thex-axis and the vertical lines x = −1 and x = 1 which is illustrated in Figure 8.2(b).

As we have y = f(x) with f(x) = 4− x2, we can see from Figure 8.2(b) thatf(x) ≥ 0 between x = −1 and x = 1. So, as noted above, the area should be given by∫ 1

−1

(4− x2) dx =

[4x− x3

3

]1

−1

=

(4(1)− (1)3

3

)−(

4(−1)− (−1)3

3

)=

(11

3

)−(−11

3

)=

22

3,

i.e. the area is 713.

138

Page 151: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

8. Calculus IV — Integration

Activity 8.8 Observe that the region in Example 8.8, as illustrated inFigure 8.2(b), is symmetric about the y-axis. Use this observation to explain whythe area of this region is two times the area represented by the definite integral,∫ 1

0

(4− x2) dx,

and verify that this does indeed give the correct area.

What definite integrals with non-positive integrands represent

We now start to consider what happens to the relationship between definite integralsand areas when we can not guarantee that the integrand is non-negative. That is, whathappens if we do not have f(x) ≥ 0 for all x such that a ≤ x ≤ b? To simplify matters,we will start by asking: What happens when this condition always fails? That is, whathappens when the integrand is non-positive as f(x) ≤ 0 for all x such that a ≤ x ≤ b?

Consider the area of the region bounded by the curve y = f(x), the x-axis and thevertical lines x = a and x = b when we have a non-positive integrand, i.e. when f(x) ≤ 0for a ≤ x ≤ b, as illustrated in Figure 8.3. Now, if we note that

If f(x) ≤ 0 for all a ≤ x ≤ b, then −f(x) ≥ 0 for all a ≤ x ≤ b,

we see that the function −f(x) does give us a non-negative integrand and so, followingwhat we saw above, the area, A, of the region in question is given by

A =

∫ b

a

−f(x) dx = −∫ b

a

f(x) dx =⇒∫ b

a

f(x) dx = −A.

That is, for non-positive integrands, the definite integral gives us minus the area. Thus,in the case of non-positive integrands, the area is given by the magnitude of the definiteintegral. Let’s have a look at an example.

xO

y

a b

y = f(x)

Figure 8.3: The hatched region is between the curve y = f(x), the x-axis and the verticallines x = a and x = b. In cases like this we have a non-positive integrand, i.e. f(x) ≤ 0 for

a ≤ x ≤ b, and so the definite integral∫ baf(x) dx gives us minus the area of this hatched

region.

139

Page 152: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

8. Calculus IV — Integration

Example 8.9 Find the area of the region between the line y = 4− 2x, the x-axisand the vertical lines x = 2 and x = 4 which is illustrated in Figure 8.4(a).

There are two ways to find this area:

As this is just a right-angled triangle, the area is just ‘half times base timesheight’, i.e.

area of triangle =1

2× 2× 4 = 4.

Thus, the area of the region is four.

As we have y = f(x) with f(x) = 4− 2x, we can see from Figure 8.4(a) thatf(x) ≤ 0 between x = 2 and x = 4. So, looking at the definite integral we get,∫ 4

2

(4−2x) dx =

[4x−x2

]4

2

= (4×4−42)−(4×2−22) = (16−16)−(8−4) = −4,

which is minus the answer we would expect. As such, we take the magnitude ofthis answer and so the area is, again, four.

Consequently, if f(x) ≤ 0 between the vertical lines, the definite integral gives usminus the area and so we take the magnitude of the definite integral to find the area.

��������������������������������

��������������������������������

����

��������������������

1 2 3

3

1

xO

4

2

4

y = 4− 2x

y

−1

−2

−3

−4 ��������������������������������

��������������������������������

����������������������������������������

����������������������������������������

����

��������������������

1 2 3

3

1

xO

4

2

4

y = 4− 2x

y

−1

−2

−3

−4

(a) (b)

Figure 8.4: Negative integrands and their relationship to area. The region between theline y = 4 − 2x, the x-axis and the vertical lines (a) x = 2 and x = 4 for Example 8.9,and (b) x = 0 and x = 4 for Example 8.10.

140

Page 153: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

8. Calculus IV — Integration

What definite integrals with general integrands represent

We now consider what happens to the relationship between definite integrals and areaswhen we can not guarantee that the integrand is non-positive or non-negative. That is,what happens if f(x) ≥ 0 for some x such that a ≤ x ≤ b but not others? Let’s start byconsidering the simple case where we have an integrand which is neither non-positivenor non-negative because there is some number c such that a ≤ c ≤ b where

f(x) ≥ 0 for all x such that a ≤ x ≤ c, and

f(x) ≤ 0 for all x such that c ≤ x ≤ b,

as illustrated in Figure 8.5. Indeed, following on from what we saw above, we see that

the area, say A1, of the hatched region between the vertical lines x = a and x = c isgiven by the definite integral ∫ c

a

f(x) dx,

i.e. A1 =

∫ c

a

f(x) dx, and

the area, say A2, of the hatched region between the vertical lines x = c and x = b isgiven by minus the definite integral ∫ b

c

f(x) dx,

i.e. A2 = −∫ b

c

f(x) dx.

As such, the area, say A, of the hatched region between the lines x = a and x = b isnow given by

A = A1 + A2 =

∫ c

a

f(x) dx+

∣∣∣∣∫ b

c

f(x) dx

∣∣∣∣ .In particular, note that in this case we will need to find two different definite integralsto find the area and not one like we did in the earlier cases!

y

xOa

b

y = f(x)

c

Figure 8.5: The hatched region is between the curve y = f(x), the x-axis and the verticallines x = a and x = b. In cases like this we have a non-negative integrand for a ≤ x ≤ cand a non-positive integrand for c ≤ x ≤ b, we need to find two different definite integralsto find the area of the region.

141

Page 154: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

8. Calculus IV — Integration

Thus, for general integrands, the procedure for finding the area of the region boundedby the curve y = f(x), the x-axis and the vertical lines x = a and x = b is as follows:

Firstly, determine all the points where the curve crosses the x-axis withx-coordinates between x = a and x = b.

Secondly, use these points to determine (possibly via a sketch) where the curve ispositive and where the curve is negative.

Thirdly, use this information to determine the areas by finding the appropriatedefinite integrals (bearing in mind that the integrands will now be eithernon-negative or non-positive).

Fourthly, add up all the areas to find the total area.

To see how this works, let’s consider a couple of examples.

Example 8.10 Find the area of the region between the line y = 4− 2x, the x-axisand the vertical lines x = 0 and x = 4 which is illustrated in Figure 8.4(b).

As indicated in Figure 8.4(b), the line y = 4− 2x crosses the x-axis when x = 2 andthis lies between x = 0 and x = 4. We can also see that the function is non-negativefor 0 ≤ x ≤ 2 and non-positive for 2 ≤ x ≤ 4. As such, using our earlier workings inExamples 8.7 and 8.9, we split the total region into two sub-regions to see that:

Between x = 0 and x = 2 we evaluate the definite integral,∫ 2

0

(4− 2x) dx,

which gives us 4 as we saw in Example 8.7. Thus, the area is four here as wehave a non-negative integrand.

Between x = 2 and x = 4 we evaluate the definite integral,∫ 4

2

(4− 2x) dx,

which gives us −4 as we saw in Example 8.9. Thus, the area is four here as wehave a non-positive integrand.

Consequently, the total area is eight.

We also note, in passing, that the definite integral∫ 4

0

(4− 2x) dx =

[4x− x2

]4

0

= (4× 4− 42)− (4× 0− 02) = (16− 16)− 0 = 0,

and, as this is zero, it most definitely is not giving us the area we seek!

142

Page 155: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

8. Calculus IV — Integration

Activity 8.9 Verify that the answer to the previous example is correct by findingthe areas of the triangles involved.

Example 8.11 Find the area of the region between the parabola y = 1− x2, thex-axis and the vertical lines x = −2 and x = 2 which is illustrated in Figure 8.6.

As indicated in Figure 8.6, the parabola y = 1− x2 crosses the x-axis when x = ±1and these points lie between x = −2 and x = 2. We can also see that the function isnon-negative for −1 ≤ x ≤ 1 and non-positive for −2 ≤ x ≤ −1 and 1 ≤ x ≤ 2. Assuch, we split the total region into three sub-regions to see that:

Between x = −2 and x = −1 we evaluate the definite integral,∫ −1

−2

(1− x2) dx =

[x− x3

3

]−1

−2

=

[−1− (−1)3

3

]−[−2− (−2)3

3

]=

[−1 +

1

3

]−[−2 +

8

3

]= −4

3.

Thus, the area is 43

here as we have a non-positive integrand.

Between x = −1 and x = 1 we evaluate the definite integral,∫ 1

−1

(1− x2) dx =

[x− x3

3

]1

−1

=

[1− 13

3

]−[−1− (−1)3

3

]=

[1− 1

3

]−[−1 +

1

3

]=

4

3.

Thus, the area is 43

here as we have a non-negative integrand.

Between x = 1 and x = 2 we evaluate the definite integral,∫ 2

1

(1−x2) dx =

[x− x3

3

]1

−1

=

[2− 23

3

]−[1− 13

3

]=

[2− 8

3

]−[1− 1

3

]= −4

3.

Thus, the area is 43

here as we have a non-positive integrand.

Consequently, the total area is 43

+ 43

+ 43

which is four.

We also note, in passing, that the definite integral,∫ 2

−2

(1−x2) dx =

[x− x3

3

]2

−2

=

[2− 23

3

]−[(−2)− (−2)3

3

]=

[2− 8

3

]−[−2 +

8

3

]= −4

3,

and this is most definitely not giving us the area we seek!

143

Page 156: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

8. Calculus IV — Integration

1

y

−1 1O x

2−2

−1

−2

−3

y = 1− x2

Figure 8.6: Negative integrands and their relationship to area (continued). ForExample 8.11, the region between the parabola y = 1 − x2, the x-axis and the verticallines x = −2 and x = 2.

Learning outcomes

At the end of this unit, you should be able to:

explain the relationship between differentiation and indefinite integration;

find simple indefinite integrals by using standard integrals and the rules ofintegration;

explain the relationship between a definite integral and an area;

find areas using definite integration.

Exercises

Exercise 8.1

Find the following indefinite integrals and use differentiation to verify your answer.

i.

∫−17 dx; vi.

∫5 ex dx;

ii.

∫27x dx; vii.

∫(3x2 − 5x+ 7) dx;

iii.

∫2x3 dx; viii.

∫(3x10 + 8x5 + 4 ex) dx;

iv.

∫5√x dx; ix.

∫ (3√x3 − 2x−

12

)dx;

v.

∫4

xdx; x.

∫ (2

x3+

3

2x

)dx.

144

Page 157: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

8. Calculus IV — Integration

Exercise 8.2

Differentiate the function F (x) = x ln(x)− x. (You will need to use the product rule!)

Hence find ∫ln(x) dx,

by thinking of the indefinite integral in terms of antiderivatives.

Exercise 8.3

Use the ‘adding powers’ power law and the constant multiple rule to show that∫ex+k dx = ex+k + c,

where c is an arbitrary constant.

Exercise 8.4

Find the indefinite integral ∫(2x− 1)2 dx,

by multiplying out the brackets and integrating term-by-term.

Exercise 8.5

Evaluate the following definite integrals.

i.

∫ 15

7

2 dx; v.

∫ 4

1

3√x dx;

ii.

∫ 1

0

x5 dx; vi.

∫ 2

−1

(4x3 + 3) dx;

iii.

∫ 8

2

1

2xdx; vii.

∫ 9

0

x√x dx;

iv.

∫ 0

−1

2 ex dx; viii.

∫ π

0

sin(x) dx.

Exercise 8.6

Find the areas between the following curves, the x-axis and the vertical lines x = 1 andx = 3. (You may find it useful to sketch the curve in each case.)

i. y = x2 − x− 2, ii. y = x2 − 3.

145

Page 158: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

9. Financial Mathematics I — Compound interest and its uses

Unit 9: Financial Mathematics ICompound interest and its uses

Overview

In this unit we look at some of the basic ideas behind financial mathematics. The keyconcept here is compound interest and how this adds value to our savings. We will alsolook at different compounding intervals and see how we can use Annual PercentageRates (or APRs) to compare investments with different interest rates and compoundingintervals. Lastly, we will see how these ideas also allow us to model the depreciation ofassets over time.

Aims

The aims of this unit are as follows.

To see how different kinds of interest work.

To see how certain investments can be compared using APRs.

To see how we can model depreciation using compounding.

Specific learning outcomes can be found near the end of this unit.

9.1 Interest

Suppose you deposit a certain amount, called the principal, in a savings account thatoffers you a certain rate of interest. Let’s say, for example, that you want to invest £500in a savings account which pays 12% interest annually (i.e. every year). This meansthat, after a year, you will receive 12% of £500 in interest. How much will this be?Well, we recall that

12% =12

100= 0.12,

and so we can see that 12% of £500 is given by

500× 0.12 = 60.

That is, you will accrue (or receive) £60 in interest from investing this principal in thisaccount for a year and the amount in your account, called the balance, will now be £560.

If we were to leave this money in the account for a second year, it then becomesnecessary to know how the interest is being calculated and there are two types ofinterest that we may wish to consider:

146

Page 159: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

9. Financial Mathematics I — Compound interest and its uses

Simple interest is where the bank always pays you interest on your principal.That is, even though the balance is £560 at the end of the first year, you still onlyaccrue 12% of £500 in interest. As such, under simple interest, your balance at theend of the second year will be £620 as you will have your original deposit of £500plus two lots of £60 in interest.

Compound interest is where the bank always pays you interest on your balance.That is, at the end of the second year you will accrue

£560× 0.12 = £67.20,

in interest. As such, under compound interest, your balance at the end of thesecond year will be £627.20 as you will now have an additional £67.20 to add toyour previous balance of £560.

Notice, in particular, that compound interest gives us a higher balance at the end of thesecond year than simple interest because we also get interest on the interest from theprevious year. That is, at the end of the first year our balance is

principal + first year’s interest = £500 + £60,

and, after the second year, we get 12% interest on both of these amounts which gives usan additional £60 from the principal and an additional

£60× 0.12 = £7.20,

from the first year’s interest yielding, as expected from above, a total of £67.20 ininterest. In this course, we will mainly focus on compound interest as that is mostwidely used, but we will occasionally mention simple interest in the activities or when itprovides a useful application.

Of course, this process of calculating simple or compound interest can continue for anynumber of years and so, instead of working these things out year-by-year we want to beable to work with a formula that will tell us the balance of the account after any givennumber of years. In particular, to find these formulae, we need to generalise ourdiscussion so that we are now dealing with the following variables.

P , the principal, i.e. the amount that is initially invested.

r, the annual interest rate written in decimal form.∗

n, the number of years in which we are interested.

In what follows we will find a formula that will allow us to calculate the balance of anaccount in terms of these variables under compound interest. We will leave you to findthe corresponding formula for simple interest in Activity 9.1.

∗In particular, even though interest rates will usually be given as a percentage, we want to workwith the decimal. That is, when speaking generally, we will specify an interest rate of 100r% as thiscorresponds to the decimal r. (For example, we had 12% above and, as 100(0.12) = 12, this gives us thedecimal r = 0.12 that we used in our calculations.)

147

Page 160: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

9. Financial Mathematics I — Compound interest and its uses

Activity 9.1 Suppose that you invest P in an account that pays simple interest ata rate of 100r% per year. Explain why the balance of this account will be P (1 + nr)after n years.

9.1.1 A formula for balances under annually compoundedinterest

We have seen how to calculate the compound interest accrued on £500 over two yearsat 12% interest per year and we have used this to calculate the balance of the accountat the end of this time period. However, this method of calculating the balance is quitetricky to generalise and so, instead of using the method above, we want to look atanother way of finding the balance at the end of each year. In particular, observe thatthe balance at the end of the first year can be written as

500 + 12% of 500 = 500 + 500× 0.12 meaning of ‘12% of’

= 500(1 + 0.12) common factor of 500

= 500(1.12) simplifying the bracket

which is, again, £560. That is, writing 12% as the decimal 0.12, we can see that theeffect of applying interest at 12% per year is the same as multiplying our principal by1.12. Similarly, since the balance at the end of the first year is £500(1.12), at the end ofthe second year, we have

500(1.12) + 12% of 500(1.12) = 500(1.12) + 500(1.12)× 0.12 meaning of ‘12% of’

= 500(1.12)(1 + 0.12) common factor of 500(1.12)

= 500(1.12)(1.12) simplifying the bracket

= 500(1.12)2 combining the brackets

which is, again, £627.20. Then, to see how much money will be in the account at theend of three years, we just multiply again by (1.12) to get 500(1.12)3, i.e. £702.46 (tothe nearest penny) and, clearly, this generalises.

The key, then, is to think of our interest rate of 100r% per year as the decimal numberr so that, given a principal P , we can see that:

After one year, the balance of the account will be P from the initial investment plusPr from the interest accrued on this investment, i.e. after one year we will have

P + Pr = P (1 + r).

After two years, the balance of the account will be P (1 + r) from the balance at theend of the first year plus P (1 + r)r from the interest accrued on this balance, i.e.after two years we will have

P (1 + r) + P (1 + r)r = P (1 + r)(1 + r) = P (1 + r)2.

After three years, the balance of the account will be P (1 + r)2 from the balance atthe end of the second year plus P (1 + r)2r from the interest accrued on thisbalance, i.e. after three years we will have

P (1 + r)2 + P (1 + r)2r = P (1 + r)2(1 + r) = P (1 + r)3.

148

Page 161: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

9. Financial Mathematics I — Compound interest and its uses

and so on, until . . .

After n years, the balance of the account will be P (1 + r)n−1 from the balance atthe end of the (n− 1)th year plus P (1 + r)n−1r from the interest accrued on thisbalance, i.e. after n years we will have

P (1 + r)n−1 + P (1 + r)n−1r = P (1 + r)n−1(1 + r) = P (1 + r)n.

Thus, we have the following result.

A principal, P , invested in an account that pays 100r% interest per year under annualcompounding will give a balance of

P (1 + r)n,

after n years.

Annually compounded interest

9.1.2 Other compounding intervals

We have seen how to calculate the balance of an account where interest is compoundedannually, but often, the interest is calculated more frequently than this. For example,the interest may be calculated on a quarterly basis and we call this quarterlycompounding. To see how this works, let’s consider a variation on our earlier example.Let’s say that we invest £500 in an account which pays 12% interest per yearcompounded quarterly. To find the balance of this account after a year, we start bydividing the annual interest rate by four to get the quarterly rate, i.e.

quarterly rate =annual rate

4=

0.12

4= 0.03,

as there are four quarters in a year. Now, working this out as before, this means that

after the first quarter the balance is 500× (1.03) = 515,

after the second quarter the balance is 500× (1.03)2 = 530.45,

after the third quarter the balance is 500× (1.03)3 = 546.36,

after the fourth quarter the balance is 500× (1.03)4 = 562.75,

and so £562.75 is the balance of the account after one year (to the nearest penny).Indeed, carrying on with this argument, we see that the balance of this account after nyears, given that we are using quarterly compounding, is

500× (1.03)4n,

since n years is the same as 4n quarters and in each quarter we compound at thequarterly interest rate. Thus, thinking of this more generally, we get the following result.

149

Page 162: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

9. Financial Mathematics I — Compound interest and its uses

A principal, P , invested in an account that pays 100r% interest per year underquarterly compounding will give a balance of

P(

1 +r

4

)4n

,

after n years. Note that r/4 is the quarterly interest rate and 4n is the number ofquarters in n years.

Quarterly compounded interest

Activity 9.2 Explain why quarterly compounded interest works in this way.

There are, of course, other periods over which compounding can occur, for example:

monthly compounding uses a monthly rate of r/12 and, after the first year, thebalance is

P(

1 +r

12

)12

,

due to the twelve monthly compoundings. After n years this yields a balance of

P(

1 +r

12

)12n

,

as there are 12n months in n years.

weekly compounding uses a weekly rate of r/52 and, after the first year, the balanceis

P(

1 +r

52

)52

,

due to the fifty-two weekly compoundings. After n years this yields a balance of

P(

1 +r

52

)52n

,

as there are 52n weeks in n years.

Activity 9.3 Explain why monthly and weekly compounded interest work in thisway.

Activity 9.4 Say I invest £500 at 12% interest per year. What is the balance afterone year if the interest is compounded annually? Quarterly? Monthly? Weekly?What do you notice about these numbers?

In each case, what is the balance after three years?

In each case, how much interest will you have received after six months?

[Note that, to 5dp, (1.01)6 = 1.06152 and(

13031300

)26= 1.06176.]

150

Page 163: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

9. Financial Mathematics I — Compound interest and its uses

Thinking about these examples more generally leads us to the following general resultfor compounding over a given interval.

A principal, P , in an account that pays 100r% interest per year where interest iscompounded over m equal intervals in each year will give a balance of

P(

1 +r

m

)mn,

after n years. Note that r/m is the interest rate for each compounding and mn isthe number of compoundings in n years.

Compound interest over a given interval

And, using this result, we can easily recover all of the compounding results that we haveseen so far.

Activity 9.5 Explain why the general result for compound interest over a giveninterval works.

Daily compounded interest is always calculated on the assumption that a year has 365days. But, as we know, in reality, every four years we have a leap year that has 366days. In the next activity, just for fun, we consider how this would affect the calculationof daily compounded interest if we were to take this into account.

Activity 9.6 Say I invest £1, 000, 000 at 12% interest per year at the beginning ofa common (i.e. non-leap) year. If the interest is compounded daily, what is thebalance at the end of the year?

Say I invest £1, 000, 000 at 12% interest per year at the beginning of a leap year. Ifthe interest is compounded daily, what is the balance at the end of the year?

What is the balance after four years?

[Note that, to 8dp,(

91289125

)365= 1.12747462 and

(30513050

)366= 1.12747468.]

9.1.3 Continuous compounding

As we saw in Activity 9.4, given a fixed principal and a fixed annual interest rate, weget a larger balance if we have a larger number of compoundings in a year. Inparticular, we have seen that after one year, if we have an interest rate of 100r% peryear and we compound m times in a year, the balance of the account will be

P(

1 +r

m

)m,

where r/m is the interest applied during each compounding. Indeed, as m increases,even though r/m, the rate at which interest is earned in each period, decreases thenumber of periods increases and, overall, the effect of these two changes is an increase inthe balance at the end of that year. So, one might wonder what the balance after one

151

Page 164: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

9. Financial Mathematics I — Compound interest and its uses

year would be if we were to make m, the number of compoundings in a year, arbitrarilylarge. Would the balance continue to increase? Or would it level off at some maximumvalue? In fact, it turns out that we get the latter and the maximum value we getinvolves the number e that we first saw in Section 4.1.2. In particular, we get thefollowing result.

As m gets larger and larger, the value of(1 +

r

m

)mgets closer and closer to

er,

where the number e, which we call the exponential constant, is approximately 2.71828(5dp).

The exponential constant

Thus we can see that if the bank was to compound continuously (or, speaking loosely, ifthe value of m was ‘infinitely large’ so that the interest was effectively beingcompounded at ‘every instant’), the balance of the account at the end of

one year would be P er,

two years would be (P er) er = P e2r,

three years would be (P e2r) er = P e3r,

and so on until, at the end of

n years we would have (P e(n−1)r) er = P enr in the account.

Thus, in general, we have the following result.

A principal, P , in an account that pays 100r% interest per year under continuouscompounding will give a balance of

P enr,

after n years. Here e is the exponential constant.

Continuously compounded interest

Clearly, this means that if I invest £500 at 12% interest per year with continuouscompounding, then given that e0.12 = 1.127497 to 6dp, we can see that the balance ofthe account after one year will be given by,

500 e0.12 = 500(1.127497) = 563.7485,

or £563.75 (to the nearest penny) which is, we note, more than we would get from anyfinite number of compoundings.

152

Page 165: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

9. Financial Mathematics I — Compound interest and its uses

Activity 9.7 If I invest £500 at 12% interest per year with continuouscompounding, what will be the balance of the account after (i) two years, (ii) sixyears and (iii) n years?

[Note that, to 6dp, e0.24 = 1.271249.]

9.2 Problems involving interest rates

The balance of a bank account is specified by four pieces of information:

How much do I invest? This is the principal, P .

What is the interest rate? This is the annual interest rate, 100r%.

How long do I invest for? This is the number of years the investment is going tolast for, n.

How often is the interest compounded? This is the number of compoundings in ayear, m, if we are compounding a finite number of times every year or, if we arecontinuously compounding, this is telling us to use er.

Often, mathematical problems concerning such investments supply you with two of thefirst three bits of information (together with information about how often the interest iscompounded) and ask you to find the third. Let’s look at some examples.

9.2.1 How much do I need to invest to get...?

For example, consider that you are investing in an account which pays 12% interest peryear compounded annually and you want to get £10, 000 after five years. How much doyou need to invest to get this return? In this case, we seek the smallest principal, P ,that will satisfy the inequality

P (1.12)5 ≥ 10, 000 =⇒ P ≥ 10, 000

(1.12)5= 5, 674.268,

to 3dp if we use the fact that (1.12)5 = 1.762342 to 6dp. This means that, if I invest£5, 674.27, we will meet, or rather just exceed, our target.

9.2.2 What interest rate do I need to get...?

For example, consider that you are investing £5, 000 and you want to get £6, 000 aftera five year period. If the interest is compounded annually, what interest rate do yourequire the bank to have? In this case, we need to find the smallest interest rate 100r%per year that will satisfy the inequality

5, 000(1 + r)5 ≥ 6, 000,

which we can rearrange to get

(1 + r)5 ≥ 6, 000

5, 000=⇒ 1 + r ≥

(6

5

) 15

=⇒ r ≥(

6

5

) 15

− 1 = 0.0371,

153

Page 166: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

9. Financial Mathematics I — Compound interest and its uses

to 4dp if we use the fact that (6/5)1/5 = 1.0371 to 4dp. This means that the interestrate needs to be at least 3.8% (to 1dp) if we want to ensure that we meet, or rather justexceed, our target.

9.2.3 How long do I need to invest to get...?

For example, consider that you are investing £500 at 12% interest per year compoundedannually. How long do you need to invest for in order to get a balance of £1, 000? Inthis case, we need to find the smallest number of years, n, that will satisfy the inequality

500(1.12)n ≥ 1, 000 =⇒ (1.12)n ≥ 1, 000

500= 2.

That is, we need to find the value of n that makes (1.12)n greater than or equal to 2, aproblem that is easily solved using logarithms. For instance, if we take commonlogarithms of both sides of this inequality, we get

log[(1.12)n] ≥ log(2),

and so, using the power law for logarithms, the left-hand side of this equation gives us

n log(1.12) ≥ log(2) =⇒ n ≥ log(2)

log(1.12),

as log(1.12) > 0. Now, if we were given that log(2) = 0.301 and log(1.12) = 0.049, bothto 3dp, this gives us

log(2)

log(1.12)=

0.301

0.049= 6.14,

to 2dp. Thus, as this gives us n ≥ 6.14, we need to invest for at least 6.14 years if wewant to ensure that we meet, or rather just exceed, our target. Consequently, as interestis calculated at the end of each year, this means that we must invest for seven years toget the desired return.

9.2.4 Annual percentage rates

Suppose that we are given a choice between two bank accounts. One offers an interestrate of 10% per year with daily compounding and the other offers an interest rate of10.1% per year with quarterly compounding. The question is, which of these accountswill give you the best return on your money? The one with the higher interest rate orthe one where the interest is compounded more often?

One way of comparing these accounts is to calculate the Annual Percentage Rate (orAPR). This gives us a way of comparing the returns by asking, if I invested £1 in theaccount for one year, what interest rate with annual compounding would I need to getthe same return? That is, by converting the returns into an equivalent interest rate thatuses a standard number of compoundings (in this case, one) over one year, we can usethe APRs to decide which account gives the higher return.

So, if we let 100r∗% be the APR, in the case of the account where we have an interestrate of 10% per year with daily compounding, investing £1 would give us

return from account −→(

1 +0.1

365

)365

= 1+r∗ ←− return from annual compounding,

154

Page 167: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

9. Financial Mathematics I — Compound interest and its uses

given that we are comparing the investments over one year. Then, if we were given therelevant information, say that (

1 +0.1

365

)365

= 1.1052

to 4dp, we would find that

1 + r∗ = 1.1052 =⇒ r∗ = 0.1052 = 10.52%,

is the APR. And, similarly, in the case of the account where we have an interest rate of10.1% per year with quarterly compounding, investing £1 would give us

return from account −→(

1 +0.101

4

)4

= 1+r∗ ←− return from annual compounding,

given that we are comparing the investments over one year. Then, if we were given therelevant information, say that (

1 +0.101

4

)4

= 1.1048

to 4dp, we would find that

1 + r∗ = 1.1048 =⇒ r∗ = 0.1048 = 10.48%,

is the APR. Thus, as we get a better return (i.e. a higher APR) from the account whereI have an interest rate of 10% per year with daily compounding, we should opt for thisone. In particular, notice that here, the higher number of compoundingsovercompensates for the fact that this account has a slightly lower interest rate! Tosummarise then, we have the following result.

An account that pays 100r% interest per year where interest is compounded over mequal intervals in each year has an APR of(

1 +r

m

)m− 1,

as a decimal.If the interest is continuously compounded, then the APR is

er − 1,

as a decimal. Here e is the exponential constant.

Annual percentage rate (APR)

Activity 9.8 You want to invest some money for a year and are given the choicebetween two accounts that use monthly compounding. Given that one account offersan interest rate of 5% per year and the other account offers an interest rate of 6%per year for the first three months and 4% per year for the rest of the year, find theirAPRs and decide which gives the best return.

[Note that, to 5dp, e0.015 = 1.01511, e0.03 = 1.03045 and e0.05 = 1.05127.]

155

Page 168: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

9. Financial Mathematics I — Compound interest and its uses

9.3 Depreciation

Often, when you buy an asset, e.g. a car, its value depreciates (or goes down) over time.For example, if you buy a car for £10, 000 and you know that a car depreciates at arate of 5% per year, its value after one year is given by

10, 000− 5% of 10, 000 = 10, 000− 10, 000× 0.05 = 10, 000(1− 0.05) = 10, 000× 0.95,

which is £9, 500. To find the value after two years we follow a similar procedure to get

10, 000× (0.95)2,

which is £9, 025. Clearly, this generalises, so that after n years the value of the car isgiven by 10, 000(0.95)n pounds.

Thus, the idea behind depreciation is that the rate of depreciation acts like a compoundinterest rate, but whereas with compound interest we add the effect of the interest rate,when we look at depreciation we need to subtract to get the effect of the rate ofdepreciation as the value is decreasing over time. And, as we saw above, this means thatwe can use the same formulae, but now the rate of depreciation which is the positivenumber, r, needs to be replaced by the negative number −r. As such, we have thefollowing result.

If the initial value of an asset is V and it depreciates at a rate of 100r% per yearwhere depreciation is compounded over m equal intervals in each year, then its valuewill be

V(

1− r

m

)mn,

after n years. Here r/m is the rate of depreciation for each compounding and mn isthe number of compoundings in n years.If this asset depreciates continuously, its value after n years is

V e−nr,

where e is the exponential constant.

Compound depreciation

Example 9.1 A computer is bought for £1, 000 and its value depreciatescontinuously at a rate of 40% per year. How much will the computer be worth aftersix months?

As the value of the computer is depreciating continuously at a rate of 40% for sixmonths, which is half of a year, its value after that time is given by

1, 000 e−(0.5)(0.4) = 1, 000 e−0.2 = 1, 000(0.81873) = 818.73,

where we have used the fact that e−0.2 = 0.81873 to 5dp. That is, the computer willbe worth £818.73 after six months.

156

Page 169: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

9. Financial Mathematics I — Compound interest and its uses

Learning outcomes

At the end of this unit, you should be able to:

solve problems that involve simple and compound interest;

use APRs to compare investments;

solve problems that involve depreciation.

Exercises

Exercise 9.1

You invest P pounds in a savings account that pays 5% interest per year using annualcompounding.

(i) Write down, in terms of P , the amount that will be in the account after one, twoand three years.

(ii) If, after two years, the amount in the account is £1, 764, how much did youinitially invest?

Exercise 9.2

Find the value of a principal sum of £10, 000 invested at an interest rate of 12% peryear for three years when the interest is compounded (i) annually, (ii) quarterly, (iii)monthly, and (iv) continuously.

What is the APR of each of these investments?

[Note that, to 7dp, (1.03)4 = 1.1255088, (1.01)12 = 1.1268250 and e0.12 = 1.1274969.]

Exercise 9.3

Two investments are made and it is given that the principal of one is 80% of the other.If the smaller principal is put into an account where interest is paid at 5% per yearusing continuous compounding and the larger principal is put into an account whereinterest is paid at 2% per year using continuous compounding, how long will it take forthe two accounts to have the same balance?

[Note that, to 5dp, ln(0.8) = −0.22314.]

Exercise 9.4

A car is worth £20, 000 brand-new, but its value depreciates continuously at a rate of20% per year.

(i) How much will the car be worth in three years?

(ii) When will it be worth half of its initial value?

[Note that, to 7dp, e−0.2 = 0.8187308 and ln(2) = 0.6931472.]

157

Page 170: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

10. Financial Mathematics II — Applications of series

Unit 10: Financial mathematics IIApplications of series

Overview

In this final Mathematics unit, we look at some more complicated ideas in financialmathematics. The key concept here is a geometric series and how this allows us to dealwith regular savings plans and annuities. We will also see how to compare the value ofdifferent investment strategies using the idea of present values.

Aims

The aims of this unit are as follows.

To see how arithmetic and geometric series can be summed.

To see how geometric series can be used to model certain kinds of investment.

To see how certain investments can be compared using present values.

Specific learning outcomes can be found near the end of this unit.

10.1 Sequences and series

In general, a sequence is an ordered list of numbers such as

2, 5, 8, 11, . . .

where here, the list is ordered because we consider 2 to be the first term, 5 to be thesecond term and so on. Indeed, we could think of this sequence of numbers as what weget when we start with two and then add three to the previous term to get eachsuccessive term. Indeed, as we could continue to do this indefinitely, we use the ‘. . .’ toindicate that this list of numbers goes on forever. In this course, we will be interested intwo special types of sequence and what we get when we add up some (or all) of theterms in such a sequence.

10.1.1 Arithmetic sequences and series

An arithmetic sequence is a sequence where each term is found by adding a commondifference to the previous term. That is, if the first term is a and the common differenceis d, then we get the arithmetic sequence given by

a, a+ d, a+ 2d, a+ 3d, . . .

158

Page 171: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

10. Financial Mathematics II — Applications of series

which is generated by adding the number d to each term to get the next term. Observethat we call d the common difference because we move from one term of the sequence tothe next by adding d.

Of course, we have seen this kind of thing before since taking the first term to be P andthe common difference to be Pr, we get the arithmetic sequence

P, P + Pr, P + 2Pr, P + 3Pr, . . .

which is, for principal P and an interest rate of 100r% per year, the initial balancefollowed by the balance after one year, two years, three years, . . . under simple interest.Of course, this means that the balance after n years will be given by P + nPr, orP (1 + nr), as we saw in Activity 9.1.

Summing an arithmetic series

An arithmetic series is what we get when we add up a certain number of successiveterms from an arithmetic sequence. For instance, if we were to add up the first threeterms of the arithmetic sequence

a, a+ d, a+ 2d, a+ 3d, . . .

we would want to find the sum of the arithmetic series

a+ (a+ d) + (a+ 2d).

We can easily find this sum, let’s call it S, by writing it as

S = a+ (a+ d) + (a+ 2d),

and then rewriting the series in reverse order to get

S = (a+ 2d) + (a+ d) + a,

so that, adding the corresponding terms in these two expressions for S together we get

2S = [a+ (a+ 2d)] + [(a+ d) + (a+ d)] + [(a+ 2d) + a],

which gives us2S = [2a+ 2d] + [2a+ 2d] + [2a+ 2d].

Now, since there are three occurrences of (2a+ 2d) on the right-hand side of thisexpression, we get

2S = 3[2a+ 2d] =⇒ S =3

2[2a+ 2d] = 3[a+ d],

as the sum of this arithmetic series.

In fact, this procedure can be used to sum any arithmetic series and if we apply it tothe first n terms of the arithmetic sequence

a, a+ d, a+ 2d, a+ 3d, . . .

159

Page 172: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

10. Financial Mathematics II — Applications of series

we can find a formula for the sum of any arithmetic series. If we do this, we get thefollowing result.

The sum of the arithmetic series

a+ (a+ d) + (a+ 2d) + · · ·+ (a+ [n− 1]d),

where a is the first term of the series, d is the common difference and n is the numberof terms is

n

2(2a+ [n− 1]d).

Sum of an arithmetic series

A useful way of remembering this formula is to note that we can write

2a+ [n− 1]d as a+ (a+ [n− 1]d),

and so we have

a+(a+d)+(a+2d)+· · ·+(a+[n−1]d) =n

2

(a+(a+[n−1]d)

)= n

(a+ (a+ [n− 1]d)

2

).

So, noting that n is the number of terms, a is the first term of the series and a+ [n− 1]dis the last term in the series, this means that the sum of the arithmetic series

a+ (a+ d) + (a+ 2d) + (a+ 3d) + · · ·+ (a+ [n− 1]d),

can be thought of as ‘the number of terms multiplied by the average of the first and lastterms of the series’.

Example 10.1 Find the sum of the arithmetic series 1 + 4 + 7 + 10 + 13.

Here the first term is 1 and we have to add three to get each successive term and sothat is the common difference. So, as there are five terms in the series, we can usethe formula to see that

1 + 4 + 7 + 10 + 13 =5

2

(2(1) + [5− 1](3)

)=

5

2× 14 = 35,

as you can easily verify with your calculator.

Alternatively, we see that the first term is 1, the last term is 13 and so their averageis 7. Multiplying this by the number of terms, i.e. 5, we again get a sum of 35.

Activity 10.1 Find the sum of the whole numbers from 1 to 100.

If n is a whole number, what is the sum of the whole numbers from 1 to n?

Activity 10.2 Suppose that you have an eccentric aunt who, starting in 2000, givesyou a cash gift every year and the amount you get (in pounds) is given by the year.(So, in 2000 you get a gift of £2, 000 and in 2001 you get a gift of £2, 001, etc.) If

160

Page 173: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

10. Financial Mathematics II — Applications of series

you save all of these gifts in your money box, how much will you have after you havereceived the gift in 2013?

How much will you have in your money box after you have received n of these gifts?

Activity 10.3 (Hard)Use the procedure described above to derive the formula for the sum of anarithmetic series.

10.1.2 Geometric sequences and series

A geometric sequence is a sequence where each term is found by multiplying theprevious term by a common ratio. That is, if the first term is a and the common ratio isr, then we get the geometric sequence given by

a, ar, ar2, ar3, ar4, . . .

which is generated by multiplying each term by the common ratio to get the next term.Observe that we call r the common ratio because we move from one term of thesequence to the next by multiplying by r.

Of course, we have seen this kind of thing before since taking the first term to be P andthe common ratio to be 1 + r, we get the sequence

P, P (1 + r), P (1 + r)2, P (1 + r)3, . . .

which is, for a principal P and a 100r% per year interest rate, the initial balancefollowed by the balance after one year, two years, three years, . . . under annualcompounding.

Summing a geometric series with a finite number of terms

A geometric series is what we get when we ‘add up’ a certain number of successiveterms from a geometric sequence. For instance, if we were to add up the first threeterms of the geometric sequence

a, ar, ar2, ar3, ar4, . . .

we would want to find the sum of the geometric series

a+ ar + ar2.

We can easily find this sum, let’s call it S, by writing

S = a+ ar + ar2,

and then multiplying this whole expression by the common ratio, r, to get

rS = ar + ar2 + ar3,

so that, subtracting the second expression from the first we get

S − rS = a− ar3,

161

Page 174: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

10. Financial Mathematics II — Applications of series

as all the intermediate terms cancel. Taking out the common factor of S on theleft-hand side and a on the right-hand side then gives

S(1− r) = a(1− r3).

So, assuming that r 6= 1,∗ we have

S = a1− r3

1− r ,

as the sum of this geometric series.

But, what happens if r = 1? This case is not covered by the formula that we have justfound and so we must treat it separately. To do this, consider what happens to thisgeometric series if r = 1 and notice that, in this case, all of the terms just become a, i.e.we just have

S = a+ a+ a = 3a,

and so, this is the sum of this geometric series if r = 1.

In fact, this procedure can be used to sum any geometric series and if we apply it to thefirst n terms of the geometric sequence

a, ar, ar2, ar3, ar4, . . .

we can find a formula for the sum of any geometric series with a finite number of terms.If we do this we get the following result.

The sum of the finite geometric series,

a+ ar + ar2 + · · ·+ arn−2 + arn−1,

where a is the first term, r is the common ratio and n is the number of terms is

a1− rn1− r ,

provided that r 6= 1. If r = 1, the sum is an instead.

Sum of a finite geometric series

Example 10.2 Sum the geometric series 1 + 2 + 22 + 23 + 24 + 25.

This geometric series has six terms where the first term is 1 and the common ratio is2. Thus, using the formula, we have

1 + 2 + 22 + 23 + 24 + 25 = 1× 1− 26

1− 2=

1− 26

−1= 26 − 1 = 63,

as the sum of this series. This can be verified by adding up the terms on yourcalculator.

∗So that we are not dividing by zero!

162

Page 175: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

10. Financial Mathematics II — Applications of series

Example 10.3 Sum the geometric series 3 + 6 + 12 + 24 + 48.

We notice that each successive term of this series is multiplied by two and so thisgeometric series, which has five terms, can be written as

3 + 6 + 12 + 24 + 48 = 3 + 3× 2 + 3× 22 + 3× 23 + 3× 24,

which means that the first term is 3 and the common ratio is 2. Thus, using theformula, we have

3 + 6 + 12 + 24 + 48 = 3× 1− 25

1− 2= 3× −31

−1= 93,

as the sum of this series. This can be verified by adding up the terms on yourcalculator.

Example 10.4 Sum the geometric series1

2− 1

4+

1

8− 1

16.

We note that this geometric series has four terms where the first term is 12

and, aswe can write it as

1

2− 1

4+

1

8− 1

16=

1

2+

1

2

(−1

2

)+

1

2

(−1

2

)2

+1

2

(−1

2

)3

,

we can see that the common ratio is −12. Thus, using the formula, we have

1

2× 1− (−1

2)4

1− (−12)

=1

2× 1− 1

1632

=1

3

[1− 1

16

]=

1

3× 15

16=

5

16,

as the sum of this series. This can be verified by adding up the terms on yourcalculator.

Activity 10.4 (Hard)Use the procedure described above to derive the formula for the sum of a geometricseries with a finite number of terms.

Summing a geometric series with an infinite number of terms

Sometimes, we can make sense of what happens when we have an infinite number ofterms in our geometric series. In such cases, we want to find the value of

a+ ar + ar2 + ar3 + · · · ,and here, the absence of a last term in the series is supposed to indicate that it ‘goes onforever’ or that it has an infinite number of terms. To see what the sum of such aninfinite geometric series would be, we recall that if we just took the first n terms of thisseries, the sum would be given by

a1− rn1− r ,

and we want to see what happens to this formula if we let n go off to infinity.

163

Page 176: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

10. Financial Mathematics II — Applications of series

In particular, if |r| < 1, then rn gets smaller as n gets larger. This means that, as n goesto infinity, rn will go to zero, and so our formula will give us

a1− 0

1− r =a

1− r ,

as the sum of our geometric series with an infinite number of terms.

However, if |r| > 1, then rn gets larger [in magnitude] as n gets larger. This means that,as n goes to infinity, rn will go to infinity too and so we will not be able to make anysense of the formula. In such cases, we say that the sum of the infinite geometric series‘does not exist’.†

To summarise, we have the following formula which allows us to sum an infinitegeometric series when |r| < 1.

The sum of the infinite geometric series,

a+ ar + ar2 + ar3 + · · · ,

where a is the first term and r is the common ratio is

a

1− r ,

provided that |r| < 1. If |r| ≥ 1, the sum of this series does not exist.

Sum of an infinite geometric series

Example 10.5 Sum the infinite geometric series1

2+

1

4+

1

8+

1

16+ · · · .

This geometric series has an infinite number of terms, the first term is 12

and we canwrite it as

1

2+

1

2

(1

2

)+

1

2

(1

2

)2

+1

2

(1

2

)3

+ · · · ,

so the common ratio is 12. As

∣∣12

∣∣ < 1, we can use the formula to see that

12

1− (12)

=1212

= 1

is the sum of this infinite geometric series.

Example 10.6 Sum the infinite geometric series1

2− 1

4+

1

8− 1

16+ · · · .

This geometric series has an infinite number of terms, the first term is 12

and we canwrite it as

1

2+

1

2

(−1

2

)+

1

2

(−1

2

)2

+1

2

(−1

2

)3

+ · · · ,†For reasons we won’t go into here, the sum of an infinite geometric series doesn’t exist when |r| = 1

either. The r = 1 case is obvious, but the r = −1 case is harder to understand.

164

Page 177: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

10. Financial Mathematics II — Applications of series

so the common ratio is −12. As

∣∣−12

∣∣ < 1, we can use the formula to see that

12

1− (−12)

=1232

=1

3

is the sum of this infinite geometric series.

10.2 Financial applications of geometric series

We now look at how geometric series can be used to model the value of differentinvestment schemes. We start with a regular saving plan where, after an initial deposit,a saver chooses to invest a certain additional amount every year and we ask, what is hisbalance after a certain number of years? We then look at annuities. These involve aninitial lump sum investment which provides the investor with a certain income at theend of each year. In this case, we are interested in how large this annual income can befor a certain number of years given the size of the initial investment. Lastly, we look atpresent values which, given a choice of several different investment opportunities, allowus to determine which one is the best.

10.2.1 Regular saving plans

Often, when one invests in a bank account, it is common to invest a certain amountover regular time periods instead of just making one lump sum payment. For example,if you decide to invest £600 at the beginning of each year in an account which paysannually compounded interest at a rate of 12% per year, what would the balance of theaccount be at the beginning of the fourth year of the investment (just after that year’s£600 has been invested)? We can work this out as follows.

At the end of the first year, the balance of the account is 600(1.12).

At the beginning of the second year, another £600 is added to the account makingthe balance 600 + 600(1.12) and so, at the end of the second year, the balance is

[600 + 600(1.12)](1.12) = 600(1.12) + 600(1.12)2.

At the beginning of the third year, another £600 is added to the account makingthe balance 600 + 600(1.12) + 600(1.12)2 and so, at the end of the third year, thebalance is

[600 + 600(1.12) + 600(1.12)2](1.12) = 600(1.12) + 600(1.12)2 + 600(1.12)3.

Now, if we add another £600 at the beginning of the fourth year, this means that thebalance of the account is now

600 + 600(1.12) + 600(1.12)2 + 600(1.12)3.

This is a geometric series of four terms with a first term of 600 and a common ratio 1.12which means that, using the formula above, the balance we seek is given by

600× 1− (1.12)4

1− 1.12= 600× 1− (1.12)4

−0.12= 5, 000[(1.12)4 − 1] = 2, 867.595,

165

Page 178: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

10. Financial Mathematics II — Applications of series

or £2, 867.60 (to the nearest penny) if we use the fact that, to 6dp, 1.124 = 1.573519.

Similarly, if we wanted to follow this investment scheme for a longer period of time, forexample if we wanted to calculate the balance at the beginning of the twenty-sixth year(just after that year’s £600 has been invested), we would need to sum the geometricseries

600 + 600(1.12) + 600(1.12)2 + 600(1.12)3 + · · ·+ 600(1.12)25,

which has 26 terms. So, again using our formula, the balance is given by

600× 1− (1.12)26

1− 1.12= 5, 000[(1.12)26 − 1] = 90, 200.36,

pounds (to the nearest penny) if we use the fact that, to 6dp, 1.1226 = 19.040072.

Indeed, more generally, we can see that if we wanted to calculate the balance at thebeginning of the nth year (just after that year’s £600 has been invested), we wouldneed to sum the geometric series

600 + 600(1.12) + 600(1.12)2 + 600(1.12)3 + · · ·+ 600(1.12)n−1,

which has n terms. And so, using the formula again, we see that

600× 1− (1.12)n

1− 1.12= 5, 000[(1.12)n − 1],

is the balance of the account at the beginning of the nth year.

10.2.2 Annuities

If we invest a certain amount of money, P , in a bank account that pays annuallycompounded interest at a rate of 100r% per year, we may want to set up an annuity.This is where, at the end of each of the next n years, we receive a payment of I fromthe account. The question then is, under these circumstances, how much can we affordto withdraw each year? If we withdraw too much or for too long a time, the money inthe account will run out. But, if we withdraw too little or for too short a time, we haveput too much money in the account. How can we model an annuity so that we can besure that we are investing in a wise and sustainable way?

For example, suppose that we decide to invest £10, 000 in an account which paysannually compounded interest at a rate of 5% per year in order to set up an annuitythat will pay £I at the end of each year for the next ten years. What, we may ask, isthe balance of the account after this annuity’s last payment?

Well, consider that the balance in the account can be modelled as follows. Given aninitial investment of £10, 000, we can see that:

At the end of the first year, the balance in the account is 10, 000(1.05) and so, if wemake our first withdrawal of I, the balance is now 10, 000(1.05)− I.

At the end of the second year, the balance of the account is

[10, 000(1.05)− I](1.05)− I = 10, 000(1.05)2 − I(1.05)− I,

after we have made our second withdrawal of I.

166

Page 179: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

10. Financial Mathematics II — Applications of series

At the end of the third year, the balance of the account is

[10, 000(1.05)2 − I(1.05)− I](1.05)− I = 10, 000(1.05)3 − I(1.05)2 − I(1.05)− I,after we have made our third withdrawal of I.

And so on until. . .

At the end of the tenth year, the balance of the account is

10, 000(1.05)10 − I(1.05)9 − I(1.05)8 − · · · − I(1.05)− I,after we have made our tenth withdrawal.

Now, this is the balance of the account after the annuity’s last payment and so, if wecall this B, we have

B = 10, 000(1.05)10 − I[1.059 + 1.058 + · · ·+ 1.05 + 1

],

and, in particular, if we consider the series in the big square brackets we see that wehave

1 + 1.05 + · · ·+ 1.058 + 1.059,

which is a geometric series with first term one, common ratio 1.05 and ten terms. So,using the formula above, we see that this gives us

11− 1.0510

1− 1.05=

1− 1.0510

−0.05= −20(1− 1.0510),

and so the balance we seek is given by

B = 10, 000(1.05)10 − I[− 20(1− 1.0510)

]= 10, 000(1.05)10 − 20I[1.0510 − 1].

We can now ask, with this annuity, how big can the withdrawals be? The key toanswering this question is to note that if our annual withdrawal, I, is too big, then atsome point before this ten year period has elapsed, the account will run out of moneyand the balance will become negative. That is, if I is too big, the bank will stopallowing us to make the withdrawals and the annuity will fail to achieve its purpose. So,we need to see what values of I give us a balance, B, which is still non-negative afterten years. But, if we need B ≥ 0, this means that we must have

10, 000(1.05)10 − 20I[1.0510 − 1] ≥ 0,

and this can be rearranged to give us

10, 000(1.05)10 ≥ 20I[1.0510 − 1] =⇒ 10, 000(1.05)10

20[1.0510 − 1]≥ I,

as 1.0510 − 1 > 0. This means that we have

I ≤ 500(1.05)10

1.0510 − 1= 1, 295.0453,

if we use the fact that, to 6dp, 1.0510 = 1.628895. That is, the maximum withdrawal wecan make each year is £1, 295.04.

167

Page 180: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

10. Financial Mathematics II — Applications of series

Activity 10.5 Assuming that we make this maximum withdrawal at the end ofeach year, what is the balance of the account after the last of these withdrawals?

Activity 10.6 Alternatively, suppose that we want this annuity to pay out £1, 500at the end of each year. How many of these withdrawals will we be able to make?

[Note that, to 2dp, log1.05

(32

)= 8.31.]

10.2.3 Future and present values

The last thing we want to consider about investments is how to compare them. Inparticular, we want to be able to compare investments which give us different returns atdifferent times, i.e. different future values, by considering what we shall call theirpresent value. This is the value of an investment at the present time, i.e. now, and thegeneral idea is that the investment with the largest present value is the one that isgiving us the best return. Let’s consider how this works.

Suppose that we have a principal, P , to invest for n years and the interest rate availableto us is 100r% per year compounded annually. In this case, we have

V = P (1 + r)n,

and V is the future value of this investment, i.e. how much it will be worth after nyears. We can also see that P is the present value of this investment since that is whatit is worth to us now.

But, what if we have been promised an amount V at some point in the future? What,we may ask, is this worth to us now? To be more specific, let’s assume that we will getthe money after n years and that an interest rate of 100r% per year compoundedannually is available to us. To see what it is worth to us now, i.e. its present value, weask how much we would have to invest now in order to get V at that time in the future.In this case, we would need to invest an amount, P , such that

V = P (1 + r)n which means that P =V

(1 + r)n,

is the present value of an amount V available to us after n years assuming that theinterest rate is 100r% per year compounded annually.

This is useful to us because present values allow us to compare different amounts ofmoney which we may get at different times in the future by considering what they areworth to us at some specific time, i.e. now. Let’s look at an example of how this works.

Example 10.7 Suppose that you have to choose between a gift of £20, 000 in tenyears’ time or a gift of £30, 000 in twenty years’ time. Which should you choosegiven that an interest rate of 10% per year compounded annually is available to you?

Given that an interest rate of 10% per year compounded annually is available toyou, the present value of £20, 000 in ten years’ time is

20, 000(1 + 10

100

)10 =20, 000

(1.1)10= 7, 710.8672

168

Page 181: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

10. Financial Mathematics II — Applications of series

or £7, 710.87 (to the nearest penny) if we use the fact that, to 6dp, 1.110 = 2.593742whereas the present value of £30, 000 in twenty years’ time is

30, 000(1 + 10

100

)20 =30, 000

(1.1)20= 4, 459.3088

or £4, 459.31 (to the nearest penny) if we use the fact that, to 6dp, 1.120 = 6.727500.Thus, you should choose the £20, 000 in ten years’ time as it is worth more to younow.‡

Present values can also be used to see what an annuity is worth as we can find thepresent value of each payment and hence the present value of the annuity as a whole.Let’s look at an example.

Example 10.8 You win a competition and you can claim a prize of £10, 000 nowor an annuity which pays £1, 000 at the end of each year for ten years. Whichshould you choose given that an interest rate of 5% per year compounded annually isavailable to you?

The present value of the first annuity payment is 1, 000/1.05, the second is1, 000/1.052, and so on until the tenth which has a present value of 1, 000/1.0510.Thus, the present value of all the annuity payments is

1, 000

1.05+

1, 000

1.052+ · · ·+ 1, 000

1.0510.

This is a geometric series with a first term of 1,000/1.05, a common ratio of 1/1.05and ten terms which means that, using the formula for the sum of a geometric series,we see that the present value of this annuity is

(1, 000

1.05

) 1−(

1

1.05

)10

1− 1

1.05

=

(1, 000

1.05

) 1− 1

1.0510

0.05

1.05

=

(1, 000

0.05

)[1− 1

1.0510

]= 20, 000

[1− 1

1.0510

]= 7, 721.74

pounds (to the nearest penny) if we use the fact that, to 6dp, 1.0510 = 1.628895. Assuch, when choosing your prize, you should opt for the £10, 000 lump sum as that isworth more to you now.

‡For example, you could take the £20, 000 in ten years’ time and invest it for the followingten years to get a return of 20, 000(1 + 10

100 )10 = 20, 000(1.1)10 = 51, 874.84 pounds (to the nearestpenny) in twenty years’ time. This is far better than just receiving £30, 000 after the same amount of time!

We also observe, in passing, that £51, 874.85 is the future value, in twenty years’ time, of getting£20, 000 in ten years’ time and investing it. So, in terms of future values over a common period of time,we should, again, opt for the £20, 000 in ten years’ time!

169

Page 182: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

10. Financial Mathematics II — Applications of series

Activity 10.7 Use present values to determine how many years they would have topay the annuity for in order for it to be a better prize than the lump sum.

[Note that, to 2dp, log1.05(2) = 14.21.]

Activity 10.8 Suppose that the annuity was a perpetuity, i.e. you would get£1, 000 at the end of each year forever. What is the present value of this perpetuity?

Activity 10.9 Why is your answer to the previous activity not a surprise?

Learning outcomes

At the end of this unit, you should be able to:

identify an arithmetic sequence and sum an arithmetic series;

identify a geometric sequence and sum a finite geometric series;

find the sum of an infinite geometric series when it exists;

solve problems that involve regular savings plans and annuities;

use present values to compare investments.

Exercises

Exercise 10.1

Find the sums of the following arithmetic series.

i. 1 + 2 + 3 + · · ·+ 10; ii. 1 + 2 + 3 + · · ·+ n;

iii. 5 + 0− 5− 10; iv. 5 + 0− 5− · · · − 5n.

Exercise 10.2

Find the sums of the following geometric series.

i. 1 +1

2+

1

22+

1

23; iv. 1 +

1

3+

1

9+

1

27+ · · ·;

ii. 3− 6 + 12− 24 + 48− 96; v. 1− 1

2+

1

4− 1

8+ · · ·;

iii. 3− 6 + 12− 24 + · · ·+ 3(−2)n; vi.1

4− 1

16+

1

64− 1

256+ · · ·.

170

Page 183: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

10. Financial Mathematics II — Applications of series

Exercise 10.3

Suppose that, at the beginning of each year you pay £500 into a savings account paying7% interest per year. How much will be in the account at the end of the eighth year?

[Note that, to 6dp, 1.078 = 1.718186.]

Exercise 10.4

Suppose that you invest £10, 000 in a bank account that pays 5% interest per year. Ifyou want to withdraw £900 at the end of each year, how many years will you be able todo this for?

[Note that, to 2dp, log1.05

(94

)= 16.62.]

Exercise 10.5

You win a competition and can choose between the following prizes.

(i) £50, 000 now.

(ii) £10, 000 at the end of each year for seven years.

(iii) £100, 000 in ten years’ time.

Given that an interest rate of 8% per year compounded annually is available to you,which one should you choose?

[Note that, to 6dp, 1.087 = 1.713824 and 1.0810 = 2.158925.]

Exercise 10.6

You borrow £1, 200 from your bank which requires that you repay the loan in monthlyinstalments over two years. If interest is charged at 12% per annum using monthlycompounding, how much will you have to pay back each month?

[Note that, to 6dp, 1.0124 = 1.269735.]

171

Page 184: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Part 2Statistics

172

Page 185: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Introduction to Statistics

Introduction to Statistics

Syllabus

This half of the course introduces some of the basic ideas of theoretical statistics,emphasising the applications of these methods and the interpretation of tables andresults. The Statistics part of this course has the following syllabus.

Data exploration: The statistics part of the course begins with basic dataanalysis through the interpretation of graphical displays of data. Univariate,bivariate and categorical situations are considered, including time series plots.Distributions are summarised and compared and their patterns discussed.Descriptive statistics are introduced to explore measures of location and dispersion.

Probability: The world is an uncertain place and probability allows thisuncertainty to be modelled. Probability distributions are explored to describe howlikely different values of a random variable are expected to be. The Normaldistribution is introduced and its importance in statistics is discussed. The conceptof a sampling distribution is explored.

Sampling and experimentation: An overview of data collection methods isfollowed by how to design and conduct surveys and experiments in the socialsciences. Particular attention is given to sources of bias and conclusions which canbe drawn from observational studies and experiments.

Fundamentals of regression: An introduction to modelling a linear relationshipbetween variables. Interpretation of computer output to assess model adequacy.

Aims of the course

The aims of the Statistics part of this course are to provide:

a basic knowledge of how to summarise, analyse and interpret data

an insight into the concepts of probability and the Normal distribution

an overview of sampling and experimentation in the social sciences

an introduction to modelling a linear relationship between variables.

Treatment is at an elementary mathematical level throughout, so you should becomfortable with the material covered in ‘Unit 1: Review I — A review of some basicmathematics’ of the Mathematics part of the subject guide.

173

Page 186: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Introduction to Statistics

Learning outcomes for the course (Statistics)

At the end of the Statistics part of the course, you should be able to:

interpret and summarise raw data on social science variables graphically andnumerically

appreciate the concepts of a probability distribution, modelling uncertainty and theNormal distribution

design and conduct surveys and experiments in a social science context

model a linear relationship between variables and interpret computer output toassess model adequacy.

Textbook

As previously mentioned in the main introduction, this subject guide has been designedto act as your principal resource. The following textbook is referenced throughout theStatistics part of the course.

+ Swift, L., and S. Piff Quantitative methods for business, management and finance.(Palgrave, 2010) third edition [ISBN 9780230218246].

This has been indicated as ‘background reading’ meaning it is not essential, but youcould benefit from reading it if you find any of the material in the guide difficult tofollow.

174

Page 187: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

11. Data exploration I — The nature of statistics

Unit 11: Data exploration IThe nature of statistics

Overview

We begin the Statistics section of the course with data exploration, arguably the singlemost important part of any data analysis. To make sense of any data, we must first‘understand’ the basic features of each variable under consideration. Visualising datacommunicates a wealth of information to even non-technical audiences. Dataexploration presents different ways of presenting data graphically depending on the typeof variable(s) being explored. We then move on to descriptive statistics (measures oflocation and measures of dispersion) which are commonly-used statistics in the socialsciences whose roles are to ‘describe’ or ‘summarise’ data numerically.

Aims

This unit explains the nature of statistics providing a gentle introduction to thediscipline. The concept of ‘data’ is explored including the different types of data whichmay be obtained. The role of statistics in the research process is also discussed.Particular aims are:

to demonstrate how social scientists familiarise themselves with datasets prior tofurther analysis

to introduce the different types of data that can occur

to explain how statistics can be used to conduct social research.

Background reading

+ Swift, L., and S. Piff Quantitative methods for business, management and finance.(Palgrave, 2010) third edition [ISBN 9780230218246] ‘Statistics’ Chapter 8.

11.1 Introduction

So just what is ‘Statistics’? Well, there are several possible definitions. A good workingone is:

‘the study of data, involving the collection, classification, summary, display,analysis and interpretation of numerical information’.

We consider each of these briefly.

175

Page 188: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

11. Data exploration I — The nature of statistics

Statistics is largely concerned with data. This is a plural noun meaning ‘given things’or more loosely ‘information’ or ‘facts’.

Sometimes we look at non-numerical data such as sex (‘gender’) or social class, butusually we are concerned with numerical information. The primary objective is todetermine what the data tell us about the underlying context in business, economics,society, medicine etc.

11.1.1 Data collection

We can do this in several ways:

Direct observation, for example driver behaviour on a motorway.

Simulation of data, by computer, using certain assumptions. For example, what isthe likely effect on traffic flow if the speed limit is changed?

An experiment, for example, some patients are given an active drug and others aplacebo.

A survey to find out more about consumers or voters (or computers or cars).

The main distinction between an experiment and a survey is that in the former casethere is some sort of intervention by the researcher. Most, although not all, of thestatistics you may go on to carry out (in finance, politics etc.) are likely to be based onsurvey data.

11.1.2 Data classification

We mention the types of data shortly — this will have an important impact on howthey should be analysed. It is a very good idea to check and ‘clean’ data in practice tomake sure there are no obvious outliers (anomalous values — more on these later in theunit) which may need to be excluded, and to ensure there are no recording errors. Ofcourse, computers play a vital role in all areas of statistics, although they are not usedexplicitly in this course.

11.1.3 Data summary

This is discussed in more detail later on in this unit. The idea is to get a quick pictureof what the ‘typical’ data value is, as measured by an average such as the mean; toassess the spread of the data, as measured typically by the variance; and to see if thedata are symmetric, as measured by the skewness.

11.1.4 Data display

This refers to tables, graphs and charts. The purpose is not to produce a pretty picturebut to gain insight into the data and their context. A simple display is often clearestand best. In some cases, the display alone is sufficient; there is no need for any formalmathematical or statistical study.

176

Page 189: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

11. Data exploration I — The nature of statistics

11.1.5 Analysis

This is the heavy part of Statistics. Most of the time, the methods used arewell-established, so it is only necessary to learn the relevant technique. It is importantto understand that most methods depend on certain assumptions about the data. Ifthese assumptions fail to hold, the conclusions are likely to be invalid.

11.1.6 Interpretation

Outside a few universities and research institutes, clear interpretation is vital!Interpretation should be understandable by managers and others without formalstatistical training. For example, do not say ‘the p-value of 0.02 shows that the result ofthe t-test is significant’, but rather ‘there is evidence that men and women differ in theirattitude to a policy of lowering taxes’.∗

11.1.7 Uncertainty

In general, what is being measured is subject to uncertainty, or random variation. Forexample, two randomly chosen groups of 100 voters will not give exactly the sameoutcomes.

We often wish to establish whether a change or a difference (between men and women,left-wing and right-wing voters etc.) can be put down to chance, or whether it is theresult of some real effect. We study probability largely in order to measure thisuncertainty.

11.1.8 Descriptive and inferential statistics

It is convenient to distinguish two approaches:

Descriptive statistics comprises those methods concerned with describing a setof data so as to yield meaningful interpretations.

Statistical inference comprises those methods concerned with analysing a subsetof data so as to draw conclusions about the entire set of data.

While we are defining things, let us formalise a little. The population is the collectionof all individuals or items under consideration. A sample is that part of the populationfrom which information is obtained when inference is used.

Example 11.1

A manufacturer of tyres wants to estimate the average life of a tyre. This is aninferential study: the population consists of all tyres produced, the sampleconsists of 100 (or 50, or 500, or 5,000) tyres that are examined.

A sports writer wants to list the times taken to run 100m in Olympic Gamesover 60 years. This is a descriptive study.

∗p-values occur in hypothesis testing, which is a form of statistical inference which is not covered inthis course.

177

Page 190: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

11. Data exploration I — The nature of statistics

A politician wants to know how many votes were cast for her party in her regionat a recent election. This is a descriptive study.

An economist estimates the average income of all California residents. This isan inferential study: the population consists of all residents, the sample consistsof the subset examined.

Notice that in an inferential study, it is the properties of the population that we wish todetermine. You could argue that it would be better to examine all population members.This is known as conducting a census. But this will usually be slower, more costly andmay sometimes be impossible. Consider a census of all the trees in the UK — or all thefish in the Atlantic Ocean!

The main thing to ensure is that the sample is representative of the population. This ismost easily done using a simple random sample, where each population member has anequal chance of inclusion in the sample, although there are alternatives. We will explorethis further in the ‘Sampling and experimentation’ section of the course.

It may not come as a surprise that, generally speaking, descriptive statistics are moreeasily carried out than inferential statistics. Sadly, descriptive statistics are often poorlydone, or even omitted completely in practical contexts, as well as student work. This isa shame because they can tell us a great deal about the data, and can even renderinferential statistics unnecessary. As a rule, any data analysis should start withdescriptive statistics.

11.2 Types of data

There are several types of data, and it is important to know which one we are dealingwith, so that the correct statistical procedure is used.

Categorical data (also known as qualitative data) give information about thediscrete groups into which a population or sample is divided. These may benominal or ordinal.

• Nominal data are unranked. For example, a group of individuals may beclassified by gender, eye colour, blood type (A, B, AB, O), or religion etc.

• Ordinal data are ranked. They give information about order or rank on ascale. For example, a group of students may be classified by the grades theyreceive in an examination (A, B, C etc.). So-called Likert scales are ordinal(this course is ‘very interesting’, ‘interesting’, ‘quite interesting’, ‘not veryinteresting’, ‘boring’). Investments can be graded by risk on an ordinal scale.

Metric data are numerical values on some continuous scale. They may be intervalor ratio data.

• Interval data are measured on a continuous scale and have the property thatthe differences between numbers have a meaning. For example, centigradetemperatures are interval data: the difference between 150 and 160 is the sameas the difference between 250 and 260, but both are different from the

178

Page 191: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

11. Data exploration I — The nature of statistics

difference between 150 and 200. The current time (for example, 19:34) is alsomeasured on an interval scale.

• Ratio data are similar to interval data but now there is an absolute zero, andtherefore the ratio of two numbers can be given a meaning. For example,height, weight and the length of time an individual has been alive allconstitute ratio data. In each of these cases, there is a fixed zero; nobody canhave a negative height or weight, or have lived a negative amount of time. Incontrast, the zero for centigrade temperatures or the current time is merely amatter of convention. [Notice that Kelvin temperatures do have an absolutezero and are therefore measured on a ratio scale.]

Example 11.2 In a household survey:

Sex (gender) of the head of household — nominal.

Type of heating used — nominal.

Age of head of household — ratio.

Thermostat setting in winter — interval.

Household income — ratio.

Average monthly electricity bill — ratio.

Time when heating is switched on — interval.

Rating of electricity provider on a 10-point scale — ordinal.

Finally, we mention that many datasets considered are on a single attribute, forexample weight. Such data are called univariate data. Sometimes, we wish to considertwo variables together, say the height and weight of a group of individuals. Such dataare called bivariate data. Multivariate data arise when we consider three or morevariables together — perhaps height, weight, age and pulse rate.

There are other ways to classify data and the classification is not always precise.However, in most cases it is fairly clear and is sufficient for most applications — inparticular, the choice of the correct statistical method.

11.3 The role of statistics in the research process

First some definitions:

Research: trying to answer questions about the world in a systematic (scientific)way.

Empirical research: doing research by first collecting relevant information (data)about the world.

Research may be about almost any topic: physics, biology, medicine, economics, history,literature etc. Most of our examples will be from the social sciences: economics,

179

Page 192: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

11. Data exploration I — The nature of statistics

management, finance, sociology, political science, psychology etc. Research in this senseis not just what universities do. Government, business, and all of us as individuals do ittoo. Statistics is used in essentially the same way for all of these.

Example 11.3

It all starts with a question:

Can labour regulation hinder economic performance?

Understanding the gender pay gap: what has competition got to do with it?

Does racism affect health?

Children and online risk: powerless victims or resourceful participants?

Refugee protection as a collective action problem: is the EU shirking itsresponsibilities?

Do directors perform for pay?

Heeding the push from below: how do social movements persuade the rich tolisten to the poor?

Does devolution lead to regional inequalities in welfare activity?

The childhood origins of adult socio-economic disadvantage: do cohort andgender matter?

Parent care as unpaid family labour: how do spouses share?

We can think of the empirical research process as having five key stages:

1. Formulating the research question.

2. Research design: deciding what kinds of data to collect, how and from where.

3. Collecting the data.

4. Analysis of the data to answer the research question.

5. Reporting the answer and how it was obtained.

We conclude this section with an example of how statistics can be used to help answer aresearch question.

Example 11.4 CCTV, crime and fear of crime

Our research question is: What is the effect of closed-circuit television (CCTV)surveillance on

the number of recorded crimes?

180

Page 193: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

11. Data exploration I — The nature of statistics

the fear of crime felt by individuals?

We illustrate this using part of the following study

Gill and Spriggs (2005): Assessing the impact of CCTV. Home Office ResearchStudy 292. Available athttp://webarchive.nationalarchives.gov.uk/20110218135832/

http://rds.homeoffice.gov.uk/rds/pdfs05/hors292.pdf

The research design of the study comprised:

Target area: a housing estate in northern England.

Control area: a second, comparable housing estate.

Intervention: CCTV cameras installed in the target area but not in thecontrol area.

Compare measures of crime and fear of crime in the target and control areas,in the 12 months before and 12 months after the intervention.

The data and data collection:

Level of crime: number of crimes recorded by the police, in the 12 monthsbefore and 12 months after the intervention.

Fear of crime: a survey of residents of the areas.

• Respondents: random samples of residents in each of the areas.

• In each area, one sample before the intervention date and one about 12months after.

• Sample sizes:

Before AfterTarget area 172 168Control area 215 242

• Question considered here: ‘In general, how much, if at all, do you worrythat you or other people in your household will be victims of crime?’ (from1 = ‘worry all the time’ to 5 = ‘never worry’).

Statistical analysis of the data:

% of respondents who worry ‘sometimes’, ‘often’ or ‘all the time’:Target Control

[a] [b] [c] [d] ConfidenceBefore After Change Before After Change RES interval

26 23 −3 53 46 −7 0.98 0.55–1.74

It is possible to calculate various statistics, for example the Relative Effect SizeRES = ([d]/[c])/([b]/[a]) = 0.98 is a summary measure which compares thechanges in the two areas.

181

Page 194: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

11. Data exploration I — The nature of statistics

RES < 1, which means that the observed change in reported fear of crime hasbeen a bit less good in the target area.

However, there is uncertainty because of sampling: only 168 and 242 individualswere actually interviewed at each time in each area.

Confidence interval for RES includes 1, which means that changes inself-reported fear of crime in the two areas are not statistically significantlydifferent from each other.

Number of (any kind of) recorded crimes:Target area Control area

[a] [b] [c] [d] ConfidenceBefore After Change Before After Change RES interval

112 101 −11 73 88 15 1.34 0.79–1.89

Now RES = 1.34 > 1, which means that the observed change in the number ofcrimes has been worse in the control area than in the target area.

However, the numbers of crimes in each area are fairly small, which means thatthese estimates of the changes in crime rates are fairly uncertain.

Confidence interval for RES again includes 1, which means that the changes incrime rates in the two areas are not statistically significantly different from eachother.

In summary, this study did not support the claim that introduction of CCTVreduces crime or fear of crime

(If you want to read more about research of this question, see Welsh and Farrington(2008). Effects of closed circuit television surveillance on crime. Campbell SystematicReviews 2008:17. (See http://www.campbellcollaboration.org/library.php))

Many of the statistical terms and concepts mentioned above were not explained.However, it serves as an interesting example of how statistics can be employed in thesocial sciences to investigate research questions.

Activities 11.1, 11.2 and 11.3 are not concerned with any technicalities of statistics, andthey do not ask you to do any calculations yourself (except, perhaps, a little bit in11.2). Instead, these exercises invite you to think about various topics related to the useof statistics, and to research design more generally. These include such issues asdefinition and measurement of variables, selection of subjects for studies, andjustifiability of claims about causes and effects.

You are asked to think of answers to the questions, using your own reasoning andcommon sense. You are welcome to discuss the questions with friends. You do not needto worry about getting the answers right or wrong; the only point is to start thinking!

Activity 11.1 Consider the following statements. Do you think the conclusions arevalid? If so, say why. If not, indicate why not: because the logic used is faulty,because any assumptions made are dubious, because the data collection method is

182

Page 195: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

11. Data exploration I — The nature of statistics

inappropriate, or for any other reason.

(a) ‘10% of drivers involved in 100 car accidents had previously taken substance X.A parallel study of drivers not involved in accidents showed that only 1% hadtaken X. Therefore X is a contributory cause of car accidents.’

(b) ‘Five years ago, the average stay of patients in this hospital was 21 days. Now itis 16 days. We now cure our patients more quickly.’

(c) ‘We wanted to see if the public approved of our plans to transfer resources toelderly patients. We carried out a large-scale survey based on 800 daytime citycentre interviews. We found 79% of respondents approved our plans. Thereforewe have public backing.’

(d) ‘Nugro is the revolutionary hair restorer for men. A sample of 100 men withthinning hair was selected to apply Nugro lotion every day for a month. Ofthese, 77 reported new hair growth. Nugro is proven to be effective in thetreatment of male baldness.’

Activity 11.2 The following cross-tabulation shows data on the 3,593 people whoapplied to graduate study at the University of California, Berkeley, in 1973. Thetable classifies the applicants according to their sex, and whether or not they wereadmitted to the university.

AdmittedSex No Yes % Yes TotalMale 1,180 686 36.8 1,866Female 1,259 468 27.1 1,727Total 2,439 1,154 32.1 3,593

The table shows that 36.8% of male applicants, but only 27.1% of female applicants,were admitted.

Bob observes this and concludes that in that year Berkeley practised discriminationagainst female applicants. Amy, however, decides to take another look at thestatistics. She adds one more piece of data, the department to which each personapplied, and creates cross-tabulations separately for each department (which arelabelled A, B, C, D and E). These tables are shown below. For example, the firsttable cross-classifies the sex and admission status of just those 585 people whoapplied to department A, and so on.

Amy examines her tables and states that she disagrees with Bob: there is noevidence of discrimination. Why does she conclude this? Why do Amy and Bobcome to different conclusions? Which one do you agree with?

183

Page 196: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

11. Data exploration I — The nature of statistics

AdmittedDepartment Sex No Yes % Yes TotalA Male 207 353 63.0 560

Female 8 17 68.0 25Total 215 370 63.2 585

B Male 205 120 36.9 325Female 391 202 34.1 593Total 596 322 35.1 918

C Male 279 138 33.1 417Female 244 131 34.9 375Total 523 269 34.0 792

D Male 138 53 27.7 191Female 299 94 23.9 393Total 437 147 25.2 584

E Male 351 22 5.9 373Female 317 24 7.0 341Total 668 46 6.4 714

Total 2,439 1,154 32.1 3,593

Activity 11.3 Each of the statements below mentions a piece of statisticalevidence, and a claim based on it. Do you agree with the claims? Why or why not?Are there any fallacies in the claims, or complications which are being glossed over?The questions marked with (†) are a bit more subtle and complex than the rest.

(a) A public consultation exercise on attitudes to genetically modified (GM) foodwas carried out in the UK in 2002–3. This involved various events whereinterested members of the public could come and take part in discussions aboutGM food. After the events, the participants were asked to complete aquestionnaire, which was also available on a website. Around 37,000 peoplecompleted the questionnaire, and 90% of those expressed opposition to GMfoods. Therefore a very large majority of the people in the UK oppose GMfoods.

(b) In a study of the ages and professions of people who had died, it was found thatthe profession with the lowest average age of death was ‘student’. Thereforebeing a student is the most dangerous of professions.

(c) In 2007, the officially recorded suicide rate in Sweden was 15.8 per 100,000people per year. This was much higher than in many other countries, some ofwhich even had a rate of 0.0. This indicates that suicide is a much more seriousproblem in Sweden than in those other countries.

184

Page 197: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

11. Data exploration I — The nature of statistics

(d) Data on the past 10 years in a country show that the number of deaths fromdrowning tends to be higher in months when total consumption of ice cream ishigh. Therefore eating ice cream before going swimming increases the risk ofdrowning.

(e)† A country has two kinds of secondary schools, private schools and state-ownedschools. Statistics show that 40% of students graduating from private schools,but only 20% of those graduating from state schools, go on to study at auniversity. Therefore, private schools are twice as good as state schools.

(f)† Sociologists conduct a study where they select a random sample of people andask these people for a list of their close friends. A random sample of the peoplenamed as friends is then contacted and the survey is repeated. The peoplesampled at the second stage have, on average, many more friends than do thepeople in the original sample. Therefore, your friends have more friends thanyou do.

11.4 Summary

This introductory unit has outlined the purpose of statistics and the role the disciplineplays in the research process. Preliminary considerations of issues relating to datacollection and analysis were discussed, as well as the different types of data which exist.Having spent some time thinking about the nature of statistics, you are now ready tostart doing statistics, beginning with data visualisation in the next unit.

11.5 Key terms and concepts

Bivariate data Categorical dataCensus DataDescriptive statistics Direct observationExperiment Interval dataMetric data Multivariate dataNominal data Ordinal dataPopulation ProbabilityRatio data ResearchSample SimulationStatistical inference SurveyUnivariate data

185

Page 198: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

11. Data exploration I — The nature of statistics

Learning outcomes

At the end of this unit, you should be able to:

outline issues relating to data collection and analysis

describe the different types of data

explain the role of statistics in the research process

discuss the key terms and concepts introduced in this unit.

Exercises

Exercise 11.1

The given working definition of ‘Statistics’ was:

‘the study of data, involving the collection, classification, summary, display,analysis and interpretation of numerical information’.

What does this mean?

Exercise 11.2

Briefly discuss the distinction between descriptive statistics and inferential statistics.

Exercise 11.3

Explain the different types of data that can occur.

Exercise 11.4

What is the measurement level for each of the following variables?

(a) The quality ranking of a newspaper

(b) The classification of an examination result as ‘Distinction’, ‘Merit’, ‘Pass’, or ‘Fail’

(c) Country of birth

(d) Favourite music

(e) Income measured by percentiles (for example, if someone’s income is above the20-th percentile, this means 20% of the population earn less).

186

Page 199: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

11. Data exploration I — The nature of statistics

Exercise 11.5

In 2009 the UK government reclassified cannabis from a class C drug to a class B drug,thereby introducing the threat of arrest for possession of the drug. The following tablecross-classifies age and agreement with the reclassification.

Age Agree with reclassification (%)1. No 2. Unsure 3. Yes

18–39 50 30 20 100%40–59 ? ? ? 100%60 and over ? ? ? 100%

Complete the table in such a way that there is a weak positive association between Ageand Agreement. (Assume the measurement scale of Agreement as given in the table isan ordinal one.)

187

Page 200: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

12. Data exploration II — Data visualisation

Unit 12: Data exploration IIData visualisation

Overview

Graphical representations of data provide us with a useful view of the distribution ofvariables. In this unit, we shall cover a selection of approaches for displaying datavisually — each being appropriate in certain situations. In the next unit we considerdescriptive statistics, whose main objective is to interpret key features of a datasetnumerically. Graphs and charts have little intrinsic value per se, however their mainfunction is to bring out interesting features of a dataset. For this reason, simpledescriptions should be preferred to complicated graphics.

Aims

This unit explains the importance of data visualisation and its role in communicatingthe underlying distribution of data. Particular aims are:

to provide a basic knowledge of how to summarise, analyse and interpret datavisually

to recommend appropriate graphical methods for different types of variables.

Background reading

+ Swift, L., and S. Piff Quantitative methods for business, management and finance.(Palgrave, 2010) third edition [ISBN 9780230218246] ‘Describing data’ Chapter 1.

12.1 Grouping data

Consider the monthly expenditure, in pounds, on credit cards by 300 individuals.

141.24 −25.00 82.23 233.90 0.0079.50 0.00 6.41 59.63 102.71 etc.

The second observation of −£25 indicates negative expenditure — presumably a refundon a previously-purchased item. It is difficult to interpret the data if it is just in theform of a lot of numbers. But we can first group the data into classes, and then findout how many data points are in each class:

188

Page 201: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

12. Data exploration II — Data visualisation

Expenditure Number of individuals Expenditure Number of individuals[−25, 25) 87 [575, 625) 3[25, 75) 55 [625, 675) 2[75, 125) 30 [675, 725) 3[125, 175) 24 [725, 775) 1[175, 225) 23 [775, 825) 4[225, 275) 22 [825, 875) 2[275, 325) 8 [875, 925) 0[325, 375) 10 [925, 975) 0[375, 425) 7 [975, 1,025) 0[425, 475) 6 [1,025, 1,075) 3[475, 525) 3 [1,075, 1,125) 0[525, 575) 6 [1,125, 1,175) 1

This is much better! We can see, for example, that 172 individuals (slightly over halfthose surveyed) spend less than £125.

There is some arbitrariness in the grouping used and the choice of classes is often downto common sense, but as a guide:

There should be between 5 and 25 classes.

Each piece of data should belong to one and only one class.

In general, all classes should have the same width (but we can sometimes haveopen-ended classes at the extreme, such as < 0 or > 1,000).

Some terminology associated with grouping data:

Classes are categories for grouping data.

The frequency is the number of data values in a class.

The frequency distribution is a listing of classes and their frequencies.

The lower class limit is the smallest value that can go in a class.

The upper class limit is the largest value that can go in a class.

The class mark is the midpoint of a class.

The class width is the difference between the upper and lower class limits for aclass.

12.2 Histograms

Diagrams are a particularly useful way of illustrating data as ‘a picture is worth athousand words.’ We can illustrate the frequency data for the credit cards as shown inFigure 12.1. From the histogram it is clear that most credit card holders (in the sample)spend moderate amounts each month, with a few spending large amounts.

Some points to note:

189

Page 202: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

12. Data exploration II — Data visualisation

Histogram of Monthly Credit Card Expenditure

Expenditure in pounds

Fre

quen

cy

0 200 400 600 800 1000 1200

020

4060

80

Figure 12.1: Histogram of credit card data.

The height of each bar equals the frequency of the class it represents.

Each ‘bar’ extends from the lower class limit of its class to the lower class limit ofthe next class.

The axes are labelled.

The histogram has an informative title.

Histograms are only used for continuous (i.e. interval or ratio) data.

It is also often useful to calculate and tabulate cumulative frequencies, that iscounting frequencies up to and including a given class, as follows.

Cumulative CumulativeExpenditure Frequency frequency Expenditure Frequency frequency

[−25, 25) 87 87 [575, 625) 3 284[25, 75) 55 142 [625, 675) 2 286[75, 125) 30 172 [675, 725) 3 289[125, 175) 24 196 [725, 775) 1 290[175, 225) 23 219 [775, 825) 4 294[225, 275) 22 241 [825, 875) 2 296[275, 325) 8 249 [875, 925) 0 296[325, 375) 10 259 [925, 975) 0 296[375, 425) 7 266 [975, 1,025) 0 296[425, 475) 6 272 [1,025, 1,075) 3 299[475, 525) 3 275 [1,075, 1,125) 0 299[525, 575) 6 281 [1,125, 1,175) 1 300

190

Page 203: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

12. Data exploration II — Data visualisation

We can quickly now see, say, that just under two-thirds of credit card holders spend lessthan £175.

Having determined the cumulative frequencies, we can construct a cumulativefrequency polygon. The horizontal axis is labelled with the class endpoints and thevertical axis with the cumulative frequencies. A point of zero frequency is placed at thebeginning of the first class and a point is plotted at the end of each class interval for thecumulative value. The points are then joined up and, for the credit card data, we getFigure 12.2.

x

x

x

x

x

x

xx

x x x x x x x x x x x x x x x x x

0 200 400 600 800 1000 1200

050

100

150

200

250

300

Cumulative Frequency Polygon of Monthly Credit Card Expenditure

Expenditure in pounds

Cum

ulat

ive

freq

uenc

y

Figure 12.2: Cumulative frequency polygon of credit card data.

So, for example, from the graph, we can see that only about 16 of the 300 credit cardholders spend more than £600 in a month.

We now look at some other types of graphical display. Recall that the type of diagramused will depend on the type of data, and the objective of any diagram is to illustratethe key features of the dataset.

Histograms (and some other forms of diagram) are suitable for (univariate) interval orratio data. For categorical data, other alternatives are more appropriate.

Activity 12.1 At a university computing centre, the daily numbers of computerstoppages due to machine malfunction were recorded for a period of 70 successiveworking days and the following data obtained.

0 0 2 0 0 0 3 3 0 01 8 5 0 0 4 3 0 6 20 3 1 1 0 1 0 1 1 02 2 0 0 0 17 1 2 1 20 1 6 4 3 3 1 2 4 00 3 15 2 0 0 0 0 0 11 0 2 0 2 4 4 0 2 2

191

Page 204: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

12. Data exploration II — Data visualisation

(a) Produce a cumulative frequency polygon for these data.

(b) Assuming a year consists of 255 working days, on how many days would youexpect 5 or more stoppages to occur?

(c) Discuss in a couple of sentences what the data tell you. What recommendationswould you make?

12.3 Pie charts and bar graphs

These two familiar diagrams will often be the methods of choice for categorical data.Both will quickly give the observer essential features of the data in a way that the rawdata cannot. Consider information on toothpaste sales in $000s for the 10 top brands inthe US in a recent year.

Brand Sales Brand SalesCrest 370,437 Rembrandt 52,067

Colgate 321,084 Sensodyne 50,133Aquafresh 177,989 Listerin 40,107Mentadent 170,630 Closeup 32,009

Arm & Hammer 109,512 Ultrabrite 25,358

We can represent this dataset using a pie chart, as in Figure 12.3.

Crest 27%Colgate 24%

Aquafresh 13%

Mentadent 13% Arm & Hammer 8%

Rembrandt 4%

Sensodyne 4%

Listerin 3%

Closeup 2%Ultrabrite 2%

Pie chart of Toothpaste Sales in $000s in the US

Figure 12.3: Pie chart of toothpaste sales data.

Alternatively, we can construct a bar chart, as in Figure 12.4. It is similar to ahistogram except that the bars are separated.

192

Page 205: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

12. Data exploration II — Data visualisation

Cre

st

Col

gate

Aqu

afre

sh

Men

tade

nt

A &

H

Rem

bran

dt

Sen

sody

ne

List

erin

Clo

seup

Ultr

abrit

e

Bar chart of Toothpaste Sales in $000s in the US

0

50000

100000

150000

200000

250000

300000

350000

Figure 12.4: Bar chart of toothpaste sales data.

12.4 Line graphs

Histograms, pie charts and bar charts are only a few of the many ways to portray datavisually. Fortunately, most displays are based on common sense and are easy tounderstand. Line graphs are an additional method, but they should generally only beused for time series data, where the horizontal axis represents time. Let us considerthe sales of a commodity recorded at three-monthly (seasonal) intervals as follows.

Year Season Sales Year Season Sales1 Spring 8.3 3 Spring 9.51 Summer 13.1 3 Summer 14.31 Autumn 9.2 3 Autumn 10.41 Winter 6.1 3 Winter 7.12 Spring 8.9 4 Spring 10.12 Summer 13.7 4 Summer 14.92 Autumn 9.8 4 Autumn 11.12 Winter 6.6 4 Winter 7.4

The line graph of this dataset is shown in Figure 12.5. Note the clear ‘seasonal variation’and small, but probably significant, upward trend. (What might the commodity be?)

193

Page 206: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

12. Data exploration II — Data visualisation

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

5 10 15

68

1012

14

Commodity Sales by Season

Season number

Sal

es

Figure 12.5: Line graph of commodity sales data.

12.5 Scatter plots

Scatter plots are used to illustrate the association between bivariate data points. Forexample, we might have data on the salary and age of a number of employees of acompany, as depicted in Figure 12.6. Think about what this scatter plot tells us aboutthe relationship between salary and age. (Note the anomalous point is called an outlier— more on this later in the unit.)

xx x

x xxx x

x x x

x

x

x

x

xx

x

x x

20 30 40 50 60

2040

6080

100

120

140

Scatter plot of Salary against Age

Age

Sal

ary

(in £

000s

)

Figure 12.6: Scatter plot of ‘Salary’ against ‘Age’.

194

Page 207: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

12. Data exploration II — Data visualisation

We now consider a slightly more elaborate example which illustrates the potentialpower of relatively simple descriptive statistics. Assume we have data for advertisingand sales (both in £ millions) for 60 companies of similar size in a given year. Eachcompany is in one of three sectors: A, B or C. How is advertising related to sales? First,let us look at the data.

Advertising Sales Sector Advertising Sales Sector Advertising Sales Sector38 77 A 66 77 B 93 67 C10 57 A 43 71 B 86 68 C60 65 A 54 73 B 20 47 C80 77 A 46 74 B 10 43 C68 73 A 6 29 B 37 49 C86 55 A 25 64 B 91 87 C1 63 A 87 30 B 89 88 C41 77 A 59 53 B 68 66 C86 70 A 80 26 B 7 32 C14 76 A 31 49 B 35 44 C25 54 A 10 18 B 42 50 C5 49 A 94 26 B 21 40 C3 72 A 68 68 B 28 42 C16 84 A 41 67 B 77 77 C22 76 A 69 72 B 53 60 C2 63 A 6 20 B 30 39 C29 76 A 93 24 B 24 37 C34 77 A 3 19 B 95 91 C55 71 A 34 47 B 84 80 C36 92 A 100 20 B 66 75 C

Clearly, it is very difficult to say anything interesting about the dataset by looking atthe raw data in a table. So, first we plot sales against advertising while ignoring thesector. The scatter plot is shown in Figure 12.7 and this suggests increasing advertisingmay lead to increasing sales, but it is not very clear.

x

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x x

x

x

x

xxx

x

x

x

x

x

x

x

x

xx

x

xx

x

x

x

xx

xx

x

xx

x

x

x

x

xx

x

x

xx

x

x

x

0 20 40 60 80 100

2040

6080

Scatter plot of Sales against Advertising for 60 companies

Advertising (in £ millions)

Sal

es (

in £

mill

ions

)

Figure 12.7: Scatter plot of ‘Sales’ against ‘Advertising’ for 60 companies.

195

Page 208: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

12. Data exploration II — Data visualisation

Suppose we produce scatter plots for each sector separately. These are shown in Figure12.8. Advertising appears to have no effect on sales in Sector A. Advertising appears tohave an increasing effect on sales in Sector B, after which it has a decreasing effect. Thisquite often happens: the market has become saturated, or the advertising campaignbecomes less effective. Finally, advertising appears to have a steadily increasing effecton sales in Sector C.

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

xx

x

x

0 20 40 60 80

5060

7080

90

Sector A

Advertising (in £ millions)

Sal

es (

in £

mill

ions

)

x

xxx

x

x

x

x

x

x

x

x

xx

x

x

x

x

x

x

0 20 40 60 80 100

2030

4050

6070

Sector B

Advertising (in £ millions)

Sal

es (

in £

mill

ions

)

xx

x

x

x

xx

x

x

x

x

xx

x

x

xx

x

x

x

20 40 60 80

3040

5060

7080

90

Sector C

Advertising (in £ millions)

Sal

es (

in £

mill

ions

)

Figure 12.8: Scatter plot of ‘Sales’ against ‘Advertising’ for 60 companies, by sector.

Now consider data on sales, in thousands of units, of a small electronics firm over 10years.

Year 1 2 3 4 5 6 7 8 9 10Sales 2.51 2.72 3.22 3.19 4.09 4.76 5.23 6.36 7.28 9.28

What can we deduce? First, we plot the data as shown in Figure 12.9.

The data appear to be increasing exponentially (literally, i.e. according to a law of thegeneral form y = a+ becx for some constants a, b and c). Note the precise use of theword ‘exponentially’ !

But perhaps the data points are increasing according to a quadratic, rather than anexponential, law, so we would be better looking for a relation of the general formy = a+ bx+ cx2. Statistical modelling can be used to determine the curve best fitting aset of data, according to some criterion. In Unit 19 we consider how to find the bestfitting line using a technique called ‘regression’.

196

Page 209: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

12. Data exploration II — Data visualisation

x xx x

x

xx

x

x

x

2 4 6 8 10

34

56

78

9

Annual sales for a small electronics firm

Year

Sal

es (

in 0

00s)

Figure 12.9: Scatter plot of ‘Sales’ against ‘Time’ for a small electronics firm.

12.6 Summary

This unit has looked at different ways of presenting data visually. Which type ofdiagram is most appropriate will be determined by the type of data being analysed. Youshould be able to interpret any important features which are apparent from a diagram.

12.7 Key terms and concepts

Bar chart ClassesCumulative frequencies Cumulative frequency polygonDistribution FrequencyFrequency distribution Line graphOutlier Pie chartScatter plot Time series

Learning outcomes

At the end of this unit, you should be able to:

interpret and summarise raw data on social science variables graphically

distinguish between univariate and bivariate situations

distinguish between categorical and continuous (including time series) variables

discuss the key terms and concepts introduced in this unit.

197

Page 210: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

12. Data exploration II — Data visualisation

Exercises

Exercise 12.1

A pie chart is most suitable for a variable measured using which of the following scales:(a) nominal scale; (b) ordinal scale; or (c) interval scale? What about a bar chart?What about a histogram?

Exercise 12.2

Name one possible advantage and one possible disadvantage of histograms.

Exercise 12.3

The table below gives the numbers of people killed or seriously injured in the UK fordifferent categories of road user during 1982 and 1984. These two years, 1982 and 1984,represent a complete year before and a complete year after the introduction of theseatbelt law.

1982 1984Car drivers 19,460 16,421Front seat passengers 9,458 7,047Rear seat passengers 4,706 5,062Pedestrians 18,963 19,168Cyclists 5,967 6,506

(a) What is the percentage change in the number of deaths or seriously injured foreach category of road user between 1982 and 1984?

(b) What was the percentage of car drivers and car front seat passengers killed orseriously injured, out of all cases, each year?

(c) Write a brief commentary on your findings (a few sentences), with any suggestionsas to additional information you would require for a fuller investigation as to whythere have been changes.

Exercise 12.4

The following table shows the weekly visits for five health, fitness and nutritionwebsites. Display the data using a suitable graph and comment on the results, givingpossible reasons for any trends that you notice.

Site Visitors April 2012 Visitors April 2013eDiets 472,000 936,000Weight Watchers 445,000 876,000WebMD 524,000 853,000AOL Health 448,000 713,000Yahoo! Health 396,000 590,000

198

Page 211: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

13. Data exploration III — Descriptive statistics: measures of location, dispersion and skewness

Unit 13: Data exploration IIIDescriptive statistics: measures oflocation, dispersion and skewness

Overview

Although data visualisation is useful to get a ‘feel’ for the data, in practice we also needto be able to summarise data numerically. This unit introduces descriptive statistics anddistinguishes between measures of location, measures of dispersion and skewness. Allthese statistics provide useful summaries of raw datasets.

Aims

This unit introduces and explains the importance of descriptive statistics. Particularaims are:

to calculate simple numbers which will summarise the most importantcharacteristics of a dataset

to explain the use and limitations of various descriptive statistics.

Background reading

+ Swift, L., and S. Piff Quantitative methods for business, management and finance.(Palgrave, 2010) third edition [ISBN 9780230218246] ‘Describing data’ Chapter 2.

13.1 Summation notation

Very often in statistics we need to add up a set of numbers. Here we introduce thenotation which statisticians use to describe the sum of some numbers. By using thisnotation we are able to write many things more concisely, and hence they become easierto read. Let us begin with N numbers, denoted as

x1, x2, . . . , xN .

Here, x1 is the first number, x2 is the second number, and so on with xN being the lastnumber in the dataset. For example, if the numbers are 7, 4, 12 and 6, we write

x1 = 7, x2 = 4, x3 = 12 and x4 = 6.

Suppose we want to add up these numbers, i.e. we want to find

x1 + x2 + x3 + . . .+ xN .

199

Page 212: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

13. Data exploration III — Descriptive statistics: measures of location, dispersion and skewness

To shorten this we use the symbol∑

(known as the summation operator), writing

N∑i=1

xi = x1 + x2 + x3 + . . .+ xN . (13.1)

Summation operator

We can ‘translate’ this notation as follows — ‘the sum of the values, whose typicalmember is xi, beginning with the number x1 and ending with the number xN ’. So, usingthe above example, if x1 = 7, x2 = 4, x3 = 12 and x4 = 6, we have

4∑i=1

xi = x1 + x2 + x3 + x4 = 7 + 4 + 12 + 6 = 29.

As you might expect, it is possible to write down other expressions involving∑

. Forexample, we might be interested in the sums of the squares of the valuesx1, x2, x3, . . . , xN , which would be written as

N∑i=1

x2i = x2

1 + x22 + x2

3 + . . .+ x2N .

Quite often the value of N will be clear and in such cases it is common to write simply∑xi instead of

∑Ni=1 xi. With practice, using the summation operator should not pose

any difficulties. However, it is essential that you properly understand its interpretationsince the summation operator is used extensively in many areas of statistics.

Example 13.1

Suppose x1 = 1, x2 = 2, x3 = 3, y1 = 4, y2 = 5 and y3 = 6. We then have

3∑i=1

xi = 1 + 2 + 3 = 6,3∑i=1

yi = 4 + 5 + 6 = 15,

33∑i=1

xi = 3× 6 = 18,3∑i=1

3xi = 3 + 6 + 9 = 18,

3∑i=1

xi +3∑i=1

yi = 6 + 15 = 21,3∑i=1

(xi + yi) = (1 + 4) + (2 + 5) + (3 + 6) = 21,

3∑i=1

x2i = 12 + 22 + 32 = 14,

( 3∑i=1

xi

)2

= 62 = 36,

3∑i=1

xi

3∑i=1

yi = 6× 15 = 90,3∑i=1

xiyi = (1× 4) + (2× 5) + (3× 6) = 32,

as well as

3∑i=1

(x3i +

60

yi

)=

(13 +

60

4

)+

(23 +

60

5

)+

(33 +

60

6

)= 16 + 20 + 37 = 73,

200

Page 213: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

13. Data exploration III — Descriptive statistics: measures of location, dispersion and skewness

and3∑i=1

8 = 8 + 8 + 8 = 24.

Some key points to note:

We saw that3∑i=1

xi +3∑i=1

yi = 21 and3∑i=1

(xi + yi) = 21. It is true, in general, that

N∑i=1

xi +N∑i=1

yi =N∑i=1

(xi + yi).

We also saw that 33∑i=1

xi = 18 and3∑i=1

3xi = 18. It is true, in general, that

cN∑i=1

xi =N∑i=1

cxi,

whatever the value of the constant c.

We also saw that3∑i=1

x2i = 14 and

(3∑i=1

xi

)2

= 36. It is not true, in general, that

N∑i=1

x2i =

(N∑i=1

xi

)2

.

We also saw that3∑i=1

xiyi = 32 and3∑i=1

xi

3∑i=1

yi = 90. It is not true, in general, that

N∑i=1

xiyi =N∑i=1

xi

N∑i=1

yi.

Activity 13.1 A dataset contains the observations 1, 1, 1, 2, 4, 8, 9 (so here,N = 7). Find the following.

(a)N∑i=1

2xi, (b)N∑i=1

x2i , (c)

N∑i=1

(xi − 2),

(d)N∑i=1

(xi − 2)2, (e)

(N∑i=1

xi

)2

, (f)N∑i=1

2.

201

Page 214: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

13. Data exploration III — Descriptive statistics: measures of location, dispersion and skewness

Activity 13.2 Can you explain why

N∑i=1

x2i 6=

(N∑i=1

xi

)2

andN∑i=1

xiyi 6=N∑i=1

xi

N∑i=1

yi ?

What are

(N∑i=1

xi

)2

andN∑i=1

xi

N∑i=1

yi ? (Consider the case where N = 2.)

13.2 Measures of location

We now consider some ways of summarising a dataset numerically, rather than visually.Although the graphical methods presented earlier are extremely useful for getting a‘feel’ for and organising the data, they lack precision. Measures of location (alsocalled measures of central tendency) are statistics which provide a typical value for acollection of numbers. Clearly, such a typical, or average, value is an important propertyof a dataset. We shall encounter three ways of defining the ‘average’. In general, theaverage is a value typical, or representative, of a dataset.

The (arithmetic) mean is the sum of all the members of a dataset divided by thenumber of values in the dataset. Sometimes the mean is denoted by µ (pronounced‘mew’) and sometimes it is denoted by x (pronounced ‘x-bar’). The distinction betweenµ and x is very important. µ refers to the mean of a population, whereas x refers to themean of a sample. For now we shall not concern ourselves too much about thisdistinction — we shall return to it when we cover ‘sampling distributions’ in Unit 16.

Suppose we have a sample of n values, x1, x2, . . . , xn. Using the summation operator,we write the sample mean mathematically as

x =

∑ni=1 xin

=x1 + x2 + . . .+ xn

n. (13.2)

(Arithmetic) sample mean

So, for example, if a student scored 62, 74, 49, 37 and 58 in five tests, the mean markachieved is

62 + 74 + 49 + 37 + 58

5=

280

5= 56.

Another measure of location is the median. This is the central value of the datasetwhen the numbers are arranged in ascending order. If no single such central value exists(this occurs when there is an even number of values), then the mean of the two middlenumbers is taken.

For 1, 3, 5, 8, 12, the median is 5.

For 2, 5, 10, 3, 7, 6, we first arrange in order to give 2, 3, 5, 6, 7, 10, which gives amedian of 5+6

2= 5.5.

202

Page 215: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

13. Data exploration III — Descriptive statistics: measures of location, dispersion and skewness

The mode of a set of numbers is the most frequently occurring value. In some cases itmay not exist, or indeed it may not be unique.

The mode of 3, 3, 3, 3, 3, 7, 7, 8, 9, 9 is 3.

The set 2, 5, 10, 13, 16 has no mode.

The set 4, 4, 30, 50, 50, 90 is bimodal ; there are two modes: 4 and 50.

Sometimes datasets have extreme observations, called outliers (a more formal definitionis provided shortly). By construction, the median and mode are ‘insensitive’ or resistantto outliers since, respectively, they will be at the end(s) of the ordered datasets (hencedo not affect the median) or values that only occur once (hence not modes). However,they may have a big influence on the mean and hence give a misleading value.

One possible remedy is to calculate the trimmed mean. This involves dropping t(where t is a number, typically 1 or 2) observations from each end of the ordereddataset and calculate the mean of the remaining observations.

With ordered values x(1), x(2), . . . , x(n), where x(i) indicates the i-th ordered value,the trimmed (sample) mean, denoted xtr, is

xtr =

n−t∑i=t+1

x(i)

n− 2t. (13.3)

Trimmed (sample) mean

So the trimmed mean may be useful if we are concerned about extreme values beingpresent in the dataset.

For example, suppose the ordered dataset is 1, 32, 37, 38, 41, 192. Clearly the largestvalue is extreme, and to a lesser extent the smallest value is too, so we set t = 1 andcompute the trimmed mean to be

xtr =

∑i=n−ti=t+1 x(i)

n− 2t=

32 + 37 + 38 + 41

6− 2= 37.

13.2.1 Which ‘average’ should be used?

Given these three measures of location (mean, median and mode), a natural question toask is ‘which one should we use?’. The mean is usually the preferred choice but, due toits sensitivity to outliers, it is not always appropriate. If a cricketer scored 15, 4, 0, 9,148, 2, 0, 3, 6 runs over nine innings he is probably not very good — the high score of148 may be attributable to a weak opposition. In this case, the median, 4, may be morerepresentative than the mean, 20.8. Similarly, if 25 employees at a small company haveannual salaries between £20,000 and £50,000, with a single salary of £200,000 for themanaging director, again the median may better reflect typical salaries than the mean.

203

Page 216: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

13. Data exploration III — Descriptive statistics: measures of location, dispersion and skewness

In most cases the mean is the most commonly used ‘average’ and is frequently used inmany statistical applications. One positive feature of the mean is that it uses all thedata points. However, as we have already seen, the mean is sensitive to extreme values,unlike the median and mode. Therefore, whenever extreme points exist, the median ortrimmed mean should be considered instead.

The mode is particularly useful when the data values represent categories. For example,if the values 6, 7, 8, . . . are the sizes of shoes sold in a shop and we want the typical sizeof shoe sold.

Activity 13.3 Consider again the data on computer stoppages in Activity 12.1.

(a) Compute the mean, median, mode and a suitable trimmed mean for these data.

(b) Discuss the relative advantages of each of these measures.

13.2.2 Frequency tables

Sometimes we may have data in the form of a frequency table, such as

Observation, xi Frequency, fi2 43 24 35 1

This corresponds to the ordered dataset: 2, 2, 2, 2, 3, 3, 4, 4, 4, 5. So the mean is

(4× 2) + (2× 3) + (3× 4) + (1× 5)

4 + 2 + 3 + 1= 3.1.

This leads to the following more general result. If the numbers x1, x2, x3, . . . , xk occurwith respective frequencies f1, f2, f3, . . . , fk, then

x =f1x1 + f2x2 + f3x3 + . . .+ fkxk

f1 + f2 + . . .+ fk=

k∑i=1

fixi

k∑i=1

fi

. (13.4)

Let us now re-visit the credit card data from Unit 12. The classes are −25 to 25, 25 to75 etc. and we can use the frequency table to estimate the mean. You should appreciatethat when we do this, we lose some information when the table is constructed as we nolonger have the raw (original) data. Consequently if we calculate the mean using (13.4)we shall expect to lose some precision, but we still expect a reasonable estimate. So weface a trade-off: although we lose some precision (disadvantage), we have theconvenience of summarising the data in a frequency table (advantage). So our frequencytable is

204

Page 217: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

13. Data exploration III — Descriptive statistics: measures of location, dispersion and skewness

Expenditure Number of individuals Midpoint[−25, 25) 87 0[25, 75) 55 50[75, 125) 30 100[125, 175) 24 150

......

...

In the table above, the ‘Midpoint’ column is simply the centre value of the interval inthe ‘Expenditure’ column and we take this to be the expenditure value for each class.Now we are able to estimate the mean using (13.4). The estimate is

(87× 0) + (55× 50) + (30× 100) + (24× 150) + . . .+ (1× 1150)

87 + 55 + 30 + 24 + . . .+ 1= £166.94.

Compare this with the ‘true’ mean calculated from the ungrouped data (not provided),which is £168.30. As expected, grouping the data loses some precision, but neverthelesswe see the mean estimate is close to the true mean.

13.3 Measures of dispersion

Consider the two sets of numbers

0, 1, 5, 8, 9, 19 and 6.8, 6.9, 6.9, 7.0, 7.1, 7.3.

Both have a mean of 7, but the datasets are clearly very different. The second dataset ismore ‘compact’ while the first dataset is more ‘spread out’. Just because two datasetshave the same mean is not sufficient to fully describe them as the mean is unable todistinguish between the difference in the spread of the data. So we seek precise ways formeasuring spread, or dispersion. Just as there are several measures of location, there arealso several measures of dispersion.

We begin with the range, which is defined as the difference between the maximum andminimum observations.

For the dataset 0, 1, 5, 8, 9, 19, the range is 19− 0 = 19.

For the dataset 6.8, 6.9, 6.9, 7.0, 7.1, 7.3, the range is 7.3− 6.8 = 0.5.

The first dataset has a much larger range owing to the greater dispersion.

An alternative is to divide up a dataset into quartiles. In fact, we have already metone quartile — the median. Recall the median splits up a dataset into the bottom 50%of values and the top 50%. But we can also divide a dataset into four equal parts. Thefirst quartile, denoted Q1, is the value which splits the bottom 25% of observations fromthe top 75%; the second quartile, denoted Q2, is simply the median; the third quartile,denoted Q3, is the value which splits the bottom 75% of observations from the top 25%.

205

Page 218: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

13. Data exploration III — Descriptive statistics: measures of location, dispersion and skewness

Example 13.2

The marks for 20 students in an introductory statistics class are:

88 67 64 76 86 85 82 39 75 3490 63 89 89 84 81 96 100 70 96

We first arrange the data in ascending order:

34 39 63 64 67 70 75 76 81 82

84 85 86 88 89 89 90 96 96 100

Hence Q1 = 68.5, Q2 = 83 and Q3 = 89, obtained by taking the average of thenumbers either side of each ‘|’.

Finding these quartiles posed no great difficulties here because the number ofobservations, 20, is divisible by 4. When the number of observations is not divisible by 4things can become slightly more complicated, although for our purposes it will suffice totake the average of the values either side of where each quartile is located. However, inpractice most datasets are large which means the difference between alternativemethods which exist becomes negligible.

Analogously to quartiles, it is possible to divide datasets into deciles (10 equal parts) oreven percentiles (100 equal parts). For example, we can express the median as Q2, the5-th decile or even the 50-th percentile. We shall not consider deciles and percentilesany further.

Having introduced quartiles, we are now in a position to discuss another measure ofdispersion — the interquartile range (IQR).

We define the IQR as the difference between the third and first quartiles, that is

IQR = Q3 −Q1. (13.5)

Interquartile range

So for the dataset 0, 1, 5, 8, 9, 19 the median is 6.5. Q1 lies somewhere between 0and 1 which, taking the midpoint (average) for simplicity, is say 0.5. Similarly Q3

lies somewhere between 8 and 9 which, again taking the midpoint (average) forsimplicity, is say 8.5. So the IQR is 8.5− 0.5 = 8.

For the dataset 6.8, 6.9, 6.9, 7.0, 7.1, 7.3, the quartiles are similarly estimated to beQ1 = 6.85, Q2 = 6.95 and Q3 = 7.05. Hence the IQR is 7.05− 6.85 = 0.2.

As we found with the range, the first dataset has a greater IQR reflecting thegreater dispersion in the dataset.

206

Page 219: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

13. Data exploration III — Descriptive statistics: measures of location, dispersion and skewness

The 5-number summary provides a useful set of features for a distribution. It isgiven by (

x(1), Q1, Q2, Q3, x(n)

), (13.6)

where x(1) and x(n) denote the smallest and largest observations, respectively (whichare not necessarily the first and last observations).

5-number summary

We are now able to provide a more formal definition of an outlier. Earlier we describedit as an ‘extreme observation’. Now we define an outlier to be a data value that is morethan 1.5 times the interquartile range above Q3 or below Q1, that is less thanQ1 − 1.5× IQR or greater than Q3 + 1.5× IQR. Extreme outliers are more than 3 timesthe interquartile range above Q3 or below Q1.

For example, for the dataset 6.8, 6.9, 6.9, 7.0, 7.1, 7.3, we have found Q1 = 6.85,Q3 = 7.05 and IQR = 0.2. So outliers are any data points which are either less than6.85− 1.5× 0.2 = 6.55, or greater than 7.05 + 1.5× 0.2 = 7.35. Hence there are nooutliers.

Another, less often-used, measure of dispersion is the mean absolute deviation(MAD). For a dataset containing the points xi, i = 1, 2, . . . , n, we define it to be

MAD =

∑ni=1 |xi − x|

n, (13.7)

that is, we use the absolute value of the differences between the observations and the(sample) mean. Using the absolute value sign gives equal weight to values either side ofthe mean. Although it is easy to calculate the MAD, it is less used in practice than thefar more common, and important, measures of dispersion known as the variance andstandard deviation.

13.3.1 Variance and standard deviation

Variance, and its square root the standard deviation, are the most popular measures ofdispersion. Given (population) data values x1, x2, . . . , xN , with (population) mean µ,the variance is defined by

σ2 =

∑Ni=1(xi − µ)2

N. (13.8)

We can think of this as the average squared deviation from the mean. Due to thesquared term, data values which are distant from the mean have a correspondingly largevalue for xi − µ and therefore contribute a great deal to the variance, regardless ofwhether values lie far above or below the mean. Similarly, data values which lie close tothe mean (above or below) contribute comparatively little to the variance.

Note the notation for the variance is σ2, which is pronounced ‘sigma-squared’. In fact σ2

— as defined by (13.8) — is the notation used for the population variance, i.e. when thedata values cover the entire population under consideration. If, instead, the datasetrepresents a sample drawn from an underlying population, we refer to the samplevariance, which we denote by s2. Clearly, if we only have sample data, not only is thepopulation variance, σ2, unknown but so is the population mean, µ.

207

Page 220: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

13. Data exploration III — Descriptive statistics: measures of location, dispersion and skewness

For a sample of size n, with data values x1, x2, . . . , xn, the sample variance is calculatedas follows:

s2 =

∑ni=1(xi − x)2

n− 1. (13.9)

Notice that (13.9) is similar to (13.8) except we replace the population mean (µ) withthe sample mean (x), and divide by n− 1 instead. It should be intuitively clear why weuse x (µ is, of course, unknown). The ‘−1’ in the denominator is present for reasonswhich are beyond the scope of this course.

Consider again the datasets 0, 1, 5, 8, 9, 19 and 6.8, 6.9, 6.9, 7.0, 7.1, 7.3, each with amean of 7. We shall treat these as population datasets, hence µ = 7 for each dataset.

The first dataset has deviations about µ of −7, −6, −2, 1, 2, 12. The squareddeviations are therefore 49, 36, 4, 1, 4, 144 with a sum of 238. The variance, using(13.8), is then σ2 = 238/6 = 39.67 and the standard deviation isσ =√

39.67 = 6.30.

The second dataset has deviations about µ of −0.2, −0.1, −0.1, 0, 0.1, 0.3. Thesquared deviations are therefore 0.04, 0.01, 0.01, 0, 0.01, 0.09 with a sum of 0.16.The variance, again using (13.8), is then σ2 = 0.16/6 = 0.027 and the standarddeviation is σ =

√0.027 = 0.16.

As before, the first dataset has a greater variance (and hence standard deviation),due to the greater dispersion in the dataset.

Clearly, using (13.8) for population datasets and (13.9) for sample datasets becomesonerous when working them out by hand. It can be shown that (13.8) and (13.9) can beequivalently expressed, respectively, as

σ2 =

∑Ni=1 x

2i

N− µ2 (13.10)

s2 =(∑n

i=1 x2i )− nx2

n− 1. (13.11)

For example, using the dataset 6.8, 6.9, 6.9, 7.0, 7.1, 7.3 we have

N∑i=1

x2i = 6.82 + 6.92 + . . .+ 7.32 = 294.16,

so the variance is, using (13.10),

σ2 =294.16

6− 72 = 0.027,

as before.

Activity 13.4 (Hard)Show that (13.8) and (13.9) can be equivalently expressed as (13.10) and (13.11),respectively.

208

Page 221: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

13. Data exploration III — Descriptive statistics: measures of location, dispersion and skewness

13.3.2 Variance using frequency distributions

If data are presented as a frequency distribution with k classes, then the equivalentforms of (13.8) and (13.10) are

σ2 =

∑ki=1 fi(xi − µ)2

N=

∑ki=1 fix

2i

N− µ2. (13.12)

For example, suppose we have the following frequency distribution for ages of students.

xi (age) 18 19 20 21 22 23 24 25 26fi (frequency) 1 5 8 12 10 7 4 1 2

For these data we first find the (population) mean using (13.4),

µ =(1× 18) + (5× 19) + (8× 20) + . . .+ (2× 26)

1 + 5 + 8 + . . .+ 2= 21.58.

Using the first method in (13.12) the variance is

σ2 =1× (−3.58)2 + 5× (−2.58)2 + 8× (−1.58)2 + . . .+ 2× 4.422

1 + 5 + 8 + . . .+ 2= 3.20,

giving a standard deviation of σ =√

3.20 = 1.79. Alternatively, we could use the secondexpression in (13.12) which gives us

(1× 182) + (5× 192) + (8× 202) + . . .+ (2× 262)

50− 21.582 = 3.20,

again. The standard deviation would then be σ =√

3.20 = 1.79.

13.4 Skewness

We conclude our look at descriptive statistics with one further quantity since the meanand variance, while very useful, do not provide complete information about a dataset.

Skewness of a distribution quantifies the departure from symmetry. By definition asymmetric distribution has zero skewness. Although various methods exist to quantifyskewness, for this course we shall only be concerned with describing skewnessqualitatively, that is whether the skewness is positive (to the right) or negative (to theleft). This can be achieved by either comparing the mean and median, or visually byconsulting a distribution plot of a dataset, such as a histogram.

If the mean is greater than the median, then this indicates a positively-skeweddistribution (also referred to as ‘right-skewed’).

If the mean is less than the median, then this indicates a negatively-skeweddistribution (also referred to as ‘left-skewed’).

If the mean equals the median, then this indicates a symmetric distribution.

209

Page 222: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

13. Data exploration III — Descriptive statistics: measures of location, dispersion and skewness

In the case of skewed distributions, we have already said that the mean is sensitive tooutliers and so the mean is ‘pulled’ in that direction leading to the above relationshipsbetween mean and median.

Graphically, skewness can be determined by identifying where the long ‘tail’ of thedistribution lies. If the long tail is heading toward increasingly positive values on thehorizontal axis (i.e. on the right-hand side), then this indicates a positively-skewed(right-skewed) distribution. Similarly, if heading toward increasingly negative values(i.e. on the left-hand side) then this indicates a negatively-skewed (left-skewed)distribution, as illustrated in Figure 13.1.

Positively-skewed

distribution

Negatively-skewed

distribution

Figure 13.1: Different types of skewed distributions.

Finally, a boxplot (sometimes called a box-and-whisker plot) is a graph that shows the5-number summary as well as any outliers and extreme outliers. These are useful plotsto display a dataset’s distribution. Unlike histograms, these explicitly depict thequartiles. From a boxplot it is easy to obtain the following: median, quartiles, IQR,range, skewness and outliers. An example of a (not-to-scale) boxplot can be seen inFigure 13.2.

Key features of the box-plot are:

Median = middle line of the ‘box’.

Q1 and Q3 are represented as the ends of the box.

‘Whiskers’ are drawn from the quartiles (Q1 and Q3) to the observations furthestfrom the median which are not more than 1.5 times the IQR (i.e. excludingoutliers).

The whiskers are terminated by small lines.

Any points beyond the whiskers (i.e. outliers) are plotted individually.

210

Page 223: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

13. Data exploration III — Descriptive statistics: measures of location, dispersion and skewness

Values more than 3 boxlengths from Q3 (extreme outlier)

Values more than 1.5 boxlengths from Q3 (outlier)

Largest observed value that is not an outlier

Q3

50% of cases

have values Q2 = Median

within the box

Q1

Smallest observed value that is not an outlier

Values more than 1.5 boxlengths from Q1 (outlier)

Values more than 3 boxlengths from Q1 (extreme outlier)

o

o

x

x

Figure 13.2: An example of a boxplot (not to scale).

In the example in Figure 13.3, it can be seen that the median (Q2) is around 74, Q1 isabout 63 and Q3 is approximately 77. The numerous outliers provide a useful indicatorthat this distribution is negatively-skewed as the long tail covers lower values of thevariable. Note also that Q3 −Q2 < Q2 −Q1.

Activity 13.5 A group of cows was fed one of three experimental diets A, B and C.After two weeks, the gain or loss in weight was recorded in kilograms.

Weight gain Diet Weight gain Diet Weight gain Diet15 A 5 B 35 C−10 A 25 B 55 C

0 A 15 B 30 C−5 A 0 B −15 C10 A −10 B 45 C20 A 30 B 35 C−55 A −10 B 35 C

5 A 15 B 20 C15 A 5 B 35 C25 A 0 B 25 C

(a) Produce a boxplot of the data. (In your diagram there should be three boxplots— one representing each diet.)

(b) Interpret your diagram.

211

Page 224: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

13. Data exploration III — Descriptive statistics: measures of location, dispersion and skewness

Figure 13.3: A boxplot showing a negatively-skewed distribution.

13.5 Summary

This unit has introduced some quantitative approaches to summarising data, known asdescriptive statistics. We have distinguished measures of location, dispersion andskewness. Although descriptive statistics serve as a very basic form of statisticalanalysis, they nevertheless are extremely useful for capturing the main characteristics ofa dataset. Therefore any statistical analysis of data should start with visualising thedata (covered in Unit 12) and the calculation of descriptive statistics.

13.6 Key terms and concepts

5-number summary (Arithmetic) meanBoxplot Frequency tableInterquartile range Mean absolute deviationMeasures of dispersion Measures of locationMedian ModeOutliers QuartilesRange SkewnessStandard deviation Summation operatorTrimmed mean Variance

Learning outcomes

At the end of this unit, you should be able to:

interpret and summarise raw data on social science variables numerically

calculate basic measures of location and dispersion

describe the skewness of a distribution and interpret boxplots

discuss the key terms and concepts introduced in this unit.

212

Page 225: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

13. Data exploration III — Descriptive statistics: measures of location, dispersion and skewness

Exercises

Exercise 13.1

The tables below show the scores of two groups of students in a test question.

(a) Determine the value of x if the median of these marks is 2.5.

Mark 0 1 2 3 4Frequency 3 x 8 5 10

(b) Determine the value of y if the mean of these marks is 2.5.

Mark 0 1 2 3 4Frequency 3 y 8 5 10

Exercise 13.2

(a) For variables measured at which measurement level (nominal, ordinal, interval orratio) is the arithmetic mean the most appropriate?

(b) Asked whether they agree with a proposed increase in university fees, the followingcounts were obtained from a group of 75 respondents:

1. Strongly disagree 302. Disagree 153. Neither agree nor disagree 154. Agree 55. Strongly agree 10Total 75

i. What are the median and mode of the responses? Using the numerical scoresin the left-hand column (i.e. the scores 1 to 5), calculate the mean response.Briefly discuss whether the mean is appropriate for this type of data.

ii. Do these data indicate that there is widespread dissatisfaction with theproposed increase in fees? Justify your answer briefly.

Exercise 13.3

Display the data below using a boxplot and provide the 5-number summary.

3 2 4 8 7 19 2 5 3 4 10 12

Exercise 13.4

(a) Do you expect the income distribution of the UK population to be symmetric,positively skewed, or negatively skewed? Briefly explain your answer.

(b) Discuss the relative merits of the mean, the median and the mode for summarisingthe income distribution of the UK population.

213

Page 226: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

13. Data exploration III — Descriptive statistics: measures of location, dispersion and skewness

Exercise 13.5

A service station sells both unleaded petrol and diesel. It has recorded the followingfrequency distribution for the number of gallons sold per car for the two fuels in a totalsample of 1,000 vehicles.

Unleaded (gallons) Frequency Diesel (gallons) Frequency0–4.99 74 0–4.99 225–9.99 192 5–9.99 68

10–14.99 280 10–14.99 15315–19.99 105 15–19.99 5720–24.99 23 20–24.99 1125–29.99 6 25–29.99 9

Total 680 Total 320

(a) Estimate the mean for these grouped data for unleaded and diesel separately.

(b) Do drivers of unleaded vehicles, or of diesel vehicles, fill up with more fuel, on thewhole? Give a possible reason for your answer.

(c) Suppose the service station expects to refuel 240 cars in a day, in the sameproportions as given in the above table. Suppose unleaded costs $5.97 a gallon anddiesel costs $6.24 a gallon. Estimate the total daily income from the sale of fuel.

214

Page 227: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

14. Probability I — Introduction to probability theory

Unit 14: Probability IIntroduction to probability theory

Overview

The world around us is an uncertain place. Will GDP growth next year be positive ornegative? Which political party will win a general election? What will the weather betomorrow? These are just a few examples. Yes, we know what could happen (e.g.positive growth, negative growth or no growth) but we do not know with certainty inadvance what will happen. ‘Probability’ allows us to model uncertainty and in this unitwe explore probability theory.

Aims

This unit introduces the concept of probability and its role in modelling uncertainty.Particular aims are:

to provide an insight into the concept of probability

to apply some common results from probability theory

to introduce statistical independence.

Background reading

+ Swift, L., and S. Piff Quantitative methods for business, management and finance.(Palgrave, 2010) third edition [ISBN 9780230218246] ‘Probability’ Chapter 1.

14.1 Probability theory

Probability theory is used to determine how likely various events are. A few examplesinclude the likelihood of:

a machine in a factory breaking down

a person chosen at random being left-handed

when rolling a die, the upper face showing a ‘6’

when tossing a coin, the upper face showing ‘tails’.

Although probability is an interesting and important subject in its own right, our maininterest in probability arises due to its role in statistical inference. Inference is

215

Page 228: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

14. Probability I — Introduction to probability theory

surrounded by uncertainty (for example, to what extent is a sample representative ofthe population?) and probability provides a sound theoretical basis for quantifying theuncertainty involved.

14.1.1 Assigning probabilities

We would like to be able to carry out calculations with probabilities, and to work outslightly harder probabilities from simpler ones. But first, we need to know how to workout those simpler probabilities. We describe three broad approaches. Fortunately theseare consistent — there are no inherent contradictions between them. As we developthese, we shall find some general laws which are true for all probabilities.

14.1.2 The classical method

This method involves an experiment, i.e. a process that produces outcomes, and anevent, which is an outcome of an experiment. The probability of an event, E, is theratio of the number of items in the population containing the event, say f , and the totalnumber of items in the population, say N . We write

P (E) =f

N, (14.1)

and say ‘the probability of the event E is f/N ’.

For example, a factory has 200 workers, of whom 70 are female. The experimentconsists of randomly selecting an employee; the event consists of randomly selectinga female employee. In this case we have f = 70 and N = 200. We would writeP (female) = 70/200 = 0.35.

If we toss a fair (i.e. unbiased) coin, the experiment is the toss of the coin, and theevent is obtaining a tail, say. Now f = 1 and N = 2, so P (tail) = 1/2.

Even at this stage we can draw some general conclusions. Clearly, f ≥ 0 and f ≤ N , i.e.0 ≤ f ≤ N . It follows that

0

N≤ f

N≤ N

N,

i.e.,

0 ≤ P (·) ≤ 1, (14.2)

where P (·) denotes the probability of some event. From this, we deduce that:

Any probability lies between 0 and 1.

If an event E is certain to occur, then P (E) = 1 (corresponding to f = N).

If an event E is certain not to occur, then P (E) = 0 (corresponding to f = 0).

It is also true that events which are equally likely to occur or not occur (for example,tossing a fair coin and getting ‘tails’) have probabilities of 0.5. Sometimes probabilitiesare converted to percentages, such as ‘there is a 70% chance of rain tomorrow’.

216

Page 229: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

14. Probability I — Introduction to probability theory

14.1.3 The relative frequency approach

This approach depends on historical data. It might be used if the classical method isdifficult, or impossible, to apply. The probability of an event is the number of times theevent has occurred divided by the number of opportunities for the event to haveoccurred.

Suppose all the eggs leaving a farm are examined. If 100 are checked and 6 are found tobe cracked, we estimate the probability of a cracked egg to be 6/100 = 0.06. We cannotuse the classical method. An argument such as ‘there could be 0, 1, 2, . . . , 100 bad eggs,so the probability of 6 bad eggs is 1/101’ would be neglecting the fact that there aremore ways to obtain 6 (or 50, say) bad eggs than to obtain 0 (or 1, say) bad eggs. Acomplete listing of all outcomes contains many more than 101 outcomes.

The relative frequency approach can often be used when the classical method isunsuitable. For example, it could be used to assess the probability of a biased coinlanding ‘tails’. If such a coin is tossed 500 times and there are 310 tails, then weestimate the probability of getting tails as 310/500 = 0.62.

It could also be used to estimate the probability of a randomly-selected adult beingleft-handed. Notice that for a good estimate, the denominator should be large. Wewould find a better approximation if we checked a thousand eggs (or a million).

14.1.4 Subjective probabilities

This is the method that would be used when neither of the two other approaches areviable. It could be used, for example, to estimate the probabilities that:

Party X wins the election

a student passes an examination

the defendant is guilty of the crime.

Sometimes it will be based on knowledge and experience; sometimes it will be littlemore than a blind guess.

Such subjective probabilities can, and should, be updated in light of new information.For the examples above new information might be:

that Party X elects a new leader

a student’s performance in a mock examination

a witness giving evidence to the court.

In order to extend our study of probability, we now develop some helpful terms andsymbols.

14.2 Terminology

As we have seen, an experiment is a process that produces outcomes: for example,rolling a die or selecting 100 eggs from a farm and seeing how many are cracked.

217

Page 230: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

14. Probability I — Introduction to probability theory

An event is an outcome of an experiment: obtaining a ‘6’ with the die, obtaining aneven number with the die, finding 99 good eggs followed by one cracked egg, finding atotal of 99 good eggs.

A sample space is a complete listing of all possible events, called elementary events.The sample space for the roll of a single die is written {1, 2, 3, 4, 5, 6}.Sometimes — when there are very many, or infinitely many possible events — it isdifficult or impossible to list all the elements of the sample space, although it can oftenbe described in other ways. But if we can list all of the possible events, the sample spacecan help us find probabilities.

Example 14.1

Suppose an experiment involves rolling two dice. What is the probability that thesum of the scores is 7?

We can set out the sample space as follows:

(1, 1) (2, 1) (3, 1) (4, 1) (5, 1) (6, 1)(1, 2) (2, 2) (3, 2) (4, 2) (5, 2) (6, 2)(1, 3) (2, 3) (3, 3) (4, 3) (5, 3) (6, 3)(1, 4) (2, 4) (3, 4) (4, 4) (5, 4) (6, 4)(1, 5) (2, 5) (3, 5) (4, 5) (5, 5) (6, 5)(1, 6) (2, 6) (3, 6) (4, 6) (5, 6) (6, 6)

We can see that there are six outcomes with a sum of 7 (highlighted in bold),out of 36 possible events.

Hence the probability that the sum is 7 is 6/36 = 1/6.

14.3 Sets

Numbers or objects enclosed in braces can be thought of as sets. Sets themselves aresimply collections of objects. For example,

The collection of all outcomes when a die is rolled is the set {1, 2, 3, 4, 5, 6}.

The collection of all positive integers is the set {1, 2, 3, . . .}.

The colours of the rainbow form the set {red, orange, yellow, green, blue, indigo,violet}.

Sets can be represented by Venn diagrams, like the one in Figure 14.1. The set A isshown as an oval; the rectangle is the sample space, or ‘universal set’. IfA = {1, 2, 3, 4, 5, 6}, the universal set might be all positive integers; if A = {red, orange,yellow, green, blue, indigo, violet}, the universal set might be all possible colours.

218

Page 231: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

14. Probability I — Introduction to probability theory

A

Figure 14.1: A Venn diagram showing the set A in the universal set.

The union of two sets X and Y consists of those elements in X or Y or both, andis written

X ∪ Y.

Union of sets

No value is listed more than once in the union. For example, if X = {1, 4, 7, 9} andY = {2, 3, 4, 5, 6}, then X ∪ Y = {1, 2, 3, 4, 5, 6, 7, 9}. If X and Y are represented by two(overlapping) shaded ovals, the union is represented by the entire shaded region, asshown in Figure 14.2.

X Y

Figure 14.2: A Venn diagram showing the union of sets X and Y (the shaded region).

The intersection of two sets X and Y consists of those elements common to bothX and Y , and is written

X ∩ Y.

Intersection of sets

No value is listed more than once in the intersection. If X = {1, 4, 7, 9} andY = {2, 3, 4, 5, 6}, then X ∩ Y = {4}. If X and Y are represented by two (overlapping)ovals, the intersection is where they intersect! Figure 14.3 shows an example. In general,the union is a larger set than the intersection.

219

Page 232: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

14. Probability I — Introduction to probability theory

X Y

Figure 14.3: A Venn diagram showing the intersection of sets X and Y (the shadedregion).

We sometimes need to talk about a set with no elements. This is called the empty setand it is denoted ∅. For example,

{1, 3, 5, 7} ∩ {2, 4, 6, 8} = ∅.

Two events are mutually exclusive if the existence of one precludes the other. Theevents ‘male’ and ‘female’ are mutually exclusive when we observe gender. Theoutcomes ‘cracked’ and ‘not cracked’ are mutually exclusive when we sample eggs.

In general, if X and Y are mutually exclusive, the event X ∩ Y is certain not tooccur. Hence,

P (X ∩ Y ) = 0, (14.3)

if X and Y are mutually exclusive. Or, equivalently, P (X ∩ Y ) = 0 if X ∩ Y = ∅.

Mutually exclusive events

14.4 Independence

Two events are independent if the occurrence or non-occurrence of one does not affectthe occurrence or non-occurrence of the other. For example:

Whether an individual is left-handed is independent of whether they have a creditcard.

Whether Party X wins the election is independent of whether your car breaks downtoday.

Coin tosses are independent of each other; the event of getting ‘heads’ on the firsttoss is independent of getting ‘heads’ on the second toss.

However,

Whether Party X wins the election is not independent of who the party leader is.

Whether a die shows a ‘6’ is not independent of whether it shows an even number.

220

Page 233: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

14. Probability I — Introduction to probability theory

We let P (X|Y ) denote the probability of X occurring given that Y has occurred, this iscalled a conditional probability. If X and Y are independent, then the probability ofX occurring given that Y has occurred is just the probability of X occurring. That is, ifX and Y are independent, then

P (X|Y ) = P (X) and P (Y |X) = P (Y ). (14.4)

Hence, if X and Y are not independent (i.e. are dependent), then P (X|Y ) 6= P (X) andP (Y |X) 6= P (Y ).

For example, a person’s handedness is presumably independent of whether they prefertea of coffee, so

P (prefers tea | person is right-handed) = P (prefers tea).

14.5 Complementary events

The complement of event A is denoted Ac.∗ The complement of A contains all theelementary events that are not in A.

If, in rolling a die, event A is getting an even number, Ac is the event getting anodd number.

If event A is getting a cracked egg from a sample, Ac is finding a good(non-cracked) egg.

If A is the event that it rains tomorrow, Ac is the event that there is no raintomorrow.

If the occurrence of event A corresponds to one of f elementary events out of a total ofN elementary events, then Ac corresponds to N − f elementary events. Now,

N − fN

=N

N− f

N,

and we deduce the following important result.

If A is some event whose complement is Ac, then

P (Ac) = 1− P (A). (14.5)

Complementary events

Example 14.2

If the probability of rain tomorrow is 0.6, then the probability of no rain is1− 0.6 = 0.4.

If the probability of selecting a female employee from a workforce is 0.35, theprobability of selecting a male employee is 1− 0.35 = 0.65.

∗Other accepted forms of notation for the complement of A include A and A′.

221

Page 234: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

14. Probability I — Introduction to probability theory

14.6 Additive laws

We now look at several ways of combining probabilities. Suppose that event A arisesfrom f elementary events, event B arises from g elementary events, and A and B aremutually exclusive. Then the event A or B, A ∪B, arises from f + g elementary events.Hence,

P (A ∪B) =f + g

N=

f

N+

g

N= P (A) + P (B),

if there are N elementary events altogether. From this we deduce the important resultthat, if A and B are mutually exclusive,

P (A ∪B) = P (A) + P (B). (14.6)

Example 14.3

We saw previously that the probability of obtaining a sum of 7 with two dice is1/6.

The probability of obtaining a sum of 12 is 1/36, since only one event, (6, 6),out of 36 will achieve this.

Obviously, totals of 7 and 12 are mutually exclusive, so the probability of 7 or12 is 1/6 + 1/36 = 7/36.

Example 14.4

Suppose movies are classified at a store.

Let E1 and E2 be the events that the movie rented by the next customer iscomedy and horror, respectively.

Suppose P (E1) = 0.26 and P (E2) = 0.18.

Using (14.6), P (E1 ∪ E2) = 0.26 + 0.18 = 0.44.

Hence the probability that it is neither comedy nor horror is 1− 0.44 = 0.56.

(Of course, we are assuming that there are no horror comedies!)

Let us look again at the Venn diagram for the union of two sets. We shall use thenotation n(A ∪B) for the number of elements in A ∪B, and similarly for n(A), n(B)and n(A ∩B). Now count n(A ∪B). It is not n(A) + n(B) because the part in A ∩Bwill have been counted twice. If we subtract n(A ∩B) we will be right. So,

n(A ∪B) = n(A) + n(B)− n(A ∩B).

Hence,n(A ∪B)

N=n(A)

N+n(B)

N− n(A ∩B)

N,

where there are N possible events altogether. But n(A ∪B)/N = P (A ∪B),n(A)/N = P (A), and so on. We deduce the general law of addition.

222

Page 235: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

14. Probability I — Introduction to probability theory

For any events A and B,

P (A ∪B) = P (A) + P (B)− P (A ∩B). (14.7)

General law of addition

(14.7) holds for any events A and B, regardless of whether or not they are mutuallyexclusive. (14.6) is a special case of (14.7) where A ∩B = ∅.

Example 14.5

What is the probability of drawing a queen or a heart from a well-shuffled pack of 52cards?

If the two events are denoted Q and H respectively, then P (Q) = 4/52 andP (H) = 13/52.

The answer is not 4/52 + 13/52 because they are not mutually exclusive —consider the Queen of Hearts!

But we can use (14.7), noting that P (Q ∩H) = 1/52, to obtain

P (Q ∪H) = P (Q) + P (H)− P (Q ∩H)

=4

52+

13

52− 1

52

=16

52

=4

13.

Example 14.6

If 16% of the population are left-handed, 30% are overweight, but only 25% ofleft-handers are overweight, what is the probability of a randomly-selected personbeing left-handed or overweight?

Let L and O be the events of the person being left-handed and overweight,respectively.

Then P (L) = 0.16, P (O) = 0.3 and P (L ∩O) = 0.04.

Then applying (14.7) gives

P (L ∪O) = P (L) + P (O)− P (L ∩O)

= 0.16 + 0.3− 0.04

= 0.42.

223

Page 236: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

14. Probability I — Introduction to probability theory

14.7 Multiplicative laws

Consider the following:

The probability of rolling a ‘6’ with a (fair) die is 1/6.

The probability of tossing a head with a (fair) coin is 1/2.

The probability of getting a ‘6’ and a head is 1/12, because (6, H) is just oneelementary outcome out of 12:

(1, H), (2, H), (3, H), (4, H), (5, H), (6, H),(1, T ), (2, T ), (3, T ), (4, T ), (5, T ), (6, T ).

Notice that 1/6× 1/2 = 1/12. We have been able to multiply the probabilities becausethe two events are independent.

More generally, if A and B are independent events, then

P (A ∩B) = P (A)P (B). (14.8)

Independent events

Example 14.7

Suppose 90% of UK adults drive a car and 60% have a broadband connection. Whatis the probability that a UK adult drives and has a broadband connection?

Well, it is tempting to use (14.8) to say that the probability is 0.9× 0.6 = 0.54.

But this depends on the two events being independent.

If they are, the answer is correct; if not, we do not have enough information tosolve the problem.

Activity 14.1 A chain is formed from n links. The strengths of the links aremutually independent, and the probability that any one link fails under a specifiedload is q. What is the probability that the chain fails under the load?

Recall that P (A|B) is the probability of event A given that event B has occurred. Theevent A ∩B (A and B) will occur if A occurs and if B occurs, so once we know B hasoccurred with probability P (B) we can use P (A|B) to find the probability that A ∩Boccurs, i.e.

P (A ∩B) = P (B)P (A|B).

We could also argue that P (A ∩B) = P (A)P (B|A).

224

Page 237: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

14. Probability I — Introduction to probability theory

For any events A and B,

P (A ∩B) = P (A)P (B|A) = P (B)P (A|B). (14.9)

General law of multiplication

(14.8) is a special case of (14.9). If A and B are independent, then P (B|A) = P (B) andP (A|B) = P (A), hence the general law reduces to (14.8).

Example 14.8

Suppose a company has 140 employees, of which 30 are supervisors. 80 of theemployees are married, 20% of the married employees are supervisors. What is theprobability that a random employee is a married supervisor?

We let M denote married, S denote supervisor and we require P (M ∩ S).

We know that P (M) = 80/140 = 4/7 and P (S|M) = 20/100 = 1/5.

So, applying (14.9), we obtain

P (M ∩ S) = P (M)P (S|M) =4

7× 1

5=

4

35≈ 0.1143.

Do not confuse ‘mutually exclusive’ and ‘independent’ ! ‘Mutually exclusive’ means twoevents cannot occur simultaneously. ‘Independent’ means that the occurrence ornon-occurrence of one event does not affect the occurrence or non-occurrence of theother. These are not the same thing at all!

Activity 14.2 Suppose A and B are events with P (A) = 0.2, P (B) = p andP (A ∪B) = 0.6.

(a) Evaluate p and P (A|B) if A and B are mutually exclusive events.

(b) Evaluate p and P (A|B) if A and B are independent events.

14.8 Bayes’ theorem

Bayes’ theorem has far-reaching consequences for statistical inference. We canconsider it in two ways:

As a tool for handling conditional probabilities, specifically the connection betweenP (A|B) and P (B|A).

As a means for modifying probabilities in light of new information.

We state the theorem in three forms, of increasing generality, illustrating both ideas.The first two will be justified, but not the third (the proof is beyond the scope of thecourse).

225

Page 238: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

14. Probability I — Introduction to probability theory

14.8.1 Version 1

For any events A and B, where P (A) 6= 0,

P (B|A) =P (A|B)P (B)

P (A). (14.10)

Bayes’ theorem (version 1)

This is easily justified. We know that P (A ∩B) = P (A)P (B|A) and also thatP (A ∩B) = P (B)P (A|B). Hence P (A)P (B|A) = P (B)P (A|B). Dividing both sides byP (A) gives the desired result, noting that P (A) 6= 0.

Example 14.9

Suppose we know that 5% of companies in a certain sector go bankrupt, while 10%of companies in the sector are unable to meet demand. Of those that go bankrupt,20% have been unable to meet demand. A company is unable to meet demand, whatis the probability of bankruptcy?

We can think of this as an exercise in conditional probability, or as modifying theprobability of bankruptcy (5%) in light of the new information about unmet demandfor their products.

Define the events: A is a company unable to meet demand and B is a companythat goes bankrupt.

We know P (A) = 0.1, P (B) = 0.05 and P (A|B) = 0.2.

We require P (B|A), and this is, using (14.10),

P (B|A) =P (A|B)P (B)

P (A)=

0.2× 0.05

0.1=

0.01

0.1= 0.1.

So the unconditional probability (5%) has been changed to 10% in light of thenew information about unmet demand.

14.8.2 Version 2

Let B be an event and Bc be its complement. If another event A occurs, then

P (B|A) =P (A|B)P (B)

P (A|B)P (B) + P (A|Bc)P (Bc). (14.11)

Bayes’ theorem (version 2)

This follows from the first version, which has denominator P (A). Since either B or Bc

must occur, we can deduce that P (A) = P (A ∩B) + P (A ∩Bc) (a Venn diagram may

226

Page 239: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

14. Probability I — Introduction to probability theory

make this clear). Also, P (A ∩B) = P (A|B)P (B), P (A ∩B) = P (A|Bc)P (Bc) by twoapplications of (14.9), so P (A) = P (A|B)P (B) + P (A|Bc)P (Bc), hence the result.

Example 14.10

Suppose a department store is considering new arrangements for credit. The creditmanager has suggested that credit should be discontinued to any customer who hastwice or more been late with repayments. She supports her claim by noting that pastrecords show that 90% of those defaulting were late with their repayments at leasttwice.

Separate investigations show that 2% of customers actually default on theirrepayments, and 45% of those not defaulting have had at least two late payments.What is the probability that a customer with two or more late payments actuallydefaults? Comment on the manager’s proposals.

First translate the problem from ‘English’ to ‘mathematics’.

Let L be a credit customer who is late with repayments at least twice and D bea customer who defaults on payments.

We require P (D|L), so using (14.11) we express this as

P (D|L) =P (L|D)P (D)

P (L|D)P (D) + P (L|Dc)P (Dc).

We know that P (L|D) = 0.90, P (D) = 0.02, P (L|Dc) = 0.45 andP (Dc) = 1− P (D) = 1− 0.02 = 0.98.

(14.11) now gives us

P (D|L) =0.90× 0.02

0.90× 0.02 + 0.45× 0.98= 0.0392 ≈ 0.04.

This is a surprising result. If the credit manager’s plan is adopted, only about 1customer in 25 who loses credit would actually have defaulted. So we would lose24 good credit customers to detect one defaulter. This is bad business! So weshould reject the proposal.

14.8.3 Version 3

Suppose the events X1, X2, . . . , Xn are mutually exclusive and collectively exhaustive,that is some Xi must occur but no two Xis can occur together. Then, for any i =1, 2, . . . , n,

P (Xi|A) =P (A|Xi)P (Xi)

P (A|X1)P (X1) + P (A|X2)P (X2) + . . .+ P (A|Xn)P (Xn). (14.12)

Bayes’ theorem (version 3)

227

Page 240: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

14. Probability I — Introduction to probability theory

As previously mentioned, we omit the proof of this version. However, note that

P (A) =n∑i=1

P (A|Xi)P (Xi),

if the events X1, X2, . . . , Xn are mutually exclusive and collectively exhaustive.

Example 14.11

Machines A, B and C all produce the same two parts X and Y. Of all the partsproduced, machine A produces 60%, machine B produces 30% and machine Cproduces 10%. In addition, 40% of parts made by A are part X; 50% of parts madeby B are part X; 70% of parts made by C are part X. A part is randomly selectedand is found to be an X part. With this knowledge, what are the revisedprobabilities that it came from machines A, B and C?

Let X be the event that we have randomly selected an X part.

We can usefully put calculations in a table:

Event P (Ei) P (X|Ei) P (X|Ei)P (Ei) P (Ei|X)E1 (came from A) 0.6 0.4 0.24 0.24

0.46= 0.52

E2 (came from B) 0.3 0.5 0.15 0.150.46

= 0.33E3 (came from C) 0.1 0.7 0.07 0.07

0.46= 0.15

P (X) = 0.46

We have used (14.12) in the form

P (Ei|X) =P (X|Ei)P (Ei)

P (X|E1)P (E1) + P (X|E2)P (E2) + P (X|E3)P (E3),

and we have been able to find revised probabilities of 0.52, 0.33 and 0.15 of thepart having come from A, B and C, respectively, rather than the probabilities of0.6, 0.3 and 0.1 using the knowledge that the part was an X part.

The unmodified and modified probabilities are sometimes called prior andposterior probabilities respectively.

14.9 Summary — a listing of probability results

We conclude with a summary of the key probability results presented in this unit.

1. 0 ≤ P (A) ≤ 1, for any event A.

2. When combining events ∪ means ‘or’ and ∩ means ‘and’.

3. P (A ∩B) = 0 if A and B are mutually exclusive events.

4. P (Ac) = 1− P (A) where Ac is the complement of event A.

228

Page 241: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

14. Probability I — Introduction to probability theory

5. P (A ∪B) = P (A) + P (B) for mutually exclusive events A and B.

6. P (A ∪B) = P (A) + P (B)− P (A ∩B) for any events A and B.

7. If X and Y are independent, then P (X|Y ) = P (X) and P (Y |X) = P (Y ).

8. P (A ∩B) = P (A)P (B) if A and B are independent.

9. P (A ∩B) = P (A)P (B|A) = P (B)P (A|B) for any events A and B.

10. Bayes’ theorem — three versions:

Version 1: For any events A and B,

P (B|A) =P (A|B)P (B)

P (A).

Version 2: For events A and B,

P (B|A) =P (A|B)P (B)

P (A|B)P (B) + P (A|Bc)P (Bc).

Version 3: For mutually exclusive and collectively exhaustive eventsX1, X2, . . . , Xn, for any i = 1, 2, . . . , n,

P (Xi|A) =P (A|Xi)P (Xi)

P (A|X1)P (X1) + P (A|X2)P (X2) + . . .+ P (A|Xn)P (Xn).

Example 14.12

Five bonds are rated A+, A, B+, B or C, depending on the stability of the issuingfirm. An inexperienced bond buyer selects two bonds at random from five. (a) Whatis the probability that she does not buy any rated C? (b) What is the probabilitythat she buys only A+ and A?

(a) • Let X be the event that the first bond is not C, and Y be the event thatthe second is not C.

• We require P (X ∩ Y ) = P (X)P (Y |X), and we know that P (X) = 4/5.

• To find P (Y |X), we note that if the first is not a C, there are fourremaining: one is C, the others are not C.

• So P (Y |X) = 3/4, hence

P (X ∩ Y ) = P (X)P (Y |X) =4

5× 3

4=

3

5.

(b) • The investor only buys A and A+ if the first one is A and the second A+,or the first is A+ and the second is A.

• The first of these probabilities is 1/5× 1/4, by a similar argument to thefirst part. The second is also 1/5× 1/4.

• So the probability of the investor obtaining these two rated bonds is

1

20+

1

20=

1

10.

229

Page 242: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

14. Probability I — Introduction to probability theory

Example 14.13

A recent survey of 1700 companies showed that 49% performed studies of marketingeffectiveness, 61% conducted short-term sales forecasts, and 38% undertook bothactivities. Define events A and B by:

A: the firm studies marketing effectiveness

B: the firm produces short-term sales forecasts.

Find P (A ∪B) and P (A|B).

Notice that P (A) = 0.49, P (B) = 0.61 and P (A ∩B) = 0.38 directly.

So,

P (A ∪B) = P (A) + P (B)− P (A ∩B) = 0.49 + 0.61− 0.38 = 0.72

P (A|B) =P (A ∩B)

P (B)=

0.38

0.61≈ 0.62.

If we wanted to estimate how many of the 1700 firms undertook both A and B,we would say 0.38× 1700 = 646, or about 650 firms.

Example 14.14

The table below gives the marital status of adults in a country by sex in terms ofproportions of the total population.

Single Married Widowed Divorced TotalMale 0.116 0.319 0.012 0.028 0.475Female 0.093 0.325 0.066 0.041 0.525Total 0.209 0.644 0.078 0.069 1.000

We can make the following observations, comments and deductions.

These are obviously proportions of the whole population.

Many more widowed women than widowed men.

More women than men overall.

More women than men in the ratio 0.525:0.475 = 21:19.

Most people of both sexes are married.

P (male) = 0.475; i.e. 47.5% of the population are male.

P (Male ∩ Divorced) = 0.028; i.e. 2.8% of adults are divorced men.

P (Divorced | Male) = P (Male ∩ Divorced)/P (Male) = 0.028/0.475 = 0.059;i.e. 5.9% of adult males are divorced.

230

Page 243: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

14. Probability I — Introduction to probability theory

P (Male | Divorced) = P (Male ∩ Divorced)/P (Divorced) = 0.028/0.069 =0.406; i.e. 40.6% of divorced adults are male.

If we also knew that the total adult population is 13.6 million, we can convert theproportions to absolute numbers. The resulting table is known as a contingency table.

Single Married Widowed Divorced TotalMale 1,577,600 4,338,400 163,200 380,800 6,460,000Female 1,264,800 4,420,000 897,600 557,600 7,140,000Total 2,842,400 8,758,400 1,060,800 938,400 13,600,000

Example 14.15

Two events A and B are independent with P (A) = 0.3 and P (B) = 0.1. (a) Are Aand B mutually exclusive? Give a reason. (b) Find P (A|B) and P (B|A). (c) FindP (A ∪B) and P (Ac ∩Bc).

(a) P (A ∩B) = P (A)P (B) given the independence of A and B, soP (A∩B) = 0.3× 0.1 6= 0. Thus the event A∩B can occur, i.e. A and B are notmutually exclusive.

(b) Using independence, P (A|B) = P (A) and P (B|A) = P (B). So P (A|B) = 0.3and P (B|A) = 0.1.

(c) We have

P (A∪B) = P (A)+P (B)−P (A∩B) = 0.3+0.1−(0.3×0.1) = 0.3+0.1−0.03 = 0.37.

Finally, look at the Venn diagram in Figure 14.2. If one circle corresponds to A andthe other to B, the white area represents both Ac ∩Bc and (A ∪B)c. These are thesame set, i.e. Ac ∩Bc = (A ∪B)c. Hence,

P (Ac ∩Bc) = P ((A ∪B)c) = 1− P (A ∪B) = 0.63.

14.10 Summary

This unit has introduced the fundamentals of probability theory and we have seen someimportant probability results, including conditional probability and independence. Agood grounding in probability theory is necessary before moving on to probabilitydistributions in the next unit.

14.11 Key terms and concepts

Bayes’ theorem ComplementConditional probability EventExperiment Independent

231

Page 244: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

14. Probability I — Introduction to probability theory

Intersection Law of additionLaw of multiplication Mutually exclusiveOutcomes Probability theorySample space SetsUnion Venn diagrams

Learning outcomes

At the end of this unit, you should be able to:

apply the ideas and notation used for sets in simple examples

recall some common probability results

explain the ideas of conditional probability and independence

discuss the key terms and concepts introduced in this unit.

Exercises

Exercise 14.1

Let K be the event of drawing a ‘king’ from a well-shuffled pack of cards. Let D be theevent of drawing a ‘diamond’ from the pack. Evaluate:

(a) P (K)

(b) P (D)

(c) P (Kc)

(d) P (K ∩D)

(e) P (K ∪D)

(f) P (K|D)

(g) P (D|K)

(h) P (D ∪Kc)

(i) P (Dc ∩K)

(j) P (Dc ∩K | D ∪K).

Are the events D and K independent, mutually exclusive, neither or both?

Exercise 14.2

If A and B are independent events such that P (A) = 0.2 and P (B) = 0.6, what isP (Ac ∩Bc)?

232

Page 245: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

14. Probability I — Introduction to probability theory

Exercise 14.3

A and B are two mutually exclusive events. State what this means:

(a) in words

(b) using set notation.

Exercise 14.4

A student has an important job interview in the morning. To ensure he wakes up intime, he sets two alarm clocks which ring with probabilities 0.97 and 0.99 respectively.What is the probability that at least one of the alarm clocks will wake him up?

Exercise 14.5

20% of men show early signs of losing their hair. 2% of men carry a gene that is relatedto hair loss. 80% of men who carry the gene experience early hair loss.

(a) What is the probability that a man carries the gene and experiences early hair loss?

(b) What is the probability that a man carries the gene, given that he experiencesearly hair loss?

Exercise 14.6

Tower Construction Company is determining whether it should submit a bid for a newshopping centre. In the past, Tower’s main competitor, Skyrise Construction Company,has submitted bids 80% of the time. If Skyrise does not bid on a job, the probabilitythat Tower will get the job is 0.6. If Skyrise does submit a bid, the probability thatTower gets the job is 0.35.

(a) What is the probability that Tower will get the job?

(b) If Tower gets the job, what is the probability that Skyrise made a bid?

(c) If Tower did not get the job, what is the probability that Skyrise did not make abid?

Exercise 14.7

In a large lecture, 60% of the students are female and 40% are male. Records show that15% of the female students and 20% of the male students are registered as part-timestudents.

(a) If a student is chosen at random from the lecture, what is the probability that thestudent studies part-time?

(b) If a randomly chosen student studies part-time, what is the probability the studentis male?

233

Page 246: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

14. Probability I — Introduction to probability theory

Exercise 14.8

James is a salesman for a company and sells two products, X and Y. He visits threedifferent customers each day. For each customer, the probability that James sellsproduct X is 1

3and the probability is 1

4that he sells Y. The sale of product X is

independent of the sale of product Y during any visit, and the results of the three visitsare mutually independent.

Calculate the probability that James will:

(a) sell both products, X and Y, on the first visit

(b) sell only one product during the first visit

(c) make no sales of product X during the day

(d) make at least one sale of product Y during the day.

Exercise 14.9

Given two events, A and B, state why each of the following is not possible. Useformulae or equations to illustrate your answer.

(a) P (A) = −0.46

(b) P (A) = 0.26 and P (Ac) = 0.62

(c) P (A ∩B) = 0.92 and P (A ∪B) = 0.42

(d) P (A ∩B) = P (A)P (B) and P (B) > P (B|A).

Exercise 14.10

At a local school, 90% of the students took test A, and 15% of the students took bothtest A and test B. Based on the information provided, which of the followingcalculations are not possible, and why? What can you say based on the data?

(a) P (B|A)

(b) P (A|B)

(c) P (A ∪B).

If you knew that everyone who took test B also took test A, how would that changeyour answers?

Exercise 14.11

A company is concerned about interruptions to e-mail. It was noticed that problemsoccurred on 15% of workdays. To see how bad the situation is, calculate theprobabilities of an interruption during a five-day working week:

(a) on Monday and again on Tuesday

(b) for the first time on Thursday

234

Page 247: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

14. Probability I — Introduction to probability theory

(c) every day

(d) at least once during the week.

Exercise 14.12

A restaurant manager classifies customers as well-dressed, casually dressed or poorlydressed and finds that 50%, 40% and 10% respectively fall into these categories. Themanager found that wine was ordered by 70% of the well-dressed, by 50% of thecasually dressed and by 30% of the poorly dressed.

(a) What is the probability that a randomly chosen customer orders wine?

(b) If wine is ordered, what is the probability that the person ordering is well-dressed?

(c) If wine is not ordered, what is the probability that the person ordering is poorlydressed?

235

Page 248: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

15. Probability II — Probability distributions

Unit 15: Probability IIProbability distributions

Overview

The previous unit introduced probability as a means for modelling uncertainty. We nowconsider the probabilities attached to all possible outcomes of a chance experiment, thatis, how probability is distributed across the sample space. Just as we used descriptivestatistics to summarise important features of sample datasets, here we learn how tocalculate equivalent features of population probability distributions.

Aims

This unit explores probability distributions and how to calculate the expected value andvariance for discrete random variables. Particular aims are:

to introduce some common discrete probability distributions

to explore properties of such distributions such as the expected value and variance.

Background reading

+ Swift, L., and S. Piff Quantitative methods for business, management and finance.(Palgrave, 2010) third edition [ISBN 9780230218246] ‘Probability’ Chapter 2.

15.1 Random variables

A random variable is a variable that contains the outcomes of a chance experiment.Alternatively, a random variable is a description of the possible outcomes of anexperiment together with the probabilities of each occurring. These, and other possibledefinitions, are somewhat abstract, so we illustrate with some examples.

Example 15.1

Examine the outcomes when two dice are rolled; we consider the random variable Xthat is the sum of the shown scores. We can read off the various possibilities fromthe sample space:

(1, 1) (2, 1) (3, 1) (4, 1) (5, 1) (6, 1)(1, 2) (2, 2) (3, 2) (4, 2) (5, 2) (6, 2)(1, 3) (2, 3) (3, 3) (4, 3) (5, 3) (6, 3)(1, 4) (2, 4) (3, 4) (4, 4) (5, 4) (6, 4)(1, 5) (2, 5) (3, 5) (4, 5) (5, 5) (6, 5)(1, 6) (2, 6) (3, 6) (4, 6) (5, 6) (6, 6)

236

Page 249: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

15. Probability II — Probability distributions

We could write the possible outcomes by P (X = 2) = 1/36, P (X = 3) = 2/36, andso on, or, more succinctly, in a table depicting the probability distribution of therandom variable X as follows.

X = x 2 3 4 5 6 7 8 9 10 11 12Probability 1

36236

336

436

536

636

536

436

336

236

136

Example 15.2

Here, we describe the sample space when two fair coins are tossed, and the associatedrandom variable, X, which counts the number of tails showing. The sample space is

{HH, HT, TH, TT},

so X takes the form:

Number of tails 0 1 2Probability 1

412

14

So, X is a random variable taking values 0, 1 and 2 with probabilities 1/4, 1/2 and1/4, respectively.

15.2 Discrete random variables

Since different types of random variables will be analysed differently, it is necessary todistinguish discrete and continuous ones. A random variable is discrete if its set ofpossible values consists of isolated points on the number line. (Often, but notnecessarily, these will be non-negative integers.) The number of such points may befinite or infinite.

The two examples we have seen above (the dice and the coins) are both discrete (andfinite). Notice that the sum of all the probabilities must be 1, since one of the possibleoutcomes must occur. True for the dice, true for the coins, true in general!

Other examples of discrete random variables include:

The number of defective items in a batch of 100 items.

The number of US voters from a sample of 1,000 who voted for Obama in 2012.

The number of people arriving at a store in a five-minute period.

The number of crimes recorded daily at a police station.

The first two of these are definitely finite. The others are finite in practice — but thereis no theoretical upper limit, so it may be convenient to specify an infinite number ofoutcomes.

We can describe a discrete distribution (i.e. the random variable and the associatedprobabilities) with a histogram or bar chart (rarely), a table as for the dice and coinsabove (sometimes) or with a rule or formula (most common).

237

Page 250: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

15. Probability II — Probability distributions

A useful discrete distribution that we shall discuss later is the Binomial distribution.

15.3 Continuous random variables

A random variable is continuous if it takes on values at every point over a (continuous)interval. Loosely, things are measured rather than counted. Examples include:

The volume of petrol in a storage tank.

The time between customer arrivals at a counter.

The heights of a group of individuals.

The noise in decibels at a night club.

Such random variables are clearly infinite.

A continuous distribution would usually be described by a formula (statisticians use theterm ‘probability density function’); we might sometimes be able to use a graph. Acomplete study of continuous distributions requires a knowledge of calculus. We will notconsider continuous probability distributions in this course, except for the Normaldistribution which will be covered in depth later on.

15.4 Mathematical expectation

We now concentrate on discrete random variables. Recall, from Example 15.2, thedistribution of the number of tails, X, when two coins are tossed:

Number of tails 0 1 2Probability 1

412

14

Suppose the experiment is repeated a large number of times, say n = 4,000,000. Wewould expect approximately 1 million 0’s, 2 million 1’s and 1 million 2’s. So the meanvalue of X would be

Sum of measurements

n

=(0× 1,000,000) + (1× 2,000,000) + (2× 1,000,000)

4,000,000

=0× 1,000,000

4,000,000+

1× 2,000,000

4,000,000+

1× 2,000,000

4,000,000

= 0× 1

4+ 1× 1

2+ 2× 1

4

= 1.

238

Page 251: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

15. Probability II — Probability distributions

The first term is 0× P (X = 0), the second term is 1× P (X = 1) and the third term is2× P (X = 2). Therefore, the mean value of X is∑

x · P (X = x).

Of course, this is no accident.

We define the mean, or expected value, or ‘expectation’ of a discrete randomvariable X to be

E(X) =∑

x · P (X = x), (15.1)

where we sum over all the values x which are taken by the random variable X.

Expectation of a random variable

We often write

E(X) = µ, (15.2)

the same symbol that is used for the (population) arithmetic mean. We can think ofE(X) as the long-run average when the experiment is carried out a large number oftimes.

Example 15.3

Suppose I buy a lottery ticket for £1. I can win £500 with probability 0.001 or £100with probability 0.003. What is my expected profit?

We begin by defining the random variable X to be my profit. Its distribution is:

X = x −£1 £499 £99P (X = x) 0.996 0.001 0.003

Using the method to calculate expectations, we get

E(X) = (−1× 0.996) + (499× 0.001) + (99× 0.003) = −0.2.

So I expect to make a loss of £0.20 (which will go to funding the prize money or,possibly, charity).

15.5 Functions of a random variable

We have seen that a random variable may be specified by the set of values it takestogether with the associated probabilities of each.

For example, an unbiased die is rolled. X denotes the score obtained. We can representthe outcomes using the table

X = x 1 2 3 4 5 6Probability 1

616

16

16

16

16

239

Page 252: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

15. Probability II — Probability distributions

What can we say about random variables such as X1 = 1X

or X2 = X2 or the randomvariable X3 where

X3 =

0 if x = 1, 2, 31 if x = 4, 52 if x = 6

?

These take the values derived from the function given; the associated probabilities arethose of X. Therefore, from the distribution of X we can derive the distribution ofX1 = 1

X:

X1 1 12

13

14

15

16

Probability 16

16

16

16

16

16

Similarly, for X2 = X2 we obtain:

X2 1 4 9 16 25 36Probability 1

616

16

16

16

16

And, finally, for X3 (as previously defined):

X3 0 1 2Probability 3

6= 1

226

= 13

16

And just as we defined

E(X) =∑

x · P (X = x),

we can define the expectation of a function of a random variable.

For a discrete random variable X,

E(g(X)) =∑

g(x) · P (X = x), (15.3)

where g(X) is the function of X being considered, gives us the expectation of thisfunction of X.

Expectation of a function of a random variable

240

Page 253: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

15. Probability II — Probability distributions

So,

E(X1) = E

(1

X

)= 1× 1

6+

1

2× 1

6+

1

3× 1

6+

1

4× 1

6+

1

5× 1

6+

1

6× 1

6

=49

120.

E(X2) = E(X2)

= 1× 1

6+ 4× 1

6+ 9× 1

6+ 16× 1

6+ 25× 1

6+ 36× 1

6

=91

6.

E(X3) = 0× 1

2+ 1× 1

3+ 2× 1

6

=2

3.

Activity 15.1 Determine the mean of the following discrete probabilitydistribution:

X 1 2 3 4 5P (X = x) 0.1 0.2 0.3 0.3 0.1

Find E(2X + 1), E(X3) and E(1/X) for this distribution. IsE(2X + 1) = 2E(X) + 1? Is E(X3) = (E(X))3? Is E(1/X) = 1/E(X)?

An immediate application is in the calculation of the variance of a random variable.

15.6 Variance

Just as we needed the idea of dispersion (or spread) to describe a set of data, so weneed to define the variance of a random variable. The definition is similar. If X is arandom variable, we define the variance by

Var(X) =∑

(x− µ)2 · P (X = x). (15.4)

Recall that E(X) = µ, and the summation is taken over all the values x which are takenby the random variable X. We often write

Var(X) = σ2. (15.5)

And just as we could find an ordinary variance in two ways, we can rewrite this as

Var(X) =∑

x2 · P (X = x)− µ2. (15.6)

241

Page 254: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

15. Probability II — Probability distributions

So there are two equivalent versions we can use. The latter is often easier in practice.The square root of the variance is the standard deviation. We could write (15.6) moresuccinctly as follows.

The variance of a random variable X is

Var(X) = E(X2)− (E(X))2 . (15.7)

Variance of a random variable

For the two coins in Example 15.2, we saw that µ = 1. The variance is therefore

σ2 = (0− 1)2 × 1

4+ (1− 1)2 × 1

2+ (2− 1)2 × 1

4

=1

4+ 0 +

1

4

=1

2(first method),

or,

σ2 =

(02 × 1

4+ 12 × 1

2+ 22 × 1

4

)− 12

=

(0 +

1

2+ 1

)− 1

=1

2(second method),

giving a standard deviation of 1/√

2.

Example 15.4

Suppose, in an attempt to schedule fire crews efficiently, the supervisor of a firestation has recorded the probability distribution for the number of emergency calls,Y , in a given day, based on historical data.

Y = y 0 1 2 3 4P (Y = y) 0.25 0.30 0.25 0.15 0.05

(a) What are the expectation and variance of Y ?

(b) What is the probability that, on any given day, the number of calls exceeds

i. µ+ 2σ

ii. µ+ 3σ?

The solutions are as follows.

(a)

E(Y ) = (0× 0.25) + (1× 0.30) + (2× 0.25) + (3× 0.15) + (4× 0.05) = 1.45.

242

Page 255: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

15. Probability II — Probability distributions

Var(Y ) = (0− 1.45)2 × 0.25 + (1− 1.45)2 × 0.30 + (2− 1.45)2 × 0.25

+(3− 1.45)2 × 0.15 + (4− 1.45)2 × 0.05

= 1.3475

(b) We have that σ =√

Var(Y ) =√

1.3475 = 1.16.

i. P (Y > µ+ 2σ) is

P (Y > 1.45+2×1.16) = P (Y > 1.45+2.32) = P (Y > 3.77) = P (Y = 4) = 0.05.

ii. P (Y > µ+ 3σ) is

P (Y > 1.45 + 3× 1.16) = P (Y > 1.45 + 3.48) = P (Y > 4.93) = 0.

It is important to distinguish a frequency distribution and a probability distribution. Theformer uses data — it counts the number of observations satisfying some criterion, ortaking some value, in the dataset. The latter is based on theory, or some assumedproperty — it gives the probability of an observation satisfying a criterion or takingsome value, based on theory or assumptions. The two are related but are not the same.

Activity 15.2 Find the variance and standard deviation of the discrete probabilitydistribution in Activity 15.1.

We now explore some specific discrete probability distributions.

15.7 Discrete uniform distribution

One of the simplest distributions is the discrete uniform distribution, where adiscrete random variable X has this distribution if X takes the values 1, 2, 3, . . . , k, eachwith probability 1/k. For example, for k = 7, we can describe it by

X = x 1 2 3 4 5 6 7P (X = x) 1

717

17

17

17

17

17

What are the mean and variance in the general case? Well, the mean is

E(X) =∑

x · P (X = x)

=∑

k · P (X = k)

= 1× 1

k+ 2× 1

k+ . . .+ k × 1

k

=1

k(1 + 2 + . . .+ k).

A useful result from mathematics is that

1 + 2 + . . .+ k =k(k + 1)

2. (15.8)

243

Page 256: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

15. Probability II — Probability distributions

Refer to Activity 10.1 where this is derived. So,

E(X) =1

k× k(k + 1)

2=k + 1

2. (15.9)

Therefore the expectation is the arithmetic mean of the minimum and maximum values.No surprise!

A similar, slightly more involved calculation shows that

Var(X) =k2 − 1

12. (15.10)

Activity 15.3 (Hard)

Show that Var(X) =k2 − 1

12, where X follows a discrete uniform distribution. You

may use the fact that 12 + 22 + . . .+ k2 =k(k + 1)(2k + 1)

6.

This distribution is of limited applicability, although it could, for example, be used in astudy of lottery outcomes because each set of numbers is equally likely to be drawn.That said, it illustrates the principles used when looking at distributions:

Define the distribution.

Find its mean and variance (and any other relevant properties).

Consider how it may be applied.

So although this distribution is not very common in applications, it serves as a usefulreference point for more complex distributions.

15.8 Bernoulli distribution

A Bernoulli trial is an experiment with only two possible outcomes. We will numberthese outcomes 1 and 0, and refer to them as ‘success’ and ‘failure’, respectively.

There are very many such cases, for example:

Agree / Disagree

Male / Female

Employed / Not employed

Owns a car / Does not own a car

Business goes bankrupt / Continues trading

and so on...

244

Page 257: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

15. Probability II — Probability distributions

The Bernoulli distribution is the distribution of the outcome of a single Bernoullitrial. This is the distribution of a random variable X with the probability function

P (X = x) =

π for x = 1

1− π for x = 0

0 otherwise.

(15.11)

i.e. P (X = 1) = π and P (X = 0) = 1− P (X = 1) = 1− π, and no other values arepossible.

Such a random variable X has a Bernoulli distribution with (probability) parameterπ. This is often written as

X ∼ Bernoulli(π).

A parameter, or set of parameters, is a measure which completely describes aprobability distribution.

We note the following results.

If X ∼ Bernoulli(π), then

E(X) = π (15.12)

Var(X) = π(1− π). (15.13)

Mean and variance of the Bernoulli distribution

Activity 15.4 Derive the mean and variance of the Bernoulli distribution.

15.9 Binomial distribution

Before giving a detailed description of this important distribution, we need a bit ofmathematics. We define the number n! (called ‘n factorial’) to be

n! = n× (n− 1)× (n− 2)× . . .× 3× 2× 1, (15.14)

where n is a positive integer. For example, 5! = 5× 4× 3× 2× 1 = 120. Similarly,4! = 4× 3× 2× 1 = 24, 1! = 1 and we define 0! = 1. Next, we define

(nx

)to be the

number of ways of selecting x objects from a set of n objects when order is unimportantand no objects may be repeated. This is given by(

n

x

)=

n!

x!(n− x)!. (15.15)

By way of illustration, this says that there are(

52

)= 5!

2!3!= 120

2×6= 10 ways to select two

objects from 5 without regard to order. We can easily check this. If we have a set{1, 2, 3, 4, 5} with 5 objects, the 10 ways to select 2 objects from these 5 are:

1,2 1,3 1,4 1,5 2,32,4 2,5 3,4 3,5 4,5.

245

Page 258: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

15. Probability II — Probability distributions

Suppose we carry out n Bernoulli trials such that

1. In each trial, the probability of success is π.

2. Different trials are statistically independent events.

Let X denote the total number of successes in these n trials. Then X follows abinomial distribution with parameters n and π, where n ≥ 1 is a known integer, and0 ≤ π ≤ 1. This is often written as

X ∼ Bin(n, π).

Examples of the binomial distribution include:

A coin (biased or unbiased) is tossed n times; we can find the probability ofobtaining x heads.

A difficult operation with constant probability π of success is carried out n times.We can find the probability of x successful outcomes.

A certain type of car battery has a known market share. If we examine n cars, wecan find the probability of finding x batteries of this type.

A known proportion of companies have an ethics code. If we contact n companies,we can find the probability that x have an ethics code.

Example 15.5

A multiple choice test has 4 questions, each with 4 possible answers. James is takingthe test, but has no idea at all about the answers. So he guesses every answer andthus has a probability of 1/4 of getting any individual question correct.

Let X denote the number of correct answers in James’ test. This follows thebinomial distribution with n = 4 and π = 0.25, i.e.

X ∼ Bin(4, 0.25).

What is the probability that James gets 3 of the 4 answers correct? Here it isassumed that the guesses are independent, and each has a probability π = 0.25 ofbeing correct.

The probability of any particular sequence of 3 correct and 1 incorrect answers, forexample 1110, is π3(1− π)1. However, we do not care about the order of the 0’s and1’s, only about the number of 1’s. So 1101 and 1011, for example, also count as 3correct answers. Each of these also has the probability π3(1− π)1.

The total number of sequences with three 1’s (and thus one 0) is the number oflocations for the three 1’s that can be selected in the sequence of 4 answers. This is(

43

)= 4. Thus the probability of obtaining three 1’s is(

4

3

)π3(1− π)1 = 4 · 0.253 · 0.751 ≈ 0.047.

246

Page 259: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

15. Probability II — Probability distributions

In general, the probability function of X ∼ Bin(n, π) is

P (X = x) =

{(nx

)πx(1− π)n−x for x = 0, 1, . . . , n

0 otherwise.(15.16)

For instance, in the example above, where X ∼ Bin(4, 0.25), we have

P (X = 0) =

(4

0

)0.2500.754 = 0.316,

P (X = 1) =

(4

1

)0.2510.753 = 0.422,

P (X = 2) =

(4

2

)0.2520.752 = 0.211,

P (X = 3) =

(4

3

)0.2530.751 = 0.047,

and P (X = 4) =

(4

4

)0.2540.750 = 0.004.

If X ∼ Bin(n, π), then

E(X) = nπ (15.17)

Var(X) = nπ(1− π). (15.18)

Mean and variance of the binomial distribution

Suppose we now have a test with 20 questions where each question has 4 possibleanswers and consider again a student who guesses every one of the answers. Let Xdenote the number of correct answers by such a student, so that X ∼ Bin(20, 0.25). Theexpected number of correct answers is E(X) = 20 · 0.25 = 5.

The teacher wants to set the pass mark of the examination so that, for such a student,the probability of passing is less than 0.05. What should the pass mark be? In otherwords, what is the smallest x such that P (X ≥ x) < 0.05, i.e. such thatP (X < x) ≥ 0.95?

Calculating the probabilities of x = 0, 1, . . . , 20 we get (rounded to 3 decimal places):

X = x 0 1 2 3 4 5 6 7 8 9 10P (X = x) 0.003 0.021 0.067 0.134 0.190 0.202 0.169 0.112 0.061 0.027 0.010X = x 11 12 13 14 15 16 17 18 19 20P (X = x) 0.003 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

We find that P (X < 8) = 0.898 and P (X < 9) = 0.959. ThereforeP (X ≥ 8) = 0.102 > 0.05 and P (X ≥ 9) = 0.041 < 0.05. So the pass mark should be setat 9.

More generally, consider a student who has the same probability π of getting the correctanswer for each question, so that X ∼ Bin(20, π). Plots of the probabilities forπ = 0.25, 0.5, 0.7 and 0.9 are provided in Figure 15.1. Notice how the shape of the

247

Page 260: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

15. Probability II — Probability distributions

distribution changes as the parameter π changes. In particular for the ‘extreme’ π, i.e.π = 0.9, the distribution is heavily skewed. When π = 0.5, i.e. success and failure areequally likely, the distribution of the number of successes is symmetric. Figure 15.1

0 5 10 15 20

0.0

00

.10

0.2

00.3

0

Correct answers

Pro

ba

bili

ty

π = 0.25, E(X)=5

0 5 10 15 20

0.0

00

.10

0.2

00.3

0

Correct answersP

roba

bili

ty

π = 0.5, E(X)=10

0 5 10 15 20

0.0

00.1

00.2

00

.30

Correct answers

Pro

babili

ty

π = 0.7, E(X)=14

0 5 10 15 20

0.0

00.1

00.2

00

.30

Correct answers

Pro

babili

ty

π = 0.9, E(X)=18

Figure 15.1: Binomial distribution probabilities where X ∼ Bin(20, π), for π =0.25, 0.5, 0.7 and 0.9.

illustrates how different probability distributions may differ from each other in abroader or narrower sense. In the broader sense, we have different families ofdistributions which may have quite different characteristics, for example:

• symmetric vs. skewed

• continuous vs. discrete

• among discrete: finite vs. infinite number of possible values

• among continuous: different sets of possible values (for example all real numbers x,or x > 0).

The ‘distributions’ discussed in this unit are really families of distributions in this sense.In the narrower sense, individual distributions within a family differ in having differentvalues of the parameters of the distribution. The parameters determine the mean andvariance of the distribution, values of probabilities from it, etc. In statistical analysis ofa random variable X we typically:

248

Page 261: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

15. Probability II — Probability distributions

• select a family of distributions based on the basic characteristics of X

• use observed data to estimate values of the parameters of that distribution, andperform statistical inference.

Activity 15.5 Of a large number of mass-produced articles, 5% are defective. Findthe probability that a random sample of 25 will contain:

(a) no defectives

(b) exactly one defective

(c) at least two defectives.

15.10 Poisson distribution

The possible values of the Poisson distribution are the non-negative integers0, 1, 2, . . . . The probability function of the Poisson distribution is

P (X = x) =

{e−λ λx

x!for x = 0, 1, 2, . . .

0 otherwise,(15.19)

where λ > 0 is a parameter. If a random variable X has a Poisson distribution withparameter λ, this is often denoted by

X ∼ Poisson(λ).

If X ∼ Poisson(λ), then

E(X) = λ (15.20)

Var(X) = λ. (15.21)

Mean and variance of the Poisson distribution

Poisson distributions are used for counts of occurrences of various kinds. To give aformal motivation, suppose that we consider the number of occurrences of somephenomenon in time, and that the process that generates the occurrences satisfies thefollowing conditions:

1. The numbers of occurrences in any two mutually exclusive intervals of time areindependent of each other.

2. The probability of two or more occurrences at the same time is negligibly small.

3. The probability of one occurrence in any short time interval of length t is λt forsome constant λ > 0.

249

Page 262: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

15. Probability II — Probability distributions

In essence, these state that individual occurrences should be independent, sufficientlyrare, and happen at a constant rate λ per unit of time. A process like this is a Poissonprocess.

If occurrences are generated by a Poisson process, then the number X of occurrences ina randomly selected time interval of length t = 1 follows a Poisson distribution withmean λ, i.e. X ∼ Poisson(λ).

The single parameter λ of the Poisson distribution is thus the rate of occurrences perunit of time. Examples of variables for which we might use a Poisson distribution:

Number of telephone calls received at a call centre in a minute.

Number of accidents on a stretch of motorway in a week.

Number of customers arriving at a checkout in a minute.

Number of misprints per page of newsprint.

Because λ is the rate per unit of time, its value also depends on the unit of time (lengthof interval) we consider. For example,

if X is the number of arrivals in an hour and X ∼ Poisson(1.5), then if Y is thenumber of arrivals in two hours, Y ∼ Poisson(2× 1.5) = Poisson(3).

λ is also the mean, E(X), of the distribution as we saw in (15.20).

Both motivations suggest that distributions with higher values of λ have higherprobabilities of large values of X.

For example, Figure 15.2 plots probabilities P (X = x) for x = 0, 1, 2, . . . , 10 forPoisson(2) and Poisson(4).

0 2 4 6 8 10

0.0

00.0

50.1

00.1

50.2

00.2

5

x

p(x

)

λ = 2λ = 4

Figure 15.2: Poisson distribution probabilities where X ∼ Poisson(λ), for λ = 2 and 4.

250

Page 263: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

15. Probability II — Probability distributions

Example 15.6

Customers arrive at a bank on weekday afternoons randomly at an average rate of1.6 customers per minute. Let X denote the number of arrivals in a minute and Ythe number of arrivals in five minutes.

We assume a Poisson distribution for both, such that

X ∼ Poisson(1.6)

Y ∼ Poisson(5× 1.6) = Poisson(8).

(a) What is the probability that no customer arrives in a minute?

For X ∼ Poisson(1.6), the probability P (X = 0) is

P (X = 0) =e−λ λ0

0!=

e−1.6 1.60

0!= e−1.6 = 0.20.

(b) What is the probability that more than two customers arrive in a minute?

P (X > 2) = 1− P (X ≤ 2)

= 1− [P (X = 0) + P (X = 1) + P (X = 2)]

= 1− P (X = 0)− P (X = 1)− P (X = 2)

= 1− e−1.6 1.60

0!− e−1.6 1.61

1!− e−1.6 1.62

2!= 1− 0.2019− 0.3230− 0.2584

= 0.2167.

(c) What is the probability that no more than one customer arrives in five minutes?

For Y ∼ Poisson(8), the probability P (Y ≤ 1) is

P (Y = 0) + P (Y = 1) =e−8 80

0!+

e−8 81

1!

= 0.000335 + 0.002684

= 0.003019.

Activity 15.6 Hits on a website arrive at the rate of 12 per hour. Briefly discusswhether or not you believe the assumptions underlying the Poisson distributionhold. Assuming the assumptions are valid, calculate the probabilities that

(a) there are exactly three hits between 10:00 and 10:30

(b) there is exactly one hit between 14:00 and 14:05

(c) there are more than two hits between 16:40 and 16:45.

251

Page 264: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

15. Probability II — Probability distributions

15.11 A word on calculators

In the examination you will be allowed a basic calculator only. To calculate binomialand Poisson probabilities directly requires access to a ‘factorial’ key (for the binomial)and ‘e’ key (for the Poisson), which will not appear on a basic calculator. Note that anyprobability calculations which are required in the examination will be possible on abasic calculator. For example, if a Poisson probability required the numerical value ofe−3, then this would be provided in the examination question.

15.12 Summary

This unit has introduced the concept of a random variable and explained how there aretwo types of random variable — discrete and continuous. Then, focusing on discreterandom variables, probability distributions were constructed to represent how likely thedifferent possible outcomes of a chance experiment were to occur. Important theoreticalproperties of these probability distributions were also discussed, specifically theexpected value and variance.

15.13 Key terms and concepts

Bernoulli distribution Bernoulli trialBinomial distribution ContinuousDiscrete Expected valueNormal distribution ParameterPoisson distribution Poisson processProbability distribution Random variableSample space Uniform distribution

Learning outcomes

At the end of this unit, you should be able to:

appreciate the concepts of a random variable and a probability distribution

calculate the expected value and variance for discrete random variables

recognise some common discrete probability distributions

discuss the key terms and concepts introduced in this unit.

252

Page 265: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

15. Probability II — Probability distributions

Exercises

Exercise 15.1

The probability function P (X = x) = 0.02x is defined for x = 8, 9, 10, 11 and 12. Whatare the mean and variance of this probability distribution?

Exercise 15.2

Of all the candles produced by a company, 0.01% do not have wicks (the core piece ofstring). If a retailer buys 10,000 candles from the company, what is the probability thatall the candles have wicks? What is the probability that at least one candle will nothave a wick?

Exercise 15.3

If a large grass lawn contains an average of 1 weed per 600cm2, what will be thedistribution of X, the number of weeds in an area of 400cm2? Hence find P (X ≤ 1).

Exercise 15.4

A graduate applies for 10 jobs. She believes she has a constant and independentprobability 0.1 of receiving an offer in each case.

(a) Write down the distribution of the total number of offers received. What are themean and variance of the distribution?

(b) What is the probability of at least one offer?

(c) The graduate is considering using the Poisson distribution to simplify thecalculation in (b). What advice would you give her?

(d) Discuss briefly whether you think the assumption of independence is realistic inthis context.

Exercise 15.5

The random variable X has a binomial distribution such that X ∼ Bin(4, 0.3). It hasthe following probability distribution.

x 0 1 2 3 4P (X = x) 0.2401 0.4116 0.2646 a b

(a) Find a and b.

(b) Suppose Y = (X − 3)2. Find E(Y ).

Exercise 15.6

In a prize draw, the probabilities of winning various amounts of money are:

Prize (£) 500 100 50 1 0Probability of win 0.01 0.03 0.11 0.50 0.35

What is the expected value and standard deviation of the prize?

253

Page 266: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

15. Probability II — Probability distributions

Exercise 15.7

The formula, P (X = 6) =(

146

)0.360.78, was used to compute a probability from a

probability distribution. What is the standard deviation for this probabilitydistribution?

Exercise 15.8

Explain briefly when it would be appropriate to use a:

(a) uniform distribution

(b) binomial distribution

(c) Poisson distribution.

254

Page 267: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

16. Probability III — The Normal distribution and sampling distributions

Unit 16: Probability IIIThe Normal distribution and samplingdistributions

Overview

The Normal distribution is introduced and probabilities calculated for this distribution(which requires a transformation to the standard Normal distribution). We thenproceed to consider the estimation of a population mean through the use of sampling.This gives rise to a sampling distribution and its properties are discussed. We concludethe probability section of the course with the powerful result known as the ‘CentralLimit Theorem’.

Aims

This unit explores the Normal distribution and how it relates to sampling distributionsand the Central Limit Theorem. Particular aims are:

to work with the Normal distribution

to understand the concept of a sampling distribution

to appreciate the usefulness of the Central Limit Theorem.

Background reading

+ Swift, L., and S. Piff Quantitative methods for business, management and finance.(Palgrave, 2010) third edition [ISBN 9780230218246] ‘Probability’ Chapter 3.

16.1 The Normal distribution

The Normal distribution is by far the most important probability distribution instatistics. This is for three broad reasons:

Many variables have distributions that are approximately normal, for exampleweights of humans, animals and various products.

The Normal distribution has extremely convenient mathematical properties, whichmake it a useful default choice of distribution in many contexts.

Even when a variable is not itself even approximately normally distributed,functions of several observations of the variable (‘sampling distributions’) are often

255

Page 268: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

16. Probability III — The Normal distribution and sampling distributions

approximately normal due to the Central Limit Theorem (CLT). Because ofthis, the Normal distribution has a crucial role in statistical inference. This will bediscussed later.

The probability density function (formula for the distribution curve) of the Normaldistribution (which you do not need to know!) is

f(x) =1√

2πσ2exp

[− 1

2σ2(x− µ)2

]for −∞ < x <∞, (16.1)

where π is the mathematical constant (i.e. π = 3.14159 . . . ), and µ and σ2 areparameters, with −∞ < µ <∞ and σ2 > 0.

A random variable X with this probability density function is said to have a Normaldistribution with mean µ and variance σ2, denoted X ∼ N(µ, σ2).

If X ∼ N(µ, σ2), then

E(X) = µ (16.2)

Var(X) = σ2 (16.3)

and the standard deviation is thus σ.

Mean and variance of the Normal distribution

The Normal distribution is the so-called ‘bell curve’. The two parameters affect it asfollows:

The mean, µ, determines the location of the curve.

The variance, σ2, determines the dispersion (spread) of the curve.

For example, in Figure 16.1,

N(0, 1) and N(5, 1) have the same dispersion but different location: the N(5, 1)curve is identical to the N(0, 1) curve, but shifted 5 units to the right.

N(0, 1) and N(0, 9) have the same location but different dispersion: the N(0, 9)curve is centred at the same value as the N(0, 1) curve, but spread out more widely.

The mean can also be inferred from the observation that the Normal distribution issymmetric about µ. This also implies that the median of the Normal distribution is alsoµ; and we also note that since the distribution reaches a maximum at µ, then the meanand median are also equal to the mode.

Probabilities are given by areas under the curve, which involves integrating equation(16.1). Unfortunately, such integrals cannot be evaluated in closed-form, so instead wemake use of statistical tables.∗ Specifically, we note the special transformation

Z =X − µσ

. (16.4)

∗In practice, we could also use a computer, but not in the examination!

256

Page 269: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

16. Probability III — The Normal distribution and sampling distributions

−5 0 5 10

0.0

0.1

0.2

0.3

0.4

x

N(0, 1) N(5, 1)

N(0, 9)

Figure 16.1: Three examples of Normal distributions.

The transformed variable Z is known as a standardised variable, or z-score. It can beshown (but is beyond the scope of this course), that the distribution of the z-score isN(0, 1), i.e. the Normal distribution with mean µ = 0 and variance σ2 = 1 (andtherefore a standard deviation of σ = 1).

If X ∼ N(µ, σ2), then

Z =X − µσ

∼ N(0, 1). (16.5)

Standard Normal distribution

The cumulative probability, P (Z ≤ z), is often denoted by Φ(z) and values for various‘z’ are given in Appendix C.

In the examination, you will have a copy of the table in Appendix C. The table showsvalues of Φ(z) = P (Z ≤ z) for z ≥ 0. This can be used to calculate probabilities of anyintervals for any Normal distribution. But how? The table seems to be incomplete:

1. It is only for N(0, 1), not for N(µ, σ2) for any other µ and σ2.

2. Even for N(0, 1), it only shows probabilities for z ≥ 0.

We now show how these are not really limitations, starting with ‘2.’, i.e. how to workout cumulative standard normal probabilities for negative z-values.

The key to using the table is that the standard Normal distribution is symmetric about0. This means that for an interval in one tail, its ‘mirror image’ in the other tail has thesame probability.

Suppose that z ≥ 0, so that −z ≤ 0. The table in Appendix C shows

P (Z ≤ z) = Φ(z). (16.6)

From it, we also get the following probabilities:

P (Z > z) = 1− Φ(z).

257

Page 270: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

16. Probability III — The Normal distribution and sampling distributions

P (Z ≤ −z) = Φ(−z) = 1− Φ(z) = P (Z > z).

P (Z > −z) = 1− Φ(−z) = Φ(z) = P (Z < z).

In the continuous world, the probability of a single point value is zero. Therefore sinceP (Z = z) = 0 for all z, we are indifferent between using ≤ and <, similarly we areindifferent between using ≥ and >. So, P (Z ≤ z) = P (Z < z) andP (Z ≥ z) = P (Z > z). This is because

P (Z ≤ z) = P (Z < z) + P (Z = z) = P (Z < z) + 0 = P (Z < z).

P (Z ≥ z) = P (Z > z) + P (Z = z) = P (Z > z) + 0 = P (Z > z).

Figure 16.2 shows equal tail probabilities for the standard Normal distribution, i.e. itshows that P (Z ≤ −z) = P (Z ≥ z).

−z 0 +z

Figure 16.2: Equal tail probabilities for the standard Normal distribution showing thatP (Z ≤ −z) = P (Z ≥ z).

If Z ∼ N(0, 1), for any two numbers z1 < z2,

P (z1 < Z ≤ z2) = Φ(z2)− Φ(z1), (16.7)

where Φ(z2) and Φ(z1) are obtained using the tabulated values in Appendix C.

Calculating probabilities for the standard Normal distribution

Reality check : Remember that the standard Normal distribution is symmetric about 0,hence

Φ(0) = P (Z ≤ 0) = 0.5. (16.8)

So if you ever end up with results like P (Z ≤ −1) = 0.7 or P (Z ≤ 1) = 0.2 orP (Z > 2) = 0.95, these must be wrong! Why? Well, P (Z ≤ −1) < P (Z ≤ 0) = 0.5, soP (Z ≤ −1) cannot be 0.7. Similarly, 0.5 = P (Z ≤ 0) < P (Z ≤ 1), so P (Z ≤ 1) cannotbe 0.2. Finally, P (Z ≥ |z|) ≤ 0.5 for any z, so P (Z > 2) cannot be 0.95.

258

Page 271: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

16. Probability III — The Normal distribution and sampling distributions

Example 16.1 If Z ∼ N(0, 1), what is P (Z > 1.20)?

It is useful to draw a quick sketch to visualise the area of probability. Such a sketchis shown in Figure 16.3. Turning to Appendix C, we look up the z-value of 1.20 byusing the ‘1.2’ row and ‘0.00’ column which shows that

Φ(1.20) = P (Z ≤ 1.20) = 0.8849.

Therefore P (Z > 1.20) = 1− Φ(1.20) = 0.1151.

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

Standard Normal Density Function

z

f Z(z

)

Figure 16.3: The standard Normal density function where P (Z > 1.20) is the area of theshaded region.

Example 16.2 Turn to Appendix C. Look up the probability in the ‘0.8’ row and‘0.04’ column of the table, which shows that

Φ(0.84) = P (Z ≤ 0.84) = 0.7995.

We then also have

P (Z > 0.84) = 1− Φ(0.84) = 0.2005.

P (Z < −0.84) = 1− P (Z ≤ 0.84) = 1− Φ(0.84) = 0.2005. Alternatively,P (Z < −0.84) = P (Z > 0.84) by symmetry.

P (Z ≥ −0.84) = P (Z ≤ 0.84) = Φ(0.84) = 0.7995.

P (−0.84 ≤ Z ≤ 0.84) = P (Z ≤ 0.84)− P (Z < −0.84) = 0.5990.

259

Page 272: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

16. Probability III — The Normal distribution and sampling distributions

Example 16.3 If Z ∼ N(0, 1), what is P (−1.24 < Z < 1.86)?

We require the sum of the red and blue areas in Figure 16.4. The red area is given by:

P (0 ≤ Z ≤ 1.86) = P (Z ≤ 1.86)− P (Z ≤ 0)

= Φ(1.86)− Φ(0)

= 0.9686− 0.5

= 0.4686.

The blue area is given by:

P (−1.24 ≤ Z ≤ 0) = P (Z ≤ 0)− P (Z ≤ −1.24)

= Φ(0)− Φ(−1.24)

= Φ(0)− (1− Φ(1.24))

= 0.5− (1− 0.8925)

= 0.3925.

Hence P (−1.24 < Z < 1.86) = 0.4686 + 0.3925 = 0.8611.

−3 −2 −1 0 1 2 3

0.0

0.1

0.2

0.3

0.4

Standard Normal Density Function

z

f Z(z

)

Figure 16.4: The standard Normal density function where the red area is P (0 ≤ Z ≤ 1.86)and the blue area is P (−1.24 ≤ Z ≤ 0).

Activity 16.1 If Z ∼ N(0, 1), calculate:

(a) P (0 < Z < 1.2)

(b) P (−0.68 < Z < 0)

(c) P (−0.46 < Z < 2.21)

(d) P (0.81 < Z < 1.94).

(e) Further, find a value for x such that P (−x < Z < x) = 0.8.

260

Page 273: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

16. Probability III — The Normal distribution and sampling distributions

16.1.1 Probabilities for any Normal distribution

How about a Normal distribution X ∼ N(µ, σ2), for any other µ and σ2? What if wewant to calculate, for any a < b, P (a ≤ X ≤ b)?

Remember that (X − µ)/σ = Z ∼ N(0, 1). If we apply this transformation to all partsof the inequalities, we get

P (a ≤ X ≤ b) = P

(a− µσ≤ X − µ

σ≤ b− µ

σ

)(16.9)

= P

(a− µσ≤ Z ≤ b− µ

σ

)= Φ

(b− µσ

)− Φ

(a− µσ

),

which can be calculated using the table.

Note that this also covers the cases of the one-sided inequalities P (X ≤ b), witha = −∞, and P (X ≥ a), with b =∞. For a = −∞, then P (X ≤ b) = Φ

(b−µσ

)because

Φ(−∞) = 0. For b =∞, then P (X ≥ a) = 1− Φ(a−µσ

)because Φ(∞) = 1.

Example 16.4

Let X denote the diastolic blood pressure of a randomly selected person in England.This is approximately distributed as X ∼ N(74.2, 127.87). Note that diastolic bloodpressure can only be approximately normal, rather than exactly normal, becausenormal random variables can take negative values and, clearly, diastolic bloodpressure cannot be negative. However, for practical purposes, we can use the Normaldistribution to model diastolic blood pressure.

Suppose we want to know the probabilities of the following intervals:

X > 90 (high blood pressure)

X < 60 (low blood pressure)

60 ≤ X ≤ 90 (normal (mid) blood pressure).

These are calculated using the previous results, with µ = 74.2 and σ2 = 127.87, andthus σ = 11.31. So here

X − 74.2

11.31= Z ∼ N(0, 1),

and we can refer values of this standardised variable to the table. We have,

P (X > 90) = P

(X − 74.2

11.31>

90− 74.2

11.31

)= P (Z > 1.40) = 1− Φ(1.40)

This we have in the table in Appendix C, from which we obtain1− Φ(1.40) = 1− 0.9192 = 0.0808.

P (X < 60) = P

(X − 74.2

11.31<

60− 74.2

11.31

)= P (Z < −1.26) = P (Z > 1.26) = 1− Φ(1.26)

261

Page 274: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

16. Probability III — The Normal distribution and sampling distributions

which is 1− 0.8962 = 0.1038, according to the table. Finally,

P (60 ≤ X ≤ 90) = P (X ≤ 90)− P (X < 60)

= [1− P (X > 90)]− P (X < 60)

= [1− 0.0808]− 0.1038

= 0.8154.

These (rounded) probabilities are shown in Figure 16.5.

40 60 80 100 120

0.0

00

.01

0.0

20

.03

0.0

4

Diastolic blood pressure

Mid: 0.82

High: 0.08

Low: 0.10

Figure 16.5: Probabilities for Example 16.4 regarding diastolic blood pressure.

Activity 16.2 The scores on a verbal reasoning test are modelled by a Normaldistribution with mean µ = 100 and standard deviation σ = 10.

(a) What proportion of the scores will be greater than 95?

(b) What proportion of the scores will be less than 110?

(c) What is the probability of an individual selected at random having a score lessthan 70?

(d) What are the quartiles of the distribution?

(e) What is the range of scores such that 0.05 (5%) of the scores are below therange and 0.05 of the scores are above it?

16.1.2 Some probabilities around the mean

The following results hold for all Normal distributions:

P (µ− σ < X < µ+ σ) = 0.683. In other words, 68.3% of the total probability iswithin 1 standard deviation of the mean.

P (µ− 1.96σ < X < µ+ 1.96σ) = 0.950. In words, 95% of the total probability iswithin 1.96 standard deviations of the mean.

262

Page 275: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

16. Probability III — The Normal distribution and sampling distributions

P (µ− 2σ < X < µ+ 2σ) = 0.954. In words, 95.4% of the total probability is within2 standard deviations of the mean.

P (µ− 2.58σ < X < µ+ 2.58σ) = 0.99. In words, 99% of the total probability iswithin 2.58 standard deviations of the mean.

P (µ− 3σ < X < µ+ 3σ) = 0.997. In words, 99.7% of the total probability is within3 standard deviations of the mean.

The first two of these are illustrated graphically in Figure 16.6.

µ −1.96σ µ−σ µ µ+σ µ +1.96σ

0.683

<−−−−−−−−−− 0.95 −−−−−−−−−−>

Figure 16.6: Some probabilities around the mean. The shaded area shows that 68.3% ofthe total probability is within 1 standard deviation of the mean. The shaded and hatchedareas combined show that 95% of the total probability is within 1.96 standard deviationsof the mean.

16.2 Sampling distributions

A simple random sample is a sample selected by a process where every possiblesample (of the same size, n) has the same probability of selection.† The selection processis left to chance, thus eliminating the effect of selection bias.‡ Due to the randomselection mechanism, we do not know (in advance) which sample will occur. Everypopulation element has a known, non-zero probability of selection in the sample but noelement is certain to appear.

Consider a population of size N = 6 elements: A, B, C, D, E and F. We consider allpossible samples of size n = 2 (without replacement, i.e. once an object has been chosenit cannot be selected again). There are 15 different, but equally likely, such samples:

AB, AC, AD, AE, AF, BC, BD, BE, BF, CD, CE, CF, DE, DF, EF.

Since this is a simple random sample, each sample has the same probability of selection,i.e. 1/15.

†Sampling techniques are discussed in greater depth later in the course.‡Forms of bias are also discussed later in the course.

263

Page 276: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

16. Probability III — The Normal distribution and sampling distributions

16.2.1 Estimation

A population has particular characteristics of interest such as the mean (µ), variance(σ2) etc. Collectively we refer to these characteristics as parameters. If we do not havepopulation data, the parameter values will be unknown.

‘Statistical inference’ is the process of estimating the (unknown) parameter values usingthe (known) sample data. We use a statistic (called an ‘estimator’) calculated fromsample observations to provide a ‘point estimate’ for a parameter.

Returning to our example, recall there are 15 different samples of size 2 from apopulation of size 6. Suppose the variable of interest is income, such that

Individual A B C D E FIncome £000s 3 6 4 9 7 7

If we seek the population mean, µ, we will use the sample mean, X, as our estimator,where for a sample size n

X =1

n

n∑i=1

Xi. (16.10)

Estimator of the population mean

For example, if the observed sample was ‘AB’, the sample mean is (3,000 + 6,000)/2 =£4,500.

Clearly, different observed samples will lead to different sample means. Consider thevalues of X, i.e. x, for all possible samples (in £000s):

Sample Values xAB 3 6 4.5AC 3 4 3.5AD 3 9 6AE 3 7 5AF 3 7 5BC 6 4 5BD 6 9 7.5BE 6 7 6.5BF 6 7 6.5CD 4 9 6.5CE 4 7 5.5CF 4 7 5.5DE 9 7 8DF 9 7 8EF 7 7 7

264

Page 277: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

16. Probability III — The Normal distribution and sampling distributions

So, the values of X vary from 3.5 to 8, depending on the sample values. Since we havethe population data here, we can actually compute the population mean, µ, in £000s,which is

µ =1

N

N∑i=1

Xi =3 + 6 + 4 + 9 + 7 + 7

6= 6.

So, even with simple random sampling, we obtain some x values far from µ. Here, infact, only one sample (AD) results in x = µ.

Let’s now consider the maximum possible absolute deviations of the sample mean fromthe population mean, i.e. the distance | x− µ |

max | x− µ | Range of x Number of samples Probability0 x = 6 1 0.067

0.5 5.5 ≤ x ≤ 6.5 6 0.4001 5 ≤ x ≤ 7 10 0.667

1.5 4.5 ≤ x ≤ 7.5 12 0.8002 4 ≤ x ≤ 8 14 0.933

2.5 3.5 ≤ x ≤ 8.5 15 1.000

So, for example, there is an 80% probability of being within 1.5 units of µ (in £000s).

We now represent this as a frequency distribution. That is, we record the frequencyof each possible value of x.

x Frequency Relative frequency3.5 1 1/15 = 0.0674.5 1 1/15 = 0.0675.0 3 3/15 = 0.2005.5 2 2/15 = 0.1336.0 1 1/15 = 0.0676.5 3 3/15 = 0.2007.0 1 1/15 = 0.0677.5 1 1/15 = 0.0678.0 2 2/15 = 0.133

This is known as the sampling distribution of X. The sampling distribution is acentral and vital concept in statistics. It can be used to evaluate how ‘good’ anestimator is. Specifically, we care about how ‘close’ the estimator is to the populationparameter of interest.

As we have seen, different samples yield different sample mean values, as a consequenceof the random sampling procedure. Hence estimators (of which X is an example) arerandom variables. So, X is our estimator of µ, and the observed value of X, denoted x,is a point estimate.

Like any distribution, we care about a sampling distribution’s mean and variance.Together, we can assess how ‘good’ an estimator is. First, consider the mean — we seekan estimator which does not mislead us systematically. So the ‘average’ (mean) value ofan estimator, over all possible samples, should be equal to the population parameteritself.

265

Page 278: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

16. Probability III — The Normal distribution and sampling distributions

Returning to our example:

x Frequency Product3.5 1 3.54.5 1 4.55.0 3 15.05.5 2 11.06.0 1 6.06.5 3 19.57.0 1 7.07.5 1 7.58.0 2 16.0

Total 15 90.0

Hence the mean of this sampling distribution is 90/15 = 6 = µ.

An important difference between a sampling distribution and other distributions is thatthe values in a sampling distribution are summary measures of whole samples (i.e.statistics/estimators) rather than individual observations. Formally, the mean of asampling distribution is called the expected value of the estimator, denoted by E(·).Hence the expected value of the sample mean is E(X).

An unbiased estimator has its expected value equal to the parameter beingestimated. For our example, E(X) = 6 = µ.

Fortunately the sample mean X is always an unbiased estimator in simple randomsampling, regardless of the sample size, n, and the distribution of the (parent)population. This is a good illustration of a population parameter (µ) being estimated byits sample counterpart (X).

The unbiasedness of an estimator is clearly desirable, however, we also need to take intoaccount the dispersion of the estimator’s sampling distribution. Ideally, the possiblevalues of the estimator should not vary much around the true parameter value. So, weseek an estimator with a small variance. Recall the variance is defined to be the meanof the squared deviations about the mean of the distribution. In the case of samplingdistributions, it is referred to as the sampling variance.

Returning to our example:

x x− µ (x− µ)2 Frequency Product3.5 −2.5 6.25 1 6.254.5 −1.5 2.25 1 2.255.0 −1.0 1.00 3 3.005.5 −0.5 0.25 2 0.506.0 0.0 0.00 1 0.006.5 0.5 0.25 3 0.757.0 1.0 1.00 1 1.007.5 1.5 2.25 1 2.258.0 2.0 4.00 2 8.00

Total 15 24.00

Hence the sampling variance is 24/15 = 1.6.

266

Page 279: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

16. Probability III — The Normal distribution and sampling distributions

The population itself has a variance — the population variance, σ2.

x x− µ (x− µ)2 Frequency Product3 −3 9 1 96 0 0 1 04 −2 4 1 49 3 9 1 97 1 1 2 2

Total 6 24

Hence the population variance is σ2 = 24/6 = 4.

We now consider the relationship between σ2 and the sampling variance. Intuitively, alarger σ2 should lead to a larger sampling variance. For population size N and samplesize n, we note the following result when sampling without replacement.

Var(X) =N − nN − 1

· σ2

n.

So for our example, we get

Var(X) =6− 2

6− 1· 4

2= 1.6,

as we saw above.

We use the term standard error to refer to the standard deviation of the samplingdistribution, so

S.E.(X) =√

Var(X) =

√N − nN − 1

· σ2

n= σX .

Some implications:

As the sample size, n, increases, the sampling variance decreases, i.e. the precisionincreases.§

Provided the sampling fraction, n/N , is small, the term

N − nN − 1

≈ 1

so can be ignored — the precision depends effectively on n only.

Returning to our example, we can use the outcome of Activity 16.3 to see that thelarger the sample, the less variability there will be between samples.

Activity 16.3 List all samples of size n = 4 from A, B, C, D, E and F, whensampling without replacement. Hence verify that the sampling distribution of X isas indicated in the table below.

§Although greater precision is desirable, data collection costs will rise with n (remember why wesample in the first place!)

267

Page 280: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

16. Probability III — The Normal distribution and sampling distributions

x n = 2 n = 43.50 1 —4.50 1 —5.00 3 25.25 — 15.50 2 15.75 — 36.00 1 16.25 — 26.50 3 36.75 — 17.00 1 —7.25 — 17.50 1 —8.00 2 —

Here we see that there is a striking improvement in the precision of the estimator,because the variability has decreased considerably. The range of possible x values goesfrom 3.5 to 8.0 down to 5.0 to 7.25. The sampling variance is reduced from 1.6 to 0.4.

The factor N−nN−1

decreases steadily as n→ N . When n = 1 the factor equals 1, and whenn = N it equals 0. Sampling without replacement, increasing n must increase precisionsince less of the population is left out. In most practical sampling N is very large (e.g.several million), while n is comparably small (e.g. at most 1,000, say). Therefore in suchcases the factor N−n

N−1is close to 1, hence

Var(X) =N − nN − 1

· σ2

n≈ σ2

n=

Var(X)

nfor small n/N. (16.11)

When N is large, it is the sample size n which is important in determining precision, notthe sampling fraction. Consider two populations: N1 = 3 million and N2 = 200 million,both with the same variance σ2. We sample n1 = n2 = 1, 000 from each population, then

σ2X1

=N1 − n1

N1 − 1· σ

2

n1

= (0.999667) · σ2

1,000

σ2X2

=N2 − n2

N2 − 1· σ

2

n2

= (0.999995) · σ2

1,000

So σ2X1≈ σ2

X2, despite N1 being much less than N2.

16.2.2 Sampling from a Normal population

The mean and variance of X are E(X) and Var(X)/n, respectively, for a randomsample of size n from any population distribution of X. What about the form of thesampling distribution of X?

This depends on the distribution of X, and is not generally known. However, when thedistribution of X is normal, the sampling distribution of X is also normal.

268

Page 281: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

16. Probability III — The Normal distribution and sampling distributions

Suppose that X1, . . . , Xn are a random sample from a Normal distribution with meanµ and variance σ2. Then

X ∼ N

(µ,σ2

n

). (16.12)

Sampling distribution of X with a Normal population

So we note E(X) = E(X) = µ.

In an individual sample, x is not usually equal to µ, the expected value of thepopulation.

However, over repeated samples the values of X are centred at µ.

We also note Var(X) = Var(X)/n = σ2/n, and so the standard error is σ/√n.

Variation of values of X in different samples (the sampling variance) is large whenthe population variance of X is large.

More interestingly, the sampling variance gets smaller when the sample size nincreases.

In other words, when n is large the distribution of X is more tightly concentratedaround µ than when n is small.

4.0 4.5 5.0 5.5 6.0

x

n=5

n=20

n=100

Figure 16.7: Sampling distributions of X from N(5, 1), for different n.

16.3 Central Limit Theorem

We now have the very convenient result that if a random sample comes from a Normalpopulation, the sampling distribution of X is also normal. But what about samplingdistributions of X from other populations?

For this, we can use a remarkable mathematical result, the Central Limit Theorem(CLT). In essence, the CLT states that the normal sampling distribution of X whichholds exactly for random samples from a Normal distribution, also holds approximatelyfor random samples from (nearly) any distribution.

269

Page 282: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

16. Probability III — The Normal distribution and sampling distributions

Suppose that X1, X2, . . . , Xn are a random sample from a population distribution whichhas mean E(Xi) = µ and variance Var(Xi) = σ2. Let Xn denote the sample meancalculated from a sample of size n. Then

limn→∞

P

[Xn − µσ/√n≤ z

]= Φ(z) (16.13)

for any z, where Φ(z) denotes P (Z ≤ z) where Z has the standard Normal distribution.

The ‘limn→∞’ indicates that this is an asymptotic result, i.e. one which holdsincreasingly well as n increases, and exactly when the sample size is infinite.

In less formal language, the CLT says that for a random sample from (nearly) anydistribution with mean µ and variance σ2,

X ∼ N

(µ,σ2

n

)(16.14)

approximately, when n is sufficiently large. We can then say that X is asymptoticallynormally distributed with mean µ and variance σ2/n.

Sampling distribution of X with a non-Normal population

‘Nearly’ because the CLT requires that the variance of the population distribution isfinite. If it is not, the CLT does not hold. But such distributions are not common.

It may appear that the CLT is still somewhat limited, in that it applies only to samplemeans calculated from simple random samples. However, this is not really true, for twomain reasons:

There are more general versions of the CLT which do not require the observationsXi to be from such samples.

Even the basic version applies very widely, when we realise that the ‘X’ can also bea function of the original variables in the data. For example, if X and Y arevariables in the sample, we can also apply the CLT to∑n

i=1 log(Xi)

nor

∑ni=1 XiYin

The CLT can thus be used to derive sampling distributions for many statistics which donot initially look at all like X for a single variable in a random sample.

How large is ‘large n’?

The larger the sample size n, the better the normal approximation provided by theCLT is.

In practice, we have various rules-of-thumb for what is ‘large enough’ for theapproximation to be ‘accurate enough’.

270

Page 283: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

16. Probability III — The Normal distribution and sampling distributions

This also depends on the population distribution of Xi. For example,

• for symmetric distributions, even small n is enough

• for very skewed distributions, larger n is required.

For many distributions, n > 30 is sufficient for the approximation to be reasonablyaccurate.

16.3.1 CLT examples

In the first example, random samples (not shown here) of sizes

n = 1, 5, 10, 30, 100 and 1,000

were simulated from an ‘Exponential’ distribution (for which µ = 4 and σ2 = 16). Thisis a positively-skewed distribution, as shown by the histogram for n = 1 in Figure 16.8.Although we will not cover the exponential distribution formally in this course, it is aninteresting distribution since it describes the waiting time between events in a Poissonprocess.

Ten thousand samples of each size were generated. Histograms of the values of X inthese samples are shown in Figure 16.8. Each plot also shows the approximating Normaldistribution N(4, 16/n). The normal approximation is reasonably good already forn = 30, very good for n = 100 and practically perfect for n = 1,000.

0 10 20 30 40

n = 1

0 2 4 6 8 10 12 14

n = 5

2 4 6 8 10

n = 10

2 3 4 5 6 7

n = 30

2.5 3.0 3.5 4.0 4.5 5.0 5.5

n = 100

3.6 3.8 4.0 4.2 4.4

n = 1000

Figure 16.8: Distributions of X from an Exponential distribution for which µ = 4, fordifferent n.

In the second example, 10,000 random samples (again, not shown here) of sizes

n = 1, 10, 30, 50, 100 and 1,000

were simulated from the Bernoulli(0.2) distribution (for which µ = 0.2 andσ2 = 0.2 · (1− 0.2) = 0.16).

Here the distribution of Xi itself is not even continuous, and has only two possiblevalues, 0 and 1. Nevertheless, the sampling distribution of X can be well-approximatedby the Normal distribution, when n is large enough, as shown in Figure 16.9.

271

Page 284: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

16. Probability III — The Normal distribution and sampling distributions

Note that since here Xi = 1 or Xi = 0 for all i, X =∑n

i=1 Xi/n = m/n, where m is thenumber of observations for which Xi = 1. In other words, X is the sample proportionof the value X = 1.

The normal approximation is clearly very bad for small n, but reasonably good alreadyfor n = 50.

0.0 0.2 0.4 0.6 0.8 1.0

n = 1

0.0 0.2 0.4 0.6 0.8

n = 10

0.0 0.1 0.2 0.3 0.4 0.5

n = 30

0.0 0.1 0.2 0.3 0.4 0.5

n = 50

0.05 0.10 0.15 0.20 0.25 0.30 0.35

n = 100

0.16 0.18 0.20 0.22 0.24

n = 1000

Figure 16.9: Distributions of X from Bernoulli(0.2), for different n.

Activity 16.4 Consider the population below with N = 4 elements.

A B C D3 6 9 12

(a) Calculate the population mean and variance.

(b) Write down the sampling distribution of the sample mean for samples of sizen = 2.

(c) Using the result in (b), calculate the mean of the sampling distribution.

(d) Using the result in (b), calculate the variance of the sampling distribution.

(e) Use the formula for the variance of the sample mean to verify the relationshipbetween the value from (d) and the population variance.

16.4 Summary

The final probability unit covered the key points relating to the Normal distribution.We saw how to calculate probabilities for this distribution by considering areas underits curve. We then proceeded to explain the concept of a sampling distribution and itsimportance when estimating an unknown parameter such as a population mean whensampling from a Normal population and, by way of the Central Limit Theorem, whensampling from non-Normal populations.

272

Page 285: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

16. Probability III — The Normal distribution and sampling distributions

16.5 Key terms and concepts

Central Limit Theorem Expected valueFrequency distribution Normal distributionParameters Point estimatePrecision Sample proportionSampling distribution Sampling fractionSampling variance Selection biasSimple random sample Standard errorStandardised variable Unbiased estimator

Learning outcomes

At the end of this unit, you should be able to:

compute areas under the curve for a Normal distribution

explain what a sampling distribution represents

state and apply the Central Limit Theorem

discuss the key terms and concepts introduced in this unit.

Exercises

Exercise 16.1

The random variable X has a Normal distribution with mean µ and variance σ2, i.e.X ∼ N(µ, σ2). It is known that

P (X ≤ 66) = 0.0359 and P (X ≥ 81) = 0.1151.

(a) Give a clearly-labelled sketch to represent these probabilities on a normal curve.

(b) Show that the value of σ = 5.

(c) Find P (69 ≤ X ≤ 83).

Exercise 16.2

A random variable takes the values 1, 2 and 3, each with equal probability. List allpossible samples of size two that may be chosen, without replacement, from thispopulation and hence construct the sampling distribution of the sample mean, X.

Exercise 16.3

The weights of a large group of animals have mean 8.2kg and standard deviation 2.2kg.What is the probability that a random selection of 80 animals from the group will havemean weight between 8.3kg and 8.4kg? State any assumptions you make.

273

Page 286: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

16. Probability III — The Normal distribution and sampling distributions

Exercise 16.4

A perfectly machined regular tetrahedral (pyramid-shaped) die has four faces labelled 1to 4. It is tossed twice onto a level surface and after each toss the number on the facewhich is downwards is recorded. If the recorded values are x1 and x2 and the mean isx = (x1 + x2)/2, describe the distribution of x as a random quantity over repeateddouble tosses.

Exercise 16.5

A Normal distribution has a mean of 40. If 10% of the distribution falls between valuesof 50 and 60, what is the standard deviation of the distribution?

Exercise 16.6

Consider the following set of data. Does it appear to approximately follow a Normaldistribution? Justify your answer.

45 31 37 55 54 5648 54 52 55 52 5149 46 62 38 45 4847 46 40 61 50 5846 35 36 59 50 4839 48 51 52 43 45

Exercise 16.7

Discuss the differences or similarities between a sampling distribution of size 5 and asingle (simple) random sample of size 5.

Exercise 16.8

The distribution of salaries of lecturers in a university is positively skewed, with mostlecturers earning near the minimum of the pay scale. What would a samplingdistribution of size 2 look like? How about size 5? How about size 50?

Exercise 16.9

In no more than 200 words, explain the term ‘Central Limit Theorem’.

274

Page 287: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

17. Sampling and experimentation I — Sampling techniques and contact methods

Unit 17: Sampling andexperimentation ISampling techniques and contactmethods

Overview

Statistics concerns data analysis, but to do any analysis first we need data! This unitexplores various methods which social scientists can use to gather data. Central to thisis the concept of sampling — the (possibly random) selection of a sample of membersfrom an underlying population. From our sample we can then make inferences aboutthe population. We begin by describing a range of sampling techniques, outlining theirrelative advantages and disadvantages, and then consider the possible contact methodswhich might be used.

Aims

This unit presents random and non-random sampling techniques and survey contactmethods. Particular aims are:

to provide an overview of sampling in the social sciences

to outline the advantages and disadvantages of various sampling techniques andsurvey contact methods.

Background reading

+ Swift, L., and S. Piff Quantitative methods for business, management and finance.(Palgrave, 2010) third edition [ISBN 9780230218246] ‘Statistics’ Chapter 8.

17.1 Sampling

Sampling is a key component of any research design. The key to the use of statistics inresearch is being able to take data from a sample and make inferences about a largepopulation. This idea is depicted in Figure 17.1.

Sampling design involves several basic questions:

Should a sample be taken?

If so, what process should be followed?

275

Page 288: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

17. Sampling and experimentation I — Sampling techniques and contact methods

Figure 17.1: A depiction of inferring population characteristics from a sample drawn fromthe population of interest.

What kind of sample should be taken?

How large should it be?

What can be done to control and adjust for non-response errors?

We now consider how to answer these questions.

Sample or census?

We introduce some important terminology:

Population — The aggregate of all the elements, sharing some common set ofcharacteristics, that comprise the universe for the purpose of the social scienceproblem.

Census — A complete enumeration of the elements of a population of studyobjects.

Sample — A subgroup of the elements of the population selected for participationin the study.

To determine whether a sample or a census should be conducted, various factors need tobe considered. For example:

A census is very costly, so a large budget would be required, whereas a smallbudget favours a sample because fewer population elements are observed.

The length of time available for the study is important — a sample is far quicker toperform.

How big is the population? If it is ‘small’, then it is feasible to conduct a census (itwould not be too costly nor too time-consuming). However, it might not bepractical to enumerate a ‘large’ population.

276

Page 289: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

17. Sampling and experimentation I — Sampling techniques and contact methods

We will be interested in some particular characteristic, such as the heights of agroup of adults. If there is a small variance in the characteristic of interest, thenpopulation elements are ‘similar’, so we only need to observe a few elements tohave a clear idea about the characteristic. If the variance is large, then a samplemay fail to capture the large dispersion in the population, hence a census would bemore appropriate.

Sampling errors occur when the sample fails to adequately represent the population.If the consequences of making sampling errors are extreme (i.e. the ‘cost’ is high),then a census would appeal more since it eliminates sampling errors completely.

If non-sampling errors are costly (for example, an interviewer incorrectlyquestioning respondents) then a sample is better because fewer resources wouldhave been spent on collecting the data.

Measuring sampled elements may result in the destruction of the object, such astesting the road-life of a tyre. Clearly, in such cases a census is not feasible as therewould be no tyres left to sell!

Sometimes we may wish to perform an in-depth interview to study elements ingreat detail. If we want to focus on detail, then time and budget constraints wouldfavour a sample.

The conditions which favour the use of a sample or census are summarised in Table17.1. Of course, in practice, some of our factors may favour a sample while others favoura census, in which case a balanced judgement is required.

Factors Conditions favouring the use of:Sample Census

Budget Small LargeTime available Short LongPopulation size Large SmallVariance in the characteristic Small LargeCost of sampling errors Low HighCost of non-sampling errors High LowNature of measurement Destructive Non-destructiveAttention to individual cases Yes No

Table 17.1: Sample versus census.

Activity 17.1 Under what conditions would a sample be preferable to a census anda census be preferable to a sample?

Classification of sampling techniques

We draw a sample from the target population, which is the collection of elements orobjects that possess the information sought by the researcher and about whichinferences are to be made. We now consider the different types of sampling techniques

277

Page 290: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

17. Sampling and experimentation I — Sampling techniques and contact methods

which can be used in practice, which can be decomposed into ‘non-probability samplingtechniques’ and ‘probability sampling techniques’.

Non-probability sampling techniques are characterised by the fact that some unitsin the population do not have a chance of selection in the sample. Individual unitsin the population have an unknown probability of being selected. There is also aninability to measure sampling error. Examples of such techniques are:

• convenience sampling

• judgemental sampling

• quota sampling

• snowball sampling.

Probability sampling techniques mean every population element has a known,non-zero probability of being selected in the sample. Probability sampling makes itpossible to estimate the margins of sampling error, therefore all statisticaltechniques (such as confidence intervals and hypothesis tests — not considered inthis course) can be applied. In order to perform probability sampling, we need asampling frame which is a list of all population elements. However, we need toconsider whether the sampling frame is (i) adequate (does it represent the targetpopulation?), (ii) complete (are there any missing units, or duplications?), (iii)accurate (are we researching dynamic populations?), and (iv) convenient (is thesampling frame readily accessible?). Examples of such techniques are:

• simple random sampling

• systematic sampling

• stratified sampling

• cluster sampling

• multistage sampling.

We now consider each of the listed techniques, explaining their strengths andweaknesses. To illustrate each, we will use the example of 25 students (labelled ‘1’ to‘25’) who happen to be in a particular class (labelled ‘A’ to ‘E’) as follows:

A B C D E1 6 11 16 212 7 12 17 223 8 13 18 234 9 14 19 245 10 15 20 25

17.1.1 Non-probability sampling techniques

Convenience sampling

Convenience sampling attempts to obtain a sample of convenient elements (hencethe name!). Often, respondents are selected because they happen to be in the right

278

Page 291: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

17. Sampling and experimentation I — Sampling techniques and contact methods

place at the right time. Examples include using students and members of socialorganisations; ‘people-in-the-street’ interviews.

Suppose class D happens to assemble at a convenient time and place, so all elements(students) in this class are selected. The resulting sample consists of students 16, 17, 18,19 and 20. Note that no students are selected from classes A, B, C and E.

A B C D E

1 6 11 16 21

2 7 12 17 22

3 8 13 18 23

4 9 14 19 24

5 10 15 20 25

Strengths of convenience sampling include being the cheapest, quickest and mostconvenient form of sampling. Weaknesses include selection bias (discussed later) andlack of a representative sample.

Judgemental sampling

Judgemental sampling is a form of convenience sampling in which the populationelements are selected based on the judgement of the researcher. Examples includepurchase engineers being selected in industrial market research; expert witnesses used incourt.

Suppose a researcher believes classes B, C and E to be ‘typical’ and ‘convenient’.Within each of these classes one or two students are selected based on typicality andconvenience. The resulting sample here consists of students 8, 10, 11, 13 and 24. Notethat no students are selected from classes A and D.

A B C D E

1 6 11 16 212 7 12 17 22

3 8 13 18 23

4 9 14 19 24

5 10 15 20 25

Judgemental sampling is achieved at low cost, is convenient, not particularlytime-consuming and good for ‘exploratory’ research designs. However, it does not allowgeneralisations and is subjective due to the judgement of the researcher.

Quota sampling

Quota sampling may be viewed as two-stage restricted judgemental sampling. Thefirst stage consists of developing control categories, or quota controls, of populationelements. In the second stage, sample elements are selected based on convenience orjudgement. An example might be using gender as our control characteristic. Let’s saythat 48% of the population is male (and hence 52% is female), then we would want the

279

Page 292: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

17. Sampling and experimentation I — Sampling techniques and contact methods

Control characteristic: Population Sample SampleGender composition composition sizeMale 48% 48% 480Female 52% 52% 520Total 100% 100% 1,000

Table 17.2: Example of using gender as a quota control.

sample composition to reflect this. See Table 17.2 assuming a required sample size of1,000 which means 48% of the sample (480) should be male and 52% of the sample(520) should be female.

Suppose a quota of one student from each class is imposed. Within each class, onestudent is selected based on judgement or convenience. The resulting sample consists ofstudents 3, 6, 13, 20 and 22.

A B C D E

1 6 11 16 21

2 7 12 17 22

3 8 13 18 234 9 14 19 24

5 10 15 20 25

Quota sampling is advantageous in that a sample can be controlled for certaincharacteristics, however it suffers from selection bias and there is no guarantee ofrepresentativeness of the sample.

Snowball sampling

In snowball sampling, an initial group of respondents is selected, usually at random.After being interviewed, these respondents are asked to identify others who belong tothe target population of interest. Subsequent respondents are selected based on thesereferrals.

Suppose students 2 and 9 are selected randomly from classes A and B. Student 2 refersstudents 12 and 13, while student 9 refers student 18. The resulting sample consists ofstudents 2, 9, 12, 13 and 18. Note here, there are no students from class E included inthe sample.

A B C D E1 6 11 16 21

2 7 12 17 22

3 8 13 18 23

4 9 14 19 245 10 15 20 25

Snowball sampling has the major advantage of being able to increase the chance oflocating the desired characteristic in the population and is also fairly cheap. However, itcan be time-consuming.

280

Page 293: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

17. Sampling and experimentation I — Sampling techniques and contact methods

Activity 17.2 What is the major difference between judgemental and conveniencesampling? Give examples of where each of these techniques may be successfullyapplied.

17.1.2 Probability sampling techniques

Simple random sampling (SRS)

In a simple random sample, each element in the population has a known and equalprobability of selection. Each possible sample of a given size, n, has a known and equalprobability of being the sample which is actually selected. This implies that everyelement is selected independently of every other element.

Suppose we select five random numbers (using a ‘random number generator’) from 1 to25. Suppose the random number generator returns 3, 7, 9, 16 and 24. The resultingsample therefore consists of students 3, 7, 9, 16 and 24. Note here, there is no studentfrom class C.

A B C D E

1 6 11 16 21

2 7 12 17 22

3 8 13 18 23

4 9 14 19 245 10 15 20 25

SRS is simple to understand and results are readily projectable. However, there may bedifficulty constructing the sampling frame, lower precision (relative to other probabilitysampling methods) and there is no guarantee of sample representativeness.

Systematic sampling

In systematic sampling, the sample is chosen by selecting a random starting pointand then picking every i-th element in succession from the sampling frame. Thesampling interval, i, is determined by dividing the population size, N , by the samplesize, n, and rounding to the nearest integer. When the ordering of the elements isrelated to the characteristic of interest, systematic sampling increases therepresentativeness of the sample. If the ordering of the elements produces a cyclicalpattern, systematic sampling may decrease the representativeness of the sample.

For example, suppose there are 100,000 elements in the population and a sample of1,000 is required. In this case the sampling interval is i = N/n = 100,000/1,000 = 100.A random number between 1 and 100 is selected. If, for example, this number is 23, thesample consists of elements 23, 123, 223, 323, 423, 523 and so on.

Suppose we select a random number between 1 and 5, say 2. The resulting sample ofstudents consists of students 2, 2 + 5 = 7, 2 + 5× 2 = 12, 2 + 5× 3 = 17 and2 + 5× 4 = 22. Note here, all the students are selected from a single row.

281

Page 294: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

17. Sampling and experimentation I — Sampling techniques and contact methods

A B C D E1 6 11 16 21

2 7 12 17 223 8 13 18 234 9 14 19 245 10 15 20 25

Systematic sampling may or may not increase representativeness — it depends onwhether there is any ‘ordering’ in the sample frame. It is easier to implement relative toSRS.

Stratified sampling

Stratified sampling is a two-step process in which the population is partitioned(divided up) into sub-populations known as strata∗. The strata should be mutuallyexclusive and collectively exhaustive in that every population element should beassigned to one and only one stratum and no population elements should be omitted.Next, elements are selected from each stratum by a random procedure, usually SRS. Amajor objective of stratified sampling is to increase the precision of statistical inferencewithout increasing cost.

The elements within a stratum should be as homogeneous as possible (i.e. as similar aspossible), but the elements between strata should be as heterogeneous as possible (i.e. asdifferent as possible). The stratification factors should also be closely related to thecharacteristic of interest. Finally, the factors (variables) should decrease the cost of thestratification process by being easy to measure and apply.

In ‘proportionate stratified sampling’, the size of the sample drawn from each stratum isproportional to the relative size of that stratum in the total population. In‘disproportionate (optimal) stratified sampling’, the size of the sample from eachstratum is proportional to the relative size of that stratum and to the standarddeviation of the distribution of the characteristic of interest among all the elements inthat stratum.

Suppose we randomly select a number from 1 to 5 for each class (stratum) A to E. Thismight result, say, in the stratified sample consisting of students 4, 7, 13, 19 and 21. Notehere, one student is selected from each class.

A B C D E

1 6 11 16 21

2 7 12 17 22

3 8 13 18 23

4 9 14 19 245 10 15 20 25

Stratified sampling includes all important sub-populations and ensures a high level ofprecision. However, sometimes it might be difficult to select relevant stratificationfactors and the stratification process itself might not be feasible in practice if it was notknown to which stratum each population element belonged.

∗‘Strata’ is the plural of ‘stratum’.

282

Page 295: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

17. Sampling and experimentation I — Sampling techniques and contact methods

Cluster sampling

In cluster sampling, the target population is first divided into mutually exclusive andcollectively exhaustive sub-populations known as clusters. Then a random sample ofclusters is selected, based on a probability sampling technique such as SRS. For eachselected cluster, either all the elements are included in the sample (one-stage clustersampling), or a sample of elements is drawn probabilistically (two-stage clustersampling).

Elements within a cluster should be as heterogeneous as possible, but clustersthemselves should be as homogeneous as possible. Ideally, each cluster should be asmall-scale representation of the population. In ‘probability proportionate to sizesampling’, the clusters are sampled with probability proportional to size. In the secondstage, the probability of selecting a sampling unit in a selected cluster varies inverselywith the size of the cluster.

Suppose we randomly select three clusters: B, D and E. Within each cluster, randomlyselect one or two elements. The resulting sample here consists of students 7, 18, 20, 21and 23. Note here, no students are selected from clusters A and C.

A B C D E

1 6 11 16 21

2 7 12 17 22

3 8 13 18 234 9 14 19 24

5 10 15 20 25

Cluster sampling is easy to implement and cost effective. However, the technique suffersfrom a lack of precision and it can be difficult to compute and interpret results.

Multistage sampling

In multistage sampling, sample selection is performed at two or more successivestages. This technique is often adopted in large surveys. At the first stage, large‘compound’ units are sampled (primary units), and several sampling stages of this typemay be performed until we at last sample the basic units.

The technique is commonly used in cluster sampling so that we are at first samplingmain clusters, and then clusters within clusters etc. We can also use multistagesampling with mixed techniques; i.e. cluster sampling at Stage 1 and stratified samplingat Stage 2 etc.

An example might be a national survey of salespeople in a company. Sales areas couldbe identified and a random selection is taken from these. Instead of interviewing everyperson in the chosen clusters (which would be a one-stage cluster sample), onlyrandomly selected salespeople within the chosen clusters will be interviewed.

Activity 17.3 How do probability sampling techniques differ from non-probabilitysampling techniques? What factors should be considered in choosing betweenprobability and non-probability sampling?

283

Page 296: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

17. Sampling and experimentation I — Sampling techniques and contact methods

17.1.3 Method of contact

We conclude this section with a short discussion of contact methods. Once thesampling procedure has been chosen, researchers have a choice of contact method forthe survey/interview. The most common methods of contact are face-to-face interview,telephone interview and online/postal/mail (so-called ‘self-completion’) interview. Inmost countries you can assume the following:

An interviewer-administered face-to-face questionnaire will be the most expensiveto carry out.

Telephone surveys depend very much on whether your target population is on thetelephone (and how good the telephone system is).

Self-completion questionnaires can have a low response rate.

We now explore some† of the advantages and disadvantages of various contact methods:

Face-to-face interview:

• Advantages: Good for personal questions; allows for probing issues in greaterdepth; permits difficult concepts to be explained; can show samples (such asnew product designs).

• Disadvantages: (Very) expensive; not always easy to obtain detailedinformation on the spot.

Telephone interview:

• Advantages: Easy to achieve a large number of interviews; easy to check onquality of interviewers (through a central switchboard perhaps).

• Disadvantages: Not everyone has a telephone so the sample can be biased;cannot usually show samples; although telephone directories exist for landlinenumbers, what about mobile numbers? Also, young people are more likely touse mobiles rather than landlines, so are more likely to be excluded.

Self-completion interview:

• Advantages: Most people can be contacted this way (there will be littlenon-response due to not-at-home reasons); allows time for people to look updetails such as income, tax returns etc.

• Disadvantages: High non-response rate — it requires effort to complete thequestionnaire; answers to some questions may influence answers to earlierquestions since the whole questionnaire is revealed to the respondent — this isimportant where the order of a questionnaire matters; you have no control overwho answers the questionnaire.

Example 17.1

Examples of occasions when you might use a particular method are:

†This is not necessarily an exhaustive list. Can you add any more?

284

Page 297: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

17. Sampling and experimentation I — Sampling techniques and contact methods

Face-to-face interviewer — a survey of shopping patterns

Here you need to be able to contact a sample of the whole population. You canassume that a large proportion would not bother to complete a postalquestionnaire — after all, the subject matter is not very important and it takestime to fill in a form! Using a telephone would exclude those (for example, thepoor and the elderly) who either do not have access to a telephone or areunwilling to talk to strangers by telephone.

Telephone interviewer — a survey of businessmen and their attitudeto a new item of office equipment

All of them will have a telephone, and also the questions should be simple toask. Here, booking time for a telephone interview at work (once it has beenagreed with the administration) should be much more effective than waiting fora form to be filled in, or sending interviewers to disrupt office routine.

Postal/mail questionnaire — a survey of teachers about their pay andconditions

Here, on the spot interviews will not elicit the level of detail needed. Mostpeople do not remember their exact pay and taxation, particularly if they areneeded for earlier years. We would expect a high response rate and good-qualitydata — the recipients are motivated to reply, since they may be hoping for apay rise! Also, the recipients come from a group of people who find it relativelyeasy to fill in forms without needing the help, or prompting, of an interviewer.

Remember that it is always possible to combine methods. The Family ExpenditureSurvey in the UK, for example, combines the approach of using an interviewer threetimes over a fortnight (to raise response and explain details) while the respondenthousehold is required to fill in a self-completion diary (showing expenditure, whichcould not be obtained by interview alone).

Similarly, telephone interviews may be combined with a mail-shot sub-sampledindividual survey, in the case of offices and businesses faxing additional information. Inthe case of the telephone survey of businessmen described above, a description of thenew equipment could be faxed to the businessmen in advance, or as they are telephoned.

Remember also that email surveys are already widespread and becoming increasinglypopular, although they are only appropriate when the population to be studiedregularly uses email and is likely to reply to your questions, such as employees in youroffice. An obvious advantage is that this method is very cheap to administer.

17.2 Summary

This unit has described the different sampling techniques which exist when samplingfrom a population. It is important to know the merits and limitations of each so that arecommendation for the most suitable choice of method can be made dependent on thecircumstances of the research problem. Attention has also been given to the choice ofcontact method, again with a focus on the strengths and weaknesses of each type.

285

Page 298: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

17. Sampling and experimentation I — Sampling techniques and contact methods

17.3 Key terms and concepts

Census Cluster samplingContact methods Convenience samplingJudgemental sampling Multistage samplingPopulation Quota controlsQuota sampling SampleSampling design Sampling frameSampling interval Simple random sampleSnowball sampling Stratified samplingSystematic sampling Target population

Learning outcomes

At the end of this unit, you should be able to:

design and conduct surveys in a social science context

discuss the relative merits and limitations of different sampling techniques

recommend an appropriate survey contact method

discuss the key terms and concepts introduced in this unit.

Exercises

Exercise 17.1

What is/are the main potential disadvantage(s) of quota sampling with respect toprobability sampling?

Exercise 17.2

Why might disproportionate stratified sampling be preferable to proportionate stratifiedsampling?

Exercise 17.3

In no more than 200 words, discuss the relative advantages and disadvantages oftelephone interviewing compared to face-to-face interviewing.

Exercise 17.4

The simplest probability-based sampling method is simple random sampling. Give tworeasons why it may be desirable to use a sampling design which is more sophisticatedthan simple random sampling.

Exercise 17.5

What is the difference between one-stage and two-stage cluster sampling?

286

Page 299: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

17. Sampling and experimentation I — Sampling techniques and contact methods

Exercise 17.6

A corporation wants to estimate the total number of worker-hours lost for a givenmonth because of accidents among its employees. Each employee is classified into one ofthree categories: (a) labourer, (b) technician, and (c) administrator. Which samplingmethod do you think would be preferable here: simple random sampling, stratifiedsampling, or cluster sampling? Give arguments to explain your choice.

Exercise 17.7

What criteria would you use in deciding which form of contact to use in a survey ofindividuals?

Exercise 17.8

Discuss the feasibility of each of the types of survey contact methods (personalinterview, postal survey, email, telephone survey) for a random sample of universitystudents about their undergraduate experiences and attitudes at the end of theacademic year.

Exercise 17.9

Retirement and Investment Services would like to conduct a survey on online users’demands for additional internet retirement services. Outline your suggested samplingand contact method and explain how the results might be affected by your methodology.

287

Page 300: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

18. Sampling and experimentation II — Bias and the design of experiments

Unit 18: Sampling andexperimentation IIBias and the design of experiments

Overview

This unit explores potential sources of bias which may occur as a result of sampling.Bias comes in various forms and potential remedies are presented. We conclude with alook at the design of experiments in the social sciences. Unlike observational studies,experiments are excellent for establishing causality through use of a control group.

Aims

This unit presents sources of bias and design of experiments. Particular aims are:

to be aware of different sources of error and bias

to provide an overview of experimentation in the social sciences

to introduce the notion of causality and how properly designed experiments cantest for this.

Background reading

+ Swift, L., and S. Piff Quantitative methods for business, management and finance.(Palgrave, 2010) third edition [ISBN 9780230218246] ‘Statistics’ Chapter 8.

18.1 Introduction

We have previously seen that the term target population represents the collection ofunits (people, objects etc.) in which we are interested. In the absence of time andbudgetary constraints we conduct a census, that is a total enumeration of thepopulation. Its advantage is that there is no sampling error because all population unitsare observed and so there is no estimation of population parameters. Due to the largesize, N , of most populations, an obvious disadvantage with a census is cost, so it isoften not feasible in practice. Even with a census non-sampling error may occur, forexample if we have to resort to using cheaper (hence less reliable) interviewers who mayerroneously record data, misunderstand a respondent etc.

So we select a sample, that is a certain number of population members are selected andstudied. The selected members are known as elementary sampling units. Sample surveys

288

Page 301: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

18. Sampling and experimentation II — Bias and the design of experiments

(hereafter ‘surveys’) are how new data are collected on a population and tend to bebased on samples rather than a census. Selected respondents may be contacted in avariety of methods such as face-to-face interviews, telephone, mail or emailquestionnaires.

Sampling error will occur (since not all population units are observed). However,non-sampling error should be less since resources can be used to ensure high qualityinterviewers or to check completed questionnaires.

18.2 Types of error

Several potential sources of error can affect a research design which we do our utmost tocontrol. The ‘total error’ represents the variation between the true value of a parameterin the population of the variable of interest (such as a population mean) and theobserved value obtained from the sample. Total error is composed of two distinct typesof error in sampling design:

Sampling error: This occurs as a result of us selecting a sample, rather thanperforming a census (where a total enumeration of the population is undertaken).

• It is attributable to random variation due to the sampling scheme used.

• For probability sampling, we can estimate the statistical properties of thesampling error, i.e. we can compute (estimated) standard errors which facilitatethe use of hypothesis testing and construction of confidence intervals.∗

Non-sampling error is a result of (inevitable) failures of the sampling scheme.

• In practice it is very difficult to quantify this sort of error, typically throughseparate investigation. We distinguish between two sorts of non-samplingerror:

◦ Selection bias — this may be due to (i.) the sampling frame not beingequal to the target population, or (ii.) cases where the sampling scheme isnot strictly adhered to, or (iii.) non-response bias.

◦ Response bias — the actual measurements might be wrong, for exampleambiguous question wording, misunderstanding of a word in aquestionnaire by less-educated people, or sensitivity of information whichis sought. Interviewer bias is another aspect of this, where theinteraction between the interviewer and interviewee influences theresponse given in some way, either intentionally or unintentionally, such asthrough leading questions, the dislike of a particular social group by theinterviewer, the interviewer’s manner or lack of training, or perhaps theloss of a batch of questionnaires from one local post office. These could alloccur in an unplanned way and bias your survey badly.

Both kinds of error can be controlled or allowed for more effectively by a pilot surveyA pilot survey is used:

∗Note hypothesis testing and confidence intervals are not explicitly covered in this course.

289

Page 302: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

18. Sampling and experimentation II — Bias and the design of experiments

to find the standard error which can be attached to different kinds of questions andhence to underpin the sampling design chosen, and

to sort out non-sampling questions:

• Do people understand the questionnaires?

• Are our interviewers working well?

• Are there particular organisational problems associated with this enquiry?

Activity 18.1 Give the main problem with the wording of the survey question: ‘Doyou want to be rich and famous?’.

18.3 Bias

Bias caused by non-response and response is worth a special mention. It can causeproblems at every stage of a survey, both random and quota, and however administered.

The first problem can be in the sampling frame. Is an obvious group missing? Forexample:

if the list is of householders, those who have just moved in will be missing

if the list is of those aged 18 or over on the electoral register, and the under-20s arecareless about registration, then younger people will be missing from the sample.

In the field, non-response (data not provided by a unit that we wish to sample) is oneof the major problems of sample surveys as the non-respondents, in general, cannot betreated like the rest of the population. As such, it is most important to try to get apicture of any shared characteristics in those refusing to answer or people who are notavailable at the time of the interview. We can classify non-response as follows:

Item non-response occurs when a sampled member fails to respond to a questionin the questionnaire.

Unit non-response occurs when no information is collected from a samplemember.

Non-response may be due to any of the following factors:

Not-at-home due to work commitments, or on holiday.

Refusals due to subject matter, or sponsorship of the survey.

Incapacity to respond due to illness, or language difficulties.

Not found due to vacant houses, incorrect addresses, moved on.

Lost schedules due to information being lost or destroyed after it had beencollected.

How should we deal with non-response? Well, note that increasing the sample size willnot solve the problem — the only outcome would be that we have more data on the

290

Page 303: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

18. Sampling and experimentation II — Bias and the design of experiments

types of individuals who are willing to respond! Instead, we might look at improvingour survey procedures such as data collection and interviewer training. Non-respondentscould be followed up using call-backs, or an alternative contact method to the originalsurvey in an attempt to sub-sample the non-respondents. A proxy interview (where aunit from your sample is substituted with an available unit) may be another possibility.(Note that non-response also occurs in quota sampling but is not generally recorded —see the earlier discussion.) However, an obvious remedy is to provide an incentive (forexample cash or entry into a prize draw) to complete the survey — this exploits thenotion that human behaviour can be influenced in response to the right incentives!

Response error is very problematic because it is not so easy to detect. A seeminglyclear reply may be based on a misunderstanding of the question asked or a wish todeceive. A good example from the UK is the reply to the question about theconsumption of alcohol in the Family Expenditure Survey. Over the years there is up toa 50% understatement of alcohol use compared with the overall known figures for salesfrom HM Revenue & Customs!

Sources of response error include:

Role of the interviewer due to the characteristics and/or opinions of theinterviewer, asking leading questions and the incorrect recording of responses.

Role of the respondent who may lack knowledge, forget information or bereluctant to give the correct answer due to the sensitivity of the subject matter.

Control of response errors typically involves improving the recruitment, training andsupervision of interviewers, re-interviewing, consistency checks and increasing thenumber of interviewers.

In relation to all these problems, pilot work is very important. It may also be possibleto carry out a check on the interviewers and methods used after the survey(post-enumeration surveys).

18.4 Adjusting for non-response

Low response rates increase the probability that non-response bias will be problematic.Response rates should always be reported, and, whenever possible, the effects ofnon-response should be estimated. This is possible by linking the response rate toestimated differences between respondents and non-respondents. Information ondifferences between both groups may be obtained from the sample itself, for exampledifferences identified through call-backs could be extrapolated, or perhaps aconcentrated follow-up could be performed on a sub-sample of non-respondents.

However, it may be that it is not possible to estimate the effects of non-response. Insuch instances we can make adjustments during data analysis and interpretation. Wenow briefly consider some possible adjustments for non-response.

Sub-sampling of non-respondents — the researcher contacts a sub-sample of thenon-respondents, usually by means of telephone or personal interviews.

291

Page 304: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

18. Sampling and experimentation II — Bias and the design of experiments

Replacement — the non-respondents in the current survey are replaced withnon-respondents from an earlier, similar survey. The researcher attempts to contactthese non-respondents from the earlier survey and administer the current surveyquestionnaire to them, possibly by offering a suitable incentive.

Substitution — the researcher substitutes for non-respondents other elements fromthe sampling frame that are expected to respond. The sampling frame is dividedinto sub-groups that are internally homogeneous in terms of respondentcharacteristics but heterogeneous in terms of response rates. These subgroups arethen used to identify substitutes who are similar to particular non-respondents butdissimilar to respondents already in the sample.

Subjective estimates — when it is no longer feasible to increase the response rateby sub-sampling, replacement, or substitution, it may be possible to arrive atsubjective estimates of the nature and effect of non-response bias. This involvesevaluating the likely effects of non-response based on experience and availableinformation.

Trend analysis — this is an attempt to discern a trend between early and laterespondents. This trend is projected to non-respondents to estimate where theystand on the characteristic of interest.

Weighting — attempts to account for non-response by assigning differential weightsto the data depending on the response rates. For example, in a survey the responserates were 85%, 70% and 40%, respectively, for the high-, medium- and low-incomegroups. In analysing the data, these subgroups are assigned weights inverselyproportional to their response rates. That is, the weights assigned would be 100/85,100/70 and 100/40, respectively, for the high- medium- and low-income groups.

Imputation — involves imputing, or assigning, the characteristic of interest to thenon-respondents based on the similarity of the variables available for bothnon-respondents and respondents. For example, a respondent who does not reportbrand usage may be imputed the usage of a respondent with similar demographiccharacteristics.

Activity 18.2 Non-response in surveys is considered to be problematic.

(a) Give at least two possible reasons why non-response may occur.

(b) Why is non-response problematic for the person or organisation conducting theresearch? Give two reasons.

(c) How can non-response be reduced in (i.) telephone surveys, and (ii.) mailsurveys?

292

Page 305: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

18. Sampling and experimentation II — Bias and the design of experiments

18.5 Experimental design in the social and medicalsciences

We conclude the ‘Sampling and experimentation’ part of the course with a look atexperimental design in the social and medical sciences. Research design is a set ofadvanced decisions that make up the master plan specifying the methods andprocedures for collecting and analysing the needed information. There is a huge array ofalternative research designs that can satisfy research objectives. The key is to create adesign that enhances the value of the information obtained, whilst reducing the cost ofobtaining it.

A research design is a framework or blueprint for conducting the research project. Itdetails the procedures necessary for obtaining the information needed to structure orsolve research problems. There are basic research designs that can be successfullymatched to given problems and research objectives, and they serve a researcher muchlike a blueprint serves a builder.

18.5.1 Experimental versus observational studies

In an experiment, an intervention or treatment is applied to some or all of theexperimental units (often people). The experimenter decides (using randomisationand blocking) which person gets which treatment, or treatment combination. Theoutcomes are recorded after the intervention (and often various measurements are madebefore the intervention). The subsequent analysis involves comparing the outcomes forthe different treatments. Typically the experimental units are not chosen to beparticularly representative of a population of interest.

In an observational study, data are collected about the units (people) without anyintervention. In fact the researcher always tries not to influence the observations. Asocial sample survey is an example of an observational study where the main dataare responses to a questionnaire. The questionnaire and interviewing technique aredesigned to obtain, as far as possible, responses that are not biased by the manner ofasking. Typically, the sample is designed to be representative of the population ofinterest using either quota sampling or probability sampling.

18.5.2 Randomised controlled clinical trials

Randomised controlled clinical trials (RCCTs) are routinely used in thedevelopment and testing of medical procedures. The methodology can also beappropriate for some research in the social sciences (for example, criminology, educationetc.).

In the simplest completely randomised design, the participants are divided at random(i.e. using a randomisation device) into a treatment group and a control group. Thetreatment group receives the experimental treatment (e.g. a new drug) and the controlgroup receives the control treatment (e.g. the drug currently used, or a placebo).

The randomisation ensures that there will, on average, be no bias due to the allocationof the treatments. The use of a control group is essential to estimate the extent to

293

Page 306: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

18. Sampling and experimentation II — Bias and the design of experiments

which the outcomes in the treatment group are due to the treatment and are not ‘whatwould have happened anyway’.

Typically, participating in an experiment produces a change, even if no treatment isgiven. This is called the placebo effect in medical trials and the Hawthorne effect insocial science experiments. Therefore it is usual to include a dummy treatment given tothe control group.

In order to avoid bias in assessing the effects of the treatments, double or tripleblinding is recommended. Double blinding means that both the subjects andadministrators are unaware who has received the treatment. Triple blinding means thatneither the person receiving the treatment, nor those involved in their care, nor thoseinvolved in measuring the outcomes know which treatment was given to which person.

The sample size in each treatment group must be large enough to ensure that medically(or socially) important differences can be detected. Sample size calculation to ensureadequate power is a routine part of experimental design.

18.5.3 Randomised blocks

To increase the accuracy of the comparisons, the units may be grouped into blocks(e.g. by age and gender, or by severity of disease etc.). Within each block one or moreunits receive each treatment. Treatments are allocated using randomisation within eachblock. Sometimes there are strata or subgroups of interest (e.g. we might want to knowwhether the drug is as effective for men as it is for women) in which case the blocksshould be chosen to correspond to strata (or to subsets of strata).

18.5.4 Multi-factorial experimental designs

In multi-factorial experimental designs, rather than just one factor or treatment(drug, say) being tested, several factors are tested simultaneously (such as drug, dietand exercise). For example, if each of these factors had two levels (experimental = 1,control = 0), then there would be 8 combinations. Ideally the units (people) would beassigned to blocks of size 8 (where people in the same block would have similarcharacteristics) and the 8 treatment combinations would be allocated at random (usinga randomisation device) to the 8 people within each block.

Not only does this allow efficient use of resources (three different factors can becompared for only a little more than the price of one), but also interactions between thefactors can be estimated (for example, the effect of aspirin might be different for thoseon a low-fat diet than for those on a normal diet).

18.5.5 Quasi-experiments

Experiments in which there is no control group, or in which randomisation is not used,are sometimes called quasi-experiments. An example might be where a new teachingmethod is introduced for students taking a course in the current academic year.Students who took the course in the previous academic year can be used as the controlgroup, but since no randomisation was used in the allocation of teaching method anydifferences in outcomes for the two years might arise from known or unknown

294

Page 307: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

18. Sampling and experimentation II — Bias and the design of experiments

differences between the years rather than from differences in teaching method.

18.5.6 Cluster randomised trials

Cluster randomised trials are used where it is not practical to apply a treatment ortreatment combination to individuals using randomisation, but only to groups orclusters of individuals. In an educational experiment, schools might be clusters. Half theschools, chosen at random, might be given new technology and the other half not. Thenresults for the students could be aggregated to school level and the treatments could becompared. Note the experimental units are the clusters (schools) and the relevantsample size is the number of clusters (schools), not the number of subunits (students).

18.5.7 Analysis and interpretation

Similar methods of statistical analysis may be used for experimental and forobservational data, but the interpretation differs.

An observational study (such as a survey of schools) may show that schools withmodern technology have better examination results. But this could be due to the factthat these schools are generally better equipped and/or have better students.

In an experiment where schools chose to participate it might be found that thoseprovided with ‘modern technology’ did better than those given extra supplies of paperand pencils. This might be evidence that having modern equipment would help schoolsthat would choose to participate. Experiments can provide evidence of causation.However, the results might not apply to all schools. Less adventurous or morehard-pressed schools might benefit more from additional paper and pencils.

Activity 18.3 Write notes on the following:

(a) Blind trials

(b) Control and treatment groups

(c) Measuring causation.

18.6 Summary

This unit has explored the different sources of error and bias which exist when drawinga sample from a population. Non-response bias is particularly problematic, and avariety of adjustments to account for non-response were suggested. Experimentationconcluded this topic, and the importance of a control and treatment group was outlinedin order to establish causality.

295

Page 308: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

18. Sampling and experimentation II — Bias and the design of experiments

18.7 Key terms and concepts

Blinding BlockingControl group ExperimentIncentive InterventionInterviewer bias Item non-responseNon-response Non-sampling errorObservational study Pilot surveyPlacebo RandomisationResearch design Response biasResponse error Sampling errorSelection bias TreatmentUnit non-response

Learning outcomes

At the end of this unit, you should be able to:

design and conduct experiments in a social science context

define different forms of bias, explain why they are problematic and offer potentialremedies

explain how an experiment can be used to determine causality

discuss the key terms and concepts introduced in this unit.

Exercises

Exercise 18.1

The following question appeared in a survey for university students: ‘How much time doyou spend studying per week?’. List two problems with the phrasing of this questionthat may adversely affect the reliability of the answers to it.

Exercise 18.2

Give an example of response bias. Is response bias a form of sampling error or a form ofnon-sampling error? Briefly explain why.

Exercise 18.3

In no more than 200 words, explain the difference between an experimental design anda survey design, and discuss their relative advantages.

Exercise 18.4

Briefly discuss the advantages and disadvantages of paying respondents for an interview.

296

Page 309: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

18. Sampling and experimentation II — Bias and the design of experiments

Exercise 18.5

A research group has designed a survey and finds the costs are greater than theavailable budget. Two possible methods of saving money are a sample size reduction orspending less on interviewers, for example, by providing less interviewer training ortaking on less-experienced interviewers. Discuss the advantages and disadvantages ofthese two methods.

Exercise 18.6

In no more than 200 words, discuss the role of the interviewer in a survey and theimportance of training an interviewer.

Exercise 18.7

Readers of the magazine Popular Science were asked to phone in (on a premium ratenumber) their responses to the following question: ‘Should the United States build morefossil-fuel generating plants or the new so-called safe nuclear generators to meet theenergy crisis?’. Of the total call-ins, 86% chose the nuclear option. Discuss the way thepoll was conducted, the question wording, and whether or not you think the results area good estimate of the prevailing mood in the country.

Exercise 18.8

What is randomisation in the context of experimental design?

Exercise 18.9

Explain what is meant by each of the following and why they are considered desirable inan experiment:

(a) placebo

(b) double blinding

(c) blocking

(d) multi-factorial design.

297

Page 310: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

19. Fundamentals of regression I — Correlation and the simple linear regression model

Unit 19: Fundamentals of regression ICorrelation and the simple linearregression model

Overview

In Section 12.5, we saw that bivariate datasets could be visualised using scatter plots.We discussed, for example, the effect advertising appeared to have on sales, i.e. whetherthere is a positive or negative relationship between the variables. In this unit we gofurther by introducing correlation and then proceed to modelling a linear relationshipbetween variables using a common procedure known as regression.

Aims

This unit explains the concepts of correlation and the fundamentals of regression.Particular aims are:

to highlight the importance of correlation

to provide an introduction to modelling a linear relationship between variables.

Background reading

+ Swift, L., and S. Piff Quantitative methods for business, management and finance.(Palgrave, 2010) third edition [ISBN 9780230218246] ‘Statistics’ Chapter 3.

19.1 Introduction

We now investigate the relationship between variables. When we have data on twovariables (X and Y ), we have bivariate data. We will consider how to:

measure the strength of the relationship

model the relationship

predict the value of one variable on the basis of the other.

The first thing to do with data is to provide a graphical representation. For one variablethis might be a histogram, pie chart etc. For two variables we produce a scatter plot (aspreviously discussed in Section 12.5).

298

Page 311: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

19. Fundamentals of regression I — Correlation and the simple linear regression model

Example 19.1

Assume that we have some data in paired form, say

(xi, yi), i = 1, 2, . . . , n.

An example might be unemployment and crime figures for 12 areas of a city.

Unemployment, x 2,614 1,160 1,055 1,199 2,157 2,305Offences, y 6,200 4,610 5,336 5,411 5,808 6,004

Unemployment, x 1,687 1,287 1,869 2,283 1,162 1,201Offences, y 5,420 5,588 5,719 6,336 5,103 5,268

We plot X on the horizontal axis, and Y on the vertical axis. By doing so, we caneasily see whether there is any relationship between the variables. The scatter plot isshown in Figure 19.1.

x

x

xx

x

x

x

x

x

x

x

x

1000 1500 2000 2500

5000

5500

6000

Scatter plot of Crime against Unemployment

Unemployment

Num

ber

of o

ffenc

es

Figure 19.1: Scatter plot of unemployment and reported crime data.

Looking at Figure 19.1 an approximate positive, linear relationship is apparent. X andY increase together, roughly linearly. The implied linear relationship is not exact — thepoints do not lie exactly on a straight line. Such an ‘upward shape’ is termed positivecorrelation. (We will see later how to quantify correlation.)

Other examples of scatter plots are shown in Figures 19.2 and 19.3.

299

Page 312: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

19. Fundamentals of regression I — Correlation and the simple linear regression model

xx

x

x

x

x

x

xx

x

x

x

2 4 6 8

24

68

Scatter plot

x

y

Figure 19.2: Scatter plot showing negative correlation (Y decreases as X increases).

x

x

x

x

x

x

x

x

x

x

x

x

0 2 4 6 8

24

68

Scatter plot

x

y

Figure 19.3: Scatter plot showing uncorrelated data (no obvious (linear) relationshipbetween X and Y ).

300

Page 313: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

19. Fundamentals of regression I — Correlation and the simple linear regression model

19.2 Correlation

Correlation measures the strength of the linear relationship between two variables,each measured on an interval scale.

Positive correlation — the two variables tend to vary in the same direction.

Negative correlation — the two variables tend to vary in the opposite direction.

Perfect correlation — the two variables have points which all lie exactly on astraight line.

If there exists a perfect linear relationship between X and Y , we can represent themusing an equation of the form

Y = b0 + b1X. (19.1)

where

b0 represents the Y -intercept of the line.

b1 represents the slope or gradient of the line.

Examples of anticipated correlation include:

Variables CorrelationHeight & weight PositiveRainfall & sunshine hours NegativeIce cream sales & sun cream sales PositiveHours of study & exam mark PositiveCar’s petrol consumption & goals scored Zero

Positive correlation is characterised by large X with large Y ; small X with small Y .Negative correlation is characterised by large X with small Y ; small X with large Y .However, since the X and Y may have widely different numerical values we need to takethis into account. We do this by considering how far away from their means the twovariables are.

So, we are interested in the degree to which variations in variable values are related toeach other. Our basis for the measurement of correlation is

n∑i=1

(xi − x)(yi − y) =n∑i=1

xiyi − nxy. (19.2)

Unfortunately, this measure is extremely sensitive to the units in which the variablesare measured. We would prefer a measure of correlation to remain the same regardlessof the units of measurement (e.g. days, hours, minutes or seconds). For this reason weuse the following.

301

Page 314: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

19. Fundamentals of regression I — Correlation and the simple linear regression model

The sample correlation coefficient, r, measures the strength of the linearrelationship between two variables. It is given by

r =1

n− 1

n∑i=1

(xi − xsx

)(yi − ysy

), (19.3)

where x and y are the sample means, and sx and sy are the sample standarddeviations.

Sample correlation coefficient

Note that r is just the sum of the products of the z-scores (see Section 16.1) of eachpoint’s coordinates. This statistic is completely independent of the units used tomeasure the variables.

We can also find r using the formula

r =Sxy√SxxSyy

, (19.4)

where

Sxx =n∑i=1

(xi − x)2 =n∑i=1

x2i − nx2 (19.5)

Syy =n∑i=1

(yi − y)2 =n∑i=1

y2i − ny2 (19.6)

Sxy =n∑i=1

(xi − x)(yi − y) =n∑i=1

xiyi − nxy, (19.7)

but we will not show the equivalence in this course.

Returning to the unemployment/crime dataset in Example 19.1, we have∑xi = 19,979,

∑x2i = 36,695,129,

∑yi = 66,803,∑

y2i = 374,471,231,

∑xiyi = 113,784,494

So, since n = 12, we have x = 19,979/12 = 1,664.92 and y = 66,803/12 = 5,566.92, sothe (sample) correlation coefficient, r, is

r =

∑xiyi − nxy√

(∑x2i − nx2)× (

∑y2i − ny2)

=113,784,494− (12× 1,664.92× 5,566.92)√

(36,695,129− 12(1,664.922))× (374,471,231− 12(5,566.922))

= 0.861.

The (sample) correlation coefficient, r, takes between −1 and 1, i.e.

−1 ≤ r ≤ 1.

302

Page 315: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

19. Fundamentals of regression I — Correlation and the simple linear regression model

r > 0 indicates positive correlation, with r = 1 indicating perfect positivecorrelation.

r < 0 indicates negative correlation, with r = −1 indicating perfect negativecorrelation.

The closer |r| is to 1, the stronger the linear relationship is.

r ' 0 suggests that they have no linear relationship.

Beware r ' 0 does not necessarily imply no relationship (as there could be anon-linear relationship). For example, the scatter plot in Figure 19.4 arises fromdata where r = 0.148 but there is a clear quadratic relationship.

xx

x

x

x

x

x xx

x x

x

x

xx

20 30 40 50 60 70 80

500

1000

1500

2000

2500

Scatter plot

x

y

Figure 19.4: Scatter plot of data simulated from the (approximate) quadratic equationy = 2(x− 15)(85− x).

Activity 19.1 State whether the following statements are true or false andexplain.

(a) ‘The correlation between X and Y is the same as the correlation between Y andX.’

(b) ‘If the slope is negative in a regression equation y = b0 + b1x, then thecorrelation coefficient between X and Y would be negative too.’

(c) ‘If two variables have a correlation coefficient of minus 1 they are not related.’

(d) ‘A large correlation coefficient means the regression line will have a high slopeb1.’

303

Page 316: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

19. Fundamentals of regression I — Correlation and the simple linear regression model

19.3 Simple linear regression

Here we introduce the simple linear regression model. This is only part of a verylarge topic in statistical analysis. In the simple model, we have two variables Y and X:

Y is the dependent (or response) variable — the variable we are trying toexplain.

X is the independent (or explanatory) variable — the factor we thinkinfluences Y .

Numerous reasons exist for establishing a mathematical relationship between Y and X:

To find and interpret unknown parameters in a known relationship.

To understand the reason for such a relationship — is it causal?

To predict or forecast Y for specific values of the explanatory variable.

Hence our objectives in regression analysis are to:

Estimate any unknown parameters.

Estimate the variation about the proposed model.

Estimate the precision of the estimates.

Test the adequacy of the proposed model, and the relevance of the explanatoryvariable.

Assume a true (population) linear relationship between a response variable y and anexplanatory variable x of the approximate form:

y = b0 + b1x.

where

b0 and b1 are fixed, but unknown, parameters

b0 is the y-intercept

b1 is the slope of the line.

We seek to estimate b0 and b1 using (paired) sample data (xi, yi), i = 1, . . . , n.

Particularly in the social sciences, we would not expect a perfect linear relationshipbetween the two variables. Hence we modify this basic model to get

y = b0 + b1x+ ε, (19.8)

where ε is some random perturbation from the initial ‘approximate’ line. In otherwords, each y observation almost lies on the postulated line, but ‘jumps’ off the lineaccording to the random variable ε. Often we refer to ε as the error term of the model.

304

Page 317: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

19. Fundamentals of regression I — Correlation and the simple linear regression model

19.4 Parameter estimation

For given sample data we could first produce a scatter plot, from which any linearrelationship would be visible. When the data indicate a linear relationship, it suggestswe should perform a (simple) linear regression. So we need to estimate the populationregression line using the sample data. This estimated line is often called the line ofbest fit.

How do we choose the line of best fit? Well, we require a formal criterion fordetermining the line of best fit. Without going into details (which are beyond the scopeof this course), estimation of b0 and b1 will be by least squares estimation.Specifically, we seek to minimise the sum of the squared error terms.

So the error terms are used to estimate the parameters (slope and intercept) of themodel. Note this is an optimisation problem. The intercept tells us the value of theresponse variable when the explanatory variable is zero. The slope tells us by how muchthe response variable changes when the explanatory variable increases by one unit.

The least squares estimator for b1 is

b1 =

∑xiyi − nxy∑x2i − nx2

. (19.9)

The least squares estimator for b0 is

b0 = y − b1x. (19.10)

Least squares estimators

Hence the line of best fit has equation

y = b0 + b1x, (19.11)

where y is our estimate of y based on the line of best fit when x is the value of theexplanatory variable.

Returning to the unemployment/crime dataset from Example 19.1, we have∑xi = 19,979,

∑x2i = 36,695,129,

∑yi = 66,803,

∑y2i = 374,471,231,

∑xiyi = 113,784,494.

Since n = 12, we have x = 19,979/12 = 1,664.92 and y = 66,803/12 = 5,566.92, hence

b1 =

∑xiyi − nxy∑x2i − nx2

=113,784,494− (12× 1,664.92× 5,566.92)

36,695,129− (12× 1,664.922)= 0.7468.

305

Page 318: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

19. Fundamentals of regression I — Correlation and the simple linear regression model

We then estimate the intercept to be

b0 = y − b1x

= 5,566.92− 0.7468× 1,664.92

= 4,323.6.

Hence the least squares regression line is

y = 4,323.6 + 0.7468x.

19.5 Prediction

One of the reasons for calculating the line of best fit is prediction. Specifically, forsome value of x, we can provide a prediction for y. So, returning to the example, howmany offences would you predict if there were 2,000 unemployed people in a city area?

Answer: just substitute the desired value of x into the least squares regression line:

y = 4,323.6 + 0.7468× 2,000 = 5,817.

Provided we are predicting y for an x value that is within the available x data, then wecan be fairly confident in our prediction. This is what we call interpolation. However,if we base our prediction on an x value outside the available x data, then we shouldview the prediction with caution. This would be an example of extrapolation which isrisky since the relationship between x and y may change for such values of x.

Activity 19.2 The following table shows the number of computers, in thousands, x,produced by a company each month and the corresponding monthly costs in £000s,y, for running its computer maintenance department.

Number of computers Maintenance costs(in thousands), x (£000s), y

7.2 1008.1 1166.4 987.7 1128.2 1156.8 1037.3 1067.8 1077.9 1128.1 111

The following statistics can be calculated from the data.

10∑i=1

xi = 75.5,10∑i=1

yi = 1,080,10∑i=1

xiyi = 8,184.9,

306

Page 319: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

19. Fundamentals of regression I — Correlation and the simple linear regression model

10∑i=1

x2i = 573.33,

10∑i=1

y2i = 116,988.

(a) Draw the scatter diagram.

(b) Calculate the correlation coefficient for computers and maintenance costs.

(c) Find the best-fitting straight line relating y and x.

(d) Comment on your results. How would you check on the strength of therelationship you have found?

19.6 Summary

This unit has introduced the concept of correlation to measure the strength of a linearrelationship between two continuous variables. Having seen that a linear relationshipexists between two such variables, it is possible to model the relationshipmathematically using the simple linear regression model. Estimation of the interceptand slope in the regression model was discussed and the subsequent use of theestimated model for prediction.

19.7 Key terms and concepts

Dependent variable Error termExtrapolation Independent variableIntercept InterpolationLeast squares estimation Line of best fitNegative correlation Perfect correlationPositive correlation PredictionSample correlation coefficient Simple linear regression modelSlope (gradient) Uncorrelated

Learning outcomes

At the end of this unit, you should be able to:

discuss the strength of correlation between two continuous variables

interpret correlation coefficients

explain the purpose of regression

discuss the key terms and concepts introduced in this unit.

307

Page 320: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

19. Fundamentals of regression I — Correlation and the simple linear regression model

Exercises

Exercise 19.1

(a) Explain the meaning of the term ‘least squares estimate’.

(b) Explain, briefly, the difference between regression and correlation.

Exercise 19.2

Define the term ‘sample correlation coefficient’, r, based on data (x1, y1), . . . , (xn, yn).Describe some properties of r in terms of how its value is different when the data havedifferent patterns of scatter plot.

Exercise 19.3

An area manager in a department store wants to study the relationship between thenumber of workers on duty and the value of merchandise lost to shoplifters. To do so,she assigned a different number of clerks for each of 10 weeks. The results were asfollows.

Week Number of workers (xi) Loss (yi)1 9 4202 11 3503 12 3604 13 3005 15 2256 18 2007 16 2308 14 2809 12 31510 10 410

Here are some useful summary statistics:

10∑i=1

xi = 130,10∑i=1

yi = 3,090,10∑i=1

xiyi = 38,305,

10∑i=1

x2i = 1,760,

10∑i=1

y2i = 1,007,750.

(a) Which variable should be the independent variable and which should be thedependent variable?

(b) Plot the data in a scatter diagram and comment on its shape.

(c) Find the least squares regression line.

(d) Interpret the regression coefficients of the fitted line.

(e) Predict the loss when the number of workers is 17.

(f) Compute the correlation coefficient between the number of workers and the loss.

308

Page 321: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

19. Fundamentals of regression I — Correlation and the simple linear regression model

Exercise 19.4

(a) Write down the simple linear regression model, explaining each term in the model.

(b) The following data were recorded during an investigation into the effect of fertiliser,x, on crop yield, y.

Crop yields (kg/ha) 160 168 176 179 183 186 189 186 184Fertiliser (g/m2) 0 1 2 3 4 5 6 7 8

Here are some useful summary statistics:

9∑i=1

xi = 36,9∑i=1

yi = 1,611,9∑i=1

xiyi = 6,627,

9∑i=1

x2i = 204,

9∑i=1

y2i = 289,099.

i. Plot the data and comment on the appropriateness of using the simple linearregression model.

ii. Calculate a least squares regression line for the data.

iii. Predict the crop yield for 3.5 g/m2 of fertiliser.

iv. Would you feel confident predicting a crop yield for 10 g/m2 of fertiliser?Explain briefly why or why not.

Exercise 19.5

In a study of household expenditure a population was divided into five income groupswith the mean income, x, and the mean expenditure, y, on essential items recorded (inEuros per month). The results are in the following table.

x y1,000 8712,000 1,3003,000 1,7604,000 2,3265,000 2,950

Here are some useful summary statistics:

5∑i=1

xi = 15,000,5∑i=1

yi = 9,207,5∑i=1

xiyi = 32,805,000,

5∑i=1

x2i = 55,000,000,

5∑i=1

y2i = 19,659,017.

(a) Fit a straight line to the data.

(b) How would you use the fit to predict the percentage of income that householdsspend on essential items? Comment on your answer.

309

Page 322: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

20. Fundamentals of regression II — Interpretation of computer output and assessing model adequacy

Unit 20: Fundamentals of regression IIInterpretation of computer output andassessing model adequacy

Overview

In practice, regression calculations are performed by computer and so in this final unitwe will consider how to interpret computer output to assess the adequacy of a givenregression model. Central to this are whether the model provides a good ‘fit’ to the datain terms of explanatory power and the statistical significance of the explanatory variable.

Aims

This unit considers how to judge whether a regression model is ‘good’ and how tointerpret typical computer output of a regression. Particular aims are:

to assess how good a regression model is in terms of explaining variation in thedependent variable

to familiarise you with interpreting standard computer output from regressionmodelling.

Background reading

+ Swift, L., and S. Piff Quantitative methods for business, management and finance.(Palgrave, 2010) third edition [ISBN 9780230218246] ‘Statistics’ Chapter 3.

20.1 Introduction

We conclude the course with a discussion of computer output for the simple linearregression model and how to assess the adequacy of a particular model. Remember ouraims are prediction and decision making. In order to make the best predictions anddecisions we need to use the best models.

20.2 Analysis of variance

Our overall objective is to explain the response variable Y , which is a random variable.We try to explain the variation in Y . Using simple linear regression, we attempt this

310

Page 323: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

20. Fundamentals of regression II — Interpretation of computer output and assessing model adequacy

using a single explanatory variable. The total variation in the response variablesample data is simply

TSS = Syy =n∑i=1

(yi − y)2. (20.1)

We call this the ‘total sum of squares’ (TSS). As can be seen from (20.1), TSS issimply the sum of the squared deviations of the response observations, the yi’s, aboutthe mean, y.∗ We can decompose TSS into two components:

the amount we are able to explain using the proposed model called the ‘explainedsum of squares’ (ESS)

and the remaining variation that we are unable to explain with the model, calledthe ‘residual sum of squares’ (RSS).

Hence,TSS = ESS + RSS. (20.2)

20.3 Coefficient of determination, R2

We can assess the overall fit of a model using the coefficient of determination. Thismeasures the proportion of the total variability in the response variable explained by themodel.

The coefficient of determination is denoted R2 and defined as

R2 =ESS

TSS. (20.3)

Coefficient of determination

Note that, as a proportion, 0 ≤ R2 ≤ 1.† The closer R2 is to 1, the better theexplanatory power of the model. Note also that R2 = r2 for a simple linear model(only), where r is the sample correlation coefficient.

Given our objective is to explain as much of the variation in the response variable aspossible, we would like R2 to be as close to 1 as possible. Ideally we would like R2 to beequal to 1. However, in practice we would not expect a perfect linear relationship toexist between the variables — this is especially true when dealing with social sciencevariables due to the complex interdependencies which exist between such variables. Infact, an R2 of around 60% may be considered ‘good’.

When conducting simple linear regression, if we had a choice of candidates to use as theexplanatory variable, then we would prefer the one which maximises R2. Then we wouldhave identified the simple regression model with the greatest explanatory power.

∗For example, if all the response observations were the same, that is, y1 = y2 = . . . = yn, then theyare all equal to y which means TSS = 0, and hence there is no variation in the response variable toexplain! Therefore there would be no need for a regression model.

†Also note that since TSS, ESS and RSS are ‘sums of squares’ they are each non-negative. Using(20.2), it follows that ESS ≤ TSS, hence 0 ≤ R2 ≤ 1.

311

Page 324: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

20. Fundamentals of regression II — Interpretation of computer output and assessing model adequacy

But how to choose the candidate explanatory variables in the first place? Well, it wouldbe sensible to draw on our prior knowledge about the response variable and generalcommon sense to come up with some obvious choices. For example, if our responsevariable is a macroeconomic variable (consumption, say) then we could use basiceconomic theory to come up with something suitable as an explanatory variable(income, say).

20.4 Computer output

Computer output usefully displays important regression results. We consider anexample where we attempt to explain GRE scores by the number of mathematicscourses taken by students. Typical output has the following form:

Regression equation: GRE = 812.988 + 50.479 Courses

Predictor Coefficient Std error t-ratio pConstant 812.98763 70.73298 11.4938 0.000Courses 50.478786 3.347518 15.0795 0.000

S = 120.37 R-sq = 92.3% R = 0.9607

We now explore in detail the output components:

In the ‘Predictor’ column, the ‘Constant’ is the y-intercept and ‘Courses’ is theexplanatory variable.

In the ‘Coefficient’ column, the estimates of the y-intercept and slope of theregression line are given, yielding the fitted regression line (using appropriaterounding) of

GRE = 812.988 + 50.479 Courses.

In the ‘Std error’ column, the ‘Courses’ value (3.347518) is the standard error ofthe slope, denoted sb1 , which measures the precision of the slope estimate.(Similarly, the ‘Constant’ value (70.73298) is the standard error of the y-intercept,denoted sb0 , which measures the precision of the y-intercept estimate, although weshall not consider this term any further in this course.)

In the ‘t-ratio’ column, the ‘Courses’ value (15.0795) is the t-statistic,

t =b1

sb1, (20.4)

which can be used to perform a statistical test to assess the significance of‘Courses’ as an explanatory variable of ‘GRE’ — that is, whether or not the trueslope b1 = 0. However, we shall perform the test using the p-value, discussed next.

312

Page 325: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

20. Fundamentals of regression II — Interpretation of computer output and assessing model adequacy

(Similarly, there is a t-statistic, t = b0/sb0 , for testing whether or not the trueintercept b0 = 0 — that is, whether or not the true line passes through the origin— but, again, we shall not consider this further in this course.)

In the ‘p’ column, the ‘Courses’ value is the p-value for a ‘two-sided’ hypothesistest of whether the true slope b1 = 0 or b1 6= 0. p-values less than 0.05 indicate thatb1 6= 0 and hence that the explanatory variable is statistically significant.‡

In the bottom row, ‘S’ is the standard error of the regression — that is, thestandard deviation of the observed y-values about the predicted y values. It is anestimate of the standard deviation of the model error term, ε, and tells us by howmuch the regression line varies.

In the bottom row, ‘R-sq’ is the value of the coefficient of determination R2, that is‘the proportion of the variation in y explained by x’.

In the bottom row, R (=√R-sq) is the sample correlation coefficient, r, as defined

in the previous unit.

Testing the statistical significance of x as an explanatory variable warrants furtherdiscussion. Why are we interested in whether or not the true slope b1 = 0? Well, let usrevisit the simple linear regression model:

y = b0 + b1x+ ε. (20.5)

If b1 = 0, then changes in x have no effect on y, whereas if b1 6= 0 then changes in xhave some effect on y. Clearly, if b1 > 0 then a one-unit increase in x leads to a b1-unitincrease in y, while if b1 < 0 then a one-unit increase in x leads to a b1-unit decrease iny.

Recall, our objective is to explain as much of the total variation in y as possible withour regression model. Therefore concluding whether or not b1 = 0 is essential fordetermining whether x is a true explanatory variable for y. If we concluded that b1 = 0when using a particular explanatory variable x in our model, then this indicates x hasno effect on y and therefore the particular model is of no use in terms of explaining y.

Returning to our GRE example, our simple linear regression model can be written as

GRE = b0 + b1Courses + ε. (20.6)

We observe that the p-value associated with ‘Courses’ is 0.000 which is clearly below§

our ‘benchmark’ value of 0.05 and hence we conclude that b1 in (20.6) is not equal to 0.Therefore the number of mathematics courses taken by students, ‘Courses’, does help toexplain GRE scores. Indeed, we estimate that taking one additional mathematics coursewill increase a student’s GRE by 50.479 points.

‡Although hypothesis testing is beyond the scope of this course, we will consider the test of whetherb1 = 0 to determine the statistical significance of ‘x’ as an explanatory variable of y. For our purposes,the test simply involves comparing the p-value reported in the computer output with the ‘benchmark’value of 0.05, with a p-value less than 0.05 indicating that the explanatory variable in the model isstatistically significant.

§Note this p-value is not truly zero, but it is to three decimal places, i.e. the p-value is less than0.0005.

313

Page 326: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

20. Fundamentals of regression II — Interpretation of computer output and assessing model adequacy

Example 20.1 A swimming pool construction company wondered whether jobscan be completed in a shorter time if more workers are used. Data were collected onthe number of workers and the time of completion (in hours) for a sample of 27 poolconstruction jobs. The results of the regression analysis were:

Regression equation: Time = 141.57− 6.724 Workers

Predictor Coefficient Std error t-ratio pConstant 141.56994 6.26140 22.6099 0.000Workers −6.724680 0.54940 −12.2401 0.000

S = 15.41 R-sq = 85.7% R = −0.9257

By looking at the sample correlation coefficient, −0.9257, we see that there is a verystrong negative linear relationship between the number of workers and theconstruction time. This means the more workers on a job, the shorter the completiontime.

Looking at the p-value of the ‘Workers’ explanatory variable, we see it is 0.000 (tothree decimal places) which is clearly below 0.05, indicating that Workers is a highlysignificant explanatory variable so is useful in explaining the response variable. TheR2 value tells us that this model is able to explain 85.7% of the variation in poolconstruction time using the number of workers as the explanatory variable.

The coefficient of Workers is −6.724 which means each additional worker on a poolconstruction job reduces the completion time by 6.724 hours.

Example 20.2 The coach of a school basketball team wanted to investigatewhether there was a relationship between the heights of players and the averagenumber of points scored per game. Data were collected from the 12 members of theschool team and a simple linear regression was performed with ‘Height’ as theexplanatory variable. The results were:

Regression equation: Average points = −40.36 + 0.706 Height

Predictor Coefficient Std error t-ratio pConstant −40.3606 33.50440 −1.2046 0.256Height 0.706061 0.493902 1.4296 0.183

S = 5.84 R-sq = 16.9% R = 0.4119

The correlation coefficient is 0.4119, indicating positive correlation between height ofplayers and the average number of points scored per game. However, the ‘small’value suggests that the linear relationship between the variables is moderately weak.

Turning to the regression results, we note the p-value of ‘Height’ is 0.183 and so thismeans it is not a statistically significant explanatory variable. This is also reflectedin the coefficient of determination which tells us that only 16.9% of the variation inthe average number of points scored per game can be explained by height.

314

Page 327: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

20. Fundamentals of regression II — Interpretation of computer output and assessing model adequacy

So, the given regression model is of no practical use. The coach might want toconsider alternative explanatory variables and see if he can find one which isstatistically significant, and results in a reasonable R2. We might recommend hoursof practice per week as a possible choice, or perhaps the number of years spentplaying basketball.

Example 20.3 An estate agent recorded sales of houses according to the sale price,in pounds, and the size of living space, in square metres. She was interested ininvestigating the relationship between these two variables and wondered whether thesales price could be predicted from the size of living space. A regression model wasestimated with the following results:

Regression equation: Sale price = −1, 263, 752 + 11, 647 Living space

Predictor Coefficient Std error t-ratio pConstant −1,263,752 332,772 −3.798 0.002Living space 11,647 1,244 9.362 0.000

S = 609,400 R-sq = 86.2% R = 0.9284

It is clear that there is a very strong positive correlation between the sale price andamount of living space, with a sample correlation coefficient of 0.9284. This isperhaps not too surprising since one would expect larger properties to be worthmore. ‘Living space’ is statistically significant due to the small p-value and thecoefficient of 11,647 can be interpreted by saying that for every extra square metreof living space, the sale price increases by £11,647.

The coefficient of determination tells us that 86.2% of the variation in house saleprices can be explained by living space alone. What might account for the other13.8%? Perhaps the number of bedrooms, location, age of the property, allocatedparking etc.

The intercept of the model is negative. Is this reasonable? Well, the intercept givesthe predicted value of the response variable when the explanatory variable is zero.Clearly a sale price cannot be negative! However, we would expect a minimumamount of living space for any house (perhaps 50 square metres?), so the model isfine as we would never encounter properties with near-zero amounts of living space.

Finally, note the ‘large’ value for S, the standard error of the regression. This ispurely a consequence of the large values of the response variable, since sale prices arein pounds, rather than hundreds of thousands of pounds. Always pay attention tothe units of measurement!

Activity 20.1 A retailer has asked you to develop a model which could be used topredict total sales for some proposed new retail locations. As an expanding retailer,it needs accurate predictions to determine whether it would be profitable to buildnew stores at various locations. The company has obtained data from a householdsurvey on retail sales per household, y, and income per household, x. You run asimple linear regression model and obtain the results below. Comment on the

315

Page 328: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

20. Fundamentals of regression II — Interpretation of computer output and assessing model adequacy

adequacy of the model.

Regression equation: Sales = 559.5 + 0.3815 Income

Predictor Coefficient Std error t-ratio pConstant 559.5 1,145.1 0.386 0.704Income 0.3815 0.02529 15.084 0.000

S = 147.7 R-sq = 91.9% R = 0.9586

Activity 20.2 A company sets different prices for its DVD system in differentregions of its country of operation. Data on the number of units sold and thecorresponding prices were collected and a simple linear regression analysisperformed. The regression results are:

Regression equation: Sales = 457.16− 0.3331 Price

Predictor Coefficient Std error t-ratio pConstant 457.1648 33.2657 13.743 0.000Price −0.3331 0.2011 −1.656 0.149

S = 30.3 R-sq = 31.4% R = −0.5604

What conclusions can you draw?

20.5 Several explanatory variables

Previously we saw simple linear regression which was characterised by one explanatoryvariable. Often one explanatory variable is not enough to adequately explain the totalvariation in the response variable. So we add more explanatory variables.

For example, absenteeism in the workforce could be due to:

hours worked

flexibility in work practice

salary paid etc.

while the salary for managers could be related to:

qualifications

experience

hours worked

performance etc.

316

Page 329: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

20. Fundamentals of regression II — Interpretation of computer output and assessing model adequacy

Remember the aim of statistics is prediction and decision making. In order to makethe best predictions and decisions we need to use the best models. This often meansmaking the models more complex by adding more explanatory variables. But themodels should not be too complex.¶

It is very straightforward to extend the simple linear regression model to incorporateseveral explanatory variables. Multiple linear regression is just a natural extensionof this framework, but with more than one explanatory variable.

Example 20.4

Suppose XYZ Catering sells catering goods and senior management wants to knowwhich factors affect sales. Your team has data on sales, clients, suppliers etc. Howdoes the management question translate into a model? What is the dependentvariable? What is (are) the independent variable(s)?

Clearly here the dependent variable is sales (the variable we are trying to explain).There could be several explanatory factors, such as

size of client company

type of client company

location

etc.

which we could take to be independent variables.

However, multiple linear regression is beyond the scope of this course and therefore wewill not consider this further.

20.6 Summary

In practice, most datasets which are used for regression are large and the estimation ofregression parameters can be computationally intensive. Therefore we tend to usecomputers to perform regression analysis. In this final unit of the course, we have lookedat the interpretation of regression output. Specifically, we have seen how to obtain theequation of the best-fitting line, determine how much of the total variation in theresponse variable can be explained by the model (R2) and determine whether theexplanatory variable in our model was statistically significant. Finally, we briefly lookedat introducing more than one explanatory variable into the regression model which hasthe advantage of being more realistic, but the disadvantage of leading to a morecomplicated model.

¶A principle known as Occam’s razor says that simplicity is preferred to complexity, other thingsbeing equal.

317

Page 330: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

20. Fundamentals of regression II — Interpretation of computer output and assessing model adequacy

20.7 Key terms and concepts

Coefficient of determination Computer outputDecision making Explained sum of squaresExplanatory power Multiple linear regressionp-value PredictionResidual sum of squares Standard errorTotal sum of squares Total variation

Learning outcomes

At the end of this unit, you should be able to:

determine how good a regression model is at explaining the dependent variable

interpret the computer output of a regression model

assess the statistical significance of the explanatory variable

discuss the key terms and concepts introduced in this unit.

Exercises

Exercise 20.1

The head of a statistics department has taken data from his instructors to observe thecorrelation between the number of homework assignments the instructors give for acourse and the average course grade for the students. The following is a plot of theresiduals, yi − yi, for this study.

x

x

xx x

x xx

x

x x

x

x x

x

x

x

0 5 10 15

−0.

2−

0.1

0.0

0.1

0.2

Plot of residuals

Number of Homework Assignments

Gra

de A

vera

ge

(a) Based on the plot of residuals, describe the strength of a linear relationshipbetween Number of Homework Assignments and Grade Average. What would be alikely value for the correlation coefficient, r? Explain your answer.

318

Page 331: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

20. Fundamentals of regression II — Interpretation of computer output and assessing model adequacy

(b) Based on the plot of residuals, describe the effect that Number of HomeworkAssignments has on student grades.

(c) Based on the plot of residuals, about how many homework assignments should aninstructor give to maximise student grade average? Explain your answer.

Exercise 20.2

A botanist is studying the relationship between the trunk circumferences of a species oftree and the number of leaves it has. The scatter plot and regression results are givenhere:

0

500

1000

1500

2000

2500

3000

100 150 200 250 300 350 400 450

Number

of Leaves

Trunk Circumference (inches)

Scatterplot with Least-Squares Regression Line

A

Predictor Coefficient Std error t-ratio p-valueConstant 934.43 298.850 3.2167 0.0074Circumference 3.1815 1.06577 2.9852 0.0114

S = 363.09 R-sq = 42.6% R = 0.6528

(a) What is the equation of the least squares regression line relating number of leavesto the trunk circumference in inches? Define any variables used.

(b) If the point A, as shown in the scatter plot, represents a tree with trunkcircumference 350 inches, and 980 leaves, what is the residual, yi − y, for this datapoint?

(c) If the data point A is removed from the sample, what effect will this have on thecorrelation coefficient, r? Explain.

319

Page 332: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

20. Fundamentals of regression II — Interpretation of computer output and assessing model adequacy

320

Page 333: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

Part 3Appendices

321

Page 334: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

A. A sample examination paper

Appendix A

A sample examination paper

Important note: This Sample examination paper reflects the examination andassessment arrangements for this course in the academic year 2013–2014. The formatand structure of the examination may have changed since the publication of this subjectguide. You can find the most recent examination papers on the VLE where all changesto the format of the examination are posted.

Mathematics and Statistics

Time allowed: 2 hours.

Candidates should answer ALL questions. Section A (50 marks) covers theMathematics part of the course, Section B (50 marks) covers the Statistics part of thecourse. Candidates are required to pass BOTH sections to pass the examination.

Candidates are strongly advised to divide their time accordingly.

A list of formulae and the table of cumulative Normal probabilities is provided at theend of this paper.∗

A calculator may be used when answering questions on this paper and it must complyin all respects with the specification given with your Admission Notice. The make andtype of machine must be clearly stated on the front cover of the answer book.

∗The table is provided at the back of this subject guide in Appendix C.

322

Page 335: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

A. A sample examination paper

Section A: Mathematics

Answer ALL questions (50 marks in total).

1. (a) The demand for a product, q, is related to its price, p, by the equation

q = 10− p,

while suppliers respond to a price of p by supplying an amount, q, given by theequation

q = 4p− 30.

Find the equilibrium price and the corresponding level of production.

Write down the supply function. For which values of p and q is it economicallymeaningful? (5 marks)

(b) Solve the equation log2(8)− log3(9) = log10(x). (5 marks)

(c) Find the derivatives of the following functions. (5 marks)

i. cos(x2).

ii. ex cos(x2).

(d) Suppose that you buy a car for £10, 000 and its value depreciates continuouslyat a rate of 25% per year. What is its value after three years? Explain why thecar’s value is halved after 4 ln(2) years. (5 marks)

[You may use the fact that, to 5dp, e0.25 = 1.28403.]

2. Consider the function f(x) = x3 − 2x2 − 15x.

(a) Find and classify the stationary points of f(x). (5 marks)

(b) Sketch the curve y = f(x). (5 marks)

(c) Find the area of the region bounded by the curve y = f(x), the x-axis and thevertical lines x = −1 and x = 1. (5 marks)

3. Consider an annuity which pays £100 every year. The first payment is to be madenow and further payments will be made at the end of each year for the next n years.

(a) Find the present value of this annuity, simplifying your answer as far aspossible, given that an interest rate of 5% per annum compounded annually isavailable to you. (5 marks)

(b) If the annuity is to make eleven payments, what is the smallest lump sumpayment that will be worth more to you than the annuity? (3 marks)

(c) How many payments are needed if the annuity is to be worth more than alump sum of £2, 000? (5 marks)

(d) If the annuity was a perpetuity, what would be its present value? (2 marks)

[You may use the facts that, to 5dp, 1.0511 = 1.71034 and log1.05(21) = 62.40033.]

323

Page 336: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

A. A sample examination paper

Section B: Statistics

Answer ALL questions (50 marks in total).

4. (a) Do you think the distribution of income (pounds per year) in the UK is mostlikely to be symmetrically distributed, skewed to the right, or skewed to theleft? Briefly explain why. Which measure of central tendency would you use todescribe income? Justify your choice. (5 marks)

(b) Given events A and B where P (A) = 0.5 and P (A ∪B) = 0.7, find P (B) inthe following three cases. (5 marks)

i. A and B are mutually exclusive.

ii. A and B are independent.

iii. P (A|B) = 0.5.

(c) In an examination, the scores of students who attend schools of type A areapproximately normally distributed about a mean of 61 with a standarddeviation of 5. The scores of students who attend type B schools areapproximately normally distributed about a mean of 64 with a standarddeviation of 4. Which type of school would have a higher proportion ofstudents with marks above 70? (5 marks)

(d) You randomly select 1,000 names from the subscription list of a magazinedesigned for hunters. You mail a questionnaire about gun control to thesereaders and receive 700 responses. You randomly select 200 of the 700responses for inclusion in your study. What forms of bias are evident in thisdesign? (5 marks)

5. Three members of an exclusive country club, Mr Adams, Miss Brown and DrCooper, have been nominated for the office of president. The probabilities of MrAdams and Miss Brown being elected are 0.3 and 0.5, respectively. If Mr Adams iselected, the probability of an increase in membership fees is 0.8. If Miss Brown orDr Cooper is elected, the corresponding probabilities of an increase in membershipfees are 0.1 and 0.4, respectively.

(a) What is the probability that Dr Cooper is elected president? (5 marks)

(b) What is the probability that there will be an increase in membership fees?

(5 marks)

(c) Given that membership fees have been increased, what is the probability thatDr Cooper was elected president? (5 marks)

Continued overleaf

324

Page 337: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

A. A sample examination paper

6. For a group of 15 students, the following table shows the average number of hoursper week spent on study and their final results in the corresponding examination.

No. of hours studied, x 16 17.5 11.5 13.5 15 12.5 20.5 14.5Examination mark, y 77 85 48 59 75 41 95 72

No. of hours studied, x 16.5 13.5 22 18.5 17 19.5 19.5Examination mark, y 80 70 99 85 83 97 89

Summary statistics for these data are:∑xi = 247.5,

∑x2i = 4218.75,

∑yi = 1155,

∑y2i = 92999,

∑xiyi = 19750.5

(a) Calculate the sample correlation coefficient for these data and comment.

(5 marks)

(b) Calculate the least squares regression line of y on x. (5 marks)

(c) Use the calculated line to predict examination marks for students who studiedfor 16 hours. Would you consider a prediction based on 20 hours to be moreaccurate? Explain why/why not. (5 marks)

END OF PAPER

325

Page 338: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

A. A sample examination paper

Formula sheet

Section A: Mathematics

The chain rule: If f(x) = f(g) for some function g(x), thendf

dx=

df

dg

dg

dx.

The product rule:d

dx

(f(x)g(x)

)=

df

dxg(x) + f(x)

dg

dx.

The quotient rule:d

dx

(f(x)

g(x)

)=

1

[g(x)]2

(df

dxg(x)− f(x)

dg

dx

).

The sum of a finite geometric series is given by

a+ ar + ar2 + · · ·+ arn−1 = a1− rn1− r .

Section B: Statistics

The variances for a population and a sample are

σ2 =

N∑i=1

x2i

N− µ2 and s2 =

(n∑i=1

x2i

)− nx2

n− 1.

For events A and B,

P (A ∪B) = P (A) + P (B)− P (A ∩B),

P (A ∩B) = P (A)P (B | A),

P (B|A) =P (A|B)P (B)

P (A).

For a discrete random variable X,

µ = E(X) =∑

x · P (X = x),

E(g(X)) =∑

g(x) · P (X = x),

σ2 = Var(X) = E(X2)− (E(X))2.

326

Page 339: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

A. A sample examination paper

If X ∼ Bin(n, π), then

P (X = x) =

(n

x

)πx(1− π)n−x where

(n

x

)=

n!

x!(n− x)!,

E(X) = nπ and Var(X) = nπ(1− π).

If X ∼ Poisson(λ), then

P (X = x) =e−λλx

x!, E(X) = λ and Var(X) = λ.

The sample correlation coefficient is

r =Sxy√SxxSyy

,

where

Sxx =n∑i=1

(xi − x)2 =n∑i=1

x2i − nx2,

Syy =n∑i=1

(yi − y)2 =n∑i=1

y2i − ny2,

Sxy =n∑i=1

(xi − x)(yi − y) =n∑i=1

xiyi − nxy.

The regression slope and intercept are given by

b1 =

∑xiyi − nxy∑x2i − nx2

and b0 = y − b1x.

327

Page 340: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

B. Solutions to the sample examination paper

Appendix B

Solutions to the sample examinationpaper

The solutions to the sample examination paper provided below are to give guidanceabout the level of detail required by the Examiners. In response to a ‘qualitative’question, it is essential to directly address the areas of the syllabus which are beingassessed. For a ‘quantitative’ question, it is essential that you show all your working asmost of the credit will be for the method, rather than the final answer.

Section A: Mathematics

Question 1

(a) As in Section 2.3.3, given the demand equation q = 10− p and the supply equationq = 4p− 30, the equilibrium price is given by

10− p = 4p− 30 =⇒ 5p = 40 =⇒ p =40

5= 8.

The corresponding quantity is then given by, say, q = 10− 8 = 2. (This is, of course, theequilibrium quantity.)

Then, as in Section 4.1.4, using the supply equation, we can see that the supply functionis given by qS(p) = 4p− 30. This is economically meaningful as long as q ≥ 0 andp ≥ 15/2 since other values of p or q will make at least one of these quantities negative.

(b) As in Section 4.2.2, if we note that

log2(8) = log2(23) = 3 and log3(9) = log3(32) = 2,

the given equation is just

3− 2 = log10(x) =⇒ log10(x) = 1,

and so, using the definition of the logarithm, we see that x = 101 = 10.

(c) For (i), the function f(x) = cos(x2) is the composition given by f(g) = cos(g) withg(x) = x2. Thus, using the chain rule from Section 6.1.3, we find that

df

dx=

df

dg· dg

dx= [− sin(g)][2x] = −2x sin(x2).

328

Page 341: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

B. Solutions to the sample examination paper

For (ii), the function h(x) = ex cos(x2) is the product of the functions ex and cos(x2).Thus, using the product rule from Section 6.1.1, we find that

dh

dx= [ ex][cos(x2)] + [ ex][−2x sin(x2)] = ex[cos(x2)− 2x sin(x2)],

if we use our answer from (i).

(d) As the car is initially worth £10, 000 and its value depreciates continuously at a rateof 25% per year, we can use what we saw in Section 9.3 to see that its value is given by

10, 000 e−(0.25)(3)

after three years. We are told in the question that, to 5dp, e0.25 = 1.28403 and so thisgives us

10, 000

( e0.25)3' 10, 000

(1.28403)3= 4, 723.615,

i.e. the car’s value is £4, 723.61 after three years.

The car’s value is halved from £10, 000 to £5, 000 after t years where

5, 000 = 10, 000 e−0.25t =⇒ e−0.25t =1

2.

Using the definition of ‘ln’, as in Section 4.2.2, this then gives us

−0.25t = ln

(1

2

)=⇒ t = −4 ln

(1

2

)= 4 ln(2),

if we use the laws of logarithms. Consequently, as required, the car’s value is halvedafter 4 ln(2) years.

Question 2

We are given the function f(x) = x3 − 2x2 − 15x.

(a) To find the stationary points of f(x), as in Section 7.2.1, we find its derivative, i.e.

f ′(x) = 3x2 − 4x− 15,

and solve the equation f ′(x) = 0 and so, using factorisation, we can see that

3x2 − 4x− 15 = 0 =⇒ (3x+ 5)(x− 3) = 0 =⇒ x = −5

3or 3.

Thus, the stationary points occur when x = −5/3 and x = 3.

To classify these stationary points, we see that the second derivative of f(x) is given by

f ′′(x) = 6x− 4,

and so, noting that

At x = −5/3, we have f ′′(−5/3) = −14 < 0 and so this is a local maximum.

329

Page 342: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

B. Solutions to the sample examination paper

At x = 3, we have f ′′(3) = 14 > 0 and so this is a local minimum.

Consequently, we see that the stationary points at x = −5/3 and x = 3 are a localmaximum and local minimum respectively.

(b) To sketch the curve y = f(x), as in Section 7.2.2, we note that:

The y-intercept, which occurs when x = 0, is given by y = 0.

The x-intercepts, which occur when y = 0, are given by

x3 − 2x2 − 15x = 0 =⇒ x(x2 − 2x− 15) = 0 =⇒ x(x− 5)(x+ 3) = 0,

and so we get x = −3, x = 0 and x = 5.

The stationary points, as found in (a), are

• a local maximum when x = −5/3 and y = f(−5/3) = 400/27, and

• a local minimum when x = 3 and y = f(3) = −36.

Lastly, as the highest power of x in f(x) is x3, we should find that f(x)→∞ asx→∞ and f(x)→ −∞ as x→ −∞.

So, using this information, we can set up the sketch and finish it off as illustrated inFigure B.1.

Ox

y

−53

40027

−36

−3

3

5O

x

y

−53

y = f(x)

40027

−36

−3

3

5

The set up The sketch

Figure B.1: Sketching the curve y = f(x) for Question 2(b).

(c) As in Section 8.2, to find the area of the region bounded by the curve y = f(x), thex-axis and the vertical lines x = −1 and x = 1, we observe that f(x) is positive for−1 ≤ x ≤ 0 and negative for 0 ≤ x ≤ 1. This means that the area we need to find isgiven by ∫ 0

−1

f(x) dx+

∣∣∣∣∫ 1

0

f(x) dx

∣∣∣∣ .So, as the first of these integrals gives us∫ 0

−1

x3 − 2x2 − 15x dx =

[x4

4− 2

3x3 − 15

2x2

]0

−1

= 0−[

1

4+

2

3− 15

2

]=

79

12,

330

Page 343: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

B. Solutions to the sample examination paper

and the second of these integrals gives us∫ 1

0

x3 − 2x2 − 15x dx =

[x4

4− 2

3x3 − 15

2x2

]1

0

=

[1

4− 2

3− 15

2

]− 0 = −95

12,

we find that the required area is 7912

+ 9512

= 292

.

Question 3

This question uses the material from Unit 10.

(a) The annuity pays £100 every year. The first payment is made now and so itspresent value is £100 whereas given that an interest rate of 5% per annum compoundedannually is available, the payments at the end of the first, second, . . . , nth years willhave present values given by

100

1.05,

100

1.052, . . . ,

100

1.05n

respectively. As such, the present value of the annuity as a whole is the sum of thegeometric series

100 +100

1.05+

100

1.052+ · · ·+ 100

1.05n,

which has a first term of 100, a common ratio of 1/1.05 and n+ 1 terms. Consequently,using the formula for the sum of a finite geometric series, we see that

1001− 1

1.05n+1

1− 1

1.05

= 2, 100

(1− 1

1.05n+1

),

is the present value of this annuity.

(b) If the annuity is to make eleven payments, so that n+ 1 = 11, we see that itspresent value is

2, 100

(1− 1

1.0511

)' 2, 100

(1− 1

1.71034

)= 872.17395,

to 5dp if we use the fact that 1.0511 = 1.71034 to 5dp. Thus, the present value of theannuity is £872.17 and, from this, we see that £872.18 is the smallest lump sumpayment that is worth more to you than the annuity.

(c) For the annuity to be worth more than a lump sum of £2, 000, we need n+ 1payments where

2, 100

(1− 1

1.05n+1

)> 2, 000 =⇒ 1− 1

1.05n+1>

20

21=⇒ 1

21>

1

1.05n+1,

and so we need to solve the inequality 1.05n+1 > 21. Indeed, given the fact thatlog1.05(21) = 62.40033 (5dp), we can use logarithms to see that

n+ 1 > log1.05(21) ' 62.40033,

and so we would need 63 payments if we want the annuity to be worth more than thegiven lump sum.

331

Page 344: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

B. Solutions to the sample examination paper

(d) If the annuity was a perpetuity, its present value would be given by the infinitegeometric series

100 +100

1.05+

100

1.052+ · · ·+ 100

1.05n+ · · · ,

whose sum is100

1− 1

1.05

= 2, 100,

if we use the formula for the sum of an infinite geometric series (or think about what wefound in (a) as n→∞). Thus, the present value of the corresponding perpetuity wouldbe £2, 100.

Section B: Statistics

Question 4

(a) The distribution of income would be skewed to the right. Most people earn, say,between £12, 000 and £60, 000, with few earning less. But a relatively small numberearn a lot more, leading to a long ‘tail’ to the right. We would probably not use themode, instead preferring the mean or median. The mean, though, is sensitive to outliersand so will be ‘pulled’ up due to the few high earners. As such, it could be argued thatthe median would be the best measure of central tendency for representing the ‘average’income of a UK employee.

(b) For (i), if A and B are mutually exclusive, then P (A ∩B) = 0 and so, using the factthat

P (A ∪B) = P (A) + P (B)− P (A ∩B),

we get

P (B) = P (A ∪B)− P (A) = 0.7− 0.5 = 0.2.

For (ii), if A and B are independent, then P (A∩B) = P (A)P (B) and so, using the factthat

P (A ∪B) = P (A) + P (B)− P (A)P (B),

we see that

0.7 = 0.5 + P (B)− 0.5× P (B).

Thus, 0.5× P (B) = 0.2 and we find that P (B) = 0.4.

For (iii), we again have the facts that

P (A ∪B) = P (A) + P (B)− P (A ∩B) and P (A ∩B) = P (B)P (A|B),

so, we have

0.7 = 0.5 + P (B)− 0.5× P (B),

which, once again, gives us 0.5× P (B) = 0.2 so that we get P (B) = 0.4.

Alternatively (and more elegantly), P (A) = 0.5 = P (A|B) implies A and B areindependent and so we can use the result from (ii).

332

Page 345: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

B. Solutions to the sample examination paper

(c) The z-score for type A is zA = 70−615

= 1.8 and for type B it is zB = 70−644

= 1.5.Since zA > zB, type B schools have a higher proportion of students with marks above 70.

Alternatively, the actual proportions could be calculated: P (Z > 1.8) = 0.0359 andP (Z > 1.5) = 0.0668, hence type B schools have the higher proportion.

(d) Several forms of bias exist in this design, including undercoverage bias, responsebias and non-response bias. Readers of a hunting magazine would probably sharepositive views about gun ownership. This group of readers is not a representativesample of the general public when it comes to gun control. Therefore, undercoveragebias is suggested. Since this group of readers probably has strong views about guncontrol, they might answer more often than the general public, therefore, exhibitingnon-response bias. There is also an expectation among hunters that gun control is not agood idea. This expectation might lead to a response bias not found in the generalpopulation. In order to try to avoid this form of bias, a magazine covering a topicunrelated to guns might be more appropriate for a subscription list.

Question 5

Let A, B and C be the events Mr Adams is elected, Miss Brown is elected and DrCooper is elected, respectively.

(a) As the events A, B and C are mutually exclusive and collectively exhaustive, wehave

P (C) = 1− P (A)− P (B) = 1− 0.3− 0.5 = 0.2.

(b) Let X be the event membership fees increase. As the events A, B and C aremutually exclusive and collectively exhaustive, we also have

P (X) = P (A)P (X|A) + P (B)P (X|B) + P (C)P (X|C)

= (0.3× 0.8) + (0.5× 0.1) + (0.2× 0.4)

= 0.37.

(c) Using Bayes’ theorem, we then have

P (C|X) =P (C)P (X|C)

P (X)=

0.2× 0.4

0.37=

8

37= 0.216.

Question 6

(a) Using the formula, we find that the sample correlation coefficient is

r =Sxy√SxxSyy

=

∑xiyi − nxy√

(∑x2i − nx2)(

∑y2i − ny2)

= 0.9356.

This indicates (very) strong, positive correlation between examination mark and hoursof study.

(b) To find the least squares regression line of the form y = b0 + b1x+ ε we use theformulae

b1 =

∑xy − nxy∑x2 − nx2

= 5.1333,

333

Page 346: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

B. Solutions to the sample examination paper

andb0 = y − b1x = −7.7000,

to see that the estimated regression line is y = −7.7 + 5.1333x.

(c) For x = 16, the expected examination mark is −7.7 + 5.1333(16) = 74.43, which wemay round to 74. We expect the predicted value for x = 16 to be more accurate becausethe available x data cover a range of 11.5 to 22, hence 16 is near the middle of thesample x values whereas 20 is towards the upper limit. Interpolation is more accuratefor values near the centre of the sample data.

334

Page 347: Mathematics and Statistics - WordPress.com and Statistics James Ward and James Abdey FP0001 2013 International Foundation Programme This guide was prepared for the University of London

C. Cumulative Normal probabilities

Appendix C

Cumulative Normal probabilities

The entries in this table are cumulative probabilities for the standard Normaldistribution and give Φ(z) = P (Z ≤ z) for z ≥ 0. For example, P (Z ≤ 1.96) = 0.9750.

For values of z < 0, use P (Z ≤ z) = 1− P (Z ≤ |z|) = 1− Φ(|z|). For example,

P (Z ≤ −1) = 1− P (Z ≤ 1) = 1− Φ(1) = 1− 0.8413 = 0.1587.

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.53590.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.57530.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.61410.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.65170.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.68790.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.72240.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.75490.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.78520.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.81330.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.83891.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.86211.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.88301.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.90151.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.91771.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.93191.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.94411.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.95451.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.96331.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.97061.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.97672.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.98172.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.98572.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.98902.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.99162.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.99362.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.99522.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.99642.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.99742.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.99812.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.99863.0 0.9987 0.9987 0.9987 0.9988 0.9988 0.9989 0.9989 0.9989 0.9990 0.9990

335