8678core material notes

16
Further Mathematics – Core Material Notes UNIVARIATE D  ATA Categorical data Type of data Graph Qualifications on use Categorical Bar chart Segmented bar chart No more than 4-5 categories. Categorical data: data obtained when classifying or naming some quality or attribute. Bar chart Segmented bar chart  Other variants include: ‘Percentage segmented bar chart’ » Analysing Categorical Data Writing up a Report Skills check: - Writ e a brief report to describe the di str ibution of a numeric al variable in terms of shape, centre, spread and outliers (if any). - Writ e a brief report to describe the di str ibution of a cat egorical var iable in terms of the dominant category (if any), the order of occurrence of each category and their relative important Numerical data 1 T  ype of data Graph Qualifications on use Numerical Histogram Medium to large data sets Stem plot Best for small to medium sized data sets Dot plot Suitable for only small data sets

Upload: james-nola

Post on 03-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 8678Core Material Notes

7/29/2019 8678Core Material Notes

http://slidepdf.com/reader/full/8678core-material-notes 1/16

Further Mathematics – Core Material Notes

UNIVARIATE D ATA

Categorical data

Type of data Graph Qualifications on useCategorical Bar chart

Segmented bar chart No more than 4-5 categories.

Categorical data: data obtained when classifying or naming some quality or

attribute.

Bar chart Segmented bar chart  

Other variants include:

‘Percentage segmented bar chart’ 

» Analysing Categorical Data

Writing up a Report

Skills check:- Write a brief report to describe the distribution of a numerical variable in

terms of shape, centre, spread and outliers (if any).

- Write a brief report to describe the distribution of a categorical variable in

terms of the dominant category (if any), the order of occurrence of each

category and their relative important

Numerical data

1

T  ype of data Graph Qualifications on useNumerical Histogram Medium to large data sets

Stem plot Best for small to medium sized

data setsDot plot Suitable for only small data sets

Page 2: 8678Core Material Notes

7/29/2019 8678Core Material Notes

http://slidepdf.com/reader/full/8678core-material-notes 2/16

Further Mathematics – Core Material Notes

Numerical data: data obtained by measuring or counting some quantity.

• Discrete: distinct values that can be counted (e.g. number of people, you

cannot have or of a person).

• Continuous: data that can have any value, even with decimals (e.g. height,

temperature, anything that requires a measuring device).

Histogram (continuous) Histogram

(discrete)

Stem-and-leaf plot Dot plot  

 

Split stem

Back-to-back stem plot 

» Analysing

Numerical Data

Shape

Symmetric

2

Page 3: 8678Core Material Notes

7/29/2019 8678Core Material Notes

http://slidepdf.com/reader/full/8678core-material-notes 3/16

Further Mathematics – Core Material Notes

Perfectly symmetrical data has an

equal mean, median and mode.

Bimodal data shows two equal

modes (usually indicates there are

two groups of data that need to be

separated such as height of boys and

girls).

Positively skewed Negatively skewed

Note: when the graph is skewed, the median and IQR are used when

measuring centre and spread.

Measures of centre

Mean: the average value.

Median: the midpoint of a distribution (50th percentile).

Mode: the most commonly occurring value/s (only used when there is a high

number of scores)

Measures of spread

Range: the difference between the smallest value and the largest.

IQR (interquartile range): the range in which 50% of the values lie.

75→th percentile whilst 25→

th percentile

3

Page 4: 8678Core Material Notes

7/29/2019 8678Core Material Notes

http://slidepdf.com/reader/full/8678core-material-notes 4/16

Further Mathematics – Core Material Notes

Outliers

Any data value/s that stands out from the

main part of the data (values that are

unusually high or low).

» Five-number summary and Box

Plots

A listing of the median, the 2nd and 3rd quartile, and the smallest and largest data

values of a distribution:

Minimum, , M, , Maximum

From this five-number summary, a box plot can be constructed:

General Box Plot 

Box Plot with outlier/s

Note: the lower fence and upper fence are not drawn in but it must be

understood that values that lie outside these fences are classified as ‘possible

outliers’ (possible, in that the distribution may just have a very long tail and

there is not enough data to pick up other values within the tail).

» Box Plots and Distribution Shape

4

Page 5: 8678Core Material Notes

7/29/2019 8678Core Material Notes

http://slidepdf.com/reader/full/8678core-material-notes 5/16

Further Mathematics – Core Material Notes

  Symmetric Positively skewed Negatively  

Skewed

» Standard Deviation

Standard Deviation: used to measure the spread of a data distribution around

the mean.

 The standard deviation can be estimated by assuming that around 95% of the

data values lie within two standard deviations of the mean (four in total):

Note: now that all of the summary statistics have been explained, the following

summary statistics are usually used together:o Mean and standard deviation

o Median and IQR

o Mode and range

» 68-95-99.7% Rule

» Standard scores/z-scores

Standard scores/z-scores: transformed data values that show the number of 

standard deviations that the values lie from the mean of the distribution.

Example: The mean study score for Further Mathematics in 2011 was 30 with a

standard deviation of 7. Student A received a study score of 47, while student B

received a score of 23. Calculate the z-scores for both students and comment.

For student A: For student B:

5

Page 6: 8678Core Material Notes

7/29/2019 8678Core Material Notes

http://slidepdf.com/reader/full/8678core-material-notes 6/16

Further Mathematics – Core Material Notes

Drawing this out graphically, student A lies at 

the BLUE point and student B lies at the GREEN

 point:

From this graph, it can be understood that

student A’s score lies within the top 2.5% of the

state, whilst student B lies in the bottom 16% of 

the state, scoring lower than 84% of the rest of 

the state.

BIVARIATE D ATABivariate data sets can be of three types:

o Categorical – Categorical

o Numerical – Categorical

o Numerical – Numerical

For any bivariatedata set, one

variable is

dependent and the other independent:

The dependent variable responds to change in the independent variable.

The independent variable explains the change in the dependent variable.

Categorical— categorical» Two-way frequency table

Note: Unless the two column sums are equal, it is incorrect to make a

 judgement regarding the relationship between these variables based on the first 

table. In order to accurately compare the two variables, it is important that the

table entries are converted into percentages (as shown below)

6

Page 7: 8678Core Material Notes

7/29/2019 8678Core Material Notes

http://slidepdf.com/reader/full/8678core-material-notes 7/16

Further Mathematics – Core Material Notes

» Segmented Bar Chart

 The relationship between two categorical variables can also be compared by

using a percentaged segmented bar chart:

Numerical—categorical» Parallel box plots

Note: When analysing parallel box plots,

compare the following features:

- Medians

- IQRs and/or ranges

- Shapes (symmetric or skewed)

Numerical—numerical

» Scatterplots

7

Page 8: 8678Core Material Notes

7/29/2019 8678Core Material Notes

http://slidepdf.com/reader/full/8678core-material-notes 8/16

Further Mathematics – Core Material Notes

» Interpreting Scatterplots 

 The object of bivariate analysis is to

determine whether a relationship exists

between two variables and if so, its:

Strength- Correlation coefficient (r) - strong,

moderate, weak or no relationship

- Coefficient of 

determination

(r2)

Form

- Linear or non-linear

Direction

- Positive, negative, or no correlation

» Pearson’s Correlation Coefficient

Correlation coefficient: A value that shows how strong the

relationship between two variables is, where a ‘strong’

relationship will show a perfect linear graph whilst a graph

with no correlation would have values scattered everywhere

with no clear pattern.

o Positive values always represent relationships with a

positive gradient.

o Negative values always represent relationships witha negative gradient.

Note: 1. The correlation coefficient is designed for numerical and linear 

data only .

2. It should be used with caution if outliers are present .

» Coefficient of DeterminationCoefficient of determination: describes the amount of influence that the

independent variable had on the dependent variable (usually expressed as a

percentage). 

The standard analysis is:

The coefficient of determination, calculated to be [   ], shows that  [ %]

of the variation in [dependent variable] can be explained by variation in

 [independent variable]. The other  [(100-100r²)]% of variation in

 [dependent variable] can be explained by other factors or influences.

8

Page 9: 8678Core Material Notes

7/29/2019 8678Core Material Notes

http://slidepdf.com/reader/full/8678core-material-notes 9/16

Further Mathematics – Core Material Notes

Note: The coefficient of determination does not entirely determine whether 

there is a relationship between two variables, therefore, it is important to

consider other factors.

For example, it was calculated that the relationship between height and

intellectual ability had a strong correlation. However, this may simply be

because taller people are generally older than those who are shorter and thus,

intellectual ability may be due to age rather than height.

Choosing a Suitable GraphType of data

GraphDependent variable Independent variable

Categorical Categorical Segmented bar chart

Categorical Numerical Parallel box plots

Categorical (two

categories only) Numerical

Back-to-back stem plots

Parallel box plots(preferred)

Numerical Numerical Scatterplot

REGRESSIONLeast Squares RegressionLinear regression: the process of fitting a straight line to bivariate data with the

aim of modelling the relationship between two numerical variables.

This straight line can be found using two methods:

» Least Squares Method

Note: Least Squares regression line is usually used for data without 

outliers.

This method assumes that the variables are linearly related .**

When using this method, the IV and DV must be correctly identified.

Residuals: the vertical distances between the

actual x value and the predicted x value

which lies on the least squares line

9

Page 10: 8678Core Material Notes

7/29/2019 8678Core Material Notes

http://slidepdf.com/reader/full/8678core-material-notes 10/16

Further Mathematics – Core Material Notes

Least squares line: minimises the sum of the squares of the residuals.

Note: The slope, b, predicts the change in y when x changes by one unit.

o +b means y increases as x increases

o  –b means y decreases as x increases

The y-intercept, a, predicts the value of y when x=0

**  To see whether two variables are linearly related, and thus whether or not the

least squares method should be applied, you can plot what is known as the

‘residual plot’. A residual plot shows important information about a relationship

and allows you to view the residual values for each point.

o A residual that appears like a positive or negative parabola indicates

that the data is non-linear and a transformation should be applied to the

data.

o A residual plot with a random pattern indicates that the data is linear.

» The Three Median Line

Note: The Three Median regression line is usually used for data with

outliers.

This method assumes that the variables are linearly related .**

1. Plot the data on a scatterplot

2. Divide the points symmetrically into three groups (if you are unable to divide

the points equally, divide it in a way that the left and right sides both have

equal amounts).

3. Find the median point of each group by finding the median of the x and y

values.

4. Connect the two outside points and move

this line one third of the way towards

the middle point.

Extrapolation and Interpolation

Interpolation: predicting within the range of 

data.

Extrapolation: predicting outside the range

of data.

Note: extrapolation is a less reliable process

than interpolation as you are going beyond your original data.

10

Page 11: 8678Core Material Notes

7/29/2019 8678Core Material Notes

http://slidepdf.com/reader/full/8678core-material-notes 11/16

Further Mathematics – Core Material Notes

Data Transformation

Note: When a transformation is needed, use the values of r and r² to helpdetermine which is best.

T IMES SERIESSummarising Time Series Data

 Time series data is simply data with a timeframe as the independent variable.

 There are four ways in which it can be described:

Trend

Data displays a trend (or secular trend) when a consistent increase or decrease

can be seen in the data over a significant period of time. A trend line can befitted to such data.

11

Page 12: 8678Core Material Notes

7/29/2019 8678Core Material Notes

http://slidepdf.com/reader/full/8678core-material-notes 12/16

Further Mathematics – Core Material Notes

Seasonality/Seasonal variation

Seasonal data are repetitive fluctuating movements which occur within a time

period of one year or less. For e.g. sales of warm drinks might fluctuate every

winter. This data can be deseasonalised.

Cycles

Cyclic data shows fluctuations, but not at consistent intervals, amplitudes or

seasons, and occur in time intervals of more than one year. This includes data

such as stock prices.

Random (Variation)

Random data shows no pattern. All fluctuations occur by chance and cannot be

predicted.

12

Page 13: 8678Core Material Notes

7/29/2019 8678Core Material Notes

http://slidepdf.com/reader/full/8678core-material-notes 13/16

Further Mathematics – Core Material Notes

Smoothing Time Series Data

Time series can be smoothed in two ways:» Moving Means

3-moving mean smoothing

5-moving mean smoothing

4-moving mean smoothing

When it comes to even numbers, the centre of the set of points is not a point

belonging to the original series. This problem is solved by using a process called

centring. This is done by taking two smoothed values beside one another and

smoothing those two values. In other words, the moving mean smoothing

process is done twice.

Note: Moving means smoothing is not limited to 3, 5 and 4. These numbers

have been chosen specifically for the sake of explaining the process of 

smoothing time series data.

» Moving Medians

Median smoothing is very similar to moving means smoothing, however the

median of the points is taken instead of the average. This method is generally

preferred to moving average smoothing when outliers are present.

Seasonal Indices

13

Page 14: 8678Core Material Notes

7/29/2019 8678Core Material Notes

http://slidepdf.com/reader/full/8678core-material-notes 14/16

Remember, the average seasonal index is 100%.

Therefore, this means that February unemployment 

figures tend to be 20% higher  than the monthly 

average. If we obtained a negative answer, say for 

example, , this would mean that  

unemployment figures for that month tend to be 10%

lower than the monthly average.

Further Mathematics – Core Material Notes

Note:  - Deseasonalised data helps show the trend in the series more

clearly and the individual

months that are different from the usual seasonal pattern (i.e. it helps

remove the seasonal component). 

- Therefore, we deseasonalise when there is a seasonal component/when

there is a petition for every season.

- The sum of the seasonal indices equals the number of seasons (e.g. if 

the seasons are

months, the seasonal indices add to 12).

» Interpreting Seasonal Indices

 To interpret a seasonal index, convert the seasonalised index into a percentage

(note: this is an optional step). Once this value is obtained, subtract 100% from

it. Alternatively, if the percentage conversion is not carried out, subtracting 1 will

also give you the answer.

Example: The seasonal index for unemployment for the month of February is 1.2.

Seasonal indices can also be used to comment on the relationship between

seasons, for example:

o A seasonal index of 1.3 means that the season is 30% above the average

of the seasons

o A seasonal index of 1.0 means that the season is equal to the average of 

the seasons

o A seasonal index of 0.8 means that the season is 20% below the average

of the seasons

Example: Mikki runs a shop and she wishes to determine quarterly seasonal

indices based on her last year’s sales, which are shown in the tablebelow.

Summer

Autumn Winter Spring

920 1085 1241 446

 

1. The seasonal index is defined by: . The seasons

are quarters. Write the formula in terms of quarters:

14

Page 15: 8678Core Material Notes

7/29/2019 8678Core Material Notes

http://slidepdf.com/reader/full/8678core-material-notes 15/16

Further Mathematics – Core Material Notes

2. Find the quarterly average for the year.

3. Work out the seasonal index (SI) for each time period.

4. Check that the seasonal indices sum to 4 (the number of seasons). The slight difference here is due to rounding error.

5. Write out your answers as a table of the seasonal indices.

Seasonal Indices

Summer

Autumn Winter Spring

0.997 1.176 1.345 0.483

» Steps in calculating seasonal indices for several years’ data

1. Calculate the seasonal indices for Years 1, 2, 3 etc. separately.

2. You should then have three different sets of seasonal indices (or three tables

that represent a each of the different years).

3. Average the three sets of seasonal indices at the end to obtain a single set of 

seasonal indices (e.g. find the average of quarter 1 for year 1, 2 and 3, find

the average for quarter 2 for year 1, 2, and 3, and so on. In the end you

should have only one table representing the seasonal indices).

Fitting a Trend Line and Forecasting

» Fitting a trend line

» Forecasting

» Taking seasonality into account

» Making predictions with deseasonalised data

15

Page 16: 8678Core Material Notes

7/29/2019 8678Core Material Notes

http://slidepdf.com/reader/full/8678core-material-notes 16/16

Further Mathematics – Core Material Notes

CHECK QUESTION 2A OF CHAPTER 7E

CAS CALCULATOR TUTORIAL FOR ‘CORE

M ATERIAL’Plotting: Five-number summary and Box Plots (and other types of plots)

How to find the standard deviation, and mean (and other ones)

How to find the r value and r^2 value

Least squares regression line

Residual Plot

Applying transformations to a set of values of a function

16