8678core material notes
TRANSCRIPT
7/29/2019 8678Core Material Notes
http://slidepdf.com/reader/full/8678core-material-notes 1/16
Further Mathematics – Core Material Notes
UNIVARIATE D ATA
Categorical data
Type of data Graph Qualifications on useCategorical Bar chart
Segmented bar chart No more than 4-5 categories.
Categorical data: data obtained when classifying or naming some quality or
attribute.
Bar chart Segmented bar chart
Other variants include:
‘Percentage segmented bar chart’
» Analysing Categorical Data
Writing up a Report
Skills check:- Write a brief report to describe the distribution of a numerical variable in
terms of shape, centre, spread and outliers (if any).
- Write a brief report to describe the distribution of a categorical variable in
terms of the dominant category (if any), the order of occurrence of each
category and their relative important
Numerical data
1
T ype of data Graph Qualifications on useNumerical Histogram Medium to large data sets
Stem plot Best for small to medium sized
data setsDot plot Suitable for only small data sets
7/29/2019 8678Core Material Notes
http://slidepdf.com/reader/full/8678core-material-notes 2/16
Further Mathematics – Core Material Notes
Numerical data: data obtained by measuring or counting some quantity.
• Discrete: distinct values that can be counted (e.g. number of people, you
cannot have or of a person).
• Continuous: data that can have any value, even with decimals (e.g. height,
temperature, anything that requires a measuring device).
Histogram (continuous) Histogram
(discrete)
Stem-and-leaf plot Dot plot
Split stem
Back-to-back stem plot
» Analysing
Numerical Data
Shape
Symmetric
2
7/29/2019 8678Core Material Notes
http://slidepdf.com/reader/full/8678core-material-notes 3/16
Further Mathematics – Core Material Notes
Perfectly symmetrical data has an
equal mean, median and mode.
Bimodal data shows two equal
modes (usually indicates there are
two groups of data that need to be
separated such as height of boys and
girls).
Positively skewed Negatively skewed
Note: when the graph is skewed, the median and IQR are used when
measuring centre and spread.
Measures of centre
Mean: the average value.
Median: the midpoint of a distribution (50th percentile).
Mode: the most commonly occurring value/s (only used when there is a high
number of scores)
Measures of spread
Range: the difference between the smallest value and the largest.
IQR (interquartile range): the range in which 50% of the values lie.
75→th percentile whilst 25→
th percentile
3
7/29/2019 8678Core Material Notes
http://slidepdf.com/reader/full/8678core-material-notes 4/16
Further Mathematics – Core Material Notes
Outliers
Any data value/s that stands out from the
main part of the data (values that are
unusually high or low).
» Five-number summary and Box
Plots
A listing of the median, the 2nd and 3rd quartile, and the smallest and largest data
values of a distribution:
Minimum, , M, , Maximum
From this five-number summary, a box plot can be constructed:
General Box Plot
Box Plot with outlier/s
Note: the lower fence and upper fence are not drawn in but it must be
understood that values that lie outside these fences are classified as ‘possible
outliers’ (possible, in that the distribution may just have a very long tail and
there is not enough data to pick up other values within the tail).
» Box Plots and Distribution Shape
4
7/29/2019 8678Core Material Notes
http://slidepdf.com/reader/full/8678core-material-notes 5/16
Further Mathematics – Core Material Notes
Symmetric Positively skewed Negatively
Skewed
» Standard Deviation
Standard Deviation: used to measure the spread of a data distribution around
the mean.
The standard deviation can be estimated by assuming that around 95% of the
data values lie within two standard deviations of the mean (four in total):
Note: now that all of the summary statistics have been explained, the following
summary statistics are usually used together:o Mean and standard deviation
o Median and IQR
o Mode and range
» 68-95-99.7% Rule
» Standard scores/z-scores
Standard scores/z-scores: transformed data values that show the number of
standard deviations that the values lie from the mean of the distribution.
Example: The mean study score for Further Mathematics in 2011 was 30 with a
standard deviation of 7. Student A received a study score of 47, while student B
received a score of 23. Calculate the z-scores for both students and comment.
For student A: For student B:
5
7/29/2019 8678Core Material Notes
http://slidepdf.com/reader/full/8678core-material-notes 6/16
Further Mathematics – Core Material Notes
Drawing this out graphically, student A lies at
the BLUE point and student B lies at the GREEN
point:
From this graph, it can be understood that
student A’s score lies within the top 2.5% of the
state, whilst student B lies in the bottom 16% of
the state, scoring lower than 84% of the rest of
the state.
BIVARIATE D ATABivariate data sets can be of three types:
o Categorical – Categorical
o Numerical – Categorical
o Numerical – Numerical
For any bivariatedata set, one
variable is
dependent and the other independent:
The dependent variable responds to change in the independent variable.
The independent variable explains the change in the dependent variable.
Categorical— categorical» Two-way frequency table
Note: Unless the two column sums are equal, it is incorrect to make a
judgement regarding the relationship between these variables based on the first
table. In order to accurately compare the two variables, it is important that the
table entries are converted into percentages (as shown below)
6
7/29/2019 8678Core Material Notes
http://slidepdf.com/reader/full/8678core-material-notes 7/16
Further Mathematics – Core Material Notes
» Segmented Bar Chart
The relationship between two categorical variables can also be compared by
using a percentaged segmented bar chart:
Numerical—categorical» Parallel box plots
Note: When analysing parallel box plots,
compare the following features:
- Medians
- IQRs and/or ranges
- Shapes (symmetric or skewed)
Numerical—numerical
» Scatterplots
7
7/29/2019 8678Core Material Notes
http://slidepdf.com/reader/full/8678core-material-notes 8/16
Further Mathematics – Core Material Notes
» Interpreting Scatterplots
The object of bivariate analysis is to
determine whether a relationship exists
between two variables and if so, its:
Strength- Correlation coefficient (r) - strong,
moderate, weak or no relationship
- Coefficient of
determination
(r2)
Form
- Linear or non-linear
Direction
- Positive, negative, or no correlation
» Pearson’s Correlation Coefficient
Correlation coefficient: A value that shows how strong the
relationship between two variables is, where a ‘strong’
relationship will show a perfect linear graph whilst a graph
with no correlation would have values scattered everywhere
with no clear pattern.
o Positive values always represent relationships with a
positive gradient.
o Negative values always represent relationships witha negative gradient.
Note: 1. The correlation coefficient is designed for numerical and linear
data only .
2. It should be used with caution if outliers are present .
» Coefficient of DeterminationCoefficient of determination: describes the amount of influence that the
independent variable had on the dependent variable (usually expressed as a
percentage).
The standard analysis is:
The coefficient of determination, calculated to be [ ], shows that [ %]
of the variation in [dependent variable] can be explained by variation in
[independent variable]. The other [(100-100r²)]% of variation in
[dependent variable] can be explained by other factors or influences.
8
7/29/2019 8678Core Material Notes
http://slidepdf.com/reader/full/8678core-material-notes 9/16
Further Mathematics – Core Material Notes
Note: The coefficient of determination does not entirely determine whether
there is a relationship between two variables, therefore, it is important to
consider other factors.
For example, it was calculated that the relationship between height and
intellectual ability had a strong correlation. However, this may simply be
because taller people are generally older than those who are shorter and thus,
intellectual ability may be due to age rather than height.
Choosing a Suitable GraphType of data
GraphDependent variable Independent variable
Categorical Categorical Segmented bar chart
Categorical Numerical Parallel box plots
Categorical (two
categories only) Numerical
Back-to-back stem plots
Parallel box plots(preferred)
Numerical Numerical Scatterplot
REGRESSIONLeast Squares RegressionLinear regression: the process of fitting a straight line to bivariate data with the
aim of modelling the relationship between two numerical variables.
This straight line can be found using two methods:
» Least Squares Method
Note: Least Squares regression line is usually used for data without
outliers.
This method assumes that the variables are linearly related .**
When using this method, the IV and DV must be correctly identified.
Residuals: the vertical distances between the
actual x value and the predicted x value
which lies on the least squares line
9
7/29/2019 8678Core Material Notes
http://slidepdf.com/reader/full/8678core-material-notes 10/16
Further Mathematics – Core Material Notes
Least squares line: minimises the sum of the squares of the residuals.
Note: The slope, b, predicts the change in y when x changes by one unit.
o +b means y increases as x increases
o –b means y decreases as x increases
The y-intercept, a, predicts the value of y when x=0
** To see whether two variables are linearly related, and thus whether or not the
least squares method should be applied, you can plot what is known as the
‘residual plot’. A residual plot shows important information about a relationship
and allows you to view the residual values for each point.
o A residual that appears like a positive or negative parabola indicates
that the data is non-linear and a transformation should be applied to the
data.
o A residual plot with a random pattern indicates that the data is linear.
» The Three Median Line
Note: The Three Median regression line is usually used for data with
outliers.
This method assumes that the variables are linearly related .**
1. Plot the data on a scatterplot
2. Divide the points symmetrically into three groups (if you are unable to divide
the points equally, divide it in a way that the left and right sides both have
equal amounts).
3. Find the median point of each group by finding the median of the x and y
values.
4. Connect the two outside points and move
this line one third of the way towards
the middle point.
Extrapolation and Interpolation
Interpolation: predicting within the range of
data.
Extrapolation: predicting outside the range
of data.
Note: extrapolation is a less reliable process
than interpolation as you are going beyond your original data.
10
7/29/2019 8678Core Material Notes
http://slidepdf.com/reader/full/8678core-material-notes 11/16
Further Mathematics – Core Material Notes
Data Transformation
Note: When a transformation is needed, use the values of r and r² to helpdetermine which is best.
T IMES SERIESSummarising Time Series Data
Time series data is simply data with a timeframe as the independent variable.
There are four ways in which it can be described:
Trend
Data displays a trend (or secular trend) when a consistent increase or decrease
can be seen in the data over a significant period of time. A trend line can befitted to such data.
11
7/29/2019 8678Core Material Notes
http://slidepdf.com/reader/full/8678core-material-notes 12/16
Further Mathematics – Core Material Notes
Seasonality/Seasonal variation
Seasonal data are repetitive fluctuating movements which occur within a time
period of one year or less. For e.g. sales of warm drinks might fluctuate every
winter. This data can be deseasonalised.
Cycles
Cyclic data shows fluctuations, but not at consistent intervals, amplitudes or
seasons, and occur in time intervals of more than one year. This includes data
such as stock prices.
Random (Variation)
Random data shows no pattern. All fluctuations occur by chance and cannot be
predicted.
12
7/29/2019 8678Core Material Notes
http://slidepdf.com/reader/full/8678core-material-notes 13/16
Further Mathematics – Core Material Notes
Smoothing Time Series Data
Time series can be smoothed in two ways:» Moving Means
3-moving mean smoothing
5-moving mean smoothing
4-moving mean smoothing
When it comes to even numbers, the centre of the set of points is not a point
belonging to the original series. This problem is solved by using a process called
centring. This is done by taking two smoothed values beside one another and
smoothing those two values. In other words, the moving mean smoothing
process is done twice.
Note: Moving means smoothing is not limited to 3, 5 and 4. These numbers
have been chosen specifically for the sake of explaining the process of
smoothing time series data.
» Moving Medians
Median smoothing is very similar to moving means smoothing, however the
median of the points is taken instead of the average. This method is generally
preferred to moving average smoothing when outliers are present.
Seasonal Indices
13
7/29/2019 8678Core Material Notes
http://slidepdf.com/reader/full/8678core-material-notes 14/16
Remember, the average seasonal index is 100%.
Therefore, this means that February unemployment
figures tend to be 20% higher than the monthly
average. If we obtained a negative answer, say for
example, , this would mean that
unemployment figures for that month tend to be 10%
lower than the monthly average.
Further Mathematics – Core Material Notes
Note: - Deseasonalised data helps show the trend in the series more
clearly and the individual
months that are different from the usual seasonal pattern (i.e. it helps
remove the seasonal component).
- Therefore, we deseasonalise when there is a seasonal component/when
there is a petition for every season.
- The sum of the seasonal indices equals the number of seasons (e.g. if
the seasons are
months, the seasonal indices add to 12).
» Interpreting Seasonal Indices
To interpret a seasonal index, convert the seasonalised index into a percentage
(note: this is an optional step). Once this value is obtained, subtract 100% from
it. Alternatively, if the percentage conversion is not carried out, subtracting 1 will
also give you the answer.
Example: The seasonal index for unemployment for the month of February is 1.2.
Seasonal indices can also be used to comment on the relationship between
seasons, for example:
o A seasonal index of 1.3 means that the season is 30% above the average
of the seasons
o A seasonal index of 1.0 means that the season is equal to the average of
the seasons
o A seasonal index of 0.8 means that the season is 20% below the average
of the seasons
Example: Mikki runs a shop and she wishes to determine quarterly seasonal
indices based on her last year’s sales, which are shown in the tablebelow.
Summer
Autumn Winter Spring
920 1085 1241 446
1. The seasonal index is defined by: . The seasons
are quarters. Write the formula in terms of quarters:
14
7/29/2019 8678Core Material Notes
http://slidepdf.com/reader/full/8678core-material-notes 15/16
Further Mathematics – Core Material Notes
2. Find the quarterly average for the year.
3. Work out the seasonal index (SI) for each time period.
4. Check that the seasonal indices sum to 4 (the number of seasons). The slight difference here is due to rounding error.
5. Write out your answers as a table of the seasonal indices.
Seasonal Indices
Summer
Autumn Winter Spring
0.997 1.176 1.345 0.483
» Steps in calculating seasonal indices for several years’ data
1. Calculate the seasonal indices for Years 1, 2, 3 etc. separately.
2. You should then have three different sets of seasonal indices (or three tables
that represent a each of the different years).
3. Average the three sets of seasonal indices at the end to obtain a single set of
seasonal indices (e.g. find the average of quarter 1 for year 1, 2 and 3, find
the average for quarter 2 for year 1, 2, and 3, and so on. In the end you
should have only one table representing the seasonal indices).
Fitting a Trend Line and Forecasting
» Fitting a trend line
» Forecasting
» Taking seasonality into account
» Making predictions with deseasonalised data
15
7/29/2019 8678Core Material Notes
http://slidepdf.com/reader/full/8678core-material-notes 16/16
Further Mathematics – Core Material Notes
CHECK QUESTION 2A OF CHAPTER 7E
CAS CALCULATOR TUTORIAL FOR ‘CORE
M ATERIAL’Plotting: Five-number summary and Box Plots (and other types of plots)
How to find the standard deviation, and mean (and other ones)
How to find the r value and r^2 value
Least squares regression line
Residual Plot
Applying transformations to a set of values of a function
16