trends in data chapter 1.3 – visualizing trends mathematics of data management (nelson) mdm 4u

28
Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

Upload: blaze-mckenzie

Post on 05-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

Trends in Data

Chapter 1.3 – Visualizing Trends

Mathematics of Data Management (Nelson)

MDM 4U

Page 2: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

Variables

In computer science and mathematics, a variable is a symbol denoting a quantity or symbolic representation. In mathematics, a variable often represents an unknown quantity; in computer science, it represents a place where a quantity can be stored. Variables are often contrasted with constants, which are known and unchanging. (Wikipedia, 2004)

Page 3: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

The Two Types of Variables Independent Variable

a variable whose values are arbitrarily chosen placed on the horizontal-axis time is always independent (why?)

Dependent Variable a variable whose values depend on the

independent variable placed on the vertical-axis

Page 4: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

Scatter Plots

a graphical method of showing the joint distribution of two variables where each point on the graph indicates a pair of variables

may show a trend or not a trend indicates a correlation that may be

strong or weak, positive or negative, linear or non-linear

Page 5: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

What is a trend?

a pattern of average behavior that occurs over time

a general “direction” that something tends toward

for example there has been a trend towards increasing costs in Canada

need two variables to exhibit a trend

Page 6: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

An Example of a trend

U.S. population from 1780 to 1960

what is the trend?

is the trend linear?

Att

r2_

po

pm

illio

ns

0

20

40

60

80

100

120

140

PearlReedandKish1940_USpopulationfrom17901940_year1780 1800 1820 1840 1860 1880 1900 1920 1940 1960

019 Scatter Plot

Page 7: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

Line of Best Fit

the line of best fit is a line which best represents the trend in the data and is used for making predictions

these can be drawn by hand but there are also methods for mathematically calculating them (median-median and least squares methods are examples that we will study)

gives no indication of the strength of the trend (use the r or r2 value)

Page 8: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

An example of the line of best fit this is temperature

data from New York over time, with a median-median line added

what type of trend are we looking at?

see p35 for method for creating a median-median line

Att

r2_

me

an

tem

p

14

1618

20

22

24

2628

30

32

StateofNewYorkHistoricalTemperatureData_winters...1900 1920 1940 1960 1980 2000

Attr2_meantemp = 0.0230StateofNewYorkHistoricalTemperatureData_winterseasonmeanof40wea_ - 21.4

048 Scatter Plot

Page 9: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

Creating a Median-Median Line Divide the points into 3 symmetric groups

If there is 1 extra point, include it in the middle group If there are 2 extra points, group one in each end

Calculate the median x- and y-coordinates for each group and plot the median point (x, y)

If the median points are on a straight line, connect them Otherwise, line up the two outer points, move 1/3 of the

way to the middle point and draw a line of best fit

Page 10: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

Median-Median Line (10 points)

Page 11: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

Median-Median Line (14 points)

Page 12: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

Exercises

try page 37 #2, 3, 6, 8

Page 13: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

Trends in Data Using Technology

Chapter 1.4 – Trends in Technology

Mathematics of Data Management (Nelson)

MDM 4U

Page 14: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

Categories of Correlation correlation scatter plots

can be positive or negative, strong or weak

try looking at the examples in this website to help you understand:

tem

pe

ratu

re

1020304050

60708090

strongpositive1 2 3 4 5 6 7 8 9

Collection 1 Scatter Plot

http://www.seeingstatistics.com/seeing1999/gallery/CorrelationPicture.html

Page 15: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

Regression a process of fitting a line or curve to a set of

data if a line is used, it is linear regression if a curve is used, it may be quadratic

regression, cubic regression, etc. why do we do this? what can we do with the resulting function? http://www.seeingstatistics.com/seeing1999/g

allery/CorrelationPicture.html

Page 16: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

Correlation Coefficient

the correlation coefficient r is an indicator of the strength and direction of a linear relationship r = 0 no relationship r = 1 perfect positive correlation r = -1 perfect negative correlation

r2 is the coefficient of determination if r2 = 0.42, that means that 42% of the variation in

y is due to x

Page 17: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

Residuals a residual is the vertical

distance between a point and the line of best fit

if the model you are considering is a good fit, the residuals should be small and have no noticeable pattern

why?

y

2

3

4

5

6

7

8

9

x1 2 3 4 5 6 7 8 9

y = 0.0804x + 3.5; r^2 = 0.021

-1

1

3

Re

sid

ual

1 2 3 4 5 6 7 8 9x

Collection 1 Scatter Plot

http://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html

Page 18: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

Creating a Median-Median Line Using Technology Copy the following file to your M:\ drive

N: \ LIEFF \ MDM4U \ 1.3 Best Fit Lines \

armspan vs height.stu.ftm

Right-click the file | Open With |

Choose Program | Browse

Program Files \ Fathom \ fathom.exe

Page 19: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

Exercises

Page 51 #1-6, 7 b,c,d, 8

Page 20: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

References

Wikipedia (2004). Online Encyclopedia. Retrieved September 1, 2004 from http://en.wikipedia.org/wiki/Main_Page

Page 21: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

The Power of Data

Chapter 1.5 – The Media

Mathematics of Data Management (Nelson)

MDM 4U

There are 3 kinds of lies: lies, damn lies and statistics.

Page 22: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

‘4 out of 5 dentists recommend Trident sugarless gum to their patients who chew gum’ In small groups discuss how this statistical

statement could be misleading

Page 23: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

Trident conclusions

How many dentists did they ask? 5? 4 out of 5 is convincing but reasonable

5 out of 5 is preposterous 3 out of 5 is good but not great

Recommend Trident over what? Chewing sugared gum?

Is Trident the “best” sugarless gum? What variables were considered?

What did the 5th dentist recommend?

Page 24: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

“More people stay with [Bell Mobility] than any other provider.” In small groups, discuss:

1) What variables would be recorded in this study?

2) How could the data be used to arrive at this conclusion falsely?

Page 25: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

1) What variables would be recorded in this study?

Number of Bell Mobility subscribers Number of renewed contracts Contract renewed? Time of Renewal (during contract / upon completion

of contract) Contract Length Contract Type (business or home)

Page 26: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

2) How could the data be used to arrive at this conclusion falsely? Does not specify how many more customers stay

with Bell. e.g. Percentage of customers renewing their plan:

Bell: 30% Rogers: 29% Telus: 25% Fido: 28% Did they only count totals? What does it mean to “stay with Bell”? Honour entire

contract? Renew contract at the end of a term? Are early terminations factored in? If so, does Bell

have a higher cost for early terminations? Competitors’ renewal rates may have decreased

due to family plans Does the data include Private / Corporate plans?

Page 27: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

How does the media use (misuse) data? To inform the public about world events in an

objective manner It sometimes gives misleading or false impressions

to sway the public or to increase ratings

It is important to: Study statistics to understand how information is

represented or misrepresented Correctly interpret tables/charts presented by the media

Page 28: Trends in Data Chapter 1.3 – Visualizing Trends Mathematics of Data Management (Nelson) MDM 4U

Exercises

p. 60 #1-6 Final Project – Manipulating Data