bi-variate data( as 3.9) dru rose (westlake girls high school) workshop pd aiming at excellence

30
Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

Upload: laurel-spencer

Post on 28-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

Bi-Variate Data( AS 3.9)

Dru Rose (Westlake Girls High School)

Workshop PD

Aiming at Excellence

Page 2: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

The Aims of this workshop

• Discuss some concepts in this standard that students appear to find difficult and share some possible teaching approaches

• Share some borderline A/M and M/E student work with our grading decisions and reasoning

Page 3: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

2 Big ideas1. “The eyes have it” 2. “Context is everything”• What is the nature of the relationship

between emissions and energy consumption per person ?

• Can I use energy consumption to reliably predict emissions?

• What are the limitations of my model?

Page 4: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

How do we train students to use their eyes?

• Intially NO Technology(until they can identify key features of data and describe them in context.)

• Provide plenty of contrasting scatter plots with straightforward contexts

Page 5: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

How do we train students to describe what they see in context?

1. Teacher modellingStudent voice: “It’s a decreasing trend”

Rephrase:• As the age of a car

increases, there is a tendency for the price to decrease.

• We say there is a negative association between the price of a car and its age. ()

Page 6: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

How do we train students to describe what they see in context?2. Writing templates with a prescribed structureTASVUTrend (Linear/Non-linear)

Association (Positive/Negative)

Strength (Strong/moderate/weak)- points are generally close to the trend/ fairly close/ there is quite a lot of scatter)

Variation (in - does it change as )

Unusual (outliers? Groups ?)

Page 7: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

Activity: Where should the trend line go?

Page 8: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence
Page 9: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

Activity: Where should the trend line go?

Page 10: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

What do students appear to find difficult?

1. Scatter

i.e. Variation in the vertical direction

(In a box-plot variation is about spread in the x-direction )

Page 11: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

What does non-constant scatter really mean?

As energy use per person increases, the variation in emissions seems to increase also.

X

Page 12: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

What do students appear to find difficult?

2. Which variable goes on which axis?

• “Outcome” (predicted or response ) must go on the axis.

• Useful comparisons require comparing 2 predictor variables for the same outcome variable-which one gives the more precise prediction?• We want to ask “as increases, what happens to ?”- not the reverse

Page 13: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

• Start with data sets where it is obvious

Tar content, nicotine content and weight are measured before the cigarette is smoked.

The CO is emitted when the cigarette is smoked – hence must be the “outcome” variable.

Page 14: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

• Systollic BP (when heart muscle heart contracts) and Diastollic BP (when heart muscle relaxes between beats)

Consider situations where it might not matter

Page 15: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

Move on to a rich multi-variate set

What might be a possible “outcome” variable?

What might be possible predictor variables for that outcome?

What are possible investigative questions we could pose?

Gapminder 2008

Page 16: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

What do students appear to find difficult?3. What the trend equation tells us

Linear Trend : Weight = 1.0766 * Height + -101.98

For every 1 cm increase in height, on average the weight of an American adult increases by about 1.08kg

What does the -101.98 tell us?Wt of a baby of zero height- Is it a useful measure in this plot? - NO

Page 17: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

4. The correlation coefficient: inappropriate use of r

• When does it tell us something useful and when should it not be used?

Technology does what the user tells it to. • Students need to continually ask questions:• Is it sensible to put a line on a non-linear graph?• Does the line I have added actually describe the

trend in the majority of the data?

What do students appear to find difficult?

Page 18: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

Anscombe’s Quartet : Same , , regression equation and “r” value

2 4 6 8 10 12 14 16 18 200

5

10

15

f(x) = 0.50009090909 x + 3.00009090909R² = 0.666542459508775

2 4 6 8 10 12 14 16 18 200

5

10

15

f(x) = 0.5 x + 3.00090909090909R² = 0.666242033727484

2 4 6 8 10 12 14 16 18 200

5

10

15

f(x) = 0.4997272727 x + 3.0024545455R² = 0.666324041066559

6 8 10 12 14 16 18 200

5

10

15

f(x) = 0.499909090909 x + 3.001727272727R² = 0.666707256898465

X

caution

X

Page 19: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

What do students appear to find difficult?5. Outliers and Groups• They see outliers and groups where there are none• They remove points or groups where they should not

Key questions we want students to ask:• Will this point (group) affect the position of the trend

line(curve)?• Will this point (group) affect the strength of the

relationship?• Do I need to do further analysis or further research or

both?

Page 20: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

iceland

Brunei

Luxembourg

Why so different?

Page 21: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

Outlier is clearly pulling trend line towards it- remove it to get an appropriate trend line for making a prediction

Page 22: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

emissions = 2.2706*Energyuse +0.6Correlation = 0.87553

(ton

nes/

yr)

Tonnes of oil equivalent per yr(toe)

Not affecting trend line. No need to remove

Page 23: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

Making a forecast: What do we expect at Merit and Excellence level?

For a country with energy use of 2 (toe pp/yr), emissions = 5.14 tonnes pp on average, but is likely somewhere between 2 and 8 tonnes per person. (around 60% relative error (5±3) so not that reliable)

when energy use increases to 5 toe pp/yr. emissions = 11.95 tonnes pp on average, but is likely somewhere between 6.5 and 16.5 tonnes per person.A wider interval, not reliable(but actually a slighly smaller 42% relative error (12±5))

Limitation: Lack of data for energy consumption beyond 5 toe-should not extrapolate

Page 24: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

Researchneededonly

Chad

Page 25: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

What do students appear to find difficult?

6. Confounding (lurking) variablesThe association we are observing may be an indirect relationship, where both the predictor and outcome variables are correlated with another related variable (called a confounder)

With the big multi-variate data sets now available in INZIGHT students can now test out their thinking regarding potential confounding variables

Page 26: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

What other variable might be connected with low life expectancy and a high number of children per woman? (or the reverse)

Using INZIGHT we can quickly identify Chad as having the highest fertility rate. Why?

Google question:

Why does Chad have the highest number of children per woman?

Page 27: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence
Page 28: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence
Page 29: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

What do we expect for MERIT?

• Some research to help explain the reason for the posed question

• A demonstration of understanding of the context and any statistical jargon used

• Some discussion on the reliability and usefulness of the forecast :

e.g. limited range of x values, wide prediction interval?

• An overall conclusion which does come to a final decision in answer to the question.

Page 30: Bi-Variate Data( AS 3.9) Dru Rose (Westlake Girls High School) Workshop PD Aiming at Excellence

What extra do we expect for Excellence?

• Sound research, referenced and integerated into the report

• Deeper thinking, beyond a formulaic approach

• Ability to cope with the unexpected• No inappropriate use of statistical

techniques and /or serious misunderstandings

• Discussion of the limitations of the analysis