torturing numbers - descriptive statistics for growers (2013)

33
Torturing Numbers Dr. Jason S.T. Deveau Application Technology Specialist OMAFRA, Simcoe Station A Grower’s Guide to Descriptive Statistics

Upload: jasondeveau

Post on 12-Apr-2017

172 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Torturing numbers - Descriptive Statistics for Growers (2013)

Torturing Numbers

Dr. Jason S.T. DeveauApplication Technology Specialist

OMAFRA, Simcoe Station

A Grower’s Guide to Descriptive Statistics

Page 2: Torturing numbers - Descriptive Statistics for Growers (2013)

"If you torture the data long enough, it will confess" – Ronald Harry Coase, Economist

Page 3: Torturing numbers - Descriptive Statistics for Growers (2013)

why do we need statistics?• Descriptive statistics are math tools we use to:

Describe data

Find trends in data against variation

Determine if a sample represents a population

Draw conclusions about data

Page 4: Torturing numbers - Descriptive Statistics for Growers (2013)

describing data• In 1950, 25 university graduates were asked what

they earned in their first year of work

$45,000

$15,000

$10,000

$5,700

$5,000

$3,700

$3,000

$2,000

$2,000

$2,000

$10,000

$5,000

$2,000

$2,000

$5,000

$2,000

$3,700

$3,700

$3,700

$2,000

$2,000

$2,000

$2,000

$2,000

$2,000

• What do these data tell you?

Page 5: Torturing numbers - Descriptive Statistics for Growers (2013)

describing data• Here is the same data ordered from greatest to

least and weighted to show how many times each value occurs in the data set

• Now what do the data tell you?

• What is the average income?

$45,000

$15,000

$10,000

$5,700

$5,000

$3,700

$3,000

$2,000

Page 6: Torturing numbers - Descriptive Statistics for Growers (2013)

$45,000

$15,000

$10,000

$5,700

$5,000

$3,700

$3,000

$2,000

describing data• BEWARE! The reported ‘average’ might depend

on what you are meant to see. Which would you use on your taxes?

MEAN (arithmetic average)

MEDIAN (midpoint in range)

MODE (most frequent)

• So, to really understand the data set you need more than just the ‘average’

Page 7: Torturing numbers - Descriptive Statistics for Growers (2013)

spread and variability• You need to know the spread of the data

• This histogram shows the ages of smart people that attend spray demos

• Is it typical for 90 year olds to attend spray demos?

Page 8: Torturing numbers - Descriptive Statistics for Growers (2013)

spread and variability• When the mean and median are the same, you

have a special situation called a ‘normal’ curve

• On this symmetrical curve, the variability can be described using standard deviations (SD)

Page 9: Torturing numbers - Descriptive Statistics for Growers (2013)

spread and variability• SD is a way to determine how far a data point is

from the mean• You can now

say that 90 year olds fall more than 2 SD from the mean, or that they make up less than 2.5% of the data set

Page 10: Torturing numbers - Descriptive Statistics for Growers (2013)

spread and variability• If we collapse the whole data set to one bar, we

can show the mean with some measure of variability (std dev, std error, etc.)

• Without some indication of variability, you cannot effectively compare two data sets

Page 11: Torturing numbers - Descriptive Statistics for Growers (2013)

spread and variability• Often, data sets are skewed. Here is the effect of a

new herbicide on quackgrass.

• Means and standard deviations don’t help here…

Page 12: Torturing numbers - Descriptive Statistics for Growers (2013)

spread and variability

Min Q1 Median Q3 Max

• Perhaps the best way to describe any data set is with five numbers: Minimum, Q1, Median, Q3, Maximum. This helps when comparing data sets, and when there are oddities called outliers

25% 25% 25% 25%*

Outlie

r

Page 13: Torturing numbers - Descriptive Statistics for Growers (2013)
Page 14: Torturing numbers - Descriptive Statistics for Growers (2013)

a sample study

• Researchers want to know which of three fertilizers produce the highest wheat yield in kg/plot

Page 15: Torturing numbers - Descriptive Statistics for Growers (2013)

a sample study• They design a study with three treatments and

five replications for each treatment

3 Treatments (Fertilizers 1, 2 and 3)

5 R

eplic

ates

Page 16: Torturing numbers - Descriptive Statistics for Growers (2013)

a sample study

• Could a nearby forest or river be a confounding variable?

• Variables like soil type and other local influences may have unexpected impacts…

Page 17: Torturing numbers - Descriptive Statistics for Growers (2013)

a sample study• This is why a good study is randomized, to

defeat potentially confounding variables

Page 18: Torturing numbers - Descriptive Statistics for Growers (2013)

• Does the sample plot in our study represent all the wheat in all the world?

POPULATION SAMPLE

Page 19: Torturing numbers - Descriptive Statistics for Growers (2013)

uncertainty• With all the unknown variables, there will always

be a degree of uncertainty that our sample represents the population

• That’s why the more samples we have, the more confident we are that our study represents the population

Page 20: Torturing numbers - Descriptive Statistics for Growers (2013)

confidence

•Any confidence interval could be used, but 95% is often chosen

•This means that 95% of the time, you expect your data represents reality

•BEWARE reports with no confidence interval

Page 21: Torturing numbers - Descriptive Statistics for Growers (2013)

two ways to present data

Fertilizer 1 Fertilizer 2 Fertilizer 364.8 56.5 65.860.5 53.8 73.263.4 59.4 59.548.2 61.1 66.355.5 58.8 70.2

• Tables are the preferred way to show data, but graphs paint a quick, easy and seductive picture

Page 22: Torturing numbers - Descriptive Statistics for Growers (2013)

drawing conclusions• A presenter may want you to see a relationship

between two variables

• Fertilizer 3 appears to increase the average yield of wheat – but what kind of average is this? How big was the sample? Where is the indication of variability? Where is the confidence interval?

Page 23: Torturing numbers - Descriptive Statistics for Growers (2013)

drawing conclusions• A presenter may want you to see a relationship

between two variables

• Fertilizer 3 appears to increase the average yield of wheat – but what kind of average is this? How big was the sample? Where is the indication of variability? Where is the confidence interval?

• Bad stats and bad experimental design may lead to bad conclusions

2 SD

Page 24: Torturing numbers - Descriptive Statistics for Growers (2013)

drawing conclusions• Correlation does not imply causation

The more firemen fighting a fire, the bigger the fire is observed to be. Therefore more firemen cause an increase in the size of a fire.

Page 25: Torturing numbers - Descriptive Statistics for Growers (2013)

• Often, a presenter wants to lead you to a conclusion. Newspapers, TV and online articles should be scrutinized!

• BEWARE:• “This is not a scientific poll…”• “These results may not be representative of

the population”• “…based on a list of those that responded”• “Data showed a trend but was not statistically

significant” (I’ve used this one!!!)

it’s all in how you show it

Page 26: Torturing numbers - Descriptive Statistics for Growers (2013)

it’s all in how you show it• Pies are for eating, and possibly throwing…• It’s very hard to see differences• BEWARE CHARTJUNK!

Page 27: Torturing numbers - Descriptive Statistics for Growers (2013)

it’s all in how you show it• Amusing graphics are nothing but distractions• Again, it’s very hard to see differences• BEWARE CHARTJUNK!

Page 28: Torturing numbers - Descriptive Statistics for Growers (2013)

it’s all in how you show it• Here is the same population growth data

shown on two scales. Which would you use to demonstrate rapid growth?

• BEWARE tricky scales!

Page 29: Torturing numbers - Descriptive Statistics for Growers (2013)

it’s all in how you show it

• BEWARE statements with no context. Here’s a made-up example, but it’s no worse than other ‘factoids’ I’ve encountered

Did you know that even speaking to someone that once sprayed pesticides DOUBLES your chance of getting cancer?!

• Your odds go from 0.000000001:1 to 0.000000002:1

Page 30: Torturing numbers - Descriptive Statistics for Growers (2013)

conclusion

• We started by stating that descriptive statistics are tools

• Like any tool, stats can be misused (intentionally or unintentionally)

• Maintain a healthy scepticism and question charts, tables and conclusions where insufficient information is provided

Page 31: Torturing numbers - Descriptive Statistics for Growers (2013)

Three statisticians were hunting when they came across a big buck. The first statistician fired, but missed by a meter to the left. The second statistician fired, but missed by a meter to the right.

The third statistician threw down his rifle and cheered “We got it!"

…one last joke

Page 32: Torturing numbers - Descriptive Statistics for Growers (2013)

- The Cartoon Guide to Statistics (1993)- Larry Gonick and Woolcott Smith

references

- How to Lie with Statistics (1954)- Darrel Huff

Page 33: Torturing numbers - Descriptive Statistics for Growers (2013)

Tom Wolf@nozzle_guy

Jason Deveau@spray_guy

Learn more about spraying

www.sprayers101.com