statistics through applications how do we get “good” data?

15
Statistics Through Applications How Do We Get “Good” Data?

Upload: dennis-shelton

Post on 17-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Statistics Through Applications

How Do We Get “Good” Data?

Good Data isn't Based on an Anecdote

Using anecdotal evidence is relying on an isolated example or experience to make a decision

Good data should come from many varied examples and be non-partial

Anecdotes usually appeal to our emotions and fool us into belief while statistics are dry but much more reliable

Good Data is Compared Fairly

Often a rate expressed as a percent or fraction is a more valid measure than a simple count of occurrences

Two schools both had 1900 students pass TAKS. One school has 2000 students and the other has 2500. Did they perform equally as well?

Good Data needs to be Communicated and Read Carefully

An advertisement for a home security system says, “When you go on vacation, burglars go to work. According to FBI statistics, over 26% of home burglaries take place between Memorial Day and Labor Day. Beware - summertime is burglary time!”

Only one in two cameras is actually in operation, but this could soon increase to as many as one in three.Watford Observer, 2 August 2002

Continental Airlines once advertised that it had “decreased lost baggage by 100% in the past six months.”

Results from a Gallup poll, taken May 29-31, 2009, with a 3% margin of error.

Can we conclude that more Americans have a favorable impression of Dick Cheney than Nancy Pelosi?

What can we conclude from these graphs?

Good Data is Valid, Unbiased & Reliable

Valid – relevant and appropriate

Unbiased – not consistently different from  actuality in one direction

Reliable – as little variation as possible

Even Good Data Varies How Long is a Minute?

How accurate are you and your classmates at knowing how long a minute is?

Get a partner and a stopwatch. You will take turns timing and guessing. Using the stopwatch, the timer tells the guesser when to start. When the guesser believes that a minute has passed, he says “Stop.” At that point, the timer stops the stopwatch and records the time that passed to the nearest tenth of a second. Do not tell your partner how much time actually passed!

Reset the stopwatch and switch roles. Continue timing and measuring until each person has been timed three times.

Analyzing How Long is a Minute?

Was your data valid? Was either partner’s data biased? Which partner was more reliable? How about the class as a whole? Add

your data (3 from each of you) to the class list and graph.

Then figure the average of your 3 measurements and add it to the other graph.

All data varies, but we can use Averages to Improve Reliability No measuring process is perfectly

reliable. The average of several repeated

measurements of the same individual is more reliable (and less variable) than a single measurement.

Goal: The least amount of variability and bias possible!

How do we achieve our goal?

To reduce bias, use random sampling. A random sample should represent the population, and therefore give unbiased results.

To reduce variability, use a larger sample. Increasing the sample size will almost always give an average estimate that is close to the truth.

Good Data shouldn't be confounding

Just because two variables have a relationship, that doesn’t mean one causes the other. There could be a confounding variable at play.

A confounding variable is an additional variable that effects the response but isn't separated out.

Confounding variables are most often found in observational studies comparing a characteristic of two groups or poorly designed experiments .

Sometimes the media ignores confounding variables and misinterprets results from observational studies reporting “proven” links when in statistics we only have shown evidence of a relationship.

Consider this data – is there a relationship?

YearNumber of Methodist Ministers in New England

Cuban Rum Imported to Boston (in barrels)

1860 63 8,376

1865 48 6,406

1870 53 7,005

1875 64 8,486

1880 72 9,595

1885 80 10,643

1890 85 11,265

1895 76 10,071

1900 80 10,547

1905 83 11,008

1910 105 13,885

1915 140 18,559

1920 175 23,024

1925 183 24,185

1930 192 25,434

1935 221 29,238

1940 262 34,705

Another confounding example A study sites that a group of children who had

certain vaccinations were more likely to develop autism than a group of children who did not receive those same vaccinations. 

Does this mean that vaccinations cause autism? No, since the children were not randomly

assigned to get vaccinations or not, there could be another confounding variable at play.  Perhaps the parents that chose vaccinations also made some other choice that increased the risk of austism or were genetically inclined to have a higher risk of autistic children