mat 1000 mathematics in today's world. last time 1.what is statistics? numbers plus context...

34
MAT 1000 Mathematics in Today's World

Upload: alan-skinner

Post on 17-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

MAT 1000

Mathematics in Today's World

Last Time

1. What is statistics? Numbers plus context (data).

2. The structure of data: individuals and variables

3. Two methods to collect data: observational studies and experiments

Last Time

Individuals: the people or objects being studied

Variables: the individuals’ characteristics or attributes being studied. Variables can be numeric or non-numeric.

Observational study: the researchers performing the study merely observe the individuals.

Experiment: the researchers attempt to modify, influence, or affect the individuals they are studying.

Last Time

Example

Every month the government calculates the unemployment rate.

Individuals: American adults.

Variable: current job status.

This must be done with an observational study. The researchers are not trying to change the job status of any of the individuals in the study.

Today

1. Two types of observational study:

census and sample survey.

2. Three methods for choosing a sample—

two bad methods and one good one.

Population

The data we collect are attributes of some type of individual (people or objects).

The collection of all of the individuals is called the population.

Example 1: if the individuals are Wayne State students, the population is the collection of all Wayne State students.

Example 2: if the individuals are American cities, then the population is the collection of all American cities.

Two types of observational study

Important design issue for observational studies:

Which individuals in the population to observe?

All, or only part?

Two types of observational study

The government could try to determine the unemployment rate by observing all of the individuals in the population, that is by asking every working age adult whether they are employed.

This would be incredibly expensive and time-consuming.

More than that, we can actually get a reasonably accurate answer by only asking a small fraction of all the working age adults.

Two types of observational study

The two types of observational study are

1. Census: the researchers try to observe all of the individuals in the population.

2. Sample survey: the researchers only observe certain individuals in the population.

In a sample survey, individuals selected for observation are called the sample.

Individuals

Population

Two types of observational study

Two types of observational study

Sample

Census

Two types of observational study

Choosing a sample

When you choose a sample, the method you use is important.

Would like a sample which represents the whole population—but we can never be sure.

However, some sampling methods tend to produce samples which are different from the whole population in some important way.

Choosing a sample

Three commonly used methods for choosing a sample are

1. Convenience samples

2. Voluntary response samples

3. Simple random samples

The first two methods are bad sampling methods.

Bad sampling methods

Suppose a teacher wants to know whether his students are understanding a lecture.

He could stop his lecture to ask the class questions, and wait for volunteers to answer.

What’s the problem?

Only the students who understand will ever volunteer to participate.

The sample of students the teacher is interacting with may not represent the class as a whole.

Bad sampling methods

A voluntary response sample consists of those individuals who volunteer to be in the sample.

Voluntary response samples usually fail to represent the population as a whole.

Opinion polls often allow anyone to participate. But the people who participate are the ones who tend to feel strongly about an issue.

Bad sampling methods

ExampleAn advice columnist (Ann Landers) wanted to know how many parents regretted having children. So she asked her readers.

She received over 10,000 responses, and 70% said they did regret having children.

Does this sound plausible?

No. This was a voluntary response sample survey. The sample was almost surely not representative of the population of all parents.

Bad sampling methods

A convenience sample includes the individuals who it is easiest to observe.

An employee of a grocery store inspects a large shipment of oranges. If there are too many damaged fruits, the grocery store will return the shipment.

The employee might only look at the top crates, and only select the oranges lying at the top of those crates?

This is a convenience sample. It will almost surely not represent the population (the whole shipment of oranges).

If there are any damaged or unacceptable oranges, they are probably going to be at the bottom of a crate.

Bad sampling methods

Both voluntary response and convenience sampling have a similar flaw: they typically lead to unrepresentative samples.

A method of choosing a sample is biased if it systematically favors certain outcomes.

To understand the word “systematic” in this definition, we need a thought experiment.

You should imagine taking a sampling method and repeating it several times.

If the sample we collect will usually fail to represent the population in the same way, we say the sampling method is “biased.”

Bad sampling methods

Suppose the teacher leaves the room, and, one after another, several different teachers come in, and ask for volunteers to answer questions.

Each teacher may talk to a different sample of students, but the volunteers will usually be the students who best understand the material.

They are using a biased sampling method.

All of these teachers will end up overestimating the level of understanding of their students (samples misrepresent the population in the same way).

Bad sampling methods

Back at the grocery store, ten employees take turns inspecting the same shipment of oranges.

Each one uses convenience sampling—they inspect the oranges that are easiest for them to find (from the top of the crates).

Maybe each person inspects a different sample of oranges.

But every one of these inspectors will probably overestimate the quality of the shipment.

The reason is that convenience sampling is biased.

Bad sampling methods

Notice that “bias” is a property of a sampling method.

So Ann Landers’ opinion poll is a sample survey that uses a biased sampling method (voluntary response).

Random sampling

Is there a method of choosing a sample that will always pick a good representation of the population?

No!

We don’t know anything about the population as a whole. So we can never know for sure that a sample really represents the population as a whole.

Nevertheless it is possible to choose a sample and be fairly confident that it represents the population.

We rely on randomness.

By choosing individuals at random, our sample is more likely (not guaranteed) to represent the population.

If instead of asking for volunteers, an instructor calls on students at random, probably the students called on will give a good representation of the class as a whole.

Of course, we might randomly choose only the students who understand the material.

In other words, it is possible to use voluntary response sampling or random sampling and end up picking the exact same people.

Random sampling

So what’s the advantage of a random sample?

As opposed to voluntary response, we now have a chance of picking students who may not understand the material.

And we will see later in the course that the odds of picking an unrepresentative sample at random are quite low.

For now, we will just look at a practical method for generating a random sample.

Random sampling

Simple random sampling

The method we will discuss is called simple random sampling (SRS).

Suppose we want to choose a sample of size n (here n is just some natural number).

In a SRS, any group of n individuals in the population has an equal chance of being selected as the sample.

Simple random sampling

To understand what this means, think of the example of a grocer inspecting a shipment of oranges. Suppose he needs to pick 25 from a large shipment.

If he uses convenience sampling, say by picking only the oranges that are at the top the crates, he could never pick a group of 25 that includes some oranges from the bottom of the crates.

If he uses SRS, any of these groups has an equal chance of being picked. So he could randomly pick 25 that are all at the bottom of a crate, or 25 that are all at the top. But the most likely thing is that he will pick a mixture.

A-1 Plumbing JL RecordsAccent publishing Johnson CommoditiesAction Sport Shop Keiser ConstructionAnderson Construction Liu's Chinese RestaurantBailey Trucking MagicTanBalloons, Inc. Peerless MachineBennett Hardware Photo ArtsBest's Camera Shop River City BooksBlue print specialties Riverside TavernCentral Tree Service Rustic BoutiqueClassic Flowers Satellite ServicesComputer Answers Scotch WashDarlene's Dolls Sewer's CenterFleisch Realty Tire SpecialtiesHernandez Electronics Von's Video Store

Simple random samplingHow do we pick a simple random sample?ExampleJohn’s small accounting firm serves 30 business clients. John wants to interview a sample of 5 clients to find ways to improve client satisfaction. To avoid bias, he chooses an SRS of size 5.

A-1 Plumbing JL RecordsAccent publishing Johnson CommoditiesAction Sport Shop Keiser ConstructionAnderson Construction Liu's Chinese RestaurantBailey Trucking MagicTanBalloons, Inc. Peerless MachineBennett Hardware Photo ArtsBest's Camera Shop River City BooksBlue print specialties Riverside TavernCentral Tree Service Rustic BoutiqueClassic Flowers Satellite ServicesComputer Answers Scotch WashDarlene's Dolls Sewer's CenterFleisch Realty Tire SpecialtiesHernandez Electronics Von's Video Store

Simple random samplingStep 1: Label Give each client a numerical label, using as few digits as possible. Here there are 30 clients, so we can’t use one digit numbers. Two digit numbers will work:

01, 02, 03, …, 28, 29, 30

01 A-1 Plumbing 16 JL Records02 Accent publishing 17 Johnson Commodities03 Action Sport Shop 18 Keiser Construction04 Anderson Construction 19 Liu's Chinese Restaurant05 Bailey Trucking 20 MagicTan06 Balloons, Inc. 21 Peerless Machine07 Bennett Hardware 22 Photo Arts08 Best's Camera Shop 23 River City Books09 Blue print specialties 24 Riverside Tavern10 Central Tree Service 25 Rustic Boutique11 Classic Flowers 26 Satellite Services12 Computer Answers 27 Scotch Wash13 Darlene's Dolls 28 Sewer's Center14 Fleisch Realty 29 Tire Specialties15 Hernandez Electronics 30 Von's Video Store

Simple random samplingStep 1: Label Give each client a numerical label, using as few digits as possible. Here there are 30 clients, so we can’t use one digit numbers. Two digit numbers will work:

01, 02, 03, …, 28, 29, 30

01 A-1 Plumbing 16 JL Records02 Accent publishing 17 Johnson Commodities03 Action Sport Shop 18 Keiser Construction04 Anderson Construction 19 Liu's Chinese Restaurant05 Bailey Trucking 20 MagicTan06 Balloons, Inc. 21 Peerless Machine07 Bennett Hardware 22 Photo Arts08 Best's Camera Shop 23 River City Books09 Blue print specialties 24 Riverside Tavern10 Central Tree Service 25 Rustic Boutique11 Classic Flowers 26 Satellite Services12 Computer Answers 27 Scotch Wash13 Darlene's Dolls 28 Sewer's Center14 Fleisch Realty 29 Tire Specialties15 Hernandez Electronics 30 Von's Video Store

Simple random samplingStep 2: Generate random numbers. In practice we use a computer to do this. For the classroom, we can use a “table of random digits.”

Chapter 2 32

01 A-1 Plumbing 16 JL Records02 Accent publishing 17 Johnson Commodities03 Action Sport Shop 18 Keiser Construction04 Anderson Construction 19 Liu's Chinese Restaurant05 Bailey Trucking 20 MagicTan06 Balloons, Inc. 21 Peerless Machine07 Bennett Hardware 22 Photo Arts08 Best's Camera Shop 23 River City Books09 Blue print specialties 24 Riverside Tavern10 Central Tree Service 25 Rustic Boutique11 Classic Flowers 26 Satellite Services12 Computer Answers 27 Scotch Wash13 Darlene's Dolls 28 Sewer's Center14 Fleisch Realty 29 Tire Specialties15 Hernandez Electronics 30 Von's Video Store

We will pick our sample using the following random digits:

69051 64817 87174 09517 84534 06489 87201 97245

Our labels are 2 digit numbers, so we read 2 digits at a time (ignore the gaps in the list of digits):

69 05 16 48 17 87 17 40 95 17

We ignore all two digit groups greater than 30. This leaves

05 16 17 17 17

The clients labeled 05, 16, and 17 go into the sample (we only use 17 once)But we need two more! Here are the next few 2 digits groups:

84 53 40 64 89 87 20 19 72 45

Disregarding the numbers greater than 30, we are left with

20 19

So clients 20 and 19 go into the sample as well. Hence, the sample consists of clients labeled 05, 16, 17, 19, and 20:

Bailey TruckingJL Records

Johnson CommoditiesMagicTan

Liu’s Chinese Restaurant