chapter 2 assignment (due thursday, october 3) introduction...

16
Math 146, Fall 2019 Instructor Linda C. Stephenson Chapter 2 Assignment Page 1 of 16 Chapter 2 Assignment (due Thursday, October 3) Introduction: The purpose of this assignment is to analyze data sets by creating histograms and scatterplots. You will use the STATDISK program for both. Therefore, you should start by downloading the STATDISK Version 13 program. A link to the textbook website is given on the “Links and Information” page of our class webpage, plus I did a demo in class. To start the program, just double-click on the “Statdisk.exe” file (or icon), and it should open looking approximately like this: There are datasets preloaded into the program. To open them, just go to Data Sets/Elementary Statistics 13 th Edition, and pick a file. For example, I will open up the ‘Bear Measurements’ dataset to use as an example throughout this assignment file (Data Set no. 9). You might want to open this file and run through it with the examples as practice.

Upload: others

Post on 21-Mar-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 2 Assignment (due Thursday, October 3) Introduction ...lindasmathpage.coffeecup.com/docs/Assignment_2_M146_F19.pdfChapter 2 Assignment Page 10 of 16 Scatterplot: Female Literacy

Math 146, Fall 2019 Instructor – Linda C. Stephenson Chapter 2 Assignment

Page 1 of 16

Chapter 2 Assignment (due Thursday, October 3)

Introduction: The purpose of this assignment is to analyze data sets by

creating histograms and scatterplots. You will use the STATDISK program for

both.

Therefore, you should start by downloading the STATDISK Version 13 program.

A link to the textbook website is given on the “Links and Information” page of our

class webpage, plus I did a demo in class.

To start the program, just double-click on the “Statdisk.exe” file (or icon), and it

should open looking approximately like this:

There are datasets preloaded into the program. To open them, just go to Data

Sets/Elementary Statistics 13th Edition, and pick a file. For example, I will

open up the ‘Bear Measurements’ dataset to use as an example throughout this

assignment file (Data Set no. 9). You might want to open this file and run

through it with the examples as practice.

Page 2: Chapter 2 Assignment (due Thursday, October 3) Introduction ...lindasmathpage.coffeecup.com/docs/Assignment_2_M146_F19.pdfChapter 2 Assignment Page 10 of 16 Scatterplot: Female Literacy

Math 146, Fall 2019 Instructor – Linda C. Stephenson Chapter 2 Assignment

Page 2 of 16

You can click and drag the corner of the data box to make the box bigger. Also,

if needed, you can scroll over to the rest of the columns using the scroll bar at the

bottom of the window (for example, there are 9 columns of data in this file).

There are two parts to the assignment:

Part A: Frequency Distributions and Histograms

Part B: Scatterplot

Page 3: Chapter 2 Assignment (due Thursday, October 3) Introduction ...lindasmathpage.coffeecup.com/docs/Assignment_2_M146_F19.pdfChapter 2 Assignment Page 10 of 16 Scatterplot: Female Literacy

Math 146, Fall 2019 Instructor – Linda C. Stephenson Chapter 2 Assignment

Page 3 of 16

Part A Frequency Distributions and Histograms

The data that you will be analyzing for this assignment deals with the literacy

rates (measurement unit: %) for 165 countries in the world. One set of data

applies to the Male population of each country, and the second set of data

applies to the Female population of each country. The rates generally apply to

people ages 15 and over, and the data is current as of different years for various

countries, but most estimates were from 2015.

The source of data is the CIA World Factbook, at the following website:

https://www.cia.gov/library/publications/the-world-factbook/

From the website, here is their definition of the data:

The data for the assignment is given in the Excel file: literacy_by_country_CIA

I provided three different sheets of data in the Excel file, and the tabs for each

sheet are located at the bottom of the spreadsheet. The data that you want to

copy over to STATDISK is on the “STATDISK” sheet. The other two sheets are

for your information only, and contain the respective data sorted in order from

smallest to largest percentages of literacy, along with the corresponding country

names. You will probably want to consult these sorted data lists when you are

answering the questions.

You are going to create two different frequency distributions and histograms for

this part of the assignment, using the data in the Excel file.

Page 4: Chapter 2 Assignment (due Thursday, October 3) Introduction ...lindasmathpage.coffeecup.com/docs/Assignment_2_M146_F19.pdfChapter 2 Assignment Page 10 of 16 Scatterplot: Female Literacy

Math 146, Fall 2019 Instructor – Linda C. Stephenson Chapter 2 Assignment

Page 4 of 16

You need to copy the data from the Excel file into STATDISK. Start STATDISK,

so all of the data columns should be empty. In Excel, make sure that you are on

the “STATDISK data” tab, the tabs are labeled at the bottom of the spreadsheet.

You can copy both columns of data over at the same time. All you want to copy

over are the numbers in columns B and C, you do not want to copy over the

country names or the column headers. Therefore, in the Excel file, click on cell

B2, which is the first data value (52). Keeping your mouse button down, drag all

the way down and over to cell C166, which is the last data value (84.6). Now

when you release the mouse button, the columns of numbers should be

highlighted. Then just hit Ctrl c (shortcut command for copying). So you hold the

Control button down on your keyboard, and hit the letter c at the same time.

Now transfer over to STATDISK, click in the very upper left cell (row 1 column 1),

and hit Ctrl v (shortcut command for pasting). Both columns of numbers should

now be pasted into STATDISK. Scroll down to confirm that you did get all 165

rows of numbers copied over in both columns. I would suggest that you go to

Data Tools/Edit Column Titles, and add appropriate column headers for the data

(male and female), just to keep them straight.

Making the Frequency Distributions and Histograms

In both cases, use the continuous data method to set up your data classes.

Use the following starting lower class limits and class widths for the

frequency distributions:

Dataset Starting Lower Class Limit

Class Width

Male Literacy Rates (Column 1) 20 10

Female Literacy Rates (Column 2) 10 10

Page 5: Chapter 2 Assignment (due Thursday, October 3) Introduction ...lindasmathpage.coffeecup.com/docs/Assignment_2_M146_F19.pdfChapter 2 Assignment Page 10 of 16 Scatterplot: Female Literacy

Math 146, Fall 2019 Instructor – Linda C. Stephenson Chapter 2 Assignment

Page 5 of 16

Fill in all of the details of the frequency distributions on the Answer Sheet given at

the end of the assignment. To set up the class limits, notice that both columns of

data are given to the first decimal place, so write your class limits accordingly.

Use the histogram (instructions below) to find the frequencies for each data

class, just by checking the ‘Bar labels’ button at the bottom of the histogram.

Disclaimer: I noticed that sometimes if you change something in the histogram

and replot it, the bar labels (frequencies) disappear. Just uncheck and recheck

the ‘Bar labels’ button and they should reappear.

If you wanted to, you can verify counting up the frequencies by hand (like we did

in class), by looking at the sorted data. If you need to sort data (smallest to

largest), go to Data Tools/Sort Data. Then select the Sort/One column, pick the

column number, leave the order from “A to Z” and hit the ‘Sort’ button. You don’t

need to do this in this case, since your histogram has counted them up for you

(plus I already sorted them in Excel, given on separate sorted tabs of the

spreadsheet), but it’s handy to know how to sort the data in STATDISK, and you

may want to use this in the future for quizzes or assignments. Also, you will need

to know how to do this process by hand on the test.

Page 6: Chapter 2 Assignment (due Thursday, October 3) Introduction ...lindasmathpage.coffeecup.com/docs/Assignment_2_M146_F19.pdfChapter 2 Assignment Page 10 of 16 Scatterplot: Female Literacy

Math 146, Fall 2019 Instructor – Linda C. Stephenson Chapter 2 Assignment

Page 6 of 16

To create the

histograms, go to

Data/Histogram. Select

the column of data that

you want to plot, and

then hit the ‘Plot’ button.

STATDISK will auto-fit

the plot options for you,

and generate a

histogram. For

example, I am plotting

the head length of the

bears (column 4).

In most cases, you will

want to define your own

plot options, including

the Class width and the

Class start (starting

lower class limit). To do

this, click on the “User

defined” button. Enter

the correct class width

and starting lower class

limit (which I specified

above for this assignment), and hit ‘Plot’. For example, for the bear’s head

length histogram, I will change the starting lower class limit to 4.

Note: anytime you change the input options for the histogram (class width

or class start), you need to hit the Plot button at the top again!

Page 7: Chapter 2 Assignment (due Thursday, October 3) Introduction ...lindasmathpage.coffeecup.com/docs/Assignment_2_M146_F19.pdfChapter 2 Assignment Page 10 of 16 Scatterplot: Female Literacy

Math 146, Fall 2019 Instructor – Linda C. Stephenson Chapter 2 Assignment

Page 7 of 16

Finally, for all histograms,

you need to add a title

and an x-axis label,

including any

appropriate units of

measurement. Use the

‘Title’ and ‘x Label’ boxes

at the upper right

Now you should have a

nice histogram with the

title and x-axis label

displayed. For example

(see above):

Print out each of the histograms. Do NOT use the ‘Print’ or ‘Copy’ button at

the bottom of the STATDISK histogram box. The problem is that these

commands only copy the plot itself (the part with the white background), and not

the column on the right showing the input plot options. I want to see these!

Instead, you can use an alternative method, especially if you want to print more

than one graph per page. This method is described separately in the file

“Printing Out Applet Output”, which is on the Assignment webpage. Wherever it

says “applet”, just think “STATDISK” instead, it is exactly the same. I will

demonstrate this in class also, if I haven’t already done so.

Finish this portion of the assignment by answering the questions given on the

Answer Sheet.

Page 8: Chapter 2 Assignment (due Thursday, October 3) Introduction ...lindasmathpage.coffeecup.com/docs/Assignment_2_M146_F19.pdfChapter 2 Assignment Page 10 of 16 Scatterplot: Female Literacy

Math 146, Fall 2019 Instructor – Linda C. Stephenson Chapter 2 Assignment

Page 8 of 16

Part B Scatterplots

You are going to create one scatterplot.

The purpose of making scatterplots is to investigate whether or not there is a

relationship between the two variables. Where applicable, you will confirm a

linear relationship (or not) by running a linear correlation and regression analysis.

From the bear data, for example,

here is a plot of a bear’s weight as

a function of its chest

measurement. From the graph,

there appears to be a strong linear

relationship between the two

variables, since the data points

approximately form a line.

We can confirm this by running a

Correlation and Regression

analysis in STATDISK.

Go to Analysis/Correlation and

Regression, select the correct

columns for the x variable and

the y variable, and then hit the

‘Evaluate’ button.

In this case, the calculated correlation coefficient r = 0.963, which indicates a

very strong positive linear correlation. It is much higher than the critical r = 0.268,

which confirms the linear correlation.

Page 9: Chapter 2 Assignment (due Thursday, October 3) Introduction ...lindasmathpage.coffeecup.com/docs/Assignment_2_M146_F19.pdfChapter 2 Assignment Page 10 of 16 Scatterplot: Female Literacy

Math 146, Fall 2019 Instructor – Linda C. Stephenson Chapter 2 Assignment

Page 9 of 16

On the other hand, this

plot of the bears weight as

a function of month of

measurement does NOT

seem to show a strong

linear correlation

between the two

variables. The data is

scattered about in no

particular pattern. Which

makes sense, because it

is the weight plotted as a

function of the month in

which the measurement was taken. There’s no reason for these two variables to

be correlated!

The correlation and

regression analysis

confirms that there is no

linear correlation, since

the calculated correlation

coefficient (r = 0.171) is

less than (closer to 0 than)

the critical r (critical r = 0.268).

Also note – a scatterplot may reveal that there IS a relationship between the two

variables, but that it is clearly NOT a linear relationship. In this case, it would

NOT be appropriate to run a linear correlation and regression analysis, because

a visual inspection of the data has already shown that the data is clearly not

linear.

Page 10: Chapter 2 Assignment (due Thursday, October 3) Introduction ...lindasmathpage.coffeecup.com/docs/Assignment_2_M146_F19.pdfChapter 2 Assignment Page 10 of 16 Scatterplot: Female Literacy

Math 146, Fall 2019 Instructor – Linda C. Stephenson Chapter 2 Assignment

Page 10 of 16

Scatterplot: Female Literacy Rates as a Function of Male Literacy Rates for

World Countries

You are going to make a scatter plot of the paired data (male, female) or (x, y) for

each country in the table. In other words, you will plot female literacy rates as a

function of male literacy rates (both in %). The point is to investigate if there a

relationship between the two variables.

Use the same two columns of data that you already copied over from Excel.

To create the scatterplot, go to Data/Scatterplot. Select the two columns of data

you want to plot, then hit the ‘Plot’ button. Note that you want to choose ‘Female’

(which is in Excel column C and STATDISK column 2) as the Y Value variable,

and ‘Male’ (which is in Excel column B and STATDISK column 1) as the X Value

variable (the independent variable). UNCHECK the “Visible” box shown at the

bottom of the plot (see figure on previous page) to get rid of the green line.

The program will automatically plot a green regression line whether or not there

is a linear correlation, so it can be very deceiving. I noticed that I sometimes had

to check and uncheck the box a couple of times to get rid of the line, so just play

around with that.

Add a title and both an x- and y-axis label every time you make a scatterplot. I

noticed that when I typed them in, I had to immediately hit the ‘Enter’ key on my

keyboard, and then it would appear. Make sure that your title correctly reflects

which variable is a function of the other variable. Also always make sure that

you have included the correct units of measurement for both the x and y axes. In

this case, the units of measurement for both variables is percent (%).

Print out the scatterplot, using one of the methods described in “Printing Out

Applet Output”.

Page 11: Chapter 2 Assignment (due Thursday, October 3) Introduction ...lindasmathpage.coffeecup.com/docs/Assignment_2_M146_F19.pdfChapter 2 Assignment Page 10 of 16 Scatterplot: Female Literacy

Math 146, Fall 2019 Instructor – Linda C. Stephenson Chapter 2 Assignment

Page 11 of 16

Run a correlation and regression analysis on the two variables, as described

above. Snip and save the output for printing.

CAUTION! NEVER SORT the data when you are making scatterplots! The

scatterplots consist of paired (x, y) data, so if you sort the columns individually,

then the pairs will no longer exist, and the data will be meaningless when you

plot it.

WHAT YOU NEED TO TURN IN:

1. Printout of two histograms. Don’t forget to add a title and an x-axis label,

including the measurement units where applicable. The units in this case

are just percent (%).

2. Printout of the scatterplot. Don’t forget to include a title and x- and y-axis

labels, including the measurement units.

3. Printout of one correlation and regression analysis.

4. Fully completed Answer Sheet (5 pages total).

Note: you do NOT need to turn in all the pages of instructions.

Page 12: Chapter 2 Assignment (due Thursday, October 3) Introduction ...lindasmathpage.coffeecup.com/docs/Assignment_2_M146_F19.pdfChapter 2 Assignment Page 10 of 16 Scatterplot: Female Literacy

Math 146, Fall 2019 Name: Chapter 2 Assignment

Page 12 of 16

Answer Sheet Part A Frequency Distributions and Histograms Frequency Distribution for: Male Literacy Rates Note: Write the data classes as: lower class limit – upper class limit, to the first decimal place level of accuracy. Given: Starting lower class limit = 20 Class width = 10 Fill in ALL of the boxes:

Data Classes

(% literate)

Frequency (Count)

Relative Frequency* (%)

* Note: calculate the relative frequencies (in %) to TWO decimal places. Do

NOT just use the values from the STATDISK histogram, because they are

rounded off to the nearest integer.

Page 13: Chapter 2 Assignment (due Thursday, October 3) Introduction ...lindasmathpage.coffeecup.com/docs/Assignment_2_M146_F19.pdfChapter 2 Assignment Page 10 of 16 Scatterplot: Female Literacy

Math 146, Fall 2019 Name: Chapter 2 Assignment

Page 13 of 16

Frequency Distribution for: Female Literacy Rates

Note: Write the data classes as: lower class limit – upper class limit, to the first decimal place level of accuracy. Given: Starting lower class limit = 10 Class width = 10 Fill in ALL of the boxes:

Data Classes

(% literate)

Frequency (Count)

Relative Frequency* (%)

* Note: calculate the relative frequencies to TWO decimal places. Do NOT just

use the values from the STATDISK histogram, because they are rounded off to

the nearest integer.

Page 14: Chapter 2 Assignment (due Thursday, October 3) Introduction ...lindasmathpage.coffeecup.com/docs/Assignment_2_M146_F19.pdfChapter 2 Assignment Page 10 of 16 Scatterplot: Female Literacy

Math 146, Fall 2019 Name: Chapter 2 Assignment

Page 14 of 16

Part A Questions: Histograms 1. a. Describe the distribution (shape) of the data for the male literacy

histogram. b. Describe the distribution (shape) of the data for the female literacy

histogram. 2. Note that the histograms have a similar shape. WHY do you think the

distributions are shaped like this, what could be some possible explanations? Consider the underlying causes, not the “mechanics” of the histograms. In particular, you might want to look at the sorted lists of data in Excel, and consider which countries are on which end of the graph.

3. There is one particular difference between the literacy rates for males and

females in countries of the world. Look carefully at the histograms and notice that the scales are different for both the x-axis, so account for that when you are comparing them. Who tends to have higher rates of literacy, males or females, and where (what parts of the world) are the differences the most pronounced?

Page 15: Chapter 2 Assignment (due Thursday, October 3) Introduction ...lindasmathpage.coffeecup.com/docs/Assignment_2_M146_F19.pdfChapter 2 Assignment Page 10 of 16 Scatterplot: Female Literacy

Math 146, Fall 2019 Name: Chapter 2 Assignment

Page 15 of 16

4. Refer back to the difference that you identified in question 3 between the literacy rates of males and females. Do a little research to investigate specifically why this difference exists, and describe your results here. Also, list your source(s) of information below. Feel free to type your answer and attach if that is more convenient.

Source: (list website link):

Page 16: Chapter 2 Assignment (due Thursday, October 3) Introduction ...lindasmathpage.coffeecup.com/docs/Assignment_2_M146_F19.pdfChapter 2 Assignment Page 10 of 16 Scatterplot: Female Literacy

Math 146, Fall 2019 Name: Chapter 2 Assignment

Page 16 of 16

Part B Scatterplot 5. Look at the scatter plot of Female literacy rates as a function of Male

literacy rates.

From the graph, does there appear to be a linear relationship between the two variables? (yes or no)

If there does appear to be a linear relationship, how STRONG does the relationship appear to be?

Does the correlation and regression analysis indicate that there is a linear relationship between the two variables? (yes or no)

List: Calculated r =

Critical r =

Briefly explain why you think this is the case. (Not why you came to the conclusion based on the plot, but what the underlying cause may be for the relationship or lack thereof.) Why is there or is there not a relationship between the male and female literacy rates for each country in the world?