math 1040 statcrunch exercise 1 kashif...

Math 1040 StatCrunch Exercise 1 Kashif Samuel

Part 1.

A. Computing the first Graph.

B. Changed the Bandwidth to 7.

C. Using the relative frequency

Part 2: Entering data, creating a stem-and-leaf plot

Variable: Ages

Decimal point is 1 digit(s) to the right of the colon.

1 : 89

2 : 1269

3 : 4

4 : 08

5 : 17

Part 3: Creating a pie-chart, creating a Pareto chart from data

Changing color Scheme.

Creating Pareto Graph.

Part 4: Creating a pie-chart, creating a Pareto chart from a summary

Math 1040 StatCrunch Exercise 2

Part 1: calculating summary statistics.

a. Min Tar default settings.

Summary statistics:

b. MinTar with selection.

Summary statistics:

Mode = 13

Part 2: finding outliers

Lower fence = 13 – 1.5(IQR) Upper fence = 15 + 1.5(IQR)

Interquartile range = 15 -13 = 2.0

1.5 x 2 = 3.0

Q1 – IQR = 13 – 3 = 10

Q3 + 3 = 15 + 3 = 18

Outliers are any number s less than 10 and greater than 18. So, the outliers are for this problem.

6, 8, 9, 19

Part 3: Creating a boxplot, creating a modified boxplot

Column n Mean Variance Std. Dev. Std. Err. Median Range Min Max Q1 Q3

MnTar 25 12.92 13.91 3.7296112 0.74592227 13 17 2 19 13 15

Column n Mean Std. Dev. Median Range Min Max Q1 Q3 IQR

MnTar 25 12.92 3.7296112 13 17 2 19 13 15 2

The graph seems like the graph starts at Q1 = 13 and stops Q3 = 15.

Graph 2 with fences to identify outliers.

These outliers can be misleading because if you look at the 1st outliers it is like 1 which is not in the

dataset. Which starts at 6.

Part 4: Practicing your new skills

Summary statistics:

0 is the mode

Graph boxplot showing outliers.

Q3 – Q1 = 74 – 0 = 74

1.5 x IQR = 1.5 X 74 =111

Q3 + 111 = 185

So Q1 – 111 = -111 I don’t think the outlier can be negative number.

So, all data values greater than 185 are considered outliers.

205, 206, 223, 299, 548 are the outliers

This graph shows outliers. The graph show from q1=0 to q3=74.

Column n Std. Dev. Range Min Max Median Mean Q1 Q3

Tobacco Use (sec) 50 103.9967 548 0 548 5.5 57.44 0 74

INDIVIDUAL WORK

1. My own bag of M&M’s bag

Blue Brown Green Orange Red Yellow

11 7 9 12 7 9

Mean = 9.17

SD = 2.04

2. Pie graph

Barplot

Brown Red Yellow Green Blue Orange

The mean is very close to the skittles to the M&Ms. Even though it had extra color the mean

was close. Skittles was about 11 and M&Ms mean is 9.

0

2

4

6

8

10

12

14

1 2 3 4 5 6

Series1

TERM PROJECT PART 2 - Organizing and Displaying Categorical Data: Colors

1. Population of each colored candy.

242 234 252 240 228

Red Orange Yellow Green Purple

2. Creating a Pie and Pareto Graph.

Pie Chart

Summery of the pie chart

Pareto Chart

3. Finding the mean and the standard deviation.

red orange yellow green purple 7 7 6 7 0 8 7 9 9 7 9 8 9 10 8

10 8 9 10 8 10 8 9 10 9 11 9 10 10 10 11 10 11 10 10 11 10 12 11 11 12 10 12 11 11 12 11 13 12 11 12 12 13 12 12 13 12 13 12 12 13 12 13 12 13 13 13 13 12 13 13 13 15 13 14 13 14 16 14 14 14 16 16 15 15 16 17 17 16 16 16 18 18 16 16 18 19 18 18 18

Red Orange Yellow Green Purple mean = 12 mean 11.7 mean=12.6 Mean=12 mean=11.4 sd = 2.69 sd = 3.62 sd = 3.35 sd = 2.7 sd = 3.97

It seems that it not random because it seems that every skittle was separated by the color. The

population is the total 1196 skittles for all 5 colors.

Response_id red orange yellow green purple

Mean

Std dev Min Q1 Median Q3 Max

1 338007 10 10 10 10 15

11 2.2 10 10 10 13 15 2 338155 13 8 16 9 13

11.8 3.2 8 8.5 13 15 16

3 338176 11 10 13 16 10

12 2.5 10 10 11 15 16 4 338205 10 8 13 10 18

11.8 3.9 8 9 10 16 18

5 338448 16 7 15 10 11

11.8 3.7 7 8.5 11 16 16 6 338772 18 9 9 13 16

13 4.1 9 9 13 17 18

7 338777 14 14 12 10 0

10 5.8 0 5 12 14 14 8 339050 11 12 12 16 8

11.8 2.9 8 9.5 12 14 16

9 339053 13 19 17 7 8

12.8 5.3 7 7.5 13 18 19 10 339063 12 12 9 12 16

12.2 2.9 9 10.5 12 14 16

11 339127 13 17 9 10 12

12.2 3.1 9 9.5 12 15 17 12 339134 9 11 13 18 9

12 3.7 9 9 11 16 18

13 339346 16 8 13 12 14

12.6 3 8 10 13 15 16 14 339505 13 13 9 11 11

11.4 1.7 9 10 11 13 13

15 339630 12 12 18 11 11

12.8 2.9 11 11 12 15 18 16 339651 11 10 13 15 12

12.2 1.9 10 10.5 12 14 15

17 340503 13 7 18 12 10

12 4.1 7 8.5 12 16 18 18 340563 8 16 6 14 14

11.6 4.3 6 7 14 15 16

19 340888 7 13 16 12 13

12.2 3.2 7 9.5 13 15 16 20 342943 12 18 11 12 7

12 3.9 7 9 12 15 18

Total 242 234 252 240 228

1196 Red Orange Yellow Geen Purple

Orange Red Yellow

12 7 9

mean=11.96

sd= 3.27

min = 0 max = 19

q1 = 10

median = 12

q3 = 14

0

0.2

0.4

0.6

0.8

1

1.2

1

Series1

red orange yellow green purple

7 7 6 7 0

8 7 9 9 7

9 8 9 10 8

10 8 9 10 8

10 8 9 10 9

11 9 10 10 10

11 10 11 10 10

11 10 12 11 11

12 10 12 11 11

12 11 13 12 11

12 12 13 12 12

13 12 13 12 12

13 12 13 12 13

13 13 13 12 13

13 13 15 13 14

13 14 16 14 14

14 16 16 15 15

16 17 17 16 16

16 18 18 16 16

18 19 18 18 18

Red Orange Yellow Green Purple

mean = 12

mean = 11.7 mean=12.6 Mean=12 mean=11.4

sd = 2.69 sd = 3.62 sd = 3.35 sd = 2.7 sd = 3.97

5:2 Term Project Part 3

GROUP 3 WORK:

Question 1

2. frequency histogram

3. Creating box plot

INDIVIDUAL WORK:

1. The shape of the distribution in Normal, like a bell shape. Yes, the graphs to reflect what was

expected to see. Yes, it does the mean very close to the bag counted. The mean for my

individual bag was 9.4 which was pretty close to mean 11 what we experimented with.

2. Categorical data is data that involves labels but not real meaning of the labels. Where

quantitative data is more about number but no real meaning of values, like the number on the

player jerseys. Pareto, pie, histogram, box plot make sense and can be read easily. Where the

data represent a visual aspect to better understand the variables. Categorical is harder to

understand because the labels are hard to graph. Number with values in range makes better

sense then number or labels without any meaning.

7:9 Term Project Part 4 Response_id red orange yellow green purple Var /bag min 1 338007 10 10 10 10 15 5 1.33972 2 338155 13 8 a 9 13 10.7 1.959841 3 338176 11 10 13 16 10 6.5 1.527515 4 338205 10 8 13 10 18 15.2 2.335881 5 338448 16 7 15 10 11 13.7 2.217631 6 338772 18 9 9 13 16 16.6 2.441085 7 338777 14 14 12 10 0 34 3.493561 8 339050 11 12 12 16 8 8.2 1.715678 9 339053 13 19 17 7 8 28.2 3.181658 10 339063 12 12 9 12 16 22.7 2.854578 11 339127 13 17 9 10 12 9.7 1.866014 12 339134 9 11 13 18 9 14 2.24178 13 339346 16 8 13 12 14 8.8 1.777339 14 339505 13 13 9 11 11 2.8 1.002554 15 339630 12 12 18 11 11 8.7 1.767212 16 339651 11 10 13 15 12 3.7 1.15247 17 340503 13 7 18 12 10 16.5 2.433722 18 340563 8 16 6 14 14 18.8 2.597813 19 340888 7 13 16 12 13 10.7 1.959841 20 342943 12 18 11 12 7 15.5 2.35882 236 Total 242 234 236 240 228 1180 Red Orange Yellow Geen Purple 1 Construct a 99% confidence interval estimate

for the population proportion of yellow candies.

Yellow Var= 11.2 s (1-.99)/2= 0.005 n-1 = 20 -1=19 chi 19 R chi .005 value: 38.582 Right L chi .995 value: 6.844 Left 2.35<Yellow bags<5.58 We are 99% confident that the inerval from 2.35 to 5.58 actually does contain the true value of the yellow candies 2 Construct a 95% confidence interval estimate

for the population mean number of candies

INDIVIDUAL WORK: Submit one pdf file to the assignment addressing each

number below.

1. In a paragraph, explain in general the purpose and meaning of a confidence

interval.

The means that if we were to select many different samples of size and construct the corresponding

confidence intervals of certain percentages of them that would actually contain the value of the

population proportion p. It gives the range of the confident of the value in percentage. Confidence

intervals can be used to compare different data sets.

1. In a paragraph, explain in general the purpose and meaning of a hypothesis test.

A hypothesis test is a procedure for testing a claim about a property of a population. The

purpose is to identify the null hypothesis and alternative hypothesis from given claim,

and express both forms. Calculate the value of the test statistic given claim and sample

data. Choose the relevant sample distribution. Find the P- value or identify the critical

value and state the conclusion of the claim.

2. Use a 0.05 significance level to test the claim that 20% of all Skittles candies are red,

using the entire class data set as your sample.

Original claim is that P = .20 of all the Skittles candies are red.

Ho: P= .20 null hypothesis, H1: ≠ .20 alternative hypothesis, α = 0.05

n = 1196

p hat = x/n = 239 / 1196 =.1998

t = (.1998 - .20)/ sqrt(.20*.80)/1196) = -.0173

P value = .986

P value > α

Fail to reject Ho, that there is not sufficient evidence to warrant rejection of the claim that

20% of all skittles candies are red.

3. Use a 0.01 significance level to test the claim that the mean number of candies in a bag

of Skittles is 55, using the entire class data set as your sample.

Ho: P= .55 null hypothesis, H1: ≠ .55 alternative hypothesis, α = 0.01

n = 1196

p hat = x/n = 239 / 1196 =.1998

t = (11.94- .55)/ (3.27/sqrt(1196)) = 120.671

P Value = 0

P value 0 ≤ α

Reject Ho that there is sufficient evidence to warrant rejection of the claim that the mean

number of candies in a bag of skittles is 55.

4. In detail, discuss how your samples meet (or fail to meet) the requirements for

performing these hypothesis tests.

All requirements were met with the problem 2 for performing the hypothesis test. The

data was the simple random sample. It was also binomial distribution and np ≥ 5 and

n(1-p) ≤ 5. Also the requirements were met there is set of sample proportions are more

or less than normal distribution. Problem 3 also met all requirements to perform the

hypothesis test. It was sample of simple random sample and n was normally distributed

or n >30. It was simple random sample and n > 30.

5. Discuss and interpret the results of each of your two hypothesis tests.

In problem 2, it was proven that 20% of all Skittles are red. This was proven because of the P

value being .98 greater than significance level α. Fail to reject that there is not sufficient

evidence to warrant rejection of the claim that 20% of all skittles candies are red.

In problem 3, it was proven that claim that the mean number of candies in a bag of Skittles

is 55. This was proven because of the P value being 0 less or equal to significance level α.

Reject Ho that there is sufficient evidence to warrant rejection of the claim that the mean

number of candies in a bag of skittles is 55.

Math 1040 StatCrunch Exercise 3

Exercise 1.

Part 1: Creating a scatterplot

This is negative correlation with outliers. Because results can be strongly affected by the presence of

outliers, any outliers must be removed if they are known. A straight line could be drawn.

Part 2: Finding the correlation coefficient and regression equation.

Simple linear regression results: Dependent Variable: HIGHWAY

Independent Variable: WEIGHT

HIGHWAY = 52.354965 - 0.0066952812 WEIGHT

Sample size: 21

R (correlation coefficient) = -0.7927

R-sq = 0.62832975

Estimate of error standard deviation: 2.7895854

Parameter estimates:

Analysis of variance table for regression model:

a. Reporting the linear correlation coefficient r, the linear regression equation, and the predicted

gas mileage for a car weighing 2800 pounds.

Simple linear regression results: Dependent Variable: HIGHWAY

Independent Variable: WEIGHT

HIGHWAY = 52.354965 - 0.0066952812 WEIGHT

Sample size: 21

R (correlation coefficient) = -0.7927

R-sq = 0.62832975




Parameter Estimate Std. Err. Alternative DF T-Stat P-Value

Intercept 52.354965 4.148952 ≠ 0 19 12.61884 <0.0001

Slope -0.0066952812 0.0011813459 ≠ 0 19 -5.667503 <0.0001

Source DF SS MS F-stat P-value

Model 1 249.95557 249.95557 32.120586 <0.0001

Error 19 147.85396 7.781787

Total 20 397.8095


Intercept 52.354965 4.148952 ≠ 0 19 12.61884 <0.0001

Slope -0.0066952812 0.0011813459 ≠ 0 19 -5.667503 <0.0001


Model 1 249.95557 249.95557 32.120586 <0.0001

Error 19 147.85396 7.781787

Predicted values:

Part 3: Practicing your new skills

1. The linear correlation coefficient and the linear regression equation.

2. Simple linear regression results: Dependent Variable: Height

Independent Variable: Foot Length

Height = 64.12561 + 4.2912536 Foot Length

Sample size: 40

Total 20 397.8095

X value Pred. Y s.e.(Pred. y) 95% C.I. for mean 95% P.I. for new

2800 33.608177 1.0023116 (31.510315, 35.70604) (27.404058, 39.812294)

R (correlation coefficient) = 0.842

R-sq = 0.7090274


Parameter estimates: 3.


2. The critical value for your sample size from Table A-6 with α = 0.01. Stating if whether or not a significant linear correlation exists.


Intercept 64.12561 11.485053 ≠ 0 38 5.583397 <0.0001

Slope 4.2912536 0.44595072 ≠ 0 38 9.622707 <0.0001


Model 1 2806.866 2806.866 92.596504 <0.0001

Error 38 1151.889 30.312872

Total 39 3958.755

Looking at the graph, there is linear correlation.

3. A coping of the Scatter Plot with fitted regression line. The graph will have appropriate labels,

including the correct units of measurement.

4. Would the linear regression equation give a good prediction of the height of a person with a

foot length of 15.3 centimeters? Explain why or why not. It does but does seem that at the end the dots get less so, that may cause and error.

Now, determine if there is a linear correlation between age in years (X) and foot length in centimeters (Y). Include the following in the document file that you will submit:

5. The linear correlation coefficient and the linear regression equation.

Foot Length = 23.824196 + 0.051946912 Age

Simple linear regression results: Dependent Variable: Foot Length

Independent Variable: Age

Foot Length = 23.824196 + 0.051946912 Age

Sample size: 40

R (correlation coefficient) = 0.3591

R-sq = 0.12895392





Intercept 23.824196 0.83638424 ≠ 0 38 28.48475 <0.0001

Slope 0.051946912 0.021901367 ≠ 0 38 2.371857 0.0229


Model 1 19.655672 19.655672 5.6257057 0.0229

Error 38 132.76833 3.4939034

Total 39 152.424

6. The critical value for your sample size from Table A-6 with α = 0.01. State whether or not a significant linear correlation exists.

The correlation does not exist per graph. As you can see the dots are all over the place and no correlation exists.

7. A copy of the Scatter Plot with fitted regression line. The graph will have appropriate labels,

including the correct units of measurement.

8. Would the linear regression equation give a good prediction of the foot length of a person

who is 42 years old? Explain why or why not. 9. No because it is hard to predict where the next dot is going to land. There is no correlation.

Part 4: Wrapping up

Reading back through the assignment and double check that you have completed each part. Make sure your document file is neat, orderly, and easy to follow. Make sure your name is on your assignment. Checked and document is neat, orderly and easy to follow. Print your document and turn it in by the posted due date. Submitting on online.

Math 1040 4-26-15

9:10 Term Project Part 6 reflection

In this statistics class, I learned lot, like how to perform random sampling survey to find

out how population reacts to certain interests. If I needed to find out how many people play video

games, I can do a random sample survey to find out the result by using certain rules to avoid

error. I can also find out how many people are needed to complete the video games survey to get

positive results. I can setup confident intervals to find where the population mean would fall in

certain range. I can use the confidence 90%, 95% or 99% to setup the margin of error. I learned

that shorter range is more accurate where mean would fall under. Now when I watch the voting

polls, I will know what does +- margin of error means. Statics will really help me understand in

my other subjects what kind of measurements I need and how to understand story problems. I

will be able to set the data as nominal, ratio, interval, and ordinal. I will be able to setup graphs

like Picto, Pareto, pie, and bar graphs, according to certain data. There are many other tools I

could use from this statics class to study or understand statistical data. I used graphs in physics

class to demonstrate data. Also found the mean for the data in physics. I also have used the tools

in many other class to show data in a simpler way where observers can understand the data.

The projects really showed me how to use the mean and graphs from data in stats crunch.

Projects really helped me find mean, standard deviation and help create certain graph used for

particular data set. The projects also helped with large data that can be computed in stats crunch

application in seconds. It also showed me how to finding mean as closest possible to the accurate

result by repeating experiments. I also learned how normal distribution bell shape can help me

get closest to the population mean. In the real world it is important to learn what users want. I

will not only be using the skills from this class in other classes but will use them in my career.

math 1040 statcrunch exercise 1 kashif...

Documents