math 1040 statcrunch exercise 1 kashif...
TRANSCRIPT
Math 1040 StatCrunch Exercise 1 Kashif Samuel
Part 1.
A. Computing the first Graph.
B. Changed the Bandwidth to 7.
C. Using the relative frequency
Part 2: Entering data, creating a stem-and-leaf plot
Variable: Ages
Decimal point is 1 digit(s) to the right of the colon.
1 : 89
2 : 1269
3 : 4
4 : 08
5 : 17
Part 3: Creating a pie-chart, creating a Pareto chart from data
Changing color Scheme.
Creating Pareto Graph.
Part 4: Creating a pie-chart, creating a Pareto chart from a summary
Math 1040 StatCrunch Exercise 2
Part 1: calculating summary statistics.
a. Min Tar default settings.
Summary statistics:
b. MinTar with selection.
Summary statistics:
Mode = 13
Part 2: finding outliers
Lower fence = 13 – 1.5(IQR) Upper fence = 15 + 1.5(IQR)
Interquartile range = 15 -13 = 2.0
1.5 x 2 = 3.0
Q1 – IQR = 13 – 3 = 10
Q3 + 3 = 15 + 3 = 18
Outliers are any number s less than 10 and greater than 18. So, the outliers are for this problem.
6, 8, 9, 19
Part 3: Creating a boxplot, creating a modified boxplot
Column n Mean Variance Std. Dev. Std. Err. Median Range Min Max Q1 Q3
MnTar 25 12.92 13.91 3.7296112 0.74592227 13 17 2 19 13 15
Column n Mean Std. Dev. Median Range Min Max Q1 Q3 IQR
MnTar 25 12.92 3.7296112 13 17 2 19 13 15 2
The graph seems like the graph starts at Q1 = 13 and stops Q3 = 15.
Graph 2 with fences to identify outliers.
These outliers can be misleading because if you look at the 1st outliers it is like 1 which is not in the
dataset. Which starts at 6.
Part 4: Practicing your new skills
Summary statistics:
0 is the mode
Graph boxplot showing outliers.
Q3 – Q1 = 74 – 0 = 74
1.5 x IQR = 1.5 X 74 =111
Q3 + 111 = 185
So Q1 – 111 = -111 I don’t think the outlier can be negative number.
So, all data values greater than 185 are considered outliers.
205, 206, 223, 299, 548 are the outliers
This graph shows outliers. The graph show from q1=0 to q3=74.
Column n Std. Dev. Range Min Max Median Mean Q1 Q3
Tobacco Use (sec) 50 103.9967 548 0 548 5.5 57.44 0 74
INDIVIDUAL WORK
1. My own bag of M&M’s bag
Blue Brown Green Orange Red Yellow
11 7 9 12 7 9
Mean = 9.17
SD = 2.04
2. Pie graph
Barplot
Brown Red Yellow Green Blue Orange
The mean is very close to the skittles to the M&Ms. Even though it had extra color the mean
was close. Skittles was about 11 and M&Ms mean is 9.
0
2
4
6
8
10
12
14
1 2 3 4 5 6
Series1
TERM PROJECT PART 2 - Organizing and Displaying Categorical Data: Colors
1. Population of each colored candy.
242 234 252 240 228
Red Orange Yellow Green Purple
2. Creating a Pie and Pareto Graph.
Pie Chart
Summery of the pie chart
Pareto Chart
3. Finding the mean and the standard deviation.
red orange yellow green purple 7 7 6 7 0 8 7 9 9 7 9 8 9 10 8
10 8 9 10 8 10 8 9 10 9 11 9 10 10 10 11 10 11 10 10 11 10 12 11 11 12 10 12 11 11 12 11 13 12 11 12 12 13 12 12 13 12 13 12 12 13 12 13 12 13 13 13 13 12 13 13 13 15 13 14 13 14 16 14 14 14 16 16 15 15 16 17 17 16 16 16 18 18 16 16 18 19 18 18 18
Red Orange Yellow Green Purple mean = 12 mean 11.7 mean=12.6 Mean=12 mean=11.4 sd = 2.69 sd = 3.62 sd = 3.35 sd = 2.7 sd = 3.97
It seems that it not random because it seems that every skittle was separated by the color. The
population is the total 1196 skittles for all 5 colors.
Response_id red orange yellow green purple
Mean
Std dev Min Q1 Median Q3 Max
1 338007 10 10 10 10 15
11 2.2 10 10 10 13 15 2 338155 13 8 16 9 13
11.8 3.2 8 8.5 13 15 16
3 338176 11 10 13 16 10
12 2.5 10 10 11 15 16 4 338205 10 8 13 10 18
11.8 3.9 8 9 10 16 18
5 338448 16 7 15 10 11
11.8 3.7 7 8.5 11 16 16 6 338772 18 9 9 13 16
13 4.1 9 9 13 17 18
7 338777 14 14 12 10 0
10 5.8 0 5 12 14 14 8 339050 11 12 12 16 8
11.8 2.9 8 9.5 12 14 16
9 339053 13 19 17 7 8
12.8 5.3 7 7.5 13 18 19 10 339063 12 12 9 12 16
12.2 2.9 9 10.5 12 14 16
11 339127 13 17 9 10 12
12.2 3.1 9 9.5 12 15 17 12 339134 9 11 13 18 9
12 3.7 9 9 11 16 18
13 339346 16 8 13 12 14
12.6 3 8 10 13 15 16 14 339505 13 13 9 11 11
11.4 1.7 9 10 11 13 13
15 339630 12 12 18 11 11
12.8 2.9 11 11 12 15 18 16 339651 11 10 13 15 12
12.2 1.9 10 10.5 12 14 15
17 340503 13 7 18 12 10
12 4.1 7 8.5 12 16 18 18 340563 8 16 6 14 14
11.6 4.3 6 7 14 15 16
19 340888 7 13 16 12 13
12.2 3.2 7 9.5 13 15 16 20 342943 12 18 11 12 7
12 3.9 7 9 12 15 18
Total 242 234 252 240 228
1196 Red Orange Yellow Geen Purple
Orange Red Yellow
12 7 9
mean=11.96
sd= 3.27
min = 0 max = 19
q1 = 10
median = 12
q3 = 14
0
0.2
0.4
0.6
0.8
1
1.2
1
Series1
red orange yellow green purple
7 7 6 7 0
8 7 9 9 7
9 8 9 10 8
10 8 9 10 8
10 8 9 10 9
11 9 10 10 10
11 10 11 10 10
11 10 12 11 11
12 10 12 11 11
12 11 13 12 11
12 12 13 12 12
13 12 13 12 12
13 12 13 12 13
13 13 13 12 13
13 13 15 13 14
13 14 16 14 14
14 16 16 15 15
16 17 17 16 16
16 18 18 16 16
18 19 18 18 18
Red Orange Yellow Green Purple
mean = 12
mean = 11.7 mean=12.6 Mean=12 mean=11.4
sd = 2.69 sd = 3.62 sd = 3.35 sd = 2.7 sd = 3.97
5:2 Term Project Part 3
GROUP 3 WORK:
Question 1
2. frequency histogram
3. Creating box plot
INDIVIDUAL WORK:
1. The shape of the distribution in Normal, like a bell shape. Yes, the graphs to reflect what was
expected to see. Yes, it does the mean very close to the bag counted. The mean for my
individual bag was 9.4 which was pretty close to mean 11 what we experimented with.
2. Categorical data is data that involves labels but not real meaning of the labels. Where
quantitative data is more about number but no real meaning of values, like the number on the
player jerseys. Pareto, pie, histogram, box plot make sense and can be read easily. Where the
data represent a visual aspect to better understand the variables. Categorical is harder to
understand because the labels are hard to graph. Number with values in range makes better
sense then number or labels without any meaning.
7:9 Term Project Part 4 Response_id red orange yellow green purple Var /bag min 1 338007 10 10 10 10 15 5 1.33972 2 338155 13 8 a 9 13 10.7 1.959841 3 338176 11 10 13 16 10 6.5 1.527515 4 338205 10 8 13 10 18 15.2 2.335881 5 338448 16 7 15 10 11 13.7 2.217631 6 338772 18 9 9 13 16 16.6 2.441085 7 338777 14 14 12 10 0 34 3.493561 8 339050 11 12 12 16 8 8.2 1.715678 9 339053 13 19 17 7 8 28.2 3.181658 10 339063 12 12 9 12 16 22.7 2.854578 11 339127 13 17 9 10 12 9.7 1.866014 12 339134 9 11 13 18 9 14 2.24178 13 339346 16 8 13 12 14 8.8 1.777339 14 339505 13 13 9 11 11 2.8 1.002554 15 339630 12 12 18 11 11 8.7 1.767212 16 339651 11 10 13 15 12 3.7 1.15247 17 340503 13 7 18 12 10 16.5 2.433722 18 340563 8 16 6 14 14 18.8 2.597813 19 340888 7 13 16 12 13 10.7 1.959841 20 342943 12 18 11 12 7 15.5 2.35882 236 Total 242 234 236 240 228 1180 Red Orange Yellow Geen Purple 1 Construct a 99% confidence interval estimate
for the population proportion of yellow candies.
Yellow Var= 11.2 s (1-.99)/2= 0.005 n-1 = 20 -1=19 chi 19 R chi .005 value: 38.582 Right L chi .995 value: 6.844 Left 2.35<Yellow bags<5.58 We are 99% confident that the inerval from 2.35 to 5.58 actually does contain the true value of the yellow candies 2 Construct a 95% confidence interval estimate
for the population mean number of candies
INDIVIDUAL WORK: Submit one pdf file to the assignment addressing each
number below.
1. In a paragraph, explain in general the purpose and meaning of a confidence
interval.
The means that if we were to select many different samples of size and construct the corresponding
confidence intervals of certain percentages of them that would actually contain the value of the
population proportion p. It gives the range of the confident of the value in percentage. Confidence
intervals can be used to compare different data sets.
1. In a paragraph, explain in general the purpose and meaning of a hypothesis test.
A hypothesis test is a procedure for testing a claim about a property of a population. The
purpose is to identify the null hypothesis and alternative hypothesis from given claim,
and express both forms. Calculate the value of the test statistic given claim and sample
data. Choose the relevant sample distribution. Find the P- value or identify the critical
value and state the conclusion of the claim.
2. Use a 0.05 significance level to test the claim that 20% of all Skittles candies are red,
using the entire class data set as your sample.
Original claim is that P = .20 of all the Skittles candies are red.
Ho: P= .20 null hypothesis, H1: ≠ .20 alternative hypothesis, α = 0.05
n = 1196
p hat = x/n = 239 / 1196 =.1998
t = (.1998 - .20)/ sqrt(.20*.80)/1196) = -.0173
P value = .986
P value > α
Fail to reject Ho, that there is not sufficient evidence to warrant rejection of the claim that
20% of all skittles candies are red.
3. Use a 0.01 significance level to test the claim that the mean number of candies in a bag
of Skittles is 55, using the entire class data set as your sample.
Ho: P= .55 null hypothesis, H1: ≠ .55 alternative hypothesis, α = 0.01
n = 1196
p hat = x/n = 239 / 1196 =.1998
t = (11.94- .55)/ (3.27/sqrt(1196)) = 120.671
P Value = 0
P value 0 ≤ α
Reject Ho that there is sufficient evidence to warrant rejection of the claim that the mean
number of candies in a bag of skittles is 55.
4. In detail, discuss how your samples meet (or fail to meet) the requirements for
performing these hypothesis tests.
All requirements were met with the problem 2 for performing the hypothesis test. The
data was the simple random sample. It was also binomial distribution and np ≥ 5 and
n(1-p) ≤ 5. Also the requirements were met there is set of sample proportions are more
or less than normal distribution. Problem 3 also met all requirements to perform the
hypothesis test. It was sample of simple random sample and n was normally distributed
or n >30. It was simple random sample and n > 30.
5. Discuss and interpret the results of each of your two hypothesis tests.
In problem 2, it was proven that 20% of all Skittles are red. This was proven because of the P
value being .98 greater than significance level α. Fail to reject that there is not sufficient
evidence to warrant rejection of the claim that 20% of all skittles candies are red.
In problem 3, it was proven that claim that the mean number of candies in a bag of Skittles
is 55. This was proven because of the P value being 0 less or equal to significance level α.
Reject Ho that there is sufficient evidence to warrant rejection of the claim that the mean
number of candies in a bag of skittles is 55.
Math 1040 StatCrunch Exercise 3
Exercise 1.
Part 1: Creating a scatterplot
This is negative correlation with outliers. Because results can be strongly affected by the presence of
outliers, any outliers must be removed if they are known. A straight line could be drawn.
Part 2: Finding the correlation coefficient and regression equation.
Simple linear regression results: Dependent Variable: HIGHWAY
Independent Variable: WEIGHT
HIGHWAY = 52.354965 - 0.0066952812 WEIGHT
Sample size: 21
R (correlation coefficient) = -0.7927
R-sq = 0.62832975
Estimate of error standard deviation: 2.7895854
Parameter estimates:
Analysis of variance table for regression model:
a. Reporting the linear correlation coefficient r, the linear regression equation, and the predicted
gas mileage for a car weighing 2800 pounds.
Simple linear regression results: Dependent Variable: HIGHWAY
Independent Variable: WEIGHT
HIGHWAY = 52.354965 - 0.0066952812 WEIGHT
Sample size: 21
R (correlation coefficient) = -0.7927
R-sq = 0.62832975
Estimate of error standard deviation: 2.7895854
Parameter estimates:
Analysis of variance table for regression model:
Parameter Estimate Std. Err. Alternative DF T-Stat P-Value
Intercept 52.354965 4.148952 ≠ 0 19 12.61884 <0.0001
Slope -0.0066952812 0.0011813459 ≠ 0 19 -5.667503 <0.0001
Source DF SS MS F-stat P-value
Model 1 249.95557 249.95557 32.120586 <0.0001
Error 19 147.85396 7.781787
Total 20 397.8095
Parameter Estimate Std. Err. Alternative DF T-Stat P-Value
Intercept 52.354965 4.148952 ≠ 0 19 12.61884 <0.0001
Slope -0.0066952812 0.0011813459 ≠ 0 19 -5.667503 <0.0001
Source DF SS MS F-stat P-value
Model 1 249.95557 249.95557 32.120586 <0.0001
Error 19 147.85396 7.781787
Predicted values:
Part 3: Practicing your new skills
1. The linear correlation coefficient and the linear regression equation.
2. Simple linear regression results: Dependent Variable: Height
Independent Variable: Foot Length
Height = 64.12561 + 4.2912536 Foot Length
Sample size: 40
Total 20 397.8095
X value Pred. Y s.e.(Pred. y) 95% C.I. for mean 95% P.I. for new
2800 33.608177 1.0023116 (31.510315, 35.70604) (27.404058, 39.812294)
R (correlation coefficient) = 0.842
R-sq = 0.7090274
Estimate of error standard deviation: 5.5057125
Parameter estimates: 3.
Analysis of variance table for regression model:
2. The critical value for your sample size from Table A-6 with α = 0.01. Stating if whether or not a significant linear correlation exists.
Parameter Estimate Std. Err. Alternative DF T-Stat P-Value
Intercept 64.12561 11.485053 ≠ 0 38 5.583397 <0.0001
Slope 4.2912536 0.44595072 ≠ 0 38 9.622707 <0.0001
Source DF SS MS F-stat P-value
Model 1 2806.866 2806.866 92.596504 <0.0001
Error 38 1151.889 30.312872
Total 39 3958.755
Looking at the graph, there is linear correlation.
3. A coping of the Scatter Plot with fitted regression line. The graph will have appropriate labels,
including the correct units of measurement.
4. Would the linear regression equation give a good prediction of the height of a person with a
foot length of 15.3 centimeters? Explain why or why not. It does but does seem that at the end the dots get less so, that may cause and error.
Now, determine if there is a linear correlation between age in years (X) and foot length in centimeters (Y). Include the following in the document file that you will submit:
5. The linear correlation coefficient and the linear regression equation.
Foot Length = 23.824196 + 0.051946912 Age
Simple linear regression results: Dependent Variable: Foot Length
Independent Variable: Age
Foot Length = 23.824196 + 0.051946912 Age
Sample size: 40
R (correlation coefficient) = 0.3591
R-sq = 0.12895392
Estimate of error standard deviation: 1.8691986
Parameter estimates:
Analysis of variance table for regression model:
Parameter Estimate Std. Err. Alternative DF T-Stat P-Value
Intercept 23.824196 0.83638424 ≠ 0 38 28.48475 <0.0001
Slope 0.051946912 0.021901367 ≠ 0 38 2.371857 0.0229
Source DF SS MS F-stat P-value
Model 1 19.655672 19.655672 5.6257057 0.0229
Error 38 132.76833 3.4939034
Total 39 152.424
6. The critical value for your sample size from Table A-6 with α = 0.01. State whether or not a significant linear correlation exists.
The correlation does not exist per graph. As you can see the dots are all over the place and no correlation exists.
7. A copy of the Scatter Plot with fitted regression line. The graph will have appropriate labels,
including the correct units of measurement.
8. Would the linear regression equation give a good prediction of the foot length of a person
who is 42 years old? Explain why or why not. 9. No because it is hard to predict where the next dot is going to land. There is no correlation.
Part 4: Wrapping up
Reading back through the assignment and double check that you have completed each part. Make sure your document file is neat, orderly, and easy to follow. Make sure your name is on your assignment. Checked and document is neat, orderly and easy to follow. Print your document and turn it in by the posted due date. Submitting on online.
Math 1040 4-26-15
9:10 Term Project Part 6 reflection
In this statistics class, I learned lot, like how to perform random sampling survey to find
out how population reacts to certain interests. If I needed to find out how many people play video
games, I can do a random sample survey to find out the result by using certain rules to avoid
error. I can also find out how many people are needed to complete the video games survey to get
positive results. I can setup confident intervals to find where the population mean would fall in
certain range. I can use the confidence 90%, 95% or 99% to setup the margin of error. I learned
that shorter range is more accurate where mean would fall under. Now when I watch the voting
polls, I will know what does +- margin of error means. Statics will really help me understand in
my other subjects what kind of measurements I need and how to understand story problems. I
will be able to set the data as nominal, ratio, interval, and ordinal. I will be able to setup graphs
like Picto, Pareto, pie, and bar graphs, according to certain data. There are many other tools I
could use from this statics class to study or understand statistical data. I used graphs in physics
class to demonstrate data. Also found the mean for the data in physics. I also have used the tools
in many other class to show data in a simpler way where observers can understand the data.
The projects really showed me how to use the mean and graphs from data in stats crunch.
Projects really helped me find mean, standard deviation and help create certain graph used for
particular data set. The projects also helped with large data that can be computed in stats crunch
application in seconds. It also showed me how to finding mean as closest possible to the accurate
result by repeating experiments. I also learned how normal distribution bell shape can help me
get closest to the population mean. In the real world it is important to learn what users want. I
will not only be using the skills from this class in other classes but will use them in my career.