avggpa average institutional gpa dmc distance to major

20
Introduction The Pre-College Selection Commodity (P-CSC) has conducted a research study using multiple regression analysis to determine if schools less than 50 miles (44.44 miles) away from major cities are more expensive. A list of potential independent variables to conduct this study for each public and private university is given below. Literature Review There is no direct research or study that relates to our topic however; we have found studies that mention variables in our research. For example, “ A Study of The Impact of Institution Name, Tuition Price, Longevity, and Website form on online students enrollment” by Scott Francis Snair. This statistical research included a list of two tables with independent variables. The first table independent variables include Age, EPI, Name Group, Graduation Tuition, and Undergraduate Tuition. The second table included similar independent variables such as Name Group, Undergraduate Tuition, and Age. This study is combined in a statistically significant, multiple linear regression analysis, institution name, graduate tuition price, and distance learning program. The strength of this model is tuition information that can determine how we analysis our model by using multiple variables to support our study of universities less than 50 miles (44.444) away from major cities less expensive. The major weakness in this model is the lack of information relating to the distance of these universities from major cities. AvgGPA Average institutional GPA DMC Distance to Major Cities In-Out State In State vs Out of State Price PrivPub Private vs Public (1,0) Pop Student Body Population Constr Construction to Campus (1,0) Sports Division I sports teams FinAid Average Amount of Financial Aid offered Accred School Accreditation (1,0)

Upload: others

Post on 05-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AvgGPA Average institutional GPA DMC Distance to Major

Introduction

The Pre-College Selection Commodity (P-CSC) has conducted a research study using multiple

regression analysis to determine if schools less than 50 miles (44.44 miles) away from major cities are

more expensive. A list of potential independent variables to conduct this study for each public and

private university is given below.

Literature Review

There is no direct research or study that relates to our topic however; we have found studies that

mention variables in our research. For example, “ A Study of The Impact of Institution Name, Tuition

Price, Longevity, and Website form on online students enrollment” by Scott Francis Snair. This

statistical research included a list of two tables with independent variables. The first table independent

variables include Age, EPI, Name Group, Graduation Tuition, and Undergraduate Tuition. The second

table included similar independent variables such as Name Group, Undergraduate Tuition, and Age.

This study is combined in a statistically significant, multiple linear regression analysis, institution

name, graduate tuition price, and distance learning program. The strength of this model is tuition

information that can determine how we analysis our model by using multiple variables to support our

study of universities less than 50 miles (44.444) away from major cities less expensive. The major

weakness in this model is the lack of information relating to the distance of these universities from

major cities.

AvgGPA Average institutional GPA DMC Distance to Major Cities In-Out State In State vs Out of State Price PrivPub Private vs Public (1,0) Pop Student Body Population Constr Construction to Campus (1,0) Sports Division I sports teams FinAid Average Amount of Financial Aid offered Accred School Accreditation (1,0)

Page 2: AvgGPA Average institutional GPA DMC Distance to Major

The next reference is from Private and Public Universities

(http://www.brainchild.org/publicORprivateU.html). This website breaks down the benefits and

drawbacks of attending a public and private university. The research mentioned that public

universities are less expensive due to some funding by the state in which the university is located in,

the student body is much larger and diverse than the average private university, and finally the

student has a higher chance to attend a university anywhere in the country because the chance of

being accepted is greater. However, a few drawbacks are the university prestige and the larger the

university the higher a chance the student may feel distant from the educational atmosphere and

anonymous among student population.

On the other hand private universities generally have smaller class sizes and more prestige than public

universities. Private universities also have higher amount of extra-curricular actives and programs for

students to participate in. However, a few drawbacks are the difficulty in being accepted for

admission and prices can be prohibitive for lower-income students although financial aid packages are

available for students who excel academically.

This study correlate with our project based off the reasoning for tuition prices. This allows us to

determine the university price difference that occurs between public and private schools in major cities

vs. away from major cities.

Data Collection Determining the accuracy of the data collected allowed us to narrow down our resources to three

websites, collegestats.org, collegeboard.org and nber.org. These resources allowed us to generate the

variables in this research. These websites also provided us accurate distance of universities in

Maryland and surrounding neighbor states.

Page 3: AvgGPA Average institutional GPA DMC Distance to Major

Descriptive Statistics Avg GPA Distance to City Total InState Cost Student Population Avg FinAid Age Mean 3.3784 43.098 33877.05 6400.33 18468.71061 153.3 Median 3.37 35.000 30776.00 3955.00 14427 147.00 Min 2.40 1.00 8400 369 3615 37 Max 4.21 200 56442 31593 44349 320 25% 3.18 6.5 20785.5 1546 10668 122.5 75% 3.58 70 49760 7609 28646 178 St Dev .34054 44.7919 15137.739 7066.819 10852.92805 62.081 Skewness -.204 1.663 .201 1.840 .655 .257 Outliers The mean/average of a data set is found by taking the aggregated sum of the data set and dividing that number the population. The mean shows us where each data will probably be around and gives us a reliable point to compare the rest of our data to. The median of the data is the literal middle number in the data set when arranged from least to greatest, as it would be with the range. This will help us develop a sense of the skew of the data and see where each data set lies in relation to the rest of the data. The minimum of a data set is the smallest or the least value in the data. In contrast, the maximum of a data set is the largest or the greatest value in a set of data. This will give us further insight into where any single data point lies in the array of the entire set. The first quartile is the first 25% of the data, whereas the third quartile is the marker for 75%, or the final 25% of the data. These descriptions also give us further insight into where any given point is in an array. The skewness is the extent to which the distribution of the data favors one side or the other of the numeric scale. This will show us whether our data favors the highest point or the lowest. The standard deviation of the data shows us what to expect in regards to variation in the data. It gives us an initial preview into how much a portion of data is varied from the average in that portion. For example, to find the standard deviation, one must first find the variance of the population and square root that number. To find the variance, you would find the mean of the data and then subtract that mean from each individual number and square the result. You then average those squared differences. This is the variance. To find the standard deviation, you find the square root of the variance. The outliers are the data points that do not show relation to the general data set and are likely to inappropriately turn the inferences one way or the other. By removing the outliers, we avoid misinterpreting the data because of information that is likely to have not affected the outcome.

Page 4: AvgGPA Average institutional GPA DMC Distance to Major

The descriptive statistics are the key to the initial portion of any data set to give us the information we need prior to any further investigation. We are intrigued to see such high numbers in some portions of the data. For example, the mean of Student Population is 6,000 students and the standard deviation of total In-State cost is $15,137. Numbers like this worry us because, while we are dealing with high numbers, you normally wouldn’t see numbers that high relating to the distribution of data. The mean of our dependent variable, Total In-State Cost, is $33,877.05. Given the large differentiation in cost between private and public schools, this was to be expected and did not come as a surprise. It is also inferred that it could be the reason for discrepancies in the data. The median of the data set was $30,776.00, which is unsurprising given the mean of the data and we were pleased to see consistency in the operation. The minimum and maximum number in the data set gave us true inset into the range of the data and allowed us to expect the offset of some numbers in outliers. There is nearly a $50,000 difference between the two numbers, $56,442 being the highest and $8,400 being the lowest. The first and third quartiles allowed us to better understand the data in showing us further differentiation in the data’s range, being $20,785.50 and $49,760. The standard deviation in the data was somewhat alarming at $15,137.74, meaning that most schools’ cost were within $15,785.74 of the mean.

Page 5: AvgGPA Average institutional GPA DMC Distance to Major

The first independent variable in our survey is a numeric variable. We found the GPA for every school in our compiled list and found a mean of 3.38, meaning the average GPA of the compiled list of institutions is between an A and B average, a 3.38. This is not surprising, as the reason one attends the institutions would be to achieve a formidable GPA to have better prospects upon graduating. The median Average Institutional GPA is 3.37, surprisingly comparable to the mean and shows reliability. The lowest average GPA was a 2.4 and the highest was a 4.21, making the range 1.81. The first quartile of the data is 3.18 and the third quartile is 3.58, giving us a formidable idea as to how close most schools appear to be in their average GPA. The standard deviation in average GPA is .34054, further proving how close most school’s GPAs normally are. A skewness of -.204 shows us which side of the mean the data favors. Most data will be slightly below the mean, according to the skewness.

The next independent numeric variable is Distance to major cities. The mean distance is 43 miles from the closest major city. This measurement allowed us to pick our cut-off of 50 miles in our hypothesis. The median distance is 35 miles, farther than we would expect from the mean but close enough to be more dependable. The minimum distance is 1 mile, as many colleges chose to locate in the major city itself, and others favor the maximum of 200 miles from any major city. Such a large range was expected. The standard deviation of the distance is 44.7919

Page 6: AvgGPA Average institutional GPA DMC Distance to Major

miles, meaning that most institutions are within almost 45 miles of a major city. With a positive skew of 1.663, it is clear that many colleges favor a varying range while the bulk favor minimal distance.

Our next numeric variable was the size of the student body population. We were surprised to find such a small mean in regards to the institutions’’ sizes, to be 6,400.33 students. Also, the median was rather far from the mean at 3,955 students. This is a change of pace from the previous data sets, but not necessarily a bad thing. The minimum number of students at a campus was incredibly small at 369 students, with the largest population of 44,349. Given such a large range and variance in the amount of students on any given campus, we were not surprised to find a standard deviation of 10,852 students.

Page 7: AvgGPA Average institutional GPA DMC Distance to Major
Page 8: AvgGPA Average institutional GPA DMC Distance to Major
Page 9: AvgGPA Average institutional GPA DMC Distance to Major

Two Sample Test Results Summary We believe that the distance to a major city has a direct relationship on affecting the tuition for a college or university. In our opinion the closer a college or university is to a major city should directly increase the cost in tuition and the further a college or university is from a major city should reduce the tuition, keeping all else constant. To test this theory, we used a one-tailed, two sample hypothesis test. Our null and alternative are: H0:M(greater than 50) ≥ M(less than 50) H1:M(greater than 50) < M(less than 50) For our results, we calculated the Pvalue to be .0395. Since the Pvalue is below .05, we reject the null and therefor e our alternative is supported. Linear Regression Analysis Summary In the data we put together, we concluded that positive factors affecting tuition of a college or university are average institutional GPA, student body population, and average amount of financial aid given. The negative factors affecting tuition costs are whether the school is public or private and the distance the college or university is from a major city. Age is our only independent variable

Page 10: AvgGPA Average institutional GPA DMC Distance to Major

in which we were unsure how it would affect the tuition. After running a linear regression we chose to use model 3. In using this model, student body population and age is removed because these categories do not have enough of an effect on tuition. Our new regression equation to calculate tuition for a college or university is: Tuition = 14630.689 + 3860.942(Avg Institutional GPA) + .728(Avg Amount Financial Aid Offered) -13422.335(Private vs Public) - 36.353(Distance to Major Cities) Next we examined the decision rule to determine how it pertains to our information. The decision rule states what percent of significance our variables are at. If Sig is less than 0.01, then x is statistically significant at the 1% level of significance. If the Sig is between 0.01 and 0.05, then the variable is statistically significant at the 5% level of significance. If the Sig is between 0.05 and 0.10, then the variable is statistically significant at the 10% level of significance. Finally, if the Sig is greater than 0.10, then the variable is statistically insignificant and should not be interpreted because it has no impact on Y. When looking at the Sigs for each variable, it is shown that financial aid and whether the institution is private or public is statistically significant at the 1% level. This means for every $1 increase in average amount of financial aid, total in-state tuition will increase by $.728. Also, if the school is Public (1), tuition will be $13422.335 less than a private school, all other things constant. We found that distances to major cities is significant at the 5% level of significance. This means for every mile increase from a major city, the tuition will decrease by $36.353. Average institutional GPA is statistically significant at the 10% level of significance. This means for every point increase in institutional GPA, a schools tuition will increase by $3860.942. In conducting our research, we found that adjusted R squared is .888. This means that %88.8 of the variation in a tuitions total in-state tuition can be explained by the variation in the set of independent variables. Overall, we decided this is a good model because we predicted about %89 of the variation. We think we could improve upon this variation if we include average dropout and transfer rate of students as well as average time to complete a bachelor’s degree. We found the standard error of estimate to be 5063.305. This number tells us that, on average, our estimate will be $5063.305 off from the actual cost of the tuition for a institution. The F-Test shows a SigAnova of .000. This means our model is statistically significant at the %1 level of significance. This shows that our model is able to explain some of the behavior in in-state tuition. Assumptions and Corrections Assumption 1 “The expected value of the residuals is zero”

Page 11: AvgGPA Average institutional GPA DMC Distance to Major

It is automatically met by running a linear regression on SPSS. In the regression output, we get the following report and you will notice that the mean of the residuals is zero, so this assumption is satisfied.

Residuals Statisticsa

Minimum Maximum Mean Std. Deviation N

Predicted Value 8226.68 64589.17 33877.05 13752.611 61

Residual -13421.689 16024.712 .000 6325.886 61

Std. Predicted Value -1.865 2.233 .000 1.000 61

Std. Residual -2.031 2.425 .000 .957 61

a. Dependent Variable: Total In-State Cost

Assumption 2

The variance of the residuals is constant. We are making sure that the residuals are consistent.

This is referred to as “Homoskedasticity”. To solve for this assumption, one must produce a scatter

plot from SPSS with standardized predicted values on the X-axis and the standardized residuals on

the Y-axis. If the variance is constant, the points will look random. If it is not, and the assumption is

not met, the points will be skewed into a pattern.

Because the points appear to skewed into a mountain shape from bottom to shop, showing a clear

pattern, Assumption 2 is failed and the variance shows heteroskedasticity.

Assumption 3: The residuals are normally distributed

Using the Normal P-Plot on SPSS, we will be able to check to see if our residuals are normally

Page 12: AvgGPA Average institutional GPA DMC Distance to Major

distributed consistently along the 45 degree trend line. If the residual of our data is not normally

distributed, we will have to use another regression technique for solving our model.

We are happy with the trend in our data along the 45 degree line, so we are

in agreement with Third assumption.

Assumption 4: The residuals are independent There is no information from one residual to another. This is normally a problem in time-series data where a residual would linger through time into the next data series. If we know the residual and its characteristics, we can use the data and SPSS to predict future residuals to make better predictions.

In testing Assumption 4, we use the Durbin-Watson test to make sure that that specific statistic is between 1.5 and 2.5, meaning the residuals are independent to eachother. If this assumption is violated, we may need to look at our dependent variables differently to make better predictions. If we were to look at the change in our dependent variables, rather than the value itself, we may be able to fix the Durbin-Watson statistic.

Model Summaryb

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate Durbin-Watson

1 .908a .825 .809 6607.173 1.406

a. Predictors: (Constant), Age, Distance to Major Cities , Student Body Population,

Average Institutional GPA, Average Amount Financial Aid Offered

b. Dependent Variable: Total In-State Cost

Page 13: AvgGPA Average institutional GPA DMC Distance to Major

Our Durbin-Watson is 1.406, failing to meet assumption 4. This means that there is too much to look at further between one residual to the next. To fix this, we should change the way we look at our dependent variable in our studies. Instead of looking at the specific number, or tuition itself, we may want to look at the change in tuition from school to school or as distance increases. Assumption 5: The independent variables are not highly correlated to eachother If all of the VIFs for all of the independent variables are below 10, it means they are not correlated with each other and the assumption is passed. If it had been found that they were correlated to each other, we would have a problem with multicolliniearity and the effects of each independent variable on the dependent variables may be indistinguishable. If we have a VIF above 10, that variable might be correlated to other independent variables.

Coefficientsa

Model

Unstandardized

Coefficients

Standardiz

ed

Coefficient

s

t Sig.

Collinearity

Statistics

B Std. Error Beta

Toleran

ce VIF

(Constant) 11471.73

4

9592.627 1.196 .237

Average

Institutional GPA

884.523 3152.132 .020 .281 .780 .631 1.584

Distance to Major

Cities

-32.886 19.889 -.097 -1.653 .104 .917 1.091

Student Body

Population

-.265 .150 -.124 -1.772 .082 .651 1.535

Average Amount

Financial Aid

Offered

1.169 .107 .838 10.968 .000 .544 1.839

Age .987 16.368 .004 .060 .952 .705 1.419

a. Dependent Variable: Total In-State Cost

None of our independent variables have a VIF greater than 10, so we do not have any violations

Page 14: AvgGPA Average institutional GPA DMC Distance to Major

of the assumption.

Corrections

Type Linear Logarithmic Inverse Quadratic Cubic Average Institutional GPA squared .068 .079 .089 .177 .123

Distance To major cities Log .70 .101 .045 .077 .70 Student body Population

.077 .044 .011 .061 .061 Financial Aid Offered Squared

.802 .799 .579 .831 .834 Age Inverse .117 .115 .162 .149 .149

In order to make corrections to our data after failing assumption 2, we ran curve estimation on every numeric variable. In doing so, we tested for the linear, logarithmic, inverse, quadratic and cubic equations of each variable. If linear had the highest Adjusted R Squared, then no new variable needed to be added. If logarithmic had the highest, then a new variable had to be made as the log of that variable. If inverse had the highest, then a variable containing the inverse of that variable had to be created. If quadratic had the highest Adjusted R Squared, a new variable had to be created as the variable squared. If cubic were selected, 2 new variables must be created as that variable squared and cubed. Linear Regression Analysis (Corrections) After creating our new variables, as shown above, we ran a backward regression with the new variables and chose the 2nd regression model, as it had the highest adjusted R squared. In this model, Student Body Population had been removed Total In-State Tuition= -98829.762- 10150.844(Average GPA Squared) – 1.064E-5(Average Financial Aid Squared) – 70140.436(Distance to Major Cities) – 11811.984(Private vs Public) – 1.205(Average Amount Financial Aid Offered) – 15.273(Age)

Page 15: AvgGPA Average institutional GPA DMC Distance to Major

Interpretation and Statistical Significance Age is statistically insignificant, and thus we cannot interpret their meaning in the model, since we cannot reject the hypothesis that the coefficient is really zero. Average GPA Squared, Average Institutional GPA, Distance to Major Cities, Private vs Public and Amount of Financial Aid Offered are all statistically significant at the 1% level of significance. Average Financial Aid Squared is statistically significant at the 5% level of significance. For every $1 increase in Avg amount of financial aid, total in state tuition will increase by $1.255. If the school is Public (1), tuition will be $11,450.89 less than a private school, all other things constant. For every mile increase from major city, the tuition will decrease by $45.215. For every point increase in institutional GPA, the schools tuition will increase by $64,224.615. Standard Error of the Estimate The standard error of the estimate in this case is $4,628.28 which means that on average, our prediction of total in-state cost will be off by $4,628 from the actual cost. This is slightly better than our previous estimate of $5,063.50. Adjusted R Squared R Squared in our corrections means that 91.7% of the variation in the cost to attend the institution can be explained by the variation in the set of independent variables. Adjusted R Squared in our corrections means that 90.7% of the variation in the cost can be explained by the variation in the set of independent variables, adjusting for the number of independent variables in our model. Overall, we have improved our model since our linear model gave us an adjusted R Squared of 89%. We have improved our accuracy by 2%. F-Test In this case, since Fsig is 0.000, our model is statistically significant at the 1% level of significance. This shows that our model is able to explain some of the behavior in the in-state tuition, just as it had before.

Page 16: AvgGPA Average institutional GPA DMC Distance to Major

Checking Assumptions for new Assumption 1: The expected value of the residuals is zero. This assumption is met again.

Residuals Statisticsa

Minimum Maximum Mean Std. Deviation N

Predicted Value 8979.37 56486.20 33877.05 14499.312 61

Residual -10364.552 10334.628 .000 4349.839 61

Std. Predicted Value -1.717 1.559 .000 1.000 61

Std. Residual -2.218 2.212 .000 .931 61

a. Dependent Variable: Total In-State Cost

Page 17: AvgGPA Average institutional GPA DMC Distance to Major

Assumption 2: The variance of the residuals is constant.

This array of data looks much better than it had previously and shows clear random distribution and no correlation to each other. The assumption is met ☺

Page 18: AvgGPA Average institutional GPA DMC Distance to Major

Assumption 3: The residuals are normally distributed.

This assumption is clearly met, as our data follows the 45 degree trend line nearly perfectly.

Page 19: AvgGPA Average institutional GPA DMC Distance to Major

Assumption 4: The residuals are independent.

Model Summaryb

Model R R Square

Adjusted R

Square

Std. Error of the

Estimate Durbin-Watson

1 .958a .917 .905 4672.476 1.443

a. Predictors: (Constant), Age, Distance to Major Cities , Student Body Population,

Average Institutional GPA, Private vs Public, Average Financial Aid Squared, Average

Amount Financial Aid Offered, Average GPA Squared

b. Dependent Variable: Total In-State Cost

Our Durbin-Watson has improved from 1.406 to 1.443, so the non-linear regression does

not violate the 3rd assumption.

Assumption 5: The independent variables are not highly correlated to each other.

The added independent variables consisting of Squared material will obviously be highly related to the other variables, so it is expected that their VIFs will be above 10. However, the other variables hold true to the assumption. Summary The non-linear regression has improved the linear regression superbly by adjusting R

Squared to a higher, more acceptable number an correcting our hetroskedasticity.

Conclusion

During the duration of this project, our group overturned some weaknesses that if improved upon could have

made our research more effective. First, as with all research studies, having more data would have made

the output given more reliable. We would expect our adjusted R squared to increase from 88.8% if we were

to do this. Also, we were the first group to conduct a study on this particular region of schools. We didn’t

have another person’s research to mirror our research off of and therefore we may have used some

independent variables that were not helpful to our study, such as student body population and age,

according to our linear regression model. In hindsight, we may have wanted to take in to account more

Page 20: AvgGPA Average institutional GPA DMC Distance to Major

independent variables such as athletic programs or school prestige. However, these numbers would be

extremely difficult to quantify. After analyzing all of our data, we have found that distance to a major city has

a direct effect on the tuition of an institution. Generally, the closer a college or university is to a city the more

expensive the tuition will be.