![Page 1: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/1.jpg)
NCAA Basketball Tournament: NCAA Basketball Tournament: Predicting PerformancePredicting Performance
Doug Fenton, Ben Nastou,
Jon Potter
Mathematics 70; Spring 2001
![Page 2: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/2.jpg)
Project GoalsProject Goals
To examine some of the factors that indicate how well a team will do in the NCAA Men’s B-Ball Tournament
To compare factors (such as seed, conference, and individual team) and their effect on a team’s result
To create an effective model for how well a team does based on certain factors
![Page 3: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/3.jpg)
BackgroundBackground The NCAA Basketball Tournament is a 64-team, single
elimination tournament. This has been the tournament’s format since 1985: we
use data from 1985-2000. There are four separate regions, each with sixteen
teams seeded 1-16 (with 1 being the best and 16 being the worst).
The result (dependent) variable is based on how many games a team wins in the tournament.
![Page 4: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/4.jpg)
General AnalysisGeneral Analysis
The first two (independent) variables looked at were the team’s seed and its winning percentage.
The regression was as follows:Result =.599 - 0.151*seed + 2.32*percent (R^2=.387) (1.87) (-18.43) (6.00)
As can be seen from this data, both seed and winning percentage had a large effect on the team’s result, with result being positively related to percent and negatively related to seed.
![Page 5: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/5.jpg)
Seed 1 2 3 4 SeedMean 3.359 2.422 1.656 1.609 Mean
Std. Err. 1.577 1.489 1.461 1.317 Std. Err.Seed 5 6 7 8 SeedMean 1.141 1.391 0.797 0.719 Mean
Std. Err. 1.006 1.280 0.820 1.147 Std. Err.Seed 9 10 11 12 SeedMean 0.594 0.672 0.438 0.438 Mean
Std. Err. 0.610 0.960 0.833 0.753 Std. Err.Seed 13 14 15 16 SeedMean 0.234 0.234 0.047 0.000 Mean
Std. Err. 0.527 0.496 0.213 0.000 Std. Err.
A Quick Look at the Seeds...
![Page 6: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/6.jpg)
#1 Seed vs. #2 Seed#1 Seed vs. #2 Seed
Seed 1 2Mean 3.359 2.422
Std. Err. 1.577 1.489result
1 2 3 4 5 6
0
.25
result0 1 2 3 4 5 6
0
.296875
Histogram for #1 Seed Histogram for #2 Seed
T n+m-2 = (Avg(x)-Avg(y))/(SP*(sqrt(1/n+1/m)) = 3.46
T n+m-2 = 3.46 > 1.64 = T 0.05, 126
We can therefore conclude with 95% certainty
that One Seeds outperform Two Seeds.
Assuming normality in the distribution of results for #1 Seeds and #2 Seeds,
![Page 7: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/7.jpg)
Differences in VarianceDifferences in Variance
Do #1 Seeds have a different variance from #2 Seeds?
H0: S12
= S22 vs. H1: S1
2 ≠ S2
2
F = S12
/ S22 = 2.49/2.22 = 1.12
Critical Value: F(63, 63) with 95% confidence: .600<F<1.67 Therefore, the F-stat falls within the interval and #1
Seeds may have the same variance as the #2 Seeds.
![Page 8: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/8.jpg)
Seed AnalysisSeed Analysis
With other seeds, the result data cannot be assumed to be normal.
Therefore, hypothesis testing comparing seeds could not be used
However, we were able to test if the probability of a given seed winning the championship was different than 1/16 (If all seeds were created equal).
![Page 9: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/9.jpg)
Does the Top Seed Win the Does the Top Seed Win the Championship More Often?Championship More Often?H0 : p1 = 1/16 vs. H1 : p1 > 1/16 t = (9 - 16*(1/16)) / ((16*1/16(1-1/16))^.5)
t = 8.26 > t(.05,15) = 1.74 Therefore, we reject the null hypothesis,
which means that the top seed wins the championship more often than if the tournament was randomly seeded.
![Page 10: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/10.jpg)
Do the High Seeds Typically Do the High Seeds Typically Outperform the Lower SeedsOutperform the Lower Seeds | Hi Seeds (1-8) | Lo Seeds (9-16) | Total Lo Result(0-2) | 392 | 504 | 896 Hi Result (3-6) | 120 | 8 | 128 Total | 512 | 512 | 1024 % Hi Result | .234 | .0152
H0: pH = pL vs. H1: pH > pL
Phat = (120+8)/(512+512) = .125 Z = (see Thm. 9.4.1) = 10.58 > 1.64 = Z.05
Hence, as expected, higher seeds outperform lower seeds.
![Page 11: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/11.jpg)
The Conference VariablesThe Conference Variables
The teams from our study came from 31 different conferences.
These conferences were divided into 4 different tiers based past tournament performance and the number of schools who get into the tournament each year (Tier 1 being strongest conferences; Tier 4 being weakest conferences)
We then tested how a team’s conference tier was correlated with their performance.
![Page 12: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/12.jpg)
Comparing Teams’ Conferences Comparing Teams’ Conferences
We tested the correlation between a team’s conference tier and their performance in the tournament.
Likewise, we tested to see if there was significance of a team’s winning percentage given their conference tier.
Therefore, we created a dummy variable for each tier and interaction terms between tier and winning percentage.
![Page 13: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/13.jpg)
Results (R^2 = .40)Results (R^2 = .40)
result | Coef. Std. Err. t P>|t| -------------+-------------------------------------------------------- win % | 1.219 .574 2.12 0.034 Tier 1 | -4.40 .543 -8.10 0.000 Tier 2 | -2.148 .824 -2.61 0.009 Tier 3 | -4.189 1.00 -4.18 0.000 %*T1 | 7.90 .756 10.44 0.000 %*T2 | 3.93 1.136 3.46 0.000 %*T3 | 6.20 1.34 4.61 0.000
![Page 14: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/14.jpg)
Is Tournament Fairly Seeded Is Tournament Fairly Seeded Based on Conference Tier?Based on Conference Tier? To see if this is true, we looked at only the top
4 seeds because they seemed the most normal.
For each of these seeds, we created four groups, one for each tier; to see if performance was consistent with the conference tier given a team’s seed.
ANOVA was used for analysis of: H0: MT1 = MT2 = MT3 = MT4 (for each seed 1-4)
![Page 15: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/15.jpg)
Results Results
Seed Group | F. | F-critical -------------------------------------------------- Seed 1 | 1.102 | 3.148 Seed 2 | 0.365 | 2.758 Seed 3 | 0.934 | 3.148 Seed 4 | 0.039 | 2.758
![Page 16: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/16.jpg)
Analyzing Certain TeamsAnalyzing Certain Teams
Dummy Variables were created for teams which had been in at least 12 (75%) of the tournaments.
There are not enough data points, and the histograms are too skewed, to assume normality for the team data
![Page 17: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/17.jpg)
A Quick Look at the A Quick Look at the Teams’ PerformancesTeams’ Performances
Team Obs Mean SDDuke 15 3.33 2.02Kentucky 13 3.08 1.75UNC 16 2.81 1.52Kansas 15 2.40 1.64Michigan 12 2.17 2.08Arkansas 13 2.08 1.89Syracuse 14 1.86 1.56Georgetown 12 1.83 1.40Louisville 12 1.75 1.66UCLA 13 1.62 1.71Arizona 16 1.56 1.86Indiana 15 1.40 1.80Purdue 14 1.36 1.01Oklahoma 13 1.31 1.49Temple 15 1.27 1.16Illinois 12 1.00 1.13
![Page 18: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/18.jpg)
Is Duke the Best? Is Duke the Best? Duke vs. KentuckyDuke vs. Kentucky
result0 1 2 3 4 5 6
0
.266667
result1 2 3 4 5 6
0
.307692
Team Obs Mean SDDuke 15 3.33 2.02Kentucky 13 3.08 1.75
Duke Kentucky
Assuming normality in the distribution of results for both Kentucky and Duke (which may not be a valid assumption), T n+m-2 = (Avg(x)-Avg(y))/(SP*(sqrt(1/n+1/m)) = 0.356
T n+m-2 = 0.356 < 0.856 = T 0.20, 26
Therefore, we cannot reject the null hypothesis that Duke and Kentucky have perform equally well with even 20% certainty
![Page 19: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/19.jpg)
Team E(1985) Time Trend Std. Err. P>|t|Arizona 0.82 0.099 0.101 0.970Arkansas 2.08 0.000 0.127 0.998Duke 3.83 -0.068 0.113 0.559Georgetown 2.70 -0.148 0.100 0.169Illinois 1.35 -0.050 0.069 0.483Indiana 2.51 -0.139 0.105 0.208Kansas 3.38 -0.127 0.087 0.170Kentucky 2.03 0.130 0.096 0.201Louisville 3.65 -0.230 0.094 0.035Michigan 2.33 -0.027 0.156 0.869UNC 2.74 0.010 0.085 0.905Oklahoma 2.44 -0.152 0.072 0.059Purdue 0.78 0.074 0.054 0.197Syracuse 1.94 -0.012 0.091 0.900Temple 1.36 -0.001 0.067 0.860UCLA 1.18 0.049 0.127 0.705
Time Trends?Time Trends?
According to this time trend regression, Kentucky would have overtaken Duke in 1994.
![Page 20: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/20.jpg)
Maybe So...Maybe So...
Tournament Appearances
Average Tournament Wins
Standard Deviation Min Max
Kentucky 6 4 2 1 6Duke 5 2.2 1.92 0 5
1995-2000
![Page 21: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/21.jpg)
Are Certain Teams Mis-Seeded?Are Certain Teams Mis-Seeded?
If the team’s dummy variable is significant with seed, it suggests that that team is often “mis-seeded” (ie. a team is consistently seeded higher or lower than it should be).
![Page 22: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/22.jpg)
So, for example, Duke can be expected to win more than one more game than other teams of the same seed, and Illinois can be expected to win more than half a game less than other teams of the same seed.
If Duke and Illinois are seeded the same, Duke can be expected to win almost two full games more than Illinois.
TeamDummy Coef.
Std. Err. t P>|t|
Duke 1.342 0.278 4.82 0.000Kentucky 1.200 0.299 4.02 0.000UNC 0.874 0.271 3.22 0.001Arkansas 0.596 0.298 2.00 0.046Kansas 0.478 0.281 1.70 0.089Michigan 0.430 0.312 1.38 0.168Louisville 0.257 0.310 0.83 0.408Georgetown 0.224 0.311 0.72 0.473Syracuse 0.110 0.289 0.38 0.704Temple 0.007 0.233 0.03 0.979UCLA -0.039 0.300 -0.13 0.897Oklahoma -0.242 0.299 -0.81 0.419Indiana -0.264 0.278 -0.95 0.344Arizona -0.265 0.270 -0.98 0.330Purdue -0.312 0.289 -1.08 0.280Illinois -0.626 0.311 -2.01 0.045
Under-Rated?
![Page 23: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/23.jpg)
Analyzing ExperienceAnalyzing Experience
An experience variable was created to reflect the total number of previous tournament games (won or lost) a team had played since 1985.
Result = .952 + .054*experience - .051*year (R^2=.14) (12.80) (12.87) (-5.51) Hence, there is correlation between experience and
result, suggesting that teams which have been in the tournament often typically win more games… also, successful teams typically stay successful.
![Page 24: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/24.jpg)
Regression with Experience (R^2=.39)Regression with Experience (R^2=.39)
result | Coef. Std. Err. t P>|t| -------------+-------------------------------------------------------- win % | 2.575 .391 6.59 0.000 seed | -.131 .009 -13.72 0.000 exper | .016 .0042 3.84 0.000 year | -0.016 .0081 -2.07 0.038
![Page 25: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/25.jpg)
Experience (cont…)Experience (cont…)
That experience is significant when regressed with seed and winning percentage indicates that it is not fully accounted for in the seeding of teams, and that it is another variable worth looking at when making tournament predictions.
The experience variable is significant in a variety of regressions indicating its robustness as an explanatory variable
![Page 26: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/26.jpg)
FINAL REGRESSIONFINAL REGRESSION Source | SS df MS Number of obs = 1024
-------------+------------------------------ F( 7, 1016) = 103.97
Model | 767.840021 7 109.691432 Prob > F = 0.0000
Residual | 1071.90998 1016 1.05502951 R-squared = 0.4174
-------------+------------------------------ Adj R-squared = 0.4133
Total | 1839.75 1023 1.7983871 Root MSE = 1.0271
------------------------------------------------------------------------------
result | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
seed | -.109729 .0122139 -8.98 0.000 -.1336964 -.0857616
percent | 2.795619 .4106081 6.81 0.000 1.989882 3.601356
experience | .0071693 .004408 1.63 0.104 -.0014805 .0158192
onepercent | .3927863 .1308057 3.00 0.003 .1361061 .6494664
duke | 1.117162 .2837561 3.94 0.000 .5603469 1.673977
kentucky | 1.074347 .2931143 3.67 0.000 .4991684 1.649526
year | -.0079938 .0081567 -0.98 0.327 -.0239997 .0080121
_cons | -.2546688 .3922204 -0.65 0.516 -1.024323 .5149859
------------------------------------------------------------------------------
![Page 27: NCAA Basketball Tournament: Predicting Performance](https://reader036.vdocument.in/reader036/viewer/2022062321/56813518550346895d9c6dc6/html5/thumbnails/27.jpg)
ConclusionsConclusions
Tournament predictions can be fairly accurate based solely on seed
There are other predictors such as winning percentage, conference, and experience which can be used to refine predictions
However, better teams don’t always win, so it is impossible to make predictions absolutely