regression to the mean at the masters golf …economics-files.pomona.edu › garysmith › econ190...
TRANSCRIPT
RegressiontotheMeanatTheMastersGolfTournamentAcomparativeanalysisofregressiontothemeanonthePGAtourandattheMastersTournament
KevinMasiniPomonaCollegeEconomics190
2
1. Introduction
Everysportinvolveselementsofluckandskill.EvenonthePGAtour,whichis
consideredasthehighestlevelofgolf,scoresandwinnersareoftendeterminedbyafortuitous
bounceontothegreenoranunluckykickintoahazard.Becausegolfissuchagameofinches,
thereisanimperfectcorrelationbetweenplayerperformanceandskill.Thisimperfect
correlationcanbeseeninallsports,andisespeciallyevidentinthegameofgolf.Thisiswhy
weseesomanydifferentwinnersonthePGAtourandwhyitissodifficultforplayerstowin
multiplestournamentsinagivenseasonandeventhroughoutaplayer’scareer.The
aforementionedimperfectcorrelationleadstoaphenomenonknownasregressiontothe
mean.
1.1RegressiontotheMean
Regressiontothemeanisthephenomenonwheresomeonewhoperformstowardan
extremeoneyearislikelytoperformclosertothemeanthefollowingyear.Regressiontothe
meancanbeseeninmanydifferentaspectsoflife,butisespeciallynoticeableinsports.Itwas
firstobservedin1886whenSirFrancisGaltonstudiedtherelationshipbetweentheheightsof
parentsandtheirchildren(Galton,1886).Thisinauguralworkhasledtofurtherresearchonthe
phenomenon.Awell-knownexampleofregressiontothemeanisthe“sophomoreslump”.
Thesophomoreslumpiswhereaplayerwhohasaparticularlyexceptionalrookieseasonshows
declineintheirsecondseason.Thisisverymuchthedefinitionofregressiontothemean.A
rookiewhohadanexceptionalseasonlikelyoutperformedtheirtrueabilityandwillregress
3
towardsthemeanthefollowingyear.Justasaplayerwhounderperformsintheirfirstseason
willlikelyperformbetterintheirsecondseason.
1.2TheMasters
Eachseasontherearenearly50PGAtourevents.Ofthesetournamentstherearefour
majortournaments(majors).Thefourmajorsareviewedasthemostimportanttournaments
eachyear.Ofthefour,TheMastersTournamentistheonlyoneplayedatthesamecourse
everyyear.TheMasterswasfirstplayedin1934andtypicallyhasafieldofeightytoone
hundredofthebestgolfersintheworld.EachyearTheMastersisplayedatAugustaNational,
oneofthemostfamousgolfcoursesintheworld.
TheMastershasbeenplayedatAugustaNational73times,ofthose73,47havebeen
wonbymultipletimewinners.Thatis,peoplewhohaveoneatleasttwiceaccountfornearly
two-thirdsofthevictoriesatAugusta.Thatmeanstherehavebeen26one-timewinnersatThe
Masters.TrevorImmelmanwonthetournamentin2008asoneofhisonlytwowinsonthePGA
tour.Furthermore,hehasonlyfinishedinthetop10twiceinhisfifteenappearancesat
Augusta.ThisisarareoccurrenceatTheMasters.Typically,fansseefamiliarnamesatopthe
leaderboardeachyear.Forexample,PhilMickelsonhasfinishedintheTop10atTheMasters
infourteenofhistwenty-fourprofessionalstarts,winningthreetimes.Toputthatinto
perspective,Philhasfinishedinthetop10in58%ofhisMastersstartscomparedto34%ofhis
PGAtourstarts.SimilartoMickelson,manyplayersseemto‘showup’atTheMastersevery
year.Whetheritbethecourse,thefactthatmanyplayerstailortheirschedulearoundthe
tournament,orsomeotherreason,itseemsthatcertainplayersshowlessregressiontothe
4
meanfromyeartoyearatTheMasters.ItisbecauseofthisthatIhypothesizethatwewillsee
lessregressiontothemeanatTheMastersthanisseenduringtheentirePGATourseason.
Thisgoesforbothyear-to-yearaswellasfromround-to-round.
2. LiteratureReview
Regressiontothemeanisstudiedinanumberofdifferentareas,withsportsbeingone
ofthemainfocuses.Whenitcomestosports,aplayer’sperformancecanbemodeledbya
combinationofluckandskill.Essentially,eachathletehasabaseskilllevelandthenhas
differentlevelsofluckonagivendayorduringagivenseason.Intermsofgolf,weseethese
fluctuationsinluckmoreoftenthanthetypicalsport.InKahnemen’sThinkingFastandSlow
(2011)heoffersasimplemodelofluckandskill,whichisasfollows:
𝑠𝑢𝑐𝑐𝑒𝑠𝑠 = 𝑡𝑎𝑙𝑒𝑛𝑡 + 𝑙𝑢𝑐𝑘
𝑔𝑟𝑒𝑎𝑡𝑠𝑢𝑐𝑐𝑒𝑠𝑠 = 𝑎𝑙𝑖𝑡𝑡𝑙𝑒𝑚𝑜𝑟𝑒𝑡𝑎𝑙𝑒𝑛𝑡 + 𝑎𝑙𝑜𝑡𝑜𝑓𝑙𝑢𝑐𝑘
Thissimplemodeloffersinsightonregressiontothemeaningolfandhowtointuitively
understandthefluctuationsinplayer’sscores.Thinkofthefirsttworoundsofagolf
tournament.Saythattheaveragescoreispar,ora72.Onewouldexpectthataplayerthat
shota65hasaboveaverageskill,butalsoexperiencedaboveaverageluck.Thisplayerislikely
tobesuccessfulonthesecondday,butprobablylesssuccessfulbecausetheywillnotbeas
luckyastheywereonthefirstday(Kahneman,2011).Kahnemandoesagoodjobofdescribing
thetheorybehindregressiontothemeanandmorespecificallyluckandskillingolf,butdoes
notofferanydataonthesubject.
5
ConnollyandRendleman(2008,2009)usethismodelofluckandskill,butoffermore
insightsonthedirectresultthatithasongolfers.Theydiscoveredthatthewinnerofanormal
PGAtoureventexperiencesroughly2.5strokesperroundofabnormallyfavorablerandom
variationinscoring.BroadieandRendleman(2015)wentdeeperintheiranalysisofluckand
skillatalllevelsofgolfbylookingathowplayer’sperformancechangedfromthefirstroundto
thesecondroundoftournaments.Theysplitplayersintotwogroups,basedontheirfirstround
performance.Group1beingplayersinthetophalfandGroup2beingplayersinthebottom
half.Theythenlookedathowplayersineachgroupperformedinthesecondround.They
foundthatGroup1asacollectiveperformedmuchworseontheseconddaywhileGroup2
showedmuchimprovement.Thistestshowedclearevidenceofregressiontothemean
betweenthefirsttworoundofprofessionalgolftournaments.Theiranalysisalsolookedat
howdifferentskilllevelsareeffectedbyluckandskill.Theydiscoveredthatasyoudecreasethe
skilllevelofgolfersfromprofessionalstoamateurstoyoureverydaycountryclubgolfer,the
variationinscoresismorelikelytobeduetoskillratherthanluckwhentheplayersareless
skilled.Thisisknownastheparadoxbetweenluckandskill.
SchallandSmith(2000)lookedatregressiontothemeaninprofessionalbaseball
players.Theiranalysisdidnotfocusonthemodelofluckandskill,butusedaverysimilar
modelforplayerperformance.Theydidaseason-by-seasonanalysisofbattingaveragesand
earnedrunaveragesstandardizedeachseasontohaveameanofzeroandastandarddeviation
of1.Theyfoundthattherewasanimperfectcorrelationinperformancefromoneyeartothe
next.Becauseperformanceisimperfectlymeasured,playersbattingaveragesandearnedrun
averagesregresstowardsthemean.
6
3. Data
ThispaperutilizesdataobtainedfromthePGAtoursShotLinkdatabase.Thedatabase
hasdataontheoverallresultsoftournamentsaswellasshot-by-shotdataforeveryshothitin
competitionplay.ThePGAtourhashundredsofvolunteersateachtournamenttohelpwith
thecollectionoftheshot-by-shotdata.Theyusethisshot-by-shotdatatorunanalyseson
playersandtournamentstoofferinsightintohowplayersindividuallyandasagroupperform
onanumberofdifferentlayersofskillsets.
Intermsofthisanalysis,theshot-by-shotdataisnotnecessary.Thispaperutilizes
playerscoresduringthefirsttworoundatTheMastersTournamentaswellasaveragefirstand
secondroundscoresforplayersthroughouttheentireseason.Scoresfromthethirdandfourth
roundsarenotusedastheyoccurafteranumberofplayersare“cut”fromthetournament.
Datawaspulledfortheten-yearstretchfrom2008until2017.
4. Methodology
Thisanalysisdiffersfrompreviousanalysesinthatitisacomparativeanalysisbetween
thePGAtourseasonandTheMastersTournament.Ilooktoseeifthereisasignificant
differenceinhowplayersregresstothemeanatTheMasterscomparedtothroughoutthe
season.Regressiontothemeanislookedatfromyear-to-yearaswellasfromround-to-round
inagivenyear.Atypicalprofessionalgolftournamentconsistsoffourroundsoftournament
playwithpoorerperformingplayersbeingcutfollowingthesecondround.Thispaperfocuses
onthefirsttworoundsofthetournamentinordertoincludeeveryplayerinthefieldfora
giventournament.Inordertoseehowplayersperformfromoneroundtothenext,thisstudy
7
usesatestverysimilartotheoneperformedbyBroadieandRendleman(2015).Thesecond
partoftheanalysisistoseehowplayersperformacrossseasons.Inordertorunthisanalysis
thispaperwilluseamodelsimilartothatusedbySchallandSmith(2000).
4.1Round-By-RoundAnalysis
Theround-by-roundanalysiscompareshowplayersperformfromoneroundtothenext
duringthePGATourseasonandatTheMasters.Foreachgroup,playersareassignedtoaone
oftwogroupsafterthefirstroundofplay.Thetophalf(theplayerswhoshotthelowest
scores)areplacedinGroup1,andthebottomhalfisplacedinGroup2.Thentheaverage
second-roundscoreiscomputedforthesamegroups.
Thereareseveraldifferentfactorsthatgointothegroupingofplayers.Playersinthe
firstgroupmaysimplybemoreskilledthanthoseinthesecondgroup.Or,itcouldbethatthe
firstgroupjustexperiencedmorefavorablerandomvariation,alsoknownas“luck”.Ifitwas
onlytheskilloftheplayerthatdeterminedthegroupsonewouldexpectthattheplayersfrom
Group1wouldhaveasecond-roundaveragescoreroughlythesamenumberofstrokesbetter
thanGroup2astheydidinthefirst-round.Ifluckwastheonlyfactorinthefirstround,then
onewouldexpectthatthetwogroupswouldhaveaveragesthatareclosetoequalinthe
secondround.Finally,ifacombinationofluckandskilliswhatdeterminesscoresthenone
wouldexpectthatthedifferencebetweensecond-roundscoreswouldbesmallerthanthe
differencewasforfirst-roundscores.Thedifferenceforgroupsarethencomparedbetween
thePGATourseasonandTheMasters.Thiscomparisoncanbequantifiedbylookingatthe
correlationbetweendifferences.
8
4.2Year-To-YearAnalysis
Inordertocompareplayerscoresfromdifferentyears’performancecanbe
standardizedbyfindingthedifferencebetweenaplayer’sperformancefromagivenyearand
themeanperformanceforallplayersduringsaidyear.Thisnumbermustbedividedbythe
standarddeviationofperformanceacrossallplayersfortheseason.
FollowingtheworkofSchallandSmith(2000),aplayer’sperformanceforagivenyearis
determinedbyanexpectedvalue(x),whichcanbethoughtofustheplayer’sskilllevelortrue
ability.Theplayer’sactualperformancethendiffersfromtheirtrueabilitybyarandomterm
(E)thathasanexpectedvalueofzeroandisindependentofskillaswellastherandomterms
valueinotherseasons.Thisthengivesusthefollowingequation:
𝑌 = 𝑥 + 𝐸
Onceplayersscoresarestandardized,player’sperformancecanbecomparedfromyear-to-year
andbetweenthePGATourseasonandTheMasters.
5. Results
Analysesofthepast10seasonsshowthatregressiontothemeanatTheMastersisnot
significantlydifferentthanitisduringthePGAtourseason.Ifanything,thereismoreregression
tothemeanatTheMastersthanduringtheseason.Whenlookingatthedifferencebetween
playerscoreandtheaveragescore,theR-squaredvalueattheMastersforthe2015and2016
seasonsis.105.Thisiscomparedwithavalueof.185forthePGAtourseason.Onecansee
9
thatwhilebothvaluesarelow,theR-squaredforTheMastersissignificantlylowerthanduring
thePGATourseason.
Whenlookingfromround-to-roundin2015,thePGAtourseasonshowsasexpected
regressiontothemeanwithanr-squaredvalueof.131.Themastersshowedanevensmaller
value.TheR-squaredforTheMastersin2015is.00034,showingnearlynorelationship
betweenfirstandsecondroundscoresofplayers.Thisseemstoshowtheparadoxofluckand
skill,whichhasbeenseeninpreviousworks.
Thislackofcorrelationbetweenthescoresofplayersbetweenroundsisevidentinthe
round-by-roundanalysisusingtwogroups.Table1abelowshowsthatthegroupsconverge
towardsthemeaninthesecondround.Thisgivessolidevidenceconfirmingtheworkof
BroadieandRendleman(2015),sayingthatacombinationofluckandskilliswhatleadstototal
performanceinprofessionalgolf.Furthermore,therewasnosignificantdifferencebetween
thegroupsatTheMastersandduringtheregularPGATourseason.DuringthePGATour
season,playersinthefirstgroupstillhavealowerscorethanthoseinthesecondgroupinthe
secondround.ThisisnottrueforTheMasters.AttheMastersweseethatthefirstgrouphas
aslightlynegativecorrelationbetweenthefirstandsecondrounds.Regressiontothemeanis
soseverethatGroup1scoresworsethanthesecondgroupduringthesecondroundatThe
Masters.ThisseemstosuggestthatdeviationinscoresbetweengroupsatTheMastersis
causedsolelybyluck.
Whencomparingthecorrelationoffirstandsecondroundscoresbetweenthedifferent
groups,oneseesverylittlecorrelationforbothgroups.Maybethemostinterestingpartisthe
10
mannerinwhichcorrelationsfluctuatefromyeartoyearascanbeseeninTable1b.For
example,in2015Group1hadsawafairlysignificantpositivecorrelationbothduringThe
Masters(.24)andduringtheseason(.44)whilethegroupwasnearlyzeroforallotherseasons.
Group2,ontheotherhand,showedapositivecorrelationin2016duringtheseason(.28)anda
similarlynegativecorrelationatTheMasters(-.22).Thefactthatthecorrelationistypically
closetozero,andthattheyfluctuateyearbyyearandgroupbygroupgoestoshowjusthow
randomgolfcanbe.
LookingatthecorrelationbetweenroundsfortheentirefieldatbothTheMastersand
duringthePGAseasonoverthepast10yearsfurtherrevealstherandomnessbetweenrounds.
ThePGAseasonismuchmoreconsistentthanTheMasterswithcorrelationsfluctuating
between.29and.51overthepast10years.Ontheotherhand,TheMastersfluctuatesfrom
.08to.47overthesameyears.ThePGAseasonhasahighercorrelationbetweenroundsin8of
the10seasons,againsuggestinglessregressiontothemeanduringtheseasonthanduringThe
Masters(Figure1).
Ithensplitplayersintotwogroupsbasedontheiraveragescoreontouroverthepastfour
years.Group1consistsofthetophalfofplayersoftheperiodandGroup2consistsofthe
bottomhalf.Thepointofthiswastosplitplayersintogroupsbasedontheirtrueabilityin
ordertodetermineifbetterplayersregresstothemeanlessthanlessskilledplayers.Group1
beingthebetterplayersandGroup2beingtheless-skilledplayers.Ithenlookedathoweach
groupperformedfromthefirsttothesecondroundatTheMastersandduringtheentirePGA
Tourseason.IfoundthattheplayersinGroup1playedthefirstroundofTheMastersnearly
halfastrokebetterthanthesecondroundoverthelastthreetournaments.Thisiscompared
11
Table1a:round-by-roundcomparison
tothemshooting.15strokesbetterinthefirstroundduringtheentireseasonoverthepast
threeyears.Ontheotherside,thesecondgroupshotnearlyhalfastrokebetterinthesecond
roundofTheMastersthanthefirst.Thiscomparedtoscoringslightlybetterinthesecond
roundthroughoutthePGATourseason.Theselargerdifferencebetweenroundsatthe
MastersprovidesfurtherevidenceofmoreregressiontothemeanatTheMastersthanduring
thePGATourseason.
Whilethistestdidnotshowanydifferenceinregressiontothemeanbetweendifferentskill
groups,itdidshowthatthegroupsperformedmuchdifferentlyfromroundtoround.Thetest
showsevidencethatthemoreskilledplayersontourplaybetterinthefirstroundthanthe
secondroundandviceversaforlessskilledplayers.Thiscouldbebecausetheworseplayers
havetoplaybettertomakethecut,oritcouldbecausedbysomeotherreason.
Table1b:round-by-roundcorrelation
12
6. Conclusion
Analysesshowthatthereisnotasignificantdifferenceinregressiontothemeanbetween
TheMastersTournamentandthePGAtourseason.Thisisapparentonboththeround-to-
roundlevelaswellastheyear-to-yearanalysis.Itisofnotethatthenumberofobservationsare
lowbecauseofthefactthattheaveragegolftournamenthasfewerthanonehundredplayers.
Onethingthatisnotcontrolledforintheround-by-roundanalysisisdifferingweather
conditions.Playerstypicallyhaveoneroundinthemorningandoneroundintheafternoon
duringthefirsttworoundsofatournament.Onoccasionthereisanextremedifferencein
playingconditionsbetweenthemorningandafternoon.Thischangeinweathercouldbea
causeforregressiontothemeanwhenlookingatasingulartournament.Itisunlikelythatthis
wouldbeafactorwhenlookingattheentireseason.
Figure1:Round-to-roundcorrelationduringPGAseasonandatTheMastersfrom2008-2017
13
ThefactthatatTheMastersplayersfromdifferinggroupsscorepracticallythesamein
thesecondroundrevealsthatscoringatTheMastersisbasedmoreonluckthanduringthe
PGAseason.ThiscouldbeduetothefactthatitismuchmoredifficulttoqualifyforThe
Mastersthanitisforregularevents.MeaningthattheplayersatTheMastersarecloserintrue
abilitythantheyareinanormaltournament.
IfplayersattheMastersshowmoreregressiontothemeanthanduringtheseason,
thenwhyisitthatplayerslikePhilMickelsonseemtoperformbetteratTheMasters?One
explanationcouldbethatMickelsonandotherplayerssimplymatchupwellwithAugusta.Itis
seeninothertournamentsthatplayersplaybetteratcertaincourses.Itcouldbethat
Mickelsonjustsohappenstohaveagamethatfitswellwithoneofthemostprestigious
coursesintheworld.
14
7. References
(1) Broadie,Mark,andRichardRendleman.“AretheOfficialWorldGolfRankingsBiased?
”Http://www.columbia.edu/~mnb2/Broadie/Assets/owgr_20120507_broadie_rendlema
n.Pdf,7May2012.
(2) Connolly,RobertA.andRichardJ.Rendleman,Jr.,2008,Skill,LuckandStreakyPlayon
thePGATour,"JournaloftheAmericanStatisticalAssociation,103(March):74-88.
(3) Connolly,RobertA.andRichardJ.Rendleman,Jr.,2012,\WhatitTakestoWinonthe
PGATour(IfYourNameisTiger"orIfItIsn't),"InterfacesNovember-December,
42(6):554-576.
(4) Galton,F.(1886),“RegressionTowardsMediocrityinHereditaryStature,”Journalofthe
AnthropologicalInstitute,15,246-263.
(5) Kahneman,Daniel.Thinking,FastandSlow.Farrar,StrausandGiroux,2013.
(6) PastWinners,2018.www.masters.com/en_US/discover/past_winners.html.
(7) PGA.“WhatIsShotLinkIntelligence.”PGATour,2005,
www.pgatour.com/stats/shotlinkintelligence/overview.html.
(8) TeddySchall&GarySmith(2000)DoBaseballPlayersRegresstowardtheMean?,The
AmericanStatistician,54:4,231-235
15
8. GraphsandFigures
Figure3:Mastersround1comparison2016-2017
Figure2:PGAround1comparison2016-2017
16
Figure4:PGAround-to-round2017
Figure5:Mastersround-to-round2017
17
Table2:PGA
Tou
rgroup
sforro
undcompa
rison
18
Table3:M
astersgroup
sforro
undcompa
rison