r is for racing · – licensing situation is expensive (largely b2b) and restrictive – no...
TRANSCRIPT
RisForRacing
ColinMageeJanuary2019
Overview
• Introduction,contextandhistory• Dataforprogrammatichorseracinganalysis• Exploringhorseracingdata• Buildingmodelswithhistorichorseracingdata• Usingdailydatatofindprofitablepredictions• UsingabettortoplacebetsintoBetfair• Whatnext?Avoidinggambler’sruin!
© 2019
About• Horseracingenthusiast,punter,sometimeowner
– CurrentlymemberofHorseracingBettors’Forumhttp://ukhbf.org/about/ supportedbytheBHA
• Careerwise,execinsoftwareanddatabusinesses– CurrentlyonboardatDoorda www.doorda.com
• Setupwww.betwise.co.uk in2007– InitiallytosupportbookAutomaticExchangeBetting– firstbookonautomatingbettingprocessviaBetfair API
– CreatedSmartform forhorseracing/dataenthusiasts– CurrentlyworkingonRisforRacingwithJayEmersonhttp://www.stat.yale.edu/~jay/
© 2019
Context• Horseracingisinmanywaysthearchetypalpredictiveanalyticsproblem– Constantseriesofexperiments(races)– Bookmakersandpuntershavealwaysusedinformationtoassignprobabilitiesofsuccesstoeachcontender
• Nosurpriseeitherthatthehistoryofbettingonhorseracingiscloselyintertwinedwiththerootsofcomputerscience,programmingandstatistics…
© 2019
Babbage– designedfirstcomputer?
© 2019
Lovelace– firstprogrammer?
© 2019
Butdidyouknow….?Fromthe1840s,Adaturnedherprodigioustalentstowardgamblingandprogrammingtheoutcomesofhorseraces.Amysterious“book”waspassedbetweenLovelaceandBabbageonceaweekthatprobablycontainedaprogramdesignedtopredicthorse-raceresults.
OnMay17,1850,Adawrotetohermother:“Iamafraidyouwilltakenointerestinwhatinterestsmemuchjustnow,—viz:thewinneroftheDerby.”Markus,Julia.LadyByronandHerDaughters
Apparently,Adalost£3,200*bettingontheDerby.
*c.£425,000in2019
© 2019
Ihavenoideawhattheywereupto
• Therewasverylittledataasweunderstandit–possiblytheyweretryingtocalculatetheeffectsofweightinslowingdownhorsesofdifferentagesoverdifferentdistances,butwhoknows?
• Whatistrueisthatinthiscenturyonemightassumeweareinamuchbetterpositiontomasterthisdomainbyapplyingaclassicdatascienceapproach– withmoderncomputingtools,domainknowledge,usefuldataandmodeling expertise
© 2019
…ModernOpportunitiesandChallenges
• Bettingexchanges– givenAPIaccess- indeedprovideopportunitytobequantitativeinbettinganalysisandexecution– WroteAutomaticExchangeBettingin2007aboutthis
• Morechallengingwasthe’fundamentaldata’situationtounderstandthedomain,createmodelsandunderstandwhattobeton– Licensingsituationisexpensive(largelyB2B)andrestrictive– Noprogrammabledatasource– lotsofwebsitesorGUIdrivendatabases
reliedonwebscraping ormanualprocessesfordailydataextraction
• So… forconveniencewecreatedSmartform,anenduserlicensed,programmabledatabaseforhorseracingforenthusiastsanddatascientistsalike
© 2019
Smartform
• https://www.betwise.co.uk/smartform– MySQLdatabase– 15yearsofhorseracinghistoryinUKandIreland,updateddailywithresultsandupcomingraces
– Easytoautomate,importandconnectfrommostanylanguage
• Risanaturalchoiceforanalysisenvironment– Minimizeinteration withMySQLandgetstraighttodataexploration,analysis,modelingetc
– AndpackagesavailableforBetfair APIetc.
© 2019
#Connecttodatabase,easywithRMySQLcon<- dbConnect(MySQL(),
host='127.0.0.1',user=’name',dbname='smartform')
#Databaseissimple,flatstructure– non-normalised– makingiteasytoslurpintoR>dbListTables(con)[1]"daily_races" "daily_runners"[3]"historic_races" "historic_runners”#.…Plussomeextratablesforbetfair historicpricesandeasymappingofentities(morelater)….
#Fordataanalysisandmodeling weonlyreallycareabouthistorictables– race_id iscommonkeyquery<- paste("select*fromhistoric_races;",sep="");#Allracesx<- as.data.frame(dbGetQuery(con,query),stringsAsFactors =FALSE)dim(x)191867 40query2<- paste("select*fromhistoric_runners;",sep="") #Allrunnersinthoseracesy<- as.data.frame(dbGetQuery(con,query2),stringsAsFactors =FALSE)dim(y)2140306 66
© 2019
Exploringhorseracingdata• So,wehaveabout2millionhorses(c.162kareunique)spreadamongst200,000races,withresultsforeachrace
• Despitelookofdatabeing“flat”,thetime/historyelementofracingandinterdependententities(eg.races,horses,sires,dams,trainers,jockeys,owners,coursesetc)enablesustocreatethousandsofderiveddimensionsfordeepermodelling
• Butdatacanstillbemessy:Splitbyjurisdiction,splitwithinjurisdiction,fromdifferentsources:Vigilanceisrequired!
• Let’slookatsomepreprocessing /explorationetc. (Demo)
© 2019
© 2019
Featureengineeringoverview
• Basiccleanuptasks– removingnon-runnersetc.• Wecandoalotmorewithderivingfieldsfromexistingfields(eg.regexonrace_name forraces)
• Second,usefulbinning(eg.distance,going,season)• Lastbutnotleast,anynumberofvariablesbasedonrepresentingentityhistory(upto- butnotincluding-eachrunningdate)inonerow• Horses,Sires,Trainers,Jockeys,Owners,StallNumber• Dimensionsondimensions– eg.Trainer/Jockeycombos
– Dataprepcanbeacomputationalnightmare
© 2019
Modelswithhorseracingdata
• Inallcases,predictingawinnerisnotactuallythepoint,orevendoingitaccurately
• Wearesimplytryingtobeatthemarket
© 2019
Acoupleofdemomodel approaches
• Pickoneofourfavouritefeaturesandlookforprofitableangles(oldfashionedpuntersapproach)
• Amorecomprehensivestatisticalmodeltrainedonpastdataandappliedtonewdata
© 2019
Model#1– findingaprofitableangle
• Sometrainershavegoodstrikerates(#winners/#losers),somenotsomuch
• Allhavedifferingspecialities,butthemarkettendstofocusonaveragetrainerstrike(ifusingitatall)
• WecanusedataanalysisinRtofindniches(usingthenewfeatureswecreatedforseasonalperformance)wheretrainersoutperformtheiraverageform
• Next,andmoreimportant,wecanassesswhetherthemarketgenerallyfactorsintheirchances– ornot
• Automatingthisprocessthrowsupmanycandidates(script)
© 2019
Bettingonthemodel
• Thisissimpleenoughtobegoodforshowingofftheliveselectionprocessandusingabettor• https://github.com/phillc73/abettor/blob/master/vignettes/abettor-placeBet.Rmd
• ImportallracesandrunnersfromdailyracesanddailyrunnersinSmartform
• Findallhorsesrunningtodaybymergingagainstourtrainerhotlist
• Usetheconveniencedailybetfair tablessuppliedinSmartform tolookupbetfair references
• Loopthrougheachselection,andinvokeabettortobetatcurrentmarketprices
© 2019
Model#2– arealmodel!
• Manyapproachestopredictingoutcome• Binomial(winnerorbust)• Multinomial(win,show,place,lose)• Continuousregression(transformfinishpositiontoapointalonganinterval)
• Modeltypes:Rhasafew- takeyourpick!• Withthismodel,Jayappliesclassicregression,basedoncontinuousresponsevariable
© 2019
Aboutthe(Linear)Model
• Response:atransformationofthefinishingposition(1,2,…,kforak-horserace)totheunitinterval[0,1].
• Predictors(examples):– Thehorse’spastwinpercentage– Thejockey’swinpercentageinthelastmonthandoverthelastyear(i.e.isthejockey“inform”)
– Thetrainer’swinpercentageinthelastmonth…– Thestallposition,or“effectofthedraw”– …
© 2019
Disclaimers(fromJay)!
• Weviolatetheindependenceassumption• Weviolatethe“normalerrors”assumption• … but…• Inapracticalsense,itdoesn’tmattermuch--Yes,alittlesecretofreal-worlddataanalysis!
• Wecouldspendalotoftimeworryingaboutthisandtryingtoimproveit… andthepayoffwouldbenegligible.
© 2019
NotesonPrediction(fromJay)
• Predictingwinprobabilitiesdoneviasimulation.Why?– Becausethemodelpredictedresponseislikeanunobserved“strengthofperformance”(latentvariable)… whilewe’reinterestedinrankings:Whowins?Whocomesinsecond,etc…
– … andofcoursethereisahugeamountofvariability(the“errors”)inreal-worldproblemslikethisone!
– Andit’sreallyeasytopulloffwithoutmakingseriousmistakes.
© 2019
Predictions!?January15,2019:UpcomingKemptonRace – 7:45pm
© 2019
NamePredictedFinish
PredictedProb(Win)
PredictedProb(2nd)
PredictedProb(3rd)
PredictedDec Odds
SixStrings 1 21.9 17.6 14.8 4.6
MrGent 2 17.4 15.7 14.4 5.7
InTheRed 3 16.7 15.3 14.3 6
StormMelody 4 11.6 12.3 12.7 8.6
MountWellington 5 10.7 11.9 12.4 9.4
PearlSpectre 6 9.5 11 12 10.5
Wilson 7 6.7 8.6 10.2 14.9
BlueCandy 8 5.4 7.6 9.2 18.4
Whatnext?Gambler’sRuin• Inbetting,mostwagershaveanoveralllosingexpectation–
evenifwearebeatingthemarketperse• Aplayerwithfiniteresourceswillinevitablygobrokeunless
thereiscareful:– Riskmanagement– Bankmanagement
• Theobjectiveistooptimise thesizeofthestakerelativetothewinningexpectationinthebetandthesizeofthebank.
• What’sneeded?Copiousbacktesting andastakingstrategy• PlentyofstrategiessuchasKellyCriterion,andpackages
fromourfriendsinRfinance thatcanhelp,eg– PerformanceAnalytics
© 2019
Thanks!
• Lookoutfor‘RisforRacing’– sometimein2019!
– Blog: https://blog.betwise.net/– Q&A: https://answers.betwise.net/– Contact: https://www.betwise.co.uk/contact
© 2019