qf 624: machine learning for financial applications part 3.pdfif all 6 were to be traded, there...
TRANSCRIPT
QF624:MachineLearningforFinancialApplications
AutomatedPatternRecognition,HMM,NLP
MasterofScienceinQuantitativeFinanceLeeKongChian SchoolofBusiness
SaurabhSingalJuly2018
PareidoliaPareidolia– seeingfacesinacloud(faceonMars)
WikipediadefinesPareidoliaas“apsychologicalphenomenoninwhichthemindrespondstoastimulus,usuallyanimageorasound,byperceivingafamiliarpatternwherenoneexists.”
Example13:AutomatedPatternRecognition
Osler,CarolL.& P.H.KevinChangHeadandShoulders:NotJustaFlakyPatternNo4,StaffReportsfromFederalReserveBankofNewYork
Abstract:“Thispaperevaluatesrigorouslythepredictivepowerofthehead-and-shoulderspatternasappliedtodailyexchangerates.Thoughsuchvisual,nonlinearchartpatternsareappliedfrequentlybytechnicalanalysts,ourpaperisoneofthefirsttoevaluatethepredictivepowerofsuchpatterns.Weapplyatradingrulebasedonthehead-and-shoulderspatterntodailyexchangeratesofmajorcurrenciesversusthedollarduringthefloatingrateperiod(fromMarch1973toJune1994).“
HeadandShoulders:NotJustaFlakyPattern
• “Weidentifyhead-and-shoulderspatternsusinganobjective,computer-implementedalgorithmbasedoncriteriainpublishedtechnicalanalysismanuals.Theresultingprofits,replicableinreal-time,arethencomparedwiththedistributionofprofitsfor10,000simulatedseriesgeneratedwiththebootstraptechniqueunderthenullhypothesisofarandomwalk.”
• “Result:Thetradingrulehaspredictivepowerfor2outof6FXcrosses.Ifall6weretobetraded,therewouldeconomicallyandstatisticallysignificantprofits.Theresultsarerobusttochangesintheparametersoftheidentificationalgorithmaswellassampleperiod.”
AndrewLo:TechnicalAnalysis
Lo,AndrewW.,HarryMamaysky andJiangWangFoundationsofTechnicalAnalysis:ComputationalAlgorithms,StatisticalInferenceandEmpiricalImplementationJournalofFinance55(2000),1705–1765.
Abstract:Technicalanalysis,alsoknownas``charting'‘,hasbeenapartoffinancialpracticeformanydecades,butthisdisciplinehasnotreceivedthesamelevelofacademicscrutinyandacceptanceasmoretraditionalapproachessuchasfundamentalanalysis.Oneofthemainobstaclesisthehighlysubjectivenatureoftechnicalanalysis—thepresenceofgeometricshapesinhistoricalpricechartsisoftenintheeyesofthebeholder.
AndrewLo:TechnicalAnalysis(2)
ABSTRACT(Contd.)Inthispaper,weproposeasystematicandautomaticapproachtotechnicalpatternrecognitionusingnon-parametrickernelregression,andapplythismethodtoalargenumberofU.S.stocksfrom1962to1996toevaluatetheeffectivenessoftechnicalanalysis.Bycomparingtheunconditionalempiricaldistributionofdailystockreturnstotheconditionaldistribution—conditionedonspecifictechnicalindicatorssuchashead-and-shouldersordouble-bottoms—wefindthatoverthe31-yearsampleperiod,severaltechnicalindicatorsdoprovideincrementalinformationandmayhavesomepracticalvalue.
TheoriginsofMarkovChaintheory.
• Supposeyouaregivenabodyoftextandaskedtoguesswhethertheletteratarandomlyselectedpositionisavoweloraconsonant.Sinceconsonantsoccurmorefrequentlythanvowels,yourbestbetistoalwaysguessconsonant.
• Supposewedecidetobealittlemorehelpfulandtellyouwhethertheletterprecedingtheoneyouchoseisavowelorconsonant.Istherenowabetterstrategyyoucanfollow?
• In1913,A.A.MarkovwastryingtoanswertheaboveproblemanalysedtwentythousandlettersfromPushkin'spoemEugeneOrigin.Hefoundthat43%letterswerevowelsand57%,consonants.Sointhefirstproblem,oneshouldalwaysguess"consonant"andcanhopetobecorrect57%ofthetime.
Pushkin'sPoetryandMarkovChains
• However,avowelwasfollowedbyconsonant87%ofthetime.Aconsonantwasfollowedbyavowel66%ofthetime.Hence,guessingtheoppositeoftheprecedingletterwouldbeabetterstrategyinthesecondcase.Clearly,knowledgeoftheprecedingletterishelpful.
• TherealinsightcamewhenMarkovtooktheanalysisastepfurther.Markovinvestigatedwhetherknowledgeabouttheprecedingtwolettersconfersanyadditionaladvantage.Hefoundthattherewasnosignificantadvantagetoknowingtheadditionalprecedingletter.ThisleadstothecentralideaofaMarkovchain- whilethesuccessiveoutcomesarenotindependent,onlythemostrecentoutcomeisofuseinmakingapredictionaboutthenextoutcome.
HiddenMarkovModelsandRegimeSpecificStrategies
• Weknowthatthestockmarketnotonlychangesdirection(bullversusbearmarkets)butalsochangesvolatility.
• Calm,peacefulperiodsoflowvolatilityarepunctuatedbyturbulentperiods;occasionallythereareepisodesofpanic.
• A 4–stateHMMcanbefit(peaceful,stable,turbulent,panic-riven).Aggressive/Conservativestrategiesandportfoliomixcanbeappropriateindifferentstates.SeeAppendix4:HiddenMarkovModels
AndrewViterbi
AndrewViterbi isanAmerican electricalengineer§ Inventedthe Viterbialgorithmwhichisadynamic
programmingbasedalgorithmforfindingthemostlikelysequenceofhiddenstatesthatcouldhaveresultedinthesequenceofobservedevents.
§ HelpeddevelopCDMAstandardforcellphones§ Co-founded QualcommInc.§ UniversityofSouthernCalifornia's ViterbiSchoolof
Engineering,isnamedinhishonorin2004inrecognitionofhis$52milliongift.
NaturalLanguageProcessing
• Linguisticsisthescientificstudyoflanguage,includingitsgrammar,semantics,andphonetics.Classicallinguisticsinvolveddevisingandevaluatingrulesoflanguage.
• Computationallinguisticsisthemodernstudyoflinguisticsusingthetoolsofcomputerscience.
• Theexamplesforsentimentanalysisusingbagofwords/wordembeddingsarefromthebookDeepLearningforNLPbyJasonBrownlee(andhisblogsatmachinelearningmastery.com)
SeeAppendix5:NeuralNetworks;Appendix6:Backpropagationexplainedindepth
StopWords
• Astopwordisacommonlyusedword(suchasthe,a,is,are,just)thatprovidelittlecontext
• Theyareremovedduringatextclean-uporpre-processingstage…
• ….becausetheyprovidemorenoisethansignalforacomputationallinguisticanalysisproject.
WhyisVocabularydefined?
• Definingavocabularyofknownorpreferredwordsisimportantinanynaturallanguageprocessingtask.
• Thelargerthevocabulary,thelargerthedimensionofthevectorspace,andthelargeristherepresentationofdocuments.
• Itismoreefficienttoselectonlythosewordsthatarebelievedtohavepredictivepowerorinformationalcontent
Bag-of-WordsModel
• Bag-of-Words(BoW)isarepresentationoftextdatausedfordocumentclassificationandfeatureextraction
• ThetaskoftextmodellingismorecomplicatedbecauseMachineLearningcannotworkonrawtextdirectly;weneedtorepresenttextasnumericvectors.
• Wemakeavocabularyofknownwordsandthencomputethescoreforeachword,thatis,eachwordcountisafeature.
BagofWords-2
• Onesimpleintuitionisthattwodocumentsaresimilariftheyhavesimilarwords.
• Wecanlearnsomething abouttheofadocumentbyexaminingtheseword-scores.
• Ann-gramisann-tokensequenceofwords:a2-gramorbigramisatwo-wordsequenceofwordslike“howare”anda3-gramoratrigramisathree-wordsequencelike“howareyou”.
• Itiscalledabag-of-words,becauseanyinformationabouttheorderorstructureofwordsinthedocumentisdiscarded.
• OnelimitationofBagofWordsisthatwordorderisnotimportant;thuswecannotinfermeaningfromcontext
• Forexample– ThisisimportantvsIsthisimportant– Goodvs“notGood”
WordScoringSchemesusedforTextEncodingbyTokenizer
• Binary- Wherewordsaremarkedaspresent(1)orabsent(0).
• Count- Wheretheoccurrencecountforeachwordismarkedasaninteger.
• Frequency- Wherewordsarescoredbasedontheirfrequencyofoccurrencewithinthedocument.
• TF-IDF- Whereeachwordisscoredbasedonitsfrequency,andwordsthatarecommonacrossalldocumentsarepenalized.
TermFrequency-InverseDocumentFrequency:TF-IDF
• Aproblemwithscoringwordfrequencyisthathighlyfrequentwordsstarttodominateinthedocument,buttheymaynotcontainasmuchinformationalcontent
• Oneapproachistorescalethefrequencyofwordsbyhowoftentheyappearinalldocuments,sothatthescoresforfrequentwordslikethe or thatarealsofrequentacrossalldocumentsarepenalized.ThisapproachtoscoringiscalledTermFrequency-InverseDocumentFrequency,orTF-IDFforshort,where:– TermFrequency:isascoringofthefrequencyofthewordinthe
currentdocument.– InverseDocumentFrequency:isascoringofhowrarethewordis
acrossdocuments.
IntroducingtheMovieReviewData
• TheMovieReviewDataisacollectionofmoviereviewsretrievedfromtheimdb.com website.
• ThereviewswerecollectedandmadeavailablebyBoPangandLillianLeeforresearchworkonNLP.
• Thedatasetcomprises1,000positiveand1,000negativemoviereviewsandiscalledthepolaritydataset.
• Ithasbecomeastandarddatasetfornaturallanguageprocessingandsentimentanalysisresearch,similartoIrisdatasetfortutorialsonclusteringortheMNISTdatasetforintroductorycomputervision.
BagofWords&MovieReviewSentiment
• Theessentialstepsare– Cleanthedata(removepunctuation,stopwords,converttolowercase,etc.)
– Createavocabulary– Createtokensandencodestringstonumericoutput– Thescoreeachdocumentbyfrequencyofeachwordinthevocabulary.IfthereareNwordsinthevocabulary,eachdocumentisrepresentedbyavectoroflengthN,andtheN-th entryisthecount(orfrequency)oftheN-th wordinthevocabulary
– Fitaneuralnetworktotrainingdata
WordEmbeddingModels:Google’sWord2Vec,Stanford’sGloVe
• AWordEmbeddingmodelprovidesdensevectorrepresentationofwordsthatsucceedsincapturingthesemanticregularityinthelanguage.
• Thevectorspacerepresentationofthewordsprovidesaprojectionwherewordswithsimilarmeaningsarelocallyclusteredwithinthespace)wordswithsimilarmeaningarerepresentedbyvectorswhichareclosetoeachother.
• ThisismoremorepowerfulthanaBagofWordsapproachwhichissimple(sparsevectorrepresentationofwordcountsorfrequency)andcandescribedocumentsbutnotthemeaningofthewords
• Word2vecisawordembeddingmodelthatwasdevelopedbyTomasMikolov atGooglein2013.StanfordUniversityresearchers(Pennington,SocherandManning)cameupwithGloVe,GlobalVectorsforWordRepresentationwhichwewillillustrate.
CNNforSentenceClassification
v “ConvolutionalNeuralNetworksforSentenceClassification”byYoonKIM,NewYorkUniversitydescribeshowtouseCNN’sinNLPtasks.
v YoonKimusedanarchitecturewhichisaslightvariantoftheonein“NaturalLanguageProcessing(Almost)fromScratch”byR.Collobert,J.Weston,L.Bottou,M.Karlen,K.Kavukcuglu,P.Kuksa.JournalofMachineLearningResearch2011
CNNforSentenceClassification-2§ Allthesentencesaremappedto
embeddingvectorsandthenusedasinputs.
§ Convolutionoperationsontheinputsusingkernelsofdifferentsizesareusedtoextractfeaturemaps.
§ Thefeaturemapsareusedasinputstoamax-poolinglayer.
§ Finally,afully-connectedlayerwithdropout&softmaxoutputisusedformakingthefinalprediction.
Example16:SentimentAnalysisusingWordEmbedding+CNNModel
§ WewillusetheMovieReviewPolarityDataset
§ Asbefore,wewillcleanthedatasetbystrippingitofpunctuation,unnecessarywhitespace,convertuppertolowercase,anddiscardstop-wordsandanywordswithnon-alphabeticcharacters.
§ Wewillalsodefineavocabulary(ofpreferredwords)
Example16:SentimentAnalysisusingWordEmbedding+CNNModel-2
Wewilluseathreecomponentarchitecture:
§ WordEmbedding:isavectorspacerepresentationofeachword;wordswhichhavesimilarmeaninginlanguagearealso“close”inthisrepresentation
§ ConvolutionModel:ACNNwhichwillbeusedforfeatureextraction,effectivelyextractingmeaningfulsub-structuresthatareusefulinoverallpredictiontask
§ FullyConnectedModel:ThisinterpretsthefeaturesextractedbytheCNNandmakespredictions(notethatCNNandfullyconnectedlayertointerpretthefeaturesandmakepredictionscanbeinsideoneneuralnetwork).
Example16: SentimentAnalysisusingWordEmbedding+CNNModel-3
WecanestimateawordembeddingsmodelandthenuseaCNN:Runthenotebooksmu_movie_senti_embeddings.ipynb
Example17:SentimentAnalysisusingGloVe
WecanuseStanford’sGloVewhichisalreadyestimated
Runthenotebooksmu_movie_senti_glove.ipynb
# load embedding from fileraw_embedding = load_embedding(os.path.join(project_folder,'data/glove.6B/glove.6B.100d.txt'))
RecurrentNeuralNetworks(RNN)
• RecurrentNeuralNetworksarepowerfulwhenitcomestomodelingsequences.TheRNNintroducesrecurrencebytheuseofloops,whichallowsustomodeltimedependencebypassinginformationfromonesteptothenext.
• TheRNNisthereforeagoodframeworkforconnectingthepastinformationtothepresent,becausethereistimedependence.
• ARecurrentNeuralNetwork(RNN) isabletodowhateveranHMMcando.
• Tosummarize,RNNispowerfulintwosituations– Whenevertemporaldependenceinimportant– Wherevercontextualinformationisimportant
VanishingandExplodingGradients• Duringtheprocessoftraininganeuralnetwork,gradientdescentisutilizedto
updatethenetworkweightsintherightdirectionandbytherightamount.Ineachiterationoftraining,networkweightsareupdatedproportionaltothepartialderivativeoftheerrorfunctionwithrespecttothecurrentweight.
• Therecanbeproblemsifback-propagatederroreither– blowsup(explodinggradient)or– decaysexponentially(vanishinggradient).
• Explodinggradientresultsinunstablemodel,(e.g.,modelweightsbecomeverylarge)andtraininglosschangesveryrapidlyateachupdate. Clippingthegradientsatapre-definedthresholdcancontrolexplodinggradients.
• Vanishinggradientpreventstheweightfromchangingitsvaluebetweenupdates,effectivelystoppingtheneuralnetworkfromfurthertraining. Vanishinggradientaremoredifficulttodetectandcontrol.(UtilizingReLUinsteadofsigmoidactivation,usingregularizationwillhelp)
• ErrorflowanalysisinexistingRNNsfoundthatlongtimelagswerenoteffectivetoexistingarchitectures,andLongShort-TermMemory(LSTM)isanRNNarchitecturespeciallydesignedtoaddressthevanishinggradientproblem.
LSTM
• AspecialtypeofRNNcalledLSTM(LongShortTermMemory)isusedtomodeltemporaldependence(timeseriesprediction)aswellashandlecontextualinformation(whatisthecurrentstateofthemarket).
• Thisallowsthenetworktolearnwhentoforgetprevioushiddenstatesandwhentoupdatehiddenstatesgivennewinformation
• LSTMcanbeusedforsequencepredictions.• LSTMispowerfulinspeechrecognition,machinetranslation
and(whencombinedwithCNN’s)forimagecaptioning.
RNN:“TheUnreasonableEffectivenessofRecurrentNeuralNetworks”byAndrejKarpathy
Let us read from this excellent article:WhatmakesRecurrentNetworkssospecial?AglaringlimitationofVanillaNeuralNetworks(andalsoConvolutionalNetworks)isthattheirAPIistooconstrained:
o Theyacceptafixed-sizedvectorasinput(e.g.animage)andproduceafixed-sizedvectorasoutput(e.g.probabilitiesofdifferentclasses).
o Thesemodelsperformthismappingusingafixedamountofcomputationalsteps(e.g.thenumberoflayersinthemodel).
o Thecorereasonthatrecurrentnetsaremoreexcitingisthattheyallowustooperateover sequences ofvectors:Sequencesintheinput,theoutput,orinthemostgeneralcaseboth.
Letslookatexamplesmaymakethismoreconcrete
RNN:“TheUnreasonableEffectivenessofRecurrentNeuralNetworks”byAndrejKarpathy
Eachrectangleisavectorandarrowsrepresentfunctions(e.g.matrixmultiply).Inputvectorsareinred,outputvectorsareinblueandgreenvectorsholdtheRNN'sstate
(2) Sequenceoutput(e.g.imagecaptioningtakesanimageandoutputsasentenceofwords).
(1) VanillamodeofprocessingwithoutRNN,fromfixed-sizedinputtofixed-sizedoutput(e.g.imageclassification).
(3) Sequenceinput(e.g.sentimentanalysiswhereagivensentenceisclassifiedasexpressingpositiveornegativesentiment).
(4) Sequenceinputandsequenceoutput(e.g.MachineTranslation:anRNNreadsasentenceinEnglishandthenoutputsasentenceinFrench).
(5) Syncedsequenceinputandoutput(e.g.videoclassificationwherewewishtolabeleachframeofthevideo).Noticethatineverycasearenopre-specifiedconstraintsonthelengthssequencesbecausetherecurrenttransformation(green)isfixedandcanbeappliedasmanytimesaswelike.