qf 624: machine learning for financial applications part 3.pdfif all 6 were to be traded, there...

38
QF 624: Machine Learning for Financial Applications Automated Pattern Recognition, HMM, NLP Master of Science in Quantitative Finance Lee Kong Chian School of Business Saurabh Singal July 2018

Upload: others

Post on 02-Sep-2019

1 views

Category:

Documents


0 download

TRANSCRIPT

QF624:MachineLearningforFinancialApplications

AutomatedPatternRecognition,HMM,NLP

MasterofScienceinQuantitativeFinanceLeeKongChian SchoolofBusiness

SaurabhSingalJuly2018

PareidoliaPareidolia– seeingfacesinacloud(faceonMars)

WikipediadefinesPareidoliaas“apsychologicalphenomenoninwhichthemindrespondstoastimulus,usuallyanimageorasound,byperceivingafamiliarpatternwherenoneexists.”

Example13:AutomatedPatternRecognition

Osler,CarolL.& P.H.KevinChangHeadandShoulders:NotJustaFlakyPatternNo4,StaffReportsfromFederalReserveBankofNewYork

Abstract:“Thispaperevaluatesrigorouslythepredictivepowerofthehead-and-shoulderspatternasappliedtodailyexchangerates.Thoughsuchvisual,nonlinearchartpatternsareappliedfrequentlybytechnicalanalysts,ourpaperisoneofthefirsttoevaluatethepredictivepowerofsuchpatterns.Weapplyatradingrulebasedonthehead-and-shoulderspatterntodailyexchangeratesofmajorcurrenciesversusthedollarduringthefloatingrateperiod(fromMarch1973toJune1994).“

HeadandShoulders:NotJustaFlakyPattern

• “Weidentifyhead-and-shoulderspatternsusinganobjective,computer-implementedalgorithmbasedoncriteriainpublishedtechnicalanalysismanuals.Theresultingprofits,replicableinreal-time,arethencomparedwiththedistributionofprofitsfor10,000simulatedseriesgeneratedwiththebootstraptechniqueunderthenullhypothesisofarandomwalk.”

• “Result:Thetradingrulehaspredictivepowerfor2outof6FXcrosses.Ifall6weretobetraded,therewouldeconomicallyandstatisticallysignificantprofits.Theresultsarerobusttochangesintheparametersoftheidentificationalgorithmaswellassampleperiod.”

TheHeadShouldersPattern

Source:Headandshoulders:NotJustaFlakyPattern,CarolOsler&KevinChang

AndrewLo:TechnicalAnalysis

Lo,AndrewW.,HarryMamaysky andJiangWangFoundationsofTechnicalAnalysis:ComputationalAlgorithms,StatisticalInferenceandEmpiricalImplementationJournalofFinance55(2000),1705–1765.

Abstract:Technicalanalysis,alsoknownas``charting'‘,hasbeenapartoffinancialpracticeformanydecades,butthisdisciplinehasnotreceivedthesamelevelofacademicscrutinyandacceptanceasmoretraditionalapproachessuchasfundamentalanalysis.Oneofthemainobstaclesisthehighlysubjectivenatureoftechnicalanalysis—thepresenceofgeometricshapesinhistoricalpricechartsisoftenintheeyesofthebeholder.

AndrewLo:TechnicalAnalysis(2)

ABSTRACT(Contd.)Inthispaper,weproposeasystematicandautomaticapproachtotechnicalpatternrecognitionusingnon-parametrickernelregression,andapplythismethodtoalargenumberofU.S.stocksfrom1962to1996toevaluatetheeffectivenessoftechnicalanalysis.Bycomparingtheunconditionalempiricaldistributionofdailystockreturnstotheconditionaldistribution—conditionedonspecifictechnicalindicatorssuchashead-and-shouldersordouble-bottoms—wefindthatoverthe31-yearsampleperiod,severaltechnicalindicatorsdoprovideincrementalinformationandmayhavesomepracticalvalue.

KernelRegressionandPatterns

• Thefollowingpatternswereanalyzedon3differentstockindicesand4ETF’s.

TheoriginsofMarkovChaintheory.

• Supposeyouaregivenabodyoftextandaskedtoguesswhethertheletteratarandomlyselectedpositionisavoweloraconsonant.Sinceconsonantsoccurmorefrequentlythanvowels,yourbestbetistoalwaysguessconsonant.

• Supposewedecidetobealittlemorehelpfulandtellyouwhethertheletterprecedingtheoneyouchoseisavowelorconsonant.Istherenowabetterstrategyyoucanfollow?

• In1913,A.A.MarkovwastryingtoanswertheaboveproblemanalysedtwentythousandlettersfromPushkin'spoemEugeneOrigin.Hefoundthat43%letterswerevowelsand57%,consonants.Sointhefirstproblem,oneshouldalwaysguess"consonant"andcanhopetobecorrect57%ofthetime.

Pushkin'sPoetryandMarkovChains

• However,avowelwasfollowedbyconsonant87%ofthetime.Aconsonantwasfollowedbyavowel66%ofthetime.Hence,guessingtheoppositeoftheprecedingletterwouldbeabetterstrategyinthesecondcase.Clearly,knowledgeoftheprecedingletterishelpful.

• TherealinsightcamewhenMarkovtooktheanalysisastepfurther.Markovinvestigatedwhetherknowledgeabouttheprecedingtwolettersconfersanyadditionaladvantage.Hefoundthattherewasnosignificantadvantagetoknowingtheadditionalprecedingletter.ThisleadstothecentralideaofaMarkovchain- whilethesuccessiveoutcomesarenotindependent,onlythemostrecentoutcomeisofuseinmakingapredictionaboutthenextoutcome.

HiddenMarkovModelsandRegimeSpecificStrategies

• Weknowthatthestockmarketnotonlychangesdirection(bullversusbearmarkets)butalsochangesvolatility.

• Calm,peacefulperiodsoflowvolatilityarepunctuatedbyturbulentperiods;occasionallythereareepisodesofpanic.

• A 4–stateHMMcanbefit(peaceful,stable,turbulent,panic-riven).Aggressive/Conservativestrategiesandportfoliomixcanbeappropriateindifferentstates.SeeAppendix4:HiddenMarkovModels

Example14:HMMImprovesStrategyReturns

12

TradingSignalsenhancedbyProprietaryHMM

AndrewViterbi

AndrewViterbi isanAmerican electricalengineer§ Inventedthe Viterbialgorithmwhichisadynamic

programmingbasedalgorithmforfindingthemostlikelysequenceofhiddenstatesthatcouldhaveresultedinthesequenceofobservedevents.

§ HelpeddevelopCDMAstandardforcellphones§ Co-founded QualcommInc.§ UniversityofSouthernCalifornia's ViterbiSchoolof

Engineering,isnamedinhishonorin2004inrecognitionofhis$52milliongift.

NaturalLanguageProcessing

• Linguisticsisthescientificstudyoflanguage,includingitsgrammar,semantics,andphonetics.Classicallinguisticsinvolveddevisingandevaluatingrulesoflanguage.

• Computationallinguisticsisthemodernstudyoflinguisticsusingthetoolsofcomputerscience.

• Theexamplesforsentimentanalysisusingbagofwords/wordembeddingsarefromthebookDeepLearningforNLPbyJasonBrownlee(andhisblogsatmachinelearningmastery.com)

SeeAppendix5:NeuralNetworks;Appendix6:Backpropagationexplainedindepth

StopWords

• Astopwordisacommonlyusedword(suchasthe,a,is,are,just)thatprovidelittlecontext

• Theyareremovedduringatextclean-uporpre-processingstage…

• ….becausetheyprovidemorenoisethansignalforacomputationallinguisticanalysisproject.

WhyisVocabularydefined?

• Definingavocabularyofknownorpreferredwordsisimportantinanynaturallanguageprocessingtask.

• Thelargerthevocabulary,thelargerthedimensionofthevectorspace,andthelargeristherepresentationofdocuments.

• Itismoreefficienttoselectonlythosewordsthatarebelievedtohavepredictivepowerorinformationalcontent

Bag-of-WordsModel

• Bag-of-Words(BoW)isarepresentationoftextdatausedfordocumentclassificationandfeatureextraction

• ThetaskoftextmodellingismorecomplicatedbecauseMachineLearningcannotworkonrawtextdirectly;weneedtorepresenttextasnumericvectors.

• Wemakeavocabularyofknownwordsandthencomputethescoreforeachword,thatis,eachwordcountisafeature.

BagofWords-2

• Onesimpleintuitionisthattwodocumentsaresimilariftheyhavesimilarwords.

• Wecanlearnsomething abouttheofadocumentbyexaminingtheseword-scores.

• Ann-gramisann-tokensequenceofwords:a2-gramorbigramisatwo-wordsequenceofwordslike“howare”anda3-gramoratrigramisathree-wordsequencelike“howareyou”.

• Itiscalledabag-of-words,becauseanyinformationabouttheorderorstructureofwordsinthedocumentisdiscarded.

• OnelimitationofBagofWordsisthatwordorderisnotimportant;thuswecannotinfermeaningfromcontext

• Forexample– ThisisimportantvsIsthisimportant– Goodvs“notGood”

WordScoringSchemesusedforTextEncodingbyTokenizer

• Binary- Wherewordsaremarkedaspresent(1)orabsent(0).

• Count- Wheretheoccurrencecountforeachwordismarkedasaninteger.

• Frequency- Wherewordsarescoredbasedontheirfrequencyofoccurrencewithinthedocument.

• TF-IDF- Whereeachwordisscoredbasedonitsfrequency,andwordsthatarecommonacrossalldocumentsarepenalized.

TermFrequency-InverseDocumentFrequency:TF-IDF

• Aproblemwithscoringwordfrequencyisthathighlyfrequentwordsstarttodominateinthedocument,buttheymaynotcontainasmuchinformationalcontent

• Oneapproachistorescalethefrequencyofwordsbyhowoftentheyappearinalldocuments,sothatthescoresforfrequentwordslikethe or thatarealsofrequentacrossalldocumentsarepenalized.ThisapproachtoscoringiscalledTermFrequency-InverseDocumentFrequency,orTF-IDFforshort,where:– TermFrequency:isascoringofthefrequencyofthewordinthe

currentdocument.– InverseDocumentFrequency:isascoringofhowrarethewordis

acrossdocuments.

IntroducingtheMovieReviewData

• TheMovieReviewDataisacollectionofmoviereviewsretrievedfromtheimdb.com website.

• ThereviewswerecollectedandmadeavailablebyBoPangandLillianLeeforresearchworkonNLP.

• Thedatasetcomprises1,000positiveand1,000negativemoviereviewsandiscalledthepolaritydataset.

• Ithasbecomeastandarddatasetfornaturallanguageprocessingandsentimentanalysisresearch,similartoIrisdatasetfortutorialsonclusteringortheMNISTdatasetforintroductorycomputervision.

BagofWords&MovieReviewSentiment

• Theessentialstepsare– Cleanthedata(removepunctuation,stopwords,converttolowercase,etc.)

– Createavocabulary– Createtokensandencodestringstonumericoutput– Thescoreeachdocumentbyfrequencyofeachwordinthevocabulary.IfthereareNwordsinthevocabulary,eachdocumentisrepresentedbyavectoroflengthN,andtheN-th entryisthecount(orfrequency)oftheN-th wordinthevocabulary

– Fitaneuralnetworktotrainingdata

Example15:SentimentAnalysisusingBagofWords

• Letusrunsomepythoncode:smu_moview_senti_BoW.ipynb

WordEmbeddingModels:Google’sWord2Vec,Stanford’sGloVe

• AWordEmbeddingmodelprovidesdensevectorrepresentationofwordsthatsucceedsincapturingthesemanticregularityinthelanguage.

• Thevectorspacerepresentationofthewordsprovidesaprojectionwherewordswithsimilarmeaningsarelocallyclusteredwithinthespace)wordswithsimilarmeaningarerepresentedbyvectorswhichareclosetoeachother.

• ThisismoremorepowerfulthanaBagofWordsapproachwhichissimple(sparsevectorrepresentationofwordcountsorfrequency)andcandescribedocumentsbutnotthemeaningofthewords

• Word2vecisawordembeddingmodelthatwasdevelopedbyTomasMikolov atGooglein2013.StanfordUniversityresearchers(Pennington,SocherandManning)cameupwithGloVe,GlobalVectorsforWordRepresentationwhichwewillillustrate.

LinearAlgebraonwords:queen=(king- man)+woman

CNNforSentenceClassification

v “ConvolutionalNeuralNetworksforSentenceClassification”byYoonKIM,NewYorkUniversitydescribeshowtouseCNN’sinNLPtasks.

v YoonKimusedanarchitecturewhichisaslightvariantoftheonein“NaturalLanguageProcessing(Almost)fromScratch”byR.Collobert,J.Weston,L.Bottou,M.Karlen,K.Kavukcuglu,P.Kuksa.JournalofMachineLearningResearch2011

CNNforSentenceClassification-2§ Allthesentencesaremappedto

embeddingvectorsandthenusedasinputs.

§ Convolutionoperationsontheinputsusingkernelsofdifferentsizesareusedtoextractfeaturemaps.

§ Thefeaturemapsareusedasinputstoamax-poolinglayer.

§ Finally,afully-connectedlayerwithdropout&softmaxoutputisusedformakingthefinalprediction.

Example16:SentimentAnalysisusingWordEmbedding+CNNModel

§ WewillusetheMovieReviewPolarityDataset

§ Asbefore,wewillcleanthedatasetbystrippingitofpunctuation,unnecessarywhitespace,convertuppertolowercase,anddiscardstop-wordsandanywordswithnon-alphabeticcharacters.

§ Wewillalsodefineavocabulary(ofpreferredwords)

Example16:SentimentAnalysisusingWordEmbedding+CNNModel-2

Wewilluseathreecomponentarchitecture:

§ WordEmbedding:isavectorspacerepresentationofeachword;wordswhichhavesimilarmeaninginlanguagearealso“close”inthisrepresentation

§ ConvolutionModel:ACNNwhichwillbeusedforfeatureextraction,effectivelyextractingmeaningfulsub-structuresthatareusefulinoverallpredictiontask

§ FullyConnectedModel:ThisinterpretsthefeaturesextractedbytheCNNandmakespredictions(notethatCNNandfullyconnectedlayertointerpretthefeaturesandmakepredictionscanbeinsideoneneuralnetwork).

Example16: SentimentAnalysisusingWordEmbedding+CNNModel-3

WecanestimateawordembeddingsmodelandthenuseaCNN:Runthenotebooksmu_movie_senti_embeddings.ipynb

Example17:SentimentAnalysisusingGloVe

WecanuseStanford’sGloVewhichisalreadyestimated

Runthenotebooksmu_movie_senti_glove.ipynb

# load embedding from fileraw_embedding = load_embedding(os.path.join(project_folder,'data/glove.6B/glove.6B.100d.txt'))

RecurrentNeuralNetworks(RNN)

• RecurrentNeuralNetworksarepowerfulwhenitcomestomodelingsequences.TheRNNintroducesrecurrencebytheuseofloops,whichallowsustomodeltimedependencebypassinginformationfromonesteptothenext.

• TheRNNisthereforeagoodframeworkforconnectingthepastinformationtothepresent,becausethereistimedependence.

• ARecurrentNeuralNetwork(RNN) isabletodowhateveranHMMcando.

• Tosummarize,RNNispowerfulintwosituations– Whenevertemporaldependenceinimportant– Wherevercontextualinformationisimportant

VanishingandExplodingGradients• Duringtheprocessoftraininganeuralnetwork,gradientdescentisutilizedto

updatethenetworkweightsintherightdirectionandbytherightamount.Ineachiterationoftraining,networkweightsareupdatedproportionaltothepartialderivativeoftheerrorfunctionwithrespecttothecurrentweight.

• Therecanbeproblemsifback-propagatederroreither– blowsup(explodinggradient)or– decaysexponentially(vanishinggradient).

• Explodinggradientresultsinunstablemodel,(e.g.,modelweightsbecomeverylarge)andtraininglosschangesveryrapidlyateachupdate. Clippingthegradientsatapre-definedthresholdcancontrolexplodinggradients.

• Vanishinggradientpreventstheweightfromchangingitsvaluebetweenupdates,effectivelystoppingtheneuralnetworkfromfurthertraining. Vanishinggradientaremoredifficulttodetectandcontrol.(UtilizingReLUinsteadofsigmoidactivation,usingregularizationwillhelp)

• ErrorflowanalysisinexistingRNNsfoundthatlongtimelagswerenoteffectivetoexistingarchitectures,andLongShort-TermMemory(LSTM)isanRNNarchitecturespeciallydesignedtoaddressthevanishinggradientproblem.

LSTM

• AspecialtypeofRNNcalledLSTM(LongShortTermMemory)isusedtomodeltemporaldependence(timeseriesprediction)aswellashandlecontextualinformation(whatisthecurrentstateofthemarket).

• Thisallowsthenetworktolearnwhentoforgetprevioushiddenstatesandwhentoupdatehiddenstatesgivennewinformation

• LSTMcanbeusedforsequencepredictions.• LSTMispowerfulinspeechrecognition,machinetranslation

and(whencombinedwithCNN’s)forimagecaptioning.

RNN:“TheUnreasonableEffectivenessofRecurrentNeuralNetworks”byAndrejKarpathy

Let us read from this excellent article:WhatmakesRecurrentNetworkssospecial?AglaringlimitationofVanillaNeuralNetworks(andalsoConvolutionalNetworks)isthattheirAPIistooconstrained:

o Theyacceptafixed-sizedvectorasinput(e.g.animage)andproduceafixed-sizedvectorasoutput(e.g.probabilitiesofdifferentclasses).

o Thesemodelsperformthismappingusingafixedamountofcomputationalsteps(e.g.thenumberoflayersinthemodel).

o Thecorereasonthatrecurrentnetsaremoreexcitingisthattheyallowustooperateover sequences ofvectors:Sequencesintheinput,theoutput,orinthemostgeneralcaseboth.

Letslookatexamplesmaymakethismoreconcrete

RNN:“TheUnreasonableEffectivenessofRecurrentNeuralNetworks”byAndrejKarpathy

Eachrectangleisavectorandarrowsrepresentfunctions(e.g.matrixmultiply).Inputvectorsareinred,outputvectorsareinblueandgreenvectorsholdtheRNN'sstate

(2) Sequenceoutput(e.g.imagecaptioningtakesanimageandoutputsasentenceofwords).

(1) VanillamodeofprocessingwithoutRNN,fromfixed-sizedinputtofixed-sizedoutput(e.g.imageclassification).

(3) Sequenceinput(e.g.sentimentanalysiswhereagivensentenceisclassifiedasexpressingpositiveornegativesentiment).

(4) Sequenceinputandsequenceoutput(e.g.MachineTranslation:anRNNreadsasentenceinEnglishandthenoutputsasentenceinFrench).

(5) Syncedsequenceinputandoutput(e.g.videoclassificationwherewewishtolabeleachframeofthevideo).Noticethatineverycasearenopre-specifiedconstraintsonthelengthssequencesbecausetherecurrenttransformation(green)isfixedandcanbeappliedasmanytimesaswelike.

J.M.Keynes

• Whenthefactschange,Ichangemymind.Whatdoyoudo,sir?(attributedtoKeynes)