using deep learning to do real-time scoring in practical applications - 2015-12-14
TRANSCRIPT
© 2015 ligaDATA, Inc. All Rights Reserved.
Using Deep Learning to do Real-Time Scoring in Practical Applications
Deep Learning Applications Meetup, Monday, 12/14/2015, Mountain View, CA http://www.meetup.com/Deep-Learning-Applications/events/227217853/ By Greg Makowski www.Linkedin.com/in/GregMakowski [email protected]
Community @ http://Kamanja.org
Try out
DeepLearning-Outline
• BigPictureof2016Technology• NeuralNetBasics• DeepNetworkConfiguraBonsforPracBcalApplicaBons
– Auto-Encoder(i.e.datacompressionorPrincipalComponentsAnalysis)
– ConvoluBonal(shiKinvarianceinBmeorspaceforvoice,imageorIoT)
– RealTimeScoringandLambdaArchitecture
– DeepNetlibrariesandtools(R,H2O,DL4J,TensorFlow,Gorila,Kamanja)
– ReinforcementLearning,Q-Learning(i.e.beatpeopleatAtarigames,IoT)
– ConBnuousSpaceWordModels(i.e.word2vec)
AdvantagesofaNetoverRegression
5
field1
field2
$
c
$
$
$$
$
$ $
$$$ $$
$
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c c
c
c
c ccc
c
ARegressionSoluBon
“Linear”
FitoneLine
$ c
Targetvaluesforadatapointwithsourcefieldvaluesgraphedby“field1”and“field2”
Showing ONE target field, with values of $ or c https://en.wikipedia.org/wiki/Regression_analysis
AdvantagesofaNetoverRegression
6
field1
field2
$
c
$
$
$$
$
$ $
$$$ $$
$
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c
c c
c
c
c ccc
c
ANeuralNetSoluBon
“Non-Linear”
Severalregionswhichare
notadjacent
Hiddennodescanbelineorcircle
https://en.wikipedia.org/wiki/Artificial_neural_network
AComparisonofaNeuralNet
andRegression
ALogis(cregressionformula:Y=f(a0+a1*X1+a2*X2+a3*X3)a*arecoefficientsBackpropaga(on,castinasimilarform:H1=f(w0+w1*I1+w2*I2+w3*I3)H2=f(w4+w5*I1+w6*I2+w7*I3):Hn=f(w8+w9*I1+w10*I2+w11*I3)O1=f(w12+w13*H1+....+w15*Hn)On=....w*areweights,AKAcoefficientsI1..Inareinputnodesorinputvariables.H1..Hnarehiddennodes,whichextractfeaturesofthedata.O1..Onaretheoutputs,whichgroupdisjointcategories.LookatraBooftrainingrecordsv.s.freeparameters(complexity,regularizaBon)
a0
a1 a2 a3
X1 X2 X3
Y
Input1 I2 I3
Bias
H1 Hidden2
Output
w1w2
w3
ThinkofSeparaBngLandvs.Water
8
1 line, Regression
(more errors)
5 Hidden Nodes in a Neural Network
Different algorithms use different Basis Functions: • One line • Many horizontal & vertical lines • Many diagonal lines • Circles
Decision Tree 12 splits
(more elements, Less computation)
Q) What is too detailed? “Memorizing high tide boundary” and applying it at all times
DeepLearning-Outline
• BigPictureof2016Technology• NeuralNetBasics• DeepNetworkConfiguraBonsforPracBcalApplicaBons
– Auto-Encoder(i.e.datacompressionorPrincipalComponentsAnalysis)
– ConvoluBonal(shiKinvarianceinBmeorspaceforvoice,imageorIoT)
– RealTimeScoringandLambdaArchitecture
– DeepNetlibrariesandtools(R,H2O,DL4J,TensorFlow,Gorila,Kamanja)
– ReinforcementLearning,Q-Learning(i.e.beatpeopleatAtarigames,IoT)
– ConBnuousSpaceWordModels(i.e.word2vec)
http://deeplearning.net/ http://www.kdnuggets.com/ http://www.analyticbridge.com/
LeadinguptoanAutoEncoder
• SupervisedLearning– Regression,TreeorNet:50inputsà1output– Possiblenets:
• 256à120à1• 256à120à5(trees,regressionsandmostarelimitedto1output)• 256à120à60à1• 256à180à120à60à1(startgemngintotrainingstabilityproblems,witholdprocesses)
• UnsupervisedLearning– Clustering(tradiBonalunsupervised):
• 60inputs(notarget);produce1-2new(clusterID&distance)
AutoEncoder(likedatacompression)Relateinputtooutput,throughcompressedmiddle
• SupervisedLearning– Regression,TreeorNet:50inputsà1output– Possiblenets:
• 256à120à1• 256à120à5(trees,regressions,SVDandmostarelimitedto1output)• 256à120à60à1• 256à180à120à60à1
• UnsupervisedLearning– Clustering(tradiBonalunsupervised):
• 60inputs(notarget);produce1-2new(clusterID&distance)– Unsupervisedtrainingofanet,assign(targetrecord==inputrecord)AUTO-ENCODING– Trainnetinstages,
• 256à180à256à120à
à120àà120à
• Addsupervisedlayertoforecast10targetcategoriesà10
Because of symmetry, Only need to update mirrored weights once
(start getting long training times to stabilize, or may not finish, The BREAKTHROUGH provided by DEEP LEARNING)
4 hidden layers w/ unsupervised training 1 layer at end w/ supervised training https://en.wikipedia.org/wiki/Deep_learning
AutoEncoderHowitcanbegenerallyusedtosolveproblems
• Addsupervisedlayertoforecast10targetcategories– 4hiddenlayerstrainedwithunuspervisedtraining, – 1newlayer,trainedwithsupervisedlearning
à10
• OutlierdetecBon
• The“acBvaBon”ateachofthe120outputnodesindicatesthe“match”tothatclusterorcompressedfeature
• Whenscoringnewrecords,candetectoutlierswithaprocesslikeIf(max_output_match<0.333)thensuspectedoutlier
• HowisitlikePCA? – Individualhiddennodesinthesamelayerare“different”or“orthogonal”
HowTransferableareFeaturesinDeepNeuralNetworks?
http://papers.nips.cc/paper/5347-how-transferable-are-features-in-deep-neural-networks.pdf
DeepLearning-Outline
• BigPictureof2016Technology• NeuralNetBasics• DeepNetworkConfiguraBonsforPracBcalApplicaBons
– Auto-Encoder(i.e.datacompressionorPrincipalComponentsAnalysis)
– ConvoluBonal(shiKinvarianceinBmeorspaceforvoice,imageorIoT)
– RealTimeScoringandLambdaArchitecture
– DeepNetlibrariesandtools(R,H2O,DL4J,TensorFlow,Gorila,Kamanja)
– ReinforcementLearning,Q-Learning(i.e.beatpeopleatAtarigames,IoT)
– ConBnuousSpaceWordModels(i.e.word2vec)
DeepLearningCauseda50%ReducBoninSpeechrecogniBonerrorratesin4yrs
“TheuseofdeepneuralnetsinproducBonspeechsystemsreallystartedmorelikein2011...
IwouldesBmatethatfromtheBmebeforedeepneuralnetswereusedunBlnow,theerrorrateonproducBonspeechsystemsfellfromabout20%downtobelow10%,somorethana50%reducBoninerrorrate.”-JeffDeanemailtoGreg12/13/2015
http://research.google.com/people/jeff/ Senior Fellow in the Knowledge Group Google
Drop in Speech Rec. Error Rates
Deep Learning Deployments Started
2011
InternetofThings(IoT)isheavilysignaldata
http://www.datasciencecentral.com/profiles/blogs/the-internet-of-things-data-science-and-big-data
ConvoluBonalNeuralNet(CNN)EnablesdetecBngshiKinvariantpaxerns
In Speech and Image applications, patterns vary by size, can be shifted right or left Challenge: finding a bounding box for a pattern is almost as hard as detecting the pat.
Neural Nets can be explicitly trained to provide a FFT (Fast Fourier Transform) to convert data from time domain to the frequency domain – but typically an explicit FFT is used
InternetofThingsSignalData
ConvoluBonalNeuralNet(CNN)EnablesdetecBngshiKinvariantpaxerns
In Speech and Image applications, patterns vary by size, can be shifted right or left Challenge: finding a bounding box for a pattern is almost as hard as detecting the pat. Solution: use a siding convolution to detect the pattern
CNN can use very long observational windows, up to 400 ms, long context
ConvoluBonNeuralNet:fromLeNet-5
Gradient-BasedLearningAppliedtoDocumentRecogniBonProceedingsoftheIEEE,Nov1998YannLeCun,LeonBoxou,YoshuaBengioandPatrickHaffner
Director Facebook, AI Research
http://yann.lecun.com/
ConvoluBonNeuralNet(CNN)
• HowisaCNNtraineddifferentlythanatypicalbackpropagaBon(BP)network?
– Partsofthetrainingwhichisthesame:• Presentinputrecord• Forwardpassthroughthenetwork• Backpropagateerror(i.e.perepoch)
– Differentpartsoftraining:• SomeconnecBonsareCONSTRAINEDtothesamevalue
– TheconnecBonsforthesamepaxern,slidingoverallinputspace
• Errorupdatesareaveragedandappliedequallytotheonesetofweightvalues
• Endupwiththesamepaxerndetectorfeedingmanynodesatthenextlevel
http://www.cs.toronto.edu/~rgrosse/icml09-cdbn.pdf Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations, 2009
ConvoluBonNeuralNet(CNN)SameLowLevelFeatures
http://stats.stackexchange.com/questions/146413/why-convolutional-neural-networks-belong-to-deep-learning
TheMammalianVisualCortexisHierarchical(TheBrainisaDeepNeuralNet-YannLeCun)
http://www.pamitc.org/cvpr15/files/lecun-20150610-cvpr-keynote.pdf
0
1
2
3 4
5 6
7 8
9
10 11
ConvoluBonNeuralNet(CNN)Facebookexample
https://gigaom.com/2014/03/18/facebook-shows-off-its-deep-learning-skills-with-deepface/
ConvoluBonNeuralNet(CNN)Yahoo+Stanfordexample–findafaceinapic,evenupsidedown
http://www.dailymail.co.uk/sciencetech/article-2958597/Facial-recognition-breakthrough-Deep-Dense-software-spots-faces-images-partially-hidden-UPSIDE-DOWN.html
ConvoluBonalNeuralNets(CNN)RoboBcGraspDetecBon(IoT)
http://pjreddie.com/media/files/papers/grasp_detection_1.pdf
DeepLearning-Outline
• BigPictureof2016Technology• NeuralNetBasics• DeepNetworkConfiguraBonsforPracBcalApplicaBons
– Auto-Encoder(i.e.datacompressionorPrincipalComponentsAnalysis)
– ConvoluBonal(shiKinvarianceinBmeorspaceforvoice,imageorIoT)
– RealTimeScoringandLambdaArchitecture
– DeepNetlibrariesandtools(R,H2O,DL4J,TensorFlow,Gorila,Kamanja)
– ReinforcementLearning,Q-Learning(i.e.beatpeopleatAtarigames,IoT)
– ConBnuousSpaceWordModels(i.e.word2vec)
RealTimeScoringOpBmizaBons
• Auto-Encodingnets– CangrowtomillionsofconnecBons,andstarttogetcomputaBonal– CanreduceconnecBonsby5%to25+%withpruning&retraining
• TrainwithincreasedregularizaBonsemngs• Dropconnec(onswithnearzeroweights,thenretrain• DropnodeswithfaninconnecBonswhichdon’tgetusedmuchlater,suchasinyourpredicBveproblem
• PerformsensiBvityanalysis–deletepossibleinputfields
• ConvoluBonalNeuralNets– Withlargeenoughdata,canevenskiptheFFTpreprocessingstep– Canusewiderthan10msaudiosamplingratesforspeedup
• Implementotherpreprocessingaslookuptables(i.e.BayesianPriors)• UsecloudcompuBng,donotlimittodevicecompuBng• Largemodelsdon’tfitàusemodelordataparallelismtotrain
© 2015 ligaDATA, Inc. All Rights Reserved. 30
ligaDATARealTimeScoringLambdaArchitecture–forbothBatchandRealTime
• Firstarchitecturetoreallydefinehowbatchandstreamprocessingcanworktogether• Foundedontheconceptsofimmutabilityandre-computaBon,withhumanfaulttolerance• Pre-computestheresultsofbatch&real-Bmeprocessesasasetofviews,&querylayer
mergestheviews
https://en.wikipedia.org/wiki/Lambda_architecture
© 2015 ligaDATA, Inc. All Rights Reserved. 31
ligaDATARealTimeScoringLambdaArchitectureWithKamanja
Kamanja
Decisions Transformations
Enrichment Aggregations
Master Dataset
Real time Views & Indexing
Serving Layer
Query
Query
Real-time Data
• KamanjaembracesandextendsLambdaarchitecture• Transformandprocessmessagesinreal-Bme,combinemessageswithhistorical
dataandcomputereal-Bmeviewstomakereal-Bmedecisionsbasedontheviews
Queue
© 2015 ligaDATA, Inc. All Rights Reserved. 32
ligaDATA
RealTimeCompu(ng
Kamanja Technology Stack
Kamanja(PMML,JavaorScalaConsumer)
High level languages / abstractions
Compute Fabric
Cloud, EC2 Internal Cloud
Security
Kerberos
Real Time Streaming
Kafka, MQ
Spark*
ligaDATA
Data Store
HBase, Cassandra,
InfluxDB HDFS
(Create adaptors to integrate others)
Resource Management
Zookeeper, Yarn*, Mesos*
High Level Languages / Abstractions
PMML Producers, MLlib
DeepLearning-Outline
• BigPictureof2016Technology• NeuralNetBasics• DeepNetworkConfiguraBonsforPracBcalApplicaBons
– Auto-Encoder(i.e.datacompressionorPrincipalComponentsAnalysis)
– ConvoluBonal(shiKinvarianceinBmeorspaceforvoice,imageorIoT)
– RealTimeScoringandLambdaArchitecture
– DeepNetlibrariesandtools(R,H2O,DL4J,TensorFlow,Gorila,Kamanja)
– ReinforcementLearning,Q-Learning(i.e.beatpeopleatAtarigames,IoT)
– ConBnuousSpaceWordModels(i.e.word2vec)
DeepReinforcementLearning,Q-Learning
http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind https://en.wikipedia.org/wiki/Reinforcement_learning https://en.wikipedia.org/wiki/Q-learning
ThinkintermsofIoT….Deviceagentmeasures,infersuser’sacBonMaximizesfuturereward,recommendstouserorsystem
DeepReinforcementLearning,Q-Learning(ThinkaboutIoTpossibiliBes)
http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind
Use 4 screen shots
DeepReinforcementLearning,Q-Learning
http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind
Use 4 screen shots
Use 4 screen shots
IoT challenge: How to replace game score with IoT score?
Shift right fast shift right stay shift left shift left fast
DeepReinforcementLearning,Q-Learninghttp://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind
Games w/ best Q-learning Video Pinball Breakout Star Gunner Crazy Climber Gopher
DeepLearning-Outline
• BigPictureof2016Technology• NeuralNetBasics• DeepNetworkConfiguraBonsforPracBcalApplicaBons
– Auto-Encoder(i.e.datacompressionorPrincipalComponentsAnalysis)
– ConvoluBonal(shiKinvarianceinBmeorspaceforvoice,imageorIoT)
– RealTimeScoring
– DeepNetlibrariesandtools(R,H2O,DL4J,TensorFlow,Gorila,Kamanja)
– ReinforcementLearning,Q-Learning(i.e.beatpeopleatAtarigames,IoT)
– ConBnuousSpaceWordModels(i.e.word2vec)
ConBnuousSpaceWordModels(word2vec)
• Before(apredicBve“BagofWords”model):– Onerowperdocument,paragraphorwebpage– Binarywordspace:10kto200kcolumns,oneperwordorphrase
000100000000000100001….“Thiswordspacemodelis….”– The“Bagofwordsmodel”relatesinputrecordtoatargetcategory
ConBnuousSpaceWordModels(word2vec)
• Before(apredicBve“BagofWords”model):– Onerowperdocument,paragraphorwebpage– Binarywordspace:10kto200kcolumns,oneperwordorphrase
000100000000000100001….“Thiswordspacemodelis….”– The“Bagofwordsmodel”relatesinputrecordtoatargetcategory
• New:– Onerowperword(word2vec),possiblypersentence(sent2vec)– Con(nuouswordspace:100to300columns,conBnuousvalues
.01.05.02.00.00.68.01.01.35....00à“King”
.00.00.05.01.49.52.00.11.84....01à“Queen”– ThedeepnettrainingresultedinanEmergentProperty:
• NumericgeometrylocaBonrelatestoconceptspace• “King”–“man”+“woman”=“Queen”(mathtochangegenderrelaBon)• “USA”–“WashingtonDC”+“England”=“London”(mathforcapitalrelaBon)
ConBnuousSpaceWordModels(word2vec)HowtoSCALEtolargervocabularies?
http://www.slideshare.net/hustwj/cikm-keynotenov2014?qid=f92c9e86-feea-41ac-a099-d086efa6fac1&v=default&b=&from_search=2
TrainingConBnuousSpaceWordModels
• HowtoTrainTheseModels?– Rawdata:“Thisexamplesentenceshowstheword2vecmodeltraining.”
TrainingConBnuousSpaceWordModels
• HowtoTrainTheseModels?– Rawdata:“Thisexamplesentenceshowstheword2vecmodeltraining.”– Trainingdata(withtargetvaluesunderscored,andotherwordsasinput)
“Thisexamplesentenceshowsword2vec”(prune“the”)“examplesentenceshowsword2vecmodel”“sentenceshowsword2vecmodeltraining”
– Thecontextofthe2to5priorandfollowingwordspredictthemiddleword
– DeepNetmodelarchitecture,datacompressionto300conBnuousnodes• 50kbinarywordinputvectorà...à300à...à50kwordtargetvector
TrainingConBnuousSpaceWordModels
• HowtoTrainTheseModels?– Rawdata:“Thisexamplesentenceshowstheword2vecmodeltraining.”– Trainingdata(withtargetvaluesunderscored,andotherwordsasinput)
“Thisexamplesentenceshowsword2vec”(prune“the”)“examplesentenceshowsword2vecmodel”“sentenceshowsword2vecmodeltraining”
– Thecontextofthe2to5priorandfollowingwordspredictthemiddleword
– DeepNetmodelarchitecture,datacompressionto300conBnuousnodes• 50kbinarywordinputvectorà...à300à...à50kwordtargetvector
• UsePre-TrainedModelshxps://code.google.com/p/word2vec/
– Trainedon100billionwordsfromGoogleNews– 300dimvectorsfor3millionwordsandphrases– hxps://code.google.com/p/word2vec/
TrainingConBnuousSpaceWordModels
http://www.slideshare.net/hustwj/cikm-keynotenov2014?qid=f92c9e86-feea-41ac-a099-d086efa6fac1&v=default&b=&from_search=2
ApplyingConBnuousSpaceWordModels
http://static.googleusercontent.com/media/research.google.com/en//people/jeff/BayLearn2015.pdf State of the art in machine translation Sequence to Sequence Learning with neural Networks, NIPS 2014
LanguagetranslaBonDocumentsummaryGeneratetextcapBonsforpictures
.01
.05
.89
.00
.05
.62
.00
.34
“Greg’sGuts”onDeepLearning
• SomeclaimtheneedforpreprocessingandknowledgerepresentaBonhasended– FormostofthesignalprocessingapplicaBonsàyes,simplify– IamVERYREADYTOCOMPETEinotherapplicaBons,conBnuing
• expressingexplicitdomainknowledge• opBmizingbusinessvaluecalculaBons
• DeepLearninggetsbigadvantagesfrombigdata– Why?BexerpopulaBnghighdimensionalspacecombinaBonsubsets– UnsupervisedfeatureextracBonreducesneedforlargelabeleddata
• However,“regularsizeddata”getsabigboostaswell– The“raBooffreeparameters”(i.e.neurons)totrainingsetrecords– Forregressionsorregularnets,want5-10Bmesasmanyrecords– RegularizaBonandweightdropoutreducesthispressure– Especiallywhenonlytraining“thenextautoencodinglayer”
DeepLearningSummary–ITSEXCITING!
• DiscussedDeepLearningarchitectures– AutoEncoder,convoluBonal,reinforcementlearning,conBnuousword
• RealTimespeedup– Trainmodel,reducecomplexity,retrain– Simplifypreprocessingwithlookuptables– UsecloudcompuBng,donotbelimitedtodevicecompuBng– LambdaarchitecturelikeKamanja,tocombinerealBmeandbatch
• ApplicaBons– SignalData:IoT,Speech,Images– ControlSystemmodels(likeAtarigameplaying,IoT)– LanguageModels
https://www.quora.com/Why-is-deep-learning-in-such-demand-now
© 2015 ligaDATA, Inc. All Rights Reserved.
Using Deep Learning to do Real-Time Scoring in Practical Applications
Deep Learning Applications Meetup, Monday, 12/14/2015, Mountain View, CA http://www.meetup.com/Deep-Learning-Applications/events/227217853/ By Greg Makowski www.Linkedin.com/in/GregMakowski [email protected]
Community @ http://Kamanja.org
Try out