using deep learning to do real-time scoring in practical applications - 2015-12-14

50
Using Deep Learning to do Real-Time Scoring in Practical Applications Deep Learning Applications Meetup, Monday, 12/14/2015, Mountain View, CA http://www.meetup.com/Deep-Learning-Applications/events/227217853/ By Greg Makowski www.Linkedin.com/in/GregMakowski [email protected] Community @ http://Kamanja.org Try out

Upload: greg-makowski

Post on 15-Feb-2017

5.458 views

Category:

Data & Analytics


0 download

TRANSCRIPT

© 2015 ligaDATA, Inc. All Rights Reserved.

Using Deep Learning to do Real-Time Scoring in Practical Applications

Deep Learning Applications Meetup, Monday, 12/14/2015, Mountain View, CA http://www.meetup.com/Deep-Learning-Applications/events/227217853/ By Greg Makowski www.Linkedin.com/in/GregMakowski [email protected]

Community @ http://Kamanja.org

Try out

DeepLearning-Outline

•  BigPictureof2016Technology•  NeuralNetBasics•  DeepNetworkConfiguraBonsforPracBcalApplicaBons

–  Auto-Encoder(i.e.datacompressionorPrincipalComponentsAnalysis)

–  ConvoluBonal(shiKinvarianceinBmeorspaceforvoice,imageorIoT)

–  RealTimeScoringandLambdaArchitecture

–  DeepNetlibrariesandtools(R,H2O,DL4J,TensorFlow,Gorila,Kamanja)

–  ReinforcementLearning,Q-Learning(i.e.beatpeopleatAtarigames,IoT)

–  ConBnuousSpaceWordModels(i.e.word2vec)

Gartner’sTop2016StrategicTechnologyTrends

David Clearley

Gartner’sTop2016StrategicTechnologyTrends

David Clearley

AdvantagesofaNetoverRegression

5

field1

field2

$

c

$

$

$$

$

$ $

$$$ $$

$

c

c

c

c

c

c

c

c

c

c

c

c

c

c

c

c c

c

c

c ccc

c

ARegressionSoluBon

“Linear”

FitoneLine

$ c

Targetvaluesforadatapointwithsourcefieldvaluesgraphedby“field1”and“field2”

Showing ONE target field, with values of $ or c https://en.wikipedia.org/wiki/Regression_analysis

AdvantagesofaNetoverRegression

6

field1

field2

$

c

$

$

$$

$

$ $

$$$ $$

$

c

c

c

c

c

c

c

c

c

c

c

c

c

c

c

c c

c

c

c ccc

c

ANeuralNetSoluBon

“Non-Linear”

Severalregionswhichare

notadjacent

Hiddennodescanbelineorcircle

https://en.wikipedia.org/wiki/Artificial_neural_network

AComparisonofaNeuralNet

andRegression

ALogis(cregressionformula:Y=f(a0+a1*X1+a2*X2+a3*X3)a*arecoefficientsBackpropaga(on,castinasimilarform:H1=f(w0+w1*I1+w2*I2+w3*I3)H2=f(w4+w5*I1+w6*I2+w7*I3):Hn=f(w8+w9*I1+w10*I2+w11*I3)O1=f(w12+w13*H1+....+w15*Hn)On=....w*areweights,AKAcoefficientsI1..Inareinputnodesorinputvariables.H1..Hnarehiddennodes,whichextractfeaturesofthedata.O1..Onaretheoutputs,whichgroupdisjointcategories.LookatraBooftrainingrecordsv.s.freeparameters(complexity,regularizaBon)

a0

a1 a2 a3

X1 X2 X3

Y

Input1 I2 I3

Bias

H1 Hidden2

Output

w1w2

w3

ThinkofSeparaBngLandvs.Water

8

1 line, Regression

(more errors)

5 Hidden Nodes in a Neural Network

Different algorithms use different Basis Functions: •  One line •  Many horizontal & vertical lines •  Many diagonal lines •  Circles

Decision Tree 12 splits

(more elements, Less computation)

Q) What is too detailed? “Memorizing high tide boundary” and applying it at all times

DeepLearning-Outline

•  BigPictureof2016Technology•  NeuralNetBasics•  DeepNetworkConfiguraBonsforPracBcalApplicaBons

–  Auto-Encoder(i.e.datacompressionorPrincipalComponentsAnalysis)

–  ConvoluBonal(shiKinvarianceinBmeorspaceforvoice,imageorIoT)

–  RealTimeScoringandLambdaArchitecture

–  DeepNetlibrariesandtools(R,H2O,DL4J,TensorFlow,Gorila,Kamanja)

–  ReinforcementLearning,Q-Learning(i.e.beatpeopleatAtarigames,IoT)

–  ConBnuousSpaceWordModels(i.e.word2vec)

http://deeplearning.net/ http://www.kdnuggets.com/ http://www.analyticbridge.com/

LeadinguptoanAutoEncoder

•  SupervisedLearning–  Regression,TreeorNet:50inputsà1output–  Possiblenets:

•  256à120à1•  256à120à5(trees,regressionsandmostarelimitedto1output)•  256à120à60à1•  256à180à120à60à1(startgemngintotrainingstabilityproblems,witholdprocesses)

•  UnsupervisedLearning–  Clustering(tradiBonalunsupervised):

•  60inputs(notarget);produce1-2new(clusterID&distance)

AutoEncoder(likedatacompression)Relateinputtooutput,throughcompressedmiddle

•  SupervisedLearning–  Regression,TreeorNet:50inputsà1output–  Possiblenets:

•  256à120à1•  256à120à5(trees,regressions,SVDandmostarelimitedto1output)•  256à120à60à1•  256à180à120à60à1

•  UnsupervisedLearning–  Clustering(tradiBonalunsupervised):

•  60inputs(notarget);produce1-2new(clusterID&distance)–  Unsupervisedtrainingofanet,assign(targetrecord==inputrecord)AUTO-ENCODING–  Trainnetinstages,

•  256à180à256à120à

à120àà120à

•  Addsupervisedlayertoforecast10targetcategoriesà10

Because of symmetry, Only need to update mirrored weights once

(start getting long training times to stabilize, or may not finish, The BREAKTHROUGH provided by DEEP LEARNING)

4 hidden layers w/ unsupervised training 1 layer at end w/ supervised training https://en.wikipedia.org/wiki/Deep_learning

AutoEncoderHowitcanbegenerallyusedtosolveproblems

•  Addsupervisedlayertoforecast10targetcategories–  4hiddenlayerstrainedwithunuspervisedtraining, –  1newlayer,trainedwithsupervisedlearning

à10

•  OutlierdetecBon

•  The“acBvaBon”ateachofthe120outputnodesindicatesthe“match”tothatclusterorcompressedfeature

•  Whenscoringnewrecords,candetectoutlierswithaprocesslikeIf(max_output_match<0.333)thensuspectedoutlier

•  HowisitlikePCA? –  Individualhiddennodesinthesamelayerare“different”or“orthogonal”

HowTransferableareFeaturesinDeepNeuralNetworks?

http://papers.nips.cc/paper/5347-how-transferable-are-features-in-deep-neural-networks.pdf

DeepLearning-Outline

•  BigPictureof2016Technology•  NeuralNetBasics•  DeepNetworkConfiguraBonsforPracBcalApplicaBons

–  Auto-Encoder(i.e.datacompressionorPrincipalComponentsAnalysis)

–  ConvoluBonal(shiKinvarianceinBmeorspaceforvoice,imageorIoT)

–  RealTimeScoringandLambdaArchitecture

–  DeepNetlibrariesandtools(R,H2O,DL4J,TensorFlow,Gorila,Kamanja)

–  ReinforcementLearning,Q-Learning(i.e.beatpeopleatAtarigames,IoT)

–  ConBnuousSpaceWordModels(i.e.word2vec)

DeepLearningCauseda50%ReducBoninSpeechrecogniBonerrorratesin4yrs

“TheuseofdeepneuralnetsinproducBonspeechsystemsreallystartedmorelikein2011...

IwouldesBmatethatfromtheBmebeforedeepneuralnetswereusedunBlnow,theerrorrateonproducBonspeechsystemsfellfromabout20%downtobelow10%,somorethana50%reducBoninerrorrate.”-JeffDeanemailtoGreg12/13/2015

http://research.google.com/people/jeff/ Senior Fellow in the Knowledge Group Google

Drop in Speech Rec. Error Rates

Deep Learning Deployments Started

2011

InternetofThings(IoT)isheavilysignaldata

http://www.datasciencecentral.com/profiles/blogs/the-internet-of-things-data-science-and-big-data

ConvoluBonalNeuralNet(CNN)EnablesdetecBngshiKinvariantpaxerns

In Speech and Image applications, patterns vary by size, can be shifted right or left Challenge: finding a bounding box for a pattern is almost as hard as detecting the pat.

Neural Nets can be explicitly trained to provide a FFT (Fast Fourier Transform) to convert data from time domain to the frequency domain – but typically an explicit FFT is used

InternetofThingsSignalData

ConvoluBonalNeuralNet(CNN)EnablesdetecBngshiKinvariantpaxerns

In Speech and Image applications, patterns vary by size, can be shifted right or left Challenge: finding a bounding box for a pattern is almost as hard as detecting the pat. Solution: use a siding convolution to detect the pattern

CNN can use very long observational windows, up to 400 ms, long context

ConvoluBon

https://en.wikipedia.org/wiki/Convolution

ConvoluBonNeuralNet:fromLeNet-5

Gradient-BasedLearningAppliedtoDocumentRecogniBonProceedingsoftheIEEE,Nov1998YannLeCun,LeonBoxou,YoshuaBengioandPatrickHaffner

Director Facebook, AI Research

http://yann.lecun.com/

AutoEncoder(likedatacompression)Relateinputtooutput,throughcompressedmiddle

ConvoluBonNeuralNet(CNN)

•  HowisaCNNtraineddifferentlythanatypicalbackpropagaBon(BP)network?

–  Partsofthetrainingwhichisthesame:•  Presentinputrecord•  Forwardpassthroughthenetwork•  Backpropagateerror(i.e.perepoch)

–  Differentpartsoftraining:•  SomeconnecBonsareCONSTRAINEDtothesamevalue

–  TheconnecBonsforthesamepaxern,slidingoverallinputspace

•  Errorupdatesareaveragedandappliedequallytotheonesetofweightvalues

•  Endupwiththesamepaxerndetectorfeedingmanynodesatthenextlevel

http://www.cs.toronto.edu/~rgrosse/icml09-cdbn.pdf Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations, 2009

ConvoluBonNeuralNet(CNN)SameLowLevelFeatures

http://stats.stackexchange.com/questions/146413/why-convolutional-neural-networks-belong-to-deep-learning

TheMammalianVisualCortexisHierarchical(TheBrainisaDeepNeuralNet-YannLeCun)

http://www.pamitc.org/cvpr15/files/lecun-20150610-cvpr-keynote.pdf

0

1

2

3 4

5 6

7 8

9

10 11

ConvoluBonNeuralNet(CNN)Facebookexample

https://gigaom.com/2014/03/18/facebook-shows-off-its-deep-learning-skills-with-deepface/

ConvoluBonNeuralNet(CNN)Yahoo+Stanfordexample–findafaceinapic,evenupsidedown

http://www.dailymail.co.uk/sciencetech/article-2958597/Facial-recognition-breakthrough-Deep-Dense-software-spots-faces-images-partially-hidden-UPSIDE-DOWN.html

ConvoluBonalNeuralNets(CNN)RoboBcGraspDetecBon(IoT)

http://pjreddie.com/media/files/papers/grasp_detection_1.pdf

DeepLearning-Outline

•  BigPictureof2016Technology•  NeuralNetBasics•  DeepNetworkConfiguraBonsforPracBcalApplicaBons

–  Auto-Encoder(i.e.datacompressionorPrincipalComponentsAnalysis)

–  ConvoluBonal(shiKinvarianceinBmeorspaceforvoice,imageorIoT)

–  RealTimeScoringandLambdaArchitecture

–  DeepNetlibrariesandtools(R,H2O,DL4J,TensorFlow,Gorila,Kamanja)

–  ReinforcementLearning,Q-Learning(i.e.beatpeopleatAtarigames,IoT)

–  ConBnuousSpaceWordModels(i.e.word2vec)

RealTimeScoringOpBmizaBons

•  Auto-Encodingnets–  CangrowtomillionsofconnecBons,andstarttogetcomputaBonal–  CanreduceconnecBonsby5%to25+%withpruning&retraining

•  TrainwithincreasedregularizaBonsemngs•  Dropconnec(onswithnearzeroweights,thenretrain•  DropnodeswithfaninconnecBonswhichdon’tgetusedmuchlater,suchasinyourpredicBveproblem

•  PerformsensiBvityanalysis–deletepossibleinputfields

•  ConvoluBonalNeuralNets–  Withlargeenoughdata,canevenskiptheFFTpreprocessingstep–  Canusewiderthan10msaudiosamplingratesforspeedup

•  Implementotherpreprocessingaslookuptables(i.e.BayesianPriors)•  UsecloudcompuBng,donotlimittodevicecompuBng•  Largemodelsdon’tfitàusemodelordataparallelismtotrain

© 2015 ligaDATA, Inc. All Rights Reserved. 30

ligaDATARealTimeScoringLambdaArchitecture–forbothBatchandRealTime

•  Firstarchitecturetoreallydefinehowbatchandstreamprocessingcanworktogether•  Foundedontheconceptsofimmutabilityandre-computaBon,withhumanfaulttolerance•  Pre-computestheresultsofbatch&real-Bmeprocessesasasetofviews,&querylayer

mergestheviews

https://en.wikipedia.org/wiki/Lambda_architecture

© 2015 ligaDATA, Inc. All Rights Reserved. 31

ligaDATARealTimeScoringLambdaArchitectureWithKamanja

Kamanja

Decisions Transformations

Enrichment Aggregations

Master Dataset

Real time Views & Indexing

Serving Layer

Query

Query

Real-time Data

•  KamanjaembracesandextendsLambdaarchitecture•  Transformandprocessmessagesinreal-Bme,combinemessageswithhistorical

dataandcomputereal-Bmeviewstomakereal-Bmedecisionsbasedontheviews

Queue

© 2015 ligaDATA, Inc. All Rights Reserved. 32

ligaDATA

RealTimeCompu(ng

Kamanja Technology Stack

Kamanja(PMML,JavaorScalaConsumer)

High level languages / abstractions

Compute Fabric

Cloud, EC2 Internal Cloud

Security

Kerberos

Real Time Streaming

Kafka, MQ

Spark*

ligaDATA

Data Store

HBase, Cassandra,

InfluxDB HDFS

(Create adaptors to integrate others)

Resource Management

Zookeeper, Yarn*, Mesos*

High Level Languages / Abstractions

PMML Producers, MLlib

DeepNetTools

DeepLearning-Outline

•  BigPictureof2016Technology•  NeuralNetBasics•  DeepNetworkConfiguraBonsforPracBcalApplicaBons

–  Auto-Encoder(i.e.datacompressionorPrincipalComponentsAnalysis)

–  ConvoluBonal(shiKinvarianceinBmeorspaceforvoice,imageorIoT)

–  RealTimeScoringandLambdaArchitecture

–  DeepNetlibrariesandtools(R,H2O,DL4J,TensorFlow,Gorila,Kamanja)

–  ReinforcementLearning,Q-Learning(i.e.beatpeopleatAtarigames,IoT)

–  ConBnuousSpaceWordModels(i.e.word2vec)

DeepReinforcementLearning,Q-Learning

http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind https://en.wikipedia.org/wiki/Reinforcement_learning https://en.wikipedia.org/wiki/Q-learning

ThinkintermsofIoT….Deviceagentmeasures,infersuser’sacBonMaximizesfuturereward,recommendstouserorsystem

DeepReinforcementLearning,Q-Learning(ThinkaboutIoTpossibiliBes)

http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind

Use 4 screen shots

DeepReinforcementLearning,Q-Learning

http://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind

Use 4 screen shots

Use 4 screen shots

IoT challenge: How to replace game score with IoT score?

Shift right fast shift right stay shift left shift left fast

DeepReinforcementLearning,Q-Learninghttp://www.iclr.cc/lib/exe/fetch.php?media=iclr2015:silver-iclr2015.pdf David Silver, Google DeepMind

Games w/ best Q-learning Video Pinball Breakout Star Gunner Crazy Climber Gopher

DeepLearning-Outline

•  BigPictureof2016Technology•  NeuralNetBasics•  DeepNetworkConfiguraBonsforPracBcalApplicaBons

–  Auto-Encoder(i.e.datacompressionorPrincipalComponentsAnalysis)

–  ConvoluBonal(shiKinvarianceinBmeorspaceforvoice,imageorIoT)

–  RealTimeScoring

–  DeepNetlibrariesandtools(R,H2O,DL4J,TensorFlow,Gorila,Kamanja)

–  ReinforcementLearning,Q-Learning(i.e.beatpeopleatAtarigames,IoT)

–  ConBnuousSpaceWordModels(i.e.word2vec)

ConBnuousSpaceWordModels(word2vec)

•  Before(apredicBve“BagofWords”model):–  Onerowperdocument,paragraphorwebpage–  Binarywordspace:10kto200kcolumns,oneperwordorphrase

000100000000000100001….“Thiswordspacemodelis….”–  The“Bagofwordsmodel”relatesinputrecordtoatargetcategory

ConBnuousSpaceWordModels(word2vec)

•  Before(apredicBve“BagofWords”model):–  Onerowperdocument,paragraphorwebpage–  Binarywordspace:10kto200kcolumns,oneperwordorphrase

000100000000000100001….“Thiswordspacemodelis….”–  The“Bagofwordsmodel”relatesinputrecordtoatargetcategory

•  New:–  Onerowperword(word2vec),possiblypersentence(sent2vec)–  Con(nuouswordspace:100to300columns,conBnuousvalues

.01.05.02.00.00.68.01.01.35....00à“King”

.00.00.05.01.49.52.00.11.84....01à“Queen”–  ThedeepnettrainingresultedinanEmergentProperty:

•  NumericgeometrylocaBonrelatestoconceptspace•  “King”–“man”+“woman”=“Queen”(mathtochangegenderrelaBon)•  “USA”–“WashingtonDC”+“England”=“London”(mathforcapitalrelaBon)

ConBnuousSpaceWordModels(word2vec)HowtoSCALEtolargervocabularies?

http://www.slideshare.net/hustwj/cikm-keynotenov2014?qid=f92c9e86-feea-41ac-a099-d086efa6fac1&v=default&b=&from_search=2

TrainingConBnuousSpaceWordModels

•  HowtoTrainTheseModels?–  Rawdata:“Thisexamplesentenceshowstheword2vecmodeltraining.”

TrainingConBnuousSpaceWordModels

•  HowtoTrainTheseModels?–  Rawdata:“Thisexamplesentenceshowstheword2vecmodeltraining.”–  Trainingdata(withtargetvaluesunderscored,andotherwordsasinput)

“Thisexamplesentenceshowsword2vec”(prune“the”)“examplesentenceshowsword2vecmodel”“sentenceshowsword2vecmodeltraining”

–  Thecontextofthe2to5priorandfollowingwordspredictthemiddleword

–  DeepNetmodelarchitecture,datacompressionto300conBnuousnodes•  50kbinarywordinputvectorà...à300à...à50kwordtargetvector

TrainingConBnuousSpaceWordModels

•  HowtoTrainTheseModels?–  Rawdata:“Thisexamplesentenceshowstheword2vecmodeltraining.”–  Trainingdata(withtargetvaluesunderscored,andotherwordsasinput)

“Thisexamplesentenceshowsword2vec”(prune“the”)“examplesentenceshowsword2vecmodel”“sentenceshowsword2vecmodeltraining”

–  Thecontextofthe2to5priorandfollowingwordspredictthemiddleword

–  DeepNetmodelarchitecture,datacompressionto300conBnuousnodes•  50kbinarywordinputvectorà...à300à...à50kwordtargetvector

•  UsePre-TrainedModelshxps://code.google.com/p/word2vec/

–  Trainedon100billionwordsfromGoogleNews–  300dimvectorsfor3millionwordsandphrases–  hxps://code.google.com/p/word2vec/

TrainingConBnuousSpaceWordModels

http://www.slideshare.net/hustwj/cikm-keynotenov2014?qid=f92c9e86-feea-41ac-a099-d086efa6fac1&v=default&b=&from_search=2

ApplyingConBnuousSpaceWordModels

http://static.googleusercontent.com/media/research.google.com/en//people/jeff/BayLearn2015.pdf State of the art in machine translation Sequence to Sequence Learning with neural Networks, NIPS 2014

LanguagetranslaBonDocumentsummaryGeneratetextcapBonsforpictures

.01

.05

.89

.00

.05

.62

.00

.34

“Greg’sGuts”onDeepLearning

•  SomeclaimtheneedforpreprocessingandknowledgerepresentaBonhasended–  FormostofthesignalprocessingapplicaBonsàyes,simplify–  IamVERYREADYTOCOMPETEinotherapplicaBons,conBnuing

•  expressingexplicitdomainknowledge•  opBmizingbusinessvaluecalculaBons

•  DeepLearninggetsbigadvantagesfrombigdata–  Why?BexerpopulaBnghighdimensionalspacecombinaBonsubsets–  UnsupervisedfeatureextracBonreducesneedforlargelabeleddata

•  However,“regularsizeddata”getsabigboostaswell–  The“raBooffreeparameters”(i.e.neurons)totrainingsetrecords–  Forregressionsorregularnets,want5-10Bmesasmanyrecords–  RegularizaBonandweightdropoutreducesthispressure–  Especiallywhenonlytraining“thenextautoencodinglayer”

DeepLearningSummary–ITSEXCITING!

•  DiscussedDeepLearningarchitectures–  AutoEncoder,convoluBonal,reinforcementlearning,conBnuousword

•  RealTimespeedup–  Trainmodel,reducecomplexity,retrain–  Simplifypreprocessingwithlookuptables–  UsecloudcompuBng,donotbelimitedtodevicecompuBng–  LambdaarchitecturelikeKamanja,tocombinerealBmeandbatch

•  ApplicaBons–  SignalData:IoT,Speech,Images–  ControlSystemmodels(likeAtarigameplaying,IoT)–  LanguageModels

https://www.quora.com/Why-is-deep-learning-in-such-demand-now

© 2015 ligaDATA, Inc. All Rights Reserved.

Using Deep Learning to do Real-Time Scoring in Practical Applications

Deep Learning Applications Meetup, Monday, 12/14/2015, Mountain View, CA http://www.meetup.com/Deep-Learning-Applications/events/227217853/ By Greg Makowski www.Linkedin.com/in/GregMakowski [email protected]

Community @ http://Kamanja.org

Try out