TensorFlowDeepLearningProjects
10real-worldprojectsoncomputervision,machinetranslation,chatbots,andreinforcementlearning
LucaMassaronAlbertoBoschettiAlexeyGrigorevAbhishekThakurRajalingappaaShanmugamani
BIRMINGHAM-MUMBAI
TensorFlowDeepLearningProjectsCopyright©2018PacktPublishing
Allrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,withoutthepriorwrittenpermissionofthepublisher,exceptinthecaseofbriefquotationsembeddedincriticalarticlesorreviews.
Everyefforthasbeenmadeinthepreparationofthisbooktoensuretheaccuracyoftheinformationpresented.However,theinformationcontainedinthisbookissoldwithoutwarranty,eitherexpressorimplied.Neithertheauthors,norPacktPublishingoritsdealersanddistributors,willbeheldliableforanydamagescausedorallegedtohavebeencauseddirectlyorindirectlybythisbook.
PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallofthecompaniesandproductsmentionedinthisbookbytheappropriateuseofcapitals.However,PacktPublishingcannotguaranteetheaccuracyofthisinformation.
CommissioningEditors:AmeyVarangaonkarAcquisitionEditor:VirajMadhavContentDevelopmentEditors:SnehalKolteTechnicalEditor:DharmendraYadavCopyEditor:SafisEditingProjectCoordinator:ManthanPatelProofreader:SafisEditingIndexers:RekhaNairGraphics:TaniaDuttaProductionCoordinator:ShraddhaFalebhai
Firstpublished:March2018
Productionreference:1270318
PublishedbyPacktPublishingLtd.LiveryPlace35LiveryStreetBirminghamB32PB,UK.
ISBN978-1-78839-806-0
www.packtpub.com
mapt.io
Maptisanonlinedigitallibrarythatgivesyoufullaccesstoover5,000booksandvideos,aswellasindustryleadingtoolstohelpyouplanyourpersonaldevelopmentandadvanceyourcareer.Formoreinformation,pleasevisitourwebsite.
Whysubscribe?SpendlesstimelearningandmoretimecodingwithpracticaleBooksandVideosfromover4,000industryprofessionals
ImproveyourlearningwithSkillPlansbuiltespeciallyforyou
GetafreeeBookorvideoeverymonth
Maptisfullysearchable
Copyandpaste,print,andbookmarkcontent
PacktPub.comDidyouknowthatPacktofferseBookversionsofeverybookpublished,withPDFandePubfilesavailable?YoucanupgradetotheeBookversionatwww.PacktPub.comandasaprintbookcustomer,youareentitledtoadiscountontheeBookcopy.Getintouchwithusatservice@packtpub.comformoredetails.
Atwww.PacktPub.com,youcanalsoreadacollectionoffreetechnicalarticles,signupforarangeoffreenewsletters,andreceiveexclusivediscountsandoffersonPacktbooksandeBooks.
Contributors
AbouttheauthorsLucaMassaronisadatascientistandmarketingresearchdirectorspecializedinmultivariatestatisticalanalysis,machinelearning,andcustomerinsight,with10+yearsexperienceofsolvingreal-worldproblemsandgeneratingvalueforstakeholdersusingreasoning,statistics,datamining,andalgorithms.Passionateabouteverythingondataanalysisanddemonstratingthepotentialityofdata-drivenknowledgediscoverytobothexpertsandnon-experts,hebelievesthatalotcanbeachievedbyunderstandinginsimpletermsandpracticingtheessentialsofanydiscipline.
IwouldliketothankYukikoandAmeliafortheircontinuedsupport,help,andlovingpatience.
AlbertoBoschettiisadatascientistwithstrongexpertiseinsignalprocessingandstatistics.HeholdsaPhDintelecommunicationengineeringandlivesandworksinLondon.Inhiswork,hefacesdailychallengesspanningnaturallanguageprocessing,machinelearning,anddistributedprocessing.Heisverypassionateabouthisjobandalwaystriestostayuptodateonthelatestdevelopmentindatasciencetechnologies,attendingmeetups,conferences,andotherevents.
AlexeyGrigorevisaskilleddatascientist,machinelearningengineer,andsoftwaredeveloperwithmorethan8yearsofprofessionalexperience.HestartedhiscareerasaJavadeveloperworkingatanumberoflargeandsmallcompanies,butafterawhileheswitchedtodatascience.Rightnow,AlexeyworksasadatascientistatSimplaex,where,inhisday-to-dayjob,heactivelyusesJavaandPythonfordatacleaning,dataanalysis,andmodeling.Hisareasofexpertisearemachinelearningandtextmining.
Iwouldliketothankmywife,Larisa,andmyson,Arkadij,fortheirpatienceandsupportwhileIwasworkingonthebook.
AbhishekThakurisadatascientist.Hisfocusismainlyonappliedmachinelearninganddeeplearning,ratherthantheoreticalaspects.Hecompletedhis
master'sincomputerscienceattheUniversityofBonninearly2014.Sincethen,hehasworkedinvariousindustries,witharesearchfocusonautomaticmachinelearning.HelikestakingpartinmachinelearningcompetitionsandhasattainedathirdplaceintheworldwiderankingsonthepopularwebsiteKaggle.
RajalingappaaShanmugamaniiscurrentlyadeeplearningleadatSAP,Singapore.Previously,heworkedandconsultedatvariousstartups,developingcomputervisionproducts.Hehasamaster'sfromIITMadras,histhesishavingbeenbasedontheapplicationsofcomputervisioninmanufacturing.Hehaspublishedarticlesinpeer-reviewedjournals,andspokenatconferences,andappliedforafewpatentsinmachinelearning.Inhissparetime,hecoachesprogrammingandmachinelearningtoschoolstudentsandengineers.
IthankmyspouseEzhil,familyandfriendsfortheirimmensesupport.Ithankalltheteachers,colleagues,managersandmentorsfromwhomIhavelearnedalot.
AboutthereviewerMarvinBertinisanonlinecourseauthorandtechnicalbookeditorfocusedondeeplearning,computervision,andNLPwithTensorFlow.Heholdsabachelor'sinmechanicalengineeringandamaster'sindatascience.HehasworkedasanMLengineeranddatascientistintheBayArea,focusingonrecommendersystems,NLP,andbiotechapplications.Hecurrentlyworksatastart-upthatdevelopsdeeplearning(AI)algorithmsforearlycancerdetection.
PacktissearchingforauthorslikeyouIfyou'reinterestedinbecominganauthorforPackt,pleasevisitauthors.packtpub.comandapplytoday.Wehaveworkedwiththousandsofdevelopersandtechprofessionals,justlikeyou,tohelpthemsharetheirinsightwiththeglobaltechcommunity.Youcanmakeageneralapplication,applyforaspecifichottopicthatwearerecruitinganauthorfor,orsubmityourownidea.
TableofContentsTitlePage
CopyrightandCredits
TensorFlowDeepLearningProjects
PacktUpsell
Whysubscribe?
PacktPub.com
Contributors
Abouttheauthors
Aboutthereviewer
Packtissearchingforauthorslikeyou
Preface
Whothisbookisfor
Whatthisbookcovers
Togetthemostoutofthisbook
Downloadtheexamplecodefiles
Conventionsused
Getintouch
Reviews
1. RecognizingtrafficsignsusingConvnetsThedataset
TheCNNnetwork
Imagepreprocessing
Trainthemodelandmakepredictions
Follow-upquestions
Summary
2. AnnotatingImageswithObjectDetectionAPITheMicrosoft commonobjectsincontext
TheTensorFlowobjectdetectionAPI
GraspingthebasicsofR-CNN,R-FCNand SSDmodels
Presentingourprojectplan
Settingupanenvironmentsuitablefortheproject
Protobufcompilation
Windowsinstallation
Unixinstallation
Provisioningoftheprojectcode
Somesimpleapplications
Real-timewebcamdetection
Acknowledgements
Summary
3. CaptionGenerationforImagesWhatiscaptiongeneration?
Exploringimagecaptioningdatasets
Downloadingthedataset
Convertingwordsintoembeddings
Imagecaptioningapproaches
Conditionalrandomfield
Recurrentneuralnetworkonconvolutionneuralnetwork
Captionranking
Densecaptioning
RNNcaptioning
Multimodalcaptioning
Attention-basedcaptioning
Implementingacaptiongenerationmodel
Summary
4. BuildingGANsforConditionalImageCreationIntroducingGANs
Thekeyisintheadversarialapproach
Acambrianexplosion
DCGANs
ConditionalGANs
Theproject
Datasetclass
CGANclass
PuttingCGANtoworkonsomeexamples
MNIST
ZalandoMNIST
EMNIST
ReusingthetrainedCGANs
ResortingtoAmazonWebService
Acknowledgements
Summary
5. StockPricePredictionwithLSTMInputdatasets–cosineandstockprice
Formatthedataset
Usingregressiontopredictthefuturepricesofastock
Longshort-termmemory–LSTM101
StockpricepredictionwithLSTM
Possiblefollow-upquestions
Summary
6. CreateandTrainMachineTranslationSystemsAwalkthroughofthearchitecture
Preprocessingofthecorpora
Trainingthemachinetranslator
Testandtranslate
Homeassignments
Summary
7. TrainandSetupaChatbot,AbletoDiscussLikeaHumanIntroductiontotheproject
Theinputcorpus
Creatingthetrainingdataset
Trainingthechatbot
ChatboxAPI
Homeassignments
Summary
8. DetectingDuplicateQuoraQuestionsPresentingthedataset
Startingwithbasicfeatureengineering
Creatingfuzzyfeatures
ResortingtoTF-IDFandSVDfeatures
MappingwithWord2vecembeddings
Testingmachinelearningmodels
BuildingaTensorFlowmodel
Processingbeforedeepneuralnetworks
Deepneuralnetworksbuildingblocks
Designingthelearningarchitecture
Summary
9. BuildingaTensorFlowRecommenderSystemRecommendersystems
Matrixfactorizationforrecommendersystems
Datasetpreparationandbaseline
Matrixfactorization
Implicitfeedbackdatasets
SGD-basedmatrixfactorization
Bayesianpersonalizedranking
RNNforrecommendersystems
Datapreparationandbaseline
RNNrecommendersysteminTensorFlow
Summary
10. VideoGamesbyReinforcementLearning
Thegamelegacy
TheOpenAIversion
InstallingOpenAIonLinux(Ubuntu14.04or16.04)
LunarLanderinOpenAIGym
Exploringreinforcementlearningthroughdeeplearning
TricksandtipsfordeepQ-learning
UnderstandingthelimitationsofdeepQ-learning
Startingtheproject
DefiningtheAIbrain
Creatingmemoryforexperiencereplay
Creatingtheagent
Specifyingtheenvironment
Runningthereinforcementlearningprocess
Acknowledgements
Summary
OtherBooksYouMayEnjoy
Leaveareview-letotherreadersknowwhatyouthink
PrefaceTensorFlowisoneofthemostpopularframeworksusedformachinelearningand,morerecently,deeplearning.Itprovidesafastandefficientframeworkfortrainingdifferentkindsofdeeplearningmodelswithveryhighaccuracy.ThisbookisyourguidetomasteringdeeplearningwithTensorFlowwiththehelpof12real-worldprojects.
TensorFlowDeepLearningProjectsstartswithsettinguptherightTensorFlowenvironmentfordeeplearning.You'lllearntotraindifferenttypesofdeeplearningmodelsusingTensorFlow,includingCNNs,RNNs,LSTMs,andgenerativeadversarialnetworks.Whiledoingso,youwillbuildend-to-enddeeplearningsolutionstotackledifferentreal-worldproblemsinimageprocessing,enterpriseAI,andnaturallanguageprocessing,tonameafew.You'lltrainhigh-performancemodelstogeneratecaptionsforimagesautomatically,predicttheperformanceofstocks,andcreateintelligentchatbots.Someadvancedaspects,suchasrecommendersystemsandreinforcementlearning,arealsocoveredinthisbook.
Bytheendofthisbook,youwillhavemasteredalltheconceptsofdeeplearningandtheirimplementationwithTensorFlow,andwillbeabletobuildandtrainyourowndeeplearningmodelswithTensorFlowtotackleanykindofproblem.
WhothisbookisforThisbookisfordatascientists,machinelearninganddeeplearningpractitioners,andAIenthusiastswhowantago-toguidetotesttheirknowledgeandexpertiseinbuildingreal-worldintelligentsystems.IfyouwanttomasterthedifferentdeeplearningconceptsandalgorithmsassociatedwithitbyimplementingpracticalprojectsinTensorFlow,thisbookiswhatyouneed!
WhatthisbookcoversChapter1,RecognizingtrafficsignsusingConvnets,showshowtoextracttheproperfeaturesfromimageswithallthenecessarypreprocessingsteps.Forourconvolutionalneuralnetwork,wewillusesimpleshapesgeneratedwithmatplotlib.Forourimagepreprocessingexercises,wewillusetheYaleFaceDatabase.
Chapter2,AnnotatingImageswithObjectDetectionAPI,detailsathebuildingofareal-timeobjectdetectionapplicationthatcanannotateimages,videos,andwebcamcapturesusingTensorFlow'snewobjectdetectionAPI(withitsselectionofpretrainedconvolutionalnetworks,theso-calledTensorFlowdetectionmodelzoo)andOpenCV.
Chapter3,CaptionGenerationforImages,enablesreaderstolearncaptiongenerationwithorwithoutpretrainedmodels.
Chapter4,BuildingGANsforConditionalImageCreation,guidesyoustepbystepthroughbuildingaselectiveGANtoreproducenewimagesofthefavoredkind.TheuseddatasetsthatGANswillreproducewillbeofhandwrittencharacters(bothnumbersandlettersinChars74K).
Chapter5,StockPricePredictionwithLSTM,exploreshowtopredictthefutureofamono-dimensionalsignal,astockprice.Givenitspast,wewilllearnhowtoforecastitsfuturewithanLSTMarchitecture,andhowwecanmakeourprediction'smoreandmoreaccurate.
Chapter6,CreateandTrainMachineTranslationSystems,showshowtocreateandtrainableeding-edgemachinetranslationsystemwithTensorFlow.
Chapter7,TrainandSetupaChatbot,AbletoDiscussLikeaHuman,tellsyouhowtobuildanintelligentchatbotfromscratchandhowtodiscusswithit.
Chapter8,DetectingDuplicateQuoraQuestions,discussesmethodsthatcanbeusedtodetectduplicatequestionsusingtheQuoradataset.Ofcourse,thesemethodscanbeusedforothersimilardatasets.
Chapter9,BuildingaTensorFlowRecommenderSystem,coverslarge-scaleapplicationswithpracticalexamples.We'lllearnhowtoimplementcloudGPUcomputingcapabilitiesonAWSwithveryclearinstructions.We'llalsoutilizeH2O'swonderfulAPIfordeepnetworksonalargescale.
Chapter10,VideoGamesbyReinforcementLearning,detailsaprojectwhereyoubuildanAIcapableofplayingLunarLanderbyitself.TheprojectrevolvesaroundtheexistingOpenAIGymprojectandintegratesitusingTensorFlow.OpenAIGymisaprojectthatprovidesdifferentgamingenvironmentstoexplorehowtouseAIagentsthatcanbepoweredby,amongotheralgorithms,TensorFlowneuralmodels.
TogetthemostoutofthisbookTheexamplescoveredinthisbookcanberunwithWindows,Ubuntu,orMac.Alltheinstallationinstructionsarecovered.YouwillneedbasicknowledgeofPython,machinelearninganddeeplearning,andfamiliaritywithTensorFlow.
DownloadtheexamplecodefilesYoucandownloadtheexamplecodefilesforthisbookfromyouraccountatwww.packtpub.com.Ifyoupurchasedthisbookelsewhere,youcanvisitwww.packtpub.com/supportandregistertohavethefilesemaileddirectlytoyou.
Youcandownloadthecodefilesbyfollowingthesesteps:
1. Loginorregisteratwww.packtpub.com.2. SelecttheSUPPORTtab.3. ClickonCodeDownloads&Errata.4. EnterthenameofthebookintheSearchboxandfollowtheonscreen
instructions.
Oncethefileisdownloaded,pleasemakesurethatyouunziporextractthefolderusingthelatestversionof:
WinRAR/7-ZipforWindowsZipeg/iZip/UnRarXforMac7-Zip/PeaZipforLinux
ThecodebundleforthebookisalsohostedonGitHubathttps://github.com/PacktPublishing/TensorFlow-Deep-Learning-Projects.Wealsohaveothercodebundlesfromourrichcatalogofbooksandvideosavailableathttps://github.com/PacktPublishing/.Checkthemout!
ConventionsusedThereareanumberoftextconventionsusedthroughoutthisbook.
CodeInText:Indicatescodewordsintext,databasetablenames,foldernames,filenames,fileextensions,pathnames,dummyURLs,userinput,andTwitterhandles.Hereisanexample:"TheclassTqdmUpToisjustatqdmwrapperthatenablestheuseoftheprogressdisplayalsofordownloads."
Ablockofcodeissetasfollows:
importnumpyasnp
importurllib.request
importtarfile
importos
importzipfile
importgzip
importos
fromglobimportglob
fromtqdmimporttqdm
Anycommand-lineinputoroutputiswrittenasfollows:
epoch01:precision:0.064
epoch02:precision:0.086
epoch03:precision:0.106
epoch04:precision:0.127
epoch05:precision:0.138
epoch06:precision:0.145
epoch07:precision:0.150
epoch08:precision:0.149
epoch09:precision:0.151
epoch10:precision:0.152
Bold:Indicatesanewterm,animportantword,orwordsthatyouseeonscreen.Forexample,wordsinmenusordialogboxesappearinthetextlikethis.Hereisanexample:"SelectSysteminfofromtheAdministrationpanel."
Warningsorimportantnotesappearlikethis.
Tipsandtricksappearlikethis.
GetintouchFeedbackfromourreadersisalwayswelcome.
Generalfeedback:Emailfeedback@packtpub.comandmentionthebooktitleinthesubjectofyourmessage.Ifyouhavequestionsaboutanyaspectofthisbook,[email protected].
Errata:Althoughwehavetakeneverycaretoensuretheaccuracyofourcontent,mistakesdohappen.Ifyouhavefoundamistakeinthisbook,wewouldbegratefulifyouwouldreportthistous.Pleasevisitwww.packtpub.com/submit-errata,selectingyourbook,clickingontheErrataSubmissionFormlink,andenteringthedetails.
Piracy:IfyoucomeacrossanyillegalcopiesofourworksinanyformontheInternet,wewouldbegratefulifyouwouldprovideuswiththelocationaddressorwebsitename.Pleasecontactusatcopyright@packtpub.comwithalinktothematerial.
Ifyouareinterestedinbecominganauthor:Ifthereisatopicthatyouhaveexpertiseinandyouareinterestedineitherwritingorcontributingtoabook,pleasevisitauthors.packtpub.com.
ReviewsPleaseleaveareview.Onceyouhavereadandusedthisbook,whynotleaveareviewonthesitethatyoupurchaseditfrom?Potentialreaderscanthenseeanduseyourunbiasedopiniontomakepurchasedecisions,weatPacktcanunderstandwhatyouthinkaboutourproducts,andourauthorscanseeyourfeedbackontheirbook.Thankyou!
FormoreinformationaboutPackt,pleasevisitpacktpub.com.
RecognizingtrafficsignsusingConvnetsAsthefirstprojectofthebook,we'lltrytoworkonasimplemodelwheredeeplearningperformsverywell:trafficsignrecognition.Briefly,givenacolorimageofatrafficsign,themodelshouldrecognizewhichsignalitis.Wewillexplorethefollowingareas:
HowthedatasetiscomposedWhichdeepnetworktouseHowtopre-processtheimagesinthedatasetHowtotrainandmakepredictionswithaneyeonperformance
ThedatasetSincewe'lltrytopredictsometrafficsignsusingtheirimages,wewilluseadatasetbuiltforthesamepurpose.Fortunately,researchersofInstitutefürNeuroinformatik,Germany,createdadatasetcontainingalmost40,000images,alldifferentandrelatedto43trafficsigns.ThedatasetwewilluseispartofacompetitionnamedGermanTrafficSignRecognitionBenchmark(GTSRB),whichattemptedtoscoretheperformanceofmultiplemodelsforthesamegoal.Thedatasetisprettyold—2011!Butitlookslikeaniceandwell-organizeddatasettostartourprojectfrom.
Thedatasetusedinthisprojectisfreelyavailableathttp://benchmark.ini.rub.de/Dataset/GTSRB_Final_Training_Images.zip.
Beforeyoustartrunningthecode,pleasedownloadthefileandunpackitinthesamedirectoryasthecode.Afterdecompressingthearchive,you'llhaveanewfolder,namedGTSRB,containingthedataset.
Theauthorsofthebookwouldliketothankthosewhoworkedonthedatasetandmadeitopensource.Also,referhttp://cs231n.github.io/convolutional-networks/tolearnmoreaboutCNN.
Let'snowseesomeexamples:
"Speedlimit20km/h":
"gostraightorturnright":
"roundabout":
Asyoucansee,thesignalsdon'thaveauniformbrightness(someareverydarkandsomeothersareverybright),they'redifferentinsize,theperspectiveisdifferent,theyhavedifferentbackgrounds,andtheymaycontainpiecesofothertrafficsigns.
Thedatasetisorganizedinthisway:alltheimagesofthesamelabelareinsidethesamefolder.Forexample,insidethepathGTSRB/Final_Training/Images/00040/,alltheimageshavethesamelabel,40.Fortheimageswithanotherlabel,5,openthefolderGTSRB/Final_Training/Images/00005/.NotealsothatalltheimagesareinPPMformat,alosslesscompressionformatforimageswithmanyopensourcedecoders/encoders.
TheCNNnetworkForourproject,wewilluseaprettysimplenetworkwiththefollowingarchitecture:
Inthisarchitecture,westillhavethechoiceof:
Thenumberoffiltersandkernelsizeinthe2DconvolutionThekernelsizeintheMaxpoolThenumberofunitsintheFullyConnectedlayerThebatchsize,optimizationalgorithm,learningstep(eventually,itsdecayrate),activationfunctionofeachlayer,andnumberofepochs
ImagepreprocessingThefirstoperationofthemodelisreadingtheimagesandstandardizingthem.Infact,wecannotworkwithimagesofvariablesizes;therefore,inthisfirststep,we'llloadtheimagesandreshapethemtoapredefinedsize(32x32).Moreover,wewillone-hotencodethelabelsinordertohavea43-dimensionalarraywhereonlyoneelementisenabled(itcontainsa1),andwewillconvertthecolorspaceoftheimagesfromRGBtograyscale.Bylookingattheimages,itseemsobviousthattheinformationweneedisnotcontainedinthecolorofthesignalbutinitsshapeanddesign.
Let'snowopenaJupyterNotebookandplacesomecodetodothat.Firstofall,let'screatesomefinalvariablescontainingthenumberofclasses(43)andthesizeoftheimagesafterbeingresized:
N_CLASSES=43
RESIZED_IMAGE=(32,32)
Next,wewillwriteafunctionthatreadsalltheimagesgiveninapath,resizethemtoapredefinedshape,convertthemtograyscale,andalsoone-hotencodethelabel.Inordertodothat,we'lluseanamedtuplenameddataset:
importmatplotlib.pyplotasplt
importglob
fromskimage.colorimportrgb2lab
fromskimage.transformimportresize
fromcollectionsimportnamedtuple
importnumpyasnp
np.random.seed(101)
%matplotlibinline
Dataset=namedtuple('Dataset',['X','y'])
defto_tf_format(imgs):
returnnp.stack([img[:,:,np.newaxis]forimginimgs],axis=0).astype(np.float32)
defread_dataset_ppm(rootpath,n_labels,resize_to):
images=[]
labels=[]
forcinrange(n_labels):
full_path=rootpath+'/'+format(c,'05d')+'/'
forimg_nameinglob.glob(full_path+"*.ppm"):
img=plt.imread(img_name).astype(np.float32)
img=rgb2lab(img/255.0)[:,:,0]
ifresize_to:
img=resize(img,resize_to,mode='reflect')
label=np.zeros((n_labels,),dtype=np.float32)
label[c]=1.0
images.append(img.astype(np.float32))
labels.append(label)
returnDataset(X=to_tf_format(images).astype(np.float32),
y=np.matrix(labels).astype(np.float32))
dataset=read_dataset_ppm('GTSRB/Final_Training/Images',N_CLASSES,RESIZED_IMAGE)
print(dataset.X.shape)
print(dataset.y.shape)
Thankstotheskimagemodule,theoperationofreading,transforming,andresizingisprettyeasy.Inourimplementation,wedecidedtoconverttheoriginalcolorspace(RGB)tolab,thenretainingonlytheluminancecomponent.NotethatanothergoodconversionhereisYUV,whereonlythe"Y"componentshouldberetainedasagrayscaleimage.
Runningtheprecedingcellgivesthis:
(39209,32,32,1)
(39209,43)
Onenoteabouttheoutputformat:theshapeoftheobservationmatrixXhasfourdimensions.Thefirstindexestheobservations(inthiscase,wehavealmost40,000ofthem);theotherthreedimensionscontaintheimage(whichis32pixel,by32pixelsgrayscale,thatis,one-dimensional).ThisisthedefaultshapewhendealingwithimagesinTensorFlow(seethecode_tf_formatfunction).
Asforthelabelmatrix,therowsindextheobservation,whilethecolumnsaretheone-hotencodingofthelabel.
Inordertohaveabetterunderstandingoftheobservationmatrix,let'sprintthefeaturevectorofthefirstsample,togetherwithitslabel:
plt.imshow(dataset.X[0,:,:,:].reshape(RESIZED_IMAGE))#sample
print(dataset.y[0,:])#label
[[1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.]]
Youcanseethattheimage,thatis,thefeaturevector,is32x32.Thelabelcontainsonlyone1inthefirstposition.
Let'snowprintthelastsample:
plt.imshow(dataset.X[-1,:,:,:].reshape(RESIZED_IMAGE))#sample
print(dataset.y[-1,:])#label
[[0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.
0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.]]
Thefeaturevectorsizeisthesame(32x32),andthelabelvectorcontainsone1inthelastposition.
Thesearethetwopiecesofinformationweneedtocreatethemodel.Please,payparticularattentiontotheshapes,becausethey'recrucialindeeplearningwhileworkingwithimages;incontrasttoclassicalmachinelearningobservationmatrices,heretheXhasfourdimensions!
Thelaststepofourpreprocessingisthetrain/testsplit.Wewanttotrainourmodelonasubsetofthedataset,andthenmeasuretheperformanceontheleftoversamples,thatis,thetestset.Todoso,let'susethefunctionprovidedbysklearn:
fromsklearn.model_selectionimporttrain_test_split
idx_train,idx_test=train_test_split(range(dataset.X.shape[0]),test_size=0.25,random_state=101)
X_train=dataset.X[idx_train,:,:,:]
X_test=dataset.X[idx_test,:,:,:]
y_train=dataset.y[idx_train,:]
y_test=dataset.y[idx_test,:]
print(X_train.shape)
print(y_train.shape)
print(X_test.shape)
print(y_test.shape)
Inthisexample,we'lluse75%ofthesamplesinthedatasetfortrainingandtheremaining25%fortesting.Infact,here'stheoutputofthepreviouscode:
(29406,32,32,1)
(29406,43)
(9803,32,32,1)
(9803,43)
TrainthemodelandmakepredictionsThefirstthingtohaveisafunctiontocreateminibatchesoftrainingdata.Infact,ateachtrainingiteration,we'dneedtoinsertaminibatchofsamplesextractedfromthetrainingset.Here,we'llbuildafunctionthattakestheobservations,labels,andbatchsizeasargumentsandreturnsaminibatchgenerator.Furthermore,tointroducesomevariabilityinthetrainingdata,let'saddanotherargumenttothefunction,thepossibilitytoshufflethedatatohavedifferentminibatchesofdataforeachgenerator.Havingdifferentminibatchesofdataineachgeneratorwillforcethemodeltolearnthein-outconnectionandnotmemorizethesequence:
defminibatcher(X,y,batch_size,shuffle):
assertX.shape[0]==y.shape[0]
n_samples=X.shape[0]
ifshuffle:
idx=np.random.permutation(n_samples)
else:
idx=list(range(n_samples))
forkinrange(int(np.ceil(n_samples/batch_size))):
from_idx=k*batch_size
to_idx=(k+1)*batch_size
yieldX[idx[from_idx:to_idx],:,:,:],y[idx[from_idx:to_idx],:]
Totestthisfunction,let'sprinttheshapesofminibatcheswhileimposingbatch_size=10000:
formbinminibatcher(X_train,y_train,10000,True):
print(mb[0].shape,mb[1].shape)
Thatprintsthefollowing:
(10000,32,32,1)(10000,43)
(10000,32,32,1)(10000,43)
(9406,32,32,1)(9406,43)
Unsurprisingly,the29,406samplesinthetrainingsetaresplitintotwominibatchesof10,000elements,withthelastoneof9406elements.Ofcourse,therearethesamenumberofelementsinthelabelmatrixtoo.
It'snowtimetobuildthemodel,finally!Let'sfirstbuildtheblocksthatwill
composethenetwork.Wecanstartcreatingthefullyconnectedlayerwithavariablenumberofunits(it'sanargument),withoutactivation.We'vedecidedtouseXavierinitializationforthecoefficients(weights)and0-initializationforthebiasestohavethelayercenteredandscaledproperly.Theoutputissimplythemultiplicationoftheinputtensorbytheweights,plusthebias.Pleasetakealookatthedimensionalityoftheweights,whichisdefineddynamically,andthereforecanbeusedanywhereinthenetwork:
importtensorflowastf
deffc_no_activation_layer(in_tensors,n_units):
w=tf.get_variable('fc_W',
[in_tensors.get_shape()[1],n_units],
tf.float32,
tf.contrib.layers.xavier_initializer())
b=tf.get_variable('fc_B',
[n_units,],
tf.float32,
tf.constant_initializer(0.0))
returntf.matmul(in_tensors,w)+b
Let'snowcreatethefullyconnectedlayerwithactivation;specifically,herewewillusetheleakyReLU.Asyoucansee,wecanbuildthisfunctionusingthepreviousone:
deffc_layer(in_tensors,n_units):
returntf.nn.leaky_relu(fc_no_activation_layer(in_tensors,n_units))
Finally,let'screateaconvolutionallayerthattakesasargumentstheinputdata,kernelsize,andnumberoffilters(orunits).Wewillusethesameactivationsusedinthefullyconnectedlayer.Inthiscase,theoutputpassesthroughaleakyReLUactivation:
defconv_layer(in_tensors,kernel_size,n_units):
w=tf.get_variable('conv_W',
[kernel_size,kernel_size,in_tensors.get_shape()[3],n_units],
tf.float32,
tf.contrib.layers.xavier_initializer())
b=tf.get_variable('conv_B',
[n_units,],
tf.float32,
tf.constant_initializer(0.0))
returntf.nn.leaky_relu(tf.nn.conv2d(in_tensors,w,[1,1,1,1],'SAME')+b)
Now,it'stimetocreateamaxpool_layer.Here,thesizeofthewindowandthestridesarebothsquares(quadrates):
defmaxpool_layer(in_tensors,sampling):
returntf.nn.max_pool(in_tensors,[1,sampling,sampling,1],[1,sampling,sampling,1],'SAME')
Thelastthingtodefineisthedropout,usedforregularizingthenetwork.Prettysimplethingtocreate,butrememberthatdropoutshouldonlybeusedwhentrainingthenetwork,andnotwhenpredictingtheoutputs;therefore,weneedtohaveaconditionaloperatortodefinewhethertoapplydropoutsornot:
defdropout(in_tensors,keep_proba,is_training):
returntf.cond(is_training,lambda:tf.nn.dropout(in_tensors,keep_proba),lambda:in_tensors)
Finally,it'stimetoputitalltogetherandcreatethemodelaspreviouslydefined.We'llcreateamodelcomposedofthefollowinglayers:
1. 2Dconvolution,5x5,32filters2. 2Dconvolution,5x5,64filters3. Flattenizer4. Fullyconnectedlater,1,024units5. Dropout40%6. Fullyconnectedlayer,noactivation7. Softmaxoutput
Here'sthecode:
defmodel(in_tensors,is_training):
#Firstlayer:5x52d-conv,32filters,2xmaxpool,20%drouput
withtf.variable_scope('l1'):
l1=maxpool_layer(conv_layer(in_tensors,5,32),2)
l1_out=dropout(l1,0.8,is_training)
#Secondlayer:5x52d-conv,64filters,2xmaxpool,20%drouput
withtf.variable_scope('l2'):
l2=maxpool_layer(conv_layer(l1_out,5,64),2)
l2_out=dropout(l2,0.8,is_training)
withtf.variable_scope('flatten'):
l2_out_flat=tf.layers.flatten(l2_out)
#Fullycollectedlayer,1024neurons,40%dropout
withtf.variable_scope('l3'):
l3=fc_layer(l2_out_flat,1024)
l3_out=dropout(l3,0.6,is_training)
#Output
withtf.variable_scope('out'):
out_tensors=fc_no_activation_layer(l3_out,N_CLASSES)
returnout_tensors
Andnow,let'swritethefunctiontotrainthemodelonthetrainingsetandtesttheperformanceonthetestset.Pleasenotethatallofthefollowingcodebelongstothefunctiontrain_modelfunction;it'sbrokendownintopiecesjustforsimplicityofexplanation.
Thefunctiontakesasarguments(otherthanthetrainingandtestsetsandtheir
labels)thelearningrate,thenumberofepochs,andthebatchsize,thatis,numberofimagespertrainingbatch.Firstthingsfirst,someTensorFlowplaceholdersaredefined:onefortheminibatchofimages,onefortheminibatchoflabels,andthelastonetoselectwhethertorunfortrainingornot(that'smainlyusedbythedropoutlayer):
fromsklearn.metricsimportclassification_report,confusion_matrix
deftrain_model(X_train,y_train,X_test,y_test,learning_rate,max_epochs,batch_size):
in_X_tensors_batch=tf.placeholder(tf.float32,shape=(None,RESIZED_IMAGE[0],RESIZED_IMAGE[1],1))
in_y_tensors_batch=tf.placeholder(tf.float32,shape=(None,N_CLASSES))
is_training=tf.placeholder(tf.bool)
Now,let'sdefinetheoutput,metricscore,andoptimizer.Here,wedecidedtousetheAdamOptimizerandthecrossentropywithsoftmax(logits)asloss:
logits=model(in_X_tensors_batch,is_training)
out_y_pred=tf.nn.softmax(logits)
loss_score=tf.nn.softmax_cross_entropy_with_logits(logits=logits,labels=in_y_tensors_batch)
loss=tf.reduce_mean(loss_score)
optimizer=tf.train.AdamOptimizer(learning_rate).minimize(loss)
Andfinally,here'sthecodefortrainingthemodelwithminibatches:
withtf.Session()assession:
session.run(tf.global_variables_initializer())
forepochinrange(max_epochs):
print("Epoch=",epoch)
tf_score=[]
formbinminibatcher(X_train,y_train,batch_size,shuffle=True):
tf_output=session.run([optimizer,loss],
feed_dict={in_X_tensors_batch:mb[0],
in_y_tensors_batch:
b[1],
is_training:True})
tf_score.append(tf_output[1])
print("train_loss_score=",np.mean(tf_score))
Afterthetraining,it'stimetotestthemodelonthetestset.Here,insteadofsendingaminibatch,wewillusethewholetestset.Mindit!is_trainingshouldbesetasFalsesincewedon'twanttousethedropouts:
print("TESTSETPERFORMANCE")
y_test_pred,test_loss=session.run([out_y_pred,loss],
feed_dict={in_X_tensors_batch:X_test,in_y_tensors_batch:y_test,is_training:False})
And,asafinaloperation,let'sprinttheclassificationreportandplottheconfusionmatrix(anditslog2version)toseethemisclassifications:
print("test_loss_score=",test_loss)
y_test_pred_classified=np.argmax(y_test_pred,axis=1).astype(np.int32)
y_test_true_classified=np.argmax(y_test,axis=1).astype(np.int32)
print(classification_report(y_test_true_classified,y_test_pred_classified))
cm=confusion_matrix(y_test_true_classified,y_test_pred_classified)
plt.imshow(cm,interpolation='nearest',cmap=plt.cm.Blues)
plt.colorbar()
plt.tight_layout()
plt.show()
#Andthelog2version,toenphasizethemisclassifications
plt.imshow(np.log2(cm+1),interpolation='nearest',cmap=plt.get_cmap("tab20"))
plt.colorbar()
plt.tight_layout()
plt.show()
tf.reset_default_graph()
Finally,let'srunthefunctionwithsomeparameters.Here,wewillrunthemodelwithalearningstepof0.001,256samplesperminibatch,and10epochs:
train_model(X_train,y_train,X_test,y_test,0.001,10,256)
Here'stheoutput:
Epoch=0
train_loss_score=3.4909246
Epoch=1
train_loss_score=0.5096467
Epoch=2
train_loss_score=0.26641673
Epoch=3
train_loss_score=0.1706828
Epoch=4
train_loss_score=0.12737551
Epoch=5
train_loss_score=0.09745725
Epoch=6
train_loss_score=0.07730477
Epoch=7
train_loss_score=0.06734192
Epoch=8
train_loss_score=0.06815668
Epoch=9
train_loss_score=0.060291935
TESTSETPERFORMANCE
test_loss_score=0.04581982
Thisisfollowedbytheclassificationreportperclass:
precisionrecallf1-scoresupport
01.000.960.9867
10.990.990.99539
20.991.000.99558
30.990.980.98364
40.990.990.99487
50.980.980.98479
61.000.991.00105
71.000.980.99364
80.990.990.99340
90.990.990.99384
100.991.001.00513
110.990.980.99334
120.991.001.00545
131.001.001.00537
141.001.001.00213
150.980.990.98164
161.000.990.9998
170.990.990.99281
181.000.980.99286
191.001.001.0056
200.990.970.9878
210.971.000.9895
221.001.001.0097
231.000.970.98123
241.000.960.9877
250.991.000.99401
260.980.960.97135
270.940.980.9660
281.000.970.98123
291.000.970.9969
300.880.990.93115
311.001.001.00178
320.980.960.9755
330.991.001.00177
340.990.990.99103
351.001.001.00277
360.991.000.9978
370.981.000.9963
381.001.001.00540
391.001.001.0060
401.000.980.9985
411.001.001.0047
420.981.000.9953
avg/total0.990.990.999803
Asyoucansee,wemanagedtoreachaprecisionof0.99onthetestset;also,recallandf1scorehavethesamescore.Themodellooksstablesincethelossinthetestsetissimilartotheonereportedinthelastiteration;therefore,we'renotover-fittingnorunder-fitting.
Andtheconfusionmatrices:
Thefollowingisthelog2versionofprecedingscreenshot:
Follow-upquestionsTryadding/removingsomeCNNlayersand/orfullyconnectedlayers.Howdoestheperformancechange?Thissimpleprojectisproofthatdropoutsarenecessaryforregularization.Changethedropoutpercentageandchecktheoverfitting-underfittingintheoutput.Now,takeapictureofmultipletrafficsignsinyourcity,andtestthetrainedmodelinreallife!
SummaryInthischapter,wesawhowtorecognizetrafficsignsusingaconvolutionalneuralnetwork,orCNN.Inthenextchapter,we'llseesomethingmorecomplexthatcanbedonewithCNNs.
AnnotatingImageswithObjectDetectionAPIComputervisionhasmadegreatleapsforwardinrecentyearsbecauseofdeeplearning,thusgrantingcomputersahighergradeinunderstandingvisualscenes.Thepotentialitiesofdeeplearninginvisiontasksaregreat:allowingacomputertovisuallyperceiveandunderstanditssurroundingsisacapabilitythatopensthedoortonewartificialintelligenceapplicationsinbothmobility(forinstance,self-drivingcarscandetectifanappearingobstacleisapedestrian,ananimaloranothervehiclefromthecameramountedonthecaranddecidethecorrectcourseofaction)andhuman-machineinteractionineveryday-lifecontexts(forinstance,allowingarobottoperceivesurroundingobjectsandsuccessfullyinteractwiththem).
AfterpresentingConvNetsandhowtheyoperateinthefirstchapter,wenowintendtocreateaquick,easyprojectthatwillhelpyoutouseacomputertounderstandimagestakenfromcamerasandmobilephones,usingimagescollectedfromtheInternetordirectlyfromyourcomputer'swebcam.Thegoaloftheprojectistofindtheexactlocationandthetypeoftheobjectsinanimage.
Inordertoachievesuchclassificationandlocalization,wewillleveragethenewTensorFlowobjectdetectionAPI,aGoogleprojectthatispartofthelargerTensorFlowmodelsprojectwhichmakesaseriesofpre-trainedneuralnetworksavailableoff-the-shelfforyoutowrapupinyourowncustomapplications.
Inthischapter,wearegoingtoillustratethefollowing:
TheadvantagesofusingtherightdataforyourprojectAbriefpresentationoftheTensorFlowobjectdetectionAPIHowtoannotatestoredimagesforfurtheruseHowtovisuallyannotateavideousingmoviepyHowtogoreal-timebyannotatingimagesfromawebcam