in-place activated batchnormfor memory- optimized training...
TRANSCRIPT
![Page 1: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/1.jpg)
In-PlaceActivatedBatchNorm forMemory-OptimizedTrainingof
DNNsSamuelRotaBulò,LorenzoPorzi,PeterKontschieder
Mapillary ResearchPaper:https://arxiv.org/abs/1712.02616
Code:https://github.com/mapillary/inplace_abn
CSC2548,2018WinterHarrisChanJan31,2018
![Page 2: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/2.jpg)
Overview
• MotivationforEfficientMemorymanagement• RelatedWorks• Reducingprecision• Checkpointing• ReversibleNetworks[9](Gomezetal.,2017)
• In-PlaceActivatedBatchNormalization• Review:BatchNormalization• In-placeActivatedBatchNormalization
• Experiments• FutureDirections
![Page 3: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/3.jpg)
Overview
• MotivationforEfficientMemorymanagement• RelatedWorks• Reducingprecision• Checkpointing• ReversibleNetworks[9](Gomezetal.,2017)
• In-PlaceActivatedBatchNormalization• Review:BatchNormalization• In-placeActivatedBatchNormalization
• Experiments• FutureDirections
![Page 4: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/4.jpg)
WhyReduceMemoryUsage?
• Moderncomputervisionrecognitionmodelsusedeepneuralnetworkstoextractfeatures• Depth/widthofnetworks~ GPUmemoryrequirements
• Semanticsegmentation:mayevenonlydojustasinglecropperGPUduringtrainingduetosuboptimalmemorymanagement
• Moreefficientmemoryusageduringtrainingletsyou:• Trainlargermodels• Usebiggerbatchsize/imageresolutions
• Thispaperfocusesonincreasingmemoryefficiencyofthetrainingprocessofdeepnetworkarchitecturesattheexpenseofsmalladditionalcomputationtime
![Page 5: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/5.jpg)
ApproachestoReducingMemory
ReduceMemoryby…
ReducingPrecision(&Accuracy)
IncreasingComputationTime
![Page 6: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/6.jpg)
Overview
• MotivationforEfficientMemorymanagement• RelatedWorks• Reducingprecision• Checkpointing• ReversibleNetworks[9](Gomezetal.,2017)
• In-PlaceActivatedBatchNormalization• Review:BatchNormalization• In-placeActivatedBatchNormalization
• Experiments• FutureDirections
![Page 7: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/7.jpg)
RelatedWorks:ReducingPrecisionWork Weight Activation Gradients
BinaryConnect(M.Courbariaux etal.2015)
Binary FullPrecision FullPrecision
Binarized neuralnetworks(I.Hubara etal.2016)
Binary Binary FullPrecision
Quantizedneuralnetworks (I.Hubara etal)
Quantized 2,4,6bits
Quantized 2,4,6bits
FullPrecision
Mixedprecisiontraining(P.Micikevicius etal.2017)
HalfPrecision(fwd/bw) &FullPrecision
(masterweights)
HalfPrecision HalfPrecision
![Page 8: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/8.jpg)
RelatedWorks:ReducingPrecision• Idea: Duringtraining,lowertheprecision(uptobinary)oftheweights/activations/gradients
Strength Weakness
Reducememory requirementandsizeofthemodel
Oftendecrease inaccuracyperformance(newerworkattemptstoaddressthis)
Lesspower:efficient forwardpass
Faster:1-bitXNOR-countvs.32-bitfloatingpointmultiply
![Page 9: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/9.jpg)
RelatedWorks:ComputationTime• Checkpointing: tradeoffmemorywithcomputationtime• Idea:Duringbackpropagation,storeasubsetofactivations(“checkpoints”)andrecompute theremainingactivationsasneeded• Dependingonthearchitecture,wecanusedifferentstrategiestofigureoutwhichsubsetsofactivationstostore
![Page 10: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/10.jpg)
RelatedWorks:ComputationTime
Work Spatial Complexity ComputationComplexity
Naive Ο(𝐿) Ο(𝐿)Checkpointing (MartensandSutskever, 2012)
Ο( 𝐿� ) Ο(𝐿)
RecursiveCheckpointing(T.Chenetal., 2016)
Ο(log 𝐿) Ο(𝐿 log 𝐿)
ReversibleNetworks(Gomezetal.,2017)
Ο(1) Ο(𝐿)
TableadaptedfromGomezetal.,2017.“TheReversibleResidualNetwork:BackpropagationWithoutStoringActivations”.ArXiv Link
• LetL bethenumberofidenticalfeed-forwardlayers:
![Page 11: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/11.jpg)
RelatedWorks:ComputationTimeReversibleResNet (Gomezetal.,2017)
ResidualBlock
RevNet (Forward) RevNet (Backward)
Gomezetal.,2017.“TheReversibleResidualNetwork:BackpropagationWithoutStoringActivations”.ArXiv Link
BasicResidualFunction
Idea:ReversibleResidualmoduleallowsthecurrentlayer’sactivationtobereconstructedexactlyfromthenextlayer’s.Noneedtostoreanyactivationsforbackpropagation!
![Page 12: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/12.jpg)
RelatedWorks:ComputationTimeReversibleResNet (Gomezetal.,2017)
• Nonoticeablelossinperformance• Gainsinnetworkdepth:~600vs
~100• 4xincreaseinbatchsize(128vs32)Ad
vantage
Disadvan
tage
Gomezetal.,2017.“TheReversibleResidualNetwork:BackpropagationWithoutStoringActivations”.ArXiv Link
• Runtimecost:1.5xofnormaltraining(sometimeslessinpractice)
• Restrictreversibleblockstohaveastrideof1tonotdiscardinformation(i.e.nobottlenecklayer)
![Page 13: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/13.jpg)
Overview
• MotivationforEfficientMemorymanagement• RelatedWorks• Reducingprecision• Checkpointing• ReversibleNetworks[9](Gomezetal.,2017)
• In-PlaceActivatedBatchNormalization• Review:BatchNormalization• In-placeActivatedBatchNormalization
• Experiments• FutureDirections
![Page 14: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/14.jpg)
Review:BatchNormalization(BN)
• ApplyBNoncurrentfeatures(𝑥+)acrossthemini-batch• Helpsreduceinternalcovariateshift &acceleratetrainingprocess• Lesssensitivetoinitialization Credit:Ioffe &Szegedy,2015.ArXiv link
![Page 15: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/15.jpg)
MemoryOptimizationStrategies
• Let’scomparethevariousstrategiesforBN+Act:1. Standard2. Checkpointing (baseline)3. Checkpointing (proposed)4. In-PlaceActivatedBatchNormalizationI5. In-PlaceActivatedBatchNormalizationII
![Page 16: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/16.jpg)
1:StandardBNImplementation
![Page 17: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/17.jpg)
GradientsforBatchNormalization
Credit:Ioffe &Szegedy,2015.“BatchNormalization:AcceleratingDeepNetworkTrainingbyReducingInternalCovariateShift”.ArXivlink
![Page 18: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/18.jpg)
2:Checkpointing (baseline)
![Page 19: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/19.jpg)
3:Checkpointing (Proposed)
![Page 20: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/20.jpg)
In-PlaceABN
• Fusebatchnormandactivationlayertoenablein-placecomputation,usingonlyasinglememorybuffertostoreresults.• Encapsulationmakesiteasytoimplementanddeploy• ImplementedINPLACEABN-IlayerinPyTorch asanewmodule
![Page 21: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/21.jpg)
4:In-PlaceABNI(Proposed)
InvertibleActivationFunction
𝛾 ≠ 0
![Page 22: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/22.jpg)
LeakyReLU isInvertible
![Page 23: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/23.jpg)
5:In-PlaceABNII(Proposed)
![Page 24: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/24.jpg)
StrategiesComparisonsStrategy Store ComputationOverhead
Standard 𝒙, 𝒛, 𝝈ℬ, 𝝁ℬ -
Checkpointing 𝒙, 𝝈ℬ, 𝝁ℬ 𝐵𝑁8,9, 𝜙
Checkpointing(proposed)
𝒙, 𝝈ℬ 𝜋8,9, 𝜙
In-PlaceABNI(proposed)
𝒛, 𝝈ℬ 𝜙<=, 𝜋8,9<=
In-PlaceABNII(proposed)
𝒛, 𝝈ℬ 𝜙<=
![Page 25: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/25.jpg)
In-PlaceABN(Proposed)
![Page 26: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/26.jpg)
In-PlaceABN(Proposed)Strength Weakness
Reducememory requirementbyhalfcomparedtostandard;samesavingsascheckpointing
Requiresinvertibleactivationfunction
Empiricallyfasterthannaïvecheckpointing
…butstillslowerthanstandard(memoryhungry)implementation.
Encapsulating BN&Activationtogether makesiteasytoimplementanddeploy(plug& play)
![Page 27: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/27.jpg)
Overview
• MotivationforEfficientMemorymanagement• RelatedWorks• Reducingprecision• Checkpointing• ReversibleNetworks[9](Gomezetal.,2017)
• In-PlaceActivatedBatchNormalization• Review:BatchNormalization• In-placeActivatedBatchNormalization
• Experiments• FutureDirections
![Page 28: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/28.jpg)
Experiments:Overview
• 3Majortypes:• Performanceon:(1)ImageClassification,(2)SemanticSegmentation• (3)TimingAnalysiscomparedtostandard/checkpointing
• ExperimentSetup:• NVIDIATitanXp (12GBRAM/GPU)• PyTorch• LeakyReLU activation
![Page 29: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/29.jpg)
Experiments:ImageClassificationResNeXt-101/ResNeXt-152 WideResNet-38
Dataset ImageNet-1k ImageNet-1k
Description Bottleneckresidualunitsarereplacedwithamulti-branchversion=“cardinality”of64
More featurechannelsbutshallower
DataAugmentation
Scalesmallestside=256pixelsthenrandomlycrop224× 224,per-channelmeanandvariancenormalization
(SameasResNeXt-101/152)
Optimizer • SGDwithNesterovUpdates
• Initiallearningrate=0.1• weightdecay=10-4• momentum=0.9• 90Epoch,reduceby
factorof10per30epoch
• (Same asResNeXt)• 90Epoch,linearly
decreasingfrom0.1 to10-6
![Page 30: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/30.jpg)
Experiments:LeakyReLU impact
• UsingLeakyReLU performsslightlyworsethanwithReLU• Within~1%,exceptfor3202centercrop—authours argueditwasdue
tonon-deterministictrainingbehaviour• Weaknesses
• Showinganaverage+standarddeviationcanbemoreconvincingoftheimprovements.
![Page 31: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/31.jpg)
Experiments:ExploitingMemorySaving
Baseline1)LargerBatchSize2)DeeperNetwork3)LargerNetwork4)SyncBN
• Performanceincreasefor1-3• Similarperformancewithlargerbatchsizevsdeepermodel(1vs2)• SynchronizedINPLACE-ABNdidnotincreasetheperformancethat
much• NotesonsynchronizedBN:http://hangzh.com/PyTorch-
Encoding/notes/syncbn.html
![Page 32: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/32.jpg)
Experiments:SemanticSegmentation
• SemanticSegmentation:Assigncategoricallabelstoeachpixelinanimage• Datasets• CityScapes• COCO-Stuff• Mapillary Vistas
FigureCredit:https://www.cityscapes-dataset.com/examples/
![Page 33: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/33.jpg)
Experiments:SemanticSegmentation
• Architecturecontains2partsthatarejointlyfine-tunedonsegmentationdata:• Body:Classificationmodelspre-trainedonImageNet• Head:Segmentationspecificarchitectures
• Authours usedDeepLabV3*asthehead• Cascadedatrous (dilated)convolutionsforcapturingcontextualinfo
• Crop-levelfeaturesencodingglobalcontext• MaximizeGPUUsageby:
• (FIXEDCROP)fixingthetrainingcropsizeandthereforepushingtheamountofcropsperminibatch tothelimit
• (FIXEDBATCH) fixingthenumberofcropsperminibatch andmaximizingthetrainingcropresolutions
*L.Chen,G.Papandreou,F.Schroff,andH.Adam.“Rethinkingatrous convolutionforsemanticimagesegmentation.”ArXivLink
![Page 34: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/34.jpg)
Experiments:SemanticSegmentation
• Moretrainingdata(FIXEDCROP) helpsalittlebit• Higherinputresolution(FIXEDBATCH) helpsevenmorethanadding
morecrops
• Noqualitativeresult:probablyvisuallysimilartoDeepLabV3
![Page 35: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/35.jpg)
Experiments:SemanticSegmentationFine-TunedonCityScapes andMapillaryVistas
• CombinationofINPLACE-ABNsyncwithlargercropsizesimprovesby≈0.9%overthebestperformingsettinginTable3
• Class- Uniformsampling:Class-uniformlysampledfromeligibleimagecandidates,makingsuretotaketrainingcropsfromareascontainingtheclassofinterest.
![Page 36: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/36.jpg)
Experiments:SemanticSegmentation• CurrentlystateoftheartforCityScapes forIoU classandiIoU (instance)Class• iIoU:Weightingthecontributionofeachpixelbytheratiooftheclass’averageinstancesizetothesizeoftherespectivegroundtruthinstance.
![Page 37: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/37.jpg)
Experiments:TimingAnalyses
• TheyisolatedasingleBN+ACT+CONVblock&evaluatethecomputationaltimesrequiredforaforwardandbackwardpass• Result:Narrowedthegapbetweenstandard vscheckpointing by half• Ensuredfaircomparisonbyre-implementingcheckpointing inPyTorch
![Page 38: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/38.jpg)
Overview
• MotivationforEfficientMemorymanagement• RelatedWorks• Reducingprecision• Checkpointing• ReversibleNetworks[9](Gomezetal.,2017)
• In-PlaceActivatedBatchNormalization• Review:BatchNormalization• In-placeActivatedBatchNormalization
• Experiments• FutureDirections
![Page 39: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/39.jpg)
FutureDirections:
• ApplyINPLACE-ABNinother…• Architectures:DenseNet,Squeeze-ExcitationNetworks,DeformableConvolutionalNetworks• ProblemDomains: Objectdetection,instance-specificsegmentation,3Ddatalearning
• CombineINPLACE-ABNwithothermemoryreductiontechniques,ex:Mixedprecisiontraining• ApplysameInPlace ideaon’newer’BatchNorm,ex:BatchRenormalization*
*S.Ioffe.“BatchRenormalization:TowardsReducingMinibatch DependenceinBatch-NormalizedModels.”ArXivLink
![Page 40: In-Place Activated BatchNormfor Memory- Optimized Training ...fidler/teaching/2018/slides/CSC2548/INPBN... · •Motivation for Efficient Memory management •Related Works •Reducing](https://reader033.vdocument.in/reader033/viewer/2022050607/5fae86f6fcf94416c8424cb4/html5/thumbnails/40.jpg)
LinksandReferences
• INPLACE-ABNPaper:https://arxiv.org/pdf/1712.02616.pdf• OfficialGithub code(PyTorch):https://github.com/mapillary/inplace_abn• CityScapes Dataset:https://www.cityscapes-dataset.com/benchmarks/#scene-labeling-task• ReducedPrecision:
• BinaryConnect:https://arxiv.org/abs/1511.00363• Binarized Networks:https://arxiv.org/abs/1602.02830• MixedPrecisionTraining:https://arxiv.org/abs/1710.03740
• TradeoffwithComputationTime• Checkpointing:https://www.cs.utoronto.ca/~jmartens/docs/HF_book_chapter.pdf
• RecursiveCheckpointing:https://arxiv.org/abs/1604.06174• ReversibleNetworks:https://arxiv.org/abs/1707.04585