Download - CUDA & CAFFE
![Page 1: CUDA & CAFFE](https://reader030.vdocument.in/reader030/viewer/2022013111/55a690b41a28aba0418b4613/html5/thumbnails/1.jpg)
CUDA & CAFFE
Использование CUDA и CAFFE для создания глубоких нейронных сетей
Babii A.S. - [email protected]
![Page 2: CUDA & CAFFE](https://reader030.vdocument.in/reader030/viewer/2022013111/55a690b41a28aba0418b4613/html5/thumbnails/2.jpg)
Why we need to learn methods of ‘deep learning’
![Page 3: CUDA & CAFFE](https://reader030.vdocument.in/reader030/viewer/2022013111/55a690b41a28aba0418b4613/html5/thumbnails/3.jpg)
Deep learning for image recognition tasks
Image classification
Object detection and localization
Object class segmentation
![Page 4: CUDA & CAFFE](https://reader030.vdocument.in/reader030/viewer/2022013111/55a690b41a28aba0418b4613/html5/thumbnails/4.jpg)
Problems related with dataset size
What if we have a large dataset?
![Page 5: CUDA & CAFFE](https://reader030.vdocument.in/reader030/viewer/2022013111/55a690b41a28aba0418b4613/html5/thumbnails/5.jpg)
What about types of parallel computing?
GPU - specificCPU - specific
![Page 6: CUDA & CAFFE](https://reader030.vdocument.in/reader030/viewer/2022013111/55a690b41a28aba0418b4613/html5/thumbnails/6.jpg)
1. Saman Amarasinghe, Matrix Multiply, a case study – 2008.
Optimization table for matrix multiplication[1]
![Page 7: CUDA & CAFFE](https://reader030.vdocument.in/reader030/viewer/2022013111/55a690b41a28aba0418b4613/html5/thumbnails/7.jpg)
If no parallelization, but we want to make it faster
1. Use profiler(gprof, valgrind, … )
2. Does application using BLAS?
3. Use vector or matrix form of data representation and include BLAS
4. SIMD – if no other way… use it for maximum perfomance on 1 core
![Page 8: CUDA & CAFFE](https://reader030.vdocument.in/reader030/viewer/2022013111/55a690b41a28aba0418b4613/html5/thumbnails/8.jpg)
Бабий А.С. - [email protected]
How to make it parallel?.
1. KML, PBLAS, ATLAS
2. Когда CPU Multicore эффективнее GPU ?
3. NVIDIA CUDA.
4. OpenCL
![Page 9: CUDA & CAFFE](https://reader030.vdocument.in/reader030/viewer/2022013111/55a690b41a28aba0418b4613/html5/thumbnails/9.jpg)
CUDA
![Page 10: CUDA & CAFFE](https://reader030.vdocument.in/reader030/viewer/2022013111/55a690b41a28aba0418b4613/html5/thumbnails/10.jpg)
Deep convolutional neural networks, CAFFE implementation
ConvNet configuration by Krizhevsky [2]
![Page 11: CUDA & CAFFE](https://reader030.vdocument.in/reader030/viewer/2022013111/55a690b41a28aba0418b4613/html5/thumbnails/11.jpg)
Deep convolutionnetwork example
Convolution Neural Network Architecture Model[3]
Feature maps
![Page 12: CUDA & CAFFE](https://reader030.vdocument.in/reader030/viewer/2022013111/55a690b41a28aba0418b4613/html5/thumbnails/12.jpg)
http://www.songho.ca/dsp/convolution/convolution.html
Convolution & pooling
![Page 13: CUDA & CAFFE](https://reader030.vdocument.in/reader030/viewer/2022013111/55a690b41a28aba0418b4613/html5/thumbnails/13.jpg)
Набор примитивов для сетей Deep Learning
1. Сверточный слой2. Слой фильтрации3. Обобщающий слой
Интеграция с Caffe
24-core Intel E5-2679v2 CPU @ 2.4GHz vs K40, NVIDIA
![Page 14: CUDA & CAFFE](https://reader030.vdocument.in/reader030/viewer/2022013111/55a690b41a28aba0418b4613/html5/thumbnails/14.jpg)
Feature maps
Feature map [4]
Накладываем друг на друга но, с «коэффициентом прозрачности»
![Page 15: CUDA & CAFFE](https://reader030.vdocument.in/reader030/viewer/2022013111/55a690b41a28aba0418b4613/html5/thumbnails/15.jpg)
Библиотеки для работы с deep learning
Caffe – deep convolutional neural network frameworkhttp://caffe.berkeleyvision.org ConvNetJS – JS based deep learning frameworkhttp://cs.stanford.edu/people/karpathy/convnetjs/DL4J - Java based deep learning frameworkhttp://deeplearning4j.org/Theano – CPU/GPU symbolic expression compiler in pythonhttp://deeplearning.net/software/theanoCuda-Convnet – A fast C++/CUDA implementation of convolutional (or more generally, feed-forward) neural networkshttp://code.google.com/p/cuda-convnet/Torch – provides a Matlab-like environment for state-of-the-art machine learning algorithms in luahttp://www.torch.ch/Accord.NET - C# deep learninghttp://accord-framework.net/, tutorial:http://whoopsidaisies.hatenablog.com/entry/2014/08/19/015420
http://deeplearning.net/software_links/
![Page 16: CUDA & CAFFE](https://reader030.vdocument.in/reader030/viewer/2022013111/55a690b41a28aba0418b4613/html5/thumbnails/16.jpg)
Работа с CAFFE
Начинать лучше с утилит командной строки:
build/tools
Наиболее доступный пример на базе MNIST – распознавания рукописных цифр
http://caffe.berkeleyvision.org/gathered/examples/mnist.html
cd $CAFFE_ROOT./data/mnist/get_mnist.sh./examples/mnist/create_mnist.sh
cd $CAFFE_ROOT./examples/mnist/train_lenet.sh
![Page 17: CUDA & CAFFE](https://reader030.vdocument.in/reader030/viewer/2022013111/55a690b41a28aba0418b4613/html5/thumbnails/17.jpg)
В каком виде подаются входные и выходные данные?
- databases (LevelDB or LMDB)
- directly from memory
- from files on disk in HDF5
- common image formats.
http://symas.com/mdb/ http://leveldb.org/
Input data
Output data
-snapshot file with mode
-snapshot file with solver state
Solver? Yes, we can continue breacked training from snapshot
![Page 18: CUDA & CAFFE](https://reader030.vdocument.in/reader030/viewer/2022013111/55a690b41a28aba0418b4613/html5/thumbnails/18.jpg)
Виды слоев CAFFE
Caffe stores and communicates data in 4-dimensional arrays called blobsname: "LogReg"layers { name: "mnist" type: DATA top: "data" top: "label" data_param { source: "input_leveldb" batch_size: 64 }}layers { name: "ip" type: INNER_PRODUCT bottom: "data" top: "ip" inner_product_param { num_output: 2 }}layers { name: "loss" type: SOFTMAX_LOSS bottom: "ip" bottom: "label" top: "loss"}
![Page 19: CUDA & CAFFE](https://reader030.vdocument.in/reader030/viewer/2022013111/55a690b41a28aba0418b4613/html5/thumbnails/19.jpg)
Виды слоев
Convolutional layerRequired field num_output (c_o): the number of filters kernel_size (or kernel_h and kernel_w): specifies height and width of each filter
Pooling layerRequired kernel_size (or kernel_h and kernel_w): specifies height and width of each filter
Loss Layers, Activation / Neuron Layers, Data Layers, Common Layers
How to configure?
Ready to use models in folder: examples
![Page 20: CUDA & CAFFE](https://reader030.vdocument.in/reader030/viewer/2022013111/55a690b41a28aba0418b4613/html5/thumbnails/20.jpg)
Решение своей задачи
1. Заботимся о корректности, размере и покрытии выборок.
2. Компилируем Caffe с поддержкой GPU.
3. Конфигурируем сеть, отталкиваясь от примеров.
4. Тренируем, смотрим на результат тестовой выборки.
5. Если результат не устраивает- настраиваем и тренируем до получения достаточного результата
6. Для использования натренированной сети для одиночныхИзображений необходимо написать конфиг и воспользоваться C++, Python или Mathlab.
![Page 21: CUDA & CAFFE](https://reader030.vdocument.in/reader030/viewer/2022013111/55a690b41a28aba0418b4613/html5/thumbnails/21.jpg)
References
1. L. Deng and D. Yu, "Deep Learning: Methods and Applications“ http://research.microsoft.com/pubs/209355/DeepLearning-NowPublishing-Vol7-
SIG-039.pdf2. ConvNet configuration by Krizhevsky et alhttp://books.nips.cc/papers/files/nips25/NIPS2012_0534.pdf3. Efficient mapping of the training of Convolutional Neural Networks to a CUDA-based cluster http://parse.ele.tue.nl/education/cluster24. http://www.cs.toronto.edu/~ranzato/research/projects.html5. http://www.amolgmahurkar.com/classifySTLusingCNN.html
Спасибо за внимание !