towards chainer v1.5

Towards Chainer v1.5

10/14 Chainer meetup @ PFI/PFN

Seiya Tokui (Preferred Networks)

Development history

l  6/12: v1.0

–  Basics of Variable/Function, FunctionSet & Optimizer, CUDA support

l  7/7: v1.1

–  Caffe referece model, type checking (forward/backward), Py3 support

l  8/19: v1.2

–  Many functions are added, collect_̲parameters is deprecated, remove type checking on backward

l  9/2: v1.3

–  CuPy, functions module is reorganized

2

CuPy

l  CUDA array implementation with NumPy-‐‑‒subset API

l  Custom elementwise and reduction kernels are still supported (with broadcasting)

l  No dependence on PyCUDA and scikits.cuda

–  Cf.) sudden renaming of scikit-‐‑‒cuda to scikits.cuda

l  NumPy API coverage is still incomplete

l  Most operations are not supported yet on the Function/Variable level

3

Development history

l  6/12: v1.0

–  Basics of Variable/Function, FunctionSet & Optimizer, CUDA support

l  7/7: v1.1

–  Caffe referece model, type checking (forward/backward), Py3 support

l  8/19: v1.2

–  Many functions are added, collect_̲parameters is deprecated, remove type checking on backward

l  9/2: v1.3

–  CuPy, functions module is reorganized

l  10/28: v1.4 (planned, delayed)

–  Some functions are added?

4

The cause of the delay

l  New model structure (#363)

l  Iʼ’ve been working on this since the release of v1.3

l  It is unexpectedly difficult to make the design

–  Still in designing phase

–  Iʼ’m planning to release this feature in v1.5

5

Objective

l  Replacement of FunctionSet/Optimizer

l  Goals:

–  Provide a solid way of sharing and reusing (sub)network definitions

–  Avoid the “to_̲cpu/to_̲gpu trap” between FunctionSet and Optimizer

–  Portable save/load

–  Make all functions pure for more flexibility and reusability

6

Solution (current idea)

l  Hierarchy of network definitions

l  Example:

–  An autoencoder uses an encoder network and a decoder network

–  Each of the networks might be MLPs, ConvNets, etc.

–  MLP consists of several fully-‐‑‒connected layers

–  Each fully-‐‑‒connected layer defines a simple operation on the input variable

l  Call each component a chain

l  Modeling in Chainer will be linking several chains into one big chain

7

Terminology

l  Link

–  A minimal component of the chain (e.g. Linear, Convolution2D, etc.)

–  “Parameterized function” in the previous versions

–  It combines parameter variables with input variables to compute the output variables

l  Chain, ChainList

–  Composition of child chains (including links)

–  Chain manages the child chains by a dictionary, while ChainList does by a list

8

Schematic of Link/Chain

9

Linear Linear Linear

Link Chain Function

layer1 layer2 layer3 predictor

x

t

loss

Example of a classifier with a multi-‐‑‒layer perceptron

MLP

Classifier

Schematic of Link/Chain

Example of Variational AutoEncoder

10

LinearLinear

LinearLinear Linearx

kld

nll

loss +

encoder decoder

z

VariationalAutoEncoder

MLP MLP(?)

Define by Run

l  Note that these diagrams do not mean the computational graph must be fixed at the defnition of chains

–  The graph is dynamically constructed on the forward computation (define-‐‑‒by-‐‑‒run)

l  A chain might implements multiple methods that constructs different graphs

11

Example (gist: https://goo.gl/JKQgSy)

12


13


14

User can freely design the predictor chain.


15


16User can freely design the encoder/decoder chains.

Planned features of Link/Chain/ChainList

l  The hierarchy is directly mapped to HDF5 format on serialization

–  Only the parameters and auxiliary variables (computed by learning) are saved

l  Helper method to traverse the hierarchy

–  Iterate all subchains in the hierarchy

–  Iterate all parameter variables in the hierarchy

17

New Optimizer

l  Optimizer is also updated

l  Optimizer will be aware of the target chain

–  Track the migration of the target chain between CPUs and GPUs

l  Optimizer is also serializable (in HDF5 format)

18

Parallel work: introduction of Cython

l  CuPy drawback: the CPU side manipulation is slow

l  No single huge bottleneck: the cause of slow down is already scattered

l  The easiest point to fix: ctypes

–  ctypes is verrrrrrrrrrrry slow

–  Even extracting the current device consumes non-‐‑‒negligible running time

–  @okuta san is trying to make Cython replace it

l  Major impact on the Chainer package

–  Low level interface will change

–  setup.py is drastically updated (since Cython extension requires Cython to build, while we have to make the package installable to environments into which Cython is not installed yet)

19

Future work

l  Lazy computation

–  See VAE example: it computes all intermediate variables in the _̲_̲call_̲_̲ operator, while there might be a usage that a user only wants some of them

–  Chainer currently computes eagerly, which causes unneeded computations

–  Avoiding unneeded computations is one of the easiest graph optimization

–  More in general, I believe that the future is in fusion of symbolic and dynamic paradigms

l  Symbolic optimization of computations on Variables (loop fusion, etc.)

l  Variable tags (or annotations)

–  Cf.) Blocks

l  Learning process abstraction, Data loading abstraction, etc.

20

towards chainer v1.5

Technology