towards chainer v1.5

20
Towards Chainer v1.5 10/14 Chainer meetup @ PFI/PFN Seiya Tokui (Preferred Networks)

Upload: seiya-tokui

Post on 23-Jan-2018

85.188 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Towards Chainer v1.5

Towards  Chainer  v1.5

10/14  Chainer  meetup  @  PFI/PFN

Seiya  Tokui  (Preferred  Networks)

Page 2: Towards Chainer v1.5

Development  history

l  6/12:  v1.0

–  Basics  of  Variable/Function,  FunctionSet  &  Optimizer,  CUDA  support

l  7/7:  v1.1

–  Caffe  referece  model,  type  checking  (forward/backward),  Py3  support

l  8/19:  v1.2

–  Many  functions  are  added,  collect_̲parameters  is  deprecated,  remove  type  checking  on  backward

l  9/2:  v1.3

–  CuPy,  functions  module  is  reorganized

2

Page 3: Towards Chainer v1.5

CuPy

l  CUDA  array  implementation  with  NumPy-‐‑‒subset  API

l  Custom  elementwise  and  reduction  kernels  are  still  supported  (with  broadcasting)

l  No  dependence  on  PyCUDA  and  scikits.cuda

–  Cf.)  sudden  renaming  of  scikit-‐‑‒cuda  to  scikits.cuda

l  NumPy  API  coverage  is  still  incomplete

l  Most  operations  are  not  supported  yet  on  the  Function/Variable  level

3

Page 4: Towards Chainer v1.5

Development  history

l  6/12:  v1.0

–  Basics  of  Variable/Function,  FunctionSet  &  Optimizer,  CUDA  support

l  7/7:  v1.1

–  Caffe  referece  model,  type  checking  (forward/backward),  Py3  support

l  8/19:  v1.2

–  Many  functions  are  added,  collect_̲parameters  is  deprecated,  remove  type  checking  on  backward

l  9/2:  v1.3

–  CuPy,  functions  module  is  reorganized

l  10/28:  v1.4  (planned,  delayed)

–  Some  functions  are  added?

4

Page 5: Towards Chainer v1.5

The  cause  of  the  delay

l  New  model  structure  (#363)

l  Iʼ’ve  been  working  on  this  since  the  release  of  v1.3

l  It  is  unexpectedly  difficult  to  make  the  design

–  Still  in  designing  phase

–  Iʼ’m  planning  to  release  this  feature  in  v1.5

5

Page 6: Towards Chainer v1.5

Objective

l  Replacement  of  FunctionSet/Optimizer

l  Goals:

–  Provide  a  solid  way  of  sharing  and  reusing  (sub)network  definitions

–  Avoid  the  “to_̲cpu/to_̲gpu  trap”  between  FunctionSet  and  Optimizer

–  Portable  save/load

–  Make  all  functions  pure  for  more  flexibility  and  reusability

6

Page 7: Towards Chainer v1.5

Solution  (current  idea)

l  Hierarchy  of  network  definitions

l  Example:

–  An  autoencoder  uses  an  encoder  network  and  a  decoder  network

–  Each  of  the  networks  might  be  MLPs,  ConvNets,  etc.

–  MLP  consists  of  several  fully-‐‑‒connected  layers

–  Each  fully-‐‑‒connected  layer  defines  a  simple  operation  on  the  input  variable

l  Call  each  component  a  chain

l  Modeling  in  Chainer  will  be  linking  several  chains  into  one  big  chain

7

Page 8: Towards Chainer v1.5

Terminology

l  Link

–  A  minimal  component  of  the  chain  (e.g.  Linear,  Convolution2D,  etc.)

–  “Parameterized  function”  in  the  previous  versions

–  It  combines  parameter  variables  with  input  variables  to  compute  the  output  variables

l  Chain,  ChainList

–  Composition  of  child  chains  (including  links)

–  Chain  manages  the  child  chains  by  a  dictionary,  while  ChainList  does  by  a  list

8

Page 9: Towards Chainer v1.5

Schematic  of  Link/Chain

9

Linear Linear Linear

Link Chain Function

layer1 layer2 layer3 predictor

x

t

loss

Example  of  a  classifier  with  a  multi-‐‑‒layer  perceptron

MLP

Classifier

Page 10: Towards Chainer v1.5

Schematic  of  Link/Chain

Example  of  Variational  AutoEncoder

10

LinearLinear

LinearLinear Linearx

kld

nll

loss +

encoder decoder

z

VariationalAutoEncoder

MLP MLP(?)

Page 11: Towards Chainer v1.5

Define  by  Run

l  Note  that  these  diagrams  do  not  mean  the  computational  graph  must  be  fixed  at  the  defnition  of  chains

–  The  graph  is  dynamically  constructed  on  the  forward  computation  (define-‐‑‒by-‐‑‒run)

l  A  chain  might  implements  multiple  methods  that  constructs  different  graphs

11

Page 12: Towards Chainer v1.5

Example  (gist:  https://goo.gl/JKQgSy)

12

Page 13: Towards Chainer v1.5

Example  (gist:  https://goo.gl/JKQgSy)

13

Page 14: Towards Chainer v1.5

Example  (gist:  https://goo.gl/JKQgSy)

14

User can freely design the predictor chain.

Page 15: Towards Chainer v1.5

Example  (gist:  https://goo.gl/JKQgSy)

15

Page 16: Towards Chainer v1.5

Example  (gist:  https://goo.gl/JKQgSy)

16User can freely design the encoder/decoder chains.

Page 17: Towards Chainer v1.5

Planned  features  of  Link/Chain/ChainList

l  The  hierarchy  is  directly  mapped  to  HDF5  format  on  serialization

–  Only  the  parameters  and  auxiliary  variables  (computed  by  learning)  are  saved

l  Helper  method  to  traverse  the  hierarchy

–  Iterate  all  subchains  in  the  hierarchy

–  Iterate  all  parameter  variables  in  the  hierarchy

17

Page 18: Towards Chainer v1.5

New  Optimizer

l  Optimizer  is  also  updated

l  Optimizer  will  be  aware  of  the  target  chain

–  Track  the  migration  of  the  target  chain  between  CPUs  and  GPUs

l  Optimizer  is  also  serializable  (in  HDF5  format)

18

Page 19: Towards Chainer v1.5

Parallel  work:  introduction  of  Cython

l  CuPy  drawback:  the  CPU  side  manipulation  is  slow

l  No  single  huge  bottleneck:  the  cause  of  slow  down  is  already  scattered

l  The  easiest  point  to  fix:  ctypes

–  ctypes  is  verrrrrrrrrrrry  slow

–  Even  extracting  the  current  device  consumes  non-‐‑‒negligible  running  time

–  @okuta  san  is  trying  to  make  Cython  replace  it

l  Major  impact  on  the  Chainer  package

–  Low  level  interface  will  change

–  setup.py  is  drastically  updated  (since  Cython  extension  requires  Cython  to  build,  while  we  have  to  make  the  package  installable  to  environments  into  which  Cython  is  not  installed  yet)

19

Page 20: Towards Chainer v1.5

Future  work

l  Lazy  computation

–  See  VAE  example:  it  computes  all  intermediate  variables  in  the  _̲_̲call_̲_̲  operator,  while  there  might  be  a  usage  that  a  user  only  wants  some  of  them

–  Chainer  currently  computes  eagerly,  which  causes  unneeded  computations

–  Avoiding  unneeded  computations  is  one  of  the  easiest  graph  optimization

–  More  in  general,  I  believe  that  the  future  is  in  fusion  of  symbolic  and  dynamic  paradigms

l  Symbolic  optimization  of  computations  on  Variables  (loop  fusion,  etc.)

l  Variable  tags  (or  annotations)

–  Cf.)  Blocks

l  Learning  process  abstraction,  Data  loading  abstraction,  etc.

20