typesafe abstractions for tensor operations · codes whateachaxismeans...

6
Typesafe Abstractions for Tensor Operations (Short Paper) Tongfei Chen Johns Hopkins University, USA [email protected] Abstract We propose a typesafe abstraction to tensors (i.e. multidi- mensional arrays) exploiting the type-level programming capabilities of Scala through heterogeneous lists (HList), and showcase typesafe abstractions of common tensor op- erations and various neural layers such as convolution or recurrent neural networks. This abstraction could lay the foundation of future typesafe deep learning frameworks that runs on Scala/JVM. CCS Concepts Software and its engineering Soft- ware libraries and repositories; Keywords Scala, tensor, heterogeneous list, deep learning ACM Reference Format: Tongfei Chen. 2017. Typesafe Abstractions for Tensor Operations (Short Paper). In Proceedings of 8th ACM SIGPLAN International Scala Symposium (SCALA’17). ACM, New York, NY, USA, 6 pages. hps://doi.org/10.1145/3136000.3136001 1 Introduction Recently the machine learning community saw a surge of libraries that handle tensors. Examples include Python li- braries such as NumPy [Walt et al. 2011], Theano [Theano Development Team 2016], TensorFlow [Abadi et al. 2015], PyTorch 1 , DyNet [Neubig et al. 2017], or Java libraries like Nd4j 2 ). These libraries provide abstractions to tensor op- erations on CPUs or GPUs, and some support automatic dierentiation on computational graphs. For tensors in these libraries (belongs to just one type NdArray/Tensor), specic meaning is implicitly assigned to each of the dimensions. Correctly manipulating the dierent dimensions of tensors could be dicult, rendering the whole program prone to runtime errors and hard to reason or maintain: we could mistakenly added up an 1-D tensor and 2-D tensor, or per- formed matrix multiplication between a 2-D tensor and a 1 github.com/pytorch/pytorch. 2 github.com/deeplearning4j/nd4j. SCALA’17, October 22–23, 2017, Vancouver, Canada © 2017 Copyright held by the owner/author(s). Publication rights licensed to Association for Computing Machinery. This is the author’s version of the work. It is posted here for your personal use. Not for redistribution. The denitive Version of Record was published in Proceedings of 8th ACM SIGPLAN International Scala Symposium (SCALA’17), hps://doi.org/10.1145/3136000.3136001. 3-D tensor. These errors are only discovered at runtime, or in some libraries, even ignored because of implicit broad- casting. Programmers must keep track of the axes of the tensors themselves (usually as comments in code), rather than leveraging the type system to guide the programmers. We would like to have a typing mechanism for tensors that not only encodes the rank of the tensor, but also en- codes what each axis means. Scala, being a statically-checked type-safe language that is highly capable of type-level pro- gramming and DSL construction, as is demonstrated in the popular library Shapeless 3 , makes it an ideal language for implementing such typeful and typesafe tensor abstractions. We describe the design and implementation of such a typesafe abstraction that addresses the typesafety problem of other tensor libraries, and release a prototype. 4 2 Typesafe Tensors We propose a typesafe tensor abstraction, in which the axes are encoded by a heterogeneous list (HList) type parameter. trait Tensor[D, A <: HList] D is the type of the elements this tensor holds (e.g. Float, Double, etc.), whereas types in the HList type parameter A are phantom types, i.e., they only serve as labels. Basic constructs such as scalars, vectors and matrices could be represented as follows (types A, B etc. are labels / names to axes). type Scalar = Tensor[Float, HNil] type Vector[A] = Tensor[Float, A :: HNil] type Matrix[A, B] = Tensor[Float, A :: B :: HNil] Looking at more concrete examples from deep learning applications, images in computer vision, or sentences in which each word is mapped to a word embedding (i.e. vector representation of that word in R d space) in natural language processing could be encoded as follows. (See Fig. 1) type Image = Tensor[Float, Width :: Height :: Channel :: HNil] type EmbeddedSentence = Tensor[Float, Word :: Embedding :: HNil] By encoding the meaning of each axis into the type, our system guarantees that all operations on tensors are allowed 3 github.com/milessabin/shapeless. 4 github.com/ctongfei/nexus. arXiv:1710.06892v1 [cs.PL] 18 Oct 2017

Upload: others

Post on 19-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Typesafe Abstractions for Tensor Operations · codes whateachaxismeans .Scala,beingastatically-checked type-safe language that is highly capable of type-level pro-gramming and DSL

Typesafe Abstractions for Tensor Operations(Short Paper)

Tongfei ChenJohns Hopkins University, USA

[email protected]

AbstractWe propose a typesafe abstraction to tensors (i.e. multidi-mensional arrays) exploiting the type-level programmingcapabilities of Scala through heterogeneous lists (HList),and showcase typesafe abstractions of common tensor op-erations and various neural layers such as convolution orrecurrent neural networks. This abstraction could lay thefoundation of future typesafe deep learning frameworks thatruns on Scala/JVM.

CCS Concepts • Software and its engineering → Soft-ware libraries and repositories;

Keywords Scala, tensor, heterogeneous list, deep learningACM Reference Format:Tongfei Chen. 2017. Typesafe Abstractions for Tensor Operations(Short Paper). In Proceedings of 8th ACM SIGPLAN InternationalScala Symposium (SCALA’17). ACM, New York, NY, USA, 6 pages.h�ps://doi.org/10.1145/3136000.3136001

1 IntroductionRecently the machine learning community saw a surge oflibraries that handle tensors. Examples include Python li-braries such as NumPy [Walt et al. 2011], Theano [TheanoDevelopment Team 2016], TensorFlow [Abadi et al. 2015],PyTorch1, DyNet [Neubig et al. 2017], or Java libraries likeNd4j2). These libraries provide abstractions to tensor op-erations on CPUs or GPUs, and some support automaticdi�erentiation on computational graphs. For tensors in theselibraries (belongs to just one type NdArray/Tensor), speci�cmeaning is implicitly assigned to each of the dimensions.Correctly manipulating the di�erent dimensions of tensorscould be di�cult, rendering the whole program prone toruntime errors and hard to reason or maintain: we couldmistakenly added up an 1-D tensor and 2-D tensor, or per-formed matrix multiplication between a 2-D tensor and a1 github.com/pytorch/pytorch.2 github.com/deeplearning4j/nd4j.

SCALA’17, October 22–23, 2017, Vancouver, Canada© 2017 Copyright held by the owner/author(s). Publication rights licensedto Association for Computing Machinery.This is the author’s version of the work. It is posted here for your personaluse. Not for redistribution. The de�nitive Version of Record was published inProceedings of 8th ACM SIGPLAN International Scala Symposium (SCALA’17),h�ps://doi.org/10.1145/3136000.3136001.

3-D tensor. These errors are only discovered at runtime, orin some libraries, even ignored because of implicit broad-casting. Programmers must keep track of the axes of thetensors themselves (usually as comments in code), ratherthan leveraging the type system to guide the programmers.

We would like to have a typing mechanism for tensorsthat not only encodes the rank of the tensor, but also en-codes what each axis means. Scala, being a statically-checkedtype-safe language that is highly capable of type-level pro-gramming and DSL construction, as is demonstrated in thepopular library Shapeless3, makes it an ideal language forimplementing such typeful and typesafe tensor abstractions.

We describe the design and implementation of such atypesafe abstraction that addresses the typesafety problemof other tensor libraries, and release a prototype.4

2 Typesafe TensorsWe propose a typesafe tensor abstraction, in which the axesare encoded by a heterogeneous list (HList) type parameter.

trait Tensor[D, A <: HList]

D is the type of the elements this tensor holds (e.g. Float,Double, etc.), whereas types in the HList type parameter Aare phantom types, i.e., they only serve as labels.

Basic constructs such as scalars, vectors and matricescould be represented as follows (types A, B etc. are labels/ names to axes).

type Scalar = Tensor[Float, HNil]type Vector[A] = Tensor[Float, A :: HNil]type Matrix[A, B] = Tensor[Float, A :: B :: HNil]

Looking at more concrete examples from deep learningapplications, images in computer vision, or sentences inwhich each word is mapped to a word embedding (i.e. vectorrepresentation of that word in Rd space) in natural languageprocessing could be encoded as follows. (See Fig. 1)

type Image =Tensor[Float, Width :: Height :: Channel :: HNil]

type EmbeddedSentence =Tensor[Float, Word :: Embedding :: HNil]

By encoding the meaning of each axis into the type, oursystem guarantees that all operations on tensors are allowed3 github.com/milessabin/shapeless.4 github.com/ctongfei/nexus.

arX

iv:1

710.

0689

2v1

[cs

.PL

] 1

8 O

ct 2

017

Page 2: Typesafe Abstractions for Tensor Operations · codes whateachaxismeans .Scala,beingastatically-checked type-safe language that is highly capable of type-level pro-gramming and DSL

SCALA’17, October 22–23, 2017, Vancouver, Canada Tongfei Chen

Figure 1. Example of encoding an image and a sentence astypesafe tensors.

only if their operands’ axes make sense mathematically. Fordi�erent mathematical tensor operators, di�erent type guar-antees are made. For instance:

• Add (+) guarantees that only two tensors of the exactsame axes could be added. Vector[A] and Vector[B]are not addable.

• MatMul (matrix multiplication) guarantees that onlytwo matrices in the shape of Matrix[A, B] andMatrix[B, C] can be multiplied, i.e., the second axisof the �rst operand and the �rst axis of the secondoperand must match.

The general treatment of operator type safety is addressedin Section 5.

Note that the actual size of each axis is not encoded inthe type. Shapeless’s Church encoding of natural numbers(Nat), when the number is large, signi�cantly reduces com-pilation speed. Thus, size mismatch (e.g. adding two tensorsof the same axes but not the same size) errors will not becaptured and will be thrown as runtime errors. We leavethe type-level encoding of dimension sizes as future work(possibly by using dependent types).

2.1 Shape/axes Manipulation FunctionsIn common libraries for tensor operations there is usually aset of operations for axes manipulation, namely transpose,expand_dims, squeeze, tile etc. To make these typesafe,we again turn to type-level HList operations.

Take expand_dims as an example. This operation insertsat a speci�c position a dimension in the tensor with size 1,i.e., a tensor t of shape (a,b, c), when calling expand_dims(t ,axis = 1), results in a tensor of shape (a, 1,b, c). To encodethis type relation, we de�ne the following type-level functionin the style of Shapeless:

trait InsertAt[L <: HList, I <: Nat, X] extends DepFn2[L, X]{ type Out <: HList }

object InsertAt {type Aux[L <: HList, I <: Nat, X, Out0 <: HList] =InsertAt[L, I, X] { type Out = Out0 }

implicit def at0[T <: HList, H]: Aux[T, _0, H, H :: T] =// implementation

implicit def atN[H, T <: HList, P <: Nat, X, R <: HList](implicit ev: InsertAt.Aux[T, P, X, R]):Aux[H :: T, Succ[P], X, H :: R] =// implementation

}

This type-level function InsertAt[L, I, X] representsthe result of the type-level computation that inserts type Xto the I-th index of type-level list L. Following the “Aux” pat-tern, an instance of InsertAt.Aux[L, I, X, R] witnessesthat the resulting list is R when inserting X to the I-th indexof L.

Given this type encoding, we could de�ne a typesafemethod expand_dims for the Tensor[D, A <: HList] traitthat admits a new type (label/name for the new axis) and atype-level natural number (Nat) that speci�es the positionto which the type be inserted. It takes the following typedeclaration:

def expandDims[X, I <: Nat, B <: HList](axis: X, i: I)(implicit d: InsertAt.Aux[A, I, X, B], n: ToInt[I]):Tensor[D, B] = // implementation

This de�nition of expandDims is completely typesafe: theaxes of the output tensor will be completely known at com-pile time. Other tensor manipulation functions could followthis pattern.

3 Computation GraphFor deep learning frameworks, the core algorithm is re-verse automatic di�erentiation [Griewank and Walther 2008;Wengert 1964]. This technique, given the symbolic expres-sion of the loss function of the model, could automaticallydi�erentiate through the computation graph and returns thegradient for each parameter in the network.

Computation graphs, being abstract syntax trees (ASTs) ofsymbolic expressions, are naturally encoded as generalizedalgebraic data types (GADTs) using case classes [Kennedyand Russo 2005; Xi et al. 2003] in Scala. We propose thefollowing GADT de�nition:

sealed trait Expr[X] { }case class Input[X]() extends Expr[X]case class Param[X](var value: X) extends Expr[X]case class Const[X](val value: X) extends Expr[X]case class Apply1[X, Y](f: Op1[X, Y], x: X) extends Expr[Y]case class Apply2[X1, X2, Y](f: Op2[X1, X2, Y], x1: X1, x2: X2) extends Expr[Y]

// higher-arities follow

In the de�nition above, Expr[X] is the base trait for allabstract expressions that conceptually hold values of type X.

Input[X] is any input to neural networks that has typeX. It is similar to TensorFlow’s tf.placeholder, and doesnot hold any value.

Page 3: Typesafe Abstractions for Tensor Operations · codes whateachaxismeans .Scala,beingastatically-checked type-safe language that is highly capable of type-level pro-gramming and DSL

Typesafe Abstractions for Tensor Operations SCALA’17, October 22–23, 2017, Vancouver, Canada

Param[X] represents parameters of neural networks. Thevalues they contain are subject to update during every train-ing iteration of the neural network.

Const[X] represents constants in neural networks. Dur-ing backpropagation computation of gradients, gradients onConsts are not computed.

Apply1[X, Y](f, x) (or higher-arity generalizedApply2, etc.) nodes are the intermediate result of the applica-tion of a di�erentiable operator f to a symbolic expression x .The Op1, Op2 etc. traits are traits for di�erentiable functions,and will be elaborated in the section below.

4 Di�erentiable Tensor OperatorsReverse automatic di�erentiation is essentially the applica-tion of the chain rule in calculus through the computationgraph:

rx = r� · @�@x.

A generic di�erentiable unary operator f : X ! Y shouldbe capable of performing both forward computation� = f (x)and reverse computation as stated above. It could be encodedas follows (for binary or n-ary functions, the de�nitions canbe easily generalized).

trait Op1[X, Y] {// Performs forward computation � = f (x )def forward(x: X): Y

// Performs backward computation rx = r� · @f@x (x )

def backward(dy: Y, y: Y, x: X): X}

In the backward method, because in machine learning appli-cations the loss function is always a scalar value, the typeof r� would be the same as �. The reason for includingthe parameter � in the backward method is because somefunction’s derivative can be easily expressed by the value �rather than x . For example, the common sigmoid activationfunction � = � (x) = 1

1+e�x has the derivative @�@x = �(1 � �).

5 Operator Type PolymorphismHowever, an operator can apply to multiple di�erent types.For example, Add(+) can be applied to any two tensors ofthe same type, however, if their axes do not match, additionis not possible: it makes no sense to add an image of typeTensor[Float, Width :: Height :: HNil] and anothertransposed Tensor[Float, Height :: Width :: HNil].Or, matrix multiplication can only apply to any two tensors inwhich the type of the second axis of the �rst matrix matchesthe type of the �rst axis of the second matrix. Namely, forany types A, B, C, we have matrix multiplication MatMul:(Matrix[A, B], Matrix[B, C]) ! Matrix[A, C]. It isobvious that our de�ned traits OpN are not polymorphic toallow this.

To capture arbitrary type relations on inputs/outputs ofoperators like these above, we de�ne the following polymor-phic unary operator, which can be considered as a di�eren-tiable version of Shapeless’s polymorphic function Poly1,in which the actual implementation of the function is foundthrough implicit resolution of instances of type F[X, Y](akin to Shapeless’s Case.Aux[X, Y]).

trait PolyOp1[F[X, Y] <: Op1[X, Y]] {// Applies this operator to a symbolic expression// requires the actual function F[X, Y] be founddef apply[X, Y](x: Expr[X])(implicit f: F[X, Y]): Expr[Y] = Apply1(f, x)

}

This de�nition captures the following typesafety guaran-tee: an operator of type PolyOp1[F] can only be applied toan expression of type Expr[X] if an implicit instance of F[X,Y] is found. If found, the type of the resulting expression isExpr[Y]. We can arbitrarily de�ne the desired type guaran-tees in F: for each di�erent tensor operator a di�erent F isde�ned to express what kinds of operands it can be appliedto.

This is easily generalized into higher-arity polymorphicoperators. We use two operators to describe the type poly-morphism de�ned above.

5.1 Matrix MultiplicationWe de�ne a type-polymorphic matrix multiplication operatorMatMul:

object MatMul extends PolyOp2[MatMulF]trait MatMulF[X1, X2, Y] extends Op2[X1, X2, Y]object MatMulF {implicit def impl[D, A, B, C]: MatMulF[

Tensor[D, A :: B :: HNil], Tensor[D, B :: C :: HNil],Tensor[D, A :: C :: HNil]] = // implementation

}

This essentially expresses: MatMul can only be applied totwo expressions Expr[X1] and Expr[X2] only if an instanceof MatMulF[X1, X2, Y] is found. This is found only if X1and X2 take the form of Tensor[D, A :: B :: HNil] andTensor[D, B :: C :: HNil].

For example, we have three tensors with the followingtypes.

val ab: Tensor[Float, A :: B :: HNil]val ac: Tensor[Float, A :: C :: HNil]val bc: Tensor[Float, B :: C :: HNil]

When compiling MatMul(ab, bc), according to the def-inition of PolyOp2, the compiler attempts to �nd the im-plicit parameter with type MatMulF[Tensor[D, A :: B ::HNil], Tensor[D, B :: C :: HNil], Y]. MatMulF.implmatches this type and by type uni�cation we get type Y beingTensor[D, A :: C :: HNil]. Henceforth the result typeof MatMul(ab, bc) is Tensor[D, A :: C :: HNil].

Page 4: Typesafe Abstractions for Tensor Operations · codes whateachaxismeans .Scala,beingastatically-checked type-safe language that is highly capable of type-level pro-gramming and DSL

SCALA’17, October 22–23, 2017, Vancouver, Canada Tongfei Chen

However, when compiling MatMul(ab, ac), the com-piler attempts to resolve the implicit parameter with typeMatMulF[Tensor[D, A :: B :: HNil], Tensor[D, A ::C :: HNil], Y]. Such implicit could not be resolved, result-ing in compilation failure, just as we desired. This exampleshows that using the polymorphic function traits (PolyOpN)and implicits, we achieved the axis typesafety for matrixmultiplication MatMul.

5.2 Tensor ContractionTensor contraction is also termed variously as einsum (Ein-stein summation) or tensordot in various Python libraries.Apart from common usage in deep learning, this operatoris also widely found in the sum-product message passingprocedure [Pearl 1982] in the belief propagation algorithmfor inference in probabilistic graphical models [Koller andFriedman 2009]. Mathematically, given two tensors A, B,and a list of axis pairs l along which tensors are contracted,we have the result C that retains axes from both A and Bnot speci�ed in l , and all other axes marginalized (summed)out. This general operation subsumes many common linearalgebra operations:

• Dot product: C =Õ

i AiBi ;• Matrix multiplication: Cik =

Õj Ai jBjk ;

• Tensor product: Ci1 · · ·im j1 · · ·jn = Ai1 · · ·imBj1, · · · , jn .In popular libraries like NumPy or TensorFlow, we write

tensordot(A, B, axes=[[1, 0]]) to specify the axesalong which tensors are contracted. Manually writing theaxes could be di�cult, however using the typeful encodingof tensors, these could be expressed succinctly.

Consider two tensors A and B with axes(A) =

{a1, · · · ,am} and axes(B) = {b1, · · · ,bn}5. We de�ne thenatural tensor contraction A ./ B as the contraction of allthe axes that share the same name/label (similar to the natu-ral join [Codd 1979] in relational databases whereby columnswith the same names are joined). For example, the naturaltensor contraction of matricesA[i, j] and B[j,k] is the naturalmatrix multiplication AB, but the natural tensor contractionof matrices A[i, j] and B[k, j] is a transposed product ABT asthe contraction aligns axes with the same name j.

What is axes(A ./ B)? Because axes that occur in bothA and B are contracted, we could see that axes(A ./ B) =axes(A) 4 axes(B), where 4 is the symmetric set di�erence.We could de�ne a typelevel function to capture symmetricdi�erence:

trait SymDiff[A <: HList, B <: HList] extends DepFn2[A, B]{ type Out <: HList }

Given this, we could encode natural tensor contraction type-fully and typesafely with SymDiff as an implicit evidence as

5For a tensor A with type Tensor[D, Axes <: HList], denote its axes byaxes(A) = Axes, which is the HList.

object Contract extends PolyOp2[ContractF]trait ContractF[X1, X2, Y] extends Op2[X1, X2, Y]object ContractF {implicit def impl[D, A <: HList, B <: HList, C <: HList](implicit C: SymDiff.Aux[A, B, C]):ContractF[Tensor[D, A], Tensor[D, B], Tensor[D, C]] =// implementation

}

Natural tensor contraction also exhibits an elegant prop-erty for automatic di�erentiation (proof omitted):

rA = r(A ./ B) ./ B; rB = r(A ./ B) ./ A.The equation above typechecks since

A = (A 4 B) 4 B; B = (A 4 B) 4 A.

6 Common Neural LayersIn deep learning applications, multiples neural layers (func-tion of symbolic expressions) are often stacked together(function composition). There are a collection of commonlayers that performs certain functions, whose typesafe en-codings we describe below.

6.1 Fully-connected LayersOne of the most common neural network layer is the fully-connected layer. It is essentially an a�ne transformationy =Wx + b on the input vector. It could be encoded as

case class Affine[D, A, B](W: Param[Tensor[D, B :: A :: HNil]],b: Param[Tensor[D, B :: HNil]]

) extends(Expr[Tensor[D, A :: HNil]] => Expr[Tensor[D, B :: HNil]])

where the input vector has type Tensor[D, A :: HNil]and the output has type Tensor[D, B :: HNil].

6.2 Convolutional LayersConvolutional layers convolves a kernel with the layer in-put to produce a tensor of outputs. This is widely used invision/speech/etc. for its shift invariant properties. For the2-dimensional case common in computer vision, it could beencoded as

case class Convolution2D[D, W, H, IC, OC](W: Param[Tensor[D, OC :: IC :: HNil],b: Param[Tensor[D, OC :: HNil]

) extends (Expr[Tensor[D, W :: H :: IC :: HNil]] =>Expr[Tensor[D, W :: H :: OC :: HNil]])

where the input (an image) has axes width, height and inputchannel (e.g. RGB), and the output has three axes: width,height and output channel.

6.3 Recursive LayersSequential recurrent neural networks can be considered asa semiautomaton (S, I ,� ) where S = Rh is the set of hidden

Page 5: Typesafe Abstractions for Tensor Operations · codes whateachaxismeans .Scala,beingastatically-checked type-safe language that is highly capable of type-level pro-gramming and DSL

Typesafe Abstractions for Tensor Operations SCALA’17, October 22–23, 2017, Vancouver, Canada

states, I = Rd is the set of input vectors, and � : (S, I ) ! Sis the transition function, i.e. the recurrent unit. We encodethe recurrent unit as a type

type RecurrentUnit[D, S, I] =(Tensor[D, S :: HNil], Tensor[D, I :: HNil])=> Tensor[D, S :: HNil]

A recurrent unit such as LSTM [Hochreiter and Schmidhuber1997] would be encoded as a subtype of RecurrentUnit.

Given an input sequence (Seq[Expr[Tensor[D, I ::HNil]]], an example would be a sentence in which eachword is represented by a vector representation), we couldnaturally use Scala’s default combinator foldLeft/Right onsequences with a recurrent unit to get the �nal hidden state(Expr[Tensor[D, H :: HNil]]) and scanLeft/Rightto get all the hidden states (Seq[Expr[Tensor[D, H ::HNil]]]). This generalizes to trees (e.g. sentiment analy-sis on dependency parse trees of sentences [Tai et al. 2015])where we could de�ne a tree recursion unit and use it witha fold operation (catamorphism [Sheard and Fegaras 1993])on trees.

7 UsabilityExploiting complex libraries such as Shapeless may imposesome di�culty to programmers: slow compiling speed (re-cursive implicit resolution) and confusing compiler errormessages.

Since tensors in general machine learning are usually atmost 5 dimensions (in video processing, a tensor could have5 dimensions: batch, time, height, width, color channel), per-forming type-level HList operations at compile time or run-time are a negligent overhead. Compiling a normal 3-layerneural network just takes an instant.

We have implemented a simple XOR network (2, 2, 2 neu-rons for each layer) and a simple image classi�cation networkfor MNIST (784, 300, 100, 10 neurons for each layer). Futurework includes deep convolutional networks for image classi-�cation; recurrent networks for text annotation; or sequenceto sequence transduction tasks such as machine translation.Runtime benchmark would be left as future work, since wedo not have a native CPU or GPU underlying implementationas of now.

Compiler errors could be customized by using the Scala@implicitNotFound annotation. Using the matrix multipli-cation (MatMul) example above, we could annotate as follows.

@implicitNotFound(�Cannot apply MatMul to ${X1} and ${X2}.�)trait MatMulF[X1, X2, Y] extends Op2[X1, X2, Y]

When multiplying two tensors that should not bemultiplied (e.g., calling MatMul on two tensors withtype Tensor[Float, A::B::HNil] and Tensor[Float,C::B::HNil]), we would get a compiler message “Cannot ap-ply MatMul to Tensor[Float, ::[A,::[B,HNil]]] and Tensor[Float,

::[C,::[B,HNil]]]”6 located just at the application site of theoperator MatMul. Additionally, IDEs (e.g. Scala plugin in In-telliJ IDEA) will also detect this kind of type error whileediting. These error reporting mechanisms would greatlyaid programmers to identify potential typing errors.

8 Related WorkThis work is related to the research area of typed linearalgebra. Most recent research focused on typing the size ofthe multidimensional arrays by using type-level integers, ortyping the unit of measurements of each dimension, insteadof what this work is presenting: typing the meaning of eachaxis as a label.

[Eaton 2006] implemented a typed linear algebra sys-tem in Haskell where dimension sizes are encoded in thetype system. [Gri�oen 2015] proposed dimensioned matri-ces/tensors in which each axis is dimensioned with a unitof measurement, and by this the unit of measurements ofeach dimension of tensors can be determined at compiletime. [Muranushi and Eisenberg 2014] extends this idea tobe used in astrophysics research and typed the length of eachdimension by using type-level integers.

There has been work that implements tensors/neural net-works in typesafe languages such as Java (DeepLearning4j7)or Haskell (Grenade8). None of these achieved the HList-backed typefulness and typesafety as described in this paper.

9 Conclusion and Future WorkA Scala-based typesafe abstraction around tensors in neuralnetworks is presented in this paper. We have demonstratedthe typesafety and expressiveness of our abstraction throughvarious examples that are common in deep learning tasks.

Type-level encoding of axis size is not explored this paper.Methods using literal types or dependent types in Scala areout of scope of this paper and left as future work.

Future work also includes further implementation thisframework: adding optimized CPU/GPU operations, imple-menting automatic batching, various training methods, etc.We hope that this tool would eventually evolve into a fully-�edged library for typesafe deep learning in Scala.

AcknowledgmentsThe author thanks the three anonymous reviewers, the shep-herd Oleg Kiselyov, and Matthew Francis-Landau, YuhuanJiang and Tim Vieira whose suggestions and advice greatlyhelped improve and clarify this paper.

6To render the HList in�x type :: in a more human-readable way, onecould use the infix functionality in the Splain Scala compiler plugin(h�ps://github.com/tek/splain).7 github.com/deeplearning4j/deeplearning4j8 github.com/HuwCampbell/grenade

Page 6: Typesafe Abstractions for Tensor Operations · codes whateachaxismeans .Scala,beingastatically-checked type-safe language that is highly capable of type-level pro-gramming and DSL

SCALA’17, October 22–23, 2017, Vancouver, Canada Tongfei Chen

ReferencesMartín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng

Chen, Craig Citro, Greg S. Corrado, Andy Davis, Je�rey Dean, MatthieuDevin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geo�rey Irving,Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, ManjunathKudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, DerekMurray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner,Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, VijayVasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wat-tenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. Tensor-Flow: Large-Scale Machine Learning on Heterogeneous Systems. (2015).h�p://tensorflow.org/ Software available from tensor�ow.org.

Edgar F Codd. 1979. Extending the database relational model to capturemore meaning. ACM Transactions on Database Systems (TODS) 4, 4 (1979),397–434.

Frederik Eaton. 2006. Statically typed linear algebra in Haskell. In Proceed-ings of the 2006 ACM SIGPLAN workshop on Haskell. ACM, 120–121.

Andreas Griewank and Andrea Walther. 2008. Evaluating derivatives: prin-ciples and techniques of algorithmic di�erentiation. SIAM.

P. R. Gri�oen. 2015. Type Inference for Array Programming with Di-mensioned Vector Spaces. In Proceedings of the 27th Symposium onthe Implementation and Application of Functional Programming Lan-guages (IFL ’15). ACM, New York, NY, USA, Article 4, 12 pages. h�ps://doi.org/10.1145/2897336.2897341

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.Neural computation 9, 8 (1997), 1735–1780.

Andrew Kennedy and Claudio V Russo. 2005. Generalized algebraic datatypes and object-oriented programming. ACM SIGPLAN Notices 40, 10(2005), 21–40.

Daphne Koller and Nir Friedman. 2009. Probabilistic graphical models:principles and techniques. MIT press.

Takayuki Muranushi and Richard A Eisenberg. 2014. Experience report:Type-checking polymorphic units for astrophysics research in Haskell.In ACM SIGPLAN Notices, Vol. 49. ACM, 31–38.

Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, WaleedAmmar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang,Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, CynthiaGan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro,Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, MatthewRichardson, Naomi Saphra, Swabha Swayamdipta, and Pengcheng Yin.2017. DyNet: The Dynamic Neural Network Toolkit. arXiv preprintarXiv:1701.03980 (2017).

Judea Pearl. 1982. Reverend Bayes on inference engines: A distributed hierar-chical approach. Cognitive Systems Laboratory, School of Engineeringand Applied Science, University of California, Los Angeles.

Tim Sheard and Leonidas Fegaras. 1993. A fold for all seasons. In Proceedingsof the conference on Functional programming languages and computerarchitecture. ACM, 233–242.

Kai Sheng Tai, Richard Socher, and Christopher D Manning. 2015. Improvedsemantic representations from tree-structured long short-term memorynetworks. arXiv preprint arXiv:1503.00075 (2015).

Theano Development Team. 2016. Theano: A Python framework for fastcomputation of mathematical expressions. arXiv e-prints abs/1605.02688(May 2016). h�p://arxiv.org/abs/1605.02688

Stéfan van der Walt, S Chris Colbert, and Gael Varoquaux. 2011. The NumPyarray: a structure for e�cient numerical computation. Computing inScience & Engineering 13, 2 (2011), 22–30.

R. E. Wengert. 1964. A Simple Automatic Derivative Evaluation Program.Commun. ACM 7, 8 (Aug. 1964), 463–464. h�ps://doi.org/10.1145/355586.364791

Hongwei Xi, Chiyan Chen, and Gang Chen. 2003. Guarded recursivedatatype constructors. In ACM SIGPLAN Notices, Vol. 38. ACM, 224–235.