bachelor thesis: performance and interfaces of datatype ...klode/thesis.pdf · programming...

57
Fachbereich Mathematik und Informatik Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic Scala Programs Bachelor Thesis Julian Andres Klode September 29, 2014

Upload: others

Post on 19-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Fachbereich Mathematik und InformatikProgramming Languages and Software Technology Group

Prof. Ostermann

Performance and Interfaces ofDatatype-Generic Scala Programs

Bachelor Thesis

Julian Andres Klode

September 29, 2014

Page 2: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic
Page 3: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic
Page 4: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic
Page 5: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Fachbereich Mathematik und InformatikProgramming Languages and Software Technology Group

Prof. Ostermann

Performance and Interfaces ofDatatype-Generic Scala Programs

Bachelor Thesis

Julian Andres KlodeAltenritter Str. 4

34270 SchauenburgMatriculation number: 2403668

Supervised by: Prof. Klaus Ostermann,Yufei Cai

September 29, 2014

Page 6: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic
Page 7: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Zusammenfassung

Generische Programmierung ist eine Vorgehensweise, die es ermöglicht, Funktionen über dieStruktur eines Datentypes zu definieren, anstelle eines spezifischen Datentypes. Dies erlaubt eseiner Funktion auf beliebigen Datentypen zu operieren, selbst auf solchen, die dem Programmierder Funktion noch nicht bekannt waren.

In dieser Bachelorarbeit werde ich Portierungen von drei Haskell Bibliotheken (LIGD, EMGM,Uniplate) nach Scala vorstellen und ihre Funktionalität und Geschwindigkeit miteinander , sowiemit Shapeless, vergleichen.

Ich werde außerdem zeigen, dass LIGD in Scala erweitert werden kann, sodass sie beinahe somächtig wie EMGM ist, indem ich das Problem der Erweiterbarkeit von Funktionen löse, welchesbisherige Implementierungen stark beeinträchtigte.

Abstract

Generic programming is a technique that allows functions to be defined over the shape of antype, rather than a specific type, thus allowing one function to be used on objects of arbitrarytypes, even those that were not known to the programmer of the function.

In this thesis, I will present ports of three Haskell libraries (LIGD, EMGM, Uniplate) to Scala,and compare their features and performance against each other and against shapeless, a nativeScala library.

I will show that LIGD in Scala can be extended to be almost as powerful as EMGM, solving theissue with extensibility that made it vastly inferior in previous implementations.

Page 8: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic
Page 9: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Contents

1 Introduction 1

2 Porting from Haskell to Scala 52.1 Type classes and instances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Making the most of type inference . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Representing universal types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Libraries 113.1 LIGD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 EMGM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.3 Uniplate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.4 Shapeless . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4 Example Operations 274.1 Paradise benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.2 Locating the smallest integer in a datatype . . . . . . . . . . . . . . . . . . . . . 334.3 First class generic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5 Evaluation of the approaches 375.1 Performance benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.2 Library overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6 Conclusion 41

I

Page 10: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic
Page 11: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

1 Introduction

Datatype-generic operations are operations that are defined over the shapes of objects rather thanany specific types of objects. This is different from the concept of parametric polymorphism, alsocalled generic programming in languages like Scala: Where parametric polymorphism abstractsaway the elements of a container, datatype-generic programming essentially abstracts away thecontainer.

In the rest of this thesis, ‘generic programming’ means ‘datatype-generic programming’.

Example 1.0.0.1 (Parametric polymorphism). The type constructor List is an example forparametric polymorphism. For example, we can generate the length of a list:

def length [T]( l i s t : List [T]) = . . .

Example 1.0.0.2 (Datatype-generic operation). A datatype-generic operation is an operationthat abstracts the outer type, and works with specific inner types. For example, a sum that canwork on containers of integers; that is, something like:

def sum[C[_] ] ( l i s t : C[ Int ]) = . . .

There are two basic ways of how datatype-generic programming libraries expose their inter-faces:

1. By representing objects as sums and products

2. By providing combinators for performing operations

In the first style, a generic operation will traverse the data structure and perform its operation.In the second style, the generic library takes care of the traversal and offers some combinatorsto allow working with the elements. The first style is thus similar to traversing a list manuallywhereas the second style resembles a fold.

What defines a good generic programming library?

In ‘Comparing Libraries for Generic Programming in Haskell’ [RJJ+08] , three basic types ofoperations a library should support were identified. These are (where T is a fixed type):

1

Page 12: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

1. Consumers take a generic object and return an object of a fixed type, they have a typeof the form a ⇒ T

Example: The sum function from above.

2. Transformers map from one generic type to another or the same, they have a type of theform a ⇒ a or a ⇒ b

Example: A function that increments every integer in an object, returning the modifiedone.

3. Producers produce a generic object, they have a type of the form: T ⇒ a

Example: A function that takes an integer and the representation of a tree and returns atree.

In addition to supporting the three type of operations above, a good generic programming libraryshould also fulfil the following criteria:

Ad-hoc cases: Can a function deal with a specific type in an ad-hoc way?For example, if the library exposes sums and products, can a function only work on sums andproducts, or can it match some types directly? That is, can we write a function that collects allobjects of a specific type?

Extensibility: Can an existing function be extended with new ad-hoc cases?For example, imagine a pre-existing function to calculate the depth of a data structure. If wewrite a new tree type that stores the depth in its nodes, can we extend the pre-existing functionto simply return that value?

Multi-parametric functions: Can there be functions with two generic parameters?For example, we might want to define a generic equality function, a generic comparison function,or a function that adds generic values. One other question to consider here is whether theparameters can be of different types.

First-class generics: Can a generic function be passed to another function?For example, suppose a function gmapQ that takes another generic function f and an object andmaps the function f over all members of the object, constructing a list of new objects.

Automatic representations: Generic libraries are usually implemented using some sort of rep-resentation objects that describe the shape of objects. Is it possible to create those automati-cally?

2

Page 13: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Separate compilation: Can we introduce a new type in a new module and use it with a genericfunction in an existing module? (All of the libraries in this thesis support this)

Constructor names: If objects are represented as sums of products, can those products havenames? For example, a pretty printer would want to print constructor names.

Current state

Generic programming is primarily used by the Haskell community. There are several librariesavailable and the Glasgow Haskell Compiler already provides a library as part of its base package,the Data.Data module. It is based on another library called ‘Scrap your boilerplate’[LJ03] and iscombinator-based.

Scala has one generic programming library called ‘Shapeless’. Shapeless was initially derivedfrom scrap your boilerplate, but became more sophisticated over the time, adding features likeheterogeneous lists. This is covered in more detail in section 3.4.

3

Page 14: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic
Page 15: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

2 Porting from Haskell to Scala

This chapter introduces some basic concepts for translating Haskell code to Scala. Most of theseconcepts were explained by Oliveira and Gibbons [dSOG08], this chapter summarises them.

2.1 Type classes and instances

Haskell has a concept of type classes and instances, whereas Scala offers traits. Traits differ fromtype classes in that their methods always contain an implicit this object, that is, they combinedata and functions whereas type classes do not reference any data.

Question: Given a type class like

class MyEq a wheremyEquals : : a → a → BooleanmyEquals a b = False

how do we translate this to Scala?

Answer: There are two approaches, an object-oriented one and a more functional-style one.

2.1.1 Object-oriented translation

The object-oriented approach uses traits just like interfaces in Java are used. For example, theMyEq type class would be translated to the following trait:

trait MyEq[T] {def myEquals(b: T) : Boolean = false

}

In order to implement this trait, a class must extend the trait and override its methods:

class MyInt(val i : Int) extends MyEq[MyInt] {override def myEquals(b: MyInt) : Boolean = i == b. i

}

This already shows one deficiency of the traits approach: We cannot implement a trait for a typewithout modifying that type itself.

On the other hand, writing a function using this approach is straight forward:

def notEquals [T <: MyEq[T] ] ( a : T, b: T) = !a .myEquals(b)

5

Page 16: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

This style makes it easy to specify contexts; for example, if we have a class MyOrd for orderingthat also requires equality, we can write

trait MyOrd[T] extends MyEq[T] {def myLessOrEqual(b: T) : Boolean

}

If we want to translate a type class containing a method that does not take a parameter of thetype T, this style of writing becomes inconsistent.

Example 2.1.1.1. Consider the type class

class PerformerFactory a wherefactor : : aperform : : a → Boolean

We can only directly translate the perform method to Scala. The factor method can not beexpressed in this scheme, it would need to be a global function for every type implementingPerformerFactory. The translation would thus be:

trait PerformerFactory {def perform: Boolean

}

The end result is inconsistent, because one method is now a method of instance objects whereasthe other is translated to a global function. It also permits instances of the trait that onlyimplement the perform method.

2.1.2 The functional approach

Remember the type class MyEq:

class MyEq a wheremyEquals : : a → a → BooleanmyEquals a b = False

We can ignore the implicit this parameter in traits and translate the type class as:

trait MyEq[A] {def myEquals(a : A, b: A) = false

}def myEquals[T](a : T, b: T)(implicit eq : MyEq[T]) = eq .myEquals(a , b)

We also provided a global myEquals function to make our life easier.

Now in order to implement the trait for a type, we need to define an instance of the trait for thistype. Instances of those traits correspond to dictionaries in Haskell, and can be implementedlike this:

implicit def MyEqInt = new MyEq[ Int ] {override def myEquals(a : Int , b: Int) = a == b

}

6

Page 17: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

The implicit keyword is a special syntactic sugar provided by Scala that makes our work easier.Without it, if we wanted to compare two objects, we would need to manually pass the instanceof the trait to the function, like this:> myEquals(1 , 1)(MyEq)true

By making MyEqInt and the eq parameter of myEquals implicit, the Scala compiler can automati-cally infer that MyEqInt is the correct argument for the eq parameter, and we can now run:> myEquals(1 , 1)true

We can also implement instances that require some sort of context. For example, if we want toimplement MyEq for tuples of objects implementing MyEq as well, we can write:implicit def myEqTuple[A,B]( implicit ea : MyEq[A] , eb: MyEq[B]) =

new MyEq[(A,B)]{override def myEquals(a : (A,B) , b: (A,B)) = ea .myEquals(a ._1, b._1)

&& eb.myEquals(a ._2, b._2)}

Now we can compare tuples: for example, in the following call, Scala will automatically infer thecorrect instance of MyEq[(Int, Int )] :> myEquals((1 ,2) , (1 ,2))true

We can still pass the dictionary manually as well:> myEquals((1 ,2) , (1 ,2))(myEqTuple(MyEqInt, MyEqInt))true

2.2 Making the most of type inference

Compared to Haskell, Scala’s type inference is weaker. For example, when defining a function,the type of parameters need to be defined.

An exception to this are anonymous functions. If the Scala compiler knows which type thefunction must have, it can automatically infer the types of the parameters.

Thus, in order to avoid repetitious type information in the code, we can change our MyEq exampleto the one shown in listing 2.2.1.

Listing 2.2.1: Improving type inference using lambdastrait MyEq[A] {

def myEquals : (A ⇒ A ⇒ Boolean) = a ⇒ b ⇒ false}implicit def MyEqInt = new MyEq[ Int ] {

override def myEquals = a ⇒ b ⇒ a == b}

7

Page 18: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

This way, type annotations are not required in method implementations anymore. This techniqueis shown in the port of the EMGM library.

Another thing that seems to matter a lot is the order of parameters. If the less complex typesare left of the more complex types, the more complex types can be inferred more easily. Forexample, in Haskell, foldl has the signature:

foldl : : (a → b → a) → a → [b] → a

Scala does not seem to be able to understand this, it will fail to infer the type of the firstargument. Moving the first argument to the end fixes the issue, as Scala can then simply inferthe types, from left to right. It basically looks like this in Scala (except that the first argumentwould actually be an object which foldLeft is a method of):

foldLeft [a ,b] : [b] ⇒ a ⇒ (a ⇒ b ⇒ a) ⇒ a

2.3 Representing universal types

Haskell libraries often use universal types like:

foral l a . Rep a ⇒ a → String

Representing such types in Scala is a bit more complicated. We cannot use functions – evengeneric ones – because the Scala compiler wants to bind the a type as soon as we pass thefunction to another function.

There are two alternative encodings, however. Both use the same encoding for the functions, thedifference is the encoding of the parameter type of a higher order function.

So, first of all, in order to encode the a function of the type forall a. Rep[a] ⇒ a → b in Scala,we encode it as an object with an apply method containing the actual function.

Listing 2.3.2: Encoding a universal functionobject myfunction {

def apply [A: Rep](a : A) : b = . . .}

There are two variants now:

2.3.1 Trait

We can provide a trait that specifies the type of function we want and make our function objectextend it. For example:

trait Returning [T] {def apply [A: Rep](a : A) : T

}

8

Page 19: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Now we can create a higher order function accepting such a function (object) by accepting anobject of Returning[T].

2.3.2 Structural type

We can also use structural types to avoid the need of extending the trait, and even the need ofproviding Returning-like types for all number of parameters we might need. We can still providea Returning structural type, if we like though, so we do not have to write the complete type everytime.

Using structural types the Returning type looks like this:

type Returning [T] = { def apply [A: Rep](a : A) : T }

and every object providing such a method is implicitly an instance of that type.

Structural types have the minor disadvantage that scala .language. reflectiveCalls must be im-ported (or the equivalent command-line argument be set) when calling them.

9

Page 20: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic
Page 21: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

3 Libraries

Haskell provides many libraries to assist with generic programming. Scala on the other handonly has the Shapeless library. In this chapter, we will translate some Haskell libraries to Scalaand in some cases extend them with new functionality made possible by the Scala type system.

3.1 LIGD

In short: The straight-forward solution to most problems

Listing 3.1.1: LIGD Basicssealed abstract class Rep[T]

implicit case object RBoolean extends Rep[Boolean]implicit case object RUnit extends Rep[Unit ]implicit case object RInt extends Rep[ Int ]implicit case object RFloat extends Rep[Float ]implicit case object RChar extends Rep[Char]implicit case object RString extends Rep[ String ]

/∗∗ Represent sums ∗/case class RSum[A, B](val a: Rep[A] , val b: Rep[B]) extends Rep[Either [A, B] ]case class RProd[A, B](val a: Rep[A] , val b: Rep[B]) extends Rep[(A, B)]

implicit def rSum[A: Rep, B: Rep] : Rep[Either [A, B] ] = RSum(rep [A] , rep [B])implicit def rProd[A: Rep, B: Rep] : Rep[(A, B)] = RProd(rep [A] , rep [B])

LIGD [CH02] is a simple library that provides a Rep trait that describes the structure of a typeusing sums, products, and scalar values.1 This version of LIGD is more powerful than the originalone, though, as we will see later.

Listing 3.1.1 shows the basic Rep trait with instances for scalar types, sums (represented usingEither), and products (represented using Tuple2). As explained in the previous chapter, we useimplicit here to allow the compiler to automatically infer the correct Rep instance for genericfunction calls. This utilises a simple helper function called rep, shown in listing 3.1.2.

1This version is equivalent to the first encoding in that paper but actually derived from the ‘Comparing librariesfor generic programming in haskell’ paper [RJJ+08]

11

Page 22: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Listing 3.1.2: LIGD ‘rep’ helper functiondef rep [T: Rep] = implicitly [Rep[T] ]

3.1.1 User-defined types

In order to work with user-defined (case) classes, those need to be translated to a representationusing sums and products. We need an isomorphism between the user-defined class and a sum ofproduct type. First of all, we can define an isomorphismus as:

class EP[B, C](val from: B ⇒ C, val to : C ⇒ B)

Now we can define a Rep instance, that represents values of a custom type B and converts it toa sum of product notation of type C, something like this:

case class RType[C, B](val c : Rep[C] , val ep: EP[B, C]) extends Rep[B]

There is one problem though: The representation of C may be recursive, for example a list type isa sum of nothing, and a product of an element and the list type. Thus, the parameter c must belazy. Scala does not allow lazy vals on case classes, though, so this needs to be worked around.

There are two approaches: One is encoding the thunk manually, like:

case class RType[C, B](val c : () ⇒ Rep[C] , val ep: EP[B, C]) extends Rep[B]

This is very easy, but looks inconsistent in usage. Another approach is to not use a case classand provide a companion object with custom apply and unapply methods. Listing 3.1.3 shows howthis is implemented.

Listing 3.1.3: Implementation of RTypeclass RType[C, B]( c : ⇒ Rep[C] , ep: EP[B, C]) extends Rep[B] {

lazy val a = clazy val b = ep

}

object RType {def apply [C, B](a : ⇒ Rep[C] , b: EP[B, C] ) : RType[C, B] = new RType(a, b)def unapply[C, B](sum: RType[C, B]) = Some(sum.a, sum.b)

}

/∗∗ A small factory to make conversions easier ∗/def rType[C: Rep, B](from: B ⇒ C, to : C ⇒ B): RType[C, B] = RType(rep [C] , EP(from, to))

/∗∗ Isomorphism for converting between types ∗/sealed case class EP[B, C](val from: B ⇒ C, val to : C ⇒ B)

Example 3.1.1.1 (Lists). In order to represent a list, we encode the list as a sum of products,or to be precise, we want to encode a List [A] as: Either [Unit, (A, List [A])]. In order to do this,we need functions that convert from List [A] to Either [Unit, (A, List [A])] and vice versa. Animplementation of those functions is given in listing 3.1.4.

12

Page 23: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Listing 3.1.4: Implementation of rListdef fromList [A]( l i s t : List [A] ) : Either [Unit , (A, List [A] ) ] = l i s t match {

case Nil ⇒ Left (())case (a : : as) ⇒ Right((a , as))

}

def toList [A]( l i s t : Either [Unit , (A, List [A] ) ] ) : List [A] = l i s t match {case Left (()) ⇒ List .emptycase Right((a , as)) ⇒ a : : as

}

case class RList [A]( ra : Rep[A]) extends RType(RSum(RUnit, RProd(ra , rList (ra ))) ,EP(fromList [A] , toList [A])

) {override def equals(other : Any): Boolean = other match {

case RList(rb) ⇒ this . ra == rbcase _ ⇒ false

}}

implicit def rList [A: Rep] : Rep[ List [A] ] = RList(rep [A])

The first parameter of RType is recursive, allowing us to describe lists of arbitrary length.

There is one issue with this representation though: It leads to deep recursion, potentially causinga stack overflow in functions using it.

3.1.2 Writing generic functions

Writing generic functions using LIGD is easy. We just pass the object(s) we are interestedin to the function, along with their type representations, and then pattern match on the typerepresentations.

Example 3.1.2.1 (Generic equality). Listing 3.1.5 shows how to implement a simple functionthat checks for generic equality.

This function does not deal with cycles nor does it support very long lists. In the first case, itwould need to keep track of which objects it visited already; and in the second case, it wouldneed to manage stack itself or be tail recursive.

Making the function tail-recursive seems straight forward. There is one issue, though: Scalawill not consider the calls to geq eligible for tail call optimization, as the type parameter differs.The only ways to deal with deeply nested structures are thus manual stack management ortrampolines2. This does not matter much in this chapter.

2The scala . util . control . TailCalls object makes trampolines easy to write

13

Page 24: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Listing 3.1.5: Generic Equality in LIGDdef geq [A: Rep](a : A, b: A): Boolean = (rep [A] , a , b) match {

case (RUnit, () , ()) ⇒ truecase (RBoolean, a , b) ⇒ a == bcase (RInt, a , b) ⇒ a == bcase (RFloat, a , b) ⇒ a == bcase (RChar, a , b) ⇒ a == bcase (RString , a , b) ⇒ a == bcase (RSum(ra , rb) , Left(a1) , Left(a2)) ⇒ geq(a1, a2)(ra)case (RSum(ra , rb) , Right(b1) , Right(b2)) ⇒ geq(b1, b2)(rb)case (RSum(_, _) , _, _) ⇒ falsecase (RProd(ra , rb) , (a1, b1) , (a2, b2)) ⇒

geq(a1, a2)(ra) && geq(b1, b2)(rb)case (r : RType[_, A] , t1 , t2) ⇒ geq(r .b. from(t1) , r .b. from(t2))( r .a)case _ ⇒ false

}

3.1.3 Differences and Extensions to the original Haskell implementation

One minor difference is that we implemented Rep like we would implement a type class in Haskell.The original paper manually passed around Rep instances for generic functions, we make use ofScala’s implicit values.

Ad-hoc cases using subtyping

A key difference that makes it possible to extend the functionality of LIGD in Scala is sub-typing.If we look at our definition of RType, we see that it is not sealed, allowing it to be extended bysubclasses, while the set of direct subclasses of Rep is still fixed, thus not breaking any existingcode.

We can use this to create custom sub classes for types we are interested in, and pattern matchon them.

An example for this is the list type. Because we extended RType as RList we can now write ageneric function that finds all lists of a specific element type in a data structure.

Extensibility

Imagine we have an existing generic function that we do not control (for example, in a differentmodule) and want to add a special case to that function. Because the existing generic functionhas self-recursion this is usually impossible.

It turns out that if we change the way we write the generic function slightly, we can actuallyextend it with new special cases. The changes are very simple:

14

Page 25: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

1. Convert the function into a class (or a trait) with an apply method containing the existingfunction and replace all recursive calls to the functions with calls to apply.

2. Provide an object that extends that class, this can then be called like a function.

We can now easily extend that ‘function’, by extending the class and overriding the apply method,matching on the cases we are interested in and deferring the remaining cases to the implemen-tation in the super class.

Example 3.1.3.1 (Extending generic equality with a special case for lists). For example, if wetransform geq according to those rules, we can easily extend it with a special case for lists, asshown in listing 3.1.6.3

Listing 3.1.6: Extended Generic Equality in LIGDclass geqlist extends geq {

override def apply [A: Rep](a : A, b: A): Boolean = (rep [A] , a , b) match {case (RList(ra) , xs , ys) ⇒ xs . length == ys . length && (

xs .isEmpty | |xs . zip(ys ) .map({ case (x, y) ⇒ apply(x, y)(ra) }).min)

case (r , a , b) ⇒ super. apply(a, b)}

}

object geqlist extends geqlist

Other code can now call geqlist instead of geq and benefit from a faster solution.

A similar approach is used in ‘Lightweight modular staging’ [RO10], although based on traitsrather than classes, and not using an apply method – thus making it easily possible to combinefunctions using inheritance rather than composition of objects.

This approach using classes is not applicable to Haskell, but it is possible to do something similar- by passing the function as an argument:

1. Convert geq :: Rep[a] → a → a → Bool togeq’ :: Rep[a] → a → a → ( forall a. Rep[a] → a → a → Bool) → Bool

and change self-recursive calls to call the new argument

2. Add a geq rep a b =geq’ rep a b geq

Extending the function should now be straight forward. It is less useful to have this in Haskell,though, because LIGD in Haskell does not support ad-hoc cases, meaning we can only overridethe pre-defined type representations and not match on custom types. If support for namedconstructors is added however, it could still become useful enough.4

3The complete example can be found in the file src/ligd-extensible.scala [Klo14]4See http://code.haskell.org/generics/comparison/LIGD/LIGD.lhs for an LIGD with named constructors

15

Page 26: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Listing 3.1.7: Pattern for representing parameterised typesclass RMyCustomType[T1, . . . ] ( r1 : Rep[T1] , . . . ) {

override def equals(b: Any) : Boolean = b match {case RMyCustomType(r1 , . . . ) ⇒ r1 == this . r1 &&

r2 == this . r2 && . . .case _ ⇒ false

}}

Equality comparisons of type representations

The Haskell version of LIGD does not support comparing Rep instances against each other. Thisis quite useful, though, as it allows to implement simple variants of generic operations that lookfor objects of a specific type in another object, such as folds.

For this to work, it is important to always use values (and not functions) for Rep instances fornon-parameterised types, so that they compare correctly without a custom equals () method.

When representing parameterised types such as Lists, we can create a subclass of RType withRep parameters for the parameters of the type. We can then define a custom equals () method,as seen in listing 3.1.7.

Example 3.1.3.2 (Use case: folds). By implementing equality on types, we can implement afold -like function, as shown in listing 3.1.8.

Listing 3.1.8: LIGD foldingdef gfoldl [A, C: Rep, N: Rep]( fun : (A, N) ⇒ A)(unit : A)(c : C) : A = (rep [C] , c) match {

case (r , v) i f r == rep [N] ⇒ fun(unit , v. asInstanceOf [N])case (RSum(ra , rb) , Left(x)) ⇒ gfoldl (fun)(unit)(x)(ra , rep [N])case (RSum(ra , rb) , Right(x)) ⇒ gfoldl (fun)(unit)(x)(rb , rep [N])case (RProd(ra , rb) , (x, y)) ⇒ gfoldl (fun)( gfoldl (fun)(unit)(x)(ra , rep [N]) )(y)(rb , rep [N])case (r : RType[_, C] , t1) ⇒ gfoldl (fun)(unit)(r .b. from(t1))(r .a , rep [N])case _ ⇒ unit

}

This is the most generic way to define the fold. A less generic way would be to restrict C toa type that can contain objects of type N. One way to deal with normal container types, is tomake C a type constructor, taking N as its parameter, as shown in listing 3.1.9.

Listing 3.1.9: Simple LIGD foldingdef foldl [A, C[_] , N]( c : C[N]) (unit : A)(fun : (A, N) ⇒ A)(implicit rep : Rep[C[N] ] , rn : ←↩

↪→ Rep[N]) : A = gfoldl (fun)(unit)(c)

Furthermore, re-ordering the parameters as done in foldl allows the scala compiler to infer thetype of a lambda passed as the third argument, allowing us to omit type annotations for it.

16

Page 27: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Example 3.1.3.3 (Use case: transformation combinator). Another way we can use equalitycomparison is to implement a combinator that performs a transformation5.

Listing 3.1.10: LIGD ‘everywhere’def everywhere [C: Rep, N: Rep]( fun : N ⇒ N)(c : C) : C = (rep [C] , c) match {

case (r , v) i f r == rep [N] ⇒ fun(v. asInstanceOf [N]) . asInstanceOf [C]case (RSum(ra , rb) , Left(x)) ⇒ Left(everywhere(fun)(x)(ra , rep [N]) )case (RSum(ra , rb) , Right(x)) ⇒ Right(everywhere(fun)(x)(rb , rep [N]) )case (RProd(ra , rb) , (x, y)) ⇒ (everywhere(fun)(x)(ra , rep [N]) , everywhere(fun)(y)(rb , ←↩

↪→ rep [N]) )case (r : RType[_, C] , t1) ⇒ r .b. to(everywhere(fun)(r .b. from(t1))(r .a , rep [N]) )case (r , v) ⇒ v

}

This implementation does not recurse into objects of the target type; depending on the task, thismight be a better idea, for example, for transforming syntax trees.

3.1.4 Variant: HLIGD – LIGD with heterogeneous lists

A new variant of LIGD included in the source code6 uses heterogeneous lists instead of tuplesfor representing products. This variant is more a proof of concept rather than a finished version,though, the current encoding might not be the best option.

Heterogeneous lists allow for a more natural representation of constructors compared to nestedpairs. In addition, they make it possible to easily implement zippers – allowing us to convertany non-scalar object with a representation to a zipper.

Additionally, this variant of LIGD introduces a new basic type of representation for sequences,allowing lists, sets, and other types to be represented more efficiently than with a nested encoding,thus reducing the number of recursion steps and reducing the chance of an algorithm abortingdue to a stack overflow.

3.2 EMGM

In short: More verbose (especially for ad-hoc cases) than LIGD

In Generics for the Masses [Hin06], Ralf Hinze introduces an alternative encoding of genericfunctions. While objects are still viewed as sums and products of scalar values, generic functionsare now defined as instances of a Generic type class, with one method for each type of objectsupported, a slightly extended and translated version can be seen in Figure 3.2.11. The parameterG refers to a type constructor of a type that stores a function of a specific type.

5Similar to everywhere of Shapeless and transformBi of Uniplate6See hlists.scala and hligd.scala in the src directory [Klo14]

17

Page 28: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Listing 3.2.11: The Generic trait in GMtrait Generic [G[_] ] {

def unit : G[Unit ]def plus [A, B] : G[A] ⇒ G[B] ⇒ G[Either [A, B] ]def prod[A, B] : G[A] ⇒ G[B] ⇒ G[(A, B)]def constr [A] : Symbol ⇒ Int ⇒ G[A] ⇒ G[A] = (name ⇒ arity ⇒ arg ⇒ arg)def char : G[Char]def int : G[ Int ]def f loat : G[Float ]def string : G[ String ]def view[A, B] : Iso [B, A] ⇒ (⇒ G[A]) ⇒ G[B]

}

In addition to the cases known from LIGD, this also introduces a constr case for representingnamed constructors of a certain arrity. The default implementation of constr will just returnthe underlying G[A] and ignore the name and arrity, but functions are free to override it – forexample, a generic show function might use this to pretty print data types.

Example 3.2.0.1 (Generic equality). Imagine we want to check arbitrary objects of the sametype for equality. Our type G[_] must have a function that accepts two values of the type givenby the first parameter, and returns a boolean – see Listing 3.2.12.

Listing 3.2.12: A generic equality function in GMcase class GEq[A](geq : A ⇒ A ⇒ Boolean)

class MyGEq extends Generic [GEq] {override def unit = GEq(x ⇒ y ⇒ true)override def plus [A, B] = a ⇒ b ⇒ GEq(x ⇒ y ⇒ (x, y) match {

case (Left(x) , Left(y)) ⇒ a.geq(x)(y)case (Right(x) , Right(y)) ⇒ b.geq(x)(y)case (_, _) ⇒ false

})override def prod[A, B] = a ⇒ b ⇒ GEq(x ⇒ y ⇒

a.geq(x._1)(y._1) && b.geq(x._2)(y._2))override def char = GEq(x ⇒ y ⇒ x == y)override def int = GEq(x ⇒ y ⇒ x == y)override def f loat = GEq(x ⇒ y ⇒ x == y)override def string = GEq(x ⇒ y ⇒ x == y)override def view[A, B] = iso ⇒ a ⇒ GEq(x ⇒ y ⇒ a.geq( iso . from(x))( iso . from(y)))

}

def geq [T](a : T, b: T)(implicit r : Rep[T] ) : Boolean = r . rep(newMyGEq).geq(a)(b)}

We use closures here instead of normal methods in order to have the compiler infer the typesautomatically, as explained in Section 2.2. As with LIGD, deeply nested data structures andcycles will cause this function to fail.

18

Page 29: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Being able to represent generic functions is only half of the work, though – we still need a wayto represent objects. Instead of converting objects to a representation of sums and products, wesimply call the members of the Generic instance we need. Some of the instances of such a Reptrait are shown in listing 3.2.13.

Listing 3.2.13: The Rep trait in GM and some instancestrait Rep[A] {

def rep [G[_] ] ( implicit g: Generic [G] ) : G[A]}

implicit def RString = new Rep[ String ] {override def rep [G[_] ] ( implicit g: Generic [G] ) : G[ String ] = g. string

}implicit def RSum[A, B]( implicit a: Rep[A] , b: Rep[B]) = new Rep[Either [A, B] ] {

override def rep [G[_] ] ( implicit g: Generic [G] ) : G[Either [A, B] ] = g. plus(a . rep)(b. rep)}implicit def RProd[A, B]( implicit a: Rep[A] , b: Rep[B]) = new Rep[(A, B)] {

override def rep [G[_] ] ( implicit g: Generic [G] ) : G[(A, B)] = g.prod(a. rep)(b. rep)}

Example 3.2.0.2 (Representing lists). We can now, similar to LIGD, represent arbitrary typesas sums of products. Recall the list example from LIGD. As we can see in Figure 3.2.14, thedefinition is similar to the one in LIGD – the isomorphism is basically the same, the onlydifference is the slightly more verbose definition of RList, because we need to split the genericfunction ‘creation’ (the description of the type) out in rList .

We will see that this only gets worse once we introduce ad-hoc cases.

Listing 3.2.14: Representing lists in GMdef isoList [A] : Iso [ List [A] , Either [Unit , (A, List [A] ) ] ] = Iso(fromList , toList)def fromList [A]( l i s t : List [A] ) : Either [Unit , (A, List [A] ) ] = l i s t match {

case Nil ⇒ Left(Unit)case (a : : as) ⇒ Right((a , as))

}

def toList [A]( l i s t : Either [Unit , (A, List [A] ) ] ) : List [A] = l i s t match {case Left (()) ⇒ List .emptycase Right((a , as)) ⇒ a : : as

}

def rList [G[_] , A](a : G[A])( implicit g: Generic [G] ) : G[ List [A] ] = {import g._view( isoList [A])( plus(unit )(prod(a)( rList (a))))

}

implicit def RList [A]( implicit a: Rep[A]) = new Rep[ List [A] ] {override def rep [G[_] ] ( implicit gen: Generic [G]) =

rList (a . rep)(gen)}

19

Page 30: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

3.2.1 Extensibility and Modularity

Now consider that we want to treat lists specially, that is, create an ad-hoc case of geq that handleslists directly, instead of as a sum of products representations. That is, we need a subclass ofGeneric that can work on lists, something like:

Listing 3.2.15: Generic with ad-hoc list casetrait GenericList [G[_] ] extends Generic [G] {

def l i s t [A] : G[A] ⇒ G[ List [A] ] = a ⇒ rList (a)(this)}

Note that the special case for lists defers to the sum of product variant if it is not handled bythe generic function. We can thus use that extra method (override it) or ignore it (and treat itusing sums and products).

Example 3.2.1.1 (Generic equality extended for lists). An example for using it would be ageneric equality function with an optimised version for lists, like figure 3.2.16 shows.

Listing 3.2.16: Generic equality specialised to listsimplicit object GEqList extends MyGEq with GenericList [GEq] {

// special case of GEq for l i s ts , semantically equivalent// to an object of type MyGeq. It is here to demonstrate// that i t is possible to add new cases to a generic operation .//// 1. If two l i s t s have unequal length , then they are not equal .// 2. Two empty l i s t s are equal .// 3. Two nonempty l i s t s of equal length are equal i f a l l their// elements are equal .override def l i s t [A] : GEq[A] ⇒ GEq[ List [A] ] = eq ⇒ GEq { xs ⇒

ys ⇒xs . length == ys . length && (xs .isEmpty | |xs . zip(ys ) .map({ case (x, y) ⇒ eq .geq(x)(y) }).min)

}}

If we do not override list , it would simply use the isomorphism.

What we are still missing is a sort of dispatcher, like Rep in ‘Generics for the masses’, but thistime, rep should not be parameterised over G, rather the type itself should be – the resultingGRep trait defines a representation of a type for a specific generic function, rather than for allgeneric functions.

Listing 3.2.17: The GRep dispatchertrait GRep[G[_] , A] {

def grep : G[A]}

20

Page 31: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

The instances of GRep are basically the same as those of Rep, except that they capture theGeneric [G] in the constructor and not in the method.

Listing 3.2.18: A GRep instance for productsimplicit def GRProd[G[_] , A, B]( implicit g: Generic [G] , a : GRep[G, A] , b: GRep[G, B]) =

new GRep[G, (A, B)] {override def grep : G[(A, B)] = g.prod(a. grep)(b. grep)

}

We can now add a representation for our special case of lists:

Listing 3.2.19: A GRep instance for listsimplicit def GRList[G[_] , A]( implicit g: GenericList [G] , a : GRep[G, A]) =

new GRep[G, List [A] ] {def grep : G[ List [A] ] = g. l i s t (a . grep)

}

Unlike the other instances of GRep, this one does not take a Generic [G], but rather a GenericList [G].This means that functions will only work on lists if they extend GenericList . This is easy to do,though: Because GenericList provides a default implementation of list that defers to the view ofsum of products, an empty declaration is sufficient. For example, instead of the special case forlists in GEqList, we could just use:

implicit object GEqList extends MyGEq with GenericList [GEq]

to extend the universe of geq to lists – there is no need to actually use any of the special cases.

3.3 Uniplate

In short: Not feasible without macros or code generators and fails generic equality test

Uniplate is a library that differentiates between two basic type of objects: uniplates and biplates.Uniplates are simple recursive types like terms or expressions, which form a graph where all innernodes have the same common supertype. Biplates are types that contain other uniplates, forexample, a List a has a Biplate List a.

3.3.1 Uniplates

The Uniplate type class, shown in listing 3.3.20 describes recursive types like terms and expres-sions. Its only primitive operation is the uniplate () method.

The uniplate () method takes an object of type T and returns a tuple of all direct children of thesame type and a function that constructs a new T from a list of children of the same type.

There are also several more functions that can be defined on top of uniplate ().

21

Page 32: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Listing 3.3.20: The Uniplate type class (and various additional functions)trait Uniplate [T] {

def uniplate( se l f : T) : (List [T] , List [T] ⇒ T)

/∗ Implementation of additional functions ∗/private [UniPlate ] final def children( se l f : T) : List [T] = uniplate( se l f ) ._1private [UniPlate ] final def universe( se l f : T) : List [T] = se l f : : ←↩

↪→ children( se l f ) . flatMap(universe)private [UniPlate ] final def transform( f : T ⇒ T)( se l f : T) : T = {

val (children , context) = uniplate( se l f )f (context(children .map(transform( f ))))

}private [UniPlate ] final def descend( f : T ⇒ T)( se l f : T) : T = {

val (children , context) = uniplate( se l f )context(children .map( f ))

}private [UniPlate ] final def rewrite( f : T ⇒ Option[T]) ( se l f : T) : T = {

def g( se l f : T) = f ( se l f ) .map(rewrite( f )) getOrElse se l ftransform(g)( se l f )

}}

The children () function returns a list of all direct children of the same type. For example, in aterm, this will be the operands.

The universe () function returns a list of all transitive children of the same type. For example, ina term, this will be the operands and their children, and so on. In a tree, applied to the root,this will be all nodes in the tree.

The transform() function defines a bottom up transformation. It applies the passed functionrecursively to all children bottom-up.

Example 3.3.1.1 (Transformation). An example of a bottom-up transformation would be elim-inating subtraction in expressions, that is transform a − b to a + (−b). Given an existing typeExpr and Add, Sub, Neg case classes of it, such a transformation could be written as:

def removeSub(e : Expr) = {case Sub(a, b) ⇒ Add(a, Neg(b))case e ⇒ e

}

val expressionWithoutSub = transform(removeSub)(expression)

The function removeSub could also match other types and do other substitutions, it is not limitedto a single sub-type.

Finally, the descend() function applies a top-down transformation, and the rewrite () functionapplies a transformation until some sort of normal form has been reached. The passed function

22

Page 33: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

returns a Some if it applied a transformation to an object and a None if the input was in normalform already.

In the original uniplate paper [MR07], Mitchell and Runciman introduce additional functions,but those are not implemented in the Scala version, because the focus was on biplates.

3.3.2 Biplates

Simply speaking, biplates are containers of uniplates. For example, a type Expr that representsarithmetic expressions over integers has a corresponding Biplate [Expr, Int ].

Listing 3.3.21: The Biplate type class (and various additional functions)abstract class Biplate [B, A]( implicit up: Uniplate [A]) {

def biplate( se l f : B) : (List [A] , List [A] ⇒ B)private [UniPlate ] final def universeBi( se l f : B) = universeOn(biplate)( se l f )private [UniPlate ] final def transformBi( f : A ⇒ A)( se l f : B) = transformOn(biplate)( f )( se l f )

}

The basic operation of a Biplate is the biplate method. This method is similar to the uniplatemethod of the Uniplate type class, but now works on children of a different type.

The universeBi and transformBi methods are like their non-bi counterparts in Uniplate. Outlook:transformBi corresponds to the everywhere combinator of ‘Scrap your Boilerplate’ [LJ03], with onecaveat: the passed function is not polymorphic.

Example 3.3.2.1 (Minimum integer). A function that calculates the minimum integer in abiplate can be defined by folding scala .math.min over the universe of the biplate, as shown inlisting 3.3.22.

Listing 3.3.22: Uniplate minIntdef minInt [C]( se l f : C)(implicit bp: Biplate [C, Int ]) =

universeBi( se l f ) . foldLeft (Int .MaxValue)( scala .math.min)

3.3.3 Limitations and Advantages compared to other solutions

Uniplate can theoretically be ported from Haskell to Scala, but it is not feasible to do so withouta way to automatically generate instances of the Biplate type class, because defining instancesmanually is too complex (quadratic).

Due to the way uniplate works, it is not possible to implement a generic equality function fortwo biplate objects. Because biplates work by looking only at a subset of the types elements(for example, a Biplate [Company,Salary] only cares about salaries in a company – not enough tocompare two companies). Even for uniplates, defining a generic equality function won’t workbecause we only have access to the objects themselves and not to members of a different type.

On the other hand, if Uniplate is applicable, it will almost always lead to the shortest codecompared to other frameworks. See chapter 3 for more examples.

23

Page 34: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

3.4 Shapeless

Shapeless is an already existing Scala library that started as a port of ‘Scrap your Boiler-plate’ [LJ03] from Haskell. In the mean time however, it evolved a bit further and now coverstopics like heterogeneous lists and zippers, among other things.

Compared to the LIGD, EMGM, and Shapeless ports, Shapeless is far more advanced. It auto-matically works on custom Scala types without having to write instances of any type class, andit also uses several advanced programming techniques, like dependent types, for its implementa-tion. This means that using Shapeless is easy, as long as there is no error. Debugging errors, onthe other hand, can be quite complicated.

3.4.1 Polymorphic functions

Shapeless provides support for writing polymorphic functions in the shapeless .poly package.

Example 3.4.1.1 (Extracting integers). The function int1 returns a passed integer and 1 (theidentity of multiplication) for all other types.

Listing 3.4.23: int1 – A polymorphic function of arity 1object prod extends Poly2 {

implicit val caseInt = at [ Int , Int ] (_ ∗ _)implicit val caseLong = at [Long, Long](_ ∗ _)implicit val caseFloat = at [Float , Float ](_ ∗ _)implicit val caseDouble = at [Double, Double] (_ ∗ _)

}

Example 3.4.1.2 (Lifting a monomorphic function to a polymorphic one). The class → ofthe shapeless .poly package takes as an argument a monomorphic function and lifts it to a Poly1instance. Using it is simple:

object incSalary extends →(( i : Salary) ⇒ Salary( i . salary ∗ 1.1F))

Example 3.4.1.3 (Calculating products of numbers). The function prod calculates the productsof numbers. Given two Int objects it produces an Int , given two Float it produces a Float, and soon.

Listing 3.4.24: prod – A polymorphic function of arity 2object prod extends Poly2 {

implicit val caseInt = at [ Int , Int ] (_ ∗ _)implicit val caseLong = at [Long, Long](_ ∗ _)implicit val caseFloat = at [Float , Float ](_ ∗ _)implicit val caseDouble = at [Double, Double] (_ ∗ _)

}

24

Page 35: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

3.4.2 The everything combinator

Shapeless provides a combinator called everything for constructing queries on objects. This takestwo polymorphic functions: The first one extracts objects we are interested in, the second onecombines those objects to a result – it basically is a fold.

Example 3.4.2.1 (Calculating the product of all integers in an object). If we combine int1 andprod from the previous subsection, we can easily calculate the product of all integers in a datastructure (recursively).

Listing 3.4.25: product – Product of all integers in an objectval product = everything( int1 )(prod)

3.4.3 The everywhere combinator

The everywhere combinator implements transformations. It takes a polymorphic function andapplies that to a data structure, types not handled by it will be recursed into.

Example 3.4.3.1 (Increasing salaries). Applying everywhere to the function incSalary introducedearlier in that chapter yields a new function that can be applied to any object and will incrementall salaries within that object, returning the changed object.

scala> everywhere(incSalary)(1)res0 : Int = 1scala> everywhere(incSalary)(List (Salary(10 f )))res1 : List [CompanyData. Salary ] = List(Salary(11.0))

3.4.4 Heterogeneous lists (HList)

Heterogeneous lists are like a blend of tuples and lists. That is, they consist of heterogeneouselements and are variable size. Their type encodes the length and the type of the elements,and operations like prepending calculate the result type through the use of dependent types. Aheterogeneous list is created by prepending elements to another heterogeneous list using the ::operator. The empty list is the object HNil.

Example 3.4.4.1 (A heterogeneous list).

scala> 1 : : "Hello ,␣world" : : HNilres0 : shapeless . : : [ Int , shapeless . : : [ String , shapeless .HNil ] ] = 1 : : Hello , world : : HNil

25

Page 36: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

3.4.5 Zippers

Shapeless provides an implementation of generic zippers. A generic zipper is a data structurethat represents a moment in the traversal of another data structure, such as a tree. It hasoperations like left , right , down, and up for navigating through the object and get to get theobject at the current position.

Example 3.4.5.1 (Object to zipper). Importing the syntax. zipper ._ sub-package adds a toZippermethod to objects.

scala> ((1 ,1) ,2 ,3). toZipperres0 = Zipper(HNil,(1 ,1) : : 2 : : 3 : : HNil,None)scala> res0 .down. getres1 : Int = 1

3.4.6 Further types and operations

Shapeless also provides several additional types and operations, such as natural numbers on thetype level, heterogeneous maps, and more.

26

Page 37: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

4 Example Operations

This chapter introduces some example operations that can be used to evaluate the interfacesof the programming libraries; for example, the lines of code needed to implement one of theexamples.

Some of the examples – like increasing the salary and finding a minimum integer – will berevisited in the next chapter, when looking at the performance of the libraries.

4.1 Paradise benchmark

GPBench, developed by Rodriguez et. al., is a suite of benchmarks for comparing generic pro-gramming libraries in Haskell [RJJ+08]. One benchmark included in it is the so-called paradisebenchmark.

The paradise benchmark looks at data structures representing a company and tries to transformit, by increasing every salary by 10%.

Listing 4.1.1: Paradise benchmark data structurescase class Company(val depts : List [Dept])case class Dept(val name: Name, val manager: Manager, val units : List [DUnit] )sealed trait DUnitcase class PU(val person : Employee) extends DUnitcase class DU(val dept : Dept) extends DUnitcase class Employee(val person : Person , val salary : Salary)case class Person(val name: Name, val address : Address)case class Salary(val salary : Float)

type Manager = Employeetype Name = Stringtype Address = String

The standard example data can be seen in Listing 4.1.2.

Apart from increasing the salary, we will also look at querying salaries.

27

Page 38: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Listing 4.1.2: Paradise benchmark example dataval ral f = Employee(Person("Ralf" , "Amsterdam") , Salary(8000))val joost = Employee(Person("Joost" , "Amsterdam") , Salary(1000))val marlow = Employee(Person("Marlow" , "Cambridge") , Salary(2000))val blair = Employee(Person("Blair" , "London") , Salary(100000))

/∗∗ Start values ∗/val genCom: Company =Company(List (

Dept("Research" , ralf , List (PU( joost ) , PU(marlow))) ,Dept("Strategy" , blair , Nil )))

4.1.1 LIGD

Given the data structures above and our implementation of LIGD that supports subtyping RType,implementing both operations is fairly straight forward.

First of all, the type representations. For the data types Person, Employee, and Company wecan use the simplest form of definition, by using the rType function we added to our LIGDimplementation.

Listing 4.1.3: Person, Employee, and Company representations in LIGDimplicit val rPerson = rType(Person.unapply(_: Person) . get , (Person.apply _) . tupled)implicit val rEmployee = rType(Employee.unapply(_: Employee) . get , (Employee.apply _) . tupled)

implicit val rCompany = rType(Company.unapply(_: Company). get , Company.apply)

Because we want to query and transform salaries, we need to treat salaries specially, so we canpattern match on it (otherwise we would need to compare, but pattern matching is easier toread):

Listing 4.1.4: Salary representation in LIGDdef fromFloat( f : Float) = Salary( f )def toFloat(s : Salary) = s . salaryimplicit case object RSalary extends RType[Float , Salary ](RFloat, EP(toFloat , fromFloat))

Finally, the DUnit and Dept types are mutually recursive. Simply using rType here as we havedone for the other types would not work due to two reasons: First, the Scala compiler wouldnot be able to infer the types; and second, because the implicit Rep of rType is not lazy, therewould be either endless recursion (if both were defined using def) or rDept would be null in thedefinition of rDUnit because rDept is defined after rDUnit and thus not yet initialized.

Fixing this is simple: for the first issue, we explicitly specify the types of the representations.For the second issue, we can simply define rDUnit in terms of RType instead of rType. This makesthe reference to rDept lazy and thus allows it to be used here.

28

Page 39: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Listing 4.1.5: Mutually recursive representationsimplicit val rDUnit: RType[Either [Dept, Employee] , DUnit] = RType(RSum(rDept, rEmployee) ,EP(_ match {

case PU(per) ⇒ Right(per)case DU(dept) ⇒ Left(dept)

} , _. fold (DU, PU)))

implicit val rDept: RType[( String , (Manager, List [DUnit] )) , Dept] = rType(d ⇒ (d.name, (d.manager, d. units )) ,e ⇒ Dept(e ._1, e ._2._1, e ._2._2)

)

Example 4.1.1.1 (Query: Summing all salaries). Querying salaries is relatively straight forward.Pattern match on all standard LIGD types, and recursive for the product, sum, and RType types;but before matching against RType match against RSalary:

Listing 4.1.6: Summing salaries in LIGD (manually)def sumSalaryOld[A: Rep](a : A): Float = (rep [A] , a) match {

case (RSum(ra , rb) , Left(a)) ⇒ sumSalaryOld(a)(ra)case (RSum(ra , rb) , Right(b)) ⇒ sumSalaryOld(b)(rb)case (RProd(ra , rb) , (a , b)) ⇒ sumSalaryOld(a)(ra) + sumSalaryOld(b)(rb)/∗ Scala does not recognize that salary is a Salary here ∗/case (RSalary , salary : Salary) ⇒ RSalary.b. from(salary)case (r : RType[_, A] , t1) ⇒ sumSalaryOld(r .b. from(t1))( r .a)/∗ Catch a l l other cases here ∗/case _ ⇒ 0

}

We can, however, also use the generic fold we implemented earlier, the definition then becomesa single line of code:

Listing 4.1.7: Summing salaries in LIGD (fold)def sumSalary[C: Rep]( c : C): Float = gfoldl ((a : Float , n: Salary) ⇒ (a + n. salary ))(0)(c)

Note: If we swapped around the first two parameters of gfoldl we would not even need to specifythe Float type, as long as we would pass 0.0 instead of 0.

Example 4.1.1.2 (Transformation: Increase all salaries). Increasing the salaries by 10% is whatis usually called the paradise benchmark. Implementing this is straight forward as well in LIGD:

Listing 4.1.8: Increasing the salaries in LIGDdef incSalary [A: Rep](a : A, by: Float ) : A = (rep [A] , a) match {

case (RSum(ra , rb) , Left(a)) ⇒ Left( incSalary(a, by)(ra))case (RSum(ra , rb) , Right(b)) ⇒ Right(incSalary(b, by)(rb))case (RProd(ra , rb) , (a , b)) ⇒ ( incSalary(a, by)(ra) , incSalary(b, by)(rb))case (RSalary , salary : Salary) ⇒ Salary(salary . salary ∗ (1 + by / 100))case (r : RType[_, A] , t1) ⇒ r .b. to(incSalary(r .b. from(t1) , by)( r .a))case (rep , value) ⇒ value

}

29

Page 40: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

We can abstract this by building an everywhere function1 that applies a transformation function,similar to EMGM and Uniplates transformBi – and then calling it with a salary-transformingfunction on our company, like this:

Listing 4.1.9: Increase the salaries using everywhere

everywhere(( i : Salary) ⇒ Salary( i . salary ∗ 1.1F))(genCom)

In any case, both examples are easy to implement.

4.1.2 EMGM

Implementing the paradise benchmark in EMGM requires more boilerplate compared to LIGD.This version is slightly more powerful than the LIGD variant (it provides ad-hoc cases for all typesin a company, compared to only Salary), but only because this makes the code more readable.

We first start by extending Generic with new methods for all types of company objects. Thisallows us to write functions with ad-hoc cases for all of those types.

Listing 4.1.10: GenericCompanytrait GenericCompany[G[_] ] extends GenericList [G] {

def dept : G[Dept] = view( iso )(prod(string)(prod(employee)( l i s t (dunit))))def person = view(Iso(Person.unapply(_: Person) . get , (Person.apply ←↩

↪→ _) . tupled))(prod(string)( string))def employee = view(Iso(Employee.unapply(_: Employee) . get , (Employee.apply ←↩

↪→ _) . tupled))(prod(person)(salary))def company = view(Iso(Company.unapply(_: Company) . get , Company.apply))( l i s t (dept))def dunit = view( iso1)(plus(dept)(employee))def salary : G[Salary ] = view(Iso(

(s : Salary) ⇒ s . salary ,(s : Float) ⇒ Salary(s)

))(constr ( ’Salary)(1)( float ))}

def iso1 : Iso [DUnit, Either [Dept, Employee ] ] = Iso ((_: DUnit) match {case PU(per) ⇒ Right(per)case DU(dept) ⇒ Left(dept)

} , (_: Either [Dept, Employee]) . fold (DU, PU))

def iso : Iso [Dept, (String , (Manager, List [DUnit] ) ) ] = Iso((d: Dept) ⇒ (d.name, (d.manager, d. units)) ,(e : (String , (Manager, List [DUnit] ) )) ⇒ Dept(e ._1, e ._2._1, e ._2._2)

)

Now we need some boilerplate to have implicit GRep instances, as shown in listing 4.1.11.

1a sample implementation of everywhere is included in the source code

30

Page 41: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Listing 4.1.11: Implicit instances of GRepimplicit def GRPerson[G[_] ] ( implicit g: GenericCompany[G]) = new GRep[G, Person] {

override def grep = g.person}implicit def GREmployee[G[_] ] ( implicit g: GenericCompany[G]) = new GRep[G, Employee] {

override def grep = g.employee}implicit def GRCompany[G[_] ] ( implicit g: GenericCompany[G]) = new GRep[G, Company] {

override def grep = g.company}implicit def GRDUnit[G[_] ] ( implicit g: GenericCompany[G]) = new GRep[G, DUnit] {

override def grep = g.dunit}implicit def GRDept[G[_] ] ( implicit g: GenericCompany[G]) = new GRep[G, Dept] {

override def grep = g.dept}

Example 4.1.2.1 (Query: Summing all salaries). Summing all salaries works as expected byusing a very simple (although verbose) definition:

Listing 4.1.12: Summing salaries in EMGMcase class SalarySum[N](gsum: N ⇒ Float ⇒ Float)implicit object MySalarySum extends GenericCompany[SalarySum] {

override def unit = SalarySum(n ⇒ r ⇒ r)override def plus [A, B] = a ⇒ b ⇒ SalarySum(x ⇒ r ⇒ x match {

case (Left(v)) ⇒ a.gsum(v)( r)case (Right(v)) ⇒ b.gsum(v)( r)

})override def prod[A, B] = a ⇒ b ⇒ SalarySum(x ⇒ r ⇒ b.gsum(x._2)(a.gsum(x._1)( r )))override def char = SalarySum(x ⇒ r ⇒ r)override def int = SalarySum(x ⇒ r ⇒ r)override def f loat = SalarySum(x ⇒ r ⇒ r)override def string = SalarySum(x ⇒ r ⇒ r)override def view[A, B] = iso ⇒ a ⇒ SalarySum(x ⇒ r ⇒ a.gsum( iso . from(x))( r ))override def salary = SalarySum(s ⇒ r ⇒ s . salary + r)

}def sumSalary[C](a : C)(implicit r : GRep[SalarySum, C] ) : Float = {

r . grep .gsum(a)(0F)}

31

Page 42: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Example 4.1.2.2 (Transformation: Increase all salaries). It’s straight forward to implementthis in EMGM, although the definition is very verbose.

Listing 4.1.13: Using the transform pattern: Increase salarycase class GSalary[A]( transform : A ⇒ A)implicit object MyGSalary extends GenericCompany[GSalary] {

override def salary = GSalary(x ⇒ Salary(x. salary ∗ 110 / 100))override def unit = GSalary(x ⇒ x)override def plus [A, B] = a ⇒ b ⇒ GSalary(x ⇒ x match {

case (Left(x)) ⇒ Left(a . transform(x))case (Right(x)) ⇒ Right(b. transform(x))

})

override def prod[A, B] = a ⇒ b ⇒ GSalary(x ⇒ (a . transform(x._1) ,b. transform(x._2)))

override def char = GSalary(x ⇒ x)override def int = GSalary(x ⇒ x)override def f loat = GSalary(x ⇒ x)override def string = GSalary(x ⇒ x)override def view[A, B] = iso ⇒ a ⇒ GSalary(x ⇒ iso . to(a . transform( iso . from(x))))

}def incSalary [T](a : T)(implicit r : GRep[GSalary, T] ) : T =

r . grep . transform(a)

4.1.3 Uniplate

Uniplate is applicable here, and as promised in chapter 3, it has the shortest solutions, due toits unique approach.2

Example 4.1.3.1 (Query: Summing all salaries). Trivial to do in uniplate: Get all salaries andfold over them:

def sumSalary[C]( se l f : C)(implicit bp: Biplate [C, Salary ])= universeBi( se l f ) . foldLeft (0.0F)(( f , s) ⇒ f + s . salary)

Example 4.1.3.2 (Transformation: Increase all salaries). A transformation is done using thetransformBi function which takes a function of type A ⇒ A and applies it to a value of a type Busing a Biplate [B,A].

transformBi(( i : Salary) ⇒ Salary( i . salary ∗ 1.1F))(genCom)

4.1.4 Shapeless

Using shapeless is much easier than using our self-created libraries because there is no need tocreate type representations manually, shapeless takes care of that for us.

2Not counting type representations

32

Page 43: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Example 4.1.4.1 (Query: Summing all salaries). We can simply use the everything combinatorpresented in chapter 3.

Listing 4.1.14: Summing salaries in Shapelessobject extractSalary extends Poly1 {

implicit def caseSalary = at [ Salary ]( s ⇒ s . salary)implicit def default [T] = at [T](_ ⇒ 0.0F)

}object addFloat extends Poly2 {

implicit val caseFloat = at [Float , Float ](_ + _)}

val sumSalary = everything(extractSalary)(addFloat)

To recall, the everything combinator consists of two parts: A traversal function, here extractSalary ,that extracts the types we are interested in, and a function to combine the results from thetraversal, such as the addFloat here.

Example 4.1.4.2 (Transformation: Increase all salaries). Transformations like that are easy towrite in Shapeless. All we need to do is create a function that takes a Salary and produces a newone and pass that to Shapeless’ everywhere function.

Listing 4.1.15: Increasing salaries in Shapelessobject incSalary extends →(( i : Salary) ⇒ Salary( i . salary ∗ 1.1F))assert (everywhere(incSalary)(genCom) == expCom)

4.2 Locating the smallest integer in a datatype

As another example, let’s take a look at what it takes to find the smallest integer in a datastructure. In all cases, we return Int .MaxValue if we cannot find any (smaller) integers.

LIGD The definition in LIGD is straight forward:

Listing 4.2.16: Minimum in LIGDdef min[C: Rep]( c : C) : Int = (rep [C] , c) match {

case (RSum(ra , rb) , Left(x)) ⇒ min(x)(ra)case (RSum(ra , rb) , Right(x)) ⇒ min(x)(rb)case (RProd(ra , rb) , (x, y)) ⇒ scala .math.min(min(x)(ra) , min(y)(rb))case (r : RType[_, C] , t1) ⇒ min(r .b. from(t1))( r .a)case (RInt, i ) ⇒ icase _ ⇒ Int .MaxValue

}

Alternatively, we can use one of the fold variants we implemented in our LIGD port – see thesource code file src/ligd.scala [Klo14] for more examples.

33

Page 44: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

EMGM In EMGM, the definition is relatively straight forward, similar to LIGD, although witha bit more boilerplate:

Listing 4.2.17: Minimum in EMGMcase class GMin[N](gmin: N ⇒ Int ⇒ Int)implicit object MyMin extends GenericList [GMin] {

override def unit = GMin(n ⇒ r ⇒ r)override def plus [A, B] = a ⇒ b ⇒ GMin(x ⇒ r ⇒ x match {

case (Left(v)) ⇒ a.gmin(v)( r)case (Right(v)) ⇒ b.gmin(v)( r)

})

override def prod[A, B] = a ⇒ b ⇒ GMin(x ⇒ r ⇒ b.gmin(x._2)(a .gmin(x._1)( r )))

override def char = GMin(x ⇒ r ⇒ r)override def int = GMin(x ⇒ r ⇒ scala .math.min(x, r ))override def f loat = GMin(x ⇒ r ⇒ r)override def string = GMin(x ⇒ r ⇒ r)override def view[A, B] = iso ⇒ a ⇒ GMin(x ⇒ r ⇒ a.gmin( iso . from(x))( r ))

}

def min[C](a : C)(implicit r : GRep[GMin, C] ) : Int = {r . grep .gmin(a)( Int .MaxValue)

}

Uniplate Finding the minimum in Uniplate could look something like this:

Listing 4.2.18: Minimum in Uniplatedef minInt [C]( se l f : C)(implicit bp: Biplate [C, Int ]) =

universeBi( se l f ) . foldLeft (Int .MaxValue)( scala .math.min)

That is, find all integers in the data structure recursively, and then simply pick the minimum.As a special bonus, this only works on data structures than can contain integers, because theBiplate has Int as a second parameter.

Shapeless Getting the minimum in shapeless works by using everything . In the example below,intmax returns the maximum integer for non-integer values and the integer for an integer value.The minimum object then combines two values.

Listing 4.2.19: Minimum in Shapelessobject intmax extends Poly1 {

implicit def caseInt = at [ Int ] ( s ⇒ s)implicit def default [T] = at [T](_ ⇒ Int .MaxValue)

}object minimum extends Poly2 {

implicit val caseInt = at [ Int , Int ] ( scala .math.min)implicit val caseDouble = at [Double, Double] ( scala .math.min)

}val min = everything(intmax)(minimum)

34

Page 45: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

4.3 First class generic functions

A generic function is a first class function if it can be passed to another generic function. Thisrequires support for universal and existential types in the programming language.

4.3.1 LIGD

Consider a generic function that can be applied to all representable values and returns a string.For example, a function gshow that shows values. In Haskell, such a function would have thetype:gshow : : foral l a . Rep a → a → String

Using the encoding of universal types shown in section 2.3, we can encode our gshow functionas an object with an apply method:

Listing 4.3.20: gshow in LIGDobject gshow {

def apply [U: Rep](u: U): String = (rep [U] , u) match {case (RInt, i ) ⇒ i . toString()case (RFloat, i ) ⇒ i . toString()case (RBoolean, i ) ⇒ i . toString()case (RString , i ) ⇒ i . toString()case (RChar, i ) ⇒ i . toString()case (RUnit, i ) ⇒ "()"case (RSum(ra , rb) , Left(x)) ⇒ "Left(" + gshow(x)(ra) + ")"case (RSum(ra , rb) , Right(x)) ⇒ "Right(" + gshow(x)(rb) + ")"case (RProd(ra , rb) , (a , b)) ⇒ "(" + gshow(a)(ra) + " ,␣" + gshow(b)(rb) + ")"case (RList(ra) , Nil) ⇒ "Nil"case (RList(ra) , x : : xs) ⇒ gshow(x)(ra) + "␣ : : ␣" + gshow(xs)case (r : RType[_, U] , t1) ⇒ gshow(r .b. from(t1))( r .a)case (r , v) ⇒ throw new Exception("Should␣not␣happen")

}}

We can now define another function accepting generic functions, for example, gmapQ:

Listing 4.3.21: gmapQ-like function in LIGDdef gmapQ[C: Rep, T]( fun : Returning [T])( c : C): List [T] = (rep [C] , c) match {

case (RSum(ra , rb) , Left(x)) ⇒ gmapQ(fun)(x)(ra)case (RSum(ra , rb) , Right(x)) ⇒ gmapQ(fun)(x)(rb)case (RProd(ra , rb) , (x, y)) ⇒ List(fun(x)(ra) , fun(y)(rb))case (r : RType[_, C] , t1) ⇒ gmapQ(fun)( r .b. from(t1))( r .a)case (r , v) ⇒ List(fun(v)( r ))

}

Note that if we encode the function as a class and an object extending that class as shown insection 3.1.3, we can even extend the function with new special cases. As such, this encodingoffers much more flexibility compared to the function encoding, and there’s not much reason touse functions.

35

Page 46: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

4.3.2 EMGM

There are two possible ways to implement this in EMGM. One solution is to simply pass and usethe GRep instances like functions, the other is to do it like LIGD. The first option looks more clean,but has the disadvantage that some wrapper functions might pass additional fixed arguments tothe method in the GRep instance, min for example passes Int .MaxValue as an argument.

The second way is less clean, because writing functions as objects does not look good, and becausethe higher-order functions can get quite long due to the use of higher-kinded types – which needto be written as explicit parameters, whereas the Rep instances in LIGD can be passed using ashorthand syntax.

Listing 4.3.22: Simple ‘apply’ function for EMGMtype Returning [F[_] , R] = {

def apply [C](a : C)(implicit r : GRep[F, C] ) : R}

def apply [F[_] , C, R]( fun : Returning [F, R])( c : C)(implicit r : GRep[F, C] ) : R = fun(c)

4.3.3 Uniplate

It is possible to encode all sorts of functions using a similar encoding as used in LIGD to makethem first class objects. Then it is possible to write higher order functions that can apply them.Because Biplate objects only offer a limited view at objects, though, this does not seem veryworthwhile. For example, we cannot write a gmapQ like we did for LIGD, because Biplate objectsonly look at one type of ‘element’ within a containing type.

4.3.4 Shapeless

In Shapeless, all generic functions are Poly values. They are first class, but they require allsorts of tricky parametric polymorphism, auxiliary classes, and dependent types. For example,composition of two functions is implemented as:

Listing 4.3.23: Composition of two Poly values (copied from shapeless)class Compose[F, G]( f : F, g : G) extends Poly

object Compose {implicit def composeCase[C, F <: Poly , G<: Poly , T, U, V]

(implicit unpack: Unpack2[C, Compose, F, G] , cG : Case1.Aux[G, T, U] , cF : Case1.Aux[F, U, ←↩↪→ V]) = new Case[C, T : : HNil ] {

type Result = Vval value = (t : T : : HNil) ⇒ cF(cG.value(t))

}}

36

Page 47: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

5 Evaluation of the approaches

This chapter compares the feature sets and the performance of the libraries presented in chapter2. Uniplate is not featured in the performance comparison due to a lack of usable instances.

5.1 Performance benchmark

The performance of the libraries was tested by running various tests on them.

5.1.1 Setup

The measurements are done using scalameter 0.6 with 150 runs, default warmer, and as themeasurer, Measurer.IgnoringGC with Measurer.OutlierElimination was used. Those tests were:

company The ‘company’ test increases the salary in a company object. The company consistsof a list of n departments, each department has n employees and 1 manager, making atotal of n2 + n salaries.

geq The ‘geq’ benchmark compares two lists for equality, where the lists are consecutive ranges[1, n2] and [1, n2 + 1] of integers.

min The ‘min’ benchmark calculates the minimum of a list.

sum The ‘sum’ benchmark calculates the sum of a list.

The tests were performed on a single ThinkPad X230 running the Hotspot Server VM on DebianGNU/Linux. More details can be found in the table below.

Scala 2.11.2JRE OpenJDK Runtime Environment (IcedTea 2.5.2) (7u65-2.5.2-4)JVM OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)Arguments -Xss64m (increased stack size to 64 MB)OS Debian GNU/Linux ‘unstable’ (2014-09-21), kernel 3.16.2CPU Intel Core i5-3320M (2.6 GHz, 3.3 GHz turbo boost)RAM 8 GB

Table 5.1: Benchmark environment

37

Page 48: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

test company geq min sumDirect 1,5 ms 414,1 µs 409,8 µs 501,5 µs

Shapeless 28,0 ms N/A 2,8 ms 2,9 msLIGD 13,8 ms 8,4 ms 7,3 ms 8,4 msHLIGD 16,1 ms 498,6 µs 477,0 µs 522,0 µsEMGM 21,4 ms 8,7 ms 6,6 ms 7,3 ms

Table 5.2: Benchmark results

5.1.2 Results

The ‘min’ and ‘sum’ tests should show the same performance, as both are the same basic typeof fold. They use the longer list of the ‘geq’ benchmark.

Apparently, LIGD and EMGM perform roughly the same. Shapeless is slower in the companybenchmark, but faster in the other ones that only use lists, which seems to indicate that shapelessis more optimised for lists than LIGD and EMGM.

The HLIGD benchmark behaves like LIGD for the company benchmark, but is as fast as thedirect version for the other cases, because HLIGD directly uses foldLeft on sequences. Any timedifference compared to the direct code is noise.

5.2 Library overview

The tested libraries support different sets of features. Table 5.2 gives an overview of the dif-ferences. There are some more features that were tested with the Haskell versions in GP-Bench [RJJ+08], but as they are supported by all of the libraries tested here, they are notlisted.

LIGD EMGM Uniplate ShapelessAd-hoc cases X+ X X XMulti-parameter functions X X (×) XExtensibility ◦+ ◦ × ◦First-class generics X ◦ ◦ ◦Automatic representations × × (X) XEase of use X ◦ X ◦

Legend:X Well supported ◦ Supported, but needs effort × Not supported+ Not in Haskell () Only tested in Haskell

Table 5.3: Differences between the libraries

38

Page 49: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Ad-hoc cases All tested libraries support ad-hoc cases. In LIGD, only the Scala version sup-ports ad-hoc cases, due to the availability of sub-typing.

Multi-parameter functions Except for uniplate, all libraries support functions with multiplegeneric arguments. Uniplate only has limited support for this, not sufficient to implement genericequality, because it only looks at parts of objects that have a specific type.

Extensibility EMGM supports extending existing functions directly, but is verbose. LIGD,using the class encoding, supports creating new functions that extend old functions and overridebehaviour for some types. Generic functions in Uniplate are not extensible, mostly because theyare just short functions based on combinators. Shapeless mostly uses combinators as well butthe polymorphic functions can be extended, allowing new ad-hoc cases in sub-classes.

First class generic functions LIGD does not directly support first class generic functions, theycan be encoded as objects with an apply method however. This is a limitation of the Scalalanguage, not a limitation of LIGD itself. See section 2.3 for a discussion on the issue. EMGMsupports first class generics to some extend, but is complicated. Uniplate does not supporta really useful form of first class generics. Shapeless mostly uses combinators, but all genericfunctions are Poly instances that can be passed around, although using them requires some moreimplicit objects.

Automatic representations Only shapeless supports automatic representations. It should bepossible to automatically generate representations for Uniplate using Shapeless, just like theycan be generated in Haskell from Data and Typeable instances, but this would require furtherinvestigation. It is also the slowest form of instances Uniplate supports.

LIGD and EMGM have not been shown to support automatic generation of representations yet.It might be possible to implement it using Scala macros, but this requires further investigationas well.

Ease of use LIGD and Uniplate are very easy to learn and use: one is based on basic patternmatching over representations and the other uses simple combinators with monomorphic func-tions, both of which do not require the knowledge of more advanced language features or libraryimplementation details, even when debugging.

EMGM is slightly less easy to use than LIGD. First of all, it requires more code to achieve thesame result; secondly, functions need to be extended to sub-universes; and thirdly, writing thegeneric function in a visitor-like way is going to be more complicated for most people.

Shapeless is very easy to use once the basics are understood. But Shapeless also provides anduses more complex features. This often leads to error messages that are hard to understand,making debugging a painful experience compared to all of the other libraries.

39

Page 50: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic
Page 51: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

6 Conclusion

Shapeless and LIGD are the best options for generic programming in Scala right now. Shapelessprovides a much larger feature set, is more widely used, and should thus be preferred in productionuse. Where Uniplate is applicable (like, for interpreters), it might make sense to use that instead,possibly in combination with Shapeless for deriving instances like done in Haskell using Data andTypeable.

EMGM does not seem to have any significant advantage compared to Shapeless that can out-weight its disadvantages in terms of verbosity.

The extended LIGD is now capable of supporting ad-hoc cases and extensibility, making it morepowerful then the original implementation in Haskell. Its use of only basic language features andits easy-to-write representations and functions make it an optimal choice for introducing genericprogramming to students.

It would be interesting to look at automatic generation of LIGD representations using macros;likewise, an implementation that represents constructors using heterogeneous lists also seems aworthwhile idea, opening up new possibilities like generic zippers. While an initial prototype ofsuch a library exists in the source code of this bachelor thesis1, a more complete solution requiresfurther work.

Another topic to look at with regards to LIGD is the handling of lists and other containertypes: Representing them as deeply-nested pairs can cause stack overflows when implementingalgorithms in a naïve way. Adding direct support for sequences and/or implementing a set of com-binators representing commonly-needed operations using trampolines might be worthwhile.

1See src/hligd.scala and src/hlists.scala [Klo14]

41

Page 52: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic
Page 53: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Bibliography

[CH02] James Cheney and Ralf Hinze. A lightweight implementation of generics and dynam-ics. In Proceedings of the 2002 ACM SIGPLAN Workshop on Haskell, Haskell ’02,pages 90–104, New York, NY, USA, 2002. ACM. Retrieved from http://www.cs.ox.ac.uk/ralf.hinze/publications/HW02.pdf.

[dSOG08] Bruno C. d. S. Oliveira and Jeremy Gibbons. Scala for generic programmers. In RalfHinze and Don Syme, editors, ICFP-WGP, pages 25–36. ACM, 2008. Retrieved fromhttp://www.cs.ox.ac.uk/jeremy.gibbons/publications/scalagp.pdf.

[Hin06] Ralf Hinze. Generics for the masses. J. Funct. Program., 16(4-5):451–483, July 2006.Retrieved from http://www.cs.ox.ac.uk/ralf.hinze/publications/Masses.pdf.

[Klo14] Julian Andres Klode. Source code of performance and interfaces of datatype-genericscala programs, 2014. Available online at https://github.com/julian-klode/bsc-thesis-code/.

[LJ03] Ralf Lämmel and Simon L. Peyton Jones. Scrap your boilerplate: a practical designpattern for generic programming. In Zhong Shao and Peter Lee, editors, TLDI, pages26–37. ACM, 2003. Retrieved from http://research.microsoft.com/en-us/um/people/simonpj/Papers/hmap/hmap.ps.

[MR07] Neil Mitchell and Colin Runciman. Uniform boilerplate and list processing. InGabriele Keller, editor, Proceedings of the ACM SIGPLAN Workshop on Haskell,Haskell 2007, Freiburg, Germany, September 30, 2007, pages 49–60. ACM, 2007.Retrieved from http://community.haskell.org/~ndm/downloads/paper-uniform_boilerplate_and_list_processing-30_sep_2007.pdf.

[RJJ+08] Alexey Rodriguez, Johan Jeuring, Patrik Jansson, Alex Gerdes, Oleg Kiselyov, andBruno C. d. S. Oliveira. Comparing libraries for generic programming in haskell.In Andy Gill, editor, Haskell, pages 111–122. ACM, 2008. Retrieved from http://www.cs.uu.nl/research/techreps/repo/CS-2008/2008-010.pdf.

[RO10] Tiark Rompf and Martin Odersky. Lightweight modular staging: a pragmatic ap-proach to runtime code generation and compiled dsls. In Eelco Visser and JaakkoJärvi, editors, Generative Programming And Component Engineering, Proceedings ofthe Ninth International Conference on Generative Programming and Component Engi-neering, GPCE 2010, Eindhoven, The Netherlands, October 10-13, 2010, pages 127–136. ACM, 2010. Retrieved from http://infoscience.epfl.ch/record/150347/files/gpce63-rompf.pdf.

43

Page 54: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic
Page 55: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Listings

2.2.1 Improving type inference using lambdas . . . . . . . . . . . . . . . . . . . . . . 72.3.2 Encoding a universal function . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.1.1 LIGD Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.1.2 LIGD ‘rep’ helper function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.1.3 Implementation of RType . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.1.4 Implementation of rList . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.1.5 Generic Equality in LIGD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.1.6 Extended Generic Equality in LIGD . . . . . . . . . . . . . . . . . . . . . . . . 153.1.7 Pattern for representing parameterised types . . . . . . . . . . . . . . . . . . . 163.1.8 LIGD folding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.1.9 Simple LIGD folding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.1.10 LIGD ‘everywhere’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2.11 The Generic trait in GM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2.12 A generic equality function in GM . . . . . . . . . . . . . . . . . . . . . . . . . 183.2.13 The Rep trait in GM and some instances . . . . . . . . . . . . . . . . . . . . . 193.2.14 Representing lists in GM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2.15 Generic with ad-hoc list case . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2.16 Generic equality specialised to lists . . . . . . . . . . . . . . . . . . . . . . . . 203.2.17 The GRep dispatcher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2.18 A GRep instance for products . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2.19 A GRep instance for lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3.20 The Uniplate type class (and various additional functions) . . . . . . . . . . . 223.3.21 The Biplate type class (and various additional functions) . . . . . . . . . . . . 233.3.22 Uniplate minInt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.4.23 int1 – A polymorphic function of arity 1 . . . . . . . . . . . . . . . . . . . . . . 24scala/tests/CompanyTest.scala . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.4.24 prod – A polymorphic function of arity 2 . . . . . . . . . . . . . . . . . . . . . 243.4.25 product – Product of all integers in an object . . . . . . . . . . . . . . . . . . . 25

4.1.1 Paradise benchmark data structures . . . . . . . . . . . . . . . . . . . . . . . . 274.1.2 Paradise benchmark example data . . . . . . . . . . . . . . . . . . . . . . . . . 284.1.3 Person, Employee, and Company representations in LIGD . . . . . . . . . . . . 28scala/src/ligd–company.scala . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.1.4 Salary representation in LIGD . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.1.5 Mutually recursive representations . . . . . . . . . . . . . . . . . . . . . . . . . 29

45

Page 56: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

4.1.6 Summing salaries in LIGD (manually) . . . . . . . . . . . . . . . . . . . . . . . 294.1.7 Summing salaries in LIGD (fold) . . . . . . . . . . . . . . . . . . . . . . . . . . 294.1.8 Increasing the salaries in LIGD . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.1.9 Increase the salaries using everywhere . . . . . . . . . . . . . . . . . . . . . . . . 304.1.10 GenericCompany . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.1.11 Implicit instances of GRep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.1.12 Summing salaries in EMGM . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.1.13 Using the transform pattern: Increase salary . . . . . . . . . . . . . . . . . . . 324.1.14 Summing salaries in Shapeless . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.1.15 Increasing salaries in Shapeless . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.2.16 Minimum in LIGD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.2.17 Minimum in EMGM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.2.18 Minimum in Uniplate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.2.19 Minimum in Shapeless . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.3.20 gshow in LIGD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.3.21 gmapQ-like function in LIGD . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.3.22 Simple ‘apply’ function for EMGM . . . . . . . . . . . . . . . . . . . . . . . . 364.3.23 Composition of two Poly values (copied from shapeless) . . . . . . . . . . . . . 36

The complete source code of the libraries and examples is publicly available at:https://github.com/julian-klode/bsc-thesis-code/

46

Page 57: Bachelor Thesis: Performance and Interfaces of Datatype ...klode/thesis.pdf · Programming Languages and Software Technology Group Prof. Ostermann Performance and Interfaces of Datatype-Generic

Erklärung

Ich versichere hiermit eidesstattlich, dass ich die vorliegende Arbeit selbstständig verfasst, ganzoder in Teilen noch nicht als Prüfungsleistung vorgelegt und keine anderen als die angegebenenHilfsmittel benutzt habe.

Sämtliche Stellen der Arbeit, die benutzen Werken im Wortlaut oder dem Sinn nach entnommensind, habe ich durch Quellenangaben kenntlich gemacht. Dies gilt auch für Zeichnungen, Skizzen,bildliche Darstellungen und dergleichen sowie für Quellen aus dem Internet. Bei Zuwiderhand-lung gilt die Bachelorarbeit als nicht bestanden.

Ich bin mir bewusst, dass es sich bei Plagiarismus um schweres akademisches Fehlverhaltenhanddelt, das im Wiederholungsfall weiter sanktioniert werden kann.

Ort, Datum Unterschrift

47