mbrace: large-scale cloud computation with f# (cufp 2014)
DESCRIPTION
Presentation at CUFP 2014, Gothenburg.TRANSCRIPT
Eirik Tsarpalis – Nessos
MBrace: Large-scale cloud computation with F#
ISV / Consultancy based in Athens, Greece.
.NET framework, specializing in F#.
Business applications ◦ Application framework development
◦ Technology migration
◦ Customized software systems
R&D Division ◦ Open Source development
◦ Distributed computation
◦ Optimization frameworks
About Nessos
What is MBrace?
A Programming Model. ◦ Large-scale distributed computation.
◦ Inspired by F# asynchronous workflows.
◦ Declarative, compositional, higher-order.
A Cluster Infrastructure. ◦ Based on the .NET framework.
◦ Elastic, fault tolerant, multitasking.
◦ Open Source – available on github.
Hello World
The MBrace Programming Model
val hello : Cloud<int> let hello = cloud { printfn "hello, world!" return 21 } let result = MBrace.Run hello
Sequential Composition
The MBrace Programming Model
let first = cloud { return 15 } let second = cloud { return 27 } cloud { let! x = first let! y = second return x + y }
Sequential fold
The MBrace Programming Model
val foldM : ('S -> 'T -> Cloud<'S>) -> 'S -> 'T list -> Cloud<'S> let rec foldM f s ts = cloud { match ts with | [] -> return s | t :: ts' -> let! s' = f s t return! foldM f s' ts' }
Parallel Composition
The MBrace Programming Model
val (<||>) : Cloud<'T> -> Cloud<'S> -> Cloud<'T * 'S> cloud { let first = cloud { return 15 } let second = cloud { return 27 } let! x,y = first <||> second return x + y }
Parallel Composition (Variadic)
The MBrace Programming Model
val Cloud.Parallel : Cloud<'T> [] -> Cloud<'T []> cloud { let sqr x = cloud { return x * x } let jobs = Array.map sqr [|1 .. 100|] let! sqrs = Cloud.Parallel jobs return Array.sum sqrs }
Exception handling
The MBrace Programming Model
let first = cloud { return 17 } let second = cloud { return 25 / 0 } cloud { try let! x,y = first <||> second return Some(x + y) with :? DivideByZeroException -> return None }
Demo
Parallel fold
The MBrace Programming Model
let parFold (folder : 'S -> 'T -> 'S) (combiner : 'S -> 'S -> 'S) (id : 'S) (inputs : 'T []) = cloud { let seqfold (inputs : 'T []) = cloud { return Array.fold folder id inputs } let! n = Cloud.GetWorkerCount () let chunks : 'T [] [] = Array.partition n inputs let! results = chunks |> Array.map seqFold |> Cloud.Parallel return Array.reduce combiner results }
MBrace Data Primitives
Storage entities represented by references.
Conceptually similar to ref cells.
Creation only admissible through the monad.
Immutable*.
Support for SQL, Windows Azure.
Cloud Storage interface
CloudRef
MBrace Data Primitives
module CloudRef = begin val New : 'T -> Cloud<CloudRef<'T>> val Read : CloudRef<'T> -> 'T end
CloudFile
MBrace Data Primitives
module CloudFile = begin val New : (Stream -> unit) -> Cloud<CloudFile> val Read : CloudFile -> (Stream -> 'T) -> Cloud<'T> val Enumerate : string -> Cloud<CloudFile []> end
Demo
Performance
We tested MBrace against Hadoop.
Tests were staged on Windows Azure.
Clusters of 4, 8, 16 and 32 Large Azure instances.
Two algorithms were tested, grep and k-means.
Source code available on github.
Distributed grep
Performance
Find occurrences of given pattern in text files.
Straightforward Map-Reduce algorithm.
Input data was 32, 64, 128 and 256 GB of text.
Distributed grep
Performance
Find occurrences of given pattern in text files.
Straightforward Map-Reduce algorithm.
Input data was 32, 64, 128 and 256 GB of text.
Distributed grep
Performance
K-means
Performance
Centroid computation out of a set of vectors.
Iterative algorithm.
Not naturally describable in Map-Reduce workflows.
Hadoop implementation using Apache Mahout.
Input was 106, randomly generated 100-dimensional points.
K-means
Performance
Conclusions
Declarative, composable computation through the cloud monad.
Explicit, dynamic control over parallelism patterns and granularity.
Exception handling!
On-the-fly deployment through the F# REPL.
Open Source.
http://m-brace.net
Thank you!