a quick introduction to f#bitti/functional-seminar/fp... · • the functional language for...

A Quick introduction to F#

…and my take on why functional programming matters

Juhana HelovuoAtostek Oy

2014-11-10

Contents

• Atostek

• F#

• Why should I care, elaborated– Easier to reason about

– Performance

– Safety

• Summary

This presentation uses materials from Visual Studio Help, Wikipedia, Github, and other parts of the Internet. May contain unnatural colours, artificial examples, and traces of nuts.

Atostek Oy

• Since 1999, office in Hermia

• AAA credit rating

• Owned by personnel

• Head count ~ 50- Mostly M.Sc. or Dr.Tech. from TUT + students

37.11.2014Atostek – Expertising your project

0

0,5

1

1,5

2

2,5

3

3,5

4

4,5

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

Mil. €

4Atostek Aatos - Tueksi hankintoihin

7.11.2014

F# : General properties

• The functional language for Microsoft’s .Net platform– Strict evaluation– Direct support for .Net OO features ”multi-paradigm”

• Manipulating standard .Net objects causes side effects• F# standard library can be used for pure computation

– Standard component of Visual Studio since VS 2008

• Developed mostly from ML (Milner et al. , 1973)

– Also influence from OCaml, Python(!), Haskell, Scala, Erlang

• Native on .Net– Runs on Common Language Runtime, the .Net virtual machine– Compiled to Common Intermediate Language ( = CLR assembly)– Basic data types implemented on Common Type System– Many parts compatible with C#: strings, numbers, method calls, generics

• But e.g. Lists and Maps and other default libraries are different

Basic F# expressions> let x = 2 + 3 * 4 ;;

val x : int = 14

> type Person = { name : string ; age : int } ;;

> let aa = { name = "Aatos" ; age = 3 } ;;

val aa : Person = {name = "Aatos"; age = 3;}

> [1..3] ;;

val it : int list = [1; 2; 3]

> seq { for i in 0 .. x do

if i % 2 = 0 then yield i+2 } ;;

val it : seq<int> = seq [2; 4; 6; 8; ...]

> if aa.age >= 18 then printfn "Yes" else printfn "No“ ;;

No

val it : unit = ()

alternatively: printfn <| if aa.age >= 18 then "Yes" else "No" ;;

> ['a' .. 'f'] |> List.map int

val it : int list = [97; 98; 99; 100; 101; 102]

Common library types:List, Seq, Map, Option

• List<’a>is Lisp-style linked list.

• Seq<’a>– Lazy but non-memoizing

sequence

– Implemented using generators

– Many data structures can convert themselves to Seq

– Is really IEnumerable<a> from .Net in disguise

• Map<’Key,’Value> is an ordered indexable collection (balanced tree).

• type Option<’a> =| Some of ’a| None

– Can be used to add null value to a type.

– Safe, since contents can be only accessed after pattern match

Operations on whole data structures

• F#Seq.map

Seq.map2

Seq.fold

Seq.foldBack

Seq.filter

Seq.scan

...

List.map

List.filter

...

Map.map

Map.filter

...

9Atostek Oy 7.11.2014

• Haskellmap (fmap)

zipWith

foldr

foldl

filter

scan

concat

any

all

...

Functions and pattern matching

type Expression = | Number of int| Add of Expression * Expression | Multiply of Expression * Expression | Variable of string

let rec Evaluate (env:Map<string,int>) exp = match exp with| Number n -> n | Add (x, y) -> Evaluate env x + Evaluate env y | Multiply (x, y) -> Evaluate env x * Evaluate env y | Variable id -> env.[id]

let environment = Map.ofList [ "a", 1 ; "b", 2 ; "c", 3 ]

// Create an expression tree that represents the expression: a + 2 * b.let expressionTree1 = Add(Variable "a", Multiply(Number 2, Variable "b"))

// Evaluate the expression a + 2 * b, given the// table of values for the variables.let result = Evaluate environment expressionTree1

Computation Expressions

• Code blocks where computation semantics can be defined (within limits)

• Similar to do-notation is Haskell, but more syntactic constructs

• Examples:– State (monad)

– Option (Maybe, monad)

– Step-by-step computation

– Undo-computation

– Async

– Query expressions (LINQ)

– Seq expressions

– Software Transactional Memory (STM)

Computation expression: state

type State<'a, 's> = State of ('s -> 'a * 's)

let runState (State s) a = s alet getState = State (fun s -> (s,s))let putState s = State (fun _ -> ((),s))

type StateBuilder() =member this.Return(a) = State (fun s -> (a,s))member this.Bind(m,k) =

State (fun s ->let (a,s') = runState m srunState (k a) s')

member this.ReturnFrom (m) = m

let state = new StateBuilder()

let counterWorkflow =

let s = state {

do! DoSomething

let! a = Foobar

do! WithA a

return a+1

}

runState s 0

stateful

pure

Computation expression: state

static member GetYYYAndXXX : State<XXXState, XRetTypeX> =

state {

let s = System.XXX()

let ll = XXX

let! cC = XXXState.getXXX

let! m = XXXState.getMMM

[ . . . ]

// TODO: so far there are no XXX other XX than YYY

let e = mm.GetEEE

|> Seq.map (function Choice1Of2 both -> A both

| Choice2Of2 only -> B only)

|> Seq.groupBy (fun t -> t.XXXs)

|> Seq.map (fun (c,ts) ->

Set.map ll.GetXXX c

|> Seq.map (fun g -> g,List.ofSeq ts )

)

|> Seq.concat

|> Map.ofSeqWith (@)

[ . . . ]

return { c = cCc; m = allMMM }

}

Computation expression: asyncopen System.Net open Microsoft.FSharp.Control.WebExtensions

// List<(string*string)>let urlList = [ "Microsoft.com", "http://www.microsoft.com/"

"MSDN", "http://msdn.microsoft.com/""Bing", "http://www.bing.com" ]

let fetchAsync(name, url:string) = async {

trylet uri = new System.Uri(url) let webClient = new WebClient() let! html = webClient.AsyncDownloadString(uri) printfn "Read %d characters for %s" html.Length name

with| ex -> printfn "%s" (ex.Message);

}

let runAll() = urlList |> Seq.map fetchAsync |> Async.Parallel |> Async.RunSynchronously |> ignore

runAll()

Short F# summary

• ML-derived language on .Net– ”Can be used for everything that C# can be,

except for null reference exceptions.”

• Recursion is the goto:– In practical programming, all iteration is done using library functions.

– Actual recursion is hardly ever used.

• Computation expressions allow defining new behavior– But with a fixed syntax

Why should I care, longer explanation.

About functional programming in general, we are going outside F# here

Example: Insertion sort

• One of the simplest sorting algorithms

• Page 3 in ”Introduction to Algorithms”

≤key

key

1 2 …

Sorted Unsorted

Shorter – or easier to reason about?Example: insertion sort

-- Functional Language

-- Haskell, but F# translation would -- only have slightly different syntax

insertionSort :: Ord a => [a] -> [a]

insertionSort [] = []

insertionSort (x:xs) =

insert x (insertionSort xs)

where

insert :: Ord a => a -> [a] -> [a]

insert x [] = [x]

insert x (y:ys)

| x < y = x : y : ys

| otherwise = y : insert x ys

-- 8 lines of code

-- + 2 lines of (compiler-inferrable) types

-- Generic Imperative Language

-- Object-oriented or not

-- Input is array A[1..n].

j = 1

while j < n do

i j

j j + 1

key A[j]

while i > 0 and A[i] > key do

A[i + 1] A[i]

i i-1

endwhile

A[i + 1] key

endwhile

-- Output is array A[1..n], now sorted.

-- 11 lines code

-- Not much longer!

Proving Imperative insertion sort

j = 1while j < n doi jj j + 1key A[j]while i > 0 and A[i] > key do

A[i + 1] A[i]i i-1

endwhileA[i + 1] key

endwhile

• Input and output are the array A[1..n]• Correctness:

1. The output is sortedSortedA(1,n) i ϵ [1,n-1] : A[i] ≤ A[i+1]

2. The output has the same elements as the input. p : Aout = Permutation(p,Ain)

3. The algorithm terminates.• Requirements 1 and 2 are most interesting and difficult, so

we’ll try those.• We’ll handle 2 rather informally due to space, time, and

boringness constraints.• Requirement 3 is left for homework for those who are

interested.• Note that

– i,j,k: SortedA(i,j) ^ SortedA(j,k) SortedA(i,k)– i: SortedA (i,i)

≤key

key

1 2 …

Sorted

Proving Imperative insertion sort

• We use ”P” to denote that A is still a permutation of the input.

• Outer loop invariant I:SortedA(1,j) ^ 1≤j≤n

• Inner loop invariant I2:SortedA(1,j-1) ^ (i=j-1 v A[j-1] ≤ A[j])^ A[i+1..j] ≥ key ^ i≥0

• Where ”A[i+1..j] ≥ key” meansk ϵ [i+1..j] : A[k] ≥ key

≤key

key

1 2 …

Sorted

{ n ≥ 1 }

j = 1

{ j = 1 ^ I ^ P }

while j < n do

{ I ^ j < n ^ P}

i j

{ I ^ j < n ^ i=j ^ P }

j j + 1

{ I Sorted(1,j-1) ^ i=j-1 ^ 1<j≤n ^ P }

key A[j]

{ Sorted(1,j-1) ^ i=j-1 ^ 1<j≤n ^ key=A[j] ^ P }

{ I2 }


{ I2 ^ i>0 ^ A[i]>key }

A[i + 1] A[i]

{ SortedA(1,j) ^ A[i+1..j]≥key ^ i>0 ^ A[i]>key ^ A[i+1] = A[i] }

i i-1

{ SortedA(1,j) ^ A[i+2..j]≥key ^ i≥0 ^ A[i+1]>key ^ A[i+2] = A[i+1] }

{I2}

endwhile

{ I2 ^ (i≤0 v A[i] ≤ key) but P }

{ I }

A[i + 1] key

{ I ^ P }

endwhile

{ I ^ j≥n ^ P} { SortedA(1,n) ^ P }

Proving Imperative insertion sort{ n ≥ 1 }

j = 1

{ j = 1 ^ I ^ P }

while j < n do

{ I ^ j < n ^ P}

i j

{ I ^ j < n ^ i=j ^ P }

j j + 1

{ I Sorted(1,j-1) ^ i=j-1 ^ 1<j≤n ^ P }

key A[j]


{ I2 }


{ I2 ^ i>0 ^ A[i]>key }

A[i + 1] A[i]


i i-1


{I2}

endwhile

{ I2 ^ (i≤0 v A[i] ≤ key) but P }

{ I }

A[i + 1] key

{ I ^ P }

endwhile

{ I ^ j≥n ^ P} { SortedA(1,n) ^ P }

• I: SortedA(1,j) ^ 1≤j≤n

• I2: SortedA(1,j-1) ^ ( i=j-1 v A[j-1] ≤ A[j] )^ A[i+1..j] ≥ key ^ i≥0

≤key

key

1 2 …

Sorted

Case A[j-1] ≤ A[j]:SortedA(1,j-1) SortedA(1,j)

Case i=j-1:A[i]=A[i+1] A[j-1]≤A[j] SortedA(1,j)

A[i] > key ^ A[i+1]=A[i] A[i+1] ≥ key A[i+1..j] ≥ key


j = 1

{ j = 1 ^ I ^ P }

while j < n do

{ I ^ j < n ^ P}

i j

{ I ^ j < n ^ i=j ^ P }

j j + 1

{ I Sorted(1,j-1) ^ i=j-1 ^ 1<j≤n ^ P }

key A[j]


{ I2 }


{ I2 ^ i>0 ^ A[i]>key }

A[i + 1] A[i]


i i-1


{I2}

endwhile

{ I2 ^ (i≤0 v A[i] ≤ key) but P }

{ I }

A[i + 1] key

{ I ^ P }

endwhile

{ I ^ j≥n ^ P} { SortedA(1,n) ^ P }



≤key

key

1 2 …

Sorted

Case A[j-1] ≤ A[j] :SortedA(1,j-1) SortedA(1,j)

Case i=j-1 ^ i≤0 : i≥0 i=0 j=1 SortedA(1,j)

Case i=j-1 ^ A[i] ≤ key :A[i+1..i+1] ≥ key A[i] ≤ key ≤ A[i+1] A[j-1] ≤ key ≤ A[j] SortedA(1,j)


j = 1

{ j = 1 ^ I ^ P }

while j < n do

{ I ^ j < n ^ P}

i j

{ I ^ j < n ^ i=j ^ P }

j j + 1

{ I Sorted(1,j-1) ^ i=j-1 ^ 1<j≤n ^ P }

key A[j]


{ I2 }


{ I2 ^ i>0 ^ A[i]>key }

A[i + 1] A[i]


i i-1


{I2}

endwhile

{ I2 ^ (i≤0 v A[i] ≤ key) but P }

{ I }

A[i + 1] key

{ I ^ P }

endwhile

{ I ^ j≥n ^ P} { SortedA(1,n) ^ P }



≤key

key

1 2 …

Sorted

Assignment A[i + 1] key maintains I, because- SortedA(1,j)- A[i] ≤ key- A[i+1..j] ≥ key

P is restored, because…too long to show. Appeal to picture above.

Also would need to show that all indexing of A is within [1..n]. TL;DR.

Proving functional insertion sort

Define Sorted(x) ”list x is sorted in ascending order”

• Req1: a : Sorted(insertionSort a)

• Req2: p : insertionSort a = Perm(p,a)

• Req3: Termination.

Show Req1 using induction over length(a):

Base: length a = 0: Trivially true by line 1.

IHypo: length a = k Sorted(insertionSort a)

IStep: If length a = k+1 , then

insertionSort a = insertionSort (x:xs) = insert x (insertionSort xs).

Now ”insertionSort xs” is sorted by induction hypothesis.

We need to show: Sorted(L) Sorted(insert x L)

And we are done.

1 insertionSort [] = []

2 insertionSort (x:xs) =

3 insert x (insertionSort xs)

4 where

5 insert x [] = [x]

6 insert x (y:ys)

7 | x < y = x : y : ys

8 | otherwise = y : insert x ys

…Proving functional insertion sort

Show ”Sorted(L) Sorted(insert x L)” :

Assume Sorted(L) and use induction on length of L.

Base: length L = 0 : Line 5 Trivially sorted.

IHypo: length L = k and Sorted(L) Sorted(insert x L)

IStep: If length L = k+1, then

insert x L = insert x (y:ys) and we have 2 cases:case x<y: result is ”x:y:ys” Sorted([x,y]) ^ Sorted(y:ys)

Sorted(x:y:xs).

case x≥y: result is ”y : insert x ys” Sorted(insert x ys) by IHypo.

y ≤ x ^ y ≤ (all of ys), because Sorted(L)

Sorted(y : insert x ys)

And we are done.




4 where

5 insert x [] = [x]

6 insert x (y:ys)

7 | x < y = x : y : ys


…Proving functional insertion sort

Req2: p : insertionSort a = Perm(p,a)

• The value of ”insert” is a permutation of the input values (by case analysis of code).

• Same holds for insertionSort.

Done.

[Strictly speaking, the above is circular reasoning because of the recursion in the code, and therefore the logic is not valid. To get this formally right, use again induction along the length of input to avoid circular argument.]

Req3: Termination (Time complexity!)

• ”insert L” runs in O(length L), because each recursion step makes input 1 element shorter and is O(1).

• ”insertionSort L” similarly, each step takes O(n), therefore complexity is O((length L)2) and therefore finite.




4 where

5 insert x [] = [x]

6 insert x (y:ys)

7 | x < y = x : y : ys


Easier to reason about?

Functional

• We can apply deduction rules to expressions and use substitution principle referential transparency– The logical formulas are valid

or not without reference to program counter

• Recursive control induction proof

Imperative

• We can (must) analyze program state between statements.– Analysis must follow control

flow and use deduction rules for each control structure

– Need to invent(!) invariants

• This technique is known as ”Hoare Logic”– Course MAT-71506 or e.g.

Wikipedia.

Faster

Functional Parallel Programing in Corento

www.atostek.com

• A single-assignment data flow

language for computation

kernels

– Designed and

implemented at Atostek

with Nokia 2009-2011

– Corento routines are

called from C (or

equivalent)

– Not independent

programs

• Matrix*Vector multiplication

code shown here

Corento

Atostek Oy 7.11.2014 29

inline

function vecMulScal

{value len:Integer}

(v1: [Float # len], x: Float)

: [Float # len] =

for a in v1 do

value all a*x

end

end

-- Golub & VanLoan:

-- Matrix Computations 3rd ed. pp. 6,

-- Algorithm 1.1.4 (Column Gaxpy)

inline

function matMulVec

{value rows:Integer, value cols:Integer}

(a:[[Float # rows] # cols], xs:[Float # cols])

: [Float # rows] =

let init_sum = vecZero{rows}();

in for ac in a, x in xs

initially cumsum = init_sum;

do yc = vecMulScal{rows}(ac,x);

next cumsum = vecAdd{rows}(cumsum,yc);

value last cumsum

end

end

end

www.atostek.com

void c_matMul_gaxpy(int rowsA, int colsA, int colsB, float* a, float* b, float* result)

{

int rowsB = colsA;

int rowA,colA,colB;

for (colB=0;colB<colsB;colB++)

{

colA=0;

for (rowA=0;rowA<rowsA;rowA++)

{

result[colB*rowsA+rowA] = a[colA*rowsA+rowA]*b[colB*rowsB+colA];

}

for (colA=1;colA<colsA;colA++)

{

for (rowA=0;rowA<rowsA;rowA++)

{

result[colB*rowsA+rowA] += a[colA*rowsA+rowA]*b[colB*rowsB+colA];

}

}

}

}

Matrix * Matrix

in portable C

Atostek Oy 7.11.2014 30

www.atostek.com

void

cv_matMul_gaxpy(int rowsA, int colsA, int colsB, float* a, float *b, float* result)

{

float32x4_t* va = (float32x4_t *) a;

float32x4_t* vb = (float32x4_t *) b;

float32x4_t* resultv = (float32x4_t *) result;

// clear output

float32x4_t* tmp = (float32x4_t *) result;

const float32x4_t zerov = {0.0,0.0,0.0,0.0};

int i;

for (i=0; i<colsA*colsB/4; ++i)

{

*tmp++ = zerov;

}

// to support vectorization, perform calculation in 4x4 blockwise

int rowblocks = rowsA/4;

int colblocks = colsB/4;

int colblocks2 = colblocks*colblocks;

int rowblock,colblock,ablock;

for (rowblock=0; rowblock<rowblocks; ++rowblock)

{

for (colblock=0; colblock<colblocks; ++colblock)

{

for (ablock=0; ablock<colblocks; ++ablock)

{

float32x4_t a0 = *(va+rowblock+0*colblocks+4*rowblocks*ablock);




float32x4_t b0 = *(vb+4*colblock*rowblocks+0*rowblocks+ablock);




float32x4_t* blockptr = resultv+4*colblock*rowblocks+rowblock;

//float32x4_t* blockptr = bptr;

// col 0

//float32x4_t oval0 = *(resultv+(0+(4*colblock))*rowblocks+rowblock);

float32x4_t oval0 = *blockptr;

float32x4_t b00 = vdupq_n_f32(vgetq_lane_f32(b0,0));

float32x4_t c0a = vmlaq_f32(oval0, a0, b00);


float32x4_t c0b = vmlaq_f32(c0a, a1, b10);


float32x4_t c0c = vmlaq_f32(c0b, a2, b20);


//*(resultv+(1+(4*colblock))*rowblocks+rowblock) = vmlaq_f32(c1c, a3, b31);

*blockptr = vmlaq_f32(c0c, a3, b30);

blockptr += rowblocks;

// col 1

// float32x4_t oval1 = *(resultv+(1+(4*colblock))*rowblocks+rowblock);












// col 2













// col 3











}

}

}

}

”SIMD optimized”

Matrix multiplication C

code for ARM

Atostek Oy 7.11.2014 31

• Up to 7x faster than portable C code

• Cortex-A8, 32x32 matrix

• Works only on ARM with Neon SIMD unit

• Clearly more difficult to write,

understand, and debug

• However, embedded DSP codes cannot

afford 7x performance loss

must use SIMD when available

...






...

www.atostek.com

void

cv_matMul_gaxpy(int rowsA, int colsA, int colsB, float* a, float *b, float* result)

{

vector float* va = (vector float *) a;

vector float* vb = (vector float *) b;

vector float* resultv = (vector float *) result;

// clear output

vector float* tmp = (vector float *) result;

const vector float zerov = {0.0,0.0,0.0,0.0};

int i;

for (i=0; i<colsA*colsB/16; ++i)

{

*tmp++ = zerov;

*tmp++ = zerov;

*tmp++ = zerov;

*tmp++ = zerov;

}

// to support vectorization, perform calculation in 4x4 blockwise

int rowblocks = rowsA/4;

int colblocks = colsB/4;

int colblocks2 = colblocks*colblocks;

int rowblock,colblock,ablock;

for (rowblock=0; rowblock<rowblocks; ++rowblock)

{

for (colblock=0; colblock<colblocks; ++colblock)

{

//vector float* bptr = resultv+4*colblock*rowblocks+rowblock;

for (ablock=0; ablock<colblocks; ++ablock)

{

vector float a0 = *(va+rowblock+0*colblocks+4*rowblocks*ablock);




vector float b0 = *(vb+4*colblock*rowblocks+0*rowblocks+ablock);




vector float* blockptr = resultv+4*colblock*rowblocks+rowblock;

//vector float* blockptr = bptr;

// col 0

//vector float oval0 = *(resultv+(0+(4*colblock))*rowblocks+rowblock);

vector float oval0 = *blockptr;

vector float b00 = spu_splats(spu_extract(b0,0));

vector float c0a = spu_madd(a0, b00, oval0);


vector float c0b = spu_madd(a1, b10, c0a);


vector float c0c = spu_madd(a2, b20, c0b);


//*(resultv+(1+(4*colblock))*rowblocks+rowblock) = spu_madd(a3, b31, c1c);

*blockptr = spu_madd(a3, b30, c0c);


// col 1

// vector float oval1 = *(resultv+(1+(4*colblock))*rowblocks+rowblock);












// col 2













// col 3











// blockptr += rowblocks;


}

}

}

}

Same for Cell SPU

Atostek Oy 7.11.2014 32

• ~12x faster than portable C

• Works only on Cell SPU

• We would like to avoid coding like this

• For each platform

...






...

www.atostek.com

• ”Mul” tests the matrix

multiplication code

fragments shown on

previous slides

• Corento beats even C+SIMD

code - mostly because

LLVM instruction scheduler

is better than GCC 4.2 and

Cortex-A8 is sensitive to

that.

Timing results on ARM Cortex-A8

Atostek Oy 7.11.2014 33

Safer

FP vs. IEC 61508

More safety, less bugs

• IEC 61508 and ISO 26262 (and others) are standards for developing systems with Safety Functions

• Safety Function failure can cause loss of life and limb

• Standards give guidelines for software development for different Safety Integrity Levels

• Conforming to standard (esp. at higher SILs this is a lot of work)– Requires a lot of

documentation, verification, and testing

– Functional programming to the rescue

Examples of recommended/required measures from safety standardsTechnique/measure Functional programming, e.g. Haskell or F#

Use of language subsets Not many dangerous features (compared to C++/asm) and those are usually easy to spot.

Enforcement of strong typing Unavoidable

Enforcement of low complexity (e.g. in a function) Easy

Use of style guides -

Use of naming conventions As in other languages,except when naming convention is substitute for typing easier

Restricted coupling between software components Functional interfaces

One entry and one exit point in subprograms (functions) There is no complicated interaction of control and data flow.

No dynamic objects or online test during creation Automatic memory management

Initialization of variables Unavoidable

Avoid global variables (or justify usage) Very easy

Limited use of pointers Very easy

No hidden data flow or control flow Easy

Summary

• F# is ML-derived functional language on .Net– Backed by Microsoft

• Functional programming advantages– Understandable: easier to reason about

• Code design

• Bug hunting

• Automated code transformations (compiler optimization)

• (Verification)

– Faster: better optimizable, parallelizable• Optimization is mandatory to get reasonable performance

– Safer: simpler semantics, easier to analyze behavior• Practical safety: Less bugs and failures

• Theoretical safety: Proving properties more feasible

• Bureucratic safety: Passes audits

Homework: Extend the diagram below to cover F#.

a quick introduction to f#bitti/functional-seminar/fp... · • the functional language for...

Documents