a quick introduction to f#bitti/functional-seminar/fp... · • the functional language for...

38
A Quick introduction to F# …and my take on why functional programming matters Juhana Helovuo Atostek Oy 2014-11-10

Upload: others

Post on 22-May-2020

16 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

A Quick introduction to F#

…and my take on why functional programming matters

Juhana HelovuoAtostek Oy

2014-11-10

Page 2: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Contents

• Atostek

• F#

• Why should I care, elaborated– Easier to reason about

– Performance

– Safety

• Summary

This presentation uses materials from Visual Studio Help, Wikipedia, Github, and other parts of the Internet. May contain unnatural colours, artificial examples, and traces of nuts.

Page 3: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Atostek Oy

• Since 1999, office in Hermia

• AAA credit rating

• Owned by personnel

• Head count ~ 50- Mostly M.Sc. or Dr.Tech. from TUT + students

37.11.2014Atostek – Expertising your project

0

0,5

1

1,5

2

2,5

3

3,5

4

4,5

2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

Mil. €

Page 4: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

4Atostek Aatos - Tueksi hankintoihin

7.11.2014

Page 5: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

F#

Page 6: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

F# : General properties

• The functional language for Microsoft’s .Net platform– Strict evaluation– Direct support for .Net OO features ”multi-paradigm”

• Manipulating standard .Net objects causes side effects• F# standard library can be used for pure computation

– Standard component of Visual Studio since VS 2008

• Developed mostly from ML (Milner et al. , 1973)

– Also influence from OCaml, Python(!), Haskell, Scala, Erlang

• Native on .Net– Runs on Common Language Runtime, the .Net virtual machine– Compiled to Common Intermediate Language ( = CLR assembly)– Basic data types implemented on Common Type System– Many parts compatible with C#: strings, numbers, method calls, generics

• But e.g. Lists and Maps and other default libraries are different

Page 7: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Basic F# expressions> let x = 2 + 3 * 4 ;;

val x : int = 14

> type Person = { name : string ; age : int } ;;

> let aa = { name = "Aatos" ; age = 3 } ;;

val aa : Person = {name = "Aatos"; age = 3;}

> [1..3] ;;

val it : int list = [1; 2; 3]

> seq { for i in 0 .. x do

if i % 2 = 0 then yield i+2 } ;;

val it : seq<int> = seq [2; 4; 6; 8; ...]

> if aa.age >= 18 then printfn "Yes" else printfn "No“ ;;

No

val it : unit = ()

alternatively: printfn <| if aa.age >= 18 then "Yes" else "No" ;;

> ['a' .. 'f'] |> List.map int

val it : int list = [97; 98; 99; 100; 101; 102]

Page 8: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Common library types:List, Seq, Map, Option

• List<’a>is Lisp-style linked list.

• Seq<’a>– Lazy but non-memoizing

sequence

– Implemented using generators

– Many data structures can convert themselves to Seq

– Is really IEnumerable<a> from .Net in disguise

• Map<’Key,’Value> is an ordered indexable collection (balanced tree).

• type Option<’a> =| Some of ’a| None

– Can be used to add null value to a type.

– Safe, since contents can be only accessed after pattern match

Page 9: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Operations on whole data structures

• F#Seq.map

Seq.map2

Seq.fold

Seq.foldBack

Seq.filter

Seq.scan

...

List.map

List.filter

...

Map.map

Map.filter

...

9Atostek Oy 7.11.2014

• Haskellmap (fmap)

zipWith

foldr

foldl

filter

scan

concat

any

all

...

Page 10: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Functions and pattern matching

type Expression = | Number of int| Add of Expression * Expression | Multiply of Expression * Expression | Variable of string

let rec Evaluate (env:Map<string,int>) exp = match exp with| Number n -> n | Add (x, y) -> Evaluate env x + Evaluate env y | Multiply (x, y) -> Evaluate env x * Evaluate env y | Variable id -> env.[id]

let environment = Map.ofList [ "a", 1 ; "b", 2 ; "c", 3 ]

// Create an expression tree that represents the expression: a + 2 * b.let expressionTree1 = Add(Variable "a", Multiply(Number 2, Variable "b"))

// Evaluate the expression a + 2 * b, given the// table of values for the variables.let result = Evaluate environment expressionTree1

Page 11: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Computation Expressions

• Code blocks where computation semantics can be defined (within limits)

• Similar to do-notation is Haskell, but more syntactic constructs

• Examples:– State (monad)

– Option (Maybe, monad)

– Step-by-step computation

– Undo-computation

– Async

– Query expressions (LINQ)

– Seq expressions

– Software Transactional Memory (STM)

Page 12: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Computation expression: state

type State<'a, 's> = State of ('s -> 'a * 's)

let runState (State s) a = s alet getState = State (fun s -> (s,s))let putState s = State (fun _ -> ((),s))

type StateBuilder() =member this.Return(a) = State (fun s -> (a,s))member this.Bind(m,k) =

State (fun s ->let (a,s') = runState m srunState (k a) s')

member this.ReturnFrom (m) = m

let state = new StateBuilder()

let counterWorkflow =

let s = state {

do! DoSomething

let! a = Foobar

do! WithA a

return a+1

}

runState s 0

stateful

pure

Page 13: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Computation expression: state

static member GetYYYAndXXX : State<XXXState, XRetTypeX> =

state {

let s = System.XXX()

let ll = XXX

let! cC = XXXState.getXXX

let! m = XXXState.getMMM

[ . . . ]

// TODO: so far there are no XXX other XX than YYY

let e = mm.GetEEE

|> Seq.map (function Choice1Of2 both -> A both

| Choice2Of2 only -> B only)

|> Seq.groupBy (fun t -> t.XXXs)

|> Seq.map (fun (c,ts) ->

Set.map ll.GetXXX c

|> Seq.map (fun g -> g,List.ofSeq ts )

)

|> Seq.concat

|> Map.ofSeqWith (@)

[ . . . ]

return { c = cCc; m = allMMM }

}

Page 14: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Computation expression: asyncopen System.Net open Microsoft.FSharp.Control.WebExtensions

// List<(string*string)>let urlList = [ "Microsoft.com", "http://www.microsoft.com/"

"MSDN", "http://msdn.microsoft.com/""Bing", "http://www.bing.com" ]

let fetchAsync(name, url:string) = async {

trylet uri = new System.Uri(url) let webClient = new WebClient() let! html = webClient.AsyncDownloadString(uri) printfn "Read %d characters for %s" html.Length name

with| ex -> printfn "%s" (ex.Message);

}

let runAll() = urlList |> Seq.map fetchAsync |> Async.Parallel |> Async.RunSynchronously |> ignore

runAll()

Page 15: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Short F# summary

• ML-derived language on .Net– ”Can be used for everything that C# can be,

except for null reference exceptions.”

• Recursion is the goto:– In practical programming, all iteration is done using library functions.

– Actual recursion is hardly ever used.

• Computation expressions allow defining new behavior– But with a fixed syntax

Page 16: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Why should I care, longer explanation.

About functional programming in general, we are going outside F# here

Page 17: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Example: Insertion sort

• One of the simplest sorting algorithms

• Page 3 in ”Introduction to Algorithms”

≤key

key

1 2 …

Sorted Unsorted

Page 18: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Shorter – or easier to reason about?Example: insertion sort

-- Functional Language

-- Haskell, but F# translation would -- only have slightly different syntax

insertionSort :: Ord a => [a] -> [a]

insertionSort [] = []

insertionSort (x:xs) =

insert x (insertionSort xs)

where

insert :: Ord a => a -> [a] -> [a]

insert x [] = [x]

insert x (y:ys)

| x < y = x : y : ys

| otherwise = y : insert x ys

-- 8 lines of code

-- + 2 lines of (compiler-inferrable) types

-- Generic Imperative Language

-- Object-oriented or not

-- Input is array A[1..n].

j = 1

while j < n do

i j

j j + 1

key A[j]

while i > 0 and A[i] > key do

A[i + 1] A[i]

i i-1

endwhile

A[i + 1] key

endwhile

-- Output is array A[1..n], now sorted.

-- 11 lines code

-- Not much longer!

Page 19: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Proving Imperative insertion sort

j = 1while j < n doi jj j + 1key A[j]while i > 0 and A[i] > key do

A[i + 1] A[i]i i-1

endwhileA[i + 1] key

endwhile

• Input and output are the array A[1..n]• Correctness:

1. The output is sortedSortedA(1,n) i ϵ [1,n-1] : A[i] ≤ A[i+1]

2. The output has the same elements as the input. p : Aout = Permutation(p,Ain)

3. The algorithm terminates.• Requirements 1 and 2 are most interesting and difficult, so

we’ll try those.• We’ll handle 2 rather informally due to space, time, and

boringness constraints.• Requirement 3 is left for homework for those who are

interested.• Note that

– i,j,k: SortedA(i,j) ^ SortedA(j,k) SortedA(i,k)– i: SortedA (i,i)

≤key

key

1 2 …

Sorted

Page 20: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Proving Imperative insertion sort

• We use ”P” to denote that A is still a permutation of the input.

• Outer loop invariant I:SortedA(1,j) ^ 1≤j≤n

• Inner loop invariant I2:SortedA(1,j-1) ^ (i=j-1 v A[j-1] ≤ A[j])^ A[i+1..j] ≥ key ^ i≥0

• Where ”A[i+1..j] ≥ key” meansk ϵ [i+1..j] : A[k] ≥ key

≤key

key

1 2 …

Sorted

{ n ≥ 1 }

j = 1

{ j = 1 ^ I ^ P }

while j < n do

{ I ^ j < n ^ P}

i j

{ I ^ j < n ^ i=j ^ P }

j j + 1

{ I Sorted(1,j-1) ^ i=j-1 ^ 1<j≤n ^ P }

key A[j]

{ Sorted(1,j-1) ^ i=j-1 ^ 1<j≤n ^ key=A[j] ^ P }

{ I2 }

while i > 0 and A[i] > key do

{ I2 ^ i>0 ^ A[i]>key }

A[i + 1] A[i]

{ SortedA(1,j) ^ A[i+1..j]≥key ^ i>0 ^ A[i]>key ^ A[i+1] = A[i] }

i i-1

{ SortedA(1,j) ^ A[i+2..j]≥key ^ i≥0 ^ A[i+1]>key ^ A[i+2] = A[i+1] }

{I2}

endwhile

{ I2 ^ (i≤0 v A[i] ≤ key) but P }

{ I }

A[i + 1] key

{ I ^ P }

endwhile

{ I ^ j≥n ^ P} { SortedA(1,n) ^ P }

Page 21: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Proving Imperative insertion sort{ n ≥ 1 }

j = 1

{ j = 1 ^ I ^ P }

while j < n do

{ I ^ j < n ^ P}

i j

{ I ^ j < n ^ i=j ^ P }

j j + 1

{ I Sorted(1,j-1) ^ i=j-1 ^ 1<j≤n ^ P }

key A[j]

{ Sorted(1,j-1) ^ i=j-1 ^ 1<j≤n ^ key=A[j] ^ P }

{ I2 }

while i > 0 and A[i] > key do

{ I2 ^ i>0 ^ A[i]>key }

A[i + 1] A[i]

{ SortedA(1,j) ^ A[i+1..j]≥key ^ i>0 ^ A[i]>key ^ A[i+1] = A[i] }

i i-1

{ SortedA(1,j) ^ A[i+2..j]≥key ^ i≥0 ^ A[i+1]>key ^ A[i+2] = A[i+1] }

{I2}

endwhile

{ I2 ^ (i≤0 v A[i] ≤ key) but P }

{ I }

A[i + 1] key

{ I ^ P }

endwhile

{ I ^ j≥n ^ P} { SortedA(1,n) ^ P }

• I: SortedA(1,j) ^ 1≤j≤n

• I2: SortedA(1,j-1) ^ ( i=j-1 v A[j-1] ≤ A[j] )^ A[i+1..j] ≥ key ^ i≥0

≤key

key

1 2 …

Sorted

Case A[j-1] ≤ A[j]:SortedA(1,j-1) SortedA(1,j)

Case i=j-1:A[i]=A[i+1] A[j-1]≤A[j] SortedA(1,j)

A[i] > key ^ A[i+1]=A[i] A[i+1] ≥ key A[i+1..j] ≥ key

Page 22: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Proving Imperative insertion sort{ n ≥ 1 }

j = 1

{ j = 1 ^ I ^ P }

while j < n do

{ I ^ j < n ^ P}

i j

{ I ^ j < n ^ i=j ^ P }

j j + 1

{ I Sorted(1,j-1) ^ i=j-1 ^ 1<j≤n ^ P }

key A[j]

{ Sorted(1,j-1) ^ i=j-1 ^ 1<j≤n ^ key=A[j] ^ P }

{ I2 }

while i > 0 and A[i] > key do

{ I2 ^ i>0 ^ A[i]>key }

A[i + 1] A[i]

{ SortedA(1,j) ^ A[i+1..j]≥key ^ i>0 ^ A[i]>key ^ A[i+1] = A[i] }

i i-1

{ SortedA(1,j) ^ A[i+2..j]≥key ^ i≥0 ^ A[i+1]>key ^ A[i+2] = A[i+1] }

{I2}

endwhile

{ I2 ^ (i≤0 v A[i] ≤ key) but P }

{ I }

A[i + 1] key

{ I ^ P }

endwhile

{ I ^ j≥n ^ P} { SortedA(1,n) ^ P }

• I: SortedA(1,j) ^ 1≤j≤n

• I2: SortedA(1,j-1) ^ ( i=j-1 v A[j-1] ≤ A[j] )^ A[i+1..j] ≥ key ^ i≥0

≤key

key

1 2 …

Sorted

Case A[j-1] ≤ A[j] :SortedA(1,j-1) SortedA(1,j)

Case i=j-1 ^ i≤0 : i≥0 i=0 j=1 SortedA(1,j)

Case i=j-1 ^ A[i] ≤ key :A[i+1..i+1] ≥ key A[i] ≤ key ≤ A[i+1] A[j-1] ≤ key ≤ A[j] SortedA(1,j)

Page 23: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Proving Imperative insertion sort{ n ≥ 1 }

j = 1

{ j = 1 ^ I ^ P }

while j < n do

{ I ^ j < n ^ P}

i j

{ I ^ j < n ^ i=j ^ P }

j j + 1

{ I Sorted(1,j-1) ^ i=j-1 ^ 1<j≤n ^ P }

key A[j]

{ Sorted(1,j-1) ^ i=j-1 ^ 1<j≤n ^ key=A[j] ^ P }

{ I2 }

while i > 0 and A[i] > key do

{ I2 ^ i>0 ^ A[i]>key }

A[i + 1] A[i]

{ SortedA(1,j) ^ A[i+1..j]≥key ^ i>0 ^ A[i]>key ^ A[i+1] = A[i] }

i i-1

{ SortedA(1,j) ^ A[i+2..j]≥key ^ i≥0 ^ A[i+1]>key ^ A[i+2] = A[i+1] }

{I2}

endwhile

{ I2 ^ (i≤0 v A[i] ≤ key) but P }

{ I }

A[i + 1] key

{ I ^ P }

endwhile

{ I ^ j≥n ^ P} { SortedA(1,n) ^ P }

• I: SortedA(1,j) ^ 1≤j≤n

• I2: SortedA(1,j-1) ^ ( i=j-1 v A[j-1] ≤ A[j] )^ A[i+1..j] ≥ key ^ i≥0

≤key

key

1 2 …

Sorted

Assignment A[i + 1] key maintains I, because- SortedA(1,j)- A[i] ≤ key- A[i+1..j] ≥ key

P is restored, because…too long to show. Appeal to picture above.

Also would need to show that all indexing of A is within [1..n]. TL;DR.

Page 24: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Proving functional insertion sort

Define Sorted(x) ”list x is sorted in ascending order”

• Req1: a : Sorted(insertionSort a)

• Req2: p : insertionSort a = Perm(p,a)

• Req3: Termination.

Show Req1 using induction over length(a):

Base: length a = 0: Trivially true by line 1.

IHypo: length a = k Sorted(insertionSort a)

IStep: If length a = k+1 , then

insertionSort a = insertionSort (x:xs) = insert x (insertionSort xs).

Now ”insertionSort xs” is sorted by induction hypothesis.

We need to show: Sorted(L) Sorted(insert x L)

And we are done.

1 insertionSort [] = []

2 insertionSort (x:xs) =

3 insert x (insertionSort xs)

4 where

5 insert x [] = [x]

6 insert x (y:ys)

7 | x < y = x : y : ys

8 | otherwise = y : insert x ys

Page 25: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

…Proving functional insertion sort

Show ”Sorted(L) Sorted(insert x L)” :

Assume Sorted(L) and use induction on length of L.

Base: length L = 0 : Line 5 Trivially sorted.

IHypo: length L = k and Sorted(L) Sorted(insert x L)

IStep: If length L = k+1, then

insert x L = insert x (y:ys) and we have 2 cases:case x<y: result is ”x:y:ys” Sorted([x,y]) ^ Sorted(y:ys)

Sorted(x:y:xs).

case x≥y: result is ”y : insert x ys” Sorted(insert x ys) by IHypo.

y ≤ x ^ y ≤ (all of ys), because Sorted(L)

Sorted(y : insert x ys)

And we are done.

1 insertionSort [] = []

2 insertionSort (x:xs) =

3 insert x (insertionSort xs)

4 where

5 insert x [] = [x]

6 insert x (y:ys)

7 | x < y = x : y : ys

8 | otherwise = y : insert x ys

Page 26: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

…Proving functional insertion sort

Req2: p : insertionSort a = Perm(p,a)

• The value of ”insert” is a permutation of the input values (by case analysis of code).

• Same holds for insertionSort.

Done.

[Strictly speaking, the above is circular reasoning because of the recursion in the code, and therefore the logic is not valid. To get this formally right, use again induction along the length of input to avoid circular argument.]

Req3: Termination (Time complexity!)

• ”insert L” runs in O(length L), because each recursion step makes input 1 element shorter and is O(1).

• ”insertionSort L” similarly, each step takes O(n), therefore complexity is O((length L)2) and therefore finite.

1 insertionSort [] = []

2 insertionSort (x:xs) =

3 insert x (insertionSort xs)

4 where

5 insert x [] = [x]

6 insert x (y:ys)

7 | x < y = x : y : ys

8 | otherwise = y : insert x ys

Page 27: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Easier to reason about?

Functional

• We can apply deduction rules to expressions and use substitution principle referential transparency– The logical formulas are valid

or not without reference to program counter

• Recursive control induction proof

Imperative

• We can (must) analyze program state between statements.– Analysis must follow control

flow and use deduction rules for each control structure

– Need to invent(!) invariants

• This technique is known as ”Hoare Logic”– Course MAT-71506 or e.g.

Wikipedia.

Page 28: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Faster

Functional Parallel Programing in Corento

Page 29: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

www.atostek.com

• A single-assignment data flow

language for computation

kernels

– Designed and

implemented at Atostek

with Nokia 2009-2011

– Corento routines are

called from C (or

equivalent)

– Not independent

programs

• Matrix*Vector multiplication

code shown here

Corento

Atostek Oy 7.11.2014 29

inline

function vecMulScal

{value len:Integer}

(v1: [Float # len], x: Float)

: [Float # len] =

for a in v1 do

value all a*x

end

end

-- Golub & VanLoan:

-- Matrix Computations 3rd ed. pp. 6,

-- Algorithm 1.1.4 (Column Gaxpy)

inline

function matMulVec

{value rows:Integer, value cols:Integer}

(a:[[Float # rows] # cols], xs:[Float # cols])

: [Float # rows] =

let init_sum = vecZero{rows}();

in for ac in a, x in xs

initially cumsum = init_sum;

do yc = vecMulScal{rows}(ac,x);

next cumsum = vecAdd{rows}(cumsum,yc);

value last cumsum

end

end

end

Page 30: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

www.atostek.com

void c_matMul_gaxpy(int rowsA, int colsA, int colsB, float* a, float* b, float* result)

{

int rowsB = colsA;

int rowA,colA,colB;

for (colB=0;colB<colsB;colB++)

{

colA=0;

for (rowA=0;rowA<rowsA;rowA++)

{

result[colB*rowsA+rowA] = a[colA*rowsA+rowA]*b[colB*rowsB+colA];

}

for (colA=1;colA<colsA;colA++)

{

for (rowA=0;rowA<rowsA;rowA++)

{

result[colB*rowsA+rowA] += a[colA*rowsA+rowA]*b[colB*rowsB+colA];

}

}

}

}

Matrix * Matrix

in portable C

Atostek Oy 7.11.2014 30

Page 31: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

www.atostek.com

void

cv_matMul_gaxpy(int rowsA, int colsA, int colsB, float* a, float *b, float* result)

{

float32x4_t* va = (float32x4_t *) a;

float32x4_t* vb = (float32x4_t *) b;

float32x4_t* resultv = (float32x4_t *) result;

// clear output

float32x4_t* tmp = (float32x4_t *) result;

const float32x4_t zerov = {0.0,0.0,0.0,0.0};

int i;

for (i=0; i<colsA*colsB/4; ++i)

{

*tmp++ = zerov;

}

// to support vectorization, perform calculation in 4x4 blockwise

int rowblocks = rowsA/4;

int colblocks = colsB/4;

int colblocks2 = colblocks*colblocks;

int rowblock,colblock,ablock;

for (rowblock=0; rowblock<rowblocks; ++rowblock)

{

for (colblock=0; colblock<colblocks; ++colblock)

{

for (ablock=0; ablock<colblocks; ++ablock)

{

float32x4_t a0 = *(va+rowblock+0*colblocks+4*rowblocks*ablock);

float32x4_t a1 = *(va+rowblock+1*colblocks+4*rowblocks*ablock);

float32x4_t a2 = *(va+rowblock+2*colblocks+4*rowblocks*ablock);

float32x4_t a3 = *(va+rowblock+3*colblocks+4*rowblocks*ablock);

float32x4_t b0 = *(vb+4*colblock*rowblocks+0*rowblocks+ablock);

float32x4_t b1 = *(vb+4*colblock*rowblocks+1*rowblocks+ablock);

float32x4_t b2 = *(vb+4*colblock*rowblocks+2*rowblocks+ablock);

float32x4_t b3 = *(vb+4*colblock*rowblocks+3*rowblocks+ablock);

float32x4_t* blockptr = resultv+4*colblock*rowblocks+rowblock;

//float32x4_t* blockptr = bptr;

// col 0

//float32x4_t oval0 = *(resultv+(0+(4*colblock))*rowblocks+rowblock);

float32x4_t oval0 = *blockptr;

float32x4_t b00 = vdupq_n_f32(vgetq_lane_f32(b0,0));

float32x4_t c0a = vmlaq_f32(oval0, a0, b00);

float32x4_t b10 = vdupq_n_f32(vgetq_lane_f32(b0,1));

float32x4_t c0b = vmlaq_f32(c0a, a1, b10);

float32x4_t b20 = vdupq_n_f32(vgetq_lane_f32(b0,2));

float32x4_t c0c = vmlaq_f32(c0b, a2, b20);

float32x4_t b30 = vdupq_n_f32(vgetq_lane_f32(b0,3));

//*(resultv+(1+(4*colblock))*rowblocks+rowblock) = vmlaq_f32(c1c, a3, b31);

*blockptr = vmlaq_f32(c0c, a3, b30);

blockptr += rowblocks;

// col 1

// float32x4_t oval1 = *(resultv+(1+(4*colblock))*rowblocks+rowblock);

float32x4_t oval1 = *blockptr;

float32x4_t b01 = vdupq_n_f32(vgetq_lane_f32(b1,0));

float32x4_t c1a = vmlaq_f32(oval1, a0, b01);

float32x4_t b11 = vdupq_n_f32(vgetq_lane_f32(b1,1));

float32x4_t c1b = vmlaq_f32(c1a, a1, b11);

float32x4_t b21 = vdupq_n_f32(vgetq_lane_f32(b1,2));

float32x4_t c1c = vmlaq_f32(c1b, a2, b21);

float32x4_t b31 = vdupq_n_f32(vgetq_lane_f32(b1,3));

*blockptr = vmlaq_f32(c1c, a3, b31);

blockptr += rowblocks;

//*(resultv+(1+(4*colblock))*rowblocks+rowblock) = vmlaq_f32(c1c, a3, b31);

// col 2

//float32x4_t oval2 = *(resultv+(2+(4*colblock))*rowblocks+rowblock);

float32x4_t oval2 = *blockptr;

float32x4_t b02 = vdupq_n_f32(vgetq_lane_f32(b2,0));

float32x4_t c2a = vmlaq_f32(oval2, a0, b02);

float32x4_t b12 = vdupq_n_f32(vgetq_lane_f32(b2,1));

float32x4_t c2b = vmlaq_f32(c2a, a1, b12);

float32x4_t b22 = vdupq_n_f32(vgetq_lane_f32(b2,2));

float32x4_t c2c = vmlaq_f32(c2b, a2, b22);

float32x4_t b32 = vdupq_n_f32(vgetq_lane_f32(b2,3));

*blockptr = vmlaq_f32(c2c, a3, b32);

blockptr += rowblocks;

//*(resultv+(2+(4*colblock))*rowblocks+rowblock) = vmlaq_f32(c2c, a3, b32);

// col 3

//float32x4_t oval3 = *(resultv+(3+(4*colblock))*rowblocks+rowblock);

float32x4_t oval3 = *blockptr;

float32x4_t b03 = vdupq_n_f32(vgetq_lane_f32(b3,0));

float32x4_t c3a = vmlaq_f32(oval3, a0, b03);

float32x4_t b13 = vdupq_n_f32(vgetq_lane_f32(b3,1));

float32x4_t c3b = vmlaq_f32(c3a, a1, b13);

float32x4_t b23 = vdupq_n_f32(vgetq_lane_f32(b3,2));

float32x4_t c3c = vmlaq_f32(c3b, a2, b23);

float32x4_t b33 = vdupq_n_f32(vgetq_lane_f32(b3,3));

*blockptr = vmlaq_f32(c3c, a3, b33);

}

}

}

}

”SIMD optimized”

Matrix multiplication C

code for ARM

Atostek Oy 7.11.2014 31

• Up to 7x faster than portable C code

• Cortex-A8, 32x32 matrix

• Works only on ARM with Neon SIMD unit

• Clearly more difficult to write,

understand, and debug

• However, embedded DSP codes cannot

afford 7x performance loss

must use SIMD when available

...

float32x4_t oval3 = *blockptr;

float32x4_t b03 = vdupq_n_f32(vgetq_lane_f32(b3,0));

float32x4_t c3a = vmlaq_f32(oval3, a0, b03);

float32x4_t b13 = vdupq_n_f32(vgetq_lane_f32(b3,1));

float32x4_t c3b = vmlaq_f32(c3a, a1, b13);

...

Page 32: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

www.atostek.com

void

cv_matMul_gaxpy(int rowsA, int colsA, int colsB, float* a, float *b, float* result)

{

vector float* va = (vector float *) a;

vector float* vb = (vector float *) b;

vector float* resultv = (vector float *) result;

// clear output

vector float* tmp = (vector float *) result;

const vector float zerov = {0.0,0.0,0.0,0.0};

int i;

for (i=0; i<colsA*colsB/16; ++i)

{

*tmp++ = zerov;

*tmp++ = zerov;

*tmp++ = zerov;

*tmp++ = zerov;

}

// to support vectorization, perform calculation in 4x4 blockwise

int rowblocks = rowsA/4;

int colblocks = colsB/4;

int colblocks2 = colblocks*colblocks;

int rowblock,colblock,ablock;

for (rowblock=0; rowblock<rowblocks; ++rowblock)

{

for (colblock=0; colblock<colblocks; ++colblock)

{

//vector float* bptr = resultv+4*colblock*rowblocks+rowblock;

for (ablock=0; ablock<colblocks; ++ablock)

{

vector float a0 = *(va+rowblock+0*colblocks+4*rowblocks*ablock);

vector float a1 = *(va+rowblock+1*colblocks+4*rowblocks*ablock);

vector float a2 = *(va+rowblock+2*colblocks+4*rowblocks*ablock);

vector float a3 = *(va+rowblock+3*colblocks+4*rowblocks*ablock);

vector float b0 = *(vb+4*colblock*rowblocks+0*rowblocks+ablock);

vector float b1 = *(vb+4*colblock*rowblocks+1*rowblocks+ablock);

vector float b2 = *(vb+4*colblock*rowblocks+2*rowblocks+ablock);

vector float b3 = *(vb+4*colblock*rowblocks+3*rowblocks+ablock);

vector float* blockptr = resultv+4*colblock*rowblocks+rowblock;

//vector float* blockptr = bptr;

// col 0

//vector float oval0 = *(resultv+(0+(4*colblock))*rowblocks+rowblock);

vector float oval0 = *blockptr;

vector float b00 = spu_splats(spu_extract(b0,0));

vector float c0a = spu_madd(a0, b00, oval0);

vector float b10 = spu_splats(spu_extract(b0,1));

vector float c0b = spu_madd(a1, b10, c0a);

vector float b20 = spu_splats(spu_extract(b0,2));

vector float c0c = spu_madd(a2, b20, c0b);

vector float b30 = spu_splats(spu_extract(b0,3));

//*(resultv+(1+(4*colblock))*rowblocks+rowblock) = spu_madd(a3, b31, c1c);

*blockptr = spu_madd(a3, b30, c0c);

blockptr += rowblocks;

// col 1

// vector float oval1 = *(resultv+(1+(4*colblock))*rowblocks+rowblock);

vector float oval1 = *blockptr;

vector float b01 = spu_splats(spu_extract(b1,0));

vector float c1a = spu_madd(a0, b01, oval1);

vector float b11 = spu_splats(spu_extract(b1,1));

vector float c1b = spu_madd(a1, b11, c1a);

vector float b21 = spu_splats(spu_extract(b1,2));

vector float c1c = spu_madd(a2, b21, c1b);

vector float b31 = spu_splats(spu_extract(b1,3));

*blockptr = spu_madd(a3, b31, c1c);

blockptr += rowblocks;

//*(resultv+(1+(4*colblock))*rowblocks+rowblock) = spu_madd(a3, b31, c1c);

// col 2

//vector float oval2 = *(resultv+(2+(4*colblock))*rowblocks+rowblock);

vector float oval2 = *blockptr;

vector float b02 = spu_splats(spu_extract(b2,0));

vector float c2a = spu_madd(a0, b02, oval2);

vector float b12 = spu_splats(spu_extract(b2,1));

vector float c2b = spu_madd(a1, b12, c2a);

vector float b22 = spu_splats(spu_extract(b2,2));

vector float c2c = spu_madd(a2, b22, c2b);

vector float b32 = spu_splats(spu_extract(b2,3));

*blockptr = spu_madd(a3, b32, c2c);

blockptr += rowblocks;

//*(resultv+(2+(4*colblock))*rowblocks+rowblock) = spu_madd(a3, b32, c2c);

// col 3

//vector float oval3 = *(resultv+(3+(4*colblock))*rowblocks+rowblock);

vector float oval3 = *blockptr;

vector float b03 = spu_splats(spu_extract(b3,0));

vector float c3a = spu_madd(a0, b03, oval3);

vector float b13 = spu_splats(spu_extract(b3,1));

vector float c3b = spu_madd(a1, b13, c3a);

vector float b23 = spu_splats(spu_extract(b3,2));

vector float c3c = spu_madd(a2, b23, c3b);

vector float b33 = spu_splats(spu_extract(b3,3));

*blockptr = spu_madd(a3, b33, c3c);

// blockptr += rowblocks;

//*(resultv+(3+(4*colblock))*rowblocks+rowblock) = spu_madd(a3, b33, c3c);

}

}

}

}

Same for Cell SPU

Atostek Oy 7.11.2014 32

• ~12x faster than portable C

• Works only on Cell SPU

• We would like to avoid coding like this

• For each platform

...

vector float oval3 = *blockptr;

vector float b03 = spu_splats(spu_extract(b3,0));

vector float c3a = spu_madd(a0, b03, oval3);

vector float b13 = spu_splats(spu_extract(b3,1));

vector float c3b = spu_madd(a1, b13, c3a);

...

Page 33: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

www.atostek.com

• ”Mul” tests the matrix

multiplication code

fragments shown on

previous slides

• Corento beats even C+SIMD

code - mostly because

LLVM instruction scheduler

is better than GCC 4.2 and

Cortex-A8 is sensitive to

that.

Timing results on ARM Cortex-A8

Atostek Oy 7.11.2014 33

Page 34: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Safer

FP vs. IEC 61508

Page 35: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

More safety, less bugs

• IEC 61508 and ISO 26262 (and others) are standards for developing systems with Safety Functions

• Safety Function failure can cause loss of life and limb

• Standards give guidelines for software development for different Safety Integrity Levels

• Conforming to standard (esp. at higher SILs this is a lot of work)– Requires a lot of

documentation, verification, and testing

– Functional programming to the rescue

Page 36: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Examples of recommended/required measures from safety standardsTechnique/measure Functional programming, e.g. Haskell or F#

Use of language subsets Not many dangerous features (compared to C++/asm) and those are usually easy to spot.

Enforcement of strong typing Unavoidable

Enforcement of low complexity (e.g. in a function) Easy

Use of style guides -

Use of naming conventions As in other languages,except when naming convention is substitute for typing easier

Restricted coupling between software components Functional interfaces

One entry and one exit point in subprograms (functions) There is no complicated interaction of control and data flow.

No dynamic objects or online test during creation Automatic memory management

Initialization of variables Unavoidable

Avoid global variables (or justify usage) Very easy

Limited use of pointers Very easy

No hidden data flow or control flow Easy

Page 37: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Summary

• F# is ML-derived functional language on .Net– Backed by Microsoft

• Functional programming advantages– Understandable: easier to reason about

• Code design

• Bug hunting

• Automated code transformations (compiler optimization)

• (Verification)

– Faster: better optimizable, parallelizable• Optimization is mandatory to get reasonable performance

– Safer: simpler semantics, easier to analyze behavior• Practical safety: Less bugs and failures

• Theoretical safety: Proving properties more feasible

• Bureucratic safety: Passes audits

Page 38: A Quick introduction to F#bitti/functional-seminar/FP... · • The functional language for Microsoft’s .Net platform –Strict evaluation –Direct support for .Net OO features

Homework: Extend the diagram below to cover F#.