why functional programming is important in big data era

24
Why Functional Programming Is Important In Big Data Era? [email protected]

Upload: handaru-sakti

Post on 06-May-2015

323 views

Category:

Data & Analytics


1 download

DESCRIPTION

The only thing that works for parallel programming is functional programming. --Carnegie Mello Professor Bob Harper

TRANSCRIPT

Page 1: Why Functional Programming Is Important in Big Data Era

Why Functional Programming Is Important In Big Data Era?

[email protected]

Page 2: Why Functional Programming Is Important in Big Data Era

What Is Big Data?

Page 3: Why Functional Programming Is Important in Big Data Era

What Are The Steps?

Act On

Analyze

Collect

Page 4: Why Functional Programming Is Important in Big Data Era

What We Need?

D

Distributed Computing

Cluster

ProcessData

Page 5: Why Functional Programming Is Important in Big Data Era

What We Need?

• Spark as data processsing in cluster, originally written in Scala, which allows concise function syntax and interactive use

• Mesos as cluster manager• ZooKeeper as highly reliable distributed

coordinator• HDFS as distributed storage

Page 6: Why Functional Programming Is Important in Big Data Era

What We Need?

• Pure functions• Atomic operations• Parallel patterns or skeletons• Lightweight algorithms

The only thing that works for parallel programming is functional programming.

--Carnegie Mello Professor Bob Harper

Page 7: Why Functional Programming Is Important in Big Data Era

What Is Functional Programming?

Page 8: Why Functional Programming Is Important in Big Data Era

FP Quick Tour In Scala

• Basic transformations:var array = new Array[Int](10)var list = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

• Indexing:array(0) = 1println(list(0))

• Anonymous functions:val multiplay = (x: Int, y: Int) => x * y

val procedure = { x: Int => {println(“Hello, ”+x)

println(x * 10) } }

Page 9: Why Functional Programming Is Important in Big Data Era

FP Quick Tour In Scala

• Scala closure syntax:(x: Int) => x * 10 // full versionx => x * 10 // type interference_ * 10 // underscore syntaxx => { // body is block of code

val y = 10x * y

}

Page 10: Why Functional Programming Is Important in Big Data Era

FP Quick Tour In Scala

• Processing collections:var list = List(1, 2, 3, 4, 5, 6, 7, 8, 9)

list.foreach(x => println(x)) list.map(_ * 10)list.filter(x => x % 2 == 0)list.reduce((x, y) => x + y)list.reduce(_ + _)

def f(x: Int) = List(x-1, x x+1)list.map(x => f(x))list.map(f(_))list.flatMap(x => f(x))list.map(x => f(x)).reduce(_ ++ _)

Page 11: Why Functional Programming Is Important in Big Data Era

Spark Quick Tour

• Spark context:• Entry point to Spark functionality• In spark-shell, crated as sc• In standalone-spark-program, we must create it

• Resilient distributed datasets (RDDs) : • A distributed memory abstraction • A logically centralized entity but physically partitioned across multiple

machines inside a cluster based on some notion of key• Immutable• Automatically rebuilt on failure• Based on LRU (Least Recent Use) eviction algorithm

Page 12: Why Functional Programming Is Important in Big Data Era

Spark Quick Tour

Working with RDDs

Page 13: Why Functional Programming Is Important in Big Data Era

Spark Quick Tour

Cached RDDs

Page 14: Why Functional Programming Is Important in Big Data Era

Spark Quick Tour

• Transformations:• Lazy operations to build RDDs from other RDDs

• Narrow transformation (involves no data shuffling) :• map• flatMap• filter

• Wide transformation (involves data shuffling):• sortByKey• reduceByKey• groupByKey

• Actions:• Return a result or write it to storage

• collect• count• take(n)

Page 15: Why Functional Programming Is Important in Big Data Era

Spark Quick Tour

Transformations

Page 16: Why Functional Programming Is Important in Big Data Era

Spark Quick Tour

• Creating RDDs:val numbers = sc.parallelize(List(1, 2, 3, 4, 5))

val textFile = sc.textFile("hdfs://localhost/test/tobe.txt")val textFile = sc.textFile("hdfs://localhost/test/*.txt")

• Basic transformations:val squares = numbers.map(x => x * x) val evens = squares.filter(_ < 9)val mapto = numbers.flatMap(x => 1 to x)

val words = textFile.flatMap(_.split(" ")).cache()

Base RDD

Transformed RDD

Turn a collection to RDD

Page 17: Why Functional Programming Is Important in Big Data Era

Spark Quick Tour

• Basic actions:words.collect()words take(5)words countwords.reduce(_ + _)

words.filter(_ == “be").count()words.filter(_ == “or").count()

words.saveAsTextFile("hdfs://localhost/test/result")

The influence of cache

Page 18: Why Functional Programming Is Important in Big Data Era

Spark Quick Tour

• Pair syntax:val pair = (a, b)

• Accessing pair elements:pair._1 pair._2

• Key-value operations:val pets = sc.parallelize(List(("cat", 1), ("dog", 2), ("cat", 3)))pets.reduceByKey(_ + _)pets.groupByKey()pets.sortByKey()

Page 19: Why Functional Programming Is Important in Big Data Era

Hello World

val logFile = "hdfs://localhost/test/tobe.txt" val logData = sc.textFile(logFile).cache()val wordCount = logData.flatMap(_.split(“ “)) .map((_, 1)) .reduceByKey(_ + _)

wordCount.saveAsTextFile("hdfs://localhost/wordcount/result") sc.stop()

Page 20: Why Functional Programming Is Important in Big Data Era

Execution

Page 21: Why Functional Programming Is Important in Big Data Era

Software Components

Application

Spark Context

ZooKeeper

Mesos Master

Mesos Slave

Spark Executor

Mesos Slave

Spark Executor

HDFS/Other Storage