scala - the language for big data

38

Upload: tzach-zohar

Post on 15-Apr-2017

1.523 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Scala - THE language for Big Data
Page 4: Scala - THE language for Big Data
Page 5: Scala - THE language for Big Data
Page 6: Scala - THE language for Big Data
Page 7: Scala - THE language for Big Data
Page 8: Scala - THE language for Big Data
Page 9: Scala - THE language for Big Data
Page 10: Scala - THE language for Big Data

private static class Person { String firstName; String lastName;}

private List<Person> firstNFamilies(int n, List<Person> persons) { final List<String> familiesSoFar = new LinkedList<>(); final List<Person> result = new LinkedList<>(); for (Person p : persons) { if (familiesSoFar.contains(p.lastName)) { result.add(p); } else if (familiesSoFar.size() < n) { familiesSoFar.add(p.lastName); result.add(p); } } return result;}

case class Person(firstName: String, lastName: String)

def firstNFamilies(n: Int, persons: List[Person]): List[Person] = { val firstFamilies = persons.map(p => p.lastName).distinct.take(n) persons.filter(p => firstFamilies.contains(p.lastName))}

Page 11: Scala - THE language for Big Data
Page 12: Scala - THE language for Big Data
Page 13: Scala - THE language for Big Data
Page 14: Scala - THE language for Big Data
Page 15: Scala - THE language for Big Data

class DirectParquetOutputCommitter(outputPath: Path, context: TaskAttemptContext) extends ParquetOutputCommitter(outputPath, context) { … }

Java class from org.apache.parquet:parquet-hadoop

Scala class from org.apache.spark:spark-core_2.10

Page 16: Scala - THE language for Big Data
Page 19: Scala - THE language for Big Data
Page 20: Scala - THE language for Big Data

http://vmturbo.com/wp-content/uploads/2015/05/ScaleUpScaleOut_sm-min.jpg

Page 21: Scala - THE language for Big Data

http://vmturbo.com/wp-content/uploads/2015/05/ScaleUpScaleOut_sm-min.jpg

Page 22: Scala - THE language for Big Data
Page 23: Scala - THE language for Big Data
Page 24: Scala - THE language for Big Data
Page 25: Scala - THE language for Big Data
Page 26: Scala - THE language for Big Data

val numbers = 1 to 100000val result = numbers.map(slowF)

Page 27: Scala - THE language for Big Data

val numbers = 1 to 100000val result = numbers.par.map(slowF)

Parallelizes next manipulations over available CPUs

Page 28: Scala - THE language for Big Data

val numbers = 1 to 100000val result = sparkContext.parallelize(numbers).map(slowF)

Parallelizes next manipulations over scalable cluster, by creating a Spark RDD - a Resilient Distributed Dataset

Page 30: Scala - THE language for Big Data

Map

Map

MapMap Map (retry)

Page 31: Scala - THE language for Big Data
Page 32: Scala - THE language for Big Data
Page 33: Scala - THE language for Big Data
Page 34: Scala - THE language for Big Data
Page 35: Scala - THE language for Big Data
Page 37: Scala - THE language for Big Data
Page 38: Scala - THE language for Big Data