lightning talks & integrations track - running apache spark libraries on apache apex @ abdw17,...
TRANSCRIPT
2
• Motivation• Apex Processing Model• Spark Processing Model• Translation from Spark to Apex• Parallelism in Apex• I/O Performance Enhancement• RoadMap
3
4
5
6
7
val parsed = sc.textFile(path, minPartitions)
.map(_.trim)
.filter(line => !(line.isEmpty || line.startsWith("#")))
.map(training_record)
val d = parsed.reduce(math.Max + 1)
parsed.map(_+d).collect()
8
val parsed = sc.textFile(path, minPartitions)
.map(_.trim)
.filter(line => !(line.isEmpty || line.startsWith("#")))
.map(training_record)
Apex RDD
parsed
9
val d = parsed.reduce(math.Max + 1)
val d = nParsed
Apex RDD
10
parsed.map(_ + d).collect()
Parsed (ApexRDD)
11
Map
Map
Map
Map
Reduce
Reduce
12
13
14
15
16
17