leveraging parallel data processing frameworks with
TRANSCRIPT
![Page 1: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/1.jpg)
Leveraging Parallel Data Processing Frameworks withVerified Lifting
Maaz Ahmad Alvin Cheung
![Page 2: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/2.jpg)
Data
Motivation
Data Collection Tool Data Analytics Application(Sequential Java)
2
![Page 3: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/3.jpg)
Data
Motivation
Data Collection Tool Data Analytics Application(Sequential Java)
2
![Page 4: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/4.jpg)
Data
Motivation
Data Collection Tool Data Analytics Application(Sequential Java)
2
![Page 5: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/5.jpg)
Data
Motivation
Data Collection Tool Data Analytics Application(Sequential Java)
2
![Page 6: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/6.jpg)
Motivation
Data
Data Collection Tool Data Analytics Application(Sequential Java)
2
![Page 7: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/7.jpg)
Motivation
Data
Data Collection Tool Data Analytics Application(Sequential Java)
2
![Page 8: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/8.jpg)
Motivation
Data
Data Collection Tool Data Analytics Application(Sequential Java)
2
![Page 9: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/9.jpg)
Motivation
Data
Data Collection Tool Data Analytics Application(Sequential Java)
I need something
faster.
2
![Page 10: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/10.jpg)
Parallel Processing Frameworks
3
![Page 11: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/11.jpg)
Parallel Processing Frameworks
Which one is right for me?
3
![Page 12: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/12.jpg)
Parallel Processing Frameworks
Which one is right for me? How do I
program in this?
3
![Page 13: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/13.jpg)
Parallel Processing Frameworks
Which one is right for me? How do I
program in this?
I will have to re-write my application!
3
![Page 14: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/14.jpg)
Parallel Processing Frameworks
Which one is right for me? How do I
program in this?
I will have to re-write my application!
Re-write might introduce bugs.
3
![Page 15: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/15.jpg)
How can we make life easier?
Java To Hadoop Compiler
4
![Page 16: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/16.jpg)
How can we make life easier?
Java To Spark Compiler
5
![Page 17: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/17.jpg)
Syntax Directed Rules
Hard to come up with rules
Brittle to code pattern changes
for(int i = 0; i < data; i++){
}
mapper(key, data){
}reducer(key, values){
}
6
…
…
…
![Page 18: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/18.jpg)
Syntax Directed Rules
Hard to come up with rules
Brittle to code pattern changes
Syntax Directed Rules
for(int i = 0; i < data; i++){
}
fm(val) →
fr(val1, val2) →
output = reduce(map(data, fm), fr);
mapper(key, data){
}reducer(key, values){
}
6
…
…
…
…
…
![Page 19: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/19.jpg)
Syntax Directed Rules
Hard to come up with rules
Brittle to code pattern changes
Verified Lifting Syntax Directed Rules
for(int i = 0; i < data; i++){
}
fm(val) →
fr(val1, val2) →
output = reduce(map(data, fm), fr);
mapper(key, data){
}reducer(key, values){
}
How do we do this?
- Program analysis
- Synthesis
- Theorem prover
6
…
…
…
…
…
![Page 20: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/20.jpg)
Introducing CASPER
• Re-targets sequential Java code fragments to Hadoop/Spark frameworks.
• Input: Unannotated sequential Java application source code.
• Output: Translated application source code that runs on top of
Hadoop/Spark to leverage its parallel execution.
7
![Page 21: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/21.jpg)
MapReduce Overview
8
InputData
![Page 22: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/22.jpg)
MapReduce Overview
8
InputData
Data Split
Data Split
Data Split
Mapper
Mapper
Mapper
![Page 23: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/23.jpg)
MapReduce Overview
8
InputData
Data Split
Data Split
Data Split
Mapper
Mapper
Mapper
GeneratedKey-Value Pairs
Reducerkey1
Reducerkey2
Reducerkey3 (Key3, value)
(Key2, value)
(Key1, value)
![Page 24: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/24.jpg)
Verified Lifting
• Infer code semantics (summary) in a high level specification
• A summary describes the effect of code on the output variables
data_sqr = 0;for(int i = 0; i < data.size(); i++){
data_sqr += data[i] * data[i];}
𝑑𝑎𝑡𝑎_𝑠𝑞𝑟 ≡
𝑖 = 0
𝑖 = 𝑑𝑎𝑡𝑎.𝑠𝑖𝑧𝑒() − 1
𝑑𝑎𝑡𝑎 𝑖 2
Java Code Fragment
Summary
9
![Page 25: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/25.jpg)
Verified Lifting
• Infer code semantics (summary) in a high level specification
• A summary describes the effect of code on the output variables
data_sqr = 0;for(int i = 0; i < data.size(); i++){
data_sqr += data[i] * data[i];}
𝑑𝑎𝑡𝑎_𝑠𝑞𝑟 ≡
𝑖 = 0
𝑖 = 𝑑𝑎𝑡𝑎.𝑠𝑖𝑧𝑒() − 1
𝑑𝑎𝑡𝑎 𝑖 2
Java Code Fragment
SummaryPost-condition
9
![Page 26: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/26.jpg)
Verified Lifting
• Infer code semantics (summary) in a high level specification
• A summary describes the effect of code on the output variables
data_sqr = 0;for(int i = 0; i < data.size(); i++){
data_sqr += data[i] * data[i];}
𝑑𝑎𝑡𝑎_𝑠𝑞𝑟 ≡
𝑖 = 0
𝑖 = 𝑑𝑎𝑡𝑎.𝑠𝑖𝑧𝑒() − 1
𝑑𝑎𝑡𝑎 𝑖 2
Java Code Fragment
SummaryPost-condition
9
![Page 27: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/27.jpg)
Verified Lifting
• Infer code semantics (summary) in a high level specification
• A summary describes the effect of code on the output variables
data_sqr = 0;for(int i = 0; i < data.size(); i++){
data_sqr += data[i] * data[i];}
𝑑𝑎𝑡𝑎_𝑠𝑞𝑟 ≡
𝑖 = 0
𝑖 = 𝑑𝑎𝑡𝑎.𝑠𝑖𝑧𝑒() − 1
𝑑𝑎𝑡𝑎 𝑖 2
Java Code Fragment
SummaryPost-condition • Specifications must be trivial
to translate.
• Program specification exhibits
good parallelism.9
![Page 28: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/28.jpg)
Code Summaries in Casper
10
![Page 29: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/29.jpg)
Code Summaries in Casper
∀𝑣 ∈ 𝑜𝑢𝑡𝑝𝑢𝑡𝑉𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠.
10
![Page 30: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/30.jpg)
Code Summaries in Casper
𝑣 ≡ 𝑓𝑟𝑒𝑑𝑢𝑐𝑒(𝑣0, 𝑟𝑒𝑑𝑢𝑐𝑒 𝑚𝑎𝑝 𝑑𝑎𝑡𝑎, 𝑓𝑚𝑎𝑝 , 𝑓𝑟𝑒𝑑𝑢𝑐𝑒 )∀𝑣 ∈ 𝑜𝑢𝑡𝑝𝑢𝑡𝑉𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠.
10
![Page 31: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/31.jpg)
Code Summaries in Casper
𝑣 ≡ 𝑓𝑟𝑒𝑑𝑢𝑐𝑒(𝑣0, 𝑟𝑒𝑑𝑢𝑐𝑒 𝑚𝑎𝑝 𝑑𝑎𝑡𝑎, 𝑓𝑚𝑎𝑝 , 𝑓𝑟𝑒𝑑𝑢𝑐𝑒 )∀𝑣 ∈ 𝑜𝑢𝑡𝑝𝑢𝑡𝑉𝑎𝑟𝑖𝑎𝑏𝑙𝑒𝑠.
Where,
𝑚𝑎pand 𝑓 𝑟𝑒𝑑𝑢𝑐𝑒 are synthesized for each code fragment.
ons.
𝑓 𝑚𝑎𝑝 𝑚𝑎𝑝 𝑝 𝑚𝑎𝑝 and 𝑓𝑟𝑒𝑑𝑢𝑐𝑒 are synthesized for each code fragment.
10
![Page 32: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/32.jpg)
Restricting Search Space
• Use Syntax-Guided Synthesis (SyGuS) to generate 𝑓𝑚𝑎𝑝 and 𝑓𝑟𝑒𝑑𝑢𝑐𝑒 .
• Use a grammar to specify a set of candidate summaries.
• Grammar is dynamically generated for each code fragment.
11
![Page 33: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/33.jpg)
Grammar Generation: fmap
• The body of 𝑓𝑚𝑎𝑝 is just a sequence of emits.
• Begin with number of emits equal to number of output variables.
• Incrementally add emits statements up to a user-defined bound.
𝑀𝑎𝑝 → 𝑀𝑎𝑝 𝑀𝑎𝑝 | 𝐸𝑚𝑖𝑡
𝐸𝑚𝑖𝑡 → 𝑒𝑚𝑖𝑡 𝐾𝑒𝑦, 𝑉𝑎𝑙𝑢𝑒 ; | 𝑖𝑓 𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 𝑒𝑚𝑖𝑡 𝐾𝑒𝑦, 𝑉𝑎𝑙𝑢𝑒 ;
𝐾𝑒𝑦 → 𝐼𝑛𝑡𝐸𝑥𝑝 𝑆𝑡𝑟𝑖𝑛𝑔𝐸𝑥𝑝 𝐵𝑜𝑜𝑙𝐸𝑥𝑝 | …
𝑉𝑎𝑙𝑢𝑒 → 𝐼𝑛𝑡𝐸𝑥𝑝 𝑆𝑡𝑟𝑖𝑛𝑔𝐸𝑥𝑝 𝐵𝑜𝑜𝑙𝐸𝑥𝑝 | …
12
![Page 34: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/34.jpg)
Grammar Generation: fmap
• The key and value for each emit are generated using expression
grammars.
data_sqr = 0;for(int i = 0; i < data.size(); i++){
data_sqr += data[i] * data[i];}
Java Code Fragment
𝐼𝑛𝑡𝐸𝑥𝑝 → 𝐼𝑛𝑡𝐸𝑥𝑝 + 𝐼𝑛𝑡𝐸𝑥𝑝 𝐼𝑛𝑡𝐸𝑥𝑝 ∗ 𝐼𝑛𝑡𝐸𝑥𝑝 𝑑𝑎𝑡𝑎 𝐼𝑛𝑡𝐸𝑥𝑝 | 𝐼𝑛𝑡𝑉𝑎𝑙
𝐼𝑛𝑡𝑉𝑎𝑙 → 𝑑𝑎𝑡𝑎_𝑠𝑞𝑟 𝑖 𝑙𝑖𝑡𝑒𝑟𝑎𝑙
Integer Expression Grammar
13
![Page 35: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/35.jpg)
Grammar Generation: freduce
• The body of 𝑓𝑟𝑒𝑑𝑢𝑐𝑒 implements a fold operation.
𝑅𝑒𝑑𝑢𝑐𝑒 → 𝑖𝑛𝑡 𝑟𝑒𝑠 = 𝑙𝑖𝑡𝑒𝑟𝑎𝑙; 𝑓𝑜𝑟 𝑣𝑎𝑙𝑢𝑒 ∶ 𝑣𝑎𝑙𝑢𝑒𝑠 𝑟𝑒𝑠 = 𝐹𝑜𝑙𝑑𝐸𝑥𝑝; 𝑒𝑚𝑖𝑡 𝑘𝑒𝑦, 𝑟𝑒𝑠 ;
𝐹𝑜𝑙𝑑𝐸𝑥𝑝 → 𝐹𝑜𝑙𝑑𝐸𝑥𝑝 + 𝐹𝑜𝑙𝑑𝐸𝑥𝑝 𝐹𝑜𝑙𝑑𝐸𝑥𝑝 ∗ 𝐹𝑜𝑙𝑑𝐸𝑥𝑝 𝐼𝑛𝑡𝑉𝑎𝑙
𝐼𝑛𝑡𝑉𝑎𝑙 → 𝑟𝑒𝑠 𝑣𝑎𝑙 𝑘𝑒𝑦 | 𝑙𝑖𝑡𝑒𝑟𝑎𝑙
Fold Expression Grammar
14
data_sqr = 0;for(int i = 0; i < data.size(); i++){
data_sqr += data[i] * data[i];}
Java Code Fragment
![Page 36: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/36.jpg)
Verifying Equivalence
• CASPER uses Hoare-style verification conditions.
• Verification conditions are the weakest pre-conditions for the post-
condition (code summary) to hold.
• Proving post-conditions for code fragments containing loops requires
loop-invariants.
15
![Page 37: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/37.jpg)
Verifying Equivalence Pt. 2
16
data_sqr = 0;for(int i = 0; i < data.size(); i++){
data_sqr += data[i] * data[i];}
![Page 38: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/38.jpg)
Verifying Equivalence Pt. 2
16
data_sqr = 0;for(int i = 0; i < data.size(); i++){
data_sqr += data[i] * data[i];}
![Page 39: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/39.jpg)
Verifying Equivalence Pt. 2
16
data_sqr = 0;for(int i = 0; i < data.size(); i++){
data_sqr += data[i] * data[i];}
![Page 40: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/40.jpg)
Verifying Equivalence Pt. 2
16
data_sqr = 0;for(int i = 0; i < data.size(); i++){
data_sqr += data[i] * data[i];}
![Page 41: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/41.jpg)
Formal Verification
• We have modelled the MapReduce library in Dafny.
• The generated summary is compiled down to Dafny code.
• Code annotations are automatically generated. These include:
• Verification conditions
• Proof lemmas
17
![Page 42: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/42.jpg)
Lemma Example
lemma InductiveStep (data: seq<int>, i: int, data_sqr: int)
requires invariant(data, i, data_sqr) && i < |data|
ensures invariant(data, i + 1, data_sqr + (data[i] * data[i]));
{
assert map (data, i+1) == fmap(data, i) + map(data, i);
assert freduce(fmap(data, i), 0) == data[i] * data[i];
…
}
18
![Page 43: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/43.jpg)
CASPER Architecture Diagram
Candidate Solution
Generator
Bounded Model
Checker
Candidate Summary
Counter-example
Input Examples(Random)
Failed Correct Solution
19
![Page 44: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/44.jpg)
CASPER Architecture Diagram
Candidate Solution
Generator
Bounded Model
Checker
Candidate Summary
Counter-example
Input Examples(Random)
Correct Solution
Program Analyzer
Grammar
Failed
Failed
Original SourceCode
19
![Page 45: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/45.jpg)
CASPER Architecture Diagram
Candidate Solution
Generator
Bounded Model
Checker
Candidate Summary
Counter-example
Input Examples(Random)
Program Analyzer
Grammar
Failed
Failed
Theorem Prover
Verified Summary
Candidate Summary
Failed
Original SourceCode
19
![Page 46: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/46.jpg)
CASPER Architecture Diagram
Candidate Solution
Generator
Bounded Model
Checker
Candidate Summary
Counter-example
Input Examples(Random)
Program Analyzer
Grammar
Failed
Failed
Theorem Prover
Candidate Summary
Failed
Verified Summary Code
Generator
Hadoop / Spark Code
Original SourceCode
19
![Page 47: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/47.jpg)
CASPER Architecture Diagram
Candidate Solution
Generator
Bounded Model
Checker
Candidate Summary
Counter-example
Input Examples(Random)
Program Analyzer
Grammar
Failed
Failed
Theorem Prover
Candidate Summary
Failed
SKETCH DafnyPolyglot
Verified Summary Code
Generator
Hadoop / Spark Code
Polyglot
Original SourceCode
19
![Page 48: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/48.jpg)
Evaluation
• Compilation performance
• Run-time performance
• Five benchmarks:
- Summation
- Word Count
- String Search (Grep)
- Linear Regression
- 3D Histogram
20
![Page 49: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/49.jpg)
Compilation Performance
21
BenchmarkProgram Analysis
Synthesis and BMC
# of grammar Iterations
Formal Verification
Summation < 1s 13s 1 2.8s
Word Count < 1s 44s 1 3.4s
String Match < 1s 1406s 2 3.3s
3D Histogram < 1s 2355s 2 4.2s
Linear Regression < 1s 1801s 2 4.8s
![Page 50: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/50.jpg)
Runtime Performance
Benchmark: String Matching (Grep)
22
• Configuration:-
10 node cluster
8 vCPU, 15GB Memory
HDFS for data storage
Hadoop 2.7.2 and Spark 1.6.1
• Average Speedup:
6.1x on Spark
3.3x on Hadoop
![Page 51: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/51.jpg)
Demo!
23
![Page 52: Leveraging Parallel Data Processing Frameworks with](https://reader030.vdocument.in/reader030/viewer/2022012523/6197025bbfca16100f2f22d5/html5/thumbnails/52.jpg)
Data
Data
Data Collection Tool Data Analytics Application(Spark)
CASPER
SummaryWeb-page: http://tinyurl.com/casper-homepageMailing-list: http://tinyurl.com/casper-subscribe
24