massive semantic web data compression with mapreduce
DESCRIPTION
Massive Semantic Web data compression with MapReduce. Jacopo Urbani , Jason Maassen , Henri Bal Vrije Universiteit , Amsterdam HPDC ( High Performance Distributed Computing) 2010 20June. 2014 SNU IDB Lab. Lee, Inhoe. Outline. Introduction Conventional Approach - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/1.jpg)
Massive Semantic Web data com-pression with MapReduceJacopo Urbani, Jason Maassen, Henri BalVrije Universiteit, AmsterdamHPDC (High Performance Distributed Computing) 2010
20June. 2014SNU IDB Lab.
Lee, Inhoe
![Page 2: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/2.jpg)
<2/38>
Outline
Introduction Conventional Approach MapReduce Data Compression MapReduce Data Decompression Evaluation Conclusions
![Page 3: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/3.jpg)
<3/38>
Introduction
Semantic Web– An extension of the current World Wide Web
A information = a set of statements Each statement = three different terms;
– subject, predicate, and object– <http://www.vu.nl> <rdf:type> <dbpedia:University>
![Page 4: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/4.jpg)
<4/38>
Introduction
the terms consist of long strings– Most semantic web applications compress the statements– to save space and increase the performance
the technique to compress data is dictionary encoding
![Page 5: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/5.jpg)
<5/38>
Motivation
Currently the amount of Semantic Web data– Is steadily growing
Compressing many billions of statements – becomes more and more time-consuming.
A fast and scalable compression is crucial
A technique to compress and decompress Semantic Web state-ments – using the MapReduce programming model
Allowed us to reason directly on the compressed statements with a consequent increase of performance [1, 2]
![Page 6: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/6.jpg)
<6/38>
Outline
Introduction Conventional Approach MapReduce Data Compression MapReduce Data Decompression Evaluation Conclusions
![Page 7: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/7.jpg)
<7/38>
Conventional Approach Dictionary encoding
– Compress data– Decompress data
![Page 8: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/8.jpg)
<8/38>
Outline
Introduction Conventional Approach MapReduce Data Compression MapReduce Data Decompression Evaluation Conclusions
![Page 9: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/9.jpg)
<9/38>
MapReduce Data Compression
job 1: identifies the popular terms and assigns them a numerical ID
job 2: deconstructs the statements, builds the dictionary table and re-places all terms with a corresponding numerical ID
job 3: read the numerical terms and reconstruct the statements in their compressed form
![Page 10: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/10.jpg)
<10/38>
Job1 : caching of popular terms
Identify the most popular terms and assigns them a numerical number– count the occurrences of the
terms– select the subset of the most
popular ones– Randomly sample the input
![Page 11: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/11.jpg)
<11/38>
Job1 : caching of popular terms
![Page 12: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/12.jpg)
<12/38>
Job1 : caching of popular terms
![Page 13: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/13.jpg)
<13/38>
Job1 : caching of popular terms
![Page 14: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/14.jpg)
<14/38>
Job2: deconstruct statements
Deconstruct the statements and compress the terms with a nu-merical ID
Before the map phase starts, loading the popular terms into the main memory
The map function reads the statements and assigns each of them a numerical ID – Since the map tasks are executed in
parallel, we partition the numerical range of the IDs so that each task is allowed to assign only a specific range of numbers
![Page 15: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/15.jpg)
<15/38>
Job2: deconstruct statements
![Page 16: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/16.jpg)
<16/38>
Job2: deconstruct statements
![Page 17: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/17.jpg)
<17/38>
Job2: deconstruct statements
![Page 18: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/18.jpg)
<18/38>
Job3: reconstruct statements Read the previous job’s output and reconstructs the statements
using the numerical IDs
![Page 19: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/19.jpg)
<19/38>
Job3: reconstruct statements
![Page 20: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/20.jpg)
<20/38>
Job3: reconstruct statements
![Page 21: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/21.jpg)
<21/38>
Job3: reconstruct statements
![Page 22: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/22.jpg)
<22/38>
Outline
Introduction Conventional Approach MapReduce Data Compression MapReduce Data Decompression Evaluation Conclusions
![Page 23: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/23.jpg)
<23/38>
MapReduce data decompression Join between the compressed state-
ments and the dictionary table
job 1: identifies the popular terms job 2: perform the join between the
popular resources and the dictionary table
job 3: deconstruct the statements and decompresses the terms performing a join on the input
job 4: reconstruct the statements in the original format
![Page 24: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/24.jpg)
<24/38>
Job 1: identify popular terms
![Page 25: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/25.jpg)
<25/38>
Job 2 : join with dictionary table
![Page 26: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/26.jpg)
<26/38>
Job 3: join with compressed input
![Page 27: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/27.jpg)
<27/38>
Job 3: join with compressed input
![Page 28: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/28.jpg)
<28/38>
Job 3: join with compressed input
(20, www.cyworld.com)(21, www.snu.ac.kr)….(113, www.hotmail.com)(114, mail)
![Page 29: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/29.jpg)
<29/38>
Job 4: reconstruct statements
![Page 30: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/30.jpg)
<30/38>
Job 4: reconstruct statements
![Page 31: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/31.jpg)
<31/38>
Job 4: reconstruct statements
![Page 32: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/32.jpg)
<32/38>
Outline
Introduction Conventional Approach MapReduce Data Compression MapReduce Data Decompression Evaluation Conclusions
![Page 33: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/33.jpg)
<33/38>
Evaluation Environments
– 32 nodes of the DAS3 cluster to set up our Hadoop framework
Each node– two dual-core 2.4 GHz AMD Opteron CPUs– 4 GB main memory– 250 GB storage
![Page 34: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/34.jpg)
<34/38>
Results The throughput of the compression algorithm is higher for a larger
datasets than for a smaller one– our technique is more efficient on larger inputs, where the computation is
not dominated by the platform overhead
Decompression is slower than Compression
![Page 35: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/35.jpg)
<35/38>
Results
The beneficial effects of the popular-terms cache
![Page 36: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/36.jpg)
<36/38>
Results Scalability
– Different input size– Varying the number of nodes
![Page 37: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/37.jpg)
<37/38>
Outline
Introduction Conventional Approach MapReduce Data Compression MapReduce Data Decompression Evaluation Conclusions
![Page 38: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/38.jpg)
<38/38>
Conclusions
Proposed a technique to compress Semantic Web statements – using the MapReduce programming model
Evaluated the performance measuring the runtime– More efficient for larger inputs
Tested the scalability– Compression algo. scales more efficiently
A major contribution to solve this crucial problem in the Semantic Web
![Page 39: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/39.jpg)
<39/38>
References [1] J. Urbani, S. Kotoulas, J. Maassen, F. van Harmelen, and H. Bal.
Owl reasoning with mapreduce: calculating the closure of 100 bil-lion triples. Currently under submission, 2010.
[2] J. Urbani, S. Kotoulas, E. Oren, and F. van Harmelen. Scalable distributed reasoning using mapreduce. In Proceedings of the ISWC '09, 2009.
![Page 40: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/40.jpg)
<40/38>
Outline
Introduction Conventional Approach MapReduce Data Compression
– Job 1: caching of popular terms– Job 2: deconstruct statements– Job 3: reconstruct statements
MapReduce Data Decompression– Job 2: join with dictionary table– Job 3: join with compressed input
Evaluation– Runtime– Scalability
Conclusions
![Page 41: Massive Semantic Web data compression with MapReduce](https://reader035.vdocument.in/reader035/viewer/2022081520/56816923550346895de05836/html5/thumbnails/41.jpg)
<41/38>
Conventional Approach Dictionary encoding
Input : ABABBABCABABBA
Output : 124523461