streaming oodt - events.static.linuxfound.org · combining apache spark's power with apache...
TRANSCRIPT
![Page 1: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/1.jpg)
Streaming OODT:
Combining Apache Spark's Power with Apache OODT"
Michael Starch – NASA Jet Propulsion Laboratory!
![Page 2: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/2.jpg)
Agenda"– Data and Processing!– Data Systems!– Apache OODT!– Apache Spark!– Streaming OODT!– Examples!– Where can I get the code?!– Acknowledgements!– Questions!
![Page 3: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/3.jpg)
Data and Processing!
![Page 4: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/4.jpg)
Data and Processing"
Figure 1: What is data processing?!
a∑x + x dxdt∫
a∑x + y dxdt∫
Figure 2: More complex data processing!
![Page 5: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/5.jpg)
Parallelization"
Figure 3: Parallelizing data processing!
![Page 6: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/6.jpg)
Big Data"
Figure 4: Data is becoming very large!
Figure 5: Parallelizable big-data !
![Page 7: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/7.jpg)
Data Systems!
![Page 8: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/8.jpg)
Archival and Search "
Figure 6: Archiving and searching in data sets!
![Page 9: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/9.jpg)
Processing and Resource Management "
Figure 7: Processing and resource management!
![Page 10: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/10.jpg)
Data Ingest and Delivery"
a∑x + x dxdt∫
Figure 8: Data ingestion and delivery!
![Page 11: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/11.jpg)
Apache OODT!
![Page 12: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/12.jpg)
Apache OODT"
Figure 9: Base Object-Oriented Data Technology (OODT)!
![Page 13: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/13.jpg)
Archival and Search"
Figure 10: OODT metadata-based search!
![Page 14: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/14.jpg)
Workflow Management"
Figure 11: OODT workflow management!
![Page 15: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/15.jpg)
Limitations"
Figure 12: Simplified OODT Architecture!
![Page 16: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/16.jpg)
Apache Spark!
![Page 17: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/17.jpg)
Map Reduce Processing"
Figure 13: Map Reduce Processing!
![Page 18: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/18.jpg)
Berkley Data Analysis Stack"
Source: https://amplab.cs.berkeley.edu/software/!Figure 14: Berkley data analysis stack components !
![Page 19: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/19.jpg)
Apache Spark"
Figure 15: Resilient Distributed Datasets!
Figure 16: Apache Spark libraries!
Source: https://spark.apache.org/images/spark-stack.png!
![Page 20: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/20.jpg)
Streaming OODT!
![Page 21: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/21.jpg)
Streaming OODT Design"
Figure 17: Design and implementation of Streaming OODT!
![Page 22: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/22.jpg)
Modified Architecture"
Figure 18: Improved OODT Architecture for big-data processing!
![Page 23: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/23.jpg)
Examples!
![Page 24: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/24.jpg)
Example - Palindromes"
Figure 19: Palindrome detection algorithm!
![Page 25: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/25.jpg)
Example - Code"
//Example detection algorithm...public static boolean isPalindrome(String line) { line = line.replaceAll("\\s","").toLowerCase(); return line.equals(new StringBuilder(line).reverse().toString());}:...//Spark wrapper class for detection algorithmstatic class FilterPalindrome implements Function<String, Boolean> { public Boolean call(String s) { return isPalindrome(s); }}...Sample 1: Palindrome detection shared code!
![Page 26: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/26.jpg)
Example – Data Set"
clowring infratrochanteric unlimitable overstaffing ...nonsubstantiality incongeniality ghbor gargil semiconventionality betokens clinodome ...pulviniform actualize cousins moocha Mosaism craals midstout desightment Boehmenism LP ravelins underskirt CSB cossas xen- nonlucidness unvagrantness togata noncaptiousness dromioid lambie undergarments salvages...LAP revealableness outsnore headstalls metallography outgazed unstintingly boongary provinces trans-Mongolian...Sample 2: Palindrome file sample!
...!10,805,887,353 Bytes (11 GB)!
46284 palindromes !
![Page 27: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/27.jpg)
Example – Shootout"Spark!
429.774s!1 CPU!
//Sample java code...JavaRDD<String> rdd = sc.textFile( input.getValue("file"));JavaRDD<String> filtered = rdd.filter(new PalindromeUtils .FilterPalindrome());long count = filtered.count();... !
//Sample java code...String file = input.getValue("file");br = new BufferedReader(new FileReader(file));String line;while ((line = br.readLine()) != null) { if (PalindromeUtils .isPalindrome(line)) count++; }... !
Spark! 16.72s !~92 CPUs!
Sample 3: Naïve file processing code ! Sample 4: Spark file processing code!
![Page 28: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/28.jpg)
Example - Streaming"JavaReceiverInputDStream<String> stream = ssc.socketTextStream(input.getValue("host"), Integer.parseInt(input.getValue("port")));JavaDStream<String> filtered = stream.filter(new PalindromeUtils.FilterPalindrome());final JavaDStream<Long> count = filtered.count();/* Begin: output code */count.foreachRDD(new Function<JavaRDD<Long>,Void>(){ public Void call(JavaRDD<Long> jrdd) throws Exception { synchronized(output) { Long[] collected = (Long[])jrdd.rdd().collect(); for (Long item : collected) output.println("Found "+item.longValue()+ " palindromes."); } return null;}});/* End: output code*/ssc.start();ssc.awaitTermination();Sample 5: Streaming palindromes code!
![Page 29: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/29.jpg)
Example – Streaming Configuration"... <instanceClass name= "org.apache.oodt.cas.resource.spark.examples.StreamingPalindromeExample" /> <inputClass name= "org.apache.oodt.cas.resource.structs.NameValueJobInput"> <properties> <property name="host" value="host" /> <property name="port" value="7007" /> <property name="time" value="60000" /> <property name="output" value="/home/user/files/output-streaming-palindrome.txt" /> </properties> </inputClass> <queue>quick</queue> <load>1</load> ... Sample 6: Streaming palindromes configuration!
![Page 30: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/30.jpg)
Example – Streaming In Action"
![Page 31: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/31.jpg)
Where can I get the code?"!
It’s Open Source! Jump on in!!!
Apache OODT SVN:!"https://svn.apache.org/repos/asf/oodt/trunk/!
!
Mailing List:! "[email protected]!
![Page 32: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/32.jpg)
Acknowledgments"
NASA Jet Propulsion Laboratory!Research & Technology Development!“Archiving, Processing and Dissemination for the Big Data Era”!!!
Apache Software Foundation!Apache OODT Project!
![Page 33: Streaming OODT - events.static.linuxfound.org · Combining Apache Spark's Power with Apache OODT" Michael Starch – NASA Jet Propulsion Laboratory! Agenda" – Data and Processing!](https://reader033.vdocument.in/reader033/viewer/2022042415/5f30b73d81479f05a816c05e/html5/thumbnails/33.jpg)
Questions?"
你!有!沒!有!問!題!?!
Haben Sie Fragen?"
¿Tienen preguntas?"
Avez-vous des questions?"