Download - Having Your Cake and Eating It Too
![Page 1: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/1.jpg)
Having Your Cake and Eating It Too
With Apache OODT and Apache Solr
Andrew F. Hart
Paul M. Ramirez
![Page 2: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/2.jpg)
About Myself…• Software Engineer– NASA Jet Propulsion Laboratory– “Data Management”
• Committer: – OODT, SIS, Gora, Streams (Incubating)
• Mentor: Streams (Incubating)
![Page 3: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/3.jpg)
What We’ll Cover• Overview of OODT & Solr Projects
• Strategies for Combining OODT and Solr
• Detailed Deployment/Config. Example
• Where to Learn More & Participate
![Page 4: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/4.jpg)
Apache OODT• Object Oriented Data Technology• Origin in NASA mission data systems• Components for– Information integration– Data cataloging and archiving– Configurable workflow processing
![Page 5: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/5.jpg)
Apache OODT• OODT @ Apache– Incubation: 2010, Graduation: 2011– 29 Committers– Latest Release: 0.5 (Dec. 26, 2012)
![Page 6: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/6.jpg)
Apache OODT• Karoo Array Telescope (KAT-7)
![Page 7: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/7.jpg)
Apache OODT• Virtual Pediatric Intensive Care Unit
![Page 8: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/8.jpg)
Apache OODT• Regional Climate Model Evaluation
System
![Page 9: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/9.jpg)
Apache OODT• Commonalities between systems– Lots of data– Defined processing steps / algorithms
• Archives important (… search important)
![Page 10: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/10.jpg)
Apache OODT• Strengths of OODT for the above use
cases– Loosely coupled components– Standard protocols, well-defined
interfaces– Highly configurable– Vetted, reliable code
![Page 11: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/11.jpg)
Apache Solr• Search + Web Services– Powerful features– Flexible formats– Highly configurable
![Page 12: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/12.jpg)
Apache Solr• The White House
![Page 13: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/13.jpg)
Apache Solr• Netflix
![Page 14: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/14.jpg)
Apache Solr• NASA Planetary Data System
![Page 15: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/15.jpg)
OODT & Solr• Why use these projects together?• Archives often need search capability• Similarities / Compatibilities– XML-based configuration– Environment (Java, Tomcat)
![Page 16: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/16.jpg)
Example Integration“Standard” Data Archive Pipeline
![Page 17: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/17.jpg)
Example Integration“Standard” Data Archive Pipeline + Search
![Page 18: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/18.jpg)
OODT Products• Typically 1-1 with Files• Each uniquely identifiable (GUID)• Support for higher-level
“ProductType”– A way to define collections
![Page 19: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/19.jpg)
OODT Metadata• Annotations for products• Key:{Val|Multival}• Common across all OODT components• Two general classes: – System– User
![Page 20: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/20.jpg)
OODT Metadata• System Metadata– Added automatically by OODT
Components– Used to track state– Used to encode relationships between
data
![Page 21: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/21.jpg)
OODT Metadata• User Metadata– Specified as “policy”– Can be product-level, or productType-
level– Used to extract & persist information
from files as they are ingested (become products)
![Page 22: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/22.jpg)
OODT Metadata• Metadata (Policy) Example
(external)
![Page 23: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/23.jpg)
Solr Schema• XML document• Define what will be indexed (“Fields”)• Provide high-level context hints– Data type, behavior, pre-processing
• Extremely flexible, extensible
![Page 24: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/24.jpg)
Solr Schema• Solr Schema Example
(external)
![Page 25: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/25.jpg)
Making the Connection• SolrIndexer Tool– Part of the File Manager component
tools–Map OODT Metadata to Solr Fields– Create Solr documents from OODT
products– Note: only talking about metadata
![Page 26: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/26.jpg)
SolrIndexer Tool• Org.Apache.Oodt.Cas.Filemgr.Tools
• Available since 0.4 Release• Recommend to use 0.5+ as some
stability improvements were added• Several modes of operation
![Page 27: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/27.jpg)
SolrIndexer Tool
![Page 28: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/28.jpg)
SolrIndexerTool• Invocation Examples: Ingest all
products from the specified File Manager instancejava -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --all \ --fmUrl http://localhost:9000 \ --solrUrl http://localhost:8080/solr
![Page 29: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/29.jpg)
SolrIndexerTool• Invocation Examples: Ingest all
products from the specified ProductType(s)java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --types urn:some:ProductType \ --fmUrl http://localhost:9000 \ --solrUrl http://localhost:8080/solr
![Page 30: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/30.jpg)
SolrIndexerTool• Invocation Examples: Ingest a single
product by its unique product id
java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --product 19bcb4b8-7999-11e1-b581-8b771498975d \ [--delete] \ --fmUrl http://localhost:9000 \ --solrUrl http://localhost:8080/solr
![Page 31: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/31.jpg)
SolrIndexerTool• Invocation Examples: Force
optimization of the Solr index
java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --optimize --solrUrl http://localhost:8080/solr
![Page 32: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/32.jpg)
Indexer.properties• Configuration file for the SolrIndexer• Specify mapping between OODT
product metadata and Solr fields• Additional “pre-processing” features
![Page 33: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/33.jpg)
Indexer.properties• Example Indexer.properties file
(external)
![Page 34: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/34.jpg)
Use Case I• Building a searchable data archive• “Long-term” / “Lights-out” archive• Products & metadata immutable• Many NASA mission data systems
use this model• Want to make it easily searchable
![Page 35: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/35.jpg)
Use Case I“Standard” Data Archive Pipeline + Search
![Page 36: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/36.jpg)
Use Cases II• Building an interactively editable,
searchable data archive• Data and metadata mutable• Want to dynamically select
product(s) to edit based on metadata
![Page 37: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/37.jpg)
Use Case IIInteractively Editable Data Archive Pipeline + Search
![Page 38: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/38.jpg)
Use Case IIInteractively Editable Data Archive Pipeline + Search
Solr catalog out of sync!
![Page 39: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/39.jpg)
Synchronization• Two ways (at least) to solve this:
A. Modify the OODT Curator ServicesB. Treat OODT Curator Services as “black
box” and write “wrapper” service to invoke Curator Services AND update Solr (via scripted call to SolrIndexer, for example)
![Page 40: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/40.jpg)
Modify Curator Services• Services implemented in JAX-RS• /curator/src/main/java/org/apache/oodt/cas/
curation/service
• [curator_url]/services/metadata/update• Options:– Utilize Solr Java API–Wrap call to OODT SolrIndexer tool
![Page 41: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/41.jpg)
Use Case II-AModified Curator Services to Simultaneously update Solr
![Page 42: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/42.jpg)
Example• Interactive event
tagging
![Page 43: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/43.jpg)
Wrap Curator Services• Curator Service/API is “black box”• Develop custom service that: – Issues POST request to Curator service– Updates Solr index via, e.g.:• Utilize Solr Java API• Wrap call to OODT SolrIndexer tool
![Page 44: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/44.jpg)
Use Case II-BWrapping OODT Curation Services with Custom UI & Services
![Page 45: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/45.jpg)
Example
![Page 46: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/46.jpg)
Lessons• Solr compliments OODT File Manager• RESTful interfaces (Solr + OODT
Curator) allow for great flexibility in designing services and UI
• “Best” approach depends on situation
![Page 47: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/47.jpg)
Next Steps• Develop “SolrCatalog” for OODT File
Manager?– Pros: Reduction in “moving parts”– Cons: Restrictive?
• Implement Use Case II-A as optional mode for Curator web service layer
![Page 48: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/48.jpg)
Learning More• Solr– http://lucene.apache.org/solr• [email protected]
• OODT– http://oodt.apache.org• https://cwiki.apache.org/confluence/display/
OODT/Home• [email protected]
![Page 49: Having Your Cake and Eating It Too](https://reader035.vdocument.in/reader035/viewer/2022062314/568147ac550346895db4e862/html5/thumbnails/49.jpg)
Thanks!• Questions?