having your cake and eating it too with apache oodt and apache solr andrew f. hart paul m. ramirez

49
Having Your Cake and Eating It Too With Apache OODT and Apache Solr Andrew F. Hart Paul M. Ramirez

Upload: bernice-henry

Post on 18-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Having Your Cake and Eating It Too

With Apache OODT and Apache Solr

Andrew F. Hart

Paul M. Ramirez

About Myself…• Software Engineer– NASA Jet Propulsion Laboratory– “Data Management”

• Committer: – OODT, SIS, Gora, Streams (Incubating)

• Mentor: Streams (Incubating)

What We’ll Cover• Overview of OODT & Solr Projects

• Strategies for Combining OODT and Solr

• Detailed Deployment/Config. Example

• Where to Learn More & Participate

Apache OODT• Object Oriented Data Technology• Origin in NASA mission data systems• Components for– Information integration– Data cataloging and archiving– Configurable workflow processing

Apache OODT• OODT @ Apache– Incubation: 2010, Graduation: 2011– 29 Committers– Latest Release: 0.5 (Dec. 26, 2012)

Apache OODT• Karoo Array Telescope (KAT-7)

Apache OODT• Virtual Pediatric Intensive Care Unit

Apache OODT• Regional Climate Model Evaluation

System

Apache OODT• Commonalities between systems– Lots of data– Defined processing steps / algorithms

• Archives important (… search important)

Apache OODT• Strengths of OODT for the above use

cases– Loosely coupled components– Standard protocols, well-defined

interfaces– Highly configurable– Vetted, reliable code

Apache Solr• Search + Web Services– Powerful features– Flexible formats– Highly configurable

Apache Solr• The White House

Apache Solr• Netflix

Apache Solr• NASA Planetary Data System

OODT & Solr• Why use these projects together?• Archives often need search capability• Similarities / Compatibilities– XML-based configuration– Environment (Java, Tomcat)

Example Integration“Standard” Data Archive Pipeline

Example Integration“Standard” Data Archive Pipeline + Search

OODT Products• Typically 1-1 with Files• Each uniquely identifiable (GUID)• Support for higher-level

“ProductType”– A way to define collections

OODT Metadata• Annotations for products• Key:{Val|Multival}• Common across all OODT components• Two general classes: – System– User

OODT Metadata• System Metadata– Added automatically by OODT

Components– Used to track state– Used to encode relationships between

data

OODT Metadata• User Metadata– Specified as “policy”– Can be product-level, or productType-

level– Used to extract & persist information

from files as they are ingested (become products)

OODT Metadata• Metadata (Policy) Example

(external)

Solr Schema• XML document• Define what will be indexed (“Fields”)• Provide high-level context hints– Data type, behavior, pre-processing

• Extremely flexible, extensible

Solr Schema• Solr Schema Example

(external)

Making the Connection• SolrIndexer Tool– Part of the File Manager component

tools–Map OODT Metadata to Solr Fields– Create Solr documents from OODT

products– Note: only talking about metadata

SolrIndexer Tool• Org.Apache.Oodt.Cas.Filemgr.Tools

• Available since 0.4 Release• Recommend to use 0.5+ as some

stability improvements were added• Several modes of operation

SolrIndexer Tool

SolrIndexerTool• Invocation Examples: Ingest all

products from the specified File Manager instancejava -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --all \ --fmUrl http://localhost:9000 \ --solrUrl http://localhost:8080/solr

SolrIndexerTool• Invocation Examples: Ingest all

products from the specified ProductType(s)java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --types urn:some:ProductType \ --fmUrl http://localhost:9000 \ --solrUrl http://localhost:8080/solr

SolrIndexerTool• Invocation Examples: Ingest a single

product by its unique product id

java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --product 19bcb4b8-7999-11e1-b581-8b771498975d \ [--delete] \ --fmUrl http://localhost:9000 \ --solrUrl http://localhost:8080/solr

SolrIndexerTool• Invocation Examples: Force

optimization of the Solr index

java -DSOLR_INDEXER_CONFIG=/path/to/indexer.properties \ -Djava.ext.dirs=/path/to/cas/filemgr/lib/ \ org.apache.oodt.cas.filemgr.tools.SolrIndexer \ --optimize --solrUrl http://localhost:8080/solr

Indexer.properties• Configuration file for the SolrIndexer• Specify mapping between OODT

product metadata and Solr fields• Additional “pre-processing” features

Indexer.properties• Example Indexer.properties file

(external)

Use Case I• Building a searchable data archive• “Long-term” / “Lights-out” archive• Products & metadata immutable• Many NASA mission data systems

use this model• Want to make it easily searchable

Use Case I“Standard” Data Archive Pipeline + Search

Use Cases II• Building an interactively editable,

searchable data archive• Data and metadata mutable• Want to dynamically select

product(s) to edit based on metadata

Use Case IIInteractively Editable Data Archive Pipeline + Search

Use Case IIInteractively Editable Data Archive Pipeline + Search

Solr catalog out of sync!

Synchronization• Two ways (at least) to solve this:

A. Modify the OODT Curator ServicesB. Treat OODT Curator Services as “black

box” and write “wrapper” service to invoke Curator Services AND update Solr (via scripted call to SolrIndexer, for example)

Modify Curator Services• Services implemented in JAX-RS• /curator/src/main/java/org/apache/oodt/cas/

curation/service

• [curator_url]/services/metadata/update• Options:– Utilize Solr Java API–Wrap call to OODT SolrIndexer tool

Use Case II-AModified Curator Services to Simultaneously update Solr

Example• Interactive event

tagging

Wrap Curator Services• Curator Service/API is “black box”• Develop custom service that: – Issues POST request to Curator service– Updates Solr index via, e.g.:• Utilize Solr Java API• Wrap call to OODT SolrIndexer tool

Use Case II-BWrapping OODT Curation Services with Custom UI & Services

Example

Lessons• Solr compliments OODT File Manager• RESTful interfaces (Solr + OODT

Curator) allow for great flexibility in designing services and UI

• “Best” approach depends on situation

Next Steps• Develop “SolrCatalog” for OODT File

Manager?– Pros: Reduction in “moving parts”– Cons: Restrictive?

• Implement Use Case II-A as optional mode for Curator web service layer

Learning More• Solr– http://lucene.apache.org/solr• [email protected]

• OODT– http://oodt.apache.org• https://cwiki.apache.org/confluence/display/

OODT/Home• [email protected]

Thanks!• Questions?