consuming external content and enriching content with apache camel

38
Presented by: Gaston Gonzalez, headwire.com, Inc. + Advanced AEM Search Consuming External Content and Enriching Content with Apache Camel

Upload: therealgaston

Post on 13-Jan-2017

347 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Consuming External Content and Enriching Content with Apache Camel

Presented by: Gaston Gonzalez, headwire.com, Inc.

+

Advanced AEM SearchConsuming External Content and Enriching Content with Apache Camel

Page 2: Consuming External Content and Enriching Content with Apache Camel

About Me• Senior Technical Architect at

headwire.com, Inc.• Search Engineer / Developer• AEM Architect / Developer• Creator of AEM Solr Search• Tech Blogger• UNIX Systems Administrator

+

Page 3: Consuming External Content and Enriching Content with Apache Camel

+

Typical AEM + Search Integration

Page 4: Consuming External Content and Enriching Content with Apache Camel

Typical AEM + Search Architecture

+

Page 5: Consuming External Content and Enriching Content with Apache Camel

Typical AEM + Search Architecture

+

Pros Cons

• Straight forward implementation• Simple architecture (AEM + Search)

• Complete data model in AEM?• Not all data may be in AEM

• Processing overhead• Data cleansing, transformation and

enrichment handled in AEM• Fault Tolerance

• What if Solr is down?• Tight coupling to search platform

Page 6: Consuming External Content and Enriching Content with Apache Camel

Is there another way?

+

Page 7: Consuming External Content and Enriching Content with Apache Camel

Goals for a better Architecture

• Offload processing outside of AEM• Improve fault tolerance• Provide flexible platform for data cleansing,

transformation and aggregation• Allow for changes to indexing logic with impacting

AEM• Search engine agnostic

+

Page 8: Consuming External Content and Enriching Content with Apache Camel

Introduce an ETL / Document Processor

+

Page 9: Consuming External Content and Enriching Content with Apache Camel

+

Document Processing

Page 10: Consuming External Content and Enriching Content with Apache Camel

Document Processing Platform• Roles & Responsibilities

• Enriches submitted documents prior to indexing.• Submits documents for indexing.

• Terms & Definitions• Enrichment: Data cleansing, filtering, transformation,

aggregation, etc.• Processing Stage: Independent processing unit

responsible for contributing to the enrichment process.• Pipeline: Consists of one or more processing stages or

sub pipelines.

+

Page 11: Consuming External Content and Enriching Content with Apache Camel

Document Processing Platform

+

Page 12: Consuming External Content and Enriching Content with Apache Camel

Document processing is really an integration problem, right?

+

Integration Library Integration Framework &Stream Processing

Enterprise Service Bus

Apache Camel Spring Integration Mule ESB

Spring Cloud Data Flow &Cloud Stream

Low ComplexityHigh

Page 13: Consuming External Content and Enriching Content with Apache Camel

+

Apache Camel

Page 14: Consuming External Content and Enriching Content with Apache Camel

Apache Camel• A light-weight, open source

integration library.• Mediation engine• Implements well-known Enterprise

Integration Patterns (EIPs)• Aggregator• Content Enricher• Content-based router• Message• Message Translator• Pipes and Filters• Splitter…

+

Page 15: Consuming External Content and Enriching Content with Apache Camel

Why Apache Camel?• Light weight—it’s a JAR• Imposes no runtime constraints• Routing engine• Powerful, fluent Java DSL• Mature open source project• Extensive list of integration components• Avoid writing boiler plate code—leverage EIPs

+

Page 16: Consuming External Content and Enriching Content with Apache Camel

Apache Camel & EIP Concepts

+

Message• Unit of information exchange between applications

Exchange• Wraps inbound & outbound message + headers

Message Channel• Allows applications to communicate using messaging

Pipes and Filters• Perform loosely coupled processing on a message• Routes and Processors in Camel

Page 17: Consuming External Content and Enriching Content with Apache Camel

Camel’s Data Model

+

Page 18: Consuming External Content and Enriching Content with Apache Camel

Camel’s Architecture

+

Page 19: Consuming External Content and Enriching Content with Apache Camel

Importing Product Content into SolrProblem: “As an AEM developer, I need to import product content into Solr so that I can display products via search and on PDPs on my AEM-powered site.”

+

Let’s use Best Buy’s Product API as example…1. Fetch product data ZIP file via HTTP request.2. Unzip product data.3. Parse each JSON file to extract individual products.4. Transform, enrich and cleanse each product as necessary.5. Submit each product to Solr for indexing.

Page 20: Consuming External Content and Enriching Content with Apache Camel

A solution using EIPs

+

Page 21: Consuming External Content and Enriching Content with Apache Camel

A solution using Camel

+

Page 22: Consuming External Content and Enriching Content with Apache Camel

A short list of Camel Components

+

AMPQ Git RabbitMQ

ATOM HTTP / HTTP4 Rest

AWS JCR RSS

Bean JDBC SolrBox JMS Apache Spark

Cache Jsch SQL

CouchDB Log TimerElasticsearch MongoDB XSLT

File Netty / Netty4 Quartz

http://camel.apache.org/components.html

Page 23: Consuming External Content and Enriching Content with Apache Camel

Back to AEM and indexing AEM

content…

+

Page 24: Consuming External Content and Enriching Content with Apache Camel

A Better AEM + Search Architecture

+

Page 25: Consuming External Content and Enriching Content with Apache Camel

Enrichment Use Cases for AEM• Search Relevancy• Merge ratings and review signals• Merge analytics signals (visits, page views…)• Merge social signals (likes, shares, …)

• Cleanse data for search• Rich content processing (Tika)• Natural Language Processing (OpenNLP)• Filter / drop documents• Classify content

+

Page 26: Consuming External Content and Enriching Content with Apache Camel

AEM: Data Model (1/3)• Use a serializable object to represent your document• In fact, use a HashMap

• No dependency object graph• Most search platforms already think of documents as a

series of key/value pairs• Use key name prefixes to model:

• Index operation type (aem.op)• Document Fields (aem.field.<field>)• Metadata (aem.meta.<field>)

+

Page 27: Consuming External Content and Enriching Content with Apache Camel

AEM: Data Model (1/3)HashMap<String, Object> jmsDoc = new HashMap<String, Object>();

// Operation TypejmsDoc.put("aem.op.type","ADD_DOC");

// Document fieldsjmsDoc.put("aem.field.id", page.getPath());jmsDoc.put("aem.field.crxPath", page.getPath());jmsDoc.put("aem.field.url", page.getPath() + ".html");jmsDoc.put("aem.field.title", page.getTitle());jmsDoc.put("aem.field.description", page.getDescription());

// MetadatajmsDoc.put("aem.meta.foo", "bar");

+

Page 28: Consuming External Content and Enriching Content with Apache Camel

AEM: Listener / JMS Producer (2/3)

+

• Create an AEM Listener • Implement EventHandler interface• Listen for the PageEvent topics• Convert the Page resource to a our data model

• Add operation type• Add document fields• Add metadata fields

• Send the message to JMS index topic• Example: JmsIndexListener.java

Page 29: Consuming External Content and Enriching Content with Apache Camel

AEM: JMS Camel Consumer (3/3)

+

• Define your Camel runtime (e.g., standalone, OSGi, etc.)• Define your Camel routes

• Consume JMS topic• Route operation type using content-based router• Enrich document as needed• Convert JMS document model to Solr model• Submit index request

• Example: AemToSolr.java

Page 30: Consuming External Content and Enriching Content with Apache Camel

+

Demo

Page 31: Consuming External Content and Enriching Content with Apache Camel

Demo Prerequisites• Java 8 / Maven 3.2.x• AEM 6.1• http://www.aemsolrsearch.com• https://github.com/GastonGonzalez/aem-solr-se

arch-product-sample

• Best Buy API Key• Vagrant and VirtualBox

+

Page 32: Consuming External Content and Enriching Content with Apache Camel

+

Camel Runtime Options

Page 33: Consuming External Content and Enriching Content with Apache Camel

Java main: CamelContext

Page 34: Consuming External Content and Enriching Content with Apache Camel

Java main: Wrapper

Page 35: Consuming External Content and Enriching Content with Apache Camel

OSGi Runtime

Page 37: Consuming External Content and Enriching Content with Apache Camel

In summary…

+

• If you do not need enrichment, keep it simple and use a direct indexing approach.

• If you have a need to enrich your AEM content consider using Camel as your document processing platform.

• This architecture is NOT search-specific!• Syndicate AEM content to other systems• Workflow replacement

Page 38: Consuming External Content and Enriching Content with Apache Camel

+

THANK YOU.