new-age search through apache solr

wwwedurekacoapache-solr

New-Age Search through Apache Solr

View Apache Solr course details at wwwedurekacoapache-solr

For QueriesPost on Twitter edurekaIN askEdurekaPost on Facebook edurekaIN

For more details please contact us US 1800 275 9730 (toll free)INDIA +91 88808 62004Email Us salesedurekaco

Slide 2

LIVE Online Class

Class Recording in LMS

247 Post Class Support

Module Wise Quiz

Project Work

Verifiable Certificate


How it Works

Slide 3 wwwedurekacoapache-solr

Objectives

At the end of this module you will be able to understand

The need for search engine for enterprise grade applications

The objectives amp challenges of search engine

How is Indexing amp Searching Handled in Lucene

Solr and its Architecture

Near Real Time Search with Solr

Leveraging Solr Capabilities with Hadoop

Solr with YARN

About job opportunity for Solr Developers

Slide 4Slide 4Slide 4 wwwedurekacoapache-solr

Why Do I Need Search Engines


Search Engine Why do I need them

1 Text Based Search

2 Filter

3 Documents

1

2

3


Search Engine ndash What it should be

If you need a storage engine to search records documents using text-based keywords it should support following

features

1 Should be optimized for faster text searches

2 Should have flexible schema

3 Should support sorting of documents

4 Web Scale - Should be optimized for reads

5 Should be document oriented


Cleartrip Spatial Search


What is Lucene

Lucene is a powerful Java search library that lets you easily add search or Information Retrieval (IR) to applications

Used by LinkedIn Twitter hellip and many more (see httpwikiapacheorglucene-javaPoweredBy )

Scalable amp High-performance Indexing

Powerful Accurate and Efficient Search Algorithms

Cross-Platform Solution

raquo Open Source amp 100 pure Java

raquo Implementations in other programming languages available that are index-compatible

Doug Cutting ldquoCreatorrdquo


Indexing ndash How it works

I like edureka coursesEdureka teaches big

data coursesEdureka helps learn new

technologies easily

Document - 1 (ldquoD1rdquo) Document - 2 (ldquoD2rdquo) Document - 3 (ldquoD3rdquo)

ldquoedurekardquo = D1 D2 D3ldquocoursesrdquo = D1 D2ldquoteachesrdquo = D2ldquobigrdquo = D2ldquodatardquo = D2ldquohelpsrdquo = D3

ldquoedurekardquo


Lucene ndash Writing to Index

Field

Field

Field

Field

Analyzer IndexWriter Directory

Document

Classes used when indexing documents with Lucene


Lucene ndash Searching In Index

QueryParser

Analyzer

IndexSearcherExpressionQuery object

Text fragments

Query Parser translates a textual expression from the end into an arbitrarily complex query for searching


Solr is an open source enterprise search server web application

Solr Uses the Lucene Search Library and extends it

Solr exposes lucene Java APIrsquos as RESTful services

You put documents in it (called indexing) via XML JSON CSV or binary over HTTP

You query it via HTTP GET and receive XML JSON CSV or binary results

What is Solr


Advanced Full-Text Search Capabilities

Optimized for High Volume Web Traffic

Standards Based Open Interfaces - XML JSON and HTTP

Comprehensive HTML Administration Interfaces

Server statistics exposed over JMX for monitoring

Near Real-time indexing and Adaptable with XML Configuration

Linearly scalable auto index replication auto Extensible Plugin Architecture

Solr Key Features


Solr Architecture


Request Handler

Query ParserResponse

Writer

Index

qt selects a RequestHandler for a query usingselect(by default the DisMaxRequestHandler is used)

defType selects a query parser for the query(by default uses whatever has been configured for the RequestHandler)

qf selects which fields to queryin the index(by default all fields are required)

wt selects a response writer for formatting the query response

fq filters query by applying an additional query to the initial queryrsquos results caches the results

Rows specifies the number of rows to be displayed at one time

Start specifies an offset(by default 0) into the query results where the returned response should begin

Solr Search Process


Near Real-Time Search

Near Real Time (NRT) search means that documents are available for search almost immediately after being indexed additions and updates to documents are seen in near real time

httplocalhost8983solrupdatestreambody=ltaddgtltdocgtltfieldname=idgttestdocltfieldgtltdocgtltaddgtampcommit=true


Real-Time Get

The realtime get feature allows retrieval (by unique-key) of the latest version of any documents without the associated cost of reopening a searcher

This is primarily useful when using Solr as a NoSQL data store and not just a search index



Solr provides us fast efficient powerful full-text search and near real-time indexing and SolrCloud is flexible

distributed search and indexing and will do things like automatic fail over etc

Hence its very suitable as NoSQL replacement for traditional databases in many situations especially when the size of

the data exceeds what is reasonable with a typical RDBMS

We can do scalable indexing using Hadoop MapReduce or PIG job and then load the indexed data in Solr

In all the major Hadoop distribution like Cloudera Hortonworks MapR you can integrate Solr easily


PDF

Word

HTML

Raw Files

Lucene

SolR SolR SolR

Query Response

Search Web App

MapReduce Indexing Job

Raw Files Indexed

HDFS(Hadoop Distributed File System)

Scalable Indexing

Input Data


Solr with YARN


Job trends for Apache Solr


Disclaimer

Criteria and guidelines mentioned in this presentation may change Please visit our website for latest and additional information on Apache Solr


Course Topics

Module 5

raquo Solr Searching

Module 6

raquo Solr Extended Features

Module 7

raquo Solr Cloud amp Administration

Module 8

raquo Final Project

Module 1

raquo Introduction to Apache Lucene

Module 2

raquo Exploring Lucene

Module 3

raquo Introduction to Apache Solr

Module 4

raquo Solr Indexing

LIVE Online Class

Class Recording in LMS

247 Post Class Support

Module Wise Quiz

Project Work

Verifiable Certificate


How it Works

Slide 3 wwwedurekacoapache-solr

Objectives








Solr with YARN






1 Text Based Search

2 Filter

3 Documents

1

2

3




features









What is Lucene













technologies easily



ldquoedurekardquo



Field

Field

Field

Field


Document




QueryParser

Analyzer


Text fragments








What is Solr









Solr Key Features


Solr Architecture


Request Handler


Writer

Index








Solr Search Process






Real-Time Get












PDF

Word

HTML

Raw Files

Lucene

SolR SolR SolR

Query Response

Search Web App


Raw Files Indexed


Scalable Indexing

Input Data


Solr with YARN




Disclaimer



Course Topics

Module 5


Module 6


Module 7


Module 8

raquo Final Project

Module 1


Module 2


Module 3


Module 4

raquo Solr Indexing


Objectives








Solr with YARN






1 Text Based Search

2 Filter

3 Documents

1

2

3




features









What is Lucene













technologies easily



ldquoedurekardquo



Field

Field

Field

Field


Document




QueryParser

Analyzer


Text fragments








What is Solr









Solr Key Features


Solr Architecture


Request Handler


Writer

Index








Solr Search Process






Real-Time Get












PDF

Word

HTML

Raw Files

Lucene

SolR SolR SolR

Query Response

Search Web App


Raw Files Indexed


Scalable Indexing

Input Data


Solr with YARN




Disclaimer



Course Topics

Module 5


Module 6


Module 7


Module 8

raquo Final Project

Module 1


Module 2


Module 3


Module 4

raquo Solr Indexing

Slide 4Slide 4 wwwedurekacoapache-solr




1 Text Based Search

2 Filter

3 Documents

1

2

3




features









What is Lucene













technologies easily



ldquoedurekardquo



Field

Field

Field

Field


Document




QueryParser

Analyzer


Text fragments








What is Solr









Solr Key Features


Solr Architecture


Request Handler


Writer

Index








Solr Search Process






Real-Time Get












PDF

Word

HTML

Raw Files

Lucene

SolR SolR SolR

Query Response

Search Web App


Raw Files Indexed


Scalable Indexing

Input Data


Solr with YARN




Disclaimer



Course Topics

Module 5


Module 6


Module 7


Module 8

raquo Final Project

Module 1


Module 2


Module 3


Module 4

raquo Solr Indexing



1 Text Based Search

2 Filter

3 Documents

1

2

3




features









What is Lucene













technologies easily



ldquoedurekardquo



Field

Field

Field

Field


Document




QueryParser

Analyzer


Text fragments








What is Solr









Solr Key Features


Solr Architecture


Request Handler


Writer

Index








Solr Search Process






Real-Time Get












PDF

Word

HTML

Raw Files

Lucene

SolR SolR SolR

Query Response

Search Web App


Raw Files Indexed


Scalable Indexing

Input Data


Solr with YARN




Disclaimer



Course Topics

Module 5


Module 6


Module 7


Module 8

raquo Final Project

Module 1


Module 2


Module 3


Module 4

raquo Solr Indexing




features









What is Lucene













technologies easily



ldquoedurekardquo



Field

Field

Field

Field


Document




QueryParser

Analyzer


Text fragments








What is Solr









Solr Key Features


Solr Architecture


Request Handler


Writer

Index








Solr Search Process






Real-Time Get












PDF

Word

HTML

Raw Files

Lucene

SolR SolR SolR

Query Response

Search Web App


Raw Files Indexed


Scalable Indexing

Input Data


Solr with YARN




Disclaimer



Course Topics

Module 5


Module 6


Module 7


Module 8

raquo Final Project

Module 1


Module 2


Module 3


Module 4

raquo Solr Indexing




What is Lucene













technologies easily



ldquoedurekardquo



Field

Field

Field

Field


Document




QueryParser

Analyzer


Text fragments








What is Solr









Solr Key Features


Solr Architecture


Request Handler


Writer

Index








Solr Search Process






Real-Time Get












PDF

Word

HTML

Raw Files

Lucene

SolR SolR SolR

Query Response

Search Web App


Raw Files Indexed


Scalable Indexing

Input Data


Solr with YARN




Disclaimer



Course Topics

Module 5


Module 6


Module 7


Module 8

raquo Final Project

Module 1


Module 2


Module 3


Module 4

raquo Solr Indexing





technologies easily



ldquoedurekardquo



Field

Field

Field

Field


Document




QueryParser

Analyzer


Text fragments








What is Solr









Solr Key Features


Solr Architecture


Request Handler


Writer

Index








Solr Search Process






Real-Time Get












PDF

Word

HTML

Raw Files

Lucene

SolR SolR SolR

Query Response

Search Web App


Raw Files Indexed


Scalable Indexing

Input Data


Solr with YARN




Disclaimer



Course Topics

Module 5


Module 6


Module 7


Module 8

raquo Final Project

Module 1


Module 2


Module 3


Module 4

raquo Solr Indexing



Field

Field

Field

Field


Document




QueryParser

Analyzer


Text fragments








What is Solr









Solr Key Features


Solr Architecture


Request Handler


Writer

Index








Solr Search Process






Real-Time Get












PDF

Word

HTML

Raw Files

Lucene

SolR SolR SolR

Query Response

Search Web App


Raw Files Indexed


Scalable Indexing

Input Data


Solr with YARN




Disclaimer



Course Topics

Module 5


Module 6


Module 7


Module 8

raquo Final Project

Module 1


Module 2


Module 3


Module 4

raquo Solr Indexing



QueryParser

Analyzer


Text fragments








What is Solr









Solr Key Features


Solr Architecture


Request Handler


Writer

Index








Solr Search Process






Real-Time Get












PDF

Word

HTML

Raw Files

Lucene

SolR SolR SolR

Query Response

Search Web App


Raw Files Indexed


Scalable Indexing

Input Data


Solr with YARN




Disclaimer



Course Topics

Module 5


Module 6


Module 7


Module 8

raquo Final Project

Module 1


Module 2


Module 3


Module 4

raquo Solr Indexing







What is Solr









Solr Key Features


Solr Architecture


Request Handler


Writer

Index








Solr Search Process






Real-Time Get












PDF

Word

HTML

Raw Files

Lucene

SolR SolR SolR

Query Response

Search Web App


Raw Files Indexed


Scalable Indexing

Input Data


Solr with YARN




Disclaimer



Course Topics

Module 5


Module 6


Module 7


Module 8

raquo Final Project

Module 1


Module 2


Module 3


Module 4

raquo Solr Indexing









Solr Key Features


Solr Architecture


Request Handler


Writer

Index








Solr Search Process






Real-Time Get












PDF

Word

HTML

Raw Files

Lucene

SolR SolR SolR

Query Response

Search Web App


Raw Files Indexed


Scalable Indexing

Input Data


Solr with YARN




Disclaimer



Course Topics

Module 5


Module 6


Module 7


Module 8

raquo Final Project

Module 1


Module 2


Module 3


Module 4

raquo Solr Indexing


Solr Architecture


Request Handler


Writer

Index








Solr Search Process






Real-Time Get












PDF

Word

HTML

Raw Files

Lucene

SolR SolR SolR

Query Response

Search Web App


Raw Files Indexed


Scalable Indexing

Input Data


Solr with YARN




Disclaimer



Course Topics

Module 5


Module 6


Module 7


Module 8

raquo Final Project

Module 1


Module 2


Module 3


Module 4

raquo Solr Indexing


Request Handler


Writer

Index








Solr Search Process






Real-Time Get












PDF

Word

HTML

Raw Files

Lucene

SolR SolR SolR

Query Response

Search Web App


Raw Files Indexed


Scalable Indexing

Input Data


Solr with YARN




Disclaimer



Course Topics

Module 5


Module 6


Module 7


Module 8

raquo Final Project

Module 1


Module 2


Module 3


Module 4

raquo Solr Indexing






Real-Time Get












PDF

Word

HTML

Raw Files

Lucene

SolR SolR SolR

Query Response

Search Web App


Raw Files Indexed


Scalable Indexing

Input Data


Solr with YARN




Disclaimer



Course Topics

Module 5


Module 6


Module 7


Module 8

raquo Final Project

Module 1


Module 2


Module 3


Module 4

raquo Solr Indexing










PDF

Word

HTML

Raw Files

Lucene

SolR SolR SolR

Query Response

Search Web App


Raw Files Indexed


Scalable Indexing

Input Data


Solr with YARN




Disclaimer



Course Topics

Module 5


Module 6


Module 7


Module 8

raquo Final Project

Module 1


Module 2


Module 3


Module 4

raquo Solr Indexing


Solr with YARN




Disclaimer



Course Topics

Module 5


Module 6


Module 7


Module 8

raquo Final Project

Module 1


Module 2


Module 3


Module 4

raquo Solr Indexing




Disclaimer



Course Topics

Module 5


Module 6


Module 7


Module 8

raquo Final Project

Module 1


Module 2


Module 3


Module 4

raquo Solr Indexing


Course Topics

Module 5


Module 6


Module 7


Module 8

raquo Final Project

Module 1


Module 2


Module 3


Module 4

raquo Solr Indexing

new-age search through apache solr

Technology