intro to apache solr

16
Apache Solr Introduction & Demo

Upload: shalin-shekhar-mangar

Post on 16-Apr-2017

230 views

Category:

Software


1 download

TRANSCRIPT

Page 1: Intro to Apache Solr

Apache SolrIntroduction & Demo

Page 2: Intro to Apache Solr

• What is Apache Solr?

• Start/stop Solr

• Indexing data to Solr

• Searching data

• Running a SolrCloud cluster

• Hacking Solr

Agenda

Page 3: Intro to Apache Solr

• Lucene based search server + other features

• Access Lucene over HTTP:

• Java, Python, Ruby, .NET, PHP over XML/JSON and other formats

• Faceting (guided navigation), suggestions, highlighting etc.

• Replication and distributed search

• Lucene best practices

What is Apache Solr?

Page 4: Intro to Apache Solr

• Extract:

• tar xvf solr-5.1.0.tgz (linux/mac)

• unzip solr-5.1.0.zip or click+extract (windows)

• Run:

• ./bin/solr start -e schemaless

• ./bin/solr start -e schemaless -p 8983

• ./bin/solr -help

• ./bin/solr start -help

• Stop:

• ./bin/solr stop

Running Solr

Page 5: Intro to Apache Solr

• ./bin/post script

• Using curl directly

• Using the Admin UI

• SolrJ and other indexing clients

Indexing data

Page 6: Intro to Apache Solr

Demo time

Page 7: Intro to Apache Solr

Inverted index

Page 8: Intro to Apache Solr

• +red +shoes = red AND shoes

• +shoes -red = shoes NOT red

• “android phone”

• “android phone” -samsung = “android phone” NOT samsung “android samsung”~4

• merced*

• createDate:[201301 TO 201401]

• author:shalin

• author:”shalin mangar”

• author:”shalin mangar” AND project:(lucene OR solr) title:samsung^5 category:phone

Lucene/Solr query syntax

Page 9: Intro to Apache Solr

• DataImportHandler: Index databases, Email, RSS, XMLs etc.

• Rich document support: PDF, MS Office, Images etc.

• Faceting, stats, analytics

• Replication for high query volume

• Production systems with billions of documents

• Very extensible and customizable

• Embedded in commercial search products from Lucidworks, DataStax, Cloudera, Hortonworks, Pivotal, Amazon Cloudsearch, Riak etc.

Other features of Solr

Page 10: Intro to Apache Solr

• Subset of optional features in Solr to enable and simplify horizontal scaling a search index using sharding and replication

• Goals: scalability, performance, high-availability, simplicity, and elasticity

What is SolrCloud?

Page 11: Intro to Apache Solr

• ./bin/solr -e cloud

• Yeah, it’s that simple!

Running SolrCloud

Page 12: Intro to Apache Solr

SolrCloud demo

Page 13: Intro to Apache Solr

• http://wiki.apache.org/solr/HowToContribute

• Pre-requisites:

• git: git clone http://git-wip-us.apache.org/repos/asf/lucene-solr.git

• github: fork and clone apache/lucene-solr

• ant 1.8.x or above

• Eclipse or Intellij Idea (I recommend Idea)

• Put svn/git and ant in your $PATH or %PATH%

Hacking Solr

Page 14: Intro to Apache Solr

• ant ivy-bootstrap (required only once)

• ant idea or ant eclipse (generated a complete project for you which you can open in your favourite IDE)

• Find an existing Jira issue or open a new one at http://issues.apache.org/jira/browse/SOLR

• Make changes, write tests, once finished:

• run ‘cd solr; ant server’ to build Solr and start via bin/solr scripts

• run ‘ant test’ (it can take a while), ensure all tests pass

• run ‘ant precommit’, (run from the checkout root) ensure it passes

• Generate a patch with ‘svn diff’ or ‘git diff’ and attach to Jira

Hacking Solr

Page 15: Intro to Apache Solr

• http://lucene.apache.org/solr

• https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide

• https://issues.apache.org/jira/browse/SOLR

• Ask me: solr-help.slack.com

• Ask other users: [email protected]

• Ask developers: [email protected] (use sparingly)

Resources

Page 16: Intro to Apache Solr

Thank youShalin Shekhar Mangar, [email protected]