apache solr! enterprise search solutions at your fingertips!
DESCRIPTION
Get an overview of Apache Solr as an enterprise search server. Get to know the available alternatives and why the Solr is cool! Get Excited! Enterprise Search Solutions are ready to pick.TRANSCRIPT
Apache-Solr! Enterprise Search Solutions at your Fingertips!
Murshed Ahmmad Khan @usamurai, [email protected]
Presented at phpXperts seminar 2011…
The criteria…
Enterprise Search Server
Fast
Flexible
Powerful
Scalable
Relevant Results
Production ready & Easy deployment
What’s in your mind, the name…
??
Apache Solr!
Fits all the above mentioned criteria…
Solr, What is it…?
q Open Source, Java application q Runs as a standalone full-text search server within any servlet container
q Uses Lucene Java search library as its core
SOLR WORK FLOW…
Solr History… q Developed at CNET Networks by Yonik Seeley
q Donated to ASF (Apache Software Foundation) in early 2006
Solr History…(2)
q Incubation period ended in january 2007 (v-1.2 released)
q Solr is now maintained as a subproject of Lucene
Solr - Features…
Powerful Full-Text search…
Hit Highlighting…
Faceted Search…
Tag Clouds…
Geo-spatial search…
Solr – Features (cont..) q Database integration
q Rich document (Word, PDF) handling
q REST-like HTTP/XML, JSON APIs (so, you can code virtually in any language)
CLIENT API SUPPORT… q Java (SolrJ), q .NET (solrnet, SolrSharp), q PHP (SolPHP), q Python (SolPython), q Ruby(on Rails) (rsolr, acts-as-solr,
sunspot), q C++, q XML/HTTP, q JSON/HTTP (AJAX Solr) ++ q PERL(SolPerl)
Solr - Features… (cont…) q Flexible configuration
q Extensive Plugin architecture for advanced customization
q Scalable distributed search, dynamic clustering, index replication
Alternatives to Solr q Use Google (GSA – has
integration problems).
q FAST (Stopped supporting linux)
q Use Lucene (write code on top of that)
Alternatives to Solr…(2) q Use your Database (has
performance issues)
q Sphinx (written in C++)
q Commercial Libraries (e.g. lucidimagination.com)
q Write your own
Who Use Solr/Lucene?
Who use Solr/Lucene…
More names: http://wiki.apache.org/solr/PublicServers
OPERATING SYSTEM SUPPORT
q All with a Java VM, including:
q Linux (all versions)
q Windows (all versions)
q MacOS (all versions)
q Unix variants
APP SERVER SUPPORT q Apache Tomcat, q Jetty, q Resin, q WebLogicTM, q WebSphereTM, q GlassFish, q dmServerTM, q JBossTM and many more q Java JDK 1.5 or later [requirement]
INSTALLATION
1. Download the latest version of: apache-solr & tomcat
2. Extract it: $tar -xzvf ./apache-solr-1.4.1.tgz $tar -xzvf ./apache-tomcat-6.0.35.tar.gz
INSTALLATION
3. copy the solr.war file in the tomcat webapps folder: $ cp apache-solr-1.4.1/dist/apache-solr-1.4.1.war apache-tomcat-6.0.35/webapps/solr.war
4. copy the example/solr directory into the tomcat home directory $ cp -r apache-solr-1.4.1/example/solr .
INSTALLATION
5. start the tomcat server $ ./bin/startup.sh
6. Visit http://localhost:8080/solr/admin/
YOU ARE DONE…
CREATE SCHEMA.XML <field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="service" type="string" indexed="true" stored="true" required="true" />
<field name="contentType" type="string" indexed="true" stored="true" required="true" />
<field name="dbId" type="long" indexed="true" stored="true" />
<field name="content" type="text" indexed="true" stored="true" />
<copyField source="*" dest=”all” />
INDEX DOCUMENTS (INDEXER)
The Common Loop
INDEX DOCUMENTS
1. </add> Add single/multiple documents $doc = new SolrSimpleDocument( array(
new SolrSimpleField('id', ’aawaj-profile-' . $user->id),
new SolrSimpleField('service', 'aawaj'),
new SolrSimpleField('contentType', 'profile'),
new SolrSimpleField('dbId', (string)$user->id)
)); $this->solr->add($doc);
INDEX DOCUMENTS
2. </commit>
Commit multiple documents at once.
$this->solr->commit();
INDEX DOCUMENTS
3. </optimize>
Optimize, for performance improvement
$this->solr->optimize();
SOLR QUERY SYNTAXES
QUERY SYNTAXES (RDMS)
SELECT * FROM post WHERE (topic LIKE ‘%apache%’ OR author LIKE ‘%kabir%’)
OR (topic LIKE ‘%solr%’ OR author LIKE ‘%frank%’) ORDER BY id DESC
QUERY SYNTAXES (SOLR)
Topic:"The Right Way" AND author:WrongGuy
BOOSTING TERMS()
topic: "jakarta apache"^4 "Apache Lucene"
FUZZY SEARCH (SOLR)
topic:roam~ (similar in spelling roam)
matches foam roams, based on the Levenshtein Distance, or Edit Distance algorithm
PROXIMITY SEARCH (SOLR)
“jakarta apache”~10
search for a "apache" and "jakarta" within 10 words of each other in a document
SO, NOW, CAN I MAKE A MINI
GOOGLE?
YES, YOU CAN!
q Apache NUTCH, already there
q Open source, Web-search software project.
q Based on Solr...
INTERESTED? READ MORE… Ø http://lucene.apache.org/solr/ Ø http://wiki.apache.org/solr Ø http://lucene.apache.org/java/docs/
scoring.html
Ø http://lucene.apache.org/java/3_5_0/queryparsersyntax.html
Ø http://www.slideshare.net/erikhatcher/solr-search-at-the-speed-of-light http://www.slideshare.net/pittaya/using-apache-solr
WHO AM I… murshed ahmmad Khan head of development,
http://www.usamurai.com @usamurai email: [email protected]
THANKS…
Questions?