in search of... integrating site search

Post on 17-May-2015

5.809 Views

Category:

Technology

8 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation from PHP UK 2010. Despite being a key method of navigation on many sites, search functionality often gets the short end of the stick in development, either by handing the job over to Google or just enabling full text search on the appropriate column in the database. In this talk we will look at how full text search actually works, how to integrate local text search engines into your PHP application, and how it's possible to actually provide better and more relevant results than Google itself, at least for your own site.

TRANSCRIPT

Ian Barber@ianbarber

http://phpir.comian@ibuildings.com

http://joind.in/talk/view/1462

In Search Of...integrating site search

what do you want?

3

How Search WorksIntegrating SearchImproving Results

Using SearchSearch Performance

Questions

4

4

Index

DocumentDocumentDocumentDocumentAnalyser

Query Parser

QueryQueryQueryQuery

ResultResultResultResult

6

With AT&T’s help, the F.B.I Miami-Dade office had recovered $1.1 million from O’Healy’s Ponzi scheme, 10-15% more than expected.

Tokenisation

“”

7

PHP Tokenisation

function tokenise($string) { $string = strtolower($string); preg_match_all('/\w+/', $string, $matches, PREG_OFFSET_CAPTURE); return $matches[0];}

8

Document Term PairsDocument ID Term

1 the 1 best1 of1 the ... ...

204 and 204 what204 would

9

Inverted IndexTerm Documents

best 1 (4, 16), 4 (422), 129 (344) ...

what 24 (50, 98), 75 (33, 208) ...

would 99 (32, 599), 201 (344) ..

... ...

10

Boolean Query MergeQuery: Best Western Hotel

Result: Document 298

best 1 4 129 298 305 338western 4 95 194 204 298 305

hotel 2 40 200 298 355 402working 4 298 305

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed sit amet ante vitae enim elementum semper sodales quis ipsum. Aliquam vel condimentum neque. Curabitur ornare feugiat ornare. Donec consectetur elit metus. Nulla eleifend tincidunt massa et euismod. Vestibulum vestibulum, justo vel egestas elementum, purus enim ornare quam, vel gravida est enim vel nibh.

Nam non eros nisi, eget fringilla justo. Fusce vel risus vitae mauris vehicula facilisis sit amet in mi. Nulla ut turpis id felis sollicitudin dictum sed non ipsum. Praesent ut risus nulla, sed blandit leo. Curabitur volutpat laoreet lacus, ut consectetur arcu vestibulum vel. Donec dapibus fringilla arcu, et semper lacus egestas non. Quisque eu purus ut lacus egestas dapibus. Integer in velit id est dictum bibendum in id mi.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed sit amet ante vitae enim elementum semper sodales quis ipsum. Aliquam vel condimentum neque. Curabitur ornare feugiat ornare. Donec consectetur elit metus. Nulla eleifend tincidunt massa et euismod. Vestibulum vestibulum, justo vel egestas elementum, purus enim ornare quam, vel gravida est enim vel nibh.

Nam non eros nisi, eget fringilla justo. Fusce vel risus vitae mauris vehicula facilisis sit amet in mi. Nulla ut turpis id felis sollicitudin dictum sed non ipsum. Praesent ut risus nulla, sed blandit leo. Curabitur volutpat laoreet lacus, ut consectetur arcu vestibulum vel. Donec dapibus fringilla arcu, et semper lacus

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed sit amet ante vitae enim elementum semper sodales quis ipsum. Aliquam vel condimentum neque. Curabitur ornare feugiat ornare. Donec consectetur elit metus. Nulla eleifend tincidunt massa et euismod. Vestibulum vestibulum, justo vel egestas elementum, purus enim ornare quam, vel gravida est enim vel nibh.

Nam non eros nisi, eget fringilla justo. Fusce vel risus vitae mauris vehicula facilisis sit amet in mi. Nulla ut turpis id felis sollicitudin dictum sed non ipsum. Praesent ut risus nulla, sed blandit leo. Curabitur volutpat laoreet lacus, ut consectetur arcu vestibulum vel. Donec dapibus fringilla arcu, et semper lacus

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed sit amet ante vitae enim elementum semper sodales quis ipsum. Aliquam vel condimentum neque. Curabitur ornare feugiat ornare. Donec consectetur elit metus. Nulla eleifend tincidunt massa et euismod. Vestibulum vestibulum, justo vel egestas elementum, purus enim ornare quam, vel gravida est enim vel nibh.

Nam non eros nisi, eget fringilla justo. Fusce vel risus vitae mauris vehicula facilisis sit amet in mi. Nulla ut turpis id felis sollicitudin dictum sed non ipsum. Praesent ut risus nulla, sed blandit leo. Curabitur volutpat laoreet lacus, ut consectetur arcu vestibulum vel. Donec dapibus fringilla arcu, et semper lacus

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed sit amet ante vitae enim elementum semper sodales quis ipsum. Aliquam vel condimentum neque. Curabitur ornare feugiat ornare. Donec consectetur elit metus. Nulla eleifend tincidunt massa et euismod. Vestibulum vestibulum, justo vel egestas elementum, purus enim ornare quam, vel gravida est enim vel nibh.

Nam non eros nisi, eget fringilla justo. Fusce vel risus vitae mauris vehicula facilisis sit amet in mi. Nulla ut turpis id felis sollicitudin dictum sed non ipsum. Praesent ut risus nulla, sed blandit leo. Curabitur volutpat laoreet lacus, ut consectetur arcu vestibulum vel. Donec dapibus fringilla arcu, et semper lacus Lorem ipsum dolor sit amet,

consectetur adipiscing elit. Sed sit amet ante vitae enim elementum semper sodales quis ipsum. Aliquam vel condimentum neque. Curabitur ornare feugiat ornare. Donec consectetur elit metus. Nulla eleifend tincidunt massa et euismod. Vestibulum vestibulum, justo vel egestas elementum, purus enim ornare quam, vel gravida est enim vel nibh.

Nam non eros nisi, eget fringilla justo. Fusce vel risus vitae mauris vehicula facilisis sit amet in mi. Nulla ut turpis id felis sollicitudin dictum sed non ipsum. Praesent ut risus nulla, sed blandit leo. Curabitur volutpat laoreet lacus, ut consectetur arcu vestibulum vel. Donec dapibus fringilla arcu, et semper lacus

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed sit amet ante vitae enim elementum semper sodales quis ipsum. Aliquam vel condimentum neque. Curabitur ornare feugiat ornare. Donec consectetur elit metus. Nulla eleifend tincidunt massa et euismod. Vestibulum vestibulum, justo vel egestas elementum, purus enim ornare quam, vel gravida est enim vel nibh.

Nam non eros nisi, eget fringilla justo. Fusce vel risus vitae mauris vehicula facilisis sit amet in mi. Nulla ut turpis id felis sollicitudin dictum sed non ipsum. Praesent ut risus nulla, sed blandit leo. Curabitur volutpat laoreet lacus, ut consectetur arcu vestibulum vel. Donec dapibus fringilla arcu, et semper lacus

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed sit amet ante vitae enim elementum semper sodales quis ipsum. Aliquam vel condimentum neque. Curabitur ornare feugiat ornare. Donec consectetur elit metus. Nulla eleifend tincidunt massa et euismod. Vestibulum vestibulum, justo vel egestas elementum, purus enim ornare quam, vel gravida est enim vel nibh.

Nam non eros nisi, eget fringilla justo. Fusce vel risus vitae mauris vehicula facilisis sit amet in mi. Nulla ut turpis id felis sollicitudin dictum sed non ipsum. Praesent ut risus nulla, sed blandit leo. Curabitur volutpat laoreet lacus, ut consectetur arcu vestibulum vel. Donec dapibus fringilla arcu, et semper lacus

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed sit amet ante vitae enim elementum semper sodales quis ipsum. Aliquam vel condimentum neque. Curabitur ornare feugiat ornare. Donec consectetur elit metus. Nulla eleifend tincidunt massa et euismod. Vestibulum vestibulum, justo vel egestas elementum, purus enim ornare quam, vel gravida est enim vel nibh.

Nam non eros nisi, eget fringilla justo. Fusce vel risus vitae mauris vehicula facilisis sit amet in mi. Nulla ut turpis id felis sollicitudin dictum sed non ipsum. Praesent ut risus nulla, sed blandit leo. Curabitur volutpat laoreet lacus, ut consectetur arcu vestibulum vel. Donec dapibus fringilla arcu, et semper lacus

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed sit amet ante vitae enim elementum semper sodales quis ipsum. Aliquam vel condimentum neque. Curabitur ornare feugiat ornare. Donec consectetur elit metus. Nulla eleifend tincidunt massa et euismod. Vestibulum vestibulum, justo vel egestas elementum, purus enim ornare quam, vel gravida est enim vel nibh.

Nam non eros nisi, eget fringilla justo. Fusce vel risus vitae mauris vehicula facilisis sit amet in mi. Nulla ut turpis id felis sollicitudin dictum sed non ipsum. Praesent ut risus nulla, sed blandit leo. Curabitur volutpat laoreet lacus, ut consectetur arcu vestibulum vel. Donec dapibus fringilla arcu, et semper lacus

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed sit amet ante vitae enim elementum semper sodales quis ipsum. Aliquam vel condimentum neque. Curabitur ornare feugiat ornare. Donec consectetur elit metus. Nulla eleifend tincidunt massa et euismod. Vestibulum vestibulum, justo vel egestas elementum, purus enim ornare quam, vel gravida est enim vel nibh.

Nam non eros nisi, eget fringilla justo. Fusce vel risus vitae mauris vehicula facilisis sit amet in mi. Nulla ut turpis id felis sollicitudin dictum sed non ipsum. Praesent ut risus nulla, sed blandit leo. Curabitur volutpat laoreet lacus, ut consectetur arcu vestibulum vel. Donec dapibus fringilla arcu, et semper lacus

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed sit amet ante vitae enim elementum semper sodales quis ipsum. Aliquam vel condimentum neque. Curabitur ornare feugiat ornare. Donec consectetur elit metus. Nulla eleifend tincidunt massa et euismod. Vestibulum vestibulum, justo vel egestas elementum, purus enim ornare quam, vel gravida est enim vel nibh.

Nam non eros nisi, eget fringilla justo. Fusce vel risus vitae mauris vehicula facilisis sit amet in mi. Nulla ut turpis id felis sollicitudin dictum sed non ipsum. Praesent ut risus nulla, sed blandit leo. Curabitur volutpat laoreet lacus, ut consectetur arcu vestibulum vel. Donec dapibus fringilla arcu, et semper lacus

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed sit amet ante vitae enim elementum semper sodales quis ipsum. Aliquam vel condimentum neque. Curabitur ornare feugiat ornare. Donec consectetur elit metus. Nulla eleifend tincidunt massa et euismod. Vestibulum vestibulum, justo vel egestas elementum, purus enim ornare quam, vel gravida est enim vel nibh.

Nam non eros nisi, eget fringilla justo. Fusce vel risus vitae mauris vehicula facilisis sit amet in mi. Nulla ut turpis id felis sollicitudin dictum sed non ipsum. Praesent ut risus nulla, sed blandit leo. Curabitur volutpat laoreet lacus, ut consectetur arcu vestibulum vel. Donec dapibus fringilla arcu, et semper lacus

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed sit amet ante vitae enim elementum semper sodales quis ipsum. Aliquam vel condimentum neque. Curabitur ornare feugiat ornare. Donec consectetur elit metus. Nulla eleifend tincidunt massa et euismod. Vestibulum vestibulum, justo vel egestas elementum, purus enim ornare quam, vel gravida est enim vel nibh.

Nam non eros nisi, eget fringilla justo. Fusce vel risus vitae mauris vehicula facilisis sit amet in mi. Nulla ut turpis id felis sollicitudin dictum sed non ipsum. Praesent ut risus nulla, sed blandit leo. Curabitur volutpat laoreet lacus, ut consectetur arcu vestibulum vel. Donec dapibus fringilla arcu, et semper lacus

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Sed sit amet ante vitae enim elementum semper sodales quis ipsum. Aliquam vel condimentum neque. Curabitur ornare feugiat ornare. Donec consectetur elit metus. Nulla eleifend tincidunt massa et euismod. Vestibulum vestibulum, justo vel egestas elementum, purus enim ornare quam, vel gravida est enim vel nibh.

Nam non eros nisi, eget fringilla justo. Fusce vel risus vitae mauris vehicula facilisis sit amet in mi. Nulla ut turpis id felis sollicitudin dictum sed non ipsum. Praesent ut risus nulla, sed blandit leo. Curabitur volutpat laoreet lacus, ut consectetur arcu vestibulum vel. Donec dapibus fringilla arcu, et semper lacus

12

TF-IDF

function getWeight($docID, $term, $total) { $tf = count($term[$docID]); $idf = log($total / count($term), 2); return $tf * $idf;}

13

Document Vector

socket what heavy steel ...

Doc 1 0.02 0.3 0.001 0 ...

Doc 2 0 0 0 0 ...

Doc 3 0.001 0.2 0 0 ...

Doc 4 0 0 0.002 0.003 ...

best 23 42 179 246 333 703

weight 0.008 0.002 0.023 0.039 0.014 0.001

western 42 88 120 179 246 798

weight 0.003 0.004 0.023 0.001 0.034 0.004

1 - 246: 0.0732 - 179: 0.0243 - 120: 0.023

Ranked Query Merge

14

15

PHP Similarityfunction score($queryString, $index) { $query = tokenize($queryString); $matches = array(); foreach($query as $qterm) { $postings = $index[$qterm]; foreach($postings as $id => $posting) { $matches[$id] += $posting['score']; } } return arsort($matches);}

16

Integrating Search

17

CREATE TABLE example ( id INT(11) NOT NULL auto_increment, title VARCHAR(255), content TEXT, PRIMARY KEY(id), FULLTEXT(title,content)) Engine=MyISAM;

INSERT INTO example (title, content) VALUES ('Mikko & Bacon','Mikko loves bacon'),('Marcello & Bacon','Marcello hates bacon'),('Jo & Sausages','Johanna loves sausages'),('Hollywood & Garlic','Lorenzo hates garlic'),('James & Cheddar','James is keen on cheeses');

MySQL Full Text Search

18

MySQL FTI QuerySELECT * FROM example WHERE MATCH(title,content) AGAINST('loves bacon');

+----+------------------+------------------------+| id | title | content |+----+------------------+------------------------+| 1 | Mikko & Bacon | Mikko loves bacon | | 2 | Marcello & Bacon | Marcello hates bacon | | 3 | Jo & Sausages | Johanna loves sausages | +----+------------------+------------------------+3 rows in set (0.00 sec)

19

Looking At The Index/var/lib/mysql/fttest# myisam_ftdump example 1

Total rows: 5Total words: 17Unique words: 14Longest word: 9 chars (hollywood)Median length: 5Average global weight: 1.176117Most common word: 2 times, weight: 0.405465 (bacon)

20

Sphinx http://www.sphinxsearch.com

21

Sphinx Configurationsource posts{ type = mysql sql_host = localhost sql_user = user sql_pass = password sql_db = search

sql_query = \ SELECT id, title, content FROM example; sql_attr_multi = uint tag from query; \ SELECT example_id, tag_id FROM tags;}

22

index posts{ source = posts path = /var/data/sphinx/example morphology = stem_en

min_word_len = 3 min_prefix_len = 3 min_infix_len = 0 enable_star = 1}

23

Stemming

happeninghappenedhappens

http://tartarus.org/~martin/PorterStemmer

- happen- happen- happen

24

Command Line Searchingindexer --config /etc/sphinx.conf --allsearch --config /etc/sphinx.conf love bacon

displaying matches:1. document=1, weight=3, tag=(1,2)! id=1! title=Mikko & Bacon! content=Mikko loves baconwords:1. 'love': 2 documents, 2 hits2. 'bacon': 2 documents, 4 hits

searchd --config /etc/sphinx.conf

25

Sphinx From PHP

$cl = new SphinxClient();$cl->SetServer('localhost', 3312);$cl->SetMatchMode(SPH_MATCH_ANY);

$result = $cl->Query('bac*');$docIDs = array_keys($result["matches"]);

$cl->SetFilter('tag', array(1));$result = $cl->Query('bac*');$docIDs = array_keys($result["matches"]);

26

Swish-E . http://swish-e.org

pecl install swish-beta

27

Filesystem Index With Swish-E

IndexDir /var/data/documentsIndexFile fs-swish-e.indexIndexOnly .doc .docx .pdfFuzzyIndexingMode Stemming_en1

FileFilter .pdf /usr/local/bin/swish_filter.plFileFilter .doc /usr/local/bin/swish_filter.pl

fs-swish-e.conf

/usr/local/bin/swish-e -S fs -c fs-swish-e.conf

28

Crawling Content

IndexDir /usr/local/lib/swish-e/spider.plIndexFile www-swish-e.indexSwishProgParameters default http://phpir.com/

FuzzyIndexingMode Stemming_en1DefaultContents HTML

www-swish-e.conf

/usr/local/bin/swish-e -S prog -c www-swish-e.conf

29

Swish-E With Multiple Indices$swish = new Swish( 'www-swish-e.index fs-swish-e.index');$search = $swish->prepare();

$queryStr = 'search string goes here';$result = $search->execute($queryStr);$total = $result->hits;

while($r = $result->nextResult()) { echo $r->swishdocpath; // url}

30

Lucene

31

$index = Zend_Search_Lucene::create('idx');foreach($documents as $title => $content) { $doc = new Zend_Search_Lucene_Document(); $doc->addField( Zend_Search_Lucene_Field::Text( 'title', $title)); $doc->addField( Zend_Search_Lucene_Field::UnStored( 'content', $content)); $index->addDocument($doc);}

Build Index

32

$results = $index->find('loves bacon');foreach($results as $result) { echo $result->score, " "; echo $result->title, "\n";} Output: 0.81656279309067 Mikko and Bacon0.24800278854758 Marcello & Bacon

Query Zend Search Lucene

33

$file = file_get_contents($url);

$doc = Zend_Search_Lucene_Document_Html:: loadHTML($file);

$doc->addField( Zend_Search_Lucene_Field::Text( 'url', $url);$index->addDocument($doc)

Index HTML

34

Solr http://lucene.apache.org/solr/

35

Solr Search Index$options = array( 'hostname' => 'localhost', 'port' => 8983 );

$client = new SolrClient($options);$doc = new SolrInputDocument();$doc->addField('id', $id);$doc->addField('cat', $category);$doc->addField('title', $title);$doc->addField('text', $text);$response = $client->addDocument($doc);$client->commit();

36

Solr Search Client$client = new SolrClient($options);

$query = new SolrQuery('bacon');$response = $client->query($query);$r = $response->getResponse();

foreach($r['response']['docs'] as $d) { echo $d->title[0] . "\n";}

37

Xapian .

http://xapian.org

38

Xapian In PHP$db = new XapianWritableDatabase( 'idx', Xapian::DB_CREATE_OR_OPEN);$i = new XapianTermGenerator();$i->set_stemmer(new XapianStem("english"));

$doc = new XapianDocument();$doc->set_data($content);$doc->add_value(1, $title);

$i->set_document($doc);$i->index_text($content);$db->add_document($doc);

39

Xapian Search In PHP

$database = new XapianDatabase('idx');$enquire = new XapianEnquire($database);$qp = new XapianQueryParser();$qp->set_stemmer(new XapianStem("english"));$qp->set_database($database);$qp->set_stemming_strategy( XapianQueryParser::STEM_SOME);$query = $qp->parse_query($queryString);

$enquire->set_query($query);

40

$matches = $enquire->get_mset(0, 10);

$i = $matches->begin();while(!$i->equals($matches->end())) { $n = $i->get_rank() + 1; $data = $i->get_document()->get_data(); $title = $i->get_document()->get_value(1); $score = $i->get_percent(); $i->next();}

41

Improving Results

42

Anchor Text

43

$p = file_get_contents('http://phpir.com');

libxml_use_internal_errors(true);$dom = DomDocument::loadHTML($p);$links = $dom->getElementsByTagName('a');

foreach($links as $link) { $href = $link->getAttribute('href'); $text = $link->nodeValue;}

Parse Anchor Text

44

1

2

3

Zone Weighting

45

ZSL Zone Weighting

$doc = new Zend_Search_Lucene_Document();

$tfield = Zend_Search_Lucene_Field::Text ('title', $title);$tfield->boost = 1.3;$doc->addField($tfield);

$doc->addField( Zend_Search_Lucene_Field::UnStored ('content', $content));

$index->addDocument($doc);

46

Document Authority

47

Document Weights in ZSL$doc = new Zend_Search_Lucene_Document();$doc->addField( Zend_Search_Lucene_Field::Text ('title', $title));$doc->addField( Zend_Search_Lucene_Field::UnStored ('content', $content));

$doc->boost = 1 + ($numComments / 100);

$index->addDocument($doc);

48

Using Search

49

Summaries & Highlighting

50

Sphinx Extract & Highlight$cl = new SphinxClient();$cl->SetServer( "localhost", 3312 );$q = 'bacon';$r = $cl->Query($q);foreach ($r["matches"] as $doc => $info) { $text[$doc] = getTextFromDB($doc);}

$e = $cl->BuildExcerpts($text, 'posts', $q);foreach($extracts as $extract) { echo $extract;}

52

Xapian Spelling Correction$indexer = new XapianTermGenerator();$indexer->set_database($database);$indexer->set_flags( XapianTermGenerator::FLAG_SPELLING);

Indexer

$queryString = "strreplace or str_cmp";$q = new XapianQueryParser();$q->set_database($database);$query = $q->parse_query($queryString, XapianQueryParser::FLAG_SPELLING_CORRECTION);echo "Did you mean: " . $q->get_corrected_query_string() . "\n";

Searcher

53

Spelling Correction Output php xapsearch.php

Did you mean: str_replace or strcmp

4644 results found for “strreplace or str_cmp”:1: 2% docid=572 [phpdocs/html/cc.license.html]2: 2% docid=7169 [phpdocs/html/imagick.constants.html]3: 2% docid=10086 [phpdocs/html/sqlite3result.fetcharray.html]4: 2% docid=6132 [phpdocs/html/function.swf-posround.html]

54

Results Sorting

55

Sorting in ZSL

$q = Zend_Search_Lucene_Search_QueryParser:: parse('search string');

$results = $index->find($q, 'title');foreach($results as $result) { echo '<h3>', $result->title, "</h3>\n"; $doc = getDocumentFromDB($result->did); echo $q->htmlFragmentHighlightMatches($doc);}

56

Faceted Search

57

Faceted Search In Solr$client = new SolrClient($options);$query = new SolrQuery('bacon');$response = $client->query($query);$query->setFacet(true);$query->addFacetField('cat');$r = $response->getResponse();$f = $r['facet_counts']['facet_fields'];foreach($f['cat'] as $facet => $count) { echo $facet . " " . $count . "\n";}

58

More Like This

59

More Like This$rset = new XapianRset();$rset->add_document(5959); // str_replace$e = $enquire->get_eset(40, $rset);

$t = $e->begin();for($t; !$t->equals($e->end()); $t->next()){ $qs[] = new XapianQuery($t->get_term(), intval($t->get_weight()));}

$query = new XapianQuery( XapianQuery::OP_OR, $qs);

60

More Like This Example php xapsim.php

1656 results found:1: 100% docid=5959 [phpdocs/html/function.str-replace.html]2: 47% docid=5956 [phpdocs/html/function.str-ireplace.html]3: 24% docid=5328 [phpdocs/html/function.preg-replace.html]4: 18% docid=5958 [phpdocs/html/function.str-repeat.html]

61

Search Performance

62

Index Updates

Docs

Main

New

Delta Delta Main

Query

Delta Main

Main

DocsDocsDocs

63

Search Speed$index = Zend_Search_Lucene::open('index');$index->optimize();

indexer --merge main delta --rotate

Zend Search Lucene

Sphinx

$client = new SolrClient($options);$client->optimize();

Solr

xapian-compact xapindex xapindex2Xapian

64

Distributing Search

Index

Application

Index Index

DocumentDocumentDocumentDocument

65

Large Scale Search

http://www.nutch.org

http://hadoop.apache.org

66

Image CreditsTitle http://www.flickr.com/photos/generated/2084287794/What Do You Want http://www.flickr.com/photos/the_justified_sinner/

2498066986/You Are Here http://www.flickr.com/photos/alecvuijlsteke/2692475420/Integrating Search http://www.flickr.com/photos/squeaks2569/3700355684/Sphinx http://www.flickr.com/photos/generated/2084287794/Lucene http://www.flickr.com/photos/mypanda/7731447/Swish-e http://www.flickr.com/photos/ryan_fung/2239687100/Solr http://www.flickr.com/photos/m-j-s/2724756177/Xapian http://www.flickr.com/photos/olibac/3522056495/Using Search http://www.flickr.com/photos/eneas/175027945/Improving Search http://www.flickr.com/photos/x-ray_delta_one/3928200642/Search Performance http://www.flickr.com/photos/maisonbisson/1634408/Large Scale Search http://www.flickr.com/photos/zedzap/3663508847/

Questions?

67

Thank you!

Ian Barber@ianbarber

http://phpir.comian@ibuildings.com

http://joind.in/talk/view/1462

top related