justia and amazon cloudsearch

12
Nephelococcygia (noun) The act of searching for shapes in clouds Nick Moline Justia’s Cloud Farmer @NickMoline http://www.nick.pro/ Guillermo Balboa Software Developer

Upload: nick-moline

Post on 27-May-2015

455 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Justia and Amazon CloudSearch

Nephelococcygia(noun) The act of searching for

shapes in clouds

Nick MolineJustia’s Cloud Farmer@NickMolinehttp://www.nick.pro/

Guillermo BalboaSoftware Developer

Page 2: Justia and Amazon CloudSearch
Page 3: Justia and Amazon CloudSearch

Search Engines we’ve tried…

• Google Mini / Search Appliance• Google Custom Search Engine• Sphinx• Apache SOLR• Amazon Cloud Search

Page 4: Justia and Amazon CloudSearch

Google Mini

Pros Cons

Really simple to set up (for web pages or documents)

No discrete field searching (other than things like title:)

Can run inside your firewall Physical branded box to install and support

Great highlighting and snippet generation Very limited control over look / feel of results

Creates “Cached Version” even of PDFs

No Geospatial

No JSON version

Discontinued

Pugs love them

Page 5: Justia and Amazon CloudSearch

Pros Cons

Really simple to set up (for web pages or documents)

No discrete field searching (other than things like title:)

If your site is indexed, no wait time to get started

Minimal control over new content getting indexed on your terms

Great highlighting and snippet generation Very limited control over look / feel of resultsWith JSON/XML version can only return 4 or 8 results at a timeNo Geospatial

Page 6: Justia and Amazon CloudSearch

Pros Cons

Very fast for searching Very slow for indexing

Full control of when content is indexed Requires reindexing ALL content, every time

Good Geospatial Search built in

Newer versions can be connected to with MySQL libraries and queried like a DB

Doesn’t return any of the textual content, so requires a separate database query ALWAYS

Filters & Faceting Only on Numeric fields

Field Boosting!

Page 7: Justia and Amazon CloudSearch

Pros Cons

Very extendable and configurable Very difficult to optimize performance

Full control of when content is indexed Adding Content

Geospatial with “LocalSOLR” plugin

Returns content More content you return, slower it gets

Does highlighting Highlighting is not good performance

Tons of Faceting options

Sharding and Cores Again, hard to optimize

Field Boosting

Document Boosting!

Page 8: Justia and Amazon CloudSearch

Pros Cons

Extremely fast

Automatically Scales (no thinking) No control of the scaling

Automatically Shards when adding contentEasy Re-indexing of content

Returns content for creating snippets

Easy JSON implementation

No geo (yet)

No highlighting/snippet gen (yet)

No field boosting (yet)

Page 9: Justia and Amazon CloudSearch
Page 10: Justia and Amazon CloudSearch

Getting Around Lack of Field Boosting

• Duplicate Word Mark field as both text and literal• Do 4 Searches:

• Exact Word Mark Match• bq=(and type:'trademark_case' literal_word_mark:'amazon')

• Prefix Word Mark Match• bq=(and type:'trademark_case' (and (not literal_word_mark:'amazon')

literal_word_mark:'amazon*'))• Anywhere Word Mark Match

• bq=(and type:'trademark_case' (and (not literal_word_mark:'amazon*') word_mark:'amazon'))

• Full Text Search• bq=(and type:'trademark_case' (and (not literal_word_mark:'amazon*')

(not literal_word_mark:'amazon') (not word_mark:'amazon')))• Pass counts with pagination links

Page 11: Justia and Amazon CloudSearch

We want more!• Trademarks– Field Boosting would simplify Greatly!– Snippet gen could make for nicer search

snippets• Law.justia.com– Using Google Custom Search now–Must have Snippet Generation–Must have Field Boosting

• Lawyers.justia.com– Using sphinx right now–Must have Geospatial–Must have Field Boosting

Page 12: Justia and Amazon CloudSearch

Nick MolineJustia’s Cloud Farmer@NickMolinehttp://www.nick.pro/

Guillermo BalboaSoftware Developer