lucandra

Lucandra

Lucene + Cassandra

http://github/tjake/Lucandrahttp://twitter.com/tjakeJake Luciani

What we'll cover today:

• Search use-cases• Problems scaling and maintaining Lucene/Solr• Cassandra• Lucandra• Lucandra in Action • Q&A

Types of search apps:

Lucene/Solr Scaling Problems

• Writes are expensive on a live systemo Merge, Reopen, Optimize, Sorting

• "Too many open files"• Solr replication too many moving parts• Scaling writes requires client side sharding• Lots of grid management -> ZooKeeper?• Backups? Monitoring? Failures? Ops Team? Oh my!

This sounds a lot like mysql doesn't it?....

Cassandra - Love Child of BigTable and Dynamo

• Peer to peer (easy to add new nodes)• CAP Configurable• Multi-level TreeMap (sorta)• Pluggable replication/sorting• Writes are very fast!• Low latency • Integrates with Hadoop • Major adoption and development

Cassandra's Data Model{ "bloghost.com" : //Keyspace { "Posts" : //ColumnFamily { "tjake.bloghost.com" : //Key { "20100426-Lucandra" : "lucandra talk today!" } //Columns } }, { "Comments" : //SuperColumnFamily { "tjake.bloghost.com" : //Key { "20100426-Lucandra-1": //SuperColumn {"From" : "Otis","Comment": "Don't Suck!"}, //Columns }, { "20100426-Lucandra-2": //SuperColumn {"From" : "Jake","Comment": "O.K."}, //Columns }, }}}

Cassandra - Partitioning

Cassandra - Scale Up / Scale Down

Cassandra - Replication

Solr/Lucene Components

Lucandra Components

How is an index stored?

{ "Lucandra" : { "Docs" : { "Index1/Doc1" : { "Field1" : "T1 T2 T1", ... }, { "Index1/Doc2" : { "Field1" : "T3 T1", ... } }, {"TermVectors" : {"Index1/Field1/T1" : { "Doc1": [0, 2], "Doc2":[1] }, {"Index1/Field1/T2" : { "Doc1": [1] }, {"Index1/Field1/T3" : { "Doc2": [1] }, }}

Lucandra Deployed

Lucandra In Action

Sparse.ly and Wikassandra

sparse.ly - twitter search for friends only• ~4k Indexes on 2 boxes

Wikassandra - Search wikipedia

• 4 node cluster• 3k writes per sec (over thrift from single node)• Solr interface

lucandra

Documents