lucandra

17
Lucandra Lucene + Cassandra http://github/tjake/ Lucandra http://twitter.com/tjake Jake Luciani

Upload: otisg

Post on 15-Jan-2015

13.088 views

Category:

Documents


0 download

DESCRIPTION

Lucandra presentation by Jake Luciani

TRANSCRIPT

Page 1: Lucandra

Lucandra

Lucene + Cassandra

http://github/tjake/Lucandrahttp://twitter.com/tjakeJake Luciani

Page 2: Lucandra

What we'll cover today:

• Search use-cases• Problems scaling and maintaining Lucene/Solr• Cassandra• Lucandra• Lucandra in Action • Q&A

Page 3: Lucandra

Types of search apps: 

Page 4: Lucandra

Types of search apps: 

Page 5: Lucandra

Lucene/Solr Scaling Problems

• Writes are expensive on a live systemo Merge, Reopen, Optimize, Sorting

• "Too many open files"• Solr replication too many moving parts• Scaling writes requires client side sharding• Lots of grid management -> ZooKeeper?• Backups? Monitoring? Failures? Ops Team? Oh my!

This sounds a lot like mysql doesn't it?....

Page 6: Lucandra

Cassandra - Love Child of BigTable and Dynamo

• Peer to peer (easy to add new nodes)• CAP Configurable• Multi-level TreeMap (sorta)• Pluggable replication/sorting• Writes are very fast!• Low latency • Integrates with Hadoop • Major adoption and development

Page 7: Lucandra

Cassandra's Data Model{ "bloghost.com" :                                                   //Keyspace     { "Posts" :                                                            //ColumnFamily       { "tjake.bloghost.com" :                                   //Key           { "20100426-Lucandra" : "lucandra talk today!" } //Columns             }      },    { "Comments" :                                         //SuperColumnFamily        { "tjake.bloghost.com" :                        //Key           { "20100426-Lucandra-1":                //SuperColumn              {"From" : "Otis","Comment": "Don't Suck!"}, //Columns              },           { "20100426-Lucandra-2":                //SuperColumn               {"From" : "Jake","Comment": "O.K."},  //Columns                        },     }}}

Page 8: Lucandra

Cassandra - Partitioning

Page 9: Lucandra

Cassandra - Scale Up / Scale Down

Page 10: Lucandra

Cassandra - Replication

Page 11: Lucandra

Solr/Lucene Components

Page 12: Lucandra

Lucandra Components

Page 13: Lucandra

How is an index stored?

{ "Lucandra" :  { "Docs" :                       {  "Index1/Doc1" :  { "Field1" : "T1 T2 T1", ... },      {  "Index1/Doc2" :  { "Field1" : "T3 T1", ... }   },   {"TermVectors" :      {"Index1/Field1/T1" : { "Doc1": [0, 2], "Doc2":[1] },      {"Index1/Field1/T2" : { "Doc1": [1] },      {"Index1/Field1/T3" : { "Doc2": [1] },   }}

Page 14: Lucandra

Lucandra Deployed

Page 15: Lucandra

Lucandra In Action

Sparse.ly and Wikassandra

Page 16: Lucandra

sparse.ly -  twitter search for friends only• ~4k Indexes on 2 boxes

Page 17: Lucandra

Wikassandra - Search wikipedia

• 4 node cluster• 3k writes per sec (over thrift from single node)• Solr interface