lucandra
DESCRIPTION
Lucandra presentation by Jake LucianiTRANSCRIPT
Lucandra
Lucene + Cassandra
http://github/tjake/Lucandrahttp://twitter.com/tjakeJake Luciani
What we'll cover today:
• Search use-cases• Problems scaling and maintaining Lucene/Solr• Cassandra• Lucandra• Lucandra in Action • Q&A
Types of search apps:
Types of search apps:
Lucene/Solr Scaling Problems
• Writes are expensive on a live systemo Merge, Reopen, Optimize, Sorting
• "Too many open files"• Solr replication too many moving parts• Scaling writes requires client side sharding• Lots of grid management -> ZooKeeper?• Backups? Monitoring? Failures? Ops Team? Oh my!
This sounds a lot like mysql doesn't it?....
Cassandra - Love Child of BigTable and Dynamo
• Peer to peer (easy to add new nodes)• CAP Configurable• Multi-level TreeMap (sorta)• Pluggable replication/sorting• Writes are very fast!• Low latency • Integrates with Hadoop • Major adoption and development
Cassandra's Data Model{ "bloghost.com" : //Keyspace { "Posts" : //ColumnFamily { "tjake.bloghost.com" : //Key { "20100426-Lucandra" : "lucandra talk today!" } //Columns } }, { "Comments" : //SuperColumnFamily { "tjake.bloghost.com" : //Key { "20100426-Lucandra-1": //SuperColumn {"From" : "Otis","Comment": "Don't Suck!"}, //Columns }, { "20100426-Lucandra-2": //SuperColumn {"From" : "Jake","Comment": "O.K."}, //Columns }, }}}
Cassandra - Partitioning
Cassandra - Scale Up / Scale Down
Cassandra - Replication
Solr/Lucene Components
Lucandra Components
How is an index stored?
{ "Lucandra" : { "Docs" : { "Index1/Doc1" : { "Field1" : "T1 T2 T1", ... }, { "Index1/Doc2" : { "Field1" : "T3 T1", ... } }, {"TermVectors" : {"Index1/Field1/T1" : { "Doc1": [0, 2], "Doc2":[1] }, {"Index1/Field1/T2" : { "Doc1": [1] }, {"Index1/Field1/T3" : { "Doc2": [1] }, }}
Lucandra Deployed
Lucandra In Action
Sparse.ly and Wikassandra
sparse.ly - twitter search for friends only• ~4k Indexes on 2 boxes
Wikassandra - Search wikipedia
• 4 node cluster• 3k writes per sec (over thrift from single node)• Solr interface