Download - Lucandra
![Page 1: Lucandra](https://reader036.vdocument.in/reader036/viewer/2022081907/54b770d64a79592a448b46bc/html5/thumbnails/1.jpg)
Lucandra
Lucene + Cassandra
http://github/tjake/Lucandrahttp://twitter.com/tjakeJake Luciani
![Page 2: Lucandra](https://reader036.vdocument.in/reader036/viewer/2022081907/54b770d64a79592a448b46bc/html5/thumbnails/2.jpg)
What we'll cover today:
• Search use-cases• Problems scaling and maintaining Lucene/Solr• Cassandra• Lucandra• Lucandra in Action • Q&A
![Page 3: Lucandra](https://reader036.vdocument.in/reader036/viewer/2022081907/54b770d64a79592a448b46bc/html5/thumbnails/3.jpg)
Types of search apps:
![Page 4: Lucandra](https://reader036.vdocument.in/reader036/viewer/2022081907/54b770d64a79592a448b46bc/html5/thumbnails/4.jpg)
Types of search apps:
![Page 5: Lucandra](https://reader036.vdocument.in/reader036/viewer/2022081907/54b770d64a79592a448b46bc/html5/thumbnails/5.jpg)
Lucene/Solr Scaling Problems
• Writes are expensive on a live systemo Merge, Reopen, Optimize, Sorting
• "Too many open files"• Solr replication too many moving parts• Scaling writes requires client side sharding• Lots of grid management -> ZooKeeper?• Backups? Monitoring? Failures? Ops Team? Oh my!
This sounds a lot like mysql doesn't it?....
![Page 6: Lucandra](https://reader036.vdocument.in/reader036/viewer/2022081907/54b770d64a79592a448b46bc/html5/thumbnails/6.jpg)
Cassandra - Love Child of BigTable and Dynamo
• Peer to peer (easy to add new nodes)• CAP Configurable• Multi-level TreeMap (sorta)• Pluggable replication/sorting• Writes are very fast!• Low latency • Integrates with Hadoop • Major adoption and development
![Page 7: Lucandra](https://reader036.vdocument.in/reader036/viewer/2022081907/54b770d64a79592a448b46bc/html5/thumbnails/7.jpg)
Cassandra's Data Model{ "bloghost.com" : //Keyspace { "Posts" : //ColumnFamily { "tjake.bloghost.com" : //Key { "20100426-Lucandra" : "lucandra talk today!" } //Columns } }, { "Comments" : //SuperColumnFamily { "tjake.bloghost.com" : //Key { "20100426-Lucandra-1": //SuperColumn {"From" : "Otis","Comment": "Don't Suck!"}, //Columns }, { "20100426-Lucandra-2": //SuperColumn {"From" : "Jake","Comment": "O.K."}, //Columns }, }}}
![Page 8: Lucandra](https://reader036.vdocument.in/reader036/viewer/2022081907/54b770d64a79592a448b46bc/html5/thumbnails/8.jpg)
Cassandra - Partitioning
![Page 9: Lucandra](https://reader036.vdocument.in/reader036/viewer/2022081907/54b770d64a79592a448b46bc/html5/thumbnails/9.jpg)
Cassandra - Scale Up / Scale Down
![Page 10: Lucandra](https://reader036.vdocument.in/reader036/viewer/2022081907/54b770d64a79592a448b46bc/html5/thumbnails/10.jpg)
Cassandra - Replication
![Page 11: Lucandra](https://reader036.vdocument.in/reader036/viewer/2022081907/54b770d64a79592a448b46bc/html5/thumbnails/11.jpg)
Solr/Lucene Components
![Page 12: Lucandra](https://reader036.vdocument.in/reader036/viewer/2022081907/54b770d64a79592a448b46bc/html5/thumbnails/12.jpg)
Lucandra Components
![Page 13: Lucandra](https://reader036.vdocument.in/reader036/viewer/2022081907/54b770d64a79592a448b46bc/html5/thumbnails/13.jpg)
How is an index stored?
{ "Lucandra" : { "Docs" : { "Index1/Doc1" : { "Field1" : "T1 T2 T1", ... }, { "Index1/Doc2" : { "Field1" : "T3 T1", ... } }, {"TermVectors" : {"Index1/Field1/T1" : { "Doc1": [0, 2], "Doc2":[1] }, {"Index1/Field1/T2" : { "Doc1": [1] }, {"Index1/Field1/T3" : { "Doc2": [1] }, }}
![Page 14: Lucandra](https://reader036.vdocument.in/reader036/viewer/2022081907/54b770d64a79592a448b46bc/html5/thumbnails/14.jpg)
Lucandra Deployed
![Page 15: Lucandra](https://reader036.vdocument.in/reader036/viewer/2022081907/54b770d64a79592a448b46bc/html5/thumbnails/15.jpg)
Lucandra In Action
Sparse.ly and Wikassandra
![Page 16: Lucandra](https://reader036.vdocument.in/reader036/viewer/2022081907/54b770d64a79592a448b46bc/html5/thumbnails/16.jpg)
sparse.ly - twitter search for friends only• ~4k Indexes on 2 boxes
![Page 17: Lucandra](https://reader036.vdocument.in/reader036/viewer/2022081907/54b770d64a79592a448b46bc/html5/thumbnails/17.jpg)
Wikassandra - Search wikipedia
• 4 node cluster• 3k writes per sec (over thrift from single node)• Solr interface