c* summit eu 2013: cassandra internals
DESCRIPTION
Speaker: Aaron Morton, Apache Cassandra Committer & Co-Founder/Principle Consultant at The Last Pickle Inc. Video: http://www.youtube.com/watch?v=efI5fL8eEfo&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=23 From the microsecond your request hits an Apache Cassandra node there are many code paths, threads and machines involved in storing or fetching your data. This talk will step through the common operations and highlight the code responsible. Apache Cassandra solves many interesting problems to provide a scalable, distributed, fault tolerant database. Cluster wide operations track node membership, direct requests and implement consistency guarantees. At the node level, the Log Structured storage engine provides high performance reads and writes. All of this is implemented in a Java code base that has greatly matured over the past few years. This talk will step through read and write requests, automatic processes and manual maintenance tasks. I'll discuss the general approach to solving the problem and drill down to the code responsible for implementation. Existing Cassandra users, those wanting to contribute to the project and people interested in Dynamo based systems will all benefit from this tour of the code base.TRANSCRIPT
CASSANDRA EU 2013
CASSANDRA INTERNALS
Aaron Morton @aaronmorton
!
Co-Founder & Principal Consultant www.thelastpickle.com
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License #CassandraEU
About The Last Pickle. Work with clients to deliver and improve
Apache Cassandra based solutions.
Apache Cassandra Committer, DataStax MVP, Hector Maintainer, Apache Usergrid Committer.
Based in New Zealand & Austin, TX.
#CassandraEUwww.thelastpickle.com
Architecture Code
#CassandraEUwww.thelastpickle.com
Cassandra Architecture.
API's
Cluster Aware
Cluster Unaware
Clients
Disk
#CassandraEUwww.thelastpickle.com
Cassandra Cluster Architecture.
API's
Cluster Aware
Cluster Unaware
Clients
Disk
API's
Cluster Aware
Cluster Unaware
Disk
Node 1 Node 2
#CassandraEUwww.thelastpickle.com
Dynamo Cluster Architecture.
API's
Dynamo
Database
Clients
Disk
API's
Dynamo
Database
Disk
Node 1 Node 2
www.thelastpickle.com #CassandraEU
Architecture API
Dynamo Database
#CassandraEUwww.thelastpickle.com
API Transports. !
Thrift Native Binary
!
#CassandraEUwww.thelastpickle.com
Thrift Transport. !
//Custom TServer implementations o.a.c.thrift.CustomTThreadPoolServer o.a.c.thrift.CustomTHsHaServer
#CassandraEUwww.thelastpickle.com
API Transports.
Thrift Native Binary
#CassandraEUwww.thelastpickle.com
Native Binary Transport. !
Beta in Cassandra 1.2, now GA. Uses Netty. CQL 3 only.
#CassandraEUwww.thelastpickle.com
o.a.c.transport.Server.run() !
//Setup the Netty server new ExecutionHandler() new NioServerSocketChannelFactory() ServerBootstrap.setPipelineFactory()
#CassandraEUwww.thelastpickle.com
o.a.c.transport.Message.Dispatcher.messageReceived() !
//Process message from client ServerConnection.validateNewMessage() Request.execute() ServerConnection.applyStateTransition() Channel.write()
#CassandraEUwww.thelastpickle.com
Messages. !
Defined in the Native Binary Protocol
$SRC/doc/native_protocol.spec
#CassandraEUwww.thelastpickle.com
API Services. !
JMX Thrift
CQL 3 !
#CassandraEUwww.thelastpickle.com
JMX Management Beans. !
Spread around the code base.
Interfaces named *MBean
#CassandraEUwww.thelastpickle.com
JMX Management Beans. !
Registered with names such as org.apache.cassandra.db:
type=StorageProxy
#CassandraEUwww.thelastpickle.com
API Services. !
JMX Thrift CQL 3
!
#CassandraEUwww.thelastpickle.com
o.a.c.thrift.CassandraServer !
// Implements Thrift Interface // Access control // Input validation // Mapping to/from Thrift and internal types
#CassandraEUwww.thelastpickle.com
Thrift Interface. !
Thrift IDL $SRC/interface/cassandra.thrift
#CassandraEUwww.thelastpickle.com
o.a.c.thrift.CassandraServer.get_slice() !
// get columns for one row Tracing.begin() ClientState cState = state() cState.hasColumnFamilyAccess() multigetSliceInternal() !
#CassandraEUwww.thelastpickle.com
CassandraServer.multigetSliceInternal() !
// get columns for may rows ThriftValidation.validate*() // Create ReadCommands getSlice() !
#CassandraEUwww.thelastpickle.com
CassandraServer.getSlice() !
// Process ReadCommands // return Thrift types !
readColumnFamily() thriftifyColumnFamily() !
#CassandraEUwww.thelastpickle.com
CassandraServer.readColumnFamily() !
// Process ReadCommands // Return ColumnFamilies !
StorageProxy.read() !
#CassandraEUwww.thelastpickle.com
API Services. !
JMX Thrift
CQL 3 !
#CassandraEUwww.thelastpickle.com
o.a.c.cql3.QueryProcessor !
// Prepares and executes CQL3 statements // Used by Thrift & Native transports // Access control // Input validation // Returns transport.ResultMessage
!
!
#CassandraEUwww.thelastpickle.com
CQL3 Grammar. !
ANTLR Grammar $SRC/o.a.c.cql3/Cql.g
#CassandraEUwww.thelastpickle.com
o.a.c.cql3.statements.ParsedStatement !
// Subclasses generated by ANTLR // Tracks bound term count // Prepare CQLStatement prepare()
#CassandraEUwww.thelastpickle.com
o.a.c.cql3.statements.CQLStatement !
checkAccess(ClientState state) validate(ClientState state) execute(ConsistencyLevel cl, QueryState state, List<ByteBuffer> variables)
#CassandraEUwww.thelastpickle.com
statements.SelectStatement.RawStatement !
// Implements ParsedStatement // Input validation prepare()
#CassandraEUwww.thelastpickle.com
statements.SelectStatement.execute() !
// Create ReadCommands StorageProxy.read()
www.thelastpickle.com #CassandraEU
Architecture API
Dynamo Database
#CassandraEUwww.thelastpickle.com
Dynamo Layer. o.a.c.service
o.a.c.net !
o.a.c.dht o.a.c.gms
o.a.c.locator o.a.c.stream
#CassandraEUwww.thelastpickle.com
o.a.c.service.StorageProxy !
// Cluster wide storage operations // Select endpoints & check CL available // Send messages to Stages // Wait for response // Store Hints
#CassandraEUwww.thelastpickle.com
o.a.c.service.StorageService !
// Ring operations // Track ring state // Start & stop ring membership // Node & token queries
#CassandraEUwww.thelastpickle.com
o.a.c.service.IResponseResolver !
preprocess(MessageIn<T> message) resolve() throws DigestMismatchException !
RowDigestResolver RowDataResolver RangeSliceResponseResolver
#CassandraEUwww.thelastpickle.com
Response Handlers / Callback.
implements IAsyncCallback<T> !
response(MessageIn<T> msg) !
#CassandraEUwww.thelastpickle.com
o.a.c.service.ReadCallback.get()
//Wait for blockfor & data response condition.await(timeout, TimeUnit.MILLISECONDS) !
throw ReadTimeoutException() !
resolver.resolve()
#CassandraEUwww.thelastpickle.com
o.a.c.service.StorageProxy.fetchRows() !
getLiveSortedEndpoints() new RowDigestResolver() new ReadCallback() MessagingService.sendRR() --------------------------------------- ReadCallback.get() # blocking catch (DigestMismatchException ex) catch (ReadTimeoutException ex)
#CassandraEUwww.thelastpickle.com
Dynamo Layer !
o.a.c.service o.a.c.net
!
o.a.c.dht o.a.c.gms
o.a.c.locator o.a.c.stream
#CassandraEUwww.thelastpickle.com
o.a.c.net.MessagingService.verb<<enum>> !
MUTATION READ REQUEST_RESPONSE TREE_REQUEST TREE_RESPONSE
(And more...)
#CassandraEUwww.thelastpickle.com
o.a.c.net.MessagingService.verbHandlers !
new EnumMap<Verb, IVerbHandler>(Verb.class)
#CassandraEUwww.thelastpickle.com
o.a.c.net.IVerbHandler<T> !
doVerb(MessageIn<T> message, String id);
!
#CassandraEUwww.thelastpickle.com
o.a.c.net.MessagingService.verbStages !
new EnumMap<MessagingService.Verb, Stage>(MessagingService.Verb.class)
#CassandraEUwww.thelastpickle.com
o.a.c.net.MessagingService.receive() !
runnable = new MessageDeliveryTask( message, id, timestamp); !
StageManager.getStage( message.getMessageType()); !
stage.execute(runnable);
#CassandraEUwww.thelastpickle.com
o.a.c.net.MessageDeliveryTask.run() !
// If dropable and rpc_timeout MessagingService.incrementDroppedMessages(v
erb); return; !
MessagingService.getVerbHandler(verb) verbHandler.doVerb(message, id)
#CassandraEUwww.thelastpickle.com
Architecture API Layer
Dynamo Layer Database Layer
#CassandraEUwww.thelastpickle.com
Database Layer !
o.a.c.concurrent o.a.c.db
!
o.a.c.cache o.a.c.io
o.a.c.trace
#CassandraEUwww.thelastpickle.com
o.a.c.concurrent.StageManager !
stages = new EnumMap<Stage, ThreadPoolExecutor>(Stage.class); !
getStage(Stage stage)
#CassandraEUwww.thelastpickle.com
o.a.c.concurrent.Stage !
READ MUTATION GOSSIP REQUEST_RESPONSE ANTI_ENTROPY
(And more...)#CassandraEUwww.thelastpickle.com
Database Layer. o.a.c.concurrent
o.a.c.db !
o.a.c.cache o.a.c.io
o.a.c.trace
#CassandraEUwww.thelastpickle.com
o.a.c.db.Table !
// Keyspace open(String table) getColumnFamilyStore(String cfName) !
getRow(QueryFilter filter) apply(RowMutation mutation, boolean writeCommitLog)
#CassandraEUwww.thelastpickle.com
o.a.c.db.ColumnFamilyStore !
// Column Family getColumnFamily(QueryFilter filter) getTopLevelColumns(...) !
apply(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer)
#CassandraEUwww.thelastpickle.com
o.a.c.db.IColumnContainer !
addColumn(IColumn column) remove(ByteBuffer columnName) !
ColumnFamily SuperColumn !
(Removed in 2.0)
#CassandraEUwww.thelastpickle.com
o.a.c.db.ISortedColumns !
addColumn(IColumn column, Allocator allocator) removeColumn(ByteBuffer name) !
ArrayBackedSortedColumns AtomicSortedColumns TreeMapBackedSortedColumns
#CassandraEUwww.thelastpickle.com
o.a.c.db.Memtable !
put(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer) !
flushAndSignal(CountDownLatch latch, Future<ReplayPosition> context)
#CassandraEUwww.thelastpickle.com
o.a.c.db.ReadCommand !
getRow(Table table) !
SliceByNamesReadCommand SliceFromReadCommand RangeSliceCommand
(Additional classes for paging in 2.0)
#CassandraEUwww.thelastpickle.com
o.a.c.db.IDiskAtomFilter !
getMemtableColumnIterator(...) getSSTableColumnIterator(...) !
IdentityQueryFilter NamesQueryFilter SliceQueryFilter
#CassandraEUwww.thelastpickle.com
Summary CustomTThreadPoolServer Message.Dispatcher
CassandraServer QueryProcessor
ReadCommand
StorageProxy
IResponseResolver
IAsyncCallback
MessagingService
IVerbHandler
Table ColumnFamilyStore IDiskAtomFilter
API
Dynamo
Database
#CassandraEUwww.thelastpickle.com
Thanks. !
#CassandraEUwww.thelastpickle.com
Aaron Morton @aaronmorton
www.thelastpickle.com !
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License