cassandra community webinar: apache cassandra internals
DESCRIPTION
Apache Cassandra solves many interesting problems to provide a scalable, distributed, fault tolerant database. Cluster wide operations track node membership, direct requests and implement consistency guarantees. At the node level, the Log Structured storage engine provides high performance reads and writes. All of this is implemented in a Java code base that has greatly matured over the past few years. In this webinar Aaron Morton will step through read and write requests, automatic processes and manual maintenance tasks. He will also discuss the general approach to solving the problem and drill down to the code responsible for implementation. Speaker: Aaron Morton, Apache Cassandra Committer Aaron Morton is a Freelance Developer based in New Zealand, and a Committer on the Apache Cassandra project. In 2010 he gave up the RDBMS world for the scale and reliability of Cassandra. He now spends his time advancing the Cassandra project and helping others get the best out of it.TRANSCRIPT
CASSANDRA COMMUNITY WEBINARS AUGUST 2013
CASSANDRA INTERNALS
Aaron Morton@aaronmorton
Co-Founder & Principal Consultantwww.thelastpickle.com
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
About The Last PickleWork with clients to deliver and improve
Apache Cassandra based solutions. Apache Cassandra Committer, DataStax MVP,
Hector Maintainer, 6+ years combined Cassandra experience.
Based in New Zealand & Austin, TX.
Cassandra Architecture
API's
Cluster Aware
Cluster Unaware
Clients
Disk
www.thelastpickle.com
Cassandra Cluster Architecture
API's
Cluster Aware
Cluster Unaware
Clients
Disk
API's
Cluster Aware
Cluster Unaware
Disk
Node 1 Node 2
www.thelastpickle.com
Dynamo Cluster Architecture
API's
Dynamo
Database
Clients
Disk
API's
Dynamo
Database
Disk
Node 1 Node 2
www.thelastpickle.com
ArchitectureAPI
DynamoDatabase
www.thelastpickle.com
API Transports
ThriftNative Binary
www.thelastpickle.com
Thrift Transport
//Custom TServer implementations
o.a.c.thrift.CustomTThreadPoolServero.a.c.thrift.CustomTNonBlockingServero.a.c.thrift.CustomTHsHaServer
www.thelastpickle.com
API Transports
ThriftNative Binary
www.thelastpickle.com
Native Binary Transport
Beta in Cassandra 1.2Uses Netty
Enabled with start_native_transport
(Disabled by default)
www.thelastpickle.com
o.a.c.transport.Server.run()
//Setup the Netty servernew ExecutionHandler()new NioServerSocketChannelFactory()ServerBootstrap.setPipelineFactory()
www.thelastpickle.com
o.a.c.transport.Message.Dispatcher.messageReceived()
//Process message from clientServerConnection.validateNewMessage()Request.execute()ServerConnection.applyStateTransition()Channel.write()
www.thelastpickle.com
Messages
Defined in the Native Binary Protocol
$SRC/doc/native_protocol.spec
www.thelastpickle.com
API Services
JMXThrift
CQL 3
www.thelastpickle.com
JMX Management Beans
Spread around the code base.
Interfaces named *MBean
www.thelastpickle.com
JMX Management Beans
Registered with names such as org.apache.cassandra.db:
type=StorageProxy
www.thelastpickle.com
API Services
JMXThriftCQL 3
www.thelastpickle.com
o.a.c.thrift.CassandraServer
// Implements Thrift Interface// Access control// Input validation// Mapping to/from Thrift and internal types
www.thelastpickle.com
Thrift Interface
Thrift IDL$SRC/interface/cassandra.thrift
www.thelastpickle.com
o.a.c.thrift.CassandraServer.get_slice()
// get columns for one rowTracing.begin()ClientState cState = state()cState.hasColumnFamilyAccess()multigetSliceInternal()
www.thelastpickle.com
CassandraServer.multigetSliceInternal()
// get columns for may rowsThriftValidation.validate*()// Create ReadCommandsgetSlice()
www.thelastpickle.com
CassandraServer.getSlice()
// Process ReadCommands// return Thrift types
readColumnFamily()thriftifyColumnFamily()
www.thelastpickle.com
CassandraServer.readColumnFamily()
// Process ReadCommands// Return ColumnFamilies
StorageProxy.read()
www.thelastpickle.com
API Services
JMXThrift
CQL 3
www.thelastpickle.com
o.a.c.cql3.QueryProcessor
// Prepares and executes CQL3 statements// Used by Thrift & Native transports// Access control// Input validation// Returns transport.ResultMessage
www.thelastpickle.com
CQL3 Grammar
ANTLR Grammar$SRC/o.a.c.cql3/Cql.g
www.thelastpickle.com
o.a.c.cql3.statements.ParsedStatement
// Subclasses generated by ANTLR// Tracks bound term count// Prepare CQLStatementprepare()
www.thelastpickle.com
o.a.c.cql3.statements.CQLStatement
checkAccess(ClientState state)validate(ClientState state)execute(ConsistencyLevel cl, QueryState state, List<ByteBuffer> variables)
www.thelastpickle.com
statements.SelectStatement.RawStatement
// Implements ParsedStatement// Input validationprepare()
www.thelastpickle.com
statements.SelectStatement.execute()
// Create ReadCommandsStorageProxy.read()
www.thelastpickle.com
ArchitectureAPI
DynamoDatabase
www.thelastpickle.com
Dynamo Layero.a.c.service
o.a.c.net
o.a.c.dhto.a.c.gms
o.a.c.locatoro.a.c.stream
www.thelastpickle.com
o.a.c.service.StorageProxy
// Cluster wide storage operations// Select endpoints & check CL available// Send messages to Stages// Wait for response// Store Hints
www.thelastpickle.com
o.a.c.service.StorageService
// Ring operations// Track ring state// Start & stop ring membership// Node & token queries
www.thelastpickle.com
o.a.c.service.IResponseResolver
preprocess(MessageIn<T> message)resolve() throws DigestMismatchException
RowDigestResolverRowDataResolverRangeSliceResponseResolver
www.thelastpickle.com
Response Handlers / Callback
implements IAsyncCallback<T>
response(MessageIn<T> msg)
www.thelastpickle.com
o.a.c.service.ReadCallback.get()
//Wait for blockfor & datacondition.await(timeout, TimeUnit.MILLISECONDS)
throw ReadTimeoutException()
resolver.resolve()
www.thelastpickle.com
o.a.c.service.StorageProxy.fetchRows()
getLiveSortedEndpoints()new RowDigestResolver()new ReadCallback()MessagingService.sendRR()---------------------------------------ReadCallback.get() # blockingcatch (DigestMismatchException ex)catch (ReadTimeoutException ex)
www.thelastpickle.com
Dynamo Layero.a.c.service
o.a.c.net
o.a.c.dhto.a.c.gms
o.a.c.locatoro.a.c.stream
www.thelastpickle.com
o.a.c.net.MessagingService.verb<<enum>>
MUTATIONREADREQUEST_RESPONSETREE_REQUESTTREE_RESPONSE
(And more...)
www.thelastpickle.com
o.a.c.net.MessagingService.verbHandlers
new EnumMap<Verb, IVerbHandler>(Verb.class)
www.thelastpickle.com
o.a.c.net.IVerbHandler<T>
doVerb(MessageIn<T> message, String id);
www.thelastpickle.com
o.a.c.net.MessagingService.verbStages
new EnumMap<MessagingService.Verb, Stage>(MessagingService.Verb.class)
www.thelastpickle.com
o.a.c.net.MessagingService.receive()
runnable = new MessageDeliveryTask( message, id, timestamp);
StageManager.getStage( message.getMessageType());
stage.execute(runnable);
www.thelastpickle.com
o.a.c.net.MessageDeliveryTask.run()
// If dropable and rpc_timeoutMessagingService.incrementDroppedMessag
es(verb);
MessagingService.getVerbHandler(verb)verbHandler.doVerb(message, id)
www.thelastpickle.com
ArchitectureAPI Layer
Dynamo LayerDatabase Layer
www.thelastpickle.com
Database Layero.a.c.concurrent
o.a.c.db
o.a.c.cacheo.a.c.io
o.a.c.trace
www.thelastpickle.com
o.a.c.concurrent.StageManager
stages = new EnumMap<Stage, ThreadPoolExecutor>(Stage.class);
getStage(Stage stage)
www.thelastpickle.com
o.a.c.concurrent.Stage
READMUTATIONGOSSIPREQUEST_RESPONSEANTI_ENTROPY
(And more...)www.thelastpickle.com
Database Layero.a.c.concurrent
o.a.c.db
o.a.c.cacheo.a.c.io
o.a.c.trace
www.thelastpickle.com
o.a.c.db.Table
// Keyspaceopen(String table)getColumnFamilyStore(String cfName)
getRow(QueryFilter filter)apply(RowMutation mutation, boolean writeCommitLog)
www.thelastpickle.com
o.a.c.db.ColumnFamilyStore
// Column FamilygetColumnFamily(QueryFilter filter)getTopLevelColumns(...)
apply(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer)
www.thelastpickle.com
o.a.c.db.IColumnContainer
addColumn(IColumn column)remove(ByteBuffer columnName)
ColumnFamilySuperColumn
www.thelastpickle.com
o.a.c.db.ISortedColumns
addColumn(IColumn column, Allocator allocator)removeColumn(ByteBuffer name)
ArrayBackedSortedColumnsAtomicSortedColumnsTreeMapBackedSortedColumns
www.thelastpickle.com
o.a.c.db.Memtable
put(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer)
flushAndSignal(CountDownLatch latch, Future<ReplayPosition> context)
www.thelastpickle.com
o.a.c.db.ReadCommand
getRow(Table table)
SliceByNamesReadCommandSliceFromReadCommand
www.thelastpickle.com
o.a.c.db.IDiskAtomFilter
getMemtableColumnIterator(...)getSSTableColumnIterator(...)
IdentityQueryFilterNamesQueryFilterSliceQueryFilter
www.thelastpickle.com
SummaryCustomTThreadPoolServer Message.Dispatcher
CassandraServer QueryProcessor
ReadCommand
StorageProxy
IResponseResolver
IAsyncCallback
MessagingService
IVerbHandler
Table ColumnFamilyStore IDiskAtomFilter
API
Dynamo
Database
www.thelastpickle.com
Aaron Morton@aaronmorton
Co-Founder & Principal Consultantwww.thelastpickle.com
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License