cassandra community webinar: apache cassandra internals
Post on 26-Jan-2015
128 Views
Preview:
DESCRIPTION
TRANSCRIPT
CASSANDRA COMMUNITY WEBINARS AUGUST 2013
CASSANDRA INTERNALS
Aaron Morton@aaronmorton
Co-Founder & Principal Consultantwww.thelastpickle.com
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
About The Last PickleWork with clients to deliver and improve
Apache Cassandra based solutions. Apache Cassandra Committer, DataStax MVP,
Hector Maintainer, 6+ years combined Cassandra experience.
Based in New Zealand & Austin, TX.
Cassandra Architecture
API's
Cluster Aware
Cluster Unaware
Clients
Disk
www.thelastpickle.com
Cassandra Cluster Architecture
API's
Cluster Aware
Cluster Unaware
Clients
Disk
API's
Cluster Aware
Cluster Unaware
Disk
Node 1 Node 2
www.thelastpickle.com
Dynamo Cluster Architecture
API's
Dynamo
Database
Clients
Disk
API's
Dynamo
Database
Disk
Node 1 Node 2
www.thelastpickle.com
ArchitectureAPI
DynamoDatabase
www.thelastpickle.com
API Transports
ThriftNative Binary
www.thelastpickle.com
Thrift Transport
//Custom TServer implementations
o.a.c.thrift.CustomTThreadPoolServero.a.c.thrift.CustomTNonBlockingServero.a.c.thrift.CustomTHsHaServer
www.thelastpickle.com
API Transports
ThriftNative Binary
www.thelastpickle.com
Native Binary Transport
Beta in Cassandra 1.2Uses Netty
Enabled with start_native_transport
(Disabled by default)
www.thelastpickle.com
o.a.c.transport.Server.run()
//Setup the Netty servernew ExecutionHandler()new NioServerSocketChannelFactory()ServerBootstrap.setPipelineFactory()
www.thelastpickle.com
o.a.c.transport.Message.Dispatcher.messageReceived()
//Process message from clientServerConnection.validateNewMessage()Request.execute()ServerConnection.applyStateTransition()Channel.write()
www.thelastpickle.com
Messages
Defined in the Native Binary Protocol
$SRC/doc/native_protocol.spec
www.thelastpickle.com
API Services
JMXThrift
CQL 3
www.thelastpickle.com
JMX Management Beans
Spread around the code base.
Interfaces named *MBean
www.thelastpickle.com
JMX Management Beans
Registered with names such as org.apache.cassandra.db:
type=StorageProxy
www.thelastpickle.com
API Services
JMXThriftCQL 3
www.thelastpickle.com
o.a.c.thrift.CassandraServer
// Implements Thrift Interface// Access control// Input validation// Mapping to/from Thrift and internal types
www.thelastpickle.com
Thrift Interface
Thrift IDL$SRC/interface/cassandra.thrift
www.thelastpickle.com
o.a.c.thrift.CassandraServer.get_slice()
// get columns for one rowTracing.begin()ClientState cState = state()cState.hasColumnFamilyAccess()multigetSliceInternal()
www.thelastpickle.com
CassandraServer.multigetSliceInternal()
// get columns for may rowsThriftValidation.validate*()// Create ReadCommandsgetSlice()
www.thelastpickle.com
CassandraServer.getSlice()
// Process ReadCommands// return Thrift types
readColumnFamily()thriftifyColumnFamily()
www.thelastpickle.com
CassandraServer.readColumnFamily()
// Process ReadCommands// Return ColumnFamilies
StorageProxy.read()
www.thelastpickle.com
API Services
JMXThrift
CQL 3
www.thelastpickle.com
o.a.c.cql3.QueryProcessor
// Prepares and executes CQL3 statements// Used by Thrift & Native transports// Access control// Input validation// Returns transport.ResultMessage
www.thelastpickle.com
CQL3 Grammar
ANTLR Grammar$SRC/o.a.c.cql3/Cql.g
www.thelastpickle.com
o.a.c.cql3.statements.ParsedStatement
// Subclasses generated by ANTLR// Tracks bound term count// Prepare CQLStatementprepare()
www.thelastpickle.com
o.a.c.cql3.statements.CQLStatement
checkAccess(ClientState state)validate(ClientState state)execute(ConsistencyLevel cl, QueryState state, List<ByteBuffer> variables)
www.thelastpickle.com
statements.SelectStatement.RawStatement
// Implements ParsedStatement// Input validationprepare()
www.thelastpickle.com
statements.SelectStatement.execute()
// Create ReadCommandsStorageProxy.read()
www.thelastpickle.com
ArchitectureAPI
DynamoDatabase
www.thelastpickle.com
Dynamo Layero.a.c.service
o.a.c.net
o.a.c.dhto.a.c.gms
o.a.c.locatoro.a.c.stream
www.thelastpickle.com
o.a.c.service.StorageProxy
// Cluster wide storage operations// Select endpoints & check CL available// Send messages to Stages// Wait for response// Store Hints
www.thelastpickle.com
o.a.c.service.StorageService
// Ring operations// Track ring state// Start & stop ring membership// Node & token queries
www.thelastpickle.com
o.a.c.service.IResponseResolver
preprocess(MessageIn<T> message)resolve() throws DigestMismatchException
RowDigestResolverRowDataResolverRangeSliceResponseResolver
www.thelastpickle.com
Response Handlers / Callback
implements IAsyncCallback<T>
response(MessageIn<T> msg)
www.thelastpickle.com
o.a.c.service.ReadCallback.get()
//Wait for blockfor & datacondition.await(timeout, TimeUnit.MILLISECONDS)
throw ReadTimeoutException()
resolver.resolve()
www.thelastpickle.com
o.a.c.service.StorageProxy.fetchRows()
getLiveSortedEndpoints()new RowDigestResolver()new ReadCallback()MessagingService.sendRR()---------------------------------------ReadCallback.get() # blockingcatch (DigestMismatchException ex)catch (ReadTimeoutException ex)
www.thelastpickle.com
Dynamo Layero.a.c.service
o.a.c.net
o.a.c.dhto.a.c.gms
o.a.c.locatoro.a.c.stream
www.thelastpickle.com
o.a.c.net.MessagingService.verb<<enum>>
MUTATIONREADREQUEST_RESPONSETREE_REQUESTTREE_RESPONSE
(And more...)
www.thelastpickle.com
o.a.c.net.MessagingService.verbHandlers
new EnumMap<Verb, IVerbHandler>(Verb.class)
www.thelastpickle.com
o.a.c.net.IVerbHandler<T>
doVerb(MessageIn<T> message, String id);
www.thelastpickle.com
o.a.c.net.MessagingService.verbStages
new EnumMap<MessagingService.Verb, Stage>(MessagingService.Verb.class)
www.thelastpickle.com
o.a.c.net.MessagingService.receive()
runnable = new MessageDeliveryTask( message, id, timestamp);
StageManager.getStage( message.getMessageType());
stage.execute(runnable);
www.thelastpickle.com
o.a.c.net.MessageDeliveryTask.run()
// If dropable and rpc_timeoutMessagingService.incrementDroppedMessag
es(verb);
MessagingService.getVerbHandler(verb)verbHandler.doVerb(message, id)
www.thelastpickle.com
ArchitectureAPI Layer
Dynamo LayerDatabase Layer
www.thelastpickle.com
Database Layero.a.c.concurrent
o.a.c.db
o.a.c.cacheo.a.c.io
o.a.c.trace
www.thelastpickle.com
o.a.c.concurrent.StageManager
stages = new EnumMap<Stage, ThreadPoolExecutor>(Stage.class);
getStage(Stage stage)
www.thelastpickle.com
o.a.c.concurrent.Stage
READMUTATIONGOSSIPREQUEST_RESPONSEANTI_ENTROPY
(And more...)www.thelastpickle.com
Database Layero.a.c.concurrent
o.a.c.db
o.a.c.cacheo.a.c.io
o.a.c.trace
www.thelastpickle.com
o.a.c.db.Table
// Keyspaceopen(String table)getColumnFamilyStore(String cfName)
getRow(QueryFilter filter)apply(RowMutation mutation, boolean writeCommitLog)
www.thelastpickle.com
o.a.c.db.ColumnFamilyStore
// Column FamilygetColumnFamily(QueryFilter filter)getTopLevelColumns(...)
apply(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer)
www.thelastpickle.com
o.a.c.db.IColumnContainer
addColumn(IColumn column)remove(ByteBuffer columnName)
ColumnFamilySuperColumn
www.thelastpickle.com
o.a.c.db.ISortedColumns
addColumn(IColumn column, Allocator allocator)removeColumn(ByteBuffer name)
ArrayBackedSortedColumnsAtomicSortedColumnsTreeMapBackedSortedColumns
www.thelastpickle.com
o.a.c.db.Memtable
put(DecoratedKey key, ColumnFamily columnFamily, SecondaryIndexManager.Updater indexer)
flushAndSignal(CountDownLatch latch, Future<ReplayPosition> context)
www.thelastpickle.com
o.a.c.db.ReadCommand
getRow(Table table)
SliceByNamesReadCommandSliceFromReadCommand
www.thelastpickle.com
o.a.c.db.IDiskAtomFilter
getMemtableColumnIterator(...)getSSTableColumnIterator(...)
IdentityQueryFilterNamesQueryFilterSliceQueryFilter
www.thelastpickle.com
SummaryCustomTThreadPoolServer Message.Dispatcher
CassandraServer QueryProcessor
ReadCommand
StorageProxy
IResponseResolver
IAsyncCallback
MessagingService
IVerbHandler
Table ColumnFamilyStore IDiskAtomFilter
API
Dynamo
Database
www.thelastpickle.com
Aaron Morton@aaronmorton
Co-Founder & Principal Consultantwww.thelastpickle.com
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
top related