cassandra 2.1 boot camp, overview

55
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License CASSANDRA SUMMIT SF 2014 CONTRIBUTOR BOOT CAMP Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License Aaron Morton @aaronmorton Co-Founder & Principal Consultant

Upload: joshmckenzie

Post on 02-Jul-2015

225 views

Category:

Technology


3 download

DESCRIPTION

Cassandra Summit Boot Camp, 2014 Introduction, Aaron Morton presenter

TRANSCRIPT

Page 1: Cassandra 2.1 boot camp, Overview

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

CASSANDRA SUMMIT SF 2014

CONTRIBUTOR BOOT CAMP

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Aaron Morton @aaronmorton

Co-Founder & Principal Consultant

Page 2: Cassandra 2.1 boot camp, Overview

Architecture Startup, Shutdown & Failure

StorageProxy MessagingService

Gossip

Page 3: Cassandra 2.1 boot camp, Overview

Dynamo Cluster Architecture

API's

Dynamo

Database

Clients

Disk

API's

Dynamo

Database

Disk

Node 1 Node 2

Page 4: Cassandra 2.1 boot camp, Overview

API Layer

o.a.c.auth o.a.c.cql3

o.a.c.metrics o.a.c.thrift

o.a.c.transport

Page 5: Cassandra 2.1 boot camp, Overview

API Layer

Talks to Dynamo layer using Commands via the

StorageProxy

Page 6: Cassandra 2.1 boot camp, Overview

Dynamo Layer o.a.c.dht o.a.c.gms

o.a.c.locator o.a.c.net

o.a.c.repair o.a.c.service

o.a.c.streaming

Page 7: Cassandra 2.1 boot camp, Overview

Dynamo Layer

Talks to Database layer by sending messages to

IVerbHandler’s via the MessagingService.

Page 8: Cassandra 2.1 boot camp, Overview

Database Layer o.a.c.cache

o.a.c.concurrent o.a.c.db o.a.c.io

o.a.c.serializers

Page 9: Cassandra 2.1 boot camp, Overview

Global Services

o.a.c.config o.a.c.trace o.a.c.utils

Page 10: Cassandra 2.1 boot camp, Overview

Architecture Startup, Shutdown & Failure

StorageProxy MessagingService

Gossip

Page 11: Cassandra 2.1 boot camp, Overview

o.a.c.service.CassandraDaemon.main() !

// Singleton // Start MBean setup() // here be magic

Page 12: Cassandra 2.1 boot camp, Overview

o.a.c.service.CassandraDaemon.setup() // JNA Thread.setDefaultUncaughtExceptionHandler() // Check directories exist SystemKeyspace.checkHealth(); DatabaseDescriptor.loadSchemas(); CFS.disableAutoCompaction(); !

Page 13: Cassandra 2.1 boot camp, Overview

o.a.c.service.CassandraDaemon.setup() CommitLog.recover(); StorageService.registerDaemon(); StorageService.initServer();

Page 14: Cassandra 2.1 boot camp, Overview

Exception Hook !

// Exception Metrics !

FileUtils.handleFSError() FileUtils.handleCorruptSSTable()

Page 15: Cassandra 2.1 boot camp, Overview

Shutdown and Drain Hook !

// Shutdown client transports // Shutdown thread pools // Blocking flush to disk // Shutdown commit log !

Page 16: Cassandra 2.1 boot camp, Overview

Architecture Startup, Shutdown & Failure

StorageProxy MessagingService

Gossip

Page 17: Cassandra 2.1 boot camp, Overview

o.a.c.service.StorageProxy !

// Cluster wide storage operations // Select endpoints & check CL available // Send messages to Stages // Wait for response // Store Hints

Page 18: Cassandra 2.1 boot camp, Overview

o.a.c.service.IResponseResolver !

preprocess(MessageIn<T> message) resolve() throws DigestMismatchException !

RowDigestResolver RowDataResolver RangeSliceResponseResolver

Page 19: Cassandra 2.1 boot camp, Overview

Response Handlers / Callback !

implements IAsyncCallback<T> !

response(MessageIn<T> msg) !

Page 20: Cassandra 2.1 boot camp, Overview

o.a.c.service.ReadCallback.get() !

//Wait for blockfor & data condition.await(timeout, TimeUnit.MILLISECONDS) !

// if condition not set throw ReadTimeoutException() !

resolver.resolve()

Page 21: Cassandra 2.1 boot camp, Overview

o.a.c.service.StorageProxy.fetchRows() !

AbstractReadExecutor.getReadExecutor() exec.executeAsync(); exec.maybeTryAdditionalReplicas(); --------------------------------------- AbstractReadExecutor.get() //handler.get catch (DigestMismatchException ex) catch (ReadTimeoutException ex)

Page 22: Cassandra 2.1 boot camp, Overview

AbstractReadExecutor.getReadExecutor() !

StorageProxy.getLiveSortedEndpoints() CFMetaData.newReadRepairDecision() ConsistencyLevel.filterForQuery() ConsistencyLevel.assureSufficientLiveNodes() …

Page 23: Cassandra 2.1 boot camp, Overview

AbstractReadExecutor.getReadExecutor() !

// no retry or blocking for all replicas return new NeverSpeculatingReadExecutor() !

// always retry or targeting all replicas return new AlwaysSpeculatingReadExecutor() !

// otherwise return new SpeculatingReadExecutor()

Page 24: Cassandra 2.1 boot camp, Overview

AbstractReadExecutor() !

resolver = new RowDigestResolver() handler = new ReadCallback<>()

Page 25: Cassandra 2.1 boot camp, Overview

AbstractReadExecutor.executeAsync() // makeDataRequests MessagingService.sendRR(command.createMessage(), endpoint, handler); !

// makeDigestRequests ReadCommand digestCommand = command.copy(); digestCommand.setDigestQuery(true); MessageOut<?> message = digestCommand.createMessage(); MessagingService.instance().sendRR(message, endpoint, handler);

Page 26: Cassandra 2.1 boot camp, Overview

StorageProxy.mutateAtomically() !

wrapResponseHandler() AbstractWriteResponseHandler.assureSufficientLiveNodes() !

----------------------------------------------------- getBatchlogEndpoints() syncWriteToBatchlog() // all mutations syncWriteBatchedMutations() // all wrappers asyncRemoveFromBatchlog() !

catch (UnavailableException e) catch (WriteTimeoutException e)

Page 27: Cassandra 2.1 boot camp, Overview

StorageProxy.wrapResponseHandler() !

StorageService.getNaturalEndpoints() TokenMetadata.pendingEndpointsFor() AbstractReplicationStrategy.getWriteResponseHandler() ----------------------------------------- !

// AbstractWriteResponseHandler WriteResponseHandler DatacenterWriteResponseHandler DatacenterSyncWriteResponseHandler ReplayWriteResponseHandler

Page 28: Cassandra 2.1 boot camp, Overview

StorageProxy.syncWriteBatchedMutations() !

// write to natural and pending endpoints sendToHintedEndpoints() !

--------------------------------------- !

AbstractWriteResponseHandler.get()

Page 29: Cassandra 2.1 boot camp, Overview

StorageProxy.sendToHintedEndpoints() // loop all targets MessagingService.sendRR() // for local !

// group messages for remote DC’s dcGroups.get(dc).add(destination) !

// write hints for down nodes submitHint() --------------------------------------- !

sendMessagesToNonlocalDC()

Page 30: Cassandra 2.1 boot camp, Overview

Architecture Startup, Shutdown & Failure

StorageProxy MessagingService

Gossip

Page 31: Cassandra 2.1 boot camp, Overview

MessagingService Transport Layer

Custom Serialisation over TCP Sockets.

Serialisers spread around code.

Page 32: Cassandra 2.1 boot camp, Overview

o.a.c.net.MessagingService.verb<<enum>> !

MUTATION READ REQUEST_RESPONSE TREE_REQUEST TREE_RESPONSE

(And more...)

Page 33: Cassandra 2.1 boot camp, Overview

o.a.c.net.MessagingService.verbHandlers !

new EnumMap<Verb, IVerbHandler>(Verb.class)

Page 34: Cassandra 2.1 boot camp, Overview

o.a.c.net.IVerbHandler<T> !

doVerb(MessageIn<T> message, String id); !

Page 35: Cassandra 2.1 boot camp, Overview

o.a.c.net.MessageIn<T> public class MessageIn<T> { public final InetAddress from; public final T payload; public final Map<String, byte[]> parameters; public final MessagingService.Verb verb; public final int version; … }

Page 36: Cassandra 2.1 boot camp, Overview

o.a.c.net.MessageOut<T> public class MessageOut<T> { public final InetAddress public final MessagingService.Verb verb; public final T payload; public final IVersionedSerializer<T> serializer; public final Map<String, byte[]> parameters; … }

Page 37: Cassandra 2.1 boot camp, Overview

o.a.c.net.MessagingService.verbStages !

new EnumMap<MessagingService.Verb, Stage>(MessagingService.Verb.class)

Page 38: Cassandra 2.1 boot camp, Overview

o.a.c.net.MessagingService.verbStages !

put(Verb.MUTATION, Stage.MUTATION); put(Verb.READ, Stage.READ); put(Verb.REQUEST_RESPONSE, Stage.REQUEST_RESPONSE);

Page 39: Cassandra 2.1 boot camp, Overview

o.a.c.net.MessagingService.receive() !

runnable = new MessageDeliveryTask( message, id, timestamp); !

StageManager.getStage( message.getMessageType()); !

stage.execute(runnable);

Page 40: Cassandra 2.1 boot camp, Overview

o.a.c.net.MessageDeliveryTask.run() !

// If dropable and rpc_timeout MessagingService.incrementDroppedMessages(verb); !

MessagingService.getVerbHandler(verb) verbHandler.doVerb(message, id)

Page 41: Cassandra 2.1 boot camp, Overview

Architecture Startup, Shutdown & Failure

StorageProxy MessagingService

Gossip

Page 42: Cassandra 2.1 boot camp, Overview

o.a.c.gms.ApplicationState !

STATUS, LOAD, SCHEMA, DC, RACK, RELEASE_VERSION, REMOVAL_COORDINATOR, INTERNAL_IP, RPC_ADDRESS, SEVERITY, NET_VERSION …

Page 43: Cassandra 2.1 boot camp, Overview

o.a.c.gms.VersionedValue !

public final int version; public final String value;

Page 44: Cassandra 2.1 boot camp, Overview

o.a.c.gms.VersionGenerator { private static final AtomicInteger version = new AtomicInteger(0); !

public static int getNextVersion() { return version.incrementAndGet(); } }

Page 45: Cassandra 2.1 boot camp, Overview

o.a.c.gms.EndpointState { private volatile HeartBeatState hbState; final Map<ApplicationState, VersionedValue> applicationState = new NonBlockingHashMap<ApplicationState, VersionedValue>(); !

}

Page 46: Cassandra 2.1 boot camp, Overview

o.a.c.gms.HeartBeatState { private int generation; private int version; … }

Page 47: Cassandra 2.1 boot camp, Overview

o.a.c.db.SystemKeyspace.incrementAndGetGeneration() SELECT gossip_generation FROM system.local WHERE key=‘local’; !

// if none generation = (int) (System.currentTimeMillis() / 1000); !

// else generation = (int) (System.currentTimeMillis() / 1000); // and some other checks

Page 48: Cassandra 2.1 boot camp, Overview

nodetool gossipinfo generation:1410220170 heartbeat:37 LOAD:1.57821104E8 STATUS:NORMAL,-1007384361686170050 RACK:rack1 NET_VERSION:8 SEVERITY:0.0 RELEASE_VERSION:2.1.0-rc5 SCHEMA:f3b70c8e-a904-3de9-ac5d-8ab30271441d HOST_ID:4aac20b5-3c68-4a26-a415-2e2f2ff0ed46 RPC_ADDRESS:127.0.0.1

Page 49: Cassandra 2.1 boot camp, Overview

o.a.c.gms.Gossiper.GossipTask.run()

Gossip every second. 1 to 3 nodes.

!

Three step process.

Page 50: Cassandra 2.1 boot camp, Overview

Processed by IVerbHandlers

I Send SYN. Remote replies with ACK.

I send ACK2.

Page 51: Cassandra 2.1 boot camp, Overview

o.a.c.gms.GossipDigestSyn

Exchange List<GossipDigest> !

GossipDigest { final InetAddress endpoint; final int generation; final int maxVersion; … }

Page 52: Cassandra 2.1 boot camp, Overview

o.a.c.gms.Gossiper.examineGossiper() // If empty SYN send all my info (shadow gossip) !

if (remoteGeneration == localGeneration && maxRemoteVersion == maxLocalVersion) // do nothing !

else if (remoteGeneration > localGeneration) // we request everything from the gossiper !

else if (remoteGeneration < localGeneration) // send all data with generation = localgeneration and version >

Page 53: Cassandra 2.1 boot camp, Overview

o.a.c.gms.Gossiper.examineGossiper() else if (remoteGeneration == localGeneration) !

/* If the max remote version is greater then we request the remote endpoint send us all the data for this endpoint with version greater than the max version number we have locally for this endpoint. !

If the max remote version is lesser, then we send all the data we have locally for this endpoint with version greater than the max remote version. */

Page 54: Cassandra 2.1 boot camp, Overview

Thanks. !

Page 55: Cassandra 2.1 boot camp, Overview

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License

Aaron Morton @aaronmorton

!

Co-Founder & Principal Consultant www.thelastpickle.com

!

Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License