2017 high performance database with scala, akka, spark
TRANSCRIPT
![Page 1: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/1.jpg)
Building a High-Performance Database with
Scala, Akka, and SparkEvan Chan
November 2017
![Page 2: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/2.jpg)
Who am I
User and contributor to Spark since 0.9, Cassandra since 0.6 Created Spark Job Server and FiloDB Talks at Spark Summit, Cassandra Summit, Strata, Scala Days, etc. http://velvia.github.io/
![Page 3: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/3.jpg)
Why Build a New Streaming Database?
![Page 4: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/4.jpg)
Needs• Ingest HUGE streams of events — IoT etc.
• Real-time, low latency, and somewhat flexible queries
• Dashboards, quick answers on new data
• Flexible schemas and query patterns
• Keep your streaming pipeline super simple
• Streaming = hardest to debug. Simplicity rules!
![Page 5: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/5.jpg)
Message QueueEvents
Stream Processing
Layer
State / Database
Happy Users
![Page 6: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/6.jpg)
Spark + HDFS Streaming
Kafka Spark Streaming
Many small files (microbatches)
Dedup, consolidate
job
Larger efficient files
• High latency
• Big impedance mismatch between streaming systems and a file system designed for big blobs of data
![Page 7: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/7.jpg)
Cassandra?• Ingest HUGE streams of events — IoT etc.
• C* is not efficient for writing raw events
• Real-time, low latency, and somewhat flexible queries
• C* is real-time, but only low latency for simple lookups. Add Spark => much higher latency
• Flexible schemas and query patterns
• C* only handles simple lookups
![Page 8: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/8.jpg)
Introducing FiloDBA distributed, columnar time-series/event database.
Built for streaming.
http://www.github.com/filodb/FiloDB
![Page 9: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/9.jpg)
Message QueueEvents
Spark Streaming
Short term storage, K-V
Adhoc, SQL, ML
Cassandra
FiloDB: Events, ad-hoc, batch
Spark
Dashboards, maps
![Page 10: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/10.jpg)
100% Reactive• Scala
• Akka Cluster
• Spark
• Monix / Reactive Streams
• Typesafe Config for all configuration
• Scodec, Ficus, Enumeratum, Scalactic, etc.
• Even most of the performance critical parts are written in Scala :)
![Page 11: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/11.jpg)
Scala, Akka, and Spark for Database
![Page 12: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/12.jpg)
Why use Scala and Akka?• Akka Cluster!
• Just the right abstractions - streams, futures, Akka, type safety….
• Failure handling and supervision are critical for databases
• All the pattern matching and immutable goodness :)
![Page 13: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/13.jpg)
Scala Big Data Projects
• Spark
• GeoMesa
• Khronus - Akka time-series DB
• Sirius - Akka distributed KV Store
• FiloDB!
![Page 14: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/14.jpg)
Actors vs Futures vs Observables
![Page 15: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/15.jpg)
One FiloDB Node
NodeCoordinatorActor (NCA)
DatasetCoordinatorActor (DsCA)
DatasetCoordinatorActor (DsCA)
Active MemTable
Flushing MemTableReprojector ColumnStore
Data, commands
![Page 16: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/16.jpg)
Akka vs Futures
NodeCoordinatorActor (NCA)
DatasetCoordinatorActor (DsCA)
DatasetCoordinatorActor (DsCA)
Active MemTable
Flushing MemTableReprojector ColumnStore
Data, commands
Akka - control flow
Core I/O - Futures/Observables
![Page 17: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/17.jpg)
Akka vs Futures• Akka Actors:
• External FiloDB node API (remote + cluster)
• Async messaging with clients
• Cluster/distributed state management
• Futures and Observables:
• Core I/O
• Columnar data processing / ingestion
• Type-safe processing stages
![Page 18: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/18.jpg)
Futures for Single Actions /** * Clears all data from the column store for that given projection, for all versions. * More like a truncation, not a drop. * NOTE: please make sure there are no reprojections or writes going on before calling this */ def clearProjectionData(projection: Projection): Future[Response]
/** * Completely and permanently drops the dataset from the column store. * @param dataset the DatasetRef for the dataset to drop. */ def dropDataset(dataset: DatasetRef): Future[Response]
/** * Appends the ChunkSets and incremental indices in the segment to the column store. * @param segment the ChunkSetSegment to write / merge to the columnar store * @param version the version # to write the segment to * @return Success. Future.failure(exception) otherwise. */ def appendSegment(projection: RichProjection, segment: ChunkSetSegment, version: Int): Future[Response]
![Page 19: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/19.jpg)
Monix / Reactive Streams• http://monix.io
• “observable sequences that are exposed as asynchronous streams, expanding on the observer pattern, strongly inspired by ReactiveX and by Scalaz, but designed from the ground up for back-pressure and made to cleanly interact with Scala’s standard library, compatible out-of-the-box with the Reactive Streams protocol”
• Much better than Future[Iterator[_]]
![Page 20: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/20.jpg)
Monix / Reactive Streams def readChunks(projection: RichProjection, columns: Seq[Column], version: Int, partMethod: PartitionScanMethod, chunkMethod: ChunkScanMethod = AllChunkScan): Observable[ChunkSetReader] = { scanPartitions(projection, version, partMethod) // Partitions to pipeline of single chunks .flatMap { partIndex => stats.incrReadPartitions(1) readPartitionChunks(projection.datasetRef, version, columns, partIndex, chunkMethod) // Collate single chunks to ChunkSetReaders }.scan(new ChunkSetReaderAggregator(columns, stats)) { _ add _ } .collect { case agg: ChunkSetReaderAggregator if agg.canEmit => agg.emit() } } }
![Page 21: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/21.jpg)
Functional Reactive Stream Processing
• Ingest stream merged with flush commands
• Built in async/parallel tasks via mapAsync
• Notify on end of stream, errors
val combinedStream = Observable.merge(stream.map(SomeData), flushStream) combinedStream.map { case SomeData(records) => shard.ingest(records) None case FlushCommand(group) => shard.switchGroupBuffers(group) Some(FlushGroup(shard.shardNum, group, shard.latestOffset)) }.collect { case Some(flushGroup) => flushGroup } .mapAsync(numParallelFlushes)(shard.createFlushTask _) .foreach { x => } .recover { case ex: Exception => errHandler(ex) }
![Page 22: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/22.jpg)
Akka Cluster and Spark
![Page 23: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/23.jpg)
Spark/Akka Cluster SetupDriver
NodeClusterActor
Client
Executor
NCA
DsCA1 DsCA2
Executor
NCA
DsCA1 DsCA2
![Page 24: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/24.jpg)
Adding one executorDriver
NodeClusterActor
Client
executor1
NCA
DsCA1 DsCA2
State:Executors -> (executor1)
MemberUp
ActorSelectionActorRef
![Page 25: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/25.jpg)
Adding second executorDriver
NodeClusterActor
Client
executor1
NCA
DsCA1 DsCA2
State:Executors -> (executor1, executor2) MemberUp
ActorSelection ActorRef
executor2
NCA
DsCA1 DsCA2
![Page 26: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/26.jpg)
Sending a commandDriver
NodeClusterActor
Client
Executor
NCA
DsCA1 DsCA2
Executor
NCA
DsCA1 DsCA2
Flush()
![Page 27: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/27.jpg)
Yes, Akka in Spark• Columnar ingestion is stateful - need stickiness of
state. This is inherently difficult in Spark.
• Akka (cluster) gives us a separate, asynchronous control channel to talk to FiloDB ingestors
• Spark only gives data flow primitives, not async messaging
• We need to route incoming records to the correct ingestion node. Sorting data is inefficient and forces all nodes to wait for sorting to be done.
![Page 28: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/28.jpg)
Data Ingestion SetupExecutor
NCA
DsCA1 DsCA2
task0 task1
Row Source Actor
Row Source Actor
Executor
NCA
DsCA1 DsCA2
task0 task1
Row Source Actor
Row Source Actor
Node Cluster Actor
Partition Map
![Page 29: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/29.jpg)
FiloDB NodeFiloDB Node
FiloDB separate nodesExecutor
NCA
DsCA1 DsCA2
task0 task1
Row Source Actor
Row Source Actor
Executor
NCA
DsCA1 DsCA2
task0 task1
Row Source Actor
Row Source Actor
Node Cluster Actor
Partition Map
![Page 30: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/30.jpg)
Testing Akka Cluster• MultiNodeSpec / sbt-multi-jvm
• NodeClusterSpec
• Tests joining of different cluster nodes and partition map updates
• Is partition map updated properly if a cluster node goes down — inject network failures
• Lessons
![Page 31: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/31.jpg)
Kamon Tracing• http://kamon.io
• One trace can encapsulate multiple Future steps all executing on different threads
• Tunable tracing levels
• Summary stats and histograms for segments
• Super useful for production debugging of reactive stack
![Page 32: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/32.jpg)
Kamon Tracing def appendSegment(projection: RichProjection, segment: ChunkSetSegment, version: Int): Future[Response] = Tracer.withNewContext("append-segment") { val ctx = Tracer.currentContext stats.segmentAppend() if (segment.chunkSets.isEmpty) { stats.segmentEmpty() return(Future.successful(NotApplied)) } for { writeChunksResp <- writeChunks(projection.datasetRef, version, segment, ctx) writeIndexResp <- writeIndices(projection, version, segment, ctx) if writeChunksResp == Success } yield { ctx.finish() writeIndexResp } }
private def writeChunks(dataset: DatasetRef, version: Int, segment: ChunkSetSegment, ctx: TraceContext): Future[Response] = { asyncSubtrace(ctx, "write-chunks", "ingestion") { val binPartition = segment.binaryPartition val segmentId = segment.segmentId val chunkTable = getOrCreateChunkTable(dataset) Future.traverse(segment.chunkSets) { chunkSet => chunkTable.writeChunks(binPartition, version, segmentId, chunkSet.info.id, chunkSet.chunks, stats) }.map { responses => responses.head } } }
![Page 33: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/33.jpg)
Kamon Metrics
• Uses HDRHistogram for much finer and more accurate buckets
• Built-in metrics for Akka actors, Spray, Akka-Http, Play, etc. etc.
KAMON trace name=append-segment n=2863 min=765952 p50=2113536 p90=3211264 p95=3981312 p99=9895936 p999=16121856 max=19529728KAMON trace-segment name=write-chunks n=2864 min=436224 p50=1597440 p90=2637824 p95=3424256 p99=9109504 p999=15335424 max=18874368KAMON trace-segment name=write-index n=2863 min=278528 p50=432128 p90=544768 p95=598016 p99=888832 p999=2260992 max=8355840
![Page 34: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/34.jpg)
Validation: Scalactic private def getColumnsFromNames(allColumns: Seq[Column], columnNames: Seq[String]): Seq[Column] Or BadSchema = { if (columnNames.isEmpty) { Good(allColumns) } else { val columnMap = allColumns.map { c => c.name -> c }.toMap val missing = columnNames.toSet -- columnMap.keySet if (missing.nonEmpty) { Bad(MissingColumnNames(missing.toSeq, "projection")) } else { Good(columnNames.map(columnMap)) } } }
for { computedColumns <- getComputedColumns(dataset.name, allColIds, columns) dataColumns <- getColumnsFromNames(columns, normProjection.columns) richColumns = dataColumns ++ computedColumns // scalac has problems dealing with (a, b, c) <- getColIndicesAndType... apparently segStuff <- getColIndicesAndType(richColumns, Seq(normProjection.segmentColId), "segment") keyStuff <- getColIndicesAndType(richColumns, normProjection.keyColIds, "row") partStuff <- getColIndicesAndType(richColumns, dataset.partitionColumns, "partition") } yield {
• Notice how multiple validations compose!
![Page 35: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/35.jpg)
Machine-Speed Scala
![Page 36: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/36.jpg)
How do you go REALLY fast?
• Don’t serialize
• Don’t allocate
• Don’t copy
![Page 37: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/37.jpg)
Filo fast
• Filo binary vectors - 2 billion records/sec
• Spark InMemoryColumnStore - 125 million records/sec
• Spark CassandraColumnStore - 25 million records/sec
![Page 38: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/38.jpg)
Filo: High Performance Binary Vectors
• Designed for NoSQL, not a file format
• random or linear access
• on or off heap
• missing value support
• Scala only, but cross-platform support possible
http://github.com/velvia/filo is a binary data vector library designed for extreme read performance with minimal deserialization costs.
![Page 39: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/39.jpg)
Billions of Ops / Sec
• JMH benchmark: 0.5ns per FiloVector element access / add
• 2 Billion adds per second - single threaded
• Who said Scala cannot be fast?
• Spark API (row-based) limits performance significantly
val randomInts = (0 until numValues).map(i => util.Random.nextInt) val randomIntsAray = randomInts.toArray val filoBuffer = VectorBuilder(randomInts).toFiloBuffer val sc = FiloVector[Int](filoBuffer) @Benchmark @BenchmarkMode(Array(Mode.AverageTime)) @OutputTimeUnit(TimeUnit.MICROSECONDS) def sumAllIntsFiloApply(): Int = { var total = 0 for { i <- 0 until numValues optimized } { total += sc(i) } total }
![Page 40: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/40.jpg)
JVM Inlining
• Very small methods can be inlined by the JVM
• final def avoids virtual method dispatch.
• Thus methods in traits, abstract classes not inlinable
val base = baseReader.readInt(0) final def apply(i: Int): Int = base + dataReader.read(i)
case (32, _) => new TypedBufferReader[Int] { final def read(i: Int): Int = reader.readInt(i) }
final def readInt(i: Int): Int = unsafe.getInt(byteArray, (offset + i * 4).toLong)
0.5ns/read is achieved through a stack of very small methods:
![Page 41: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/41.jpg)
BinaryRecord• Tough problem: FiloDB must handle many
different datasets, each with different schemas
• Cannot rely on static types and standard serialization mechanisms - case classes, Protobuf, etc.
• Serialization very costly, especially strings
• Solution: BinaryRecord
![Page 42: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/42.jpg)
BinaryRecord II• BinaryRecord is a binary (ie transport ready) record
class that supports any schema or mix of column types
• Values can be extracted or written with no serialization cost
• UTF8-encoded string class
• String compare as fast as native Java strings
• Immutable API once built
![Page 43: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/43.jpg)
Use Case: Sorting
• Regular sorting: deserialize record, create sort key, compare sort key
• BinaryRecord sorting: binary compare fields directly — no deserialization, no object allocations
![Page 44: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/44.jpg)
Regular SortingProtobuf/Avro etc record
Deserialized instance
Sort Key
Protobuf/Avro etc record
Deserialized instance
Sort KeyCmp
![Page 45: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/45.jpg)
BinaryRecord Sorting
• BinaryRecord sorting: binary compare fields directly — no deserialization, no object allocations
name: Str age: Int lastTimestamp: Long group: Str
name: Str age: Int lastTimestamp: Long group: Str
![Page 46: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/46.jpg)
SBT-JMH
• Super useful tool to leverage JMH, the best micro benchmarking harness
• JMH is written by the JDK folks
![Page 47: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/47.jpg)
In Summary• Scala, Akka, reactive can give you both awesome
abstractions AND performance
• Use Akka for distribution, state, protocols
• Use reactive/Monix for functional, concurrent stream processing
• Build (or use FiloDB’s) fast low-level abstractions with good APIs
![Page 48: 2017 High Performance Database with Scala, Akka, Spark](https://reader031.vdocument.in/reader031/viewer/2022020917/5a6487a87f8b9a2c568b5439/html5/thumbnails/48.jpg)
Thank you Scala OSS!