in-memory data and compute on top of hadoop

© 2013 SpringOne 2GX. All rights reserved. Do not distribute without permission.

In-memory data and compute on top of Hadoop

Jags Ramnarayan – Chief Architect, Fast Data, Pivotal Anthony Baker – Architect, Fast Data, Pivotal

Agenda •  In-memory data grid – concepts, strengths, weaknesses •  HDFS – strengths, weaknesses •  What is our proposal? •  How do you use this? SQL syntax and demo •  HDFS integration architecture and demo •  MapReduce integration and demo

–  In-memory, parallel stored procedures •  Comparison to Hbase

“It is raining databases in the cloud” (The 451Group) •  Next Gen transactional DB is memory

based, distributed, elastic, HA, cloud ready … –  In-Memory data grids(IMDG),

NoSQL, Caching •  Pivotal GemFire, Oracle coherence,

Redis, Cassandra, …

•  Next Gen OLAP DB is centered around Hadoop –  Driver: They say it is ‘Volume,

velocity, variety’ –  Or, is it just cost/TB?

IMDG basic concepts

5

–  Distributed memory oriented store •  KV/Objects or SQL •  Queriable, Indexable and transactional

–  Multiple storage models •  Replication, partitioning in memory •  With synchronous copies in cluster •  Overflow to disk and/or RDBMS

Handle thousands of concurrent connections

Synchronous replication for slow changing data

Replicated Region

Partition for large data or highly transactional data

Partitioned Region

Redundant copy

–  Parallelize Java App logic –  Multiple failure detection schemes –  Dynamic membership (elastic)

–  Vendors differentiate on •  SQL support, WAN, events, etc

Low latency for thousands of

clients

6

Key IMDG pattern - Distributed Caching •  Designed to work with existing RDBs

–  Read through: Fetch from DB on cache miss –  Write through: Reflect in cache IFF DB write succeeds –  Write behind: reliable, in-order queue and batch write to DB

Traditional RDB integration can be challenging Memory Tables

(1)

DB WRITER

(2)

(3)

(4)

Memory Tables(1)

DB WRITER

(2)

(3)

(4)

Synchronous “Write through”

Single point of bottleneck and failure Not an option for “Write heavy”

Complex 2-phase commit protocol Parallel recovery is difficult

(1)Queue

(2)Updates

Asynchronous, Batches

DB Synchronizer

(1)Queue

(2)

DB Synchronizer

Updates

Asynchronous “Write behind”

Cannot sustain high “write” rates Queue may have to be persistent

Parallel recovery is difficult

Some IMDG, NoSQL offer ‘Shared nothing persistence’

•  Append only operation logs

•  Fully parallel •  Zero disk seeks

•  But, cluster restart requires log scan

•  Very large volumes pose challenges

MemoryTables

Append only Operation logs

OS Buffers

LOG Compressor

Record1

Record2

Record3

Record1

Record2

Record3

MemoryTables

Append only Operation logs

OS Buffers

LOG Compressor

Record1

Record2

Record3

Record1

Record2

Record3

Hadoop core(HDFS) for scalable, parallel storage

•  maturing and will be ubiquitous •  Handle very large data sets on commodity •  Handle failures well •  Simple Coherency model

Hadoop design center – batch and sequential

�  64MB immutable blocks

�  For random reads, you have to sequentially walk through records each time

�  Write once, read many design

�  Namenode can be a contention point

�  Slow failure detection

Hadoop Strengths

�  Massive volumes ( TB to PB)

�  HA, compression

�  Ever growing and maturing eco-system for parallel compute and analytics

�  Storage systems like Isilon now offer HDFS interface

�  Optimized for virtual machines

SQL + IMDG(Objects) + HDFS

Data in many shapes – support multiple data models

Main-memory based, distributed low latency, data store for big data

Operational data is the focus. It is in memory (mostly)

All Data, History in HDFS


Replication or partitioning

Storage model:

In-memory, In-memory with local disk or In-memory with HDFS persistence

SQL + IMDG(Objects) + HDFS SQL Engine – designed for online/OLTP, Transactions

IMDG caching features – readThru, writeBehind, etc


Tight HDFS integration – streaming, RW cases

analytics without access via in-memory tier – sequential walk through or incremental processing.

With parallel ingestion, you get near real time visibility to data for deep analytics.


MR ‘reduce’ can directly emit results to in-memory tier

closed loop between real-time and the analytics

GemFire XD – a Pivotal HD Service

SQLFire

Pivotal HD GemFire

Clustering, in-memory storage, HA, replication,

WAN, Events, Distributed queue…

SQL engine – cost based optimizer, in-memory indexing,

DTxn, RDB integration..

Integrated Install, config; command center –

monitoring, optimizations to Hadoop

+

Working set in memory, geo replicated History, time series in HDFS

SQL

Objects, JSON

The real time Latency spectrum

Machine latency

Interactive reports

Batch processing

Human interactions

Milliseconds Seconds Seconds, Minutes Minutes, Hours

GemFire XD, Online/OLTP/Operational DBs Analytics, Data Warehousing PivotalHD HAWQ

Real time on top of Hadoop – who else?

Many more…. Most focused on interactive queries for analytics

Design patterns •  Streaming ingest – consume unbounded event streams

–  Write fast into memory; stream all writes to HDFS for batch analytics

•  e.g. Maintain latest price for each security in memory; time series in HDFS

•  Continuously ingest click streams, audit trail or interaction data

–  Trap interactions or OLTP transactions, do in-line stream processing (actionable insights) and write results or raw state into HDFS

Design patterns •  High performance Operational Database

–  Keep operational data in-memory; history in HDFS is randomly accessible

•  e.g. Last 1 month of trades in-memory but all history is accessible at some cost

–  Take analytic output from Hadoop/SQL analytics and make it visible to online apps

Agenda •  How do you use this? SQL syntax and demo

In-Memory Partitioning & Replication

Explore features using simple STAR schema

27

FLIGHTS---------------------------------------------

FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEPART_TIME TIME,…..

PRIMARY KEY (FLIGHT_ID, SEGMENT_NUMBER)

FLIGHTAVAILABILITY---------------------------------------------

FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER ,…..

PRIMARY KEY ( FLIGHT_ID, SEGMENT_NUMBER, FLIGHT_DATE))

FOREIGN KEY (FLIGHT_ID, SEGMENT_NUMBER) REFERENCES FLIGHTS ( FLIGHT_ID, SEGMENT_NUMBER)

FLIGHTHISTORY---------------------------------------------

FLIGHT_ID CHAR(6), SEGMENT_NUMBER INTEGER, ORIG_AIRPORT CHAR(3), DEPART_TIME TIME, DEST_AIRPORT CHAR(3),…..

1 – M

1 – 1

SEVERAL CODE/DIMENSION TABLES---------------------------------------------

AIRLINES: AIRLINE INFORMATION (VERY STATIC)COUNTRIES : LIST OF COUNTRIES SERVED BY FLIGHTSCITIES: MAPS: PHOTOS OF REGIONS SERVED

Assume, thousands of flight rows, millions of flightavailability records

Creating tables

Table

CREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. );

GF XD GF XD GF XD

Replicated tables CREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. ) REPLICATE;

Replicated Table Replicated Table Replicated Table

GF XD GF XD GF XD

Design Pattern Replicate reference tables in

STAR schemas (seldom change, often referenced in queries)

Partitioned tables CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEST_AIRPORT CHAR(3) DEPART_TIME TIME, FLIGHT_MILES INTEGER NOT NULL) PARTITION BY COLUMN(FLIGHT_ID);

Table Partitioned Table Partitioned Table Partitioned Table


GF XD GF XD GF XD

Design Pattern Partition Fact tables in STAR schemas for load balancing

(large, write heavy)

Partitioned but highly available CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEST_AIRPORT CHAR(3) DEPART_TIME TIME, FLIGHT_MILES INTEGER NOT NULL) PARTITION BY COLUMN (FLIGHT_ID) REDUNDANCY 1;

Table Partitioned Table

Redundant Partition

Partitioned Table

Redundant Partition

Partitioned Table

Redundant Partition


GF XD GF XD GF XD

Design Pattern Increase redundant copies for HA and load balancing queries

across replicas

Colocation for related data


Redundant Partition

Partitioned Table

Redundant Partition

Partitioned Table

Redundant Partition


GF XD GF XD GF XD

Colocated Partition Colocated Partition Colocated Partition

Redundant Partition Redundant Partition Redundant Partition

CREATE TABLE FLIGHTAVAILABILITY ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , …..

PARTITION BY COLUMN (FLIGHT_ID) COLOCATE WITH (FLIGHTS);

Design Pattern Colocate related tables for maximum join performance

Native Disk resident tables (operation logging)


Redundant Partition

Partitioned Table

Redundant Partition

Partitioned Table

Redundant Partition


GF XD GF XD GF XD


Redundant Partition Redundant Partition Redundant Partition

sqlf backup /export/fileServerDirectory/sqlfireBackupLocation

Data dictionary is always persisted in each server

CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , …..

PARTITION BY COLUMN (FLIGHT_ID) PERSISTENT;

Demo environment

SQL client

Virtual Machine

GemFire XD Server

GemFire XD Locator

GemFire XD Server

GemFire XD Server

jdbc:sqlfire://localhost:1527

Pulse (monitoring)

Demo: replicated and partitioned tables

Agenda •  HDFS integration architecture and demo

Effortless HDFS integration

�  Options – Fast Streaming writes – Random RW – With or without time

series

Streaming all writes to HDFS


Redundant Partition

Partitioned Table

Redundant Partition

Partitioned Table

Redundant Partition



CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ….. PARTITION BY COLUMN (FLIGHT_ID) PERSISTENT HDFSSTORE streamingstore WRITEONLY;

CREATE HDFSSTORE streamingstore NAMENODE hdfs://PHD1:8020 DIR /stream-tables BATCHSIZE 10 BATCHTIMEINTERVAL 2000 QUEUEPERSISTENT true;

Read and Write to HDFS


Redundant Partition

Partitioned Table

Redundant Partition

Partitioned Table

Redundant Partition



CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ….. PARTITION BY COLUMN (FLIGHT_ID) PERSISTENT HDFSSTORE RWStore;

CREATE HDFSSTORE RWStore NAMENODE hdfs://PHD1:8020 DIR /indexed-tables BATCHSIZE 10 BATCHTIMEINTERVAL 2000 QUEUEPERSISTENT true;

Write Path – streaming to HDFS

SQL client 5

HDFS

NameNode GemFire XD

DFS Client

Table FLIGHTS (bucket N)

local store (append only)

3 4

GemFire XD

Table FLIGHTS (bucket N backup)


DataNode 6

7

1

2

Directory: /GFXD/APP/FLIGHTS/BucketN

In-memory partitioned data colocated with HD DN

Directory structure in HDFS

/1

/0

bloom index data 0-1-XXX.hop




/GFXD

/APP.FLIGHTS

/1

/0

data 0-1-XXX.shop

data 0-2-XXX.shop

data 1-1-XXX.shop

data 1-2-XXX.shop

/GFXD

/APP.FLIGHT_HISTORY

Table FLIGHT_HISTORY

(bucket 0)

Table FLIGHT_HISTORY

(bucket 1)

Read/Write Write-only

Time-stamped records allow incremental Map/Reduce jobs

Read/Write with Compaction

SQL client 5

HDFS

NameNode GemFire XD

DFS Client



3 4

GemFire XD

Table FLIGHTS (bucket N backup)


DataNode 6

7

1

2

Directory: /GFXD/APP/FLIGHTS/BucketN

Now with sorting!

…and compaction

Log structured merge tree (like HBase, Cassandra)

Read path for HDFS tables

SQL client

4 NameNode GemFire XD

DFS Client



2 3

DataNode 5

6

1

bloom index data

bloom index data

…

Block cache

Short circuit read path for local blocks; Block cache avoids I/O for bloom and index lookups

Tiered compaction •  Async writes allow lock-free sequential I/O

…but more files means slower reads •  Compactions balance read/write throughput

•  Minor compactions merge small files into bigger files •  Major compactions merge all files into one single file

Level 0 bloom index data bloom index data bloom index data

bloom index data

bloom index data

bloom index data … Level 1

Level 2

Time order

…

HDFS

“Closed-loop” with analytics

Level 0 bloom index data bloom index data bloom index data

bloom index data bloom index data … Level 1

Time order

GemFire XD

DFS Client



Map/Reduce Pivotal Hawq

Hive

OutputFormat

InputFormat

Demo environment with PivotalHD

Pulse (monitoring)

SQL client

jdbc:sqlfire://localhost:1527 Virtual Machine

GemFire XD Server

GemFire XD Locator

GemFire XD Server

GemFire XD Server

PivotalHD NameNode

PivotalHD DataNode

Demo: HDFS tables

Operational vs. Historical Data •  Operational data is retained in memory for fast access •  User-supplied criteria identifies operational data

–  Enforced on incoming updates or periodically

•  Query hints or connection properties control use of historical data

CREATE TABLE flights_history (…) PARTITION BY PRIMARY KEY EVICTION BY CRITERIA (LAST_MODIFIED_DURATION > 300000) EVICTION FREQUENCY 60 SECONDS HDFSSTORE (bar);

SELECT * FROM flights_history --PROPERTIES queryHDFS = true WHERE orig_airport = ‘PDX’ AND miles > 1000 ORDER BY dest_airport

Agenda •  MapReduce integration and demo

Hadoop Map/Reduce •  Map/Reduce is a framework for

processing massive data sets in parallel

–  Mapper acts on local file splits to transform individual data elements

–  Reducer receives all values for a key and generates aggregate result

–  Driver provides job configuration –  InputFormat and OutputFormat

define data source and sink

•  Hadoop manages job execution

Node 1

Mapper

Node 2

Mapper

Node 3

Mapper

Node 1

Reducer

Node 2

Reducer

Node 3

Reducer

Shuffle

InputFormat supplies local data Mapper transforms data Hadoop sorts keys Reducer generates aggregate result OutputFormat writes result

Map/Reduce with GemFire XD •  Users can execute Hadoop Map/Reduce jobs against GemFire XD data using

–  EventInputFormat to read data from HDFS without impacting online availability or performance

–  SqlfOutputFormat to write data into SQL table for immediate use by online applications

Hadoop

Reducer SqlfOutputFormat

GemFire XD

table foo jdbc:sqlfire://localhost:1527

PUT INTO foo (…) VALUES (?, ?, …) PUT INTO foo (…) VALUES (?, ?, …) …

Hadoop

Mapper EventInputFormat

DDL file split

Demo: Map/Reduce

Using the InputFormat - Mapper // count each airport present in a FLIGHT_HISTORY row public class SampleMapper extends MapReduceBase implements Mapper<Object, Row, Text, IntWritable> { public void map(Object key, Row row, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { try { IntWritable one = new IntWritable(1); ResultSet rs = row.getRowAsResultSet(); String origAirport = rs.getString("ORIG_AIRPORT"); String destAirport = rs.getString("DEST_AIRPORT"); output.collect(new Text(origAirport), one); output.collect(new Text(destAirport), one); } catch (SQLException e) { … } } }

JobConf conf = new JobConf(getConf()); conf.setJobName("Busy Airport Count"); conf.set(EventInputFormat.HOME_DIR, hdfsHomeDir); conf.set(EventInputFormat.INPUT_TABLE, tableName); conf.setInputFormat(EventInputFormat.class); conf.setMapperClass(SampleMapper.class); ...

Use Spring Hadoop for Job Configuration <beans:beans …> <job id="busyAirportsJob" libs="…" input-‐format="com.vmware.sqlfire.internal.engine.hadoop.mapreduce.EventInputFormat" output-‐path="${flights.intermediate.path}" mapper="demo.sqlf.mr2.BusyAirports.SampleMapper" combiner="org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer" reducer="org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer" /> <job id="topBusyAirportJob" libs="${LIB_DIR}/sqlfire-‐mapreduce-‐1.0-‐SNAPSHOT.jar" input-‐path="${flights.intermediate.path}" output-‐path="${flights.output.path}" mapper="demo.sqlf.mr2.TopBusyAirport.TopBusyAirportMapper" reducer="demo.sqlf.mr2.TopBusyAirport.TopBusyAirportReducer" number-‐reducers="1” /> … </beans:beans>

Using the OutputFormat - Reducer // find the max, aka the busiest airport public class TopBusyAirportReducer extends MapReduceBase implements Reducer<Text, StringIntPair, Key, BusyAirportModel> { public void reduce(Text token, Iterator<StringIntPair> values, OutputCollector<Key, BusyAirportModel> output, Reporter reporter) throws IOException { String topAirport = null; int max = 0; while (values.hasNext()) { StringIntPair v = values.next(); if (v.getSecond() > max) { max = v.getSecond(); topAirport = v.getFirst(); } } BusyAirportModel busy = new BusyAirportModel(topAirport, max); output.collect(null, busy); } }

JobConf conf = new JobConf(getConf()); conf.setJobName("Top Busy Airport"); conf.set(SqlfOutputFormat.OUTPUT_URL, "jdbc:sqlfire://localhost:1527"); conf.set(SqlfOutputFormat.OUTPUT_SCHEMA, "APP"); conf.set(SqlfOutputFormat.OUTPUT_TABLE, "BUSY_AIRPORT"); conf.setReducerClass(TopBusyAirportReducer.class); conf.setOutputKeyClass(Key.class); conf.setOutputValueClass(BusyAirportModel.class); conf.setOutputFormat(SqlfOutputFormat.class); ...

Where do the results go? •  Automatically insert reduced values into output table by

matching column names public class BusyAirportModel { private String airport; private int flights; public BusyAirportModel(String airport, int flights) { this.airport = airport; this.flights = flights; } public void setFlights(int idx, PreparedStatement ps) throws SQLException { ps.setInt(idx, flights); } public void setAirport(int idx, PreparedStatement ps) throws SQLException { ps.setString(idx, airport); } }

PUT INTO BUSY_AIRPORT ( flights, airport) VALUES (?, ?) PUT INTO BUSY_AIRPORT (flights, airport) VALUES (?, ?) …

Scaling Application logic with Parallel “Data Aware

procedures”

Why not Map Reduce?

Source: UC Berkeley Spark project (just the image)

Traditional Map reduce parallel “data aware” procedures

Procedures – managed in spring containers as beans

Java Stored Procedures may be created according to the SQL Standard

SQLFire also supports the JDBC type Types.JAVA_OBJECT. A parameter of type JAVA_OBJECT supports an arbitrary Serializable Java object.

CREATE PROCEDURE getOverBookedFlights () LANGUAGE JAVA PARAMETER STYLE JAVA

READS SQL DATA DYNAMIC RESULT SETS 1 EXTERNAL NAME

‘examples.OverBookedStatus.getOverBookedStatus’;

Data Aware Procedures Parallelize procedure and prune to nodes with required data

CALL [PROCEDURE]

procedure_name

( [ expression [, expression ]* ] ) [ WITH RESULT PROCESSOR processor_name ]

[ { ON TABLE table_name [ WHERE whereClause ] } |

{ ON {ALL | SERVER GROUPS (server_group_name [, server_group_name ]*) }}

]

Extend the procedure call with the following syntax:

Fabric Server 2 Fabric Server 1

Client

Hint the data the procedure depends on

CALL getOverBookedFlights( ) ON TABLE FLIGHTAVAILABILITY

WHERE FLIGHT_ID = ‘AA1116’;

If table is partitioned by columns in the where clause the procedure execution is pruned to nodes with the data (node with “AA1116” in this case)

Parallelize procedure then aggregate (reduce) CALL [PROCEDURE]

procedure_name

( [ expression [, expression ]* ] )

[ WITH RESULT PROCESSOR processor_name ]

[ { ON TABLE table_name [ WHERE whereClause ] } |

{ ON {ALL | SERVER GROUPS (server_group_name [, server_group_name ]*) }}

]

Fabric Server 2 Fabric Server 1

Client

Fabric Server 3

register a Java Result Processor (optional in some cases):

High density storage in Memory – Off Java Heap

Off-heap to minimize JVM copying, GC (MemScale) •  Off-heap memory manager for Java

–  JVM memory manager not designed for volume –  Believe TB memory machines are now commodity class

•  Key principles –  Avoid defrag and compaction of data blocks through reusable buffer pools –  Avoid all the nasty copying in Java heaps

•  (YG – From – To – OldGen – UserToKernal copy – network copy) then repeat on the replicated node side

•  Hadoop exacerbates the copying problem –  Multiple JVMs involved: TaskTracker(JVM) – Data Node (JVM) –

FileSystem/Network –  Let alone all the copies and intermediate disk storage required in MR

shuffling

Integration with SpringXD (future)

•  Spring XD is a distributed, extensible framework for •  Ingestion, real time analytics, batch processing

•  GemFire XD as a source and sink

•  Pluggability in its Runtime (DIRT) •  GemFire XD could be an optional runtime

Comparison to HBase

Reminder for speaker - don’t make this a product pitch J

Some HBase 0.9x challenges

•  HBase inherently is not HA; HDFS is –  Failed segment servers can cause pauses?

•  WAL writes have to synchronously go to HDFS (and its replicas) –  HDFS inherently detects failures slowly (thinks it is a overload)

•  Probability for Hotspots –  Segments are sorted not stored on a random hash

•  WAN replication needs a lot of work •  No Backup, recovery

Some HBase 0.9x challenges

•  No real Querying – just key based range scans –  And, LSM on disk is suboptimal to B+Tree for querying

•  You cannot execute transactions or integrate with RDBs

•  Some like ColumnFamily data model; Really? –  Pros: Self describing, nested model is possible –  Cons: difficult, query engine optimization is difficult; mapping is your

problem, bloat

Learn More. Stay Connected.

Learn more:

Jags – jramnarayan at gopivotal.com Anthony – abaker at gopivotal.com

http://communities.vmware.com/community/vmtn/appplatform/vfabric_sqlfire Twitter: twitter.com/springsource YouTube: youtube.com/user/SpringSourceDev Google +: plus.google.com/+springframework

Extras

Consistency model

•  Replication within cluster is always eager and synchronous •  Row updates are always atomic; No need to use transactions •  FIFO consistency: writes performed by a single thread are seen by all

other processes in the order in which they were issued

Consistency Model without Transactions

•  Consistency in Partitioned tables –  a partitioned table row owned by one member at a point in time –  all updates are serialized to replicas through owner –  "Total ordering" at a row level: atomic and isolated

•  Membership changes and consistency – need another hour J

•  Pessimistic concurrency support using ‘Select for update’ •  Support for referential integrity

Consistency Model without Transactions

Distributed Transactions

•  Full support for distributed transactions •  Support READ_COMITTED and REPEATABLE_READ

•  Highly scalable without any centralized coordinator or lock manager •  We make some important assumptions

•  Most OLTP transactions are small in duration and size

•  W-W conflicts are very rare in practice

•  How does it work? •  Each data node has a sub-coordinator to track TX state •  Eagerly acquire local “write” locks on each replica

•  Object owned by a single primary at a point in time •  Fail fast if lock cannot be obtained

•  Atomic and works with the cluster Failure detection system •  Isolated until commit for READ_COMMITTED

•  Only support local isolation during commit

Distributed Transactions

GFXD Performance benchmark In-memory

How does it perform? Scale?

•  Scale from 2 to 10 servers (one per host) •  Scale from 200 to 1200 simulated clients (10 hosts) •  Single partitioned table: int PK, 40 fields (20 ints, 20 strings)

How does it perform? Scale?

•  CPU% remained low per server – about 30% indicating many more clients could be handled

Is latency low with scale?

•  Latency decreases with server capacity •  50-70% take < 1 millisecond •  About 90% take less than 2 milliseconds

in-memory data and compute on top of hadoop

Technology