in-memory data and compute on top of hadoop

79
© 2013 SpringOne 2GX. All rights reserved. Do not distribute without permission. In-memory data and compute on top of Hadoop Jags Ramnarayan – Chief Architect, Fast Data, Pivotal Anthony Baker – Architect, Fast Data, Pivotal

Upload: spring-io

Post on 06-May-2015

936 views

Category:

Technology


3 download

DESCRIPTION

Speakers: Anthony Baker and Jags Ramnarayan Hadoop gives us dramatic volume scalability at a cheap price. But core Hadoop is designed for sequential access - write once and read many times; making it impossible to use hadoop from a real-time/online application. Add a distributed in-memory tier in front and you could get the best of two worlds - very high speed, concurrency and the ability to scale to very large volume. We present the seamless integration of in-memory data grids with hadoop to achieve interesting new design patterns - ingesting raw or processed data into hadoop, random read-writes on operational data in memory or massive historical data in Hadoop with O(1) lookup times, zero ETL Map-reduce processing, enabling deep-scale SQL processing on data in Hadoop or the ability to easily output analytic models from hadoop into memory. We introduce and present the ideas and code samples through Pivotal in-memory real-time and the Hadoop platform.

TRANSCRIPT

Page 1: In-memory data and compute on top of Hadoop

© 2013 SpringOne 2GX. All rights reserved. Do not distribute without permission.

In-memory data and compute on top of Hadoop

Jags Ramnarayan – Chief Architect, Fast Data, Pivotal Anthony Baker – Architect, Fast Data, Pivotal

Page 2: In-memory data and compute on top of Hadoop

Agenda •  In-memory data grid – concepts, strengths, weaknesses •  HDFS – strengths, weaknesses •  What is our proposal? •  How do you use this? SQL syntax and demo •  HDFS integration architecture and demo •  MapReduce integration and demo

–  In-memory, parallel stored procedures •  Comparison to Hbase

Page 3: In-memory data and compute on top of Hadoop

“It is raining databases in the cloud” (The 451Group) •  Next Gen transactional DB is memory

based, distributed, elastic, HA, cloud ready … –  In-Memory data grids(IMDG),

NoSQL, Caching •  Pivotal GemFire, Oracle coherence,

Redis, Cassandra, …

•  Next Gen OLAP DB is centered around Hadoop –  Driver: They say it is ‘Volume,

velocity, variety’ –  Or, is it just cost/TB?

Page 4: In-memory data and compute on top of Hadoop

Agenda •  In-memory data grid – concepts, strengths, weaknesses •  HDFS – strengths, weaknesses •  What is our proposal? •  How do you use this? SQL syntax and demo •  HDFS integration architecture and demo •  MapReduce integration and demo

–  In-memory, parallel stored procedures •  Comparison to Hbase

Page 5: In-memory data and compute on top of Hadoop

IMDG basic concepts

5

–  Distributed memory oriented store •  KV/Objects or SQL •  Queriable, Indexable and transactional

–  Multiple storage models •  Replication, partitioning in memory •  With synchronous copies in cluster •  Overflow to disk and/or RDBMS

Handle thousands of concurrent connections

Synchronous replication for slow changing data

Replicated Region

Partition for large data or highly transactional data

Partitioned Region

Redundant copy

–  Parallelize Java App logic –  Multiple failure detection schemes –  Dynamic membership (elastic)

–  Vendors differentiate on •  SQL support, WAN, events, etc

Low latency for thousands of

clients

Page 6: In-memory data and compute on top of Hadoop

6

Key IMDG pattern - Distributed Caching •  Designed to work with existing RDBs

–  Read through: Fetch from DB on cache miss –  Write through: Reflect in cache IFF DB write succeeds –  Write behind: reliable, in-order queue and batch write to DB

Page 7: In-memory data and compute on top of Hadoop

Traditional RDB integration can be challenging Memory Tables

(1)

DB WRITER

(2)

(3)

(4)

Memory Tables(1)

DB WRITER

(2)

(3)

(4)

Synchronous “Write through”

Single point of bottleneck and failure Not an option for “Write heavy”

Complex 2-phase commit protocol Parallel recovery is difficult

(1)Queue

(2)Updates

Asynchronous, Batches

DB Synchronizer

(1)Queue

(2)

DB Synchronizer

Updates

Asynchronous “Write behind”

Cannot sustain high “write” rates Queue may have to be persistent

Parallel recovery is difficult

Page 8: In-memory data and compute on top of Hadoop

Some IMDG, NoSQL offer ‘Shared nothing persistence’

•  Append only operation logs

•  Fully parallel •  Zero disk seeks

•  But, cluster restart requires log scan

•  Very large volumes pose challenges

MemoryTables

Append only Operation logs

OS Buffers

LOG Compressor

Record1

Record2

Record3

Record1

Record2

Record3

MemoryTables

Append only Operation logs

OS Buffers

LOG Compressor

Record1

Record2

Record3

Record1

Record2

Record3

Page 9: In-memory data and compute on top of Hadoop

Agenda •  In-memory data grid – concepts, strengths, weaknesses •  HDFS – strengths, weaknesses •  What is our proposal? •  How do you use this? SQL syntax and demo •  HDFS integration architecture and demo •  MapReduce integration and demo

–  In-memory, parallel stored procedures •  Comparison to Hbase

Page 10: In-memory data and compute on top of Hadoop

Hadoop core(HDFS) for scalable, parallel storage

•  maturing and will be ubiquitous •  Handle very large data sets on commodity •  Handle failures well •  Simple Coherency model

Page 11: In-memory data and compute on top of Hadoop

Hadoop design center – batch and sequential

�  64MB immutable blocks

�  For random reads, you have to sequentially walk through records each time

�  Write once, read many design

�  Namenode can be a contention point

�  Slow failure detection

Page 12: In-memory data and compute on top of Hadoop

Hadoop Strengths

�  Massive volumes ( TB to PB)

�  HA, compression

�  Ever growing and maturing eco-system for parallel compute and analytics

�  Storage systems like Isilon now offer HDFS interface

�  Optimized for virtual machines

Page 13: In-memory data and compute on top of Hadoop

Agenda •  In-memory data grid – concepts, strengths, weaknesses •  HDFS – strengths, weaknesses •  What is our proposal? •  How do you use this? SQL syntax and demo •  HDFS integration architecture and demo •  MapReduce integration and demo

–  In-memory, parallel stored procedures •  Comparison to Hbase

Page 14: In-memory data and compute on top of Hadoop

SQL + IMDG(Objects) + HDFS

Data in many shapes – support multiple data models

Main-memory based, distributed low latency, data store for big data

Operational data is the focus. It is in memory (mostly)

All Data, History in HDFS

Page 15: In-memory data and compute on top of Hadoop

SQL + IMDG(Objects) + HDFS

Replication or partitioning

Storage model:

In-memory, In-memory with local disk or In-memory with HDFS persistence

Page 16: In-memory data and compute on top of Hadoop

SQL + IMDG(Objects) + HDFS SQL Engine – designed for online/OLTP, Transactions

IMDG caching features – readThru, writeBehind, etc

Page 17: In-memory data and compute on top of Hadoop

SQL + IMDG(Objects) + HDFS

Tight HDFS integration – streaming, RW cases

analytics without access via in-memory tier – sequential walk through or incremental processing.

With parallel ingestion, you get near real time visibility to data for deep analytics.

Page 18: In-memory data and compute on top of Hadoop

SQL + IMDG(Objects) + HDFS

MR ‘reduce’ can directly emit results to in-memory tier

closed loop between real-time and the analytics

Page 19: In-memory data and compute on top of Hadoop

GemFire XD – a Pivotal HD Service

SQLFire

Pivotal HD GemFire

Clustering, in-memory storage, HA, replication,

WAN, Events, Distributed queue…

SQL engine – cost based optimizer, in-memory indexing,

DTxn, RDB integration..

Integrated Install, config; command center –

monitoring, optimizations to Hadoop

+

Working set in memory, geo replicated History, time series in HDFS

SQL

Objects, JSON

Page 20: In-memory data and compute on top of Hadoop

The real time Latency spectrum

Machine latency

Interactive reports

Batch processing

Human interactions

Milliseconds Seconds Seconds, Minutes Minutes, Hours

GemFire XD, Online/OLTP/Operational DBs Analytics, Data Warehousing PivotalHD HAWQ

Page 21: In-memory data and compute on top of Hadoop

Real time on top of Hadoop – who else?

Many more…. Most focused on interactive queries for analytics

Page 22: In-memory data and compute on top of Hadoop

Design patterns •  Streaming ingest – consume unbounded event streams

–  Write fast into memory; stream all writes to HDFS for batch analytics

•  e.g. Maintain latest price for each security in memory; time series in HDFS

•  Continuously ingest click streams, audit trail or interaction data

–  Trap interactions or OLTP transactions, do in-line stream processing (actionable insights) and write results or raw state into HDFS

Page 23: In-memory data and compute on top of Hadoop

Design patterns •  High performance Operational Database

–  Keep operational data in-memory; history in HDFS is randomly accessible

•  e.g. Last 1 month of trades in-memory but all history is accessible at some cost

–  Take analytic output from Hadoop/SQL analytics and make it visible to online apps

Page 24: In-memory data and compute on top of Hadoop

Agenda •  In-memory data grid – concepts, strengths, weaknesses •  HDFS – strengths, weaknesses •  What is our proposal? •  How do you use this? SQL syntax and demo •  HDFS integration architecture and demo •  MapReduce integration and demo

–  In-memory, parallel stored procedures •  Comparison to Hbase

Page 25: In-memory data and compute on top of Hadoop

Agenda •  How do you use this? SQL syntax and demo

Page 26: In-memory data and compute on top of Hadoop

In-Memory Partitioning & Replication

Page 27: In-memory data and compute on top of Hadoop

Explore features using simple STAR schema

27

FLIGHTS---------------------------------------------

FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEPART_TIME TIME,…..

PRIMARY KEY (FLIGHT_ID, SEGMENT_NUMBER)

FLIGHTAVAILABILITY---------------------------------------------

FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , FLIGHT_DATE DATE NOT NULL , ECONOMY_SEATS_TAKEN INTEGER ,…..

PRIMARY KEY ( FLIGHT_ID, SEGMENT_NUMBER, FLIGHT_DATE))

FOREIGN KEY (FLIGHT_ID, SEGMENT_NUMBER) REFERENCES FLIGHTS ( FLIGHT_ID, SEGMENT_NUMBER)

FLIGHTHISTORY---------------------------------------------

FLIGHT_ID CHAR(6), SEGMENT_NUMBER INTEGER, ORIG_AIRPORT CHAR(3), DEPART_TIME TIME, DEST_AIRPORT CHAR(3),…..

1 – M

1 – 1

SEVERAL CODE/DIMENSION TABLES---------------------------------------------

AIRLINES: AIRLINE INFORMATION (VERY STATIC)COUNTRIES : LIST OF COUNTRIES SERVED BY FLIGHTSCITIES: MAPS: PHOTOS OF REGIONS SERVED

Assume, thousands of flight rows, millions of flightavailability records

Page 28: In-memory data and compute on top of Hadoop

Creating tables

Table

CREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. );

GF XD GF XD GF XD

Page 29: In-memory data and compute on top of Hadoop

Replicated tables CREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL PRIMARY KEY, AIRLINE_FULL VARCHAR(24), BASIC_RATE DOUBLE PRECISION, DISTANCE_DISCOUNT DOUBLE PRECISION,…. ) REPLICATE;

Replicated Table Replicated Table Replicated Table

GF XD GF XD GF XD

Design Pattern Replicate reference tables in

STAR schemas (seldom change, often referenced in queries)

Page 30: In-memory data and compute on top of Hadoop

Partitioned tables CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEST_AIRPORT CHAR(3) DEPART_TIME TIME, FLIGHT_MILES INTEGER NOT NULL) PARTITION BY COLUMN(FLIGHT_ID);

Table Partitioned Table Partitioned Table Partitioned Table

Replicated Table Replicated Table Replicated Table

GF XD GF XD GF XD

Design Pattern Partition Fact tables in STAR schemas for load balancing

(large, write heavy)

Page 31: In-memory data and compute on top of Hadoop

Partitioned but highly available CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ORIG_AIRPORT CHAR(3), DEST_AIRPORT CHAR(3) DEPART_TIME TIME, FLIGHT_MILES INTEGER NOT NULL) PARTITION BY COLUMN (FLIGHT_ID) REDUNDANCY 1;

Table Partitioned Table

Redundant Partition

Partitioned Table

Redundant Partition

Partitioned Table

Redundant Partition

Replicated Table Replicated Table Replicated Table

GF XD GF XD GF XD

Design Pattern Increase redundant copies for HA and load balancing queries

across replicas

Page 32: In-memory data and compute on top of Hadoop

Colocation for related data

Table Partitioned Table

Redundant Partition

Partitioned Table

Redundant Partition

Partitioned Table

Redundant Partition

Replicated Table Replicated Table Replicated Table

GF XD GF XD GF XD

Colocated Partition Colocated Partition Colocated Partition

Redundant Partition Redundant Partition Redundant Partition

CREATE TABLE FLIGHTAVAILABILITY ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , …..

PARTITION BY COLUMN (FLIGHT_ID) COLOCATE WITH (FLIGHTS);

Design Pattern Colocate related tables for maximum join performance

Page 33: In-memory data and compute on top of Hadoop

Native Disk resident tables (operation logging)

Table Partitioned Table

Redundant Partition

Partitioned Table

Redundant Partition

Partitioned Table

Redundant Partition

Replicated Table Replicated Table Replicated Table

GF XD GF XD GF XD

Colocated Partition Colocated Partition Colocated Partition

Redundant Partition Redundant Partition Redundant Partition

sqlf backup /export/fileServerDirectory/sqlfireBackupLocation

Data dictionary is always persisted in each server

CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , …..

PARTITION BY COLUMN (FLIGHT_ID) PERSISTENT;

Page 34: In-memory data and compute on top of Hadoop

Demo environment

SQL client

Virtual Machine

GemFire XD Server

GemFire XD Locator

GemFire XD Server

GemFire XD Server

jdbc:sqlfire://localhost:1527

Pulse (monitoring)

Page 35: In-memory data and compute on top of Hadoop

Demo: replicated and partitioned tables

Page 36: In-memory data and compute on top of Hadoop

Agenda •  HDFS integration architecture and demo

Page 37: In-memory data and compute on top of Hadoop

Effortless HDFS integration

�  Options – Fast Streaming writes – Random RW – With or without time

series

Page 38: In-memory data and compute on top of Hadoop

Streaming all writes to HDFS

Table Partitioned Table

Redundant Partition

Partitioned Table

Redundant Partition

Partitioned Table

Redundant Partition

Replicated Table Replicated Table Replicated Table

Colocated Partition Colocated Partition Colocated Partition

CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ….. PARTITION BY COLUMN (FLIGHT_ID) PERSISTENT HDFSSTORE streamingstore WRITEONLY;

CREATE HDFSSTORE streamingstore NAMENODE hdfs://PHD1:8020 DIR /stream-tables BATCHSIZE 10 BATCHTIMEINTERVAL 2000 QUEUEPERSISTENT true;

Page 39: In-memory data and compute on top of Hadoop

Read and Write to HDFS

Table Partitioned Table

Redundant Partition

Partitioned Table

Redundant Partition

Partitioned Table

Redundant Partition

Replicated Table Replicated Table Replicated Table

Colocated Partition Colocated Partition Colocated Partition

CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , SEGMENT_NUMBER INTEGER NOT NULL , ….. PARTITION BY COLUMN (FLIGHT_ID) PERSISTENT HDFSSTORE RWStore;

CREATE HDFSSTORE RWStore NAMENODE hdfs://PHD1:8020 DIR /indexed-tables BATCHSIZE 10 BATCHTIMEINTERVAL 2000 QUEUEPERSISTENT true;

Page 40: In-memory data and compute on top of Hadoop

Write Path – streaming to HDFS

SQL client 5

HDFS

NameNode GemFire XD

DFS Client

Table FLIGHTS (bucket N)

local store (append only)

3 4

GemFire XD

Table FLIGHTS (bucket N backup)

local store (append only)

DataNode 6

7

1

2

Directory: /GFXD/APP/FLIGHTS/BucketN

In-memory partitioned data colocated with HD DN

Page 41: In-memory data and compute on top of Hadoop

Directory structure in HDFS

/1

/0

bloom index data 0-1-XXX.hop

bloom index data 0-2-XXX.hop

bloom index data 1-1-XXX.hop

bloom index data 1-2-XXX.hop

/GFXD

/APP.FLIGHTS

/1

/0

data 0-1-XXX.shop

data 0-2-XXX.shop

data 1-1-XXX.shop

data 1-2-XXX.shop

/GFXD

/APP.FLIGHT_HISTORY

Table FLIGHT_HISTORY

(bucket 0)

Table FLIGHT_HISTORY

(bucket 1)

Read/Write Write-only

Time-stamped records allow incremental Map/Reduce jobs

Page 42: In-memory data and compute on top of Hadoop

Read/Write with Compaction

SQL client 5

HDFS

NameNode GemFire XD

DFS Client

Table FLIGHTS (bucket N)

local store (append only)

3 4

GemFire XD

Table FLIGHTS (bucket N backup)

local store (append only)

DataNode 6

7

1

2

Directory: /GFXD/APP/FLIGHTS/BucketN

Now with sorting!

…and compaction

Log structured merge tree (like HBase, Cassandra)

Page 43: In-memory data and compute on top of Hadoop

Read path for HDFS tables

SQL client

4 NameNode GemFire XD

DFS Client

Table FLIGHTS (bucket N)

local store (append only)

2 3

DataNode 5

6

1

bloom index data

bloom index data

Block cache

Short circuit read path for local blocks; Block cache avoids I/O for bloom and index lookups

Page 44: In-memory data and compute on top of Hadoop

Tiered compaction •  Async writes allow lock-free sequential I/O

…but more files means slower reads •  Compactions balance read/write throughput

•  Minor compactions merge small files into bigger files •  Major compactions merge all files into one single file

Level 0 bloom index data bloom index data bloom index data

bloom index data

bloom index data

bloom index data … Level 1

Level 2

Time order

Page 45: In-memory data and compute on top of Hadoop

HDFS

“Closed-loop” with analytics

Level 0 bloom index data bloom index data bloom index data

bloom index data bloom index data … Level 1

Time order

GemFire XD

DFS Client

Table FLIGHTS (bucket N)

local store (append only)

Map/Reduce Pivotal Hawq

Hive

OutputFormat

InputFormat

Page 46: In-memory data and compute on top of Hadoop

Demo environment with PivotalHD

Pulse (monitoring)

SQL client

jdbc:sqlfire://localhost:1527 Virtual Machine

GemFire XD Server

GemFire XD Locator

GemFire XD Server

GemFire XD Server

PivotalHD NameNode

PivotalHD DataNode

Page 47: In-memory data and compute on top of Hadoop

Demo: HDFS tables

Page 48: In-memory data and compute on top of Hadoop

Operational vs. Historical Data •  Operational data is retained in memory for fast access •  User-supplied criteria identifies operational data

–  Enforced on incoming updates or periodically

•  Query hints or connection properties control use of historical data

CREATE TABLE flights_history (…) PARTITION BY PRIMARY KEY EVICTION BY CRITERIA (LAST_MODIFIED_DURATION > 300000) EVICTION FREQUENCY 60 SECONDS HDFSSTORE (bar);

SELECT * FROM flights_history --PROPERTIES queryHDFS = true WHERE orig_airport = ‘PDX’ AND miles > 1000 ORDER BY dest_airport

Page 49: In-memory data and compute on top of Hadoop

Agenda •  MapReduce integration and demo

Page 50: In-memory data and compute on top of Hadoop

Hadoop Map/Reduce •  Map/Reduce is a framework for

processing massive data sets in parallel

–  Mapper acts on local file splits to transform individual data elements

–  Reducer receives all values for a key and generates aggregate result

–  Driver provides job configuration –  InputFormat and OutputFormat

define data source and sink

•  Hadoop manages job execution

Node 1

Mapper

Node 2

Mapper

Node 3

Mapper

Node 1

Reducer

Node 2

Reducer

Node 3

Reducer

Shuffle

InputFormat supplies local data Mapper transforms data Hadoop sorts keys Reducer generates aggregate result OutputFormat writes result

Page 51: In-memory data and compute on top of Hadoop

Map/Reduce with GemFire XD •  Users can execute Hadoop Map/Reduce jobs against GemFire XD data using

–  EventInputFormat to read data from HDFS without impacting online availability or performance

–  SqlfOutputFormat to write data into SQL table for immediate use by online applications

Hadoop

Reducer SqlfOutputFormat

GemFire XD

table foo jdbc:sqlfire://localhost:1527

PUT INTO foo (…) VALUES (?, ?, …) PUT INTO foo (…) VALUES (?, ?, …) …

Hadoop

Mapper EventInputFormat

DDL file split

Page 52: In-memory data and compute on top of Hadoop

Demo: Map/Reduce

Page 53: In-memory data and compute on top of Hadoop

Using the InputFormat - Mapper //  count  each  airport  present  in  a  FLIGHT_HISTORY  row  public  class  SampleMapper  extends  MapReduceBase      implements  Mapper<Object,  Row,  Text,  IntWritable>  {        public  void  map(Object  key,  Row  row,              OutputCollector<Text,  IntWritable>  output,              Reporter  reporter)  throws  IOException  {          try  {              IntWritable  one  =  new  IntWritable(1);              ResultSet  rs  =  row.getRowAsResultSet();              String  origAirport  =  rs.getString("ORIG_AIRPORT");              String  destAirport  =  rs.getString("DEST_AIRPORT");              output.collect(new  Text(origAirport),  one);              output.collect(new  Text(destAirport),  one);          }  catch  (SQLException  e)  {              …          }      }  }  

JobConf  conf  =  new  JobConf(getConf());  conf.setJobName("Busy  Airport  Count");    conf.set(EventInputFormat.HOME_DIR,  hdfsHomeDir);  conf.set(EventInputFormat.INPUT_TABLE,  tableName);    conf.setInputFormat(EventInputFormat.class);  conf.setMapperClass(SampleMapper.class);    ...    

Page 54: In-memory data and compute on top of Hadoop

Use Spring Hadoop for Job Configuration <beans:beans  …>    <job  id="busyAirportsJob"        libs="…"        input-­‐format="com.vmware.sqlfire.internal.engine.hadoop.mapreduce.EventInputFormat"        output-­‐path="${flights.intermediate.path}"        mapper="demo.sqlf.mr2.BusyAirports.SampleMapper"        combiner="org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer"        reducer="org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer"    />      <job  id="topBusyAirportJob"          libs="${LIB_DIR}/sqlfire-­‐mapreduce-­‐1.0-­‐SNAPSHOT.jar"          input-­‐path="${flights.intermediate.path}"          output-­‐path="${flights.output.path}"          mapper="demo.sqlf.mr2.TopBusyAirport.TopBusyAirportMapper"          reducer="demo.sqlf.mr2.TopBusyAirport.TopBusyAirportReducer"          number-­‐reducers="1”    />    …  </beans:beans>  

Page 55: In-memory data and compute on top of Hadoop

Using the OutputFormat - Reducer //  find  the  max,  aka  the  busiest  airport  public  class  TopBusyAirportReducer  extends  MapReduceBase      implements  Reducer<Text,  StringIntPair,  Key,  BusyAirportModel>  {        public  void  reduce(Text  token,  Iterator<StringIntPair>  values,                  OutputCollector<Key,  BusyAirportModel>  output,                    Reporter  reporter)  throws  IOException  {            String  topAirport  =  null;          int  max  =  0;            while  (values.hasNext())  {              StringIntPair  v  =  values.next();              if  (v.getSecond()  >  max)  {                  max  =  v.getSecond();                  topAirport  =  v.getFirst();              }          }          BusyAirportModel  busy  =                new  BusyAirportModel(topAirport,  max);          output.collect(null,  busy);      }  }  

JobConf  conf  =  new  JobConf(getConf());  conf.setJobName("Top  Busy  Airport");    conf.set(SqlfOutputFormat.OUTPUT_URL,          "jdbc:sqlfire://localhost:1527");  conf.set(SqlfOutputFormat.OUTPUT_SCHEMA,  "APP");  conf.set(SqlfOutputFormat.OUTPUT_TABLE,          "BUSY_AIRPORT");    conf.setReducerClass(TopBusyAirportReducer.class);  conf.setOutputKeyClass(Key.class);  conf.setOutputValueClass(BusyAirportModel.class);  conf.setOutputFormat(SqlfOutputFormat.class);    ...    

Page 56: In-memory data and compute on top of Hadoop

Where do the results go? •  Automatically insert reduced values into output table by

matching column names public  class  BusyAirportModel  {      private  String  airport;      private  int  flights;        public  BusyAirportModel(String  airport,  int  flights)  {          this.airport  =  airport;          this.flights  =  flights;      }        public  void  setFlights(int  idx,  PreparedStatement  ps)              throws  SQLException  {          ps.setInt(idx,  flights);      }        public  void  setAirport(int  idx,  PreparedStatement  ps)              throws  SQLException  {          ps.setString(idx,  airport);      }  }  

PUT INTO BUSY_AIRPORT ( flights, airport) VALUES (?, ?) PUT INTO BUSY_AIRPORT (flights, airport) VALUES (?, ?) …

Page 57: In-memory data and compute on top of Hadoop

Agenda •  In-memory data grid – concepts, strengths, weaknesses •  HDFS – strengths, weaknesses •  What is our proposal? •  How do you use this? SQL syntax and demo •  HDFS integration architecture and demo •  MapReduce integration and demo

–  In-memory, parallel stored procedures •  Comparison to Hbase

Page 58: In-memory data and compute on top of Hadoop

Scaling Application logic with Parallel “Data Aware

procedures”

Page 59: In-memory data and compute on top of Hadoop

Why not Map Reduce?

Source: UC Berkeley Spark project (just the image)

Traditional Map reduce parallel “data aware” procedures

Page 60: In-memory data and compute on top of Hadoop

Procedures – managed in spring containers as beans

Java Stored Procedures may be created according to the SQL Standard

SQLFire also supports the JDBC type Types.JAVA_OBJECT. A parameter of type JAVA_OBJECT supports an arbitrary Serializable Java object.

CREATE PROCEDURE getOverBookedFlights () LANGUAGE JAVA PARAMETER STYLE JAVA

READS SQL DATA DYNAMIC RESULT SETS 1 EXTERNAL NAME

‘examples.OverBookedStatus.getOverBookedStatus’;

Page 61: In-memory data and compute on top of Hadoop

Data Aware Procedures Parallelize procedure and prune to nodes with required data

CALL [PROCEDURE]

procedure_name

( [ expression [, expression ]* ] ) [ WITH RESULT PROCESSOR processor_name ]

[ { ON TABLE table_name [ WHERE whereClause ] } |

{ ON {ALL | SERVER GROUPS (server_group_name [, server_group_name ]*) }}

]

Extend the procedure call with the following syntax:

Fabric Server 2 Fabric Server 1

Client

Hint the data the procedure depends on

CALL getOverBookedFlights( ) ON TABLE FLIGHTAVAILABILITY

WHERE FLIGHT_ID = ‘AA1116’;

If table is partitioned by columns in the where clause the procedure execution is pruned to nodes with the data (node with “AA1116” in this case)

Page 62: In-memory data and compute on top of Hadoop

Parallelize procedure then aggregate (reduce) CALL [PROCEDURE]

procedure_name

( [ expression [, expression ]* ] )

[ WITH RESULT PROCESSOR processor_name ]

[ { ON TABLE table_name [ WHERE whereClause ] } |

{ ON {ALL | SERVER GROUPS (server_group_name [, server_group_name ]*) }}

]

Fabric Server 2 Fabric Server 1

Client

Fabric Server 3

register a Java Result Processor (optional in some cases):

Page 63: In-memory data and compute on top of Hadoop

High density storage in Memory – Off Java Heap

Page 64: In-memory data and compute on top of Hadoop

Off-heap to minimize JVM copying, GC (MemScale) •  Off-heap memory manager for Java

–  JVM memory manager not designed for volume –  Believe TB memory machines are now commodity class

•  Key principles –  Avoid defrag and compaction of data blocks through reusable buffer pools –  Avoid all the nasty copying in Java heaps

•  (YG – From – To – OldGen – UserToKernal copy – network copy) then repeat on the replicated node side

•  Hadoop exacerbates the copying problem –  Multiple JVMs involved: TaskTracker(JVM) – Data Node (JVM) –

FileSystem/Network –  Let alone all the copies and intermediate disk storage required in MR

shuffling

Page 65: In-memory data and compute on top of Hadoop

Integration with SpringXD (future)

•  Spring XD is a distributed, extensible framework for •  Ingestion, real time analytics, batch processing

•  GemFire XD as a source and sink

•  Pluggability in its Runtime (DIRT) •  GemFire XD could be an optional runtime

Page 66: In-memory data and compute on top of Hadoop

Comparison to HBase

Reminder for speaker - don’t make this a product pitch J

Page 67: In-memory data and compute on top of Hadoop

Some HBase 0.9x challenges

•  HBase inherently is not HA; HDFS is –  Failed segment servers can cause pauses?

•  WAL writes have to synchronously go to HDFS (and its replicas) –  HDFS inherently detects failures slowly (thinks it is a overload)

•  Probability for Hotspots –  Segments are sorted not stored on a random hash

•  WAN replication needs a lot of work •  No Backup, recovery

Page 68: In-memory data and compute on top of Hadoop

Some HBase 0.9x challenges

•  No real Querying – just key based range scans –  And, LSM on disk is suboptimal to B+Tree for querying

•  You cannot execute transactions or integrate with RDBs

•  Some like ColumnFamily data model; Really? –  Pros: Self describing, nested model is possible –  Cons: difficult, query engine optimization is difficult; mapping is your

problem, bloat

Page 69: In-memory data and compute on top of Hadoop

Learn More. Stay Connected.

Learn more:

Jags – jramnarayan at gopivotal.com Anthony – abaker at gopivotal.com

http://communities.vmware.com/community/vmtn/appplatform/vfabric_sqlfire Twitter: twitter.com/springsource YouTube: youtube.com/user/SpringSourceDev Google +: plus.google.com/+springframework

Page 70: In-memory data and compute on top of Hadoop

Extras

Page 71: In-memory data and compute on top of Hadoop

Consistency model

Page 72: In-memory data and compute on top of Hadoop

•  Replication within cluster is always eager and synchronous •  Row updates are always atomic; No need to use transactions •  FIFO consistency: writes performed by a single thread are seen by all

other processes in the order in which they were issued

Consistency Model without Transactions

Page 73: In-memory data and compute on top of Hadoop

•  Consistency in Partitioned tables –  a partitioned table row owned by one member at a point in time –  all updates are serialized to replicas through owner –  "Total ordering" at a row level: atomic and isolated

•  Membership changes and consistency – need another hour J

•  Pessimistic concurrency support using ‘Select for update’ •  Support for referential integrity

Consistency Model without Transactions

Page 74: In-memory data and compute on top of Hadoop

Distributed Transactions

•  Full support for distributed transactions •  Support READ_COMITTED and REPEATABLE_READ

•  Highly scalable without any centralized coordinator or lock manager •  We make some important assumptions

•  Most OLTP transactions are small in duration and size

•  W-W conflicts are very rare in practice

Page 75: In-memory data and compute on top of Hadoop

•  How does it work? •  Each data node has a sub-coordinator to track TX state •  Eagerly acquire local “write” locks on each replica

•  Object owned by a single primary at a point in time •  Fail fast if lock cannot be obtained

•  Atomic and works with the cluster Failure detection system •  Isolated until commit for READ_COMMITTED

•  Only support local isolation during commit

Distributed Transactions

Page 76: In-memory data and compute on top of Hadoop

GFXD Performance benchmark In-memory

Page 77: In-memory data and compute on top of Hadoop

How does it perform? Scale?

•  Scale from 2 to 10 servers (one per host) •  Scale from 200 to 1200 simulated clients (10 hosts) •  Single partitioned table: int PK, 40 fields (20 ints, 20 strings)

Page 78: In-memory data and compute on top of Hadoop

How does it perform? Scale?

•  CPU% remained low per server – about 30% indicating many more clients could be handled

Page 79: In-memory data and compute on top of Hadoop

Is latency low with scale?

•  Latency decreases with server capacity •  50-70% take < 1 millisecond •  About 90% take less than 2 milliseconds