building apps with distributed in-memory computing using apache geode

45
Building Apps with Distributed In-Memory Computing using Apache Geode Nitin Lamba @nlamba9 (incubating) William Markito @william_markito

Upload: pivotalopensourcehub

Post on 14-Jan-2017

572 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Building Apps with Distributed In-Memory Computing Using Apache Geode

Building Apps with Distributed In-Memory Computing

using Apache Geode

Nitin Lamba@nlamba9

(incubating)

William Markito@william_markito

Page 2: Building Apps with Distributed In-Memory Computing Using Apache Geode

Introduction (Nitin) • WHAT? Overview & history • WHY? Relevance & Differentiators • HOW? Features & Basic Concepts • SEE! Quick start

Hands-on (William) • LEARN: Advanced Concepts - Persistence, f(x), PDX, … • SHOW: Demos (Docker, PDX)

Resources Q & A

Agenda

2

Page 3: Building Apps with Distributed In-Memory Computing Using Apache Geode

IntroductionNitin

3

Page 4: Building Apps with Distributed In-Memory Computing Using Apache Geode

From GEM to GEODE…

4

Page 5: Building Apps with Distributed In-Memory Computing Using Apache Geode

A distributed, memory-based data management platform for data oriented apps that need: • high performance, scalability,

resiliency and continuous availability

• fast access to critical data sets • location-aware distributed data

processing • event-driven data architecture

What is GEODE?

5

Page 6: Building Apps with Distributed In-Memory Computing Using Apache Geode

High-level Architecture

6

Powerful app development kit • APIs: Java & REST • Adapters: Redis, Lucene*, Spark*, …

Multiple persistence options • Filesystem, RDBMS or HDFS* • Sync: read-through, write-through • Async: write-behind

Durable <K,V> cache/ store • Data replicated or partitioned • Redundant storage in-memory/ disk • Flexible data retention policiesÎ

!

Loca

tor

Serv

er

Serv

er

Serv

er

Serv

er +""""

"

$

% % %

&& &% % % % % % % %

&&

A Peer-2-Peer Distributed System

REST

!

* Experimental and waiting community feedback

Page 7: Building Apps with Distributed In-Memory Computing Using Apache Geode

• 1000+ systems in production (real customers) • Cutting edge use cases

Incubating but ROCK solid…

7

<2000 2004 2008 2012 2016

Early drivers • Data Volumes • Margins/ transactions • IT maintenance costs • Elasticity needs

Real-time needs • Real-time response • Time to market needs • Flexible Data Models • Persistent+In-memory

Global Data • Visibility across DC • Fast Ingest • Device to enterprise • Uptime (always on)

Open Source! • Apache Incubation • Gemfire > Geode • M1 release • 1st Geode Summit

Financial Services

US DoDTrade Clearing

Travel Portal

Online Gambling

TelcosManufacturing

Auto InsurancePayroll processing

Rail systems

Page 8: Building Apps with Distributed In-Memory Computing Using Apache Geode

…with both SCALE and SPEED, …

8

40K

Transactionsper second

3TB Data

in-memory

17B Records

in-memory

120K

Concurrent users

Page 9: Building Apps with Distributed In-Memory Computing Using Apache Geode

… and impacting a LOT of people!

9

China RailwayCorporation

Indian Railways

19%

17%

36%

of the world population

Page 10: Building Apps with Distributed In-Memory Computing Using Apache Geode

Built for PERFORMANCE…

10

Ope

ratio

ns p

er s

econ

d

0

200,000

400,000

600,000

800,000

YCSB Workloads

A Re

ads

A U

pdat

es

B Re

ads

B U

pdat

es

C R

eads

D In

serts

D R

eads

F Re

ads

F U

pdat

es

CassandraGeode

Page 11: Building Apps with Distributed In-Memory Computing Using Apache Geode

…and horizontal, consistent SCALABILITY!

11

Horizontal scaling for reads, consistent latency and CPU

0

4.5

9

13.5

18

Speedu

p

0

1.25

2.5

3.75

5

ServerHosts

2 4 6 8 10

speedup latency(ms) CPU%

• Scaled from 256 clients and 2 servers to 1280 clients and 10 servers• Partitioned region with redundancy and 1K data size

Page 12: Building Apps with Distributed In-Memory Computing Using Apache Geode

• Minimize copying

• Minimize contention points

• Run user code in-process

• Partitioning & parallelism

• Avoid disk seeks

• Automated benchmarks

What makes it go FAST?

12

Page 13: Building Apps with Distributed In-Memory Computing Using Apache Geode

• Cache • Region • Member • Client Cache • Persistence • Functions • Events & Listeners • High Availability • Serialization

Let’s talk about a few (basic) CONCEPTS…

13

Page 14: Building Apps with Distributed In-Memory Computing Using Apache Geode

• In-memory storage and management for your data

• Configurable through XML, Java API or CLI

• Collection of Region

What is a CACHE?

14

Region

Region

Region

Cache

JVM

Page 15: Building Apps with Distributed In-Memory Computing Using Apache Geode

• Distributed java.util.Map on steroids (Key/Value)

• Consistent API regardless of where or how data is stored

• Observable (reactive)

• Highly available, redundant on cache Member (s).

What is a REGION?

15

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

Page 16: Building Apps with Distributed In-Memory Computing Using Apache Geode

• Local, Replicated or Partitioned

• In-memory or persistent

• Redundant

• LRU

• Overflow

Region: Types & Options

16

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

LOCALLOCAL_HEAP_LRULOCAL_OVERFLOWLOCAL_PERSISTENTLOCAL_PERSISTENT_OVERFLOWPARTITIONPARTITION_HEAP_LRUPARTITION_OVERFLOWPARTITION_PERSISTENTPARTITION_PERSISTENT_OVERFLOWPARTITION_PROXYPARTITION_PROXY_REDUNDANTPARTITION_REDUNDANTPARTITION_REDUNDANT_HEAP_LRUPARTITION_REDUNDANT_OVERFLOWPARTITION_REDUNDANT_PERSISTENTPARTITION_REDUNDANT_PERSISTENT_OVERFLOWREPLICATEREPLICATE_HEAP_LRUREPLICATE_OVERFLOWREPLICATE_PERSISTENTREPLICATE_PERSISTENT_OVERFLOWREPLICATE_PROXY

Page 17: Building Apps with Distributed In-Memory Computing Using Apache Geode

• Durability

• WAL for efficient writing

• Consistent recovery

• Compaction

Persistent Regions

17

Modify k1->v5

Create k6->v6

Create k2->v2

Create k4->v4 Oplog2.crf

Member 1

Modify k4->v7 Oplog3.crf

Put k4->v7

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

Region

Cache

java.util.Map

JVM

Key Value

K01 May

K02 Tim

Server 1 Server N

Page 18: Building Apps with Distributed In-Memory Computing Using Apache Geode

• A process that has a connection to the system

• A process that has created a cache

• Embeddable within your application

What is a MEMBER?

18

Client

Locator

Server

Page 19: Building Apps with Distributed In-Memory Computing Using Apache Geode

• A process connected to the Geode server(s)

• Can have a local copy of the data

• Run OQL queries on local data

• Can be notified about events on the servers

What is a CLIENT CACHE?

19

Application

GemFire Server

Region

Region

Region Client Cache

Page 20: Building Apps with Distributed In-Memory Computing Using Apache Geode

• Clone & Build

• Start Services

• Create & Monitor Region

How to START? Easy as !!

20

gitclonehttps://github.com/apache/incubator-geodecdincubator-geode./gradlewbuild-Dskip.tests=true

cdgemfire-assembly/build/install/apache-geode./bin/gfshgfsh>startlocator--name=locatorgfsh>startserver--name=server

gfsh>createregion--name=myRegion—type=REPLICATEgfsh>start[pulse|jconsole]

1

2

3

'

1 2 3

Page 21: Building Apps with Distributed In-Memory Computing Using Apache Geode

Hands OnWilliam

21

Page 22: Building Apps with Distributed In-Memory Computing Using Apache Geode

• Cache • Region • Member • Client Cache • Persistence • Functions • Events & Listeners • High Availability • Serialization

More (advanced) CONCEPTS…

22

Page 23: Building Apps with Distributed In-Memory Computing Using Apache Geode

Persistence - Shared Nothing

23

Server 3Server 2Server 1

Page 24: Building Apps with Distributed In-Memory Computing Using Apache Geode

Persistence - Shared Nothing

24

Server 3Server 2Server 1

B1

B3

B2

B1

B3

B2

Primary

Secondary

Page 25: Building Apps with Distributed In-Memory Computing Using Apache Geode

Persistence - Shared Nothing

25

Server 3Server 2Server 1

B1

B3

B2

B1

B3

B2

Primary

Secondary

Page 26: Building Apps with Distributed In-Memory Computing Using Apache Geode

Persistence - Shared Nothing

26

Server 3Server 2Server 1

B1

B3

B2

B1

B3

B2

Primary

Secondary

Page 27: Building Apps with Distributed In-Memory Computing Using Apache Geode

Persistence - Shared Nothing

27

Server 3Server 2Server 1

B1

B3

B2

B1

B3

B2

Primary

Secondary

Page 28: Building Apps with Distributed In-Memory Computing Using Apache Geode

Persistence - Shared Nothing

28

Server 3Server 2Server 1

B1

B3

B2

B1

B3

B2

Primary

Secondary

B3

B2

Server 1 waits for others when it starts

Page 29: Building Apps with Distributed In-Memory Computing Using Apache Geode

Persistence - Shared Nothing

29

Server 3Server 2Server 1

B1

B3

B2

B1

B3

B2

Primary

Secondary

Fetches missed operations on restart

Page 30: Building Apps with Distributed In-Memory Computing Using Apache Geode

Persistence - Operational Logs

30

Create k1->v1

Create k2->v2

Modifyk1->v3

Create k4->v4

Modify k1->v5

Create k6->v6

Member 1Put k6->v6

Oplog2.crf

Oplog1.crf

Append to operation log

Page 31: Building Apps with Distributed In-Memory Computing Using Apache Geode

Persistence - Operational Logs: Compaction

31

Create k1->v1

Create k2->v2

Modifyk1->v3

Create k4->v4

Modify k1->v5

Create k6->v6

Member 1Put k6->v6

Oplog2.crf

Oplog1.crf

Append to operation log

Copy live data forward

Page 32: Building Apps with Distributed In-Memory Computing Using Apache Geode

• Used for distributed concurrent processing (Map/Reduce, stored procedure)

• Highly available

• Data oriented

• Member oriented

Functions

32

Submit (f1)

f1 , f2 , … fn

Execute Functions

Page 33: Building Apps with Distributed In-Memory Computing Using Apache Geode

Functions

33

Server

Server

FunctionService.onRegion.withFilter.execute ResultCollector.getResult

Server Distributed System

execute

Server

Server

6

1

result

execute

execute

result result

2

5

3

4 3 4

Server

Partitioned Region Data Store - X

Partitioned Region Data Store - Y

Partitioned Region Data Store - Z

Partitioned Region Data Accessor

Partitioned Region Data Accessor

filter = Keys X, Y Client Region

Page 34: Building Apps with Distributed In-Memory Computing Using Apache Geode

• Register Interest • Individual Keys OR RegEx for Keys • Updates Local Copy

• Examples: • region.registerInterest(“key-1”); • region1.registerInterestRegex(“[a-z]+“);

• Continuous Query • Receive Notification when Query condition met on server • Example:

• SELECT * FROM /tradeOrder t WHERE t.price > 100.00

Can be DURABLE

Events & Notifications

34

Page 35: Building Apps with Distributed In-Memory Computing Using Apache Geode

• CacheWriter / CacheListener

• AsyncEventListener (queue / batch)

• Parallel or Serial

• Conflation

Listeners

35

Page 36: Building Apps with Distributed In-Memory Computing Using Apache Geode

High Availability

36

Page 37: Building Apps with Distributed In-Memory Computing Using Apache Geode

Fixed or Flexible schema?

37

id name age pet_id

or

{id:1,name:“Fred”,age:42,pet:{name:“Barney”,type:“dino”}}

Page 38: Building Apps with Distributed In-Memory Computing Using Apache Geode

Portable Data eXchange (PDX)

38

C#, C++, Java, JSON

No IDL, no schemas, no hand-coding Schema evolution (Forward and Backward Compatible)

* domain object classes not required

|header|data||pdx|length|dsid|typeid|fields|offsets|

Page 39: Building Apps with Distributed In-Memory Computing Using Apache Geode

Efficient for queries

39

{id:1,name:“Fred”,age:42,pet:{name:“Barney”,type:“dino”}}

SELECTp.nameFROM/PersonpWHEREp.pet.type=“dino”

single field deserialization

Page 40: Building Apps with Distributed In-Memory Computing Using Apache Geode

But HOW to serialize data?

40

Benchmark: https://github.com/eishay/jvm-serializers

Page 41: Building Apps with Distributed In-Memory Computing Using Apache Geode

Schema Evolution

41

Member A Member B

Distributed Type Definitions

v2v1

Application #1

Application #2

v2 objects preserve data from missing fields

v1 objects use default values to fill in new fields

PDX provides forwards and backwards compatibility, no code required

Page 42: Building Apps with Distributed In-Memory Computing Using Apache Geode

Demo(Docker, PDX, …)

42

Page 43: Building Apps with Distributed In-Memory Computing Using Apache Geode

Code • New features • Bug fixes • Writing tests

Documentation • Wiki • Web site • User guide

How to CONTRIBUTE?

43

Community • Join the mailing list

• Ask or answer • Join our HipChat • Become a speaker • Finding bugs • Testing an RC/Beta

Page 44: Building Apps with Distributed In-Memory Computing Using Apache Geode

Website http://geode.incubator.apache.org/ JIRA

https://issues.apache.org/jira/browse/GEODE Wiki

cwiki.apache.org/confluence/display/GEODE GitHub

https://github.com/apache/incubator-geode Mailing lists

mail-archives.apache.org/mod_mbox/incubator-geode-dev/

Where to BEGIN?

44

Page 45: Building Apps with Distributed In-Memory Computing Using Apache Geode

45

Thank you!http://geode.incubator.apache.org

https://github.com/Pivotal-Open-Source-Hub