slides for the apache geode hands-on meetup and hackathon announcement
TRANSCRIPT
Hands-on Introduction & Hackathon Kickoff
Ashvin Agrawal William Markito@william_markito@aasoj
Powered byPivotal Open Source Hub (POSH)
(incubating)
• Hackathon Details • Apache Geode Introduction
• History • Key features and components • Roadmap
• Hands-on lab • Build & run • Starting a cluster • Using docker for clustering • Your first app
• Q&A
2
Agenda
A distributed, memory-based data management platform for data oriented apps that need: • high performance, scalability, resiliency and continuous
availability • fast access to critical data set • location aware distributed data processing • event driven data architecture
5
Introduction
7
One size fits all ?
Cost of sorting is nlog(n)
• Data quality and quantity differences • Eventual consistency • Response time expectation • Scalability challenges: disk, memory, network and
external systems
• 1000+ systems in production (real customers) • Cutting edge use cases
8
Incubating… but rock solid
2004 2008 2014
• Massive increase in data volumes
• Falling margins per transaction
• Increasing cost of IT maintenance
• Need for elasticity in systems
• Financial Services Providers (every major Wall Street bank)
• Department of Defense
• Real Time response needs • Time to market constraints • Need for flexible data
models across enterprise • Distributed development • Persistence + In-memory
• Global data visibility needs • Fast Ingest needs for data • Need to allow devices to
hook into enterprise data • Always on
• Largest travel Portal • Airlines • Trade clearing • Online gambling
• Largest Telcos • Large mfrers • Largest Payroll processor • Auto insurance giants • Largest rail systems on
earth
• 17 billion records in memory • GE Power & Water's Remote Monitoring & Diagnostics Center
• 3 TB operational data in-memory, 400 TB archived • China Railways
• 4.6 Million transactions a day / 40K transactions a second • China Railways
9
Incubating… but rock solid
• Performance optimized persistence
• Configurable consistency
• Elastic capacity
• Latency minimizing distribution
• Heterogenous deployment
Designed for High Performance
10
+/-
L2 ~10 ns, memory ~100 ns, network <1ms, disk ~10ms
• Cache
• In-memory storage and management for your data
• Configurable through XML, Spring, Java API or CLI
• Collection of Region
12
Concepts
Region
Region
Region
Cache
JVM
• Region
• Distributed java.util.Map on steroids (Key/Value)
• Consistent API regardless of where or how data is stored
• Observable (reactive)
• Highly available, redundant on cache Member (s).
13
Concepts
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
• Region
• Local, Replicated or Partitioned
• In-memory or persistent
• Redundant
• LRU
• Overflow
14
Concepts
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
LOCAL LOCAL_HEAP_LRU LOCAL_OVERFLOW LOCAL_PERSISTENT LOCAL_PERSISTENT_OVERFLOW PARTITION PARTITION_HEAP_LRU PARTITION_OVERFLOW PARTITION_PERSISTENT PARTITION_PERSISTENT_OVERFLOW PARTITION_PROXY PARTITION_PROXY_REDUNDANT PARTITION_REDUNDANT PARTITION_REDUNDANT_HEAP_LRU PARTITION_REDUNDANT_OVERFLOW PARTITION_REDUNDANT_PERSISTENT PARTITION_REDUNDANT_PERSISTENT_OVERFLOW REPLICATE REPLICATE_HEAP_LRU REPLICATE_OVERFLOW REPLICATE_PERSISTENT REPLICATE_PERSISTENT_OVERFLOW REPLICATE_PROXY
• Persistent Regions
• Durability
• WAL for efficient writing
• Consistent recovery
• Compaction
15
Concepts
Modify k1->v5
Create k6->v6
Create k2->v2
Create k4->v4 Oplog2.crf
Member 1
Modify k4->v7 Oplog3.crf
Put k4->v7
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
Region
Cache
java.util.Map
JVM
Key Value
K01 May
K02 Tim
Server 1 Server N
• Member
• A process that has a connection to the system
• A process that has created a cache
• Embeddable within your application
16
Concepts
Client
Locator
Server
• Client cache
• A process connected to the Geode server(s)
• Can have a local copy of the data
• Can be notified about events on the servers
17
Concepts
Application
GemFire Server
Region
Region
Region Client Cache
• Functions
• Used for distributed concurrent processing (Map/Reduce, stored procedure)
• Highly available
• Data oriented
• Member oriented
18
Concepts
Submit (f1)
f1 , f2 , … fn
Execute Functions
19
Concepts
Server
Server
FunctionService.onRegion.withFilter.execute ResultCollector.getResult
Server Distributed System
execute
Server
Server
6
1
result
execute
execute
result result
2
5
3
4 3 4
Server
Partitioned Region Data Store - X
Partitioned Region Data Store - Y
Partitioned Region Data Store - Z
Partitioned Region Data Accessor
Partitioned Region Data Accessor
filter = Keys X, Y Client Region
• Functions
• Listeners
• CacheWriter / CacheListener
• AsyncEventListener (queue / batch)
• Parallel or Serial
• Conflation
20
Concepts
• Clone & Build
22
Hands-on: Build & run
git clone https://github.com/apache/incubator-‐geode cd incubator-‐geode./gradlew build -‐Dskip.tests=true
• Start a server
cd gemfire-‐assembly/build/install/apache-‐geode ./bin/gfsh gfsh> start locator -‐-‐name=locator gfsh> start server -‐-‐name=server gfsh> create region -‐-‐name=myRegion -‐-‐type=REPLICATE
• Containers • FreeBSD Jails (2000) • Solaris Zones (2004) • Docker (2013)
• Operating system level virtualization • Isolated user space instances
24
* https://linuxcontainers.org/
Hands-on: Docker
25
Container vs VM
“..while the hypervisor abstracts the entire device, containers just abstract the operating system kernel"
Hands-on: Docker & Compose
26
• Single instance
docker run -‐it apachegeode/geode:nightly gfsh
• Cluster
docker-‐compose up
• Scale
docker-‐compose scale server=3
Hands-on: Application
27
• Teeny URL • Fast response time • Statistics
• Hits • User agent ? • IPs ?
• URL will last for 5 minutes • Distribute data & load • Highly scalable
createURL
getURLstats
• HDFS Persistence • Off-heap memory storage • Lucene Search • Spark Integration • Cloud Foundry service
28
Roadmap
• Code • New features • Bug fixes • Writing tests
• Documentation • Wiki • Web site • User guide
29
How to Contribute
• Community • Join the mailing list
• Ask or answer • Join our HipChat • Become a speaker • Finding bugs • Testing an RC/Beta
• JIRA https://issues.apache.org/jira/browse/GEODE
• Wiki cwiki.apache.org/confluence/display/GEODE
• GitHub https://github.com/apache/incubator-geode
• Mailing lists mail-archives.apache.org/mod_mbox/incubator-geode-dev/
30
Links
31
Thank youhttp://geode.incubator.apache.org
https://github.com/Pivotal-Open-Source-Hub