accumulo meetup 20130109

18
Securely explore your data APPACHE ACCUMULO Adam Fuchs and John Vines sqrrl data, inc. January 9, 2013 Securely explore your data

Upload: sqrrl

Post on 27-Jan-2015

111 views

Category:

Technology


4 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Accumulo meetup 20130109

Securely explore your data

APPACHE ACCUMULO

Adam Fuchs and John Vines

sqrrl data, inc.

January 9, 2013

Securely explore your data

Page 2: Accumulo meetup 20130109

APACHE ACCUMULO

Sorted, Distributed Key/Value Store

Based on Google’s Big Table Design

Built on Top of Apache Hadoop and Apache Zookeeper

Augments and Integrates With the Hadoop ecosystem

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 2

Page 3: Accumulo meetup 20130109

TODAY’S TALK

Overview of the Accumulo Project

Accumulo Design

Table Design Strategy

Live Demonstration

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 3

Page 4: Accumulo meetup 20130109

ACCUMULO TIMELINE

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 4

2005 2006 2007 2008 2009 2010 2011 2012 2013

Google publishes

Bigtable paper

NSA begins development of

Accumulo

NSA open sources

Accumulo into incubation at

Apache

sqrrl is founded

First sqrrl release planned

Google Publishes Papers:

GFS (2003)Map Reduce (2004)

Accumulo becomes a top-level Apache

project

Page 5: Accumulo meetup 20130109

ACCUMULO’S STRENGTHS

Apache Accumulo excels at:

- Security

Cell-level security reduces the cost of application development in the presence of complex legal or policy restrictions on data use

Mandatory access control keeps your data safe

- Scalability

Proven reliability and performance at the multi-petabyte scale

High-performance parallel I/O library

- Adaptability

Flexible schema support to quickly ingest new data sources

Sorted key/value paradigm supports a multitude of search and analysis applications

Server-side programming framework “iterator trees” support best-in-class aggregation, filtering, and complex query semantics

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 5

Page 6: Accumulo meetup 20130109

BASIC SCHEMA

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 6

An Accumulo key is a 5-tuple, consisting of:- Row: Controls Atomicity- Column Family: Controls Locality- Column Qualifier: Controls Uniqueness- Visibility Label: Controls Access- Timestamp: Controls Versioning

Keys are sorted:- Hierarchically: Row first, then column family, and so on.

- Lexicographically: Compare first byte, then second, and so on.

Accumulo stores sorted key/value pairs (entries).

Values are byte arrays.

Page 7: Accumulo meetup 20130109

KEY/VALUE EXAMPLES

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 7

Row Col. Fam. Col. Qual. Visibility Timestamp Value

Jane Doe Friends John Doe JD 20121130

Jane Doe Phone Number 555-1212 20090115

John Doe Friends Jane Doe JD 20121201

John Doe Notes PCP PCP_JD 20120912 Patient suffers from an acute …

John Doe Test Results Cholesterol JD|PCP_JD 20120912 183

John Doe Test Results Mental Health JD|PSYCH_JD 20120801 Pass

John Doe Test Results Mental Health PSYCH_JD 20120801 Crazy!

John Doe Test Results X-Ray JD|PHYS_JD 20120513 1010110110100…

Page 8: Accumulo meetup 20130109

VISIBILITY SYNTAX & SEMANTICS

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 8

Page 9: Accumulo meetup 20130109

TABLET ORGANIZATION

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 9

Root Tablet-∞ to ∞

Metadata Tablet 1-∞ to

“Encyclopedia:Ocelot”

Data Tablet-∞ : thing

Data Tabletthing : ∞

Data Tablet-∞ : Ocelot

Data TabletOcelot : Yak

Data TabletYak : ∞

Data Tablet-∞ to ∞

Metadata Tablet 2 “Encyclopedia:Ocelot” to ∞

Well-Known Location

(zookeeper)

Table: Adam’s Table Table: Encyclopedia Table: Foo

Collections of entries from tables

Tables are partitioned into Tablets

Metadata tablets hold info about other tablets, forming a 3-level hierarchy

A Tablet is a unit of work for a Tablet Server

Page 10: Accumulo meetup 20130109

ACCUMULO ARCHITECTURE

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 10

Tablet Server

Tablet

Tablet Server

Tablet

Tablet Server

Tablet

Application

Zookeeper

Zookeeper

Zookeeper

Master

HDFS

Read/Write

Store/Replicate

Assign/Balance

Application

Application

GarbageCollector

ScanDelete

Delegate Authority,

Configs

Delegate Authority,

Configs

Page 11: Accumulo meetup 20130109

TABLET DATA FLOW

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 11

In-Memory Map

Write AheadLog

(For Recovery)

Sorted, Indexed

File

Sorted, Indexed

FileSorted, Indexed

File

Tablet

ReadsIterator

TreeMinor

Compaction

Merging / Major Compaction

Iterator Tree

Writes Iterator Tree

Scan

Page 12: Accumulo meetup 20130109

ITERATOR FRAMEWORK

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 12

Iterator Operations:

- File Reads- Block Caching- Merging- Deletion- Isolation- Locality Groups- Range Selection- Column Selection- Cell-level Security- Versioning- Filtering- Aggregation- Partitioned Joins

Page 13: Accumulo meetup 20130109

CLIENT API

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 13

Instancenew ZooKeeperInstance(...)

new MockInstance()

Connector

getConnector(auth info...)

Scanner BatchScanner BatchWriter

createScanner(...) createBatchScanner(...) createBatchWriter(...)

TableOperations

InstanceOperations

SecurityOperations

Map.Entry

Key Value

Mutation

Range

IteratorOption

iterator()addMutation(...)

Authorizations

Page 14: Accumulo meetup 20130109

TABLE DESIGN

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 14

Graphs

Document-distributed indexing

Multi-dimensional index

Custom index

No built-in secondary indices

Sort Order Index

Basic design pattern: forward and inverted index tables

Additional table design patterns

Table:

Row:

Column Family:

Column Qualifier:

Value:

Forward Index

<UUID>

<Type>

<Field>

<Term>

Inverted Index

<Term>

<Type> + <Field>

<UUID>

<Digest of Event>

Page 15: Accumulo meetup 20130109

DEMO TIME!

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 15

Page 16: Accumulo meetup 20130109

OPEN SOURCE PROJECT

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 16

Apache Software Foundation project since October 2011

site: http://accumulo.apache.org

jira: https://issues.apache.org/jira/browse/ACCUMULO

lists: http://accumulo.apache.org/mailing_list.html

Page 17: Accumulo meetup 20130109

CURRENT CONTRIBUTORS

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 17

Page 18: Accumulo meetup 20130109

CONTACT

© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 18

Adam FuchsCTO

[email protected]

John VinesDirector of Ecosystems

[email protected]

sqrrl data, Inc.www.sqrrl.com

@[email protected]