adam fuchs' accumulo talk at nosql now! 2013
DESCRIPTION
Adam Fuch provides an overview of Accumulo and Sqrrl Enterprise at the 2013 NoSQL Now! conferenceTRANSCRIPT
![Page 1: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/1.jpg)
Securely explore your data
SQRRL ENTERPRISE +
APACHE ACCUMULO:
A secure, scalable, real-time analysis framework
Adam Fuchs, CTO
Sqrrl Data, Inc.
August 21, 2013
![Page 2: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/2.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
OUTLINE
Two Halves of “Real-Time”
Accumulo and Sqrrl Technology
Data-Centric Security
Table Designs
Performance Benchmarks
![Page 3: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/3.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
TWO HALVES OF REAL-TIME
Real-Time reduce event to reaction time Real-Time reduce ingest to query latency
Data-Driven Query-Driven
![Page 4: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/4.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
1. SPE queries NoSQL to enrich streaming data
2. SPE persists results in NoSQL for future query
3. SPE takes action automatically
4. SPE issues data-driven alerts
5. Sqrrl provides context for dashboards
6. Analysis tools query use Sqrrl to search and manipulate historical data
Data-Driven + Query-Driven Real-Time Ecosystem
Data
NoSQL+
SPE
Dashboards
Actions
InteractiveAnalysis Tools(Discovery + Forensics)
1 2
3
5
4
6
![Page 5: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/5.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential © 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 5
This talk focuses on the database.
Dashboards
InteractiveAnalysis Tools(Discovery + Forensics)
1. SPE queries NoSQL to enrich streaming data2. SPE persists results in NoSQL for future query3. SPE takes action automatically4. SPE issues data-driven alerts5. Sqrrl provides context for dashboards6. Analysis tools query use Sqrrl to search and manipulate historical data
Data
Actions
SPE4
3
NoSQL+6
5
21
![Page 6: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/6.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
OUTLINE
Two Halves of “Real-Time”
Accumulo and Sqrrl Technology
Data-Centric Security
Table Designs
Performance Benchmarks
![Page 7: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/7.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
ACCUMULO DATA FORMAT
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 7
Accumulo Key/Value Example
An Accumulo key is a 5-tuple, consisting of:
- Row: Controls Atomicity- Column Family: Controls Locality - Column Qualifier: Controls Uniqueness- Visibility Label: Controls Access- Timestamp: Controls Versioning
![Page 8: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/8.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
ACCUMULO TABLETS
Collections of KV pairs form Tables
Tables are partitioned into Tablets
Metadata tablets hold info about other tablets, forming a 3-level hierarchy
A Tablet is a unit of work for a Tablet Server
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 8
Root Tablet-∞ to ∞
Metadata Tablet 1-∞ to “Encyclopedia:Ocelot”
Data Tablet-∞ : thing
Data Tabletthing : ∞
Data Tablet-∞ : Ocelot
Data TabletOcelot : Yak
Data TabletYak : ∞
Data Tablet-∞ to ∞
Metadata Tablet 2 “Encyclopedia:Ocelot” to ∞
Well-Known Location
(zookeeper)
Table: Adam’s Table Table: Encyclopedia Table: Foo
![Page 9: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/9.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
ACCUMULO PROCESSES
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 9
Tablet Server
Tablet
Tablet Server
Tablet
Tablet Server
Tablet
Application
Zookeeper
Zookeeper
Zookeeper
Master
HDFS
Read/Write
Store/Replicate
Assign/Balance
Delegate Authority
Application
Application
![Page 10: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/10.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
TABLET DATA FLOW
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 10
In-Memory Map
Write AheadLog
(For Recovery)
Sorted, Indexed
File
Sorted, Indexed
File
Sorted, Indexed
File
Tablet
ReadsIterator
TreeMinor
Compaction
Merging / Major Compaction
Iterator Tree
Writes Iterator Tree
Scan
![Page 11: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/11.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
WORD COUNT:
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 11
Summing Aggregating Iterator
Input Corpus
![Page 12: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/12.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
ITERATOR FRAMEWORK
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 12
Iterator Operations:
- File Reads- Block Caching- Merging- Deletion- Isolation- Locality Groups- Range Selection- Column Selection- Cell-level Security- Versioning- Filtering- Aggregation- Partitioned Joins
![Page 13: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/13.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
ACCUMULO LATENCIES
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 13
Ingesters QueriersTablet Servers
Input BatchWriter
In-Memory
Map
ScanIterators
Scanner/Batch
Scanner
In-Memory
Map
RFile
Compaction
Iterators
ScanIterators
RFile
Compaction
Iterators
In-Memory
Map
RFiles
CompactionIterators
ScanIterators
Output
~ms~ms ~ms
ms
- m
in
![Page 14: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/14.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
ACCUMULO THROUGHPUT
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 14
Ingesters QueriersTablet Servers
Input BatchWriter
In-Memory
Map
ScanIterators
Scanner/Batch
Scanner
In-Memory
Map
RFile
Compaction
Iterators
ScanIterators
RFile
Compaction
Iterators
In-Memory
Map
RFiles
CompactionIterators
ScanIterators
Output
~ms~ms ~ms
ms
- m
in
Read-Modify-Write Latency: ~ms
>1K entries/s challenging with R-M-W
Ingest:up to 500K entries/s
per node
Scan:up to 1M entries/s
per node
![Page 15: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/15.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
SQRRL ENTERPRISE
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 15
Built on Apache Accumulo
Sqrrl Server
Sqrrl API over Apache Thrift RPC(JSON, Graph, Aggregation, Search, etc.)
• Sqrrl proprietary• Automated indexing• Custom iterators• Lucene integration• Security extensions Accumulo RPC
(Sorted Key/Value I/O)
Hadoop RPC(File I/O)
• Open source (including Sqrrl contributions)
• Open source or commercial distributions
Graph + Document I/O
Exploratory / Operational Apps
Bulk Processing Integration
![Page 16: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/16.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
OUTLINE
Two Halves of “Real-Time”
Accumulo and Sqrrl Technology
Data-Centric Security
Table Designs
Performance Benchmarks
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 16
![Page 17: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/17.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
DATA-CENTRIC SECURITY
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 17
Definition: Data carries with it information that is required to make policy decisions on its releasability.
User 1 User 2Sqrrl/
Accumulo
![Page 18: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/18.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
SECURITY
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 18
Example Accumulo Key/Value Pairs
Accumulo is the only NoSQL database with cell-level access controls
![Page 19: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/19.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
DATA-CENTRIC SECURITY ECOSYSTEM
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 19
Data Labeler Sqrrl Enterprise
Apps
User Attributes
Audits
Policies
End Users
Auth. Service
Policy Engine
Key Mgmt
![Page 20: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/20.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
OUTLINE
Two Halves of “Real-Time”
Accumulo and Sqrrl Technology
Data-Centric Security
Table Designs
Performance Benchmarks
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 20
![Page 21: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/21.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
HIERARCHICAL DECOMPOSITION
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 21
Row:
Column Family:
Column Qualifier:
Value:
<person>
attribute purchases
age
<age>
discount
<cost>
sneakers
<rate>
returns
hat
<cost>
![Page 22: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/22.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
MATERIALIZED TABLE
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 22
Row: george
attribute purchases
age
27 $83
sneakers
bill
attribute purchases
40%
sneakers
$100
discount
49
age
Key/Value Pair
Column Family:
Column Qualifier:
Value:
![Page 23: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/23.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
FORWARD AND INVERTED INDEX
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 23
Table:
Row:
Column Family:
Value:
Forward Index
<UUID>
<Type>
<Field>
<Term>
Inverted Index
<Term>
<UUID>
<Type+Field>
<Digest of Event>
Column Qualifier:
![Page 24: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/24.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
FORWARD AND INVERTED INDEX
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 24
![Page 25: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/25.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
CUSTOM INDEXING
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 25
Table:
Row:
Geo Index
<GeoHash>
<Event Type>
<UUID>
<Digest of Event>
Latitude10110101001
Longitude00111010010
101001110111010101011100001011100
Depth11010110110
Column Family:
Column Qualifier:
Value:
![Page 26: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/26.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
D4M 2.0 SCHEMA FOR TWITTER DATA
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 26
Table:
Row:
Column Family:
Tedge
<UUID>
“stat”
<stat>
“1”
“time”
<time>
“1”
“user”
<user>
“1”
“word”
<word>
“1”
TedgeT
<value>
“stat”
<UUID>
“1”
“time”
<UUID>
“1”
“user”
<UUID>
“1”
“word”
<UUID>
“1”
Column Qualifier:
Value:
![Page 27: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/27.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
D4M 2.0 SCHEMA FOR TWITTER DATA
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 27
Table:
Row:
Column Family:
TedgeDegT
<value>
“stat”
“degree”
<count>
“time”
“degree”
<count>
“user”
“degree”
<count>
“word”
“degree”
<count>
Ttext
<UUID>
Column Qualifier:
Value:
“text”
-
<text>
![Page 28: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/28.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
D4M 2.0 SCHEMA FOR TWITTER DATA
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 28
Source: D4M 2.0 Schema: A General Purpose High Performance Schema for the Accumulo Database , Kepner et. al., HPEC 2013
![Page 29: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/29.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
OUTLINE
Two Halves of “Real-Time”
Accumulo and Sqrrl Technology
Data-Centric Security
Table Designs
Performance Benchmarks
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 29
![Page 30: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/30.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
ACCUMULO WITH D4M 2.0 SCHEMA PERFORMANCE
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 30
Source: D4M 2.0 Schema: A General Purpose High Performance Schema for the Accumulo Database , Kepner et. al., HPEC 2013
Maximizing throughput on an 8-node, 192-core cluster:
![Page 31: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/31.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
ACCUMULO SCALABILITY: GRAPH500 BENCHMARK
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 31
source: http://www.pdl.cmu.edu/SDI/2013/slides/big_graph_nsa_rd_2013_56002v1.pdf
![Page 32: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/32.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
ATOMIC INCREMENT PERFORMANCE COMPARISON
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 32
Read/Modify/Write (HBase) vs. Iterators/Combiners (Accumulo)
![Page 33: Adam Fuchs' Accumulo Talk at NoSQL Now! 2013](https://reader038.vdocument.in/reader038/viewer/2022103111/54c6ada14a7959526c8b4576/html5/thumbnails/33.jpg)
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential
QUESTIONS?
© 2013 Sqrrl | All Rights Reserved | Proprietary and Confidential 33
Adam Fuchs, CTOSqrrl Data, Inc.