effective testing of apache accumulo iterators
TRANSCRIPT
Effective Testing ofApache Accumulo IteratorsJosh ElserAccumulo Summit 20162016/10/11
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Engineer at Hortonworks, Member of the Apache Software Foundation
Top-Level Projects• Apache Accumulo®• Apache Calcite™• Apache Commons ™• Apache HBase ®• Apache Phoenix ™
ASF Incubator• Apache Fluo ™• Apache Gossip ™• Apache Pirk ™• Apache Rya ™• Apache Slider ™
These Apache project names are trademarks or registeredtrademarks of the Apache Software Foundation.
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
A Novel Feature of Apache Accumulo
SortedKeyValueIterator (SKVI or “Iterators”) Computation offload Reduced I/O Rumored to be called “cool” by Jeff Dean
TransformationsServer-Side
Predicate-Pushdown
Filters
Aggregations
Combiners
Versioning
Security
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Accumulo Iterators
Column Slices (CfCqSliceFilter) Basic Statistics (StatsCombiner) Value/Array Concatenation (Summing[Array]Combiner) Aggregations (WholeRowIterator, WholeColumnFamilyIterator) In-Row operations (AndIterator, OrIterator) Filters (RegExFilter, GrepIterator, FirstEntryInRowIterator)
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Reads
Clients request a Range of data Key to Row to Tablet to TabletServer Sorted, merged-read of memory and files Computation offload and RPC boost
Tablet
Memory RFileRFile
RFileRFile
RFileClient
Iterators
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Reads with Iterators
A poor-man’s “VIEW” Server-side transformation at query-time
Raw Key Value Transformed Key Value
3141592 siblings:brothers Bobby,Steven 3141592 siblings:count 4
3141592 siblings:sisters Sally,Francine
3141593 siblings:brothers Frank 3141593 siblings:count 3
3141593 siblings:sisters Amy,Loretta
3141594 siblings:brothers 3141594 siblings:count 2
3141594 siblings:sisters Rebecca,Savannah
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Compactions
Bounds number of files and performance Iterators provide data optimization mechanism
Tablet
RFileRFile
RFileRFile
RFile
RFile
RFile
Before AfterIterators
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Compactions with Iterators Deferred aggregation Rewrite application data in optimal form
Raw Key Value Transformed Key Value
3141592 siblings:brothers Bobby,Steven 3141592 siblings:brothers …
3141592 siblings:count 4
3141592 siblings:sisters Sally,Francine 3141592 siblings:sisters …
3141593 siblings:brothers Frank 3141593 siblings:brothers …
3141593 siblings:count 3
3141593 siblings:sisters Amy,Loretta 3141593 siblings:sisters …
3141594 siblings:brothers 3141594 siblings:brothers …
3141594 siblings:counts 2
3141594 siblings:sisters Rebecca,Savannah 3141594 siblings:sisters …
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Better for Everyone
Iterators are great– Abstraction for system-level filters and optimizations– Better performance for power-users
Lots of things Iterators are not– Triggers– Hooks– Coprocessors– “Hammers”
Iterators do not generally replace– Flink, Hive, Mesos, Presto, Storm, Spark, YARN, etc– Can in some cases
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
On Building an Iterator
The API is not particularly intuitive
Hard to create/support SKVIv2
Edge-cases in production are hard to understand
Lots of things to not do in an Iterator– Trial and error
Difficult insight in production systems
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Good– Fast– Concise/Simple– Given input, verify output
Bad– Not end-to-end– Not representative invocation
Unit Testing Good
– Same server execution as production– Same client interaction as production
Bad– Slow/Memory intensive– Pedantic to write tests– Might not catch production edge-cases– Impacted by environment
MiniAccumuloCluster (MAC) Testing
Existing Testing Tools
What’s the happy medium?
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Iterator Testing Harness
Testing harness designed to capture common pitfalls– ACCUMULO-626 in >=1.8.0
Complementary The good parts
– Fast– Generalized/Reusable tests– Extensible
The bad parts– Not directly using TabletServer for invocation– Subtle failures
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Iterator Testing Harness
Testing an Iterator requires three things– Input data– Expected output– Collection of test cases to run
Test cases found via reflection– Common edge cases provided– Easy to develop and run new test cases
JUnit4 integration
@Parameters public static Object[][] data() { IteratorTestInput input = createIteratorInput(); IteratorTestOutput expectedOutput = createIteratorOuput(); List<IteratorTestCase> testCases = createTestCases(); return BaseJUnit4IteratorTest.createParameters(input,
expectedOutput, testCases); }
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Example Test Cases
Iterator Instantiation– Does the Iterator have a visibile no-args constructor?
”DeepCopy” safety– Can a “deepCopy()” of an Iterator be used like the original?
Stateless “hasTop()”– Do multiple invocations of “hasTop()” cause incorrect results/errors?
Re-seek()’ing– Accumulo will re-instantiate scan sessions and use new Ranges– Does the Iterator still return correct results in this case?
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
In an Ideal World
Good testing means faster deployments Faster deployment means more value for customers Automated tests combats technical debt in code growth More automation reduces developer stress
Unit Tests MiniAccumuloCluster Iterator Testing Harness+ + =
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
In an Ideal World
Unit Tests (test lifecycle phase)– Fast verification given input/output– Validate impact of state
Iterator Testing Harness (test lifecycle phase)– Catch common-mistakes– Basic lifetime/API validation– Encourage best-practices
MiniAccumuloCluster (integration-test lifecycle phase)– Functional/Acceptance tests– Does the ingest/query system function– Real execution of Iterator by TabletServer
A Trio of Testing Approaches
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Standalone environment– The ”laptop test”– Sanity check
Staging environments– Small cluster with a subset of data– Correctness and performance
In an Ideal World
Code
MAC
IteratorTest Harness
Unit Tests
BinaryArtifacts
Standalone
Staging
ProductionDeploy
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
In an Ideal World
No more ”voodoo” and “black magic” Find common errors fast Catch bad Iterator design early Standardized testing methodology Community contributes new tests Increase in quality, reusability, and confidence
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank YouTwitter: @josh_elserEmail: [email protected] / [email protected]