secondary indexing in phoenix
DESCRIPTION
Secondary Indexing in Phoenix. SF HBase User Group – September 26, 2013 . James Taylor Phoenix Lead Software Engineer. Jesse Yates HBase Committer Software Engineer. Agenda. https://www.madison.k12.wi.us/calendars. About Indexes In Phoenix Immutable Indexes Mutable Indexes Demo! - PowerPoint PPT PresentationTRANSCRIPT
Secondary Indexing in Phoenix
Jesse YatesHBase CommitterSoftware Engineer
SF HBase User Group – September 26, 2013
James TaylorPhoenix LeadSoftware Engineer
2
Agenda• About
• Indexes In Phoenix
• Immutable Indexes
• Mutable Indexes
• Demo!
• Roadmap
SF HUG – Sept 2013
https://www.madison.k12.wi.us/calendars
3
Phoenix
• Open Source– https://github.com/forcedotcom/phoenix
• “SQL-skin” on HBase– Everyone knows SQL!
• JDBC Driver– Plug-and-play
• Faster than HBase– in some cases
SF HUG – Sept 2013
4
Secondary Indexes
• Sort on ‘orthogonal’ axis
• Save full-table scan
• Expected database feature
• Hard in HBase b/c of ACID considerations
SF HUG – Sept 2013
5
• About
• Indexes In Phoenix
• Immutable Indexes
• Mutable Indexes
• Demo!
• Roadmap
Agenda
SF HUG – Sept 2013
6
Indexes In Phoenix
• Creating an index– DDL statement– Creates another HBase table behind the scenes
• Deciding when an index is used– Transparent to the user– (but user can override through hint)– No stats yet
• Knowing which table was used– EXPLAIN <query>
SF HUG – Sept 2013
7
Creating Indexes In Phoenix
• CREATE INDEX <index_name>ON <table_name>(<columns_to_index>…)INCLUDE (<columns_to_cover>…);
• Optionally add IMMUTABLE_ROWS=true property to CREATE TABLE statement
SF HUG – Sept 2013
8
Creating Indexes In Phoenix
CREATE TABLE baby_names ( name VARCHAR PRIMARY KEY, occurrences BIGINT);
CREATE INDEX baby_names_idx ON baby_names(occurrences DESC,
name);
SF HUG – Sept 2013
9
Deciding When To Use
• Transparent to the user• Query optimizer does the following:– Compiles query against data and index tables– Chooses “best” one (not yet stats driven)• Can index even be used?
– Active, Using columns contained in index (no join back to data table)
• Can ORDER BY be removed?• Which plan forms the longest start/stop scan key?
SF HUG – Sept 2013
10
Deciding When To Use
SELECT name, occurrences FROM baby_names ORDER BY occurrences DESC LIMIT 10;
SELECT name, occurrences FROM baby_names_idxLIMIT 10
SF HUG – Sept 2013
ORDER BY not necessary since rows in index table are already ordered this way
11
Deciding When To Use
SELECT name, occurrences FROM baby_names WHERE occurrences > 100;
SELECT name, occurrences FROM baby_names_idxWHERE occurrences > 100;
SF HUG – Sept 2013
Uses index, since we can form start row for scan based on filter of occurrences
12
Deciding When To Use
SELECT /* NO_INDEX */ nameFROM baby_names WHERE occurrences > 100;
SELECT /*+ INDEX (baby_names baby_names_idx other_baby_names_idx) */name,occurrences FROM baby_namesWHERE occurrences > 100;
SF HUG – Sept 2013
Override optimizer by telling it not to use any indexes
Tell optimizer priority in which it should consider using indexes`
13
Knowing which table was used
EXPLAIN SELECT name, occurrences FROM baby_names ORDER BY occurrences DESC LIMIT 10;
CLIENT PARALLEL 1-WAY FULL SCAN OVER BABY_NAMES_IDX SERVER FILTER BY PageFilter 10CLIENT 10 ROW LIMIT
SF HUG – Sept 2013
14
• About
• Indexes In Phoenix
• Immutable Indexes
• Mutable Indexes
• Demo!
• Roadmap
Agenda
SF HUG – Sept 2013
15
Immutable Indexes
• Immutable Rows
• Much easier to implement
• Client-managed
• Bulk-loadable
SF HUG – Sept 2013
16
• About
• Indexes In Phoenix
• Immutable Indexes
• Mutable Indexes
• Demo!
• Roadmap
Agenda
SF HUG – Sept 2013
17
Mutable Indexes
• Global Index
• Change row state– Common use-case– “expected” implementation
• Covered Columns/Join Index
SF HUG – Sept 2013
18
1.5 years*
SF HUG – Sept 2013
19
Internals
• Index Management– Build index updates– Ensures index is ‘cleaned up’
• Recovery Mechanism– Ensures index updates are “ACID”
SF HUG – Sept 2013
20
“There is no magic”
- Every programming hipster (chipster)
SF HUG – Sept 2013
21
Mutable Indexing: Standard Write Path
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
SF HUG – Sept 2013
22
Mutable Indexing: Standard Write Path
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
SF HUG – Sept 2013
23
Mutable Indexing
RegionCoprocessor
Host
WAL
RegionCoprocessor
Host
Indexer Builder
WAL Updater
Durable!
IndexerIndex Table
Index TableIndex Table
Codec
SF HUG – Sept 2013
24
Index Management
• Lives within a RegionCoprocesorObserver• Access to the local HRegion• Specifies the mutations to apply to the index
tables
public interface IndexBuilder {public void setup(RegionCoprocessorEnvironment env);public Map<Mutation, String> getIndexUpdate(Put put);public Map<Mutation, String> getIndexUpdate(Delete delete);
}
SF HUG – Sept 2013
25
Why not write my own?
• Managing Cleanup – Efficient point-in-time correctness– Performance tricks
• Abstract access to HRegion– Minimal network hops
• Sorting correctness– Phoenix typing ensures correct index sorting
SF HUG – Sept 2013
26
Example: Managing Cleanup
• Updates can arrive out of order– Client-managed timestamps
SF HUG – Sept 2013
ROW FAMILY QUALIFIER TS VALUE
Row1 Fam Qual 10 val1
Row1 Fam2 Qual2 12 val2
Row1 Fam Qual 13 val3
27
Example: Managing Cleanup
Index Table
SF HUG – Sept 2013
ROW FAMILY QUALIFIER TS
Val1|Row1 Index Fam:Qual 10
Val1|Val2|Row1 Index Fam:QualFam2:Qual2
12
Val3|Val2|Row1 Index Fam:QualFam2:Qual2
13
28
Example: Managing Cleanup
SF HUG – Sept 2013
ROW FAMILY QUALIFIER TS VALUE
Row1 Fam Qual 10 val1
Row1 Fam2 Qual2 12 val2
Row1 Fam Qual 13 val3
Row1 Fam Qual 11 val4
29
Example: Managing Cleanup
SF HUG – Sept 2013
ROW FAMILY QUALIFIER TS VALUE
Row1 Fam Qual 10 val1
Row1 Fam Qual 11 val4
Row1 Fam2 Qual2 12 val2
Row1 Fam Qual 13 val3
30
Example: Managing Cleanup
SF HUG – Sept 2013
ROW FAMILY QUALIFIER TS
Va1|Row1 Index Fam:Qual 10
Val4|Row1 Index Fam:Qual 11
Val4|Val2|Row1 Index Fam:QualFam2:Qual2
12
Va1l|Val2|Row1 Index Fam:QualFam2:Qual2
12
Val3|Val2|Row1 Index Fam:QualFam2:Qual2
13
31
Example: Managing Cleanup
SF HUG – Sept 2013
ROW FAMILY QUALIFIER TS
Va1|Row1 Index Fam:Qual 10
Val4|Row1 Index Fam:Qual 11
Val4|Val2|Row1 Index Fam:QualFam2:Qual2
12
Va1l|Val2|Row1 Index Fam:QualFam2:Qual2
12
Val3|Val2|Row1 Index Fam:QualFam2:Qual2
13
32
Managing Cleanup
• History “roll up”• Out-of-order Updates• Point-in-time correctness• Multiple Timestamps per Mutation• Delete vs. DeleteColumn vs. DeleteFamily
Surprisingly hard!SF HUG – Sept 2013
33
Phoenix Index Builder
• Much simpler than full index management• Hides cleanup considerations• Abstracted access to local state
SF HUG – Sept 2013
public interface IndexCodec{public void initialize(RegionCoprocessorEnvironment env);public Iterable<IndexUpdate> getIndexDeletes(TableState state);public Iterable<IndexUpdate> getIndexUpserts(TableState state);
}
34
Phoenix Index Codec
SF HUG – Sept 2013
35
Dude, where’s my data?
SF HUG – Sept 2013
Ensuring Correctness
36
HBase ACID
• Does NOT give you:– Cross-row consistency– Cross-table consistency
• Does give you:– Durable data on success– Visibility on success without partial rows
SF HUG – Sept 2013
Key Observation
“Secondary indexing is inherently an easier problem than full transactions… secondary index updates are idempotent.”
- Lars Hofhansl
37 SF HUG – Sept 2013
38
Idempotent Index Updates
• Doesn’t need full transactions
• Replay as many times as needed
• Can tolerate a little lag– As long as we get the order right
SF HUG – Sept 2013
39
Failure Recovery• Custom WALEditCodec– Encodes index updates– Supports compressed WAL
• Custom WAL Reader– Replay index updates from WAL
SF HUG – Sept 2013
<property><name>hbase.regionserver.wal.codec</name> <value>o.a.h.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property><property>
<name>hbase.regionserver.hlog.reader.impl</name> <value>o.a.h.hbase.regionserver.wal.IndexedHLogReader</value>
</property>
40
Failure Situations
• Any time before WAL, client replay
• Any time after WAL, HBase replay
• All-or-nothing
SF HUG – Sept 2013
41
Failure #1: Before WAL
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
SF HUG – Sept 2013
42
Failure #1: Before WAL
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
No problem! No data is stored in the WAL, client just retries entire update.
SF HUG – Sept 2013
43
Failure #2: After WAL
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
SF HUG – Sept 2013
44
Failure #2: After WAL
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
WAL replayed via usual replay mechanisms
SF HUG – Sept 2013
45
“Magic”
• Server-short circuit• Lazy load columns• Skip-scan for cache• Parallel Writing• Custom MemStore in Indexer• Caching HTables• Pluggable Index Writing/Failure Policy• Minimize byte[] copy (ImmutableBytesPtr)
SF HUG – Sept 2013
46
• About
• Indexes In Phoenix
• Immutable Indexes
• Mutable Indexes
• Demo!
• Roadmap
Agenda
SF HUG – Sept 2013
47
Demo
SF HUG – Sept 2013
48
• About
• Indexes In Phoenix
• Immutable Indexes
• Mutable Indexes
• Demo!
• Roadmap
Agenda
SF HUG – Sept 2013
49
Roadmap
• Next release of Phoenix
• Performance improvements
• Functional Indexes
• Other indexing approaches (Huawei, SEP)
SF HUG – Sept 2013
50
Open Source!
• Main: https://github.com/forcedotcom/phoenix
• Indexing:https://github.com/forcedotcom/phoenix/tree/mutable-si
SF HUG – Sept 2013
(obligatory hiring slide)
We’re Hiring!
53
Appendix
• AsyncHBaseWriter– github.com/jyates/phoenix/tree/async-hbase– 2x+ slower*
* Written in 2hrs, not 100% correct either
SF HUG – Sept 2013