realtime search at twitter - michael busch
DESCRIPTION
See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011 At Twitter we serve more than 1.5 billion queries per day from Lucene indexes, while appending more than 200 million tweets per day in realtime. Additionally we recently launched image, video and relevance search on the same engine. This talk will explain the changes we made to Lucene to support this high load and the changes and improvements we made in the last year.TRANSCRIPT
![Page 1: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/1.jpg)
![Page 2: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/2.jpg)
230 MillionTweets per day
![Page 3: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/3.jpg)
2 BillionQueries per day
![Page 4: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/4.jpg)
< 10 sIndexing latency
![Page 5: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/5.jpg)
50 msAvg. query response time
![Page 7: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/7.jpg)
Agenda
‣ Introduction
- Search Architecture
- Inverted Index 101
- Memory Model & Concurrency
- Top Tweets
Earlybird - Realtime Search @twitter
![Page 8: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/8.jpg)
Introduction
![Page 9: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/9.jpg)
Introduction
• Twitter acquired Summize in 2008
• 1st gen search engine based on MySQL
![Page 10: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/10.jpg)
Introduction
• Next gen search engine based on Lucene
• Improves scalability and performance by orders or magnitude
• Open Source
![Page 11: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/11.jpg)
Realtime Search @twitter
Agenda
- Introduction
‣ Search Architecture
- Inverted Index 101
- Memory Model & Concurrency
- Top Tweets
![Page 12: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/12.jpg)
Search Architecture
![Page 13: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/13.jpg)
Search Architecture
• Ingester pre-processes Tweets for search
• Geo-coding, URL expansion, tokenization, etc.
IngesterTweets
![Page 14: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/14.jpg)
Search Architecture
• Tweets are serialized to MySQL in Thrift format
ThriftMySQLMaster MySQL
Slaves
IngesterTweets
![Page 15: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/15.jpg)
Earlybird
• Earlybird reads from MySQL slaves
• Builds an in-memory inverted index in real time
ThriftMySQLMaster MySQL
Slaves
IngesterTweets
EarlybirdIndex
![Page 16: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/16.jpg)
Blender
EarlybirdIndex
BlenderThrift
Thrift
• Blender is our Thrift service aggregator
• Queries multiple Earlybirds, merges results
![Page 17: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/17.jpg)
Realtime Search @twitter
Agenda
- Introduction
- Search Architecture
‣ Inverted Index 101
- Memory Model & Concurrency
- Top Tweets
![Page 18: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/18.jpg)
Inverted Index 101
![Page 19: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/19.jpg)
Inverted Index 101
1 The old night keeper keeps the keep in the town
2 In the big old house in the big old gown.
3 The house in the town had the big old keep
4 Where the old night keeper never did sleep.
5 The night keeper keeps the keep in the night
6 And keeps in the dark and sleeps in the light.
Table with 6 documents
Example from:Justin Zobel , Alistair Moffat, Inverted files for text search engines, ACM Computing Surveys (CSUR)v.38 n.2, p.6-es, 2006
![Page 20: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/20.jpg)
Inverted Index 101
1 The old night keeper keeps the keep in the town
2 In the big old house in the big old gown.
3 The house in the town had the big old keep
4 Where the old night keeper never did sleep.
5 The night keeper keeps the keep in the night
6 And keeps in the dark and sleeps in the light.
term freqand 1 <6>big 2 <2> <3>
dark 1 <6>did 1 <4>
gown 1 <2>had 1 <3>
house 2 <2> <3>in 5 <1> <2> <3> <5> <6>
keep 3 <1> <3> <5>keeper 3 <1> <4> <5>keeps 3 <1> <5> <6>light 1 <6>
never 1 <4>night 3 <1> <4> <5>old 4 <1> <2> <3> <4>
sleep 1 <4>sleeps 1 <6>
the 6 <1> <2> <3> <4> <5> <6>town 2 <1> <3>where 1 <4>
Table with 6 documents
Dictionary and posting lists
![Page 21: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/21.jpg)
Inverted Index 101
1 The old night keeper keeps the keep in the town
2 In the big old house in the big old gown.
3 The house in the town had the big old keep
4 Where the old night keeper never did sleep.
5 The night keeper keeps the keep in the night
6 And keeps in the dark and sleeps in the light.
term freqand 1 <6>big 2 <2> <3>
dark 1 <6>did 1 <4>
gown 1 <2>had 1 <3>
house 2 <2> <3>in 5 <1> <2> <3> <5> <6>
keep 3 <1> <3> <5>keeper 3 <1> <4> <5>keeps 3 <1> <5> <6>light 1 <6>
never 1 <4>night 3 <1> <4> <5>old 4 <1> <2> <3> <4>
sleep 1 <4>sleeps 1 <6>
the 6 <1> <2> <3> <4> <5> <6>town 2 <1> <3>where 1 <4>
Table with 6 documents
Dictionary and posting lists
Query: keeper
![Page 22: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/22.jpg)
Inverted Index 101
1 The old night keeper keeps the keep in the town
2 In the big old house in the big old gown.
3 The house in the town had the big old keep
4 Where the old night keeper never did sleep.
5 The night keeper keeps the keep in the night
6 And keeps in the dark and sleeps in the light.
term freqand 1 <6>big 2 <2> <3>
dark 1 <6>did 1 <4>
gown 1 <2>had 1 <3>
house 2 <2> <3>in 5 <1> <2> <3> <5> <6>
keep 3 <1> <3> <5>keeper 3 <1> <4> <5>keeps 3 <1> <5> <6>light 1 <6>
never 1 <4>night 3 <1> <4> <5>old 4 <1> <2> <3> <4>
sleep 1 <4>sleeps 1 <6>
the 6 <1> <2> <3> <4> <5> <6>town 2 <1> <3>where 1 <4>
Table with 6 documents
Dictionary and posting lists
Query: keeper
![Page 23: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/23.jpg)
Posting list encoding
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
![Page 24: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/24.jpg)
Posting list encoding
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
5 10 8985 2 90998 90Delta encoding:
![Page 25: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/25.jpg)
Posting list encoding
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
5 10 8985 2 90998 90Delta encoding:
00000101VInt compression:
Values 0 <= delta <= 127 need one byte
![Page 26: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/26.jpg)
Posting list encoding
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
5 10 8985 2 90998 90Delta encoding:
11000110VInt compression:
Values 128 <= delta <= 16384 need two bytes
00011001
![Page 27: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/27.jpg)
Posting list encoding
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
5 10 8985 2 90998 90Delta encoding:
11000110VInt compression:
First bit indicates whether next byte belongs to the same value
00011001
![Page 28: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/28.jpg)
Posting list encoding
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
5 10 8985 2 90998 90Delta encoding:
11000110VInt compression: 00011001
• Variable number of bytes - a VInt-encoded posting can not be written as a primitive Java type; therefore it can not be written atomically
![Page 29: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/29.jpg)
Posting list encoding
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
5 10 8985 2 90998 90Delta encoding:
Read direction
• Each posting depends on previous one; decoding only possible in old-to-new direction
• With recency ranking (new-to-old) no early termination is possible
![Page 30: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/30.jpg)
Posting list encoding
• By default Lucene uses a combination of delta encoding and VInt compression
• VInts are expensive to decode
• Problem 1: How to traverse posting lists backwards?
• Problem 2: How to write a posting atomically?
![Page 31: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/31.jpg)
Posting list encoding in Earlybird
int (32 bits)
docID24 bits
max. 16.7M
textPosition8 bits
max. 255
• Tweet text can only have 140 chars
• Decoding speed significantly improved compared to delta and VInt decoding (early experiments suggest 5x improvement compared to vanilla Lucene with FSDirectory)
![Page 32: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/32.jpg)
Posting list encoding in Earlybird
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
Earlybird encoding:
Read direction
5 15 9000 9002 100000 100090
![Page 33: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/33.jpg)
Early query termination
Doc IDs to encode: 5, 15, 9000, 9002, 100000, 100090
Earlybird encoding:
Read direction
5 15 9000 9002 100000 100090
E.g. 3 result are requested: Here we can terminate after reading 3
postings
![Page 34: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/34.jpg)
Posting list encoding - Summary
• ints can be written atomically in Java
• Backwards traversal easy on absolute docIDs (not deltas)
• Every posting is a possible entry point for a searcher
• Skipping can be done without additional data structures as binary search, even though there are better approaches which should be explored
• On tweet indexes we need about 30% more storage for docIDs compared to delta+Vints; compensated by compression of complete segments
• Max. segment size: 2^24 = 16.7M tweets
![Page 35: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/35.jpg)
Realtime Search @twitter
Agenda
- Introduction
- Search Architecture
- Inverted Index 101
‣ Memory Model & Concurrency
- Top Tweets
![Page 36: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/36.jpg)
Memory Model & Concurrency
![Page 37: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/37.jpg)
Inverted index components
Dictionary
Posting list storage
?
![Page 38: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/38.jpg)
Inverted index components
Dictionary
Posting list storage
?
![Page 39: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/39.jpg)
Inverted Index
1 The old night keeper keeps the keep in the town
2 In the big old house in the big old gown.
3 The house in the town had the big old keep
4 Where the old night keeper never did sleep.
5 The night keeper keeps the keep in the night
6 And keeps in the dark and sleeps in the light.
term freqand 1 <6>big 2 <2> <3>
dark 1 <6>did 1 <4>
gown 1 <2>had 1 <3>
house 2 <2> <3>in 5 <1> <2> <3> <5> <6>
keep 3 <1> <3> <5>keeper 3 <1> <4> <5>keeps 3 <1> <5> <6>light 1 <6>
never 1 <4>night 3 <1> <4> <5>old 4 <1> <2> <3> <4>
sleep 1 <4>sleeps 1 <6>
the 6 <1> <2> <3> <4> <5> <6>town 2 <1> <3>where 1 <4>
Table with 6 documents
Dictionary and posting lists
Per term we store different kinds of metadata: text pointer, frequency, postings pointer, etc.
![Page 40: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/40.jpg)
Term dictionary
termIDint[] textPointer; frequency;postingsPointer;
int[] int[] int[]
0123456
term text pool
![Page 41: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/41.jpg)
0
termIDint[] textPointer; frequency;postingsPointer;
int[] int[] int[]
t0 p0 f00123456
c a tt0
Term dictionary
![Page 42: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/42.jpg)
1
0
termIDint[] textPointer; frequency;postingsPointer;
int[] int[] int[]
t0
t1
p0
p1
f0
f1
0123456
c a t f o ot0 t1
Term dictionary
![Page 43: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/43.jpg)
1
0
2
termIDint[] textPointer; frequency;postingsPointer;
int[] int[] int[]
t0
t1
t2
p0
p1
p2
f0
f1
f2
0123456
c a t f o o b a rt0 t1 t2
Term dictionary
![Page 44: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/44.jpg)
Term dictionary
• Number of objects << number of terms
• O(1) lookups
• Easy to store more term metadata by adding additional parallel arrays
![Page 45: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/45.jpg)
Inverted index components
Parallel arraysDictionary
pointer to the most recently indexed posting for a term
Posting list storage
?
![Page 46: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/46.jpg)
Inverted index components
Parallel arraysDictionary
pointer to the most recently indexed posting for a term
Posting list storage
?
![Page 47: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/47.jpg)
• Store many single-linked lists of different lengths space-efficiently
• The number of java objects should be independent of the number of lists or number of items in the lists
• Every item should be a possible entry point into the lists for iterators, i.e. items should not be dependent on other items (e.g. no delta encoding)
• Append and read possible by multiple threads in a lock-free fashion (single append thread, multiple reader threads)
• Traversal in backwards order
Posting lists storage - Objectives
![Page 48: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/48.jpg)
Memory management
= 32K int[]
4 int[]pools
![Page 49: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/49.jpg)
Memory management
= 32K int[]
4 int[]pools
Each pool can be grown
individually by adding 32K
blocks
![Page 50: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/50.jpg)
Memory management
• For simplicity we can forget about the blocks for now and think of the pools as continuous, unbounded int[] arrays
• Small total number of Java objects (each 32K block is one object)
4 int[]pools
![Page 51: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/51.jpg)
Memory management
• Slices can be allocated in each pool
• Each pool has a different, but fixed slice size
21
24
27
211
slice size
![Page 52: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/52.jpg)
Adding and appending to a list
21
24
27
211
slice size
available
allocated
current list
![Page 53: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/53.jpg)
Adding and appending to a list
21
24
27
211
slice size
Store first twopostings in this slice
available
allocated
current list
![Page 54: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/54.jpg)
Adding and appending to a list
21
24
27
211
slice size
When first slice is full, allocate another one in second pool
available
allocated
current list
![Page 55: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/55.jpg)
Adding and appending to a list
21
24
27
211
slice size
available
allocated
current list
Allocate a slice on each level as list grows
![Page 56: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/56.jpg)
Adding and appending to a list
21
24
27
211
slice size
available
allocated
current list
On upper most level one list can own multiple slices
![Page 57: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/57.jpg)
Posting list format
int (32 bits)
docID24 bits
max. 16.7M
textPosition8 bits
max. 255
• Tweet text can only have 140 chars
• Decoding speed significantly improved compared to delta and VInt decoding (early experiments suggest 5x improvement compared to vanilla Lucene with FSDirectory)
![Page 58: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/58.jpg)
Addressing items
• Use 32 bit (int) pointers to address any item in any list unambiguously:
int (32 bits)
poolIndex2 bits0-3
offset in slice1-11 bits
depends on pool
sliceIndex19-29 bits
depends on pool
• Nice symmetry: Postings and address pointers both fit into a 32 bit int
![Page 59: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/59.jpg)
Linking the slices
21
24
27
211
slice size
available
allocated
current list
![Page 60: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/60.jpg)
Linking the slices
21
24
27
211
slice size
available
allocated
current list
Parallel arraysDictionary
pointer to the last posting indexed for a term
![Page 61: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/61.jpg)
Concurrency - Definitions
• Pessimistic locking
• A thread holds an exclusive lock on a resource, while an action is performed [mutual exclusion]
• Usually used when conflicts are expected to be likely
• Optimistic locking
• Operations are tried to be performed atomically without holding a lock; conflicts can be detected; retry logic is often used in case of conflicts
• Usually used when conflicts are expected to be the exception
![Page 62: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/62.jpg)
Concurrency - Definitions
• Non-blocking algorithm
Ensures, that threads competing for shared resources do not have their execution indefinitely postponed by mutual exclusion.
• Lock-free algorithm
A non-blocking algorithm is lock-free if there is guaranteed system-wide progress.
• Wait-free algorithm
A non-blocking algorithm is wait-free, if there is guaranteed per-thread progress.
* Source: Wikipedia
![Page 63: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/63.jpg)
Concurrency
• Having a single writer thread simplifies our problem: no locks have to be used to protect data structures from corruption (only one thread modifies data)
• But: we have to make sure that all readers always see a consistent state of all data structures -> this is much harder than it sounds!
• In Java, it is not guaranteed that one thread will see changes that another thread makes in program execution order, unless the same memory barrier is crossed by both threads -> safe publication
• Safe publication can be achieved in different, subtle ways. Read the great book “Java concurrency in practice” by Brian Goetz for more information!
![Page 64: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/64.jpg)
Java Memory Model
• Program order rule
Each action in a thread happens-before every action in that thread that comes later in the program order.
• Volatile variable rule
A write to a volatile field happens-before every subsequent read of that same field.
• Transitivity
If A happens-before B, and B happens-before C, then A happens-before C.
* Source: Brian Goetz: Java Concurrency in Practice
![Page 65: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/65.jpg)
Concurrency
0RAM
int x;
Cache
Thread 1 Thread 2
time
![Page 66: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/66.jpg)
Concurrency
0RAM
int x;
Cache 5
Thread 1 Thread 2
x = 5;
Thread A writes x=5 to cache
time
![Page 67: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/67.jpg)
Concurrency
0RAM
int x;
Cache 5
Thread 1 Thread 2
x = 5;
while(x != 5);time
This condition will likely never become false!
![Page 68: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/68.jpg)
Concurrency
0RAM
int x;
Cache
Thread 1 Thread 2
time
![Page 69: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/69.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5
Thread A writes b=1 to RAM, because b is volatile
b = 1;
![Page 70: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/70.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5
Read volatile b
b = 1;
int dummy = b;
while(x != 5);
![Page 71: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/71.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5
b = 1;
int dummy = b;
while(x != 5);
• Program order rule: Each action in a thread happens-before every action in that thread that comes later in the program order.
happens-before
![Page 72: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/72.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5
b = 1;
int dummy = b;
while(x != 5);
happens-before
• Volatile variable rule: A write to a volatile field happens-before every subsequent read of that same field.
![Page 73: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/73.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5
b = 1;
int dummy = b;
while(x != 5);
happens-before
• Transitivity: If A happens-before B, and B happens-before C, then A happens-before C.
![Page 74: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/74.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5
b = 1;
int dummy = b;
while(x != 5);
This condition will be false, i.e. x==5
• Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.
![Page 75: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/75.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5
b = 1;
int dummy = b;
while(x != 5);
Memory barrier
• Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.
![Page 76: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/76.jpg)
Demo
![Page 77: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/77.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5
b = 1;
int dummy = b;
while(x != 5);
Memory barrier
• Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.
![Page 78: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/78.jpg)
Concurrency
IndexWriter IndexReader
time
write 100 docs
maxDoc = 100
in IR.open(): read maxDoc
search upto maxDoc
maxDoc is volatile
write more docs
![Page 79: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/79.jpg)
Concurrency
IndexWriter IndexReader
time
write 100 docs
maxDoc = 100
in IR.open(): read maxDoc
search upto maxDoc
maxDoc is volatile
write more docs
happens-before
• Only maxDoc is volatile. All other fields that IW writes to and IR reads from don’t need to be!
![Page 80: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/80.jpg)
• Not a single exclusive lock
• Writer thread can always make progress
• Optimistic locking (retry-logic) in a few places for searcher thread
• Retry logic very simple and guaranteed to always make progress
Wait-free
![Page 81: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/81.jpg)
Realtime Search @twitter
Agenda
- Introduction
- Search Architecture
- Inverted Index 101
- Memory Model & Concurrency
‣ Top Tweets
![Page 82: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/82.jpg)
Top Tweets
![Page 83: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/83.jpg)
Signals
• Query-dependent
• E.g. Lucene text score, language
• Query-independent
• Static signals (e.g. text quality)
• Dynamic signals (e.g. retweets)
• Timeliness
![Page 84: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/84.jpg)
Signals
Many Earlybird segments (8M documents each)
Task: Find best tweet in billions of tweets efficiently
![Page 85: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/85.jpg)
Signals
Many Earlybird segments (8M documents each)
• For top tweets we can’t early terminate as efficiently
• Scoring and ranking billions of tweets is impractical
![Page 86: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/86.jpg)
Query cache
Many Earlybird segments (8M documents each)
Idea: Mark all tweets with high query-independent scores and only visit those for top tweets queries
Query results cache
High static score doc
![Page 87: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/87.jpg)
Query cache
Many Earlybird segments (8M documents each)
• A background thread periodically wakes up, executes queries, and stores the results in the per-segment cache
• Rewriting queries
• User query: q = ‘lucene’
• Rewritten: q’ = ‘lucene AND cached_filter:toptweets’
• Efficient skipping over tweets with low query-independent scores
This clause will be executed as a Lucene ConstantScoreQuery that wraps a BitSet or SortedVIntList
![Page 88: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/88.jpg)
Query cache
Many Earlybird segments (8M documents each)
• A background thread periodically wakes up, executes queries, and stores the results in the per-segment cache
• Configurable per cached query:
• Result set type: BitSet, SortedVIntList
• Execution schedule
• Filter mode: cache-only or hybrid
![Page 89: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/89.jpg)
Query cache
Many Earlybird segments (8M documents each)
• Result set type: BitSet, SortedVIntList
• BitSet for results with many hits
• SortedVIntList for very sparse results
![Page 90: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/90.jpg)
Query cache
Many Earlybird segments (8M documents each)
• Execution schedule
• Per segment: Sleep time between refreshing the cached results
![Page 91: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/91.jpg)
Query cache
Many Earlybird segments (8M documents each)
• Filter mode: cache-only or hybrid
![Page 92: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/92.jpg)
Query cache
• Filter mode: cache-only or hybrid
Read direction
![Page 93: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/93.jpg)
Query cache
• Filter mode: cache-only or hybrid
Read direction
Partially filled segment (IndexWriter is currently
appending)
![Page 94: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/94.jpg)
Query cache
• Filter mode: cache-only or hybrid
Read direction
Query cache can only be computed until current
maxDocID
![Page 95: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/95.jpg)
Query cache
• Filter mode: cache-only or hybrid
Read direction
Some time later: More docs in segment, but query cache was
not updated yet
![Page 96: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/96.jpg)
Query cache
• Filter mode: cache-only or hybrid
Read direction
Cache-only mode: Ignore documents added to segment
since cache was updatedSearch range
![Page 97: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/97.jpg)
Query cache
• Filter mode: cache-only or hybrid
Read direction
Hybrid mode: Fall back to original query and execute it on
latest documentsSearch range
![Page 98: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/98.jpg)
Query cache
• Filter mode: cache-only or hybrid
Read direction
Use query cache up to the cache’s maxDocID
Search range
![Page 99: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/99.jpg)
Query result cache
Many Earlybird segments (8M documents each)
query cache config yml file
![Page 100: Realtime Search at Twitter - Michael Busch](https://reader034.vdocument.in/reader034/viewer/2022050919/5484d399b47959d30c8b4cee/html5/thumbnails/100.jpg)
Questions?