![Page 2: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/2.jpg)
Agenda
‣ Introduction
- Search Architecture
- Lucene Extensions
- Outlook
Search @twitter
![Page 3: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/3.jpg)
![Page 4: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/4.jpg)
Introduction
![Page 5: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/5.jpg)
Introduction
Twitter has more than 284 million monthly active users.
![Page 6: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/6.jpg)
Introduction
500 million tweets are sent per day.
![Page 7: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/7.jpg)
Introduction
More than 300 billion tweets have been sent since company founding in 2006.
![Page 8: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/8.jpg)
Introduction
Tweets-per-second record: one-second peak of 143,199 TPS.
![Page 9: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/9.jpg)
Introduction
More than 2 billion search queries per day.
![Page 10: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/10.jpg)
Search @twitter
Agenda
- Introduction
‣ Search Architecture
- Lucene Extensions
- Outlook
![Page 11: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/11.jpg)
![Page 12: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/12.jpg)
Search Architecture
![Page 13: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/13.jpg)
RT index
Search Architecture
RT streamAnalyzer/Partitioner
RT index(Earlybird)
Blender
RT indexArchive index
MapreduceAnalyzer
rawtweets
HDFS
searcheswrites
Searchrequests
analyzedtweets
analyzedtweets
rawtweets
Tweet archive
![Page 14: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/14.jpg)
RT index
Search Architecture
TweetsAnalyzer/Partitioner
RT index(Earlybird)
Blender
RT indexArchive index
queue
HDFS
Searchrequests
Updates Deletes/Engagement (e.g. retweets/favs)
searcheswrites
MapreduceAnalyzer
![Page 15: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/15.jpg)
RT index
Search Architecture
RT index(Earlybird)
Blender
RT indexArchive index
searcheswrites
Searchrequests
• Blender is our Thrift service aggregator
• Queries multiple Earlybirds, merges results
Social graph
Social graph
Social graphUser
search
![Page 16: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/16.jpg)
Search Architecture
RT index(Earlybird)
Archive index
Usersearch
![Page 17: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/17.jpg)
Search Architecture
RT index(Earlybird)
Archive index
• For historic reasons, these used to be entirely different codebases, but had similar features/technologies
• Over time cross-dependencies were introduced to share code
Usersearch
Lucene
![Page 18: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/18.jpg)
Search Architecture
RT index(Earlybird)
Archive index
Usersearch
Lucene
Lucene Extensions
• New Lucene extension package
• This package is truly generic and has no dependency on an actual product/index
• It contains Twitter’s extensions for real-time search, a thin segment management layer and other features
![Page 19: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/19.jpg)
Search @twitter
Agenda
- Introduction
- Search Architecture
‣ Lucene Extensions
- Outlook
![Page 20: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/20.jpg)
![Page 21: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/21.jpg)
Lucene Extensions
![Page 22: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/22.jpg)
Lucene Extension Library
• Abstraction layer for Lucene index segments
• Real-time writer for in-memory index segments
• Schema-based Lucene document factory
• Real-time faceting
![Page 23: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/23.jpg)
• API layer for Lucene segments
• *IndexSegmentWriter
• *IndexSegmentAtomicReader
• Two implementations
• In-memory: RealtimeIndexSegmentWriter (and reader)
• On-disk: LuceneIndexSegmentWriter (and reader)
Lucene Extension Library
![Page 24: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/24.jpg)
• IndexSegments can be built ...
• in realtime
• on Mesos or Hadoop (Mapreduce)
• locally on serving machines
• Cluster-management code that deals with IndexSegments
• Share segments across serving machines using HDFS
• Can rebuild segments (e.g. to upgrade Lucene version, change data schema, etc.)
Lucene Extension Library
![Page 25: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/25.jpg)
HDFS EarlybirdEarlybirdEarlybird
Mesos
Hadoop (MR)
RT pipeline
Lucene Extension Library
![Page 26: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/26.jpg)
RealtimeIndexSegmentWriter
• Modified Lucene index implementation optimized for realtime search
• IndexWriter buffer is searchable (no need to flush to allow searching)
• In-memory
• Lock-free concurrency model for best performance
![Page 27: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/27.jpg)
Concurrency - Definitions
• Pessimistic locking
• A thread holds an exclusive lock on a resource, while an action is performed [mutual exclusion]
• Usually used when conflicts are expected to be likely
• Optimistic locking
• Operations are tried to be performed atomically without holding a lock; conflicts can be detected; retry logic is often used in case of conflicts
• Usually used when conflicts are expected to be the exception
![Page 28: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/28.jpg)
Concurrency - Definitions
• Non-blocking algorithm
Ensures, that threads competing for shared resources do not have their execution indefinitely postponed by mutual exclusion.
• Lock-free algorithm
A non-blocking algorithm is lock-free if there is guaranteed system-wide progress.
• Wait-free algorithm
A non-blocking algorithm is wait-free, if there is guaranteed per-thread progress.
* Source: Wikipedia
![Page 29: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/29.jpg)
Concurrency
• Having a single writer thread simplifies our problem: no locks have to be used to protect data structures from corruption (only one thread modifies data)
• But: we have to make sure that all readers always see a consistent state of all data structures -> this is much harder than it sounds!
• In Java, it is not guaranteed that one thread will see changes that another thread makes in program execution order, unless the same memory barrier is crossed by both threads -> safe publication
• Safe publication can be achieved in different, subtle ways. Read the great book “Java concurrency in practice” by Brian Goetz for more information!
![Page 30: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/30.jpg)
Java Memory Model
• Program order rule
Each action in a thread happens-before every action in that thread that comes later in the program order.
• Volatile variable rule
A write to a volatile field happens-before every subsequent read of that same field.
• Transitivity
If A happens-before B, and B happens-before C, then A happens-before C.
* Source: Brian Goetz: Java Concurrency in Practice
![Page 31: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/31.jpg)
Concurrency
0RAM
int x;
Cache
Thread 1 Thread 2
time
![Page 32: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/32.jpg)
Concurrency
0RAM
int x;
Cache 5
Thread 1 Thread 2
x = 5;
Thread A writes x=5 to cache
time
![Page 33: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/33.jpg)
Concurrency
0RAM
int x;
Cache 5
Thread 1 Thread 2
x = 5;
while(x != 5);time
This condition will likely never become false!
![Page 34: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/34.jpg)
Concurrency
0RAM
int x;
Cache
Thread 1 Thread 2
time
![Page 35: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/35.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5
Thread A writes b=1 to RAM, because b is volatile
b = 1;
![Page 36: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/36.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5
Read volatile b
b = 1;
int dummy = b;
while(x != 5);
![Page 37: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/37.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5b = 1;
int dummy = b;
while(x != 5);
• Program order rule: Each action in a thread happens-before every action in that thread that comes later in the program order.
happens-before
![Page 38: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/38.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5b = 1;
int dummy = b;
while(x != 5);
happens-before
• Volatile variable rule: A write to a volatile field happens-before every subsequent read of that same field.
![Page 39: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/39.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5b = 1;
int dummy = b;
while(x != 5);
happens-before
• Transitivity: If A happens-before B, and B happens-before C, then A happens-before C.
![Page 40: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/40.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5b = 1;
int dummy = b;
while(x != 5);
This condition will be false, i.e. x==5
• Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.
![Page 41: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/41.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5b = 1;
int dummy = b;
while(x != 5);
Memory barrier
• Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.
![Page 42: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/42.jpg)
![Page 43: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/43.jpg)
Demo
![Page 44: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/44.jpg)
Concurrency
0RAM
int x;
1
Cache
Thread 1 Thread 2
time
volatile int b;
x = 5;5b = 1;
int dummy = b;
while(x != 5);
Memory barrier
• Note: x itself doesn’t have to be volatile. There can be many variables like x, but we need only a single volatile field.
![Page 45: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/45.jpg)
Concurrency
IndexWriter IndexReader
time
write 100 docs
maxDoc = 100
in IR.open(): read maxDoc
search upto maxDoc
maxDoc is volatile
write more docs
![Page 46: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/46.jpg)
Concurrency
IndexWriter IndexReader
time
write 100 docs
maxDoc = 100
in IR.open(): read maxDoc
search upto maxDoc
maxDoc is volatile
write more docs
happens-before
• Only maxDoc is volatile. All other fields that IW writes to and IR reads from don’t need to be!
![Page 47: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/47.jpg)
• Not a single exclusive lock
• Writer thread can always make progress
• Optimistic locking (retry-logic) in a few places for searcher thread
• Retry logic very simple and guaranteed to always make progress
Wait-free
![Page 48: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/48.jpg)
In-memory Real-time Index
• Highly optimized for GC - all data is stored in blocked native arrays
• v1: Optimized for tweets with a term position limit of 255
• v2: Support for 32 bit positions without performance degradation
• v2: Basic support for out-of-order posting list inserts
![Page 49: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/49.jpg)
In-memory Real-time Index
• Highly optimized for GC - all data is stored in blocked native arrays
• v1: Optimized for tweets with a term position limit of 255
• v2: Support for 32 bit positions without performance degradation
• v2: Basic support for out-of-order posting list inserts
![Page 50: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/50.jpg)
In-memory Real-time Index
• RT term dictionary
• Term lookups using a lock-free hashtable in O(1)
• v2: Additional probabilistic, lock-free skip list maintains ordering on terms
• Perfect skip list not an option: out-of-order inserts would require rebalancing, which is impractical with our lock-free index
• In a probabilistic skip list the tower height of a new (out-of-order) item can be determined without knowing its insert position by simply rolling a dice
![Page 51: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/51.jpg)
In-memory Real-time Index
• Perfect skip list
![Page 52: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/52.jpg)
In-memory Real-time Index
• Perfect skip list
Inserting a new element in the middle of this skip list requires re-balancing the towers.
![Page 53: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/53.jpg)
In-memory Real-time Index
• Probabilistic skip list
![Page 54: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/54.jpg)
In-memory Real-time Index
• Probabilistic skip list Tower height determined by rolling a dice BEFORE knowing the insert location; tower height
never has to change for an element, simplifying memory allocation and concurrency.
![Page 55: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/55.jpg)
• Apps provide one ThriftSchema per index and create a ThriftDocument for each document
• SchemaDocumentFactory translates ThriftDocument -> Lucene Document using the Schema
• Default field values
• Extended field settings
• Type-system on top of DocValues
• Validation
Schema-based Document factory
![Page 56: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/56.jpg)
Schema-based Document factory
Schema
Lucene Document
SchemaDocumentFactory
Thrift Document
• Validation
• Fill in default values
• Apply correct Lucene field settings
![Page 57: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/57.jpg)
Schema-based Document factory
Schema
Lucene Document
SchemaDocumentFactory
Thrift Document
• Validation
• Fill in default values
• Apply correct Lucene field settings
Decouples core package from specific product/index. Similar
to Solr/ElasticSearch.
![Page 58: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/58.jpg)
Search @twitter
Agenda
- Introduction
- Search Architecture
- Lucene Extensions
‣ Outlook
![Page 59: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/59.jpg)
![Page 60: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/60.jpg)
Outlook
![Page 61: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/61.jpg)
Outlook
• Support for parallel (sliced) segments to support partial segment rebuilds and other cool posting list update patterns
• Add remaining missing Lucene features to RT index
• Index term statistics for ranking
• Term vectors
• Stored fields
![Page 62: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/62.jpg)
Questions?Michael Busch@[email protected] [email protected]
![Page 63: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/63.jpg)
![Page 64: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/64.jpg)
Backup Slides
![Page 65: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/65.jpg)
Searching for top entities within Tweets
• Task: Find the best photos in a subset of tweets
• We could use a Lucene index, where each photo is a document
• Problem: How to update existing documents when the same photos are tweeted again?
• In-place posting list updates are hard
• Lucene’s updateDocument() is a delete/add operation - expensive and not order-preserving
![Page 66: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/66.jpg)
Searching for top entities within Tweets
• Task: Find the best photos in a subset of tweets
• Could we use our existing time-ordered tweet index?
• Facets!
![Page 67: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/67.jpg)
Inverted index
Query Doc ids
Forward indexDoc id Document
Metadata
Facetindex
Term id Term label
Doc id Term ids
Searching for top entities within Tweets
![Page 68: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/68.jpg)
Storing tweet metadata
FacetindexDoc id Term ids
![Page 69: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/69.jpg)
Facetindex
Matching doc id
Term ids
5 15 9000 9002 100000 100090
48239 831241 2
Top-k heap
Id Count
Query
Searching for top entities within Tweets
![Page 70: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/70.jpg)
Facetindex
Matching doc id
Term ids
5 15 9000 9002 100000 100090
48239 1531241 1285932 86748 3
Top-k heap
Id Count
Query
Searching for top entities within Tweets
![Page 71: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/71.jpg)
Facetindex
Matching doc id
Term ids
5 15 9000 9002 100000 100090
48239 1531241 1285932 86748 3
Top-k heap
Id Count
Query
Weighted counts (from engagement features) used
for relevance scoring
Searching for top entities within Tweets
![Page 72: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/72.jpg)
Facetindex
Matching doc id
Term ids
5 15 9000 9002 100000 100090
48239 1531241 1285932 86748 3
Top-k heap
Id Count
Query
All query operators can be used. E.g. find best photos in
San Francisco tweeted by people I follow
Searching for top entities within Tweets
![Page 73: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/73.jpg)
Inverted indexTerm id Term label
Searching for top entities within Tweets
![Page 74: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/74.jpg)
pic.twitter.com/jknui4w 45pic.twitter.com/dslkfj83 23pic.twitter.com/acm3ps 15pic.twitter.com/948jdsd 11pic.twitter.com/dsjkf15h 8pic.twitter.com/irnsoa32 5
48239 4531241 2385932 156748 11
74294 83728 5
Id Count Label Count
Inverted index
Searching for top entities within Tweets
![Page 75: Search at Twitter: Presented by Michael Busch, Twitter](https://reader033.vdocument.in/reader033/viewer/2022052700/55a20a9f1a28aba5368b46da/html5/thumbnails/75.jpg)
Summary
• Indexing tweet entities (e.g. photos) as facets allows to search and rank top-entities using a tweets index
• All query operators supported
• Documents don’t need to be reindexed
• Approach reusable for different use cases, e.g.: best vines, hashtags, @mentions, etc.