sfbay area solr meetup - june 18th: box + solr = content search for business
Post on 11-May-2015
835 Views
Preview:
DESCRIPTION
TRANSCRIPT
1
June 2014
Box + Solr = Content Search for Business
2
Wei Zhao
Box backend engineerwzhao@box.com
3
to make organizations more productive, competitive and collaborative by connecting people and their most important information
Box mission
4
25MM+ Users
225K+ Businesses
99% Fortune 500
5
Box search mission is to make user content easy to discover.
6
10Billion+ Documents
10TB+ Index size
100M+Daily requests
Box uses Solr for search
7
Quick Search
8
Quick Search
9
Full Search
10
Sharding – splitting the index
Agenda
Highly available search
A few more things
1
2
3
4
5 Q&A
Currently working on
11
We shard things
12
Shard ID = File ID % Total Shards
13
Multi-tenant – One big logical index for all users
Solr index
Shard1 Shard2 Shard3 ShardN
14
Search scope
15
File ID: 12345
OwnerID: user1
Parent Folders IDs: folder1, folder2
File Name: Solr.ppt
File Content: blah......
A typical Solr Document
16
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder2
Owner: User1Parent:Folder1Folder4
File 1 File 2
File 3 File 4
17
User1 with no share folder
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder2
Owner: User1Parent:Folder1Folder4
Filter: User1
File 1 File 2
File 3 File 4
18
User2 shares Folder2 with User1
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder2
Owner: User1Parent:Folder1Folder4
File 1 File 2
File 3 File 4
19
User2 shares Folder2 with User1
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder2
Owner: User1Parent:Folder1Folder4
Filter: User1 + Folder2
File 1 File 2
File 3 File 4
20
User2 shares Folder2 with User1
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder5
Owner: User1Parent:Folder1Folder4
File 1 File 2
File 3 File 4
Removed out of Folder2
21
User2 shares Folder2 with User1
Owner: User1Parent: Folder1
Owner: User2Parent: Folder3
Owner: User2Parent: Folder5
Owner: User1Parent:Folder1Folder4
Filter: User1 + Folder2
File 1 File 2
File 3 File 4
Removed out of Folder2
22
Highly Available Search
23
• Index is highly available
• Search functionality is highly available
24
Index workflow
25
Box Front End
UploadIndex Queue
Queue 1
Queue 2
Queue 3
Indexer 1
Indexer 3
Indexer 2
MySQL
Index1
Index2
Index2
26
Search workflow
27
Box Front End
query HA Proxy Head
nodeHA Proxy
1 2 3 N
Box Front End
query HA Proxy Head
nodeHA Proxy
1 2 3 N
Data center boundary
28
A few more things
29
File Content Search
30
Box Front End
Upload
MySQL Box FileStorage
IndexerSolr Index
Text Extraction ExtractedText
31
Multi-language support
32
Raw file content
Language detector
English tokenizer
Spanish tokenizer
Japanese tokenizer
German tokenizer
file_content_en
File_content_es{hola}
file_content_ja....
File_content_de
33
To Dos
• Scale language support
• Support document with mixed languages
34
Search Warm-up
35
• Front end informs backend to warm up on keyboard focus
• Backend prepares the search filter and caches it in a search session
• Backend sends a warm-up query to Solr
36
What we are working on
37
• Search suggestions
• Search operators
• Use machine learning to influence ranking
• Logical sharding
Things we are working on
38
Question?
39
Contact: wzhao@box.com
We are hiring!
top related