sfbay area solr meetup - june 18th: box + solr = content search for business

Post on 11-May-2015

835 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

"Box + Solr = Content Search for Business" - Wei Zhao, Box

TRANSCRIPT

1

June 2014

Box + Solr = Content Search for Business

2

Wei Zhao

Box backend engineerwzhao@box.com

3

to make organizations more productive, competitive and collaborative by connecting people and their most important information

Box mission

4

25MM+ Users

225K+ Businesses

99% Fortune 500

5

Box search mission is to make user content easy to discover.

6

10Billion+ Documents

10TB+ Index size

100M+Daily requests

Box uses Solr for search

7

Quick Search

8

Quick Search

9

Full Search

10

Sharding – splitting the index

Agenda

Highly available search

A few more things

1

2

3

4

5 Q&A

Currently working on

11

We shard things

12

Shard ID = File ID % Total Shards

13

Multi-tenant – One big logical index for all users

Solr index

Shard1 Shard2 Shard3 ShardN

14

Search scope

15

File ID: 12345

OwnerID: user1

Parent Folders IDs: folder1, folder2

File Name: Solr.ppt

File Content: blah......

A typical Solr Document

16

Owner: User1Parent: Folder1

Owner: User2Parent: Folder3

Owner: User2Parent: Folder2

Owner: User1Parent:Folder1Folder4

File 1 File 2

File 3 File 4

17

User1 with no share folder

Owner: User1Parent: Folder1

Owner: User2Parent: Folder3

Owner: User2Parent: Folder2

Owner: User1Parent:Folder1Folder4

Filter: User1

File 1 File 2

File 3 File 4

18

User2 shares Folder2 with User1

Owner: User1Parent: Folder1

Owner: User2Parent: Folder3

Owner: User2Parent: Folder2

Owner: User1Parent:Folder1Folder4

File 1 File 2

File 3 File 4

19

User2 shares Folder2 with User1

Owner: User1Parent: Folder1

Owner: User2Parent: Folder3

Owner: User2Parent: Folder2

Owner: User1Parent:Folder1Folder4

Filter: User1 + Folder2

File 1 File 2

File 3 File 4

20

User2 shares Folder2 with User1

Owner: User1Parent: Folder1

Owner: User2Parent: Folder3

Owner: User2Parent: Folder5

Owner: User1Parent:Folder1Folder4

File 1 File 2

File 3 File 4

Removed out of Folder2

21

User2 shares Folder2 with User1

Owner: User1Parent: Folder1

Owner: User2Parent: Folder3

Owner: User2Parent: Folder5

Owner: User1Parent:Folder1Folder4

Filter: User1 + Folder2

File 1 File 2

File 3 File 4

Removed out of Folder2

22

Highly Available Search

23

• Index is highly available

• Search functionality is highly available

24

Index workflow

25

Box Front End

UploadIndex Queue

Queue 1

Queue 2

Queue 3

Indexer 1

Indexer 3

Indexer 2

MySQL

Index1

Index2

Index2

26

Search workflow

27

Box Front End

query HA Proxy Head

nodeHA Proxy

1 2 3 N

Box Front End

query HA Proxy Head

nodeHA Proxy

1 2 3 N

Data center boundary

28

A few more things

29

File Content Search

30

Box Front End

Upload

MySQL Box FileStorage

IndexerSolr Index

Text Extraction ExtractedText

31

Multi-language support

32

Raw file content

Language detector

English tokenizer

Spanish tokenizer

Japanese tokenizer

German tokenizer

file_content_en

File_content_es{hola}

file_content_ja....

File_content_de

33

To Dos

• Scale language support

• Support document with mixed languages

34

Search Warm-up

35

• Front end informs backend to warm up on keyboard focus

• Backend prepares the search filter and caches it in a search session

• Backend sends a warm-up query to Solr

36

What we are working on

37

• Search suggestions

• Search operators

• Use machine learning to influence ranking

• Logical sharding

Things we are working on

38

Question?

39

Contact: wzhao@box.com

We are hiring!

top related