full-text search: how it works and what it can do – couchbase connect 2016

Post on 15-Feb-2017

148 Views

Category:

Software

12 Downloads

Preview:

Click to see full reader

TRANSCRIPT

©2016 Couchbase Inc.

Couchbase Full Text Search (FTS)

©2016 Couchbase Inc. 2©2016 Couchbase Inc.

about your speakers

Marty Schoch Steve Yen

©2016 Couchbase Inc.©2016 Couchbase Inc.

agenda

why?what is it? how does it work?how does it scale?demobest practicesstatus / roadmap / what’s next

©2016 Couchbase Inc.©2016 Couchbase Inc.

agenda

why?

©2016 Couchbase Inc.©2016 Couchbase Inc.

couchbase users need to search their documents

©2016 Couchbase Inc.©2016 Couchbase Inc.

dedicated search solutions

✗ Provision✗ Install✗ Integrate✗ Transfer data✗ Learn✗ Manage✗ Troubleshoot

©2016 Couchbase Inc.©2016 Couchbase Inc.

why Full Text Search?

why Full Text Search?

simpleintegrated

80/20 of features

©2016 Couchbase Inc.©2016 Couchbase Inc.

agenda

what is it?

©2016 Couchbase Inc.©2016 Couchbase Inc.

what’s Full Text Search?

©2016 Couchbase Inc.©2016 Couchbase Inc.

what’s Full Text Search?

©2016 Couchbase Inc.©2016 Couchbase Inc.

search results

Result Text Snippets

©2016 Couchbase Inc.©2016 Couchbase Inc.

search results

Result Text Snippets

Highlighted Search Terms

©2016 Couchbase Inc.©2016 Couchbase Inc.

agenda

how does it work?

©2016 Couchbase Inc.©2016 Couchbase Inc.

how does it work?

• Inverted indexes• Language awareness• Scoring

©2016 Couchbase Inc.©2016 Couchbase Inc.

inverted index

Terms

my: Doc 1, Doc 2, Doc 3dog: Doc 1, Doc 2, Doc 81has: Doc 1, Doc 2, Doc 3fleas: Doc 1, Doc 81…

Where found

©2016 Couchbase Inc.©2016 Couchbase Inc.

language aware

Document contains…

Beauty

Indexed as…

beauti

stemmingstemming Text Analysis

✔Match!

User searches…

Beautiful

Searched as…

beauti

©2016 Couchbase Inc.©2016 Couchbase Inc.

scoring

©2016 Couchbase Inc.©2016 Couchbase Inc.

TF/IDF scoring

• TF = Term Frequency• How often does a term occur in a document?• More often yields a higher score

• IDF = Inverse Document Frequency• How many documents have this term?• More documents yields lower score • (because it means the term is more common)

©2016 Couchbase Inc.©2016 Couchbase Inc.

index mapping

©2016 Couchbase Inc.©2016 Couchbase Inc.

index mapping

•Exclude fields/sub-sections•Configure indexing behavior by type of document (beer vs brewery)•Configure indexing behavior per-field• Index Fields•Nested structures• Arrays

©2016 Couchbase Inc.©2016 Couchbase Inc.

precision vs. recall

• Precision – ratio of document matches that are actually relevant• Recall – ratio of relevant documents that are actually matched• High quality results depend on performing the right analysis for your text• Beware: increasing precision may reduce recall (and vice versa)

©2016 Couchbase Inc.©2016 Couchbase Inc.

agenda

how does it scale?

©2016 Couchbase Inc.©2016 Couchbase Inc.

how does it scale?

©2016 Couchbase Inc.©2016 Couchbase Inc.

how does it scale?

✔auto index partitioning? (hash partitioning))

©2016 Couchbase Inc.©2016 Couchbase Inc.

how does it scale?

✔auto index partitioning (hash partitioning)

✔ (replicas promoted)

©2016 Couchbase Inc.©2016 Couchbase Inc.

how does it scale?

✔auto index partitioning (hash partitioning)

✔to multiple FTS nodes? (auto-placement)

✔ (replicas promoted)

©2016 Couchbase Inc.©2016 Couchbase Inc.

how does it scale?

✔auto index partitioning (hash partitioning)

✔to multiple FTS nodes (auto-placement)

✔ (replicas promoted)

©2016 Couchbase Inc.©2016 Couchbase Inc.

how does it scale?

✔auto index partitioning (hash partitioning)

✔to multiple FTS nodes (auto-placement)

✔rebalance? (add/swap/remove)

(replicas promoted)

©2016 Couchbase Inc.©2016 Couchbase Inc.

how does it scale?

✔auto index partitioning (hash partitioning)

✔to multiple FTS nodes (auto-placement)

✔rebalance (add/swap/remove)

✔ (replicas promoted)

©2016 Couchbase Inc.©2016 Couchbase Inc.

how does it scale?

✔auto index partitioning (hash partitioning)

✔to multiple FTS nodes (auto-placement)

✔rebalance (add/swap/remove)

✔scatter/gather queries?)

©2016 Couchbase Inc.©2016 Couchbase Inc.

how does it scale?

✔auto index partitioning (hash partitioning)

✔to multiple FTS nodes (auto-placement)

✔rebalance (add/swap/remove)

✔scatter/gather queries (partial results ok)

✔ (replicas promoted)

©2016 Couchbase Inc.©2016 Couchbase Inc.

how does it scale?

✔auto index partitioning (hash partitioning)

✔to multiple FTS nodes (auto-placement)

✔rebalance (add/swap/remove)

✔scatter/gather queries (partial results ok)

©2016 Couchbase Inc.©2016 Couchbase Inc.

how does it scale?

✔auto index partitioning (hash partitioning)

✔to multiple FTS nodes (auto-placement)

✔rebalance (add/swap/remove)

✔scatter/gather queries (partial results ok)

✔replicas? (only primaries queried)

©2016 Couchbase Inc.©2016 Couchbase Inc.

how does it scale?

✔auto index partitioning (hash partitioning)

✔to multiple FTS nodes (auto-placement)

✔rebalance (add/swap/remove)

✔scatter/gather queries (partial results ok)

✔replicas (only primaries queried)

©2016 Couchbase Inc.©2016 Couchbase Inc.

how does it scale?

✔auto index partitioning (hash partitioning)

✔to multiple FTS nodes (auto-placement)

✔rebalance (add/swap/remove)

✔scatter/gather queries (partial results ok)

✔replicas (only primaries queried)

✔failover? (replicas promoted)

©2016 Couchbase Inc.©2016 Couchbase Inc.

how does it scale?

✔auto index partitioning (hash partitioning)

✔to multiple FTS nodes (auto-placement)

✔rebalance (add/swap/remove)

✔scatter/gather queries (partial results ok)

✔replicas (only primaries queried)

✔failover (replicas promoted)

©2016 Couchbase Inc.©2016 Couchbase Inc.

agenda

demo

©2016 Couchbase Inc.©2016 Couchbase Inc.

agenda

best practices

©2016 Couchbase Inc.©2016 Couchbase Inc.

only use explicit field mappings in production

{ “type” : ”brewery”, “random_number” : 4, “edible” : false}

Dynamic mappings are great, until…

©2016 Couchbase Inc.©2016 Couchbase Inc.

only use explicit field mappings in production

{ “type” : ”brewery”,

“comments”: 4k of text “random_number” : 4, “edible” : false}

Developer adds one small field

©2016 Couchbase Inc.©2016 Couchbase Inc.

only use explicit field mappings in production

{ “type” : ”brewery”,

“comments”: 4k of text “random_number” : 4, “edible” : false}

Developer adds one small field

©2016 Couchbase Inc.©2016 Couchbase Inc.

always use Index Aliases

Index Rebuilding

©2016 Couchbase Inc.©2016 Couchbase Inc.

always use Index Aliases

/users /usersV1

©2016 Couchbase Inc.©2016 Couchbase Inc.

always use Index Aliases

/users

/usersV1

/usersV2

Indexing 55%

©2016 Couchbase Inc.©2016 Couchbase Inc.

always use Index Aliases

/users

/usersV1

/usersV2

Atomic Switch to /usersV2

©2016 Couchbase Inc.©2016 Couchbase Inc.

always use Index Aliases

/users

/usersV2

Atomic Switch to /usersV2

©2016 Couchbase Inc. 52©2016 Couchbase Inc.

go watch!

Dave Starling

seenit

©2016 Couchbase Inc.©2016 Couchbase Inc.

agenda

status / roadmap / what’s next

©2016 Couchbase Inc.©2016 Couchbase Inc.

project status

FTS is developer preview in 4.5, 4.6

planned GA in Spockplease help kick the tires

http://www.couchbase.com/download

©2016 Couchbase Inc.

Couchbase Full Text Search (FTS)

thanks!

©2016 Couchbase Inc.©2016 Couchbase Inc.

links & Q+A

http://NICE-URL-TODO-HEREdownloads, getting started, tech docs

and, where you can ask questions

and share your feedback!

©2016 Couchbase Inc. 57©2016 Couchbase Inc.

EXTRA SLIDES

©2016 Couchbase Inc.©2016 Couchbase Inc.

FTS design

couchbase couchbase couchbase

FTS FTS FTS

cfg

DCP streamsfor incrementalindex updates

a cfg bucketholds metadata

about the indexes

©2016 Couchbase Inc. 59

Transition Slide TitleTransition Slide Subtitle Goes Here

©2016 Couchbase Inc. 60

Transition Slide TitleTransition Slide Subtitle Goes Here

©2016 Couchbase Inc. 61©2016 Couchbase Inc.

Title of Slide Goes Here

• Heading 1• Heading 2

• Heading 3• Heading 4

©2016 Couchbase Inc. 62

Title of Slide Goes Here

• Heading 1• Heading 2

• Heading 3• Heading 4

• Heading 1• Heading 2

• Heading 3• Heading 4

©2016 Couchbase Inc. 63

Speaker NameSpeakers TitleContact information

IMAGE GOES HERE

©2016 Couchbase Inc. 64

Thank You!

©2016 Couchbase Inc.©2016 Couchbase Inc.

agenda

design

©2016 Couchbase Inc.©2016 Couchbase Inc.

FTS design / index partitioning

©2016 Couchbase Inc.©2016 Couchbase Inc.

FTS design / index partitioning

bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)

FTS nodes:X Y Z

©2016 Couchbase Inc.©2016 Couchbase Inc.

FTS design / index partitioning

bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)

index partitions: (groups of vbuckets)

FTS nodes:X Y Z

©2016 Couchbase Inc.©2016 Couchbase Inc.

FTS design / index partitioning

bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)

index partitions: A B C(groups of vbuckets) 0-399 400-799 800-1023

FTS nodes:X Y Z

©2016 Couchbase Inc.©2016 Couchbase Inc.

FTS design / index partitioning

bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)

index partitions: A B C(groups of vbuckets) 0-399 400-799 800-1023

assign to FTS nodes:

FTS nodes:X Y Z

©2016 Couchbase Inc.©2016 Couchbase Inc.

FTS design / index partitioning

bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)

index partitions: A B C(groups of vbuckets) 0-399 400-799 800-1023

assign to FTS nodes:

FTS nodes:X Y Z

©2016 Couchbase Inc.©2016 Couchbase Inc.

FTS design / index partitioning

bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)

index partitions: A B C(groups of vbuckets) 0-399 400-799 800-1023

assign to FTS nodes: replicas, too:

FTS nodes:X Y Z

©2016 Couchbase Inc.©2016 Couchbase Inc.

FTS design / index partitioning

bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)

index partitions: A B C(groups of vbuckets) 0-399 400-799 800-1023

assign to FTS nodes: replicas, too:

FTS nodes:X Y Z

©2016 Couchbase Inc.©2016 Couchbase Inc.

FTS design / indexing

couchbase couchbase couchbase

FTS FTS FTS

DCP streamsfor incrementalindex updates

©2016 Couchbase Inc.©2016 Couchbase Inc.

FTS design / indexing

couchbase couchbase couchbase

FTS FTS FTS

DCP streamsfor incrementalindex updates

©2016 Couchbase Inc.©2016 Couchbase Inc.

FTS design / queries

a query sentto any FTSnode…

your application

REST

FTS FTS FTS

©2016 Couchbase Inc.©2016 Couchbase Inc.

FTS design / queries

a query sentto any FTSnode…

…is scatter / gatheredto the other

FTS nodesRE

ST

your application

FTS FTS FTS

top related