full-text search: how it works and what it can do – couchbase connect 2016
TRANSCRIPT
©2016 Couchbase Inc.
Couchbase Full Text Search (FTS)
©2016 Couchbase Inc. 2©2016 Couchbase Inc.
about your speakers
Marty Schoch Steve Yen
©2016 Couchbase Inc.©2016 Couchbase Inc.
agenda
why?what is it? how does it work?how does it scale?demobest practicesstatus / roadmap / what’s next
©2016 Couchbase Inc.©2016 Couchbase Inc.
agenda
why?
©2016 Couchbase Inc.©2016 Couchbase Inc.
couchbase users need to search their documents
©2016 Couchbase Inc.©2016 Couchbase Inc.
dedicated search solutions
✗ Provision✗ Install✗ Integrate✗ Transfer data✗ Learn✗ Manage✗ Troubleshoot
≠
©2016 Couchbase Inc.©2016 Couchbase Inc.
why Full Text Search?
why Full Text Search?
simpleintegrated
80/20 of features
©2016 Couchbase Inc.©2016 Couchbase Inc.
agenda
what is it?
©2016 Couchbase Inc.©2016 Couchbase Inc.
what’s Full Text Search?
©2016 Couchbase Inc.©2016 Couchbase Inc.
what’s Full Text Search?
©2016 Couchbase Inc.©2016 Couchbase Inc.
search results
Result Text Snippets
©2016 Couchbase Inc.©2016 Couchbase Inc.
search results
Result Text Snippets
Highlighted Search Terms
©2016 Couchbase Inc.©2016 Couchbase Inc.
agenda
how does it work?
©2016 Couchbase Inc.©2016 Couchbase Inc.
how does it work?
• Inverted indexes• Language awareness• Scoring
©2016 Couchbase Inc.©2016 Couchbase Inc.
inverted index
Terms
my: Doc 1, Doc 2, Doc 3dog: Doc 1, Doc 2, Doc 81has: Doc 1, Doc 2, Doc 3fleas: Doc 1, Doc 81…
Where found
©2016 Couchbase Inc.©2016 Couchbase Inc.
language aware
Document contains…
Beauty
Indexed as…
beauti
stemmingstemming Text Analysis
✔Match!
User searches…
Beautiful
Searched as…
beauti
©2016 Couchbase Inc.©2016 Couchbase Inc.
scoring
©2016 Couchbase Inc.©2016 Couchbase Inc.
TF/IDF scoring
• TF = Term Frequency• How often does a term occur in a document?• More often yields a higher score
• IDF = Inverse Document Frequency• How many documents have this term?• More documents yields lower score • (because it means the term is more common)
©2016 Couchbase Inc.©2016 Couchbase Inc.
index mapping
©2016 Couchbase Inc.©2016 Couchbase Inc.
index mapping
•Exclude fields/sub-sections•Configure indexing behavior by type of document (beer vs brewery)•Configure indexing behavior per-field• Index Fields•Nested structures• Arrays
©2016 Couchbase Inc.©2016 Couchbase Inc.
precision vs. recall
• Precision – ratio of document matches that are actually relevant• Recall – ratio of relevant documents that are actually matched• High quality results depend on performing the right analysis for your text• Beware: increasing precision may reduce recall (and vice versa)
©2016 Couchbase Inc.©2016 Couchbase Inc.
agenda
how does it scale?
©2016 Couchbase Inc.©2016 Couchbase Inc.
how does it scale?
©2016 Couchbase Inc.©2016 Couchbase Inc.
how does it scale?
✔auto index partitioning? (hash partitioning))
©2016 Couchbase Inc.©2016 Couchbase Inc.
how does it scale?
✔auto index partitioning (hash partitioning)
✔ (replicas promoted)
©2016 Couchbase Inc.©2016 Couchbase Inc.
how does it scale?
✔auto index partitioning (hash partitioning)
✔to multiple FTS nodes? (auto-placement)
✔ (replicas promoted)
©2016 Couchbase Inc.©2016 Couchbase Inc.
how does it scale?
✔auto index partitioning (hash partitioning)
✔to multiple FTS nodes (auto-placement)
✔ (replicas promoted)
©2016 Couchbase Inc.©2016 Couchbase Inc.
how does it scale?
✔auto index partitioning (hash partitioning)
✔to multiple FTS nodes (auto-placement)
✔rebalance? (add/swap/remove)
(replicas promoted)
©2016 Couchbase Inc.©2016 Couchbase Inc.
how does it scale?
✔auto index partitioning (hash partitioning)
✔to multiple FTS nodes (auto-placement)
✔rebalance (add/swap/remove)
✔ (replicas promoted)
©2016 Couchbase Inc.©2016 Couchbase Inc.
how does it scale?
✔auto index partitioning (hash partitioning)
✔to multiple FTS nodes (auto-placement)
✔rebalance (add/swap/remove)
✔scatter/gather queries?)
©2016 Couchbase Inc.©2016 Couchbase Inc.
how does it scale?
✔auto index partitioning (hash partitioning)
✔to multiple FTS nodes (auto-placement)
✔rebalance (add/swap/remove)
✔scatter/gather queries (partial results ok)
✔ (replicas promoted)
©2016 Couchbase Inc.©2016 Couchbase Inc.
how does it scale?
✔auto index partitioning (hash partitioning)
✔to multiple FTS nodes (auto-placement)
✔rebalance (add/swap/remove)
✔scatter/gather queries (partial results ok)
©2016 Couchbase Inc.©2016 Couchbase Inc.
how does it scale?
✔auto index partitioning (hash partitioning)
✔to multiple FTS nodes (auto-placement)
✔rebalance (add/swap/remove)
✔scatter/gather queries (partial results ok)
✔replicas? (only primaries queried)
©2016 Couchbase Inc.©2016 Couchbase Inc.
how does it scale?
✔auto index partitioning (hash partitioning)
✔to multiple FTS nodes (auto-placement)
✔rebalance (add/swap/remove)
✔scatter/gather queries (partial results ok)
✔replicas (only primaries queried)
©2016 Couchbase Inc.©2016 Couchbase Inc.
how does it scale?
✔auto index partitioning (hash partitioning)
✔to multiple FTS nodes (auto-placement)
✔rebalance (add/swap/remove)
✔scatter/gather queries (partial results ok)
✔replicas (only primaries queried)
✔failover? (replicas promoted)
©2016 Couchbase Inc.©2016 Couchbase Inc.
how does it scale?
✔auto index partitioning (hash partitioning)
✔to multiple FTS nodes (auto-placement)
✔rebalance (add/swap/remove)
✔scatter/gather queries (partial results ok)
✔replicas (only primaries queried)
✔failover (replicas promoted)
©2016 Couchbase Inc.©2016 Couchbase Inc.
agenda
demo
©2016 Couchbase Inc.©2016 Couchbase Inc.
agenda
best practices
©2016 Couchbase Inc.©2016 Couchbase Inc.
only use explicit field mappings in production
{ “type” : ”brewery”, “random_number” : 4, “edible” : false}
Dynamic mappings are great, until…
©2016 Couchbase Inc.©2016 Couchbase Inc.
only use explicit field mappings in production
{ “type” : ”brewery”,
“comments”: 4k of text “random_number” : 4, “edible” : false}
Developer adds one small field
©2016 Couchbase Inc.©2016 Couchbase Inc.
only use explicit field mappings in production
{ “type” : ”brewery”,
“comments”: 4k of text “random_number” : 4, “edible” : false}
Developer adds one small field
©2016 Couchbase Inc.©2016 Couchbase Inc.
always use Index Aliases
Index Rebuilding
©2016 Couchbase Inc.©2016 Couchbase Inc.
always use Index Aliases
/users /usersV1
©2016 Couchbase Inc.©2016 Couchbase Inc.
always use Index Aliases
/users
/usersV1
/usersV2
Indexing 55%
©2016 Couchbase Inc.©2016 Couchbase Inc.
always use Index Aliases
/users
/usersV1
/usersV2
Atomic Switch to /usersV2
©2016 Couchbase Inc.©2016 Couchbase Inc.
always use Index Aliases
/users
/usersV2
Atomic Switch to /usersV2
©2016 Couchbase Inc. 52©2016 Couchbase Inc.
go watch!
Dave Starling
seenit
©2016 Couchbase Inc.©2016 Couchbase Inc.
agenda
status / roadmap / what’s next
©2016 Couchbase Inc.©2016 Couchbase Inc.
project status
FTS is developer preview in 4.5, 4.6
planned GA in Spockplease help kick the tires
http://www.couchbase.com/download
©2016 Couchbase Inc.
Couchbase Full Text Search (FTS)
thanks!
©2016 Couchbase Inc.©2016 Couchbase Inc.
links & Q+A
http://NICE-URL-TODO-HEREdownloads, getting started, tech docs
and, where you can ask questions
and share your feedback!
©2016 Couchbase Inc. 57©2016 Couchbase Inc.
EXTRA SLIDES
©2016 Couchbase Inc.©2016 Couchbase Inc.
FTS design
couchbase couchbase couchbase
FTS FTS FTS
cfg
DCP streamsfor incrementalindex updates
a cfg bucketholds metadata
about the indexes
©2016 Couchbase Inc. 59
Transition Slide TitleTransition Slide Subtitle Goes Here
©2016 Couchbase Inc. 60
Transition Slide TitleTransition Slide Subtitle Goes Here
©2016 Couchbase Inc. 61©2016 Couchbase Inc.
Title of Slide Goes Here
• Heading 1• Heading 2
• Heading 3• Heading 4
©2016 Couchbase Inc. 62
Title of Slide Goes Here
• Heading 1• Heading 2
• Heading 3• Heading 4
• Heading 1• Heading 2
• Heading 3• Heading 4
©2016 Couchbase Inc. 63
Speaker NameSpeakers TitleContact information
IMAGE GOES HERE
©2016 Couchbase Inc. 64
Thank You!
©2016 Couchbase Inc.©2016 Couchbase Inc.
agenda
design
©2016 Couchbase Inc.©2016 Couchbase Inc.
FTS design / index partitioning
©2016 Couchbase Inc.©2016 Couchbase Inc.
FTS design / index partitioning
bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)
FTS nodes:X Y Z
©2016 Couchbase Inc.©2016 Couchbase Inc.
FTS design / index partitioning
bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)
index partitions: (groups of vbuckets)
FTS nodes:X Y Z
©2016 Couchbase Inc.©2016 Couchbase Inc.
FTS design / index partitioning
bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)
index partitions: A B C(groups of vbuckets) 0-399 400-799 800-1023
FTS nodes:X Y Z
©2016 Couchbase Inc.©2016 Couchbase Inc.
FTS design / index partitioning
bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)
index partitions: A B C(groups of vbuckets) 0-399 400-799 800-1023
assign to FTS nodes:
FTS nodes:X Y Z
©2016 Couchbase Inc.©2016 Couchbase Inc.
FTS design / index partitioning
bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)
index partitions: A B C(groups of vbuckets) 0-399 400-799 800-1023
assign to FTS nodes:
FTS nodes:X Y Z
©2016 Couchbase Inc.©2016 Couchbase Inc.
FTS design / index partitioning
bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)
index partitions: A B C(groups of vbuckets) 0-399 400-799 800-1023
assign to FTS nodes: replicas, too:
FTS nodes:X Y Z
©2016 Couchbase Inc.©2016 Couchbase Inc.
FTS design / index partitioning
bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)
index partitions: A B C(groups of vbuckets) 0-399 400-799 800-1023
assign to FTS nodes: replicas, too:
FTS nodes:X Y Z
©2016 Couchbase Inc.©2016 Couchbase Inc.
FTS design / indexing
couchbase couchbase couchbase
FTS FTS FTS
DCP streamsfor incrementalindex updates
©2016 Couchbase Inc.©2016 Couchbase Inc.
FTS design / indexing
couchbase couchbase couchbase
FTS FTS FTS
DCP streamsfor incrementalindex updates
©2016 Couchbase Inc.©2016 Couchbase Inc.
FTS design / queries
a query sentto any FTSnode…
your application
REST
FTS FTS FTS
©2016 Couchbase Inc.©2016 Couchbase Inc.
FTS design / queries
a query sentto any FTSnode…
…is scatter / gatheredto the other
FTS nodesRE
ST
your application
FTS FTS FTS