©2015 Couchbase Inc. 3
about the speakers
Marty Schochlead contributor to bleve
the most popular, open-source
full-text indexing engine
for golang
©2015 Couchbase Inc. 4
agenda
why cbft?what’s full-text search and how’s it work?designdemostatus / roadmap / what’s next
©2015 Couchbase Inc. 5
agenda
why cbft?what’s full-text search and how’s it work?designdemostatus / roadmap / what’s next
©2015 Couchbase Inc. 7
why cbft?
couchbase connectors… yet another tier &
cluster to manage
yesyes
yesyes
Lucidworks yesyes
©2015 Couchbase Inc. 9
agenda
why cbft?what’s full-text search and how’s it work?designdemostatus / roadmap / what’s next
©2015 Couchbase Inc. 15
search results SpellingSuggestions
Result TextSnippets
HighlightedSearch Terms
©2015 Couchbase Inc. 17
JSON document in Couchbase
Key: akay1980
Document: {
“name”: “Alan Kay”, “description”: “... the wisest
engineer ...” }
©2015 Couchbase Inc. 18
Text Analysis : tokenizer + token filters
A pipeline of transformations
One Tokenizer
Zero or more Token Filters
©2015 Couchbase Inc. 19
“… the wisest engineer …”
thewises
tenginee
r• Seems like simple whitespace… but, this doesn’t work for
all languages• Unicode standard rules help (see Unicode Standard Annex
#29)• Still need to account for exceptions
• E-mail addresses and URLs don’t follow normal rules
Text Analysis : tokenizer + token filters
©2015 Couchbase Inc. 20
Text Analysis : tokenizer + token filters
thewises
tengineer
Stop WordRemoval the
wisest
engineer
Stemming wise engineer
©2015 Couchbase Inc. 21
Inverted Index
…
wise
…
engineer…
…
…
…, akay1980, …
…, akay1980, …
Inverted Index
©2015 Couchbase Inc. 22
Search
…
wise
…
engineer…
…
…
…, akay1980, …
…, akay1980, …
engineersInverted Index
©2015 Couchbase Inc. 23
Search
…
wise
…
engineer…
…
…
…, akay1980, …
…, akay1980, …
engineers
engineer
Apply the same analysis at search time that we used at index time.
Inverted Index
©2015 Couchbase Inc. 24
Search
…
wise
…
engineer…
…
…
…, akay1980, …
…, akay1980, …
engineers
engineer
Exact Match
Apply the same analysis at search time that we used at index time.
Inverted Index
©2015 Couchbase Inc. 25
Document Scoring
• tf/idf scoring• Term Frequency• How often does a term occur in
a doc?• More often yields a higher score
• Inverse Document Frequency• How many docs have this term?• More docs yield lower score
(because the term is more common)
©2015 Couchbase Inc. 26
Quality Results
• Getting high quality results depends on the right text analysis
• Beware: adjustments that increase precision may reduce recall (and the other way around)
©2015 Couchbase Inc. 27
agenda
why cbft?what’s full-text search and how’s it work?designdemostatus / roadmap / what’s next
©2015 Couchbase Inc. 29
cbft design / index partitioning
bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)
©2015 Couchbase Inc. 30
cbft design / index partitioning
bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)
index partitions: A B C
©2015 Couchbase Inc. 31
cbft design / index partitioning
bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)
index partitions: A B C
©2015 Couchbase Inc. 32
cbft design / index partitioning
bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)
index partitions: A B C
(groups of vbuckets) 0-399 400-799 800-1023
©2015 Couchbase Inc. 33
cbft design / index partitioning
bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)
index partitions: A B C
(groups of vbuckets) 0-399 400-799 800-1023
cbft nodes:
X
©2015 Couchbase Inc. 34
cbft design / index partitioning
bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)
index partitions: A B C
(groups of vbuckets) 0-399 400-799 800-1023
assign to cbft nodes:
cbft nodes:
X
©2015 Couchbase Inc. 35
cbft design / index partitioning
bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)
index partitions: A B C
(groups of vbuckets) 0-399 400-799 800-1023
assign to cbft nodes:
cbft nodes:
X Y Z
©2015 Couchbase Inc. 36
cbft design / index partitioning
bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)
index partitions: A B C
(groups of vbuckets) 0-399 400-799 800-1023
assign to cbft nodes:replicas, too:
cbft nodes:
X Y Z
©2015 Couchbase Inc. 37
cbft design / index partitioning
bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 (1024 vbuckets)
index partitions: A B C
(groups of vbuckets) 0-399 400-799 800-1023
assign to cbft nodes:replicas, too:
cbft nodes:
X Y Z
©2015 Couchbase Inc. 38
cbft design / indexing
couchbase couchbase couchbase
cbft cbft cbft
DCP streams
©2015 Couchbase Inc. 39
cbft design / indexing
couchbase couchbase couchbase
cbft cbft cbft
DCP streams
©2015 Couchbase Inc. 40
cbft design / queries
cbft cbft
a query sentto any cbftnode…
your application
cbftR
ES
T
©2015 Couchbase Inc. 41
cbft design / queries
cbft cbft
a query sentto any cbftnode…
…is scatter / gathered
to the other cbft nodes
your application
cbftR
ES
T
©2015 Couchbase Inc. 42
agenda
why cbft?what’s full-text search and how’s it work?designdemostatus / roadmap / what’s next
©2015 Couchbase Inc. 43
agenda
why cbft?what’s full-text search and how’s it work?designdemostatus / roadmap / what’s next
©2015 Couchbase Inc. 44
project status
cbft is developer preview!
please help kick the tires
http://labs.couchbase.com/cbft
©2015 Couchbase Inc. 45
project status / roadmap / what’s next
today
bleve full-text engine yadvanced mappings yfaceted search y
incremental indexing y
index partitioning and replication y
index aliasesy
©2015 Couchbase Inc. 46
project status / roadmap / what’s next
today future
bleve full-text engine yy
advanced mappings yy
faceted search yy
incremental indexing yy
index partitioning and replication yy
index aliasesy y
integrated into Couchbase Server & N1QLy
API stabilityy
production qualityy
performance optimization / tuningy
forestdb storage & partial rollbacks y
security, SSLy
more docs, examples, SDK supporty
©2015 Couchbase Inc. 47
links & Q+A
http://labs.couchbase.com/cbftdownloads, getting started, tech docs
and, share your feedback!
THANKS! (and please do the survey!)