Slide 1

Slide 1 text

Couchbase Server Full Text Preview and Demo

Slide 2

Slide 2 text

©2015 Couchbase Inc. ‹#› Marty Schoch Engineer with Couchbase for 4 years. Mobile, Elasticsearch adapter, N1QL, etc. Lead contributor to Bleve, a full-text search library built in Go. Based in Vienna, VA. marty@couchbase.com @mschoch

Slide 3

Slide 3 text

©2015 Couchbase Inc. ‹#› agenda why cbft? what’s full-text search and how’s it work? design demo status / roadmap / what’s next

Slide 4

Slide 4 text

©2015 Couchbase Inc. ‹#› agenda why cbft? what’s full-text search and how’s it work? design demo status / roadmap / what’s next

Slide 5

Slide 5 text

©2015 Couchbase Inc. ‹#› why cbft? couchbase connectors… yes yes Lucidworks yes

Slide 6

Slide 6 text

©2015 Couchbase Inc. ‹#› why cbft? couchbase connectors… yet another tier & cluster to manage yes yes yes yes Lucidworks yes yes

Slide 7

Slide 7 text

©2015 Couchbase Inc. ‹#› why cbft? simple integrated 80/20 of features

Slide 8

Slide 8 text

©2015 Couchbase Inc. ‹#› agenda why cbft? what’s full-text search and how’s it work? design demo status / roadmap / what’s next

Slide 9

Slide 9 text

©2015 Couchbase Inc. ‹#› what’s full-text search?

Slide 10

Slide 10 text

©2015 Couchbase Inc. ‹#› advanced search

Slide 11

Slide 11 text

©2015 Couchbase Inc. ‹#› search results Spelling Suggestion Result Text Snippets Highlighted Search Terms

Slide 12

Slide 12 text

©2015 Couchbase Inc. ‹#› faceted search

Slide 13

Slide 13 text

©2015 Couchbase Inc. ‹#› JSON document in Couchbase Key: akay1980 Document: { “name”: “Alan Kay”, “description”: “... the wisest engineer ...” }

Slide 14

Slide 14 text

©2015 Couchbase Inc. ‹#› Text Analysis : tokenizer + token filters A pipeline of transformations One Tokenizer Zero or more Token Filters

Slide 15

Slide 15 text

©2015 Couchbase Inc. ‹#› Text Analysis : tokenizer + token filters “… the wisest engineer …” the wisest engineer • Seems like simple whitespace… but, this doesn’t work for all languages • Unicode standard rules help (see Unicode Standard Annex #29) • Still need to account for exceptions • E-mail addresses and URLs don’t follow normal rules

Slide 16

Slide 16 text

©2015 Couchbase Inc. ‹#› Text Analysis : tokenizer + token filters the wisest engineer Stop Word Removal the wisest engineer Stemming wise engineer

Slide 17

Slide 17 text

©2015 Couchbase Inc. ‹#› Search … wise … engineer … … … …, akay1980, … …, akay1980, … engineers engineer Exact Match Inverted Index Apply the same text analysis at search time that we used at index time.

Slide 18

Slide 18 text

©2015 Couchbase Inc. ‹#› Document Scoring ▪tf/idf scoring ▪Term Frequency ▪How often does a term occur in a document? ▪More often yields a higher score ▪Inverse Document Frequency ▪How many documents have this term? ▪More documents yields lower score (because it means the term is more common)

Slide 19

Slide 19 text

©2015 Couchbase Inc. ‹#› Quality Results • Getting high quality results depends on performing the right analysis for your text • Beware: adjustments to the mapping that increase precision may reduce recall (and the other way around)

Slide 20

Slide 20 text

©2015 Couchbase Inc. ‹#› agenda why cbft? what’s full-text search and how’s it work? design demo status / roadmap / what’s next

Slide 21

Slide 21 text

©2015 Couchbase Inc. ‹#› cbft design / index partitioning bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 X cbft nodes: assign to cbft nodes: (groups of vbuckets) 0-399 400-799 800-1023 index partitions: A B C (1024 vbuckets)

Slide 22

Slide 22 text

©2015 Couchbase Inc. ‹#› cbft design / index partitioning bucket partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 X cbft nodes: assign to cbft nodes: (groups of vbuckets) 0-399 400-799 800-1023 index partitions: A B C (1024 vbuckets) Y Z replicas, too:

Slide 23

Slide 23 text

©2015 Couchbase Inc. ‹#› cbft design / indexing couchbase couchbase couchbase cbft cbft cbft DCP streams for incremental index updates

Slide 24

Slide 24 text

©2015 Couchbase Inc. ‹#› cbft design / queries cbft cbft a query sent to any cbft node… your application REST cbft …is scatter / gathered to the other cbft nodes

Slide 25

Slide 25 text

©2015 Couchbase Inc. ‹#› agenda why cbft? what’s full-text search and how’s it work? design demo status / roadmap / what’s next

Slide 26

Slide 26 text

©2015 Couchbase Inc. ‹#› agenda why cbft? what’s full-text search and how’s it work? design demo status / roadmap / what’s next

Slide 27

Slide 27 text

©2015 Couchbase Inc. ‹#› project status cbft is developer preview! please help kick the tires http://labs.couchbase.com/cbft

Slide 28

Slide 28 text

©2015 Couchbase Inc. ‹#› project status / roadmap / what’s next today bleve full-text engine y advanced mappings y faceted search y incremental indexing y index partitioning and replication y index aliases y

Slide 29

Slide 29 text

©2015 Couchbase Inc. ‹#› project status / roadmap / what’s next today future bleve full-text engine y y advanced mappings y y faceted search y y incremental indexing y y index partitioning and replication y y index aliases y y integrated into Couchbase Server & N1QL y API stability y production quality y performance optimization / tuning y forestdb storage & partial rollbacks y security, SSL y more docs, examples, SDK support y

Slide 30

Slide 30 text

©2015 Couchbase Inc. ‹#› links & Q+A http://labs.couchbase.com/cbft downloads, getting started, tech docs and, where you can ask questions and share your feedback! THANKS!