Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Couchbase Server Full Text - Paris Nov 2015

Marty Schoch
November 10, 2015

Couchbase Server Full Text - Paris Nov 2015

Support for building full-text indexes is something we've been working towards at Couchbase for some time now. Initially released as a developer preview in June we've continued to improve the feature and integrate it with Couchbase Server. In this talk I'll give an overview of the full-text features we're adding to Couchbase, then we'll conclude with a live demo of the current state of the Couchbase Server integration.

Marty Schoch

November 10, 2015
Tweet

More Decks by Marty Schoch

Other Decks in Technology

Transcript

  1. ©2015 Couchbase Inc. ‹#› Marty Schoch Engineer with Couchbase for

    4 years. Mobile, Elasticsearch adapter, N1QL, etc. Lead contributor to Bleve, a full-text search library built in Go. [email protected] @mschoch
  2. ©2015 Couchbase Inc. ‹#› agenda why cbft? what’s full-text search

    and how’s it work? design demo status / roadmap / what’s next
  3. ©2015 Couchbase Inc. ‹#› agenda why cbft? what’s full-text search

    and how’s it work? design demo status / roadmap / what’s next
  4. ©2015 Couchbase Inc. ‹#› why cbft? couchbase connectors… yet another

    tier & cluster to manage yes yes yes yes Lucidworks yes yes
  5. ©2015 Couchbase Inc. ‹#› agenda why cbft? what’s full-text search

    and how’s it work? design demo status / roadmap / what’s next
  6. ©2015 Couchbase Inc. ‹#› JSON document in Couchbase Key: akay1980

    Document: { “name”: “Alan Kay”, “description”: “... the wisest engineer ...” }
  7. ©2015 Couchbase Inc. ‹#› Text Analysis : tokenizer + token

    filters A pipeline of transformations One Tokenizer Zero or more Token Filters
  8. ©2015 Couchbase Inc. ‹#› Text Analysis : tokenizer + token

    filters “… the wisest engineer …” the wisest engineer • Seems like simple whitespace… but, this doesn’t work for all languages • Unicode standard rules help (see Unicode Standard Annex #29) • Still need to account for exceptions • E-mail addresses and URLs don’t follow normal rules
  9. ©2015 Couchbase Inc. ‹#› Text Analysis : tokenizer + token

    filters the wisest engineer Stop Word Removal the wisest engineer Stemming wise engineer
  10. ©2015 Couchbase Inc. ‹#› Search … wise … engineer …

    … … …, akay1980, … …, akay1980, … engineers engineer Exact Match Inverted Index Apply the same text analysis at search time that we used at index time.
  11. ©2015 Couchbase Inc. ‹#› Document Scoring ▪tf/idf scoring ▪Term Frequency

    ▪How often does a term occur in a document? ▪More often yields a higher score ▪Inverse Document Frequency ▪How many documents have this term? ▪More documents yields lower score (because it means the term is more common)
  12. ©2015 Couchbase Inc. ‹#› Quality Results • Getting high quality

    results depends on performing the right analysis for your text • Beware: adjustments to the mapping that increase precision may reduce recall (and the other way around)
  13. ©2015 Couchbase Inc. ‹#› agenda why cbft? what’s full-text search

    and how’s it work? design demo status / roadmap / what’s next
  14. ©2015 Couchbase Inc. ‹#› cbft design / index partitioning bucket

    partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 X cbft nodes: assign to cbft nodes: (groups of vbuckets) 0-399 400-799 800-1023 index partitions: A B C (1024 vbuckets)
  15. ©2015 Couchbase Inc. ‹#› cbft design / index partitioning bucket

    partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 X cbft nodes: assign to cbft nodes: (groups of vbuckets) 0-399 400-799 800-1023 index partitions: A B C (1024 vbuckets) Y Z replicas, too:
  16. ©2015 Couchbase Inc. ‹#› cbft design / indexing couchbase couchbase

    couchbase cbft cbft cbft DCP streams for incremental index updates
  17. ©2015 Couchbase Inc. ‹#› cbft design / queries cbft cbft

    a query sent to any cbft node… your application REST cbft …is scatter / gathered to the other cbft nodes
  18. ©2015 Couchbase Inc. ‹#› agenda why cbft? what’s full-text search

    and how’s it work? design demo status / roadmap / what’s next
  19. ©2015 Couchbase Inc. ‹#› agenda why cbft? what’s full-text search

    and how’s it work? design demo status / roadmap / what’s next
  20. ©2015 Couchbase Inc. ‹#› project status cbft is developer preview!

    please help kick the tires http://labs.couchbase.com/cbft
  21. ©2015 Couchbase Inc. ‹#› project status / roadmap / what’s

    next today bleve full-text engine y advanced mappings y faceted search y incremental indexing y index partitioning and replication y index aliases y
  22. ©2015 Couchbase Inc. ‹#› project status / roadmap / what’s

    next today future bleve full-text engine y y advanced mappings y y faceted search y y incremental indexing y y index partitioning and replication y y index aliases y y integrated into Couchbase Server & N1QL y API stability y production quality y performance optimization / tuning y forestdb storage & partial rollbacks y security, SSL y more docs, examples, SDK support y
  23. ©2015 Couchbase Inc. ‹#› links & Q+A http://labs.couchbase.com/cbft downloads, getting

    started, tech docs and, where you can ask questions and share your feedback! THANKS!