Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Couchbase Server Full Text - Paris Nov 2015

Avatar for Marty Schoch Marty Schoch
November 10, 2015

Couchbase Server Full Text - Paris Nov 2015

Support for building full-text indexes is something we've been working towards at Couchbase for some time now. Initially released as a developer preview in June we've continued to improve the feature and integrate it with Couchbase Server. In this talk I'll give an overview of the full-text features we're adding to Couchbase, then we'll conclude with a live demo of the current state of the Couchbase Server integration.

Avatar for Marty Schoch

Marty Schoch

November 10, 2015
Tweet

More Decks by Marty Schoch

Other Decks in Technology

Transcript

  1. ©2015 Couchbase Inc. ‹#› Marty Schoch Engineer with Couchbase for

    4 years. Mobile, Elasticsearch adapter, N1QL, etc. Lead contributor to Bleve, a full-text search library built in Go. [email protected] @mschoch
  2. ©2015 Couchbase Inc. ‹#› agenda why cbft? what’s full-text search

    and how’s it work? design demo status / roadmap / what’s next
  3. ©2015 Couchbase Inc. ‹#› agenda why cbft? what’s full-text search

    and how’s it work? design demo status / roadmap / what’s next
  4. ©2015 Couchbase Inc. ‹#› why cbft? couchbase connectors… yet another

    tier & cluster to manage yes yes yes yes Lucidworks yes yes
  5. ©2015 Couchbase Inc. ‹#› agenda why cbft? what’s full-text search

    and how’s it work? design demo status / roadmap / what’s next
  6. ©2015 Couchbase Inc. ‹#› JSON document in Couchbase Key: akay1980

    Document: { “name”: “Alan Kay”, “description”: “... the wisest engineer ...” }
  7. ©2015 Couchbase Inc. ‹#› Text Analysis : tokenizer + token

    filters A pipeline of transformations One Tokenizer Zero or more Token Filters
  8. ©2015 Couchbase Inc. ‹#› Text Analysis : tokenizer + token

    filters “… the wisest engineer …” the wisest engineer • Seems like simple whitespace… but, this doesn’t work for all languages • Unicode standard rules help (see Unicode Standard Annex #29) • Still need to account for exceptions • E-mail addresses and URLs don’t follow normal rules
  9. ©2015 Couchbase Inc. ‹#› Text Analysis : tokenizer + token

    filters the wisest engineer Stop Word Removal the wisest engineer Stemming wise engineer
  10. ©2015 Couchbase Inc. ‹#› Search … wise … engineer …

    … … …, akay1980, … …, akay1980, … engineers engineer Exact Match Inverted Index Apply the same text analysis at search time that we used at index time.
  11. ©2015 Couchbase Inc. ‹#› Document Scoring ▪tf/idf scoring ▪Term Frequency

    ▪How often does a term occur in a document? ▪More often yields a higher score ▪Inverse Document Frequency ▪How many documents have this term? ▪More documents yields lower score (because it means the term is more common)
  12. ©2015 Couchbase Inc. ‹#› Quality Results • Getting high quality

    results depends on performing the right analysis for your text • Beware: adjustments to the mapping that increase precision may reduce recall (and the other way around)
  13. ©2015 Couchbase Inc. ‹#› agenda why cbft? what’s full-text search

    and how’s it work? design demo status / roadmap / what’s next
  14. ©2015 Couchbase Inc. ‹#› cbft design / index partitioning bucket

    partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 X cbft nodes: assign to cbft nodes: (groups of vbuckets) 0-399 400-799 800-1023 index partitions: A B C (1024 vbuckets)
  15. ©2015 Couchbase Inc. ‹#› cbft design / index partitioning bucket

    partitions: 0, 1, 2, 3, 4, … … ,1021, 1022, 1023 X cbft nodes: assign to cbft nodes: (groups of vbuckets) 0-399 400-799 800-1023 index partitions: A B C (1024 vbuckets) Y Z replicas, too:
  16. ©2015 Couchbase Inc. ‹#› cbft design / indexing couchbase couchbase

    couchbase cbft cbft cbft DCP streams for incremental index updates
  17. ©2015 Couchbase Inc. ‹#› cbft design / queries cbft cbft

    a query sent to any cbft node… your application REST cbft …is scatter / gathered to the other cbft nodes
  18. ©2015 Couchbase Inc. ‹#› agenda why cbft? what’s full-text search

    and how’s it work? design demo status / roadmap / what’s next
  19. ©2015 Couchbase Inc. ‹#› agenda why cbft? what’s full-text search

    and how’s it work? design demo status / roadmap / what’s next
  20. ©2015 Couchbase Inc. ‹#› project status cbft is developer preview!

    please help kick the tires http://labs.couchbase.com/cbft
  21. ©2015 Couchbase Inc. ‹#› project status / roadmap / what’s

    next today bleve full-text engine y advanced mappings y faceted search y incremental indexing y index partitioning and replication y index aliases y
  22. ©2015 Couchbase Inc. ‹#› project status / roadmap / what’s

    next today future bleve full-text engine y y advanced mappings y y faceted search y y incremental indexing y y index partitioning and replication y y index aliases y y integrated into Couchbase Server & N1QL y API stability y production quality y performance optimization / tuning y forestdb storage & partial rollbacks y security, SSL y more docs, examples, SDK support y
  23. ©2015 Couchbase Inc. ‹#› links & Q+A http://labs.couchbase.com/cbft downloads, getting

    started, tech docs and, where you can ask questions and share your feedback! THANKS!