Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Bleve — Go Israel

Bleve — Go Israel

In this talk we'll start with an overview of the functionality provided by Bleve. Next we'll look at some examples of how you can integrate Bleve with your Go applications. Finally, we'll talk about Scorch, the latest index scheme used by Bleve, and how it fits into the future of the project.

16cdfb0c4af5297e261cb36e30fa5c20?s=128

Marty Schoch

October 29, 2018
Tweet

More Decks by Marty Schoch

Other Decks in Technology

Transcript

  1. Bleve Go Israel October 2018 Marty Schoch

  2. About Marty 7 years with Couchbase • Databases • Indexing

    • Search • Go • Distributed Systems
  3. Made Possible By...

  4. None
  5. None
  6. None
  7. Text Analysis Pipeline One Tokenizer Zero or more Token Filters

  8. Tokenization “the wisest engineer” the wisest engineer

  9. Token Filters the wisest engineer the wisest engineer wise engineer

    Stop Word Removal Stemming
  10. Searching the Index engineers engineer Apply the same text analysis

    at search time that we used at index time. engineer ... ... ... ... ... ... wise Inverted Index exact match
  11. Code

  12. import "github.com/blevesearch/bleve" type WebPage struct { Content string } func

    main() { mapping := bleve.NewIndexMapping() index, err := bleve.New("website.bleve", mapping) if err != nil { log.Fatal(err) } page := WebPage{"..."} err = index.Index("p1", page) if err != nil { log.Fatal(err) } fmt.Println("Indexed Document") } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
  13. import "github.com/blevesearch/bleve" type WebPage struct { Content string } func

    main() { mapping := bleve.NewIndexMapping() index, err := bleve.New("website.bleve", mapping) if err != nil { log.Fatal(err) } page := WebPage{"..."} err = index.Index("p1", page) if err != nil { log.Fatal(err) } fmt.Println("Indexed Document") } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
  14. import "github.com/blevesearch/bleve" type WebPage struct { Content string } func

    main() { mapping := bleve.NewIndexMapping() index, err := bleve.New("website.bleve", mapping) if err != nil { log.Fatal(err) } page := WebPage{"..."} err = index.Index("p1", page) if err != nil { log.Fatal(err) } fmt.Println("Indexed Document") } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
  15. Index Mapping Go Struct • Language • Ignore fields

  16. import "github.com/blevesearch/bleve" type WebPage struct { Content string } func

    main() { mapping := bleve.NewIndexMapping() index, err := bleve.New("website.bleve", mapping) if err != nil { log.Fatal(err) } page := WebPage{"..."} err = index.Index("p1", page) if err != nil { log.Fatal(err) } fmt.Println("Indexed Document") } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
  17. import "github.com/blevesearch/bleve" type WebPage struct { Content string } func

    main() { mapping := bleve.NewIndexMapping() index, err := bleve.New("website.bleve", mapping) if err != nil { log.Fatal(err) } page := WebPage{"..."} err = index.Index("p1", page) if err != nil { log.Fatal(err) } fmt.Println("Indexed Document") } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
  18. import "github.com/blevesearch/bleve" type WebPage struct { Content string } func

    main() { mapping := bleve.NewIndexMapping() index, err := bleve.New("website.bleve", mapping) if err != nil { log.Fatal(err) } page := WebPage{"..."} err = index.Index("p1", page) if err != nil { log.Fatal(err) } fmt.Println("Indexed Document") } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
  19. website.bleve

  20. import "github.com/blevesearch/bleve" func main() { index, err := bleve.Open("website.bleve") if

    err != nil { log.Fatal(err) } query := bleve.NewMatchQuery("bleve") request := bleve.NewSearchRequest(query) result, err := index.Search(request) if err != nil { log.Fatal(err) } fmt.Println(result) } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  21. import "github.com/blevesearch/bleve" func main() { index, err := bleve.Open("website.bleve") if

    err != nil { log.Fatal(err) } query := bleve.NewMatchQuery("bleve") request := bleve.NewSearchRequest(query) result, err := index.Search(request) if err != nil { log.Fatal(err) } fmt.Println(result) } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  22. import "github.com/blevesearch/bleve" func main() { index, err := bleve.Open("website.bleve") if

    err != nil { log.Fatal(err) } query := bleve.NewMatchQuery("bleve") request := bleve.NewSearchRequest(query) result, err := index.Search(request) if err != nil { log.Fatal(err) } fmt.Println(result) } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  23. import "github.com/blevesearch/bleve" func main() { index, err := bleve.Open("website.bleve") if

    err != nil { log.Fatal(err) } query := bleve.NewMatchQuery("bleve") request := bleve.NewSearchRequest(query) result, err := index.Search(request) if err != nil { log.Fatal(err) } fmt.Println(result) } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  24. Other Types of Queries • Phrase Queries • Multi-Phrase Queries

    • Fuzzy Queries • Regular Expression • Wildcard
  25. Bleve Beyond Text • Exact String Comparison • Numeric Range

    Queries • Date Range Queries • Geo Point Distance Queries • Combine them with AND/OR • Bleve’s Inverted Index is very well suited for this
  26. None
  27. No. Bleve is a library. But… It makes building a

    distributed search engine possible.
  28. Couchbase Full Text Search

  29. None
  30. None
  31. bleve index bleve client bleve client index alias query request

    bleve index bleve index bleve index index alias bleve index bleve index index alias query requests Using IndexAlias to Search Multiple Nodes/Indexes
  32. https://github.com/mosuka/blast

  33. Scorch

  34. Too Big and Too Slow Bleve Index

  35. Segmented Index

  36. Monolithic Key/Value Index Back Index New Values Current Values Update

  37. Segmented Index Update

  38. New Storage Abstractions

  39. Term Dictionary

  40. randomised

  41. randomised randomized ~1 Levenshtein Edit Distance

  42. Finite State Transducer Term Dictionary mon, tues, thurs Levenshtein Automata

    tues~2
  43. Vellum backed Term Dictionary

  44. Postings List

  45. Bitmap backed Postings List

  46. Roaring Bitmaps •Uncompressed Bitsets •Slice of Integers •Run-length encoding •Chunked

    Encoding, combining each
  47. New Storage Abstractions + Segments How Does It Fit Together?

  48. Introducing New Segments

  49. None
  50. None
  51. None
  52. None
  53. None
  54. None
  55. None
  56. None
  57. Persisting Segments

  58. None
  59. None
  60. None
  61. f.zap

  62. f.zap

  63. f.zap

  64. f.zap

  65. f.zap

  66. f.zap

  67. Merging Segments

  68. None
  69. None
  70. None
  71. None
  72. None
  73. None
  74. Scorch Indexing Performance 5.5.0-2780 5.5.0-2780 Scorch Improved (> 1 is

    better) Index size(MB), 1 node, 1M docs 5,063 973 5.20 Index build time (sec), 1 node, 1M docs 36 28 1.29 Index size(MB), 3 nodes, 1M docs 5,427 1,003 5.41 Index build time (sec), 3 nodes, 1M docs 35 23 1.52 Index size(MB), 2 nodes, 10M docs, DGM 53,334 14,034 3.80 Index build time (sec), 2 nodes, 10M docs, DGM 687 274 2.51
  75. Scorch Query Latency 80th percentile query latency (ms), no kv-load,

    wiki 1M x 1KB, 1 node, FTS Performance compared to upside_down/moss Fuzzy-1 Searches ~92% Term date facet Searches ~77% Phrase Searches ~50% High Frequency Conjunction Searches ~33% Fuzzy-2 Searches ~24% Prefix/Wildcard Searches ~25%
  76. Scorch Query Throughput Average Throughput (q/sec), no kv-load, wiki 1M

    x 1KB, 1 node, FTS Performance compared to upside_down/moss Fuzzy-1 Searches ~4X Term date facet Searches ~3X Fuzzy-2 Searches ~2X Phrase Searches ~2X Medium/Low Frequency Term Searches 25/40% Prefix/Wildcard Searches 25%
  77. Future Overhaul Relase Managment 1.0 - Formalize official non- scorch

    release 1.1 / 2.0 - First officially supported (and default to) Scorch
  78. Thanks Marty Schoch marty.schoch@gmail.com @mschoch