Slide 1

Slide 1 text

Bleve Go Israel October 2018 Marty Schoch

Slide 2

Slide 2 text

About Marty 7 years with Couchbase ● Databases ● Indexing ● Search ● Go ● Distributed Systems

Slide 3

Slide 3 text

Made Possible By...

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

Text Analysis Pipeline One Tokenizer Zero or more Token Filters

Slide 8

Slide 8 text

Tokenization “the wisest engineer” the wisest engineer

Slide 9

Slide 9 text

Token Filters the wisest engineer the wisest engineer wise engineer Stop Word Removal Stemming

Slide 10

Slide 10 text

Searching the Index engineers engineer Apply the same text analysis at search time that we used at index time. engineer ... ... ... ... ... ... wise Inverted Index exact match

Slide 11

Slide 11 text

Code

Slide 12

Slide 12 text

import "github.com/blevesearch/bleve" type WebPage struct { Content string } func main() { mapping := bleve.NewIndexMapping() index, err := bleve.New("website.bleve", mapping) if err != nil { log.Fatal(err) } page := WebPage{"..."} err = index.Index("p1", page) if err != nil { log.Fatal(err) } fmt.Println("Indexed Document") } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Slide 13

Slide 13 text

import "github.com/blevesearch/bleve" type WebPage struct { Content string } func main() { mapping := bleve.NewIndexMapping() index, err := bleve.New("website.bleve", mapping) if err != nil { log.Fatal(err) } page := WebPage{"..."} err = index.Index("p1", page) if err != nil { log.Fatal(err) } fmt.Println("Indexed Document") } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Slide 14

Slide 14 text

import "github.com/blevesearch/bleve" type WebPage struct { Content string } func main() { mapping := bleve.NewIndexMapping() index, err := bleve.New("website.bleve", mapping) if err != nil { log.Fatal(err) } page := WebPage{"..."} err = index.Index("p1", page) if err != nil { log.Fatal(err) } fmt.Println("Indexed Document") } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Slide 15

Slide 15 text

Index Mapping Go Struct ● Language ● Ignore fields

Slide 16

Slide 16 text

import "github.com/blevesearch/bleve" type WebPage struct { Content string } func main() { mapping := bleve.NewIndexMapping() index, err := bleve.New("website.bleve", mapping) if err != nil { log.Fatal(err) } page := WebPage{"..."} err = index.Index("p1", page) if err != nil { log.Fatal(err) } fmt.Println("Indexed Document") } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Slide 17

Slide 17 text

import "github.com/blevesearch/bleve" type WebPage struct { Content string } func main() { mapping := bleve.NewIndexMapping() index, err := bleve.New("website.bleve", mapping) if err != nil { log.Fatal(err) } page := WebPage{"..."} err = index.Index("p1", page) if err != nil { log.Fatal(err) } fmt.Println("Indexed Document") } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Slide 18

Slide 18 text

import "github.com/blevesearch/bleve" type WebPage struct { Content string } func main() { mapping := bleve.NewIndexMapping() index, err := bleve.New("website.bleve", mapping) if err != nil { log.Fatal(err) } page := WebPage{"..."} err = index.Index("p1", page) if err != nil { log.Fatal(err) } fmt.Println("Indexed Document") } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Slide 19

Slide 19 text

website.bleve

Slide 20

Slide 20 text

import "github.com/blevesearch/bleve" func main() { index, err := bleve.Open("website.bleve") if err != nil { log.Fatal(err) } query := bleve.NewMatchQuery("bleve") request := bleve.NewSearchRequest(query) result, err := index.Search(request) if err != nil { log.Fatal(err) } fmt.Println(result) } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Slide 21

Slide 21 text

import "github.com/blevesearch/bleve" func main() { index, err := bleve.Open("website.bleve") if err != nil { log.Fatal(err) } query := bleve.NewMatchQuery("bleve") request := bleve.NewSearchRequest(query) result, err := index.Search(request) if err != nil { log.Fatal(err) } fmt.Println(result) } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Slide 22

Slide 22 text

import "github.com/blevesearch/bleve" func main() { index, err := bleve.Open("website.bleve") if err != nil { log.Fatal(err) } query := bleve.NewMatchQuery("bleve") request := bleve.NewSearchRequest(query) result, err := index.Search(request) if err != nil { log.Fatal(err) } fmt.Println(result) } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Slide 23

Slide 23 text

import "github.com/blevesearch/bleve" func main() { index, err := bleve.Open("website.bleve") if err != nil { log.Fatal(err) } query := bleve.NewMatchQuery("bleve") request := bleve.NewSearchRequest(query) result, err := index.Search(request) if err != nil { log.Fatal(err) } fmt.Println(result) } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Slide 24

Slide 24 text

Other Types of Queries ● Phrase Queries ● Multi-Phrase Queries ● Fuzzy Queries ● Regular Expression ● Wildcard

Slide 25

Slide 25 text

Bleve Beyond Text ● Exact String Comparison ● Numeric Range Queries ● Date Range Queries ● Geo Point Distance Queries ● Combine them with AND/OR ● Bleve’s Inverted Index is very well suited for this

Slide 26

Slide 26 text

No content

Slide 27

Slide 27 text

No. Bleve is a library. But… It makes building a distributed search engine possible.

Slide 28

Slide 28 text

Couchbase Full Text Search

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

bleve index bleve client bleve client index alias query request bleve index bleve index bleve index index alias bleve index bleve index index alias query requests Using IndexAlias to Search Multiple Nodes/Indexes

Slide 32

Slide 32 text

https://github.com/mosuka/blast

Slide 33

Slide 33 text

Scorch

Slide 34

Slide 34 text

Too Big and Too Slow Bleve Index

Slide 35

Slide 35 text

Segmented Index

Slide 36

Slide 36 text

Monolithic Key/Value Index Back Index New Values Current Values Update

Slide 37

Slide 37 text

Segmented Index Update

Slide 38

Slide 38 text

New Storage Abstractions

Slide 39

Slide 39 text

Term Dictionary

Slide 40

Slide 40 text

randomised

Slide 41

Slide 41 text

randomised randomized ~1 Levenshtein Edit Distance

Slide 42

Slide 42 text

Finite State Transducer Term Dictionary mon, tues, thurs Levenshtein Automata tues~2

Slide 43

Slide 43 text

Vellum backed Term Dictionary

Slide 44

Slide 44 text

Postings List

Slide 45

Slide 45 text

Bitmap backed Postings List

Slide 46

Slide 46 text

Roaring Bitmaps •Uncompressed Bitsets •Slice of Integers •Run-length encoding •Chunked Encoding, combining each

Slide 47

Slide 47 text

New Storage Abstractions + Segments How Does It Fit Together?

Slide 48

Slide 48 text

Introducing New Segments

Slide 49

Slide 49 text

No content

Slide 50

Slide 50 text

No content

Slide 51

Slide 51 text

No content

Slide 52

Slide 52 text

No content

Slide 53

Slide 53 text

No content

Slide 54

Slide 54 text

No content

Slide 55

Slide 55 text

No content

Slide 56

Slide 56 text

No content

Slide 57

Slide 57 text

Persisting Segments

Slide 58

Slide 58 text

No content

Slide 59

Slide 59 text

No content

Slide 60

Slide 60 text

No content

Slide 61

Slide 61 text

f.zap

Slide 62

Slide 62 text

f.zap

Slide 63

Slide 63 text

f.zap

Slide 64

Slide 64 text

f.zap

Slide 65

Slide 65 text

f.zap

Slide 66

Slide 66 text

f.zap

Slide 67

Slide 67 text

Merging Segments

Slide 68

Slide 68 text

No content

Slide 69

Slide 69 text

No content

Slide 70

Slide 70 text

No content

Slide 71

Slide 71 text

No content

Slide 72

Slide 72 text

No content

Slide 73

Slide 73 text

No content

Slide 74

Slide 74 text

Scorch Indexing Performance 5.5.0-2780 5.5.0-2780 Scorch Improved (> 1 is better) Index size(MB), 1 node, 1M docs 5,063 973 5.20 Index build time (sec), 1 node, 1M docs 36 28 1.29 Index size(MB), 3 nodes, 1M docs 5,427 1,003 5.41 Index build time (sec), 3 nodes, 1M docs 35 23 1.52 Index size(MB), 2 nodes, 10M docs, DGM 53,334 14,034 3.80 Index build time (sec), 2 nodes, 10M docs, DGM 687 274 2.51

Slide 75

Slide 75 text

Scorch Query Latency 80th percentile query latency (ms), no kv-load, wiki 1M x 1KB, 1 node, FTS Performance compared to upside_down/moss Fuzzy-1 Searches ~92% Term date facet Searches ~77% Phrase Searches ~50% High Frequency Conjunction Searches ~33% Fuzzy-2 Searches ~24% Prefix/Wildcard Searches ~25%

Slide 76

Slide 76 text

Scorch Query Throughput Average Throughput (q/sec), no kv-load, wiki 1M x 1KB, 1 node, FTS Performance compared to upside_down/moss Fuzzy-1 Searches ~4X Term date facet Searches ~3X Fuzzy-2 Searches ~2X Phrase Searches ~2X Medium/Low Frequency Term Searches 25/40% Prefix/Wildcard Searches 25%

Slide 77

Slide 77 text

Future Overhaul Relase Managment 1.0 - Formalize official non- scorch release 1.1 / 2.0 - First officially supported (and default to) Scorch

Slide 78

Slide 78 text

Thanks Marty Schoch marty.schoch@gmail.com @mschoch