Slide 1

Slide 1 text

ELASTICSEARCH text to full-text within 1 hour

Slide 2

Slide 2 text

Mark van Straten @markvanstraten Q42 https://q42.com

Slide 3

Slide 3 text

What is full text searching?

Slide 4

Slide 4 text

What is filtering

Slide 5

Slide 5 text

What is suggesting

Slide 6

Slide 6 text

Small history lesson Lucene Solr Elasticsearch

Slide 7

Slide 7 text

How does full text searching work The document’s text is put in an inverted index doc1 = “no limit, no boundaries” doc2 = “no limit made music” $q=boundaries => doc1 $q=limit => doc1, doc2 no 1,1 2 limit 1 2 boundaries 1 made 2 music 2

Slide 8

Slide 8 text

Tokenizers Tokenizers are used to break a string down into a stream of terms or tokens. A simple tokenizer might split the string up into terms wherever it encounters whitespace or punctuation. “The|quick|brown|fox…”

Slide 9

Slide 9 text

TokenFilters Removing words based on given criteria. Commonly used: StopTokenFilter “We are changing the world with the full text searching capabilities of elasticsearch.” (culture aware, configurable)

Slide 10

Slide 10 text

Analyzers Modify tokens to make them better searchable

Slide 11

Slide 11 text

Download Elasticsearch + Unzip somewhere

Slide 12

Slide 12 text

Download HEAD $elasticsearch-1.3.2\bin\plugin --install mobz/elasticsearch-head

Slide 13

Slide 13 text

Create a new (C#) project File -> New -> Console application Install the C# Client library NEST from NuGet $Install-package -Id Nest

Slide 14

Slide 14 text

Connecting to Elasticsearch var client = new ElasticClient(new ConnectionSettings(new Uri("http://localhost:9200")).SetDefaultIndex("my-index")); client.CreateIndex("my-index", s => s .AddMapping(mapMyObjectForIndex) );

Slide 15

Slide 15 text

Index mappings Default all is analyzed. Some fields [url] might not need that. private static PutMappingDescriptor mapStaatberichtForIndex(PutMappingDescriptor obj) { return obj .Properties(p => p .String(n => n.Name(nn => nn.Name).Index(FieldIndexOption.NotAnalyzed)) .String(n => n.Name(nn => nn.Url).Index(FieldIndexOption.NotAnalyzed)) ); }

Slide 16

Slide 16 text

Indexing //load stuff into memory var data = JsonConvert.DeserializeObject>(File.ReadAllText(@" C:\development\my-object-input-data.json")); //index it client.IndexMany(data, "my-index");

Slide 17

Slide 17 text

Searching Searching will return results with a score, lower is less-relevant. var queryStringQuery = client.Search(s => s.QueryString("Lightning fast ")); Console.WriteLine("Found " + queryStringQuery.Total + " results, first 10:"); foreach (var doc in queryStringQuery.Documents) { Console.WriteLine(" - Found: " + doc.Url); }

Slide 18

Slide 18 text

Filtering Filtering is binary (yes/no) but gives no scoring. (FAST! just the inverted index) var filteredItems = client.Search(s => s.Filter(f => f.Term(t => t.Name, "IronMan"))); foreach (var doc in filteredItems.Documents) { Console.WriteLine(" - Found: " + doc.Url); }

Slide 19

Slide 19 text

Aggregations (previously: Facets) var facetsSearch = client.Search(s => s .FacetTerm(t => t.OnField(tf => tf.Category)) ); foreach (var facet in facetsSearch.Facets) { Console.WriteLine("Facet: " + facet.Key); foreach (var facetValue in (facet.Value as TermFacet).Items) Console.WriteLine(" - Option: " + facetValue.Term); }

Slide 20

Slide 20 text

Suggest var suggestQuery = client.Search(s => s .SuggestTerm("suggestterm", sg => sg.OnField(sgf => sgf.TextBody).Text("fastest man alive")) ); foreach (var suggest in suggestQuery.Suggest["suggestterm"]) { Console.WriteLine(" - Maybe you mean: " + suggest.Text); }

Slide 21

Slide 21 text

Questions?