Elasticsearch 5.x You Know, for Search

Elasticsearch 5.x You Know, for Search https://speakerdeck.com/shagaba/elasticsearch-5-dot-x-you-know-for-search Shai Gabai

What is Elasticsearch?

What is Elasticsearch? • Full-text search & Analytics engine •
Open source • NoSQL database • “Schemaless” • Inverted indices - Lucene based

What is Elasticsearch? • Distributed • Easy to scale •
Elastic • High availability

What is Elasticsearch? • RESTFul interface (HTTP/JSON) • Written in
Java • ELK / Elastic stack

ELK / Elastic stack

Elastic stack Versions 16-Aug 16-Mar 16-Feb 15-Nov 15-Oct 2.4 2.3
2.2 2.1 2.0 Elasticsearch 2.4 2.3 2.2 2.1 2.0 Logstash 4.6 4.5 4.4 4.3 4.2 Kibana 1.3 1.2 1.1 1.0 Beats

Elastic stack Versions

Use Cases • Full text search • Logging & Analysis
• Event data • Analytics & Aggregations • Data visualization • Alerting & Classification • Suggestions & Autocomplete • Performance monitoring

Agenda • Basics • Text analysis & lucene • Analyzers
• Mapping • Query DSL • Q&A

Basics • Concepts • Getting started • Field datatypes •
Document • Metadata • CRUD

Concepts • Index • Type • Document • Field •
Mapping • Everything is Indexed • Query DSL • Database • Table • Row • Column • Schema • Index • SQL RDBMS Elasticsearch

Idx-1-sh1 Idx-1-sh2 Idx-1-sh3 Node-1 Elasticsearch • Shard • Node Concepts
Index-1

Elasticsearch • Replica shards • Cluster Concepts Idx-1-sh1 Idx-1-sh2 Idx-1-sh3
Node-1 Idx-1-R1 Idx-1-R2 Idx-1-R3 Node-2

Add Failover • All primary and replica shards are allocated
Concepts Idx-1-sh1 Idx-1-sh2 Idx-1-sh3 Node-1 Node-2 Idx-1-R1 Idx-1-R2 Idx-1-R3

Scale Horizontally • shards have been reallocated to spread the
load Concepts Idx-1-sh2 Idx-1-sh3 Node-1 Node-2 Idx-1-R1 Idx-1-R2 Idx-1-sh1 Node-3 Idx-1-R3

Scale Some More • Increasing the number of replicas to
2 Concepts Idx-1-sh2 Idx-1-sh3 Node-1 Node-2 Idx-1-R1 Idx-1-R2 Idx-1-sh1 Node-3 Idx-1-R3 Idx-1-R3 Idx-1-R2 Idx-1-R1

Cluster after killing one node • A cluster must have
a master node in order to function correctly Concepts Idx-1-sh2 Idx-1-sh3 Node-2 Idx-1-sh1 Node-3 Idx-1-R3 Idx-1-R2 Idx-1-R1 Node-1

Download Elasticsearch • Download : • https://www.elastic.co/downloads/elasticsearch • Elasticsearch Configuration:
• elasticsearch.yml • cluster.name: shagaba • node.name: node-01

Getting Started • Run bin/elasticsearch (or bin\elasticsearch.bat on Windows) •
Run curl http://localhost:9200/

Getting Started http://localhost:9200/ { "name" : "node-01", "cluster_name" : "shagaba",
"cluster_uuid" : "3p3dLj8bQYqOZGLBX3GAHg", "version" : { "number" : "5.1.1", "build_hash" : "5395e21", "build_date" : "2016-12-06T12:36:15.409Z", "build_snapshot" : false, "lucene_version" : "6.3.0" }, "tagline" : "You Know, for Search" }

Getting Started

Field datatypes • Core datatypes • String – text, keyword
• Numeric – long, integer, short, byte, double, float • Date – date • Boolean – boolean • Complex datatypes • Array • Object

Field datatypes • Geo datatypes • geo_point – for lat/lon
points • geo_shape – for complex shapes like polygons • Specialised datatypes • ip – for IPv4 and IPv6 addresses • completion – to provide auto-complete suggestions

Document { "name": "John Smith", "age": 42, "confirmed": true, "join_date":
"2017-01-01", "home": { "lat": 51.5, "lon": 0.1 }, "accounts": [ { "type": "facebook", "id": "johnsmith" } ] }

Document Metadata • _index Where the document lives • _type
The class of object that the document represents • _id The unique identifier for the document • _version Enables optimistic concurrency control on a single document level • _source The original document that was indexed

Indexing a Document • Using Our Own ID PUT /{index}/{type}/{id}
{ "field": "value", ... } PUT /website/blog/123 { "title": "My first blog entry", "text": "Just trying this out...", "date": "2017/01/01" }

Indexing a Document • Elasticsearch responds { "_index": "website", "_type":
"blog", "_id": "123", "_version": 1, "created": true } 201 (CREATED) if it's a newly created doc 200 (OK) if the doc was updated (replaced/reindexed)

Indexing a Document • Autogenerating IDs POST /website/blog/ { "title":
"My second blog entry", "text": "Still trying this out...", "date": "2017/01/01" } { "_index": "website", "_type": "blog", "_id": "AVFgSgVHUP18jI2wRx0w", "_version": 1, "created": true }

Retrieving a Document • Retrieving the whole Document GET /website/blog/123?pretty
{ "_index" : "website", "_type" : "blog", "_id" : "123", "_version" : 1, "found" : true, "_source" : { "title": "My first blog entry", "text": "Just trying this out...", "date": "2017/01/01" } } 200 (OK) if exists 404 (NOT FOUND) if doesn’t exist

Retrieving a Document • Not Found GET /website/blog/456?pretty HTTP/1.1 404
Not Found Content-Type: application/json; charset=UTF-8 Content-Length: 83 { "_index" : "website", "_type" : "blog", "_id" : “456", "found" : false }

Retrieving a Document • Retrieving Part of a Document GET
/website/blog/123?_source=title,text { "_index" : "website", "_type" : "blog", "_id" : "123", "_version" : 1, "found" : true, "_source" : { "title": "My first blog entry", "text": "Just trying this out..." } }

Retrieving a Document • Retrieving Fields without Metadata GET /website/blog/123/_source
{ "title": "My first blog entry", "text": "Just trying this out...", "date": "2017/01/01" }

Checking whether a Document Exists • Check if a document
is in the index • Without the overhead of loading it HEAD /website/blog/7890 HTTP/1.1 200 OK Content-Type: application/json; charset=UTF-8 Content-Length: 0 HTTP/1.1 404 Not Found Content-Type: application/json; charset=UTF-8 Content-Length: 0 200 (OK) if _id exists 404 (NOT FOUND) if _id doesn’t exist

Updating a Whole Document • Documents are immutable PUT /website/blog/123
{ "title": "My first blog entry", "text": "I am starting to get the hang of this...", "date": "2017/01/01" } { "_index" : "website", "_type" : "blog", "_id" : "123", "_version" : 2, "created": false }

Partial Update • Partial document merged with existing document POST
/website/blog/123/_update { "title": "Partial Update to Document" } { "_index" : "website", "_type" : "blog", "_id" : "123", "_version" : 2, "created": false }

Deleting a Document • When the Document is found DELETE
/website/blog/123 { "found" : true, "_index" : "website", "_type" : "blog", "_id" : "123", "_version" : 3 } 200 (OK) if exists 404 (NOT FOUND) if doesn’t exist

Deleting a Document • When the Document isn’t found DELETE
/website/blog/123 { "found" : false, "_index" : "website", "_type" : "blog", "_id" : "123", "_version" : 4 } • Elasticsearch does keep records of deletes, but forgets about them after 60 second. • This is called deletes garbage collection

Text Analysis & Lucene • The need for text analysis
• Inverted index • Analysis

The need for Text Analysis Exact values • 156 •
1.9 • 2017-01-01 • true / false • “Hello World” Full text “The quick brown fox jumped over the lazy dog”

The need for Text Analysis • Stopwords • "a", "and",
"but", "how", "or", "what", "else", "etc", "the“… • Case sensitivity • "Hello World", "hello world", "HELLO WORLD"... • Grammar • "jumps", "jumping", "jumped“, "jump“ • Synonyms • "walk", "hike", "tour", "parade", "march“ • Relevance scoring

Inverted Index • Elasticsearch uses a structure called an inverted
index, which is designed to allow very fast full-text searches. • An inverted index consists of a list of all the unique words that appear in any document, and for each word, a list of the documents in which it appears.

Inverted Index “The quick brown fox jumped over the lazy
dog” “Quick brown foxes leap over lazy dogs in summer “

Inverted Index • Separate words / terms The,quick,brown,fox,jumped,over,the,lazy,dog Quick,brown,foxes,leap,over,lazy,dogs,in,summer

Inverted Index • Sort unique terms The,brown,dog,fox,jumped,lazy,over,quick,the Quick,brown,dogs,foxes,in,lazy,leap,over,summer

Inverted Index • List docs containing terms Term Doc_1 Doc_2
------------------------- Quick | | X The | X | brown | X | X dog | X | dogs | | X fox | X | foxes | | X in | | X jumped | X | lazy | X | X leap | | X over | X | X quick | X | summer | | X the | X | ------------------------

Inverted Index • Search for “quick brown” Term Doc_1 Doc_2
------------------------- brown | X | X quick | X | ------------------------ Total | 2 | 1 Both documents match, but the first document has more matches than the second

Inverted Index • Few problems • “Quick” and “quick” •
“The” and “the” • “fox” and “foxes” • “dog” and “dogs” • “jumped” and “leap” • “the” doesn’t bring much value Term Doc_1 Doc_2 ------------------------- Quick | | X The | X | brown | X | X dog | X | dogs | | X fox | X | foxes | | X in | | X jumped | X | lazy | X | X leap | | X over | X | X quick | X | summer | | X the | X | ------------------------

Inverted Index • Normalize into a standard format • “Quick”
can be lowercased to become “quick”. • “The” can be lowercased to become “the”. • “foxes” can be stemmed to its root form: “fox”. • “dogs” could be stemmed to “dog”. • “jumped” and “leap” are synonyms and can be indexed as just the single term “jump”. Term Doc_1 Doc_2 ------------------------- brown | X | X dog | X | X fox | X | X in | | X jump | X | X lazy | X | X over | X | X quick | X | X summer | | X ------------------------

Inverted Index • Search for “Quick fox“ would fail Term
Doc_1 Doc_2 ------------------------- brown | X | X dog | X | X fox | X | X in | | X jump | X | X lazy | X | X over | X | X quick | X | X summer | | X ------------------------ We no longer have the exact term “Quick” in our index.

Inverted Index • Solution • apply the same normalization rules
that we used on the content field to our query string, it would become a query for ”quick fox” Term Doc_1 Doc_2 ------------------------- brown | X | X dog | X | X fox | X | X in | | X jump | X | X lazy | X | X over | X | X quick | X | X summer | | X ------------------------

Analysis • The process of converting text into tokens or
terms which are added to the inverted index for searching. • Tokenization – tokenizing a block of text into individual terms suitable for use in an inverted index. • Normalization – normalizing these terms into a standard form to improve their “searchability”.

Analyzers • Introduction • Build in analyzers • Testing analyzers
• Custom analyzers

Introduction • Special algorithms that determine how a string field
in a document is transformed into terms in an inverted index. • Character filters – replaces characters for analyzed text • Tokenizers – break text down into terms • Token filters – add/ change/ remove terms • Build in analyzers • Custom analyzers • When analyzers are used? • Index time • Search time

• Standard analyzer • Simple analyzer • Whitespace analyzer •
Stop analyzer • Keyword analyzer • Pattern analyzer • Language analyzer • Fingerprint analyzer • Custom analyzer Built in Analyzers

Standard Analyzer "The 2 QUICK Brown-Foxes jumped over the lazy
dog's bone."

Standard Analyzer • Standard Tokenizer [ The, 2, QUICK, Brown,
Foxes, jumped, over, the, lazy, dog's, bone ]

Standard Analyzer • Lowercase Filter [ the, 2, quick, brown,
foxes, jumped, over, the, lazy, dog's, bone ]

Standard Analyzer • Stopwords Filter (disabled by default) [ 2,
quick, brown, foxes, jumped, over, lazy, dog's, bone ]

Simple Analyzer "The 2 QUICK Brown-Foxes jumped over the lazy
dog's bone." 1. Lowercase Tokenizer [ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ] Equivalent to the letter tokenizer combined with the lowercase token filter, but is more efficient as it performs both steps in a single pass

Stop Analyzer "The 2 QUICK Brown-Foxes jumped over the lazy
dog's bone." 1. Lowercase Tokenizer [ the, quick, brown, foxes, jumped, over, the, lazy, dog, s, bone ] 2. Stop Filter [ quick, brown, foxes, jumped, over, lazy, dog, s, bone ]

Whitespace Analyzer "The 2 QUICK Brown-Foxes jumped over the lazy
dog's bone." 1. Whitespace Tokenizer [ The, 2, QUICK, Brown-Foxes, jumped, over, the, lazy, dog's, bone. ]

English Analyzer "The Quick Brown Fox jumped over the Lazy
Dog!" 1. Standard Tokenizer [ The, Quick, Brown, Fox, jumped, over, the, Lazy, Dog ] 2. Lowercase Filter [ the, quick, brown, fox, jumped, over, the, lazy, dog ] 3. English Stemmer [ the, quick, brown, fox, jump, over, the, lazy, dog ] 4. English Stopwords [ quick, brown, fox, jump, over, lazy, dog ]

HebMorph • Open source AGPL3 • Commercial Option Available •
Itamar Syn-Hershko • https://github.com/synhershko/HebMorph • http://code972.com/hebmorph

• Performs the analysis process on a text and return
the tokens breakdown of the text. GET /_analyze { "analyzer": "english", "text": "The 2 QUICK Foxes jumped." } Testing Analyzer

{ "token": "fox", "start_offset": 12, "end_offset": 17, "type": "<ALPHANUM>", "position":
3 }, { "token": "jump", "start_offset": 18, "end_offset": 25, "type": "<ALPHANUM>", "position": 4 } ] } { "tokens": [ { "token": "2", "start_offset": 4, "end_offset": 5, "type": "<NUM>", "position": 1 }, { "token": "quick", "start_offset": 6, "end_offset": 11, "type": "<ALPHANUM>", "position": 2 }, Testing Analyzer

PUT my_index { "settings": { "analysis": { "analyzer": { "my_custom_analyzer":
{ "type": "custom", "char_filter": [ "html_strip" ], "tokenizer": "standard", "filter": [ "lowercase" ] } } } } } Custom Analyzer

• Specifying an index – performs the analysis process on
a text. GET my_index/_analyze { "analyzer": "my_custom_analyzer", "text": "The 2 <b>QUICK Foxes</b>" } POST my_index/_analyze { "analyzer": "my_custom_analyzer", "text": "The 2 <b>QUICK Foxes</b>" } Custom Analyzer

{ "token": "quick", "start_offset": 9, "end_offset": 14, "type": "<ALPHANUM>", "position":
2 }, { "token": "foxes", "start_offset": 15, "end_offset": 24, "type": "<ALPHANUM>", "position": 3 } ] } { "tokens": [ { "token": "the", "start_offset": 0, "end_offset": 3, "type": "<ALPHANUM>", "position": 0 }, { "token": "2", "start_offset": 4, "end_offset": 5, "type": "<NUM>", "position": 1 }, Custom Analyzer

Mapping • Dynamic field mapping • Put mappings • Get
mappings • Mapping analyzer • Multilingual documents

Dynamic Field Mapping JSON datatype • null • true/false •
floating point number • integer • object • array • string Elasticsearch datatype • No field is added. • boolean field • float field • long field • object field • Depends on the first non-null value • Either a date field (date detection), a double or long field (numeric detection) or a text field, with a keyword sub-field.

• Creates an index called twitter with the message field
in the tweet mapping type PUT twitter { "mappings": { "tweet": { "properties": { "message": { "type": "text" } } } } } Put Mapping

• Uses the PUT mapping API to add a new
field called user_name to the tweet mapping type. PUT twitter/_mapping/tweet { "properties": { "user_name": { "type": "text" } } } Put Mapping

GET /twitter/_mapping/tweet { "twitter": { "mappings": { "tweet": { "properties":
{ "message": { "type": "text" }, "user_name": { "type": "text" } } } } } } Get Mapping

Get Mapping • Get mappings for tweet and user types
• GET /_mapping/tweet,user • GET /_all/_mapping/tweet,user • Get mappings of all indices and types • GET /_all/_mapping • GET /_mapping

Mapping Analyzer PUT /my_index { "mappings": { "my_type": { "properties":
{ "text4u": { "type": "text" }, "english4u": { "type": "text", "analyzer": "english" } } } } }

Mapping Analyzer GET /my_index/_analyze { "field": "text4u", "text": "The quick
Brown Foxes." } [ the, quick, brown, foxes ]

Mapping Analyzer GET /my_index/_analyze { "field": "english4u", "text": "The quick
Brown Foxes." } [ quick, brown, fox ]

PUT website/_mapping/blog { "properties": { "tag": { "type": "text", "index":
"not_analyzed" } } } Mapping Analyzer

• Finding the right strategy for handling documents written in
several languages can be challenging. • Mixing languages in the same inverted index can be problematic. • We must take into consideration • Index Time • Search Time Multilingual documents

At Index Time • Multilingual documents come in three main
varieties: • One predominant language per document, which may contain snippets from other languages (One Language per Document) • One predominant language per field, which may contain snippets from other languages (One Language per Field) • A mixture of languages per field (Mixed-Language Fields.) Multilingual documents

At Query Time • Identify the main language: • the
language that the user chosen from the UI • the accept-language HTTP header from the user’s browser. • User searches also come in three main varieties: • Users search for words in their main language. • Users search for words in a different language, but expect results in their main language. • Users search for words in a different language, and expect results in that language. Multilingual documents

PUT /blogs-en { "mappings": { "post": { "properties": { "title":
{ "type": "string", "fields": { "stemmed": { "type": "string", "analyzer": "english" } }}}}}} One Language per Document

PUT /blogs-fr { "mappings": { "post": { "properties": { "title":
{ "type": "string", "fields": { "stemmed": { "type": "string", "analyzer": "french" } }}}}}} One Language per Document

GET /blogs-*/post/_search { "query": { "multi_match": { "query": "deja vu",
"fields": [ "title", "title.stemmed" ] "type": "most_fields" } } } One Language per Document

PUT /movies { "mappings": { "movie": { "properties": { "title":
{ "type": "string"}, "title_br": { "type": "string", "analyzer": "brazilian" }, "title_cz": { "type": "string", "analyzer": "czech" }, "title_en": { "type": "string", "analyzer": "english" }, "title_es": { "type": "string", "analyzer": "spanish" } } } } } One Language per Field

GET /movies/movie/_search { "query": { "multi_match": { "query": "club de
la lucha", "fields": [ "title*"] "type": "most_fields" } } } One Language per Field

PUT /movies { "mappings": { "movie": { "properties": { "title":
{ "type": "string", "fields": { "de": { "type": "string", "analyzer": "german" }, "en": { "type": "string", "analyzer": "english" }, "fr": { "type": "string", "analyzer": "french" }, "es": { "type": "string", "analyzer": "spanish" } }}}}}} Mixed-Language Fields

GET /movies/movie/_search { "query": { "multi_match": { "query": "club de
la lucha", "fields": [ "title*"] "type": "most_fields“ "minimum_should_match": "75%" } } } Mixed-Language Fields

• Compact Language Detector (CLD) from Google • Open source
– Apache License 2.0 • It is small, fast, and accurate, and can detect 160+ languages from as little as two sentences. • It can even detect multiple languages within a single block of text • https://github.com/CLD2Owners/cld2 Identifying Language

Query DSL • Structure • Introducing the Query • Executing
Searches and Filters • Match All queries • Full text queries • Term level queries • Compound queries • Joining queries • Geo queries

Structure • Based on JSON • Flexible • Powerful •
Leaf and Compound query clauses • Query and Filter context

Query Context • Relevance – Score • Full text •
Not cached • Slower "How well does this document match this query clause?" Structure

Filter Context • Boolean true/false • Exact values • Cached
• Faster "Does this document match this query clause?" Structure

Filter first, then Query remaining docs Structure

Introducing the Query GET /website/blog/_search { "query": { ... }
} GET /website/blog/_search { "query" : { "match_all": {} } }

Introducing the Query GET /website/blog/_search { "query" : { "match_all":
{}, "from" : 10, "size" : 10, "sort": { "title" : { "order" : "desc" } } } }

Executing Searches • Returns all documents in the account type
within bank index GET /bank/account/_search { "query": { "match_all": {} } } • Returns all documents in the bank index GET /bank/_search { "query": { "match_all": {} } }

Executing Searches • Returns the account numbered 20 GET /bank/account/_search
{ "query": { "match": { "account" : 20 } } } • Returns all accounts containing the term "mill" in the address GET /bank/account/_search { "query": { "match" : { "address" : "mill" } } }

Executing Searches • Returns all accounts containing the term "mill"
or "lane" in the address GET /bank/account/_search { "query": { "match": { "address": "mill lane" } } } • Returns all accounts containing the phrase "mill lane" in the address GET /bank/account/_search { "query": { "match_phrase": { "address": "mill lane" } } }

Executing Searches • Returns all accounts containing "mill" and "lane"
in the address GET /bank/account/_search { "query": { "bool": { "must": [ { "match": { "address": "mill" } }, { "match": { "address": "lane" } } ] } } }

Executing Searches • Returns all accounts containing "mill" or "lane"
in the address GET /bank/account/_search { "query": { "bool": { "should": [ { "match": { "address": "mill" } }, { "match": { "address": "lane" } } ] } } }

Executing Searches • Returns all accounts that contain neither "mill"
nor "lane" in the address GET /bank/account/_search { "query": { "bool": { "must_not": [ { "match": { "address": "mill" } }, { "match": { "address": "lane" } } ] } } }

Executing Searches • Returns all accounts of anybody who is
40 years old but doesn’t live in ID(aho) GET /bank/account/_search { "query": { "bool": { "must": [ { "match": { "age": "40" } } ], "must_not": [ { "match": { "state": "ID" } } ] } } }

• Returns all accounts with balances between 20000 and 30000,
inclusive. GET /bank/account/_search { "query": { "bool": { "must": { "match_all": {} }, "filter": { "range": { "balance": { "gte": 20000, "lte": 30000 } } } } } } Executing Filters

GET /bank/account/_search { "query": { "bool": { "must": [ {
"match": { "title": "Search" }}, { "match": { "content": "Elasticsearch" }} ], "filter": [ { "term": { "status": "published" }}, { "range": { "publish_date": { "gte": "2015-01-01" }}} ] } } } Query and Filter Context

Match All Queries The most simple query. • match_all •
match_none

• Matches all documents, giving them all a _score of
1.0. GET /bank/account/_search { "query": { "match_all": {} } } • The inverse of the match_all query, which matches no documents. GET /bank/account/_search { "query": { "match_none": {} } } Match All Queries

Full text queries The high-level full text queries understand how
the field being queried is analyzed and will apply each field's analyzer (or search_analyzer) to the query string before executing. • match • match_phrase • match_phrase_prefix • multi_match • common_terms • query_string • simple_query_string

Match Query GET /_search { "query": { "match" : {
"message" : "QUICK BROWN FOX" } } } GET /_search { "query": { "match" : { "message" : "QUICK BROWN FOX" }, "operator" : "and" } } • minimum_should_match • fuzziness - levenshtein edit distance: kiuck > qiuck > quick • zero_terms_query

Match Query GET /_search { "query": { "match" : {
"message" : "QUICK BROWN FOX", "operator" : "and" } } } • minimum_should_match • fuzziness - levenshtein edit distance: kiuck > qiuck > quick • zero_terms_query

Match Phrase Query Analyzes the text and creates a phrase
query out of the analyzed text A phrase query matches terms up to a configurable slop (which defaults to 0) in any order. GET /_search { "query": { "match_phrase" : { "message" : "QUICK BROWN FOX" } } }

Match Phrase Query GET /_search { "query": { "match_phrase" :
{ "message" : "BROWN QUICK FOX", "slop" : "10" } } }

Match Phrase Prefix Query GET /_search { "query": { "match_phrase_prefix"
: { "message" : "quick brown f" } } }

Multi Match Query Match query on multiple fields GET '/_search
{ "query" : { "multi_match" : { "query" : "this is a test", "fields" : [ "subject", "message" ] } } }

Multi Match Query GET /_search { "query": { "multi_match" :
{ "query": "brown fox", "type": "best_fields", "fields": [ "subject", "message" ] } } } • best_fields • most_fields • cross_fields • phrase • phrase_prefix

Term level queries The term-level queries operate on the exact
terms that are stored in the inverted index. Used for structured data like numbers, dates, and enums. • wildcard • regexp • *fuzzy • type • ids • term • terms • range • exists • prefix

Term Query Lets create a document for this example PUT
my_index/my_type/1 { "full_text": "Quick Foxes!", "exact_value": "Quick Foxes!" } • full_text - inverted index will contain the terms: [quick, foxes] • exact_value - inverted index will contain the exact term: [Quick Foxes!]

Term Query This query matches because the exact_value field contains
the exact term Quick Foxes! GET my_index/my_type/_search { "query": { "term": { "exact_value": "Quick Foxes!" } } } • exact_value - inverted index will contain the exact term: [Quick Foxes!]

Term Query This query does not match, because the full_text
field only contains the terms quick and foxes. It does not contain the exact term Quick Foxes! GET my_index/my_type/_search { "query": { "term": { "full_text": "Quick Foxes!" } } } • full_text - inverted index will contain the terms: [quick, foxes]

Term Query A term query for the term foxes matches
the full_text field. GET my_index/my_type/_search { "query": { "term": { "full_text": "foxes" } } } • full_text - inverted index will contain the terms: [quick, foxes]

Term Query This match query on the full_text field first
analyzes the query string, then looks for documents containing quick or foxes or both. GET my_index/my_type/_search { "query": { "match": { "full_text": "Quick Foxes!" } } } • full_text - inverted index will contain the terms: [quick, foxes]

Range Query GET my_index/my_type/_search { "query": { "range" : {
"age" : { "gte" : 10, "lte" : 20 } } } }

Ranges on date fields GET _search { "query": { "range"
: { "date" : { "gte" : "now-1d/d", "lt" : "now/d" } } } } • date math • date format • timezone

Exists Query Returns documents that have at least one non-null
value in the original field GET /_search { "query": { "exists" : { "field" : "user" } } } These documents would all match the query { "user": "jane" } { "user": "" } { "user": "-" } { "user": ["jane"] } { "user": ["jane", null ] } These documents would not match the query: { "user": null } { "user": [] } { "user": [null] }

Compound queries Compound queries wrap other compound or queries, either
to combine their results and scores, to change their behaviour, or to switch from query to filter context. • constant_score • bool • dis_max • function_score • boosting • indices

Bool Query A query that matches documents matching boolean combinations
of other queries. • must • filter • should • must_not

Bool Query POST _search { "query": { "bool" : {
"must" : { "term" : { "user" : "SHAGABA" } }, "filter": { "term" : { "tag" : "tikal" } }, "must_not" : { "range" : { "age" : { "gte" : 1, "lte" : 21} } }, "should" : [ { "term" : { "tag" : "spark" } }, { "term" : { "tag" : "elasticsearch" } } ] } } }

Summary • Elasticsearch Concepts • Getting started • CRUD •
Inverted index • Text analysis • Analyzers • Mapping • Query DSL

Links • Elastic • https://www.elastic.co/ • Elasticsearch: The Definitive Guide
• https://www.elastic.co/guide/en/elasticsearch/guide/current/index.html • Elasticsearch Reference – 5.1 • https://www.elastic.co/guide/en/elasticsearch/reference/5.1/index.html • Elastic video & webinars • https://www.elastic.co/videos

Thanks Hope you’re inspired Shai Gabai

Elasticsearch 5.x You Know, for Search

Elasticsearch 5.x You Know, for Search

More Decks by Shai Gabai

Other Decks in Technology

Featured

Transcript