Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ElasticSearch: The Missing Intro

ElasticSearch: The Missing Intro

ElasticSearch tutorial for OSCON 2014.

Laura Thomson

July 20, 2014
Tweet

More Decks by Laura Thomson

Other Decks in Technology

Transcript

  1. lastic the missing tutorial lastic Erik Rose & Laura Thomson

    Mozilla earch earch
  2. lastic the missing tutorial lastic Erik Rose & Laura Thomson

    Mozilla earch earch
  3. housekeeping • Make sure ES is installed. If you haven’t

    installed it yet and you’re on a Mac, just install 1.1.x. • Exercise code: clone the git repo at (or just visit)
 https://github.com/erikrose/oscon-elasticsearch/ • Make faces.
  4. • Full-text search • Big data • Faceting • Geographical

    queries what it’s good for
  5. Shay Banon, Heavy Lifter

  6. the rest of us ?

  7. characteristics

  8. • Elasticsearch wraps Lucene. • Read/write/admin via REST • Native

    format is JSON (vs XML). lucene++ JSON HTTP on port 9200
  9. • CAP: consistency, availability, partition tolerance • “pick any two”

    ! • “When it comes to CAP, in a very high level, elasticsearch gives up on partition tolerance” (2010) CAP
  10. • …it’s not that simple ! • Consistency is mostly

    eventual. • Availability is variable. • Partition tolerant it’s not. ! • Read http://aphyr.com/posts/317-call-me-maybe-elasticsearch (and despair). CAP
  11. • Generally not suitable as a primary data store. •

    It’s a distributed search engine ! • Easy to get started • Easy to integrate with your existing web app • Easy to configure it not-too-terribly • Enables fast search with cool features what it’s good for, redux
  12. definitions

  13. • node — a machine in your cluster • cluster

    — the set of nodes running ES • master node — Elected by the cluster. If the master fails, another node will take over. nodes and clusters
  14. • shard — A Lucene index. Each piece of data

    you store is written to a primary shard. Primary shards are distributed over the cluster. ! • replica — Each shard has a set of distributed replicas (copies). Data written to a primary shard is copied to replicas on different nodes. shards and replicas
  15. self-defense

  16. # Unicast discovery allows to explicitly control which nodes will

    be used # to discover the cluster. It can be used when multicast is not present, # or to restrict the cluster communication-wise. # # 1. Disable multicast discovery (enabled by default): # discovery.zen.ping.multicast.enabled: false exercise: fix clustering and listening # Elasticsearch, by default, binds itself to the 0.0.0.0 address, and listens # on port [9200-9300] for HTTP traffic and on port [9300-9400] for node-to-node # communication. (the range means that if the port is busy, it will automatically # try the next port). ! # Set the bind address specifically (IPv4 or IPv6): # network.bind_host: 127.0.0.1
  17. % cd elasticsearch-1.2.2 ! % bin/elasticsearch ! # On the

    Mac: % JAVA_HOME=$(/usr/libexec/java_home -v 1.7) bin/elasticsearch exercise: start up and check % curl -s -XGET 'http://127.0.0.1:9200/_cluster/health?pretty' { "cluster_name" : "grinchertoo", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 19, "active_shards" : 19, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 13 }
  18. exercise: tool up curl

  19. exercise: tool up BBEdit’s shell worksheets:
 http://pine.barebones.com/files/BBEdit_10.5.11.dmg

  20. exercise: tool up Marvel/Sense: http://www.elasticsearch.org/overview/marvel/download/

  21. data structure basics

  22. index doctype another doctype {… }

  23. curl -s -XPUT 'http://localhost:9200/test/' exercise: make an index

  24. IDs 6a8ca01c-7896-48e9-! 81cc-9f70661fcb32

  25. # Make a doc:
 curl -s XPUT 'http://127.0.0.1:9200/test/book/1' -d '{


    "title": "All About Fish",
 "author": "Fishy McFishstein",
 "pages": 3015
 }' ! # Make sure it's there:
 curl -s -XGET 'http://127.0.0.1:9200/test/book/1?pretty' {
 "_index" : "test",
 "_type" : "book",
 "_id" : "1",
 "_version" : 2,
 "found" : true,
 "_source" : {
 "title": "All About Fish",
 "author": "Fishy McFishstein",
 "pages": 3015
 }
 } exercise: make a doc
  26. # Delete the doc: curl -s -XDELETE 'http://localhost:9200/test/book/1' exercise: make

    a doc
  27. diplodocus …………………………… 333 duodenum …………………………… 201 dwaal …………………………… 500, 119

  28. row → 0,1,3 boat → 0,1 chicken → 2 row

    row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3
  29. row → 0,1,3 boat → 0,1 chicken → 2 row

    row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3
  30. row → 0,1,3 boat → 0,1 chicken → 2 row

    row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3
  31. row → 0,1,3 boat → 0,1 chicken → 2 row

    row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3
  32. row → 0,1,3 boat → 0,1 chicken → 2 row

    row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3
  33. doc row → 0 [0,1,2] 1 [0,2] 3 [2] boat

    → 0 [4] 1 [3] chicken → 2 [0,1,2] row row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3 positions
  34. doc positions row → 0 [0,1,2] 1 [0,2] 3 [2]

    boat → 0 [4] 1 [3] chicken → 2 [0,1,2] row row row your boat row the row boat 0 1 chicken chicken chicken the front row 2 3
  35. doc positions row → 0 [0,1,2] 1 [0,2] 3 [2]

    boat → 0 [4] 1 [3] chicken → 2 [0,1,2] row row row your boat row the row boat 0 1 chicken chicken chicken the front row 2 3 ?
  36. doc positions row → 0 [0,1,2] 1 [0,2] 3 [2]

    boat → 0 [4] 1 [3] chicken → 2 [0,1,2] row row row your boat row the row boat 0 1 chicken chicken chicken the front row 2 3 ?
  37. doc row 232 → 0 [0,1,2] 1 [0,2] 3 [2]

    boat 78 → 0 [4] 1 [3] chicken 91 → 2 [0,1,2] row row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3 positions
  38. indices on properties "title": "All About Fish", "author": "Fishy McFishstein",

    "pages": 3015 "title": "Nothing About Pigs", "author": "Nopiggy Nopigman", "pages": 0 "title": "All About Everything", "author": "Everybody", "pages": 4294967295
  39. inner objects curl -s -XPUT 'http://localhost:9200/test/book/1' -d '{ "title": "All

    About Fish", "author": { "name": "Fisher McFishstein", "birthday": "1980-02-22", "favorite_color": "green" } }' title: All About Fish author.name: Fisher McFishstein author.birthday: 1980-02-22 author.favorite_color: green curl -s -XGET 'http://127.0.0.1:9200/test/book/1?pretty' { "_index" : "test", "_type" : "book", "_id" : "1", "_version" : 1, "found" : true, "_source" : { "title": "All About Fish", "author": { "name": "Fisher McFishstein", "birthday": "1980-02-22", "favorite_color": "green" } }
  40. arrays # Insert a doc containing an array: curl -s

    XPUT 'http://127.0.0.1:9200/test/book/1' -d '{ "title": "All About Fish", "tag": ["one", "two", "red", "blue"] }' doc one → 1 two → 1 red → 1 blue → 1 ["one", "two", "red", "blue"] doc 1
  41. # Insert a bunch of different docs by changing the

    things in bold: % curl -s XPUT 'http://127.0.0.1:9200/test/book/1' -d '{ "title": "All About Fish", "tag": ["one", "two", "red", "blue"] }' exercise: array play # A sample query--try changing the bold things: % curl -s -XGET 'http://127.0.0.1:9200/test/book/_search?pretty' -d '{ "query": { "match_all": {} }, "filter": { "term": {"tag": ["two", "three"]} } }' "red" ["blue"] ["one", "red"] "two"
  42. mappings

  43. # Make a new album doc:
 curl -s XPUT 'http://127.0.0.1:9200/test/

    album/1' -d '{ "title": "Fish Sounds", "gapless_playback": true, "length_seconds": 210000, "weight": 1.22, "released": "2013-01-23"
 }' ! # See what kind of mapping ES guessed:
 curl -s -XGET 'http://127.0.0.1:9200/test/ album/_mapping?pretty'
 implicit mappings { "test" : { "mappings" : { "album" : { "properties" : { "title" : { "type" : "string" }, "gapless_playback" : { "type" : "boolean" }, "length_seconds" : { "type" : "long" }, "weight" : { "type" : "double" }, "released" : { "type" : "date", "format" : "dateOptionalTime" } } } } }
  44. explicit mappings { "test" : { "mappings" : { "album"

    : { "properties" : { "title" : { "type" : "string" }, "gapless_playback" : { "type" : "boolean" }, "length_seconds" : { "type" : "long" }, "weight" : { "type" : "double" }, "released" : { "type" : "date", "format" : "dateOptionalTime" } } } } } curl -s XPUT 'http://127.0.0.1:9200/test/ _mapping/album' -d '{ "properties" : { "title" : { "type" : "string" }, "gapless_playback" : { "type" : "boolean" }, "length_seconds" : { "type" : "long" }, "weight" : { "type" : "double" }, "released" : { "type" : "date", "format" : "dateOptionalTime" } } }' { curl -s -XDELETE 'http://127.0.0.1:9200/ test/album'
  45. 1. Delete the “album” doctype, if you’ve made one by

    following along. 2. Think of an album which would prompt ES to guess a wrong type. 3. Insert it, and GET the _mapping to show the wrong guess. 4. Delete all “album” docs again so you can change the mapping. 5. Set a mapping explicitly so you can’t fool ES anymore. exercise: use explicit mappings
  46. Lurking Horrors

  47. queries

  48. • Query ES via HTTP/REST • Possible to do with

    query string • DSL is better ! • Let’s write some queries. • But first, let’s get some data in our cluster to query. queries
  49. exercise 1 • Bulk load a small test data set

    to use for querying. • This is exercise_1 in the queries/ directory of the git repo, so you can cut and paste, or execute it directly. ! % curl -XPOST localhost:9200/_bulk --data-binary @data.bulk
  50. ! ! % curl -s -XGET 'http://127.0.0.1:9200/test/book/1?pretty' exercise 2 •

    Let’s check we can pull that data, by grabbing a single document. ! • This is exercise_2 in the queries/ directory of the repo, so you can cut and paste.
  51. exercise 3 • We’ll begin by using a URI search

    (sometimes called, a little fuzzily, a query string query). ! • (This is exercise_3) ! % curl -s -XGET 'http://127.0.0.1:9200/test/book/_search?q=title:Python'
  52. • Passes searches via GET in the query string •

    This is fine for running simple queries, basic “is it working” type tests and so on. • Once you have any level of complexity in your query, you’re going to need the query DSL. ! limited appeal
  53. • DSL == Domain Specific Language • DSL is an

    AST (abstract syntax tree) of queries. ! • What does that actually mean? • Write your queries in JSON, which can be arbitrarily complex. query DSL
  54. { "query" : { "match" : { "title" : "Python"

    } } } simple DSL term query
  55. • Run this query (exercise 4). ! % curl -XGET

    'http://localhost:9200/test/book/_search' -d '{ "query" : { "match" : { "title" : "Python" } } }' ! (What do you notice about the results?) exercise 4
  56. • Filters: • Boolean: document matches or it does not

    • Order of magnitude faster than queries • Use for exact values • Cacheable queries vs. filters
  57. • Queries: • Use for full text searches • Relevance

    scored ! ! ! Filter when you can; query when you must. queries vs. filters
  58. curl -XGET -s 'http://localhost:9200/test/book/_search?pretty=true' -d \ ! '{ ! "query":

    { ! "filtered": { ! "filter": { ! "term": { ! "category": "Web Development" ! } ! }, ! "query": { ! "bool": { ! "should": [ ! { ! "match": { ! "title": "Python" ! } ! }, ! { ! "match": { ! "summary": "Python" ! } ! } ! ] ! } ! } ! } ! } ! }' use them together!
  59. exercise 5 • Let’s run that query. ! • (This

    is exercise_5)
  60. exercise 5 results • Where are my results???

  61. exercise 6 • Similar to many relational databases, ElasticSearch supports

    an explainer. Let’s run it on this query. • (This is exercise_6) ! curl -XGET -s 'http://localhost:9200/test/book/4/_explain?pretty=true' -d \ ! '{ ! "query": { ! "filtered": { ! "filter": { ! "term": { ! "category": "Web Development" ! } ! }, ! "query": { … ! ! !
  62. exercise 6 results {! "_index" : "test",! "_type" : "book",!

    "_id" : "4",! "matched" : false,! "explanation" : {! "value" : 0.0,! "description" : "failure to match filter: cache(category:Web Development)",! "details" : [ {! ! …! ! ! !
  63. • This is a classic beginner gotcha. • Using the

    standard analyzer, applied to all fields (by default) “Web Development” will be broken into the terms “web” and “development” and those will be indexed. ! • The term “Web Development” is not indexed anywhere. analyze that!
  64. • term queries or filters look for an exact match,

    so find nothing ! • But {“match” : “Web Development”} does work. Why? • match queries or filters use analysis: they break this down into searches for “web” or “development” but match works!
  65. exercise 7 • Let’s make it work. ! • One

    solution is in exercise_7. • Take a couple minutes before peeking. • TMTOWTDI ! !
  66. • Term queries look for the whole term and are

    not analyzed. • Match queries are analyzed, and look for matches to the analyzed parts of the query. summary: term vs. match
  67. curl -XGET -s 'http://localhost:9200/test/book/_search?pretty=true' -d \ '{ "query": { "match_phrase":

    { "summary": { "query": "old versions of browsers", "slop": 2 } } } }' match_phrase
  68. • Where are my favorites, AND, OR, and NOT? •

    Tortured syntax of the bool query: • must: everything in the must clause is AND • should: everything in the should clause is OR • should not: you guessed it. • Nest them as much as you like boolean queries
  69. • minimum_should_match is the number of should clauses that have

    to match. boolean bonuses
  70. "query": {! "bool": {! "must": { ! "bool": {! "should":

    [ ! {! "match": {! "category": "development"! }! },! {! "match": { ! "category": "programming" ! }! }! ]! }! },! "should": [! {! "match": {…!
  71. exercise 8 • Run this query - it’s in exercise_8

    ! • Can you modify it to find books for intermediate or above level programmers? ! !
  72. • We’re actually not going to cover faceting - deprecated

    in favor of aggregations. faceting
  73. • Aggregations let you put returned documents into buckets and

    run metrics over those buckets. • Useful for drill down navigation of data. aggregations
  74. exercise 9 curl -XGET -s 'http://localhost:9200/test/book/_search?pretty=true' -d \! '{! "size"

    : 0,! "aggs" : {! "category" : {! "terms" : {! "field" : "category"! }! }! }! }' • Run a sample aggregation - exercise_9
  75. • You can affect the way ES calculates relevance scores

    for results. For example: • Boost: weigh one part of a query more heavily than others • Custom function-scoring queries: e.g. weighting more complete user profiles • Constant score queries: pre-set a score for part of a query (useful for filters!) scoring
  76. boosting "query": { "bool": { "should": [ { "term": {

    "title": { "value": "python", "boost": 2.0 } } }, { "term": { "summary": "python" } } ] } }
  77. function scoring curl -XGET -s 'http://localhost:9200/test/book/_search?pretty=true' -d \ '{ "query":

    { "function_score": { "query": { "match": { "title": "Python" } }, "script_score": { "script": "_score * doc[\"rating\"].value" } } } }'
  78. • You have various options for writing your functions: •

    Default has been mvel but is now Groovy • Plugins for: • JS • Python • Clojure • mvel scripting languages
  79. analysis

  80. stock analyzers original: Red-orange gerbils live at #43A Franklin St.

    ! whitespace: Red-orange gerbils live at #43A Franklin St. standard: red orange gerbils live 43a franklin st simple: red orange gerbils live at a franklin st stop: red orange gerbils live franklin st snowball: red orang gerbil live 43a franklin st • stopwords • stemming • punctuation • case-folding
  81. curl -XGET -s 'http://localhost:9200/_analyze? analyzer=whitespace&pretty=true' -d 'Red-orange gerbils live at

    #43A Franklin St.' { "tokens" : [ { "token" : "Red-orange", "start_offset" : 0, "end_offset" : 10, "type" : "word", "position" : 1 }, { "token" : "gerbils", "start_offset" : 11, "end_offset" : 18, "type" : "word", "position" : 2 }, ...
  82. exercise: find 10 stopwords curl -XGET -s 'http://localhost:9200/_analyze? analyzer=stop&pretty=true' -d

    'The word "an" is a stopword.' Hint: Run the above and see what happens.
  83. solution: find 10 stopwords curl -XGET -s 'http://localhost:9200/_analyze? analyzer=stop&pretty=true' -d

    'The an is a with that be for to and snookums' { "tokens" : [ { "token" : "snookums", "start_offset" : 36, "end_offset" : 44, "type" : "word", "position" : 11 } ] } [0,1,2] [0,2] [2] [4] [3] [0,1,2] positions
  84. applying mappings to properties curl -s XPUT 'http://127.0.0.1:9200/test/_mapping/album' -d '{

    "properties": { "title": { "type": "string" }, "description": { "type": "string", "analyzer": "snowball" }, ... } }'
  85. analyzer internals name_analyzer CharFilter Tokenizer Token Filter terms O Brien

  86. "analysis": { "analyzer": { "name_analyzer": { "type": "custom", "tokenizer": "name_tokenizer",

    "filter": ["lowercase"] } }, "tokenizer": { "name_tokenizer": { "type": "pattern", "pattern": "[^a-zA-Z']+" } } } name_analyzer CharFilter Tokenizer Token Filter terms x O’Brien
  87. exercise: write a custom analyzer tags: "red, two-headed, striped, really

    dangerous" ! curl -XGET -s 'http://localhost:9200/_analyze?analyzer=whitespace&pretty=true' -d 'red, two-headed, striped, really dangerous' red two-headed striped really dangerous curl -s -XGET 'http://127.0.0.1:9200/test/ monster/_search?pretty' -d '{ "query": { "match_all": {} }, "filter": { "term": {"tags": "dangerous"} } } { "took" : 3, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 1.0, "hits" : [ { "_index" : "test", "_type" : "monster", "_id" : "1", "_score" : 1.0, "_source" : { "title": "Scarlet Klackinblax", "tags": "red, two-headed, striped, really dangerous" } } ] } }
  88. exercise: write a custom analyzer # How to update the

    "test" index's analyzers: curl -s -XPUT 'http://localhost:9200/test/_settings?pretty' -d '{ "analysis": { "analyzer": { "whitespace_analyzer": { "filter": ["lowercase"], "tokenizer": "whitespace_tokenizer" } }, "tokenizer": { "whitespace_tokenizer": { "type": "pattern", "pattern": " +" } } } }' curl -XGET -s 'http://localhost:9200/test/_analyze? analyzer=whitespace_analyzer&pretty=true' -d 'all your base are belong to us, dude' { "error" : "ElasticsearchIllegalArgumentException[Can't update non dynamic settings[[index.analysis.analyzer.comma_delim.filter.0, index.analysis.tokenizer.comma_delim_tokenizer.type, index.analysis.tokenizer.comma_delim_tokenizer.pattern, index.analysis.analyzer.comma_delim.tokenizer]] for open indices[[test]]]", "status" : 400 } curl -s -XPOST 'http://localhost:9200/test/_close' curl -s -XPOST 'http://localhost:9200/test/_open'
  89. solution: write a custom analyzer curl -s -XPUT 'http://localhost:9200/test/_settings?pretty' -d

    '{ "analysis": { "analyzer": { "comma_delim": { "filter": ["lowercase"], "tokenizer": "comma_delim_tokenizer" } }, "tokenizer": { "comma_delim_tokenizer": { "type": "pattern", "pattern": ", +" } } } }' curl -XGET -s 'http://localhost:9200/test/_analyze?analyzer=comma_delim&pretty=true' -d 'red, two- headed, striped, really dangerous' "token": "red" ... "token": "two-headed" ... "token": "striped" ... "token": "really dangerous"
  90. ngrams 'analyzer': { # A lowercase trigram analyzer 'trigramalyzer': {

    'filter': ['lowercase'], 'tokenizer': 'trigram_tokenizer' } }, 'tokenizer': { 'trigram_tokenizer': { 'type': 'nGram', 'min_gram': 3, 'max_gram': 3 # Keeps all kinds of chars by default. } “Chemieingenieurwesen ” …ing nge gen eni nie ieu eur…
  91. clustering

  92. shards curl -XPUT 'http://localhost:9200/twitter/' -d ' index: number_of_shards: 3 '

  93. replicas curl -XPUT 'http://localhost:9200/twitter/' -d ' index: number_of_shards: 3 number_of_replicas:

    2 '
  94. exercise: provisioning How would you provision a cluster if we

    were doing lots of CPU- expensive queries on a large corpus, but only a small subset of the corpus was “hot”?
  95. extremer extremes

  96. • At least 1 replica • Plenty of shards—but not

    a million • At least 3 nodes. recommendations Avoid split-brain: discovery.zen.minimum_master_nodes: 2 • Get unlucky?
 Set fire to the data center and walk away. Or continually repopulate.
  97. real-life examples

  98. • Protect with a firewall, or try elasticsearch-jetty. • discovery.zen.ping.multicast.enabled:

    false • discovery.zen.ping.unicast.hosts:
 [“master1”, “master2”] • cluster.name: something_weird too friendly
  99. adding nodes without downtime • Puppet out new config file:


    discovery.zen.ping.unicast.hosts:
 ["old.example.com", ..., "new.example.com"] • Bring up the new node.
  100. beware inconsistent config

  101. be wary of upgrades

  102. monitoring curl -XGET -s 'http://localhost:9200/_cluster/health?pretty' { "cluster_name" : "grinchertoo", "status"

    : "yellow", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 29, "active_shards" : 29, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 26 } curl -XGET -s 'http://localhost:9200/_cluster/state?pretty' { "cluster_name" : "elasticsearch", "version" : 3, "master_node" : "ACuIytIIQ7G7b_Rg_G7wnA",
  103. exercise: monitoring Why is just checking for cluster color insufficient?

    ! What could we check in addition? "cluster_name" : "grinchertoo", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 29, "active_shards" : 29, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 26
  104. monitoring: elasticsearch-paramedic http://karmi.github.com/elasticsearch-paramedic/

  105. monitoring: marvel http://www.elasticsearch.org/overview/marvel/

  106. optimization

  107. bootstrap.mlockall: true

  108. ES_HEAP_SIZE: half of RAM

  109. open files /etc/security/limits.conf:! es_user soft nofile 65535! es_user hard nofile

    65535 /etc/init.d/elasticsearch:! ulimit -n 65535! ulimit -l unlimited ✚
  110. Use default stores.

  111. RAM & JVM tuning

  112. MySQL

  113. shrinking indices % vmstat -S m -a 2 procs -----------memory----------

    ---swap-- -----io---- r b swpd free inact active si so bi bo 1 0 4 37 54 55 0 0 0 1 0 0 4 37 54 55 0 0 0 0 0 0 4 37 54 55 0 0 0 0 ! "some_doctype" : { "_source" : {"enabled" : false} } "some_doctype" : { "_all" : {"enabled" : false} } "some_doctype" : { "some_field" : {"include_in_all" : false} }
  114. filter caching "filter": { "terms": { "tags": ["red", "green"], "execution":

    "plain" } } "filter": { "terms": { "tags": ["red", "green"], "execution": "bool" } }
  115. dealing with the future

  116. mappings

  117. expensive updates

  118. • Use Bulk API. how to reindex • Turn off

    auto-refresh: curl -XPUT localhost:9200/test/_settings -d '{ "index" : { "refresh_interval" : "-1" } }' • index.merge.policy.merge_factor: 1000 • Remove replicas if you can. • Use multiple feeder processes. • Put everything back.
  119. • Backups used to be fairly cumbersome but now there’s

    an API for that! ! • Set it up: curl -XPUT 'http://localhost:9200/_snapshot/backups' -d '{! "type": "fs",! "settings": {! "location": "/somewhere/backups",! "compress": true! }! }'! ! • Run a backup: curl -XPUT "localhost:9200/_snapshot/backups/july20" backups
  120. fancy & advanced features

  121. synonyms "filter": { "synonym": { "type": "synonym", "synonyms": [ "albert

    => albert, al", "allan => allan, al" ] } } original query: Allan Smith after synonyms: [allan, al] smith original query: Albert Smith after synonyms: [albert, al] smith
  122. • You can set up synonyms at indexing or at

    query time. • For all that’s beautiful in this world, do it at query time. • At indexing explodes your data size. • You can store synonyms in a file, and reference that file in your mapping. • Many gotchas. • Undocumented limits on the file. • Needs to be uploaded to the config dir on each node. synonym gotchas
  123. • Use to suggest possible search terms, or complete queries

    • Types: • Term and Phrase - will do spelling corrections • Completion - for autocomplete • Context - limit suggestions to a subset suggesters
  124. • Why? Hook your query up to JS and query-as-they-type

    ! • Completion suggester (faster, newer, slightly cumbersome) • Prefix queries (slower, older, more reliable) ! • Both require mapping changes to work autocompletion
  125. ! curl -X POST 'localhost:9200/test/books/_suggest?pretty' -d '{! "title-suggest" : {!

    "text" : "p",! "completion" : {! "field" : "suggest"! }! }! }'! suggester autocompletion
  126. curl -XGET -s 'http://localhost:9200/test/book/_search?pretty=true' -d \ '{ "query": { "prefix":

    { "title": "P" } } }' prefix autocompletion
  127. thank you @ErikRose erik@mozilla.com @lxt laura@mozilla.com