ElasticSearch: The Missing Intro (Indexing and Querying) by Erik Rose

ElasticSearch: The Missing Intro (Indexing and Querying) by Erik Rose

Elasticsearch provides an easy path to clusterable full-text search, with synonyms, faceting, and geographic math, but there's a paucity of written wisdom beyond its API docs. This talk, part 1 of a 2-part series, surveys its capabilities and shows how its internal data structures and algorithms work. With the groundwork laid, we explore how to choose efficient indexing and the right queries to make your apps go fast.

Afcfefa1f067d10bd021de0cc2e5e806?s=128

PyCon 2013

March 15, 2013
Tweet

Transcript

  1. 2.
  2. 11.

    row → 0,1,3 boat → 0,1 chicken → 2 row

    row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3
  3. 12.

    row → 0,1,3 boat → 0,1 chicken → 2 row

    row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3
  4. 13.

    row → 0,1,3 boat → 0,1 chicken → 2 row

    row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3
  5. 14.

    row → 0,1,3 boat → 0,1 chicken → 2 row

    row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3
  6. 15.

    row → 0,1,3 boat → 0,1 chicken → 2 row

    row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3
  7. 16.

    doc row → 0 [0,1,2] 1 [0,2] 3 [2] boat

    → 0 [4] 1 [3] chicken → 2 [0,1,2] row row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3 positions
  8. 17.

    doc row → 0 [0,1,2] 1 [0,2] 3 [2] boat

    → 0 [4] 1 [3] chicken → 2 [0,1,2] row row row your boat row the row boat 0 1 positions chicken chicken chicken the front row 2 3
  9. 18.

    doc row → 0 [0,1,2] 1 [0,2] 3 [2] boat

    → 0 [4] 1 [3] chicken → 2 [0,1,2] row row row your boat row the row boat 0 1 positions chicken chicken chicken the front row 2 3 ?
  10. 19.

    doc row → 0 [0,1,2] 1 [0,2] 3 [2] boat

    → 0 [4] 1 [3] chicken → 2 [0,1,2] row row row your boat row the row boat 0 1 positions chicken chicken chicken the front row 2 3 ?
  11. 20.
  12. 21.
  13. 22.

    stock analyzers original: Red-orange gerbils live at #43A Franklin St.

    whitespace: Red-orange, gerbils, live, at, #43A, Franklin, St. standard: red, orange, gerbils, live, 43a, franklin, st simple: red, orange, gerbils, live, at, a, franklin, st stop: red, orange, gerbils, live, franklin, st snowball: red, orang, gerbil, live, 43a, franklin, st ‘ stopwords ‘ stemming ‘ punctuation ‘ case-folding
  14. 23.

    curl -XGET -s 'http://localhost:9200/_analyze? analyzer=whitespace&pretty=true' -d 'Red- orange gerbils live

    at #43A Franklin St.' { "tokens" : [ { "token" : "Red-orange", "start_offset" : 0, "end_offset" : 10, "type" : "word", "position" : 1 }, { "token" : "gerbils", "start_offset" : 11, "end_offset" : 18, "type" : "word", "position" : 2 }, ...
  15. 25.

    'analysis': { 'analyzer': { 'name_analyzer': { 'type': 'custom', 'tokenizer': 'name_tokenizer',

    'filter': ['lowercase'] } }, 'tokenizer': { 'name_tokenizer': { 'type': 'pattern', 'pattern': "[^a-zA-Z']+" } } } name_analyzer CharFilter Tokenizer Token Filter terms
  16. 26.

    synonyms "filter": { "synonym": { "type": "synonym", "synonyms": [ "albert

    => albert, al", "allan => allan, al" ] } } original query: Allan Smith after synonyms: [allan, al] smith original query: Albert Smith after synonyms: [albert, al] smith
  17. 27.
  18. 28.
  19. 29.

    { "bool" : { "must" : { "term" : {

    "user" : "fred" } }, "must_not" : { "range" : { "age" : { "from" : 12, "to" : 21 } } }, "should" : [ { "term" : { "tag" : "crunchy" } }, { "term" : { "tag" : "elasticsearch" } } ], "minimum_number_should_match" : 1, "boost" : 1.0 } }
  20. 30.
  21. 31.

    filters ‘ Boolean queries ‘ Fuzzy, scoring ‘ Fast ‘

    Slower ‘ Cacheable ‘ Not cacheable
  22. 32.

    curl -XGET -s 'http://localhost:9200/blog/_search?pretty=true' -d \ '{ "query": { "filtered":

    { "filter": { "term": { "category": "rants" } }, "query": { "bool": { "should": [ { "match_phrase": { "body": "fix your little red wagon" } }, { "match": { "body": "fix your little red wagon" } } ] } } } } }'
  23. 33.

    ‘ pyes ‘ pyelasticsearch query = {'query': { 'filtered': {

    'query': { 'match': {'name': 'test tester'} }, 'filter': { 'range': { 'age': {'from': 27, 'to': 37} }}}}} es.search(query, index='people') ‘ elasticutils print [item['title'] for item in searcher.query(title__text='cookie') .filter(topics='websites')] libraries ✚
  24. 34.

    thank you twitter: ErikRose erik@mozilla.com Background image by Tim and

    Julie Wilson: https://secure.flickr.com/photos/secondtree/. This presentation is noncommercial sharealike in accordance with that image's license. Part 2: Sunday at 1:10pm