Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ElasticSearch: The Missing Intro (Indexing and ...

ElasticSearch: The Missing Intro (Indexing and Querying) by Erik Rose

Elasticsearch provides an easy path to clusterable full-text search, with synonyms, faceting, and geographic math, but there's a paucity of written wisdom beyond its API docs. This talk, part 1 of a 2-part series, surveys its capabilities and shows how its internal data structures and algorithms work. With the groundwork laid, we explore how to choose efficient indexing and the right queries to make your apps go fast.

PyCon 2013

March 15, 2013
Tweet

More Decks by PyCon 2013

Other Decks in Programming

Transcript

  1. row → 0,1,3 boat → 0,1 chicken → 2 row

    row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3
  2. row → 0,1,3 boat → 0,1 chicken → 2 row

    row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3
  3. row → 0,1,3 boat → 0,1 chicken → 2 row

    row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3
  4. row → 0,1,3 boat → 0,1 chicken → 2 row

    row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3
  5. row → 0,1,3 boat → 0,1 chicken → 2 row

    row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3
  6. doc row → 0 [0,1,2] 1 [0,2] 3 [2] boat

    → 0 [4] 1 [3] chicken → 2 [0,1,2] row row row your boat row the row boat chicken chicken chicken the front row 0 1 2 3 positions
  7. doc row → 0 [0,1,2] 1 [0,2] 3 [2] boat

    → 0 [4] 1 [3] chicken → 2 [0,1,2] row row row your boat row the row boat 0 1 positions chicken chicken chicken the front row 2 3
  8. doc row → 0 [0,1,2] 1 [0,2] 3 [2] boat

    → 0 [4] 1 [3] chicken → 2 [0,1,2] row row row your boat row the row boat 0 1 positions chicken chicken chicken the front row 2 3 ?
  9. doc row → 0 [0,1,2] 1 [0,2] 3 [2] boat

    → 0 [4] 1 [3] chicken → 2 [0,1,2] row row row your boat row the row boat 0 1 positions chicken chicken chicken the front row 2 3 ?
  10. stock analyzers original: Red-orange gerbils live at #43A Franklin St.

    whitespace: Red-orange, gerbils, live, at, #43A, Franklin, St. standard: red, orange, gerbils, live, 43a, franklin, st simple: red, orange, gerbils, live, at, a, franklin, st stop: red, orange, gerbils, live, franklin, st snowball: red, orang, gerbil, live, 43a, franklin, st ‘ stopwords ‘ stemming ‘ punctuation ‘ case-folding
  11. curl -XGET -s 'http://localhost:9200/_analyze? analyzer=whitespace&pretty=true' -d 'Red- orange gerbils live

    at #43A Franklin St.' { "tokens" : [ { "token" : "Red-orange", "start_offset" : 0, "end_offset" : 10, "type" : "word", "position" : 1 }, { "token" : "gerbils", "start_offset" : 11, "end_offset" : 18, "type" : "word", "position" : 2 }, ...
  12. 'analysis': { 'analyzer': { 'name_analyzer': { 'type': 'custom', 'tokenizer': 'name_tokenizer',

    'filter': ['lowercase'] } }, 'tokenizer': { 'name_tokenizer': { 'type': 'pattern', 'pattern': "[^a-zA-Z']+" } } } name_analyzer CharFilter Tokenizer Token Filter terms
  13. synonyms "filter": { "synonym": { "type": "synonym", "synonyms": [ "albert

    => albert, al", "allan => allan, al" ] } } original query: Allan Smith after synonyms: [allan, al] smith original query: Albert Smith after synonyms: [albert, al] smith
  14. { "bool" : { "must" : { "term" : {

    "user" : "fred" } }, "must_not" : { "range" : { "age" : { "from" : 12, "to" : 21 } } }, "should" : [ { "term" : { "tag" : "crunchy" } }, { "term" : { "tag" : "elasticsearch" } } ], "minimum_number_should_match" : 1, "boost" : 1.0 } }
  15. filters ‘ Boolean queries ‘ Fuzzy, scoring ‘ Fast ‘

    Slower ‘ Cacheable ‘ Not cacheable
  16. curl -XGET -s 'http://localhost:9200/blog/_search?pretty=true' -d \ '{ "query": { "filtered":

    { "filter": { "term": { "category": "rants" } }, "query": { "bool": { "should": [ { "match_phrase": { "body": "fix your little red wagon" } }, { "match": { "body": "fix your little red wagon" } } ] } } } } }'
  17. ‘ pyes ‘ pyelasticsearch query = {'query': { 'filtered': {

    'query': { 'match': {'name': 'test tester'} }, 'filter': { 'range': { 'age': {'from': 27, 'to': 37} }}}}} es.search(query, index='people') ‘ elasticutils print [item['title'] for item in searcher.query(title__text='cookie') .filter(topics='websites')] libraries ✚
  18. thank you twitter: ErikRose [email protected] Background image by Tim and

    Julie Wilson: https://secure.flickr.com/photos/secondtree/. This presentation is noncommercial sharealike in accordance with that image's license. Part 2: Sunday at 1:10pm