$30 off During Our Annual Pro Sale. View Details »

ElasticSearch {r}Evolution. Welcome. [DPC12]

ElasticSearch {r}Evolution. Welcome. [DPC12]

ElasticSearch is quickly becoming one of the primary contenders in the search space: it is distributed, highly available, fast, RESTful, and ready to be plugged into Web applications. Its developers have been busy in the last year; this talk will do a quick introduction to ElasticSearch and cover some of the most interesting and exciting new features. We might even take down a live server or two to illustrate a point.

Andrei Zmievski

June 08, 2012
Tweet

More Decks by Andrei Zmievski

Other Decks in Technology

Transcript

  1. elasticsearch {r}evolution
    welcome.
    Andrei Zmievski • DPC • June 8, 2012

    View Slide

  2. TRUTH

    View Slide

  3. who am i?
    curl http://localhost:9200/speaker/info/andrei
    {“name”: “Andrei Zmievski”,
    “works”: “AppDynamics”,
    “projects”: [“PHP”, “PHP-GTK”, “Smarty”, “Unicode/i18n”],
    “likes”: [“coding”, “beer”, “brewing”, “photography”],
    “twitter”: “@a”,
    “email”: “[email protected]”}

    View Slide

  4. what is elasticsearch?
    a search engine for the NoSQL generation
    domain-driven
    document-oriented
    distributed
    RESTful
    Lucene-based engine

    View Slide

  5. what has happened?
    A year ago was at 0.15.0
    Just released 0.19.4
    Continuous progress, lots of new features,
    improved stability, and more
    Increasing adoption, small and big companies
    No cloud hosting option yet, but maybe soon

    View Slide

  6. API conventions
    append ?pretty=true to get readable JSON
    boolean values: false/0/off = false, rest is true
    JSONP support via callback parameter

    View Slide

  7. API structure
    http://host:port/[index]/[type]/[_action/id]
    GET http://es:9200/twitter/_status
    GET http://es:9200/twitter/tweet/1
    GET http://es:9200/twitter/tweet/_search
    GET http://es:9200/twitter/tweet,user/_search
    GET http://es:9200/twitter,facebook/_search
    GET http://es:9200/_search

    View Slide

  8. API query example
    {
           "query":  {
                   "filtered":  {
                           "query":  {
                                   "query_string":  {
                                           "query":  "foo  bar",
                                           "default_operator":  "AND",
                                           "fields":  ["title",  "description"],
                                           "boost":  2.0
                                   }
                           },
                           "filter":  {
                                   "range":  {"date":  {"gt":  "2012-­‐02-­‐09"}}
                           }
                   }
           },
           "from:  10,
           "size":  10
    }

    View Slide

  9. 3 easy steps

    View Slide

  10. 1. index
    curl  -­‐XPOST  http://localhost:9200/conf/speaker/1  -­‐d'
    {
           "name":  "Andrei  Zmievski",
           "talk":  "ElasticSearch  Revolution.  Welcome.",
           "likes":  ["coding",  "beer",  "photography"],
           "twitter":  "a",
           "height":  187
    }'
    request
    {
           "ok":true
           "_index":"conf"
           "_type":"speaker"
           "_id":"1"
    }
    response

    View Slide

  11. 2. search
    curl  http://localhost:9200/conf/speaker/_search?q=beer
    request
    {  "took"  :  3,
       "_shards"  :  {
           "total"  :  1,
           "successful"  :  1,
           "failed"  :  0
       },
       "hits"  :  {
           "total"  :  1,
           "max_score"  :  0.5908709,
           "hits"  :  [  {
               "_index"  :  "conf",
               "_type"  :  "speaker",
               "_id"  :  "1",
               "_score"  :  0.5908709,
               "_source"  :  
    {
           "name":  "Andrei  Zmievski",
           "lives":  "San  Francisco",
           "likes":  ["coding",  "beer",  "photography"],
           "twitter":  "a",
           "height":  187
    }  }  ]  }  }
    response

    View Slide

  12. 2. search
    curl  http://localhost:9200/conf/speaker/_search?q=beer
    request
    {  "took"  :  3,
       "_shards"  :  {
           "total"  :  1,
           "successful"  :  1,
           "failed"  :  0
       },
       "hits"  :  {
           "total"  :  1,
           "max_score"  :  0.5908709,
           "hits"  :  [  {
               "_index"  :  "conf",
               "_type"  :  "speaker",
               "_id"  :  "1",
               "_score"  :  0.5908709,
               "_source"  :  
    {
           "name":  "Andrei  Zmievski",
           "lives":  "San  Francisco",
           "likes":  ["coding",  "beer",  "photography"],
           "twitter":  "a",
           "height":  187
    }  }  ]  }  }
    response
    total number of hits

    View Slide

  13. 2. search
    curl  http://localhost:9200/conf/speaker/_search?q=beer
    request
    {  "took"  :  3,
       "_shards"  :  {
           "total"  :  1,
           "successful"  :  1,
           "failed"  :  0
       },
       "hits"  :  {
           "total"  :  1,
           "max_score"  :  0.5908709,
           "hits"  :  [  {
               "_index"  :  "conf",
               "_type"  :  "speaker",
               "_id"  :  "1",
               "_score"  :  0.5908709,
               "_source"  :  
    {
           "name":  "Andrei  Zmievski",
           "lives":  "San  Francisco",
           "likes":  ["coding",  "beer",  "photography"],
           "twitter":  "a",
           "height":  187
    }  }  ]  }  }
    response
    the index of the doc

    View Slide

  14. 2. search
    curl  http://localhost:9200/conf/speaker/_search?q=beer
    request
    {  "took"  :  3,
       "_shards"  :  {
           "total"  :  1,
           "successful"  :  1,
           "failed"  :  0
       },
       "hits"  :  {
           "total"  :  1,
           "max_score"  :  0.5908709,
           "hits"  :  [  {
               "_index"  :  "conf",
               "_type"  :  "speaker",
               "_id"  :  "1",
               "_score"  :  0.5908709,
               "_source"  :  
    {
           "name":  "Andrei  Zmievski",
           "lives":  "San  Francisco",
           "likes":  ["coding",  "beer",  "photography"],
           "twitter":  "a",
           "height":  187
    }  }  ]  }  }
    response
    the type of the doc

    View Slide

  15. 2. search
    curl  http://localhost:9200/conf/speaker/_search?q=beer
    request
    {  "took"  :  3,
       "_shards"  :  {
           "total"  :  1,
           "successful"  :  1,
           "failed"  :  0
       },
       "hits"  :  {
           "total"  :  1,
           "max_score"  :  0.5908709,
           "hits"  :  [  {
               "_index"  :  "conf",
               "_type"  :  "speaker",
               "_id"  :  "1",
               "_score"  :  0.5908709,
               "_source"  :  
    {
           "name":  "Andrei  Zmievski",
           "lives":  "San  Francisco",
           "likes":  ["coding",  "beer",  "photography"],
           "twitter":  "a",
           "height":  187
    }  }  ]  }  }
    response
    the id of the doc

    View Slide

  16. 2. search
    curl  http://localhost:9200/conf/speaker/_search?q=beer
    request
    {  "took"  :  3,
       "_shards"  :  {
           "total"  :  1,
           "successful"  :  1,
           "failed"  :  0
       },
       "hits"  :  {
           "total"  :  1,
           "max_score"  :  0.5908709,
           "hits"  :  [  {
               "_index"  :  "conf",
               "_type"  :  "speaker",
               "_id"  :  "1",
               "_score"  :  0.5908709,
               "_source"  :  
    {
           "name":  "Andrei  Zmievski",
           "lives":  "San  Francisco",
           "likes":  ["coding",  "beer",  "photography"],
           "twitter":  "a",
           "height":  187
    }  }  ]  }  }
    response
    the hit score

    View Slide

  17. 2. search
    curl  http://localhost:9200/conf/speaker/_search?q=beer
    request
    {  "took"  :  3,
       "_shards"  :  {
           "total"  :  1,
           "successful"  :  1,
           "failed"  :  0
       },
       "hits"  :  {
           "total"  :  1,
           "max_score"  :  0.5908709,
           "hits"  :  [  {
               "_index"  :  "conf",
               "_type"  :  "speaker",
               "_id"  :  "1",
               "_score"  :  0.5908709,
               "_source"  :  
    {
           "name":  "Andrei  Zmievski",
           "lives":  "San  Francisco",
           "likes":  ["coding",  "beer",  "photography"],
           "twitter":  "a",
           "height":  187
    }  }  ]  }  }
    response
    the original doc contents

    View Slide

  18. 2. search
    curl  http://localhost:9200/conf/speaker/_search?q=beer
    request
    {  "took"  :  3,
       "_shards"  :  {
           "total"  :  1,
           "successful"  :  1,
           "failed"  :  0
       },
       "hits"  :  {
           "total"  :  1,
           "max_score"  :  0.5908709,
           "hits"  :  [  {
               "_index"  :  "conf",
               "_type"  :  "speaker",
               "_id"  :  "1",
               "_score"  :  0.5908709,
               "_source"  :  
    {
           "name":  "Andrei  Zmievski",
           "lives":  "San  Francisco",
           "likes":  ["coding",  "beer",  "photography"],
           "twitter":  "a",
           "height":  187
    }  }  ]  }  }
    response
    the execution time

    View Slide

  19. 3. profit
    that’s up to you

    View Slide

  20. demo

    View Slide

  21. distributed model
    built for performance and resiliency
    zero-conf discovery
    sharding/replication
    auto-routing

    View Slide

  22. replicas
    each shard can have 1 or more replicas
    # of replicas can be updated dynamically after
    index creation
    replicas can be used for querying in parallel

    View Slide

  23. shard allocation
    node 1
    start with a single node

    View Slide

  24. shard allocation
    PUT /person {
    “index”: {
    “number_of_shards”: 2,
    “number_of_replicas”: 1
    }}
    node 1
    person1
    person2

    View Slide

  25. shard allocation
    node 1
    person1
    person2
    node 2
    person1
    person2
    start the second node

    View Slide

  26. shard allocation
    node 1 node 2 node 3 node 4
    person1
    person2
    person1
    person2
    start 2 more nodes

    View Slide

  27. shard allocation
    node 1 node 2 node 3 node 4
    person1
    person2
    person1
    person2
    start 2 more nodes

    View Slide

  28. document sharding
    node 1 node 2 node 3 node 4
    person1
    person2
    person1
    person2
    PUT /person/info/1
    { … }

    View Slide

  29. document sharding
    node 1 node 2 node 3 node 4
    person1
    person2
    person1
    person2
    hashed to shard 1
    PUT /person/info/1
    { … }

    View Slide

  30. document sharding
    node 1 node 2 node 3 node 4
    person1
    person2
    person1
    person2
    replicated
    PUT /person/info/1
    { … }

    View Slide

  31. document sharding
    node 1 node 2 node 3 node 4
    person1
    person2
    person1
    person2
    PUT /person/info/2
    { … }

    View Slide

  32. document sharding
    node 1 node 2 node 3 node 4
    person1
    person2
    person1
    person2
    hashed to shard 2
    PUT /person/info/2
    { … }

    View Slide

  33. document sharding
    node 1 node 2 node 3 node 4
    person1
    person2
    person1
    person2
    replicated
    PUT /person/info/2
    { … }

    View Slide

  34. scatter-gather
    node 1 node 2 node 3 node 4
    person1
    person2
    person1
    person2
    GET /person/_search?q=name:thomas

    View Slide

  35. shard allocation
    node 1 node 2 node 3 node 4
    person1
    person2
    person1
    person2
    GET /person/_search?q=name:thomas

    View Slide

  36. shard allocation
    node 1 node 2 node 3 node 4
    person1
    person2
    person1
    person2
    GET /person/_search?q=name:thomas

    View Slide

  37. shard allocation
    node 1 node 2 node 3 node 4
    person1
    person2
    person1
    person2
    GET /person/_search?q=name:thomas

    View Slide

  38. transactional model
    write consistency is per-document
    uses write-ahead transaction log
    1 second index refresh rate by default

    View Slide

  39. storage
    node data considered transient
    can be stored in local file system, JVM heap,
    native OS memory, or FS & memory combination
    gateway is a persistent storage mechanism
    local, shared FS, HDFS, S3

    View Slide

  40. mapping
    describes document structure
    automatically created with sensible defaults, but
    can be overridden per field
    many field types: string, integer/long, float/double,
    boolean, date, geo, array, object, and more

    View Slide

  41. sample mapping
    {"user":            "derick",
     "title":          "Don’t  Panic",
     "tags":            ["profiling",  "debugging",  "php"],
     "postDate":    "2010-­‐12-­‐22T17:14:12",
     "priority":    2}
    document
    {"post":  {
       "properties"  :  {
           "user":            {"type":  "string",  "index":  "not_analyzed"},
           "message":      {"type":  "string",  “boost”:  1.5},
           "tags":            {"type":  "string",  "include_in_all":  "no"},
           "postDate"  :  {"type"  :  "date",  “store”:  “no”},
           "priority"  :  {"type"  :  "integer"}
    }}}
    mapping

    View Slide

  42. sample mapping
    {"user":            "derick",
     "title":          "Don’t  Panic",
     "tags":            ["profiling",  "debugging",  "php"],
     "postDate":    "2010-­‐12-­‐22T17:14:12",
     "priority":    2}
    document
    {"post":  {
       "properties"  :  {
           "user":            {"type":  "string",  "index":  "not_analyzed"},
           "message":      {"type":  "string",  “boost”:  1.5},
           "tags":            {"type":  "string",  "include_in_all":  "no"},
           "postDate"  :  {"type"  :  "date",  “store”:  “no”},
           "priority"  :  {"type"  :  "integer"}  not  really  needed
    }}}
    mapping

    View Slide

  43. analyzers
    break down (tokenize) and normalize fields during
    indexing and query strings at search time
    index:
       analysis:
           analyzer:
               eulang:
                   type:  custom
                   tokenizer:  standard
                   filter:  [standard,  lowercase,  stop,
                                     asciifolding,  porterStem]
    elasticsearch.yml

    "title":  {"type":  "string",  "analyzer":  "eulang"},

    mapping

    View Slide

  44. filters
    share some similar features with queries
    apply to the result of the query
    why use a filter?

    View Slide

  45. filters
    faster than queries
    cached (depends on the filter)
    the cache is used for different queries against
    the same filter
    no scoring
    more useful ones: term, terms, range, prefix, and,
    or, not, exists, missing, query

    View Slide

  46. facets
    provide aggregated data based on the search
    request
    usual purpose is to offer a faceted navigation, or
    faceted search (EBay and more)
    facet types: terms, histogram, date histogram,
    range, statistical, and more

    View Slide

  47. rivers
    pluggable service running within the cluster
    pulls in data from external sources and indexes it
    automatic failover
    current: Twitter, MongoDB, CouchDB, RabbitMQ,
    RSS, Wikipedia

    View Slide

  48. percolator
    turns searching on its head
    search: index docs and run queries for matches
    percolator: index queries and run docs for
    matches
    great feature for notification/triggers
    implementation

    View Slide

  49. index aliases
    each index name can have one or more aliases
    atomic renames allow on-the-fly index switching
    actual index: tweets{date}
    alias: tweets
    on update, create new index and switch alias

    View Slide

  50. filtered index aliases
    allows creation of “views” into an index
    associates a filter with an alias
    curl  -­‐XPOST  http://localhost:9200/_aliases  -­‐d'
    {
           "actions"  :  [
                   {
                           "add"  :  {
                                     "index"  :  "posts",
                                     "alias"  :  "posts_by_andrei",
                                     "filter"  :  {  "term"  :  {  "user"  :  "andrei"  }  }
                           }
                   }
           ]
    }'
    filtered alias

    View Slide

  51. parent/child docs
    _parent field in mapping
    establishes relationship between doc types, e.g.
    comment and post
    used with has_child and top_children queries

    View Slide

  52. geo search
    implemented as filters (and a facet)
    geo_distance
    geo_bounding_box
    geo_polygon

    View Slide

  53. plugins
    add custom functionality to ES
    written in Java
    installable from GitHub
    custom mapping types, scripting language
    support, custom discovery, admin tools, and more

    View Slide

  54. interfaces
    REST
    Java / Groovy
    clients/integration:
    Python, PHP, Ruby, Perl, Erlang, Django, Drupal,
    Symfony2, CouchDB, Flume,
    Flume sink implementation

    View Slide

  55. References
    http://github.com/elasticsearch/elasticsearch
    https://groups.google.com/group/elasticsearch
    IRC: #elasticsearch on irc.freenode.net
    twitter: @elasticsearch
    Useful tutorials:
    Query DSL Explained
    ElasticSearch on EC2

    View Slide

  56. Dank u wel!
    http://joind.in/6236

    View Slide