Upgrade to Pro — share decks privately, control downloads, hide ads and more …

ElasticSearch {r}Evolution. Welcome.

ElasticSearch {r}Evolution. Welcome.

ElasticSearch is quickly becoming one of the primary contenders in the search space: it is distributed, highly available, fast, RESTful, and ready to be plugged into Web applications. Its developers have been busy in the last year; this talk will do a quick introduction to ElasticSearch and cover some of the most interesting and exciting new features.

Andrei Zmievski

February 29, 2012
Tweet

More Decks by Andrei Zmievski

Other Decks in Technology

Transcript

  1. elasticsearch {r}evolution
    welcome.
    Andrei Zmievski • ConFoo • Feb 29, 2012
    Wednesday, February 29, 12

    View Slide

  2. who am i?
    curl http://localhost:9200/speaker/info/andrei
    {“name”: “Andrei Zmievski”,
    “works”: “AppDynamics”,
    “projects”: [“PHP”, “PHP-GTK”, “Smarty”, “Unicode/i18n”],
    “likes”: [“coding”, “beer”, “brewing”, “photography”],
    “twitter”: “@a”,
    “email”: “[email protected]”}
    Wednesday, February 29, 12

    View Slide

  3. what is elasticsearch?
    a search engine for the NoSQL generation
    domain-driven
    document-oriented
    distributed
    RESTful
    Lucene-based engine
    Wednesday, February 29, 12

    View Slide

  4. what has happened?
    A year ago was at 0.15.0
    Just released 0.19.0RC3
    Continuous progress, lots of new features,
    improved stability, and more
    Increasing adoption, small and big companies
    No cloud hosting option yet, but maybe soon
    Wednesday, February 29, 12

    View Slide

  5. API conventions
    append ?pretty=true to get readable JSON
    boolean values: false/0/off = false, rest is true
    JSONP support via callback parameter
    Wednesday, February 29, 12

    View Slide

  6. API structure
    http://host:port/[index]/[type]/[_action/id]
    GET http://es:9200/twitter/_status
    GET http://es:9200/twitter/tweet/1
    GET http://es:9200/twitter/tweet/_search
    GET http://es:9200/twitter/tweet,user/_search
    GET http://es:9200/twitter,facebook/_search
    GET http://es:9200/_search
    Wednesday, February 29, 12

    View Slide

  7. API query example
    {
           "query":  {
                   "filtered":  {
                           "query":  {
                                   "query_string":  {
                                           "query":  "foo  bar",
                                           "default_operator":  "AND",
                                           "fields":  ["title",  "description"],
                                           "boost":  2.0
                                   }
                           },
                           "filter":  {
                                   "range":  {"date":  {"gt":  "2012-­‐02-­‐09"}}
                           }
                   }
           },
           "from:  10,
           "size":  10
    }
    Wednesday, February 29, 12

    View Slide

  8. 3 easy steps
    Wednesday, February 29, 12

    View Slide

  9. 1. index
    curl  -­‐XPOST  http://localhost:9200/conf/speaker/1  -­‐d'
    {
           "name":  "Andrei  Zmievski",
           "talk":  "ElasticSearch  Revolution.  Welcome.",
           "likes":  ["coding",  "beer",  "photography"],
           "twitter":  "a",
           "height":  187
    }'
    request
    {
           "ok":true
           "_index":"conf"
           "_type":"speaker"
           "_id":"1"
    }
    response
    Wednesday, February 29, 12

    View Slide

  10. 2. search
    curl  http://localhost:9200/conf/speaker/_search?q=beer
    request
    {  "took"  :  3,
       "_shards"  :  {
           "total"  :  1,
           "successful"  :  1,
           "failed"  :  0
       },
       "hits"  :  {
           "total"  :  1,
           "max_score"  :  0.5908709,
           "hits"  :  [  {
               "_index"  :  "conf",
               "_type"  :  "speaker",
               "_id"  :  "1",
               "_score"  :  0.5908709,
               "_source"  :  
    {
           "name":  "Andrei  Zmievski",
           "lives":  "San  Francisco",
           "likes":  ["coding",  "beer",  "photography"],
           "twitter":  "a",
           "height":  187
    }  }  ]  }  }
    response
    Wednesday, February 29, 12

    View Slide

  11. 2. search
    curl  http://localhost:9200/conf/speaker/_search?q=beer
    request
    {  "took"  :  3,
       "_shards"  :  {
           "total"  :  1,
           "successful"  :  1,
           "failed"  :  0
       },
       "hits"  :  {
           "total"  :  1,
           "max_score"  :  0.5908709,
           "hits"  :  [  {
               "_index"  :  "conf",
               "_type"  :  "speaker",
               "_id"  :  "1",
               "_score"  :  0.5908709,
               "_source"  :  
    {
           "name":  "Andrei  Zmievski",
           "lives":  "San  Francisco",
           "likes":  ["coding",  "beer",  "photography"],
           "twitter":  "a",
           "height":  187
    }  }  ]  }  }
    response
    total number of hits
    Wednesday, February 29, 12

    View Slide

  12. 2. search
    curl  http://localhost:9200/conf/speaker/_search?q=beer
    request
    {  "took"  :  3,
       "_shards"  :  {
           "total"  :  1,
           "successful"  :  1,
           "failed"  :  0
       },
       "hits"  :  {
           "total"  :  1,
           "max_score"  :  0.5908709,
           "hits"  :  [  {
               "_index"  :  "conf",
               "_type"  :  "speaker",
               "_id"  :  "1",
               "_score"  :  0.5908709,
               "_source"  :  
    {
           "name":  "Andrei  Zmievski",
           "lives":  "San  Francisco",
           "likes":  ["coding",  "beer",  "photography"],
           "twitter":  "a",
           "height":  187
    }  }  ]  }  }
    response
    the index of the doc
    Wednesday, February 29, 12

    View Slide

  13. 2. search
    curl  http://localhost:9200/conf/speaker/_search?q=beer
    request
    {  "took"  :  3,
       "_shards"  :  {
           "total"  :  1,
           "successful"  :  1,
           "failed"  :  0
       },
       "hits"  :  {
           "total"  :  1,
           "max_score"  :  0.5908709,
           "hits"  :  [  {
               "_index"  :  "conf",
               "_type"  :  "speaker",
               "_id"  :  "1",
               "_score"  :  0.5908709,
               "_source"  :  
    {
           "name":  "Andrei  Zmievski",
           "lives":  "San  Francisco",
           "likes":  ["coding",  "beer",  "photography"],
           "twitter":  "a",
           "height":  187
    }  }  ]  }  }
    response
    the type of the doc
    Wednesday, February 29, 12

    View Slide

  14. 2. search
    curl  http://localhost:9200/conf/speaker/_search?q=beer
    request
    {  "took"  :  3,
       "_shards"  :  {
           "total"  :  1,
           "successful"  :  1,
           "failed"  :  0
       },
       "hits"  :  {
           "total"  :  1,
           "max_score"  :  0.5908709,
           "hits"  :  [  {
               "_index"  :  "conf",
               "_type"  :  "speaker",
               "_id"  :  "1",
               "_score"  :  0.5908709,
               "_source"  :  
    {
           "name":  "Andrei  Zmievski",
           "lives":  "San  Francisco",
           "likes":  ["coding",  "beer",  "photography"],
           "twitter":  "a",
           "height":  187
    }  }  ]  }  }
    response
    the id of the doc
    Wednesday, February 29, 12

    View Slide

  15. 2. search
    curl  http://localhost:9200/conf/speaker/_search?q=beer
    request
    {  "took"  :  3,
       "_shards"  :  {
           "total"  :  1,
           "successful"  :  1,
           "failed"  :  0
       },
       "hits"  :  {
           "total"  :  1,
           "max_score"  :  0.5908709,
           "hits"  :  [  {
               "_index"  :  "conf",
               "_type"  :  "speaker",
               "_id"  :  "1",
               "_score"  :  0.5908709,
               "_source"  :  
    {
           "name":  "Andrei  Zmievski",
           "lives":  "San  Francisco",
           "likes":  ["coding",  "beer",  "photography"],
           "twitter":  "a",
           "height":  187
    }  }  ]  }  }
    response
    the hit score
    Wednesday, February 29, 12

    View Slide

  16. 2. search
    curl  http://localhost:9200/conf/speaker/_search?q=beer
    request
    {  "took"  :  3,
       "_shards"  :  {
           "total"  :  1,
           "successful"  :  1,
           "failed"  :  0
       },
       "hits"  :  {
           "total"  :  1,
           "max_score"  :  0.5908709,
           "hits"  :  [  {
               "_index"  :  "conf",
               "_type"  :  "speaker",
               "_id"  :  "1",
               "_score"  :  0.5908709,
               "_source"  :  
    {
           "name":  "Andrei  Zmievski",
           "lives":  "San  Francisco",
           "likes":  ["coding",  "beer",  "photography"],
           "twitter":  "a",
           "height":  187
    }  }  ]  }  }
    response
    the original doc contents
    Wednesday, February 29, 12

    View Slide

  17. 2. search
    curl  http://localhost:9200/conf/speaker/_search?q=beer
    request
    {  "took"  :  3,
       "_shards"  :  {
           "total"  :  1,
           "successful"  :  1,
           "failed"  :  0
       },
       "hits"  :  {
           "total"  :  1,
           "max_score"  :  0.5908709,
           "hits"  :  [  {
               "_index"  :  "conf",
               "_type"  :  "speaker",
               "_id"  :  "1",
               "_score"  :  0.5908709,
               "_source"  :  
    {
           "name":  "Andrei  Zmievski",
           "lives":  "San  Francisco",
           "likes":  ["coding",  "beer",  "photography"],
           "twitter":  "a",
           "height":  187
    }  }  ]  }  }
    response
    the execution time
    Wednesday, February 29, 12

    View Slide

  18. 3. profit
    that’s up to you
    Wednesday, February 29, 12

    View Slide

  19. demo
    Wednesday, February 29, 12

    View Slide

  20. distributed model
    built for performance and resiliency
    zero-conf discovery
    sharding/replication
    auto-routing
    Wednesday, February 29, 12

    View Slide

  21. transactional model
    write consistency is per-document
    uses write-ahead transaction log
    1 second index refresh rate by default
    Wednesday, February 29, 12

    View Slide

  22. storage
    node data considered transient
    can be stored in local file system, JVM heap,
    native OS memory, or FS & memory combination
    gateway is a persistent storage mechanism
    local, shared FS, HDFS, S3
    Wednesday, February 29, 12

    View Slide

  23. mapping
    describes document structure
    automatically created with sensible defaults, but
    can be overridden per field
    many field types: string, integer/long, float/double,
    boolean, date, geo, array, object, and more
    Wednesday, February 29, 12

    View Slide

  24. sample mapping
    {"user":            "derick",
     "title":          "Don’t  Panic",
     "tags":            ["profiling",  "debugging",  "php"],
     "postDate":    "2010-­‐12-­‐22T17:14:12",
     "priority":    2}
    document
    {"post":  {
       "properties"  :  {
           "user":            {"type":  "string",  "index":  "not_analyzed"},
           "message":      {"type":  "string",  “boost”:  1.5},
           "tags":            {"type":  "string",  "include_in_all":  "no"},
           "postDate"  :  {"type"  :  "date",  “store”:  “no”},
           "priority"  :  {"type"  :  "integer"}
    }}}
    mapping
    Wednesday, February 29, 12

    View Slide

  25. sample mapping
    {"user":            "derick",
     "title":          "Don’t  Panic",
     "tags":            ["profiling",  "debugging",  "php"],
     "postDate":    "2010-­‐12-­‐22T17:14:12",
     "priority":    2}
    document
    {"post":  {
       "properties"  :  {
           "user":            {"type":  "string",  "index":  "not_analyzed"},
           "message":      {"type":  "string",  “boost”:  1.5},
           "tags":            {"type":  "string",  "include_in_all":  "no"},
           "postDate"  :  {"type"  :  "date",  “store”:  “no”},
           "priority"  :  {"type"  :  "integer"}  not  really  needed
    }}}
    mapping
    Wednesday, February 29, 12

    View Slide

  26. analyzers
    break down (tokenize) and normalize fields during
    indexing and query strings at search time
    index:
       analysis:
           analyzer:
               eulang:
                   type:  custom
                   tokenizer:  standard
                   filter:  [standard,  lowercase,  stop,
                                     asciifolding,  porterStem]
    elasticsearch.yml

    "title":  {"type":  "string",  "analyzer":  "eulang"},

    mapping
    Wednesday, February 29, 12

    View Slide

  27. filters
    share some similar features with queries
    apply to the result of the query
    why use a filter?
    Wednesday, February 29, 12

    View Slide

  28. filters
    faster than queries
    cached (depends on the filter)
    the cache is used for different queries against
    the same filter
    no scoring
    more useful ones: term, terms, range, prefix, and,
    or, not, exists, missing, query
    Wednesday, February 29, 12

    View Slide

  29. facets
    provide aggregated data based on the search
    request
    usual purpose is to offer a faceted navigation, or
    faceted search (EBay and more)
    facet types: terms, histogram, date histogram,
    range, statistical, and more
    Wednesday, February 29, 12

    View Slide

  30. rivers
    pluggable service running within the cluster
    pulls in data from external sources and indexes it
    automatic failover
    current: Twitter, MongoDB, CouchDB, RabbitMQ,
    RSS, Wikipedia
    Wednesday, February 29, 12

    View Slide

  31. percolator
    turns searching on its head
    search: index docs and run queries for matches
    percolator: index queries and run docs for
    matches
    great feature for notification/triggers
    implementation
    Wednesday, February 29, 12

    View Slide

  32. index aliases
    each index name can have one or more aliases
    atomic renames allow on-the-fly index switching
    actual index: tweets{date}
    alias: tweets
    on update, create new index and switch alias
    Wednesday, February 29, 12

    View Slide

  33. filtered index aliases
    allows creation of “views” into an index
    associates a filter with an alias
    curl  -­‐XPOST  http://localhost:9200/_aliases  -­‐d'
    {
           "actions"  :  [
                   {
                           "add"  :  {
                                     "index"  :  "posts",
                                     "alias"  :  "posts_by_andrei",
                                     "filter"  :  {  "term"  :  {  "user"  :  "andrei"  }  }
                           }
                   }
           ]
    }'
    filtered alias
    Wednesday, February 29, 12

    View Slide

  34. parent/child docs
    _parent field in mapping
    establishes relationship between doc types, e.g.
    comment and post
    used with has_child and top_children queries
    Wednesday, February 29, 12

    View Slide

  35. geo search
    implemented as filters (and a facet)
    geo_distance
    geo_bounding_box
    geo_polygon
    Wednesday, February 29, 12

    View Slide

  36. plugins
    add custom functionality to ES
    written in Java
    installable from GitHub
    custom mapping types, scripting language
    support, custom discovery, admin tools, and more
    Wednesday, February 29, 12

    View Slide

  37. interfaces
    REST
    Java / Groovy
    clients/integration:
    Python, PHP, Ruby, Perl, Erlang, Django, Drupal,
    Symfony2, CouchDB, Flume,
    Flume sink implementation
    Wednesday, February 29, 12

    View Slide

  38. data import
    ES is not the primary data store (usually)
    to import/synchronize data:
    write an agent (Gearman, message queues, etc)
    use a river plugin (CouchDB, RabbitMQ, Twitter)
    Wednesday, February 29, 12

    View Slide

  39. References
    http://github.com/elasticsearch/elasticsearch
    https://groups.google.com/group/elasticsearch
    IRC: #elasticsearch on irc.freenode.net
    twitter: @elasticsearch
    Useful tutorials:
    Query DSL Explained
    ElasticSearch on EC2
    Wednesday, February 29, 12

    View Slide

  40. Merci!
    http://joind.in/5998
    Wednesday, February 29, 12

    View Slide