Upgrade to Pro — share decks privately, control downloads, hide ads and more …

From A to JSON - an overview of Elasticsearch

Boaz Leskes
January 09, 2014

From A to JSON - an overview of Elasticsearch

A talk I gave at the Jan 9th 2014 010Dev meetup

It gives an overview of Elasticsearch with some extra attention to the new features in the 1.0 release

Boaz Leskes

January 09, 2014
Tweet

More Decks by Boaz Leskes

Other Decks in Technology

Transcript

  1. Boaz Leskes @bleskes From A to JSON an overview of

    Elasticsearch
  2. Plug & Play

  3. Installation $ wget https://download.elasticsearch.org/... $ tar -xf elasticsearch-0.90.9.tar.gz $ ./elasticsearch-0.90.9/bin/elasticsearch

    -f ... [INFO ][node][Ghost Maker] {0.90.2}[5645]: initializing ...
  4. Index a document... $ curl -X PUT localhost:9200/products/product/1 -d '{

    "title" : "Welcome!" }'
  5. Update a document... $ curl -X PUT localhost:9200/products/product/1 -d '{

    "title" : "Welcome to the breakfast. Bon appetite!” }'
  6. Search for documents.... $ curl -X GET localhost:9200/products/_search?q=welcome

  7. Add a node... $ ./elasticsearch-0.90.9/bin/elasticsearch -f -D es.node.name=Node2 ...[cluster.service] [Node2]

    detected_master [Node1] ...
  8. Add another node... $ ./elasticsearch-0.90.9/bin/elasticsearch -f -D es.node.name=Node3 ...[cluster.service] [Node3]

    detected_master [Node1] ...
  9. {
 "id" : "abc123",
 
 "title" : "A JSON Document",


    
 "body" : "A JSON document is a ...",
 
 "published_on" : "2013/06/27 10:00:00", ! "featured" : true,
 
 "tags" : ["search", "json"],
 
 "author" : {
 "first_name" : "Clara",
 "last_name" : "Rice",
 "email" : "[email protected]"
 }
 } Documents as JSON Data structure with basic types, arrays and deep hierarchies
  10. http:// Lingua Franca of APIs Also supported: Native Java protocol,

    Thrift, Memcached
  11. None
  12. Until you know what to tweak...

  13. Search & Find

  14. Terms apple apple iphone Phrases "apple iphone" Proximity "apple safari"~5

    Fuzzy apple~0.8 Wildcards app* *pp* Boosting apple^10 safari Range [2011/05/01 TO 2011/05/31] [java TO json] Boolean apple AND NOT iphone +apple -iphone (apple OR iphone) AND NOT review Fields title:iphone^15 OR body:iphone published_on:[2011/05/01 TO "2011/05/27 10:00:00"] http://lucene.apache.org/core/4_5_0/queryparser... $ curl -X GET "http://localhost:9200/_search?q=<YOUR QUERY>"
  15. curl -X GET localhost:9200/articles/_search -d '{
 "query" : {
 "filtered"

    : {
 "query" : {
 "bool" : {
 
 "must" : {
 "match" : {
 "author.first_name" : {
 "query" : "claire",
 "fuzziness" : 0.1
 }
 }
 },
 
 "must" : {
 "multi_match" : {
 "query" : "elasticsearch",
 "fields" : ["title^10", "body"]
 }
 }
 }
 
 },
 
 "filter": {
 "and" : [
 { "terms" : { "tags" : ["search"] } },
 { "range" : { "published_on": {"from": "2013"} } },
 { "term" : { "featured" : true } }
 ]
 }
 }
 }
 }' JSON-based Query DSL
  16. curl -X GET localhost:9200/articles/_search -d '{
 "query" : {
 "filtered"

    : {
 "query" : {
 "bool" : {
 
 "must" : {
 "match" : {
 "author.first_name" : {
 "query" : "claire",
 "fuzziness" : 0.1
 }
 }
 },
 
 "must" : {
 "multi_match" : {
 "query" : "elasticsearch",
 "fields" : ["title^10", "body"]
 }
 }
 }
 
 },
 
 "filter": {
 "and" : [
 { "terms" : { "tags" : ["search"] } },
 { "range" : { "published_on": {"from": "2013"} } },
 { "term" : { "featured" : true } }
 ]
 }
 }
 }
 }' JSON-based Query DSL 
 
 
 

  17. curl -X GET localhost:9200/articles/_search -d '{
 "query" : {
 "filtered"

    : {
 "query" : {
 "bool" : {
 
 "must" : {
 "match" : {
 "author.first_name" : {
 "query" : "claire",
 "fuzziness" : 0.1
 }
 }
 },
 
 "must" : {
 "multi_match" : {
 "query" : "elasticsearch",
 "fields" : ["title^10", "body"]
 }
 }
 }
 
 },
 
 "filter": {
 "and" : [
 { "terms" : { "tags" : ["search"] } },
 { "range" : { "published_on": {"from": "2013"} } },
 { "term" : { "featured" : true } }
 ]
 }
 }
 }
 }' JSON-based Query DSL 
 
 
 

  18. curl -X GET localhost:9200/articles/_search -d '{
 "query" : {
 "filtered"

    : {
 "query" : {
 "bool" : {
 
 "must" : {
 "match" : {
 "author.first_name" : {
 "query" : "claire",
 "fuzziness" : 0.1
 }
 }
 },
 
 "must" : {
 "multi_match" : {
 "query" : "elasticsearch",
 "fields" : ["title^10", "body"]
 }
 }
 }
 
 },
 
 "filter": {
 "and" : [
 { "terms" : { "tags" : ["search"] } },
 { "range" : { "published_on": {"from": "2013"} } },
 { "term" : { "featured" : true } }
 ]
 }
 }
 }
 }' JSON-based Query DSL 
 
 
 

  19. curl -X GET localhost:9200/articles/_search -d '{
 "query" : {
 "filtered"

    : {
 "query" : {
 "bool" : {
 
 "must" : {
 "match" : {
 "author.first_name" : {
 "query" : "claire",
 "fuzziness" : 0.1
 }
 }
 },
 
 "must" : {
 "multi_match" : {
 "query" : "elasticsearch",
 "fields" : ["title^10", "body"]
 }
 }
 }
 
 },
 
 "filter": {
 "and" : [
 { "terms" : { "tags" : ["search"] } },
 { "range" : { "published_on": {"from": "2013"} } },
 { "term" : { "featured" : true } }
 ]
 }
 }
 }
 }' JSON-based Query DSL 
 
 
 

  20. “Find all articles with ‘search’ in their title or body,

    give matches in titles higher score” Full-text Search “Find all articles from year 2013 tagged ‘search’” Structured Search See custom_score and custom_filters_score queries Custom Scoring
  21. Fetch document field ➝ Pick configured analyzer ➝ Parse text

    into tokens ➝ Apply token filters ➝ Store into index How Search Engine Works? Result Results Query How Users See Search?
  22. Mapping curl -X PUT localhost:9200/articles/_mapping -d '{
 "article" : {


    "properties" : {
 "title" : {
 "type" : "string",
 "analyzer" : "english"
 }
 }
 }
 }' Configuring document properties for the search engine
  23. _analyze?pretty&format=text&text=jumping+jack+flash.
 The _analyze API [jumping:0->7:<ALPHANUM>] [jack:8->12:<ALPHANUM>] [flash:13->18:<ALPHANUM>] _analyze?pretty&format=text&text=jumping+jack+flash.&analyzer=english
 [jump:0->7:<ALPHANUM>] [jack:8->12:<ALPHANUM>]

    [flash:13->18:<ALPHANUM>] _analyze?text=...&tokenizer=X&filters=A,B,C
  24. Slice & Dice

  25. Query Facets

  26. curl -X POST 'localhost:9200/articles/_search?search_type=count&pretty' -d '{
 "facets": {
 "tag-cloug": {


    "terms" : {
 "field" : "tags"
 }
 }
 }
 }'
 “Tag Cloud” With the terms Facet "facets" : { "tag-cloug" : { "terms" : [ { "term" : "ruby", "count" : 3 }, { "term" : "java", "count" : 2 }, ... } ] } } Simplest “map/reduce” aggregation: document count per tag
  27. curl -X GET 'localhost:9200/scores/_search/?search_type=count&pretty' -d '{
 "facets": {
 "scores-per-subject" :

    {
 "terms_stats" : {
 "key_field" : "subject",
 "value_field" : "score"
 }
 }
 }
 }'
 Statistics on Student Scores With the terms_stats Facet "facets" : { "scores-per-subject" : { "_type" : "terms_stats", "missing" : 0, "terms" : [ { "term" : "math", "count" : 4, "total_count" : 4, "min" : 25.0, "max" : 92.0, "total" : 267.0, "mean" : 66.75 }, ... ] } } Aggregating statistics per subject
  28. curl -X GET 'localhost:9200/scores/_search/?search_type=count&pretty' -d '{
 "query" : {
 "match"

    : {
 "student" : "john"
 }
 },
 "facets": {
 "scores-per-subject" : {
 "terms_stats" : {
 "key_field" : "subject",
 "value_field" : "score"
 }
 }
 }
 }'
 Statistics on Student Scores With the terms_stats Facet "facets" : { "scores-per-subject" : { "_type" : "terms_stats", "missing" : 0, "terms" : [ { "term" : "math", "count" : 1, "total_count" : 1, "min" : 85.0, "max" : 85.0, "total" : 85.0, "mean" : 85.0 }, ... ] } } Realtime filtering with queries and filters
  29. Facets Terms Terms Stats Statistical Range Histogram Date Histogram Filter

    Query Geo Distance
  30. Above & Beyond

  31. Above & Beyond Bulk operations (For indexing and search operations)

    Percolator (“reversed search” — alerts, classification, …) Suggesters (“Did you mean …?”) Index aliases (Grouping, filtering or “renaming” of indices) Index templates (Automatic index configuration) Monitoring API (Amount of memory used, number of operations, …) …
  32. Aggregations "

  33. What’s wrong with facets? nothing it’s just that we want

    more…
  34. curl -X GET 'localhost:9200/scores/_search/?search_type=count&pretty' -d '{
 "query" : {
 "match"

    : {
 "student" : "john"
 }
 },
 "aggs": {
 "scores-per-subject" : {
 "terms" : { "field" : “subject” }, "aggs" : { “avg_score” : { "avg" : { "field" : "score"
 } } } }
 }
 }'
 "aggregations" : { "scores-per-subject" : { "terms" : [ { "term" : "math", "doc_count" : 1, "avg_score" : { “value": 85.0 } }, ... ] } }
  35. curl -X GET 'localhost:9200/scores/_search/?search_type=count&pretty' -d '{
 "query" : {
 "match"

    : {
 "student" : "john"
 }
 },
 "aggs": {
 "scores-per-subject" : {
 "terms" : { "field" : “subject” }, "aggs" : { "avg_score_by_year”: { “date_histogram”: { "field" : "date", "interval" : "year", "format" : "yyyy" } } "aggs": { "avg_score" : { "avg": { "field" : "score"
 } } } } } }
 }
 }'
 "aggregations" : { "scores-per-subject" : { "terms" : [ { "term" : "math", "doc_count" : 1, "avg_score_by_year" : [{ "key_as_string": "2013", "avg_score": { “value”: 85.0 } }… ] }, ... ] } }
  36. Distributed Percolation "

  37. curl -XPUT “localhost:9200/twitter/.percolator/es-tweets” -d ‘{ “query”: { “match”: { “body”:

    “elasticsearch” } } }’ $ curl -XGET “localhost:9200/twitter/_percolate” -d ‘{ “doc”: { “body”: “#elasticsearch is awesome” “nick”: “@imotov” “name”: “Igor Motov” “date”: “2013-11-03” } }’ { … “matches”: [ { “_index”: “twitter”, “_id”: “es-tweets” } ] }
  38. So what’s in distribution? •Highlighting •Sorting •Multi-Index support •Aggregations •Multi-Percolate

  39. Snapshot & Restore "

  40. Backup, 0.90 style 1. disable flush 2. find all primary

    shard location (optional) 3. copy files from primary shards (rsync) 4. enable flush
  41. curl -XPUT “localhost:9200/_snapshot/my_backup/snapshot_20140101” Backup, 1.0 style

  42. Register a repository curl -XPUT "localhost:9200/_snapshot/my_backup" -d '{ "type": "fs",

    "settings": { "location":"/mnt/es-test-repo" } }'
  43. curl -XPUT “localhost:9200/_snapshot/my_backup/snapshot_20140101” -d ‘{ "indices":"+test_*,-test_4" }’ Creating a Snapshot

  44. Backup, 0.90 style 1. close the index (shutdown the cluster)

    2. find all existing index shards 3. replace all index shards with data from backup 4. open the index (start the cluster)
  45. curl -XPOST "localhost:9200/test_*/_close" Restore, 1.0 style curl -XPOST "localhost:9200/_snapshot/my_backup/snapshot_20140101" -d

    '{ "indices":"test_*" }'
  46. Shard & Cluster

  47. A curl -XPUT 'http://localhost:9200/a/' -d '{
 "settings" : {
 "index"

    : {
 "number_of_shards" : 3,
 "number_of_replicas" : 1
 }
 }
 }'
 Index is partitioned into 3 primary shards, each is duplicated in 1 replica shard A1 A2 A3 Replicas Primaries A1' A2' A3'
  48. 1 node 2 nodes 3 nodes Demo "index.routing.allocation.exclude.name" : "Node1"

    "cluster.routing.allocation.exclude.name" : "Node3" ...
  49. thanks!