$30 off During Our Annual Pro Sale. View Details »

From A to JSON - an overview of Elasticsearch

Boaz Leskes
January 09, 2014

From A to JSON - an overview of Elasticsearch

A talk I gave at the Jan 9th 2014 010Dev meetup

It gives an overview of Elasticsearch with some extra attention to the new features in the 1.0 release

Boaz Leskes

January 09, 2014
Tweet

More Decks by Boaz Leskes

Other Decks in Technology

Transcript

  1. Boaz Leskes
    @bleskes
    From A to JSON
    an overview of Elasticsearch

    View Slide

  2. Plug & Play

    View Slide

  3. Installation
    $ wget https://download.elasticsearch.org/...
    $ tar -xf elasticsearch-0.90.9.tar.gz
    $ ./elasticsearch-0.90.9/bin/elasticsearch -f
    ... [INFO ][node][Ghost Maker] {0.90.2}[5645]: initializing ...

    View Slide

  4. Index a document...
    $ curl -X PUT localhost:9200/products/product/1 -d '{
    "title" : "Welcome!"
    }'

    View Slide

  5. Update a document...
    $ curl -X PUT localhost:9200/products/product/1 -d '{
    "title" : "Welcome to the breakfast. Bon appetite!”
    }'

    View Slide

  6. Search for documents....
    $ curl -X GET localhost:9200/products/_search?q=welcome

    View Slide

  7. Add a node...
    $ ./elasticsearch-0.90.9/bin/elasticsearch -f -D es.node.name=Node2
    ...[cluster.service] [Node2] detected_master [Node1] ...

    View Slide

  8. Add another node...
    $ ./elasticsearch-0.90.9/bin/elasticsearch -f -D es.node.name=Node3
    ...[cluster.service] [Node3] detected_master [Node1] ...

    View Slide

  9. {

    "id" : "abc123",


    "title" : "A JSON Document",


    "body" : "A JSON document is a ...",


    "published_on" : "2013/06/27 10:00:00",
    !
    "featured" : true,


    "tags" : ["search", "json"],


    "author" : {

    "first_name" : "Clara",

    "last_name" : "Rice",

    "email" : "[email protected]"

    }

    }
    Documents as JSON
    Data structure with basic types, arrays and deep hierarchies

    View Slide

  10. http:// Lingua Franca of APIs
    Also supported: Native Java protocol, Thrift, Memcached

    View Slide

  11. View Slide

  12. Until you know what to tweak...

    View Slide

  13. Search & Find

    View Slide

  14. Terms apple
    apple iphone
    Phrases "apple iphone"
    Proximity "apple safari"~5
    Fuzzy apple~0.8
    Wildcards app*
    *pp*
    Boosting apple^10 safari
    Range [2011/05/01 TO 2011/05/31]
    [java TO json]
    Boolean
    apple AND NOT iphone
    +apple -iphone
    (apple OR iphone) AND NOT review
    Fields
    title:iphone^15 OR body:iphone
    published_on:[2011/05/01 TO "2011/05/27 10:00:00"]
    http://lucene.apache.org/core/4_5_0/queryparser...
    $ curl -X GET "http://localhost:9200/_search?q="

    View Slide

  15. curl -X GET localhost:9200/articles/_search -d '{

    "query" : {

    "filtered" : {

    "query" : {

    "bool" : {


    "must" : {

    "match" : {

    "author.first_name" : {

    "query" : "claire",

    "fuzziness" : 0.1

    }

    }

    },


    "must" : {

    "multi_match" : {

    "query" : "elasticsearch",

    "fields" : ["title^10", "body"]

    }

    }

    }


    },


    "filter": {

    "and" : [

    { "terms" : { "tags" : ["search"] } },

    { "range" : { "published_on": {"from": "2013"} } },

    { "term" : { "featured" : true } }

    ]

    }

    }

    }

    }'
    JSON-based Query DSL

    View Slide

  16. curl -X GET localhost:9200/articles/_search -d '{

    "query" : {

    "filtered" : {

    "query" : {

    "bool" : {


    "must" : {

    "match" : {

    "author.first_name" : {

    "query" : "claire",

    "fuzziness" : 0.1

    }

    }

    },


    "must" : {

    "multi_match" : {

    "query" : "elasticsearch",

    "fields" : ["title^10", "body"]

    }

    }

    }


    },


    "filter": {

    "and" : [

    { "terms" : { "tags" : ["search"] } },

    { "range" : { "published_on": {"from": "2013"} } },

    { "term" : { "featured" : true } }

    ]

    }

    }

    }

    }'
    JSON-based Query DSL 




    View Slide

  17. curl -X GET localhost:9200/articles/_search -d '{

    "query" : {

    "filtered" : {

    "query" : {

    "bool" : {


    "must" : {

    "match" : {

    "author.first_name" : {

    "query" : "claire",

    "fuzziness" : 0.1

    }

    }

    },


    "must" : {

    "multi_match" : {

    "query" : "elasticsearch",

    "fields" : ["title^10", "body"]

    }

    }

    }


    },


    "filter": {

    "and" : [

    { "terms" : { "tags" : ["search"] } },

    { "range" : { "published_on": {"from": "2013"} } },

    { "term" : { "featured" : true } }

    ]

    }

    }

    }

    }'
    JSON-based Query DSL 




    View Slide

  18. curl -X GET localhost:9200/articles/_search -d '{

    "query" : {

    "filtered" : {

    "query" : {

    "bool" : {


    "must" : {

    "match" : {

    "author.first_name" : {

    "query" : "claire",

    "fuzziness" : 0.1

    }

    }

    },


    "must" : {

    "multi_match" : {

    "query" : "elasticsearch",

    "fields" : ["title^10", "body"]

    }

    }

    }


    },


    "filter": {

    "and" : [

    { "terms" : { "tags" : ["search"] } },

    { "range" : { "published_on": {"from": "2013"} } },

    { "term" : { "featured" : true } }

    ]

    }

    }

    }

    }'
    JSON-based Query DSL 




    View Slide

  19. curl -X GET localhost:9200/articles/_search -d '{

    "query" : {

    "filtered" : {

    "query" : {

    "bool" : {


    "must" : {

    "match" : {

    "author.first_name" : {

    "query" : "claire",

    "fuzziness" : 0.1

    }

    }

    },


    "must" : {

    "multi_match" : {

    "query" : "elasticsearch",

    "fields" : ["title^10", "body"]

    }

    }

    }


    },


    "filter": {

    "and" : [

    { "terms" : { "tags" : ["search"] } },

    { "range" : { "published_on": {"from": "2013"} } },

    { "term" : { "featured" : true } }

    ]

    }

    }

    }

    }'
    JSON-based Query DSL 




    View Slide

  20. “Find all articles with ‘search’ in their title or body, give
    matches in titles higher score”
    Full-text Search
    “Find all articles from year 2013 tagged ‘search’”
    Structured Search
    See custom_score and custom_filters_score queries
    Custom Scoring

    View Slide

  21. Fetch document field ➝ Pick configured analyzer ➝ Parse
    text into tokens ➝ Apply token filters ➝ Store into index
    How Search Engine Works?
    Result
    Results
    Query
    How Users See Search?

    View Slide

  22. Mapping
    curl -X PUT localhost:9200/articles/_mapping -d '{

    "article" : {

    "properties" : {

    "title" : {

    "type" : "string",

    "analyzer" : "english"

    }

    }

    }

    }'
    Configuring document properties for the search engine

    View Slide

  23. _analyze?pretty&format=text&text=jumping+jack+flash.

    The _analyze API [jumping:0->7:]
    [jack:8->12:]
    [flash:13->18:]
    _analyze?pretty&format=text&text=jumping+jack+flash.&analyzer=english

    [jump:0->7:]
    [jack:8->12:]
    [flash:13->18:]
    _analyze?text=...&tokenizer=X&filters=A,B,C

    View Slide

  24. Slice & Dice

    View Slide

  25. Query
    Facets

    View Slide

  26. curl -X POST 'localhost:9200/articles/_search?search_type=count&pretty' -d '{

    "facets": {

    "tag-cloug": {

    "terms" : {

    "field" : "tags"

    }

    }

    }

    }'

    “Tag Cloud” With the terms Facet
    "facets" : {
    "tag-cloug" : {
    "terms" : [ {
    "term" : "ruby",
    "count" : 3
    }, {
    "term" : "java",
    "count" : 2
    },
    ...
    } ]
    }
    }
    Simplest “map/reduce” aggregation: document count per tag

    View Slide

  27. curl -X GET 'localhost:9200/scores/_search/?search_type=count&pretty' -d '{

    "facets": {

    "scores-per-subject" : {

    "terms_stats" : {

    "key_field" : "subject",

    "value_field" : "score"

    }

    }

    }

    }'

    Statistics on Student Scores With the terms_stats Facet
    "facets" : {
    "scores-per-subject" : {
    "_type" : "terms_stats",
    "missing" : 0,
    "terms" : [ {
    "term" : "math",
    "count" : 4,
    "total_count" : 4,
    "min" : 25.0,
    "max" : 92.0,
    "total" : 267.0,
    "mean" : 66.75
    }, ... ]
    }
    }
    Aggregating statistics per subject

    View Slide

  28. curl -X GET 'localhost:9200/scores/_search/?search_type=count&pretty' -d '{

    "query" : {

    "match" : {

    "student" : "john"

    }

    },

    "facets": {

    "scores-per-subject" : {

    "terms_stats" : {

    "key_field" : "subject",

    "value_field" : "score"

    }

    }

    }

    }'

    Statistics on Student Scores With the terms_stats Facet
    "facets" : {
    "scores-per-subject" : {
    "_type" : "terms_stats",
    "missing" : 0,
    "terms" : [ {
    "term" : "math",
    "count" : 1,
    "total_count" : 1,
    "min" : 85.0,
    "max" : 85.0,
    "total" : 85.0,
    "mean" : 85.0
    }, ... ]
    }
    }
    Realtime filtering with queries and filters

    View Slide

  29. Facets
    Terms
    Terms Stats
    Statistical
    Range
    Histogram
    Date Histogram
    Filter
    Query
    Geo Distance

    View Slide

  30. Above
    &
    Beyond

    View Slide

  31. Above & Beyond
    Bulk operations (For indexing and search operations)
    Percolator (“reversed search” — alerts, classification, …)
    Suggesters (“Did you mean …?”)
    Index aliases (Grouping, filtering or “renaming” of indices)
    Index templates (Automatic index configuration)
    Monitoring API (Amount of memory used, number of operations, …)

    View Slide

  32. Aggregations
    "

    View Slide

  33. What’s wrong with facets?
    nothing
    it’s just that we want more…

    View Slide

  34. curl -X GET 'localhost:9200/scores/_search/?search_type=count&pretty' -d '{

    "query" : {

    "match" : {

    "student" : "john"

    }

    },

    "aggs": {

    "scores-per-subject" : {

    "terms" : {
    "field" : “subject”
    },
    "aggs" : {
    “avg_score” : {
    "avg" : {
    "field" : "score"

    } }
    }
    }

    }

    }'

    "aggregations" : {
    "scores-per-subject" : {
    "terms" : [ {
    "term" : "math",
    "doc_count" : 1,
    "avg_score" : {
    “value": 85.0
    }
    }, ... ]
    }
    }

    View Slide

  35. curl -X GET 'localhost:9200/scores/_search/?search_type=count&pretty' -d '{

    "query" : {

    "match" : {

    "student" : "john"

    }

    },

    "aggs": {

    "scores-per-subject" : {

    "terms" : {
    "field" : “subject”
    },
    "aggs" : {
    "avg_score_by_year”: {
    “date_histogram”: {
    "field" : "date",
    "interval" : "year",
    "format" : "yyyy"
    }
    }
    "aggs": {
    "avg_score" : {
    "avg": {
    "field" : "score"

    } } }
    }
    }
    }

    }

    }'

    "aggregations" : {
    "scores-per-subject" : {
    "terms" : [ {
    "term" : "math",
    "doc_count" : 1,
    "avg_score_by_year" : [{
    "key_as_string": "2013",
    "avg_score": {
    “value”: 85.0
    }
    }…
    ]
    }, ... ]
    }
    }

    View Slide

  36. Distributed
    Percolation
    "

    View Slide

  37. curl -XPUT “localhost:9200/twitter/.percolator/es-tweets” -d ‘{
    “query”: {
    “match”: { “body”: “elasticsearch” }
    }
    }’
    $ curl -XGET “localhost:9200/twitter/_percolate” -d ‘{
    “doc”: {
    “body”: “#elasticsearch is awesome”
    “nick”: “@imotov”
    “name”: “Igor Motov”
    “date”: “2013-11-03”
    }
    }’
    {

    “matches”: [
    {
    “_index”: “twitter”,
    “_id”: “es-tweets”
    }
    ]
    }

    View Slide

  38. So what’s in distribution?
    •Highlighting
    •Sorting
    •Multi-Index support
    •Aggregations
    •Multi-Percolate

    View Slide

  39. Snapshot
    &
    Restore
    "

    View Slide

  40. Backup, 0.90 style
    1. disable flush
    2. find all primary shard location (optional)
    3. copy files from primary shards (rsync)
    4. enable flush

    View Slide

  41. curl -XPUT “localhost:9200/_snapshot/my_backup/snapshot_20140101”
    Backup, 1.0 style

    View Slide

  42. Register a repository
    curl -XPUT "localhost:9200/_snapshot/my_backup" -d '{
    "type": "fs",
    "settings": {
    "location":"/mnt/es-test-repo"
    }
    }'

    View Slide

  43. curl -XPUT “localhost:9200/_snapshot/my_backup/snapshot_20140101” -d ‘{
    "indices":"+test_*,-test_4"
    }’
    Creating a Snapshot

    View Slide

  44. Backup, 0.90 style
    1. close the index (shutdown the cluster)
    2. find all existing index shards
    3. replace all index shards with data from backup
    4. open the index (start the cluster)

    View Slide

  45. curl -XPOST "localhost:9200/test_*/_close"
    Restore, 1.0 style
    curl -XPOST "localhost:9200/_snapshot/my_backup/snapshot_20140101"
    -d '{
    "indices":"test_*"
    }'

    View Slide

  46. Shard & Cluster

    View Slide

  47. A
    curl -XPUT 'http://localhost:9200/a/' -d '{

    "settings" : {

    "index" : {

    "number_of_shards" : 3,

    "number_of_replicas" : 1

    }

    }

    }'

    Index is partitioned into 3 primary shards,
    each is duplicated in 1 replica shard
    A1
    A2
    A3
    Replicas
    Primaries
    A1'
    A2'
    A3'

    View Slide

  48. 1 node 2 nodes 3 nodes
    Demo
    "index.routing.allocation.exclude.name" : "Node1"
    "cluster.routing.allocation.exclude.name" : "Node3"
    ...

    View Slide

  49. thanks!

    View Slide