$30 off During Our Annual Pro Sale. View Details »

Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch

Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch

A talk I gave at berlin buzzwords 2014

Boaz Leskes

May 26, 2014
Tweet

More Decks by Boaz Leskes

Other Decks in Technology

Transcript

  1. two use cases of scaling data with Elasticsearch
    Staying Ahead of

    Users and Time
    Boaz Leskes
    @bleskes

    View Slide

  2. indices,types and
    other animals
    Basics

    View Slide

  3. A document
    {

    "created_at": "Fri Jan 24 11:15:24 +0000 2014",
    "id": 426674590560305150,
    "text": "Prepping up for my #elasticsearch talk
    this afternoon at the UvA : http://t.co/rqhBI5zys0",
    "user": {
    "name": “Boaz Leskes",
    "screen_name": "bleskes",
    }
    }

    View Slide

  4. A type
    {

    "created_at": "Fri Jan 24 11:15:24 +0000 2014",
    "id": 426674590560305150,
    "text": "Prepping up for my #elasticsearch talk
    this afternoon at the UvA : http://t.co/rqhBI5zys0",
    "user": {
    "name": “Boaz Leskes",
    "screen_name": "bleskes",
    }
    }
    = docs with similar data/structure
    {

    "created_at": "Thu Jan 23 18:27:23 +0000 2014",
    "id": 426420915698544640,
    "text": "Elasticsearch es una maravilla !!!!",
    "user": {
    "name": "Abel Coronado",
    "screen_name": "abxda",
    }
    }

    View Slide

  5. An index
    = a collection of types
    {

    "created_at": "Thu Jan 23
    "id": 426420915698544640,
    "text": "Elasticsearch Esc
    "user": {
    "name": "Abel Coronado
    "screen_name": "abxda"
    }
    }
    {

    "id": 19726002,
    "name": "Abel Coronado
    "screen_name": "abxda"
    "location":"Aguascalientes"
    "followers_count":871
    "friends_count":1794
    "listed_count":38
    }

    View Slide

  6. Sharding
    index

    View Slide

  7. Sharding
    index
    shard 3 shard 4
    shard 1 shard 2

    View Slide

  8. Sharding
    index
    shard 3 shard 4
    shard 1 shard 2
    node node

    View Slide

  9. Sharding
    index
    node node
    shard 3
    shard 1
    shard 4
    shard 2

    View Slide

  10. Sharding
    index
    node node
    shard 3
    shard 1
    shard 4
    shard 2
    node node
    copy 1
    copy 4
    copy 3
    copy 2

    View Slide

  11. Sharding
    node node
    copy 1 copy 3
    node node
    shard 1
    shard 3 copy 4
    copy 2 shard 4
    shard 2

    View Slide

  12. Sharding
    node node
    copy 1 copy 3
    node node
    node node node node
    shard 1
    shard 3 copy 4
    copy 2 shard 4
    shard 2

    View Slide

  13. Sharding
    node node
    copy 1 copy 3
    node node
    node node node node
    shard 1 shard 2
    shard 3 copy 4
    copy 2 shard 4

    View Slide

  14. Sharding - multiple indices
    node node
    shard 1 copy 2
    node node
    node node node node
    shard 1 shard 2
    shard 2 copy 1
    copy 2 copy 1

    View Slide

  15. Search
    node node
    shard 1 copy 2
    node node
    shard 1 shard 2
    shard 2 copy 1
    copy 2 copy 1
    # curl localhost:9200/index1,index2/_search?q=something
    any node

    View Slide

  16. Search
    node node
    shard 1 copy 2
    node node
    shard 1 shard 2
    shard 2 copy 1
    copy 2 copy 1
    # curl localhost:9200/index1,index2/_search?q=something
    any node

    View Slide

  17. Search
    node node
    shard 1 copy 2
    node node
    shard 1 shard 2
    shard 2 copy 1
    copy 2 copy 1
    # curl localhost:9200/index1,index2/_search?q=something
    any node

    View Slide

  18. Search
    node node
    shard 1 copy 2
    node node
    shard 1 shard 2
    shard 2 copy 1
    copy 2 copy 1
    # curl localhost:9200/index1,index2/_search?q=something
    any node

    View Slide

  19. Search
    node node
    shard 1 copy 2
    node node
    shard 1 shard 2
    shard 2 copy 1
    copy 2 copy 1
    # curl localhost:9200/index1,index2/_search?q=something
    any node

    View Slide

  20. Important fact for later
    indexing & searching is done on
    shards, not indices

    View Slide

  21. time based data
    To the subject at hand

    View Slide

  22. View Slide

  23. easy to get, easy to index
    # curl -XPUT localhost:9200/tweets/tweet/426674590560305150 -d '{

    "created_at": "Fri Jan 24 11:15:24 +0000 2014",
    "id": 426674590560305150,
    "text": "Prepping up for my #elasticsearch talk”,
    "user": {
    "name": “Boaz Leskes",
    "screen_name": "bleskes",
    }
    }'

    View Slide

  24. View Slide

  25. Shard 1 Shard 2 Shard 3

    View Slide

  26. Shard 1 Shard 2 Shard 3

    View Slide

  27. Shard 1 Shard 2 Shard 3

    View Slide

  28. Shard 1 Shard 2 Shard 3

    View Slide

  29. Shard 1 Shard 2 Shard 3

    View Slide

  30. no problem, just use more shards
    shard 1
    shard 7
    shard 2
    shard 8
    shard 3
    shard 9
    shard 4
    shard 10
    shard 5 shard 6
    shard 11 shard 10

    View Slide

  31. no problem, just use more shards
    shard 1
    shard 7
    shard 2
    shard 8
    shard 3
    shard 9
    shard 4
    shard 10
    shard 5 shard 6
    shard 11 shard 10
    # curl localhost:9200/index/_search?q=something

    View Slide

  32. Reminds of a tile at my aunt’s house
    Today is the tomorrow
    we were all afraid of
    yesterday….

    View Slide

  33. View Slide

  34. March
    1 2 3

    View Slide

  35. March
    1 2 3

    View Slide

  36. March
    1 2 3
    April
    1 2 3

    View Slide

  37. March
    1 2 3
    April
    1 2 3
    May
    1 2 3

    View Slide

  38. Cluster scales with time
    ar. 1
    april 1
    pril 2
    mar. 1 april 1
    april 2
    may 1
    may 2
    may 2
    may 2 june 2

    View Slide

  39. Scopes searches
    mar. 1
    april 1
    april 2
    # curl localhost:9200/may/_search?q=something
    mar. 1 april 1
    april 2
    may 1
    may 2
    may 2
    may 2

    View Slide

  40. one little tweak…
    # curl -XPUT localhost:9200/tweets_201401/tweet/426674590560305150
    -d '{

    "created_at": "Fri Jan 24 11:15:24 +0000 2014",
    "id": 426674590560305150,
    "text": "Prepping up for my #elasticsearch talk
    this afternoon at the UvA : http://t.co/rqhBI5zys0",
    "user": {
    "name": “Boaz Leskes",
    "screen_name": "bleskes",
    }
    }'

    View Slide

  41. View Slide

  42. Another fact
    index is the basic unit of configuration

    View Slide

  43. index templates
    curl -XPUT localhost:9200/_template/twitter -d '
    {
    "template" : “twitter_*",
    "settings" : {
    "number_of_shards" : 4,
    "number_of_replicas" : 1
    }
    }'

    View Slide

  44. older data
    # elasticsearch.yml
    !
    node.disk: spinning_disks
    curl -XPUT localhost:9200/twitter_2012*/_settings -d '{
    "index.routing.allocation.include.disk" : “spinning_disks”,
    “index.routing.allocation.exclude.disk" : "ssd"
    }'

    View Slide

  45. older data
    curl -XPOST localhost:9200/twitter_201404/_optimize
    !
    curl -XPOST localhost:9200/twitter_201304/_close
    !
    curl -XDELETE localhost:9200/twitter_201204/
    pro tip: https://github.com/elasticsearch/curator

    View Slide

  46. aliases
    curl -XPUT localhost:9200/_aliases -d ‘{
    "actions": {
    "add": {
    "index": "twitter_201311", "alias": "last_2_months"
    },
    "remove": {
    "index": "twitter_201309", "alias": "last_2_months"
    }
    }
    }'

    View Slide

  47. Implications
    • Use indices to manage data as it scales over time

    • Use aliases to efficiently point your searches at
    the relevant shards

    View Slide

  48. One More Thing..
    Time is just a (strictly) monotonic function

    Primary keys are just as good

    View Slide

  49. scalable multi-tenancy
    Users data

    View Slide

  50. Data

    View Slide

  51. Data
    Data
    Data
    Data
    Data
    Data
    Data
    Data
    Data
    Data
    Data
    Data

    View Slide

  52. Solution 1 - index per user
    user 1
    shard 1
    user 1
    shard 2
    user 2
    shard 1
    user 2
    shard 2
    user 3
    shard 1
    user 3
    shard 2
    user 4
    shard 1
    user 4
    shard 2
    user 5
    shard 1
    user 5
    shard 2
    user 6
    shard 1
    user 6
    shard 2
    user 7
    shard 1
    user 7
    shard 2
    user 8
    shard 1
    user 8
    shard 2
    user 9
    shard 1
    user 9
    shard 2
    user 10
    shard 1
    user 10
    shard 2
    user 11
    shard 1
    user 11
    shard 2
    user 12
    shard 1
    user 12
    shard 2

    View Slide

  53. Solution 1 - index per user
    user 1
    shard 1
    user 1
    shard 2
    user 2
    shard 1
    user 2
    shard 2
    user 3
    shard 1
    user 3
    shard 2
    user 4
    shard 1
    user 4
    shard 2
    user 5
    shard 1
    user 5
    shard 2
    user 6
    shard 1
    user 6
    shard 2
    user 7
    shard 1
    user 7
    shard 2
    user 8
    shard 1
    user 8
    shard 2
    user 9
    shard 1
    user 9
    shard 2
    user 10
    shard 1
    user 10
    shard 2
    user 11
    shard 1
    user 11
    shard 2
    user 12
    shard 1
    user 12
    shard 2
    node

    View Slide

  54. Solution 1 - index per user
    user 1
    shard 1
    user 1
    shard 2
    user 2
    shard 1
    user 2
    shard 2
    user 3
    shard 1
    user 3
    shard 2
    user 4
    shard 1
    user 4
    shard 2
    user 5
    shard 1
    user 5
    shard 2
    user 6
    shard 1
    user 6
    shard 2
    user 7
    shard 1
    user 7
    shard 2
    user 8
    shard 1
    user 8
    shard 2
    user 9
    shard 1
    user 9
    shard 2
    user 10
    shard 1
    user 10
    shard 2
    user 11
    shard 1
    user 11
    shard 2
    user 12
    shard 1
    user 12
    shard 2
    node
    Overloaded

    View Slide

  55. Solution 2 - all users in one index
    shard 1 shard 1
    shard 2 shard 3
    user 1, user 2, user 3
    user 4, user 5, user 6
    user 7, user 9, user 9
    user 10, user 11, user 12
    user 1, user 2, user 3
    user 4, user 5, user 6
    user 7, user 9, user 9
    user 10, user 11, user 12
    user 1, user 2, user 3
    user 4, user 5, user 6
    user 7, user 9, user 9
    user 10, user 11, user 12
    user 1, user 2, user 3
    user 4, user 5, user 6
    user 7, user 9, user 9
    user 10, user 11, user 12

    View Slide

  56. Solution 2 - all users in one index
    shard 1 shard 1
    shard 2 shard 3
    user 1, user 2, user 3
    user 4, user 5, user 6
    user 7, user 9, user 9
    user 10, user 11, user 12
    user 1, user 2, user 3
    user 4, user 5, user 6
    user 7, user 9, user 9
    user 10, user 11, user 12
    user 1, user 2, user 3
    user 4, user 5, user 6
    user 7, user 9, user 9
    user 10, user 11, user 12
    user 1, user 2, user 3
    user 4, user 5, user 6
    user 7, user 9, user 9
    user 10, user 11, user 12
    limited horizon

    View Slide

  57. Solution 12 - both
    shard 1 shard 2 shard 1 shard 1 shard 2
    shard 3 shard 4
    user 1, user 2
    user 4, user 5
    user 1, user 2
    user 4, user 5
    shard 3 shard 4
    user 1, user 2
    user 4, user 5
    user 1, user 2
    user 4, user 5
    user 7, user 8
    user 9, user 0
    user 7, user 8
    user 9, user 0
    user 7, user 8
    user 9, user 0
    user 7, user 8
    user 9, user 0
    user 6
    shard 2
    user 6
    shard 3
    user 6
    shard 4
    user 6
    index 1 index 2 index 3

    View Slide

  58. Why is one index per user convenient?
    # curl -XGET localhost:9200/user_1/_search -d '{

    "query": {
    "match": {
    "body": "all the things"
    }
    }
    }'

    View Slide

  59. What do we want?
    • Have the simplicity of one user per index

    • Have the scalability of solution 12

    View Slide

  60. Aliases to the rescue
    curl -XPUT localhost:9200/_aliases -d ‘{
    "actions": {
    "add": {
    "index": "users_group_1", "alias": "user_1",
    "filter": {
    "term": {
    "user": "user_1"
    }
    }
    }
    }'
    # curl -XGET localhost:9200/user_1/_search -d '{


    }'

    View Slide

  61. thank you!
    http://elasticsearch.com/support
    @elasticsearch , @bleskes
    http://elasticsearch.org/resources

    View Slide