Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch

Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch

A talk I gave at berlin buzzwords 2014

Boaz Leskes

May 26, 2014
Tweet

More Decks by Boaz Leskes

Other Decks in Technology

Transcript

  1. two use cases of scaling data with Elasticsearch Staying Ahead

    of Users and Time Boaz Leskes @bleskes
  2. indices,types and other animals Basics

  3. A document {
 "created_at": "Fri Jan 24 11:15:24 +0000 2014",

    "id": 426674590560305150, "text": "Prepping up for my #elasticsearch talk this afternoon at the UvA : http://t.co/rqhBI5zys0", "user": { "name": “Boaz Leskes", "screen_name": "bleskes", } }
  4. A type {
 "created_at": "Fri Jan 24 11:15:24 +0000 2014",

    "id": 426674590560305150, "text": "Prepping up for my #elasticsearch talk this afternoon at the UvA : http://t.co/rqhBI5zys0", "user": { "name": “Boaz Leskes", "screen_name": "bleskes", } } = docs with similar data/structure {
 "created_at": "Thu Jan 23 18:27:23 +0000 2014", "id": 426420915698544640, "text": "Elasticsearch es una maravilla !!!!", "user": { "name": "Abel Coronado", "screen_name": "abxda", } }
  5. An index = a collection of types {
 "created_at": "Thu

    Jan 23 "id": 426420915698544640, "text": "Elasticsearch Esc "user": { "name": "Abel Coronado "screen_name": "abxda" } } {
 "id": 19726002, "name": "Abel Coronado "screen_name": "abxda" "location":"Aguascalientes" "followers_count":871 "friends_count":1794 "listed_count":38 }
  6. Sharding index

  7. Sharding index shard 3 shard 4 shard 1 shard 2

  8. Sharding index shard 3 shard 4 shard 1 shard 2

    node node
  9. Sharding index node node shard 3 shard 1 shard 4

    shard 2
  10. Sharding index node node shard 3 shard 1 shard 4

    shard 2 node node copy 1 copy 4 copy 3 copy 2
  11. Sharding node node copy 1 copy 3 node node shard

    1 shard 3 copy 4 copy 2 shard 4 shard 2
  12. Sharding node node copy 1 copy 3 node node node

    node node node shard 1 shard 3 copy 4 copy 2 shard 4 shard 2
  13. Sharding node node copy 1 copy 3 node node node

    node node node shard 1 shard 2 shard 3 copy 4 copy 2 shard 4
  14. Sharding - multiple indices node node shard 1 copy 2

    node node node node node node shard 1 shard 2 shard 2 copy 1 copy 2 copy 1
  15. Search node node shard 1 copy 2 node node shard

    1 shard 2 shard 2 copy 1 copy 2 copy 1 # curl localhost:9200/index1,index2/_search?q=something any node
  16. Search node node shard 1 copy 2 node node shard

    1 shard 2 shard 2 copy 1 copy 2 copy 1 # curl localhost:9200/index1,index2/_search?q=something any node
  17. Search node node shard 1 copy 2 node node shard

    1 shard 2 shard 2 copy 1 copy 2 copy 1 # curl localhost:9200/index1,index2/_search?q=something any node
  18. Search node node shard 1 copy 2 node node shard

    1 shard 2 shard 2 copy 1 copy 2 copy 1 # curl localhost:9200/index1,index2/_search?q=something any node
  19. Search node node shard 1 copy 2 node node shard

    1 shard 2 shard 2 copy 1 copy 2 copy 1 # curl localhost:9200/index1,index2/_search?q=something any node
  20. Important fact for later indexing & searching is done on

    shards, not indices
  21. time based data To the subject at hand

  22. None
  23. easy to get, easy to index # curl -XPUT localhost:9200/tweets/tweet/426674590560305150

    -d '{
 "created_at": "Fri Jan 24 11:15:24 +0000 2014", "id": 426674590560305150, "text": "Prepping up for my #elasticsearch talk”, "user": { "name": “Boaz Leskes", "screen_name": "bleskes", } }'
  24. None
  25. Shard 1 Shard 2 Shard 3

  26. Shard 1 Shard 2 Shard 3

  27. Shard 1 Shard 2 Shard 3

  28. Shard 1 Shard 2 Shard 3

  29. Shard 1 Shard 2 Shard 3

  30. no problem, just use more shards shard 1 shard 7

    shard 2 shard 8 shard 3 shard 9 shard 4 shard 10 shard 5 shard 6 shard 11 shard 10
  31. no problem, just use more shards shard 1 shard 7

    shard 2 shard 8 shard 3 shard 9 shard 4 shard 10 shard 5 shard 6 shard 11 shard 10 # curl localhost:9200/index/_search?q=something
  32. Reminds of a tile at my aunt’s house Today is

    the tomorrow we were all afraid of yesterday….
  33. None
  34. March 1 2 3

  35. March 1 2 3

  36. March 1 2 3 April 1 2 3

  37. March 1 2 3 April 1 2 3 May 1

    2 3
  38. Cluster scales with time ar. 1 april 1 pril 2

    mar. 1 april 1 april 2 may 1 may 2 may 2 may 2 june 2
  39. Scopes searches mar. 1 april 1 april 2 # curl

    localhost:9200/may/_search?q=something mar. 1 april 1 april 2 may 1 may 2 may 2 may 2
  40. one little tweak… # curl -XPUT localhost:9200/tweets_201401/tweet/426674590560305150 -d '{
 "created_at":

    "Fri Jan 24 11:15:24 +0000 2014", "id": 426674590560305150, "text": "Prepping up for my #elasticsearch talk this afternoon at the UvA : http://t.co/rqhBI5zys0", "user": { "name": “Boaz Leskes", "screen_name": "bleskes", } }'
  41. None
  42. Another fact index is the basic unit of configuration

  43. index templates curl -XPUT localhost:9200/_template/twitter -d ' { "template" :

    “twitter_*", "settings" : { "number_of_shards" : 4, "number_of_replicas" : 1 } }'
  44. older data # elasticsearch.yml ! node.disk: spinning_disks curl -XPUT localhost:9200/twitter_2012*/_settings

    -d '{ "index.routing.allocation.include.disk" : “spinning_disks”, “index.routing.allocation.exclude.disk" : "ssd" }'
  45. older data curl -XPOST localhost:9200/twitter_201404/_optimize ! curl -XPOST localhost:9200/twitter_201304/_close !

    curl -XDELETE localhost:9200/twitter_201204/ pro tip: https://github.com/elasticsearch/curator
  46. aliases curl -XPUT localhost:9200/_aliases -d ‘{ "actions": { "add": {

    "index": "twitter_201311", "alias": "last_2_months" }, "remove": { "index": "twitter_201309", "alias": "last_2_months" } } }'
  47. Implications • Use indices to manage data as it scales

    over time • Use aliases to efficiently point your searches at the relevant shards
  48. One More Thing.. Time is just a (strictly) monotonic function

    Primary keys are just as good
  49. scalable multi-tenancy Users data

  50. Data

  51. Data Data Data Data Data Data Data Data Data Data

    Data Data
  52. Solution 1 - index per user user 1 shard 1

    user 1 shard 2 user 2 shard 1 user 2 shard 2 user 3 shard 1 user 3 shard 2 user 4 shard 1 user 4 shard 2 user 5 shard 1 user 5 shard 2 user 6 shard 1 user 6 shard 2 user 7 shard 1 user 7 shard 2 user 8 shard 1 user 8 shard 2 user 9 shard 1 user 9 shard 2 user 10 shard 1 user 10 shard 2 user 11 shard 1 user 11 shard 2 user 12 shard 1 user 12 shard 2
  53. Solution 1 - index per user user 1 shard 1

    user 1 shard 2 user 2 shard 1 user 2 shard 2 user 3 shard 1 user 3 shard 2 user 4 shard 1 user 4 shard 2 user 5 shard 1 user 5 shard 2 user 6 shard 1 user 6 shard 2 user 7 shard 1 user 7 shard 2 user 8 shard 1 user 8 shard 2 user 9 shard 1 user 9 shard 2 user 10 shard 1 user 10 shard 2 user 11 shard 1 user 11 shard 2 user 12 shard 1 user 12 shard 2 node
  54. Solution 1 - index per user user 1 shard 1

    user 1 shard 2 user 2 shard 1 user 2 shard 2 user 3 shard 1 user 3 shard 2 user 4 shard 1 user 4 shard 2 user 5 shard 1 user 5 shard 2 user 6 shard 1 user 6 shard 2 user 7 shard 1 user 7 shard 2 user 8 shard 1 user 8 shard 2 user 9 shard 1 user 9 shard 2 user 10 shard 1 user 10 shard 2 user 11 shard 1 user 11 shard 2 user 12 shard 1 user 12 shard 2 node Overloaded
  55. Solution 2 - all users in one index shard 1

    shard 1 shard 2 shard 3 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12
  56. Solution 2 - all users in one index shard 1

    shard 1 shard 2 shard 3 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 limited horizon
  57. Solution 12 - both shard 1 shard 2 shard 1

    shard 1 shard 2 shard 3 shard 4 user 1, user 2 user 4, user 5 user 1, user 2 user 4, user 5 shard 3 shard 4 user 1, user 2 user 4, user 5 user 1, user 2 user 4, user 5 user 7, user 8 user 9, user 0 user 7, user 8 user 9, user 0 user 7, user 8 user 9, user 0 user 7, user 8 user 9, user 0 user 6 shard 2 user 6 shard 3 user 6 shard 4 user 6 index 1 index 2 index 3
  58. Why is one index per user convenient? # curl -XGET

    localhost:9200/user_1/_search -d '{
 "query": { "match": { "body": "all the things" } } }'
  59. What do we want? • Have the simplicity of one

    user per index • Have the scalability of solution 12
  60. Aliases to the rescue curl -XPUT localhost:9200/_aliases -d ‘{ "actions":

    { "add": { "index": "users_group_1", "alias": "user_1", "filter": { "term": { "user": "user_1" } } } }' # curl -XGET localhost:9200/user_1/_search -d '{
 … }'
  61. thank you! http://elasticsearch.com/support @elasticsearch , @bleskes http://elasticsearch.org/resources