Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch

Staying Ahead of Users And Time - two use cases of scaling data with Elasticsearch

A talk I gave at berlin buzzwords 2014

9a2049bf377d85f15dd1f7a3ce697a68?s=128

Boaz Leskes

May 26, 2014
Tweet

Transcript

  1. two use cases of scaling data with Elasticsearch Staying Ahead

    of Users and Time Boaz Leskes @bleskes
  2. indices,types and other animals Basics

  3. A document {
 "created_at": "Fri Jan 24 11:15:24 +0000 2014",

    "id": 426674590560305150, "text": "Prepping up for my #elasticsearch talk this afternoon at the UvA : http://t.co/rqhBI5zys0", "user": { "name": “Boaz Leskes", "screen_name": "bleskes", } }
  4. A type {
 "created_at": "Fri Jan 24 11:15:24 +0000 2014",

    "id": 426674590560305150, "text": "Prepping up for my #elasticsearch talk this afternoon at the UvA : http://t.co/rqhBI5zys0", "user": { "name": “Boaz Leskes", "screen_name": "bleskes", } } = docs with similar data/structure {
 "created_at": "Thu Jan 23 18:27:23 +0000 2014", "id": 426420915698544640, "text": "Elasticsearch es una maravilla !!!!", "user": { "name": "Abel Coronado", "screen_name": "abxda", } }
  5. An index = a collection of types {
 "created_at": "Thu

    Jan 23 "id": 426420915698544640, "text": "Elasticsearch Esc "user": { "name": "Abel Coronado "screen_name": "abxda" } } {
 "id": 19726002, "name": "Abel Coronado "screen_name": "abxda" "location":"Aguascalientes" "followers_count":871 "friends_count":1794 "listed_count":38 }
  6. Sharding index

  7. Sharding index shard 3 shard 4 shard 1 shard 2

  8. Sharding index shard 3 shard 4 shard 1 shard 2

    node node
  9. Sharding index node node shard 3 shard 1 shard 4

    shard 2
  10. Sharding index node node shard 3 shard 1 shard 4

    shard 2 node node copy 1 copy 4 copy 3 copy 2
  11. Sharding node node copy 1 copy 3 node node shard

    1 shard 3 copy 4 copy 2 shard 4 shard 2
  12. Sharding node node copy 1 copy 3 node node node

    node node node shard 1 shard 3 copy 4 copy 2 shard 4 shard 2
  13. Sharding node node copy 1 copy 3 node node node

    node node node shard 1 shard 2 shard 3 copy 4 copy 2 shard 4
  14. Sharding - multiple indices node node shard 1 copy 2

    node node node node node node shard 1 shard 2 shard 2 copy 1 copy 2 copy 1
  15. Search node node shard 1 copy 2 node node shard

    1 shard 2 shard 2 copy 1 copy 2 copy 1 # curl localhost:9200/index1,index2/_search?q=something any node
  16. Search node node shard 1 copy 2 node node shard

    1 shard 2 shard 2 copy 1 copy 2 copy 1 # curl localhost:9200/index1,index2/_search?q=something any node
  17. Search node node shard 1 copy 2 node node shard

    1 shard 2 shard 2 copy 1 copy 2 copy 1 # curl localhost:9200/index1,index2/_search?q=something any node
  18. Search node node shard 1 copy 2 node node shard

    1 shard 2 shard 2 copy 1 copy 2 copy 1 # curl localhost:9200/index1,index2/_search?q=something any node
  19. Search node node shard 1 copy 2 node node shard

    1 shard 2 shard 2 copy 1 copy 2 copy 1 # curl localhost:9200/index1,index2/_search?q=something any node
  20. Important fact for later indexing & searching is done on

    shards, not indices
  21. time based data To the subject at hand

  22. None
  23. easy to get, easy to index # curl -XPUT localhost:9200/tweets/tweet/426674590560305150

    -d '{
 "created_at": "Fri Jan 24 11:15:24 +0000 2014", "id": 426674590560305150, "text": "Prepping up for my #elasticsearch talk”, "user": { "name": “Boaz Leskes", "screen_name": "bleskes", } }'
  24. None
  25. Shard 1 Shard 2 Shard 3

  26. Shard 1 Shard 2 Shard 3

  27. Shard 1 Shard 2 Shard 3

  28. Shard 1 Shard 2 Shard 3

  29. Shard 1 Shard 2 Shard 3

  30. no problem, just use more shards shard 1 shard 7

    shard 2 shard 8 shard 3 shard 9 shard 4 shard 10 shard 5 shard 6 shard 11 shard 10
  31. no problem, just use more shards shard 1 shard 7

    shard 2 shard 8 shard 3 shard 9 shard 4 shard 10 shard 5 shard 6 shard 11 shard 10 # curl localhost:9200/index/_search?q=something
  32. Reminds of a tile at my aunt’s house Today is

    the tomorrow we were all afraid of yesterday….
  33. None
  34. March 1 2 3

  35. March 1 2 3

  36. March 1 2 3 April 1 2 3

  37. March 1 2 3 April 1 2 3 May 1

    2 3
  38. Cluster scales with time ar. 1 april 1 pril 2

    mar. 1 april 1 april 2 may 1 may 2 may 2 may 2 june 2
  39. Scopes searches mar. 1 april 1 april 2 # curl

    localhost:9200/may/_search?q=something mar. 1 april 1 april 2 may 1 may 2 may 2 may 2
  40. one little tweak… # curl -XPUT localhost:9200/tweets_201401/tweet/426674590560305150 -d '{
 "created_at":

    "Fri Jan 24 11:15:24 +0000 2014", "id": 426674590560305150, "text": "Prepping up for my #elasticsearch talk this afternoon at the UvA : http://t.co/rqhBI5zys0", "user": { "name": “Boaz Leskes", "screen_name": "bleskes", } }'
  41. None
  42. Another fact index is the basic unit of configuration

  43. index templates curl -XPUT localhost:9200/_template/twitter -d ' { "template" :

    “twitter_*", "settings" : { "number_of_shards" : 4, "number_of_replicas" : 1 } }'
  44. older data # elasticsearch.yml ! node.disk: spinning_disks curl -XPUT localhost:9200/twitter_2012*/_settings

    -d '{ "index.routing.allocation.include.disk" : “spinning_disks”, “index.routing.allocation.exclude.disk" : "ssd" }'
  45. older data curl -XPOST localhost:9200/twitter_201404/_optimize ! curl -XPOST localhost:9200/twitter_201304/_close !

    curl -XDELETE localhost:9200/twitter_201204/ pro tip: https://github.com/elasticsearch/curator
  46. aliases curl -XPUT localhost:9200/_aliases -d ‘{ "actions": { "add": {

    "index": "twitter_201311", "alias": "last_2_months" }, "remove": { "index": "twitter_201309", "alias": "last_2_months" } } }'
  47. Implications • Use indices to manage data as it scales

    over time • Use aliases to efficiently point your searches at the relevant shards
  48. One More Thing.. Time is just a (strictly) monotonic function

    Primary keys are just as good
  49. scalable multi-tenancy Users data

  50. Data

  51. Data Data Data Data Data Data Data Data Data Data

    Data Data
  52. Solution 1 - index per user user 1 shard 1

    user 1 shard 2 user 2 shard 1 user 2 shard 2 user 3 shard 1 user 3 shard 2 user 4 shard 1 user 4 shard 2 user 5 shard 1 user 5 shard 2 user 6 shard 1 user 6 shard 2 user 7 shard 1 user 7 shard 2 user 8 shard 1 user 8 shard 2 user 9 shard 1 user 9 shard 2 user 10 shard 1 user 10 shard 2 user 11 shard 1 user 11 shard 2 user 12 shard 1 user 12 shard 2
  53. Solution 1 - index per user user 1 shard 1

    user 1 shard 2 user 2 shard 1 user 2 shard 2 user 3 shard 1 user 3 shard 2 user 4 shard 1 user 4 shard 2 user 5 shard 1 user 5 shard 2 user 6 shard 1 user 6 shard 2 user 7 shard 1 user 7 shard 2 user 8 shard 1 user 8 shard 2 user 9 shard 1 user 9 shard 2 user 10 shard 1 user 10 shard 2 user 11 shard 1 user 11 shard 2 user 12 shard 1 user 12 shard 2 node
  54. Solution 1 - index per user user 1 shard 1

    user 1 shard 2 user 2 shard 1 user 2 shard 2 user 3 shard 1 user 3 shard 2 user 4 shard 1 user 4 shard 2 user 5 shard 1 user 5 shard 2 user 6 shard 1 user 6 shard 2 user 7 shard 1 user 7 shard 2 user 8 shard 1 user 8 shard 2 user 9 shard 1 user 9 shard 2 user 10 shard 1 user 10 shard 2 user 11 shard 1 user 11 shard 2 user 12 shard 1 user 12 shard 2 node Overloaded
  55. Solution 2 - all users in one index shard 1

    shard 1 shard 2 shard 3 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12
  56. Solution 2 - all users in one index shard 1

    shard 1 shard 2 shard 3 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 user 1, user 2, user 3 user 4, user 5, user 6 user 7, user 9, user 9 user 10, user 11, user 12 limited horizon
  57. Solution 12 - both shard 1 shard 2 shard 1

    shard 1 shard 2 shard 3 shard 4 user 1, user 2 user 4, user 5 user 1, user 2 user 4, user 5 shard 3 shard 4 user 1, user 2 user 4, user 5 user 1, user 2 user 4, user 5 user 7, user 8 user 9, user 0 user 7, user 8 user 9, user 0 user 7, user 8 user 9, user 0 user 7, user 8 user 9, user 0 user 6 shard 2 user 6 shard 3 user 6 shard 4 user 6 index 1 index 2 index 3
  58. Why is one index per user convenient? # curl -XGET

    localhost:9200/user_1/_search -d '{
 "query": { "match": { "body": "all the things" } } }'
  59. What do we want? • Have the simplicity of one

    user per index • Have the scalability of solution 12
  60. Aliases to the rescue curl -XPUT localhost:9200/_aliases -d ‘{ "actions":

    { "add": { "index": "users_group_1", "alias": "user_1", "filter": { "term": { "user": "user_1" } } } }' # curl -XGET localhost:9200/user_1/_search -d '{
 … }'
  61. thank you! http://elasticsearch.com/support @elasticsearch , @bleskes http://elasticsearch.org/resources