Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Throw Some Keys on It: Data Modeling for Key-Value Data Stores by Example

B32443719f266e1da10dc301688642b4?s=47 Hector Castro
February 21, 2014

Throw Some Keys on It: Data Modeling for Key-Value Data Stores by Example

Relational databases are a great tool, but they are just one tool among many. If you have more data than fits on a single server and need a database that is highly available, then you should consider key/value data-storage. You may discover that they are a better fit for your needs.

Many key/value databases are designed to withstand multiple server and network failures, and to scale to dozens or hundreds of servers. In order to take advantage of that power you need to think differently about data modeling.

You will be amazed as to how much can be achieved with a key/value data store.

In this session, you will learn how to model your data to meet latency, scale and high-availability requirements using a high volume use-case of a mobile app that connects passengers with drivers of vehicles for hire, as well as coordinates ridesharing services.

B32443719f266e1da10dc301688642b4?s=128

Hector Castro

February 21, 2014
Tweet

Transcript

  1. Throw Some Keys on It Data Modeling for Key/Value Data

    Stores by Example
  2. Hector Castro @hectcastro

  3. None
  4. Relational databases are not all bad. They actually give us

    a lot.
  5. Relationships Transactions Schemas Ability to extend preexisting structure easily Ad-hoc

    queries (with SQL)
  6. Relationships Transactions Schemas Ability to extend preexisting structure easily Ad-hoc

    queries (with SQL)
  7. Relationships Transactions Schemas Ability to extend preexisting structure easily Ad-hoc

    queries (with SQL)
  8. Relationships Transactions Schemas Ability to extend preexisting structure easily Ad-hoc

    queries (with SQL)
  9. Relationships Transactions Schemas Ability to extend preexisting structure easily Ad-hoc

    queries (with SQL)
  10. None
  11. What if your application doesn’t need most of that?

  12. What if the latency, scale, or high-availability requirements you’re facing

    make most of those benefits disappear?
  13. Enter key/value data stores.

  14. Schema-less Single-access reads Great at write-heavy workloads Easier to scale

    Familiar interface
  15. Schema-less Single-access reads Great at write-heavy workloads Easier to scale

    Familiar interface
  16. Schema-less Single-access reads Great at write-heavy workloads Easier to scale

    Familiar interface
  17. Schema-less Single-access reads Great at write-heavy workloads Easier to scale

    Familiar interface
  18. Schema-less Single-access reads Great at write-heavy workloads Easier to scale

    Familiar interface
  19. mapping  =  {  }   mapping["p_cool_databas"]  =  "riak"   p

     mapping["p_cool_databas"]   #  =>  "riak"
  20. mapping["p_cool_databas"]  =  ["riak",  "postgres"].to_json   p  mapping["p_cool_databas"]   #  =>

     "[\"riak\",\"postgres\"]"
  21. mapping["p_cool_databas_logo"]  =  File.read("riak_logo.png")   p  mapping["p_cool_databas_logo"]   #  =>  "\x89PNG\r\n\u001A\n\u0000\u0000\u0000\rIHDR\u0000\...

  22. None
  23. Cool story, Hector, but my real-world application is too complex

    for a key/value data store.
  24. None
  25. None
  26. None
  27. None
  28. None
  29. None
  30. None
  31. None
  32. None
  33. buckets/20140221133500/keys/1_-­‐2

  34. buckets/20140221133500/keys                20140221

  35. buckets/20140221133500/keys                  

                 133500
  36. buckets/20140221133500/keys                  

                                         1_-­‐2
  37. [      "car1",      "car22",      "car7"

      ]
  38. def  emit_car_location(car_id,  color,  lat,  lon)      current_timestamp  =  Time.now.utc

         bucket  =  $client.bucket(current_timestamp.strftime($timestamp_format))   !    puts  "%s[%s]:  %s  is  at  [  %s,  %s  ]"  %  [          color,          current_timestamp.strftime($timestamp_format),          car_id,          lat,          lon      ]   !    cars  =  GSet.new      cars.add(car_id)   !    object  =  bucket.new("%s_%s"  %  [  lat,  lon  ])      object.content_type  =  "application/json"      object.data  =  cars.to_json   !    object.store(returnbody:  false)   end
  39. def      current_timestamp      bucket   !  

     puts        color,          current_timestamp        car_id,          lat,          lon      ]   !    cars      cars !    object      object    object !    object end        emit_car_location(car_id,  color,  lat,  lon)
  40. def      current_timestamp      bucket   !  

     puts        color,          current_timestamp        car_id,          lat,          lon      ]   !    cars      cars !    object      object    object !    object end ! !    bucket  =  $client.bucket(current_timestamp.strftime($timestamp_format))  
  41. buckets/20140221133500/keys/                20140221133500

  42. def      current_timestamp      bucket   !  

     puts        color,          current_timestamp        car_id,          lat,          lon      ]   !    cars      cars !    object      object    object !    object end ! ! ! !    puts  "%s[%s]:  %s  is  at  [  %s,  %s  ]"  %  [          color,          current_timestamp.strftime($timestamp_format),          car_id,          lat,          lon      ]
  43. def      current_timestamp      bucket   !  

     puts        color,          current_timestamp        car_id,          lat,          lon      ]   !    cars      cars !    object      object    object !    object end ! ! ! ! ! ! ! ! ! ! ! !    cars  =  GSet.new      cars.add(car_id)
  44. G[row-only]Set

  45. Set union is commutative and convergent; therefore it is always

    safe to have simultaneous writes to a set which only allows addition.
  46. Conflict-free Replicated Data Type

  47. 1. An alternative to locking 2. A useful abstraction (Set,

    Counter, Map) 3. A way to resolve automatically toward a
 single value Provides:
  48. def      current_timestamp      bucket   !  

     puts        color,          current_timestamp        car_id,          lat,          lon      ]   !    cars      cars !    object      object    object !    object end ! ! ! ! ! ! ! ! ! ! ! ! ! ! !    object  =  bucket.new("%s_%s"  %  [  lat,  lon  ])      object.content_type  =  "application/json"      object.data  =  cars.to_json
  49. {      "type":  "GSet",      "a":  ["car1"]  

    }  
  50. None
  51. None
  52. None
  53. None
  54. def  request_car(request_lat,  request_lon)      local_grid  =  closest_blocks(request_lat,  request_lon)  

    !    puts  "%s\n%%  Car  requested  in  [  %s,  %s  ]!\n\n"  %            ANSI.yellow,          request_lat,          request_lon      ]   !    local_grid.keys.sort.each  do  |distance|          lat,  lon  =  local_grid[distance].split("_")          closest_car  =  get_cars_at_location(lat,  lon)   !        if  closest_car.members.length  >  0              puts  "\n%%  Cars  closest  to  you:\n\n"              puts  JSON.pretty_generate(closest_car.members.to_a)              puts   !            break          end      end   end
  55. def      local_grid   !    puts    

               request_lat,          request_lon      ]   !    local_grid        lat,  lon          closest_car   !                                             !                        end end        request_car(request_lat,  request_lon)  
  56. def      local_grid   !    puts    

               request_lat,          request_lon      ]   !    local_grid        lat,  lon          closest_car   !                                             !                        end end !    local_grid  =  closest_blocks(request_lat,  request_lon)  
  57. def      local_grid   !    puts    

               request_lat,          request_lon      ]   !    local_grid        lat,  lon          closest_car   !                                             !                        end end ! ! !    puts  "%s\n%%  Car  requested  in  [  %s,  %s  ]!\n\n"  %            ANSI.yellow,          request_lat,          request_lon      ]  
  58. def      local_grid   !    puts    

               request_lat,          request_lon      ]   !    local_grid        lat,  lon          closest_car   !                                             !                        end end ! ! ! ! ! ! ! ! !    local_grid.keys.sort.each  do  |distance|  
  59. {      1.2424  =>  "2_1",      2.7335  =>

     "2_2",      5.8733  =>  "5_-­‐5"   }
  60. def      local_grid   !    puts    

               request_lat,          request_lon      ]   !    local_grid        lat,  lon          closest_car   !                                             !                        end end ! ! ! ! ! ! ! ! !    local_grid.keys.sort.each  do  |distance|          lat,  lon  =  local_grid[distance].split("_")          closest_car  =  get_cars_at_location(lat,  lon)  
  61. def      local_grid   !    puts    

               request_lat,          request_lon      ]   !    local_grid        lat,  lon          closest_car   !                                             !                        end end ! ! ! ! ! ! ! ! ! ! !        closest_car  =  get_cars_at_location(lat,  lon)  
  62. Still with me?

  63. def  get_cars_at_location(lat,  lon)      current_timestamp  =  Time.now.utc  -­‐  1

         bucket  =  $client.bucket(current_timestamp.strftime($timestamp_format))      object  =  bucket.get_or_new("%s_%s"  %  [  lat,  lon  ])   !    cars  =  GSet.new   !    if  object.siblings.length  >  1          object.siblings.each  do  |sibling|              unless  sibling.data.nil?                  cars.merge_json(sibling.data)              end          end   !        resolved_object  =  bucket.new("%s_%s"  %  [  lat,  lon  ])          resolved_object.vclock  =  object.vclock          resolved_object.content_type  =  "application/json"          resolved_object.data  =  cars.to_json   !        resolved_object.store(returnbody:  false)      elsif  !object.data.nil?          cars.merge_json(object.data)      end   !    cars   end
  64.        get_cars_at_location(lat,  lon)   def      current_timestamp

         bucket      object   !    cars   !    if        object                            cars                     !        resolved_object          resolved_object        resolved_object        resolved_object !        resolved_object    elsif        cars    end !    cars   end
  65. !    current_timestamp  =  Time.now.utc  -­‐  1      bucket

     =  $client.bucket(current_timestamp.strftime($timestamp_format))      object  =  bucket.get_or_new("%s_%s"  %  [  lat,  lon  ])   def      current_timestamp      bucket      object   !    cars   !    if        object                            cars                     !        resolved_object          resolved_object        resolved_object        resolved_object !        resolved_object    elsif        cars    end !    cars   end
  66. ! ! ! ! !    cars  =  GSet.new  

    def      current_timestamp      bucket      object   !    cars   !    if        object                            cars                     !        resolved_object          resolved_object        resolved_object        resolved_object !        resolved_object    elsif        cars    end !    cars   end
  67. def      current_timestamp      bucket      object

      !    cars   !    if        object                            cars                     !        resolved_object          resolved_object        resolved_object        resolved_object !        resolved_object    elsif        cars    end !    cars   end ! ! ! ! ! ! !    if  object.siblings.length  >  1  
  68. Siblings are a mechanism to prevent data loss during concurrent

    writes or network partitions.
  69. A A A

  70. A A A B C

  71. A A A B C

  72. A A A B C {B,C} {B,C} {B,C}

  73. {      "type":  "GSet",      "a":  ["car1",  "car22"]

      }   {      "type":  "GSet",      "a":  ["car7"]   }  
  74. ! ! ! ! ! ! ! !    

       object.siblings.each  do  |sibling|              unless  sibling.data.nil?                  cars.merge_json(sibling.data)              end          end def      current_timestamp      bucket      object   !    cars   !    if        object                            cars                     !        resolved_object          resolved_object        resolved_object        resolved_object !        resolved_object    elsif        cars    end !    cars   end
  75. ! ! ! ! ! ! ! ! ! !

    ! ! ! !        resolved_object  =  bucket.new("%s_%s"  %  [  lat,  lon  ])          resolved_object.vclock  =  object.vclock          resolved_object.content_type  =  "application/json"          resolved_object.data  =  cars.to_json   !        resolved_object.store(returnbody:  false)   def      current_timestamp      bucket      object   !    cars   !    if        object                            cars                     !        resolved_object          resolved_object        resolved_object        resolved_object !        resolved_object    elsif        cars    end !    cars   end
  76. B C {B,C} {B,C} {B,C}

  77. D D D B C {B,C} {B,C} {B,C}

  78. {      "type":  "GSet",      "a":  ["car1",  "car22",

     "car7"]   }  
  79. def      local_grid   !    puts    

               request_lat,          request_lon      ]   !    local_grid        lat,  lon          closest_car   !                                             !                        end end ! ! ! ! ! ! ! ! ! ! !        closest_car  =  get_cars_at_location(lat,  lon)  
  80. def      local_grid   !    puts    

               request_lat,          request_lon      ]   !    local_grid        lat,  lon          closest_car   !                                             !                        end end ! ! ! ! ! ! ! ! ! ! ! ! !        if  closest_car.members.length  >  0              puts  "\n%%  Cars  closest  to  you:\n\n"              puts  JSON.pretty_generate(closest_car.members.to_a)              puts   !            break          end  
  81. None
  82. Data types Available in Riak 2.0.

  83. bucket  =  $client.bucket(current_timestamp.strftime($timestamp_format))   cars  =  Riak::Crdt::Set.new(bucket,  "%s_%s"  %  [

     lat,  lon  ])   cars.add(car_id)   ! #  -­‐-­‐   ! cars.members   ! #<Set:  {"car1",  "car22",  "car7"}>
  84. None
  85. When would doing all of this work be attractive?

  86. What if the latency, scale, or high-availability requirements you’re facing

    make most of those benefits disappear?
  87. Thanks for listening. E-mail: hector@basho.com Twitter: @hectcastro Web: http://docs.basho.com