$30 off During Our Annual Pro Sale. View Details »

Throw Some Keys on It: Data Modeling for Key-Value Data Stores by Example

Hector Castro
February 21, 2014

Throw Some Keys on It: Data Modeling for Key-Value Data Stores by Example

Relational databases are a great tool, but they are just one tool among many. If you have more data than fits on a single server and need a database that is highly available, then you should consider key/value data-storage. You may discover that they are a better fit for your needs.

Many key/value databases are designed to withstand multiple server and network failures, and to scale to dozens or hundreds of servers. In order to take advantage of that power you need to think differently about data modeling.

You will be amazed as to how much can be achieved with a key/value data store.

In this session, you will learn how to model your data to meet latency, scale and high-availability requirements using a high volume use-case of a mobile app that connects passengers with drivers of vehicles for hire, as well as coordinates ridesharing services.

Hector Castro

February 21, 2014
Tweet

More Decks by Hector Castro

Other Decks in Technology

Transcript

  1. Throw Some Keys on It
    Data Modeling for Key/Value Data Stores by Example

    View Slide

  2. Hector Castro
    @hectcastro

    View Slide

  3. View Slide

  4. Relational databases are
    not all bad. They actually
    give us a lot.

    View Slide

  5. Relationships
    Transactions
    Schemas
    Ability to extend preexisting
    structure easily
    Ad-hoc queries (with SQL)

    View Slide

  6. Relationships
    Transactions
    Schemas
    Ability to extend preexisting
    structure easily
    Ad-hoc queries (with SQL)

    View Slide

  7. Relationships
    Transactions
    Schemas
    Ability to extend preexisting
    structure easily
    Ad-hoc queries (with SQL)

    View Slide

  8. Relationships
    Transactions
    Schemas
    Ability to extend preexisting
    structure easily
    Ad-hoc queries (with SQL)

    View Slide

  9. Relationships
    Transactions
    Schemas
    Ability to extend preexisting
    structure easily
    Ad-hoc queries (with SQL)

    View Slide

  10. View Slide

  11. What if your application
    doesn’t need most of that?

    View Slide

  12. What if the latency, scale,
    or high-availability
    requirements you’re facing
    make most of those
    benefits disappear?

    View Slide

  13. Enter key/value data stores.

    View Slide

  14. Schema-less
    Single-access reads
    Great at write-heavy
    workloads
    Easier to scale
    Familiar interface

    View Slide

  15. Schema-less
    Single-access reads
    Great at write-heavy
    workloads
    Easier to scale
    Familiar interface

    View Slide

  16. Schema-less
    Single-access reads
    Great at write-heavy
    workloads
    Easier to scale
    Familiar interface

    View Slide

  17. Schema-less
    Single-access reads
    Great at write-heavy
    workloads
    Easier to scale
    Familiar interface

    View Slide

  18. Schema-less
    Single-access reads
    Great at write-heavy
    workloads
    Easier to scale
    Familiar interface

    View Slide

  19. mapping  =  {  }  
    mapping["p_cool_databas"]  =  "riak"  
    p  mapping["p_cool_databas"]  
    #  =>  "riak"

    View Slide

  20. mapping["p_cool_databas"]  =  ["riak",  "postgres"].to_json  
    p  mapping["p_cool_databas"]  
    #  =>  "[\"riak\",\"postgres\"]"

    View Slide

  21. mapping["p_cool_databas_logo"]  =  File.read("riak_logo.png")  
    p  mapping["p_cool_databas_logo"]  
    #  =>  "\x89PNG\r\n\u001A\n\u0000\u0000\u0000\rIHDR\u0000\...

    View Slide

  22. View Slide

  23. Cool story, Hector, but my
    real-world application is
    too complex for a key/value
    data store.

    View Slide

  24. View Slide

  25. View Slide

  26. View Slide

  27. View Slide

  28. View Slide

  29. View Slide

  30. View Slide

  31. View Slide

  32. View Slide

  33. buckets/20140221133500/keys/1_-­‐2

    View Slide

  34. buckets/20140221133500/keys
                   20140221

    View Slide

  35. buckets/20140221133500/keys
                                   133500

    View Slide

  36. buckets/20140221133500/keys
                                                           1_-­‐2

    View Slide

  37. [  
       "car1",  
       "car22",  
       "car7"  
    ]

    View Slide

  38. def  emit_car_location(car_id,  color,  lat,  lon)  
       current_timestamp  =  Time.now.utc  
       bucket  =  $client.bucket(current_timestamp.strftime($timestamp_format))  
    !
       puts  "%s[%s]:  %s  is  at  [  %s,  %s  ]"  %  [  
           color,  
           current_timestamp.strftime($timestamp_format),  
           car_id,  
           lat,  
           lon  
       ]  
    !
       cars  =  GSet.new  
       cars.add(car_id)  
    !
       object  =  bucket.new("%s_%s"  %  [  lat,  lon  ])  
       object.content_type  =  "application/json"  
       object.data  =  cars.to_json  
    !
       object.store(returnbody:  false)  
    end

    View Slide

  39. def  
       current_timestamp  
       bucket  
    !
       puts
           color,  
           current_timestamp
           car_id,  
           lat,  
           lon  
       ]  
    !
       cars  
       cars
    !
       object  
       object
       object
    !
       object
    end
           emit_car_location(car_id,  color,  lat,  lon)

    View Slide

  40. def  
       current_timestamp  
       bucket  
    !
       puts
           color,  
           current_timestamp
           car_id,  
           lat,  
           lon  
       ]  
    !
       cars  
       cars
    !
       object  
       object
       object
    !
       object
    end
    !
    !
       bucket  =  $client.bucket(current_timestamp.strftime($timestamp_format))  

    View Slide

  41. buckets/20140221133500/keys/
                   20140221133500

    View Slide

  42. def  
       current_timestamp  
       bucket  
    !
       puts
           color,  
           current_timestamp
           car_id,  
           lat,  
           lon  
       ]  
    !
       cars  
       cars
    !
       object  
       object
       object
    !
       object
    end
    !
    !
    !
    !
       puts  "%s[%s]:  %s  is  at  [  %s,  %s  ]"  %  [  
           color,  
           current_timestamp.strftime($timestamp_format),  
           car_id,  
           lat,  
           lon  
       ]

    View Slide

  43. def  
       current_timestamp  
       bucket  
    !
       puts
           color,  
           current_timestamp
           car_id,  
           lat,  
           lon  
       ]  
    !
       cars  
       cars
    !
       object  
       object
       object
    !
       object
    end
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
       cars  =  GSet.new  
       cars.add(car_id)

    View Slide

  44. G[row-only]Set

    View Slide

  45. Set union is commutative
    and convergent; therefore it
    is always safe to have
    simultaneous writes to a set
    which only allows addition.

    View Slide

  46. Conflict-free Replicated
    Data Type

    View Slide

  47. 1. An alternative to locking
    2. A useful abstraction (Set, Counter, Map)
    3. A way to resolve automatically toward a

    single value
    Provides:

    View Slide

  48. def  
       current_timestamp  
       bucket  
    !
       puts
           color,  
           current_timestamp
           car_id,  
           lat,  
           lon  
       ]  
    !
       cars  
       cars
    !
       object  
       object
       object
    !
       object
    end
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
       object  =  bucket.new("%s_%s"  %  [  lat,  lon  ])  
       object.content_type  =  "application/json"  
       object.data  =  cars.to_json

    View Slide

  49. {  
       "type":  "GSet",  
       "a":  ["car1"]  
    }  

    View Slide

  50. View Slide

  51. View Slide

  52. View Slide

  53. View Slide

  54. def  request_car(request_lat,  request_lon)  
       local_grid  =  closest_blocks(request_lat,  request_lon)  
    !
       puts  "%s\n%%  Car  requested  in  [  %s,  %s  ]!\n\n"  %    
           ANSI.yellow,  
           request_lat,  
           request_lon  
       ]  
    !
       local_grid.keys.sort.each  do  |distance|  
           lat,  lon  =  local_grid[distance].split("_")  
           closest_car  =  get_cars_at_location(lat,  lon)  
    !
           if  closest_car.members.length  >  0  
               puts  "\n%%  Cars  closest  to  you:\n\n"  
               puts  JSON.pretty_generate(closest_car.members.to_a)  
               puts  
    !
               break  
           end  
       end  
    end

    View Slide

  55. def  
       local_grid  
    !
       puts
           
           request_lat,  
           request_lon  
       ]  
    !
       local_grid
           lat,  lon  
           closest_car  
    !
           
               
               
               
    !
               
           
       end
    end
           request_car(request_lat,  request_lon)  

    View Slide

  56. def  
       local_grid  
    !
       puts
           
           request_lat,  
           request_lon  
       ]  
    !
       local_grid
           lat,  lon  
           closest_car  
    !
           
               
               
               
    !
               
           
       end
    end
    !
       local_grid  =  closest_blocks(request_lat,  request_lon)  

    View Slide

  57. def  
       local_grid  
    !
       puts
           
           request_lat,  
           request_lon  
       ]  
    !
       local_grid
           lat,  lon  
           closest_car  
    !
           
               
               
               
    !
               
           
       end
    end
    !
    !
    !
       puts  "%s\n%%  Car  requested  in  [  %s,  %s  ]!\n\n"  %    
           ANSI.yellow,  
           request_lat,  
           request_lon  
       ]  

    View Slide

  58. def  
       local_grid  
    !
       puts
           
           request_lat,  
           request_lon  
       ]  
    !
       local_grid
           lat,  lon  
           closest_car  
    !
           
               
               
               
    !
               
           
       end
    end
    !
    !
    !
    !
    !
    !
    !
    !
    !
       local_grid.keys.sort.each  do  |distance|  

    View Slide

  59. {  
       1.2424  =>  "2_1",  
       2.7335  =>  "2_2",  
       5.8733  =>  "5_-­‐5"  
    }

    View Slide

  60. def  
       local_grid  
    !
       puts
           
           request_lat,  
           request_lon  
       ]  
    !
       local_grid
           lat,  lon  
           closest_car  
    !
           
               
               
               
    !
               
           
       end
    end
    !
    !
    !
    !
    !
    !
    !
    !
    !
       local_grid.keys.sort.each  do  |distance|  
           lat,  lon  =  local_grid[distance].split("_")  
           closest_car  =  get_cars_at_location(lat,  lon)  

    View Slide

  61. def  
       local_grid  
    !
       puts
           
           request_lat,  
           request_lon  
       ]  
    !
       local_grid
           lat,  lon  
           closest_car  
    !
           
               
               
               
    !
               
           
       end
    end
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
           closest_car  =  get_cars_at_location(lat,  lon)  

    View Slide

  62. Still with me?

    View Slide

  63. def  get_cars_at_location(lat,  lon)  
       current_timestamp  =  Time.now.utc  -­‐  1  
       bucket  =  $client.bucket(current_timestamp.strftime($timestamp_format))  
       object  =  bucket.get_or_new("%s_%s"  %  [  lat,  lon  ])  
    !
       cars  =  GSet.new  
    !
       if  object.siblings.length  >  1  
           object.siblings.each  do  |sibling|  
               unless  sibling.data.nil?  
                   cars.merge_json(sibling.data)  
               end  
           end  
    !
           resolved_object  =  bucket.new("%s_%s"  %  [  lat,  lon  ])  
           resolved_object.vclock  =  object.vclock  
           resolved_object.content_type  =  "application/json"  
           resolved_object.data  =  cars.to_json  
    !
           resolved_object.store(returnbody:  false)  
       elsif  !object.data.nil?  
           cars.merge_json(object.data)  
       end  
    !
       cars  
    end

    View Slide

  64.        get_cars_at_location(lat,  lon)  
    def  
       current_timestamp  
       bucket  
       object  
    !
       cars  
    !
       if
           object
               
                   cars
               
           
    !
           resolved_object  
           resolved_object
           resolved_object
           resolved_object
    !
           resolved_object
       elsif
           cars
       end
    !
       cars  
    end

    View Slide

  65. !
       current_timestamp  =  Time.now.utc  -­‐  1  
       bucket  =  $client.bucket(current_timestamp.strftime($timestamp_format))  
       object  =  bucket.get_or_new("%s_%s"  %  [  lat,  lon  ])  
    def  
       current_timestamp  
       bucket  
       object  
    !
       cars  
    !
       if
           object
               
                   cars
               
           
    !
           resolved_object  
           resolved_object
           resolved_object
           resolved_object
    !
           resolved_object
       elsif
           cars
       end
    !
       cars  
    end

    View Slide

  66. !
    !
    !
    !
    !
       cars  =  GSet.new  
    def  
       current_timestamp  
       bucket  
       object  
    !
       cars  
    !
       if
           object
               
                   cars
               
           
    !
           resolved_object  
           resolved_object
           resolved_object
           resolved_object
    !
           resolved_object
       elsif
           cars
       end
    !
       cars  
    end

    View Slide

  67. def  
       current_timestamp  
       bucket  
       object  
    !
       cars  
    !
       if
           object
               
                   cars
               
           
    !
           resolved_object  
           resolved_object
           resolved_object
           resolved_object
    !
           resolved_object
       elsif
           cars
       end
    !
       cars  
    end
    !
    !
    !
    !
    !
    !
    !
       if  object.siblings.length  >  1  

    View Slide

  68. Siblings are a mechanism
    to prevent data loss
    during concurrent writes
    or network partitions.

    View Slide

  69. A A A

    View Slide

  70. A A A
    B C

    View Slide

  71. A A A
    B C

    View Slide

  72. A A A
    B C
    {B,C} {B,C} {B,C}

    View Slide

  73. {  
       "type":  "GSet",  
       "a":  ["car1",  "car22"]  
    }  
    {  
       "type":  "GSet",  
       "a":  ["car7"]  
    }  

    View Slide

  74. !
    !
    !
    !
    !
    !
    !
    !
           object.siblings.each  do  |sibling|  
               unless  sibling.data.nil?  
                   cars.merge_json(sibling.data)  
               end  
           end
    def  
       current_timestamp  
       bucket  
       object  
    !
       cars  
    !
       if
           object
               
                   cars
               
           
    !
           resolved_object  
           resolved_object
           resolved_object
           resolved_object
    !
           resolved_object
       elsif
           cars
       end
    !
       cars  
    end

    View Slide

  75. !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
           resolved_object  =  bucket.new("%s_%s"  %  [  lat,  lon  ])  
           resolved_object.vclock  =  object.vclock  
           resolved_object.content_type  =  "application/json"  
           resolved_object.data  =  cars.to_json  
    !
           resolved_object.store(returnbody:  false)  
    def  
       current_timestamp  
       bucket  
       object  
    !
       cars  
    !
       if
           object
               
                   cars
               
           
    !
           resolved_object  
           resolved_object
           resolved_object
           resolved_object
    !
           resolved_object
       elsif
           cars
       end
    !
       cars  
    end

    View Slide

  76. B C
    {B,C} {B,C} {B,C}

    View Slide

  77. D D D
    B C
    {B,C} {B,C} {B,C}

    View Slide

  78. {  
       "type":  "GSet",  
       "a":  ["car1",  "car22",  "car7"]  
    }  

    View Slide

  79. def  
       local_grid  
    !
       puts
           
           request_lat,  
           request_lon  
       ]  
    !
       local_grid
           lat,  lon  
           closest_car  
    !
           
               
               
               
    !
               
           
       end
    end
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
           closest_car  =  get_cars_at_location(lat,  lon)  

    View Slide

  80. def  
       local_grid  
    !
       puts
           
           request_lat,  
           request_lon  
       ]  
    !
       local_grid
           lat,  lon  
           closest_car  
    !
           
               
               
               
    !
               
           
       end
    end
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
    !
           if  closest_car.members.length  >  0  
               puts  "\n%%  Cars  closest  to  you:\n\n"  
               puts  JSON.pretty_generate(closest_car.members.to_a)  
               puts  
    !
               break  
           end  

    View Slide

  81. View Slide

  82. Data types
    Available in Riak 2.0.

    View Slide

  83. bucket  =  $client.bucket(current_timestamp.strftime($timestamp_format))  
    cars  =  Riak::Crdt::Set.new(bucket,  "%s_%s"  %  [  lat,  lon  ])  
    cars.add(car_id)  
    !
    #  -­‐-­‐  
    !
    cars.members  
    !
    #

    View Slide

  84. View Slide

  85. When would doing all of
    this work be attractive?

    View Slide

  86. What if the latency, scale,
    or high-availability
    requirements you’re facing
    make most of those
    benefits disappear?

    View Slide

  87. Thanks for listening.
    E-mail: [email protected]
    Twitter: @hectcastro
    Web: http://docs.basho.com

    View Slide