Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction to Amazon DynamoDB

Introduction to Amazon DynamoDB

Presentation given at Percona 2013 conference in Santa Clara, California. Speaker is Simone Brunozzi.
You can follow him on Twitter: @simon

Simone Brunozzi

April 23, 2013
Tweet

More Decks by Simone Brunozzi

Other Decks in Technology

Transcript

  1. Amazon DynamoDB Simone Brunozzi ( @simon) Senior Technology Evangelist Amazon

    Web Services v 4.0 - Apr 20th, 2013 http://bit.ly/dynamodb2013
  2. NoSQL 4 Scaling Scale first •Facebook, Gmail, Amazon.com, Twitter •RDBMS

    + Key-Value Feature first •Financial, CRM, Human resources •Dominated by RDBMS Structured Storage
  3. NoSQL 4 Scaling Simple structured •BLOB-store not enough •Need query/index

    •BerkeleyDB/SimpleDB Scale first •Facebook, Gmail, Amazon.com, Twitter •RDBMS + Key-Value Feature first •Financial, CRM, Human resources •Dominated by RDBMS Structured Storage
  4. NoSQL 4 Scaling Purpose-optimized •StreamBase, Vertica, VoltDB, Aster Data, Netezza,

    Greenplum Simple structured •BLOB-store not enough •Need query/index •BerkeleyDB/SimpleDB Scale first •Facebook, Gmail, Amazon.com, Twitter •RDBMS + Key-Value Feature first •Financial, CRM, Human resources •Dominated by RDBMS Structured Storage
  5. NoSQL 4 Scaling Purpose-optimized •StreamBase, Vertica, VoltDB, Aster Data, Netezza,

    Greenplum Simple structured •BLOB-store not enough •Need query/index •BerkeleyDB/SimpleDB Scale first •Facebook, Gmail, Amazon.com, Twitter •RDBMS + Key-Value Feature first •Financial, CRM, Human resources •Dominated by RDBMS Structured Storage Good for NoSQL
  6. Structured Storage 5 Scaling Document store •Popular: MongoDB, CouchDB •Semi-structured

    data (XML, JSON, etc) Key-Value store •Popular: Cassandra, Redis •Schema-less NoSQL
  7. Structured Storage 5 Scaling Document store •Popular: MongoDB, CouchDB •Semi-structured

    data (XML, JSON, etc) Key-Value store •Popular: Cassandra, Redis •Schema-less Graph Database •Popular: Neo4J, FlockDB •Stores the relationship of data as a graph NoSQL
  8. Structured Storage 5 Scaling Document store •Popular: MongoDB, CouchDB •Semi-structured

    data (XML, JSON, etc) Key-Value store •Popular: Cassandra, Redis •Schema-less Graph Database •Popular: Neo4J, FlockDB •Stores the relationship of data as a graph Etc. •Many others NoSQL
  9. Structured Storage 5 Scaling Document store •Popular: MongoDB, CouchDB •Semi-structured

    data (XML, JSON, etc) Key-Value store •Popular: Cassandra, Redis •Schema-less Graph Database •Popular: Neo4J, FlockDB •Stores the relationship of data as a graph Etc. •Many others NoSQL DynamoDB
  10. NoSQL 6 Structured Storage Easier than “YesSQL” ? •NoSQL is

    better for simple queries, Primary Key lookups •No maintenance windows Scaling
  11. NoSQL 6 Structured Storage Durability •Synchronous replication •Built-in durability Easier

    than “YesSQL” ? •NoSQL is better for simple queries, Primary Key lookups •No maintenance windows Scaling
  12. NoSQL 6 Structured Storage Evolution •MySQL: HandlerSocket •PostgreSQL 9.2: index

    only scan •SE PostgreSQL Durability •Synchronous replication •Built-in durability Easier than “YesSQL” ? •NoSQL is better for simple queries, Primary Key lookups •No maintenance windows Scaling “view leakage”, etc.
  13. 9 Usage patterns: traditional IT ] [ Variable peaks Fast

    Growth Predictable peaks On and Off
  14. 9 Poor Service WASTE Usage patterns: traditional IT ] [

    Variable peaks Fast Growth Predictable peaks On and Off
  15. 10 Elastic CLOUD capacity traditional IT capacity Your IT needs

    Usage patterns: Cloud Computing ] [ Time Capacity
  16. DynamoDB ] [ 15 NoSQL • No schema (only Key)

    • Hash / Hash + Range • Local Secondary Index DynamoDB
  17. DynamoDB ] [ 15 NoSQL • No schema (only Key)

    • Hash / Hash + Range • Local Secondary Index Speeeed • Provisioned throughput • Auto storage scaling • “Shared nothing” • Low latency (<10ms Wr) • Solid State Drives (SSD) •IOPS per Table DynamoDB
  18. DynamoDB ] [ 15 NoSQL • No schema (only Key)

    • Hash / Hash + Range • Local Secondary Index Speeeed • Provisioned throughput • Auto storage scaling • “Shared nothing” • Low latency (<10ms Wr) • Solid State Drives (SSD) •IOPS per Table Robust • Built-in fault tolerance • Strong consistency • Atomic counters • Disk-only writes DynamoDB
  19. Creating a table with the Java low-level API ] [

    17 client = new AmazonDynamoDBClient(credentials); String tableName = "ProductCatalog"; KeySchemaElement hashKey = new KeySchemaElement().withAttributeName("Id").withAttributeType("N"); KeySchema ks = new KeySchema().withHashKeyElement(hashKey); ProvisionedThroughput provisionedThroughput = new ProvisionedThroughput() .withReadCapacityUnits(10L) .withWriteCapacityUnits(10L); CreateTableRequest request = new CreateTableRequest() .withTableName(tableName) .withKeySchema(ks) .withProvisionedThroughput(provisionedThroughput); CreateTableResult result = client.createTable(request);
  20. Creating a table with the Java low-level API ] [

    17 client = new AmazonDynamoDBClient(credentials); String tableName = "ProductCatalog"; KeySchemaElement hashKey = new KeySchemaElement().withAttributeName("Id").withAttributeType("N"); KeySchema ks = new KeySchema().withHashKeyElement(hashKey); ProvisionedThroughput provisionedThroughput = new ProvisionedThroughput() .withReadCapacityUnits(10L) .withWriteCapacityUnits(10L); CreateTableRequest request = new CreateTableRequest() .withTableName(tableName) .withKeySchema(ks) .withProvisionedThroughput(provisionedThroughput); CreateTableResult result = client.createTable(request); Schema Throughput Table
  21. Creating a table with BOTO library (Python) ] [ 18

    >>> message_table_schema = conn.create_schema( hash_key_name='forum', hash_key_proto_value='S', range_key_name='subject', range_key_proto_value='S' ) >>> table = conn.create_table( name='messages', schema=message_table_schema, read_units=5, write_units=5 ) >>>
  22. Creating a table with BOTO library (Python) ] [ 18

    >>> message_table_schema = conn.create_schema( hash_key_name='forum', hash_key_proto_value='S', range_key_name='subject', range_key_proto_value='S' ) >>> table = conn.create_table( name='messages', schema=message_table_schema, read_units=5, write_units=5 ) >>> Schema Table Throughput
  23. Deleting a table? Careful... ] [ 19 >>> conn.delete_table(table) Permanently

    deletes Table! I suggest to use THREE different users: 1. Dev/Test 2. Production 3. Read-only You can manage permissions with IAM
  24. Managing DynamoDB permissions with IAM ] [ 20 { "Statement":

    [ { "Action": [ "dynamodb:GetItem", "dynamodb:BatchGetItem", "dynamodb:Query", "dynamodb:Scan", "dynamodb:DescribeTable", "dynamodb:ListTables" ], "Effect": "Allow", "Resource": "*" } ] }
  25. Managing DynamoDB permissions with IAM ] [ 20 { "Statement":

    [ { "Action": [ "dynamodb:GetItem", "dynamodb:BatchGetItem", "dynamodb:Query", "dynamodb:Scan", "dynamodb:DescribeTable", "dynamodb:ListTables" ], "Effect": "Allow", "Resource": "*" } ] } Read-only user
  26. Managing DynamoDB permissions with IAM ] [ 21 { "Statement":

    [ { "Action": [ "dynamodb:*" ], "Effect": "Allow", "Resource": "*" } ] }
  27. Managing DynamoDB permissions with IAM ] [ 21 { "Statement":

    [ { "Action": [ "dynamodb:*" ], "Effect": "Allow", "Resource": "*" } ] } Full-access User
  28. Products Example: Table, Items, Attributes ] [ Table You must

    specify what type of Primary Key to use: “Hash”, or “Hash + Range”
  29. Products Example: Table, Items, Attributes ] [ id=”301” id=”201” Author=”Simone”,

    “John”, “Erin” Title=”PHP basics” Title=”Learn C++” ISBN=”122938” id=”101” Price=”15.50” Cat=”Bycicle”
  30. Products Example: Table, Items, Attributes ] [ id=”301” id=”201” Author=”Simone”,

    “John”, “Erin” Title=”PHP basics” Title=”Learn C++” ISBN=”122938” id=”101” Price=”15.50” Cat=”Bycicle”
  31. Products Example: Table, Items, Attributes ] [ id=”301” id=”201” Author=”Simone”,

    “John”, “Erin” Title=”PHP basics” Title=”Learn C++” ISBN=”122938” id=”101” Price=”15.50” Cat=”Bycicle” Multi-valued data type Scalar data type
  32. DynamoDB Data Types ] [ Scalar Number (+/-, 38 digits)

    String (UTF-8) Binary Multi-valued Number Set String Set Binary Set • Values in a set must be unique. • Values not ordered.
  33. Primary Key: Hash / Hash + Range ] [ 30

    Hash The key is hashed over the different partitions to optimize workload distribution
  34. Primary Key: Hash / Hash + Range ] [ 30

    Hash The key is hashed over the different partitions to optimize workload distribution Hash + Range When querying, the hash attribute needs to be uniquely matched, but a range operation can be specified for the range attribute. (e.g. all orders in the last 60 minutes)
  35. Primary Key: Hash / Hash + Range ] [ 31

    id=100 paid=100.3 id=103 paid=87.0 id=201 paid=33.5 Hash
  36. Primary Key: Hash / Hash + Range ] [ 31

    id=100 paid=100.3 id=103 paid=87.0 id=201 paid=33.5 id=100 date=2012-09-18 paid=10.66 id=100 date=2012-09-16 paid=71.0 id=103 date=2012-09-10 paid=23.6 Hash Hash + Range
  37. 32 dynamo.create_table("Activity", { HashKeyElement: {AttributeName: "user", AttributeType: "S"}, RangeKeyElement: {AttributeName:

    "created", AttributeType: "N"}}, {ReadCapacityUnits: 5, WriteCapacityUnits: 5}) dynamo.put_item("Activity", { user: {S: "roidrage"}, created: {N: Time.now.tv_sec.to_s}, activity: {S: "Checked in"}}) items = activities.items.query( hash_key: "roidrage", range_greater_than: (Time.now - 85600).tv_sec) Table with “Hash + Range” Primary Key (Ruby) ] [
  38. 32 dynamo.create_table("Activity", { HashKeyElement: {AttributeName: "user", AttributeType: "S"}, RangeKeyElement: {AttributeName:

    "created", AttributeType: "N"}}, {ReadCapacityUnits: 5, WriteCapacityUnits: 5}) dynamo.put_item("Activity", { user: {S: "roidrage"}, created: {N: Time.now.tv_sec.to_s}, activity: {S: "Checked in"}}) items = activities.items.query( hash_key: "roidrage", range_greater_than: (Time.now - 85600).tv_sec) Query Table with “Hash + Range” Primary Key (Ruby) ] [ Put item Create table
  39. Query / Scan ] [ 36 Query • Search only

    on primary key (hash or composite) • Supports a subset of comparison operators on key attribute values. • Returns 1 MB per Query operation. • More efficient than Scan. http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/QueryAndScan.html
  40. Query / Scan ] [ 36 Scan • Scans the

    entire table • Supports a specific set of comparison operators (e.g. <=, >, ==). • Returns 1 MB / Scan. • Slower for bigger tables. Query • Search only on primary key (hash or composite) • Supports a subset of comparison operators on key attribute values. • Returns 1 MB per Query operation. • More efficient than Scan. http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/QueryAndScan.html
  41. 37 Limiting the capabilities of Query's comparison operators was a

    deliberate decision, to ensure that this API's performance will always remain predictable, no matter the scale of the table (size or throughput). ... Stefano @ AWS (on discussion forums) Query vs. Scan? ] [
  42. 38 Limiting the capabilities of Query's comparison operators was a

    deliberate decision, to ensure that this API's performance will always remain predictable, no matter the scale of the table (size or throughput). This limitation forces the developer to perform more work upfront, but it will yield a scalable workload no matter how much it grows. ... Stefano @ AWS (on discussion forums) Query vs. Scan? ] [
  43. 39 Limiting the capabilities of Query's comparison operators was a

    deliberate decision, to ensure that this API's performance will always remain predictable, no matter the scale of the table (size or throughput). This limitation forces the developer to perform more work upfront, but it will yield a scalable workload no matter how much it grows. An operation like CONTAINS could seem appealing on paper, but its performance would start slowing progressively as the dataset size grows, eventually requiring a painful rearchitecture down the road. Stefano @ AWS (on discussion forums) Query vs. Scan? ] [
  44. Writing to DynamoDB ] [ 41 http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/QueryAndScan.html How to solve

    the “Lost update”: Optimistic Concurrency Control (A.K.A. Conditional Writes) Put/Update/Delete are always ACID; “Isolation” only at Item level Atomicity Consistency Isolation Durability { (only at Item)
  45. DynamoDB The “Lost update” concurrency issue ] [ Client 1

    Client 2 Id=1 Price=10 Id=1 Price=10 Id=1 Price=10 GetItem GetItem Time
  46. DynamoDB The “Lost update” concurrency issue ] [ Client 1

    Client 2 Id=1 Price=10 Id=1 Price=10 Id=1 Price=10 GetItem GetItem PutItem PutItem Time Id=1 Price=10 Id=1 Price=12 Id=1 Price=10 Id=1 Price=12 Id=1 Price=12 Id=1 Price=8 Id=1 Price=10 Id=1 Price=8
  47. DynamoDB How to fix it with Conditional Writes ] [

    Client 1 Client 2 Id=1 Price=10 Id=1 Price=10 Id=1 Price=10 GetItem GetItem Time
  48. DynamoDB How to fix it with Conditional Writes ] [

    Client 1 Client 2 Id=1 Price=10 Id=1 Price=10 Id=1 Price=10 GetItem GetItem Time Id=1 Price=10 Id=1 Price=12 Id=1 Price=10 Id=1 Price=8
  49. DynamoDB How to fix it with Conditional Writes ] [

    Client 1 Client 2 Id=1 Price=10 Id=1 Price=10 Id=1 Price=10 GetItem GetItem if (Price=10) PutItem Time Id=1 Price=10 Id=1 Price=12 Id=1 Price=10 Id=1 Price=12 Id=1 Price=10 Id=1 Price=8
  50. Conditional Writes or Atomic Counters? ] [ Conditional Writes •Idempotent

    operation •Small overhead Atomic Counters •Increment/Decrement •Allow simultaneous write requests •NOT Idempotent
  51. (Eventually) Consistent Reads ] [ Consistent Read •Consumes more “read

    capacity” units (2x) •Consistency reached within 1,000 ms after last write
  52. (Eventually) Consistent Reads ] [ Consistent Read •Consumes more “read

    capacity” units (2x) •Consistency reached within 1,000 ms after last write Eventually Consistent Read •Can read immediately after a write (2 copies) •Read old or new value
  53. (Eventually) Consistent Reads ] [ Consistent Read •Consumes more “read

    capacity” units (2x) •Consistency reached within 1,000 ms after last write Eventually Consistent Read •Can read immediately after a write (2 copies) •Read old or new value (2 copies)
  54. (Eventually) Consistent Reads ] [ Consistent Read •Consumes more “read

    capacity” units (2x) •Consistency reached within 1,000 ms after last write Eventually Consistent Read •Can read immediately after a write (2 copies) •Read old or new value (2 copies) Let me explain...
  55. Example: Consistent Read (HTTP request) ] [ 49 // This

    header is abbreviated. POST / HTTP/1.1 x-amz-target: DynamoDB_20111205.GetItem content-type: application/x-amz-json-1.0 {"TableName":"comptable", "Key": {"HashKeyElement":{"S":"Julie"}, "RangeKeyElement":{"N":"1307654345"}}, "AttributesToGet":["status","friends"], "ConsistentRead":true }
  56. Example: Consistent Read (HTTP request) ] [ 49 // This

    header is abbreviated. POST / HTTP/1.1 x-amz-target: DynamoDB_20111205.GetItem content-type: application/x-amz-json-1.0 {"TableName":"comptable", "Key": {"HashKeyElement":{"S":"Julie"}, "RangeKeyElement":{"N":"1307654345"}}, "AttributesToGet":["status","friends"], "ConsistentRead":true } Consistent Read
  57. DynamoDB APIs ] [ Table •CreateTable •UpdateTable •DeleteTable •DescribeTable •ListTables

    Item •PutItem •GetItem •UpdateItem •DeleteItem •BatchGetItem •BatchWriteItem Query/Scan •Query •Scan
  58. + + + + + + + 53 (image) Monitoring

    DynamoDB with CloudWatch ] [ Successful Request Latency Consumed Read Capacity Units Throttled Requests User Errors Returned Item Count System Errors Consumed Write Capacity Units
  59. A simple example ] [ Let’s take a look. How

    to do things with Python and the BOTO library?
  60. Table We are going to use this schema... ] [

    read_units=5 write_units=5 forum= subject= hash key range key
  61. Messages ... to create a table, and add items. ]

    [ forum= ”AWS forum” Body= ”http://127.0.0.1/hello.gif“ subject= ”Hello!” SentBy= “Simone” forum= ”AWS forum” Body= "Nice meeting with you!" subject= ”Goodbye!” SentBy= “Simone”
  62. 63 (image) Perl Javascript Erlang Node.js Java Django PHP Ruby

    Python .NET Groovy / Grails Cold Fusion Amazon DynamoDB libraries, mappers, etc. ] [
  63. 65 DynamoDB costs: 0.0065 $/h per 10 writes/second 0.0065 $/h

    per 50 “strong” reads/second 1.00 $/month per GB
  64. 65 DynamoDB costs: 0.0065 $/h per 10 writes/second 0.0065 $/h

    per 50 “strong” reads/second 1.00 $/month per GB Unlike Scan, Query only operates on matching records, not all records. You only pay for the throughput of the items that match, not for everything scanned.
  65. 65 DynamoDB costs: 0.0065 $/h per 10 writes/second 0.0065 $/h

    per 50 “strong” reads/second 1.00 $/month per GB For large BLOBs or infrequently accessed data, use Amazon S3 (DynamoDB item limit: 64 KB) You can store smaller data elements or file pointers in DynamoDB
  66. 65 DynamoDB costs: 0.0065 $/h per 10 writes/second 0.0065 $/h

    per 50 “strong” reads/second 1.00 $/month per GB DynamoDB Free tier: 5 writes/second 10 consistent reads/second 100 Mb storage
  67. 70 But what if the server / storage / datacenter

    fails? On most NoSQL systems you would lose your most recent changes, or the data might be saved but could be offline and unavailable. ... James Hamilton, VP and Distinguished Engineer, Amazon Web Services
  68. 71 But what if the server / storage / datacenter

    fails? On most NoSQL systems you would lose your most recent changes, or the data might be saved but could be offline and unavailable. With dynamoDB, if data is committed just as one entire datacenter burns to the ground, the data is safe, and the application can continue to run without negative impact at exactly the same provisioned throughput rate. The loss of an entire datacenter isn’t even inconvenient, and has no impact on your running application performance. ... James Hamilton, VP and Distinguished Engineer, Amazon Web Services
  69. 72 But what if the server / storage / datacenter

    fails? On most NoSQL systems you would lose your most recent changes, or the data might be saved but could be offline and unavailable. With dynamoDB, if data is committed just as one entire datacenter burns to the ground, the data is safe, and the application can continue to run without negative impact at exactly the same provisioned throughput rate. The loss of an entire datacenter isn’t even inconvenient, and has no impact on your running application performance. Combining rock solid synchronous, multi-datacenter redundancy with average latency in the single digits, and throughput scaling to the millions of requests per second is both an excellent engineering challenge and one often not achieved. James Hamilton, VP and Distinguished Engineer, Amazon Web Services
  70. Why DynamoDB ] [ • Sorted range keys • Conditional

    updates • Atomic counters • Structured data and multi-valued data types • Fetching and updating single attributes • Strong consistency • No table size limits • Live repartitioning • Disk-only writes • IOPS per table No explicit way to handle conflicts other than conditions
  71. Amazon DynamoDB Simone Brunozzi ( @simon) Senior Technology Evangelist Amazon

    Web Services v 4.0 - Apr 20th, 2013 http://bit.ly/dynamodb2013