Introduction to Amazon DynamoDB

Slide 1

Slide 1 text

Amazon DynamoDB Simone Brunozzi ( @simon) Senior Technology Evangelist Amazon Web Services v 4.0 - Apr 20th, 2013 http://bit.ly/dynamodb2013

Slide 2

Slide 2 text

No-S-Q-What?

Slide 3

Slide 3 text

Who invented “NoSQL” ? ] [ 3

Slide 4

Slide 4 text

Who invented “NoSQL” ? ] [ 3 “NoSQL” conceived in 1998 by Carlo Strozzi (Italy)

Slide 5

Slide 5 text

NoSQL 4 Scaling Structured Storage

Slide 6

Slide 6 text

NoSQL 4 Scaling Feature first •Financial, CRM, Human resources •Dominated by RDBMS Structured Storage

Slide 7

Slide 7 text

NoSQL 4 Scaling Scale first •Facebook, Gmail, Amazon.com, Twitter •RDBMS + Key-Value Feature first •Financial, CRM, Human resources •Dominated by RDBMS Structured Storage

Slide 8

Slide 8 text

NoSQL 4 Scaling Simple structured •BLOB-store not enough •Need query/index •BerkeleyDB/SimpleDB Scale first •Facebook, Gmail, Amazon.com, Twitter •RDBMS + Key-Value Feature first •Financial, CRM, Human resources •Dominated by RDBMS Structured Storage

Slide 9

Slide 9 text

NoSQL 4 Scaling Purpose-optimized •StreamBase, Vertica, VoltDB, Aster Data, Netezza, Greenplum Simple structured •BLOB-store not enough •Need query/index •BerkeleyDB/SimpleDB Scale first •Facebook, Gmail, Amazon.com, Twitter •RDBMS + Key-Value Feature first •Financial, CRM, Human resources •Dominated by RDBMS Structured Storage

Slide 10

Slide 10 text

Slide 11

Slide 11 text

Structured Storage 5 Scaling NoSQL

Slide 12

Slide 12 text

Structured Storage 5 Scaling Document store •Popular: MongoDB, CouchDB •Semi-structured data (XML, JSON, etc) NoSQL

Slide 13

Slide 13 text

Structured Storage 5 Scaling Document store •Popular: MongoDB, CouchDB •Semi-structured data (XML, JSON, etc) Key-Value store •Popular: Cassandra, Redis •Schema-less NoSQL

Slide 14

Slide 14 text

Structured Storage 5 Scaling Document store •Popular: MongoDB, CouchDB •Semi-structured data (XML, JSON, etc) Key-Value store •Popular: Cassandra, Redis •Schema-less Graph Database •Popular: Neo4J, FlockDB •Stores the relationship of data as a graph NoSQL

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Slide 17

Slide 17 text

NoSQL 6 Structured Storage Scaling

Slide 18

Slide 18 text

NoSQL 6 Structured Storage Easier than “YesSQL” ? •NoSQL is better for simple queries, Primary Key lookups •No maintenance windows Scaling

Slide 19

Slide 19 text

NoSQL 6 Structured Storage Durability •Synchronous replication •Built-in durability Easier than “YesSQL” ? •NoSQL is better for simple queries, Primary Key lookups •No maintenance windows Scaling

Slide 20

Slide 20 text

NoSQL 6 Structured Storage Evolution •MySQL: HandlerSocket •PostgreSQL 9.2: index only scan •SE PostgreSQL Durability •Synchronous replication •Built-in durability Easier than “YesSQL” ? •NoSQL is better for simple queries, Primary Key lookups •No maintenance windows Scaling “view leakage”, etc.

Slide 21

Slide 21 text

Why scalability is important ] [ 7 traditional IT capacity Your IT needs Time Capacity

Slide 22

Slide 22 text

8 Usage patterns: traditional IT ] [

Slide 23

Slide 23 text

8 On and Off Fast Growth Variable peaks Predictable peaks Usage patterns: traditional IT ] [

Slide 24

Slide 24 text

9 Usage patterns: traditional IT ] [ Variable peaks Fast Growth Predictable peaks On and Off

Slide 25

Slide 25 text

9 Poor Service WASTE Usage patterns: traditional IT ] [ Variable peaks Fast Growth Predictable peaks On and Off

Slide 26

Slide 26 text

10 Elastic CLOUD capacity traditional IT capacity Your IT needs Usage patterns: Cloud Computing ] [ Time Capacity

Slide 27

Slide 27 text

11 Usage patterns: Cloud Computing ] [ Variable peaks Fast Growth Predictable peaks On and Off

Slide 28

Slide 28 text

11 Usage patterns: Cloud Computing ] [ Variable peaks Fast Growth Predictable peaks On and Off

Slide 29

Slide 29 text

A closer look at DynamoDB

Slide 30

Slide 30 text

13 DynamoDB: Speeeed ] [ (image)

Slide 31

Slide 31 text

13 DynamoDB: Speeeed ] [ Scale to 100,000+ Writes/second (image)

Slide 32

Slide 32 text

Eventually consistent Key-value store Unstructured NoSQL Horizontally scalable Non-Relational Schema-free Distributed DynamoDB keywords

Slide 33

Slide 33 text

DynamoDB ] [ 15 DynamoDB

Slide 34

Slide 34 text

DynamoDB ] [ 15 NoSQL • No schema (only Key) • Hash / Hash + Range • Local Secondary Index DynamoDB

Slide 35

Slide 35 text

DynamoDB ] [ 15 NoSQL • No schema (only Key) • Hash / Hash + Range • Local Secondary Index Speeeed • Provisioned throughput • Auto storage scaling • “Shared nothing” • Low latency (<10ms Wr) • Solid State Drives (SSD) •IOPS per Table DynamoDB

Slide 36

Slide 36 text

Slide 37

Slide 37 text

How do I... Create a table?

Slide 38

Slide 38 text

Creating a table with the Java low-level API ] [ 17 client = new AmazonDynamoDBClient(credentials); String tableName = "ProductCatalog"; KeySchemaElement hashKey = new KeySchemaElement().withAttributeName("Id").withAttributeType("N"); KeySchema ks = new KeySchema().withHashKeyElement(hashKey); ProvisionedThroughput provisionedThroughput = new ProvisionedThroughput() .withReadCapacityUnits(10L) .withWriteCapacityUnits(10L); CreateTableRequest request = new CreateTableRequest() .withTableName(tableName) .withKeySchema(ks) .withProvisionedThroughput(provisionedThroughput); CreateTableResult result = client.createTable(request);

Slide 39

Slide 39 text

Slide 40

Slide 40 text

Creating a table with BOTO library (Python) ] [ 18 >>> message_table_schema = conn.create_schema( hash_key_name='forum', hash_key_proto_value='S', range_key_name='subject', range_key_proto_value='S' ) >>> table = conn.create_table( name='messages', schema=message_table_schema, read_units=5, write_units=5 ) >>>

Slide 41

Slide 41 text

Slide 42

Slide 42 text

Deleting a table? Careful... ] [ 19 >>> conn.delete_table(table)

Slide 43

Slide 43 text

Deleting a table? Careful... ] [ 19 >>> conn.delete_table(table) Permanently deletes Table! I suggest to use THREE different users: 1. Dev/Test 2. Production 3. Read-only You can manage permissions with IAM

Slide 44

Slide 44 text

Managing DynamoDB permissions with IAM ] [ 20 { "Statement": [ { "Action": [ "dynamodb:GetItem", "dynamodb:BatchGetItem", "dynamodb:Query", "dynamodb:Scan", "dynamodb:DescribeTable", "dynamodb:ListTables" ], "Effect": "Allow", "Resource": "*" } ] }

Slide 45

Slide 45 text

Slide 46

Slide 46 text

Managing DynamoDB permissions with IAM ] [ 21 { "Statement": [ { "Action": [ "dynamodb:*" ], "Effect": "Allow", "Resource": "*" } ] }

Slide 47

Slide 47 text

Managing DynamoDB permissions with IAM ] [ 21 { "Statement": [ { "Action": [ "dynamodb:*" ], "Effect": "Allow", "Resource": "*" } ] } Full-access User

Slide 48

Slide 48 text

22 AWS Management Console: IAM (Identity and Access Mgmt) (image)

Slide 49

Slide 49 text

Data Model

Slide 50

Slide 50 text

DynamoDB Data Model ] [

Slide 51

Slide 51 text

DynamoDB Data Model ] [ Table(s) Item(s) Attribute(s)

Slide 52

Slide 52 text

Example: Table, Items, Attributes ] [

Slide 53

Slide 53 text

Products Example: Table, Items, Attributes ] [ Table You must specify what type of Primary Key to use: “Hash”, or “Hash + Range”

Slide 54

Slide 54 text

Products Example: Table, Items, Attributes ] [

Slide 55

Slide 55 text

Products Example: Table, Items, Attributes ] [ Item Item Item

Slide 56

Slide 56 text

Products Example: Table, Items, Attributes ] [

Slide 57

Slide 57 text

Products Example: Table, Items, Attributes ] [ id=”301” id=”201” Author=”Simone”, “John”, “Erin” Title=”PHP basics” Title=”Learn C++” ISBN=”122938” id=”101” Price=”15.50” Cat=”Bycicle”

Slide 58

Slide 58 text

Slide 59

Slide 59 text

Slide 60

Slide 60 text

DynamoDB Data Types ] [

Slide 61

Slide 61 text

DynamoDB Data Types ] [ Scalar Number (+/-, 38 digits) String (UTF-8) Binary Multi-valued Number Set String Set Binary Set • Values in a set must be unique. • Values not ordered.

Slide 62

Slide 62 text

Primary Key: Hash / Hash + Range ] [ 30

Slide 63

Slide 63 text

Primary Key: Hash / Hash + Range ] [ 30 Hash The key is hashed over the different partitions to optimize workload distribution

Slide 64

Slide 64 text

Primary Key: Hash / Hash + Range ] [ 30 Hash The key is hashed over the different partitions to optimize workload distribution Hash + Range When querying, the hash attribute needs to be uniquely matched, but a range operation can be specified for the range attribute. (e.g. all orders in the last 60 minutes)

Slide 65

Slide 65 text

Primary Key: Hash / Hash + Range ] [ 31

Slide 66

Slide 66 text

Primary Key: Hash / Hash + Range ] [ 31 id=100 paid=100.3 id=103 paid=87.0 id=201 paid=33.5 Hash

Slide 67

Slide 67 text

Primary Key: Hash / Hash + Range ] [ 31 id=100 paid=100.3 id=103 paid=87.0 id=201 paid=33.5 id=100 date=2012-09-18 paid=10.66 id=100 date=2012-09-16 paid=71.0 id=103 date=2012-09-10 paid=23.6 Hash Hash + Range

Slide 68

Slide 68 text

32 dynamo.create_table("Activity", { HashKeyElement: {AttributeName: "user", AttributeType: "S"}, RangeKeyElement: {AttributeName: "created", AttributeType: "N"}}, {ReadCapacityUnits: 5, WriteCapacityUnits: 5}) dynamo.put_item("Activity", { user: {S: "roidrage"}, created: {N: Time.now.tv_sec.to_s}, activity: {S: "Checked in"}}) items = activities.items.query( hash_key: "roidrage", range_greater_than: (Time.now - 85600).tv_sec) Table with “Hash + Range” Primary Key (Ruby) ] [

Slide 69

Slide 69 text

32 dynamo.create_table("Activity", { HashKeyElement: {AttributeName: "user", AttributeType: "S"}, RangeKeyElement: {AttributeName: "created", AttributeType: "N"}}, {ReadCapacityUnits: 5, WriteCapacityUnits: 5}) dynamo.put_item("Activity", { user: {S: "roidrage"}, created: {N: Time.now.tv_sec.to_s}, activity: {S: "Checked in"}}) items = activities.items.query( hash_key: "roidrage", range_greater_than: (Time.now - 85600).tv_sec) Query Table with “Hash + Range” Primary Key (Ruby) ] [ Put item Create table

Slide 70

Slide 70 text

Throughput

Slide 71

Slide 71 text

34 Scaling DynamoDB throughput (video)

Slide 72

Slide 72 text

Query / Scan

Slide 73

Slide 73 text

Query / Scan ] [ 36 http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/QueryAndScan.html

Slide 74

Slide 74 text

Query / Scan ] [ 36 Query • Search only on primary key (hash or composite) • Supports a subset of comparison operators on key attribute values. • Returns 1 MB per Query operation. • More efficient than Scan. http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/QueryAndScan.html

Slide 75

Slide 75 text

Query / Scan ] [ 36 Scan • Scans the entire table • Supports a specific set of comparison operators (e.g. <=, >, ==). • Returns 1 MB / Scan. • Slower for bigger tables. Query • Search only on primary key (hash or composite) • Supports a subset of comparison operators on key attribute values. • Returns 1 MB per Query operation. • More efficient than Scan. http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/QueryAndScan.html

Slide 76

Slide 76 text

37 Limiting the capabilities of Query's comparison operators was a deliberate decision, to ensure that this API's performance will always remain predictable, no matter the scale of the table (size or throughput). ... Stefano @ AWS (on discussion forums) Query vs. Scan? ] [

Slide 77

Slide 77 text

38 Limiting the capabilities of Query's comparison operators was a deliberate decision, to ensure that this API's performance will always remain predictable, no matter the scale of the table (size or throughput). This limitation forces the developer to perform more work upfront, but it will yield a scalable workload no matter how much it grows. ... Stefano @ AWS (on discussion forums) Query vs. Scan? ] [

Slide 78

Slide 78 text

39 Limiting the capabilities of Query's comparison operators was a deliberate decision, to ensure that this API's performance will always remain predictable, no matter the scale of the table (size or throughput). This limitation forces the developer to perform more work upfront, but it will yield a scalable workload no matter how much it grows. An operation like CONTAINS could seem appealing on paper, but its performance would start slowing progressively as the dataset size grows, eventually requiring a painful rearchitecture down the road. Stefano @ AWS (on discussion forums) Query vs. Scan? ] [

Slide 79

Slide 79 text

Lost Update

Slide 80

Slide 80 text

Writing to DynamoDB ] [ 41 http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/QueryAndScan.html

Slide 81

Slide 81 text

Slide 82

Slide 82 text

Writing to DynamoDB ] [ 41 http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/QueryAndScan.html How to solve the “Lost update”: Optimistic Concurrency Control (A.K.A. Conditional Writes) Put/Update/Delete are always ACID; “Isolation” only at Item level Atomicity Consistency Isolation Durability { (only at Item)

Slide 83

Slide 83 text

DynamoDB The “Lost update” concurrency issue ] [ Client 1 Client 2 Id=1 Price=10 Time

Slide 84

Slide 84 text

DynamoDB The “Lost update” concurrency issue ] [ Client 1 Client 2 Id=1 Price=10 Id=1 Price=10 Id=1 Price=10 GetItem GetItem Time

Slide 85

Slide 85 text

DynamoDB The “Lost update” concurrency issue ] [ Client 1 Client 2 Id=1 Price=10 Id=1 Price=10 Id=1 Price=10 GetItem GetItem PutItem PutItem Time Id=1 Price=10 Id=1 Price=12 Id=1 Price=10 Id=1 Price=12 Id=1 Price=12 Id=1 Price=8 Id=1 Price=10 Id=1 Price=8

Slide 86

Slide 86 text

DynamoDB How to fix it with Conditional Writes ] [ Client 1 Client 2 Id=1 Price=10 Id=1 Price=10 Id=1 Price=10 GetItem GetItem Time

Slide 87

Slide 87 text

DynamoDB How to fix it with Conditional Writes ] [ Client 1 Client 2 Id=1 Price=10 Id=1 Price=10 Id=1 Price=10 GetItem GetItem Time Id=1 Price=10 Id=1 Price=12 Id=1 Price=10 Id=1 Price=8

Slide 88

Slide 88 text

DynamoDB How to fix it with Conditional Writes ] [ Client 1 Client 2 Id=1 Price=10 Id=1 Price=10 Id=1 Price=10 GetItem GetItem if (Price=10) PutItem Time Id=1 Price=10 Id=1 Price=12 Id=1 Price=10 Id=1 Price=12 Id=1 Price=10 Id=1 Price=8

Slide 89

Slide 89 text

Conditional Writes or Atomic Counters? ] [

Slide 90

Slide 90 text

Conditional Writes or Atomic Counters? ] [ Conditional Writes •Idempotent operation •Small overhead

Slide 91

Slide 91 text

Conditional Writes or Atomic Counters? ] [ Conditional Writes •Idempotent operation •Small overhead Atomic Counters •Increment/Decrement •Allow simultaneous write requests •NOT Idempotent

Slide 92

Slide 92 text

(eventually) consistent read

Slide 93

Slide 93 text

(Eventually) Consistent Reads ] [

Slide 94

Slide 94 text

(Eventually) Consistent Reads ] [ Consistent Read •Consumes more “read capacity” units (2x) •Consistency reached within 1,000 ms after last write

Slide 95

Slide 95 text

(Eventually) Consistent Reads ] [ Consistent Read •Consumes more “read capacity” units (2x) •Consistency reached within 1,000 ms after last write Eventually Consistent Read •Can read immediately after a write (2 copies) •Read old or new value

Slide 96

Slide 96 text

Slide 97

Slide 97 text

Slide 98

Slide 98 text

DynamoDB Durability in DynamoDB ] [ Client Time

Slide 99

Slide 99 text

DynamoDB Durability in DynamoDB ] [ Client Id=1 Price=10 Id=1 Price=10 PutItem Time Confirmation

Slide 100

Slide 100 text

Example: Consistent Read (HTTP request) ] [ 49 // This header is abbreviated. POST / HTTP/1.1 x-amz-target: DynamoDB_20111205.GetItem content-type: application/x-amz-json-1.0 {"TableName":"comptable", "Key": {"HashKeyElement":{"S":"Julie"}, "RangeKeyElement":{"N":"1307654345"}}, "AttributesToGet":["status","friends"], "ConsistentRead":true }

Slide 101

Slide 101 text

Slide 102

Slide 102 text

Consistency Availability Partition Tolerance CAP Theorem ] [

Slide 103

Slide 103 text

APIs

Slide 104

Slide 104 text

DynamoDB APIs ] [

Slide 105

Slide 105 text

DynamoDB APIs ] [ Table •CreateTable •UpdateTable •DeleteTable •DescribeTable •ListTables Item •PutItem •GetItem •UpdateItem •DeleteItem •BatchGetItem •BatchWriteItem Query/Scan •Query •Scan

Slide 106

Slide 106 text

53 (image) Monitoring DynamoDB with CloudWatch ] [

Slide 107

Slide 107 text

+ + + + + + + 53 (image) Monitoring DynamoDB with CloudWatch ] [ Successful Request Latency Consumed Read Capacity Units Throttled Requests User Errors Returned Item Count System Errors Consumed Write Capacity Units

Slide 108

Slide 108 text

A simple example ] [

Slide 109

Slide 109 text

A simple example ] [ Let’s take a look. How to do things with Python and the BOTO library?

Slide 110

Slide 110 text

55 Download and install the BOTO python library (Mac OS) (video)

Slide 111

Slide 111 text

56 Enter AWS credentials and connect to DynamoDB (video)

Slide 112

Slide 112 text

Table We are going to use this schema... ] [

Slide 113

Slide 113 text

Table We are going to use this schema... ] [ read_units=5 write_units=5 forum= subject= hash key range key

Slide 114

Slide 114 text

Messages ... to create a table, and add items. ] [

Slide 115

Slide 115 text

Messages ... to create a table, and add items. ] [ forum= ”AWS forum” Body= ”http://127.0.0.1/hello.gif“ subject= ”Hello!” SentBy= “Simone” forum= ”AWS forum” Body= "Nice meeting with you!" subject= ”Goodbye!” SentBy= “Simone”

Slide 116

Slide 116 text

59 Define Schema and Table, then create the Table (video)

Slide 117

Slide 117 text

60 Put an Item, retrieve the Item from the Table (video)

Slide 118

Slide 118 text

61 Adding another Item with the AWS Management Console (video)

Slide 119

Slide 119 text

Ok, I get BOTO. ] [

Slide 120

Slide 120 text

Ok, I get BOTO. ] [ Do you want more choice?

Slide 121

Slide 121 text

63 (image) Amazon DynamoDB libraries, mappers, etc. ] [

Slide 122

Slide 122 text

63 (image) Perl Javascript Erlang Node.js Java Django PHP Ruby Python .NET Groovy / Grails Cold Fusion Amazon DynamoDB libraries, mappers, etc. ] [

Slide 123

Slide 123 text

64 Hive: Importing/Exporting/Querying Data in DynamoDB

Slide 124

Slide 124 text

65 DynamoDB costs: 0.0065 $/h per 10 writes/second 0.0065 $/h per 50 “strong” reads/second 1.00 $/month per GB

Slide 125

Slide 125 text

65 DynamoDB costs: 0.0065 $/h per 10 writes/second 0.0065 $/h per 50 “strong” reads/second 1.00 $/month per GB Unlike Scan, Query only operates on matching records, not all records. You only pay for the throughput of the items that match, not for everything scanned.

Slide 126

Slide 126 text

65 DynamoDB costs: 0.0065 $/h per 10 writes/second 0.0065 $/h per 50 “strong” reads/second 1.00 $/month per GB For large BLOBs or infrequently accessed data, use Amazon S3 (DynamoDB item limit: 64 KB) You can store smaller data elements or file pointers in DynamoDB

Slide 127

Slide 127 text

65 DynamoDB costs: 0.0065 $/h per 10 writes/second 0.0065 $/h per 50 “strong” reads/second 1.00 $/month per GB DynamoDB Free tier: 5 writes/second 10 consistent reads/second 100 Mb storage

Slide 128

Slide 128 text

66 http://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html

Slide 129

Slide 129 text

Local Secondary Index (LSI)

Slide 130

Slide 130 text

(video) Create a table with a Local Secondary Index (LSI), query it

Slide 131

Slide 131 text

Thoughts...

Slide 132

Slide 132 text

70 But what if the server / storage / datacenter fails? On most NoSQL systems you would lose your most recent changes, or the data might be saved but could be offline and unavailable. ... James Hamilton, VP and Distinguished Engineer, Amazon Web Services

Slide 133

Slide 133 text

71 But what if the server / storage / datacenter fails? On most NoSQL systems you would lose your most recent changes, or the data might be saved but could be offline and unavailable. With dynamoDB, if data is committed just as one entire datacenter burns to the ground, the data is safe, and the application can continue to run without negative impact at exactly the same provisioned throughput rate. The loss of an entire datacenter isn’t even inconvenient, and has no impact on your running application performance. ... James Hamilton, VP and Distinguished Engineer, Amazon Web Services

Slide 134

Slide 134 text

72 But what if the server / storage / datacenter fails? On most NoSQL systems you would lose your most recent changes, or the data might be saved but could be offline and unavailable. With dynamoDB, if data is committed just as one entire datacenter burns to the ground, the data is safe, and the application can continue to run without negative impact at exactly the same provisioned throughput rate. The loss of an entire datacenter isn’t even inconvenient, and has no impact on your running application performance. Combining rock solid synchronous, multi-datacenter redundancy with average latency in the single digits, and throughput scaling to the millions of requests per second is both an excellent engineering challenge and one often not achieved. James Hamilton, VP and Distinguished Engineer, Amazon Web Services

Slide 135

Slide 135 text

73 (image) Live repartitioning, no downtime ] [

Slide 136

Slide 136 text

Why DynamoDB ] [ • Sorted range keys • Conditional updates • Atomic counters • Structured data and multi-valued data types • Fetching and updating single attributes • Strong consistency • No table size limits • Live repartitioning • Disk-only writes • IOPS per table No explicit way to handle conflicts other than conditions

Slide 137

Slide 137 text

No content

Slide 138

Slide 138 text

Amazon DynamoDB Simone Brunozzi ( @simon) Senior Technology Evangelist Amazon Web Services v 4.0 - Apr 20th, 2013 http://bit.ly/dynamodb2013