Introduction to Amazon DynamoDB

Amazon DynamoDB Simone Brunozzi ( @simon) Senior Technology Evangelist Amazon
Web Services v 4.0 - Apr 20th, 2013 http://bit.ly/dynamodb2013

No-S-Q-What?

Who invented “NoSQL” ? ] [ 3

Who invented “NoSQL” ? ] [ 3 “NoSQL” conceived in
1998 by Carlo Strozzi (Italy)

NoSQL 4 Scaling Structured Storage

NoSQL 4 Scaling Feature first •Financial, CRM, Human resources •Dominated
by RDBMS Structured Storage

NoSQL 4 Scaling Scale first •Facebook, Gmail, Amazon.com, Twitter •RDBMS
+ Key-Value Feature first •Financial, CRM, Human resources •Dominated by RDBMS Structured Storage

NoSQL 4 Scaling Simple structured •BLOB-store not enough •Need query/index
•BerkeleyDB/SimpleDB Scale first •Facebook, Gmail, Amazon.com, Twitter •RDBMS + Key-Value Feature first •Financial, CRM, Human resources •Dominated by RDBMS Structured Storage

NoSQL 4 Scaling Purpose-optimized •StreamBase, Vertica, VoltDB, Aster Data, Netezza,
Greenplum Simple structured •BLOB-store not enough •Need query/index •BerkeleyDB/SimpleDB Scale first •Facebook, Gmail, Amazon.com, Twitter •RDBMS + Key-Value Feature first •Financial, CRM, Human resources •Dominated by RDBMS Structured Storage

NoSQL 4 Scaling Purpose-optimized •StreamBase, Vertica, VoltDB, Aster Data, Netezza,
Greenplum Simple structured •BLOB-store not enough •Need query/index •BerkeleyDB/SimpleDB Scale first •Facebook, Gmail, Amazon.com, Twitter •RDBMS + Key-Value Feature first •Financial, CRM, Human resources •Dominated by RDBMS Structured Storage Good for NoSQL

Structured Storage 5 Scaling NoSQL

Structured Storage 5 Scaling Document store •Popular: MongoDB, CouchDB •Semi-structured
data (XML, JSON, etc) NoSQL

data (XML, JSON, etc) Key-Value store •Popular: Cassandra, Redis •Schema-less NoSQL

data (XML, JSON, etc) Key-Value store •Popular: Cassandra, Redis •Schema-less Graph Database •Popular: Neo4J, FlockDB •Stores the relationship of data as a graph NoSQL

data (XML, JSON, etc) Key-Value store •Popular: Cassandra, Redis •Schema-less Graph Database •Popular: Neo4J, FlockDB •Stores the relationship of data as a graph Etc. •Many others NoSQL

data (XML, JSON, etc) Key-Value store •Popular: Cassandra, Redis •Schema-less Graph Database •Popular: Neo4J, FlockDB •Stores the relationship of data as a graph Etc. •Many others NoSQL DynamoDB

NoSQL 6 Structured Storage Scaling

NoSQL 6 Structured Storage Easier than “YesSQL” ? •NoSQL is
better for simple queries, Primary Key lookups •No maintenance windows Scaling

NoSQL 6 Structured Storage Durability •Synchronous replication •Built-in durability Easier
than “YesSQL” ? •NoSQL is better for simple queries, Primary Key lookups •No maintenance windows Scaling

NoSQL 6 Structured Storage Evolution •MySQL: HandlerSocket •PostgreSQL 9.2: index
only scan •SE PostgreSQL Durability •Synchronous replication •Built-in durability Easier than “YesSQL” ? •NoSQL is better for simple queries, Primary Key lookups •No maintenance windows Scaling “view leakage”, etc.

Why scalability is important ] [ 7 traditional IT capacity
Your IT needs Time Capacity

8 Usage patterns: traditional IT ] [

8 On and Off Fast Growth Variable peaks Predictable peaks
Usage patterns: traditional IT ] [

9 Usage patterns: traditional IT ] [ Variable peaks Fast
Growth Predictable peaks On and Off

9 Poor Service WASTE Usage patterns: traditional IT ] [
Variable peaks Fast Growth Predictable peaks On and Off

10 Elastic CLOUD capacity traditional IT capacity Your IT needs
Usage patterns: Cloud Computing ] [ Time Capacity

11 Usage patterns: Cloud Computing ] [ Variable peaks Fast
Growth Predictable peaks On and Off

A closer look at DynamoDB

13 DynamoDB: Speeeed ] [ (image)

13 DynamoDB: Speeeed ] [ Scale to 100,000+ Writes/second (image)

Eventually consistent Key-value store Unstructured NoSQL Horizontally scalable Non-Relational Schema-free
Distributed DynamoDB keywords

DynamoDB ] [ 15 DynamoDB

DynamoDB ] [ 15 NoSQL • No schema (only Key)
• Hash / Hash + Range • Local Secondary Index DynamoDB

• Hash / Hash + Range • Local Secondary Index Speeeed • Provisioned throughput • Auto storage scaling • “Shared nothing” • Low latency (<10ms Wr) • Solid State Drives (SSD) •IOPS per Table DynamoDB

• Hash / Hash + Range • Local Secondary Index Speeeed • Provisioned throughput • Auto storage scaling • “Shared nothing” • Low latency (<10ms Wr) • Solid State Drives (SSD) •IOPS per Table Robust • Built-in fault tolerance • Strong consistency • Atomic counters • Disk-only writes DynamoDB

How do I... Create a table?

Creating a table with the Java low-level API ] [
17 client = new AmazonDynamoDBClient(credentials); String tableName = "ProductCatalog"; KeySchemaElement hashKey = new KeySchemaElement().withAttributeName("Id").withAttributeType("N"); KeySchema ks = new KeySchema().withHashKeyElement(hashKey); ProvisionedThroughput provisionedThroughput = new ProvisionedThroughput() .withReadCapacityUnits(10L) .withWriteCapacityUnits(10L); CreateTableRequest request = new CreateTableRequest() .withTableName(tableName) .withKeySchema(ks) .withProvisionedThroughput(provisionedThroughput); CreateTableResult result = client.createTable(request);

Creating a table with the Java low-level API ] [
17 client = new AmazonDynamoDBClient(credentials); String tableName = "ProductCatalog"; KeySchemaElement hashKey = new KeySchemaElement().withAttributeName("Id").withAttributeType("N"); KeySchema ks = new KeySchema().withHashKeyElement(hashKey); ProvisionedThroughput provisionedThroughput = new ProvisionedThroughput() .withReadCapacityUnits(10L) .withWriteCapacityUnits(10L); CreateTableRequest request = new CreateTableRequest() .withTableName(tableName) .withKeySchema(ks) .withProvisionedThroughput(provisionedThroughput); CreateTableResult result = client.createTable(request); Schema Throughput Table

Creating a table with BOTO library (Python) ] [ 18
>>> message_table_schema = conn.create_schema( hash_key_name='forum', hash_key_proto_value='S', range_key_name='subject', range_key_proto_value='S' ) >>> table = conn.create_table( name='messages', schema=message_table_schema, read_units=5, write_units=5 ) >>>

Creating a table with BOTO library (Python) ] [ 18
>>> message_table_schema = conn.create_schema( hash_key_name='forum', hash_key_proto_value='S', range_key_name='subject', range_key_proto_value='S' ) >>> table = conn.create_table( name='messages', schema=message_table_schema, read_units=5, write_units=5 ) >>> Schema Table Throughput

Deleting a table? Careful... ] [ 19 >>> conn.delete_table(table)

Deleting a table? Careful... ] [ 19 >>> conn.delete_table(table) Permanently
deletes Table! I suggest to use THREE different users: 1. Dev/Test 2. Production 3. Read-only You can manage permissions with IAM

Managing DynamoDB permissions with IAM ] [ 20 { "Statement":
[ { "Action": [ "dynamodb:GetItem", "dynamodb:BatchGetItem", "dynamodb:Query", "dynamodb:Scan", "dynamodb:DescribeTable", "dynamodb:ListTables" ], "Effect": "Allow", "Resource": "*" } ] }

[ { "Action": [ "dynamodb:GetItem", "dynamodb:BatchGetItem", "dynamodb:Query", "dynamodb:Scan", "dynamodb:DescribeTable", "dynamodb:ListTables" ], "Effect": "Allow", "Resource": "*" } ] } Read-only user

[ { "Action": [ "dynamodb:*" ], "Effect": "Allow", "Resource": "*" } ] }

[ { "Action": [ "dynamodb:*" ], "Effect": "Allow", "Resource": "*" } ] } Full-access User

22 AWS Management Console: IAM (Identity and Access Mgmt) (image)

Data Model

DynamoDB Data Model ] [

DynamoDB Data Model ] [ Table(s) Item(s) Attribute(s)

Example: Table, Items, Attributes ] [

Products Example: Table, Items, Attributes ] [ Table You must
specify what type of Primary Key to use: “Hash”, or “Hash + Range”

Products Example: Table, Items, Attributes ] [

Products Example: Table, Items, Attributes ] [ Item Item Item

Products Example: Table, Items, Attributes ] [

Products Example: Table, Items, Attributes ] [ id=”301” id=”201” Author=”Simone”,
“John”, “Erin” Title=”PHP basics” Title=”Learn C++” ISBN=”122938” id=”101” Price=”15.50” Cat=”Bycicle”

Products Example: Table, Items, Attributes ] [ id=”301” id=”201” Author=”Simone”,
“John”, “Erin” Title=”PHP basics” Title=”Learn C++” ISBN=”122938” id=”101” Price=”15.50” Cat=”Bycicle” Multi-valued data type Scalar data type

DynamoDB Data Types ] [

DynamoDB Data Types ] [ Scalar Number (+/-, 38 digits)
String (UTF-8) Binary Multi-valued Number Set String Set Binary Set • Values in a set must be unique. • Values not ordered.

Primary Key: Hash / Hash + Range ] [ 30

Hash The key is hashed over the different partitions to optimize workload distribution

Hash The key is hashed over the different partitions to optimize workload distribution Hash + Range When querying, the hash attribute needs to be uniquely matched, but a range operation can be specified for the range attribute. (e.g. all orders in the last 60 minutes)

id=100 paid=100.3 id=103 paid=87.0 id=201 paid=33.5 Hash

id=100 paid=100.3 id=103 paid=87.0 id=201 paid=33.5 id=100 date=2012-09-18 paid=10.66 id=100 date=2012-09-16 paid=71.0 id=103 date=2012-09-10 paid=23.6 Hash Hash + Range

32 dynamo.create_table("Activity", { HashKeyElement: {AttributeName: "user", AttributeType: "S"}, RangeKeyElement: {AttributeName:
"created", AttributeType: "N"}}, {ReadCapacityUnits: 5, WriteCapacityUnits: 5}) dynamo.put_item("Activity", { user: {S: "roidrage"}, created: {N: Time.now.tv_sec.to_s}, activity: {S: "Checked in"}}) items = activities.items.query( hash_key: "roidrage", range_greater_than: (Time.now - 85600).tv_sec) Table with “Hash + Range” Primary Key (Ruby) ] [

32 dynamo.create_table("Activity", { HashKeyElement: {AttributeName: "user", AttributeType: "S"}, RangeKeyElement: {AttributeName:
"created", AttributeType: "N"}}, {ReadCapacityUnits: 5, WriteCapacityUnits: 5}) dynamo.put_item("Activity", { user: {S: "roidrage"}, created: {N: Time.now.tv_sec.to_s}, activity: {S: "Checked in"}}) items = activities.items.query( hash_key: "roidrage", range_greater_than: (Time.now - 85600).tv_sec) Query Table with “Hash + Range” Primary Key (Ruby) ] [ Put item Create table

Throughput

34 Scaling DynamoDB throughput (video)

Query / Scan

Query / Scan ] [ 36 http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/QueryAndScan.html

Query / Scan ] [ 36 Query • Search only
on primary key (hash or composite) • Supports a subset of comparison operators on key attribute values. • Returns 1 MB per Query operation. • More efficient than Scan. http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/QueryAndScan.html

Query / Scan ] [ 36 Scan • Scans the
entire table • Supports a specific set of comparison operators (e.g. <=, >, ==). • Returns 1 MB / Scan. • Slower for bigger tables. Query • Search only on primary key (hash or composite) • Supports a subset of comparison operators on key attribute values. • Returns 1 MB per Query operation. • More efficient than Scan. http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/QueryAndScan.html

37 Limiting the capabilities of Query's comparison operators was a
deliberate decision, to ensure that this API's performance will always remain predictable, no matter the scale of the table (size or throughput). ... Stefano @ AWS (on discussion forums) Query vs. Scan? ] [

deliberate decision, to ensure that this API's performance will always remain predictable, no matter the scale of the table (size or throughput). This limitation forces the developer to perform more work upfront, but it will yield a scalable workload no matter how much it grows. ... Stefano @ AWS (on discussion forums) Query vs. Scan? ] [

deliberate decision, to ensure that this API's performance will always remain predictable, no matter the scale of the table (size or throughput). This limitation forces the developer to perform more work upfront, but it will yield a scalable workload no matter how much it grows. An operation like CONTAINS could seem appealing on paper, but its performance would start slowing progressively as the dataset size grows, eventually requiring a painful rearchitecture down the road. Stefano @ AWS (on discussion forums) Query vs. Scan? ] [

Lost Update

Writing to DynamoDB ] [ 41 http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/QueryAndScan.html

Writing to DynamoDB ] [ 41 http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/QueryAndScan.html How to solve
the “Lost update”: Optimistic Concurrency Control (A.K.A. Conditional Writes)

Writing to DynamoDB ] [ 41 http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/QueryAndScan.html How to solve
the “Lost update”: Optimistic Concurrency Control (A.K.A. Conditional Writes) Put/Update/Delete are always ACID; “Isolation” only at Item level Atomicity Consistency Isolation Durability { (only at Item)

DynamoDB The “Lost update” concurrency issue ] [ Client 1
Client 2 Id=1 Price=10 Time

Client 2 Id=1 Price=10 Id=1 Price=10 Id=1 Price=10 GetItem GetItem Time

Client 2 Id=1 Price=10 Id=1 Price=10 Id=1 Price=10 GetItem GetItem PutItem PutItem Time Id=1 Price=10 Id=1 Price=12 Id=1 Price=10 Id=1 Price=12 Id=1 Price=12 Id=1 Price=8 Id=1 Price=10 Id=1 Price=8

DynamoDB How to fix it with Conditional Writes ] [
Client 1 Client 2 Id=1 Price=10 Id=1 Price=10 Id=1 Price=10 GetItem GetItem Time

Client 1 Client 2 Id=1 Price=10 Id=1 Price=10 Id=1 Price=10 GetItem GetItem Time Id=1 Price=10 Id=1 Price=12 Id=1 Price=10 Id=1 Price=8

Client 1 Client 2 Id=1 Price=10 Id=1 Price=10 Id=1 Price=10 GetItem GetItem if (Price=10) PutItem Time Id=1 Price=10 Id=1 Price=12 Id=1 Price=10 Id=1 Price=12 Id=1 Price=10 Id=1 Price=8

Conditional Writes or Atomic Counters? ] [

Conditional Writes or Atomic Counters? ] [ Conditional Writes •Idempotent
operation •Small overhead

Conditional Writes or Atomic Counters? ] [ Conditional Writes •Idempotent
operation •Small overhead Atomic Counters •Increment/Decrement •Allow simultaneous write requests •NOT Idempotent

(eventually) consistent read

(Eventually) Consistent Reads ] [

(Eventually) Consistent Reads ] [ Consistent Read •Consumes more “read
capacity” units (2x) •Consistency reached within 1,000 ms after last write

capacity” units (2x) •Consistency reached within 1,000 ms after last write Eventually Consistent Read •Can read immediately after a write (2 copies) •Read old or new value

capacity” units (2x) •Consistency reached within 1,000 ms after last write Eventually Consistent Read •Can read immediately after a write (2 copies) •Read old or new value (2 copies)

capacity” units (2x) •Consistency reached within 1,000 ms after last write Eventually Consistent Read •Can read immediately after a write (2 copies) •Read old or new value (2 copies) Let me explain...

DynamoDB Durability in DynamoDB ] [ Client Time

DynamoDB Durability in DynamoDB ] [ Client Id=1 Price=10 Id=1
Price=10 PutItem Time Confirmation

Example: Consistent Read (HTTP request) ] [ 49 // This
header is abbreviated. POST / HTTP/1.1 x-amz-target: DynamoDB_20111205.GetItem content-type: application/x-amz-json-1.0 {"TableName":"comptable", "Key": {"HashKeyElement":{"S":"Julie"}, "RangeKeyElement":{"N":"1307654345"}}, "AttributesToGet":["status","friends"], "ConsistentRead":true }

Example: Consistent Read (HTTP request) ] [ 49 // This
header is abbreviated. POST / HTTP/1.1 x-amz-target: DynamoDB_20111205.GetItem content-type: application/x-amz-json-1.0 {"TableName":"comptable", "Key": {"HashKeyElement":{"S":"Julie"}, "RangeKeyElement":{"N":"1307654345"}}, "AttributesToGet":["status","friends"], "ConsistentRead":true } Consistent Read

Consistency Availability Partition Tolerance CAP Theorem ] [

DynamoDB APIs ] [

DynamoDB APIs ] [ Table •CreateTable •UpdateTable •DeleteTable •DescribeTable •ListTables
Item •PutItem •GetItem •UpdateItem •DeleteItem •BatchGetItem •BatchWriteItem Query/Scan •Query •Scan

53 (image) Monitoring DynamoDB with CloudWatch ] [

+ + + + + + + 53 (image) Monitoring
DynamoDB with CloudWatch ] [ Successful Request Latency Consumed Read Capacity Units Throttled Requests User Errors Returned Item Count System Errors Consumed Write Capacity Units

A simple example ] [

A simple example ] [ Let’s take a look. How
to do things with Python and the BOTO library?

55 Download and install the BOTO python library (Mac OS)
(video)

56 Enter AWS credentials and connect to DynamoDB (video)

Table We are going to use this schema... ] [

Table We are going to use this schema... ] [
read_units=5 write_units=5 forum= subject= hash key range key

Messages ... to create a table, and add items. ]
[

Messages ... to create a table, and add items. ]
[ forum= ”AWS forum” Body= ”http://127.0.0.1/hello.gif“ subject= ”Hello!” SentBy= “Simone” forum= ”AWS forum” Body= "Nice meeting with you!" subject= ”Goodbye!” SentBy= “Simone”

59 Define Schema and Table, then create the Table (video)

60 Put an Item, retrieve the Item from the Table
(video)

61 Adding another Item with the AWS Management Console (video)

Ok, I get BOTO. ] [

Ok, I get BOTO. ] [ Do you want more
choice?

63 (image) Amazon DynamoDB libraries, mappers, etc. ] [

63 (image) Perl Javascript Erlang Node.js Java Django PHP Ruby
Python .NET Groovy / Grails Cold Fusion Amazon DynamoDB libraries, mappers, etc. ] [

64 Hive: Importing/Exporting/Querying Data in DynamoDB

65 DynamoDB costs: 0.0065 $/h per 10 writes/second 0.0065 $/h
per 50 “strong” reads/second 1.00 $/month per GB

per 50 “strong” reads/second 1.00 $/month per GB Unlike Scan, Query only operates on matching records, not all records. You only pay for the throughput of the items that match, not for everything scanned.

per 50 “strong” reads/second 1.00 $/month per GB For large BLOBs or infrequently accessed data, use Amazon S3 (DynamoDB item limit: 64 KB) You can store smaller data elements or file pointers in DynamoDB

per 50 “strong” reads/second 1.00 $/month per GB DynamoDB Free tier: 5 writes/second 10 consistent reads/second 100 Mb storage

66 http://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html

Local Secondary Index (LSI)

(video) Create a table with a Local Secondary Index (LSI),
query it

Thoughts...

70 But what if the server / storage / datacenter
fails? On most NoSQL systems you would lose your most recent changes, or the data might be saved but could be offline and unavailable. ... James Hamilton, VP and Distinguished Engineer, Amazon Web Services

fails? On most NoSQL systems you would lose your most recent changes, or the data might be saved but could be offline and unavailable. With dynamoDB, if data is committed just as one entire datacenter burns to the ground, the data is safe, and the application can continue to run without negative impact at exactly the same provisioned throughput rate. The loss of an entire datacenter isn’t even inconvenient, and has no impact on your running application performance. ... James Hamilton, VP and Distinguished Engineer, Amazon Web Services

fails? On most NoSQL systems you would lose your most recent changes, or the data might be saved but could be offline and unavailable. With dynamoDB, if data is committed just as one entire datacenter burns to the ground, the data is safe, and the application can continue to run without negative impact at exactly the same provisioned throughput rate. The loss of an entire datacenter isn’t even inconvenient, and has no impact on your running application performance. Combining rock solid synchronous, multi-datacenter redundancy with average latency in the single digits, and throughput scaling to the millions of requests per second is both an excellent engineering challenge and one often not achieved. James Hamilton, VP and Distinguished Engineer, Amazon Web Services

73 (image) Live repartitioning, no downtime ] [

Why DynamoDB ] [ • Sorted range keys • Conditional
updates • Atomic counters • Structured data and multi-valued data types • Fetching and updating single attributes • Strong consistency • No table size limits • Live repartitioning • Disk-only writes • IOPS per table No explicit way to handle conflicts other than conditions

Amazon DynamoDB Simone Brunozzi ( @simon) Senior Technology Evangelist Amazon
Web Services v 4.0 - Apr 20th, 2013 http://bit.ly/dynamodb2013

Introduction to Amazon DynamoDB

Introduction to Amazon DynamoDB

More Decks by Simone Brunozzi

Other Decks in Technology

Featured

Transcript