Slide 1

Slide 1 text

DynamoDB Building Applications with An Online Seminar - 16th May 2012 Dr Matt Wood, Amazon Web Services

Slide 2

Slide 2 text

Thank you!

Slide 3

Slide 3 text

Building Applications with DynamoDB

Slide 4

Slide 4 text

Building Applications with DynamoDB Getting started

Slide 5

Slide 5 text

Building Applications with DynamoDB Getting started Data modeling

Slide 6

Slide 6 text

Building Applications with DynamoDB Getting started Data modeling Partitioning

Slide 7

Slide 7 text

Building Applications with DynamoDB Getting started Data modeling Partitioning Analytics

Slide 8

Slide 8 text

Getting started with DynamoDB quick review

Slide 9

Slide 9 text

DynamoDB is a managed NoSQL database service. Store and retrieve any amount of data. Serve any level of request traffic.

Slide 10

Slide 10 text

Without the operational burden.

Slide 11

Slide 11 text

Consistent, predictable performance. Single digit millisecond latencies. Backed on solid-state drives.

Slide 12

Slide 12 text

Flexible data model. Key/attribute pairs. No schema required. Easy to create. Easy to adjust.

Slide 13

Slide 13 text

Seamless scalability. No table size limits. Unlimited storage. No downtime.

Slide 14

Slide 14 text

Durable. Consistent, disk-only writes. Replication across data centres and availability zones.

Slide 15

Slide 15 text

Without the operational burden.

Slide 16

Slide 16 text

Without the operational burden. FOCUS ON YOUR APP

Slide 17

Slide 17 text

Two decisions + three clicks = ready for use

Slide 18

Slide 18 text

Two decisions + three clicks = ready for use Primary keys + level of throughput

Slide 19

Slide 19 text

Provisioned throughput. Reserve IOPS for reads and writes. Scale up (or down) at any time.

Slide 20

Slide 20 text

Pay per capacity unit. Priced per hour of provisioned throughput.

Slide 21

Slide 21 text

Write throughput. $0.01 per hour for 10 write units Units = size of item x writes/second

Slide 22

Slide 22 text

Consistent writes. Atomic increment/decrement. Optimistic concurrency control. aka: “conditional writes”.

Slide 23

Slide 23 text

Transactions. Item level transactions only. Puts, updates and deletes are ACID.

Slide 24

Slide 24 text

Read throughput. strongly consistent eventually consistent

Slide 25

Slide 25 text

Read throughput. $0.01 per hour for 50 read units Provisioned units = size of item x reads/second strongly consistent eventually consistent

Slide 26

Slide 26 text

Read throughput. $0.01 per hour for 100 read units Provisioned units = size of item x reads/second 2 strongly consistent eventually consistent

Slide 27

Slide 27 text

Read throughput. Mix and match at “read time”. Same latency expectations. strongly consistent eventually consistent

Slide 28

Slide 28 text

Two decisions + three clicks = ready for use

Slide 29

Slide 29 text

No content

Slide 30

Slide 30 text

No content

Slide 31

Slide 31 text

No content

Slide 32

Slide 32 text

Two decisions + three clicks = ready for use

Slide 33

Slide 33 text

Two decisions + one API call = ready for use

Slide 34

Slide 34 text

$create_response = $dynamodb->create_table(array( 'TableName' => 'ProductCatalog', 'KeySchema' => array( 'HashKeyElement' => array( 'AttributeName' => 'Id', 'AttributeType' => AmazonDynamoDB::TYPE_NUMBER ) ), 'ProvisionedThroughput' => array( 'ReadCapacityUnits' => 10, 'WriteCapacityUnits' => 5 ) ));

Slide 35

Slide 35 text

Two decisions + one API call = ready for use

Slide 36

Slide 36 text

Two decisions + one API call = ready for development

Slide 37

Slide 37 text

Two decisions + one API call = ready for production

Slide 38

Slide 38 text

Two decisions + one API call = ready for scale

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

Authentication. Session based to minimize latency. Uses Amazon Security Token Service. Handled by AWS SDKs. Integrates with IAM.

Slide 41

Slide 41 text

Monitoring. CloudWatch metrics: latency, consumed read and write throughput, errors and throttling.

Slide 42

Slide 42 text

Libraries, mappers & mocks. http://j.mp/dynamodb-libs ColdFusion, Django, Erlang, Java, .Net, Node.js, Perl, PHP, Python, Ruby

Slide 43

Slide 43 text

DynamoDB data models

Slide 44

Slide 44 text

DynamoDB semantics. Tables, items and attributes.

Slide 45

Slide 45 text

Tables contain items. Unlimited items per table.

Slide 46

Slide 46 text

Items are a collection of attributes. Each attribute has a key and a value. An item can have any number of attributes, up to 64k total.

Slide 47

Slide 47 text

Two scalar data types. String: Unicode, UTF8 binary encoding. Number: 38 digit precision. Multi-value strings and numbers.

Slide 48

Slide 48 text

id = 100 date = 2012-05-16-09-00-10 total = 25.00 id = 101 date = 2012-05-15-15-00-11 total = 35.00 id = 101 date = 2012-05-16-12-00-10 total = 100.00 id = 102 date = 2012-03-20-18-23-10 total = 20.00 id = 102 date = 2012-03-20-18-23-10 total = 120.00

Slide 49

Slide 49 text

id = 100 date = 2012-05-16-09-00-10 total = 25.00 id = 101 date = 2012-05-15-15-00-11 total = 35.00 id = 101 date = 2012-05-16-12-00-10 total = 100.00 id = 102 date = 2012-03-20-18-23-10 total = 20.00 id = 102 date = 2012-03-20-18-23-10 total = 120.00 Table

Slide 50

Slide 50 text

id = 100 date = 2012-05-16-09-00-10 total = 25.00 id = 101 date = 2012-05-15-15-00-11 total = 35.00 id = 101 date = 2012-05-16-12-00-10 total = 100.00 id = 102 date = 2012-03-20-18-23-10 total = 20.00 id = 102 date = 2012-03-20-18-23-10 total = 120.00 Item

Slide 51

Slide 51 text

id = 100 date = 2012-05-16-09-00-10 total = 25.00 id = 101 date = 2012-05-15-15-00-11 total = 35.00 id = 101 date = 2012-05-16-12-00-10 total = 100.00 id = 102 date = 2012-03-20-18-23-10 total = 20.00 id = 102 date = 2012-03-20-18-23-10 total = 120.00 Attribute

Slide 52

Slide 52 text

Where is the schema? Tables do not require a formal schema. Items are an arbitrary sized hash. Just need to specify the primary key.

Slide 53

Slide 53 text

Items are indexed by primary key. Single hash keys and composite keys.

Slide 54

Slide 54 text

id = 100 date = 2012-05-16-09-00-10 total = 25.00 id = 101 date = 2012-05-15-15-00-11 total = 35.00 id = 101 date = 2012-05-16-12-00-10 total = 100.00 id = 102 date = 2012-03-20-18-23-10 total = 20.00 id = 102 date = 2012-03-20-18-23-10 total = 120.00 Hash Key

Slide 55

Slide 55 text

Range key for queries. Querying items by composite key.

Slide 56

Slide 56 text

id = 100 date = 2012-05-16-09-00-10 total = 25.00 id = 101 date = 2012-05-15-15-00-11 total = 35.00 id = 101 date = 2012-05-16-12-00-10 total = 100.00 id = 102 date = 2012-03-20-18-23-10 total = 20.00 id = 102 date = 2012-03-20-18-23-10 total = 120.00 Hash Key Range Key +

Slide 57

Slide 57 text

Programming DynamoDB. Small but perfectly formed. Whole programming interface fits on one slide.

Slide 58

Slide 58 text

CreateTable UpdateTable DeleteTable DescribeTable ListTables PutItem GetItem UpdateItem DeleteItem BatchGetItem BatchWriteItem Query Scan

Slide 59

Slide 59 text

CreateTable UpdateTable DeleteTable DescribeTable ListTables PutItem GetItem UpdateItem DeleteItem BatchGetItem BatchWriteItem Query Scan

Slide 60

Slide 60 text

CreateTable UpdateTable DeleteTable DescribeTable ListTables PutItem GetItem UpdateItem DeleteItem BatchGetItem BatchWriteItem Query Scan

Slide 61

Slide 61 text

Conditional updates. PutItem, UpdateItem, DeleteItem can take optional conditions for operation. UpdateItem performs atomic increments.

Slide 62

Slide 62 text

CreateTable UpdateTable DeleteTable DescribeTable ListTables PutItem GetItem UpdateItem DeleteItem BatchGetItem BatchWriteItem Query Scan

Slide 63

Slide 63 text

One API call, multiple items. BatchGet returns multiple items by primary key. BatchWrite performs up to 25 put or delete operations. Throughput is measured by IO, not API calls.

Slide 64

Slide 64 text

CreateTable UpdateTable DeleteTable DescribeTable ListTables PutItem GetItem UpdateItem DeleteItem BatchGetItem BatchWriteItem Query Scan

Slide 65

Slide 65 text

Query vs Scan Query for composite key queries. Scan for full table scans, exports. Both support pages and limits. Maximum response is 1Mb in size.

Slide 66

Slide 66 text

Query patterns. Retrieve all items by hash key. Range key conditions: ==, <, >, >=, <=, begins with, between. Counts. Top and bottom n values. Paged responses.

Slide 67

Slide 67 text

Modeling patterns

Slide 68

Slide 68 text

1. Mapping relationships with range keys. No cross-table joins in DynamoDB. Use composite keys to model relationships. Patterns

Slide 69

Slide 69 text

Data model example: online gaming. Storing scores and leader boards. Players with high Scores. Leader board for each game.

Slide 70

Slide 70 text

Data model example: online gaming. Storing scores and leader boards. Players with high Scores. Leader board for each game. user_id = mza location = Cambridge joined = 2011-07-04 user_id = jeffbarr location = Seattle joined = 2012-01-20 user_id = werner location = Worldwide joined = 2011-05-15 Players: hash key

Slide 71

Slide 71 text

Data model example: online gaming. Storing scores and leader boards. Players with high Scores. Leader board for each game. user_id = mza location = Cambridge joined = 2011-07-04 user_id = jeffbarr location = Seattle joined = 2012-01-20 user_id = werner location = Worldwide joined = 2011-05-15 Players: hash key user_id = mza game = angry-birds score = 11,000 user_id = mza game = tetris score = 1,223,000 user_id = werner location = bejewelled score = 55,000 Scores: composite key

Slide 72

Slide 72 text

Data model example: online gaming. Storing scores and leader boards. Players with high Scores. Leader board for each game. user_id = mza location = Cambridge joined = 2011-07-04 user_id = jeffbarr location = Seattle joined = 2012-01-20 user_id = werner location = Worldwide joined = 2011-05-15 Players: hash key user_id = mza game = angry-birds score = 11,000 user_id = mza game = tetris score = 1,223,000 user_id = werner location = bejewelled score = 55,000 Scores: composite key game = angry-birds score = 11,000 user_id = mza game = tetris score = 1,223,000 user_id = mza game = tetris score = 9,000,000 user_id = jeffbarr Leader boards: composite key

Slide 73

Slide 73 text

Data model example: online gaming. Storing scores and leader boards. user_id = mza location = Cambridge joined = 2011-07-04 user_id = jeffbarr location = Seattle joined = 2012-01-20 user_id = werner location = Worldwide joined = 2011-05-15 Players: hash key user_id = mza game = angry-birds score = 11,000 user_id = mza game = tetris score = 1,223,000 user_id = werner location = bejewelled score = 55,000 Scores: composite key game = angry-birds score = 11,000 user_id = mza game = tetris score = 1,223,000 user_id = mza game = tetris score = 9,000,000 user_id = jeffbarr Leader boards: composite key Scores by user (and by game)

Slide 74

Slide 74 text

Data model example: online gaming. Storing scores and leader boards. user_id = mza location = Cambridge joined = 2011-07-04 user_id = jeffbarr location = Seattle joined = 2012-01-20 user_id = werner location = Worldwide joined = 2011-05-15 Players: hash key user_id = mza game = angry-birds score = 11,000 user_id = mza game = tetris score = 1,223,000 user_id = werner location = bejewelled score = 55,000 Scores: composite key game = angry-birds score = 11,000 user_id = mza game = tetris score = 1,223,000 user_id = mza game = tetris score = 9,000,000 user_id = jeffbarr Leader boards: composite key High scores by game

Slide 75

Slide 75 text

2. Handling large items. Unlimited attributes per item. Unlimited items per table. Max 64k per item. Patterns

Slide 76

Slide 76 text

Data model example: large items. Storing more than 64k across items. message_id = 1 part = 1 message = message_id = 1 part = 2 message = message_id = 1 part = 3 joined = Large messages: composite keys Split attributes across items. Query by message_id and part to retrieve.

Slide 77

Slide 77 text

Store a pointer to objects in Amazon S3. Large data stored in S3. Location stored in DynamoDB. 99.999999999% data durability in S3. Patterns

Slide 78

Slide 78 text

3. Managing secondary indices. Not supported by DynamoDB. Create your own. Patterns

Slide 79

Slide 79 text

Data model example: secondary indices. Storing more than 64k across items. user_id = mza first_name = Matt last_name = Wood user_id = mattfox first_name = Matt last_name = Fox user_id = werner first_name = Werner last_name = Vogels Users: hash key

Slide 80

Slide 80 text

Data model example: secondary indices. Storing more than 64k across items. user_id = mza first_name = Matt last_name = Wood user_id = mattfox first_name = Matt last_name = Fox user_id = werner first_name = Werner last_name = Vogels Users: hash key first_name = Matt user_id = mza first_name = Matt user_id = mattfox first_name = Werner user_id = werner First name index: composite keys

Slide 81

Slide 81 text

Data model example: secondary indices. Storing more than 64k across items. Users: hash key first_name = Matt user_id = mza first_name = Matt user_id = mattfox first_name = Werner user_id = werner First name index: composite keys Second name index: composite keys last_name = Wood user_id = mza last_name = Fox user_id = mattfox last_name = Vogels user_id = werner user_id = mza first_name = Matt last_name = Wood user_id = mattfox first_name = Matt last_name = Fox user_id = werner first_name = Werner last_name = Vogels

Slide 82

Slide 82 text

last_name = Wood user_id = mza last_name = Fox user_id = mattfox last_name = Vogels user_id = werner user_id = mza first_name = Matt last_name = Wood user_id = mattfox first_name = Matt last_name = Fox user_id = werner first_name = Werner last_name = Vogels Data model example: secondary indices. Storing more than 64k across items. Users: hash key first_name = Matt user_id = mza first_name = Matt user_id = mattfox first_name = Werner user_id = werner First name index: composite keys Second name index: composite keys

Slide 83

Slide 83 text

last_name = Wood user_id = mza last_name = Fox user_id = mattfox last_name = Vogels user_id = werner user_id = mza first_name = Matt last_name = Wood user_id = mattfox first_name = Matt last_name = Fox user_id = werner first_name = Werner last_name = Vogels Data model example: secondary indices. Storing more than 64k across items. Users: hash key first_name = Matt user_id = mza first_name = Matt user_id = mattfox first_name = Werner user_id = werner First name index: composite keys Second name index: composite keys

Slide 84

Slide 84 text

4. Time series data. Logging, click through, ad views, game play data, application usage. Non-uniform access patterns. Newer data is ‘live’. Older data is read only. Patterns

Slide 85

Slide 85 text

Data model example: time series data. Rolling tables for hot and cold data. event_id = 1000 timestamp = 2012-05-16-09-59-01 key = value event_id = 1001 timestamp = 2012-05-16-09-59-02 key = value event_id = 1002 timestamp = 2012-05-16-09-59-02 key = value Events table: composite keys

Slide 86

Slide 86 text

Data model example: time series data. Rolling tables for hot and cold data. event_id = 1000 timestamp = 2012-05-16-09-59-01 key = value event_id = 1001 timestamp = 2012-05-16-09-59-02 key = value event_id = 1002 timestamp = 2012-05-16-09-59-02 key = value Events table: composite keys Events table for April: composite keys Events table for January: composite keys event_id = 400 timestamp = 2012-04-01-00-00-01 event_id = 401 timestamp = 2012-04-01-00-00-02 event_id = 402 timestamp = 2012-04-01-00-00-03 event_id = 100 timestamp = 2012-01-01-00-00-01 event_id = 101 timestamp = 2012-01-01-00-00-02 event_id = 102 timestamp = 2012-01-01-00-00-03

Slide 87

Slide 87 text

Hot and cold tables. Jan April May Feb Mar Dec Patterns

Slide 88

Slide 88 text

Hot and cold tables. Jan April May Feb Mar higher throughput Dec Patterns

Slide 89

Slide 89 text

Hot and cold tables. Jan April May Feb Mar higher throughput lower throughput Dec Patterns

Slide 90

Slide 90 text

Hot and cold tables. Jan April May Feb Mar data to S3, delete cold tables Dec Patterns

Slide 91

Slide 91 text

Hot and cold tables. Feb May June Mar Apr Jan Patterns

Slide 92

Slide 92 text

Hot and cold tables. Mar June July Apr May Feb Patterns

Slide 93

Slide 93 text

Hot and cold tables. Apr July Aug May June Mar Patterns

Slide 94

Slide 94 text

Hot and cold tables. May Aug Sept June July Apr Patterns

Slide 95

Slide 95 text

Hot and cold tables. June Sept Oct July Aug May Patterns

Slide 96

Slide 96 text

Not out of mind. DynamoDB and S3 data can be integrated for analytics. Run queries across hot and cold data with Elastic MapReduce. Patterns

Slide 97

Slide 97 text

Partitioning best practices

Slide 98

Slide 98 text

Uniform workloads. DynamoDB divides table data into multiple partitions. Data is distributed primarily by hash key. Provisioned throughput is divided evenly across the partitions.

Slide 99

Slide 99 text

Uniform workloads. To achieve and maintain full provisioned throughput for a table, spread your workload evenly across the hash keys.

Slide 100

Slide 100 text

Non-uniform workloads. Some requests might be throttled, even at high levels of provisioned throughput. Some best practices...

Slide 101

Slide 101 text

1. Distinct values for hash keys. Patterns Hash key elements should have a high number of distinct values.

Slide 102

Slide 102 text

Data model example: hash key selection. Well distributed work loads user_id = mza first_name = Matt last_name = Wood user_id = jeffbarr first_name = Jeff last_name = Barr user_id = werner first_name = Werner last_name = Vogels user_id = mattfox first_name = Matt last_name = Fox ... ... ... Users

Slide 103

Slide 103 text

Data model example: hash key selection. Well distributed work loads user_id = mza first_name = Matt last_name = Wood user_id = jeffbarr first_name = Jeff last_name = Barr user_id = werner first_name = Werner last_name = Vogels user_id = mattfox first_name = Matt last_name = Fox ... ... ... Users Lots of users with unique user_id. Workload well distributed across user partitions.

Slide 104

Slide 104 text

2. Avoid limited hash key values. Patterns Hash key elements should have a high number of distinct values.

Slide 105

Slide 105 text

Data model example: small hash value range. Non-uniform workload. status = 200 date = 2012-04-01-00-00-01 status = 404 date = 2012-04-01-00-00-01 status 404 date = 2012-04-01-00-00-01 status = 404 date = 2012-04-01-00-00-01 Status responses

Slide 106

Slide 106 text

Data model example: small hash value range. Non-uniform workload. status = 200 date = 2012-04-01-00-00-01 status = 404 date = 2012-04-01-00-00-01 status 404 date = 2012-04-01-00-00-01 status = 404 date = 2012-04-01-00-00-01 Status responses Small number of status codes. Unevenly, non-uniform workload.

Slide 107

Slide 107 text

3. Model for even distribution of access. Patterns Access by hash key value should be evenly distributed across the dataset.

Slide 108

Slide 108 text

Data model example: uneven access pattern by key. Non-uniform access workload. mobile_id = 100 access_date = 2012-04-01-00-00-01 mobile_id = 100 access_date = 2012-04-01-00-00-02 mobile_id = 100 access_date = 2012-04-01-00-00-03 mobile_id = 100 access_date = 2012-04-01-00-00-04 ... ... Devices

Slide 109

Slide 109 text

mobile_id = 100 access_date = 2012-04-01-00-00-01 mobile_id = 100 access_date = 2012-04-01-00-00-02 mobile_id = 100 access_date = 2012-04-01-00-00-03 mobile_id = 100 access_date = 2012-04-01-00-00-04 ... ... Devices Large number of devices. Small number which are much more popular than others. Workload unevenly distributed. Data model example: uneven access pattern by key. Non-uniform access workload.

Slide 110

Slide 110 text

mobile_id = 100.1 access_date = 2012-04-01-00-00-01 mobile_id = 100.2 access_date = 2012-04-01-00-00-02 mobile_id = 100.3 access_date = 2012-04-01-00-00-03 mobile_id = 100.4 access_date = 2012-04-01-00-00-04 ... ... Devices Randomize access pattern. Workload randomised by hash key. Data model example: randomize access pattern by key. Towards a uniform workload.

Slide 111

Slide 111 text

Design for a uniform workload.

Slide 112

Slide 112 text

Analytics with DynamoDB

Slide 113

Slide 113 text

Seamless scale. Scalable methods for data processing. Scalable methods for backup/restore.

Slide 114

Slide 114 text

Amazon Elastic MapReduce. http://aws.amazon.com/emr Managed Hadoop service for data-intensive workflows.

Slide 115

Slide 115 text

Hadoop under the hood. Take advantage of the Hadoop ecosystem: streaming interfaces, Hive, Pig, Mahout.

Slide 116

Slide 116 text

Distributed data processing. API driven. Analytics at any scale.

Slide 117

Slide 117 text

Query flexibility with Hive. create external table items_db (id string, votes bigint, views bigint) stored by 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' tblproperties ("dynamodb.table.name" = "items", "dynamodb.column.mapping" = "id:id,votes:votes,views:views");

Slide 118

Slide 118 text

Query flexibility with Hive. select id, likes, views from items_db order by views desc;

Slide 119

Slide 119 text

Data export/import. Use EMR for backup and restore to Amazon S3.

Slide 120

Slide 120 text

Data export/import. CREATE EXTERNAL TABLE orders_s3_new_export ( order_id string, customer_id string, order_date int, total double ) PARTITIONED BY (year string, month string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION 's3://export_bucket'; INSERT OVERWRITE TABLE orders_s3_new_export PARTITION (year='2012', month='01') SELECT * from orders_ddb_2012_01;

Slide 121

Slide 121 text

Integrate live and archive data Run queries across external Hive tables on S3 and DynamoDB. Live & archive. Metadata & big objects.

Slide 122

Slide 122 text

In summary... DynamoDB Predictable performance Provisioned throughput Libraries & mappers

Slide 123

Slide 123 text

In summary... DynamoDB Data modeling Predictable performance Provisioned throughput Libraries & mappers Tables & items Read & write patterns Time series data

Slide 124

Slide 124 text

In summary... DynamoDB Data modeling Partitioning Predictable performance Provisioned throughput Libraries & mappers Tables & items Read & write patterns Time series data Automatic partitioning Hot and cold data Size/throughput ratio

Slide 125

Slide 125 text

In summary... DynamoDB Data modeling Partitioning Analytics Predictable performance Provisioned throughput Libraries & mappers Tables & items Read & write patterns Time series data Automatic partitioning Hot and cold data Size/throughput ratio Elastic MapReduce Hive queries Backup & restore

Slide 126

Slide 126 text

DynamoDB free tier 5 writes, 10 consistent reads per second 100Mb of storage

Slide 127

Slide 127 text

aws.amazon.com/dynamodb aws.amazon.com/documentation/dynamodb best practice + sample code

Slide 128

Slide 128 text

Thank you!

Slide 129

Slide 129 text

Q & A [email protected] @mza