Structured Storage 5 Scaling Document store •Popular: MongoDB, CouchDB •Semi-structured data (XML, JSON, etc) Key-Value store •Popular: Cassandra, Redis •Schema-less Graph Database •Popular: Neo4J, FlockDB •Stores the relationship of data as a graph NoSQL
Structured Storage 5 Scaling Document store •Popular: MongoDB, CouchDB •Semi-structured data (XML, JSON, etc) Key-Value store •Popular: Cassandra, Redis •Schema-less Graph Database •Popular: Neo4J, FlockDB •Stores the relationship of data as a graph Etc. •Many others NoSQL
Structured Storage 5 Scaling Document store •Popular: MongoDB, CouchDB •Semi-structured data (XML, JSON, etc) Key-Value store •Popular: Cassandra, Redis •Schema-less Graph Database •Popular: Neo4J, FlockDB •Stores the relationship of data as a graph Etc. •Many others NoSQL DynamoDB
Creating a table with the Java low-level API ] [ 17 client = new AmazonDynamoDBClient(credentials); String tableName = "ProductCatalog"; KeySchemaElement hashKey = new KeySchemaElement().withAttributeName("Id").withAttributeType("N"); KeySchema ks = new KeySchema().withHashKeyElement(hashKey); ProvisionedThroughput provisionedThroughput = new ProvisionedThroughput() .withReadCapacityUnits(10L) .withWriteCapacityUnits(10L); CreateTableRequest request = new CreateTableRequest() .withTableName(tableName) .withKeySchema(ks) .withProvisionedThroughput(provisionedThroughput); CreateTableResult result = client.createTable(request);
Deleting a table? Careful... ] [ 19 >>> conn.delete_table(table) Permanently deletes Table! I suggest to use THREE different users: 1. Dev/Test 2. Production 3. Read-only You can manage permissions with IAM
DynamoDB Data Types ] [ Scalar Number (+/-, 38 digits) String (UTF-8) Binary Multi-valued Number Set String Set Binary Set • Values in a set must be unique. • Values not ordered.
Primary Key: Hash / Hash + Range ] [ 30 Hash The key is hashed over the different partitions to optimize workload distribution Hash + Range When querying, the hash attribute needs to be uniquely matched, but a range operation can be specified for the range attribute. (e.g. all orders in the last 60 minutes)
Query / Scan ] [ 36 Query • Search only on primary key (hash or composite) • Supports a subset of comparison operators on key attribute values. • Returns 1 MB per Query operation. • More efficient than Scan. http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/QueryAndScan.html
Query / Scan ] [ 36 Scan • Scans the entire table • Supports a specific set of comparison operators (e.g. <=, >, ==). • Returns 1 MB / Scan. • Slower for bigger tables. Query • Search only on primary key (hash or composite) • Supports a subset of comparison operators on key attribute values. • Returns 1 MB per Query operation. • More efficient than Scan. http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/QueryAndScan.html
37 Limiting the capabilities of Query's comparison operators was a deliberate decision, to ensure that this API's performance will always remain predictable, no matter the scale of the table (size or throughput). ... Stefano @ AWS (on discussion forums) Query vs. Scan? ] [
38 Limiting the capabilities of Query's comparison operators was a deliberate decision, to ensure that this API's performance will always remain predictable, no matter the scale of the table (size or throughput). This limitation forces the developer to perform more work upfront, but it will yield a scalable workload no matter how much it grows. ... Stefano @ AWS (on discussion forums) Query vs. Scan? ] [
39 Limiting the capabilities of Query's comparison operators was a deliberate decision, to ensure that this API's performance will always remain predictable, no matter the scale of the table (size or throughput). This limitation forces the developer to perform more work upfront, but it will yield a scalable workload no matter how much it grows. An operation like CONTAINS could seem appealing on paper, but its performance would start slowing progressively as the dataset size grows, eventually requiring a painful rearchitecture down the road. Stefano @ AWS (on discussion forums) Query vs. Scan? ] [
Writing to DynamoDB ] [ 41 http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/QueryAndScan.html How to solve the “Lost update”: Optimistic Concurrency Control (A.K.A. Conditional Writes)
Writing to DynamoDB ] [ 41 http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/QueryAndScan.html How to solve the “Lost update”: Optimistic Concurrency Control (A.K.A. Conditional Writes) Put/Update/Delete are always ACID; “Isolation” only at Item level Atomicity Consistency Isolation Durability { (only at Item)
(Eventually) Consistent Reads ] [ Consistent Read •Consumes more “read capacity” units (2x) •Consistency reached within 1,000 ms after last write Eventually Consistent Read •Can read immediately after a write (2 copies) •Read old or new value
(Eventually) Consistent Reads ] [ Consistent Read •Consumes more “read capacity” units (2x) •Consistency reached within 1,000 ms after last write Eventually Consistent Read •Can read immediately after a write (2 copies) •Read old or new value (2 copies)
(Eventually) Consistent Reads ] [ Consistent Read •Consumes more “read capacity” units (2x) •Consistency reached within 1,000 ms after last write Eventually Consistent Read •Can read immediately after a write (2 copies) •Read old or new value (2 copies) Let me explain...
65 DynamoDB costs: 0.0065 $/h per 10 writes/second 0.0065 $/h per 50 “strong” reads/second 1.00 $/month per GB Unlike Scan, Query only operates on matching records, not all records. You only pay for the throughput of the items that match, not for everything scanned.
65 DynamoDB costs: 0.0065 $/h per 10 writes/second 0.0065 $/h per 50 “strong” reads/second 1.00 $/month per GB For large BLOBs or infrequently accessed data, use Amazon S3 (DynamoDB item limit: 64 KB) You can store smaller data elements or file pointers in DynamoDB
70 But what if the server / storage / datacenter fails? On most NoSQL systems you would lose your most recent changes, or the data might be saved but could be offline and unavailable. ... James Hamilton, VP and Distinguished Engineer, Amazon Web Services
71 But what if the server / storage / datacenter fails? On most NoSQL systems you would lose your most recent changes, or the data might be saved but could be offline and unavailable. With dynamoDB, if data is committed just as one entire datacenter burns to the ground, the data is safe, and the application can continue to run without negative impact at exactly the same provisioned throughput rate. The loss of an entire datacenter isn’t even inconvenient, and has no impact on your running application performance. ... James Hamilton, VP and Distinguished Engineer, Amazon Web Services
72 But what if the server / storage / datacenter fails? On most NoSQL systems you would lose your most recent changes, or the data might be saved but could be offline and unavailable. With dynamoDB, if data is committed just as one entire datacenter burns to the ground, the data is safe, and the application can continue to run without negative impact at exactly the same provisioned throughput rate. The loss of an entire datacenter isn’t even inconvenient, and has no impact on your running application performance. Combining rock solid synchronous, multi-datacenter redundancy with average latency in the single digits, and throughput scaling to the millions of requests per second is both an excellent engineering challenge and one often not achieved. James Hamilton, VP and Distinguished Engineer, Amazon Web Services
Why DynamoDB ] [ • Sorted range keys • Conditional updates • Atomic counters • Structured data and multi-valued data types • Fetching and updating single attributes • Strong consistency • No table size limits • Live repartitioning • Disk-only writes • IOPS per table No explicit way to handle conflicts other than conditions