Slide 1

Slide 1 text

Modeling Relational Data in DynamoDB Manoj Fernando Tech Lead @ 99X Technology @mjmrz Manoj Fernando Contact Info

Slide 2

Slide 2 text

Reference http://bit.ly/2O95lEs

Slide 3

Slide 3 text

Agenda ● What is Data Modeling ● NoSQL vs Relational Databases ● DynamoDB - Core Concepts ● DynamoDB - Relational Data Modeling ● Five Step Process for Data Modeling ● A Use Case ● Further Reading References

Slide 4

Slide 4 text

What is Data Modeling? Data modeling is how an application stores data in a given database related to the real-world entities.

Slide 5

Slide 5 text

NoSQL vs Relational Databases Relational Databases ● Optimized for storage with data normalization ● Each table has a strict schema ● Each entity has its own table ● Tables are connected with relationships ● Need more compute power to retrieve data from multiple tables ● Performance may degrade as the database scales

Slide 6

Slide 6 text

NoSQL Databases ● Optimized for compute rather than storage ● Promote denormalization for less processing of data ● Flexible schema ● Designed for highly scalable applications NoSQL vs Relational Databases

Slide 7

Slide 7 text

DynamoDB ● NoSQL offering from AWS ● Cloud Native! (Unlike MongoDB) ● Provides consistent performance at any scale

Slide 8

Slide 8 text

DynamoDB - Consistency Model Millisecond Latency ~10ms Automatically store data in Three Availability Zones Three options for Reads ● Eventual Consistency ● Strong Consistency ● Transactional Two options for Writes ● Standard ● Transactional

Slide 9

Slide 9 text

DynamoDB - Capacity Units We provision/demand capacity units for a table Specify throughput in terms of capacity units 1 capacity unit can handle 1 write or 1 read DynamoDB pricing is based on capacity units and storage Two type of capacity units ● Read Capacity Unit (RCU) ● Write Capacity Unit (WCU)

Slide 10

Slide 10 text

RCU and WCU 1 RCU 1 WCU 4KB data block 1KB data block 1 strongly consistent read per second 1 standard write per second 2 eventually consistent read per second 0.5 transactional write per second 0.5 transactional read per second E.g. If the read item size is 8KB - 2 RCU for strong consistent read - 1 RCUs for eventual consistent read - 4 RCUs for transactional read E.g. If the write item size is 2KB - 2 WCUs for standard write - 4 WCUs for transactional write

Slide 11

Slide 11 text

DynamoDB Partitions DynamoDB stores data in partitions (SSD volumes) A table can have one or more partitions Number of partitions for a table depends on two things ● Provisioned Throughput ○ 1 Partition => 1000 WCUs or 3000 RCUs ● Table Size ○ 1 Partition => 10 GB of data If either 10GB or capacity unit limit exceed, DynamoDB adds a NEW partition

Slide 12

Slide 12 text

Primary Key

Slide 13

Slide 13 text

Primary Key vs Partitions DynamoDB primary key is used to identify an item uniquely Primary key can have two forms ● Only Partition key ● Partition key + Sort key (Composite key) Partition Key(PK) is also known as Hash key Sort Key(SK) is also known as Range key Partition key is used to determine the partition that data is stored

Slide 14

Slide 14 text

Primary Key vs Partitions Partition Key (PK) Hashing Algorithm F(PK, PartionCount) P1 P2 P3 P4 Eg: userId

Slide 15

Slide 15 text

Hot Partition Problem Partition Key (PK) Hashing Algorithm F(PK, PartionCount) P1 P2 P3 P4 It’s important to choose a partition key with high cardinality

Slide 16

Slide 16 text

DynamoDB - Relational Data Modeling SELECT * FROM Orders INNER JOIN Order_Items ON Orders.Order_ID = Order_Items.Order_ID INNER JOIN Products ON Products.Product_ID = Order_Items.Product_ID INNER JOIN Inventories ON Products.Product_ID = Inventories.Product_ID ORDER BY Quantity_on_Hand DESC

Slide 17

Slide 17 text

Relational Data Modeling - Relational DBs Provides a flexible API for queries (As seen on previous slide) Limitations ● Table joins require significant amount of processing ● ACID transaction framework adds significant overhead to write process (It has to wait until all reads are completed in all the tables)

Slide 18

Slide 18 text

Relational Data Modeling - DynamoDB Achieve scalability by using, ● Schema Flexibility ○ Store complex hierarchical data within a single item ● Composite key design ○ Store related items close together on the same table - (Using Sort key)

Slide 19

Slide 19 text

Mindset Change ● Requires a shift in thinking from relational data modeling ● Don’t fake a relational model ● Must identify the Access Patterns before table design ● Most applications need only one table ● Identify primary keys and indexes to minimize the number of requests to DynamoDB to satisfy each access pattern

Slide 20

Slide 20 text

Single Table for the Application ● Make use of the flexible schema to store different types of records in the same table ● Identifying the most suitable Partition key and Sort key for each type of record to satisfy access patterns ● Identifying secondary indexes for additional access patterns that cannot only be satisfied by the primary key (Partition and Sort keys) SELECT * FROM Table_X WHERE Attribute_Y = "somevalue" As a result - Simplified Queries

Slide 21

Slide 21 text

Five Steps Process 1. Draw an Entity Relationship Diagram (ERD) 2. Identify the relationships between entities (1:1, 1:N, N:M) 3. List down all the access patterns for each entity 4. Identify the primary key (Partition Key + Sort Key) for each entity 5. Identify the secondary indexes for additional access patterns if required Consider DynamoDB Design Patterns at Each Step

Slide 22

Slide 22 text

DynamoDB Design Patterns ● Composite Key ● Adjacency list pattern ● Write Sharding ● GSI Write Sharding ● GSI Overloading

Slide 23

Slide 23 text

Use Case - Project Management Tool SAAS Tool Multi-tenant Support Manage Projects and Employees

Slide 24

Slide 24 text

Step 01 - Draw an ERD

Slide 25

Slide 25 text

Step 02 - Identify Relationships

Slide 26

Slide 26 text

Step 03 - Identify Access Patterns Organization ● Organization CRUD operations ● Find all the projects of an organization ● Find all the employees of an organization ● Find all projects and employees of an organization ● Find an organization by name

Slide 27

Slide 27 text

Step 03 - Identify Access Patterns Project ● Project CRUD operations ● Find a project by name ● Find the employees assigned to a project ● Find agile projects ● Find fixed-bid projects ● Find on-hold projects

Slide 28

Slide 28 text

Step 03 - Identify Access Patterns Employee ● Employee CRUD operations ● Find all the projects an employee is part of ● Find an employee by name

Slide 29

Slide 29 text

Step 03 - Identify Access Patterns Project-Employee ● Assign an employee to a project

Slide 30

Slide 30 text

Step 04 - Identify Primary Key Entity Partition Key (PK) Sort Key (SK) Organization ORG# #METADATA# Project ORG# PRO## Employee ORG# EMP# Project-Employee ORG##PRO# ORG##EMP#

Slide 31

Slide 31 text

Step 04 - Identify Primary Key Entity Partition Key (PK) Sort Key (SK) Organization ORG# #METADATA# Project ORG# PRO## Employee ORG# EMP# Project-Employee ORG##PRO# ORG##EMP# Stored in the same partition

Slide 32

Slide 32 text

Example Queries Organization (org-id=1234) Organization CRUD operations - PK=ORG#1234 , SK=#METADATA#1234 Find all the projects of an organization - PK=ORG#1234 , SK begins_with(PRO#) Find all the employees of an organization - PK=ORG#1234 , SK begins_with(EMP#) Find both employees and projects - PK=ORG#1234 Find organization by name - Not satisfied yet.. Entity Partition Key (PK) Sort Key (SK) Organization ORG# #METADATA# Project ORG# PRO## Employee ORG# EMP#

Slide 33

Slide 33 text

Example Queries Project (org-id=1234, project-id=100) Project CRUD operations - PK=ORG#1234, SK=PRO#agile#100 Find agile projects - PK= ORG#1234, SK begins_with(PRO#agile) Find fixed-bid projects - PK=ORG#1234, SK begins_with(PRO#fixed-bid) Find the employees assigned to a project - PK=ORG#1234#PRO#100 Find a project by name - Not yet satisfied Find on-hold projects - Not yet satisfied Entity Partition Key (PK) Sort Key (SK) Organization ORG# #METADATA# Project ORG# PRO## Project-Employee ORG##PRO# ORG##EMP#

Slide 34

Slide 34 text

Example Queries Employee (org-id=1234, emp-id=300) Employee CRUD operations - PK=ORG#1234, SK=EMP#300 Find all projects an employee is part of - Not yet satisfied Find all employees by name - Not yet satisfied Entity Partition Key (PK) Sort Key (SK) Employee ORG# EMP#

Slide 35

Slide 35 text

Step 05 - Identify Secondary Indexes ● Find all projects an employee is part of - Use an inverted index Table Entity Partition Key (PK) Sort Key (SK) Project-Employee ORG##PRO# ORG##EMP# GSI Name GSI Partition Key (PK) GSI Sort Key (SK) Project-Employee-Index ORG##EMP# ORG##PRO# Query on GSI - PK=ORG#1234#EMP#300

Slide 36

Slide 36 text

Step 05 - Identify Secondary Indexes ● Find all organizations, projects, employees by name - GSI Overloading GSI/LSI Name GSI/LSI Partition Key (PK) GSI/LSI Sort Key (filterName) Filter-by-name-index ORG# ORG# Or EMP# Or PRO# Find by org name - PK=ORG#1234 , filterName=ORG#HappyInc Find by emp name - PK=ORG#1234, filterName=EMP#Manoj

Slide 37

Slide 37 text

Advance Text Search Use an Elasticsearch attached to the DynamoDB Enable DynamoDB streams to asynchronously index table data in ElasticSearch

Slide 38

Slide 38 text

Step 05 - Identify Secondary Indexes ● Find on-hold projects - Use a Sparse Index You can Query or Scan the GSI to find all the on-hold projects GSI Name GSI Partition Key (is_on_hold) On-Hold-Project-Index eg: true

Slide 39

Slide 39 text

Using Filter Conditions ● If you need to further filter the results by non-key attributes ● A filter expression is applied after a Query finishes, but before the results are returned. Therefore, a Query consumes the same amount of read capacity, regardless of whether a filter expression is present. ● Use filter conditions if the secondary indexes cost more than the filter conditions due to low query velocity or frequency

Slide 40

Slide 40 text

References First Steps for Modeling Relational Data in DynamoDB https://docs.aws.amazon.com/amazonDynamoDB/latest/developerguide/bp-modeling-nosql.html Advance Design Patterns for DynamoDB - re:Invent 2018 video https://www.youtube.com/watch?v=HaEPXoXVf2k Database Design for a Mobile App with DynamoDB - (Workshop) https://aws.amazon.com/getting-started/projects/design-a-database-for-a-mobile-app-with-DynamoDB/ AWS DynamoDB Documentation https://docs.aws.amazon.com/amazonDynamoDB/latest/developerguide/Introduction.html

Slide 41

Slide 41 text

Thank You! Contact Information @mjmrz Manoj Fernando https://www.youtube.com/c/cloudtutorials