Modeling_Relational_Data_in_DynamoDB.pdf

Modeling Relational Data in DynamoDB Manoj Fernando Tech Lead @
99X Technology @mjmrz Manoj Fernando Contact Info

Reference http://bit.ly/2O95lEs

Agenda • What is Data Modeling • NoSQL vs Relational
Databases • DynamoDB - Core Concepts • DynamoDB - Relational Data Modeling • Five Step Process for Data Modeling • A Use Case • Further Reading References

What is Data Modeling? Data modeling is how an application
stores data in a given database related to the real-world entities.

NoSQL vs Relational Databases Relational Databases • Optimized for storage
with data normalization • Each table has a strict schema • Each entity has its own table • Tables are connected with relationships • Need more compute power to retrieve data from multiple tables • Performance may degrade as the database scales

NoSQL Databases • Optimized for compute rather than storage •
Promote denormalization for less processing of data • Flexible schema • Designed for highly scalable applications NoSQL vs Relational Databases

DynamoDB • NoSQL offering from AWS • Cloud Native! (Unlike
MongoDB) • Provides consistent performance at any scale

DynamoDB - Consistency Model Millisecond Latency ~10ms Automatically store data
in Three Availability Zones Three options for Reads • Eventual Consistency • Strong Consistency • Transactional Two options for Writes • Standard • Transactional

DynamoDB - Capacity Units We provision/demand capacity units for a
table Specify throughput in terms of capacity units 1 capacity unit can handle 1 write or 1 read DynamoDB pricing is based on capacity units and storage Two type of capacity units • Read Capacity Unit (RCU) • Write Capacity Unit (WCU)

RCU and WCU 1 RCU 1 WCU 4KB data block
1KB data block 1 strongly consistent read per second 1 standard write per second 2 eventually consistent read per second 0.5 transactional write per second 0.5 transactional read per second E.g. If the read item size is 8KB - 2 RCU for strong consistent read - 1 RCUs for eventual consistent read - 4 RCUs for transactional read E.g. If the write item size is 2KB - 2 WCUs for standard write - 4 WCUs for transactional write

DynamoDB Partitions DynamoDB stores data in partitions (SSD volumes) A
table can have one or more partitions Number of partitions for a table depends on two things • Provisioned Throughput ◦ 1 Partition => 1000 WCUs or 3000 RCUs • Table Size ◦ 1 Partition => 10 GB of data If either 10GB or capacity unit limit exceed, DynamoDB adds a NEW partition

Primary Key

Primary Key vs Partitions DynamoDB primary key is used to
identify an item uniquely Primary key can have two forms • Only Partition key • Partition key + Sort key (Composite key) Partition Key(PK) is also known as Hash key Sort Key(SK) is also known as Range key Partition key is used to determine the partition that data is stored

Primary Key vs Partitions Partition Key (PK) Hashing Algorithm F(PK,
PartionCount) P1 P2 P3 P4 Eg: userId

Hot Partition Problem Partition Key (PK) Hashing Algorithm F(PK, PartionCount)
P1 P2 P3 P4 It’s important to choose a partition key with high cardinality

DynamoDB - Relational Data Modeling SELECT * FROM Orders INNER
JOIN Order_Items ON Orders.Order_ID = Order_Items.Order_ID INNER JOIN Products ON Products.Product_ID = Order_Items.Product_ID INNER JOIN Inventories ON Products.Product_ID = Inventories.Product_ID ORDER BY Quantity_on_Hand DESC

Relational Data Modeling - Relational DBs Provides a flexible API
for queries (As seen on previous slide) Limitations • Table joins require significant amount of processing • ACID transaction framework adds significant overhead to write process (It has to wait until all reads are completed in all the tables)

Relational Data Modeling - DynamoDB Achieve scalability by using, •
Schema Flexibility ◦ Store complex hierarchical data within a single item • Composite key design ◦ Store related items close together on the same table - (Using Sort key)

Mindset Change • Requires a shift in thinking from relational
data modeling • Don’t fake a relational model • Must identify the Access Patterns before table design • Most applications need only one table • Identify primary keys and indexes to minimize the number of requests to DynamoDB to satisfy each access pattern

Single Table for the Application • Make use of the
flexible schema to store different types of records in the same table • Identifying the most suitable Partition key and Sort key for each type of record to satisfy access patterns • Identifying secondary indexes for additional access patterns that cannot only be satisfied by the primary key (Partition and Sort keys) SELECT * FROM Table_X WHERE Attribute_Y = "somevalue" As a result - Simplified Queries

Five Steps Process 1. Draw an Entity Relationship Diagram (ERD)
2. Identify the relationships between entities (1:1, 1:N, N:M) 3. List down all the access patterns for each entity 4. Identify the primary key (Partition Key + Sort Key) for each entity 5. Identify the secondary indexes for additional access patterns if required Consider DynamoDB Design Patterns at Each Step

DynamoDB Design Patterns • Composite Key • Adjacency list pattern
• Write Sharding • GSI Write Sharding • GSI Overloading

Use Case - Project Management Tool SAAS Tool Multi-tenant Support
Manage Projects and Employees

Step 01 - Draw an ERD

Step 02 - Identify Relationships

Step 03 - Identify Access Patterns Organization • Organization CRUD
operations • Find all the projects of an organization • Find all the employees of an organization • Find all projects and employees of an organization • Find an organization by name

Step 03 - Identify Access Patterns Project • Project CRUD
operations • Find a project by name • Find the employees assigned to a project • Find agile projects • Find ﬁxed-bid projects • Find on-hold projects

Step 03 - Identify Access Patterns Employee • Employee CRUD
operations • Find all the projects an employee is part of • Find an employee by name

Step 03 - Identify Access Patterns Project-Employee • Assign an
employee to a project

Step 04 - Identify Primary Key Entity Partition Key (PK)
Sort Key (SK) Organization ORG#<org-id> #METADATA#<org-id> Project ORG#<org-id> PRO#<type>#<project-id> Employee ORG#<org-id> EMP#<emp-id> Project-Employee ORG#<org-id>#PRO#<project-id> ORG#<org-id>#EMP#<emp-id>

Step 04 - Identify Primary Key Entity Partition Key (PK)
Sort Key (SK) Organization ORG#<org-id> #METADATA#<org-id> Project ORG#<org-id> PRO#<type>#<project-id> Employee ORG#<org-id> EMP#<emp-id> Project-Employee ORG#<org-id>#PRO#<project-id> ORG#<org-id>#EMP#<emp-id> Stored in the same partition

Example Queries Organization (org-id=1234) Organization CRUD operations - PK=ORG#1234 ,
SK=#METADATA#1234 Find all the projects of an organization - PK=ORG#1234 , SK begins_with(PRO#) Find all the employees of an organization - PK=ORG#1234 , SK begins_with(EMP#) Find both employees and projects - PK=ORG#1234 Find organization by name - Not satisﬁed yet.. Entity Partition Key (PK) Sort Key (SK) Organization ORG#<org-id> #METADATA#<org-id> Project ORG#<org-id> PRO#<type>#<project-id> Employee ORG#<org-id> EMP#<emp-id>

Example Queries Project (org-id=1234, project-id=100) Project CRUD operations - PK=ORG#1234,
SK=PRO#agile#100 Find agile projects - PK= ORG#1234, SK begins_with(PRO#agile) Find fixed-bid projects - PK=ORG#1234, SK begins_with(PRO#fixed-bid) Find the employees assigned to a project - PK=ORG#1234#PRO#100 Find a project by name - Not yet satisfied Find on-hold projects - Not yet satisfied Entity Partition Key (PK) Sort Key (SK) Organization ORG#<org-id> #METADATA#<org-id> Project ORG#<org-id> PRO#<type>#<project-id> Project-Employee ORG#<org-id>#PRO#<project-id> ORG#<org-id>#EMP#<emp-id>

Example Queries Employee (org-id=1234, emp-id=300) Employee CRUD operations - PK=ORG#1234,
SK=EMP#300 Find all projects an employee is part of - Not yet satisﬁed Find all employees by name - Not yet satisﬁed Entity Partition Key (PK) Sort Key (SK) Employee ORG#<org-id> EMP#<emp-id>

Step 05 - Identify Secondary Indexes • Find all projects
an employee is part of - Use an inverted index Table Entity Partition Key (PK) Sort Key (SK) Project-Employee ORG#<org-id>#PRO#<project-id> ORG#<org-id>#EMP#<emp-id> GSI Name GSI Partition Key (PK) GSI Sort Key (SK) Project-Employee-Index ORG#<org-id>#EMP#<emp-id> ORG#<org-id>#PRO#<project-id> Query on GSI - PK=ORG#1234#EMP#300

Step 05 - Identify Secondary Indexes • Find all organizations,
projects, employees by name - GSI Overloading GSI/LSI Name GSI/LSI Partition Key (PK) GSI/LSI Sort Key (filterName) Filter-by-name-index ORG#<org-id> ORG#<org-name> Or EMP#<emp-name> Or PRO#<project-name> Find by org name - PK=ORG#1234 , ﬁlterName=ORG#HappyInc Find by emp name - PK=ORG#1234, ﬁlterName=EMP#Manoj

Advance Text Search Use an Elasticsearch attached to the DynamoDB
Enable DynamoDB streams to asynchronously index table data in ElasticSearch

Step 05 - Identify Secondary Indexes • Find on-hold projects
- Use a Sparse Index You can Query or Scan the GSI to ﬁnd all the on-hold projects GSI Name GSI Partition Key (is_on_hold) On-Hold-Project-Index <any_value> eg: true

Using Filter Conditions • If you need to further filter
the results by non-key attributes • A filter expression is applied after a Query finishes, but before the results are returned. Therefore, a Query consumes the same amount of read capacity, regardless of whether a filter expression is present. • Use filter conditions if the secondary indexes cost more than the filter conditions due to low query velocity or frequency

References First Steps for Modeling Relational Data in DynamoDB https://docs.aws.amazon.com/amazonDynamoDB/latest/developerguide/bp-modeling-nosql.html
Advance Design Patterns for DynamoDB - re:Invent 2018 video https://www.youtube.com/watch?v=HaEPXoXVf2k Database Design for a Mobile App with DynamoDB - (Workshop) https://aws.amazon.com/getting-started/projects/design-a-database-for-a-mobile-app-with-DynamoDB/ AWS DynamoDB Documentation https://docs.aws.amazon.com/amazonDynamoDB/latest/developerguide/Introduction.html

Thank You! Contact Information @mjmrz Manoj Fernando https://www.youtube.com/c/cloudtutorials

Modeling_Relational_Data_in_DynamoDB.pdf

Modeling_Relational_Data_in_DynamoDB.pdf

More Decks by Manoj Fernando

Featured

Transcript