Data Modeling in RethinkDB

Slide 1

Slide 1 text

RethinkDB Meetup San Francisco, California June 29, 2015 Data Modeling in RethinkDB

Slide 2

Slide 2 text

Jorge Silva @thejsj Developer Evangelist @ RethinkDB

Slide 3

Slide 3 text

Introduction Data modeling in other databases

Slide 4

Slide 4 text

SQL • Very basic data types: strings, numbers, and date/time • All data is saved in rows under a predefined column (schema) • Complex data is modeled through relations (foreign keys)

Slide 5

Slide 5 text

SQL • Relations favor data spread across multiple tables • Limited data types favor very normalized data

Slide 6

Slide 6 text

SQL Family Tree People id name 1 Darth 2 Luke 3 Leia ancestor descendant 1 1 1 2 1 3 TreePaths

Slide 7

Slide 7 text

MongoDB • JSON support reduces the need for spreading data across tables • Joins happen at the application layer • Lack of relations and joins favors storing more data on a single document

Slide 8

Slide 8 text

MongoDB Family Tree { "id": 1, "name": "Darth", "children": [ { "name": "Luke" }, { "name": "Leia" } ] }, { "id": 2, "name": "Luke" }, { "id": 3, "name": "Leia", }

Slide 9

Slide 9 text

MongoDB Family Tree { "id": 1, "name": "Darth", "children": [ 2, 3 ] }, { "id": 2, "name": "Luke" }, { "id": 3, "name": "Leia", } • Array of ids, joined at application level

Slide 10

Slide 10 text

RethinkDB • RethinkDB allows for both: SQL style data modeling through joins and MongoDB style modeling through subdocuments

Slide 11

Slide 11 text

Data Types • The introduction of arrays and objects completely changes how you structure your data • Arrays reduce the need for intersection tables • Objects allow for nested subdocuments and easy lookup

Slide 12

Slide 12 text

RethinkDB vs SQL • Even though RethinkDB has joins, it has no relations. • There is no concept of a foreign key or cascading queries at a database-level • Schema enforcement has to be done manually

Slide 13

Slide 13 text

Data Modeling in RethinkDB • Data modeling in RethinkDB is incredibly flexible. This flexibility may seem daunting at first • What you already know is applicable

Slide 14

Slide 14 text

Data Modeling in RethinkDB You still need to think about your data

Slide 15

Slide 15 text

Thinking about your database Tradeoﬀs

Slide 16

Slide 16 text

Main Tradeoff • Do I store this related piece of data as a subdocument or do I save it in a diﬀerent table? • What you do depends on your use case

Slide 17

Slide 17 text

Structure and Performance Considerations

Slide 18

Slide 18 text

How to decide what to do? • The format and structure of your data is not the only consideration • RethinkDB's flexibility allows you to optimize for ease-of-use and performance

Slide 19

Slide 19 text

Considerations • Structure of your data • Joins in query • Atomic updates

Slide 20

Slide 20 text

Structure of your data • Models should still be simple, faithful and avoid redundancy • Data should still be normalized and follow the three normal forms • Only break these rules for denormalization, but don't optimize prematurely

Slide 21

Slide 21 text

Joins in query • How you read your data is important • If you always read your data in conjunction, you should store data in the same document • If data is read separately, it should be store in diﬀerent tables

Slide 22

Slide 22 text

Joins in query: Example • Example: You have 'cities' and 'states' and you need to store the relationships between them

Slide 23

Slide 23 text

Joins in query: Example • Options: • Have an intersection table • Store the whole city as a subdocument • Store the city IDs in the document

Slide 24

Slide 24 text

Joins in query: Example // States { "id": 1, "name": "California" }, { "id": 2, "name": "Oregon" } // Cities { "id": 1, "name": "Portland" }, { "id": 2, "name": "Salem" }, { "id": 3, "name": "San Francisco" }, { "id": 4, "name": "Mountain View" } // State-to-city { "state": 1, "city": 3 }, { "state": 1, "city": 4 }, { "state": 2, "city": 1 }, { "state": 2, "city": 2 }

Slide 25

Slide 25 text

Joins in query: Example # Reading all cities in state r.table("states") .get(stateId).merge({ "cities": r.table("cities") .get_all(r.args(r.table("state-to-city") .get_all(stateId, index="state")["city"] )) }) # Reading one city r.table("cities") .get_all("Mountain View", index="name")

Slide 26

Slide 26 text

Joins in query: Example # Adding a state r.table("states") .insert({ "name": "Washington" }) # Adding a city r.table("cities") .insert({ "name": "San Jose" }) r.table("state-to-city") .insert({ "state": stateId, "city": cityId })

Slide 27

Slide 27 text

Joins in query: Example // States { "id": 1, "name": "California", "cities": [ { "name": "San Francisco" }, { "name": "Mountain View" } ] }, { "id": 2, "name": "Oregon", "cities": [ { "name": "Portland" }, { "name": "Salem" } ] }

Slide 28

Slide 28 text

Joins in query: Example # Getting a state with cities r.table("states") .get(stateId) # Getting a city r.table("states") .filter(lambda row: row["cities"]["name"] == "San Jose" )[0] .map(lambda row: row.filter(lambda city: city["name"] == "San Jose") )

Slide 29

Slide 29 text

Joins in query: Example # Adding a state r.table("states") .insert({ "name": "Washington", "cities": [] }) # Adding a city r.table("states") .get(stateId) .update({ "cities": r.row["cities"].append({ "name": "San Jose" }) })

Slide 30

Slide 30 text

Joins in query: Example // States { "id": 1, "name": "California", "cities": [ 3, 4 ] }, { "id": 2, "name": "Oregon", "cities": [ 1, 2 ] } // Cities { "id": 1, "name": "Portland" }, { "id": 2, "name": "Salem" }, { "id": 3, "name": "San Francisco" }, { "id": 4, "name": "Mountain View" }

Slide 31

Slide 31 text

Joins in query: Example # Getting a state with cities r.table("states") .get(stateId).merge({ "cities": r.table("cities") .get_all(r.row["cities"]) }) # Getting a city r.table("cities") .get_all("Mountain View", index="name")

Slide 32

Slide 32 text

Joins in query: Example # Adding a state r.table("states") .insert({ "name": "Washington", "cities": [] }) # Adding a city r.table("cities") .insert({ "name": "San Jose" }) r.table("states") .get(stateId).update({ "cities": r.row["cities"].append(cityId) })

Slide 33

Slide 33 text

Joins in query: Example # Creating an index for cities r.table("states") .index_create("city", lambda row: row["cities"]["name"], multi=True) # Query city using index r.table("states") .get_all("San Francisco", index="city") Using an index for subdocument

Slide 34

Slide 34 text

Atomic Updates • Because RethinkDB is distributed, atomic updates can only be guaranteed on a per-document basis • If you need atomic updates, you must keep everything in one document

Slide 35

Slide 35 text

• Example: You have an 'orders' table with an order id and an array of products Atomic Updates: Example

Slide 36

Slide 36 text

• Options: • Store everything in one document to make the update atomic • Store it in separate documents and give up atomic guarantees Atomic Updates: Example

Slide 37

Slide 37 text

Atomic Updates: Example # Update total and products in order r.table("orders").get(order_id).update({ "products": r.row["products"].append(product['id']), "total_price": r.row["total_price"] + product["price"]) }) • The complete query is guaranteed to succeed or fail

Slide 38

Slide 38 text

Slide 39

Slide 39 text

Slide 40

Slide 40 text

Slide 41

Slide 41 text

Atomic Updates: Example # Update total in order r.table('orders').get(order_id).update({ "total_amount": r.row["total_amount"] + product["price"] }) .do(lambda result: # Update products in intersection table # If this fails, the `update` won't be unapplied r.table("orders_products").insert({ "order": order_id, "product": product["id"] }) ) • The complete query is not guaranteed to succeed or fail

Slide 42

Slide 42 text

Slide 43

Slide 43 text

Slide 44

Slide 44 text

Slide 45

Slide 45 text

Summary • Data modeling in RethinkDB is very flexible • The main tradeoﬀ is subdocuments vs joins • How you model your data will depend heavily on how you interact with the database