Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Modeling in RethinkDB

Data Modeling in RethinkDB

Jorge Silva

June 30, 2015
Tweet

More Decks by Jorge Silva

Other Decks in Programming

Transcript

  1. SQL • Very basic data types: strings, numbers, and date/time

    • All data is saved in rows under a predefined column (schema) • Complex data is modeled through relations (foreign keys)
  2. SQL • Relations favor data spread across multiple tables •

    Limited data types favor very normalized data
  3. SQL Family Tree People id name 1 Darth 2 Luke

    3 Leia ancestor descendant 1 1 1 2 1 3 TreePaths
  4. MongoDB • JSON support reduces the need for spreading data

    across tables • Joins happen at the application layer • Lack of relations and joins favors storing more data on a single document
  5. MongoDB Family Tree { "id": 1, "name": "Darth", "children": [

    { "name": "Luke" }, { "name": "Leia" } ] }, { "id": 2, "name": "Luke" }, { "id": 3, "name": "Leia", }
  6. MongoDB Family Tree { "id": 1, "name": "Darth", "children": [

    2, 3 ] }, { "id": 2, "name": "Luke" }, { "id": 3, "name": "Leia", } • Array of ids, joined at application level
  7. RethinkDB • RethinkDB allows for both: SQL style data modeling

    through joins and MongoDB style modeling through subdocuments
  8. Data Types • The introduction of arrays and objects completely

    changes how you structure your data • Arrays reduce the need for intersection tables • Objects allow for nested subdocuments and easy lookup
  9. RethinkDB vs SQL • Even though RethinkDB has joins, it

    has no relations. • There is no concept of a foreign key or cascading queries at a database-level • Schema enforcement has to be done manually
  10. Data Modeling in RethinkDB • Data modeling in RethinkDB is

    incredibly flexible. This flexibility may seem daunting at first • What you already know is applicable
  11. Main Tradeoff • Do I store this related piece of

    data as a subdocument or do I save it in a different table? • What you do depends on your use case
  12. How to decide what to do? • The format and

    structure of your data is not the only consideration • RethinkDB's flexibility allows you to optimize for ease-of-use and performance
  13. Structure of your data • Models should still be simple,

    faithful and avoid redundancy • Data should still be normalized and follow the three normal forms • Only break these rules for denormalization, but don't optimize prematurely
  14. Joins in query • How you read your data is

    important • If you always read your data in conjunction, you should store data in the same document • If data is read separately, it should be store in different tables
  15. Joins in query: Example • Example: You have 'cities' and

    'states' and you need to store the relationships between them
  16. Joins in query: Example • Options: • Have an intersection

    table • Store the whole city as a subdocument • Store the city IDs in the document
  17. Joins in query: Example // States { "id": 1, "name":

    "California" }, { "id": 2, "name": "Oregon" } // Cities { "id": 1, "name": "Portland" }, { "id": 2, "name": "Salem" }, { "id": 3, "name": "San Francisco" }, { "id": 4, "name": "Mountain View" } // State-to-city { "state": 1, "city": 3 }, { "state": 1, "city": 4 }, { "state": 2, "city": 1 }, { "state": 2, "city": 2 }
  18. Joins in query: Example # Reading all cities in state

    r.table("states") .get(stateId).merge({ "cities": r.table("cities") .get_all(r.args(r.table("state-to-city") .get_all(stateId, index="state")["city"] )) }) # Reading one city r.table("cities") .get_all("Mountain View", index="name")
  19. Joins in query: Example # Adding a state r.table("states") .insert({

    "name": "Washington" }) # Adding a city r.table("cities") .insert({ "name": "San Jose" }) r.table("state-to-city") .insert({ "state": stateId, "city": cityId })
  20. Joins in query: Example // States { "id": 1, "name":

    "California", "cities": [ { "name": "San Francisco" }, { "name": "Mountain View" } ] }, { "id": 2, "name": "Oregon", "cities": [ { "name": "Portland" }, { "name": "Salem" } ] }
  21. Joins in query: Example # Getting a state with cities

    r.table("states") .get(stateId) # Getting a city r.table("states") .filter(lambda row: row["cities"]["name"] == "San Jose" )[0] .map(lambda row: row.filter(lambda city: city["name"] == "San Jose") )
  22. Joins in query: Example # Adding a state r.table("states") .insert({

    "name": "Washington", "cities": [] }) # Adding a city r.table("states") .get(stateId) .update({ "cities": r.row["cities"].append({ "name": "San Jose" }) })
  23. Joins in query: Example // States { "id": 1, "name":

    "California", "cities": [ 3, 4 ] }, { "id": 2, "name": "Oregon", "cities": [ 1, 2 ] } // Cities { "id": 1, "name": "Portland" }, { "id": 2, "name": "Salem" }, { "id": 3, "name": "San Francisco" }, { "id": 4, "name": "Mountain View" }
  24. Joins in query: Example # Getting a state with cities

    r.table("states") .get(stateId).merge({ "cities": r.table("cities") .get_all(r.row["cities"]) }) # Getting a city r.table("cities") .get_all("Mountain View", index="name")
  25. Joins in query: Example # Adding a state r.table("states") .insert({

    "name": "Washington", "cities": [] }) # Adding a city r.table("cities") .insert({ "name": "San Jose" }) r.table("states") .get(stateId).update({ "cities": r.row["cities"].append(cityId) })
  26. Joins in query: Example # Creating an index for cities

    r.table("states") .index_create("city", lambda row: row["cities"]["name"], multi=True) # Query city using index r.table("states") .get_all("San Francisco", index="city") Using an index for subdocument
  27. Atomic Updates • Because RethinkDB is distributed, atomic updates can

    only be guaranteed on a per-document basis • If you need atomic updates, you must keep everything in one document
  28. • Example: You have an 'orders' table with an order

    id and an array of products Atomic Updates: Example
  29. • Options: • Store everything in one document to make

    the update atomic • Store it in separate documents and give up atomic guarantees Atomic Updates: Example
  30. Atomic Updates: Example # Update total and products in order

    r.table("orders").get(order_id).update({ "products": r.row["products"].append(product['id']), "total_price": r.row["total_price"] + product["price"]) }) • The complete query is guaranteed to succeed or fail
  31. Atomic Updates: Example # Update total and products in order

    r.table("orders").get(order_id).update({ "products": r.row["products"].append(product['id']), "total_price": r.row["total_price"] + product["price"]) }) • The complete query is guaranteed to succeed or fail
  32. Atomic Updates: Example # Update total and products in order

    r.table("orders").get(order_id).update({ "products": r.row["products"].append(product['id']), "total_price": r.row["total_price"] + product["price"]) }) • The complete query is guaranteed to succeed or fail
  33. Atomic Updates: Example # Update total and products in order

    r.table("orders").get(order_id).update({ "products": r.row["products"].append(product['id']), "total_price": r.row["total_price"] + product["price"]) }) • The complete query is guaranteed to succeed or fail
  34. Atomic Updates: Example # Update total in order r.table('orders').get(order_id).update({ "total_amount":

    r.row["total_amount"] + product["price"] }) .do(lambda result: # Update products in intersection table # If this fails, the `update` won't be unapplied r.table("orders_products").insert({ "order": order_id, "product": product["id"] }) ) • The complete query is not guaranteed to succeed or fail
  35. Atomic Updates: Example # Update total in order r.table('orders').get(order_id).update({ "total_amount":

    r.row["total_amount"] + product["price"] }) .do(lambda result: # Update products in intersection table # If this fails, the `update` won't be unapplied r.table("orders_products").insert({ "order": order_id, "product": product["id"] }) ) • The complete query is not guaranteed to succeed or fail
  36. Atomic Updates: Example # Update total in order r.table('orders').get(order_id).update({ "total_amount":

    r.row["total_amount"] + product["price"] }) .do(lambda result: # Update products in intersection table # If this fails, the `update` won't be unapplied r.table("orders_products").insert({ "order": order_id, "product": product["id"] }) ) • The complete query is not guaranteed to succeed or fail
  37. Atomic Updates: Example # Update total in order r.table('orders').get(order_id).update({ "total_amount":

    r.row["total_amount"] + product["price"] }) .do(lambda result: # Update products in intersection table # If this fails, the `update` won't be unapplied r.table("orders_products").insert({ "order": order_id, "product": product["id"] }) ) • The complete query is not guaranteed to succeed or fail
  38. Summary • Data modeling in RethinkDB is very flexible •

    The main tradeoff is subdocuments vs joins • How you model your data will depend heavily on how you interact with the database