Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Modeling in RethinkDB

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

Data Modeling in RethinkDB

Avatar for Jorge Silva

Jorge Silva

June 30, 2015
Tweet

More Decks by Jorge Silva

Other Decks in Programming

Transcript

  1. SQL • Very basic data types: strings, numbers, and date/time

    • All data is saved in rows under a predefined column (schema) • Complex data is modeled through relations (foreign keys)
  2. SQL • Relations favor data spread across multiple tables •

    Limited data types favor very normalized data
  3. SQL Family Tree People id name 1 Darth 2 Luke

    3 Leia ancestor descendant 1 1 1 2 1 3 TreePaths
  4. MongoDB • JSON support reduces the need for spreading data

    across tables • Joins happen at the application layer • Lack of relations and joins favors storing more data on a single document
  5. MongoDB Family Tree { "id": 1, "name": "Darth", "children": [

    { "name": "Luke" }, { "name": "Leia" } ] }, { "id": 2, "name": "Luke" }, { "id": 3, "name": "Leia", }
  6. MongoDB Family Tree { "id": 1, "name": "Darth", "children": [

    2, 3 ] }, { "id": 2, "name": "Luke" }, { "id": 3, "name": "Leia", } • Array of ids, joined at application level
  7. RethinkDB • RethinkDB allows for both: SQL style data modeling

    through joins and MongoDB style modeling through subdocuments
  8. Data Types • The introduction of arrays and objects completely

    changes how you structure your data • Arrays reduce the need for intersection tables • Objects allow for nested subdocuments and easy lookup
  9. RethinkDB vs SQL • Even though RethinkDB has joins, it

    has no relations. • There is no concept of a foreign key or cascading queries at a database-level • Schema enforcement has to be done manually
  10. Data Modeling in RethinkDB • Data modeling in RethinkDB is

    incredibly flexible. This flexibility may seem daunting at first • What you already know is applicable
  11. Main Tradeoff • Do I store this related piece of

    data as a subdocument or do I save it in a different table? • What you do depends on your use case
  12. How to decide what to do? • The format and

    structure of your data is not the only consideration • RethinkDB's flexibility allows you to optimize for ease-of-use and performance
  13. Structure of your data • Models should still be simple,

    faithful and avoid redundancy • Data should still be normalized and follow the three normal forms • Only break these rules for denormalization, but don't optimize prematurely
  14. Joins in query • How you read your data is

    important • If you always read your data in conjunction, you should store data in the same document • If data is read separately, it should be store in different tables
  15. Joins in query: Example • Example: You have 'cities' and

    'states' and you need to store the relationships between them
  16. Joins in query: Example • Options: • Have an intersection

    table • Store the whole city as a subdocument • Store the city IDs in the document
  17. Joins in query: Example // States { "id": 1, "name":

    "California" }, { "id": 2, "name": "Oregon" } // Cities { "id": 1, "name": "Portland" }, { "id": 2, "name": "Salem" }, { "id": 3, "name": "San Francisco" }, { "id": 4, "name": "Mountain View" } // State-to-city { "state": 1, "city": 3 }, { "state": 1, "city": 4 }, { "state": 2, "city": 1 }, { "state": 2, "city": 2 }
  18. Joins in query: Example # Reading all cities in state

    r.table("states") .get(stateId).merge({ "cities": r.table("cities") .get_all(r.args(r.table("state-to-city") .get_all(stateId, index="state")["city"] )) }) # Reading one city r.table("cities") .get_all("Mountain View", index="name")
  19. Joins in query: Example # Adding a state r.table("states") .insert({

    "name": "Washington" }) # Adding a city r.table("cities") .insert({ "name": "San Jose" }) r.table("state-to-city") .insert({ "state": stateId, "city": cityId })
  20. Joins in query: Example // States { "id": 1, "name":

    "California", "cities": [ { "name": "San Francisco" }, { "name": "Mountain View" } ] }, { "id": 2, "name": "Oregon", "cities": [ { "name": "Portland" }, { "name": "Salem" } ] }
  21. Joins in query: Example # Getting a state with cities

    r.table("states") .get(stateId) # Getting a city r.table("states") .filter(lambda row: row["cities"]["name"] == "San Jose" )[0] .map(lambda row: row.filter(lambda city: city["name"] == "San Jose") )
  22. Joins in query: Example # Adding a state r.table("states") .insert({

    "name": "Washington", "cities": [] }) # Adding a city r.table("states") .get(stateId) .update({ "cities": r.row["cities"].append({ "name": "San Jose" }) })
  23. Joins in query: Example // States { "id": 1, "name":

    "California", "cities": [ 3, 4 ] }, { "id": 2, "name": "Oregon", "cities": [ 1, 2 ] } // Cities { "id": 1, "name": "Portland" }, { "id": 2, "name": "Salem" }, { "id": 3, "name": "San Francisco" }, { "id": 4, "name": "Mountain View" }
  24. Joins in query: Example # Getting a state with cities

    r.table("states") .get(stateId).merge({ "cities": r.table("cities") .get_all(r.row["cities"]) }) # Getting a city r.table("cities") .get_all("Mountain View", index="name")
  25. Joins in query: Example # Adding a state r.table("states") .insert({

    "name": "Washington", "cities": [] }) # Adding a city r.table("cities") .insert({ "name": "San Jose" }) r.table("states") .get(stateId).update({ "cities": r.row["cities"].append(cityId) })
  26. Joins in query: Example # Creating an index for cities

    r.table("states") .index_create("city", lambda row: row["cities"]["name"], multi=True) # Query city using index r.table("states") .get_all("San Francisco", index="city") Using an index for subdocument
  27. Atomic Updates • Because RethinkDB is distributed, atomic updates can

    only be guaranteed on a per-document basis • If you need atomic updates, you must keep everything in one document
  28. • Example: You have an 'orders' table with an order

    id and an array of products Atomic Updates: Example
  29. • Options: • Store everything in one document to make

    the update atomic • Store it in separate documents and give up atomic guarantees Atomic Updates: Example
  30. Atomic Updates: Example # Update total and products in order

    r.table("orders").get(order_id).update({ "products": r.row["products"].append(product['id']), "total_price": r.row["total_price"] + product["price"]) }) • The complete query is guaranteed to succeed or fail
  31. Atomic Updates: Example # Update total and products in order

    r.table("orders").get(order_id).update({ "products": r.row["products"].append(product['id']), "total_price": r.row["total_price"] + product["price"]) }) • The complete query is guaranteed to succeed or fail
  32. Atomic Updates: Example # Update total and products in order

    r.table("orders").get(order_id).update({ "products": r.row["products"].append(product['id']), "total_price": r.row["total_price"] + product["price"]) }) • The complete query is guaranteed to succeed or fail
  33. Atomic Updates: Example # Update total and products in order

    r.table("orders").get(order_id).update({ "products": r.row["products"].append(product['id']), "total_price": r.row["total_price"] + product["price"]) }) • The complete query is guaranteed to succeed or fail
  34. Atomic Updates: Example # Update total in order r.table('orders').get(order_id).update({ "total_amount":

    r.row["total_amount"] + product["price"] }) .do(lambda result: # Update products in intersection table # If this fails, the `update` won't be unapplied r.table("orders_products").insert({ "order": order_id, "product": product["id"] }) ) • The complete query is not guaranteed to succeed or fail
  35. Atomic Updates: Example # Update total in order r.table('orders').get(order_id).update({ "total_amount":

    r.row["total_amount"] + product["price"] }) .do(lambda result: # Update products in intersection table # If this fails, the `update` won't be unapplied r.table("orders_products").insert({ "order": order_id, "product": product["id"] }) ) • The complete query is not guaranteed to succeed or fail
  36. Atomic Updates: Example # Update total in order r.table('orders').get(order_id).update({ "total_amount":

    r.row["total_amount"] + product["price"] }) .do(lambda result: # Update products in intersection table # If this fails, the `update` won't be unapplied r.table("orders_products").insert({ "order": order_id, "product": product["id"] }) ) • The complete query is not guaranteed to succeed or fail
  37. Atomic Updates: Example # Update total in order r.table('orders').get(order_id).update({ "total_amount":

    r.row["total_amount"] + product["price"] }) .do(lambda result: # Update products in intersection table # If this fails, the `update` won't be unapplied r.table("orders_products").insert({ "order": order_id, "product": product["id"] }) ) • The complete query is not guaranteed to succeed or fail
  38. Summary • Data modeling in RethinkDB is very flexible •

    The main tradeoff is subdocuments vs joins • How you model your data will depend heavily on how you interact with the database