Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Modeling in RethinkDB

Data Modeling in RethinkDB

Jorge Silva

June 30, 2015
Tweet

More Decks by Jorge Silva

Other Decks in Programming

Transcript

  1. RethinkDB Meetup
    San Francisco, California
    June 29, 2015
    Data Modeling in
    RethinkDB

    View Slide

  2. Jorge Silva
    @thejsj
    Developer Evangelist @ RethinkDB

    View Slide

  3. Introduction
    Data modeling in other
    databases

    View Slide

  4. SQL
    • Very basic data types: strings,
    numbers, and date/time
    • All data is saved in rows under a
    predefined column (schema)
    • Complex data is modeled through
    relations (foreign keys)

    View Slide

  5. SQL
    • Relations favor data spread across
    multiple tables
    • Limited data types favor very
    normalized data

    View Slide

  6. SQL Family Tree
    People
    id name
    1 Darth
    2 Luke
    3 Leia
    ancestor descendant
    1 1
    1 2
    1 3
    TreePaths

    View Slide

  7. MongoDB
    • JSON support reduces the need for
    spreading data across tables
    • Joins happen at the application layer
    • Lack of relations and joins favors
    storing more data on a single
    document

    View Slide

  8. MongoDB Family Tree
    {
    "id": 1,
    "name": "Darth",
    "children": [
    { "name": "Luke" },
    { "name": "Leia" }
    ]
    }, {
    "id": 2,
    "name": "Luke"
    }, {
    "id": 3,
    "name": "Leia",
    }

    View Slide

  9. MongoDB Family Tree
    {
    "id": 1,
    "name": "Darth",
    "children": [ 2, 3 ]
    }, {
    "id": 2,
    "name": "Luke"
    }, {
    "id": 3,
    "name": "Leia",
    }
    • Array of ids, joined at application level

    View Slide

  10. RethinkDB
    • RethinkDB allows for both: SQL
    style data modeling through joins
    and MongoDB style modeling
    through subdocuments

    View Slide

  11. Data Types
    • The introduction of arrays and
    objects completely changes how you
    structure your data
    • Arrays reduce the need for
    intersection tables
    • Objects allow for nested
    subdocuments and easy lookup

    View Slide

  12. RethinkDB vs SQL
    • Even though RethinkDB has joins, it
    has no relations.
    • There is no concept of a foreign key or
    cascading queries at a database-level
    • Schema enforcement has to be done
    manually

    View Slide

  13. Data Modeling in RethinkDB
    • Data modeling in RethinkDB is
    incredibly flexible. This flexibility
    may seem daunting at first
    • What you already know is
    applicable

    View Slide

  14. Data Modeling in RethinkDB
    You still need to think about
    your data

    View Slide

  15. Thinking about your database
    Tradeoffs

    View Slide

  16. Main Tradeoff
    • Do I store this related piece of
    data as a subdocument or do I
    save it in a different table?
    • What you do depends on your use
    case

    View Slide

  17. Structure and Performance
    Considerations

    View Slide

  18. How to decide what to do?
    • The format and structure of your
    data is not the only consideration
    • RethinkDB's flexibility allows you
    to optimize for ease-of-use and
    performance

    View Slide

  19. Considerations
    • Structure of your data
    • Joins in query
    • Atomic updates

    View Slide

  20. Structure of your data
    • Models should still be simple, faithful
    and avoid redundancy
    • Data should still be normalized and
    follow the three normal forms
    • Only break these rules for
    denormalization, but don't optimize
    prematurely

    View Slide

  21. Joins in query
    • How you read your data is important
    • If you always read your data in
    conjunction, you should store data
    in the same document
    • If data is read separately, it should
    be store in different tables

    View Slide

  22. Joins in query: Example
    • Example: You have 'cities' and
    'states' and you need to store the
    relationships between them

    View Slide

  23. Joins in query: Example
    • Options:
    • Have an intersection table
    • Store the whole city as a
    subdocument
    • Store the city IDs in the document

    View Slide

  24. Joins in query: Example
    // States
    { "id": 1, "name": "California" },
    { "id": 2, "name": "Oregon" }
    // Cities
    { "id": 1, "name": "Portland" },
    { "id": 2, "name": "Salem" },
    { "id": 3, "name": "San Francisco" },
    { "id": 4, "name": "Mountain View" }
    // State-to-city
    { "state": 1, "city": 3 },
    { "state": 1, "city": 4 },
    { "state": 2, "city": 1 },
    { "state": 2, "city": 2 }

    View Slide

  25. Joins in query: Example
    # Reading all cities in state
    r.table("states")
    .get(stateId).merge({
    "cities": r.table("cities")
    .get_all(r.args(r.table("state-to-city")
    .get_all(stateId, index="state")["city"]
    ))
    })
    # Reading one city
    r.table("cities")
    .get_all("Mountain View", index="name")

    View Slide

  26. Joins in query: Example
    # Adding a state
    r.table("states")
    .insert({ "name": "Washington" })
    # Adding a city
    r.table("cities")
    .insert({ "name": "San Jose" })
    r.table("state-to-city")
    .insert({ "state": stateId, "city": cityId })

    View Slide

  27. Joins in query: Example
    // States
    {
    "id": 1,
    "name": "California",
    "cities": [
    { "name": "San Francisco" },
    { "name": "Mountain View" }
    ]
    }, {
    "id": 2,
    "name": "Oregon",
    "cities": [
    { "name": "Portland" },
    { "name": "Salem" }
    ]
    }

    View Slide

  28. Joins in query: Example
    # Getting a state with cities
    r.table("states")
    .get(stateId)
    # Getting a city
    r.table("states")
    .filter(lambda row:
    row["cities"]["name"] == "San Jose"
    )[0]
    .map(lambda row:
    row.filter(lambda city:
    city["name"] == "San Jose")
    )

    View Slide

  29. Joins in query: Example
    # Adding a state
    r.table("states")
    .insert({ "name": "Washington", "cities": [] })
    # Adding a city
    r.table("states")
    .get(stateId)
    .update({ "cities":
    r.row["cities"].append({ "name": "San Jose" })
    })

    View Slide

  30. Joins in query: Example
    // States
    {
    "id": 1,
    "name": "California",
    "cities": [ 3, 4 ]
    }, {
    "id": 2,
    "name": "Oregon",
    "cities": [ 1, 2 ]
    }
    // Cities
    { "id": 1, "name": "Portland" },
    { "id": 2, "name": "Salem" },
    { "id": 3, "name": "San Francisco" },
    { "id": 4, "name": "Mountain View" }

    View Slide

  31. Joins in query: Example
    # Getting a state with cities
    r.table("states")
    .get(stateId).merge({
    "cities": r.table("cities")
    .get_all(r.row["cities"])
    })
    # Getting a city
    r.table("cities")
    .get_all("Mountain View", index="name")

    View Slide

  32. Joins in query: Example
    # Adding a state
    r.table("states")
    .insert({ "name": "Washington", "cities": [] })
    # Adding a city
    r.table("cities")
    .insert({ "name": "San Jose" })
    r.table("states")
    .get(stateId).update({
    "cities": r.row["cities"].append(cityId)
    })

    View Slide

  33. Joins in query: Example
    # Creating an index for cities
    r.table("states")
    .index_create("city", lambda row:
    row["cities"]["name"], multi=True)
    # Query city using index
    r.table("states")
    .get_all("San Francisco", index="city")
    Using an index for subdocument

    View Slide

  34. Atomic Updates
    • Because RethinkDB is distributed,
    atomic updates can only be
    guaranteed on a per-document
    basis
    • If you need atomic updates, you
    must keep everything in one
    document

    View Slide

  35. • Example: You have an 'orders'
    table with an order id and an
    array of products
    Atomic Updates: Example

    View Slide

  36. • Options:
    • Store everything in one
    document to make the update
    atomic
    • Store it in separate documents
    and give up atomic guarantees
    Atomic Updates: Example

    View Slide

  37. Atomic Updates: Example
    # Update total and products in order
    r.table("orders").get(order_id).update({
    "products": r.row["products"].append(product['id']),
    "total_price": r.row["total_price"] + product["price"])
    })
    • The complete query is
    guaranteed to succeed or fail

    View Slide

  38. Atomic Updates: Example
    # Update total and products in order
    r.table("orders").get(order_id).update({
    "products": r.row["products"].append(product['id']),
    "total_price": r.row["total_price"] + product["price"])
    })
    • The complete query is
    guaranteed to succeed or fail

    View Slide

  39. Atomic Updates: Example
    # Update total and products in order
    r.table("orders").get(order_id).update({
    "products": r.row["products"].append(product['id']),
    "total_price": r.row["total_price"] + product["price"])
    })
    • The complete query is
    guaranteed to succeed or fail

    View Slide

  40. Atomic Updates: Example
    # Update total and products in order
    r.table("orders").get(order_id).update({
    "products": r.row["products"].append(product['id']),
    "total_price": r.row["total_price"] + product["price"])
    })
    • The complete query is
    guaranteed to succeed or fail

    View Slide

  41. Atomic Updates: Example
    # Update total in order
    r.table('orders').get(order_id).update({
    "total_amount": r.row["total_amount"] + product["price"]
    })
    .do(lambda result:
    # Update products in intersection table
    # If this fails, the `update` won't be unapplied
    r.table("orders_products").insert({
    "order": order_id, "product": product["id"]
    })
    )
    • The complete query is not
    guaranteed to succeed or fail

    View Slide

  42. Atomic Updates: Example
    # Update total in order
    r.table('orders').get(order_id).update({
    "total_amount": r.row["total_amount"] + product["price"]
    })
    .do(lambda result:
    # Update products in intersection table
    # If this fails, the `update` won't be unapplied
    r.table("orders_products").insert({
    "order": order_id, "product": product["id"]
    })
    )
    • The complete query is not
    guaranteed to succeed or fail

    View Slide

  43. Atomic Updates: Example
    # Update total in order
    r.table('orders').get(order_id).update({
    "total_amount": r.row["total_amount"] + product["price"]
    })
    .do(lambda result:
    # Update products in intersection table
    # If this fails, the `update` won't be unapplied
    r.table("orders_products").insert({
    "order": order_id, "product": product["id"]
    })
    )
    • The complete query is not
    guaranteed to succeed or fail

    View Slide

  44. Atomic Updates: Example
    # Update total in order
    r.table('orders').get(order_id).update({
    "total_amount": r.row["total_amount"] + product["price"]
    })
    .do(lambda result:
    # Update products in intersection table
    # If this fails, the `update` won't be unapplied
    r.table("orders_products").insert({
    "order": order_id, "product": product["id"]
    })
    )
    • The complete query is not
    guaranteed to succeed or fail

    View Slide

  45. Summary
    • Data modeling in RethinkDB is very
    flexible
    • The main tradeoff is subdocuments
    vs joins
    • How you model your data will
    depend heavily on how you interact
    with the database

    View Slide

  46. Summary
    Think about your data!

    View Slide

  47. Questions?

    View Slide