Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Databases Part 2: Choosing the Right Datastore

Databases Part 2: Choosing the Right Datastore

In light of the CAP theorem, this talk will cover how to analyze your application’s data storage needs and choose a datastore that suits the needs of the application and handles the load on the system.

Myles Megyesi

January 11, 2013
Tweet

More Decks by Myles Megyesi

Other Decks in Technology

Transcript

  1. CAP Theorem It is impossible for a distributed computer system

    to simultaneously provide all three of the following guarantees: • Consistency • Availability • Partition tolerance
  2. Consistency • All nodes see the same data at the

    same time • All data is always the same ◦ No write conflicts ◦ No faulty reads Solutions: • DB Transactions • Atomic writes
  3. Availability Simply put: the server responds • Lots of requests

    • Lots of data Solutions: • Vertical scaling ◦ Mecha server (68 GB cache) • Horizontal scaling (increase number of servers) ◦ read slaves ◦ shards
  4. Partition tolerance the system continues to operate despite arbitrary message

    loss or failure of part of the system Solutions: • Databases without a master node
  5. CAP Theorem In failure scenarios, what is most important to

    you? 1. The data stays consistent (C Camp) 2. The database stays running (A Camp)
  6. When do we pick? • Often times, this is done

    before the project starts ◦ Assume we are using MySQL! • Assumptions create coupled code
  7. Assumptions example 2: book = Book.create!(title: "Sherlock") Book.find(book.id).title.should == "Sherlock"

    # assume the book gets created immediately # assume you can read your writes
  8. Assumptions example 3: # assume delete happens immediately book =

    Book.create!(title: "Sherlock") book.destroy Book.count.should == 0
  9. When do we pick? • Once an assumption is made,

    it is hard to decouple from it • The assumptions become littered throughout the system without you knowing • Don't make the assumption ◦ Choose later ◦ Isolate all code that talks to a datastore ◦ Expose hidden dependencies
  10. When do we pick? • Defer the decision until you

    absolutely must a. Demos ▪ Who cares? - Put the data anywhere b. Production deploy - need to store the data somewhere c. Use case that begs for a feature of one a database
  11. Some Questions to consider • Use case by use case

    basis • Do you need a specific feature? ◦ Transactions? • Read operations ◦ Do you need to read your writes? ◦ Do you need to perform some crazy INNER JOIN. • Write Operations ◦ Do you need ACID? ◦ Does it need to happen right now?
  12. Group analysis of use cases Split into groups! For each

    of the following use cases, choose which type of database(s) infrastructure would suit best. Remember to take into account lots of scenarios like: 1. Consistency 2. Speed 3. Availability 4. Failure
  13. Use case 1 We are building an application for Napa

    Auto parts. In this system there are Cars and Parts. A car has many parts and part can belong to many cars. I would like a web page that shows me a table of the different parts in a given car.
  14. Use case 2 You are hired by Chase Manhattan Bank

    to write a payment processing API. At first, this API only needs to support transactions between two Chase customers, but someday they would like to support transactions between users of other banks as well.
  15. Use case 3 The Chicago Tribune hires you to build

    a new web application. The main feature of the site is the user's dashboard, where they go to see news that is location aware and age demographic aware.
  16. Use case 4 The New York Stock Exchange needs you

    to build an API that reports stock prices. Given a ticker symbol, you simply return the current price of the stock.
  17. Conclusion • Separate yourself from the datastore so that the

    dependencies/assumptions are more clear • Choose based on absolute necessity ◦ Databases are not Javascript Frameworks - don't pick the newest/hottest one