Databases Part 2: Choosing the Right Datastore

Databases Part 2: Choosing the right Datastore

CAP Theorem It is impossible for a distributed computer system
to simultaneously provide all three of the following guarantees: • Consistency • Availability • Partition tolerance

Consistency • All nodes see the same data at the
same time • All data is always the same ◦ No write conflicts ◦ No faulty reads Solutions: • DB Transactions • Atomic writes

Availability Simply put: the server responds • Lots of requests
• Lots of data Solutions: • Vertical scaling ◦ Mecha server (68 GB cache) • Horizontal scaling (increase number of servers) ◦ read slaves ◦ shards

Partition tolerance the system continues to operate despite arbitrary message
loss or failure of part of the system Solutions: • Databases without a master node

CAP Theorem In failure scenarios, what is most important to
you? 1. The data stays consistent (C Camp) 2. The database stays running (A Camp)

Picking the right datastore • Assumption: You actually get to
pick the datastore!

When do we pick? • Often times, this is done
before the project starts ◦ Assume we are using MySQL! • Assumptions create coupled code

Assumptions example 1: class User < ActiveRecord::Base validates :email, :uniqueness
=> true end User.create!(email: "[email protected]") # assume that this is atomic

Assumptions example 2: book = Book.create!(title: "Sherlock") Book.find(book.id).title.should == "Sherlock"
# assume the book gets created immediately # assume you can read your writes

Assumptions example 3: # assume delete happens immediately book =
Book.create!(title: "Sherlock") book.destroy Book.count.should == 0

When do we pick? • Once an assumption is made,
it is hard to decouple from it • The assumptions become littered throughout the system without you knowing • Don't make the assumption ◦ Choose later ◦ Isolate all code that talks to a datastore ◦ Expose hidden dependencies

When do we pick? • Defer the decision until you
absolutely must a. Demos ▪ Who cares? - Put the data anywhere b. Production deploy - need to store the data somewhere c. Use case that begs for a feature of one a database

Some Questions to consider • Use case by use case
basis • Do you need a specific feature? ◦ Transactions? • Read operations ◦ Do you need to read your writes? ◦ Do you need to perform some crazy INNER JOIN. • Write Operations ◦ Do you need ACID? ◦ Does it need to happen right now?

Group analysis of use cases Split into groups! For each
of the following use cases, choose which type of database(s) infrastructure would suit best. Remember to take into account lots of scenarios like: 1. Consistency 2. Speed 3. Availability 4. Failure

Use case 1 We are building an application for Napa
Auto parts. In this system there are Cars and Parts. A car has many parts and part can belong to many cars. I would like a web page that shows me a table of the different parts in a given car.

Use case 2 You are hired by Chase Manhattan Bank
to write a payment processing API. At first, this API only needs to support transactions between two Chase customers, but someday they would like to support transactions between users of other banks as well.

Use case 3 The Chicago Tribune hires you to build
a new web application. The main feature of the site is the user's dashboard, where they go to see news that is location aware and age demographic aware.

Use case 4 The New York Stock Exchange needs you
to build an API that reports stock prices. Given a ticker symbol, you simply return the current price of the stock.

Conclusion • Separate yourself from the datastore so that the
dependencies/assumptions are more clear • Choose based on absolute necessity ◦ Databases are not Javascript Frameworks - don't pick the newest/hottest one

Databases Part 2: Choosing the Right Datastore

Databases Part 2: Choosing the Right Datastore

Myles Megyesi

More Decks by Myles Megyesi

Other Decks in Technology

Featured

Transcript

Databases Part 2: Choosing the right Datastore

CAP Theorem It is impossible for a distributed computer system

Consistency • All nodes see the same data at the

Availability Simply put: the server responds • Lots of requests

Partition tolerance the system continues to operate despite arbitrary message

CAP Theorem In failure scenarios, what is most important to

Picking the right datastore • Assumption: You actually get to

When do we pick? • Often times, this is done

Assumptions example 1: class User < ActiveRecord::Base validates :email, :uniqueness

Assumptions example 2: book = Book.create!(title: "Sherlock") Book.find(book.id).title.should == "Sherlock"

Assumptions example 3: # assume delete happens immediately book =

When do we pick? • Once an assumption is made,

When do we pick? • Defer the decision until you

Some Questions to consider • Use case by use case

Group analysis of use cases Split into groups! For each

Use case 1 We are building an application for Napa

Use case 2 You are hired by Chase Manhattan Bank

Use case 3 The Chicago Tribune hires you to build

Use case 4 The New York Stock Exchange needs you

Conclusion • Separate yourself from the datastore so that the