Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Databases: Beyond the SQL/NoSQL Debate

Geoffrey Litt
September 28, 2013

Databases: Beyond the SQL/NoSQL Debate

Seth Thompson and I gave this talk at the first HackNY Masters mini-conference, held at the HackNY Fall 2013 hackathon.

Geoffrey Litt

September 28, 2013
Tweet

More Decks by Geoffrey Litt

Other Decks in Programming

Transcript

  1. Beyond the
    SQL / NoSQL Debate
    When File.open(“data.txt”) isn’t enough...
    Geoffrey Litt and Seth Thompson
    Yale University

    View Slide

  2. How do you choose a database?

    View Slide

  3. Is there a
    trendy new
    database that
    everyone’s talking
    about on Hacker
    News?
    Use the trendy new
    database!
    Yes
    Use whatever I used
    last time, I guess.
    No

    View Slide

  4. There is no silver bullet
    database!

    View Slide

  5. What are different ways to organize
    data?
    SQL
    Relational Database
    Management System
    (RDBMS)
    PostgreSQL
    MySQL
    Oracle
    SQL Server
    DB2
    NoSQL
    Document Store
    MongoDB
    CouchDB
    Key-Value Store
    Redis
    Cassandra
    Graph Database
    Distributed Analytics
    Etc.

    View Slide

  6. Fundamental Concepts of Databases
    Transaction
    a unit of work performed within a database management system
    Table / Collection
    a grouping of records
    Tuple / Row / Document
    a single record of related data
    Schema
    a definition of attributes and relations, imposed on records
    ACID
    a set of properties that guarantee that transactions are processed reliably

    View Slide

  7. ACID

    View Slide

  8. Atomicity (ACID)
    Every transaction is all or nothing
    If one part of the transaction fails, the entire
    transaction fails
    account_1 $10
    account_2 $0
    Transaction to transfer $10 between two
    accounts

    View Slide

  9. Atomicity (ACID)
    Every transaction is all or nothing
    If one part of the transaction fails, the entire
    transaction fails
    account_1 $0
    account_2 $0
    Transaction to transfer $10 between two
    accounts

    View Slide

  10. Consistency (ACID)
    Every transaction brings the database from one
    valid state to another
    phone_numbers
    0123456789
    9876543210
    01234567890
    Inconsistent
    insertion in
    column with
    custom data
    type
    checking

    View Slide

  11. Isolation (ACID)
    No transaction can interfere with another
    In other words, the effects of concurrent
    transactions should be the same as if the
    transactions were run serially in arbitrary order
    account_1 $10
    Process 1
    val = account_1
    val += $10
    account_1 = val
    Process 2
    val = account_1
    val += $10
    account_1 = val

    View Slide

  12. Durability (ACID)
    Once a transaction has been committed it will
    remain so, even in the event of power loss or
    system failure

    View Slide

  13. How do databases
    implement ACID
    guarantees?

    View Slide

  14. Spotlight: PostgreSQL

    View Slide

  15. PostgreSQL: Atomicity & Durability
    first_name last_name phone_number xnum
    Seth Thompson 1111111111 2384
    Geoffrey Litt 2222222222 3859
    contacts table
    transactions table
    xnum status

    View Slide

  16. PostgreSQL: Atomicity & Durability
    contacts table
    transactions table
    xnum status
    9001 in_progress
    first_name last_name phone_number xnum
    Seth Thompson 1111111111 2384
    Geoffrey Litt 2222222222 3859

    View Slide

  17. PostgreSQL: Atomicity & Durability
    contacts table
    transactions table
    xnum status
    9001 in_progress
    first_name last_name phone_number xnum
    Seth Thompson 1111111111 2384
    Geoffrey Litt 2222222222 3859
    Ramsey Nasser 3333333333 9001

    View Slide

  18. PostgreSQL: Atomicity & Durability
    contacts table
    transactions table
    xnum status
    9001 in_progress
    first_name last_name phone_number xnum
    Seth Thompson 1111111111 2384
    Geoffrey Litt 2222222222 3859
    Ramsey Nasser 3333333333 9001
    Alex Qin 4444444444 9001

    View Slide

  19. PostgreSQL: Atomicity & Durability
    contacts table
    transactions table
    xnum status
    9001 in_progress
    first_name last_name phone_number xnum
    Seth Thompson 1111111111 2384
    Geoffrey Litt 2222222222 3859
    Ramsey Nasser 3333333333 9001
    Alex Qin 4444444444 9001
    Omar Omran 5555555555 9001

    View Slide

  20. PostgreSQL: Atomicity & Durability
    contacts table
    transactions table
    xnum status
    9001 in_progress
    first_name last_name phone_number xnum
    Seth Thompson 1111111111 2384
    Geoffrey Litt 2222222222 3859
    Ramsey Nasser 3333333333 9001
    Alex Qin 4444444444 9001
    Omar Omran 5555555555 9001
    All writes persisted to disk!

    View Slide

  21. PostgreSQL: Atomicity & Durability
    contacts table
    transactions table
    xnum status
    9001 committed
    first_name last_name phone_number xnum
    Seth Thompson 1111111111 2384
    Geoffrey Litt 2222222222 3859
    Ramsey Nasser 3333333333 9001
    Alex Qin 4444444444 9001
    Omar Omran 5555555555 9001

    View Slide

  22. NoSQL

    View Slide

  23. NoSQL: Schema Flexibility
    Pros
    Plan for uncertain
    requirements and change
    schemas on the fly
    Store data records with
    different attributes
    Avoid slow JOINs
    Cons
    Schema enforces
    data validity, type
    checking
    Many complex data
    structures can be
    nicely modeled
    using a schema

    View Slide

  24. Sometimes it’s worth
    giving up ACID
    wat

    View Slide

  25. Spotlight: MongoDB

    View Slide

  26. Relaxed durability allows much faster write
    performance.
    Use case: high-volume logging
    MongoDB: Durability

    View Slide

  27. MongoDB: Durability
    MongoDB write benchmark results (version 2.4.1)
    Source: https://blog.serverdensity.com/mongodb-benchmarks/
    “Write concern” level Average write time
    Unacknowledged 0.07ms
    Acknowledged 0.13ms
    Journaled to HDD
    100x slowdown when you enforce durability!
    34.19ms

    View Slide

  28. Spotlight: Amazon DynamoDB

    View Slide

  29. DynamoDB: Consistency

    View Slide

  30. DynamoDB: Consistency

    View Slide

  31. DynamoDB: Consistency
    A user’s shopping cart is
    replicated across multiple
    data centers.

    View Slide

  32. DynamoDB: Consistency
    Network partition!

    View Slide

  33. DynamoDB: Consistency
    User tries to delete the item
    from the cart…
    What happens?

    View Slide

  34. Customers should be able to view and add items to their
    shopping cart even if… data centers are being destroyed by
    tornadoes.
    Source: “Dynamo: Amazon’s Highly Available Key-value

    View Slide

  35. DynamoDB: Consistency
    Dynamo lets the user delete
    the item.

    View Slide

  36. DynamoDB: Consistency
    Network is restored!

    View Slide

  37. DynamoDB: Consistency
    Dynamo tries to resolve the
    conflict. The item gets added
    back into the cart!

    View Slide

  38. DynamoDB: Consistency
    Amazon avoids downtime at ALL costs.
    Consistency problems are rare and can be dealt
    with by customer service.
    Plus, good to keep items in shopping carts :)

    View Slide

  39. So how do you choose a database?

    View Slide

  40. Things to think about
    What components of ACID do you need?
    Does schema flexibility make sense for your
    application?
    Do you need massive scale? (Probably not)
    Codebase maturity: old isn’t bad!

    View Slide

  41. The future
    Grey lines between types of databases
    PostgreSQL has a JSON column type
    id
    (INT)
    first_name
    (VARCHAR)
    last_name
    (VARCHAR)
    settings
    (JSON)
    1 Seth Thompson {“setting1”: true, “setting2”: [“a”, “b”...
    2 Geoffrey Litt {“setting1”: false, “setting2”: [“d”,
    “c”...
    3 Ramsey Nasser {“setting1”: false, “setting2”: [“e”,
    “f”...
    4 Alex Qin {“setting1”: true, “setting2”: [“g”, “h”...
    5 Omar Omran {“setting1”: true, “setting2”: [“q”, “s”...

    View Slide

  42. The future
    90% of data that exists in the world has been
    generated in the past two years
    More database choices; choice is good!

    View Slide

  43. Thank you
    @s3ththompson
    @geoffreylitt

    View Slide