Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Databases: Beyond the SQL/NoSQL Debate

7a6310a990212e2b392d95a67855afc2?s=47 Geoffrey Litt
September 28, 2013

Databases: Beyond the SQL/NoSQL Debate

Seth Thompson and I gave this talk at the first HackNY Masters mini-conference, held at the HackNY Fall 2013 hackathon.

7a6310a990212e2b392d95a67855afc2?s=128

Geoffrey Litt

September 28, 2013
Tweet

Transcript

  1. Beyond the SQL / NoSQL Debate When File.open(“data.txt”) isn’t enough...

    Geoffrey Litt and Seth Thompson Yale University
  2. How do you choose a database?

  3. Is there a trendy new database that everyone’s talking about

    on Hacker News? Use the trendy new database! Yes Use whatever I used last time, I guess. No
  4. There is no silver bullet database!

  5. What are different ways to organize data? SQL Relational Database

    Management System (RDBMS) PostgreSQL MySQL Oracle SQL Server DB2 NoSQL Document Store MongoDB CouchDB Key-Value Store Redis Cassandra Graph Database Distributed Analytics Etc.
  6. Fundamental Concepts of Databases Transaction a unit of work performed

    within a database management system Table / Collection a grouping of records Tuple / Row / Document a single record of related data Schema a definition of attributes and relations, imposed on records ACID a set of properties that guarantee that transactions are processed reliably
  7. ACID

  8. Atomicity (ACID) Every transaction is all or nothing If one

    part of the transaction fails, the entire transaction fails account_1 $10 account_2 $0 Transaction to transfer $10 between two accounts
  9. Atomicity (ACID) Every transaction is all or nothing If one

    part of the transaction fails, the entire transaction fails account_1 $0 account_2 $0 Transaction to transfer $10 between two accounts
  10. Consistency (ACID) Every transaction brings the database from one valid

    state to another phone_numbers 0123456789 9876543210 01234567890 Inconsistent insertion in column with custom data type checking
  11. Isolation (ACID) No transaction can interfere with another In other

    words, the effects of concurrent transactions should be the same as if the transactions were run serially in arbitrary order account_1 $10 Process 1 val = account_1 val += $10 account_1 = val Process 2 val = account_1 val += $10 account_1 = val
  12. Durability (ACID) Once a transaction has been committed it will

    remain so, even in the event of power loss or system failure
  13. How do databases implement ACID guarantees?

  14. Spotlight: PostgreSQL

  15. PostgreSQL: Atomicity & Durability first_name last_name phone_number xnum Seth Thompson

    1111111111 2384 Geoffrey Litt 2222222222 3859 contacts table transactions table xnum status
  16. PostgreSQL: Atomicity & Durability contacts table transactions table xnum status

    9001 in_progress first_name last_name phone_number xnum Seth Thompson 1111111111 2384 Geoffrey Litt 2222222222 3859
  17. PostgreSQL: Atomicity & Durability contacts table transactions table xnum status

    9001 in_progress first_name last_name phone_number xnum Seth Thompson 1111111111 2384 Geoffrey Litt 2222222222 3859 Ramsey Nasser 3333333333 9001
  18. PostgreSQL: Atomicity & Durability contacts table transactions table xnum status

    9001 in_progress first_name last_name phone_number xnum Seth Thompson 1111111111 2384 Geoffrey Litt 2222222222 3859 Ramsey Nasser 3333333333 9001 Alex Qin 4444444444 9001
  19. PostgreSQL: Atomicity & Durability contacts table transactions table xnum status

    9001 in_progress first_name last_name phone_number xnum Seth Thompson 1111111111 2384 Geoffrey Litt 2222222222 3859 Ramsey Nasser 3333333333 9001 Alex Qin 4444444444 9001 Omar Omran 5555555555 9001
  20. PostgreSQL: Atomicity & Durability contacts table transactions table xnum status

    9001 in_progress first_name last_name phone_number xnum Seth Thompson 1111111111 2384 Geoffrey Litt 2222222222 3859 Ramsey Nasser 3333333333 9001 Alex Qin 4444444444 9001 Omar Omran 5555555555 9001 All writes persisted to disk!
  21. PostgreSQL: Atomicity & Durability contacts table transactions table xnum status

    9001 committed first_name last_name phone_number xnum Seth Thompson 1111111111 2384 Geoffrey Litt 2222222222 3859 Ramsey Nasser 3333333333 9001 Alex Qin 4444444444 9001 Omar Omran 5555555555 9001
  22. NoSQL

  23. NoSQL: Schema Flexibility Pros Plan for uncertain requirements and change

    schemas on the fly Store data records with different attributes Avoid slow JOINs Cons Schema enforces data validity, type checking Many complex data structures can be nicely modeled using a schema
  24. Sometimes it’s worth giving up ACID wat

  25. Spotlight: MongoDB

  26. Relaxed durability allows much faster write performance. Use case: high-volume

    logging MongoDB: Durability
  27. MongoDB: Durability MongoDB write benchmark results (version 2.4.1) Source: https://blog.serverdensity.com/mongodb-benchmarks/

    “Write concern” level Average write time Unacknowledged 0.07ms Acknowledged 0.13ms Journaled to HDD 100x slowdown when you enforce durability! 34.19ms
  28. Spotlight: Amazon DynamoDB

  29. DynamoDB: Consistency

  30. DynamoDB: Consistency

  31. DynamoDB: Consistency A user’s shopping cart is replicated across multiple

    data centers.
  32. DynamoDB: Consistency Network partition!

  33. DynamoDB: Consistency User tries to delete the item from the

    cart… What happens?
  34. Customers should be able to view and add items to

    their shopping cart even if… data centers are being destroyed by tornadoes. Source: “Dynamo: Amazon’s Highly Available Key-value “
  35. DynamoDB: Consistency Dynamo lets the user delete the item.

  36. DynamoDB: Consistency Network is restored!

  37. DynamoDB: Consistency Dynamo tries to resolve the conflict. The item

    gets added back into the cart!
  38. DynamoDB: Consistency Amazon avoids downtime at ALL costs. Consistency problems

    are rare and can be dealt with by customer service. Plus, good to keep items in shopping carts :)
  39. So how do you choose a database?

  40. Things to think about What components of ACID do you

    need? Does schema flexibility make sense for your application? Do you need massive scale? (Probably not) Codebase maturity: old isn’t bad!
  41. The future Grey lines between types of databases PostgreSQL has

    a JSON column type id (INT) first_name (VARCHAR) last_name (VARCHAR) settings (JSON) 1 Seth Thompson {“setting1”: true, “setting2”: [“a”, “b”... 2 Geoffrey Litt {“setting1”: false, “setting2”: [“d”, “c”... 3 Ramsey Nasser {“setting1”: false, “setting2”: [“e”, “f”... 4 Alex Qin {“setting1”: true, “setting2”: [“g”, “h”... 5 Omar Omran {“setting1”: true, “setting2”: [“q”, “s”...
  42. The future 90% of data that exists in the world

    has been generated in the past two years More database choices; choice is good!
  43. Thank you @s3ththompson @geoffreylitt