Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MongoDB — confessions of a PostgreSQL lover

MongoDB — confessions of a PostgreSQL lover

So, I switched to MongoDB. After using PostgreSQL for 7 years, and enjoying every minute, it wasn't a switch I was expecting to make. But I'm loving it!

This talk will explore some of the differences between building apps with relational databases and document databases. We'll cover a few of the expected differences, and also some of the more subtle ones. We'll also discuss how some of the perceived limitations of a document database are actually strengths in
disguise.

Conrad Irwin

October 17, 2013
Tweet

More Decks by Conrad Irwin

Other Decks in Programming

Transcript

  1. Documents { " _ i d " : " 5

    2 3 3 b b 7 7 3 1 e f 2 0 5 2 1 d 0 0 0 0 0 2 " , " n a m e " : " C o n r a d I r w i n " , " f a c e b o o k " : { " a c c e s s _ k e y " : " $ u p e r $ e c r e t " } , " l i n k e d i n " : { " a c c e s s _ k e y " : " $ o m e w h a t $ e c r e t " , " a c c e s s _ s e c r e t " : " $ t u p e n d o u $ l y $ e c r e t " } } { " _ i d " : " 5 2 3 3 b b 7 7 3 1 e f 2 0 5 4 2 e 0 0 0 0 0 9 " , " n a m e " : " J a m e s S m i t h " , " f a c e b o o k " : { " a c c e s s _ k e y " : " $ o m e t h i n g e l $ e " } , " t w i t t e r " : { " a c c e s s _ k e y " : " $ e e m $ l e g i t " , " a c c e s s _ s e c r e t " : " $ e r i o u $ l y " } }
  2. Records U s e r s i d n a

    m e 1 C o n r a d I r w i n 2 J a m e s S m i t h A c c o u n t s i d u s e r s i t e a c c e s s _ k e y a c c e s s _ s e c r e t 1 1 F a c e b o o k $ u p e r $ e c r e t — 2 1 L i n k e d I n $ o m e w h a t $ e c r e t $ t u p e n d o u $ l y $ e c r e t 3 2 F a c e b o o k $ o m e t h i n g e l $ e — 4 2 T w i t t e r $ e e m $ l e g i t $ e r i o u $ l y
  3. Records vs. Documents Records Documents • Lots of small things

    • One big thing • All the same • Can differ • Split into chunks • Related data together • Atomic access to many • Atomic within document • "Space efficient" • "Locality efficient"
  4. Aside: Space vs. Locality Disk latency (EBS): ~2ms Disk throughput

    (EBS): ~60MB/s Read 4kb from disk: 2ms + 0.07ms = ~2ms Read 8kb from disk: 2ms + 0.14ms = ~2.1ms Read 2 * 4kb from disk: 4ms + 0.14ms = ~4.1ms (Network round-trip (EC2): 2-3ms)
  5. Aside: Space vs. Locality PostgreSQL: Overhead per row ~24bytes MongoDB:

    Overhead per document: size of JSON keys + padding Both: Per-table/per-collection overhead
  6. Flexibility vs. Integrity Flexibility optimizes for change. Integrity optimizes for

    validity. You can't set a schema in MongoDB. PostgreSQL requires an explicit schema.
  7. e.g. Social network connections Each social network has some things

    in common. But they're all different, OAuth, OAuth2, etc. Don't want a table for each...
  8. In PostgreSQL: C R E A T E T A

    B L E a c c o u n t s ( i d I N T E G E R P R I M A R Y K E Y , u s e r _ i d I N T E G E R N O T N U L L , s o c i a l _ n e t w o r k T E X T N O T N U L L , p r o p e r t i e s J S O N N O T N U L L D E F A U L T ' { } ' , c r e a t e d _ a t T I M E S T A M P , u p d a t e d _ a t T I M E S T A M P )
  9. In MongoDB: { " _ i d " : "

    5 2 3 3 b b 7 7 3 1 e f 2 0 5 2 1 d 0 0 0 0 0 2 " , " n a m e " : " C o n r a d I r w i n " , " f a c e b o o k " : { " a c c e s s _ k e y " : " $ u p e r $ e c r e t " } , " l i n k e d i n " : { " a c c e s s _ k e y " : " $ o m e w h a t $ e c r e t " , " a c c e s s _ s e c r e t " : " $ t u p e n d o u $ l y $ e c r e t " } }
  10. e.g. Start using first names Have a table will full

    names. Start collecting first name & last name instead. Ensure all code works for all users...
  11. MongoDB: u s e r . f i r s

    t _ n a m e | | u s e r . n a m e . s p l i t ( " " ) [ 0 ] ;
  12. MongoDB data-layer / / e . g . u s

    i n g m o n g o s k i n f o r n o d e d b . b i n d ( ' u s e r s ' , { f e t c h : f u n c t i o n ( e m a i l , c a l l b a c k ) { t h i s . f i n d O n e ( { e m a i l : e m a i l } , f u n c t i o n ( e r r , u s e r ) { i f ( u s e r & & u s e r . n a m e & & ! u s e r . f i r s t _ n a m e ) { u s e r . f i r s t _ n a m e = u s e r . n a m e . s p l i t ( " " ) [ 0 ] ; } c a l l b a c k ( e r r , u s e r ) } ) } } ) ;
  13. PostgreSQL: A L T E R T A B L

    E u s e r s A D D C O L U M N f i r s t _ n a m e N O T N U L L D E F A U L T s p l i t _ p a r t ( n a m e , ' ' , 1 ) ;
  14. Consistency PostgreSQL: Written means written, no exceptions. (except disk failure,

    but use RAID) MongoDB: Written means written, unless something goes wrong. (e.g. server crash, network partition, disk failure)
  15. Availability PostgreSQL: If the master dies, stop to avoid corruption.

    MongoDB: If the master dies, rebalance to avoid downtime. 'You cannot have consistency, availability and partition tolerance'. — CAP theorem
  16. Which is better? PostgreSQL: Easier to understand. MongoDB: Pretty much

    "just works". Which do you prefer: A broken app, or data loss?
  17. Scaling RAM is fast, Disk is slow. Ideal: fit all

    data in RAM. Good: fit working set in RAM. Bearable: fit working set indexes in RAM.
  18. Making things faster Use a bigger database server. Replicate all

    your data over multiple servers. Shard portions of your data across multiple servers.
  19. Use a bigger server. Good for PostgreSQL, up to 64

    cores, 1TB RAM. Bad for MongoDB, per-database write locks. Expensive? Can't use cloud?
  20. Sharding Good for MongoDB, built in support via mongos. Bad

    for PostgreSQL. Hard to chose shards to maintain integrity. Cheaper? Works in the cloud.
  21. Replication Doesn't help write-throughput, always hits master. Doesn't give you

    more working-set ram. Gives you more disk heads. Gives you faster failover.