Slide 1

Slide 1 text

MongoDB Confessions of a PostgreSQL lover @ConradIrwin

Slide 2

Slide 2 text

I ♥

Slide 3

Slide 3 text

I ♥

Slide 4

Slide 4 text

Fight!

Slide 5

Slide 5 text

• Documents • Records • Flexibility • Integrity • Availability • Consistency

Slide 6

Slide 6 text

Documents { " _ i d " : " 5 2 3 3 b b 7 7 3 1 e f 2 0 5 2 1 d 0 0 0 0 0 2 " , " n a m e " : " C o n r a d I r w i n " , " f a c e b o o k " : { " a c c e s s _ k e y " : " $ u p e r $ e c r e t " } , " l i n k e d i n " : { " a c c e s s _ k e y " : " $ o m e w h a t $ e c r e t " , " a c c e s s _ s e c r e t " : " $ t u p e n d o u $ l y $ e c r e t " } } { " _ i d " : " 5 2 3 3 b b 7 7 3 1 e f 2 0 5 4 2 e 0 0 0 0 0 9 " , " n a m e " : " J a m e s S m i t h " , " f a c e b o o k " : { " a c c e s s _ k e y " : " $ o m e t h i n g e l $ e " } , " t w i t t e r " : { " a c c e s s _ k e y " : " $ e e m $ l e g i t " , " a c c e s s _ s e c r e t " : " $ e r i o u $ l y " } }

Slide 7

Slide 7 text

Records U s e r s i d n a m e 1 C o n r a d I r w i n 2 J a m e s S m i t h A c c o u n t s i d u s e r s i t e a c c e s s _ k e y a c c e s s _ s e c r e t 1 1 F a c e b o o k $ u p e r $ e c r e t — 2 1 L i n k e d I n $ o m e w h a t $ e c r e t $ t u p e n d o u $ l y $ e c r e t 3 2 F a c e b o o k $ o m e t h i n g e l $ e — 4 2 T w i t t e r $ e e m $ l e g i t $ e r i o u $ l y

Slide 8

Slide 8 text

Records vs. Documents Records Documents • Lots of small things • One big thing • All the same • Can differ • Split into chunks • Related data together • Atomic access to many • Atomic within document • "Space efficient" • "Locality efficient"

Slide 9

Slide 9 text

Aside: Space vs. Locality Disk latency (EBS): ~2ms Disk throughput (EBS): ~60MB/s Read 4kb from disk: 2ms + 0.07ms = ~2ms Read 8kb from disk: 2ms + 0.14ms = ~2.1ms Read 2 * 4kb from disk: 4ms + 0.14ms = ~4.1ms (Network round-trip (EC2): 2-3ms)

Slide 10

Slide 10 text

Aside: Space vs. Locality PostgreSQL: Overhead per row ~24bytes MongoDB: Overhead per document: size of JSON keys + padding Both: Per-table/per-collection overhead

Slide 11

Slide 11 text

Flexibility MongoDB: You can put anything in any document. PostgreSQL: You can only match the schema.

Slide 12

Slide 12 text

Integrity MongoDB: You could read anything out. PostgreSQL: You will only read valid data.

Slide 13

Slide 13 text

Flexibility vs. Integrity Flexibility optimizes for change. Integrity optimizes for validity. You can't set a schema in MongoDB. PostgreSQL requires an explicit schema.

Slide 14

Slide 14 text

e.g. Social network connections Each social network has some things in common. But they're all different, OAuth, OAuth2, etc. Don't want a table for each...

Slide 15

Slide 15 text

In PostgreSQL: C R E A T E T A B L E a c c o u n t s ( i d I N T E G E R P R I M A R Y K E Y , u s e r _ i d I N T E G E R N O T N U L L , s o c i a l _ n e t w o r k T E X T N O T N U L L , p r o p e r t i e s J S O N N O T N U L L D E F A U L T ' { } ' , c r e a t e d _ a t T I M E S T A M P , u p d a t e d _ a t T I M E S T A M P )

Slide 16

Slide 16 text

In MongoDB: { " _ i d " : " 5 2 3 3 b b 7 7 3 1 e f 2 0 5 2 1 d 0 0 0 0 0 2 " , " n a m e " : " C o n r a d I r w i n " , " f a c e b o o k " : { " a c c e s s _ k e y " : " $ u p e r $ e c r e t " } , " l i n k e d i n " : { " a c c e s s _ k e y " : " $ o m e w h a t $ e c r e t " , " a c c e s s _ s e c r e t " : " $ t u p e n d o u $ l y $ e c r e t " } }

Slide 17

Slide 17 text

e.g. Start using first names Have a table will full names. Start collecting first name & last name instead. Ensure all code works for all users...

Slide 18

Slide 18 text

MongoDB: u s e r . f i r s t _ n a m e | | u s e r . n a m e . s p l i t ( " " ) [ 0 ] ;

Slide 19

Slide 19 text

MongoDB data-layer / / e . g . u s i n g m o n g o s k i n f o r n o d e d b . b i n d ( ' u s e r s ' , { f e t c h : f u n c t i o n ( e m a i l , c a l l b a c k ) { t h i s . f i n d O n e ( { e m a i l : e m a i l } , f u n c t i o n ( e r r , u s e r ) { i f ( u s e r & & u s e r . n a m e & & ! u s e r . f i r s t _ n a m e ) { u s e r . f i r s t _ n a m e = u s e r . n a m e . s p l i t ( " " ) [ 0 ] ; } c a l l b a c k ( e r r , u s e r ) } ) } } ) ;

Slide 20

Slide 20 text

PostgreSQL: A L T E R T A B L E u s e r s A D D C O L U M N f i r s t _ n a m e N O T N U L L D E F A U L T s p l i t _ p a r t ( n a m e , ' ' , 1 ) ;

Slide 21

Slide 21 text

Consistency PostgreSQL: Written means written, no exceptions. (except disk failure, but use RAID) MongoDB: Written means written, unless something goes wrong. (e.g. server crash, network partition, disk failure)

Slide 22

Slide 22 text

Availability PostgreSQL: If the master dies, stop to avoid corruption. MongoDB: If the master dies, rebalance to avoid downtime. 'You cannot have consistency, availability and partition tolerance'. — CAP theorem

Slide 23

Slide 23 text

Which is better? PostgreSQL: Easier to understand. MongoDB: Pretty much "just works". Which do you prefer: A broken app, or data loss?

Slide 24

Slide 24 text

Scaling RAM is fast, Disk is slow. Ideal: fit all data in RAM. Good: fit working set in RAM. Bearable: fit working set indexes in RAM.

Slide 25

Slide 25 text

Making things faster Use a bigger database server. Replicate all your data over multiple servers. Shard portions of your data across multiple servers.

Slide 26

Slide 26 text

Use a bigger server. Good for PostgreSQL, up to 64 cores, 1TB RAM. Bad for MongoDB, per-database write locks. Expensive? Can't use cloud?

Slide 27

Slide 27 text

Sharding Good for MongoDB, built in support via mongos. Bad for PostgreSQL. Hard to chose shards to maintain integrity. Cheaper? Works in the cloud.

Slide 28

Slide 28 text

Replication Doesn't help write-throughput, always hits master. Doesn't give you more working-set ram. Gives you more disk heads. Gives you faster failover.

Slide 29

Slide 29 text

I ♥ PostgreSQL I ♥ MongoDB

Slide 30

Slide 30 text

MongoDB Confessions of a PostgreSQL lover @ConradIrwin