Pro Yearly is on sale from $80 to $50! »

I Hate Your Database

I Hate Your Database

A talk I gave at Djangocon Europe 2012

077e9a0cb34fa3eba2699240c9509717?s=128

Andrew Godwin

June 05, 2012
Tweet

Transcript

  1. I Hate Your Database I Hate Your Database Andrew Godwin

    Andrew Godwin @andrewgodwin @andrewgodwin flickr.com/96dpi flickr.com/96dpi
  2. Hate? Databases? Hate? Databases? flickr.com/parkerblohm flickr.com/parkerblohm

  3. · Misuse · Misuse · Ignorance · Ignorance · Lies

    · Lies
  4. Different databases, Different databases, different occasions different occasions flickr.com/maistora flickr.com/maistora

  5. Relational Relational Document Document Key-Value Key-Value Graph Graph Object /

    Hierarchial Object / Hierarchial Spatial Spatial Time-series / RRD Time-series / RRD Search Search
  6. Relational Relational PostgreSQL PostgreSQL MySQL MySQL SQLite SQLite PostgreSQL PostgreSQL

    MySQL MySQL SQLite SQLite MongoDB MongoDB CouchDB CouchDB Document Document Redis Redis Cassandra Cassandra Riak Riak Key-value Key-value
  7. Some Quick Theory Some Quick Theory flickr.com/iamdabe flickr.com/iamdabe

  8. A Atomicity tomicity C Consistency onsistency I Isolation solation D

    Durability urability
  9. C Consistency onsistency A Availability vailability P Partition Tolerance artition

    Tolerance
  10. Relational Databases Relational Databases flickr.com/maistora flickr.com/maistora

  11. Common Pitfalls Common Pitfalls SELECT item1, item2, item3 FROM basket;

    SELECT item1, item2, item3 FROM basket; INSERT INTO groups (name, people) INSERT INTO groups (name, people) VALUES ('friends', 'aaron,betty,chris,damien') VALUES ('friends', 'aaron,betty,chris,damien') SELECT file_bytes FROM pictures WHERE filename = SELECT file_bytes FROM pictures WHERE filename = 'foo.jpg' 'foo.jpg'
  12. names = set() names = set() for book in Book.objects.filter(year=2012):

    for book in Book.objects.filter(year=2012): names.add(book.author.name) names.add(book.author.name)
  13. names = Author.objects.filter( names = Author.objects.filter( books__year=2012 books__year=2012 ).values_list('name', flat=True).distinct()

    ).values_list('name', flat=True).distinct()
  14. No transactional DDL No transactional DDL Poor query optimiser Poor

    query optimiser MyISAM: Full-table locking, no transactions MyISAM: Full-table locking, no transactions Oracle Oracle Very fast for some operations Very fast for some operations MySQL MySQL
  15. Little integrity checking (slowly being fixed) Little integrity checking (slowly

    being fixed) Impossible to do some table alterations Impossible to do some table alterations No concurrent access No concurrent access Very low overhead Very low overhead Very portable Very portable SQLite SQLite
  16. Slow default configuration Slow default configuration Can be a little

    harder to learn / less familiar Can be a little harder to learn / less familiar Almost too many features Almost too many features Incredibly reliable Incredibly reliable PostgreSQL PostgreSQL
  17. Document Databases Document Databases flickr.com/babyowls flickr.com/babyowls

  18. db.insert({ db.insert({ 'name': 'Sally', 'name': 'Sally', 'tags': ['django', 'python', 'search'],

    'tags': ['django', 'python', 'search'], 'addresses': [ 'addresses': [ {'type': 'jabber', 'jid': 'sally@eg.com'}, {'type': 'jabber', 'jid': 'sally@eg.com'}, {'type': 'phone', 'number': '011899981199'}, {'type': 'phone', 'number': '011899981199'}, ], ], }) }) db.find({ db.find({ 'tags': 'python', 'tags': 'python', 'addresses.type': 'jabber', 'addresses.type': 'jabber', }) })
  19. No fixed schema No fixed schema Low barrier to entry

    Low barrier to entry Closer to Python datatypes Closer to Python datatypes Advantages Advantages
  20. Immature (but improving) Immature (but improving) No transactions No transactions

    No integrity checking No integrity checking Problems Problems
  21. Key-value stores Key-value stores flickr.com/zebble flickr.com/zebble

  22. db.set('foo', 'bar) db.set('foo', 'bar) x = db.get('foo') x = db.get('foo')

    db.sadd('names', 'andrew') db.sadd('names', 'andrew') db.sadd('names', 'brian') db.sadd('names', 'brian') y = db.scard('names') y = db.scard('names')
  23. Horizontal scaling (but with drawbacks) Horizontal scaling (but with drawbacks)

    Extremely fast Extremely fast Can only fetch objects by key Can only fetch objects by key Batch/map-reduce queries Batch/map-reduce queries Transactions not possible Transactions not possible Traits Traits
  24. Other database types Other database types flickr.com/tusnelda flickr.com/tusnelda

  25. Knowledge of projections useful Knowledge of projections useful Spatial indexes

    really speed up some problems Spatial indexes really speed up some problems Generally add-on to existing DB Generally add-on to existing DB Spatial Spatial
  26. Hierarchial key-value store Hierarchial key-value store Allows multiple writers for

    appends Allows multiple writers for appends Supports very large files Supports very large files Filesystems Filesystems
  27. Allow efficient neighbour queries Allow efficient neighbour queries Generally not

    much use for anything else Generally not much use for anything else Graph Databases Graph Databases
  28. Deliberately loses old data Deliberately loses old data Useful for

    logging or statistics Useful for logging or statistics Round-Robin Database Round-Robin Database
  29. Final Thoughts Final Thoughts flickr.com/pagedooley flickr.com/pagedooley

  30. It's unlikely your data all It's unlikely your data all

    fits in one paradigm. fits in one paradigm.
  31. Just buying bigger servers Just buying bigger servers goes a

    long way goes a long way
  32. If it sounds too good to be true, If it

    sounds too good to be true, it probably is. it probably is.
  33. Fin. Fin. Andrew Godwin / @andrewgodwin Andrew Godwin / @andrewgodwin

    flickr.com/oimax flickr.com/oimax