Upgrade to Pro — share decks privately, control downloads, hide ads and more …

I Hate Your Database

I Hate Your Database

A talk I gave at Djangocon Europe 2012

Andrew Godwin

June 05, 2012
Tweet

More Decks by Andrew Godwin

Other Decks in Technology

Transcript

  1. I Hate Your Database I Hate Your Database Andrew Godwin

    Andrew Godwin @andrewgodwin @andrewgodwin flickr.com/96dpi flickr.com/96dpi
  2. Relational Relational Document Document Key-Value Key-Value Graph Graph Object /

    Hierarchial Object / Hierarchial Spatial Spatial Time-series / RRD Time-series / RRD Search Search
  3. Relational Relational PostgreSQL PostgreSQL MySQL MySQL SQLite SQLite PostgreSQL PostgreSQL

    MySQL MySQL SQLite SQLite MongoDB MongoDB CouchDB CouchDB Document Document Redis Redis Cassandra Cassandra Riak Riak Key-value Key-value
  4. Common Pitfalls Common Pitfalls SELECT item1, item2, item3 FROM basket;

    SELECT item1, item2, item3 FROM basket; INSERT INTO groups (name, people) INSERT INTO groups (name, people) VALUES ('friends', 'aaron,betty,chris,damien') VALUES ('friends', 'aaron,betty,chris,damien') SELECT file_bytes FROM pictures WHERE filename = SELECT file_bytes FROM pictures WHERE filename = 'foo.jpg' 'foo.jpg'
  5. names = set() names = set() for book in Book.objects.filter(year=2012):

    for book in Book.objects.filter(year=2012): names.add(book.author.name) names.add(book.author.name)
  6. No transactional DDL No transactional DDL Poor query optimiser Poor

    query optimiser MyISAM: Full-table locking, no transactions MyISAM: Full-table locking, no transactions Oracle Oracle Very fast for some operations Very fast for some operations MySQL MySQL
  7. Little integrity checking (slowly being fixed) Little integrity checking (slowly

    being fixed) Impossible to do some table alterations Impossible to do some table alterations No concurrent access No concurrent access Very low overhead Very low overhead Very portable Very portable SQLite SQLite
  8. Slow default configuration Slow default configuration Can be a little

    harder to learn / less familiar Can be a little harder to learn / less familiar Almost too many features Almost too many features Incredibly reliable Incredibly reliable PostgreSQL PostgreSQL
  9. db.insert({ db.insert({ 'name': 'Sally', 'name': 'Sally', 'tags': ['django', 'python', 'search'],

    'tags': ['django', 'python', 'search'], 'addresses': [ 'addresses': [ {'type': 'jabber', 'jid': '[email protected]'}, {'type': 'jabber', 'jid': '[email protected]'}, {'type': 'phone', 'number': '011899981199'}, {'type': 'phone', 'number': '011899981199'}, ], ], }) }) db.find({ db.find({ 'tags': 'python', 'tags': 'python', 'addresses.type': 'jabber', 'addresses.type': 'jabber', }) })
  10. No fixed schema No fixed schema Low barrier to entry

    Low barrier to entry Closer to Python datatypes Closer to Python datatypes Advantages Advantages
  11. Immature (but improving) Immature (but improving) No transactions No transactions

    No integrity checking No integrity checking Problems Problems
  12. db.set('foo', 'bar) db.set('foo', 'bar) x = db.get('foo') x = db.get('foo')

    db.sadd('names', 'andrew') db.sadd('names', 'andrew') db.sadd('names', 'brian') db.sadd('names', 'brian') y = db.scard('names') y = db.scard('names')
  13. Horizontal scaling (but with drawbacks) Horizontal scaling (but with drawbacks)

    Extremely fast Extremely fast Can only fetch objects by key Can only fetch objects by key Batch/map-reduce queries Batch/map-reduce queries Transactions not possible Transactions not possible Traits Traits
  14. Knowledge of projections useful Knowledge of projections useful Spatial indexes

    really speed up some problems Spatial indexes really speed up some problems Generally add-on to existing DB Generally add-on to existing DB Spatial Spatial
  15. Hierarchial key-value store Hierarchial key-value store Allows multiple writers for

    appends Allows multiple writers for appends Supports very large files Supports very large files Filesystems Filesystems
  16. Allow efficient neighbour queries Allow efficient neighbour queries Generally not

    much use for anything else Generally not much use for anything else Graph Databases Graph Databases
  17. Deliberately loses old data Deliberately loses old data Useful for

    logging or statistics Useful for logging or statistics Round-Robin Database Round-Robin Database
  18. It's unlikely your data all It's unlikely your data all

    fits in one paradigm. fits in one paradigm.
  19. If it sounds too good to be true, If it

    sounds too good to be true, it probably is. it probably is.