Small Data: Databases in the Real World

Small Data: Databases in the Real World

A talk I gave at PyCon AU 2014.

077e9a0cb34fa3eba2699240c9509717?s=128

Andrew Godwin

August 04, 2014
Tweet

Transcript

  1. Andrew Godwin @andrewgodwin SMALL DATA REAL WORLD DATABASES IN THE

  2. Andrew Godwin Core Developer Senior Engineer

  3. BIG DATA What does it mean? What is 'big'?

  4. 1,000 rows? 1,000,000 rows? 1,000,000,000 rows? 1,000,000,000,000 rows?

  5. Scalable designs are a tradeoff: NOW LATER vs

  6. Small company? Agency? Focus on ease of change, not scalability

  7. You don't need to scale from day one But always

    leave yourself scaling points
  8. Rapid development Continuous deployment Hardware choice Scaling 'breakpoints'

  9. Rapid development It's all about schema change overhead

  10. Explicit Schema ID int Name text Weight uint 1 2

    3 Alice Bob Charles 76 84 65 Implicit Schema { "id": 342, "name": "David", "weight": 44, }
  11. Silent Failure { "id": 342, "name": "David", "weight": 74, }

    { "id": 342, "name": "Ellie", "weight": "85kg", } { "id": 342, "nom": "Frankie", "weight": 77, } { "id": 342, "name": "Frankie", "weight": -67, }
  12. Continuous deployment It's 11pm. Do you know where your locks

    are?
  13. Add NULL and backfill 1-to-1 relation and backfill DBMS-supported type

    changes
  14. Hardware choice ZOMG RUN IT ON THE CLOUD

  15. VMs are TERRIBLE at IO Up to 10x slowdown, even

    with VT-d.
  16. Memory is king Your database loves it. Don't let other

    apps steal it.
  17. Adding more power goes far Especially with PostgreSQL or read-only

    replicas
  18. None
  19. Sharding point Vertical split Consistency leeway

  20. Sharding point Datasets paritioned by primary key

  21. Migration plan Implement consistent hashing on primary key Make large

    number of logical shards (2048?) Map logical shards to single physical shard Migrate shards using replication
  22. Vertical split Entirely unrelated tables

  23. Migration plan Replicate database to new server Route split tables

    there, disable replication - or - Slowly backfill new datastore with fallback lookup
  24. Denormalisation It's not free!

  25. Migration plan Add NULL fields to dependent tables App code

    to fetch and fill if not present Possibly prefill on save of new items
  26. Consistency leeway Can you take inconsistent views?

  27. Migration plan Change your site! Talk to your designers! Deliberately

    introduce inconsistency!
  28. Big Data isn't one thing It depends on type, size,

    complexity, throughput, latency...
  29. Focus on the current problems Future problems don't matter if

    you never get there
  30. Efficiency and iterating fast matters The smaller you are, the

    more time is worth
  31. Good architecture affects product You're not writing a system in

    a vacuum
  32. Thanks! Andrew Godwin @andrewgodwin andrewgodwin@eventbrite.com are hiring!