Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Small Data: Databases in the Real World

Small Data: Databases in the Real World

A talk I gave at PyCon AU 2014.

Andrew Godwin

August 04, 2014
Tweet

More Decks by Andrew Godwin

Other Decks in Programming

Transcript

  1. Andrew Godwin
    @andrewgodwin
    SMALL DATA
    REAL WORLD
    DATABASES IN THE

    View Slide

  2. Andrew Godwin
    Core Developer
    Senior Engineer

    View Slide

  3. BIG DATA
    What does it mean?
    What is 'big'?

    View Slide

  4. 1,000 rows?
    1,000,000 rows?
    1,000,000,000 rows?
    1,000,000,000,000 rows?

    View Slide

  5. Scalable designs are a tradeoff:
    NOW LATER
    vs

    View Slide

  6. Small company? Agency?
    Focus on ease of change, not scalability

    View Slide

  7. You don't need to scale
    from day one
    But always leave yourself scaling points

    View Slide

  8. Rapid development
    Continuous deployment
    Hardware choice
    Scaling 'breakpoints'

    View Slide

  9. Rapid development
    It's all about schema change overhead

    View Slide

  10. Explicit Schema
    ID int Name text Weight uint
    1
    2
    3
    Alice
    Bob
    Charles
    76
    84
    65
    Implicit Schema
    {
    "id": 342,
    "name": "David",
    "weight": 44,
    }

    View Slide

  11. Silent Failure
    {
    "id": 342,
    "name": "David",
    "weight": 74,
    }
    {
    "id": 342,
    "name": "Ellie",
    "weight": "85kg",
    }
    {
    "id": 342,
    "nom": "Frankie",
    "weight": 77,
    }
    {
    "id": 342,
    "name": "Frankie",
    "weight": -67,
    }

    View Slide

  12. Continuous deployment
    It's 11pm. Do you know where your locks are?

    View Slide

  13. Add NULL and backfill
    1-to-1 relation and backfill
    DBMS-supported type changes

    View Slide

  14. Hardware choice
    ZOMG RUN IT ON THE CLOUD

    View Slide

  15. VMs are TERRIBLE at IO
    Up to 10x slowdown, even with VT-d.

    View Slide

  16. Memory is king
    Your database loves it. Don't let other apps steal it.

    View Slide

  17. Adding more power goes far
    Especially with PostgreSQL or read-only replicas

    View Slide

  18. View Slide

  19. Sharding point
    Vertical split
    Consistency leeway

    View Slide

  20. Sharding point
    Datasets paritioned by primary key

    View Slide

  21. Migration plan
    Implement consistent hashing on primary key
    Make large number of logical shards (2048?)
    Map logical shards to single physical shard
    Migrate shards using replication

    View Slide

  22. Vertical split
    Entirely unrelated tables

    View Slide

  23. Migration plan
    Replicate database to new server
    Route split tables there, disable replication
    - or -
    Slowly backfill new datastore with fallback lookup

    View Slide

  24. Denormalisation
    It's not free!

    View Slide

  25. Migration plan
    Add NULL fields to dependent tables
    App code to fetch and fill if not present
    Possibly prefill on save of new items

    View Slide

  26. Consistency leeway
    Can you take inconsistent views?

    View Slide

  27. Migration plan
    Change your site!
    Talk to your designers!
    Deliberately introduce inconsistency!

    View Slide

  28. Big Data isn't one thing
    It depends on type, size, complexity,
    throughput, latency...

    View Slide

  29. Focus on the current problems
    Future problems don't matter if you never get there

    View Slide

  30. Efficiency and iterating fast matters
    The smaller you are, the more time is worth

    View Slide

  31. Good architecture affects product
    You're not writing a system in a vacuum

    View Slide

  32. Thanks!
    Andrew Godwin
    @andrewgodwin
    [email protected]
    are hiring!

    View Slide