Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Small Data: Storage For The Rest Of Us

Small Data: Storage For The Rest Of Us

A talk I gave at PyWaw Summit 2015.

Andrew Godwin

May 26, 2015
Tweet

More Decks by Andrew Godwin

Other Decks in Programming

Transcript

  1. Andrew Godwin
    @andrewgodwin
    SMALL DATA
    STORAGE FOR THE REST OF US

    View Slide

  2. Andrew Godwin
    Hi, I'm
    Django Core Developer
    Senior Engineer at
    Far too many hobbies

    View Slide

  3. BIG DATA
    What does it mean?

    View Slide

  4. BIG DATA
    What does it mean?
    What is 'big'?

    View Slide

  5. 1,000 rows?
    1,000,000 rows?
    1,000,000,000 rows?
    1,000,000,000,000 rows?

    View Slide

  6. Scalable designs are a tradeoff:
    NOW LATER
    vs

    View Slide

  7. Small company? Agency?
    Focus on ease of change, not scalability

    View Slide

  8. You don't need to scale
    from day one
    But always leave yourself scaling points

    View Slide

  9. Rapid development
    Continuous deployment
    Hardware choice
    Scaling 'breakpoints'

    View Slide

  10. Rapid development
    It's all about schema change overhead

    View Slide

  11. Explicit Schema
    ID int Name text Weight uint
    1
    2
    3
    Alice
    Bob
    Charles
    76
    84
    65
    Implicit Schema
    {
    "id": 342,
    "name": "David",
    "weight": 44,
    }

    View Slide

  12. Silent Failure
    {
    "id": 342,
    "name": "David",
    "weight": 74,
    }
    {
    "id": 342,
    "name": "Ellie",
    "weight": "85kg",
    }
    {
    "id": 342,
    "nom": "Frankie",
    "weight": 77,
    }
    {
    "id": 342,
    "name": "Frankie",
    "weight": -67,
    }

    View Slide

  13. Continuous deployment
    It's 11pm. Do you know where your locks are?

    View Slide

  14. Add NULL and backfill
    1-to-1 relation and backfill
    DBMS-supported type changes

    View Slide

  15. Hardware choice
    ZOMG RUN IT ON THE CLOUD

    View Slide

  16. VMs are TERRIBLE at IO
    Up to 10x slowdown, even with VT-d.

    View Slide

  17. Memory is king
    Your database loves it. Don't let other apps steal it.

    View Slide

  18. Adding more power goes far
    Especially with PostgreSQL or read-only replicas

    View Slide

  19. Scaling Breakpoints

    View Slide

  20. Sharding point
    Datasets paritioned by primary key

    View Slide

  21. Vertical split
    Entirely unrelated tables

    View Slide

  22. Denormalisation
    It's not free!

    View Slide

  23. Consistency leeway
    Can you take inconsistent views?

    View Slide

  24. Load Shapes

    View Slide

  25. Read-heavy
    Write-heavy Large size

    View Slide

  26. Read-heavy
    Write-heavy Large size
    Wikipedia
    TV show website
    Minecraft
    Forums
    Amazon Glacier
    Eventbrite
    Logging

    View Slide

  27. Read-heavy
    Write-heavy Large size
    Offline storage
    Append
    formats
    In-memory cache / flat files
    Many indexes
    Fewer indexes

    View Slide

  28. Extremes

    View Slide

  29. Extreme Reads
    Heavy Replication
    Extreme Writes
    Sacrifice ordering or consistency
    Extreme Size
    Sacrifice query time

    View Slide

  30. Extreme Longevity
    Flash in cold storage
    Extreme Survivability
    Rad-hardened Flash
    Extreme Auditability
    True append only storage

    View Slide

  31. SSDs
    Magnetic Tape
    Hard Drives
    Consumer Flash
    CDs/DVDs
    Long-life Flash
    Metal-Carbon DVDs
    3-6 months
    5-10 years
    3-5 years
    100+ years
    Approximate time to bit flip, unpowered at room temperature

    View Slide

  32. Big Data isn't one thing
    It depends on type, size, complexity,
    throughput, latency...

    View Slide

  33. Focus on the current problems
    Future problems don't matter if you never get there

    View Slide

  34. Efficiency and iterating fast matters
    The smaller you are, the more time is worth

    View Slide

  35. Good architecture affects product
    You're not writing a system in a vacuum

    View Slide

  36. Thanks.
    Andrew Godwin
    @andrewgodwin

    View Slide