Small Data: Storage For The Rest Of Us

Small Data: Storage For The Rest Of Us

A talk I gave at PyWaw Summit 2015.

077e9a0cb34fa3eba2699240c9509717?s=128

Andrew Godwin

May 26, 2015
Tweet

Transcript

  1. Andrew Godwin @andrewgodwin SMALL DATA STORAGE FOR THE REST OF

    US
  2. Andrew Godwin Hi, I'm Django Core Developer Senior Engineer at

    Far too many hobbies
  3. BIG DATA What does it mean?

  4. BIG DATA What does it mean? What is 'big'?

  5. 1,000 rows? 1,000,000 rows? 1,000,000,000 rows? 1,000,000,000,000 rows?

  6. Scalable designs are a tradeoff: NOW LATER vs

  7. Small company? Agency? Focus on ease of change, not scalability

  8. You don't need to scale from day one But always

    leave yourself scaling points
  9. Rapid development Continuous deployment Hardware choice Scaling 'breakpoints'

  10. Rapid development It's all about schema change overhead

  11. Explicit Schema ID int Name text Weight uint 1 2

    3 Alice Bob Charles 76 84 65 Implicit Schema { "id": 342, "name": "David", "weight": 44, }
  12. Silent Failure { "id": 342, "name": "David", "weight": 74, }

    { "id": 342, "name": "Ellie", "weight": "85kg", } { "id": 342, "nom": "Frankie", "weight": 77, } { "id": 342, "name": "Frankie", "weight": -67, }
  13. Continuous deployment It's 11pm. Do you know where your locks

    are?
  14. Add NULL and backfill 1-to-1 relation and backfill DBMS-supported type

    changes
  15. Hardware choice ZOMG RUN IT ON THE CLOUD

  16. VMs are TERRIBLE at IO Up to 10x slowdown, even

    with VT-d.
  17. Memory is king Your database loves it. Don't let other

    apps steal it.
  18. Adding more power goes far Especially with PostgreSQL or read-only

    replicas
  19. Scaling Breakpoints

  20. Sharding point Datasets paritioned by primary key

  21. Vertical split Entirely unrelated tables

  22. Denormalisation It's not free!

  23. Consistency leeway Can you take inconsistent views?

  24. Load Shapes

  25. Read-heavy Write-heavy Large size

  26. Read-heavy Write-heavy Large size Wikipedia TV show website Minecraft Forums

    Amazon Glacier Eventbrite Logging
  27. Read-heavy Write-heavy Large size Offline storage Append formats In-memory cache

    / flat files Many indexes Fewer indexes
  28. Extremes

  29. Extreme Reads Heavy Replication Extreme Writes Sacrifice ordering or consistency

    Extreme Size Sacrifice query time
  30. Extreme Longevity Flash in cold storage Extreme Survivability Rad-hardened Flash

    Extreme Auditability True append only storage
  31. SSDs Magnetic Tape Hard Drives Consumer Flash CDs/DVDs Long-life Flash

    Metal-Carbon DVDs 3-6 months 5-10 years 3-5 years 100+ years Approximate time to bit flip, unpowered at room temperature
  32. Big Data isn't one thing It depends on type, size,

    complexity, throughput, latency...
  33. Focus on the current problems Future problems don't matter if

    you never get there
  34. Efficiency and iterating fast matters The smaller you are, the

    more time is worth
  35. Good architecture affects product You're not writing a system in

    a vacuum
  36. Thanks. Andrew Godwin @andrewgodwin