Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How shit works: Storage

How shit works: Storage

A talk given at GeeCON 2016 in Kraków, Poland.

The beautiful thing about software engineering is that it gives you the warm and fuzzy illusion of total understanding: I control this machine because I know how it operates. This is the result of layers upon layers of successful abstractions, which hide immense sophistication and complexity. As with any abstraction, though, these sometimes leak, and that's when a good grounding in what's under the hood pays off.

This first in what will hopefully be a series of talks covers the fundamentals of storage, providing an overview of the three storage tiers commonly found on modern platforms (hard drives, RAM and CPU cache). You'll come away knowing a little bit about a lot of different moving parts under the hood; after all, isn't understanding how the machine operates what this is all about?

0014decc65763e66f22891be724b5afa?s=128

Tomer Gabel

May 11, 2016
Tweet

More Decks by Tomer Gabel

Other Decks in Programming

Transcript

  1. How Shit Works: Storage Tomer Gabel, Wix @ GeeCON Kraków

    2016
  2. Like all good stories… • We’ll start with a question.

    • “What’s wrong with this picture?”
  3. Like all good stories… • We’ll start with a question.

    • “What’s wrong with this picture?”
  4. MY, OH, MY. WHAT COULD IT BE?

  5. Axioms • Not a trick question – Servers are properly

    configured – System architecture makes sense – No obvious bugs – No scheduled jobs • So what else goes bump in the night?
  6. PROLOGUE “A LAUGHABLE CLAIM”

  7. I/O is simple • Just open a file, write, flush,

    close • Nothing to it, right? HDD Application File
  8. I/O is simple • A little closer… HDD Application File

    Kernel File system (ext4) Virtual File System Logical Volume Manager I/O scheduler SCSI driver stack
  9. I/O is simple • But really… HDD Application File Kernel

    Hardware Storage Subsystem System Bus Drivers PCI Express Bus SATA Controller
  10. THE ONION OF ABSTRACTION

  11. ACT I THESE BOOTS ARE MADE FOR WALKIN’

  12. Everybody knows... • Sequential access is fast • Random access

    is slow • … so what?
  13. Everybody knows… “Disk seeks are a huge performance bottleneck… When

    the amount of data starts to grow so large that effective caching becomes impossible… you need at least one disk seek to read and a couple of disk seeks to write things.” -- MySQL Reference Manual (8.12.3)
  14. Everybody knows… “Disk seeks are a huge performance bottleneck… When

    the amount of data starts to grow so large that effective caching becomes impossible… you need at least one disk seek to read and a couple of disk seeks to write things.” -- MySQL Reference Manual (8.12.3)
  15. But why?

  16. Rotational Latency

  17. Rotational Latency

  18. Rotational Latency

  19. Rotational Latency

  20. Throughput • So you understand latency… • What about throughput?

    • Depends on two factors: – Areal density – Newtonian physics
  21. Areal Density

  22. Interlude: Math • Rotation is fixed – Constant angular velocity

    (CAV) • Newton tells us that… v = ω ∙ r • Throughput increases with radius!
  23. Interlude: Math • Commodity drives are available at: – 5400-15000

    RPM – Usually 7200 RPM • What does it mean for latency? 7200 60 = 120 Revolutions / Second 1 120 = 0.08333 ~ 8.33ms!
  24. In practice? • Modern drives give you: 200+ MB/s 300

    IOPS • Pure random access nets only 1.2MB/s!
  25. RIGHT. WHAT CAN WE DO ABOUT IT?

  26. Fine-tuning • Provision more RAM • Careful index structure –

    Represent IPs as UNSIGNED INT for 75% reduction – Implement better UUIDs¹ for 30% reduction ¹ Store UUID in an optimized way, Percona blog
  27. … or use a sledgehammer! • RAID 0 (and variants)

    employ striping • Data is distributed to multiple spindles • If it sounds familiar… – It is! – We call it “sharding”
  28. It’s turtles all the way down • Don’t jump to

    conclusions! – RAID 0 is impractical – RAID 5 may be slow – RAID 10 is expensive – etc. • Do your homework • Benchmark!
  29. ACT II: I’LL USE MY CREDIT CARD

  30. Let’s talk SSDs • Non-volatile RAM • Lots of IOPS

    • Expensive :-) • Same caveats apply…
  31. Let’s talk SSDs • Value starts at “1” • Electrons

    accrue in the floating gate • After programming, value becomes “0” • Electrons are drained to reset value to “0”
  32. Surprise and Terror • “Draining” is destructive! • Limited erases

    • Limited lifespan!
  33. Wear Leveling

  34. Caveats, remember? • Addressing – Cells (1 bit) – not

    addressable
  35. Caveats, remember? • Addressing – Cells (1 bit) – not

    addressable – Pages (0.5-8KB)
  36. Caveats, remember? • Addressing – Cells (1 bit) – not

    addressable – Pages (0.5-8KB) – Blocks (32-64 pages)
  37. Caveats, remember? • Addressing – Cells (1 bit) – not

    addressable – Pages (0.5-8KB) – Blocks (32-64 pages) • Why do you care? – Reads/writes on a page – But erasure on a block
  38. Write Amplification 1 1 1 1 1 Δ = 1

    bit Δ = 1 block!
  39. Surprising Results • Defragmentation – Relocates blocks – Contiguous files

    – Lower LBAs – Background job • Bad, bad, bad! – No benefit with SSDs – Major write load!
  40. Background GC 7 5 6 1 2 Block A Block

    B Block C Block D 1 2 5 6 7 Block A Block B Block C Block D
  41. Surprising Results • What happens when you delete file? –

    Not much – Bit flip on file table – Space is not reclaimed • Result? – SATA TRIM command 7 5 6 1 2 Block A Block B Block C Block D
  42. SSD Takeaways • A moving target –File systems –Data structures

    –Longevity • As usual: –Benchmark –Monitor
  43. EPILOGUE “LET ME EMBRACE THEE, SOUR ADVERSITY, FOR WISE MEN

    SAY IT IS THE WISEST COURSE.”
  44. WE’RE DONE HERE! … AND YES, WE’RE HIRING :-) Thank

    you for listening tomer@tomergabel.com @tomerg http://il.linkedin.com/in/tomergabel Wix Engineering blog: http://engineering.wix.com