Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How shit works: Storage

How shit works: Storage

A talk given at GeeCON 2016 in Kraków, Poland.

The beautiful thing about software engineering is that it gives you the warm and fuzzy illusion of total understanding: I control this machine because I know how it operates. This is the result of layers upon layers of successful abstractions, which hide immense sophistication and complexity. As with any abstraction, though, these sometimes leak, and that's when a good grounding in what's under the hood pays off.

This first in what will hopefully be a series of talks covers the fundamentals of storage, providing an overview of the three storage tiers commonly found on modern platforms (hard drives, RAM and CPU cache). You'll come away knowing a little bit about a lot of different moving parts under the hood; after all, isn't understanding how the machine operates what this is all about?

Tomer Gabel

May 11, 2016
Tweet

More Decks by Tomer Gabel

Other Decks in Programming

Transcript

  1. Like all good stories… • We’ll start with a question.

    • “What’s wrong with this picture?”
  2. Like all good stories… • We’ll start with a question.

    • “What’s wrong with this picture?”
  3. Axioms • Not a trick question – Servers are properly

    configured – System architecture makes sense – No obvious bugs – No scheduled jobs • So what else goes bump in the night?
  4. I/O is simple • Just open a file, write, flush,

    close • Nothing to it, right? HDD Application File
  5. I/O is simple • A little closer… HDD Application File

    Kernel File system (ext4) Virtual File System Logical Volume Manager I/O scheduler SCSI driver stack
  6. I/O is simple • But really… HDD Application File Kernel

    Hardware Storage Subsystem System Bus Drivers PCI Express Bus SATA Controller
  7. Everybody knows… “Disk seeks are a huge performance bottleneck… When

    the amount of data starts to grow so large that effective caching becomes impossible… you need at least one disk seek to read and a couple of disk seeks to write things.” -- MySQL Reference Manual (8.12.3)
  8. Everybody knows… “Disk seeks are a huge performance bottleneck… When

    the amount of data starts to grow so large that effective caching becomes impossible… you need at least one disk seek to read and a couple of disk seeks to write things.” -- MySQL Reference Manual (8.12.3)
  9. Throughput • So you understand latency… • What about throughput?

    • Depends on two factors: – Areal density – Newtonian physics
  10. Interlude: Math • Rotation is fixed – Constant angular velocity

    (CAV) • Newton tells us that… v = ω ∙ r • Throughput increases with radius!
  11. Interlude: Math • Commodity drives are available at: – 5400-15000

    RPM – Usually 7200 RPM • What does it mean for latency? 7200 60 = 120 Revolutions / Second 1 120 = 0.08333 ~ 8.33ms!
  12. In practice? • Modern drives give you: 200+ MB/s 300

    IOPS • Pure random access nets only 1.2MB/s!
  13. Fine-tuning • Provision more RAM • Careful index structure –

    Represent IPs as UNSIGNED INT for 75% reduction – Implement better UUIDs¹ for 30% reduction ¹ Store UUID in an optimized way, Percona blog
  14. … or use a sledgehammer! • RAID 0 (and variants)

    employ striping • Data is distributed to multiple spindles • If it sounds familiar… – It is! – We call it “sharding”
  15. It’s turtles all the way down • Don’t jump to

    conclusions! – RAID 0 is impractical – RAID 5 may be slow – RAID 10 is expensive – etc. • Do your homework • Benchmark!
  16. Let’s talk SSDs • Non-volatile RAM • Lots of IOPS

    • Expensive :-) • Same caveats apply…
  17. Let’s talk SSDs • Value starts at “1” • Electrons

    accrue in the floating gate • After programming, value becomes “0” • Electrons are drained to reset value to “0”
  18. Caveats, remember? • Addressing – Cells (1 bit) – not

    addressable – Pages (0.5-8KB) – Blocks (32-64 pages)
  19. Caveats, remember? • Addressing – Cells (1 bit) – not

    addressable – Pages (0.5-8KB) – Blocks (32-64 pages) • Why do you care? – Reads/writes on a page – But erasure on a block
  20. Surprising Results • Defragmentation – Relocates blocks – Contiguous files

    – Lower LBAs – Background job • Bad, bad, bad! – No benefit with SSDs – Major write load!
  21. Background GC 7 5 6 1 2 Block A Block

    B Block C Block D 1 2 5 6 7 Block A Block B Block C Block D
  22. Surprising Results • What happens when you delete file? –

    Not much – Bit flip on file table – Space is not reclaimed • Result? – SATA TRIM command 7 5 6 1 2 Block A Block B Block C Block D
  23. SSD Takeaways • A moving target –File systems –Data structures

    –Longevity • As usual: –Benchmark –Monitor
  24. WE’RE DONE HERE! … AND YES, WE’RE HIRING :-) Thank

    you for listening [email protected] @tomerg http://il.linkedin.com/in/tomergabel Wix Engineering blog: http://engineering.wix.com