Slide 1

Slide 1 text

Optimizing  leveldb  for   Performance  and  Scale

Slide 2

Slide 2 text

leveldb  throughput 0 5000 10000 15000 20000 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000

Slide 3

Slide 3 text

leveldb  throughput 0 5000 10000 15000 20000 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 0 5000 10000 15000 20000 0 10000 20000 30000 40000 50000 tuned  as  a  server github.com    basho/leveldb

Slide 4

Slide 4 text

key/value  lifecycle Write() Skip  list Recovery  log Immutable  memory Level-­‐‑0  .sst  (overlapping) Level-­‐‑1  .sst  (sorted/overlapping) Level-­‐‑2  .sst  (sorted/overlapping) Level-­‐‑3  .sst  (sorted) Level-­‐‑4  .sst  (sorted) Level-­‐‑5  .sst  (sorted) Level-­‐‑6  .sst  (sorted) MANIFEST

Slide 5

Slide 5 text

.sst  file  anatomy trailer block  index filter  table  (bloom) data  block data  block data  block File  position  0 metadata  index

Slide 6

Slide 6 text

stalls imm  (immutable  memory) level  0  full

Slide 7

Slide 7 text

compaction   C2 F M B  C1  E A C3 H1 G  H0  L C0  D  J K  N A  B  C3 E  F  G H1  L  M C0  D  J K  N Sorted Level+1 Sorted Level Overlap Level Before  Compaction After  Compaction •  Write  Amplification:    the  silent  performance  killer

Slide 8

Slide 8 text

stall  sources •  Single  Database •  Level  0  full  and  IMM  compactions  occur  too  often •  Level  0  full  and  blocked  by  any  higher  level  compaction •  Multiple  Databases •  IMM  /  Level  0  full  and  blocked  by  any  other  active  compaction •  IMM  /  Level  0  full  and  waiting  on  queue

Slide 9

Slide 9 text

compaction  management   Global Thread  block  1  (of  5) Tiered  Lock  0 Tiered  Lock  1 IMM  to  Level  0  compaction  thread Level  0  to  Level  1  compaction  thread Levels  1+  compaction  thread Backpressure:    Write  Throble

Slide 10

Slide 10 text

key/value  retrieval Get() Skip  list Immutable  memory Use  manifest  to  find  files   covering  key  range  by  level File  in  file  cache (no:  the  open  file  song) Bloom  filter  suggests  exists Use  index  to  identify   block  with  key  range Block  in  read  cache (no:  see  open  file  song,  verse  4) Sequentially  walk  block   to  find  key

Slide 11

Slide 11 text

the  open  file  song Open  .sst  file Read  and  validate  trailer Request  block  index Chorus:      Read  block  to  user  space      CRC  scan  block      Compression’s  checksum  block  scan      Decompress  block  into  malloc  memory Request  metadata  index Chorus: Request  bloom  filter Chorus: Chorus: Request  data  block

Slide 12

Slide 12 text

time  fillers •  Q&A •  Repair •  Level  directories  for  tiered  storage •  Linux  and  grace  of  posix_fadvise •  Performance  counters •  Independent  cache  types •  FusionIO  /  SSD  /  SATA  /  AWS