Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Introduction of System Software for Persistent ...

Introduction of System Software for Persistent Memory (Reading Circle 2014/12/18)

Introduction of a paper in PMFS published on EuroSys'14

This presentation was held in a reading circle of our lab.

Makoto Shimazu

December 18, 2014
Tweet

More Decks by Makoto Shimazu

Other Decks in Programming

Transcript

  1. Introduction of System Software for Persistent Memory Makoto Shimazu @Reading

    Circle 2014/12/18 S. R. Dulloor1,3, S. Kumar1, A. Keshavamurthy2, P. Lantz1, D. Reddy1, R. Sankaran1, J. Jackson1 1Intel Labs, 2Intel Corp, 3Georgia Institute of Technology EuroSys 2014
  2. Contributions Introduction of pm_wbarrier File system architecture optimized for PM

    ▪ light-weight and consistent POSIX file system ▪ memory-mapped I/O ▪ protecting stray writes Performance evaluation with PM emulator
  3. Outline Volatile cache problem Architecture ▪ Consistency ▪ Write protection

    from stray writes Implementation Evaluation Related Work Conclusion
  4. Outline Volatile cache problem Architecture ▪ Consistency ▪ Write protection

    from stray writes Implementation Evaluation Related Work Conclusion
  5. Flush the cache explicitly works well (clflush) Caching problem in

    PM 5 fig of HDD/SSD) http://storage-system.fujitsu.com/jp/lib-f/tech/beginner/ssd/ load/store to DRAM read/write to SSD/HDD load/store to PM Non-volatile Area Cache Volatile Area
  6. Flush the cache explicitly works well (clflush) clflush cannot flush

    from memory controller Caching problem in PM 6 fig of HDD/SSD) http://storage-system.fujitsu.com/jp/lib-f/tech/beginner/ssd/ load/store to DRAM read/write to SSD/HDD load/store to PM Non-volatile Area Cache Volatile Area MC
  7. pm_wbarrier Feature Enforce the durability of a cacheline Steps of

    usage 1. clflush A ▪ flush the cacheline contains A 2. sfence ▪ ensure the completion of store 3. pm_wbarrier ▪ ensure the durability of every store to PM
  8. Outline Volatile cache problem Architecture ▪ Consistency ▪ Write protection

    from stray writes Implementation Evaluation Related Work Conclusion
  9. Outline Volatile cache problem Architecture ▪ Consistency ▪ Write protection

    from stray writes Implementation Evaluation Related Work Conclusion
  10. Consistency Three existing techniques: Copy on Write (CoW) Journaling Log-structured

    updates One more PM specific technique: Atomic in-place writes Used for updates on Data Area Used for updates on Meta Data (inode) Used for updates of small portion of data
  11. Copy on Write (Shadow Paging) Safe and consistent method to

    modify data Three steps: Copy, Modify, Refer 1: Copy 2: Modify 3: Refer Recursive Copy!!! 12
  12. Journaling 13 Hello World! RINKO NXXXX hello.txt 1: WRITE “RINKO”

    2: WRITE “NOW!!!” Log Snapshot CRASH! Hello World! RINKO NOW!!!
  13. Hybrid method Metadata ▪ Updated by fine-grained logging Data ▪

    Use Copy on Write method Distributed small modification Centralized large modification Copy on Write ☓ (Write Amplification) ◯ (Freely after copy) Journaling ◯ (Just append logs) ☓ (Double writes)
  14. Extended atomic in-place writes 8 bytes (the same as BPFS)

    Update inode’s access time 16 bytes Using cmpxchg16b instruction Update inode’s size and modification time 64 bytes Using RTM (introduced in Haswell and having erratum) Update a number of inode fields like delete
  15. Outline Volatile cache problem Architecture ▪ Consistency ▪ Write protection

    from stray writes Implementation Evaluation Related Work Conclusion
  16. Write Protection Supervisor Mode Access Protection (SMAP) ▪ Prohibit writes

    into user area Write windows (introduced in this paper) ▪ Mount as read-only ▪ When writing, CR0.WP is set to zero Right) http://en.wikipedia.org/wiki/Protection_ring
  17. Outline Volatile cache problem Architecture ▪ Consistency ▪ Write protection

    from stray writes Implementation Evaluation Related Work Conclusion
  18. Implementation on Linux Execution In Place (XIP) Interface of loading

    data from Flash directly in limited RAM environment Used to avoid the block device/page cache layer
  19. Testing and Validation Yat: Hypervisor-based validation framework ▪ Ensure cache

    flushing and pm_wbarrier are executed in correct order ▪ Paper is published in USENIX ATC’14
  20. Outline Volatile cache problem Architecture ▪ Consistency ▪ Write protection

    from stray writes Implementation Evaluation Related Work Conclusion
  21. Evaluation Environment PM Emulation Platform (PMEP) PM Block Driver (PMBD)

    Results File-based Access Memory-Mapped I/O Write Protection
  22. Evaluation Settings PM Emulation Platform (PMEP) Configurable latencies and bandwidth

    for PM Configurable pm_wbarrier latency Environment Partitioned memory channels ▪ using custom BIOS? Latency Emulation ▪ debug hook and HW counter counting LLC stall cycles Bandwidth Emulation ▪ memory controller Element Value CPU Xeon(2.6GHz) 8 cores x 2sockets DRAM 16GB PM 256GB (disabled NUMA?)
  23. PMBD Persistent Memory Block Driver (PMBD) presented in MSST’14 Introduced

    for fair comparison Open-source implementation ▪ https://github.com/linux-pmbd/pmbd Partition between DRAM and PM Use non-temporal stores
  24. File-based Access File I/O (Right 4 Graphs) Single thread Single

    64GB file File Utilities (Bottom) For Linux Kernel tarball
  25. In-place updates/Logging Effect of in-place updates Compare with fine-grained logging...

    ▪ Using 16-byte atomic writes: 1.8X faster ▪ Using 64-byte atomic writes: 18% faster Logging Overhead
  26. Mmap Random read/write in a single 64GB file PMFS-D: default

    4kB page PMFS-L: 1GB page Large enough not to be on page cache Thanks to omitting page cache
  27. Neo4j (user application of mmap) Dataset 10M nodes/100M edges from

    Wikipedia dataset Workload Delete: deleting 2000 nodes and associated edges Insert: adding back the 2000 nodes and the edges Query: selecting two nodes and calculate the shortest path Improvements by no copy overhead Improvements by synchronous write latency
  28. Outline Volatile cache problem Architecture ▪ Consistency ▪ Write protection

    from stray writes Implementation Evaluation Related Work Conclusion
  29. Related Work Enhance new storage DFS[30], Log-structured File System[37], Conquest

    FS[41] Hybrid of NVM and Disk or Flash Rio File Cache[24], Conquest FS[41] PM-only Storage BPFS[27], SCMFS[43] High Level API on PM Failure-atomic msync[33] NV-Heaps[26], Mnemosyne[40] Library solutions[39]
  30. Outline Volatile cache problem Architecture ▪ Consistency ▪ Write protection

    from stray writes Implementation Evaluation Related Work Conclusion