Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Joint R&D on Building a MySQL Storage Engine fo...

Joint R&D on Building a MySQL Storage Engine for the Latest Hardware

Shohei Matsuura (Yahoo! JAPAN / Service Platform, System Management Group, Technology Group / Software Engineer)

https://tech-verse.me/ja/sessions/178
https://tech-verse.me/en/sessions/178
https://tech-verse.me/ko/sessions/178

Tech-Verse2022

November 17, 2022
Tweet

More Decks by Tech-Verse2022

Other Decks in Technology

Transcript

  1. Agenda - Unleash the Power of Data - New Landscape

    - Game has Just Begun! - Just go, go! Game Continues!
  2. What constitutes a good data platform? Process Data Faster Performance

    Never Stops Reliable Keep up with Growth Scalable
  3. Call for action: we need to evolve... Collaboration Workloads Data

    Platform Photo: Aflo More Performance More Reliable More Scalable
  4. Memory • Non-volatile Memory • CXL Memory Pool Network •

    RDMA • RoCE Storage • New Storage Solutions for New Storage Devices Emerging Hardware Technology
  5. Non-volatile Memory a.k.a. PMEM (Persistent Memory) Byte-addressable memory with data

    persistence Faster than SSDs, but slower than DRAM Less capacity than SSDs, but more than DRAM Performance Higher Capacity Larger DRAM Non-volatile Memory SSDs Characteristics of Non-volatile Memory
  6. CXL Memory Pool Compute Express Link Open standards for interconnect

    between CPU and other devices (Accelerators, NIC, Memory, etc.) Memory pooling from CXL 2.0 CXL-enabled CPU & memory devices soon to come Host #n Host #2 Host #1 ... CXL Memory Pool CXL 2.0 Switch Mem for Host #1 Mem for Host #2 Mem for Host #n ... CXL Memory Pool Overview
  7. RDMA/RoCE Remote Direct Memory Access Access remote memory (target) from

    a local machine (initiator) with an extremely low latency Widely used in HPC, but also used in DB and NVM replication Initiator Target Program RDMA-capable NIC RDMA-capable NIC Mem Data Data RDMA N/W Program Zero-copy Transmission Mem RDMA/RoCE Overview
  8. New Storage Solutions for New Storage Devices Storage solutions optimized

    for NVM, NVMe SSDs Software-defined or Hardware e.g.) DAOS (Distributed Asynchronous Object Storage) Primarily used for HPC/AI workloads Compute Nodes Posix I/O MPI-I/O etc... HDF5 DAOS Library DAOS Storage Nodes Storage Engine Non-volatile Memory/NVMe SSDs I/Os DAOS Overview
  9. • Leverage a very low-latency of the device for faster

    data processing with MySQL • Place heap files & log files on NVM and accessed with byte granularity • Data synchronized among nodes with RDMA/RoCE Non-volatile-Memory Optimized Source NVM MySQL Storage Engine Heap Log Replica NVM MySQL Storage Engine Heap Log Data Processing RDMA-write Architecture Overview Load-balancer Read + Write Read-only Request Data Processing
  10. Ø To fully benefit from the low-latency of NVM for

    data processing Why we need to implement our storage engine for NVM? n Concurrent Write Operation to NVM @10 Threads Ø Theoretical Maximum Bandwidth: 1.85GB/sec x 12= 22.2GB/sec Conventional File Access Average Bandwidth: 5GB/sec Optimized Memory Access Average Bandwidth: 15GB/sec
  11. Collaboration: Backup & Restore Feature Ø Implementing full, incremental, differential

    online backup & restore feature Source Replica #1 Replica #n ... Backup Storage (NAS, Object Storage, etc.) restore backup full, incremental, differential NVM NVM NVM
  12. Making it Scalable: Integration With a Scalable Storage • To

    be scalable with the growth of data size, integrate the storage engine with a scalable storage, DAOS • Shared-disk model with a disaggregated storage Node #1 Node #2 Node #n ... DAOS Cluster DAOS Pool Source Replica #1 Replica #m ... expand pool size create R/W R R Integration Overview
  13. A Little bit About DAOS Distributed Asynchronous Object Storage OSS

    with contributions from major H/W vendors Optimized for NVM & NVMe SSDs Mainly used in HPC/AI workloads DAOS Architecture Overview DAOS Node #1 CPU Socket CPU Socket DAOS Engine DAOS Engine DAOS Server Daemon NVM NVMe SSD Target #1 NVM NVMe SSD Target #2 ... DAOS Node #n CPU Socket CPU Socket DAOS Engine DAOS Engine DAOS Server Daemon NVM NVMe SSD NVM NVMe SSD DAOS Pool (Target #1+ Target #2) DAOS Pool (Target #3+ Target+4+...+Target #n)
  14. Storage Engine & DAOS Integration FUSE FUSE + R/W Interception

    Native API Outline Performance x ✔ ✔✔ Portability ✔ ✔ x Replica Source DAOS Cluster DAOS Pool fuse fuse POSIX I/O Replica Source DAOS Cluster DAOS Pool fuse fuse POSIX I/O bypass kernel in read/write Replica Source DAOS Cluster DAOS Pool fuse fuse DAOS API
  15. Storage Engine & DAOS Integration FUSE FUSE + R/W Interception

    Native API Outline Performance x ✔ ✔✔ Portability ✔ ✔ x Replica Source DAOS Cluster DAOS Pool fuse fuse POSIX I/O Replica Source DAOS Cluster DAOS Pool fuse fuse POSIX I/O bypass kernel in read/write Replica Source DAOS Cluster DAOS Pool fuse fuse DAOS API
  16. Ø Further Integration with scalable storage solutions Future Direction of

    Joint R&D Cluster #n ... DAOS Cluster DAOS Pool #n ... I/Os #3: Exploration for Other Scalable Storage Options as Well Source Replica #1 Replica #2 Cluster #2 DAOS Pool #2 I/Os Cluster #1 DAOS Pool #1 I/Os #1:Multi-tenancy with Performance Isolation Source Replica #1 Replica #2 Source Replica #1 Replica #2 #2: Observability & Mgmt. of the Whole System
  17. Ø CXL-memory Enablement & Memory Pooling Future Direction of Joint

    R&D Cluster #n ... Source Replica #1 Replica #2 Cluster #2 Cluster #1 Source Replica #1 Replica #2 Source Replica #1 Replica #2 CXL Memory Pool CXL 2.0 Switch Memory for Cluster #1 Memory for Cluster #2 ... Memory for Cluster #n Memory Online Memory Addition ... DAOS Pool #n DAOS Pool #2 DAOS Pool #1 I/Os I/Os I/Os
  18. Trademarks Ø MySQL is a registered trademark of Oracle and/or

    its affiliates. Other names may be trademarks of their respective owners. Ø LINE is a trademark or a registered trademark of LINE Corporation. Ø NVMe is a trademark of the NVM Express Organization.