Upgrade to Pro — share decks privately, control downloads, hide ads and more …

LINBIT - IT Press Tour #47 Dec. 2022

LINBIT - IT Press Tour #47 Dec. 2022

The IT Press Tour

December 07, 2022
Tweet

More Decks by The IT Press Tour

Other Decks in Technology

Transcript

  1. Leading Open Source OS based SDS COMPANY OVERVIEW REFERENCES •

    Developer of DRBD and LINSTOR • 100% founder owned • Offices in Europe and US • Team of highly experienced Linux experts • Exclusivity Japan: SIOS • Leading Open Source Block Storage (included in Linux Kernel (v2.6.33) • Open Source DRBD supported by proprietary LINBIT products / services • OpenStack with DRBD Cinder driver • Kubernetes Driver • Install base of >2 million PRODUCT OVERVIEW SOLUTIONS LINBIT SDS Since 2016 Perfectly suited for SSD/NVMe high performance storage LINBIT HA, LINBIT DR Market leading solutions since 2001, over 600 customers Ideally suited to power HA and DR in OEM appliances (Cisco, IBM, Oracle)
  2. 4 When is LINBIT SDS a fit? Transaction Processing •

    Oracle DB • PostgreSQL • MariaDB • Message queuing systems Analytic Processing • DB2 Warehouse • And similar read intensive workloads PersistentVolumes ...for Containers • Kubernetes • Nomad • Docker Virtualization • OpenStack • CloudStack • OpenNebula • XCP-ng • Proxmox
  3. 5 Why is LINBIT SDS so fast? Layout at volume

    allocation • All participating machines have full replicas, which machines participate determined when creating a volume. • Be faster at IO submission time • Saving on CPU/memory In Kernel data-path • Reduce number of context switches • Saving on CPU/memory resources • Minimal latency for block-IO operations • Optional load-balancing for READs Hyper-Converged Very well suitable for hyper-converged deployment • Reduced network load for reads • Reduces latency • LINBIT SDS’ Low resource consumption leaves most of CPU and memory for workload. About 0.5% of a single core are consumed by DRBD under heavier IO load (measured with an analytics DB) Build on existing components • DRBD, LVM, ZFS, LUKS, VDO, ... • Help day2 operations by leveraging on the opera tion teams prior knowledge • Build on the shoulders of giants
  4. 6 What is LINBIT SDS doing? Storage Allocation • 3

    to 1000s of nodes • Multiple tiers • Multi tenancy • Complex policies Chassis - rack - room Network • Multiple NICs per server • Multiple networks • RDMA • TCP Business continuity • Continuous data protection • Multiple sites • Backups SSD - disk - cloud Data Replication • Persistence & availability • Sync / async • 2,3 or more replicas • Consistency groups • Quorum
  5. 8 Linux's LVM physical volume physical volume logical volume logical

    volume physical volume snapshot Volume Group
  6. 9 Linux's LVM • based on device mapper • original

    objects • PVs, VGs, LVs, snapshots • LVs can scatter over PVs in multiple segments • thinlv • thinpools = LVs • thin LVs live in thinpools • multiple snapshots became efficient!
  7. 10 Linux's LVM physical volume physical volume logical volume thinpool

    physical volume snapshot thin-LV thin-LV thin-sLV Volume Group
  8. 11 Linux's RAID • original MD code • mdadm command

    • Raid Levels: 0,1,4,5,6,10 • now available in LVM as well • device mapper interface for MD code • do not call it ‘dmraid’; that is software for hardware fake-raid • lvcreate --type raid6 --size 100G VG_name RAID1 A4 A3 A2 A1 A4 A3 A2 A1
  9. 12 Linux’s DeDupe • Virtual Data Optimizer (VDO) since RHEL

    7.5 • Red hat acquired Permabit and is GPLing VDO • Linux upstreaming is in preparation • in-line data deduplication • kernel part is a device mapper module • indexing service runs in user-space • async or synchronous writeback • recommended to be used below LVM
  10. 13 SSD cache for HDD • dm-cache • device mapper

    module • accessible via LVM tools • bcache • generic Linux block device • slightly ahead in the performance game • dm-write-cache • for combinding PMEM & NVMe drives
  11. 14 Linux’s targets & initiators • Open-ISCSI initiator • Ietd,

    STGT, SCST • mostly historical • LIO • iSCSI, iSER, SRP, FC, FCoE • SCSI pass through, block IO, file IO, user-specific-IO • NVMe-OF & NVMe/TCP • target & initiator Initiator Target IO-requests data/completion
  12. 15 ZFS on Linux • Ubuntu eco-system only • has

    its own • logic volume manager (zVols) • thin provisioning • RAID (RAIDz) • caching for SSDs (ZIL, SLOG) • and a file system!
  13. 17 DRBD – think of it as ... Target Initiator

    IO-requests data/completion RAID1 A4 A3 A2 A1 A4 A3 A2 A1
  14. 21 DRBD – up to 32 replicas • each may

    be synchronous or async Primary Secondary Secondary
  15. 22 DRBD – Diskless nodes • intentional diskless (no change

    tracking bitmap) • disks can fail Primary Secondary Secondary
  16. 23 DRBD - more about • a node knows the

    version of the data is exposes • automatic partial resync after connection outage • checksum-based verify & resync • split brain detection & resolution policies • fencing • quorum • multiple resouces per node possible (1000s) • dual Primary for live migration of VMs only!
  17. 24 DRBD Recent Features & ROADMAP • Recent • meta-data

    on PMEM/NVDIMMS • improved, fine-grained locking for parallel workloads • Eurostars grant: DRBD4Cloud • started DRBD-9.1 • ROADMAP • performance optimizations • replace „stacking” • production release of WinDRBD
  18. 26 LINSTOR - goals • storage build from generic Linux

    nodes • for SDS consumers (K8s, OpenStack, OpenNebula) • building on existing Linux storage components • multiple tenants possible • deployment architectures • distinct storage nodes • hyperconverged with hypervisors / container hosts • LVM, thin LVM or ZFS for volume management (stratis later) • Open Source, GPL
  19. 28 LINSTOR - Hyperconverged storage devices hypervisor & storage VM

    storage devices hypervisor & storage storage devices hypervisor & storage storage devices hypervisor & storage storage devices hypervisor & storage storage devices hypervisor & storage VM
  20. 29 LINSTOR - VM migrated storage devices hypervisor & storage

    storage devices hypervisor & storage storage devices hypervisor & storage storage devices hypervisor & storage storage devices hypervisor & storage storage devices hypervisor & storage VM VM
  21. 30 LINSTOR - add local replica storage devices hypervisor &

    storage storage devices hypervisor & storage storage devices hypervisor & storage storage devices hypervisor & storage storage devices hypervisor & storage storage devices hypervisor & storage VM VM
  22. 31 LINSTOR - remove 3rd copy storage devices hypervisor &

    storage storage devices hypervisor & storage storage devices hypervisor & storage storage devices hypervisor & storage storage devices hypervisor & storage storage devices hypervisor & storage VM VM
  23. 33 LINSTOR Controller LINSTOR Satellite LINSTOR Satellite LINSTOR Satellite LINSTOR

    Controller LINSTOR Satellite LINSTOR Controller LINSTOR Satellite LINSTOR Protocol TCP/IP SSL/TLS API Library CLI
  24. 34 LINSTOR Objects • Nodes • Resources  Volumes •

    Snapshots • Storage Pools  shared • Resource groups • Properties  Aux properties node A LVM VG node C shared LVM VG DRBD res. 2 replicas encryped volume node B ZFS pool plain volume Resource Groups Resource Groups
  25. 35 LINSTOR Storage Layers LVM VG, ZFS zPool, Exos, OpenFlex,

    SPDK, shared LVM VG Mid (opt & multiple) Bottom (required) LUKS (encryption), caches, NVMe target & initiator Top (opt) DRBD Below (opt) Bottom (required) VDO (deduplication), software RAID
  26. 36 LINSTOR data placement Example policy 3 way redundant, where

    two copies are in the same rack but in diffeent fire compartments (synchronous) and a 3rd replica in a different site (asynchronous) Example tags rack = number room = number site = city • arbitrary tags on nodes • require placement on equal/different/named tag values • prohibit placements with named existing volumes • different failure domains for related volumes
  27. 37 LINSTOR network path selection • a storage pool may

    preferred a NIC • express NUMA relation of NVMe devices and NICs • DRBD’s multi pathing supported • load balancing with the RDMA transport • fail-over only with the TCP transport
  28. 38 LINSTOR Recent & ROADMAP • Recent • Volume, snapshot

    and snapshot-delta shipment • between LINSTOR clusters (disaster recovery) • to S3 buckets (backup) • K8s CRDs as LINSTOR’s database • Eliminates need for dedicated etcd • Roadmap • Public cloud, storage drivers: EBS on AWS, Azure disks, Google’s Persistent Disk
  29. 40 LINSTOR connectors Kubernetes: CSI-driver, Operator, Stork, HA, YAMLs, kubectl

    plugin Nomad: CSI-driver OpenStack: Cinder-driver since “Stein” (April 2019) OpenNebula: Storage Driver Proxmox VE: storage plugin XCP-ng (in final beta) Apache CloudStack
  30. 41 Piraeus Datastore • CSI-driver, Operator, Stork, HA, helm-chart, kubectl

    • Publicly available containers of all components • Joint effort of LINBIT & DaoCloud • In CNCF Sandbox https://github.com/piraeusdatastore https://piraeus.io
  31. 42 LINBIT SDS & Piraeus Datastore LINBIT SDS Piraeus Datastore

    Container base Img Red hat UBI Debian Available drbd.io LINBIT customers only dockerhub, quay.io publicly Support ✓ Enterprise, incl 24/7 Community only OpenShift/RHCOS ✓ n.a. DRBD driver Pre-compiled for RHEL/SLES kernels Compile from source Contains LINSTOR, DRBD, operator, CSI-driver, Stork, HA, helm-chart, kubectl
  32. 43 “Naming is hard” – Phil Karlton Translation Matrix LINSTOR

    Resource Group Resource/Volume Kubernetes storageClass + file system → persistentVolume Nomad storageClass + file system → persistentVolume OpenNebula Datastore Image OpenStack Volume Type Volume Proxmox Storage Pool Volume XCP-ng Storage Repository (SR) Virtual disk Image (VDI) CloudStack Primary storage Volume
  33. 44 Summary DRBD Diskless NVMe - oF ISCSI Block transport

    systems Orchestrators Block Storage Features DRBD LUKS DM - Cache VDO Hardware HDD Node-level volume management LVM ZFS SSD NVMe PMEM
  34. 47 LINSTOR – disaggregated stack storage node storage node hypervisor

    VM VM storage node storage node hypervisor VM VM hypervisor
  35. 48 LINSTOR / failed Hypervisor storage node storage node storage

    node storage node hypervisor VM VM hypervisor VM VM
  36. 49 LINSTOR / failed storage node storage node hypervisor VM

    VM storage node storage node hypervisor VM VM hypervisor
  37. 51 LINSTOR Storage Stacks • Disaggregated Storage • Classic enterprise

    workloads • Data bases • Message queues • Typical Orchestrators • OpenStack, OpenNebula • Kubernetes • Flexibly redundancy (1-n) • HDDs, SSDs, NVMe SSDs DRBD App DRBD DRBD
  38. 52 LINSTOR Storage Stacks • Hyperconverged • Classic enterprise workloads

    • Data bases • Message queues • Typical Orchestrators • OpenStack, OpenNebula • Kubernetes • Flexibly redundancy (1-n) • HDDs, SSDs, NVMe SSDs DRBD DRBD App DRBD
  39. 53 LINSTOR Storage Stacks • Disaggregated • Classic enterprise workloads

    • Data bases • Message queues • Typical Orchestrators • OpenStack, OpenNebula • Kubernetes • NVMe SSDs, SSDs App NVMe-oF Initiator NVMe-oF Target NVMe-oF Initiator NVMe-oF Target Raid1
  40. 54 LINSTOR Storage Stacks • Disaggregated • Cloud native workload

    • Ephemeral storage • Typical Orchestrator • Kubernetes • Application handles redundancy • Best suited for NVMe SSDs App NVMe-oF Initiator NVMe-oF Target
  41. 55 LINSTOR Storage Stacks • Hyperconverged • Cloud native workload

    • Ephemeral storage • PMEM optimized data base • Typical Orchestrator • Kubernetes • Application handles redundancy • PMEM, NVDIMMs App
  42. 56 LINSTOR Slicing Storage • LVM or ZFS • Thick

    – pre allocated • Best performance • Less features • Thin – allocated on demand • Overprovisioning possible • Many snapshots possible • Optional • Encryption on top • Deduplication below
  43. 58 WinDRBD • Version 1.0 was released in Q1 2022

    • https://www.linbit.com/en/drbd-community/drbd-download/ • Windows 7sp1, Windows 10, Windows Server 2008, Windows Server 2016 and 2019 • wire protocol compatible to Linux version • driver tracks Linux version with one day release offset • WinDRBD user level tools are merged into upstream