Upgrade to Pro — share decks privately, control downloads, hide ads and more …

AuriStor - IT Press Tour #40 Sep./Oct. 2021

AuriStor - IT Press Tour #40 Sep./Oct. 2021

The IT Press Tour

September 28, 2021
Tweet

More Decks by The IT Press Tour

Other Decks in Technology

Transcript

  1. Outline • Introduction to AuriStorFS • AuriStorFS Supported Platforms •

    Linux Distributions • macOS (Apple Silicon and Intel) • Microsoft Windows 10 & 11 • AuriStorFS Futures
  2. AuriStor’s Vision for /afs Application Transparency Be a First Class

    file system on all major OSes Client support built into Linux kernel Embrace multi-producer, multi-consumer work-flows Be performance competitive Lustre, GPFS, Panasas, NFSv4, … Best of breed data security Wire privacy and integrity protection policy guarantees Combined identity authentication Multi-factor authorization Geographic isolation Improved Ease of Use No Flag Days
  3. Key Features ▪ Security ▪ Combined identity authn ▪ Multi-factor

    authz ▪ Volume and File Server policies ▪ Data privacy for all services ▪ Need to know access policies (GDPR) ▪ AES256 wire privacy ▪ Protection against cache poisoning attacks ▪ Networking ▪ IPv6 ▪ Can saturate multiple bonded 10Gbit NICs ▪ File and DB Server Performance and Scale ▪ Dynamic per-service worker pools ▪ Reduced resource contention ▪ Supports multi-producer, multi-consumer workflows ▪ UBIK DB quorum ▪ Accelerated establishment ▪ Membership up to 80 servers per cell ▪ 2038 compatible ▪ Simplified configuration management ▪ Robust continuous integration test suite ▪ USENIX LISA21 “Hands off testing for Network Filesystems” ▪ Coverity analysis ▪ Docker container deployments ▪ Servers run as unprivileged user; not “root” ▪ Arbitrary file server vice partition backing store ▪ Any POSIX file system – local or remote ▪ Object stores ▪ Enumeration “each” commands for servers, volumes, and protection entities
  4. The Global /afs file namespace ▪ The global /afs file

    namespace has a long history of use ▪ Federated authentication ▪ Home and shared Project directories ▪ Cross platform distributed access to files and directory trees over the WAN ▪ Anything that benefits from atomic publication and/or read-only replication ▪ Software distribution ▪ Static web content distribution ▪ Global data replication and live data migration ▪ New use cases include ▪ Persistent storage for containerized processes ▪ Distribution of container images
  5. AFS without Fear! No use case avoidance ▪ With AuriStorFS

    /afs now also handles work-flows that require ▪ Thousands of nodes modifying a common set of objects (multiple writer, multiple reader) ▪ Hundreds of processes on a single multi-processor thread client system ▪ Robust multi-factor authorization and security (auth/integ/priv) requirements ▪ End users expect their data to be available out of the box with no third-party software ▪ Linux native AFS (kafs) and AF_RXRPC now provide an out-of-the-box AuriStorFS client on Fedora, Debian, and Ubuntu Our partners push the limits of /afs without fear! No more walking on eggshells.
  6. What if something doesn’t work? ▪ No software is bug

    free ▪ If you don’t sleep, we don’t sleep ▪ The AuriStor team will perform a thorough root cause analysis and fix the problem no matter how big or small; no matter how long it takes
  7. Functional improvements due to end user usage ▪ Unlimited data

    transfer RX calls ▪ 19 March 2018 – World’s first RX call to send more than 5.63TB ▪ World’s largest AFS volume – 500TB ▪ Fully functional – migration, replication, backup, restore ▪ Volume quotas up to 16 Zettabytes ▪ 500,000 sustained RX connections per fileserver ▪ 40,000 Ubik queries/second and 25 writes/second sustained (See “Ubik Services at Scale” talk on Wednesday) ▪ Linux cache manager scaling beyond 64 simultaneous processor threads with minimal resource contention. (See “Juggling Bottlenecks” talk) ▪ 5TB and larger files
  8. AFS Operational Goals - Review Location Independence Authentication, Integrity Protection,

    & Privacy Geographic Replication of Critical Data Atomic Publishing Model One File System for all Fine Grained Access Control Federated Authentication Platform specific path resolution (@sys) Platform Architecture Independence Distributed Administration
  9. /afs deployments have long lifetimes For the work flows that

    AFS excels at there are no clear alternatives Transition Costs are Huge Legacy deployments are not enough
  10. AuriStorFS: The Next Generation AFS ▪ Zero configuration clients ▪

    Improved Security Model ▪ Client cache poisoning attack prevention ▪ Performance ▪ Scale ▪ Enhanced File System Functionality ▪ Out-of-the-box /afs access
  11. Cell Discovery for /afs ▪ In modern clients, the contents

    of /afs can be populated on-demand via DNS SRV queries
  12. Federated Authentication ▪ Likewise, cell mapping to authentication domains is

    also performed via DNS ▪ GSS-Kerberos v5 resolves the realm from the service ticket host component ▪ yfs-rxgk/_afs.your-file-system.com
  13. Multi-layered Security ✓ Per-server Security Policies ✓ Per-volume Security Polices

    ✓ Per-volume Maximum Access Control ✓ Per-object Access Control ✓ Combined Identity Authentication
  14. GSS-API RX Security Class Multiple authentication services Kerberos v5 (Active

    Directory) GSS Secure Anonymous Modern crypto (e.g. AES, Camellia) Larger key sizes Reduced key exposure Rekeying Token combining
  15. Combined Identity Authentication https://www.auristor.com/documentation/man/linux/7/auristorfs_acls.html Authenticated User on an Authenticated Machine

    Anonymous User on an Authenticated Machine Key Combination protects against cache poisoning attacks
  16. User-Centric, Constrained Elevation Access Control ▪ Combined Identity Authentication, and

    Multi-factor Access Control Lists • grant more rights, not fewer, when used to evaluate an ACL's Normal entries • revoke more rights, not fewer, when used to evaluate an ACL's Negative entries https://www.auristor.com/documentation/man/linux/7/auristorfs_acls.html
  17. RX Security Class Capabilities yfs-rxgk rxkad-k5 rxnull Authentication GSS-API Kerberos

    v5 None Integrity Protection Yes Yes No Privacy AES256-CTS-HMAC- SHA1-96 AES256-CTS-HMAC- SHA512-384 fcrypt No Rekeying Yes No No Max Pkts / Call 2^64 2^30 unlimited # identities one or more (ordered) one none Combined Key Yes No No
  18. Volume Security Policies and Maximum ACLs AFS Volumes are relocatable,

    replicable object stores Security Policies mandate the use of RX Security Classes, and Data Protection levels (integrity protection / wire privacy) Also, determine which servers are permitted to store which objects Maximum ACLs restrict the access rights that can be granted via per-object ACLs
  19. 32-bits just isn’t enough ▪ Maximum ~2GiB or ~4GiB file

    size ▪ 2038-problem, 2106-problem ▪ 100ns granularity to match NTFS ▪ UNIX Epoch
  20. Scaling to Meet Demand AuriStorFS AFS Year 2038 Ready Yes

    No Timestamp Granularity 100ns (UNIX Epoch) 1s (UNIX Epoch) Rx Listener Thread Throughput <= 8.2 gbit/second <= 2.4 gbits/second Rx Listener Threads per Service up to 16 1 Rx Window Size (default) 128 packets / 180.5KB 32 packets / 44KB Rx Window Size (maximum) 65535 packets / 90.2MB 32 packets / 44KB Rx Congestion Window Validation Yes No Volume IDs per Cell 264 231 Object IDs per Volume 295 directories and 295 files 230 directories and 230 files Objects per Volume 290 directories or files 226 directories or files Objects per Directory Up to 2,029,072 up to 64,447 Maximum Distributed DB Size 16 exabytes (264 bytes) 2 gigabytes (231 bytes) Maximum Assignable Quota 16 zettabytes (274 bytes) 2 terabytes (241 bytes) Maximum Reported Volume Size 16 zettabytes (274 bytes) 2 terabytes (241 bytes) Maximum Volume Size 16 zettabytes (274 bytes) 16 zettabytes (274 bytes) Maximum Transferable Volume Size 16 zettabytes (274 bytes) 5.639 terabytes Maximum Partition Size 16 zettabytes (274 bytes) 16 zettabytes (274 bytes)
  21. Networking Improvements IPv6 Pipeline data engine SACK based loss recovery

    as documented in RFC6675 with elements of New Reno from RFC6582 on top of TCP-style congestion control elements as documented in RFC5681 Maximum window size increased to 65535 packets from 32
  22. Filesystem Operation Enhancements Locking • Advisory Locks • Mandatory Locks

    • Atomic Create and Lock Rename Variants • Replace existing target • NoReplace – fail if existing target • Exchange Append Data • Return offset at completion Store Data Variants • Reserve • Reserve and Truncate • Reserve and ZeroFill Server Side Operations • Silly Rename • Copy Object • Copy Range • Move Object
  23. Client Caching and Callback Data Version incremented for each object's

    data change Secure Callback notifications Issued to all subscribed clients for data and metadata change Delivered before completion of the RPC to ensure serialization of object updates and subsequent communications via other channels Cached data is not considered valid unless an outstanding callback promise exists Callback promises expire after a period determined by the object server.
  24. KAFS – Linux Native AFS Client Part of the Linux

    kernel Compiled as a module by Fedora, Debian Module signed by build process kafs-client package (rpm, deb) Sets home cell Handles DNS cell lookups /afs started by systemd during boot Basic Kerberos authentication
  25. Fedora 34 Accessing /afs dnf install krb5-workstation kafs-client • [defaults]

    • thiscell = <cellname> • sysname = <sysname> [optional] Create /etc/kafs/client.d/defaults.conf systemctl start afs.mount Systemctl enable afs.mount
  26. AuriStorFS Linux Client Design ▪ Cross Platform ▪ Common code

    is shared across Linux, Solaris, AIX, macOS, and various BSD flavors; and a userland library ▪ Monolithic kernel module ▪ Combines the filesystem, rx network stack, and data/metadata cache management. ▪ The internal locking model for inodes and dentries is an imperfect match for all vfs implementations. ▪ One Mounted Device For All Volumes ▪ The entire file namespace is exposed to the vfs as a single device traditionally mounted on /afs
  27. KAFS Linux Client Design ▪ Single platform ▪ Only exists

    on Linux, but will work with any arch ▪ Modular design ▪ Userspace-accessible AF_RXRPC network protocol driver ▪ FS-Cache caching, shared with NFS, CIFS, … ▪ Uses native Linux VFS locking model ▪ Each volume has its own superblock ▪ Vnode IDs map to inode numbers ▪ df works correctly ▪ Linux VFS automounter handles AFS mountpoints ▪ Individual volumes can be mounted anywhere
  28. Advantages of an External Project ▪ Kernel-independent ▪ Already ported

    to many systems ▪ Features developed there ▪ Faster rollout ▪ Point of focus for developers interested in AFS
  29. Disadvantages of Being Out-of-tree ▪ Kernel interfaces change often ▪

    Parts of kernel API not accessible to non-GPL ▪ Massive continuous building effort to support many kernels across distributions ▪ User must find and install the packages
  30. Advantages of Being In-Kernel (1) ▪ Kernel-reserved interfaces ▪ RCU,

    container features ▪ Shared components ▪ FS-Cache, keyrings ▪ Crypto ▪ Tracepoints ▪ Debugging facilities
  31. Advantages of Being In-Kernel (2) ▪ Developers making big changes

    in-kernel must make the changes throughout ▪ Automated testing ▪ Distributions build the filesystem modules with distribution kernels ▪ Much more widely available ❑Note: Not in any enterprise distros yet
  32. Large File Write Performance – No soft lock ups on

    Linux ▪ 1TB file copy produced 288 soft lockup BUGs ▪ Soft lockups are logged when a processor’s non-maskable interrupt watchdog thread is unable to execute for watchdog_thresh seconds ▪ In this case, the soft lockups were the result of the computational complexity of the Store Segments algorithm which was dependent upon the size of the file be saved ▪ [157841.618236] watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [dd:959736] ▪ [157841.618291] RIP: 0010:afs_StoreAllSegments+0x407/0xd20 [yfs] Before After
  33. RX Security - Crypto engine ▪ AuriStorFS leverages Operating System

    vendor’s crypto engine ▪ FIPS certificate, speed, access to hardware accelerators ▪ Integrated support into Heimdal’s hcrypto framework ▪ Microsoft Crypto Next Gen (WinCNG), Apple Common Crypto, Solaris PKCS#11, Linux OpenSSL ▪ Simon Wilkinson’s new crypto library leverages the latest x86/x64 instructions ▪ Intel Advanced Encryption Standard New Instructions (AES-NI) ▪ Streaming Single Instruction Multiple Data Extensions (SSE, SSE2, SSE3) ▪ Advanced Vector Instructions (AVX, AVX2) ▪ to encrypt, decrypt, sign and verify RX packets at a high level of efficiency ▪ Compared to rfc3961 kernel implementation contributed by AuriStor to OpenAFS in 2010; and userland OSX Common Crypto (built from OpenSSL) ▪ Faster crypto -- Reduced latency ▪ Available for AMD64 and ARM8 platforms
  34. RX Security New Crypto engine – Intel RESULTS ▪ Measurements

    were performed on a MacBook Air ▪ macOS Sierra ▪ Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz (Haswell 22nm) ▪ Intel® SSE4.1, Intel® SSE4.2, Intel® AVX2 ▪ Intel® AES New Instructions ▪ Time to compute 5,000,000 rxgk packets in a single thread ▪ Worst case for encryption is MIT with built-in crypto at 203 seconds (or 24,630 packets/sec) ▪ Best case is 23 seconds (or 217,391 packets/sec) Faster than -> RFC3961 (kernel) CommonCrypto MIT Krb5 (built-in) MIT Krb5 (OpenSSL) Encryption 6x 3x 9x 3.5x Decryption 5x 5x 14x 6x Sign and Verify 3x 2x 9x 2x
  35. Volume Transaction Lifecycle Improvements ▪ New or updated commands: ▪

    “vos status”, “vos movesite”, “vos copysite”, “vos eachfs”, “vos eachvl”, “vos eachvol” ▪ Improved behavior for system administrators ▪ YFSVOL RPC suite removes race conditions associated with volume transactions ▪ volservers automatically garbage collect temporary or partial volumes created by interrupted or failed “vos” commands ▪ Read-only volumes can be moved between servers or partitions on the same server ▪ Read-write volume moves recreate the backup volume on the destination ▪ Avoids loss of nightly backup if move occurs between snapshot and backup ▪ Improved status reporting ▪ Bytes sent and received for each call bound to a transaction ▪ Search transactions by volume group ▪ Volserver validates volume transaction metadata against location database for consistency checks ▪ Coordinated shutdown of volume transactions during volserver shutdown
  36. The “wait 15 minutes between volume releases” rule ▪ The

    atomic publishing workflow involves writing related changes to a RW volume and publishing the change set to clients via “vos release” ▪ AFS-lore indicates that any volume must not be "released" more frequently than once every ten to fifteen minutes. ▪ Releasing volumes more often can result in transient data access failures. This wisdom has been based upon empirical evidence with no explanation of why this is true. ▪ An AuriStorFS end-user wished to release changes every two minutes. ▪ Failures traced to UNIX cache manager handling of VOFFLINE errors, VLSF_DONTUSE flags, and a periodic ten-minute daemon check. ▪ Starting with the March 2021 client release, there is no longer a limit to the frequency of volume releases
  37. Development by the numbers since 1 Jan 2019 ▪ 8

    contributing developers ▪ 21,245 code review submissions ▪ 129 errors detected by Coverity ▪ 3545 commits ▪ 2268 files changed, 172506 insertions(+), 105934 deletions(-) ▪ ~2000 new tests (~8400 total) ▪ 163,044 continuous integration builds spanning 36 platforms ▪ 11 releases 0 50 100 150 200 250 300 350 400 Mar-12 Aug-12 Jan-13 Jun-13 Nov-13 Apr-14 Sep-14 Feb-15 Jul-15 Dec-15 May-16 Oct-16 Mar-17 Aug-16 Jan-17 Jun-17 Nov-17 Apr-18 Sep-18 Feb-19 Jul-19 Dec-19 May-20 Oct-20 Mar-21 Source Commits By Month AuriStorFS OpenAFS
  38. Linux Platform support ▪ Distributions: ▪ Red Hat Enterprise Linux

    8.4, 7.9, 6.10 and Extended Support ▪ Red Hat Fedora 32, 33, 34 ▪ Debian Bullseye, Buster, Stretch, Trust ▪ Ubuntu 20.04, 18.04, 16.04, 14.04 ▪ CentOS 6, 7, 8 ▪ Amazon Linux 2 ▪ Oracle Linux ▪ Architectures ▪ X86_64 ▪ aarch64 ▪ PPC-LE (no hardware accelerated crypto)
  39. macOS Platform Support 11 – Big Sur (Apple Silicon and

    Intel) 10.15 – Catalina 10.14 – Mojave 10.13 – High Sierra 10.12 – Sierra Each platform supported within one day of Apple release; including support for Apple Silicon with hardware accelerated cryptography on 12 November 2020.
  40. Windows Support All Intel Windows 10 versions Back-level support to

    Windows 7 if SHA256 signature support installed Even Windows 11
  41. Reinvestment in the Community ▪ AuriStor’s recognizes that its continued

    success requires a healthy ecosystem around us ▪ AuriStor is proud to sponsor the work of these critical organizations ▪ USENIX Benefactor ▪ LISA Gold Sponsor ▪ VAULT Sponsor ▪ The Linux Foundation Silver Member ▪ Open Source Security Foundation Board Member ▪ SPEC Open Systems Group Member
  42. On-going Investment in LINUX ▪ AuriStor will continue to support

    and develop its out-of-tree client ▪ Faster to fix bugs and develop new features and functionality ▪ But we believe the long-term future depends upon the success of the in-tree afs filesystem and AF_RXRPC ▪ AuriStor will continue to partner with David Howells on the development and QA testing of kafs and AF_RXRPC and will continue to invest in its adoption by downstream Linux distributions
  43. Fine grained volume properties ▪ At present the granularity for

    enablement of OpenAFS incompatible volume capabilities is per fileserver ▪ Per-file ACLs ▪ Hard Links ▪ Large Directories ▪ This choice is problematic because the owner of a volume might want to ensure OpenAFS compatibility ▪ This Summer AuriStor will introduce per-volume feature management
  44. Fileserver RPC Refresh – 2021 Edition ▪ New FetchStatus fields

    ▪ Server Creation Time ▪ New RPCs ▪ Fetch Status for User List ▪ Server-side silly rename ▪ New Rename variants (Replace, Replace Silly, No Replace, Exchange) ▪ All return status information on renamed vnodes ▪ Callbacks for Hard and Symbolic Links ▪ New Store Data modes to handle short writes (Zero Fill, Short OK) ▪ Append Data ▪ Copy File Range for server-side copies including between volumes ▪ New Fetch Data returns metadata before data in case of interruption ▪ New WhoAmI returns the fileserver’s cell name ▪ Enabled by Cache Manager Capability bit. Once enabled all old RPCs are rejected by FS.
  45. 2016 ACM Software System Award • AuriStor stands on the

    shoulders of those that came before us • Mahadev Satyanarayanan • Michael L. Kazar • Robert N. Sidebotham • David A. Nichols • Michael J. West • John H. Howard • Alfred Z. Spector • Sherri M. Nichols
  46. Pricing ▪ Perpetural license starts at US$21,000 ▪ 4 DB

    Service Instances ▪ 4 File Service Instances ▪ Unlimited client devices ▪ 1000 Protection Records for User and Managed Devices Entities ▪ One year of web support and feature updates included ▪ Annual licensing for continued support and feature updates
  47. Source code licensing ▪ AuriStorFS source code is available (under

    contract) to licensees ▪ AuriStor believes in the benefits of community development ▪ Multi-platform distributed infrastructures such as AuriStorFS requires sustained investment ▪ Global Public Goods are taken for granted
  48. Want to Learn More? ▪ History of AFS Tutorial: CERN

    Education series ▪ https://indico.cern.ch/event/126258/ ▪ Future of AFS: CERN Education series ▪ https://indico.cern.ch/event/126259/ ▪ Leveraging AFS Storage Systems to Ease Global Software Deployment ▪ https://www.usenix.org/conference/lisa21/presentation/dimarco