Slide 1

Slide 1 text

IT Press Tour 40th Edition 28 September 2021

Slide 2

Slide 2 text

No content

Slide 3

Slide 3 text

Outline • Introduction to AuriStorFS • AuriStorFS Supported Platforms • Linux Distributions • macOS (Apple Silicon and Intel) • Microsoft Windows 10 & 11 • AuriStorFS Futures

Slide 4

Slide 4 text

AuriStorFS – An AFS Family file system

Slide 5

Slide 5 text

AuriStor’s Vision for /afs Application Transparency Be a First Class file system on all major OSes Client support built into Linux kernel Embrace multi-producer, multi-consumer work-flows Be performance competitive Lustre, GPFS, Panasas, NFSv4, … Best of breed data security Wire privacy and integrity protection policy guarantees Combined identity authentication Multi-factor authorization Geographic isolation Improved Ease of Use No Flag Days

Slide 6

Slide 6 text

Key Features ▪ Security ▪ Combined identity authn ▪ Multi-factor authz ▪ Volume and File Server policies ▪ Data privacy for all services ▪ Need to know access policies (GDPR) ▪ AES256 wire privacy ▪ Protection against cache poisoning attacks ▪ Networking ▪ IPv6 ▪ Can saturate multiple bonded 10Gbit NICs ▪ File and DB Server Performance and Scale ▪ Dynamic per-service worker pools ▪ Reduced resource contention ▪ Supports multi-producer, multi-consumer workflows ▪ UBIK DB quorum ▪ Accelerated establishment ▪ Membership up to 80 servers per cell ▪ 2038 compatible ▪ Simplified configuration management ▪ Robust continuous integration test suite ▪ USENIX LISA21 “Hands off testing for Network Filesystems” ▪ Coverity analysis ▪ Docker container deployments ▪ Servers run as unprivileged user; not “root” ▪ Arbitrary file server vice partition backing store ▪ Any POSIX file system – local or remote ▪ Object stores ▪ Enumeration “each” commands for servers, volumes, and protection entities

Slide 7

Slide 7 text

The Global /afs file namespace ▪ The global /afs file namespace has a long history of use ▪ Federated authentication ▪ Home and shared Project directories ▪ Cross platform distributed access to files and directory trees over the WAN ▪ Anything that benefits from atomic publication and/or read-only replication ▪ Software distribution ▪ Static web content distribution ▪ Global data replication and live data migration ▪ New use cases include ▪ Persistent storage for containerized processes ▪ Distribution of container images

Slide 8

Slide 8 text

AFS without Fear! No use case avoidance ▪ With AuriStorFS /afs now also handles work-flows that require ▪ Thousands of nodes modifying a common set of objects (multiple writer, multiple reader) ▪ Hundreds of processes on a single multi-processor thread client system ▪ Robust multi-factor authorization and security (auth/integ/priv) requirements ▪ End users expect their data to be available out of the box with no third-party software ▪ Linux native AFS (kafs) and AF_RXRPC now provide an out-of-the-box AuriStorFS client on Fedora, Debian, and Ubuntu Our partners push the limits of /afs without fear! No more walking on eggshells.

Slide 9

Slide 9 text

What if something doesn’t work? ▪ No software is bug free ▪ If you don’t sleep, we don’t sleep ▪ The AuriStor team will perform a thorough root cause analysis and fix the problem no matter how big or small; no matter how long it takes

Slide 10

Slide 10 text

Functional improvements due to end user usage ▪ Unlimited data transfer RX calls ▪ 19 March 2018 – World’s first RX call to send more than 5.63TB ▪ World’s largest AFS volume – 500TB ▪ Fully functional – migration, replication, backup, restore ▪ Volume quotas up to 16 Zettabytes ▪ 500,000 sustained RX connections per fileserver ▪ 40,000 Ubik queries/second and 25 writes/second sustained (See “Ubik Services at Scale” talk on Wednesday) ▪ Linux cache manager scaling beyond 64 simultaneous processor threads with minimal resource contention. (See “Juggling Bottlenecks” talk) ▪ 5TB and larger files

Slide 11

Slide 11 text

AFS Operational Goals - Review Location Independence Authentication, Integrity Protection, & Privacy Geographic Replication of Critical Data Atomic Publishing Model One File System for all Fine Grained Access Control Federated Authentication Platform specific path resolution (@sys) Platform Architecture Independence Distributed Administration

Slide 12

Slide 12 text

/afs deployments have long lifetimes For the work flows that AFS excels at there are no clear alternatives Transition Costs are Huge Legacy deployments are not enough

Slide 13

Slide 13 text

1990s era deployments of /afs

Slide 14

Slide 14 text

AuriStorFS: The Next Generation AFS ▪ Zero configuration clients ▪ Improved Security Model ▪ Client cache poisoning attack prevention ▪ Performance ▪ Scale ▪ Enhanced File System Functionality ▪ Out-of-the-box /afs access

Slide 15

Slide 15 text

Zero configuration

Slide 16

Slide 16 text

Cell Discovery for /afs ▪ In modern clients, the contents of /afs can be populated on-demand via DNS SRV queries

Slide 17

Slide 17 text

Federated Authentication ▪ Likewise, cell mapping to authentication domains is also performed via DNS ▪ GSS-Kerberos v5 resolves the realm from the service ticket host component ▪ yfs-rxgk/_afs.your-file-system.com

Slide 18

Slide 18 text

Security Model Improvements

Slide 19

Slide 19 text

Multi-layered Security ✓ Per-server Security Policies ✓ Per-volume Security Polices ✓ Per-volume Maximum Access Control ✓ Per-object Access Control ✓ Combined Identity Authentication

Slide 20

Slide 20 text

GSS-API RX Security Class Multiple authentication services Kerberos v5 (Active Directory) GSS Secure Anonymous Modern crypto (e.g. AES, Camellia) Larger key sizes Reduced key exposure Rekeying Token combining

Slide 21

Slide 21 text

Combined Identity Authentication https://www.auristor.com/documentation/man/linux/7/auristorfs_acls.html Authenticated User on an Authenticated Machine Anonymous User on an Authenticated Machine Key Combination protects against cache poisoning attacks

Slide 22

Slide 22 text

User-Centric, Constrained Elevation Access Control ▪ Combined Identity Authentication, and Multi-factor Access Control Lists • grant more rights, not fewer, when used to evaluate an ACL's Normal entries • revoke more rights, not fewer, when used to evaluate an ACL's Negative entries https://www.auristor.com/documentation/man/linux/7/auristorfs_acls.html

Slide 23

Slide 23 text

RX Security Class Capabilities yfs-rxgk rxkad-k5 rxnull Authentication GSS-API Kerberos v5 None Integrity Protection Yes Yes No Privacy AES256-CTS-HMAC- SHA1-96 AES256-CTS-HMAC- SHA512-384 fcrypt No Rekeying Yes No No Max Pkts / Call 2^64 2^30 unlimited # identities one or more (ordered) one none Combined Key Yes No No

Slide 24

Slide 24 text

Volume Security Policies and Maximum ACLs AFS Volumes are relocatable, replicable object stores Security Policies mandate the use of RX Security Classes, and Data Protection levels (integrity protection / wire privacy) Also, determine which servers are permitted to store which objects Maximum ACLs restrict the access rights that can be granted via per-object ACLs

Slide 25

Slide 25 text

Scale

Slide 26

Slide 26 text

32-bits just isn’t enough ▪ Maximum ~2GiB or ~4GiB file size ▪ 2038-problem, 2106-problem ▪ 100ns granularity to match NTFS ▪ UNIX Epoch

Slide 27

Slide 27 text

Scaling to Meet Demand AuriStorFS AFS Year 2038 Ready Yes No Timestamp Granularity 100ns (UNIX Epoch) 1s (UNIX Epoch) Rx Listener Thread Throughput <= 8.2 gbit/second <= 2.4 gbits/second Rx Listener Threads per Service up to 16 1 Rx Window Size (default) 128 packets / 180.5KB 32 packets / 44KB Rx Window Size (maximum) 65535 packets / 90.2MB 32 packets / 44KB Rx Congestion Window Validation Yes No Volume IDs per Cell 264 231 Object IDs per Volume 295 directories and 295 files 230 directories and 230 files Objects per Volume 290 directories or files 226 directories or files Objects per Directory Up to 2,029,072 up to 64,447 Maximum Distributed DB Size 16 exabytes (264 bytes) 2 gigabytes (231 bytes) Maximum Assignable Quota 16 zettabytes (274 bytes) 2 terabytes (241 bytes) Maximum Reported Volume Size 16 zettabytes (274 bytes) 2 terabytes (241 bytes) Maximum Volume Size 16 zettabytes (274 bytes) 16 zettabytes (274 bytes) Maximum Transferable Volume Size 16 zettabytes (274 bytes) 5.639 terabytes Maximum Partition Size 16 zettabytes (274 bytes) 16 zettabytes (274 bytes)

Slide 28

Slide 28 text

Enhanced File System and Network Functionality

Slide 29

Slide 29 text

Networking Improvements IPv6 Pipeline data engine SACK based loss recovery as documented in RFC6675 with elements of New Reno from RFC6582 on top of TCP-style congestion control elements as documented in RFC5681 Maximum window size increased to 65535 packets from 32

Slide 30

Slide 30 text

Filesystem Operation Enhancements Locking • Advisory Locks • Mandatory Locks • Atomic Create and Lock Rename Variants • Replace existing target • NoReplace – fail if existing target • Exchange Append Data • Return offset at completion Store Data Variants • Reserve • Reserve and Truncate • Reserve and ZeroFill Server Side Operations • Silly Rename • Copy Object • Copy Range • Move Object

Slide 31

Slide 31 text

Client Caching and Callback Data Version incremented for each object's data change Secure Callback notifications Issued to all subscribed clients for data and metadata change Delivered before completion of the RPC to ensure serialization of object updates and subsequent communications via other channels Cached data is not considered valid unless an outstanding callback promise exists Callback promises expire after a period determined by the object server.

Slide 32

Slide 32 text

Linux Kernel AFS: Out of the Box AFS

Slide 33

Slide 33 text

KAFS – Linux Native AFS Client Part of the Linux kernel Compiled as a module by Fedora, Debian Module signed by build process kafs-client package (rpm, deb) Sets home cell Handles DNS cell lookups /afs started by systemd during boot Basic Kerberos authentication

Slide 34

Slide 34 text

Fedora 34 Accessing /afs dnf install krb5-workstation kafs-client • [defaults] • thiscell = • sysname = [optional] Create /etc/kafs/client.d/defaults.conf systemctl start afs.mount Systemctl enable afs.mount

Slide 35

Slide 35 text

Contrasting AuriStorFS and KAFS

Slide 36

Slide 36 text

AuriStorFS Linux Client Design ▪ Cross Platform ▪ Common code is shared across Linux, Solaris, AIX, macOS, and various BSD flavors; and a userland library ▪ Monolithic kernel module ▪ Combines the filesystem, rx network stack, and data/metadata cache management. ▪ The internal locking model for inodes and dentries is an imperfect match for all vfs implementations. ▪ One Mounted Device For All Volumes ▪ The entire file namespace is exposed to the vfs as a single device traditionally mounted on /afs

Slide 37

Slide 37 text

KAFS Linux Client Design ▪ Single platform ▪ Only exists on Linux, but will work with any arch ▪ Modular design ▪ Userspace-accessible AF_RXRPC network protocol driver ▪ FS-Cache caching, shared with NFS, CIFS, … ▪ Uses native Linux VFS locking model ▪ Each volume has its own superblock ▪ Vnode IDs map to inode numbers ▪ df works correctly ▪ Linux VFS automounter handles AFS mountpoints ▪ Individual volumes can be mounted anywhere

Slide 38

Slide 38 text

Contrasting Development Processes

Slide 39

Slide 39 text

Advantages of an External Project ▪ Kernel-independent ▪ Already ported to many systems ▪ Features developed there ▪ Faster rollout ▪ Point of focus for developers interested in AFS

Slide 40

Slide 40 text

Disadvantages of Being Out-of-tree ▪ Kernel interfaces change often ▪ Parts of kernel API not accessible to non-GPL ▪ Massive continuous building effort to support many kernels across distributions ▪ User must find and install the packages

Slide 41

Slide 41 text

Advantages of Being In-Kernel (1) ▪ Kernel-reserved interfaces ▪ RCU, container features ▪ Shared components ▪ FS-Cache, keyrings ▪ Crypto ▪ Tracepoints ▪ Debugging facilities

Slide 42

Slide 42 text

Advantages of Being In-Kernel (2) ▪ Developers making big changes in-kernel must make the changes throughout ▪ Automated testing ▪ Distributions build the filesystem modules with distribution kernels ▪ Much more widely available ❑Note: Not in any enterprise distros yet

Slide 43

Slide 43 text

Recent AuriStorFS Enhancements

Slide 44

Slide 44 text

Large File Write Performance ▪ 1TB file copy 3.7 hours compared to 11 hours with v0.188

Slide 45

Slide 45 text

Large File Write Performance – No soft lock ups on Linux ▪ 1TB file copy produced 288 soft lockup BUGs ▪ Soft lockups are logged when a processor’s non-maskable interrupt watchdog thread is unable to execute for watchdog_thresh seconds ▪ In this case, the soft lockups were the result of the computational complexity of the Store Segments algorithm which was dependent upon the size of the file be saved ▪ [157841.618236] watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [dd:959736] ▪ [157841.618291] RIP: 0010:afs_StoreAllSegments+0x407/0xd20 [yfs] Before After

Slide 46

Slide 46 text

RX Security - Crypto engine ▪ AuriStorFS leverages Operating System vendor’s crypto engine ▪ FIPS certificate, speed, access to hardware accelerators ▪ Integrated support into Heimdal’s hcrypto framework ▪ Microsoft Crypto Next Gen (WinCNG), Apple Common Crypto, Solaris PKCS#11, Linux OpenSSL ▪ Simon Wilkinson’s new crypto library leverages the latest x86/x64 instructions ▪ Intel Advanced Encryption Standard New Instructions (AES-NI) ▪ Streaming Single Instruction Multiple Data Extensions (SSE, SSE2, SSE3) ▪ Advanced Vector Instructions (AVX, AVX2) ▪ to encrypt, decrypt, sign and verify RX packets at a high level of efficiency ▪ Compared to rfc3961 kernel implementation contributed by AuriStor to OpenAFS in 2010; and userland OSX Common Crypto (built from OpenSSL) ▪ Faster crypto -- Reduced latency ▪ Available for AMD64 and ARM8 platforms

Slide 47

Slide 47 text

RX Security New Crypto engine – Intel RESULTS ▪ Measurements were performed on a MacBook Air ▪ macOS Sierra ▪ Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz (Haswell 22nm) ▪ Intel® SSE4.1, Intel® SSE4.2, Intel® AVX2 ▪ Intel® AES New Instructions ▪ Time to compute 5,000,000 rxgk packets in a single thread ▪ Worst case for encryption is MIT with built-in crypto at 203 seconds (or 24,630 packets/sec) ▪ Best case is 23 seconds (or 217,391 packets/sec) Faster than -> RFC3961 (kernel) CommonCrypto MIT Krb5 (built-in) MIT Krb5 (OpenSSL) Encryption 6x 3x 9x 3.5x Decryption 5x 5x 14x 6x Sign and Verify 3x 2x 9x 2x

Slide 48

Slide 48 text

Volume Transaction Lifecycle Improvements ▪ New or updated commands: ▪ “vos status”, “vos movesite”, “vos copysite”, “vos eachfs”, “vos eachvl”, “vos eachvol” ▪ Improved behavior for system administrators ▪ YFSVOL RPC suite removes race conditions associated with volume transactions ▪ volservers automatically garbage collect temporary or partial volumes created by interrupted or failed “vos” commands ▪ Read-only volumes can be moved between servers or partitions on the same server ▪ Read-write volume moves recreate the backup volume on the destination ▪ Avoids loss of nightly backup if move occurs between snapshot and backup ▪ Improved status reporting ▪ Bytes sent and received for each call bound to a transaction ▪ Search transactions by volume group ▪ Volserver validates volume transaction metadata against location database for consistency checks ▪ Coordinated shutdown of volume transactions during volserver shutdown

Slide 49

Slide 49 text

The “wait 15 minutes between volume releases” rule ▪ The atomic publishing workflow involves writing related changes to a RW volume and publishing the change set to clients via “vos release” ▪ AFS-lore indicates that any volume must not be "released" more frequently than once every ten to fifteen minutes. ▪ Releasing volumes more often can result in transient data access failures. This wisdom has been based upon empirical evidence with no explanation of why this is true. ▪ An AuriStorFS end-user wished to release changes every two minutes. ▪ Failures traced to UNIX cache manager handling of VOFFLINE errors, VLSF_DONTUSE flags, and a periodic ten-minute daemon check. ▪ Starting with the March 2021 client release, there is no longer a limit to the frequency of volume releases

Slide 50

Slide 50 text

Development by the numbers since 1 Jan 2019 ▪ 8 contributing developers ▪ 21,245 code review submissions ▪ 129 errors detected by Coverity ▪ 3545 commits ▪ 2268 files changed, 172506 insertions(+), 105934 deletions(-) ▪ ~2000 new tests (~8400 total) ▪ 163,044 continuous integration builds spanning 36 platforms ▪ 11 releases 0 50 100 150 200 250 300 350 400 Mar-12 Aug-12 Jan-13 Jun-13 Nov-13 Apr-14 Sep-14 Feb-15 Jul-15 Dec-15 May-16 Oct-16 Mar-17 Aug-16 Jan-17 Jun-17 Nov-17 Apr-18 Sep-18 Feb-19 Jul-19 Dec-19 May-20 Oct-20 Mar-21 Source Commits By Month AuriStorFS OpenAFS

Slide 51

Slide 51 text

AuriStorFS Supported Platforms

Slide 52

Slide 52 text

Linux Platform support ▪ Distributions: ▪ Red Hat Enterprise Linux 8.4, 7.9, 6.10 and Extended Support ▪ Red Hat Fedora 32, 33, 34 ▪ Debian Bullseye, Buster, Stretch, Trust ▪ Ubuntu 20.04, 18.04, 16.04, 14.04 ▪ CentOS 6, 7, 8 ▪ Amazon Linux 2 ▪ Oracle Linux ▪ Architectures ▪ X86_64 ▪ aarch64 ▪ PPC-LE (no hardware accelerated crypto)

Slide 53

Slide 53 text

macOS Platform Support 11 – Big Sur (Apple Silicon and Intel) 10.15 – Catalina 10.14 – Mojave 10.13 – High Sierra 10.12 – Sierra Each platform supported within one day of Apple release; including support for Apple Silicon with hardware accelerated cryptography on 12 November 2020.

Slide 54

Slide 54 text

Windows Support All Intel Windows 10 versions Back-level support to Windows 7 if SHA256 signature support installed Even Windows 11

Slide 55

Slide 55 text

AuriStorFS Futures

Slide 56

Slide 56 text

Reinvestment in the Community ▪ AuriStor’s recognizes that its continued success requires a healthy ecosystem around us ▪ AuriStor is proud to sponsor the work of these critical organizations ▪ USENIX Benefactor ▪ LISA Gold Sponsor ▪ VAULT Sponsor ▪ The Linux Foundation Silver Member ▪ Open Source Security Foundation Board Member ▪ SPEC Open Systems Group Member

Slide 57

Slide 57 text

On-going Investment in LINUX ▪ AuriStor will continue to support and develop its out-of-tree client ▪ Faster to fix bugs and develop new features and functionality ▪ But we believe the long-term future depends upon the success of the in-tree afs filesystem and AF_RXRPC ▪ AuriStor will continue to partner with David Howells on the development and QA testing of kafs and AF_RXRPC and will continue to invest in its adoption by downstream Linux distributions

Slide 58

Slide 58 text

Fine grained volume properties ▪ At present the granularity for enablement of OpenAFS incompatible volume capabilities is per fileserver ▪ Per-file ACLs ▪ Hard Links ▪ Large Directories ▪ This choice is problematic because the owner of a volume might want to ensure OpenAFS compatibility ▪ This Summer AuriStor will introduce per-volume feature management

Slide 59

Slide 59 text

Fileserver RPC Refresh – 2021 Edition ▪ New FetchStatus fields ▪ Server Creation Time ▪ New RPCs ▪ Fetch Status for User List ▪ Server-side silly rename ▪ New Rename variants (Replace, Replace Silly, No Replace, Exchange) ▪ All return status information on renamed vnodes ▪ Callbacks for Hard and Symbolic Links ▪ New Store Data modes to handle short writes (Zero Fill, Short OK) ▪ Append Data ▪ Copy File Range for server-side copies including between volumes ▪ New Fetch Data returns metadata before data in case of interruption ▪ New WhoAmI returns the fileserver’s cell name ▪ Enabled by Cache Manager Capability bit. Once enabled all old RPCs are rejected by FS.

Slide 60

Slide 60 text

Credits

Slide 61

Slide 61 text

2016 ACM Software System Award • AuriStor stands on the shoulders of those that came before us • Mahadev Satyanarayanan • Michael L. Kazar • Robert N. Sidebotham • David A. Nichols • Michael J. West • John H. Howard • Alfred Z. Spector • Sherri M. Nichols

Slide 62

Slide 62 text

U.S. Department of Energy SBIR Commercialization Grant

Slide 63

Slide 63 text

Licensing

Slide 64

Slide 64 text

Pricing ▪ Perpetural license starts at US$21,000 ▪ 4 DB Service Instances ▪ 4 File Service Instances ▪ Unlimited client devices ▪ 1000 Protection Records for User and Managed Devices Entities ▪ One year of web support and feature updates included ▪ Annual licensing for continued support and feature updates

Slide 65

Slide 65 text

Source code licensing ▪ AuriStorFS source code is available (under contract) to licensees ▪ AuriStor believes in the benefits of community development ▪ Multi-platform distributed infrastructures such as AuriStorFS requires sustained investment ▪ Global Public Goods are taken for granted

Slide 66

Slide 66 text

More Information

Slide 67

Slide 67 text

Want to Learn More? ▪ History of AFS Tutorial: CERN Education series ▪ https://indico.cern.ch/event/126258/ ▪ Future of AFS: CERN Education series ▪ https://indico.cern.ch/event/126259/ ▪ Leveraging AFS Storage Systems to Ease Global Software Deployment ▪ https://www.usenix.org/conference/lisa21/presentation/dimarco

Slide 68

Slide 68 text

No content