file system on all major OSes Client support built into Linux kernel Embrace multi-producer, multi-consumer work-flows Be performance competitive Lustre, GPFS, Panasas, NFSv4, … Best of breed data security Wire privacy and integrity protection policy guarantees Combined identity authentication Multi-factor authorization Geographic isolation Improved Ease of Use No Flag Days
authz ▪ Volume and File Server policies ▪ Data privacy for all services ▪ Need to know access policies (GDPR) ▪ AES256 wire privacy ▪ Protection against cache poisoning attacks ▪ Networking ▪ IPv6 ▪ Can saturate multiple bonded 10Gbit NICs ▪ File and DB Server Performance and Scale ▪ Dynamic per-service worker pools ▪ Reduced resource contention ▪ Supports multi-producer, multi-consumer workflows ▪ UBIK DB quorum ▪ Accelerated establishment ▪ Membership up to 80 servers per cell ▪ 2038 compatible ▪ Simplified configuration management ▪ Robust continuous integration test suite ▪ USENIX LISA21 “Hands off testing for Network Filesystems” ▪ Coverity analysis ▪ Docker container deployments ▪ Servers run as unprivileged user; not “root” ▪ Arbitrary file server vice partition backing store ▪ Any POSIX file system – local or remote ▪ Object stores ▪ Enumeration “each” commands for servers, volumes, and protection entities
namespace has a long history of use ▪ Federated authentication ▪ Home and shared Project directories ▪ Cross platform distributed access to files and directory trees over the WAN ▪ Anything that benefits from atomic publication and/or read-only replication ▪ Software distribution ▪ Static web content distribution ▪ Global data replication and live data migration ▪ New use cases include ▪ Persistent storage for containerized processes ▪ Distribution of container images
/afs now also handles work-flows that require ▪ Thousands of nodes modifying a common set of objects (multiple writer, multiple reader) ▪ Hundreds of processes on a single multi-processor thread client system ▪ Robust multi-factor authorization and security (auth/integ/priv) requirements ▪ End users expect their data to be available out of the box with no third-party software ▪ Linux native AFS (kafs) and AF_RXRPC now provide an out-of-the-box AuriStorFS client on Fedora, Debian, and Ubuntu Our partners push the limits of /afs without fear! No more walking on eggshells.
free ▪ If you don’t sleep, we don’t sleep ▪ The AuriStor team will perform a thorough root cause analysis and fix the problem no matter how big or small; no matter how long it takes
transfer RX calls ▪ 19 March 2018 – World’s first RX call to send more than 5.63TB ▪ World’s largest AFS volume – 500TB ▪ Fully functional – migration, replication, backup, restore ▪ Volume quotas up to 16 Zettabytes ▪ 500,000 sustained RX connections per fileserver ▪ 40,000 Ubik queries/second and 25 writes/second sustained (See “Ubik Services at Scale” talk on Wednesday) ▪ Linux cache manager scaling beyond 64 simultaneous processor threads with minimal resource contention. (See “Juggling Bottlenecks” talk) ▪ 5TB and larger files
& Privacy Geographic Replication of Critical Data Atomic Publishing Model One File System for all Fine Grained Access Control Federated Authentication Platform specific path resolution (@sys) Platform Architecture Independence Distributed Administration
Multi-factor Access Control Lists • grant more rights, not fewer, when used to evaluate an ACL's Normal entries • revoke more rights, not fewer, when used to evaluate an ACL's Negative entries https://www.auristor.com/documentation/man/linux/7/auristorfs_acls.html
v5 None Integrity Protection Yes Yes No Privacy AES256-CTS-HMAC- SHA1-96 AES256-CTS-HMAC- SHA512-384 fcrypt No Rekeying Yes No No Max Pkts / Call 2^64 2^30 unlimited # identities one or more (ordered) one none Combined Key Yes No No
replicable object stores Security Policies mandate the use of RX Security Classes, and Data Protection levels (integrity protection / wire privacy) Also, determine which servers are permitted to store which objects Maximum ACLs restrict the access rights that can be granted via per-object ACLs
as documented in RFC6675 with elements of New Reno from RFC6582 on top of TCP-style congestion control elements as documented in RFC5681 Maximum window size increased to 65535 packets from 32
• Atomic Create and Lock Rename Variants • Replace existing target • NoReplace – fail if existing target • Exchange Append Data • Return offset at completion Store Data Variants • Reserve • Reserve and Truncate • Reserve and ZeroFill Server Side Operations • Silly Rename • Copy Object • Copy Range • Move Object
data change Secure Callback notifications Issued to all subscribed clients for data and metadata change Delivered before completion of the RPC to ensure serialization of object updates and subsequent communications via other channels Cached data is not considered valid unless an outstanding callback promise exists Callback promises expire after a period determined by the object server.
kernel Compiled as a module by Fedora, Debian Module signed by build process kafs-client package (rpm, deb) Sets home cell Handles DNS cell lookups /afs started by systemd during boot Basic Kerberos authentication
is shared across Linux, Solaris, AIX, macOS, and various BSD flavors; and a userland library ▪ Monolithic kernel module ▪ Combines the filesystem, rx network stack, and data/metadata cache management. ▪ The internal locking model for inodes and dentries is an imperfect match for all vfs implementations. ▪ One Mounted Device For All Volumes ▪ The entire file namespace is exposed to the vfs as a single device traditionally mounted on /afs
on Linux, but will work with any arch ▪ Modular design ▪ Userspace-accessible AF_RXRPC network protocol driver ▪ FS-Cache caching, shared with NFS, CIFS, … ▪ Uses native Linux VFS locking model ▪ Each volume has its own superblock ▪ Vnode IDs map to inode numbers ▪ df works correctly ▪ Linux VFS automounter handles AFS mountpoints ▪ Individual volumes can be mounted anywhere
Parts of kernel API not accessible to non-GPL ▪ Massive continuous building effort to support many kernels across distributions ▪ User must find and install the packages
in-kernel must make the changes throughout ▪ Automated testing ▪ Distributions build the filesystem modules with distribution kernels ▪ Much more widely available ❑Note: Not in any enterprise distros yet
Linux ▪ 1TB file copy produced 288 soft lockup BUGs ▪ Soft lockups are logged when a processor’s non-maskable interrupt watchdog thread is unable to execute for watchdog_thresh seconds ▪ In this case, the soft lockups were the result of the computational complexity of the Store Segments algorithm which was dependent upon the size of the file be saved ▪ [157841.618236] watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [dd:959736] ▪ [157841.618291] RIP: 0010:afs_StoreAllSegments+0x407/0xd20 [yfs] Before After
vendor’s crypto engine ▪ FIPS certificate, speed, access to hardware accelerators ▪ Integrated support into Heimdal’s hcrypto framework ▪ Microsoft Crypto Next Gen (WinCNG), Apple Common Crypto, Solaris PKCS#11, Linux OpenSSL ▪ Simon Wilkinson’s new crypto library leverages the latest x86/x64 instructions ▪ Intel Advanced Encryption Standard New Instructions (AES-NI) ▪ Streaming Single Instruction Multiple Data Extensions (SSE, SSE2, SSE3) ▪ Advanced Vector Instructions (AVX, AVX2) ▪ to encrypt, decrypt, sign and verify RX packets at a high level of efficiency ▪ Compared to rfc3961 kernel implementation contributed by AuriStor to OpenAFS in 2010; and userland OSX Common Crypto (built from OpenSSL) ▪ Faster crypto -- Reduced latency ▪ Available for AMD64 and ARM8 platforms
were performed on a MacBook Air ▪ macOS Sierra ▪ Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz (Haswell 22nm) ▪ Intel® SSE4.1, Intel® SSE4.2, Intel® AVX2 ▪ Intel® AES New Instructions ▪ Time to compute 5,000,000 rxgk packets in a single thread ▪ Worst case for encryption is MIT with built-in crypto at 203 seconds (or 24,630 packets/sec) ▪ Best case is 23 seconds (or 217,391 packets/sec) Faster than -> RFC3961 (kernel) CommonCrypto MIT Krb5 (built-in) MIT Krb5 (OpenSSL) Encryption 6x 3x 9x 3.5x Decryption 5x 5x 14x 6x Sign and Verify 3x 2x 9x 2x
“vos status”, “vos movesite”, “vos copysite”, “vos eachfs”, “vos eachvl”, “vos eachvol” ▪ Improved behavior for system administrators ▪ YFSVOL RPC suite removes race conditions associated with volume transactions ▪ volservers automatically garbage collect temporary or partial volumes created by interrupted or failed “vos” commands ▪ Read-only volumes can be moved between servers or partitions on the same server ▪ Read-write volume moves recreate the backup volume on the destination ▪ Avoids loss of nightly backup if move occurs between snapshot and backup ▪ Improved status reporting ▪ Bytes sent and received for each call bound to a transaction ▪ Search transactions by volume group ▪ Volserver validates volume transaction metadata against location database for consistency checks ▪ Coordinated shutdown of volume transactions during volserver shutdown
atomic publishing workflow involves writing related changes to a RW volume and publishing the change set to clients via “vos release” ▪ AFS-lore indicates that any volume must not be "released" more frequently than once every ten to fifteen minutes. ▪ Releasing volumes more often can result in transient data access failures. This wisdom has been based upon empirical evidence with no explanation of why this is true. ▪ An AuriStorFS end-user wished to release changes every two minutes. ▪ Failures traced to UNIX cache manager handling of VOFFLINE errors, VLSF_DONTUSE flags, and a periodic ten-minute daemon check. ▪ Starting with the March 2021 client release, there is no longer a limit to the frequency of volume releases
Intel) 10.15 – Catalina 10.14 – Mojave 10.13 – High Sierra 10.12 – Sierra Each platform supported within one day of Apple release; including support for Apple Silicon with hardware accelerated cryptography on 12 November 2020.
success requires a healthy ecosystem around us ▪ AuriStor is proud to sponsor the work of these critical organizations ▪ USENIX Benefactor ▪ LISA Gold Sponsor ▪ VAULT Sponsor ▪ The Linux Foundation Silver Member ▪ Open Source Security Foundation Board Member ▪ SPEC Open Systems Group Member
and develop its out-of-tree client ▪ Faster to fix bugs and develop new features and functionality ▪ But we believe the long-term future depends upon the success of the in-tree afs filesystem and AF_RXRPC ▪ AuriStor will continue to partner with David Howells on the development and QA testing of kafs and AF_RXRPC and will continue to invest in its adoption by downstream Linux distributions
enablement of OpenAFS incompatible volume capabilities is per fileserver ▪ Per-file ACLs ▪ Hard Links ▪ Large Directories ▪ This choice is problematic because the owner of a volume might want to ensure OpenAFS compatibility ▪ This Summer AuriStor will introduce per-volume feature management
▪ Server Creation Time ▪ New RPCs ▪ Fetch Status for User List ▪ Server-side silly rename ▪ New Rename variants (Replace, Replace Silly, No Replace, Exchange) ▪ All return status information on renamed vnodes ▪ Callbacks for Hard and Symbolic Links ▪ New Store Data modes to handle short writes (Zero Fill, Short OK) ▪ Append Data ▪ Copy File Range for server-side copies including between volumes ▪ New Fetch Data returns metadata before data in case of interruption ▪ New WhoAmI returns the fileserver’s cell name ▪ Enabled by Cache Manager Capability bit. Once enabled all old RPCs are rejected by FS.
shoulders of those that came before us • Mahadev Satyanarayanan • Michael L. Kazar • Robert N. Sidebotham • David A. Nichols • Michael J. West • John H. Howard • Alfred Z. Spector • Sherri M. Nichols
Service Instances ▪ 4 File Service Instances ▪ Unlimited client devices ▪ 1000 Protection Records for User and Managed Devices Entities ▪ One year of web support and feature updates included ▪ Annual licensing for continued support and feature updates
contract) to licensees ▪ AuriStor believes in the benefits of community development ▪ Multi-platform distributed infrastructures such as AuriStorFS requires sustained investment ▪ Global Public Goods are taken for granted
Education series ▪ https://indico.cern.ch/event/126258/ ▪ Future of AFS: CERN Education series ▪ https://indico.cern.ch/event/126259/ ▪ Leveraging AFS Storage Systems to Ease Global Software Deployment ▪ https://www.usenix.org/conference/lisa21/presentation/dimarco