Upgrade to Pro — share decks privately, control downloads, hide ads and more …

quFiles: The Right File at The Right Time

Emaad Manzoor
November 05, 2012

quFiles: The Right File at The Right Time

Paper presented by Rachee Singh and myself at our Data Storage Technologies & Networks Reading Group, authored by Kaushik Veeraraghavan , Jason Flinn , Edmund B. Nightingale and Brian Noble.

Emaad Manzoor

November 05, 2012
Tweet

More Decks by Emaad Manzoor

Other Decks in Technology

Transcript

  1. The Problem: Example Scenarios. • Transcode data for small-screen devices.

    • Provide low-quality streaming for low-bandwidth networks. • Redact data files at insecure locations.
  2. The Problem: Reinventing The Wheel. • Transcode data for small-screen

    devices. • Provide low-quality streaming for low-bandwidth networks. • Redact data files at insecure locations. Each built as a separate system, over years of effort.
  3. The Problem: Reinventing The Wheel. • Transcode data for small-screen

    devices. • Provide low-quality streaming for low-bandwidth networks. • Redact data files at insecure locations. Most require application-level or operating-system level changes.
  4. The Problem: Reinventing The Wheel. • Transcode data for small-screen

    devices. • Provide low-quality streaming for low-bandwidth networks. • Redact data files at insecure locations. Each is a separate implementation of a more fundamental abstraction.
  5. The Solution: Description. • Encapsulates different physical representations of the

    same logical data. • The particular representation is not determined until it is needed; like a quBit. • Provides this abstraction as a first-class filesystem entity. qu
  6. 1. Design Goals Transparent to the quFile-unaware Powerful to the

    quFile-aware Capable of static and dynamic operation Flexible for policy writers 2. Implementation 3. Case Studies 4. Results
  7. 1. Design Goals 2. Implementation 3. Case Studies 4. Results

    I. Transparent to the quFile-unaware • The quFile-unaware application sees only one view of the file, the default view. • The application has no idea that there are other views of the file available. • The data content might change, but the quFile mechanism to retrieve it remains the same.
  8. 1. Design Goals 2. Implementation 3. Case Studies 4. Results

    II. Powerful to the quFile-aware • quFile representations are called views. • quFile-aware applications can access the raw view of the quFile. • quFile-aware users can access specific view using a filename suffix.
  9. 1. Design Goals 2. Implementation 3. Case Studies 4. Results

    III. Capable of static and dynamic operation • Static or dynamic in terms of resolution and delivery of context-specific data; the resolved file representation need not exist until it is requested. • Dynamic resolution: When a video is requested in a low bandwidth context, transcode it on the fly and then deliver it. • Static resolution: Store pre-transcoded videos and serve the appropriate one depending on the context.
  10. 1. Design Goals 2. Implementation 3. Case Studies 4. Results

    • Developers shouldn't need to write much extra code; so policies must be short code modules. • Badly written policies shouldn't compromise the integrity of the rest of the system in any way. • The mechanism of detecting the context and loading and applying the policy must be transparent to the developer. IV. Flexible for policy writers
  11. 2. Implementation 3. Case Studies 4. Results I. Scenario Description

    1. Design Goals Return videos formatted for the device requesting them. File System Transcoder FILE SYSTEM CHANGE NOTIFICATION Formats and sizes for different devices: Name, Content, Cache qu DVR: *.TiVo, high quality Laptop: *.mp4, medium quality Mobile: *.mp4, low quality
  12. 2. Implementation 3. Case Studies 4. Results II. Background 1.

    Design Goals • Open-source file system, developed at Michigan-Ann Arbor. • Server-based distributed file system. • Supports mobile devices as well as traditional computers. BlueFS Kernel Module Daemon User level daemon, handles all VFS operations. Manages data in kernel caches.
  13. 2. Implementation 3. Case Studies 4. Results III. Views 1.

    Design Goals Default View: Usually refers to the most constrained view. Custom View: Some physical representation of the same logical data. Raw View: Lists all views. Example: Versioned File System Default view: Current version. Raw view: All versions. Custom view: Yesterday's version.
  14. 2. Implementation 3. Case Studies 4. Results 1. Design Goals

    ~$ mkdir foo.quFile ~$ mv /tmp/foo.mp4 foo.quFile • A quFile is a new type of file system object. • Physically (on-disk), identical to a directory. ◦ But in a directory, file resolution is static. IV. Physical Representation Example: • quFiles need to hide from their parent directories by replacing entries in the parent inodes with resolved files. ◦ Requires modifying VFS operations, but only for the *. qufile namespace.
  15. 2. Implementation 3. Case Studies 4. Results Policies 1. Design

    Goals • Policies govern the behaviour and resolution of quFiles. • That means, given a quFile name, these policies decide the actual file name, the inode, what data from the disk is being requested. • Policies are stored as shared libraries in the filesystem. • quFiles contain links to these shared libraries. V. Policies
  16. 2. Implementation 3. Case Studies 4. Results 1. Design Goals

    qu Policies V. Policies Policy Types qu 1. Name 2. Content 3. Edit 4. Cache
  17. 2. Implementation 3. Case Studies 4. Results 1. Name Policy

    This policy allows quFiles to have different logical names in different contexts. • VFS readdir on the parent directory containing a quFile doesn't return the name of the quFile. ◦ It is modified to return all logical representations of data encapsulated within the quFile. • quFiles occupy the *.qufile namespace. ◦ On creation or renaming of foo.quFile, if there is any other file with the name foo.*, the operation is disallowed. 1. Design Goals
  18. 2. Implementation 3. Case Studies 4. Results 1. Name Policy

    1. Design Goals The application calls readdir on a quFile. Call to the BlueFS daemon The daemon runs the name policy for the quFile. The name policy returns 0 - n names. The BlueFS kernel module calls filldir on each name. The application gets the data returned by filldir.
  19. 2. Implementation 3. Case Studies 4. Results 2. Content Policy

    1. Design Goals Lets quFiles have different contents in different contexts. Names retrieved from the name policy. VFS lookup for each name. Modified lookup function returns the correct inode.
  20. 2. Implementation 3. Case Studies 4. Results 3. Edit Policy

    1. Design Goals Specifies what modifications to the quFile are allowed. quFile is modified. BlueFS daemon called, which runs the edit policy. If allowed, save modified contents. If disallowed, the kernel returns a read-only error. If this causes a new version, save modification and log changes.
  21. 2. Implementation 3. Case Studies 4. Results 4. Cache Policy

    1. Design Goals Called when a file is read, specifying which content to cache on device's local disk. Cache Insert Whether or not cache contents should be evicted. Cache Eviction
  22. 2. Implementation 3. Case Studies 4. Results VI. Optimizations 1.

    Design Goals • Kernel Caching: Name and content resolution results are cached by the kernel module. • Compound RPCs: A new NFS4-like feature that allows many RPCs in a single round trip. ◦ All quFiles in a directory are read during a readdir, so batching the RPC calls for each quFile reduces overhead.
  23. 3. Case Studies 4. Results 1. Design Goals 2. Implementation

    I. Context Aware Data Redaction To protect sensitive information on mobile devices, create a quFile- aware utility that redacts XML files containing sensitive data. The utility parses files and obfuscates sensitive data. The redacted and original files are moved to a quFile. The context-aware policy determines if the location is secure. The name policy returns the same name as the original file. The content policy returns different data, based on location.
  24. 3. Case Studies 4. Results 1. Design Goals 2. Implementation

    II. Availability: Resource Aware Directories A resource-aware directory listing policy to tailor the contents of the directory to match the resources available to the computer. • Distributed file systems provide no distinction between data on the local machine and the data on remote servers. • The user is unaware until (s)he tries to access the data that the network bandwidth is insufficient to transfer. Attempt to access a media file. Check if file is available locally. Else, check if bandwidth is sufficient. List files that can be played.
  25. 4. Results 1. Design Goals 2. Implementation 3. Case Studies

    Evaluation Environment System: • Ubuntu 8.04 desktop. • Linux 2.6.24 kernel. • BlueFS server and client on the same desktop ◦ No local disk cache for the client. Scenarios: • Warm client: Client kernel page cache contains all the data to be read; no RPC or disk access required. • Cold client: Empty client kernel cache, but server kernel cache is hot; RPC required but no disk access. • Cold server: Empty server and client kernel caches; both RPC and disk access will be required.