Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Development of an associative file system

Lars Hupel
October 25, 2011

Development of an associative file system

Organizing multimedia data, e. g. pictures, music or videos is a rather common use case for modern file systems. There are quite a number of applications which try to expose an user-friendly interface for dealing with tagging, sorting and editing these files. This becomes necessary because sets of such files do not have an intrinsic hierarchic structure. For example, pictures taken with a digital camera carry EXIF metadata which can be used to retrieve a picture by date, time or location instead of an artificial folder structure.

However, the major problem shared by all of those multimedia applications is that the files are actually stored in folders on a traditional file system. As such, any operation done by an user outside of the application leads to inconsistencies inside of the application. Also, metadata produced by one application cannot be consumed by another one because of proprietary formats.

In this thesis, a file system which uses the established RDF standard for storing metadata is developed which imposes only little structural requirements on the data. The system features both an API which enables high-level operations on file contents and metadata and a CLI which resembles ideas from versioning systems like Git.

Also, a formalization of the most important operations is given, including a concept of transactions, which has been adapted from relational database systems to fit in the environment of a file system.

Lars Hupel

October 25, 2011
Tweet

More Decks by Lars Hupel

Other Decks in Research

Transcript

  1. Motivation Ideas Architecture Performance
    Development of an associative file system
    Lars Hupel
    TU München
    Fakultät für Informatik
    Lehrstuhl für Datenbanksysteme
    2011-10-25
    Lars Hupel Development of an associative file system 2011-10-25 1 / 17

    View Slide

  2. Motivation Ideas Architecture Performance
    File systems
    State of the art
    Major operating systems share the same concept of files:
    hierachically organized in folders
    chunk of data (bit string)
    identified by recursive name
    some metadata (timestamps, permissions, ...)
    Lars Hupel Development of an associative file system 2011-10-25 3 / 17

    View Slide

  3. Motivation Ideas Architecture Performance
    File systems
    Works nicely for many kinds of data, but...
    What about multimedia?
    Lars Hupel Development of an associative file system 2011-10-25 5 / 17

    View Slide

  4. Motivation Ideas Architecture Performance
    File systems and multimedia
    Fact: Some kinds of data are not suited
    for hierarchical storage.
    Lars Hupel Development of an associative file system 2011-10-25 6 / 17

    View Slide

  5. Motivation Ideas Architecture Performance
    File systems and multimedia
    Fact: Some kinds of data are not suited
    for hierarchical storage.
    Examples: photos, music, videos, ...
    Lars Hupel Development of an associative file system 2011-10-25 6 / 17

    View Slide

  6. Motivation Ideas Architecture Performance
    File systems and multimedia
    Fact: Some kinds of data are not suited
    for hierarchical storage.
    Examples: photos, music, videos, ...
    specialized library applications
    (Picasa, Digikam, ...)
    Lars Hupel Development of an associative file system 2011-10-25 6 / 17

    View Slide

  7. Motivation Ideas Architecture Performance
    Multimedia libraries
    ... are a solution, but not the best one.
    Problems
    only manage files of some types
    no common standard ( no import/export)
    usually no API
    cannot detect changes made outside
    Lars Hupel Development of an associative file system 2011-10-25 7 / 17

    View Slide

  8. Motivation Ideas Architecture Performance
    Multimedia libraries
    ... are a solution, but not the best one.
    Problems
    only manage files of some types
    no common standard ( no import/export)
    usually no API
    cannot detect changes made outside
    Lars Hupel Development of an associative file system 2011-10-25 7 / 17

    View Slide

  9. Motivation Ideas Architecture Performance
    Multimedia libraries
    ... are a solution, but not the best one.
    Problems
    only manage files of some types
    no common standard ( no import/export)
    usually no API
    cannot detect changes made outside
    Lars Hupel Development of an associative file system 2011-10-25 7 / 17

    View Slide

  10. Motivation Ideas Architecture Performance
    Multimedia libraries
    ... are a solution, but not the best one.
    Problems
    only manage files of some types
    no common standard ( no import/export)
    usually no API
    cannot detect changes made outside
    Lars Hupel Development of an associative file system 2011-10-25 7 / 17

    View Slide

  11. Motivation Ideas Architecture Performance
    Multimedia libraries
    ... are a solution, but not the best one.
    Problems
    only manage files of some types
    no common standard ( no import/export)
    usually no API
    cannot detect changes made outside
    Lars Hupel Development of an associative file system 2011-10-25 7 / 17

    View Slide

  12. Motivation Ideas Architecture Performance
    Basic model
    blob: a chunk of data
    file: set of (name, blob) pairs
    metadata: triples of subject, predicate, object
    no file names
    no folders
    Lars Hupel Development of an associative file system 2011-10-25 8 / 17

    View Slide

  13. Motivation Ideas Architecture Performance
    Basic model
    blob: a chunk of data
    file: set of (name, blob) pairs
    metadata: triples of subject, predicate, object
    no file names
    no folders
    Lars Hupel Development of an associative file system 2011-10-25 8 / 17

    View Slide

  14. Motivation Ideas Architecture Performance
    Operating system integration
    Basically impossible.
    Lars Hupel Development of an associative file system 2011-10-25 9 / 17

    View Slide

  15. Motivation Ideas Architecture Performance
    Operating system integration
    + FUSE
    flexible infrastructure for customized file systems
    not applicable because it uses a hierarchy
    still interesting for a “virtual hierarchy”
    but: outside of scope
    Lars Hupel Development of an associative file system 2011-10-25 10 / 17

    View Slide

  16. Motivation Ideas Architecture Performance
    Assumptions
    Traditional file systems: high performance, stable and tested
    store blobs in an existing file system
    RDBMS: high performance, stable and tested
    store metadata in a database
    Lars Hupel Development of an associative file system 2011-10-25 11 / 17

    View Slide

  17. Motivation Ideas Architecture Performance
    Assumptions
    Traditional file systems: high performance, stable and tested
    store blobs in an existing file system
    RDBMS: high performance, stable and tested
    store metadata in a database
    Lars Hupel Development of an associative file system 2011-10-25 11 / 17

    View Slide

  18. Motivation Ideas Architecture Performance
    Platform
    RDBMS Target OS
    HyPer MySQL SQLite ext4 · · ·
    Core
    Connection VFS Process
    Environment
    API/CLI
    Session Isolation Actions Objects
    Lars Hupel Development of an associative file system 2011-10-25 12 / 17

    View Slide

  19. Motivation Ideas Architecture Performance
    Interaction with the file system
    CLI
    Git-style interaction
    commands: create, cat, rm,
    ls, ...
    built-in shell
    $ fs create
    $ fs store --uuid=...
    --blob-name=...
    API
    full session control
    multiple instances per
    process
    error handling
    auto file = env.createFile();
    file->addBlob("default", ...);
    storeFile(stream, ...);
    Lars Hupel Development of an associative file system 2011-10-25 13 / 17

    View Slide

  20. Motivation Ideas Architecture Performance
    Performance
    It depends!
    target file system
    depends on disk speed, optimizations, mount flags, ...
    database
    depends on storage, system, ...
    Lars Hupel Development of an associative file system 2011-10-25 14 / 17

    View Slide

  21. Motivation Ideas Architecture Performance
    Performance
    It depends!
    target file system
    depends on disk speed, optimizations, mount flags, ...
    database
    depends on storage, system, ...
    Lars Hupel Development of an associative file system 2011-10-25 14 / 17

    View Slide

  22. Motivation Ideas Architecture Performance
    Performance
    It depends!
    target file system
    depends on disk speed, optimizations, mount flags, ...
    database
    depends on storage, system, ...
    Lars Hupel Development of an associative file system 2011-10-25 14 / 17

    View Slide

  23. Motivation Ideas Architecture Performance
    Performance
    Unit tests
    small test suite
    single- and multi-session
    produces ≈ 100 DB rows and ≈ 50 KiB data
    ext3+SQLite: 57 s
    ext3+MySQL: 1.5 s
    Lars Hupel Development of an associative file system 2011-10-25 15 / 17

    View Slide

  24. Motivation Ideas Architecture Performance
    Performance
    Create files
    Create n files in m sessions.
    ext3+MySQL, n = 1000
    m = n: 37 s
    m = 1: 8.6 s
    Comparison (ext3 only): 2.9 s
    Lars Hupel Development of an associative file system 2011-10-25 16 / 17

    View Slide

  25. Q & A

    View Slide