Development of an associative file system

A1216674d5c9747bcdcc716872439137?s=47 Lars Hupel
October 25, 2011

Development of an associative file system

Organizing multimedia data, e. g. pictures, music or videos is a rather common use case for modern file systems. There are quite a number of applications which try to expose an user-friendly interface for dealing with tagging, sorting and editing these files. This becomes necessary because sets of such files do not have an intrinsic hierarchic structure. For example, pictures taken with a digital camera carry EXIF metadata which can be used to retrieve a picture by date, time or location instead of an artificial folder structure.

However, the major problem shared by all of those multimedia applications is that the files are actually stored in folders on a traditional file system. As such, any operation done by an user outside of the application leads to inconsistencies inside of the application. Also, metadata produced by one application cannot be consumed by another one because of proprietary formats.

In this thesis, a file system which uses the established RDF standard for storing metadata is developed which imposes only little structural requirements on the data. The system features both an API which enables high-level operations on file contents and metadata and a CLI which resembles ideas from versioning systems like Git.

Also, a formalization of the most important operations is given, including a concept of transactions, which has been adapted from relational database systems to fit in the environment of a file system.

A1216674d5c9747bcdcc716872439137?s=128

Lars Hupel

October 25, 2011
Tweet

Transcript

  1. Motivation Ideas Architecture Performance Development of an associative file system

    Lars Hupel TU München Fakultät für Informatik Lehrstuhl für Datenbanksysteme 2011-10-25 Lars Hupel Development of an associative file system 2011-10-25 1 / 17
  2. Motivation Ideas Architecture Performance File systems State of the art

    Major operating systems share the same concept of files: hierachically organized in folders chunk of data (bit string) identified by recursive name some metadata (timestamps, permissions, ...) Lars Hupel Development of an associative file system 2011-10-25 3 / 17
  3. Motivation Ideas Architecture Performance File systems Works nicely for many

    kinds of data, but... What about multimedia? Lars Hupel Development of an associative file system 2011-10-25 5 / 17
  4. Motivation Ideas Architecture Performance File systems and multimedia Fact: Some

    kinds of data are not suited for hierarchical storage. Lars Hupel Development of an associative file system 2011-10-25 6 / 17
  5. Motivation Ideas Architecture Performance File systems and multimedia Fact: Some

    kinds of data are not suited for hierarchical storage. Examples: photos, music, videos, ... Lars Hupel Development of an associative file system 2011-10-25 6 / 17
  6. Motivation Ideas Architecture Performance File systems and multimedia Fact: Some

    kinds of data are not suited for hierarchical storage. Examples: photos, music, videos, ... specialized library applications (Picasa, Digikam, ...) Lars Hupel Development of an associative file system 2011-10-25 6 / 17
  7. Motivation Ideas Architecture Performance Multimedia libraries ... are a solution,

    but not the best one. Problems only manage files of some types no common standard ( no import/export) usually no API cannot detect changes made outside Lars Hupel Development of an associative file system 2011-10-25 7 / 17
  8. Motivation Ideas Architecture Performance Multimedia libraries ... are a solution,

    but not the best one. Problems only manage files of some types no common standard ( no import/export) usually no API cannot detect changes made outside Lars Hupel Development of an associative file system 2011-10-25 7 / 17
  9. Motivation Ideas Architecture Performance Multimedia libraries ... are a solution,

    but not the best one. Problems only manage files of some types no common standard ( no import/export) usually no API cannot detect changes made outside Lars Hupel Development of an associative file system 2011-10-25 7 / 17
  10. Motivation Ideas Architecture Performance Multimedia libraries ... are a solution,

    but not the best one. Problems only manage files of some types no common standard ( no import/export) usually no API cannot detect changes made outside Lars Hupel Development of an associative file system 2011-10-25 7 / 17
  11. Motivation Ideas Architecture Performance Multimedia libraries ... are a solution,

    but not the best one. Problems only manage files of some types no common standard ( no import/export) usually no API cannot detect changes made outside Lars Hupel Development of an associative file system 2011-10-25 7 / 17
  12. Motivation Ideas Architecture Performance Basic model blob: a chunk of

    data file: set of (name, blob) pairs metadata: triples of subject, predicate, object no file names no folders Lars Hupel Development of an associative file system 2011-10-25 8 / 17
  13. Motivation Ideas Architecture Performance Basic model blob: a chunk of

    data file: set of (name, blob) pairs metadata: triples of subject, predicate, object no file names no folders Lars Hupel Development of an associative file system 2011-10-25 8 / 17
  14. Motivation Ideas Architecture Performance Operating system integration Basically impossible. Lars

    Hupel Development of an associative file system 2011-10-25 9 / 17
  15. Motivation Ideas Architecture Performance Operating system integration + FUSE flexible

    infrastructure for customized file systems not applicable because it uses a hierarchy still interesting for a “virtual hierarchy” but: outside of scope Lars Hupel Development of an associative file system 2011-10-25 10 / 17
  16. Motivation Ideas Architecture Performance Assumptions Traditional file systems: high performance,

    stable and tested store blobs in an existing file system RDBMS: high performance, stable and tested store metadata in a database Lars Hupel Development of an associative file system 2011-10-25 11 / 17
  17. Motivation Ideas Architecture Performance Assumptions Traditional file systems: high performance,

    stable and tested store blobs in an existing file system RDBMS: high performance, stable and tested store metadata in a database Lars Hupel Development of an associative file system 2011-10-25 11 / 17
  18. Motivation Ideas Architecture Performance Platform RDBMS Target OS HyPer MySQL

    SQLite ext4 · · · Core Connection VFS Process Environment API/CLI Session Isolation Actions Objects Lars Hupel Development of an associative file system 2011-10-25 12 / 17
  19. Motivation Ideas Architecture Performance Interaction with the file system CLI

    Git-style interaction commands: create, cat, rm, ls, ... built-in shell $ fs create $ fs store --uuid=... --blob-name=... API full session control multiple instances per process error handling auto file = env.createFile(); file->addBlob("default", ...); storeFile(stream, ...); Lars Hupel Development of an associative file system 2011-10-25 13 / 17
  20. Motivation Ideas Architecture Performance Performance It depends! target file system

    depends on disk speed, optimizations, mount flags, ... database depends on storage, system, ... Lars Hupel Development of an associative file system 2011-10-25 14 / 17
  21. Motivation Ideas Architecture Performance Performance It depends! target file system

    depends on disk speed, optimizations, mount flags, ... database depends on storage, system, ... Lars Hupel Development of an associative file system 2011-10-25 14 / 17
  22. Motivation Ideas Architecture Performance Performance It depends! target file system

    depends on disk speed, optimizations, mount flags, ... database depends on storage, system, ... Lars Hupel Development of an associative file system 2011-10-25 14 / 17
  23. Motivation Ideas Architecture Performance Performance Unit tests small test suite

    single- and multi-session produces ≈ 100 DB rows and ≈ 50 KiB data ext3+SQLite: 57 s ext3+MySQL: 1.5 s Lars Hupel Development of an associative file system 2011-10-25 15 / 17
  24. Motivation Ideas Architecture Performance Performance Create files Create n files

    in m sessions. ext3+MySQL, n = 1000 m = n: 37 s m = 1: 8.6 s Comparison (ext3 only): 2.9 s Lars Hupel Development of an associative file system 2011-10-25 16 / 17
  25. Q & A