Upgrade to Pro — share decks privately, control downloads, hide ads and more …

The Oxford Common File Layout

David Wilcox
September 26, 2018

The Oxford Common File Layout

The Oxford Common File Layout (OCFL) initiative is an effort to define a shared approach to file hierarchy for long-term preservation. What began as a discussion at the Fedora and Samvera Camp held in Oxford, UK in September of 2017 has grown into a focused community effort. Many repository systems store large amounts of data that are difficult to migrate, and such migrations risk data loss. Within a storage system, digital objects should be designed under the assumption that they will be accessed and managed by a variety of applications. This effort is an attempt to decouple the structure of the persisted files from the software that might manage it by creating an expectation of file hierarchy to which software applications must conform, whether implemented in a conventional filesystem with files and directories, or in an object store. Thus the file hierarchy functions as a storage Application Programming Interface (API). This effort addresses three primary requirements: 1) completeness, so that a repository can be rebuilt from the files, 2) parsability, both by humans and machines in the absence of original software, and 3) robustness against errors, corruption and migration between storage technologies. This paper describes motivations, principles and anticipated features of the proposed file layout. The OCFL initiative aims to produce draft specifications for trial use by Fall 2018.

David Wilcox

September 26, 2018
Tweet

More Decks by David Wilcox

Other Decks in Technology

Transcript

  1. “ The Oxford Common File Layout (OCFL) initiative is an

    effort to define a shared approach to file hierarchy for long-term preservation. 4
  2. A little history ○ Started as a discussion at Fedora

    and Samvera Camp in September 2017 ○ Initial call attracted 49 participants The OCFL has broad appeal across the digital preservation landscape. 5
  3. 1. Scale Many repository systems store large amounts of data

    (hundreds of terabytes or petabytes) that are time consuming and/or expensive to migrate or reorganize. 7
  4. 3. Systems Digital objects within a storage system should be

    designed with the assumption that they will be managed by many different applications. 9
  5. 4. Commonalities File and directory hierarchies are pervasive organizational metaphors

    across most computing systems, and as such can persist across CPU architectures, disk formats, and operating systems. 10
  6. Goals 1. Application-agnostic file layout spec 2. Enable object and

    version preservation 3. Record of actions 4. Object validation against specification 5. No special software for basic functions 6. Compatible with wide variety of storage systems 7. Full data restoration using only filesystem 11
  7. Foundational Principles 1 ○ An OCFL digital object is a

    collection of files and metadata. ○ OCFL digital objects should be organized based on their identifiers ○ Content versioning should be implemented within OCFL digital objects 13
  8. Foundational Principles 2 ○ OCFL digital objects should support the

    storage of records (logs) of actions taken on an object ○ OCFL digital objects should be amenable to validation 14
  9. Implementations ○ Reusable libraries will be implemented ○ Existing systems

    will need to rewrite persistence layers Preservation-focused layout helps mitigate the need for future data migrations. 16
  10. Next Steps ○ 2018: Draft spec release ○ 2019: Formal

    spec release ○ Experimental validation tool ○ Test suite and fixtures ○ 2 institutions backing the spec 17
  11. 18 Thanks! Any questions? You can find me at [email protected]

    and @d_wilcox Learn more: https://ocfl.io Presentation template by SlidesCarnival
  12. 19 Image Credits 1. view from carfax tower oxford 2012

    by chensiyuan is licensed under CC BY-SA 4.0 2. Bodleian Library by Roman Kirillov is licensed under CC BY-SA 3.0 3. Chinese Water Dragon Head is licensed under CC0 1.0 4. Air Clouds Migration Group Of Birds Fly Bird is licensed under CC0 1.0 5. Honey Comb Structure by Gavin Mackintosh is licensed under CC BY 2.0 6. Cat Football by Alexas_Fotos is licensed under CC0 1.0 7. Alfond Barn Raising by Sterling College is licensed under CC BY-SA 2.0 8. Brick laying by Scott Lewis is licensed under CC BY 2.0