Upgrade to Pro — share decks privately, control downloads, hide ads and more …

  A Distributed Archival Network for Process-Oriented Autonomic Long-Term Digital Preservation

  A Distributed Archival Network for Process-Oriented Autonomic Long-Term Digital Preservation

Presentation slides for #JCDL13

Ivan Subotic

July 23, 2013
Tweet

Other Decks in Research

Transcript

  1. A Distributed Archival Network for Process-Oriented Autonomic Long-Term Digital Preservation

    Ivan Subotic 1 Lukas Rosenthaler1 Heiko Schuldt2 1Digital Humanities Lab, 2Databases and Information Systems Group University of Basel, Switzerland {firstname.lastname}@unibas.ch JCDL 2013, July 23, 2013
  2. Introduction Scenario: National Museum Content • Images, Digitized Manuscripts •

    Annotations, Links, Collections Automatic data handling • Preservation Policy • Maintenance • Failure recovery • Data-format migration Robustness and failure tolerance • Redundant o↵-site replicas Collaboration • Collaborative network of remote sites • Automatic distribution System (automation) Metadata Digital Data JPEG, TIFF, ... - complex objects - annotations - links - collections Jim (User) Ivan Subotic (University of Basel) DISTARNET JCDL 2013, July 23 2 / 20
  3. Introduction Digital Preservation Requirements • Replication and Distribution • Fault

    Tolerance and Failure Management • Management of Complex Information Objects • Scalability • Openness and Extensibility • Resource Discovery and Load Balancing • Authentication, Authorization, and Auditing Ivan Subotic (University of Basel) DISTARNET JCDL 2013, July 23 3 / 20
  4. Introduction Outline 1 Introduction 2 A Distributed Archival System 3

    The DISTARNET System Prototype 4 Evaluation 5 Conclusion and Future Work Ivan Subotic (University of Basel) DISTARNET JCDL 2013, July 23 4 / 20
  5. A Distributed Archival System Outline 1 Introduction 2 A Distributed

    Archival System 3 The DISTARNET System Prototype 4 Evaluation 5 Conclusion and Future Work Ivan Subotic (University of Basel) DISTARNET JCDL 2013, July 23 5 / 20
  6. A Distributed Archival System Distributed Archival System ... Distributed Archival

    Network • Virtual Organization (VO) based • Peer-to-Peer (P2P) ) DISTARNET Network / Sub-Network (DSN1, DSN2, DSN3) DISTARNET NETWORK DSN3 DSN2 DSN1 National Museum Data Model • DISTARNET Archival Object (DAO) == Information Object + Archived digital object + Metadata (content, technical, preservation) ) Implemented: Resource Description Framework (RDF) Ivan Subotic (University of Basel) DISTARNET JCDL 2013, July 23 6 / 20
  7. A Distributed Archival System ... Distributed Archival System ... DISTARNET

    Processes • Self-Configuration + Node Joining Process + Periodic Neighbor-Node Checking Process + Automated Dynamic Replication Process • Self-Healing + Periodic Integrity Checking Process + DAO Repairing Process + Node Lost Process + Reliable Copying Process + Data Format Migration Process • Self-Optimization + State Dissemination Process + Resource Discovery + Parameter Optimization Ivan Subotic (University of Basel) DISTARNET JCDL 2013, July 23 7 / 20
  8. A Distributed Archival System ... Distributed Archival System Failure Classification

    and Fault-Tolerance • Distributed Infrastructure Faults + Node Loss Periodic Neighbor-Node Checking Process, Node Lost Process, Automated Dynamic Replication Process + Node Dependability Periodic Neighbor-Node Checking Process • Content Faults + DAO Corruption / Destruction Periodic Integrity Checking Process, DAO Repairing Process, Reliable Copying Process + DAO Representation Unreadable Data Format Migration Process • Node Engine Faults + Process Execution / Implementation dependent (more in next section) Ivan Subotic (University of Basel) DISTARNET JCDL 2013, July 23 8 / 20
  9. The DISTARNET System Prototype Outline 1 Introduction 2 A Distributed

    Archival System 3 The DISTARNET System Prototype 4 Evaluation 5 Conclusion and Future Work Ivan Subotic (University of Basel) DISTARNET JCDL 2013, July 23 9 / 20
  10. The DISTARNET System Prototype DISTARNET System Overview Acronym • DISTARNET:

    DIST ributed AR chival NET work Architecture • Modularized architecture • Actor-based modules (Akka-Framework) • Message-based asynchronous communication (Protocol Bu↵ers) Long-Term Use and Fault-Tolerance • Flexibility and independence • Full decoupling of the modules • Supervisor Strategy DISTARNET NODE Data Layer Content and Network Management Layer User Interaction Layer Akka Local and Remote Messaging DISTARNET NETWORK User Interaction Module (RESTful API) UI Manager (HTTP Server) Preservation Planning Ingest Access System Management Netty + Unfiltered + Akka DSN1 DSN3 DSN2 DSN4 DAO Storage Module Network Module Services Module Repositories Module DP Logic Module PEL Manager Network Manager Storage Manager DAO Filestore Rslv. Data Object Catalog Repositories Manager DAO Triple Store NIR FSM, Akka Mongo DB, Akka Jena, Akka Akka Jena TDB, Filesystem, Mongo DB, Akka RLR CJR MJR SIR PICP ADRP DRP PNCP NLP NJP SDP ... RCP DFMP Services Manager An. CS DFMP Dist. PNCP Pub. RCP ... ... Ivan Subotic (University of Basel) DISTARNET JCDL 2013, July 23 10 / 20
  11. Evaluation Outline 1 Introduction 2 A Distributed Archival System 3

    The DISTARNET System Prototype 4 Evaluation 5 Conclusion and Future Work Ivan Subotic (University of Basel) DISTARNET JCDL 2013, July 23 11 / 20
  12. Evaluation Cooperating Image Archives Scenario DSN • Network of four

    Image Archives Image Archives DISTARNET Network node01 node04 node03 node02 Data • 1 Collection DAO containing 100 Image and 100 Annotation DAOs i1 i1/dc hasRep i1/high hasRep i1/thumb hasRep i1/thumb/data (link to bitstream) data-uri i1/high/data (link to bitstream) data-uri a1 c1 hasMemberInCollection hasMemberInCollection a1/annotation hasRep ... i100 ... a100 Ivan Subotic (University of Basel) DISTARNET JCDL 2013, July 23 12 / 20
  13. Evaluation Qualitative System Evaluation ... Evaluation Goals • Evaluate behavior

    Test Setup • 4 nodes running on one machine • 1 JVM per node Initial Network State • Each node owns 1 collection (100 Image and Annotation DAOs) • 3 replicas of each DAO in network Ivan Subotic (University of Basel) DISTARNET JCDL 2013, July 23 13 / 20
  14. Evaluation ... Qualitative System Evaluation Test Scenarios 1. Node Destruction

    Distributed Infrastructure Fault Class 2. Content Corruption Content Fault Class 3. Data Format Obsolescence Content Fault Class Add new representation 4. Multi-Failure Mixed Ivan Subotic (University of Basel) DISTARNET JCDL 2013, July 23 14 / 20
  15. Evaluation Quantitative System Evaluation ... Evaluation Goals • Evaluate system

    performance Data (F = 1) • 1 Collection containing 100 Image and 100 Annotation DAOs Test Data Size • Simplified calculation • 1 Image DAO = 100 MB, 1 Annotation DAO = 0 Bytes Scaling Factor (F) 1 10 100 · · · 1,000,000 # of Collections 1 10 100 · · · 106 # of Image DAOs 100 1’000 10’000 · · · 108 # of Annotation DAOs 100 1’000 10’000 · · · 108 Overall Archive Size 10 GB 100 GB 1 TB · · · 10 PB Ivan Subotic (University of Basel) DISTARNET JCDL 2013, July 23 15 / 20
  16. Evaluation ... Quantitative System Evaluation ... • Process mix +

    PICP over Archive with 10 % data loss, DRP, RCP + DFMP running for 1 DAO • Inter-Node Transfer Times + 100 Mb/s ⇡ 12.5 MB/s ) 8 seconds for 100 MB + 1 Gb/s ⇡ 125 MB/s ) 0.8 seconds for 100 MB • Procedure + One node with data for F = 1 + Measure execution time of process mix + Add calculated inter-node transfer times + Extrapolate under linear assumption Ivan Subotic (University of Basel) DISTARNET JCDL 2013, July 23 16 / 20
  17. Evaluation ... Quantitative System Evaluation ... Results • total duration

    under 24 h • total duration over 24 h Scaling Factor 1 10 100 1’000 10’000 100’000 1’000’000 Image DAOs 100 1’000 10’000 105 106 107 108 Archive Size 10 GB 100 GB 1 TB 10 TB 100 TB 1 PB 10 PB Duration w. 100 Mb/s 129.415 s 0.36 h 3.59 h 35.95 h 15 d 150 d 1’495 d Duration w. 1 Gb/s 50.894 s 0.14 h 1.41 h 14.14 h 6 d 59 d 589 d Assumptions / Results • Conservative: execution under 24 h ) ⇡ 10 TB • Relaxed: execution under 60 days ) ⇡ 1 PB Ivan Subotic (University of Basel) DISTARNET JCDL 2013, July 23 17 / 20
  18. Evaluation ... Quantitative System Evaluation Results: Breakdown for F=1 DFMP

    RCP 95.46 % 95.57 % 1.881 2.256 4 0.400 4.54 % 4.43 % 0 s 38 s 75 s 113 s 150 s 100 Mb/s 1 Gb/s (1) PICP + DRP (1) RCP (2) DFMP (2) RCP Σ 4.54% Σ 95.46% Σ 4.43% Σ 95.57% Periodic Integrity Checking Process + DAO Repairing Process Reliable Copying Process Data Format Migration Process Reliable Copying Process (1) (2) Ivan Subotic (University of Basel) DISTARNET JCDL 2013, July 23 18 / 20
  19. Conclusion and Future Work Outline 1 Introduction 2 A Distributed

    Archival System 3 The DISTARNET System Prototype 4 Evaluation 5 Conclusion and Future Work Ivan Subotic (University of Basel) DISTARNET JCDL 2013, July 23 19 / 20
  20. Conclusion and Future Work Conclusion and Future Work Conclusion •

    DISTARNET Prototype Implementation + Fully distributed, Internet-based, fault-tolerant, long-term preservation system + Process-based autonomic behavior - Dynamic replication / automated consistency checking / recovery + Flexible data model - Complex information objects / annotations / links / collections • Evaluation / e↵ective and e cient Future Work • Semantic Digital Archives (SDA) • Preservation of process execution chain Ivan Subotic (University of Basel) DISTARNET JCDL 2013, July 23 20 / 20