Slide 5
Slide 5 text
Dataset Version Control
Version control is a system that records changes to a file or set
of files over time so that you can recall specific versions later.
https://git-scm.com/
A Dataset Version Control System (DVCS) is designed to store
large files, which traditional VCS (e.g., git, SVN) are not good at.
Various solutions for different types of data and environment
▷ Git – Large File Storage [https://git-lfs.github.com/]
▷ Unstructured files and use of delta encoding [VLDB’15]
▷ Relational data [Decibel, VLDB’16; OrpheusDB, VLDB’17]
▷ Tensors (e.g., in Deep Learning) [ModelHub, ICDE’17]
▷ …