Slide 1

Slide 1 text

Haskell & Zip files What I learned while improving zip-archive Tommaso Piazza @tmpz 1 © Tommaso Piazza, 2018 - https:/ /github.com/blender

Slide 2

Slide 2 text

The library The zip-archive ! library provides functions for creating, modifying, and extracting files from zip archives. • Created by John MacFarlane ! author of pandoc ! • Very straight forward, ~1000 LOC 2 © Tommaso Piazza, 2018 - https:/ /github.com/blender

Slide 3

Slide 3 text

The Problem • zip and unzip support archiving and extracting symlinks • zip-archive did not Python zipfile ! does not support symlinks. 3 © Tommaso Piazza, 2018 - https:/ /github.com/blender

Slide 4

Slide 4 text

What is a symlink anyways? • Magic? No, just a file (with some special attributes) • Contains relative or absolute path to target as text $ ls total 32 drwxr-xr-x 3 blender staff 102B Jan 23 18:48 dir1 -rw-r--r-- 1 blender staff 15B Jan 23 18:46 file1 lrwxr-xr-x 1 blender staff 5B Jan 23 18:49 link_to_dir1 -> dir1/ lrwxr-xr-x 1 blender staff 5B Jan 23 18:49 link_to_file1 -> file1 lrwxr-xr-x 1 blender staff 16B Jan 23 18:49 link_to_file_in_dir -> dir1/file_in_dir $ readlink link_to_file1 file1 4 © Tommaso Piazza, 2018 - https:/ /github.com/blender

Slide 5

Slide 5 text

The plan Steps suggested by Matthias at the last meetup: 1. Find out how a zip archive is represented in zip-archive 2. zip -r -q --symlinks a directory with symlinks 3. zip -r -q the same directory 4. Compare archives 5. Profit ⭐⭐⭐⭐⭐ Solid plan. Would follow again. -- A satisfied developer 5 © Tommaso Piazza, 2018 - https:/ /github.com/blender

Slide 6

Slide 6 text

Codec.Archive.Zip.Archive • ! Zip Archive binary format document • ! Helpful reference 1 • ! Helpful reference 2 data Archive = Archive { zEntries :: [Entry] -- ^ Files in zip archive , zSignature :: Maybe B.ByteString -- ^ Digital signature , zComment :: B.ByteString -- ^ Comment for whole zip archive } deriving (Read, Show) 6 © Tommaso Piazza, 2018 - https:/ /github.com/blender

Slide 7

Slide 7 text

Codec.Archive.Zip.Entry data Entry = Entry { eRelativePath :: FilePath -- ^ Relative path, using '/' as separator , eCompressionMethod :: CompressionMethod -- ^ Compression method , eLastModified :: Integer -- ^ Modification time (seconds since unix epoch) , eCRC32 :: Word32 -- ^ CRC32 checksum , eCompressedSize :: Word32 -- ^ Compressed size in bytes , eUncompressedSize :: Word32 -- ^ Uncompressed size in bytes , eExtraField :: B.ByteString -- ^ Extra field - unused by this library , eFileComment :: B.ByteString -- ^ File comment - unused by this library , eVersionMadeBy :: Word16 -- ^ Version made by field , eInternalFileAttributes :: Word16 -- ^ Internal file attributes - unused by this library , eExternalFileAttributes :: Word32 -- ^ External file attributes (system-dependent) , eCompressedData :: B.ByteString -- ^ Compressed contents of file } deriving (Read, Show, Eq) 7 © Tommaso Piazza, 2018 - https:/ /github.com/blender

Slide 8

Slide 8 text

How to read a zip • toArchive :: ByteString -> Archive Data.ByteString.Lazy.readFile "archive.zip" >>= return . toArchive 8 © Tommaso Piazza, 2018 - https:/ /github.com/blender

Slide 9

Slide 9 text

Diff http:/ /www.mergely.com/MF7ILoW3/ • Left: Without symlinks • Right: With symlinks 9 © Tommaso Piazza, 2018 - https:/ /github.com/blender

Slide 10

Slide 10 text

Non Symlink Entry With some omissions Entry { , eRelativePath = "hello/link_to_dir1/" , eCompressionMethod = NoCompression , eCRC32 = 0 , eCompressedSize = 0 -- ^ Size of the file, dirs have 0 size , eUncompressedSize = 0 , eExternalFileAttributes = 1106051088 -- ^ Permissions and other flags , eCompressedData = "" -- ^ Dirs have no compressed data } 10 © Tommaso Piazza, 2018 - https:/ /github.com/blender

Slide 11

Slide 11 text

Symlink Entry With some omissions Entry { , eRelativePath = "hello/link_to_dir1" , eCompressionMethod = NoCompression , eCRC32 = 3594983628 , eCompressedSize = 5 -- ^ The length in bytes of string representing the target , eUncompressedSize = 5 , eExternalFileAttributes = 2716663808 -- ^ Permissions and other flags like symlink flag! , eCompressedData = "dir1/" -- ^ The path to the target } 11 © Tommaso Piazza, 2018 - https:/ /github.com/blender

Slide 12

Slide 12 text

Differences (1) In zip without symlink preservation • One entry for the target • One entry with duplicated eCompressedData same as in target • For all intents and purposes this is another file • If symlink is to a dir, recurse and duplicate everything 12 © Tommaso Piazza, 2018 - https:/ /github.com/blender

Slide 13

Slide 13 text

Differences (2) In zip with symlink preservation • One entry for the target • One entry for the link • eCompressedData is the string representing the target • If symlink is to a dir, do not recurse 13 © Tommaso Piazza, 2018 - https:/ /github.com/blender

Slide 14

Slide 14 text

Behavior of zip-archive Both when archiving and unarchiving • No respect for symlinks • Always duplicates eCompressedData • thus encoding a new file • If symlink is to a dir, recurses into the dir 14 © Tommaso Piazza, 2018 - https:/ /github.com/blender

Slide 15

Slide 15 text

How Archiving works addFilesToArchive [OptRecursive] emptyArchive ["hello"] :: IO Archive This will internally call for each file readEntry :: [ZipOption] -> FilePath -> IO Entry Then finally serialized as binary 15 © Tommaso Piazza, 2018 - https:/ /github.com/blender

Slide 16

Slide 16 text

Archiving Changes -- | Options for 'addFilesToArchive' and 'extractFilesFromArchive'. data ZipOption = OptRecursive -- ^ Recurse into directories when adding files | OptVerbose -- ^ Print information to stderr | OptDestination FilePath -- ^ Directory in which to extract | OptLocation FilePath Bool -- ^ Where to place file when adding files and whether to append current path | OptPreserveSymbolicLinks -- ^ Preserve symbolic links as such. This option is ignored on Windows. deriving (Read, Show, Eq) addFilesToArchive :: [ZipOption] -> Archive -> [FilePath] -> IO Archive addFilesToArchive opts archive files = -- ^ don't recurse in dirs if OptPreserveSymbolicLinks `elem` opts -- | Generates a 'Entry' from a file or directory. readEntry :: [ZipOption] -> FilePath -> IO Entry readEntry opts file = -- ^ if OptPreserveSymbolicLinks `elem` opts -- strip "/" if FilePath is dir -- populate eCompressedData with targets of symlinks as string -- add symbolic link file mode. Extremetly important detail. 16 © Tommaso Piazza, 2018 - https:/ /github.com/blender

Slide 17

Slide 17 text

eExternalFileAttributes eExternalFileAttributes for Posix systems contains the file modes and permissions. For symlinks, add the symlink file mode from System.Posix.Files From readEntry: fs <- getSymbolicLinkStatus path let isSymLink = isSymbolicLink fs let fm = if isSymLink then unionFileModes symbolicLinkMode (fileMode fs) else fileMode f 17 © Tommaso Piazza, 2018 - https:/ /github.com/blender

Slide 18

Slide 18 text

All Done right? Not quire, changes are needed in extraction too 18 © Tommaso Piazza, 2018 - https:/ /github.com/blender

Slide 19

Slide 19 text

Unarchiving Changes extractFilesFromArchive :: [ZipOption] -> Archive -> IO () extractFilesFromArchive opts archive = -- ^ Separate symbolic entries from non symbolic entries. -- Call writeSymbolicLinkEntry on each Entry writeSymbolicLinkEntry :: [ZipOption] -> Entry -> IO () if (isEntrySymbolicLink entry) then do let targetPath = fromJust . symbolicLinkEntryTarget $ entry let symlinkPath = prefixPath eRelativePath entry createSymbolicLink targetPath symlinkPath else writeEntry opts entry symbolicLinkEntryTarget :: Entry -> Maybe FilePath symbolicLinkEntryTarget entry | isEntrySymbolicLink entry = Just . C.unpack $ fromEntry entry | otherwise = Nothing 19 © Tommaso Piazza, 2018 - https:/ /github.com/blender

Slide 20

Slide 20 text

Other challenges (1) • 2 Pull requests • Extraction was not implemented in phase one. No one noticed !. • ifdef _WINDOWS, ifndef _WINDOWS everywhere • Not supported on Windows " • directory == 1.2.1.0 for compatibility reasons 20 © Tommaso Piazza, 2018 - https:/ /github.com/blender

Slide 21

Slide 21 text

Other challenges (2) • Broke Hackage and Stackage • ! jgm/zip-archive/issues/38 • Added dir with symlinks for testing. Broke Hackage. • directory == 1.3.1 newer than LTS version. Broke Stackage. 21 © Tommaso Piazza, 2018 - https:/ /github.com/blender