Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Haskelll & Zip Archives - What I learned improving zip-archive
Search
Tommaso Piazza
January 24, 2018
Programming
0
170
Haskelll & Zip Archives - What I learned improving zip-archive
What I learned about zip archives, the zip-archive library and contributing to open source Haskell
Tommaso Piazza
January 24, 2018
Tweet
Share
More Decks by Tommaso Piazza
See All by Tommaso Piazza
From Code to Binary
tmspzz
0
22
Mach-O Mach-O
tmspzz
2
120
Haskell's Kind System - A Primer
tmspzz
0
170
Carthage Tips & Tricks
tmspzz
1
74
PSA: Carthage 0.29.0
tmspzz
0
220
Caching, a simple solution to speeding up build times
tmspzz
0
510
Rome - An S3 Cache for Carthage
tmspzz
0
200
Other Decks in Programming
See All in Programming
DMMプラットフォームがTiDB Cloudを採用した背景
pospome
7
2.7k
CQRS/ES avec Symfony, c’est (trop) bien !
jeremyfreeagent
1
630
Open Source Swift Workshop - Foundation and first party libraries
ikesyo
0
1.1k
Creating Retro-Style Photos Using Swift
ski
1
910
Prepare for Jakarta EE 11 - Performance and Developer Productivity
ivargrimstad
0
370
スクラムガイドのスプリントレトロスペクティブを改めて読みかえしてみた / Re-reading the Sprint Retrospective Section in the Scrum Guide
mackey0225
3
320
Code Reviews
bkuhlmann
4
860
脱・初心者!脱・マネコン!AWS CDKを使ってみませんか!?
har1101
0
300
[技育CAMPアカデミア]アイディアを形に!【超入門】スマホアプリ開発〜リリースまでの流れをご紹介
teamlab
PRO
0
330
Micro Frontends for Java Microservices - Devnexus 2024
mraible
PRO
0
410
From Spring Boot 2 to Spring Boot 3 with Java 22 and Jakarta EE
ivargrimstad
0
830
From Spring Boot 2 to Spring Boot 3 with Java 21 and Jakarta EE
ivargrimstad
0
1.1k
Featured
See All Featured
Intergalactic Javascript Robots from Outer Space
tanoku
266
26k
Writing Fast Ruby
sferik
619
60k
How to train your dragon (web standard)
notwaldorf
71
5.1k
What's in a price? How to price your products and services
michaelherold
237
11k
Building a Modern Day E-commerce SEO Strategy
aleyda
16
6.3k
It's Worth the Effort
3n
180
27k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
320
20k
Automating Front-end Workflow
addyosmani
1354
200k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
352
28k
What the flash - Photography Introduction
edds
64
11k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
355
22k
How GitHub Uses GitHub to Build GitHub
holman
468
290k
Transcript
Haskell & Zip files What I learned while improving zip-archive
Tommaso Piazza @tmpz 1 © Tommaso Piazza, 2018 - https:/ /github.com/blender
The library The zip-archive ! library provides functions for creating,
modifying, and extracting files from zip archives. • Created by John MacFarlane ! author of pandoc ! • Very straight forward, ~1000 LOC 2 © Tommaso Piazza, 2018 - https:/ /github.com/blender
The Problem • zip and unzip support archiving and extracting
symlinks • zip-archive did not Python zipfile ! does not support symlinks. 3 © Tommaso Piazza, 2018 - https:/ /github.com/blender
What is a symlink anyways? • Magic? No, just a
file (with some special attributes) • Contains relative or absolute path to target as text $ ls total 32 drwxr-xr-x 3 blender staff 102B Jan 23 18:48 dir1 -rw-r--r-- 1 blender staff 15B Jan 23 18:46 file1 lrwxr-xr-x 1 blender staff 5B Jan 23 18:49 link_to_dir1 -> dir1/ lrwxr-xr-x 1 blender staff 5B Jan 23 18:49 link_to_file1 -> file1 lrwxr-xr-x 1 blender staff 16B Jan 23 18:49 link_to_file_in_dir -> dir1/file_in_dir $ readlink link_to_file1 file1 4 © Tommaso Piazza, 2018 - https:/ /github.com/blender
The plan Steps suggested by Matthias at the last meetup:
1. Find out how a zip archive is represented in zip-archive 2. zip -r -q --symlinks a directory with symlinks 3. zip -r -q the same directory 4. Compare archives 5. Profit ⭐⭐⭐⭐⭐ Solid plan. Would follow again. -- A satisfied developer 5 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Codec.Archive.Zip.Archive • ! Zip Archive binary format document • !
Helpful reference 1 • ! Helpful reference 2 data Archive = Archive { zEntries :: [Entry] -- ^ Files in zip archive , zSignature :: Maybe B.ByteString -- ^ Digital signature , zComment :: B.ByteString -- ^ Comment for whole zip archive } deriving (Read, Show) 6 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Codec.Archive.Zip.Entry data Entry = Entry { eRelativePath :: FilePath --
^ Relative path, using '/' as separator , eCompressionMethod :: CompressionMethod -- ^ Compression method , eLastModified :: Integer -- ^ Modification time (seconds since unix epoch) , eCRC32 :: Word32 -- ^ CRC32 checksum , eCompressedSize :: Word32 -- ^ Compressed size in bytes , eUncompressedSize :: Word32 -- ^ Uncompressed size in bytes , eExtraField :: B.ByteString -- ^ Extra field - unused by this library , eFileComment :: B.ByteString -- ^ File comment - unused by this library , eVersionMadeBy :: Word16 -- ^ Version made by field , eInternalFileAttributes :: Word16 -- ^ Internal file attributes - unused by this library , eExternalFileAttributes :: Word32 -- ^ External file attributes (system-dependent) , eCompressedData :: B.ByteString -- ^ Compressed contents of file } deriving (Read, Show, Eq) 7 © Tommaso Piazza, 2018 - https:/ /github.com/blender
How to read a zip • toArchive :: ByteString ->
Archive Data.ByteString.Lazy.readFile "archive.zip" >>= return . toArchive 8 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Diff http:/ /www.mergely.com/MF7ILoW3/ • Left: Without symlinks • Right: With
symlinks 9 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Non Symlink Entry With some omissions Entry { , eRelativePath
= "hello/link_to_dir1/" , eCompressionMethod = NoCompression , eCRC32 = 0 , eCompressedSize = 0 -- ^ Size of the file, dirs have 0 size , eUncompressedSize = 0 , eExternalFileAttributes = 1106051088 -- ^ Permissions and other flags , eCompressedData = "" -- ^ Dirs have no compressed data } 10 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Symlink Entry With some omissions Entry { , eRelativePath =
"hello/link_to_dir1" , eCompressionMethod = NoCompression , eCRC32 = 3594983628 , eCompressedSize = 5 -- ^ The length in bytes of string representing the target , eUncompressedSize = 5 , eExternalFileAttributes = 2716663808 -- ^ Permissions and other flags like symlink flag! , eCompressedData = "dir1/" -- ^ The path to the target } 11 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Differences (1) In zip without symlink preservation • One entry
for the target • One entry with duplicated eCompressedData same as in target • For all intents and purposes this is another file • If symlink is to a dir, recurse and duplicate everything 12 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Differences (2) In zip with symlink preservation • One entry
for the target • One entry for the link • eCompressedData is the string representing the target • If symlink is to a dir, do not recurse 13 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Behavior of zip-archive Both when archiving and unarchiving • No
respect for symlinks • Always duplicates eCompressedData • thus encoding a new file • If symlink is to a dir, recurses into the dir 14 © Tommaso Piazza, 2018 - https:/ /github.com/blender
How Archiving works addFilesToArchive [OptRecursive] emptyArchive ["hello"] :: IO Archive
This will internally call for each file readEntry :: [ZipOption] -> FilePath -> IO Entry Then finally serialized as binary 15 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Archiving Changes -- | Options for 'addFilesToArchive' and 'extractFilesFromArchive'. data
ZipOption = OptRecursive -- ^ Recurse into directories when adding files | OptVerbose -- ^ Print information to stderr | OptDestination FilePath -- ^ Directory in which to extract | OptLocation FilePath Bool -- ^ Where to place file when adding files and whether to append current path | OptPreserveSymbolicLinks -- ^ Preserve symbolic links as such. This option is ignored on Windows. deriving (Read, Show, Eq) addFilesToArchive :: [ZipOption] -> Archive -> [FilePath] -> IO Archive addFilesToArchive opts archive files = -- ^ don't recurse in dirs if OptPreserveSymbolicLinks `elem` opts -- | Generates a 'Entry' from a file or directory. readEntry :: [ZipOption] -> FilePath -> IO Entry readEntry opts file = -- ^ if OptPreserveSymbolicLinks `elem` opts -- strip "/" if FilePath is dir -- populate eCompressedData with targets of symlinks as string -- add symbolic link file mode. Extremetly important detail. 16 © Tommaso Piazza, 2018 - https:/ /github.com/blender
eExternalFileAttributes eExternalFileAttributes for Posix systems contains the file modes and
permissions. For symlinks, add the symlink file mode from System.Posix.Files From readEntry: fs <- getSymbolicLinkStatus path let isSymLink = isSymbolicLink fs let fm = if isSymLink then unionFileModes symbolicLinkMode (fileMode fs) else fileMode f 17 © Tommaso Piazza, 2018 - https:/ /github.com/blender
All Done right? Not quire, changes are needed in extraction
too 18 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Unarchiving Changes extractFilesFromArchive :: [ZipOption] -> Archive -> IO ()
extractFilesFromArchive opts archive = -- ^ Separate symbolic entries from non symbolic entries. -- Call writeSymbolicLinkEntry on each Entry writeSymbolicLinkEntry :: [ZipOption] -> Entry -> IO () if (isEntrySymbolicLink entry) then do let targetPath = fromJust . symbolicLinkEntryTarget $ entry let symlinkPath = prefixPath </> eRelativePath entry createSymbolicLink targetPath symlinkPath else writeEntry opts entry symbolicLinkEntryTarget :: Entry -> Maybe FilePath symbolicLinkEntryTarget entry | isEntrySymbolicLink entry = Just . C.unpack $ fromEntry entry | otherwise = Nothing 19 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Other challenges (1) • 2 Pull requests • Extraction was
not implemented in phase one. No one noticed !. • ifdef _WINDOWS, ifndef _WINDOWS everywhere • Not supported on Windows " • directory == 1.2.1.0 for compatibility reasons 20 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Other challenges (2) • Broke Hackage and Stackage • !
jgm/zip-archive/issues/38 • Added dir with symlinks for testing. Broke Hackage. • directory == 1.3.1 newer than LTS version. Broke Stackage. 21 © Tommaso Piazza, 2018 - https:/ /github.com/blender