Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Haskelll & Zip Archives - What I learned improv...
Search
Tommaso Piazza
January 24, 2018
Programming
0
190
Haskelll & Zip Archives - What I learned improving zip-archive
What I learned about zip archives, the zip-archive library and contributing to open source Haskell
Tommaso Piazza
January 24, 2018
Tweet
Share
More Decks by Tommaso Piazza
See All by Tommaso Piazza
From Code to Binary
tmspzz
0
25
Mach-O Mach-O
tmspzz
2
120
Haskell's Kind System - A Primer
tmspzz
0
180
Carthage Tips & Tricks
tmspzz
1
82
PSA: Carthage 0.29.0
tmspzz
0
220
Caching, a simple solution to speeding up build times
tmspzz
0
590
Rome - An S3 Cache for Carthage
tmspzz
0
220
Other Decks in Programming
See All in Programming
Duke on CRaC with Jakarta EE
ivargrimstad
0
410
Djangoにおける複数ユーザー種別認証の設計アプローチ@DjangoCongress JP 2025
delhi09
PRO
4
520
TCAを用いたAmebaのリアーキテクチャ
dazy
0
250
CDK開発におけるコーディング規約の運用
yamanashi_ren01
2
270
オレを救った Cline を紹介する
codehex
16
15k
Jasprが凄い話
hyshu
0
200
フロントエンドオブザーバビリティ on Google Cloud
yunosukey
0
110
15分で学ぶDuckDBの可愛い使い方 DuckDBの最近の更新
notrogue
3
870
React 19アップデートのために必要なこと
uhyo
8
1.6k
Django NinjaによるAPI開発の効率化とリプレースの実践
kashewnuts
1
310
PEPCは何を変えようとしていたのか
ken7253
3
320
良いコードレビューとは
danimal141
10
9.5k
Featured
See All Featured
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
7
660
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
21
2.5k
Adopting Sorbet at Scale
ufuk
75
9.2k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
KATA
mclloyd
29
14k
Making Projects Easy
brettharned
116
6.1k
A better future with KSS
kneath
238
17k
Done Done
chrislema
182
16k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
227
22k
Rebuilding a faster, lazier Slack
samanthasiow
80
8.9k
Being A Developer After 40
akosma
89
590k
Designing Experiences People Love
moore
140
23k
Transcript
Haskell & Zip files What I learned while improving zip-archive
Tommaso Piazza @tmpz 1 © Tommaso Piazza, 2018 - https:/ /github.com/blender
The library The zip-archive ! library provides functions for creating,
modifying, and extracting files from zip archives. • Created by John MacFarlane ! author of pandoc ! • Very straight forward, ~1000 LOC 2 © Tommaso Piazza, 2018 - https:/ /github.com/blender
The Problem • zip and unzip support archiving and extracting
symlinks • zip-archive did not Python zipfile ! does not support symlinks. 3 © Tommaso Piazza, 2018 - https:/ /github.com/blender
What is a symlink anyways? • Magic? No, just a
file (with some special attributes) • Contains relative or absolute path to target as text $ ls total 32 drwxr-xr-x 3 blender staff 102B Jan 23 18:48 dir1 -rw-r--r-- 1 blender staff 15B Jan 23 18:46 file1 lrwxr-xr-x 1 blender staff 5B Jan 23 18:49 link_to_dir1 -> dir1/ lrwxr-xr-x 1 blender staff 5B Jan 23 18:49 link_to_file1 -> file1 lrwxr-xr-x 1 blender staff 16B Jan 23 18:49 link_to_file_in_dir -> dir1/file_in_dir $ readlink link_to_file1 file1 4 © Tommaso Piazza, 2018 - https:/ /github.com/blender
The plan Steps suggested by Matthias at the last meetup:
1. Find out how a zip archive is represented in zip-archive 2. zip -r -q --symlinks a directory with symlinks 3. zip -r -q the same directory 4. Compare archives 5. Profit ⭐⭐⭐⭐⭐ Solid plan. Would follow again. -- A satisfied developer 5 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Codec.Archive.Zip.Archive • ! Zip Archive binary format document • !
Helpful reference 1 • ! Helpful reference 2 data Archive = Archive { zEntries :: [Entry] -- ^ Files in zip archive , zSignature :: Maybe B.ByteString -- ^ Digital signature , zComment :: B.ByteString -- ^ Comment for whole zip archive } deriving (Read, Show) 6 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Codec.Archive.Zip.Entry data Entry = Entry { eRelativePath :: FilePath --
^ Relative path, using '/' as separator , eCompressionMethod :: CompressionMethod -- ^ Compression method , eLastModified :: Integer -- ^ Modification time (seconds since unix epoch) , eCRC32 :: Word32 -- ^ CRC32 checksum , eCompressedSize :: Word32 -- ^ Compressed size in bytes , eUncompressedSize :: Word32 -- ^ Uncompressed size in bytes , eExtraField :: B.ByteString -- ^ Extra field - unused by this library , eFileComment :: B.ByteString -- ^ File comment - unused by this library , eVersionMadeBy :: Word16 -- ^ Version made by field , eInternalFileAttributes :: Word16 -- ^ Internal file attributes - unused by this library , eExternalFileAttributes :: Word32 -- ^ External file attributes (system-dependent) , eCompressedData :: B.ByteString -- ^ Compressed contents of file } deriving (Read, Show, Eq) 7 © Tommaso Piazza, 2018 - https:/ /github.com/blender
How to read a zip • toArchive :: ByteString ->
Archive Data.ByteString.Lazy.readFile "archive.zip" >>= return . toArchive 8 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Diff http:/ /www.mergely.com/MF7ILoW3/ • Left: Without symlinks • Right: With
symlinks 9 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Non Symlink Entry With some omissions Entry { , eRelativePath
= "hello/link_to_dir1/" , eCompressionMethod = NoCompression , eCRC32 = 0 , eCompressedSize = 0 -- ^ Size of the file, dirs have 0 size , eUncompressedSize = 0 , eExternalFileAttributes = 1106051088 -- ^ Permissions and other flags , eCompressedData = "" -- ^ Dirs have no compressed data } 10 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Symlink Entry With some omissions Entry { , eRelativePath =
"hello/link_to_dir1" , eCompressionMethod = NoCompression , eCRC32 = 3594983628 , eCompressedSize = 5 -- ^ The length in bytes of string representing the target , eUncompressedSize = 5 , eExternalFileAttributes = 2716663808 -- ^ Permissions and other flags like symlink flag! , eCompressedData = "dir1/" -- ^ The path to the target } 11 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Differences (1) In zip without symlink preservation • One entry
for the target • One entry with duplicated eCompressedData same as in target • For all intents and purposes this is another file • If symlink is to a dir, recurse and duplicate everything 12 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Differences (2) In zip with symlink preservation • One entry
for the target • One entry for the link • eCompressedData is the string representing the target • If symlink is to a dir, do not recurse 13 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Behavior of zip-archive Both when archiving and unarchiving • No
respect for symlinks • Always duplicates eCompressedData • thus encoding a new file • If symlink is to a dir, recurses into the dir 14 © Tommaso Piazza, 2018 - https:/ /github.com/blender
How Archiving works addFilesToArchive [OptRecursive] emptyArchive ["hello"] :: IO Archive
This will internally call for each file readEntry :: [ZipOption] -> FilePath -> IO Entry Then finally serialized as binary 15 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Archiving Changes -- | Options for 'addFilesToArchive' and 'extractFilesFromArchive'. data
ZipOption = OptRecursive -- ^ Recurse into directories when adding files | OptVerbose -- ^ Print information to stderr | OptDestination FilePath -- ^ Directory in which to extract | OptLocation FilePath Bool -- ^ Where to place file when adding files and whether to append current path | OptPreserveSymbolicLinks -- ^ Preserve symbolic links as such. This option is ignored on Windows. deriving (Read, Show, Eq) addFilesToArchive :: [ZipOption] -> Archive -> [FilePath] -> IO Archive addFilesToArchive opts archive files = -- ^ don't recurse in dirs if OptPreserveSymbolicLinks `elem` opts -- | Generates a 'Entry' from a file or directory. readEntry :: [ZipOption] -> FilePath -> IO Entry readEntry opts file = -- ^ if OptPreserveSymbolicLinks `elem` opts -- strip "/" if FilePath is dir -- populate eCompressedData with targets of symlinks as string -- add symbolic link file mode. Extremetly important detail. 16 © Tommaso Piazza, 2018 - https:/ /github.com/blender
eExternalFileAttributes eExternalFileAttributes for Posix systems contains the file modes and
permissions. For symlinks, add the symlink file mode from System.Posix.Files From readEntry: fs <- getSymbolicLinkStatus path let isSymLink = isSymbolicLink fs let fm = if isSymLink then unionFileModes symbolicLinkMode (fileMode fs) else fileMode f 17 © Tommaso Piazza, 2018 - https:/ /github.com/blender
All Done right? Not quire, changes are needed in extraction
too 18 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Unarchiving Changes extractFilesFromArchive :: [ZipOption] -> Archive -> IO ()
extractFilesFromArchive opts archive = -- ^ Separate symbolic entries from non symbolic entries. -- Call writeSymbolicLinkEntry on each Entry writeSymbolicLinkEntry :: [ZipOption] -> Entry -> IO () if (isEntrySymbolicLink entry) then do let targetPath = fromJust . symbolicLinkEntryTarget $ entry let symlinkPath = prefixPath </> eRelativePath entry createSymbolicLink targetPath symlinkPath else writeEntry opts entry symbolicLinkEntryTarget :: Entry -> Maybe FilePath symbolicLinkEntryTarget entry | isEntrySymbolicLink entry = Just . C.unpack $ fromEntry entry | otherwise = Nothing 19 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Other challenges (1) • 2 Pull requests • Extraction was
not implemented in phase one. No one noticed !. • ifdef _WINDOWS, ifndef _WINDOWS everywhere • Not supported on Windows " • directory == 1.2.1.0 for compatibility reasons 20 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Other challenges (2) • Broke Hackage and Stackage • !
jgm/zip-archive/issues/38 • Added dir with symlinks for testing. Broke Hackage. • directory == 1.3.1 newer than LTS version. Broke Stackage. 21 © Tommaso Piazza, 2018 - https:/ /github.com/blender