Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Haskelll & Zip Archives - What I learned improv...
Search
Tommaso Piazza
January 24, 2018
Programming
0
190
Haskelll & Zip Archives - What I learned improving zip-archive
What I learned about zip archives, the zip-archive library and contributing to open source Haskell
Tommaso Piazza
January 24, 2018
Tweet
Share
More Decks by Tommaso Piazza
See All by Tommaso Piazza
From Code to Binary
tmspzz
0
26
Mach-O Mach-O
tmspzz
2
120
Haskell's Kind System - A Primer
tmspzz
0
180
Carthage Tips & Tricks
tmspzz
1
84
PSA: Carthage 0.29.0
tmspzz
0
220
Caching, a simple solution to speeding up build times
tmspzz
0
600
Rome - An S3 Cache for Carthage
tmspzz
0
220
Other Decks in Programming
See All in Programming
セキュリティマネジャー廃止とクラウドネイティブ型サンドボックス活用
kazumura
1
170
生成AIコーディングとの向き合い方、AIと共創するという考え方 / How to deal with generative AI coding and the concept of co-creating with AI
seike460
PRO
1
190
Benchmark
sysong
0
140
Javaのルールをねじ曲げろ!禁断の操作とその代償から学ぶメタプログラミング入門 / A Guide to Metaprogramming: Lessons from Forbidden Techniques and Their Price
nrslib
3
1.9k
RubyKaigiで得られる10の価値 〜Ruby話を聞くことだけが RubyKaigiじゃない〜
tomohiko9090
0
140
Haskell でアルゴリズムを抽象化する / 関数型言語で競技プログラミング
naoya
17
4.1k
Prism.parseで 300本以上あるエンドポイントに 接続できる権限の一覧表を作ってみた
hatsu38
1
110
Datadog RUM 本番導入までの道
shinter61
1
260
Perlで痩せる
yuukis
1
680
「ElixirでIoT!!」のこれまでとこれから
takasehideki
0
350
Gleamという選択肢
comamoca
6
700
UPDATEがシステムを複雑にする? イミュータブルデータモデルのすすめ
shimomura
1
530
Featured
See All Featured
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Designing for Performance
lara
609
69k
Facilitating Awesome Meetings
lara
54
6.4k
How to train your dragon (web standard)
notwaldorf
92
6.1k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
8
650
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
14
1.5k
It's Worth the Effort
3n
184
28k
StorybookのUI Testing Handbookを読んだ
zakiyama
30
5.8k
How STYLIGHT went responsive
nonsquared
100
5.6k
The World Runs on Bad Software
bkeepers
PRO
68
11k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
48
5.4k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
20
1.3k
Transcript
Haskell & Zip files What I learned while improving zip-archive
Tommaso Piazza @tmpz 1 © Tommaso Piazza, 2018 - https:/ /github.com/blender
The library The zip-archive ! library provides functions for creating,
modifying, and extracting files from zip archives. • Created by John MacFarlane ! author of pandoc ! • Very straight forward, ~1000 LOC 2 © Tommaso Piazza, 2018 - https:/ /github.com/blender
The Problem • zip and unzip support archiving and extracting
symlinks • zip-archive did not Python zipfile ! does not support symlinks. 3 © Tommaso Piazza, 2018 - https:/ /github.com/blender
What is a symlink anyways? • Magic? No, just a
file (with some special attributes) • Contains relative or absolute path to target as text $ ls total 32 drwxr-xr-x 3 blender staff 102B Jan 23 18:48 dir1 -rw-r--r-- 1 blender staff 15B Jan 23 18:46 file1 lrwxr-xr-x 1 blender staff 5B Jan 23 18:49 link_to_dir1 -> dir1/ lrwxr-xr-x 1 blender staff 5B Jan 23 18:49 link_to_file1 -> file1 lrwxr-xr-x 1 blender staff 16B Jan 23 18:49 link_to_file_in_dir -> dir1/file_in_dir $ readlink link_to_file1 file1 4 © Tommaso Piazza, 2018 - https:/ /github.com/blender
The plan Steps suggested by Matthias at the last meetup:
1. Find out how a zip archive is represented in zip-archive 2. zip -r -q --symlinks a directory with symlinks 3. zip -r -q the same directory 4. Compare archives 5. Profit ⭐⭐⭐⭐⭐ Solid plan. Would follow again. -- A satisfied developer 5 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Codec.Archive.Zip.Archive • ! Zip Archive binary format document • !
Helpful reference 1 • ! Helpful reference 2 data Archive = Archive { zEntries :: [Entry] -- ^ Files in zip archive , zSignature :: Maybe B.ByteString -- ^ Digital signature , zComment :: B.ByteString -- ^ Comment for whole zip archive } deriving (Read, Show) 6 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Codec.Archive.Zip.Entry data Entry = Entry { eRelativePath :: FilePath --
^ Relative path, using '/' as separator , eCompressionMethod :: CompressionMethod -- ^ Compression method , eLastModified :: Integer -- ^ Modification time (seconds since unix epoch) , eCRC32 :: Word32 -- ^ CRC32 checksum , eCompressedSize :: Word32 -- ^ Compressed size in bytes , eUncompressedSize :: Word32 -- ^ Uncompressed size in bytes , eExtraField :: B.ByteString -- ^ Extra field - unused by this library , eFileComment :: B.ByteString -- ^ File comment - unused by this library , eVersionMadeBy :: Word16 -- ^ Version made by field , eInternalFileAttributes :: Word16 -- ^ Internal file attributes - unused by this library , eExternalFileAttributes :: Word32 -- ^ External file attributes (system-dependent) , eCompressedData :: B.ByteString -- ^ Compressed contents of file } deriving (Read, Show, Eq) 7 © Tommaso Piazza, 2018 - https:/ /github.com/blender
How to read a zip • toArchive :: ByteString ->
Archive Data.ByteString.Lazy.readFile "archive.zip" >>= return . toArchive 8 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Diff http:/ /www.mergely.com/MF7ILoW3/ • Left: Without symlinks • Right: With
symlinks 9 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Non Symlink Entry With some omissions Entry { , eRelativePath
= "hello/link_to_dir1/" , eCompressionMethod = NoCompression , eCRC32 = 0 , eCompressedSize = 0 -- ^ Size of the file, dirs have 0 size , eUncompressedSize = 0 , eExternalFileAttributes = 1106051088 -- ^ Permissions and other flags , eCompressedData = "" -- ^ Dirs have no compressed data } 10 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Symlink Entry With some omissions Entry { , eRelativePath =
"hello/link_to_dir1" , eCompressionMethod = NoCompression , eCRC32 = 3594983628 , eCompressedSize = 5 -- ^ The length in bytes of string representing the target , eUncompressedSize = 5 , eExternalFileAttributes = 2716663808 -- ^ Permissions and other flags like symlink flag! , eCompressedData = "dir1/" -- ^ The path to the target } 11 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Differences (1) In zip without symlink preservation • One entry
for the target • One entry with duplicated eCompressedData same as in target • For all intents and purposes this is another file • If symlink is to a dir, recurse and duplicate everything 12 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Differences (2) In zip with symlink preservation • One entry
for the target • One entry for the link • eCompressedData is the string representing the target • If symlink is to a dir, do not recurse 13 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Behavior of zip-archive Both when archiving and unarchiving • No
respect for symlinks • Always duplicates eCompressedData • thus encoding a new file • If symlink is to a dir, recurses into the dir 14 © Tommaso Piazza, 2018 - https:/ /github.com/blender
How Archiving works addFilesToArchive [OptRecursive] emptyArchive ["hello"] :: IO Archive
This will internally call for each file readEntry :: [ZipOption] -> FilePath -> IO Entry Then finally serialized as binary 15 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Archiving Changes -- | Options for 'addFilesToArchive' and 'extractFilesFromArchive'. data
ZipOption = OptRecursive -- ^ Recurse into directories when adding files | OptVerbose -- ^ Print information to stderr | OptDestination FilePath -- ^ Directory in which to extract | OptLocation FilePath Bool -- ^ Where to place file when adding files and whether to append current path | OptPreserveSymbolicLinks -- ^ Preserve symbolic links as such. This option is ignored on Windows. deriving (Read, Show, Eq) addFilesToArchive :: [ZipOption] -> Archive -> [FilePath] -> IO Archive addFilesToArchive opts archive files = -- ^ don't recurse in dirs if OptPreserveSymbolicLinks `elem` opts -- | Generates a 'Entry' from a file or directory. readEntry :: [ZipOption] -> FilePath -> IO Entry readEntry opts file = -- ^ if OptPreserveSymbolicLinks `elem` opts -- strip "/" if FilePath is dir -- populate eCompressedData with targets of symlinks as string -- add symbolic link file mode. Extremetly important detail. 16 © Tommaso Piazza, 2018 - https:/ /github.com/blender
eExternalFileAttributes eExternalFileAttributes for Posix systems contains the file modes and
permissions. For symlinks, add the symlink file mode from System.Posix.Files From readEntry: fs <- getSymbolicLinkStatus path let isSymLink = isSymbolicLink fs let fm = if isSymLink then unionFileModes symbolicLinkMode (fileMode fs) else fileMode f 17 © Tommaso Piazza, 2018 - https:/ /github.com/blender
All Done right? Not quire, changes are needed in extraction
too 18 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Unarchiving Changes extractFilesFromArchive :: [ZipOption] -> Archive -> IO ()
extractFilesFromArchive opts archive = -- ^ Separate symbolic entries from non symbolic entries. -- Call writeSymbolicLinkEntry on each Entry writeSymbolicLinkEntry :: [ZipOption] -> Entry -> IO () if (isEntrySymbolicLink entry) then do let targetPath = fromJust . symbolicLinkEntryTarget $ entry let symlinkPath = prefixPath </> eRelativePath entry createSymbolicLink targetPath symlinkPath else writeEntry opts entry symbolicLinkEntryTarget :: Entry -> Maybe FilePath symbolicLinkEntryTarget entry | isEntrySymbolicLink entry = Just . C.unpack $ fromEntry entry | otherwise = Nothing 19 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Other challenges (1) • 2 Pull requests • Extraction was
not implemented in phase one. No one noticed !. • ifdef _WINDOWS, ifndef _WINDOWS everywhere • Not supported on Windows " • directory == 1.2.1.0 for compatibility reasons 20 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Other challenges (2) • Broke Hackage and Stackage • !
jgm/zip-archive/issues/38 • Added dir with symlinks for testing. Broke Hackage. • directory == 1.3.1 newer than LTS version. Broke Stackage. 21 © Tommaso Piazza, 2018 - https:/ /github.com/blender