Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Haskelll & Zip Archives - What I learned improv...
Search
Tommaso Piazza
January 24, 2018
Programming
0
190
Haskelll & Zip Archives - What I learned improving zip-archive
What I learned about zip archives, the zip-archive library and contributing to open source Haskell
Tommaso Piazza
January 24, 2018
Tweet
Share
More Decks by Tommaso Piazza
See All by Tommaso Piazza
From Code to Binary
tmspzz
0
29
Mach-O Mach-O
tmspzz
2
130
Haskell's Kind System - A Primer
tmspzz
0
180
Carthage Tips & Tricks
tmspzz
1
84
PSA: Carthage 0.29.0
tmspzz
0
220
Caching, a simple solution to speeding up build times
tmspzz
0
620
Rome - An S3 Cache for Carthage
tmspzz
0
220
Other Decks in Programming
See All in Programming
例外処理とどう使い分ける?Result型を使ったエラー設計 #burikaigi
kajitack
9
2.6k
【卒業研究】会話ログ分析によるユーザーごとの関心に応じた話題提案手法
momok47
0
160
Cap'n Webについて
yusukebe
0
160
AI前提で考えるiOSアプリのモダナイズ設計
yuukiw00w
0
210
生成AI時代を勝ち抜くエンジニア組織マネジメント
coconala_engineer
0
37k
re:Invent 2025 トレンドからみる製品開発への AI Agent 活用
yoskoh
0
570
Combinatorial Interview Problems with Backtracking Solutions - From Imperative Procedural Programming to Declarative Functional Programming - Part 2
philipschwarz
PRO
0
130
Giselleで作るAI QAアシスタント 〜 Pull Requestレビューに継続的QAを
codenote
0
330
CSC307 Lecture 01
javiergs
PRO
0
650
それ、本当に安全? ファイルアップロードで見落としがちなセキュリティリスクと対策
penpeen
4
1.2k
2年のAppleウォレットパス開発の振り返り
muno92
PRO
0
180
Navigation 3: 적응형 UI를 위한 앱 탐색
fornewid
1
520
Featured
See All Featured
The Illustrated Children's Guide to Kubernetes
chrisshort
51
51k
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
35
3.3k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.6k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
3.9k
ラッコキーワード サービス紹介資料
rakko
0
1.9M
Music & Morning Musume
bryan
46
7k
Google's AI Overviews - The New Search
badams
0
880
State of Search Keynote: SEO is Dead Long Live SEO
ryanjones
0
80
WENDY [Excerpt]
tessaabrams
9
35k
Rebuilding a faster, lazier Slack
samanthasiow
85
9.3k
HDC tutorial
michielstock
1
290
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
Transcript
Haskell & Zip files What I learned while improving zip-archive
Tommaso Piazza @tmpz 1 © Tommaso Piazza, 2018 - https:/ /github.com/blender
The library The zip-archive ! library provides functions for creating,
modifying, and extracting files from zip archives. • Created by John MacFarlane ! author of pandoc ! • Very straight forward, ~1000 LOC 2 © Tommaso Piazza, 2018 - https:/ /github.com/blender
The Problem • zip and unzip support archiving and extracting
symlinks • zip-archive did not Python zipfile ! does not support symlinks. 3 © Tommaso Piazza, 2018 - https:/ /github.com/blender
What is a symlink anyways? • Magic? No, just a
file (with some special attributes) • Contains relative or absolute path to target as text $ ls total 32 drwxr-xr-x 3 blender staff 102B Jan 23 18:48 dir1 -rw-r--r-- 1 blender staff 15B Jan 23 18:46 file1 lrwxr-xr-x 1 blender staff 5B Jan 23 18:49 link_to_dir1 -> dir1/ lrwxr-xr-x 1 blender staff 5B Jan 23 18:49 link_to_file1 -> file1 lrwxr-xr-x 1 blender staff 16B Jan 23 18:49 link_to_file_in_dir -> dir1/file_in_dir $ readlink link_to_file1 file1 4 © Tommaso Piazza, 2018 - https:/ /github.com/blender
The plan Steps suggested by Matthias at the last meetup:
1. Find out how a zip archive is represented in zip-archive 2. zip -r -q --symlinks a directory with symlinks 3. zip -r -q the same directory 4. Compare archives 5. Profit ⭐⭐⭐⭐⭐ Solid plan. Would follow again. -- A satisfied developer 5 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Codec.Archive.Zip.Archive • ! Zip Archive binary format document • !
Helpful reference 1 • ! Helpful reference 2 data Archive = Archive { zEntries :: [Entry] -- ^ Files in zip archive , zSignature :: Maybe B.ByteString -- ^ Digital signature , zComment :: B.ByteString -- ^ Comment for whole zip archive } deriving (Read, Show) 6 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Codec.Archive.Zip.Entry data Entry = Entry { eRelativePath :: FilePath --
^ Relative path, using '/' as separator , eCompressionMethod :: CompressionMethod -- ^ Compression method , eLastModified :: Integer -- ^ Modification time (seconds since unix epoch) , eCRC32 :: Word32 -- ^ CRC32 checksum , eCompressedSize :: Word32 -- ^ Compressed size in bytes , eUncompressedSize :: Word32 -- ^ Uncompressed size in bytes , eExtraField :: B.ByteString -- ^ Extra field - unused by this library , eFileComment :: B.ByteString -- ^ File comment - unused by this library , eVersionMadeBy :: Word16 -- ^ Version made by field , eInternalFileAttributes :: Word16 -- ^ Internal file attributes - unused by this library , eExternalFileAttributes :: Word32 -- ^ External file attributes (system-dependent) , eCompressedData :: B.ByteString -- ^ Compressed contents of file } deriving (Read, Show, Eq) 7 © Tommaso Piazza, 2018 - https:/ /github.com/blender
How to read a zip • toArchive :: ByteString ->
Archive Data.ByteString.Lazy.readFile "archive.zip" >>= return . toArchive 8 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Diff http:/ /www.mergely.com/MF7ILoW3/ • Left: Without symlinks • Right: With
symlinks 9 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Non Symlink Entry With some omissions Entry { , eRelativePath
= "hello/link_to_dir1/" , eCompressionMethod = NoCompression , eCRC32 = 0 , eCompressedSize = 0 -- ^ Size of the file, dirs have 0 size , eUncompressedSize = 0 , eExternalFileAttributes = 1106051088 -- ^ Permissions and other flags , eCompressedData = "" -- ^ Dirs have no compressed data } 10 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Symlink Entry With some omissions Entry { , eRelativePath =
"hello/link_to_dir1" , eCompressionMethod = NoCompression , eCRC32 = 3594983628 , eCompressedSize = 5 -- ^ The length in bytes of string representing the target , eUncompressedSize = 5 , eExternalFileAttributes = 2716663808 -- ^ Permissions and other flags like symlink flag! , eCompressedData = "dir1/" -- ^ The path to the target } 11 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Differences (1) In zip without symlink preservation • One entry
for the target • One entry with duplicated eCompressedData same as in target • For all intents and purposes this is another file • If symlink is to a dir, recurse and duplicate everything 12 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Differences (2) In zip with symlink preservation • One entry
for the target • One entry for the link • eCompressedData is the string representing the target • If symlink is to a dir, do not recurse 13 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Behavior of zip-archive Both when archiving and unarchiving • No
respect for symlinks • Always duplicates eCompressedData • thus encoding a new file • If symlink is to a dir, recurses into the dir 14 © Tommaso Piazza, 2018 - https:/ /github.com/blender
How Archiving works addFilesToArchive [OptRecursive] emptyArchive ["hello"] :: IO Archive
This will internally call for each file readEntry :: [ZipOption] -> FilePath -> IO Entry Then finally serialized as binary 15 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Archiving Changes -- | Options for 'addFilesToArchive' and 'extractFilesFromArchive'. data
ZipOption = OptRecursive -- ^ Recurse into directories when adding files | OptVerbose -- ^ Print information to stderr | OptDestination FilePath -- ^ Directory in which to extract | OptLocation FilePath Bool -- ^ Where to place file when adding files and whether to append current path | OptPreserveSymbolicLinks -- ^ Preserve symbolic links as such. This option is ignored on Windows. deriving (Read, Show, Eq) addFilesToArchive :: [ZipOption] -> Archive -> [FilePath] -> IO Archive addFilesToArchive opts archive files = -- ^ don't recurse in dirs if OptPreserveSymbolicLinks `elem` opts -- | Generates a 'Entry' from a file or directory. readEntry :: [ZipOption] -> FilePath -> IO Entry readEntry opts file = -- ^ if OptPreserveSymbolicLinks `elem` opts -- strip "/" if FilePath is dir -- populate eCompressedData with targets of symlinks as string -- add symbolic link file mode. Extremetly important detail. 16 © Tommaso Piazza, 2018 - https:/ /github.com/blender
eExternalFileAttributes eExternalFileAttributes for Posix systems contains the file modes and
permissions. For symlinks, add the symlink file mode from System.Posix.Files From readEntry: fs <- getSymbolicLinkStatus path let isSymLink = isSymbolicLink fs let fm = if isSymLink then unionFileModes symbolicLinkMode (fileMode fs) else fileMode f 17 © Tommaso Piazza, 2018 - https:/ /github.com/blender
All Done right? Not quire, changes are needed in extraction
too 18 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Unarchiving Changes extractFilesFromArchive :: [ZipOption] -> Archive -> IO ()
extractFilesFromArchive opts archive = -- ^ Separate symbolic entries from non symbolic entries. -- Call writeSymbolicLinkEntry on each Entry writeSymbolicLinkEntry :: [ZipOption] -> Entry -> IO () if (isEntrySymbolicLink entry) then do let targetPath = fromJust . symbolicLinkEntryTarget $ entry let symlinkPath = prefixPath </> eRelativePath entry createSymbolicLink targetPath symlinkPath else writeEntry opts entry symbolicLinkEntryTarget :: Entry -> Maybe FilePath symbolicLinkEntryTarget entry | isEntrySymbolicLink entry = Just . C.unpack $ fromEntry entry | otherwise = Nothing 19 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Other challenges (1) • 2 Pull requests • Extraction was
not implemented in phase one. No one noticed !. • ifdef _WINDOWS, ifndef _WINDOWS everywhere • Not supported on Windows " • directory == 1.2.1.0 for compatibility reasons 20 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Other challenges (2) • Broke Hackage and Stackage • !
jgm/zip-archive/issues/38 • Added dir with symlinks for testing. Broke Hackage. • directory == 1.3.1 newer than LTS version. Broke Stackage. 21 © Tommaso Piazza, 2018 - https:/ /github.com/blender