Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Haskelll & Zip Archives - What I learned improv...
Search
Tommaso Piazza
January 24, 2018
Programming
0
190
Haskelll & Zip Archives - What I learned improving zip-archive
What I learned about zip archives, the zip-archive library and contributing to open source Haskell
Tommaso Piazza
January 24, 2018
Tweet
Share
More Decks by Tommaso Piazza
See All by Tommaso Piazza
From Code to Binary
tmspzz
0
24
Mach-O Mach-O
tmspzz
2
120
Haskell's Kind System - A Primer
tmspzz
0
180
Carthage Tips & Tricks
tmspzz
1
80
PSA: Carthage 0.29.0
tmspzz
0
220
Caching, a simple solution to speeding up build times
tmspzz
0
560
Rome - An S3 Cache for Carthage
tmspzz
0
220
Other Decks in Programming
See All in Programming
cXML という電子商取引の トランザクションを支える プロトコルと向きあっている話
phigasui
3
2.3k
プロジェクト新規参入者のリードタイム短縮の観点から見る、品質の高いコードとアーキテクチャを保つメリット
d_endo
1
1k
Kotlin2でdataクラスの copyメソッドを禁止する/Data class copy function to have the same visibility as constructor
eichisanden
1
140
Googleのテストサイズを活用したテスト環境の構築
toms74209200
0
270
約9000個の自動テストの 時間を50分->10分に短縮 Flakyテストを1%以下に抑えた話
hatsu38
23
11k
Synchronizationを支える技術
s_shimotori
1
150
AWS IaCの注目アップデート 2024年10月版
konokenj
3
3.1k
Realtime API 入門
riofujimon
0
110
2万ページのSSG運用における工夫と注意点 / Vue Fes Japan 2024
chinen
3
1.3k
C#/.NETのこれまでのふりかえり
tomokusaba
1
160
Universal Linksの実装方法と陥りがちな罠
kaitokudou
1
220
Vaporモードを大規模サービスに最速導入して学びを共有する
kazukishimamoto
4
4.3k
Featured
See All Featured
Art, The Web, and Tiny UX
lynnandtonic
296
20k
What's in a price? How to price your products and services
michaelherold
243
12k
How to train your dragon (web standard)
notwaldorf
88
5.7k
Reflections from 52 weeks, 52 projects
jeffersonlam
346
20k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
231
17k
Documentation Writing (for coders)
carmenintech
65
4.4k
Unsuck your backbone
ammeep
668
57k
GitHub's CSS Performance
jonrohan
1030
460k
Done Done
chrislema
181
16k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
664
120k
A better future with KSS
kneath
238
17k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.2k
Transcript
Haskell & Zip files What I learned while improving zip-archive
Tommaso Piazza @tmpz 1 © Tommaso Piazza, 2018 - https:/ /github.com/blender
The library The zip-archive ! library provides functions for creating,
modifying, and extracting files from zip archives. • Created by John MacFarlane ! author of pandoc ! • Very straight forward, ~1000 LOC 2 © Tommaso Piazza, 2018 - https:/ /github.com/blender
The Problem • zip and unzip support archiving and extracting
symlinks • zip-archive did not Python zipfile ! does not support symlinks. 3 © Tommaso Piazza, 2018 - https:/ /github.com/blender
What is a symlink anyways? • Magic? No, just a
file (with some special attributes) • Contains relative or absolute path to target as text $ ls total 32 drwxr-xr-x 3 blender staff 102B Jan 23 18:48 dir1 -rw-r--r-- 1 blender staff 15B Jan 23 18:46 file1 lrwxr-xr-x 1 blender staff 5B Jan 23 18:49 link_to_dir1 -> dir1/ lrwxr-xr-x 1 blender staff 5B Jan 23 18:49 link_to_file1 -> file1 lrwxr-xr-x 1 blender staff 16B Jan 23 18:49 link_to_file_in_dir -> dir1/file_in_dir $ readlink link_to_file1 file1 4 © Tommaso Piazza, 2018 - https:/ /github.com/blender
The plan Steps suggested by Matthias at the last meetup:
1. Find out how a zip archive is represented in zip-archive 2. zip -r -q --symlinks a directory with symlinks 3. zip -r -q the same directory 4. Compare archives 5. Profit ⭐⭐⭐⭐⭐ Solid plan. Would follow again. -- A satisfied developer 5 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Codec.Archive.Zip.Archive • ! Zip Archive binary format document • !
Helpful reference 1 • ! Helpful reference 2 data Archive = Archive { zEntries :: [Entry] -- ^ Files in zip archive , zSignature :: Maybe B.ByteString -- ^ Digital signature , zComment :: B.ByteString -- ^ Comment for whole zip archive } deriving (Read, Show) 6 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Codec.Archive.Zip.Entry data Entry = Entry { eRelativePath :: FilePath --
^ Relative path, using '/' as separator , eCompressionMethod :: CompressionMethod -- ^ Compression method , eLastModified :: Integer -- ^ Modification time (seconds since unix epoch) , eCRC32 :: Word32 -- ^ CRC32 checksum , eCompressedSize :: Word32 -- ^ Compressed size in bytes , eUncompressedSize :: Word32 -- ^ Uncompressed size in bytes , eExtraField :: B.ByteString -- ^ Extra field - unused by this library , eFileComment :: B.ByteString -- ^ File comment - unused by this library , eVersionMadeBy :: Word16 -- ^ Version made by field , eInternalFileAttributes :: Word16 -- ^ Internal file attributes - unused by this library , eExternalFileAttributes :: Word32 -- ^ External file attributes (system-dependent) , eCompressedData :: B.ByteString -- ^ Compressed contents of file } deriving (Read, Show, Eq) 7 © Tommaso Piazza, 2018 - https:/ /github.com/blender
How to read a zip • toArchive :: ByteString ->
Archive Data.ByteString.Lazy.readFile "archive.zip" >>= return . toArchive 8 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Diff http:/ /www.mergely.com/MF7ILoW3/ • Left: Without symlinks • Right: With
symlinks 9 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Non Symlink Entry With some omissions Entry { , eRelativePath
= "hello/link_to_dir1/" , eCompressionMethod = NoCompression , eCRC32 = 0 , eCompressedSize = 0 -- ^ Size of the file, dirs have 0 size , eUncompressedSize = 0 , eExternalFileAttributes = 1106051088 -- ^ Permissions and other flags , eCompressedData = "" -- ^ Dirs have no compressed data } 10 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Symlink Entry With some omissions Entry { , eRelativePath =
"hello/link_to_dir1" , eCompressionMethod = NoCompression , eCRC32 = 3594983628 , eCompressedSize = 5 -- ^ The length in bytes of string representing the target , eUncompressedSize = 5 , eExternalFileAttributes = 2716663808 -- ^ Permissions and other flags like symlink flag! , eCompressedData = "dir1/" -- ^ The path to the target } 11 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Differences (1) In zip without symlink preservation • One entry
for the target • One entry with duplicated eCompressedData same as in target • For all intents and purposes this is another file • If symlink is to a dir, recurse and duplicate everything 12 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Differences (2) In zip with symlink preservation • One entry
for the target • One entry for the link • eCompressedData is the string representing the target • If symlink is to a dir, do not recurse 13 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Behavior of zip-archive Both when archiving and unarchiving • No
respect for symlinks • Always duplicates eCompressedData • thus encoding a new file • If symlink is to a dir, recurses into the dir 14 © Tommaso Piazza, 2018 - https:/ /github.com/blender
How Archiving works addFilesToArchive [OptRecursive] emptyArchive ["hello"] :: IO Archive
This will internally call for each file readEntry :: [ZipOption] -> FilePath -> IO Entry Then finally serialized as binary 15 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Archiving Changes -- | Options for 'addFilesToArchive' and 'extractFilesFromArchive'. data
ZipOption = OptRecursive -- ^ Recurse into directories when adding files | OptVerbose -- ^ Print information to stderr | OptDestination FilePath -- ^ Directory in which to extract | OptLocation FilePath Bool -- ^ Where to place file when adding files and whether to append current path | OptPreserveSymbolicLinks -- ^ Preserve symbolic links as such. This option is ignored on Windows. deriving (Read, Show, Eq) addFilesToArchive :: [ZipOption] -> Archive -> [FilePath] -> IO Archive addFilesToArchive opts archive files = -- ^ don't recurse in dirs if OptPreserveSymbolicLinks `elem` opts -- | Generates a 'Entry' from a file or directory. readEntry :: [ZipOption] -> FilePath -> IO Entry readEntry opts file = -- ^ if OptPreserveSymbolicLinks `elem` opts -- strip "/" if FilePath is dir -- populate eCompressedData with targets of symlinks as string -- add symbolic link file mode. Extremetly important detail. 16 © Tommaso Piazza, 2018 - https:/ /github.com/blender
eExternalFileAttributes eExternalFileAttributes for Posix systems contains the file modes and
permissions. For symlinks, add the symlink file mode from System.Posix.Files From readEntry: fs <- getSymbolicLinkStatus path let isSymLink = isSymbolicLink fs let fm = if isSymLink then unionFileModes symbolicLinkMode (fileMode fs) else fileMode f 17 © Tommaso Piazza, 2018 - https:/ /github.com/blender
All Done right? Not quire, changes are needed in extraction
too 18 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Unarchiving Changes extractFilesFromArchive :: [ZipOption] -> Archive -> IO ()
extractFilesFromArchive opts archive = -- ^ Separate symbolic entries from non symbolic entries. -- Call writeSymbolicLinkEntry on each Entry writeSymbolicLinkEntry :: [ZipOption] -> Entry -> IO () if (isEntrySymbolicLink entry) then do let targetPath = fromJust . symbolicLinkEntryTarget $ entry let symlinkPath = prefixPath </> eRelativePath entry createSymbolicLink targetPath symlinkPath else writeEntry opts entry symbolicLinkEntryTarget :: Entry -> Maybe FilePath symbolicLinkEntryTarget entry | isEntrySymbolicLink entry = Just . C.unpack $ fromEntry entry | otherwise = Nothing 19 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Other challenges (1) • 2 Pull requests • Extraction was
not implemented in phase one. No one noticed !. • ifdef _WINDOWS, ifndef _WINDOWS everywhere • Not supported on Windows " • directory == 1.2.1.0 for compatibility reasons 20 © Tommaso Piazza, 2018 - https:/ /github.com/blender
Other challenges (2) • Broke Hackage and Stackage • !
jgm/zip-archive/issues/38 • Added dir with symlinks for testing. Broke Hackage. • directory == 1.3.1 newer than LTS version. Broke Stackage. 21 © Tommaso Piazza, 2018 - https:/ /github.com/blender