Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Efficient Container Image Updating in Low-bandw...

Naoki MATSUMOTO
September 26, 2023

Efficient Container Image Updating in Low-bandwidth Networks with Delta Encoding

Naoki MATSUMOTO

September 26, 2023
Tweet

More Decks by Naoki MATSUMOTO

Other Decks in Technology

Transcript

  1. Efficient Container Image Updating in Low-bandwidth Networks with Delta Encoding

    IC2E 2023 Session 1: Containers and micro-services September 26, 2023 Naoki Matsumoto, Daisuke Kotani, Yasuo Okabe (Kyoto University, Japan) 1
  2. Background Container: Lightweight isolation technology. • Container’s process, rootfs, network

    namespaces are isolated from hosts. • Users can bundle and distribute environments as Container image → It makes easy to provision or update environments. • Often used in Cloud and Edge Computing environment. Container image: Bundle of container’s rootfs • Each container uses it with read-only mode. Container Image / ├ etc ├ usr └ local ├ home ├ ubuntu └ public └ opt 2
  3. Background Increasing container use in network-resource restricted environment. • Bandwidth

    is low (e.g., Cellular :50~300Mbps[1]) To start or update containers, users download and expand container images. (pull) 3 Cellular Cloud ISP コンテナ Container コンテナ コンテナ Container Pulling image Run with image Container image [1]Mobile access bandwidth in practice: measurement, analysis, and implications(Xinlei Yang et al., 2022)
  4. Problems in Container Image Updating Large update data cause problems.

    4 Lightweight and Fast Updating is Needed! Low-bandwidth Network Cloud IoT Device Cost increases! Congestion! Deployment takes too much time…
  5. Current Container Image Updating Current container runtimes (e.g., containerd) provides

    layer-based image. Layer-based images cannot provide efficient update. 5 0.00 20.00 40.00 60.00 80.00 100.00 10Mbps 50Mbps 100Mbps 500Mbps 1Gbps 5Gbps 10Gbps Time to pull (sec) Network Bandwidth Time to update from postgres:13.1 to postgres:13.2 Download Expand Download time is dominant We assume these network environments Layer-0 Layer-1 Layer-2 Build with updated Dockerfile or source code Updated layer
  6. Related Works 6 Lazy-pulling[2][5] • Downloading files required to start

    container preferentially. [2] Slacker: Fast Distribution with Lazy Docker Containers (Tyler Harter et al., 2016) a Container Lazy-pulling plugin Container Registry Container 2. Request files to start the container 3.Files or chunks 5. Request a file when read Client [5] stargz-snapshotter (https://github.com/containerd/stargz-snapshotter) 4. A container starts 6.Files or chunks 1. Request to start a container
  7. Related Works File-by-file delta method[3] • Transferring file-by-file deltas between

    local images and required images. 7 Compare files and transfer complete updated files [3] Starlight: Fast Container Provisioning on the Edge and over the WAN (Jun Lin Chen et al., 2022) a Container Container Starlight Client Container Starlight Client Starlight Server Compare files between images The client has some images Send only new or updated files Requesting new image
  8. Problems in Related Works These works rest a room to

    reduce update data size. Lazy-pulling • Data size is not reduced, Require stable low-latency network. File-by-file delta method • Cannot handle partial modifications on files efficiently. • Most of the content in some execs and shared libs are not updated. 8 old file new file updated Need to transfer complete file
  9. Proposed Method Reducing data to update images using delta encoding.

    • Transferring only required partial data to update. Old image New image update Generating deltas (Server) 9 a Apply deltas Applying deltas (Client) Update bundle Generate deltas Distribute コンテナ コンテナ Container Low-bandwidth network Update bundle Old image New image Non-layered updating Update data size is reduced!
  10. Results Update data size is reduced to 5 ~ 40%

    compared to existing methods. • Time to update is also reduced. • Performance degradation is little excepting some cases. 10 5.17 4.95 12.56 20.64 1.69 0.97 3.23 8.48 0.00 5.00 10.00 15.00 20.00 25.00 .1 - .2 .2 - .3 .29 - .30 .30 - .31 postgres mysql Time to Update Image (sec) File-by-file delta Proposed method 28.27 26.57 69.84 118.56 4.46 3.79 16.51 47.26 0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 .1 - .2 .2 - .3 .29 - .30 .30 - .31 postgres mysql Delta Bundle Size (MB) File-by-file delta Proposed method
  11. Challenges in Container Image Updating Delta encoding: generating deltas between

    files. → Updating container images has challenges in delta encoding. • Generating deltas on server: Target is many versioned files • Deltas are generated and applied to 100s or 1000s of files at a time. • Number of combination for deltas are 𝑂(𝑛2) with 𝑛 versions. → We need to consider the time and load to generate them. • Applying deltas on client: Need to apply each delta. • Applying deltas takes time and consumes disk IO and CPU resource. 11
  12. Overview of Proposed Method Our approach uses delta encoding for

    container image updating. • We used bsdiff as delta encoding algorithm. • A server generates and merges deltas, and a client applies deltas to old images. 12 Server Client Container Runtime Di3FS Snapshotter plugin Container Container (5) Provide container images (3) Work with runtime Update bundle server (4) Mount delta bundle with Di3FS Registry Delta bundle store (0) Download image (1)Generate delta bundles (2) Generate update bundle with DeltaMerging Update bundle
  13. Overview of Proposed Method This presentation explains the core parts

    of our method. • Delta Bundle Format, Merge-based Delta Generation Strategy • Server: DeltaMerging enables merged-based strategy. • Client: Di3FS applies deltas laziliy. 13 Container Runtime Di3FS Snapshotter plugin Container Container (5) Provide container images (3) Work with runtime Update bundle server (4) Mount delta bundle with Di3FS Registry Delta bundle store (0) Download image (1)Generate delta bundles (2) Generate update bundle with DeltaMerging Update bundle Generating deltas quickly Applying deltas lazily
  14. Delta Generation Generating deltas for each file and packing them

    as delta bundle. • Delta encoding generates delta files for updated files. • New files are compressed. Manifest and Config for container are packed as an update bundle 14 / ├ usr └ home └ ubuntu ├ fileA └ fileB / ├ usr └ home └ ubuntu ├ fileA(updated) ├ fileB └ fileC(new) compression delta ・Manifest ・Config ・Delta bundle ・Metadata ・Structure of directories ・File attributes ・fileA.diff (delta file) ・fileC (new file) Old image New image Update bundle
  15. Delta Bundle Decompressing image layers(tar.gz) takes much time especially in

    IoT devices. → It increases pulling time and consume CPU resources and disk IO. Delta bundle does not require entire decompression and expansion. • Directory structures and file attributes are retained as metadata. • Di3FS provides updated image using metadata without applying all deltas. FileA B FileD.diff C E { "name":“FileD", “type”: FILE_DIFF, "compressedSize":74, "offset":78 } { "name":“FileA", “type”: FILE_NEW, "compressedSize":40, "offset":0 } Metadata Delta bundle { "name":“FileD”, “size”:180, “mode”:420, “uid”:1000, “gid”:1000, “type”:FILE_DIFF, “childs”:[], “compressedSize”:74, "offset":78 } 15
  16. Strategy for Delta Generation 16 Each Client can have different

    old image. As a strategy, three approaches are considered. 1. Generating deltas for each client on request. 2. Generating deltas for all combinations in advance. 3. Cherry picking best points from 1 and 2 v3 v3 Client A Client B Delta for v2 → v3 is required Delta for v1 → v3 is required v2 v1
  17. Strategy for Delta Generation 1. Generating deltas for each client

    on request. → Generating deltas takes much time = update time increases 2. Generating deltas for all patterns in advance. → Number of deltas follows 𝑂 𝑛2 = impractical when version increases 17 Client Server 1. Request 2. Generate deltas 3. Response Client Server 1. Request 2. Response 0. Generate deltas in advance
  18. Strategy for Delta Generation 18 3. Generating deltas cherry picking

    best points from 1 and 2 • We employed the approach to utilize pre-generated deltas and merging. • Generating deltas for (𝑽𝒊 , 𝑽𝒊+𝟏 ) in advance, and merging them. Client A Request Δ(𝑉0 , 𝑉1 ) Response Δ(𝑉0 , 𝑉1 ) Client B Δ(𝑉0 , 𝑉1 ) Δ(𝑉1 , 𝑉2 ) Δ(𝑉2 , 𝑉3 ) Request Δ(𝑉1 , 𝑉3 ) Response Δ(𝑉1 , 𝑉3 ) Server Send pre-generated delta Merge pre-generated deltas Δ(𝑉1 , 𝑉3 )
  19. Faster Delta Generation We use bsdiff to generate and apply

    deltas. • Known as highly efficient delta generation method. • Using suffix array to get Longest Common Subsequence. 19 0x00, 0x00, 0x02, 0x02, 0x02 0xAB, 0xBC INSERT ADD 0x05, 0x02, 0x03 offsets ADD 5 bytes, INSERT 2 bytes, Move +3 bytes Delta file A block of operation A subsequence to ADD A subsequence to INSERT
  20. Faster Delta Generation: DeltaMerging DeltaMerging merges each delta files generated

    by bsdiff. • ADD and ADD are merged as ADD, and others are merged as INSERT. • Only seeking and merging delta blocks → Faster than generating deltas. 20 v1→v2 Delta file v2→v3 Delta file v1→v3 Delta file ADD INSERT ADD INSERT ADD ADD INSE RT … … INSE RT INSE RT …
  21. Lazy Delta Applying: Di3FS Applying deltas on-demand when the file

    is opened → No need to apply all deltas 21 same approach with lazy-pulling Di3FS 1. Showing new files with metadata in the delta bundle New file attributes (metadata) ReadDir, GetAttr Open New file Applying delta 2. Applying delta when the files is opened OK Read(offset=0, len=4096) OK(Data=0xab, 0xbc,…) 3. Reading data Old file Delta file ls -l cat new.txt
  22. Implementation and Evaluation • Implemented for containerd 1.6.2. • Environment:

    IoT device as client, and network is slow cellular network. • Parameters are Throughput: 50 Mbps, Latency(RTT): 40 ms [1][4] • Showing results for postgres(13.1, .2, .3) and mysql(8.0.29, .30, .31). 22 Client (Raspberry Pi 4B) CPU 4 cores Memory 8GB Server (Virtual Machine) CPU 8 cores Memory 32GB [4] Revisiting the Arguments for Edge Computing Research(Blesson Varghese, et al., 2021) tc emulated network (50 Mbps, 40 ms)
  23. Delta Size Reduction Compared delta size reduction with Starlight[3]’s approach

    (File-by-file delta). → Proposed method reduces delta size to 5~40% compared to File-by-file delta Size increase with DeltaMerging is little. 23 With pre-generated deltas With merging deltas on request 28.27 26.57 69.84 118.56 4.46 3.79 16.51 47.26 0.00 20.00 40.00 60.00 80.00 100.00 120.00 140.00 .1 - .2 .2 - .3 .29 - .30 .30 - .31 postgres mysql Delta Bundle Size (MB) File-by-file delta Proposed method 31.02 6.71 5.29 1.16 5.36 1.15 0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 .1 - .3 postgres .29 - .31 mysql Delta Bundle Size (MB) File-by-file delta Binary delta encoding DeltaMerging
  24. Breakdown of Delta Size Reduction Huge size reductions were seen

    in executables and shared libs. • bsdiff is designed for executable files. 24 0 0.5 1 1.5 2 2.5 3 3.5 0 500000 1000000 1500000 2000000 2500000 3000000 3500000 4000000 Compressed size ratio Compressed new file size (bytes) /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1 /usr/lib/postgresql/13/lib/bitcode/postgres.index.bc /usr/lib/x86_64-linux-gnu/libapt-pkg.so.5.0.2 Proposed method is better file-by-file delta is better /usr/lib/postgresql/13/bin/postgres Lower is better Deltas between postgres:13.1 and postgres:13.2
  25. Time to Generate Deltas • Time to generate deltas increased

    compared to file-by-file delta. • With DeltaMerging, generating deltas is much faster than non-merging generation. → Pre-generated + merging deltas reduces the time to generate deltas. 25 3.37 3.38 4.80 5.35 19.99 18.03 113.89 156.94 0.00 30.00 60.00 90.00 120.00 150.00 180.00 .1 - .2 .2 - .3 .29 - .30 .30 - .31 postgres mysql Time to Generate Deltas (sec) File-by-file delta Delta encoding 3.38 5.32 22.33 167.08 5.87 23.96 0.00 30.00 60.00 90.00 120.00 150.00 180.00 .1 - .3 .29 - .31 postgres mysql Time to Generate Deltas (sec) File-by-file delta Delta encoding DeltaMerging Generating deltas Comparison between generating and merging deltas
  26. Time to Update Container Image Time to update is from

    downloading to mounting images. → When pre-generated deltas exists, the time is reduced. Merging deltas is a bit slow, and more improvements are required. 26 With pre-generated deltas With merging deltas on request 0.00 5.00 10.00 15.00 20.00 25.00 File-by-file delta Proposed method File-by-file delta Proposed method File-by-file delta Proposed method File-by-file delta Proposed method postgres .1 - .2 postgres .2 - .3 mysql .29 - .30 mysql .30 - .31 Time to update images (sec) download mount 0 10 20 30 40 File-by-file delta Proposed method File-by-file delta Proposed method postgres .1 - .3 mysql .29 - .31 Time to update images (sec) merge download mount
  27. Performance Degradation on Applications Evaluated at updating from postgres:13.1 to

    postgres:13.2 • Time to compare files in new images with diff(1) increased greatly. • Due to the delta applying overhead in Di3FS • No performance degradation were not seen in benchmark with pgbench. • Di3FS only handles files in images and delta applied result is retained on memory. • New data and modifications are handled by native FS → Once the container started, severe performance issues will not occur. 27 Elapsed time for diff(1) pgbench result method Time (sec) Di3FS 6.019 Native FS 0.234 method Time per transaction (ms) Transactions per second Di3FS 15.748 634.997 Native FS 15.747 635.037
  28. Summary Objective: Reducing data size and time to update container

    images. Proposal: Updating method with delta encoding. Evaluation: Our method reduce size to 5 ~ 40% that of a file-by-file delta. • Huge reduction in executable binaries and shared libraries. • Performance degradation is little excepting some cases. Conclusion: Delta encoding is also effective in container image updating. • File-specific delta encoding method will reduce data size more. 28 Prototype implementation is available at https://github.com/naoki9911/d4c