Upgrade to Pro — share decks privately, control downloads, hide ads and more …

How to get Slim with SOCI and Star(g)s: Minifying Containers with Lazy Image Loading in ContainerD

How to get Slim with SOCI and Star(g)s: Minifying Containers with Lazy Image Loading in ContainerD

This is a story about exploring the new lazy image loading capabilities in ContainerD and how this lazy loading behavior can be leveraged to minify the container images in DockerSlim/MinToolkit.

First, we'll review the SOCI and Stargz snapshotters that provide the lazy loading capabilities, how they work and their differences.

Then, we'll take a look at what happens in the lazy image loading snapshotters to learn how to discover the files loaded by the containers. We'll explore how this file loading telemetry data can be exposed externally, so it can be consumed by MinToolkit.

Finally, you'll discover if it's actually possible to create fully functional minified container images using the SOCI and Stargz snapshotter data.

Kyle Quest

April 19, 2024
Tweet

More Decks by Kyle Quest

Other Decks in Technology

Transcript

  1. How to get Slim with SOCI and Star(g)s Minifying Containers

    with Lazy Image Loading in ContainerD
  2. • Minimal container images is a step towards meeting the

    container best practices • Different ways to create them ◦ DIY ◦ … ◦ DockerSlim/MinToolkit ▪ Performs static and dynamic image analysis ▪ Collects runtime container telemetry
  3. • Original way DockerSlim collects runtime telemetry: ◦ Container level

    sensor is added to a temporary container it creates • New way with a system level sensor (WIP): ◦ Uses eBPF • Both telemetry modes have their own use cases and trade offs • Both modes are trying to figure out what you need to have in the container. • Is it possible to get the same data without eBPF or instrumenting the containers?
  4. • Is it possible to use lazy image loading to

    know what you need in containers? • What is lazy loading? ◦ A way to start containers before their entire image is fetched ◦ Loading the image files on demand when you need them instead of loading all of them ahead of time • If you can observe the lazy loading process you will know what you need in the container
  5. • Lazy loading is not available by default • Takes

    advantage of the extensible ContainerD design • Possible because ContainerD has pluggable container filesystem services called Snapshotters
  6. • Two ContainerD snapshotters (Stargz and SOCI) ◦ Enable lazy

    loading ◦ Designed as remote (file system) snappshotters ◦ Configured as proxy plugins in ContainerD
  7. • Why don’t we have lazy loading by default? ◦

    Because the standard container image format is not designed to support it • The container image format is designed around the container image build flow and NOT the container execution flow
  8. • Container images are like trees ◦ They grow over

    time by adding rings/layers • The image composition is based on how you build the image and not how you run it • What makes it worse: ◦ Each layer is designed to be an opaque blob… optimized for storage ◦ Each layer is a TAR archive (can be GZIP’ed) ◦ GZIP and TAR do not allow random I/O
  9. • Alternative designs to load partial layer data • Both

    require image format enhancements • Stargz: ◦ Repackages/re-encodes layer data ◦ Adds extra layer metadata • SOCI: ◦ No changes to target image ◦ Extra metadata stored as separate artifacts in container registry
  10. • Both create an INDEX for each layer ◦ Offsets

    for file data ◦ Metadata for files/dirs/symlinks ◦ Snapshotter-specific INDEX data ▪ SOCI has extra data due to its design • The INDEX location depends on the snapshotter ◦ Appended to layer (Stargz) or ◦ Stored separately in the registry (SOCI) • Partial layer blob data is requested with the RANGE HTTP request header. • The INDEX is loaded before container starts • FUSE FS callbacks initiate lazy loading
  11. • The original/target image is not changed • Not changing

    the container images requires additional index metadata to have random I/O ◦ because the layers could be GZ compressed • Each requestable chunk (aka span) in layer metadata needs GZ decompression state
  12. • SOCI Index Manifest ◦ Like a regular image manifest

    + “subject” ◦ Empty config ◦ Layer records point to layer index metadata (zTOC) ◦ zTOC index same as target layer index ◦ “com.amazon.soci.image-layer-digest” - layer digest from original/target image • Generate SOCI index with “soci create” (need to use small(er) minimal layer size and span size parameters for better minification results) • Layer index data (zTOC) is binary (“soci ztoc”)
  13. • No need to track decompression state ◦ BUT requires

    image/layer repackaging • Each layer file is TAR’ed and GZip’ed ◦ TAR’ed/GZip’ed files are combined into one GZIP stream • Layer index/TOC is saved as a file (“stargz.index.json”) at the end of the layer • eStargz layer ends with a Footer (has TOC offset) • Special Landmark files indicate what to pre-fetch • Only “file” data is lazy loaded ◦ not directories or symlinks
  14. • Optional pre-fetched files (loaded/fetched when container starts) are moved

    to the beginning of each eStargz layer ◦ “.prefetch.landmark” file added to show where they end • Layer index data (TOC) is text/json (“stargz.index.json”) unlike zTOC with SOCI • “Mount” flow (separate layer blob HTTP requests) ◦ Read layer Footer ◦ Read layer TOC/stargz.index.json ◦ Read all prefetch files
  15. • “uname -a” - runs in the target Stargz container

    • “bytes=5276000-5291999” range for layer “sha256:80505726a0e2e80ef6336f4976e6a30ed3 de572934224603cdc66801bbf195b9”- requested from registry (lower right side log window) • “bin/uname” record offset (“5276730”) from “stargz.index.json” is matched with the requested range (lower left side window) • Stargz Snapshotter config (no background fetch and a small data request chunk size, 2k)
  16. • Yes, it’s possible to use Stargz/SOCI snapshotters to minify

    images • BUT it requires custom configs (must disable background fetching) • AND you need to use/config small chunks/spans (which may result in large SOCI index data because it keeps track of the GZ decompression state) • Need to host the fat images in your registry that logs HTTP requests (“range” headers and the layer blob path)
  17. • Ability to convert regular images to Stargz images •

    Ability to generate and publish SOCI Index data • Stargz images as “slim”/”build” command output • Stargz-based minification mode (linux only) https://github.com/mintoolkit/mint
  18. Kyle Quest • Creator, DockerSlim (aka SlimToolkit/minToolkit) • Founder, AutonomousPlane

    • Founder/CTO, Slim.AI • https://twitter.com/kcqon • https://github.com/kcq