Slide 1

Slide 1 text

From Docker’Fail to Dockerfile Djalal Elbaz @enlamp DevOps Consultant Mohammed Aboullaite @laytoun Backend engineer, Spotify

Slide 2

Slide 2 text

Agenda ● coding like on my laptop ● building anything and testing by hand ● ignore cache and download the internet the production-like misunderstanding blindly include secrets and vulnerabilities tags are mutable ● health checks can kill your production… ● multiple processes in same container ● security as a second thought

Slide 3

Slide 3 text

Demos! ● Mostly using Docker ● Container orchestrator agnostic ● CI/CD tool agnostic ● Works everywhere!

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

No content

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

Dockerfail #1 Coding like my laptop

Slide 8

Slide 8 text

Dockerfail #1: coding like my laptop ➔ Signs of Dockerfail ◆ One COPY to rule them all ( COPY . . ) ◆ Duplicate “COPY” lines ◆ Consecutive “RUN” ➔ Better Dockerfile ◆ use a linter that will check syntax: hadolint ◆ Hadolint packs dozens of syntax rules ◆ Bonus: Enable a pre-commit hook in git to protect coworkers from the same branch

Slide 9

Slide 9 text

Dockerfail #2 building anything and testing by hand

Slide 10

Slide 10 text

Dockerfail #2: building anything and testing by hand ➔ Signs of Dockerfail ◆ Installs unwanted packages in final image ◆ Does not have a clue when critical versions change ◆ Oftentimes, the build is successful with incorrect content. ➔ Better Dockerfile ◆ Write a test suite, using container-test-tool by google ● Assert expected content, ● But also assert unwanted content ◆ Bonus: run as a pre commit git hook

Slide 11

Slide 11 text

Dockerfail #3 ignore cache and download the internet

Slide 12

Slide 12 text

Dockerfail #3: ignore cache and download the internet [at your own risk] ➔ Signs of Dockerfail ◆ Each build downloads hundred of packages ◆ Dependencies are never cached ◆ It takes minutes to rebuild between code patches ➔ Better Dockerfile ◆ Add content in docker image from most static to most dynamic ◆ Split COPY and RUN instructions by distance source: ● from internet ● from internal servers ● from Code from internal repo Example: OS -> system packages -> maven dependencies -> internal tooling -> code

Slide 13

Slide 13 text

Success… Well, Almost!

Slide 14

Slide 14 text

Dockerfail #4: the production-like misunderstanding

Slide 15

Slide 15 text

Dockerfail #4: the production-like misunderstanding ➔ Signs of Dockerfail ◆ Dockerfile.dev vs Dockerfile.prod in the same repo ◆ Dirty Hacks in a Dockerfile with shell conditions like ● RUN if [$IS_DEV] apt-get install -yq xdebug ➔ Better Dockerfile ◆ Multi stage builds ◆ Using per env profiles

Slide 16

Slide 16 text

Dockerfail #5 blindly include secrets and vulnerabilities

Slide 17

Slide 17 text

Dockerfail #5: blindly include secrets and vulnerabilities ➔ Signs of Dockerfail ◆ “Oops moments” when you inadvertently find “sensitive content” in docker images ◆ Like all silent bugs, it’s hard to detect by peer reviews, does not hurt until it’s too late ◆ Deleting from previous layer doesn’t actually delete it from docker image! ➔ Better dockerfile ◆ Using the multi stage build to scan for secrets and fail builds as soon as possible. ◆ Use built in secret management functionality ◆ Stop CI pipeline from pushing such images in docker registry

Slide 18

Slide 18 text

Dockerfail #6 tags are mutable, changes occur behind the scene

Slide 19

Slide 19 text

Dockerfail #6: tags are mutable, changes get lost ➔ Signs of Dockerfail ◆ Using latest, prod or “stable” tag to deploy in production. ◆ not having a 1-1 relation from code to docker images ● (“what git SHA1 is running in production?”) ➔ Better Dockerfile ◆ Collect docker image content, via docker sbom new CLI command (experimental) ◆ Use a git repository as an auditing space for Software Bill of Materials ◆ Multi tag and label each docker image (git SHA1, build number, timestamp, etc.) ● Bonus: some Docker registries block tag reuse,, enable it if you can!

Slide 20

Slide 20 text

But wait… What about production!

Slide 21

Slide 21 text

Dockerfail #7 health checks can kill your production

Slide 22

Slide 22 text

Dockerfail #7: health checks can kill your production ➔ Signs of Dockerfail ◆ Uncalled restarts ◆ Slowness in changes ◆ Orchestrator confusion, moving pods around for no reason ➔ Better Dockerfile ◆ take time to design a real-world healthcheck ● Start by a readiness probe (HEALTHCHECK instruction in Dockerfile) ● Add a livenessProbe ● Slow start? Add a startupProbe, with the new K8s v 1.19 ● Use observability data to adjust the many per-probe settings

Slide 23

Slide 23 text

Dockerfail #8 multiple processes in same container/pod

Slide 24

Slide 24 text

Dockerfail #8: multiple processes in same container/pod ➔ Signs of Dockerfail ◆ Installing supervisord / systemd process managers ◆ Installing a SSH server in a Dockerfile ◆ Scaling the world! (tightly coupled components) ➔ Better Dockerfile ◆ Split the app into scoped services: frontend, api, backend, cache, storage, etc. ◆ Use one Dockerfile per service ◆ Manage them all with the proper abstraction: docker-compose.yml, kustomize specs, Helm chart, etc.

Slide 25

Slide 25 text

Dockerfail #9 security as a second thought

Slide 26

Slide 26 text

Dockerfail #9: security as a second thought ➔ Signs of Dockerfail ◆ Vulnerabilities (obviously) ◆ DAST and SAST tools on fire! ➔ Better Dockerfile ◆ Scan production images in a container ◆ Drop all kernel capabilities, and enable them one by one as needed ◆ Run containers as non-root users, and in read-only mode, writing only in explicitly declared volumes ◆ Run container runtime (e.g containerd) in user namespace, and not as root daemon

Slide 27

Slide 27 text

No content

Slide 28

Slide 28 text

Dockerfail #10: You probably don’t need a Dockerfile! ➔ How to build an OCI compliant image ◆ Docker build ◆ Buildkit (Docker) ◆ BuildPack ◆ Jib (Google) ◆ kaniko (Google) ◆ orca-build (Aleksa Sarai) ◆ img (Jessie Frazelle) ◆ buildah (RedHat) ◆ umoci (SuSE) ◆ Bazel (Google) ◆ S2I (RedHat) ◆ Package (metaparticle) ◆ Systemd-nspawn ◆ LXC

Slide 29

Slide 29 text

@enlamp @laytoun Thanks - Kiitos ● Questions? Resources ➔ Code source ◆ https://github.com/djalal/dockerfail ➔ Links ◆ Awesome-docker ◆ Play-with-docker TL;DR – 10 “Dockerfine” 1. Linting 2. Testing 3. Caching 4. Multi Staging 5. Secrets 6. Deploying 7. Monitoring 8. Scaling 9. Scanning/Signing 10. Adapting