Slide 1

Slide 1 text

Test your Docker images with Python Jamie Hewland PyConZA 2018

Slide 2

Slide 2 text

Welcome 00 Thanks for coming to a talk about testing

Slide 3

Slide 3 text

Introduction • Site Reliability Engineer (SRE) and Tech Ambassador at Praekelt.org • Developed continuous integration (CI) workflows for Docker images • 3rd PyConZA, 2nd talk on Docker

Slide 4

Slide 4 text

What’s a (Linux) container? Isolation of a process’ view of its operating environment (via namespaces) Limitation/prioritization of resources (via cgroups) •Block I/O •Networking… •CPU •Memory •Networking •Mounted filesystems… •Process trees •User IDs

Slide 5

Slide 5 text

Docker containers Docker is the most popular container technology. • Batteries included • Easy-to-use, lots of sensible defaults • Layered filesystem: containers can start up very quickly and share a lot of filesystem data • Images available for all the software you know & love

Slide 6

Slide 6 text

ubuntu debian redis rabbitmq > $ docker run python postgres sentry nginx nodejs

Slide 7

Slide 7 text

Why containers? Consistent portability • A clean way to package software • With (almost) everything it needs to run • With a single, (hopefully) simple entry-point • Limit access to resources
 Eliminates “but it works on my machine”

Slide 8

Slide 8 text

Docker terminology Terms “image” and “container” often conflated

Slide 9

Slide 9 text

Docker daemon • All interactions with Docker happen via the daemon • Docker has an HTTP API: Docker Engine API

Slide 10

Slide 10 text

The problem 01 Dockerfiles are code— code should be tested

Slide 11

Slide 11 text

Dockerfiles

Slide 12

Slide 12 text

Entry point scripts…

Slide 13

Slide 13 text

Config… 12-factor app environment variables

Slide 14

Slide 14 text

How is this usually solved? Bash scripts

Slide 15

Slide 15 text

How is this usually solved? Bash Automated Testing System (Bats)

Slide 16

Slide 16 text

How is this usually solved? docker- compose (still need something to assert expectations—probably Bash)

Slide 17

Slide 17 text

Bash scripts Google’s Shell Style Guide google.github.io/ styleguide/shell.xml

Slide 18

Slide 18 text

A solution 02 Seaworthy

Slide 19

Slide 19 text

Seaworthy • A testing library we wrote in Python • Integrates with Docker using the Docker for Python SDK • Interact with Docker daemon programmatically • Define various Docker resources as test fixtures • Containers, networks, and volumes • Handles creation/teardown of resources • Prebuilt fixtures for common containers: PostgreSQL, RabbitMQ, Redis • Tools for ensuring containers fully started • Leverages existing Python testing libraries • unittest, pytest, testtools

Slide 20

Slide 20 text

Resource Definitions • Define a resource before it is created • So that we know how to create/tear it down • e.g. ContainerDefinition, VolumeDefinition • Docker SDK/daemon API complicated • Attempt to make Seaworthy closer to docker CLI or docker- compose. • Wrappers around Docker SDK types.

Slide 21

Slide 21 text

Resource Definitions

Slide 22

Slide 22 text

Helpers • Docker SDK structured as models and collections of models • e.g. Container (model), ContainerCollection (collection) • Definitions wrap models, “helpers” wrap collections • Helpers (try to) track all resources created • So there aren’t any containers/networks/volumes lying around after tests • If you use the built-in pytest fixtures you don’t have to worry about helpers much

Slide 23

Slide 23 text

Sensible defaults • Seaworthy does some things by default to make life easier • Inspired by docker-compose and docker CLI • Pull missing images by default • Create dedicated bridge network for containers • Make containers available by name • Shorter forms of image names, volume mounts

Slide 24

Slide 24 text

Waiting for containers to start • Wait for certain log line(s) to appear • Stream logs line-by-line, match against pattern(s) with a timeout • Simplest and most effective • Wait for HTTP response • Wait for container to be “running” for some time

Slide 25

Slide 25 text

HTTP client • Often useful to be able to make requests against a container • Seaworthy can create a Requests-based client to make requests against forwarded ports

Slide 26

Slide 26 text

Writing tests Assert on: • Command return codes, output • stdout/stderr • HTTP responses • Process trees • Info from Docker about the container • More…

Slide 27

Slide 27 text

Competitors • Testcontainers • Java/JUnit • Selenium WebDriver containers • https://www.testcontainers.org • Google’s Container Structure Tests • Declarative tests: YAML • Commands + file existence, contents, metadata • https://github.com/GoogleContainerTools/container-structure-test

Slide 28

Slide 28 text

Python testing libraries 03 pytest and testtools

Slide 29

Slide 29 text

pytest • pytest can do many things • Mostly interested in its fixture functionality • Easy setup/teardown • Easily adjustable scope • Can inspect which test requested the fixture • Other tools for annotating tests

Slide 30

Slide 30 text

pytest • Built-in fixture “factories” • Call pytest_fixture() on any definition instance • Default helpers used automatically • Annotate tests to only run on systems with Docker

Slide 31

Slide 31 text

testtools • Various extensions to Python test framework • Mostly interested in its matchers • Make it possible/ easier to assert complex things

Slide 32

Slide 32 text

testtools • Process tree assertions

Slide 33

Slide 33 text

Challenges 04 Managing state, container boundaries

Slide 34

Slide 34 text

Performance • Container things are often slow • Slow tests don’t get run • Resetting state is a big task • e.g. recreating a PostgreSQL database • Python not the bottleneck • Docker daemon not good at parallelism

Slide 35

Slide 35 text

Performance workarounds • Have to carefully choose container fixture scope • Longer scope (e.g. module), faster tests • Shorter scope (e.g. function), less state between tests • Concept of “cleanable” state • e.g. drop and recreate table instead of restarting container • Call clean() on container with long scope when necessary

Slide 36

Slide 36 text

Performance workarounds

Slide 37

Slide 37 text

What’s in a container? • Seaworthy relies on docker exec calls to do many things • What if executable is not in the container? • Seaworthy uses ps to list the running processes • Debian Jessie image had procps pre-installed, Stretch doesn’t • Could read /proc in the container • What if there weren’t even basic Unix tools? • How to check the presence/contents/metadata of a file? • Could stream tarball from daemon, check files

Slide 38

Slide 38 text

Fixture ordering • Resources often have strict ordering requirements • e.g. create networks, volumes before containers that use them • Can make fixture setup even slower • pytest lets fixtures depend on other fixtures • May be other libraries that can do fixture ordering better • docker-compose handles this nicely because it is declarative

Slide 39

Slide 39 text

Further work 05 Optimisations and pod support

Slide 40

Slide 40 text

Container state management • Most database images will start up with empty state, then create default user/role/tables etc. • Potential to improve performance by doing less at startup • zanox/mysql allows applying SQL code while building image • docker commit: Save container filesystem state • docker checkpoint: Save container execution state (experimental)

Slide 41

Slide 41 text

Multi-container patterns • Application pod • Kubernetes concept • Co-located containers • Integration between 2 containers • In many cases this should be transparent: e.g. “service mesh” • Some cases tighter integration, e.g. • Webserver serving files at certain paths • Vault agent token handshake Ambassador pattern Sidecar pattern Adapter pattern Design Patterns for Container-based Distributed Systems by B. Burns

Slide 42

Slide 42 text

Kubernetes • Could try run containers for test on an external cluster • Kubernetes platform now de-facto standard • But is complex, with many variables • Greater potential for parallelising tests • Just run lots of containers operating independently • Would be sensitive to scheduling latency • Is there something like this already???

Slide 43

Slide 43 text

Conclusions 06 …or should you write a testing library?

Slide 44

Slide 44 text

Is this recreating production? • Recreating production with Docker containers on a dev machine is a nice idea • In practice many limitations • The only real production environment is production itself • Seaworthy aimed at testing Dockerfiles, entry point scripts, basic config • But you could do a full integration test suite if you wanted

Slide 45

Slide 45 text

This is a difficult problem • Seaworthy works well for the limited cases we’ve used it • Needs more users and contributors • Docker/container space move quickly • Driven by large companies with more resources • It’s been a learning experience • Technical challenges, finding quirks in popular software • Lots of ideas that may never be built • Need a community to build sizeable open source projects

Slide 46

Slide 46 text

Thank you Seaworthy praekeltfoundation/seaworthy seaworthy.readthedocs.io @praekeltorg medium.com/mobileforgood Me @jayhewland youtu.be/T2hooQzvurQ Questions?