Automatisierte Tests mit Machine Learning und verteilter Ausführung beschleunigen

Slide 1

Slide 1 text

Automatisierte Tests beschleunigen mit Machine Learning und verteilter Ausführung Entwicklertag Karlsruhe 2023

Slide 2

Slide 2 text

⬢ Sr. Principal Software Engineer at Gradle Inc. leading the Testing Team ⬢ JUnit team lead Marc Philipp Email: [email protected] Mastodon: @[email protected] About me

Slide 3

Slide 3 text

Open Source Build Automation Commercial Offering Developer Productivity Engineering Supports Gradle, Maven, and Bazel Gradle Inc.

Slide 4

Slide 4 text

Gradle Enterprise API Gradle Enterprise Technical Overview

Slide 5

Slide 5 text

Table of Contents Motivation Test Distribution Predictive Test Selection

Slide 6

Slide 6 text

Motivation 1

Slide 7

Slide 7 text

Testing usually dominates build times

Slide 8

Slide 8 text

Why is testing so slow? Databases Web servers Directories Virtual machines Network latency Network latency

Slide 9

Slide 9 text

Reduced test time yields increase in productivity More build and test executions Shift-left testing from CI to local builds Less context-switching

Slide 10

Slide 10 text

Local parallelization lacks historical information

Slide 11

Slide 11 text

Single-machine parallelism does not scale

Slide 12

Slide 12 text

CI Fanout Idea: run subset of tests on CI agents in parallel ⬢ Grouping of tests often manual ⬢ Large overhead because each CI agent has to run build up to test task ⬢ Test results are scattered over multiple CI jobs ⬢ Does not support local builds

Slide 13

Slide 13 text

Test Distribution 2

Slide 14

Slide 14 text

Test Distribution Overview ⬢ Broker component included in Gradle Enterprise server ⬢ Agents connect to the broker ⬢ Builds connect to the broker and request agents ⬢ Works for local and CI builds On-premises inside your network

Slide 15

Slide 15 text

Running Test Distribution agents ⬢ The agent comes in two ﬂavors: Jar and Docker image ⬢ Runs on Java 17+, requires about 128 MB of memory ⬢ Runs on Windows, macOS, and Linux ⬢ Detects its environment (JDKs, OS) during startup ⬢ Administrators can pass additional capabilities as command line parameters java -jar gradle-enterprise-test-distribution-agent.jar \ --server https://ge.example.com \ --api-key «api-key» \ --capabilities docker,postgres=14 docker run \ --env TEST_DISTRIBUTION_AGENT_SERVER=https://ge.example.com \ --env TEST_DISTRIBUTION_AGENT_API_KEY=«api-key» \ --env TEST_DISTRIBUTION_AGENT_CAPABILITIES=postgres=14 \ gradle/gradle-enterprise-test-distribution-agent Runs on-premises inside your own network infrastructure

Slide 16

Slide 16 text

Gradle: Integrates with default Test task tasks.test { useJUnitPlatform() distribution { enabled.set(true) } } Code coverage and other output ﬁles are transferred back and merged automatically Input ﬁles (e.g. classpath) are automatically transferred to remote agents plugins { id("com.gradle.enterprise") version "3.13.3" }

Slide 17

Slide 17 text

Maven: integrates with Sureﬁre/Failsafe plugins maven-surefire-plugin 2.22.2 true ⬢ Requires Gradle Enterprise Maven extension ⬢ Integrates with Sureﬁre and Failsafe plugins

Slide 18

Slide 18 text

Test Distribution requires JUnit Platform Most test frameworks with JUnit Platform test engines are supported: ⬢ JUnit 5 (Jupiter) ✔ ⬢ JUnit 3/4 and Spock 1.x (via junit-vintage-engine included in JUnit 5) ✔ ⬢ Spock 2.x ✔ ⬢ TestNG (via testng-engine) ✔ ⬢ ScalaTest (via scalatest-junit-runner) ✔ ⬢ ArchUnit ✔ ⬢ jqwik ✔ ⬢ Kotest ✔

Slide 19

Slide 19 text

Checking compatibility ⬢ Compatibility can be checked before adopting Test Distribution by applying a custom build script that adds custom values to the Build Scan https://github.com/gradle/gradle-enterprise-build-conﬁg-samples/pull/469

Slide 20

Slide 20 text

Run build only once, distribute test execution

Slide 21

Slide 21 text

Automatic distribution based on previous execution times

Slide 22

Slide 22 text

Resilience against temporary network failures ⬢ Actively manage connections using WebSocket pings ⬢ Reconnect if connection is lost or unresponsive ⬢ Reschedule work on other agents if agent disappears ⬢ Retry ﬁle uploads on non-client errors ⬢ Avoid builds from breaking and causing disruption

Slide 23

Slide 23 text

Adaptive scheduling ⬢ Be able to react to additional agents becoming available during test execution ⬢ Increases agent utilization ⬢ Reduces testing time

Slide 24

Slide 24 text

Auto-scaling Test Distribution agents ⬢ Agent pools with min/max size and capabilities for horizontal scaling ⬢ HTTP endpoint provides metrics indicating the target number of agents for each pool, based on demand. ⬢ Step-by-step instructions for Kubernetes in docs ⬢ Real-time and historical usage can be visualized by Gradle Enterprise administrators { "id": "sosmbpbr", "name": "Linux", "capabilities": [ "jdk=8", "os=linux" ], "minimumAgents": 1, "maximumAgents": 90, "connectedAgents": 2, "idleAgents": 0, "desiredAgents": 8 }

Slide 25

Slide 25 text

Demo

Slide 26

Slide 26 text

PR Build w/ Integration tests 30 min → 3 min

Slide 27

Slide 27 text

PR Build Times with Integration Tests went from 60 minutes to 5 Danny Thomas Developer Productivity Team For one project we reduced the build time from 62 minutes to under 5 minutes just using Test Distribution across multiple machines. So we think that Test Distribution will really move the needle and improve the test experience for everyone.

Slide 28

Slide 28 text

Local Build Times: From 54 min to 5. CI PR Builds: Way More Reliable Cédric Champeau Technical Staff, Oracle (former Gradle Build Tool engineer) My test suites ﬁnished in 5 minutes instead of 54. The 10X developer is ﬁnally here!

Slide 29

Slide 29 text

Gradle Enterprise Acceleration Features

Slide 30

Slide 30 text

Gradle Enterprise Acceleration Features Gradle Enterprise 2022.2

Slide 31

Slide 31 text

Predictive Test Selection (PTS) 3

Slide 32

Slide 32 text

Avoid wasting time and resources ⬢ Typically less than 1% of codebase affected by any given change ⬢ Typically fewer than 1% of tests affected by any given change ⬢ Yet, most reported test failures are not regressions 💡 Skip irrelevant tests for a given change set using machine learning

Slide 33

Slide 33 text

Not a new concept ⬢ Predictive Test Selection — Meta 2019 ⬢ Improving Test Effectiveness Using Test Executions History: An Industrial Experience Report — Ericsson 2019 ⬢ Taming Google-Scale Continuous Testing — Google 2017 ⬢ Test Re-prioritization in Continuous Testing Environments — Concordia Univ. 2016 ⬢ The Art of Testing Less without Sacriﬁcing Quality — Microsoft 2015 ⬢ Improving the effectiveness of test suite through mining historical data — ACM 2014 ⬢ … and more

Slide 34

Slide 34 text

1. When a test run starts, Gradle Enterprise submits a test input snapshot and test set to a machine learning model. 2. Gradle Enterprise automatically develops a test selection strategy by learning from historical code changes and test outcomes from your Build Scan™ data to predict a subset of relevant tests, which are then executed by your build. 3. Code change and test results data are processed immediately after a Build Scan is uploaded to Gradle Enterprise and updates the test selection strategy based on new results. How it works…

Slide 35

Slide 35 text

Predictive Test Selection Simulator ⬢ Simulates what tests would have been selected had PTS been enabled ⬢ Compares full test results to selected test results ⬢ Allows to assess risk and savings of PTS per task/goal before adopting it ⬢ Requires at least 50 executions over a 14-day period before starting to make predictions

Slide 36

Slide 36 text

No content

Slide 37

Slide 37 text

No content

Slide 38

Slide 38 text

No content

Slide 39

Slide 39 text

No content

Slide 40

Slide 40 text

maven-surefire-plugin 2.22.2 true Enabling Predictive Test Selection tasks.test { predictiveSelection { enabled.set(true) } } plugins { id("com.gradle.enterprise") version "3.13.3" }

Slide 41

Slide 41 text

Must-run tests import com.gradle.enterprise.testing.annotations.MustRun; @MustRun public class ImportantTests { // ... } Using gradle-enterprise-testing-annotations from Maven Central. tasks.test { predictiveSelection { mustRun { includeClasses.add("example.ImportantTests") } } } example.ImportantTests

Slide 42

Slide 42 text

No content

Slide 43

Slide 43 text

No content

Slide 44

Slide 44 text

Usage patterns Our recommendation is to run all tests post-merge or at least periodically. 1. Apply to existing pre-merge veriﬁcation (run all tests post-merge) 2. Apply to existing pre- and post-merge veriﬁcation (move “all tests” run to nightly build) 3. Run some high value tests pre-merge (change CI jobs to run costly tests sooner) 4. Two-pass pre-merge (split pre-merge CI jobs into two steps to get faster feedback)

Slide 45

Slide 45 text

https://ge.spring.io https://ge.junit.org https://ge.micronaut.io https://ge.gradle.org Predictive Test Selection in Action