Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Automatisierte Tests mit Machine Learning und verteilter Ausführung beschleunigen

Automatisierte Tests mit Machine Learning und verteilter Ausführung beschleunigen

Die Ausführung von Tests dominiert in vielen Fällen die Dauer von Software-Builds. Dazu tragen unter anderem die wachsende Anzahl von Integrationstests und funktionaler Tests sowie die sequentielle Ausführung von Tests und etwaige Abhängigkeiten auf externe Dienste bei. Dies führt häufig dazu, das Entwickler*innen Tests nur auf dem CI-Server ausführen und somit den Feedback-Zyklus zwischen Code-Änderung und Testergebnis wesentlich verlängern. Desweiteren stellt häufig sogar in diesem Fall die Ausführung aller Tests für jede Änderung eine Herausforderung in Bezug auf Kosten und Build-Dauer dar. Gradle Enterprise bietet zwei innovative Technologien, die es ermöglichen, Tests früher und häufiger auszuführen: Predictive Test Selection und Test Distribution.

Predictive Test Selection spart Testzeit, in dem es Tests identifiziert, priorisiert und ausführt, die mit hoher Wahrscheinlichkeit zu nützlichem Feedback führen. Dies wird durch Anwendung eines Machine Learning Models erreicht, das auf feingranuläre Code-Snapshots sowie umfassende Test Analytics und Daten über Test-Flakiness einbezieht.

Test Distribution erweitert die parallele Ausführung von Tests, in dem es zusätzlich Remote Agents verwendet und orchestriert. Dies funktioniert sowohl für lokale Builds als auch auf dem CI-Server. So können bestehende Test Suites verteilt und schneller ausgeführt werden.

Individuell oder in Kombination — diese beiden Technologien ermöglichen, Testzeiten dramatisch zu reduzieren, Tests früher im Entwicklungszyklus auszuführen und erreichen somit eine Verkürzung des Feedbackzyklus, die wiederum zu höherer Produktivität und Zufriedenheit der Entwickler*innen führt. In diesem Vortrag werden wir beide Features in Aktion sehen und anhand von öffentlich verfügbaren Gradle-Enterprise-Instanzen bekannter Open-Source-Projekte (Spring, JUnit, Micronaut, …) besprechen, wie sie im Detail funktionieren.

Marc Philipp

June 14, 2023
Tweet

More Decks by Marc Philipp

Other Decks in Programming

Transcript

  1. Automatisierte Tests
    beschleunigen
    mit Machine Learning und
    verteilter Ausführung
    Entwicklertag Karlsruhe 2023

    View Slide

  2. ⬢ Sr. Principal Software Engineer at
    Gradle Inc. leading the Testing Team
    ⬢ JUnit team lead
    Marc Philipp
    Email:
    [email protected]
    Mastodon:
    @[email protected]
    About me

    View Slide

  3. Open Source
    Build Automation
    Commercial Offering
    Developer Productivity Engineering
    Supports Gradle, Maven, and Bazel
    Gradle Inc.

    View Slide

  4. Gradle
    Enterprise
    API
    Gradle Enterprise Technical Overview

    View Slide

  5. Table of Contents
    Motivation
    Test Distribution
    Predictive Test Selection

    View Slide

  6. Motivation
    1

    View Slide

  7. Testing usually dominates build times

    View Slide

  8. Why is testing so slow?
    Databases
    Web servers
    Directories
    Virtual machines
    Network
    latency
    Network
    latency

    View Slide

  9. Reduced test time yields increase in productivity
    More build and test executions
    Shift-left testing from CI to local builds
    Less context-switching

    View Slide

  10. Local parallelization lacks historical information

    View Slide

  11. Single-machine parallelism does not scale

    View Slide

  12. CI Fanout
    Idea: run subset of tests on CI agents
    in parallel
    ⬢ Grouping of tests often manual
    ⬢ Large overhead because each
    CI agent has to run build up to
    test task
    ⬢ Test results are scattered over
    multiple CI jobs
    ⬢ Does not support local builds

    View Slide

  13. Test Distribution
    2

    View Slide

  14. Test Distribution Overview
    ⬢ Broker component included in Gradle Enterprise server
    ⬢ Agents connect to the broker
    ⬢ Builds connect to the broker and request agents
    ⬢ Works for local and CI builds
    On-premises inside your network

    View Slide

  15. Running Test Distribution agents
    ⬢ The agent comes in two flavors: Jar and Docker image
    ⬢ Runs on Java 17+, requires about 128 MB of memory
    ⬢ Runs on Windows, macOS, and Linux
    ⬢ Detects its environment (JDKs, OS) during startup
    ⬢ Administrators can pass additional capabilities as command line parameters
    java -jar gradle-enterprise-test-distribution-agent.jar \
    --server https://ge.example.com \
    --api-key «api-key» \
    --capabilities docker,postgres=14
    docker run \
    --env TEST_DISTRIBUTION_AGENT_SERVER=https://ge.example.com \
    --env TEST_DISTRIBUTION_AGENT_API_KEY=«api-key» \
    --env TEST_DISTRIBUTION_AGENT_CAPABILITIES=postgres=14 \
    gradle/gradle-enterprise-test-distribution-agent
    Runs on-premises inside your
    own network infrastructure

    View Slide

  16. Gradle: Integrates with default Test task
    tasks.test {
    useJUnitPlatform()
    distribution {
    enabled.set(true)
    }
    }
    Code coverage and other output
    files are transferred back and
    merged automatically
    Input files (e.g. classpath) are
    automatically transferred to
    remote agents
    plugins {
    id("com.gradle.enterprise") version "3.13.3"
    }

    View Slide

  17. Maven: integrates with Surefire/Failsafe plugins





    maven-surefire-plugin
    2.22.2



    true







    ⬢ Requires Gradle Enterprise
    Maven extension
    ⬢ Integrates with Surefire and
    Failsafe plugins

    View Slide

  18. Test Distribution requires JUnit Platform
    Most test frameworks with JUnit Platform test engines are supported:
    ⬢ JUnit 5 (Jupiter) ✔
    ⬢ JUnit 3/4 and Spock 1.x (via junit-vintage-engine included in JUnit 5) ✔
    ⬢ Spock 2.x ✔
    ⬢ TestNG (via testng-engine) ✔
    ⬢ ScalaTest (via scalatest-junit-runner) ✔
    ⬢ ArchUnit ✔
    ⬢ jqwik ✔
    ⬢ Kotest ✔

    View Slide

  19. Checking compatibility
    ⬢ Compatibility can be checked before adopting Test Distribution by applying a custom
    build script that adds custom values to the Build Scan
    https://github.com/gradle/gradle-enterprise-build-config-samples/pull/469

    View Slide

  20. Run build only once, distribute test execution

    View Slide

  21. Automatic distribution based on previous execution
    times

    View Slide

  22. Resilience against temporary network failures
    ⬢ Actively manage connections using
    WebSocket pings
    ⬢ Reconnect if connection is lost or
    unresponsive
    ⬢ Reschedule work on other agents if agent
    disappears
    ⬢ Retry file uploads on non-client errors
    ⬢ Avoid builds from breaking and causing
    disruption

    View Slide

  23. Adaptive scheduling
    ⬢ Be able to react to additional agents becoming available during test execution
    ⬢ Increases agent utilization
    ⬢ Reduces testing time

    View Slide

  24. Auto-scaling Test Distribution agents
    ⬢ Agent pools with min/max size and
    capabilities for horizontal scaling
    ⬢ HTTP endpoint provides metrics
    indicating the target number of agents
    for each pool, based on demand.
    ⬢ Step-by-step instructions for
    Kubernetes in docs
    ⬢ Real-time and historical usage can be
    visualized by Gradle Enterprise
    administrators
    {
    "id": "sosmbpbr",
    "name": "Linux",
    "capabilities": [
    "jdk=8",
    "os=linux"
    ],
    "minimumAgents": 1,
    "maximumAgents": 90,
    "connectedAgents": 2,
    "idleAgents": 0,
    "desiredAgents": 8
    }

    View Slide

  25. Demo

    View Slide

  26. PR Build w/ Integration tests
    30 min → 3 min

    View Slide

  27. PR Build Times with Integration
    Tests went from 60 minutes to 5
    Danny Thomas
    Developer Productivity Team
    For one project we reduced the build
    time from 62 minutes to under 5
    minutes just using Test Distribution
    across multiple machines. So we think
    that Test Distribution will really move
    the needle and improve the test
    experience for everyone.

    View Slide

  28. Local Build Times: From 54 min
    to 5. CI PR Builds: Way More
    Reliable
    Cédric Champeau
    Technical Staff, Oracle
    (former Gradle Build Tool engineer)
    My test suites finished in 5
    minutes instead of 54. The 10X
    developer is finally here!

    View Slide

  29. Gradle Enterprise Acceleration Features

    View Slide

  30. Gradle Enterprise Acceleration Features
    Gradle
    Enterprise
    2022.2

    View Slide

  31. Predictive Test Selection
    (PTS)
    3

    View Slide

  32. Avoid wasting time and resources
    ⬢ Typically less than 1% of codebase affected
    by any given change
    ⬢ Typically fewer than 1% of tests affected
    by any given change
    ⬢ Yet, most reported test failures are not
    regressions
    💡 Skip irrelevant tests for a given change set using machine learning

    View Slide

  33. Not a new concept
    ⬢ Predictive Test Selection — Meta 2019
    ⬢ Improving Test Effectiveness Using Test Executions History: An
    Industrial Experience Report — Ericsson 2019
    ⬢ Taming Google-Scale Continuous Testing — Google 2017
    ⬢ Test Re-prioritization in Continuous Testing Environments —
    Concordia Univ. 2016
    ⬢ The Art of Testing Less without Sacrificing Quality — Microsoft
    2015
    ⬢ Improving the effectiveness of test suite through mining historical
    data — ACM 2014
    ⬢ … and more

    View Slide

  34. 1. When a test run starts, Gradle Enterprise
    submits a test input snapshot and test set to a
    machine learning model.
    2. Gradle Enterprise automatically develops a
    test selection strategy by learning from
    historical code changes and test outcomes
    from your Build Scan™ data to predict a
    subset of relevant tests, which are then
    executed by your build.
    3. Code change and test results data are
    processed immediately after a Build Scan is
    uploaded to Gradle Enterprise and updates
    the test selection strategy based on new
    results.
    How it works…

    View Slide

  35. Predictive Test Selection Simulator
    ⬢ Simulates what tests would have been
    selected had PTS been enabled
    ⬢ Compares full test results to selected test
    results
    ⬢ Allows to assess risk and savings of PTS per
    task/goal before adopting it
    ⬢ Requires at least 50 executions over a
    14-day period before starting to make
    predictions

    View Slide

  36. View Slide

  37. View Slide

  38. View Slide

  39. View Slide




  40. maven-surefire-plugin
    2.22.2



    true






    Enabling Predictive Test Selection
    tasks.test {
    predictiveSelection {
    enabled.set(true)
    }
    }
    plugins {
    id("com.gradle.enterprise") version "3.13.3"
    }

    View Slide

  41. Must-run tests
    import com.gradle.enterprise.testing.annotations.MustRun;
    @MustRun
    public class ImportantTests {
    // ...
    }
    Using gradle-enterprise-testing-annotations
    from Maven Central.
    tasks.test {
    predictiveSelection {
    mustRun {
    includeClasses.add("example.ImportantTests")
    }
    }
    }



    example.ImportantTests



    View Slide

  42. View Slide

  43. View Slide

  44. Usage patterns
    Our recommendation is to run all tests post-merge or at least periodically.
    1. Apply to existing pre-merge verification (run all tests post-merge)
    2. Apply to existing pre- and post-merge verification (move “all tests” run to nightly build)
    3. Run some high value tests pre-merge (change CI jobs to run costly tests sooner)
    4. Two-pass pre-merge (split pre-merge CI jobs into two steps to get faster feedback)

    View Slide

  45. https://ge.spring.io https://ge.junit.org
    https://ge.micronaut.io https://ge.gradle.org
    Predictive Test Selection in Action

    View Slide

  46. Blank background use at will
    Questions?
    More info on gradle.com

    View Slide

  47. Thank you!

    View Slide