Building a screenshot testing pipeline that scales

Slide 1

Slide 1 text

1 Building a screenshot testing pipeline that scales Lukas Appelhans

Slide 2

Slide 2 text

2 Building a screenshot testing pipeline Motivation

Slide 3

Slide 3 text

3 Motivation: Feature development

Slide 4

Slide 4 text

4 Motivation: Feature development

Slide 5

Slide 5 text

5 Motivation: Design system

Slide 6

Slide 6 text

6 Motivation: Manual review

Slide 7

Slide 7 text

7 Motivation: Manual review

Slide 8

Slide 8 text

8 Motivation: Ideal state (iOS)

Slide 9

Slide 9 text

9 Choosing which frameworks to use. Structure of this talk Building a screenshot testing pipeline. Using the screenshot testing pipeline. Adoption at Mercari and what we’re missing. 02 03 04 01

Slide 10

Slide 10 text

10 What is the purpose of screenshot testing? Verifying that UI code is rendering correctly. (As opposed to: Verifying that our code’s logic is working correctly.)

Slide 11

Slide 11 text

11 How many tests do we expect to have? A lot of them. E2E Tests Integration tests Unit tests Screenshot tests

Slide 12

Slide 12 text

12 How does screenshot testing differ from other tests? Unit/Integration/E2E tests ● Given, when, then is clearly speciﬁed. ● Post-condition can be veriﬁed automatically.

Slide 13

Slide 13 text

13 How does screenshot testing differ from other tests? Unit/Integration/E2E tests ● Given, when, then is clearly specified. ● Post-condition can be verified automatically. Screenshot tests ● Given (UI code with data), then. ● Post-condition cannot be verified automatically. (Whether UI code renders “correctly” is up to the viewer.) ● Use diffing to find which parts of the UI changed to make manual review easier.

Slide 14

Slide 14 text

14 Test cases that generate screenshots. Screenshot testing A report of visual differences. 02 01

Slide 15

Slide 15 text

15 JVM-based (Paparazzi) Test cases that generate screenshots Instrumented (on device/emulator) (Shot) 02 01

Slide 16

Slide 16 text

16 Test cases: Framework choices Paparazzi ● Fast and cheap. ● Easy to set up. ● There are some differences between UI rendered using Paparazzi and on-device. @Test fun launchComposable() { paparazzi.snapshot { MyComposable() } }

Slide 17

Slide 17 text

17 Test cases: Framework choices Shot ● As close to the “real world” as possible. ● More difﬁcult setup, especially on CI. (Instrumented tests.) ● Takes signiﬁcantly longer to run. ● More costly. (You’ll probably run this on Firebase test lab.) @Test fun rendersMyComposable() { composeRule.setContent { MyComposable() } compareScreenshot(composeRule) }

Slide 18

Slide 18 text

18 Reporting visual differences: Difﬁng tools Compare two sets of screenshots, make visual differences easy to review. Built-in reporting (e.g. Paparazzi) Reg-suit 02 01

Slide 19

Slide 19 text

19 Reporting visual differences: Where to store screenshot ﬁles? Store directly in git (use git-lfs) Store externally (e.g. GCS bucket) 02 01

Slide 20

Slide 20 text

20 Reporting visual differences: What to compare to? In a pull request, we want a report of the visual differences between the current state of the branch and when the branch was started. 1 2 3 4 Base commit master HEAD branch HEAD

Slide 21

Slide 21 text

21 Summary of choices Paparazzi to generate screenshots. Reg-suit to diff them. 02 01

Slide 22

Slide 22 text

22 Generate screenshots 1. Generate screenshots in all modules. ./gradlew recordPaparazzi[Debug/Dev/Release] 2. Copy screenshots of each module into a separate directory. cp ./*/src/test/snapshots/images/*.png ./screenshots Or create further subdirectories per module, package, etc.

Slide 23

Slide 23 text

23 Generate diff report 3. Use reg-suit to generate report and store screenshots in the cloud. export EXPECTED_KEY=$(git merge-base HEAD origin/master) export ACTUAL_KEY=$GITHUB_SHA npx reg-suit run

Slide 24

Slide 24 text

24 Generate diff report 3. Use reg-suit to generate report and store screenshots in the cloud. export EXPECTED_KEY=$(git merge-base HEAD origin/master) export ACTUAL_KEY=$GITHUB_SHA npx reg-suit run

Slide 25

Slide 25 text

25 Conﬁguring reg-suit { "core": { "workingDir": ".reg", "actualDir": "screenshots", }, "plugins": { … } } ./regconﬁg.json

Slide 26

Slide 26 text

26 Conﬁguring reg-suit { …, "plugins": { "reg-simple-keygen-plugin": { "expectedKey": "${EXPECTED_KEY}", "actualKey": "${ACTUAL_KEY}" }, … } } ./regconﬁg.json

Slide 27

Slide 27 text

27 Conﬁguring reg-suit { …, "plugins": { …, "reg-publish-gcs-plugin": { "bucketName": "gcs-bucket-name" }, … } } ./regconﬁg.json

Slide 28

Slide 28 text

28 Conﬁguring reg-suit { …, "plugins": { …, "reg-notify-github-plugin": { "clientId": "your-client-id", "setCommitStatus": false, "shortDescription": true } } } ./regconﬁg.json

Slide 29

Slide 29 text

29 Conﬁguring reg-suit { …, "plugins": { …, "reg-notify-github-plugin": { "clientId": "your-client-id", "setCommitStatus": false, "shortDescription": true } } } ./regconﬁg.json

Slide 30

Slide 30 text

30 Conﬁguring reg-suit { …, "plugins": { …, "reg-notify-github-plugin": { "clientId": "your-client-id", "setCommitStatus": false, "shortDescription": true } } } ./regconﬁg.json

Slide 31

Slide 31 text

31 Using screenshot tests We know how to build the infrastructure, but how do we use it?

Slide 32

Slide 32 text

32 How to write tests class ChipScreenshotTest { @get:Rule val paparazzi = MercariPaparazzi() // Test cases }

Slide 33

Slide 33 text

33 How to write test cases class ChipScreenshotTest { … @Test fun shortLabel() = paparazzi.snapshot { Chip( label = "Foo", selected = false, onSelectionChanged = {} ) } }

Slide 34

Slide 34 text

34 Pre-merge screenshot test report

Slide 35

Slide 35 text

35 Pre-merge screenshot test report

Slide 36

Slide 36 text

36 Pre-merge screenshot test report

Slide 37

Slide 37 text

37 Pre-merge screenshot test report (2)

Slide 38

Slide 38 text

39 Current scale ~30 secs ~700 Time it takes to generate the screenshots Amount of screenshots

Slide 39

Slide 39 text

40 Caveats 1. Code that requires multiple compositions will not render correctly. val styles = listOf(Large, Medium, Small) var index by remember { mutableStateOf(0) } Text( maxLines = 1, style = styles[index], onTextLayout = { textLayoutResult -> if (textLayoutResult.hasVisualOverflow) { index = index.plus(1).coerceAtMost(styles.size - 1) } } )

Slide 40

Slide 40 text

41 Caveats 1. Code that requires multiple compositions will not render correctly. 2. Supporting multiple densities for each test case. In addition, there was a bug in Paparazzi that prevented it to work properly with our build cache. Fixed in version 1.3.

Slide 41

Slide 41 text

42 What are we missing? Full screens Adoption Testing full screens Adoption outside of Design System Components

Slide 42

Slide 42 text

43 Thank you!