True reasons behind software testing

Us vs Them (startup I work for) (everyone else 😒)

6 years later, mobile engineering at Azimo • Release once
per week • Crash free 99.99% • Decentralised mobile engineering • QA culture • Small team

“We don’t follow the hype” - one of Azimo’s company
values • No devices racks shelves • No 24/7 monkey runners • Unit tests coverage - probably ~50% • Keep number of UI and functional tests as low as possible For hard working people Every moment counts None of us is as smart as all of us We don’t follow the hype

The beginning • Zero tests • Crash free 90-95% •
Monolithic code (~ 3k lines of code classes) 👉 • Release every 1-3 months Preview of 1/20 of the file 34 A4 pages when printed

This 💩 code made the company gaining traction 💰.

How to make company earning more money? Deliver faster, iterate
more often.

The problem: release every 1-3 months The causes I pointed
out: • Zero tests • Crash free 90-95% • Monolithic code My remedy: “Freeze product development for 3-6 months and let me build this 👉”

Next product change in half a year!!! 😱 Can your
company afford that? (*in the end we got 1 full month to improve our codebase)

Why our release process wasn’t stable? • Centralised QA team
(Available for us usually on Thursdays) • Manual testing from the ground up • A lot of back and forths between QA and devs THURSDAY

Monday: Start coding a feature Tuesday: Coding finished Wednesday: 1
day left for QA, let’s add one more feature… Thursday: ... Friday: “It took more than predicted… 󰣻” Let’s wait another 6 days.

Next Thursday: QA: “app is crashing”. Me: “Ok, 3 lines
of code” QA: “Cool, commit it. We will check it next thursday” Next Friday to Wednesday: Let’s add a few more changes... Next Next Thursday: QA: “app is crashing”. Me: “🤬...”

Goal #1: stable release cycle, once per month

Goal #1: stable release cycle, once per month How: Unit
Tests to decrease back and forths between QA and devs. Supporting metrics: Unit Tests coverage

Our rules for unit testing • Bug, once found, will
never be repeated • Test a logic which is hard to reproduce • Test tedious things which need to be tested • Improve code architecture (“if it’s hard to test, it’s a bug”)

“It’s easy to fool code coverage metrics.” Yes, unless you
have a good purpose to use them. Our goal’s tracker in 2015/16

Purpose of measuring test coverage and improving it • Good
practice • Others do this • Faster product delivery • Identify what’s not tested Martin Fowler about test coverage metrics: https://martinfowler.com/bliki/TestCoverage.html

Milestone #1: At least 1 release per month What else:
• Crash free 95% -> 99.0% • Better code architecture (MVP, DI, testing is easier) • Unit tests coverage 50-60%

Goal #2: Release cycle 1 month -> 2 weeks, Crash
Free >= 99.0%

Goal #2: Release cycle 1 month -> 2 weeks +
Crash Free >= 99.0% How: • QA testers -> QA engineers • Reduce manual testing as much as possible

QA engineers in the team, why now? • Not possible
before code cleanup • Without unit tests we would automate wrong things - see testing pyramid 👉 • Internal career progression (QA Tester => QA Engineer) Martin Fowler about testing pyramid: https://martinfowler.com/bliki/TestPyramid.html

QA engineers priorities 1. 󰡷 Test new releases (we cannot
be slower than 1/mo) 2. 🤖 Automate as much as possible (we have to be faster than 1/mo)

Why functional and UI tests?

Mobile fragmentation OpenSignal report on Android fragmentation in 2015 (link)

Unit tests aren’t enough (esp. after 50-60% test coverage)

Test things in the easiest place to test them Backend
(monolithic system)

UI & functional tests coverage - not % but product
features 1. Login, registration 2. Price, transaction, payment 3. Everything else (with the focus on things which take the most of our manual testing time)

Milestone #2: Release train - 2 weeks Stable crash-free -
99.0% What else: • QA engineers in the team • Hundreds of functional and UI tests • Unit tests coverage 60-70%

Goal #3: Release cycle, 2 weeks to 1 week, Crash
Free >= 99.5%

Goal #3: Release cycle, 2 weeks -> 1 week +
Crash-free >= 99.5% How: • Breaking changes in testing stack

Testing stack was pushed to its limits 🥵 • 5
hours for full test suite (probably no single successful run) • Non-measurable flakiness • Hard to debug (eps. AVDs, ADB) • No internal competencies to improve test runs management (Fastlane/Ruby)

AutomationTestSupervisor Configurable tests sharding Re-run failing tests Multi-level logging AVD,
ADB, app logs Testing stack as a code (AVD management, test packages split)

1. A few months of development 2. Ruby migrated to
Python (our competency) 3. Logs which work for us 4. 5-6 parallel simulators on maxed out Macbook Pro 5. Testing time reduced by 50% (2-3hrs now) AutomationTestSupervisor Full blog post about ATS (link) AutomationTestSupervisor on Github (link)

1. 2017 (Xcode 9) - test sharding via command line
2. 2018 (Xcode 10) - test sharding integrated in Xcode UI 3. 4hrs reduced to 1hr due to parallelisation How about iOS? Blog post with full coverage of iOS parallel testing (link)

Milestone #3: Release train - 1 week Crash-free - 99.5%
What else: • 2-3hrs for QA • Full control over testing stack • Tests parallelisation • Emerging picture of flakiness

Goal #4: Decentralize mobile engineering team Keep 1 week release
and crash free 99.5% How: • Process speed-up through simplifications

Challenges • 20% flakiness • Overgrown test suite - too
many tests, too long to run • Custom test stack

Test things in the easiest right place to test them
Backend (microservices system)

Speed up test configuration phase Launch app Create recipient Make
payment Main screen Check price Assertions on transfer status Launch app Create recipient Make payment Main screen Check price Assertions on transfer status QA utils preconfiguration (microservice) Transfer status screen Transfer status screen configuration Do what you need to do assertions +1s added 30s saved

Remove unnecessary tests

"Adding is favoured over subtracting in problem solving" “(...) subtractive
solutions are also less likely to be appreciated. People might expect to receive less credit for subtractive solutions than for additive ones. A proposal to get rid of something might feel less creative than would coming up with something new to add(...)” Nature | Vol 592 | April 2021 https://www.nature.com/articles/d41586-021-00592-0

Results after test count reduction and simplification • 20% flakiness
-> 10% • Test execution < 2hrs • Removed ~100 UI/functional tests (~20%) while product was growing Our thoughts on flakiness (link)

Milestone #4: Decentralised team (2 teams work in parallel) Release
train - 1 week Stable crash-free - 99.5% What else: • QA test run < 2hrs • Flakiness <10% • Tests in the right place

Goal #5: Try team’s scalability (2 -> 3 teams working
in parallel) + Keep 1 week release train + Improve crash free >99.9%

• Keeping it up to date (SDKs, dev tools, emulators)
• Limitations of local machines • No support from the outside of the world (we’re still team of 5) AutomationTestSupervisor - when unlocker becomes an obstacle

Flakiness 10% 300 test = 30 flaky tests 👇 QA
engineers insights and manual re-tests double or triple the testing time

Firebase Test Lab

One step backward • No control over ADB (e.g. resetting
the app or device state) • Tests debugging 😱 😱 😱 (scrolling through videos, kilometers of logs) • Harder to get data for our reports (no webhooks, just scraping data from console output). Firebase Test Lab

One step forward • Support from the community https://firebase.community/ •
QA engineer’s machine isn't blocked • Sharing test results with software engineers by copy/pasting URL addresses (remote work) • Unlimited scaling. We effortlessly increased from 5 emulators on the local machine to 20 on Firebase Test Lab Firebase Test Lab

Results? • Testing time 2h -> 25min • flakiness 10%
-> 2% (thanks to faster iterations)

💰 400$/mo - is it a lot? • QA machine
is not blocked anymore • Software engineer gets feedback 8x faster • No maintenance freezes = few hundreds of hours saved per year

Firebase Test Lab Implementation details • Tests sharding with Flank
https://flank.github.io/flank/ • Do we really need video? Better logs instead 👉

Milestone #5: Decentralised team Release train - 1 week Stable
crash-free > 99.99% 💪 What else: • QA test run < 30min • Flakiness 1-2%

Goal #6 (the current one): Everyday release (if we want
to) + Keep crash free > 99.99%

How to release a change within 24hrs? • Low flakiness
and <30min test results mean: ◦ QA self-service for software engineers ◦ QA engineers responsible for test stack, not just testing • Better code review process 👉 • and others: multimodule project, CI/CD in the cloud, Kotlin How we improved code review process (link)

6 years later, mobile engineering at Azimo • Release once
per week (soon: on demand) • Crash free 99.99% • Decentralised mobile engineering • QA culture • Small team

Release 1/month, crash-free > 95% Release 1/2-weeks, crash-free 99.0% Release
1/week, crash-free 99.5% Release 1/week, crash-free 99.5% 2 parallel teams Release 1/week, crash-free 99.99% 2-3 parallel teams (ongoing) Release on demand & crash-free 99.99% The journey

Don’t follow the hype. Be 1% better each day.

References AzimoLabs blog - Series about testing history (5 articles)
https://medium.com/azimolabs/the-evolution-of-apps-quality-assurance-at-azimo-b2fa31d5cc5e - Code review process improvements https://medium.com/azimolabs/how-we-improved-code-review-process-in-android-engineering-team-a637dd68cfaa - Parallel testing of iOS app https://medium.com/azimolabs/parallel-testing-get-feedback-earlier-release-faster-b66d4dd08930 - Story behind AutomationTestSupervisor https://medium.com/azimolabs/story-behind-automationtestsupervisor-our-custom-made-tool-for-android-automation-tests-180c74a5cbfb - What is flakiness https://medium.com/azimolabs/what-is-flakiness-and-how-we-deal-with-it-39b270ed5445 Martin Fowler blog - Test coverage - https://martinfowler.com/bliki/TestCoverage.html - Tests pyramid - https://martinfowler.com/bliki/TestPyramid.html Nature Magazine - Adding is favoured over subtracting in problem solving - https://www.nature.com/articles/d41586-021-00592-0

Thank you! [email protected] twitter.com/froger_mcs

True reasons behind software testing

True reasons behind software testing

More Decks by Mirosław Stanek

Other Decks in Technology

Featured

Transcript