Upgrade to Pro — share decks privately, control downloads, hide ads and more …

True reasons behind software testing

True reasons behind software testing

Mobiconf 2021 (07.10)

Have you ever wondered how startups’ proof of concepts with zero tests evolve into environments that do automated quality checks 24/7? No company invests countless hours of engineering work and an enormous amount of money only to maintain good engineering practices. Testing automations are more than that. They make your engineering team agile, enable a company to scale up and make your product successful. And this is not only due to crash-free metrics and product quality.

In this talk, you will see the evolution of mobile QA at Azimo. I will walk you through our journey from the project with no unit tests to where we are now. We will cover:

code coverage metrics (and when to stop using them),
QA culture in the entire engineering team,
building and maintaining a custom testing stack,
parallel testing in the cloud,
our evolution to thousands of tests just to come back to a fraction of them,
and more.
You will learn not only about what we built but also what was the business reasoning behind our decisions. It’s not a technical deep-dive presentation. It’s about how to make your company successful.

Implementation details are on you.


Mirosław Stanek

October 07, 2021

More Decks by Mirosław Stanek

Other Decks in Technology


  1. True reasons behind software testing

  2. Us vs Them (startup I work for) (everyone else 😒)

  3. 6 years later, mobile engineering at Azimo • Release once

    per week • Crash free 99.99% • Decentralised mobile engineering • QA culture • Small team
  4. “We don’t follow the hype” - one of Azimo’s company

    values • No devices racks shelves • No 24/7 monkey runners • Unit tests coverage - probably ~50% • Keep number of UI and functional tests as low as possible For hard working people Every moment counts None of us is as smart as all of us We don’t follow the hype
  5. The beginning • Zero tests • Crash free 90-95% •

    Monolithic code (~ 3k lines of code classes) 👉 • Release every 1-3 months Preview of 1/20 of the file 34 A4 pages when printed
  6. This 💩 code made the company gaining traction 💰.

  7. How to make company earning more money? Deliver faster, iterate

    more often.
  8. The problem: release every 1-3 months The causes I pointed

    out: • Zero tests • Crash free 90-95% • Monolithic code My remedy: “Freeze product development for 3-6 months and let me build this 👉”
  9. Next product change in half a year!!! 😱 Can your

    company afford that? (*in the end we got 1 full month to improve our codebase)
  10. Why our release process wasn’t stable? • Centralised QA team

    (Available for us usually on Thursdays) • Manual testing from the ground up • A lot of back and forths between QA and devs THURSDAY
  11. Monday: Start coding a feature Tuesday: Coding finished Wednesday: 1

    day left for QA, let’s add one more feature… Thursday: ... Friday: “It took more than predicted… 󰣻” Let’s wait another 6 days.
  12. Next Thursday: QA: “app is crashing”. Me: “Ok, 3 lines

    of code” QA: “Cool, commit it. We will check it next thursday” Next Friday to Wednesday: Let’s add a few more changes... Next Next Thursday: QA: “app is crashing”. Me: “🤬...”
  13. Goal #1: stable release cycle, once per month

  14. Goal #1: stable release cycle, once per month How: Unit

    Tests to decrease back and forths between QA and devs. Supporting metrics: Unit Tests coverage
  15. Our rules for unit testing • Bug, once found, will

    never be repeated • Test a logic which is hard to reproduce • Test tedious things which need to be tested • Improve code architecture (“if it’s hard to test, it’s a bug”)
  16. “It’s easy to fool code coverage metrics.” Yes, unless you

    have a good purpose to use them. Our goal’s tracker in 2015/16
  17. Purpose of measuring test coverage and improving it • Good

    practice • Others do this • Faster product delivery • Identify what’s not tested Martin Fowler about test coverage metrics: https://martinfowler.com/bliki/TestCoverage.html
  18. Milestone #1: At least 1 release per month What else:

    • Crash free 95% -> 99.0% • Better code architecture (MVP, DI, testing is easier) • Unit tests coverage 50-60%
  19. Goal #2: Release cycle 1 month -> 2 weeks, Crash

    Free >= 99.0%
  20. Goal #2: Release cycle 1 month -> 2 weeks +

    Crash Free >= 99.0% How: • QA testers -> QA engineers • Reduce manual testing as much as possible
  21. QA engineers in the team, why now? • Not possible

    before code cleanup • Without unit tests we would automate wrong things - see testing pyramid 👉 • Internal career progression (QA Tester => QA Engineer) Martin Fowler about testing pyramid: https://martinfowler.com/bliki/TestPyramid.html
  22. QA engineers priorities 1. 󰡷 Test new releases (we cannot

    be slower than 1/mo) 2. 🤖 Automate as much as possible (we have to be faster than 1/mo)
  23. Why functional and UI tests?

  24. Mobile fragmentation OpenSignal report on Android fragmentation in 2015 (link)

  25. Unit tests aren’t enough (esp. after 50-60% test coverage)

  26. Test things in the easiest place to test them Backend

    (monolithic system)
  27. UI & functional tests coverage - not % but product

    features 1. Login, registration 2. Price, transaction, payment 3. Everything else (with the focus on things which take the most of our manual testing time)
  28. Milestone #2: Release train - 2 weeks Stable crash-free -

    99.0% What else: • QA engineers in the team • Hundreds of functional and UI tests • Unit tests coverage 60-70%
  29. Goal #3: Release cycle, 2 weeks to 1 week, Crash

    Free >= 99.5%
  30. Goal #3: Release cycle, 2 weeks -> 1 week +

    Crash-free >= 99.5% How: • Breaking changes in testing stack
  31. Testing stack was pushed to its limits 🥵 • 5

    hours for full test suite (probably no single successful run) • Non-measurable flakiness • Hard to debug (eps. AVDs, ADB) • No internal competencies to improve test runs management (Fastlane/Ruby)
  32. AutomationTestSupervisor Configurable tests sharding Re-run failing tests Multi-level logging AVD,

    ADB, app logs Testing stack as a code (AVD management, test packages split)
  33. 1. A few months of development 2. Ruby migrated to

    Python (our competency) 3. Logs which work for us 4. 5-6 parallel simulators on maxed out Macbook Pro 5. Testing time reduced by 50% (2-3hrs now) AutomationTestSupervisor Full blog post about ATS (link) AutomationTestSupervisor on Github (link)
  34. 1. 2017 (Xcode 9) - test sharding via command line

    2. 2018 (Xcode 10) - test sharding integrated in Xcode UI 3. 4hrs reduced to 1hr due to parallelisation How about iOS? Blog post with full coverage of iOS parallel testing (link)
  35. Milestone #3: Release train - 1 week Crash-free - 99.5%

    What else: • 2-3hrs for QA • Full control over testing stack • Tests parallelisation • Emerging picture of flakiness
  36. Goal #4: Decentralize mobile engineering team Keep 1 week release

    and crash free 99.5% How: • Process speed-up through simplifications
  37. Challenges • 20% flakiness • Overgrown test suite - too

    many tests, too long to run • Custom test stack
  38. Test things in the easiest right place to test them

    Backend (microservices system)
  39. Speed up test configuration phase Launch app Create recipient Make

    payment Main screen Check price Assertions on transfer status Launch app Create recipient Make payment Main screen Check price Assertions on transfer status QA utils preconfiguration (microservice) Transfer status screen Transfer status screen configuration Do what you need to do assertions +1s added 30s saved
  40. Remove unnecessary tests

  41. "Adding is favoured over subtracting in problem solving" “(...) subtractive

    solutions are also less likely to be appreciated. People might expect to receive less credit for subtractive solutions than for additive ones. A proposal to get rid of something might feel less creative than would coming up with something new to add(...)” Nature | Vol 592 | April 2021 https://www.nature.com/articles/d41586-021-00592-0
  42. Results after test count reduction and simplification • 20% flakiness

    -> 10% • Test execution < 2hrs • Removed ~100 UI/functional tests (~20%) while product was growing Our thoughts on flakiness (link)
  43. Milestone #4: Decentralised team (2 teams work in parallel) Release

    train - 1 week Stable crash-free - 99.5% What else: • QA test run < 2hrs • Flakiness <10% • Tests in the right place
  44. Goal #5: Try team’s scalability (2 -> 3 teams working

    in parallel) + Keep 1 week release train + Improve crash free >99.9%
  45. • Keeping it up to date (SDKs, dev tools, emulators)

    • Limitations of local machines • No support from the outside of the world (we’re still team of 5) AutomationTestSupervisor - when unlocker becomes an obstacle
  46. Flakiness 10% 300 test = 30 flaky tests 👇 QA

    engineers insights and manual re-tests double or triple the testing time
  47. Firebase Test Lab

  48. One step backward • No control over ADB (e.g. resetting

    the app or device state) • Tests debugging 😱 😱 😱 (scrolling through videos, kilometers of logs) • Harder to get data for our reports (no webhooks, just scraping data from console output). Firebase Test Lab
  49. One step forward • Support from the community https://firebase.community/ •

    QA engineer’s machine isn't blocked • Sharing test results with software engineers by copy/pasting URL addresses (remote work) • Unlimited scaling. We effortlessly increased from 5 emulators on the local machine to 20 on Firebase Test Lab Firebase Test Lab
  50. Results? • Testing time 2h -> 25min • flakiness 10%

    -> 2% (thanks to faster iterations)
  51. 💰 400$/mo - is it a lot? • QA machine

    is not blocked anymore • Software engineer gets feedback 8x faster • No maintenance freezes = few hundreds of hours saved per year
  52. Firebase Test Lab Implementation details • Tests sharding with Flank

    https://flank.github.io/flank/ • Do we really need video? Better logs instead 👉
  53. Milestone #5: Decentralised team Release train - 1 week Stable

    crash-free > 99.99% 💪 What else: • QA test run < 30min • Flakiness 1-2%
  54. Goal #6 (the current one): Everyday release (if we want

    to) + Keep crash free > 99.99%
  55. How to release a change within 24hrs? • Low flakiness

    and <30min test results mean: ◦ QA self-service for software engineers ◦ QA engineers responsible for test stack, not just testing • Better code review process 👉 • and others: multimodule project, CI/CD in the cloud, Kotlin How we improved code review process (link)
  56. 6 years later, mobile engineering at Azimo • Release once

    per week (soon: on demand) • Crash free 99.99% • Decentralised mobile engineering • QA culture • Small team
  57. Release 1/month, crash-free > 95% Release 1/2-weeks, crash-free 99.0% Release

    1/week, crash-free 99.5% Release 1/week, crash-free 99.5% 2 parallel teams Release 1/week, crash-free 99.99% 2-3 parallel teams (ongoing) Release on demand & crash-free 99.99% The journey
  58. Don’t follow the hype. Be 1% better each day.

  59. References AzimoLabs blog - Series about testing history (5 articles)

    https://medium.com/azimolabs/the-evolution-of-apps-quality-assurance-at-azimo-b2fa31d5cc5e - Code review process improvements https://medium.com/azimolabs/how-we-improved-code-review-process-in-android-engineering-team-a637dd68cfaa - Parallel testing of iOS app https://medium.com/azimolabs/parallel-testing-get-feedback-earlier-release-faster-b66d4dd08930 - Story behind AutomationTestSupervisor https://medium.com/azimolabs/story-behind-automationtestsupervisor-our-custom-made-tool-for-android-automation-tests-180c74a5cbfb - What is flakiness https://medium.com/azimolabs/what-is-flakiness-and-how-we-deal-with-it-39b270ed5445 Martin Fowler blog - Test coverage - https://martinfowler.com/bliki/TestCoverage.html - Tests pyramid - https://martinfowler.com/bliki/TestPyramid.html Nature Magazine - Adding is favoured over subtracting in problem solving - https://www.nature.com/articles/d41586-021-00592-0
  60. Thank you! mirek@azimo.com twitter.com/froger_mcs