Scaling_Mobile_Test_Automation_with_Appium_and_AI

Scaling Mobile Test Automation with Appium and AI Pann Nu
Wai QA Engineer — KINTO Technologies SeleniumConf Valencia 2026

AGENDA The Breaking Point Framework Evolution AI Integration From Tools
to Culture Visual Regression Testing Real Impact & Takeaways

01 The Breaking Point

The Scale of the Problem 128 Scenarios 2 projects 12h
Total Execution per full run 2 QA Engineers managing all maintenance 6h Per Project execution time ➤ Flaky tests blocked releases ➤ 6-8h/week on test maintenance ➤ No time for new tests ➤ Key-person dependency risk 04

Release Blockers 05 Flaky Tests ➤ Unstable and unreliable test
results ➤ Environment dependency, timing issues Slow Feedback ➤ Slow feedback from CI/CD pipeline ➤ Developers skip waiting for results Cost of Delay ➤ Multiple costs from release delays ➤ Slower delivery, declining morale

02 Framework Evolution: Building for Scale

Modular Page Object Model 07 WHAT WE DID ➤ Modular
Appium framework refactoring ➤ Page Object + component design ➤ Shared iOS/Android abstraction layer ➤ Reusable component library RESULT 40%+ Test Duplication Reduced Cross-platform code sharing reduced maintenance significantly

CI/CD Integration with GitHub Actions 08 PR Trigger — Auto-trigger
tests on PR events Smart Grouping — Tests selected by change impact Parallel Execution — Parallel test execution for faster runs Auto Report — Auto-report results with Slack notification

GitHub Actions Workflow Build Maven validate, compile Dependency cache Mobile
Test APK / App download Emulator / Simulator + Appium Code Quality Import check Dependency analysis Build Summary Status report Slack notification 09 ➤ Triggers: push to develop/main, PR, manual dispatch ➤ AI Labeler: auto-tag PRs with AI tool usage

Smart Grouping Strategy PR TRIGGER (Fast) ➤ Core smoke test
suite only ➤ Core flow smoke test ➤ Fast feedback per PR ➤ Every PR auto-triggered FULL REGRESSION (Deep) ➤ Grouped test suites by feature ➤ All scenarios across projects ➤ Feature-based test categories ➤ Manual dispatch or merge 10

Framework Evolution: Impact 11 BEFORE AFTER Excessive time spent on
maintenance & CI/CD Heavy test duplication Flaky tests blocked releases Key-person dependency (2 QA) Maintenance & CI/CD time reduced by over 70% Test duplication eliminated Stable tests, reliable releases Shared framework & documentation

03 AI Integration: What Worked, What Didn’t

What Worked Claude Log Analysis & Failure Summaries Log analysis,
cause & fix tips GitHub Copilot Test Code Refactoring Refactoring suggestions, auto-generated template code Devin Automated Documentation Auto-generated docs from Appium code 13

What Did Not Work ✗ AI-Generated Selectors Pass locally, silently
fail in CI ✗ Over-Automated Tests Mass auto-generation of shallow, ineffective tests ✗ Lack of Transparency Unclear AI recommendations causing trust issues 14

AI in QA: Successes vs Failures 15 SUCCESS ✓ Automated
log analysis (Claude) ✓ Code refactoring support (Copilot) ✓ Documentation generation (DevinAI) ✓ Improved engineer productivity FAILURE ✗ AI-generated selectors failing in CI ✗ Mass generation of shallow automated tests ✗ AI becoming a black box ✗ Over-reliance on AI tools ✗ Trust issues within the team

Recovery Strategies 16 Validation Checklists Validate AI code before production
use Engineer Training Prompt design and AI review workshops

04 From Tools to Culture

Overcoming Resistance to AI Adoption 18 Skepticism — "AI is
useless" — team reaction Experimentation — Building up small wins Adoption — Whole team leveraging AI tools Culture — Working with AI becomes the norm

Standards & Community 19 INTERNAL STANDARDS ➤ AI code review
criteria ➤ Test quality checklists ➤ AI tool usage guidelines COMMUNITY ➤ Appium Meetup Tokyo ➤ Internal blog knowledge sharing ➤ Conference presentations

05 Visual Regression Testing with AI

Full Page Visual Comparison 21 Capture — Scroll & screenshot
each viewport position Trim — Trim fixed header & footer per image Stitch — Overlap-aware merge into single full-page Compare — Pixel-by-pixel diff against stored baseline Output — Save highlighted difference image

AI in Visual Testing GitHub Copilot Code Generation & Refactoring
22 WHAT COPILOT DOES ➤ Figma API integration template code ➤ Export scripts with rate limiting ➤ Image processing utilities ➤ Test config auto-completion ➤ Comparison logic refactoring CONCRETE EXAMPLES ➤ Retry & rate limit scripts ➤ Screen mapping config generation ➤ Image scaling helper methods ➤ Stitching algorithm template code ➤ Test suite XML generation

AI in Visual Testing Claude Architecture & Intelligent Analysis 23
WHAT CLAUDE DOES ➤ Visual regression architecture design ➤ Figma screen-to-test mapping ➤ Screen-type-aware diff thresholds ➤ Diff image analysis & explanation CONCRETE EXAMPLES ➤ SmartVisualRegressionHelper design ➤ ScreenMappingConfig auto-generation ➤ Excluded-area definitions ➤ Threshold tuning per screen type

AI in Visual Testing Figma API Design as Source of
Truth 24 WHAT FIGMA API PROVIDES ➤ Export designs as PNG baselines ➤ Screen IDs in frame names ➤ Page-based filtered exports ➤ Smart retry & rate limiting ➤ Design changes auto-detected INTEGRATION BENEFITS ➤ No manual baseline maintenance ➤ Figma expected UI truth ➤ Auto-detect design changes ➤ Filtered export by page/frame ➤ Consistent cross-platform baselines

06 Real Impact & Takeaways

Key Results 70% Maintenance Effort Reduced 30% CI/CD Feedback Speedup
Improved Release Reliability & Developer Trust 26 PROBLEMS RESOLVED ✓ Flaky tests → Stabilized with smart retry & environment isolation ✓ Release blockers → Eliminated through reliable CI/CD pipeline ✓ No capacity for new tests → 6h/week freed by maintenance reduction ✓ Key-person dependency → Resolved with shared framework & documentation

Getting Started: Action Plan 27 Test Suite Audit Template Evaluate
test quality, coverage, and costs AI Guardrail Checklist Ensure AI-generated code quality, validation processes Refactoring Starter Kit Step-by-step guide for framework improvement

Key Takeaways 28 Appium framework refactoring for scalability Integrating Claude,
Copilot, DevinAI into QA AI failure modes & avoidance strategies Checklists for reliability and reduced maintenance Scalable culture balancing automation with judgment

Automation is not about replacing human judgment. It's about giving
humans more time to exercise it. 29

Thank You Questions & Discussion Pann Nu Wai QA Engineer
— KINTO Technologies SeleniumConf Valencia 2026 30

Scaling_Mobile_Test_Automation_with_Appium_and_AI

Scaling_Mobile_Test_Automation_with_Appium_and_AI

KintoTech_Dev

More Decks by KintoTech_Dev

Other Decks in Technology

Featured

Transcript