Upgrade to Pro — share decks privately, control downloads, hide ads and more …

No More Writing Test Code: JetBrains AI Assista...

Avatar for makun makun
September 08, 2025

No More Writing Test Code: JetBrains AI Assistant Automates Design and Generation of Asynchronous Processing Tests

Avatar for makun

makun

September 08, 2025
Tweet

More Decks by makun

Other Decks in Programming

Transcript

  1. DroidKaigi 2025 No More Writing Test Code: JetBrains AI Assistant

    Automates Design and Generation of Asynchronous Processing Tests Masahiro Saito
  2. Masahiro Saito 齊藤 正宏 pixiv Inc. 8th Year Smartphone App

    Engineer Engineering Recruitment Lead github.com/m4kvn
  3. Agenda 1. The Pain of Android Testing 2. Introducing JetBrains

    AI Assistant 3. Let's hand off Room(DB) testing entirely to AI 4. Taking on the challenge of testing WorkManager 5. The Limits of AI Test Design and Generation and How to Work with Them Effectively 6. Potential for AI Test Design and Generation
  4. The Ideal Android Test Quality is guaranteed through testing! We're

    releasing with zero bugs! • High test coverage • Robust testing resilient to specification changes • Test code anyone can write
  5. The Reality of Android Testing "Ugh, I don't have time...

    The test can wait..." • When asynchronous processing (coroutines, flows) comes into play, things get complicated all at once. • DI, DB, API communication... Preparing mocks is just such a hassle. • Test code becomes personalized and is not maintained. Conclusion: Writing tests is a time-consuming and labor-intensive task.
  6. // Example: Part of a ViewModel fun fetchData() { viewModelScope.launch

    { _uiState.value = Loading try { val data = repository.fetchData() // suspend fun val processedData = process(data) _uiState.value = Success(processedData) } catch (e: Exception) { _uiState.value = Error(e) } } } Example of a ViewModel function with general data retrieval logic
  7. Testing asynchronous processing is full of formalities • Thread Replacement

    • Monitoring asynchronous values • Dependency Mocking Writing these formalities by hand every time is a real hassle. Shouldn't we leave this tedious routine work to AI?
  8. JetBrains AI Assistant • An AI assistant that can be

    integrated into JetBrains IDEs such as IntelliJ IDEA and Android Studio. • When you throw natural language instructions in a chat format, the AI understands the code context and suggests test code or refactoring proposals. As a developer's "pair programmer," providing powerful support for daily routine tasks, research, and design.
  9. Gemini • Gemini is available in Android Studio. • Gemini

    and JetBrains AI Assistant are different. • We will not discuss the differences in this session.
  10. Key evaluation criteria Accuracy: Whether the implementation is correct against

    specifications, or whether false positives/false negatives are low Coverage: Command coverage and branch coverage; are critical paths satisfied? Reproducibility: Is stability high outside of local environments such as CI? Maintainability: Whether intent is clear and whether the cost of adapting to specification changes is low Speed/Cost: Low costs such as CI execution time and review time Based on these evaluation criteria, determine whether to entrust it to AI.
  11. @Dao interface TaskDao { @Query("SELECT * FROM tasks") fun getTasks():

    Flow<List<Task>> @Insert(onConflict = OnConflictStrategy.REPLACE) suspend fun insertTask(task: Task) @Update suspend fun updateTask(task: Task) @Query("DELETE FROM tasks WHERE id = :taskId") suspend fun deleteTaskById(taskId: String) } TaskDao.kt
  12. Please create unit tests for TaskDao. The tests you create

    must meet the following requirements. ・Use an in-memory database ・Initialize the database in @Before and close it in @After ・Tests use `runTest` from kotlinx-coroutines-test Test cases must cover the following. ・Tasks inserted using insertTask can be correctly retrieved using getTasks ・The content updated via updateTask is correctly reflected ・Tasks deleted using deleteTaskById disappear from the list Prompt for generating TaskDao tests
  13. @RunWith(AndroidJUnit4::class) class TaskDaoTest { private lateinit var database: TaskDatabase private

    lateinit var taskDao: TaskDao @Before fun setup() { database = Room.inMemoryDatabaseBuilder( ApplicationProvider.getApplicationContext(), TaskDatabase::class.java ).allowMainThreadQueries().build() taskDao = database.taskDao() } @After fun teardown() { database.close() } TaskDaoTest.kt
  14. @Test fun insertTask_getsTasks_returnsInsertedTask() = runTest { // Given val task

    = Task(id = "task1", title = "Test Task") // When taskDao.insertTask(task) val tasks = taskDao.getTasks().first() // Then assertEquals(1, tasks.size) assertEquals(task, tasks[0]) } TaskDaoTest.kt
  15. @Test fun insertTask_getsTasks_returnsInsertedTask() @Test fun updateTask_getsTasks_returnsUpdatedTask() @Test fun deleteTaskById_getsTasks_returnsEmptyList() @Test

    fun multipleInsertTasks_getsTasks_returnsAllTasks() @Test fun insertDuplicateTaskId_getsTasks_replacesOldTask() TaskDaoTest.kt
  16. @Test fun multipleInsertTasks_getsTasks_returnsAllTasks() = runTest { ... // Then assertEquals(3,

    tasks.size) assertTrue(tasks.contains(task1)) assertTrue(tasks.contains(task2)) assertTrue(tasks.contains(task3)) } TaskDaoTest.kt
  17. 【Practice 1】Key Points to AI-Generated Code • The flow of

    "Setup → Operation → Verification" is clearly defined. • Covering the normal operation of basic CRUD
  18. // Given val task1 = Task(id = "task1", title =

    "Task 1") val task2 = Task(id = "task2", title = "Task 2") val task3 = Task(id = "task3", title = "Task 3") // When taskDao.insertTask(task1) taskDao.insertTask(task2) taskDao.insertTask(task3) val tasks = taskDao.getTasks().first() // Then assertEquals(3, tasks.size) assertTrue(tasks.contains(task1)) assertTrue(tasks.contains(task2)) assertTrue(tasks.contains(task3)) TaskDaoTest.kt
  19. // Given val task1 = Task(id = "same_id", title =

    "Old Task") val task2 = Task(id = "same_id", title = "New Task") // When taskDao.insertTask(task1) taskDao.insertTask(task2) // Overwrite with same ID val tasks = taskDao.getTasks().first() // Then assertEquals(1, tasks.size) assertEquals(task2, tasks[0]) assertEquals("New Task", tasks[0].title) TaskDaoTest.kt
  20. Accuracy Evaluation (Good) • We have clearly verified the normal

    flow of basic CRUD operations. • @Before/@After setup and cleanup eliminate side effects between tests. • Using runTest, we can accurately verify asynchronous processing as well.
  21. // Given val task1 = Task(id = "task1", title =

    "Task 1") val task2 = Task(id = "task2", title = "Task 2") val task3 = Task(id = "task3", title = "Task 3") // When taskDao.insertTask(task1) taskDao.insertTask(task2) taskDao.insertTask(task3) val tasks = taskDao.getTasks().first() // Then assertEquals(3, tasks.size) assertTrue(tasks.contains(task1)) assertTrue(tasks.contains(task2)) assertTrue(tasks.contains(task3)) TaskDaoTest.kt depends on the implementation of equals/hashCode
  22. Accuracy Evaluation (Bad) • During deletion and update testing, false

    positives remain due to the absence of field-level assertions and explicit verification of record counts and side effects. • Behavior during exceptions or constraint violations (e.g., null in required fields) is not covered.
  23. Summary of Accuracy Evaluation • "The basic CRUD normal flow"

    is sufficiently accurate. • More rigorous verification and coverage of abnormal conditions are required. ◦ Field-level assertions ◦ Explicit verification of case numbers and adverse effects
  24. Coverage Evaluation (Good) • Basic CRUD normal cases (covering all

    commands) are sufficiently covered. ◦ Inserting new tasks or retrieving multiple tasks ◦ REPLACE (overwrite) operation with the same ID • A set of test cases is available to verify that commonly used daily operations function as expected.
  25. // Given val task1 = Task(id = "task1", title =

    "Task 1") val task2 = Task(id = "task2", title = "Task 2") val task3 = Task(id = "task3", title = "Task 3") // When taskDao.insertTask(task1) taskDao.insertTask(task2) taskDao.insertTask(task3) val tasks = taskDao.getTasks().first() // Then assertEquals(3, tasks.size) assertTrue(tasks.contains(task1)) assertTrue(tasks.contains(task2)) assertTrue(tasks.contains(task3)) TaskDaoTest.kt
  26. Unverified branches and edge cases • Retrieval from the empty

    database • Behavior of update or delete operations on non-existent IDs • Guarantee of retrieval order during multiple inserts • Verification of event sequence and content during consecutive operations such as insert→update→delete • Exception handling for abnormal conditions, such as when required fields are null or when constraints are violated
  27. Summary of Coverage Evaluation • The basic CRUD operations (covering

    all commands) are sufficiently covered, making it perfectly adequate for everyday use. • Enhancing robustness requires additional verification of branching and abnormal conditions.
  28. Reproducibility Evaluation (Good) • By utilizing an in-memory database and

    runTest, external factors are minimized, ensuring basic reproducibility. • The same input and state will yield the same result.
  29. @RunWith(AndroidJUnit4::class) class TaskDaoTest { private lateinit var database: TaskDatabase private

    lateinit var taskDao: TaskDao @Before fun setup() { database = Room.inMemoryDatabaseBuilder( ApplicationProvider.getApplicationContext(), TaskDatabase::class.java ).allowMainThreadQueries().build() taskDao = database.taskDao() } TaskDaoTest.kt Dependent on emulators or physical devices
  30. // Given val task1 = Task(id = "task1", title =

    "Task 1") val task2 = Task(id = "task2", title = "Task 2") val task3 = Task(id = "task3", title = "Task 3") // When taskDao.insertTask(task1) taskDao.insertTask(task2) taskDao.insertTask(task3) val tasks = taskDao.getTasks().first() // Then assertEquals(3, tasks.size) assertTrue(tasks.contains(task1)) assertTrue(tasks.contains(task2)) assertTrue(tasks.contains(task3)) TaskDaoTest.kt Strict verification of the sequence and timing of sequential operations causes reproducibility to fluctuate.
  31. Summary of Reproducibility Evaluation • "Basic CRUD single-test verification" possesses

    sufficient reproducibility. • When complex branching, sequential events, and the elimination of environmental dependencies are required, additional measures such as reviewing test runners and verification methods are necessary.
  32. Maintainability Evaluation (Good) • The test code is simple and

    easy to read, with clear separation of responsibilities. • Setup and cleanup using @Before/@After are appropriate and have minimal side effects. • Test case naming in GWT format makes the intent clear
  33. @RunWith(AndroidJUnit4::class) class TaskDaoTest { private lateinit var database: TaskDatabase private

    lateinit var taskDao: TaskDao @Before fun setup() { database = Room.inMemoryDatabaseBuilder( ApplicationProvider.getApplicationContext(), TaskDatabase::class.java ).allowMainThreadQueries().build() taskDao = database.taskDao() } TaskDaoTest.kt The process of generating DB/DAO is prone to duplication
  34. Maintainability Evaluation (Bad) • The lack of standardization regarding assumptions

    about acquisition order and methods for verifying flow may lead to inconsistent testing approaches in the future. • We want to standardize testing styles within the project, such as explicitly specifying ORDER BY in DAOs to clarify ordering assumptions and unifying Flow verification using Turbine.
  35. Summary of Maintainability Evaluation • Meets a certain standard from

    the perspective of maintainability • By promoting commonality and standardization to prepare for increased test cases and specification changes, we can achieve higher maintainability.
  36. Speed/Cost Evaluation (Good) • Test execution is accelerated by utilizing

    an in-memory database • Low description and execution cost per case • AI generation keeps initial creation costs low
  37. Speed/Cost Evaluation (Bad) • Increasing test cases may lead to

    code duplication becoming a factor in rising costs. • Using device-dependent test runners may increase execution time and CI costs. • Robolectric, parameterized testing, and efforts toward commonality are necessary to maintain the balance between speed and cost.
  38. Summary of Speed/Cost Evaluation • It is highly efficient, and

    given the current scale and content, the development and CI burden is minimal. • To prepare for future expansion and increased complexity, it is necessary to incorporate measures for standardization and efficiency.
  39. Is it acceptable to leave “all” test generation for CRUD

    operations to AI? For basic normal operations and standard patterns, it's perfectly fine to leave it to AI. However, at present, our supplementation and review are still necessary.
  40. class SyncWorker( appContext: Context, params: WorkerParameters, private val api: TaskApi,

    // Retrofit API private val dao: TaskDao // Room DAO ) : CoroutineWorker(appContext, params) { override suspend fun doWork(): Result { return try { val tasks = api.fetchTasks() // Fetch from the network tasks.forEach { dao.insertTask(it) } // Insert into the db Result.success() } catch (e: Exception) { Result.failure() } } } SyncWorker.kt
  41. Please create unit tests for SyncWorker. The tests you create

    must meet the following requirements. ・Use TestListenableWorkerBuilder ・TaskApi and TaskDao, which are dependencies, are mocked using MockK. Test cases should cover the following. Success Case Testing ・If api.fetchTasks() returns a task list, return Result.success ・dao.insertTask() is being called Failure Case Testing ・If api.fetchTasks() throws an exception, return Result.failure() Prompt to generate SyncWorker tests
  42. @RunWith(RobolectricTestRunner::class) class SyncWorkerTest { private lateinit var context: Context private

    lateinit var taskApi: TaskApi private lateinit var taskDao: TaskDao @Before fun setUp() { context = ApplicationProvider.getApplicationContext() taskApi = mockk() taskDao = mockk(relaxUnitFun = true) } SyncWorkerTest.kt
  43. @Test fun givenApiReturnsTasks...() = runTest { // Given val tasks

    = listOf(...) coEvery { taskApi.fetchTasks() } returns tasks val worker = ... // When val result = worker.doWork() // Then assertEquals(ListenableWorker.Result.success(), result) tasks.forEach { task -> coVerify { taskDao.insertTask(task) } } } SyncWorkerTest.kt
  44. 【Practice 2】Key Points to AI-Generated Code • The flow of

    "Setup → Operation → Verification" is clearly defined. • Cover all major branches (success/failure) thoroughly • Properly utilize testing support tools such as MockK, Robolectric, and TestListenableWorkerBuilder
  45. @RunWith(RobolectricTestRunner::class) class SyncWorkerTest { private lateinit var context: Context private

    lateinit var taskApi: TaskApi private lateinit var taskDao: TaskDao @Before fun setUp() { context = ApplicationProvider.getApplicationContext() taskApi = mockk() taskDao = mockk(relaxUnitFun = true) } SyncWorkerTest.kt
  46. @Test fun givenApiReturnsTasks_... = runTest { // Given ... val

    worker = TestListenableWorkerBuilder<SyncWorker>(context) .setWorkerFactory { appContext, params -> SyncWorker( appContext = appContext, params = params, api = taskApi, dao = taskDao ) } .build() ... } SyncWorkerTest.kt
  47. @Test fun givenApiThrowsException_...() = runTest { // Given coEvery {

    taskApi.fetchTasks() } throws Exception("...") // When val worker = ... val result = worker.doWork() // Then assertEquals(ListenableWorker.Result.failure(), result) coVerify(exactly = 0) { taskDao.insertTask(any()) } } SyncWorkerTest.kt
  48. @Test fun givenApiReturnsTasks_... = runTest { // Given ... val

    worker = TestListenableWorkerBuilder<SyncWorker>(context) .setWorkerFactory { appContext, params -> SyncWorker( appContext = appContext, params = params, api = taskApi, dao = taskDao ) } .build() ... } SyncWorkerTest.kt Pass an inherited class of WorkerFactory
  49. Accuracy Evaluation • Accurately verifying the main branches (success/failure) •

    The usage of asynchronous and mock tools like MockK and runTest is also correct. • There is an error in the usage of setWorkerFactory, and it needs to be corrected. It is sufficiently accurate, but careful consideration is required for detailed API usage.
  50. Coverage Evaluation • Covers the major branching points of success

    and failure, ensuring sufficient basic comprehensiveness. • Boundary values, abnormal systems, and multiple executions remain unverified. • The number of calls and individual verifications have not been verified. To enhance future specification changes and bug detection capabilities, additional cases are required.
  51. Reproducibility Evaluation • Eliminate external dependencies with Robolectric and MockK

    to enable stable test execution. • Implementation errors or incorrect usage carry the risk of causing unintended behavior. Basic reproducibility is ensured, but caution is required regarding the implementation method.
  52. Maintainability Evaluation • GWT naming and structure make intent easier

    to convey • Standardized setup reduces duplication • Further improving maintainability through standardizing WorkerFactory and introducing a test data builder
  53. Speed/Cost Evaluation • Local tests and mocks enable extremely fast

    execution speeds. • Low code volume and execution cost, with minimal CI load • When expanding in the future, efforts to standardize and streamline will be effective.
  54. Is it acceptable to leave “all” test generation for WorkManager

    to AI? While AI can sufficiently handle generating templates aligned with basic normal systems, major branches, and best practices, our supplementation and review remain indispensable at present due to subtle implementation differences and considerations of scalability and maintainability.
  55. Limitations of AI-Based Test Design • Simple cases like Room's

    CRUD tests can achieve sufficient quality through AI generation. • AI-generated code may still contain errors or maintainability issues in complex tests involving dependency injection and asynchronous processing, such as those using WorkManager. • AI excels at generating templates within specified parameters, but struggles with covering boundary values and abnormal cases, as well as clarifying design intent.
  56. Limitations of AI in Test Design and Generation While delegating

    "routine tasks and template generation" to AI enables significant efficiency gains, essential aspects like "quality assurance, scalability, and clarifying design intent" still require our design, review, and supplementation.
  57. Current Impressions of AI Test Design and Generation • AI

    excels at generating templates and automating routine CRUD tests. • Complex dependencies, asynchronous processing, and comprehensive coverage of edge cases and abnormal conditions are difficult to achieve with AI alone.
  58. Expectations for AI Test Design and Generation • The key

    to future evolution lies in enabling AI to understand the ambiguity in test design intent and specifications. • A future is anticipated where systems can learn from past bug histories and failure patterns to propose optimal test designs. • Automatic correction of test code and automatic adaptation to specification changes are also possibilities for the future. • AI's ability to explain the rationale and coverage of tests enhances their reliability.
  59. The Potential for Evolution in AI Test Design and Generation

    There is potential for evolution from "mere template generation" to "autonomous test design that delves into design intent and quality assurance." To achieve this, a "collaborative" approach—where AI and humans engage in dialogue, leveraging each other's strengths to advance test design—will become increasingly crucial.
  60. Finally • For now, "AI plus our division of labor"

    remains the most realistic approach, but a future where "AI autonomously leads test design while we handle reviews and final decisions" is entirely plausible. • We aim to leverage the evolution of AI to achieve higher-quality and more efficient test design.