No More Writing Test Code: JetBrains AI Assistant Automates Design and Generation of Asynchronous Processing Tests

DroidKaigi 2025 No More Writing Test Code: JetBrains AI Assistant
Automates Design and Generation of Asynchronous Processing Tests Masahiro Saito

Masahiro Saito 齊藤正宏 pixiv Inc. 8th Year Smartphone App
Engineer Engineering Recruitment Lead github.com/m4kvn

Agenda 1. The Pain of Android Testing 2. Introducing JetBrains
AI Assistant 3. Let's hand off Room(DB) testing entirely to AI 4. Taking on the challenge of testing WorkManager 5. The Limits of AI Test Design and Generation and How to Work with Them Effectively 6. Potential for AI Test Design and Generation

Androidテストの「つらみ」 The "pains" of Android testing

The Ideal Android Test Quality is guaranteed through testing! We're
releasing with zero bugs! • High test coverage • Robust testing resilient to speciﬁcation changes • Test code anyone can write

The Reality of Android Testing "Ugh, I don't have time...
The test can wait..." • When asynchronous processing (coroutines, ﬂows) comes into play, things get complicated all at once. • DI, DB, API communication... Preparing mocks is just such a hassle. • Test code becomes personalized and is not maintained. Conclusion: Writing tests is a time-consuming and labor-intensive task.

なぜ非同期処理のテストは難しい？ Why is asynchronous testing difficult?

// Example: Part of a ViewModel fun fetchData() { viewModelScope.launch
{ _uiState.value = Loading try { val data = repository.fetchData() // suspend fun val processedData = process(data) _uiState.value = Success(processedData) } catch (e: Exception) { _uiState.value = Error(e) } } } Example of a ViewModel function with general data retrieval logic

Testing asynchronous processing is full of formalities • Thread Replacement
• Monitoring asynchronous values • Dependency Mocking Writing these formalities by hand every time is a real hassle. Shouldn't we leave this tedious routine work to AI?

JetBrains AI Assistant登場 Introducing JetBrains AI Assistant

JetBrains AI Assistant • An AI assistant that can be
integrated into JetBrains IDEs such as IntelliJ IDEA and Android Studio. • When you throw natural language instructions in a chat format, the AI understands the code context and suggests test code or refactoring proposals. As a developer's "pair programmer," providing powerful support for daily routine tasks, research, and design.

JetBrains AI Assistant

Gemini • Gemini is available in Android Studio. • Gemini
and JetBrains AI Assistant are different. • We will not discuss the differences in this session.

Key evaluation criteria Accuracy: Whether the implementation is correct against
specifications, or whether false positives/false negatives are low Coverage: Command coverage and branch coverage; are critical paths satisfied? Reproducibility: Is stability high outside of local environments such as CI? Maintainability: Whether intent is clear and whether the cost of adapting to specification changes is low Speed/Cost: Low costs such as CI execution time and review time Based on these evaluation criteria, determine whether to entrust it to AI.

【実践1】Room(DB)のテストをAIに丸投げしてみる [Practice 1] Let's try to outsource the Room
(DB) testing to AI

Topic: Generate tests for the TaskDao that performs CRUD operations
on the Task entity

@Dao interface TaskDao { @Query("SELECT * FROM tasks") fun getTasks():
Flow<List<Task>> @Insert(onConflict = OnConflictStrategy.REPLACE) suspend fun insertTask(task: Task) @Update suspend fun updateTask(task: Task) @Query("DELETE FROM tasks WHERE id = :taskId") suspend fun deleteTaskById(taskId: String) } TaskDao.kt

Please create unit tests for TaskDao. The tests you create
must meet the following requirements. ・Use an in-memory database ・Initialize the database in @Before and close it in @After ・Tests use `runTest` from kotlinx-coroutines-test Test cases must cover the following. ・Tasks inserted using insertTask can be correctly retrieved using getTasks ・The content updated via updateTask is correctly reflected ・Tasks deleted using deleteTaskById disappear from the list Prompt for generating TaskDao tests

【実践1】AIが生成したコードの注目すべきポイント [Practice 1] Important points about AI-generated code

@RunWith(AndroidJUnit4::class) class TaskDaoTest { private lateinit var database: TaskDatabase private
lateinit var taskDao: TaskDao @Before fun setup() { database = Room.inMemoryDatabaseBuilder( ApplicationProvider.getApplicationContext(), TaskDatabase::class.java ).allowMainThreadQueries().build() taskDao = database.taskDao() } @After fun teardown() { database.close() } TaskDaoTest.kt

@Test fun insertTask_getsTasks_returnsInsertedTask() = runTest { // Given val task
= Task(id = "task1", title = "Test Task") // When taskDao.insertTask(task) val tasks = taskDao.getTasks().first() // Then assertEquals(1, tasks.size) assertEquals(task, tasks[0]) } TaskDaoTest.kt

@Test fun insertTask_getsTasks_returnsInsertedTask() @Test fun updateTask_getsTasks_returnsUpdatedTask() @Test fun deleteTaskById_getsTasks_returnsEmptyList() @Test
fun multipleInsertTasks_getsTasks_returnsAllTasks() @Test fun insertDuplicateTaskId_getsTasks_replacesOldTask() TaskDaoTest.kt

@Test fun multipleInsertTasks_getsTasks_returnsAllTasks() = runTest { ... // Then assertEquals(3,
tasks.size) assertTrue(tasks.contains(task1)) assertTrue(tasks.contains(task2)) assertTrue(tasks.contains(task3)) } TaskDaoTest.kt

【Practice 1】Key Points to AI-Generated Code • The flow of
"Setup → Operation → Verification" is clearly defined. • Covering the normal operation of basic CRUD

【実践1】評価軸による分析 [Practice 1] Analysis based on evaluation criteria

Accuracy Evaluation (Good) • We have clearly veriﬁed the normal
ﬂow of basic CRUD operations.

// Given val task1 = Task(id = "task1", title =
"Task 1") val task2 = Task(id = "task2", title = "Task 2") val task3 = Task(id = "task3", title = "Task 3") // When taskDao.insertTask(task1) taskDao.insertTask(task2) taskDao.insertTask(task3) val tasks = taskDao.getTasks().first() // Then assertEquals(3, tasks.size) assertTrue(tasks.contains(task1)) assertTrue(tasks.contains(task2)) assertTrue(tasks.contains(task3)) TaskDaoTest.kt

// Given val task1 = Task(id = "same_id", title =
"Old Task") val task2 = Task(id = "same_id", title = "New Task") // When taskDao.insertTask(task1) taskDao.insertTask(task2) // Overwrite with same ID val tasks = taskDao.getTasks().first() // Then assertEquals(1, tasks.size) assertEquals(task2, tasks[0]) assertEquals("New Task", tasks[0].title) TaskDaoTest.kt

Accuracy Evaluation (Good) • We have clearly veriﬁed the normal
ﬂow of basic CRUD operations. • @Before/@After setup and cleanup eliminate side effects between tests. • Using runTest, we can accurately verify asynchronous processing as well.

"Task 1") val task2 = Task(id = "task2", title = "Task 2") val task3 = Task(id = "task3", title = "Task 3") // When taskDao.insertTask(task1) taskDao.insertTask(task2) taskDao.insertTask(task3) val tasks = taskDao.getTasks().first() // Then assertEquals(3, tasks.size) assertTrue(tasks.contains(task1)) assertTrue(tasks.contains(task2)) assertTrue(tasks.contains(task3)) TaskDaoTest.kt depends on the implementation of equals/hashCode

Accuracy Evaluation (Bad) • During deletion and update testing, false
positives remain due to the absence of field-level assertions and explicit verification of record counts and side effects. • Behavior during exceptions or constraint violations (e.g., null in required fields) is not covered.

Summary of Accuracy Evaluation • "The basic CRUD normal flow"
is sufficiently accurate. • More rigorous verification and coverage of abnormal conditions are required. ◦ Field-level assertions ◦ Explicit verification of case numbers and adverse effects

Coverage Evaluation (Good) • Basic CRUD normal cases (covering all
commands) are sufficiently covered. ◦ Inserting new tasks or retrieving multiple tasks ◦ REPLACE (overwrite) operation with the same ID • A set of test cases is available to verify that commonly used daily operations function as expected.

"Task 1") val task2 = Task(id = "task2", title = "Task 2") val task3 = Task(id = "task3", title = "Task 3") // When taskDao.insertTask(task1) taskDao.insertTask(task2) taskDao.insertTask(task3) val tasks = taskDao.getTasks().first() // Then assertEquals(3, tasks.size) assertTrue(tasks.contains(task1)) assertTrue(tasks.contains(task2)) assertTrue(tasks.contains(task3)) TaskDaoTest.kt

Unverified branches and edge cases • Retrieval from the empty
database • Behavior of update or delete operations on non-existent IDs • Guarantee of retrieval order during multiple inserts • Verification of event sequence and content during consecutive operations such as insert→update→delete • Exception handling for abnormal conditions, such as when required fields are null or when constraints are violated

Summary of Coverage Evaluation • The basic CRUD operations (covering
all commands) are sufficiently covered, making it perfectly adequate for everyday use. • Enhancing robustness requires additional veriﬁcation of branching and abnormal conditions.

Reproducibility Evaluation (Good) • By utilizing an in-memory database and
runTest, external factors are minimized, ensuring basic reproducibility. • The same input and state will yield the same result.

lateinit var taskDao: TaskDao @Before fun setup() { database = Room.inMemoryDatabaseBuilder( ApplicationProvider.getApplicationContext(), TaskDatabase::class.java ).allowMainThreadQueries().build() taskDao = database.taskDao() } TaskDaoTest.kt Dependent on emulators or physical devices

"Task 1") val task2 = Task(id = "task2", title = "Task 2") val task3 = Task(id = "task3", title = "Task 3") // When taskDao.insertTask(task1) taskDao.insertTask(task2) taskDao.insertTask(task3) val tasks = taskDao.getTasks().first() // Then assertEquals(3, tasks.size) assertTrue(tasks.contains(task1)) assertTrue(tasks.contains(task2)) assertTrue(tasks.contains(task3)) TaskDaoTest.kt Strict veriﬁcation of the sequence and timing of sequential operations causes reproducibility to ﬂuctuate.

Summary of Reproducibility Evaluation • "Basic CRUD single-test veriﬁcation" possesses
sufficient reproducibility. • When complex branching, sequential events, and the elimination of environmental dependencies are required, additional measures such as reviewing test runners and veriﬁcation methods are necessary.

Maintainability Evaluation (Good) • The test code is simple and
easy to read, with clear separation of responsibilities. • Setup and cleanup using @Before/@After are appropriate and have minimal side effects. • Test case naming in GWT format makes the intent clear

lateinit var taskDao: TaskDao @Before fun setup() { database = Room.inMemoryDatabaseBuilder( ApplicationProvider.getApplicationContext(), TaskDatabase::class.java ).allowMainThreadQueries().build() taskDao = database.taskDao() } TaskDaoTest.kt The process of generating DB/DAO is prone to duplication

Maintainability Evaluation (Bad) • The lack of standardization regarding assumptions
about acquisition order and methods for verifying ﬂow may lead to inconsistent testing approaches in the future. • We want to standardize testing styles within the project, such as explicitly specifying ORDER BY in DAOs to clarify ordering assumptions and unifying Flow veriﬁcation using Turbine.

Summary of Maintainability Evaluation • Meets a certain standard from
the perspective of maintainability • By promoting commonality and standardization to prepare for increased test cases and speciﬁcation changes, we can achieve higher maintainability.

Speed/Cost Evaluation (Good) • Test execution is accelerated by utilizing
an in-memory database • Low description and execution cost per case • AI generation keeps initial creation costs low

Speed/Cost Evaluation (Bad) • Increasing test cases may lead to
code duplication becoming a factor in rising costs. • Using device-dependent test runners may increase execution time and CI costs. • Robolectric, parameterized testing, and efforts toward commonality are necessary to maintain the balance between speed and cost.

Summary of Speed/Cost Evaluation • It is highly efficient, and
given the current scale and content, the development and CI burden is minimal. • To prepare for future expansion and increased complexity, it is necessary to incorporate measures for standardization and efficiency.

Is it acceptable to leave “all” test generation for CRUD
operations to AI? For basic normal operations and standard patterns, it's perfectly ﬁne to leave it to AI. However, at present, our supplementation and review are still necessary.

【実践2】WorkManagerのテストという壁に挑む [Practice 2] Tackling the challenge of testing WorkManager

Topic: Generate tests for SyncWorker that fetches data from the
API and saves it to the database.

class SyncWorker( appContext: Context, params: WorkerParameters, private val api: TaskApi,
// Retrofit API private val dao: TaskDao // Room DAO ) : CoroutineWorker(appContext, params) { override suspend fun doWork(): Result { return try { val tasks = api.fetchTasks() // Fetch from the network tasks.forEach { dao.insertTask(it) } // Insert into the db Result.success() } catch (e: Exception) { Result.failure() } } } SyncWorker.kt

Please create unit tests for SyncWorker. The tests you create
must meet the following requirements. ・Use TestListenableWorkerBuilder ・TaskApi and TaskDao, which are dependencies, are mocked using MockK. Test cases should cover the following. Success Case Testing ・If api.fetchTasks() returns a task list, return Result.success ・dao.insertTask() is being called Failure Case Testing ・If api.fetchTasks() throws an exception, return Result.failure() Prompt to generate SyncWorker tests

【実践2】AIが生成したコードの注目すべきポイント [Practice 2] Important points about AI-generated code

@RunWith(RobolectricTestRunner::class) class SyncWorkerTest { private lateinit var context: Context private
lateinit var taskApi: TaskApi private lateinit var taskDao: TaskDao @Before fun setUp() { context = ApplicationProvider.getApplicationContext() taskApi = mockk() taskDao = mockk(relaxUnitFun = true) } SyncWorkerTest.kt

@Test fun givenApiReturnsTasks...() = runTest { // Given val tasks
= listOf(...) coEvery { taskApi.fetchTasks() } returns tasks val worker = ... // When val result = worker.doWork() // Then assertEquals(ListenableWorker.Result.success(), result) tasks.forEach { task -> coVerify { taskDao.insertTask(task) } } } SyncWorkerTest.kt

【Practice 2】Key Points to AI-Generated Code • The flow of
"Setup → Operation → Verification" is clearly defined. • Cover all major branches (success/failure) thoroughly • Properly utilize testing support tools such as MockK, Robolectric, and TestListenableWorkerBuilder

【実践2】評価軸による分析 [Practice 2] Analysis based on evaluation criteria

Accuracy Evaluation

@RunWith(RobolectricTestRunner::class) class SyncWorkerTest { private lateinit var context: Context private
lateinit var taskApi: TaskApi private lateinit var taskDao: TaskDao @Before fun setUp() { context = ApplicationProvider.getApplicationContext() taskApi = mockk() taskDao = mockk(relaxUnitFun = true) } SyncWorkerTest.kt

@Test fun givenApiReturnsTasks_... = runTest { // Given ... val
worker = TestListenableWorkerBuilder<SyncWorker>(context) .setWorkerFactory { appContext, params -> SyncWorker( appContext = appContext, params = params, api = taskApi, dao = taskDao ) } .build() ... } SyncWorkerTest.kt

@Test fun givenApiReturnsTasks_whenWorkerExecutes_thenRetur nsSuccessAndSavesTasks() @Test fun givenApiThrowsException_whenWorkerExecutes_thenRe turnsFailureAndNoTasksAreSaved() SyncWorkerTest.kt

@Test fun givenApiThrowsException_...() = runTest { // Given coEvery {
taskApi.fetchTasks() } throws Exception("...") // When val worker = ... val result = worker.doWork() // Then assertEquals(ListenableWorker.Result.failure(), result) coVerify(exactly = 0) { taskDao.insertTask(any()) } } SyncWorkerTest.kt

@Test fun givenApiReturnsTasks_... = runTest { // Given ... val
worker = TestListenableWorkerBuilder<SyncWorker>(context) .setWorkerFactory { appContext, params -> SyncWorker( appContext = appContext, params = params, api = taskApi, dao = taskDao ) } .build() ... } SyncWorkerTest.kt Pass an inherited class of WorkerFactory

Accuracy Evaluation • Accurately verifying the main branches (success/failure) •
The usage of asynchronous and mock tools like MockK and runTest is also correct. • There is an error in the usage of setWorkerFactory, and it needs to be corrected. It is sufficiently accurate, but careful consideration is required for detailed API usage.

Coverage Evaluation • Covers the major branching points of success
and failure, ensuring sufficient basic comprehensiveness. • Boundary values, abnormal systems, and multiple executions remain unverified. • The number of calls and individual verifications have not been verified. To enhance future specification changes and bug detection capabilities, additional cases are required.

Reproducibility Evaluation • Eliminate external dependencies with Robolectric and MockK
to enable stable test execution. • Implementation errors or incorrect usage carry the risk of causing unintended behavior. Basic reproducibility is ensured, but caution is required regarding the implementation method.

Maintainability Evaluation • GWT naming and structure make intent easier
to convey • Standardized setup reduces duplication • Further improving maintainability through standardizing WorkerFactory and introducing a test data builder

Speed/Cost Evaluation • Local tests and mocks enable extremely fast
execution speeds. • Low code volume and execution cost, with minimal CI load • When expanding in the future, efforts to standardize and streamline will be effective.

Is it acceptable to leave “all” test generation for WorkManager
to AI? While AI can sufficiently handle generating templates aligned with basic normal systems, major branches, and best practices, our supplementation and review remain indispensable at present due to subtle implementation differences and considerations of scalability and maintainability.

AI設計の限界と「うまい付き合い方」 The limits of AI design and how to live
with it

Limitations of AI-Based Test Design • Simple cases like Room's
CRUD tests can achieve sufficient quality through AI generation. • AI-generated code may still contain errors or maintainability issues in complex tests involving dependency injection and asynchronous processing, such as those using WorkManager. • AI excels at generating templates within speciﬁed parameters, but struggles with covering boundary values and abnormal cases, as well as clarifying design intent.

Limitations of AI in Test Design and Generation While delegating
"routine tasks and template generation" to AI enables signiﬁcant efficiency gains, essential aspects like "quality assurance, scalability, and clarifying design intent" still require our design, review, and supplementation.

AIテスト設計・生成の期待と展望 Prospects and Outlook for AI Test Design and Generation

Current Impressions of AI Test Design and Generation • AI
excels at generating templates and automating routine CRUD tests. • Complex dependencies, asynchronous processing, and comprehensive coverage of edge cases and abnormal conditions are difficult to achieve with AI alone.

Expectations for AI Test Design and Generation • The key
to future evolution lies in enabling AI to understand the ambiguity in test design intent and speciﬁcations. • A future is anticipated where systems can learn from past bug histories and failure patterns to propose optimal test designs. • Automatic correction of test code and automatic adaptation to speciﬁcation changes are also possibilities for the future. • AI's ability to explain the rationale and coverage of tests enhances their reliability.

The Potential for Evolution in AI Test Design and Generation
There is potential for evolution from "mere template generation" to "autonomous test design that delves into design intent and quality assurance." To achieve this, a "collaborative" approach—where AI and humans engage in dialogue, leveraging each other's strengths to advance test design—will become increasingly crucial.

Finally • For now, "AI plus our division of labor"
remains the most realistic approach, but a future where "AI autonomously leads test design while we handle reviews and ﬁnal decisions" is entirely plausible. • We aim to leverage the evolution of AI to achieve higher-quality and more efficient test design.

No More Writing Test Code: JetBrains AI Assista...

No More Writing Test Code: JetBrains AI Assistant Automates Design and Generation of Asynchronous Processing Tests

More Decks by makun

Other Decks in Programming

Featured

Transcript