AI Assistant 3. Let's hand off Room(DB) testing entirely to AI 4. Taking on the challenge of testing WorkManager 5. The Limits of AI Test Design and Generation and How to Work with Them Effectively 6. Potential for AI Test Design and Generation
The test can wait..." • When asynchronous processing (coroutines, flows) comes into play, things get complicated all at once. • DI, DB, API communication... Preparing mocks is just such a hassle. • Test code becomes personalized and is not maintained. Conclusion: Writing tests is a time-consuming and labor-intensive task.
{ _uiState.value = Loading try { val data = repository.fetchData() // suspend fun val processedData = process(data) _uiState.value = Success(processedData) } catch (e: Exception) { _uiState.value = Error(e) } } } Example of a ViewModel function with general data retrieval logic
• Monitoring asynchronous values • Dependency Mocking Writing these formalities by hand every time is a real hassle. Shouldn't we leave this tedious routine work to AI?
integrated into JetBrains IDEs such as IntelliJ IDEA and Android Studio. • When you throw natural language instructions in a chat format, the AI understands the code context and suggests test code or refactoring proposals. As a developer's "pair programmer," providing powerful support for daily routine tasks, research, and design.
specifications, or whether false positives/false negatives are low Coverage: Command coverage and branch coverage; are critical paths satisfied? Reproducibility: Is stability high outside of local environments such as CI? Maintainability: Whether intent is clear and whether the cost of adapting to specification changes is low Speed/Cost: Low costs such as CI execution time and review time Based on these evaluation criteria, determine whether to entrust it to AI.
Flow<List<Task>> @Insert(onConflict = OnConflictStrategy.REPLACE) suspend fun insertTask(task: Task) @Update suspend fun updateTask(task: Task) @Query("DELETE FROM tasks WHERE id = :taskId") suspend fun deleteTaskById(taskId: String) } TaskDao.kt
must meet the following requirements. ・Use an in-memory database ・Initialize the database in @Before and close it in @After ・Tests use `runTest` from kotlinx-coroutines-test Test cases must cover the following. ・Tasks inserted using insertTask can be correctly retrieved using getTasks ・The content updated via updateTask is correctly reflected ・Tasks deleted using deleteTaskById disappear from the list Prompt for generating TaskDao tests
"Old Task") val task2 = Task(id = "same_id", title = "New Task") // When taskDao.insertTask(task1) taskDao.insertTask(task2) // Overwrite with same ID val tasks = taskDao.getTasks().first() // Then assertEquals(1, tasks.size) assertEquals(task2, tasks[0]) assertEquals("New Task", tasks[0].title) TaskDaoTest.kt
flow of basic CRUD operations. • @Before/@After setup and cleanup eliminate side effects between tests. • Using runTest, we can accurately verify asynchronous processing as well.
"Task 1") val task2 = Task(id = "task2", title = "Task 2") val task3 = Task(id = "task3", title = "Task 3") // When taskDao.insertTask(task1) taskDao.insertTask(task2) taskDao.insertTask(task3) val tasks = taskDao.getTasks().first() // Then assertEquals(3, tasks.size) assertTrue(tasks.contains(task1)) assertTrue(tasks.contains(task2)) assertTrue(tasks.contains(task3)) TaskDaoTest.kt depends on the implementation of equals/hashCode
positives remain due to the absence of field-level assertions and explicit verification of record counts and side effects. • Behavior during exceptions or constraint violations (e.g., null in required fields) is not covered.
is sufficiently accurate. • More rigorous verification and coverage of abnormal conditions are required. ◦ Field-level assertions ◦ Explicit verification of case numbers and adverse effects
commands) are sufficiently covered. ◦ Inserting new tasks or retrieving multiple tasks ◦ REPLACE (overwrite) operation with the same ID • A set of test cases is available to verify that commonly used daily operations function as expected.
database • Behavior of update or delete operations on non-existent IDs • Guarantee of retrieval order during multiple inserts • Verification of event sequence and content during consecutive operations such as insert→update→delete • Exception handling for abnormal conditions, such as when required fields are null or when constraints are violated
all commands) are sufficiently covered, making it perfectly adequate for everyday use. • Enhancing robustness requires additional verification of branching and abnormal conditions.
"Task 1") val task2 = Task(id = "task2", title = "Task 2") val task3 = Task(id = "task3", title = "Task 3") // When taskDao.insertTask(task1) taskDao.insertTask(task2) taskDao.insertTask(task3) val tasks = taskDao.getTasks().first() // Then assertEquals(3, tasks.size) assertTrue(tasks.contains(task1)) assertTrue(tasks.contains(task2)) assertTrue(tasks.contains(task3)) TaskDaoTest.kt Strict verification of the sequence and timing of sequential operations causes reproducibility to fluctuate.
sufficient reproducibility. • When complex branching, sequential events, and the elimination of environmental dependencies are required, additional measures such as reviewing test runners and verification methods are necessary.
easy to read, with clear separation of responsibilities. • Setup and cleanup using @Before/@After are appropriate and have minimal side effects. • Test case naming in GWT format makes the intent clear
lateinit var taskDao: TaskDao @Before fun setup() { database = Room.inMemoryDatabaseBuilder( ApplicationProvider.getApplicationContext(), TaskDatabase::class.java ).allowMainThreadQueries().build() taskDao = database.taskDao() } TaskDaoTest.kt The process of generating DB/DAO is prone to duplication
about acquisition order and methods for verifying flow may lead to inconsistent testing approaches in the future. • We want to standardize testing styles within the project, such as explicitly specifying ORDER BY in DAOs to clarify ordering assumptions and unifying Flow verification using Turbine.
the perspective of maintainability • By promoting commonality and standardization to prepare for increased test cases and specification changes, we can achieve higher maintainability.
code duplication becoming a factor in rising costs. • Using device-dependent test runners may increase execution time and CI costs. • Robolectric, parameterized testing, and efforts toward commonality are necessary to maintain the balance between speed and cost.
given the current scale and content, the development and CI burden is minimal. • To prepare for future expansion and increased complexity, it is necessary to incorporate measures for standardization and efficiency.
operations to AI? For basic normal operations and standard patterns, it's perfectly fine to leave it to AI. However, at present, our supplementation and review are still necessary.
// Retrofit API private val dao: TaskDao // Room DAO ) : CoroutineWorker(appContext, params) { override suspend fun doWork(): Result { return try { val tasks = api.fetchTasks() // Fetch from the network tasks.forEach { dao.insertTask(it) } // Insert into the db Result.success() } catch (e: Exception) { Result.failure() } } } SyncWorker.kt
must meet the following requirements. ・Use TestListenableWorkerBuilder ・TaskApi and TaskDao, which are dependencies, are mocked using MockK. Test cases should cover the following. Success Case Testing ・If api.fetchTasks() returns a task list, return Result.success ・dao.insertTask() is being called Failure Case Testing ・If api.fetchTasks() throws an exception, return Result.failure() Prompt to generate SyncWorker tests
"Setup → Operation → Verification" is clearly defined. • Cover all major branches (success/failure) thoroughly • Properly utilize testing support tools such as MockK, Robolectric, and TestListenableWorkerBuilder
The usage of asynchronous and mock tools like MockK and runTest is also correct. • There is an error in the usage of setWorkerFactory, and it needs to be corrected. It is sufficiently accurate, but careful consideration is required for detailed API usage.
and failure, ensuring sufficient basic comprehensiveness. • Boundary values, abnormal systems, and multiple executions remain unverified. • The number of calls and individual verifications have not been verified. To enhance future specification changes and bug detection capabilities, additional cases are required.
to enable stable test execution. • Implementation errors or incorrect usage carry the risk of causing unintended behavior. Basic reproducibility is ensured, but caution is required regarding the implementation method.
to convey • Standardized setup reduces duplication • Further improving maintainability through standardizing WorkerFactory and introducing a test data builder
execution speeds. • Low code volume and execution cost, with minimal CI load • When expanding in the future, efforts to standardize and streamline will be effective.
to AI? While AI can sufficiently handle generating templates aligned with basic normal systems, major branches, and best practices, our supplementation and review remain indispensable at present due to subtle implementation differences and considerations of scalability and maintainability.
CRUD tests can achieve sufficient quality through AI generation. • AI-generated code may still contain errors or maintainability issues in complex tests involving dependency injection and asynchronous processing, such as those using WorkManager. • AI excels at generating templates within specified parameters, but struggles with covering boundary values and abnormal cases, as well as clarifying design intent.
"routine tasks and template generation" to AI enables significant efficiency gains, essential aspects like "quality assurance, scalability, and clarifying design intent" still require our design, review, and supplementation.
excels at generating templates and automating routine CRUD tests. • Complex dependencies, asynchronous processing, and comprehensive coverage of edge cases and abnormal conditions are difficult to achieve with AI alone.
to future evolution lies in enabling AI to understand the ambiguity in test design intent and specifications. • A future is anticipated where systems can learn from past bug histories and failure patterns to propose optimal test designs. • Automatic correction of test code and automatic adaptation to specification changes are also possibilities for the future. • AI's ability to explain the rationale and coverage of tests enhances their reliability.
There is potential for evolution from "mere template generation" to "autonomous test design that delves into design intent and quality assurance." To achieve this, a "collaborative" approach—where AI and humans engage in dialogue, leveraging each other's strengths to advance test design—will become increasingly crucial.
remains the most realistic approach, but a future where "AI autonomously leads test design while we handle reviews and final decisions" is entirely plausible. • We aim to leverage the evolution of AI to achieve higher-quality and more efficient test design.