Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Towards Reliable Agile Iterative Planning via Predicting Documentation Changes of Work Items

Towards Reliable Agile Iterative Planning via Predicting Documentation Changes of Work Items

In agile iterative development, an agile team needs to analyze documented information for effort estimation and sprint planning. While documentation can be changed, the documentation changes after sprint planning may invalidate the estimated effort and sprint plan. Hence, to help the team be aware of the potential documentation changes, we developed DocWarn to estimate the probability that a work item will have documentation changes. We developed three variations of DocWarn, which are based on the characteristics extracted from the work items (DocWarn-C), the natural language text (DocWarn-T), and both inputs (DocWarn-H). Based on nine open-source projects that work in sprints and actively maintain documentation, DocWarn can predict the documentation changes with an average AUC of 0.75 and an average F1-Score of 0.36, which are significantly higher than the baselines. We also found that the most influential characteristics of a work item for determining the future documentation changes are the past tendency of the developers and the length of description text. Based on the qualitative assessment, we found that 40%-68% of the correctly predicted documentation changes were related to scope modification. With the prediction of DocWarn, the team will be better aware of the potential documentation changes during sprint planning, allowing the team to manage the uncertainty and reduce the risk of unreliable effort estimation and sprint planning.

Jirat Pasuksmit

May 18, 2022
Tweet

More Decks by Jirat Pasuksmit

Other Decks in Technology

Transcript

  1. Jirat
 Pasuksmit Towards Reliable Agile Iterative Planning via Predicting Documentation

    Changes of Work Items Patanamon (Pick)
 Thongtanunam Shanika
 Karunasekera @jiratpa MSR 2022: Technical track @pamon [email protected] [email protected] [email protected] 👈 me
  2. Agile teams rely on documentation in planning the time-boxed iterations

    (sprints). Completed 
 Features + Planning Implementing Releasing Sprint 
 (2-4 weeks) Product Backlog work item work item work item Sprint Backlog work item work item Selecting work item Estimate work item Detail 
 analysis 1/8 (Pasuksmit et. al., ICSME 2021, Kasauli et al., RE 2017, Andriyani et al., KSEM 2017, Gralha et al., ICSE 2018) Documented information The teams analyze the documented information to estimate effort and selects the work items to work in the sprint e.g., user story, acceptance criteria
  3. Even though documentation can be updated in Agile, it might

    negatively impact the sprint plan = 7 days During sprint planning Implement a script to do […] that can be used as a server- side extension to […], on the machines used in […]. * Install necessary version of […]. Sprint Backlog ( fi t the sprint capacity) Documented information is important for effort estimation The changes of documented information can negatively impact the estimation accuracy (and the sprint plan) Pasuksmit et. al., ICSME 2021 Sprint Backlog Implement a script to do […] that can be used as a server- side extension to […], on the machines used in […]. * Install necessary version of […].
 * Implement […] procedure.
 * Interface with […] via JSON inputs and outputs. The inputs include […]. After sprint planning Documentation changes could lead to re-estimation = 14 days (Exceed the sprint capacity) 2/8
  4. DocWarn - An approach that enable the team to foresee

    uncertainty in documentation DocWarn-C 
 (characteristics) DocWarn-H 
 (hybrid) DocWarn-T 
 (text) Past tendency Pre-Sprint Changes Collaboration Readability Primitive Attributes Completeness Technique: Random Forest classifier Data: 41 metrics grouped into 6 dimensions Technique: Neural network classifier with 
 DistilRoBERTa text embedding 
 (fine-tuned using 119,254 work items) 
 Data: summary and description of work items Technique: Neural network classifier 
 Data: both metrics and text documentation change: 19% documentation change: 15% documentation change: 23% documentation change: 78% DocWarn Probability DocWarn Probability DocWarn allows the team foresee the probability of documentation changes. During sprint planning Objective: help the team gain con fi dence in effort estimation and sprint planning. 3/8
  5. Case Study Design Open source projects that 
 (1) actively

    work in sprints and 
 (2) actively maintain documentation Studied Projects From five large open-source organizations, nine 
 projects passed the two project selection criteria Data cleansing A studied work item must 
 (1) assigned to a sprint 
 (2) with a status of done, closed, or resolved 
 (3) with a resolution of done, complete, or fixed Total: 17,731 work items Identifying documentation changes A work item have documentation change if the summary and description were semantically changed by at least 10% Text B Text B Text B Text B Created work item #1 Closed work item #1 Scenario #2: A work item with changed documentation Description text changed Created work item #2 Closed work item #2 Assigned to sprint Description text changed (+1hr) Compare semantic difference (cosine similarity < 0.9) Compare semantic difference (cosine similarity >= 0.9) Assigned to sprint (+1hr) Text A Text A Text A Text B (Training data) RQ1) How well can we predict documentation changes? RQ2) What are the most influential characteristics of a work item 
 for determining the documentation changes? RQ3) What are the reasons for the documentation changes 
 in the work items that we can correctly predict Research questions 4/8
  6. RQ1) How well can we predict documentation changes? Results: DocWarns

    performs better than baselines 
 DocWarn-C performs better than the other two variations DocWarn-T DocWarn-H OneR Random 0 0.2 0.4 0.6 0.8 AUC F1-Score DocWarn-C Implication: Our approach based on the characteristics of work items can predict 
 the documentation changes occurred after sprint assignment time. Methods: (1) measure performance based on 10 x 5-fold cross validation 
 (2) compare the performance with random and OneR 
 (using Wilcoxon signed-rank test) 5/8
  7. RQ2) What are the most influential characteristics of a work

    item for determining the documentation changes? Implication: The work items with these characteristics may need attention 
 during the sprint planning as they may be changed in the future. Methods: (1) measure Mean Decrease Accuracy of each metrics in the model 
 
 (2) find a statistical distinct rank using Scott-Knott test Results: The past tendency of developers and the length of description text 
 were the most influential metrics in DocWarn-C Stable rate of the past work items of the reporter Stable rate in the past work items of the assignee Length of description text (summary + description) of the work item 6/8
  8. Results: DocWarn-C can identify documentation changes related to … 0%-30%

    27%- 47% 6%-27% 4%- 20% RQ3) What are the reasons for the documentation changes in the work items that we can correctly predict? Methods: (1) perform open-coding to find the reasons for documentation changes 
 (2) categorize the correctly predicted documentation changes Implication: DocWarn can identify documentation changes that are related to 
 scope modification, i.e., changing scope and defining scope 7/8
  9. documentation change: 19% documentation change: 15% documentation change: 23% documentation

    change: 78% DocWarn Probability DocWarn Probability DocWarn allows the team foresee the probability of documentation changes. DocWarn-C can predict the documentation changes Achieved a better performance than 
 other variations and baselines (RQ1) 0 0.2 0.4 0.6 0.8 The past tendency and text length are the most influential metrics (RQ2) It can predict the documentation changes related to scope modification (RQ3) The change of documented information may negatively impact the estimated effort and sprint plan Implement a script to do […] that can be used as a server-side extension to […], on the machines used in […]. * Install necessary version of […]. Implement a script to do […] that can be used as a server-side extension to […], on the machines used in […]. * Install necessary version of […].
 * Implement […] procedure.
 * Interface with […] via JSON inputs and outputs. The inputs include […]. Sprint Backlog Sprint Backlog Agile team can apply our approach to reduce the risk of unreliable effort estimation and sprint planning Jirat Pasuksmit
 The University of Melbourne
 [email protected][email protected] 8/8