Upgrade to Pro — share decks privately, control downloads, hide ads and more …

FrUITeR: A Framework for Evaluating UI Test Reuse

E4ec94f3e536e3066279a59adc0cf14d?s=47 Yixue Zhao
November 02, 2020

FrUITeR: A Framework for Evaluating UI Test Reuse

Presentation slides of the paper "FrUITeR: A Framework for Evaluating UI Test Reuse" at the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) 2020.
Presentation link: https://youtu.be/zVWpT5aLyQo

E4ec94f3e536e3066279a59adc0cf14d?s=128

Yixue Zhao

November 02, 2020
Tweet

Transcript

  1. FrUITeR: A Framework for Evaluating UI Test Reuse Yixue Zhao1,

    Justin Chen2, Adriana Sejfia1, Marcelo Schmitt Laser1, Jie Zhang3, Federica Sarro3, Mark Harman3, Nenad Medvidović1 ESEC/FSE 2020, Virtual Event 1 2 3
  2. What’s UI Test Reuse? 2 ▪ New and exciting!! J

    ▪ Test generation technique ▪ “Usage-based” UI test ▪ Reuse existing tests ▪ UI similarities a1 b1 a1-1 a1-2 a1-3 b2 b1-1 b2-1 a1 b1 a1-1 a1-2 a1-3 b2 b3 b1-1 b2-1 b3-3 b3-2 b3-1 Source app Target app
  3. Why evaluation framework? 3 Ideal Happy Yixue in her proposal

    UI Test Reuse!
  4. Why evaluation framework? 4 Ideal Happy Yixue in her proposal

    UI Test Reuse! Reality Happier people submitting papers I’m done hehe!
  5. ▪ Limitations of existing work ▪ Which techniques are better

    ▪ Compare my new technique 5 FrUITeR was born A Framework for Evaluating UI Test Reuse
  6. So hard! ▪ Identify 5 key challenges ▪ Establish 5

    requirements ▪ Design framework ▪ Build benchmark ▪ Migrate existing work 6
  7. Challenges ▪ Evaluation metrics are different and limited ▪ Significant

    manual effort ▪ No guidelines for manual inspection ▪ One-off solutions evaluated as a whole ▪ Different benchmarks 7
  8. Metrics Evaluation metrics are different and limited 1. 50% !=

    50% Accuracy != Good
  9. 9 a1 b1 a1-1 a1-2 a1-3 b2 b3 b1-1 b2-1

    b3-3 b3-2 b3-1 Wish Etsy
  10. Manual Significant manual effort 2.

  11. 11 a1 b1 a1-1 a1-2 a1-3 b2 b3 b1-1 b2-1

    b3-3 b3-2 b3-1 Wish Etsy Wish → Etsy 1 test w/ 3 events 3 times
  12. 12 Wish Etsy Wish → Etsy 1 test w/ 3

    events 3 times 10 apps (100 pairs) 10 tests w/ 10 events 100 × 100 times!
  13. Guideline No guidelines for manual inspection 3. Biased & Unreproducible

  14. 14 a1 b1 a1-1 a1-2 a1-3 b2 b3 b1-1 b2-1

    b3-3 b3-2 b3-1 Wish Etsy a1-1 à b3-1???
  15. Address Challenge #1 Metrics 15 1. Evaluation metrics are different

    and limited
  16. Address Challenge #1 Metrics 16 1. Evaluation metrics are different

    and limited 7 Fidelity Metrics
  17. Address Challenge #1 Metrics 17 1. Evaluation metrics are different

    and limited Effort = Levenshtein Distance (transEvents, gtEvents) 2 Utility Metrics
  18. Address Challenges #2 & #3 18 2. Significant manual effort

    3. No guidelines for manual inspection
  19. 19

  20. 20 Uniform Representation

  21. 21 Uniform Representation

  22. 22

  23. 23 a1 b1 a1-1 a1-2 a1-3 b2 b1-1 b2-1 a1

    b1 a1-1 a1-2 a1-3 b2 b3 b1-1 b2-1 b3-3 b3-2 b3-1 Wish Canonical Map a1-1 à email a1-2 à password a1-3 à sign in Etsy Canonical Map b3-1 à email b3-2 à password b3-3 à sign in Canonical Map
  24. 24 Wish Canonical Map a1-1 à email a1-2 à password

    a1-3 à sign in Etsy Canonical Map b3-1 à email b3-2 à password b3-3 à sign in Address Challenge #2 ONLY manual effort
  25. 25 Wish Etsy Wish → Etsy 1 test w/ 3

    events 3 times 10 apps (100 pairs) 10 tests w/ 10 events 100 × 100 times!
  26. 26 Address Challenge #3 a1 b1 a1-1 a1-2 a1-3 b2

    b1-1 b a1 b1 a1-3 b2 b3 b1-1 b2-1 b3-3 b3-2 b3-1 Wish Canonical Map a1-1 à email a1-2 à password a1-3 à sign in Etsy Canonical Map b3-1 à email b3-2 à password b3-3 à sign in
  27. 27 a1 b1 a1-1 a1-2 a1-3 b2 b1-1 b a1

    b1 a1-3 b2 b3 b1-1 b2-1 b3-3 b3-2 b3-1 Wish Canonical Map a1-1 à email a1-2 à password a1-3 à sign in Etsy Canonical Map b3-1 à email username b3-2 à password b3-3 à sign in Address Challenge #3
  28. More challenges in reality… 28 Contact authors Study implementation Modify/verify

    implementation Establish 239 benchmark tests Construct 20 ground- truth Canonical Maps
  29. FrUITeR’s Empirical Results 29 ▪ 1,000 source-target app pairs (2

    app categories × 100 app pairs in each category × 5 techniques) ▪ 11,917 results entries ▪ 7 fidelity metrics ▪ 2 utility metrics
  30. FrUITeR’s Empirical Results 30

  31. FrUITeR’s Empirical Results 31 https://felicitia.github.io/FrUITeR/

  32. FrUITeR’s Selected Implications 32 ▪ Perfect isn’t perfect (e.g., fidelity

    vs. utility) ▪ Source app selection (e.g., app company, code clone) ▪ Testing technique selection (e.g., trade-offs, manual vs. auto)
  33. Thanks! Any questions? yixue.zhao@usc.edu @yixue_zhao https://softarch.usc.edu/~yixue/