FrUITeR: A Framework for Evaluating UI Test Reuse

FrUITeR: A Framework for Evaluating UI Test Reuse Yixue Zhao1,
Justin Chen2, Adriana Sejfia1, Marcelo Schmitt Laser1, Jie Zhang3, Federica Sarro3, Mark Harman3, Nenad Medvidović1 ESEC/FSE 2020, Virtual Event 1 2 3

What’s UI Test Reuse? 2 ▪ New and exciting!! J
▪ Test generation technique ▪ “Usage-based” UI test ▪ Reuse existing tests ▪ UI similarities a1 b1 a1-1 a1-2 a1-3 b2 b1-1 b2-1 a1 b1 a1-1 a1-2 a1-3 b2 b3 b1-1 b2-1 b3-3 b3-2 b3-1 Source app Target app

Why evaluation framework? 3 Ideal Happy Yixue in her proposal
UI Test Reuse!

Why evaluation framework? 4 Ideal Happy Yixue in her proposal
UI Test Reuse! Reality Happier people submitting papers I’m done hehe!

▪ Limitations of existing work ▪ Which techniques are better
▪ Compare my new technique 5 FrUITeR was born A Framework for Evaluating UI Test Reuse

So hard! ▪ Identify 5 key challenges ▪ Establish 5
requirements ▪ Design framework ▪ Build benchmark ▪ Migrate existing work 6

Challenges ▪ Evaluation metrics are different and limited ▪ Significant
manual effort ▪ No guidelines for manual inspection ▪ One-off solutions evaluated as a whole ▪ Different benchmarks 7

Metrics Evaluation metrics are different and limited 1. 50% !=
50% Accuracy != Good

9 a1 b1 a1-1 a1-2 a1-3 b2 b3 b1-1 b2-1
b3-3 b3-2 b3-1 Wish Etsy

Manual Significant manual effort 2.

11 a1 b1 a1-1 a1-2 a1-3 b2 b3 b1-1 b2-1
b3-3 b3-2 b3-1 Wish Etsy Wish → Etsy 1 test w/ 3 events 3 times

12 Wish Etsy Wish → Etsy 1 test w/ 3
events 3 times 10 apps (100 pairs) 10 tests w/ 10 events 100 × 100 times!

Guideline No guidelines for manual inspection 3. Biased & Unreproducible

14 a1 b1 a1-1 a1-2 a1-3 b2 b3 b1-1 b2-1
b3-3 b3-2 b3-1 Wish Etsy a1-1 à b3-1???

Address Challenge #1 Metrics 15 1. Evaluation metrics are different
and limited

and limited 7 Fidelity Metrics

and limited Effort = Levenshtein Distance (transEvents, gtEvents) 2 Utility Metrics

Address Challenges #2 & #3 18 2. Significant manual effort
3. No guidelines for manual inspection

20 Uniform Representation

21 Uniform Representation

23 a1 b1 a1-1 a1-2 a1-3 b2 b1-1 b2-1 a1
b1 a1-1 a1-2 a1-3 b2 b3 b1-1 b2-1 b3-3 b3-2 b3-1 Wish Canonical Map a1-1 à email a1-2 à password a1-3 à sign in Etsy Canonical Map b3-1 à email b3-2 à password b3-3 à sign in Canonical Map

24 Wish Canonical Map a1-1 à email a1-2 à password
a1-3 à sign in Etsy Canonical Map b3-1 à email b3-2 à password b3-3 à sign in Address Challenge #2 ONLY manual effort

25 Wish Etsy Wish → Etsy 1 test w/ 3
events 3 times 10 apps (100 pairs) 10 tests w/ 10 events 100 × 100 times!

26 Address Challenge #3 a1 b1 a1-1 a1-2 a1-3 b2
b1-1 b a1 b1 a1-3 b2 b3 b1-1 b2-1 b3-3 b3-2 b3-1 Wish Canonical Map a1-1 à email a1-2 à password a1-3 à sign in Etsy Canonical Map b3-1 à email b3-2 à password b3-3 à sign in

27 a1 b1 a1-1 a1-2 a1-3 b2 b1-1 b a1
b1 a1-3 b2 b3 b1-1 b2-1 b3-3 b3-2 b3-1 Wish Canonical Map a1-1 à email a1-2 à password a1-3 à sign in Etsy Canonical Map b3-1 à email username b3-2 à password b3-3 à sign in Address Challenge #3

More challenges in reality… 28 Contact authors Study implementation Modify/verify
implementation Establish 239 benchmark tests Construct 20 ground- truth Canonical Maps

FrUITeR’s Empirical Results 29 ▪ 1,000 source-target app pairs (2
app categories × 100 app pairs in each category × 5 techniques) ▪ 11,917 results entries ▪ 7 fidelity metrics ▪ 2 utility metrics

FrUITeR’s Empirical Results 30

FrUITeR’s Empirical Results 31 https://felicitia.github.io/FrUITeR/

FrUITeR’s Selected Implications 32 ▪ Perfect isn’t perfect (e.g., fidelity
vs. utility) ▪ Source app selection (e.g., app company, code clone) ▪ Testing technique selection (e.g., trade-offs, manual vs. auto)

Thanks! Any questions? [email protected] @yixue_zhao https://softarch.usc.edu/~yixue/

FrUITeR: A Framework for Evaluating UI Test Reuse

FrUITeR: A Framework for Evaluating UI Test Reuse

Yixue Zhao, PhD

More Decks by Yixue Zhao, PhD

Other Decks in Research

Featured

Transcript