FrUITeR: A Framework for Evaluating UI Test Reuse

Slide 1

Slide 1 text

FrUITeR: A Framework for Evaluating UI Test Reuse Yixue Zhao1, Justin Chen2, Adriana Sejfia1, Marcelo Schmitt Laser1, Jie Zhang3, Federica Sarro3, Mark Harman3, Nenad Medvidović1 ESEC/FSE 2020, Virtual Event 1 2 3

Slide 2

Slide 2 text

What’s UI Test Reuse? 2 ▪ New and exciting!! J ▪ Test generation technique ▪ “Usage-based” UI test ▪ Reuse existing tests ▪ UI similarities a1 b1 a1-1 a1-2 a1-3 b2 b1-1 b2-1 a1 b1 a1-1 a1-2 a1-3 b2 b3 b1-1 b2-1 b3-3 b3-2 b3-1 Source app Target app

Slide 3

Slide 3 text

Why evaluation framework? 3 Ideal Happy Yixue in her proposal UI Test Reuse!

Slide 4

Slide 4 text

Why evaluation framework? 4 Ideal Happy Yixue in her proposal UI Test Reuse! Reality Happier people submitting papers I’m done hehe!

Slide 5

Slide 5 text

▪ Limitations of existing work ▪ Which techniques are better ▪ Compare my new technique 5 FrUITeR was born A Framework for Evaluating UI Test Reuse

Slide 6

Slide 6 text

So hard! ▪ Identify 5 key challenges ▪ Establish 5 requirements ▪ Design framework ▪ Build benchmark ▪ Migrate existing work 6

Slide 7

Slide 7 text

Challenges ▪ Evaluation metrics are different and limited ▪ Significant manual effort ▪ No guidelines for manual inspection ▪ One-off solutions evaluated as a whole ▪ Different benchmarks 7

Slide 8

Slide 8 text

Metrics Evaluation metrics are different and limited 1. 50% != 50% Accuracy != Good

Slide 9

Slide 9 text

9 a1 b1 a1-1 a1-2 a1-3 b2 b3 b1-1 b2-1 b3-3 b3-2 b3-1 Wish Etsy

Slide 10

Slide 10 text

Manual Significant manual effort 2.

Slide 11

Slide 11 text

11 a1 b1 a1-1 a1-2 a1-3 b2 b3 b1-1 b2-1 b3-3 b3-2 b3-1 Wish Etsy Wish → Etsy 1 test w/ 3 events 3 times

Slide 12

Slide 12 text

12 Wish Etsy Wish → Etsy 1 test w/ 3 events 3 times 10 apps (100 pairs) 10 tests w/ 10 events 100 × 100 times!

Slide 13

Slide 13 text

Guideline No guidelines for manual inspection 3. Biased & Unreproducible

Slide 14

Slide 14 text

14 a1 b1 a1-1 a1-2 a1-3 b2 b3 b1-1 b2-1 b3-3 b3-2 b3-1 Wish Etsy a1-1 à b3-1???

Slide 15

Slide 15 text

Address Challenge #1 Metrics 15 1. Evaluation metrics are different and limited

Slide 16

Slide 16 text

Address Challenge #1 Metrics 16 1. Evaluation metrics are different and limited 7 Fidelity Metrics

Slide 17

Slide 17 text

Address Challenge #1 Metrics 17 1. Evaluation metrics are different and limited Effort = Levenshtein Distance (transEvents, gtEvents) 2 Utility Metrics

Slide 18

Slide 18 text

Address Challenges #2 & #3 18 2. Significant manual effort 3. No guidelines for manual inspection

Slide 19

Slide 19 text

Slide 20

Slide 20 text

20 Uniform Representation

Slide 21

Slide 21 text

21 Uniform Representation

Slide 22

Slide 22 text

Slide 23

Slide 23 text

23 a1 b1 a1-1 a1-2 a1-3 b2 b1-1 b2-1 a1 b1 a1-1 a1-2 a1-3 b2 b3 b1-1 b2-1 b3-3 b3-2 b3-1 Wish Canonical Map a1-1 à email a1-2 à password a1-3 à sign in Etsy Canonical Map b3-1 à email b3-2 à password b3-3 à sign in Canonical Map

Slide 24

Slide 24 text

24 Wish Canonical Map a1-1 à email a1-2 à password a1-3 à sign in Etsy Canonical Map b3-1 à email b3-2 à password b3-3 à sign in Address Challenge #2 ONLY manual effort

Slide 25

Slide 25 text

25 Wish Etsy Wish → Etsy 1 test w/ 3 events 3 times 10 apps (100 pairs) 10 tests w/ 10 events 100 × 100 times!

Slide 26

Slide 26 text

26 Address Challenge #3 a1 b1 a1-1 a1-2 a1-3 b2 b1-1 b a1 b1 a1-3 b2 b3 b1-1 b2-1 b3-3 b3-2 b3-1 Wish Canonical Map a1-1 à email a1-2 à password a1-3 à sign in Etsy Canonical Map b3-1 à email b3-2 à password b3-3 à sign in

Slide 27

Slide 27 text

27 a1 b1 a1-1 a1-2 a1-3 b2 b1-1 b a1 b1 a1-3 b2 b3 b1-1 b2-1 b3-3 b3-2 b3-1 Wish Canonical Map a1-1 à email a1-2 à password a1-3 à sign in Etsy Canonical Map b3-1 à email username b3-2 à password b3-3 à sign in Address Challenge #3

Slide 28

Slide 28 text

More challenges in reality… 28 Contact authors Study implementation Modify/verify implementation Establish 239 benchmark tests Construct 20 ground- truth Canonical Maps

Slide 29

Slide 29 text

FrUITeR’s Empirical Results 29 ▪ 1,000 source-target app pairs (2 app categories × 100 app pairs in each category × 5 techniques) ▪ 11,917 results entries ▪ 7 fidelity metrics ▪ 2 utility metrics

Slide 30

Slide 30 text

FrUITeR’s Empirical Results 30

Slide 31

Slide 31 text

FrUITeR’s Empirical Results 31 https://felicitia.github.io/FrUITeR/

Slide 32

Slide 32 text

FrUITeR’s Selected Implications 32 ▪ Perfect isn’t perfect (e.g., fidelity vs. utility) ▪ Source app selection (e.g., app company, code clone) ▪ Testing technique selection (e.g., trade-offs, manual vs. auto)

Slide 33

Slide 33 text

Thanks! Any questions? [email protected] @yixue_zhao https://softarch.usc.edu/~yixue/