MULTISTAGING TO UNDERSTAND: DISTILLING THE ESSENCE OF CODE EXAMPLES

MULTISTAGING TO UNDERSTAND Huascar Sanchez [email protected] DISTILLING THE ESSENCE OF
CODE EXAMPLES May 16 -17, 2016 SRI International Jim Whitehead [email protected] UC Santa Cruz Martin Schäf [email protected] SRI International ICPC’16

MTU - ICPC’16 - 05 16, 2016 Problem: Understanding unfamiliar
code during code foraging is laborious and challenging. • Lots of information contained within code are either peripheral or obscured by other elements. • Lack of tool support for locating the essential sections within a code and then aid with their understanding. Solution: Deliver a method (and its tool) for discovering these essential sections and reveal only their relevant details. 2 Distilling the Essence of Code Examples

Title - CONFYYY - MM DD, YYYY 3 Multistage Representation
of Code code example … public final class RandGaussianDistrib { private static final Random R… static double uniform(){ return R.nextInt(n); } static double uniform(double a, double b) { return a + uniform() * (b - a); } static double gaussian(){ double r, x, y; do { x = uniform(-1.0, 1.0); y = uniform(-1.0, 1.0); r = (x*x) + (y*y); } while(r >= 1 || r == 0); return x * Math.sqrt(-2 * Math.log(r) / r); } } … public final class RandGaussianDistrib { private static final Random R… static double uniform(){ return R.nextInt(n); } } … public final class RandGaussianDistrib { private static final Random R… static double uniform(){ return R.nextInt(n); } static double uniform(double a, double b){ return a + uniform() * (b - a); } } … public final class RandGaussianDistrib { private static final Random R… static double uniform(){…} static double uniform(double a, double b) {…} static double gaussian(){ double r, x, y; do {…} while (r >= 1 || r == 0); return x * Math.sqrt(-2 * Math.log(r) / r); } } 1: Code stage 2: Code stage 3: Code stage

Title - CONFYYY - MM DD, YYYY 4 Multistage representation
of code … public final class RandGaussianDistrib { private static final Random R… static double uniform(){ return R.nextInt(n); } } … public final class RandGaussianDistrib { private static final Random R… static double uniform(){ return R.nextInt(n); } static double uniform(double a, double b){ return a + uniform() * (n); } } … public final class RandGaussianDistrib { private static final Random R… static double uniform(){…} static double uniform(double a, double b) {…} static double gaussian(){ double r, x, y; do {…} while (r >= 1 || r == 0); return x * Math.sqrt(-2 * Math.log(r) / r); } } 1: Code stage 2: Code stage 3: Code stage Roadmap for steering understanding

Title - CONFYYY - MM DD, YYYY 5 Implementation

Title - CONFYYY - MM DD, YYYY 6 Multistaging Implementation
browser plug-in Stage code example Code stages S 1: Get Pivot Index (a) Code stage 1. (b) Code stage 2. (c) Code stage 3. Fig. 6. One application of MethodStaging against the SmallestNum code example. function GETBINDINGSIN( m ) V , S , R {} W {target node types} S S [ m while S is not empty do u pop S if u / 2 V then V V [ { u } for each child node w in u do if if w 2 W then R R [ {binding of w } end if S S [ { w } end for end if end while return R // Set of bindings in m end function Fig. 7. GetBindingsIn subroutine. function RECONSTRUCTSOURCECODE( p, d ) // deletes declaration nodes n 2 { p \ d } from AST p p 0 { n | n 2 p and n / 2 { p \ d }} return source code for p 0 end function Fig. 8. ReconstructSourceCode subroutine. B. MethodStaging with Reduction Programmers dealing with large code stages are often con- fronted with the consequent information overload problem. We can reduce this problem by automatically reducing them. The rationale is that reduced code stages can be easily digested by programmers wishing a quick overview of their operation. We make reduction decisions in MethodStaging based on examples’ source code structure. Our approach is consistent with how human abstractors approach inspecting unfamiliar Equation 1. The usage score of a code block is representative of the demand of its elements throughout the code example. The usage frequency of each element in a code block is the number of times this element appears in a code stage. As a result, we use code blocks’ usage score to show the blocks with a higher demand and hide those with a lesser demand. UsageScore ( b ) = P elem2b UsageF req ( elem ) T otalChildren ( b ) (1) For example, given a nested code block at line 11 in Figure 4c, we first collect its children: temp, list, left, and right. Second, we compute each child’s usage frequency: 2, 7, 10, and 9. Lastly, we put it all together and calculate the nested code block’s usage score: (2 + 7 + 10 + 9)/4 = 7. We cast the problem of reducing large code stages as an instance of the Precedence Constrained Knapsack Problem or PCKP [17]. This problem is specified herein. Problem 3.2: Code Stage Reduction. Given a set of code blocks B (with weight wb and profit pb per block b 2 B), a Knapsack capacity W, a precedence order O ✓ B ⇥ B, and a set of constraints C, find H⇤ such that H⇤ = B \ X ⇤, where wb = number of lines of code in b, pb = UsageScore(b), X ⇤ = arg max { P b2B pb }, and X ⇤ satisfies the constraints in C. The constraints in C include: P b j 2B wb j  W, where bi bj (bi precedes bj ) 2 O, and i, j = 1, . . . , |B|. Similar to Samphaiboon et al. [17], we solve this problem by using dynamic programming. Our solution generalizes the code stage reduction problem, also taking into account a precedence relation between code blocks in a code stage. We build a Directed Acyclic Graph (DAG) to represent such a relation, where nodes correspond to code blocks in a one– to–one fashion. This relation is expressed as a composition relation between code blocks. For instance, a code block k 1 2: Select (a) Code stage 1. (b) Code stage 2. (c) Code stage 3. Fig. 6. One application of MethodStaging against the SmallestNum code example. function GETBINDINGSIN( m ) V , S , R {} W {target node types} S S [ m while S is not empty do u pop S if u / 2 V then V V [ { u } for each child node w in u do if if w 2 W then R R [ {binding of w } end if S S [ { w } end for end if end while return R // Set of bindings in m end function Fig. 7. GetBindingsIn subroutine. function RECONSTRUCTSOURCECODE( p, d ) // deletes declaration nodes n 2 { p \ d } from AST p p 0 { n | n 2 p and n / 2 { p \ d }} return source code for p 0 end function Fig. 8. ReconstructSourceCode subroutine. B. MethodStaging with Reduction Equation 1. The usage score of a code block is representative of the demand of its elements throughout the code example. The usage frequency of each element in a code block is the number of times this element appears in a code stage. As a result, we use code blocks’ usage score to show the blocks with a higher demand and hide those with a lesser demand. UsageScore ( b ) = P elem2b UsageFreq ( elem ) TotalChildren ( b ) (1) For example, given a nested code block at line 11 in Figure 4c, we first collect its children: temp, list, left, and right. Second, we compute each child’s usage frequency: 2, 7, 10, and 9. Lastly, we put it all together and calculate the nested code block’s usage score: (2 + 7 + 10 + 9)/4 = 7. We cast the problem of reducing large code stages as an instance of the Precedence Constrained Knapsack Problem or PCKP [17]. This problem is specified herein. Problem 3.2: Code Stage Reduction. Given a set of code blocks B (with weight wb and profit pb per block b 2 B), a Knapsack capacity W, a precedence order O ✓ B ⇥ B, and a set of constraints C, find H⇤ such that H⇤ = B \ X⇤, where wb = number of lines of code in b, pb = UsageScore(b), X⇤ = arg max { P b2B pb }, and X⇤ satisfies the constraints in C. The constraints in C include: P b j 2B wb j  W, where bi bj (bi precedes bj) 2 O, and i, j = 1, . . . , |B|. 3: Main (a) Code stage 1. (b) Code stage 2. (c) Code stage 3. Fig. 6. One application of MethodStaging against the SmallestNum code example. function GETBINDINGSIN( m ) V , S , R {} W {target node types} S S [ m while S is not empty do u pop S if u / 2 V then V V [ { u } for each child node w in u do if if w 2 W then R R [ {binding of w } end if S S [ { w } end for end if end while return R // Set of bindings in m end function Fig. 7. GetBindingsIn subroutine. function RECONSTRUCTSOURCECODE( p, d ) Equation 1. The usage score of a code block is representative of the demand of its elements throughout the code example. The usage frequency of each element in a code block is the number of times this element appears in a code stage. As a result, we use code blocks’ usage score to show the blocks with a higher demand and hide those with a lesser demand. UsageScore ( b ) = P elem2b UsageFreq ( elem ) TotalChildren ( b ) (1) For example, given a nested code block at line 11 in Figure 4c, we first collect its children: temp, list, left, and right. Second, we compute each child’s usage frequency: 2, 7, 10, and 9. Lastly, we put it all together and calculate the nested code block’s usage score: (2 + 7 + 10 + 9)/4 = 7. We cast the problem of reducing large code stages as an instance of the Precedence Constrained Knapsack Problem or PCKP [17]. This problem is specified herein. RESTful service MethodStaging w/Reduction Source code Capacity multistaging request 1 2 3 (S, H*) e.g., hi ∈ hidden code H* browser plug-in Stage RESTful service processing …

Title - CONFYYY - MM DD, YYYY 7 Multistaging in
Action

MTU - ICPC’16 - 05 16, 2016 Exploring the code
stages suggests a form of code inspection called Multistaging to Understand (MTU). By adopting MTU • Programmers can inspect a few generated code stages, • mentally abstract their functionality, and then • combine gained knowledge to understand main functionality MTU shares similarities with code reading by stepwise abstraction (Linger et al., 1979) 8 Multistaging to Understand

MTU - ICPC’16 - 05 16, 2016 9 MTU Evaluation

Title - CONFYYY - MM DD, YYYY We consider the
following question: Does MTU make the understanding of unfamiliar code examples easier during code foraging? Where easier means: • High comprehension accuracy • short reviewing time 10 MTU in the Lab

• 12 Participants, 2 groups, 3 tasks, 120 minutes •
Crossed factorial design with 2 factors • Between-subjects: Comprehension strategy • Within-subjects: Size of code examples • Variables • Response Accuracy & Reviewing time SCC - SRI - 09 18, 2015 11 Experimental Setup (Babbie, 2015)

MTU - ICPC’16 - 05 16, 2016 Open ended questions
addressing five comprehension abstractions (Pennington, 1987) 12 Response Accuracy Function Describe the overall functionality. Control flow Describe execution sequence using pseudo code. Data flow Describe when a data object gets updated. Operations Describe data object’s need in an execution sequence. State Describe data object’s composition at point of execution. Rating scheme (Du Bois, 2005) to score answers: Correct (10 pts), Almost Correct (8 pts), Right Idea (5 pts), and Wrong (0 pts)

MTU - ICPC’16 - 05 16, 2016 Collected reviewing times
from two sources: 13 Reviewing time browser plug-in’s time tracker Stage 00h 00m off Upwork’s time tracker Assigned task

MTU - ICPC’16 - 05 16, 2016 14 Results

MTU - ICPC’16 - 05 16, 2016 Significant differences in
average response accuracy; favoring the treatment group (in bold) over the control group (in italics) 15 Average Response Accuracy Short [35,70) Medium [70, 140) Long [140, 200] MTU RTU p-value MTU RTU p-value MTU RTU p-value Function 6.83 3.33 0.0037 7.17 - 3.83 3.83 0.0509 7.67 - 5.00 p=0.0534 Control flow 8.50 6.83 0.0525 7.17 - 4.33 4.33 0.1984 8.17 - 4.33 p=0.0204 Data flow 8.67 6.17 0.0462 5.33 - 3.00 3.00 0.2308 8.50 - 6.00 p=0.1199 State 8.67 7.00 0.0873 7.67 - 5.67 5.67 0.1594 9.00 - 6.50 p=0.0971 Operations 7.33 3.33 0.0595 7.83 - 4.83 4.83 0.0609 6.50 - 3.00 p=0.0549 Unaccounted factor: Delocalization (Letovsky et al.,1986) Delocalization led to many wrong answers, which caused high p-values. Note: Rating scheme for scoring accuracy of answers: Correct (10 points), Almost Correct (8 points), Right Idea (5 points), and Wrong (0 points).

MTU - ICPC’16 - 05 16, 2016 Signiﬁcant speed improvements
of the treatment group (in bold) over the control group (in italics) 16 Average Reviewing Time Short [35,70) Medium [70, 140) Long [140, 200] MTU RTU p-value MTU RTU p-value MTU RTU p-value Reviewing time (secs) 475 745 0.0995 655 1022 0.0446 465 912 0.0284 Note: Reviewing times obtained from two sources: Upwork’s time tracker and Violette’s time tracker.

MTU - ICPC’16 - 05 16, 2016 MTU helps facilitate
quick and accurate understanding when most of the code is localized. MTU provides minor beneﬁts when code is partially or fully delocalized. MTU provides consistent speed improvements regardless of delocalization. 17 Summarizing

MTU - ICPC’16 - 05 16, 2016 18 Questions: [email protected]
https://github.com/vesperin

Title - CONFYYY - MM DD, YYYY 19

Title - CONFYYY - MM DD, YYYY 20 Multistager Architecture

MTU - ICPC’16 - 05 16, 2016 Given the AST
of a code example, with a set of n method declarations D = D1 ∪ D2 … ∪ Dn, compute a set of interconnected code stages {S | S ⊆ D × D}, sorted in ascending order by LOC, s.t., each code stage s ∈ S ∪ {sØ } builds upon, and in relation to, preceding code stages. Where: • sØ is the null code stage (sØ ’s preceding code stage is sØ ) • si < sj , si precedes sj and i, j = 1 … |S| 21 The Multistaging Problem

MTU - ICPC’16 - 05 16, 2016 Algorithm: MethodStaging(p/*AST*/, sØ
) Stages = {sØ } for each method m in p do d = {} // declarations set for each binding b in GetBindingsIn(m) do // e.g., b = (select, method) d = d U {getDeclarationNode(b)} end for s = source code for {n|n ∈ p ∧ n ∉ {p\d}} Stages = Stages U {s} end for return sortAscending(Stages) end Algorithm 22 Solution: MethodStaging Algorithm

MTU - ICPC’16 - 05 16, 2016 MethodStaging provides an
effective divide and conquer approach for code understanding. One caveat: It can produce large code stages. • Large code stages (code stages with long methods) can hinder MethodStaging’s effectiveness. • Long methods tend to increase programmers’ cognitive overhead more than small methods (Mantyla et al. 2003) Solution: MethodStaging w/Reduction (via code folding) 23 Reﬂections on MethodStaging

Reduction in MethodStaging shows the code blocks (X*) with a
high usage score in each code stage s and hides (i.e., folds) the ones with a low usage score (H*), where X* U H* ∈ s MTU - ICPC’16 - 05 16, 2016 24 MethodStaging w/Reduction Basics Usage frequency of an element in a code block b ⊆ s is the number of times it appears in s. UsageScore(b) = ∑elem ∈ b UsageFreq(elem) TotalChildren(b)

MTU - ICPC’16 - 05 16, 2016 Given a set
of code blocks B (with a weight wb and profit pb per b ∈ B), a Knapsack capacity W, a precedence order O ⊆ B x B (modeled as a DAG), and a set of constraints C, let’s find the set H*, such that H* = B \ X*, wb = LOC(b), pb = UsageScore(b), X* = arg max {∑b ∈ B pb }, and X* satisfies the constraints in C. Where C includes: • ∑ b’ ∈ B wb ’ ≤ W • ∃ bi → bj (bi precedes bi ) ∈ O, i, j = 1 … |B| 25 MethodStaging w/Reduction (Formulated as a Precedence-Constrained Knapsack Problem)

Title - CONFYYY - MM DD, YYYY Multistaging Problem 26
MethodSlicing w/Reduction s to iden- de stages. em by au- t reduced grammers xample. g entirely approach h inspect- unfamiliar ing to the [18]. This ected code score. We quation 1. of the de- mple. The Input: AST Node p, and declarations d 2 p Output: A tuple consisting of the reconstructed source code and H⇤ Function ReconstructSourceCode( p, d ) // delete nodes {p \ d} from AST let p0 JDT. deleteAstNodes (p, {p \ d}) let DAG p 0 traverse p0 and then get built DAG let H⇤ computes B p 0 \ X⇤ p 0 using DAG p 0 and a capacity of 15 LOC return (JDT. getSourceCode (p0), H⇤) end Figure 11: Pseudocode for updated Reconstruct- SourceCode . This subroutine returns a tuple com- prising the reconstructed source code and the code elements to hide.

B7 B1 B2 B3 B4 B6 B5 B8 B9 B12
B10 B11 X* B7 B1 B2 B3 B4 B6 B5 B8 B9 B12 B10 B11 H* SCC - SRI - 09 18, 2015 Generating the set H* 27 63/6 7/3 19/1 5/3 5/3 3/0 5/0 3/0 13/1 3/1 2/0 17/1 wb/pb B7 B1 B2 B3 B4 B6 B5 B8 B9 B12 B10 B11 B 3. Generate H*: • H* = B \ X* { X*[k,w] = X*[k - 1, w] wk > w max(X*[k - 1, w], wk ≤ w ∧ k - 1 → k X*[k - 1, w - wk] + pk) 1. Build DAG from Example’s AST • bi → bj, bi precedes bj • wb and pb are calculated • wb = wb-original − (wc + wd ) 2. Solve X* using Dynamic Programing

MULTISTAGING TO UNDERSTAND: DISTILLING THE ESSE...

MULTISTAGING TO UNDERSTAND: DISTILLING THE ESSENCE OF CODE EXAMPLES

Huascar Sanchez

More Decks by Huascar Sanchez

Other Decks in Research

Featured

Transcript

MULTISTAGING TO UNDERSTAND Huascar Sanchez [email protected] DISTILLING THE ESSENCE OF

MTU - ICPC’16 - 05 16, 2016 Problem: Understanding unfamiliar

Title - CONFYYY - MM DD, YYYY 3 Multistage Representation

Title - CONFYYY - MM DD, YYYY 4 Multistage representation

Title - CONFYYY - MM DD, YYYY 5 Implementation

Title - CONFYYY - MM DD, YYYY 6 Multistaging Implementation

Title - CONFYYY - MM DD, YYYY 7 Multistaging in

MTU - ICPC’16 - 05 16, 2016 Exploring the code

MTU - ICPC’16 - 05 16, 2016 9 MTU Evaluation

Title - CONFYYY - MM DD, YYYY We consider the

• 12 Participants, 2 groups, 3 tasks, 120 minutes •

MTU - ICPC’16 - 05 16, 2016 Open ended questions

MTU - ICPC’16 - 05 16, 2016 Collected reviewing times

MTU - ICPC’16 - 05 16, 2016 14 Results

MTU - ICPC’16 - 05 16, 2016 Signiﬁcant differences in

MTU - ICPC’16 - 05 16, 2016 Signiﬁcant speed improvements

MTU - ICPC’16 - 05 16, 2016 MTU helps facilitate

MTU - ICPC’16 - 05 16, 2016 18 Questions: [email protected]

Title - CONFYYY - MM DD, YYYY 19

Title - CONFYYY - MM DD, YYYY 20 Multistager Architecture

MTU - ICPC’16 - 05 16, 2016 Given the AST

MTU - ICPC’16 - 05 16, 2016 Algorithm: MethodStaging(p/AST/, sØ

MTU - ICPC’16 - 05 16, 2016 MethodStaging provides an

Reduction in MethodStaging shows the code blocks (X*) with a

MTU - ICPC’16 - 05 16, 2016 Given a set

Title - CONFYYY - MM DD, YYYY Multistaging Problem 26

B7 B1 B2 B3 B4 B6 B5 B8 B9 B12