Slide 1

Slide 1 text

huascar sanchez [email protected] for the Code Forager Source code curation tooling defense @ UCSC, October 29, 2015

Slide 2

Slide 2 text

Title - CONFYYY - MM DD, YYYY Code foraging is a form of reuse practiced by many programmers. Despite advances in search technology, code foraging still restricted by the questionable quality of online source code. We can build tools that can help programmers address this questionable quality. 2 This talk in one slide

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

SCC - SRI - 09 18, 2015 This process is laborious and challenging • Involves multiple rounds of specific steps (Marchionini, 2006): • browsing, screening, filtering, and retooling • Deals with source code with inherently questionable quality (Gysin and Kuhn, 2010): • not guaranteed to work, to be good, or to be trustworthy 4 Just like Junkyard scavenging…

Slide 5

Slide 5 text

SCC - SRI - 09 18, 2015 Quality means fitness for use, and is relative to a specific programming task (Gryna and Juran, 2001). Quality dimensions (Dandashi, 2002): 1. Accuracy 2. Adaptability 3. Completeness 4. Understandability Source code with questionable quality: Source code that lacks some of these characteristics. 5 Quality

Slide 6

Slide 6 text

SCC - SRI - 09 18, 2015 Uncertainty over the quality of source code can have negative effects on task effectiveness (Mackay, 1991). Effective foraging for online source code, requires addressing questionable quality of code upfront. 6 Quality matters

Slide 7

Slide 7 text

SCC - SRI - 09 18, 2015 (In general) Curation is used to determine what’s useful, what’s junk, and what’s not. Curation blends filtering, refinement, and validation activities (Krysa, 2006; Stonebraker et al., 2013) Curation can help address code foraging’s challenging nature Curation applied to source code equals Source Code Curation. 7 Curation equals Quality minus Junk

Slide 8

Slide 8 text

SCC - SRI - 09 18, 2015 Source Code Curation covers the act of • discovering a code snippet of interest, • cleaning and transforming (refining) it, • presenting it in a meaningful & organized way. Its goal is to improve online source code’s quality; all before consumption. 8 Source Code Curation (Sanchez et al., 2015)

Slide 9

Slide 9 text

SCC - SRI - 09 18, 2015 Source Code Curation can (1) help programmers deal with the inherently questionable quality of online source code upfront, and (2) facilitate code understanding 9 Thesis Key ideas: 1. Source code quality greatly impacts code foraging. 2. Quality is described by a series of quality dimensions. 3. Source code curation can improve these dimensions.

Slide 10

Slide 10 text

SCC - SRI - 09 18, 2015 Programmers are curious (Brandt et al. 2009). Solutions to code foraging’s challenging nature must not impede such natural impulses These impulses are an integral part of their learning experience (Kuhn and DeLine, 2012). Build intuitive tools to support the curation of Java code examples on StackOverflow. 10 How can we implement this notion?

Slide 11

Slide 11 text

SCC - SRI - 09 18, 2015 11 System for curating Java code examples on StackOverflow The Vesperin System Multistage (JSON) Source JSON Pack Text capacity Text Pack Kiwi Violette ( ( AST Multi-stage 1 : swap 2 : partition 4 : quicksort Code stages 3 : randomizedPartition import java.util.Random; public class Quicksort { private static Random rand = new Random(); public static void quicksort(int[] arr, int left, int right) { if (left < right) { int pivot = randomizedPartition(arr, left, right); quicksort(arr, left, pivot); quicksort(arr, pivot + 1, right); } } private static int randomizedPartition(int[] arr, int left, int right) { int swapIndex = left + rand.nextInt(right - left) + 1; swap(arr, left, swapIndex); return partition(arr, left, right); } private static int partition(int[] arr, int left, int right) { int pivot = arr[left]; int i = left - 1; int j = right + 1; while (true) { error path (0:warning, ...., n:warning) ok path Multistaging to Understand: Distilling code examples essence Code Examples Multistager (Sanchez et al., 2015) paper under review

Slide 12

Slide 12 text

SCC - SRI - 09 18, 2015 Research idea: Allow programmers to experiment with code modification ideas in the Web page of the Q&A system (in-place). Hypothesis: Intuitively experimenting with code modifications ideas hands-on can (1) help programmers deal with code with questionable quality upfront and (2) facilitate code understanding. 12 Vesperin

Slide 13

Slide 13 text

13

Slide 14

Slide 14 text

14

Slide 15

Slide 15 text

15

Slide 16

Slide 16 text

16

Slide 17

Slide 17 text

SCC - SRI - 09 18, 2015 17 Source object Source ID PK Description Text Content Text Notes Array curation request updated code (1) (2) browser plug-in RESTful service scratch space Vesperin page

Slide 18

Slide 18 text

SCC - SRI - 09 18, 2015 18 (1) Q&A page scratch space • The space where all in-place code modifications are made, via direct editing or via semi-automated code transformations. Vesperin page • reDOMed Q&A page drafts management • Drafts are snapshots of changed code for future recoveries. • Add error tolerance into curation process (Olsen, 2009) Vesperin actions • Make curation requests • Add notes in context • Check code syntax • Mark drafts notes (in context)

Slide 19

Slide 19 text

SCC - SRI - 09 18, 2015 19 mongo db (2) Curation request {“rename”: { “what”: “method”, “where”: [1, 6], “source”: { “name”: “..”, “content”: JSON requests A P I Reply {“draft”: { “before”: {}, “after”: { “name”: “..”, “content”: JSON replies twitter HTML page RESTful Incremental Java parser Publisher & Renderer Java code transformer: • Codepacking • Delete code member, • Rename code member, • Code cleanup, • Clip fragment, • Create new method

Slide 20

Slide 20 text

SCC- SRI - 09 18, 2015 20 Vesperin in the lab

Slide 21

Slide 21 text

SCC - SRI - 09 18, 2015 Considered the following research questions: • How will programmers use Vesperin? • Were the provided facilities sufficient? • Will programmers be able to better understand unfamiliar code examples via curation? Will Vesperin add value? 21 User Study

Slide 22

Slide 22 text

• 15 Participants, 3 tasks, 60 minutes • One group pretest posttest design • Participants are studied before and after the experimental manipulation (Babbie, 2015) • Variables • Independent variable: Vesperin system • Dependent variables: perception and experience SCC - SRI - 09 18, 2015 22 Study setup (Babbie, 2015)

Slide 23

Slide 23 text

Title - CONFYYY - MM DD, YYYY Results: 23 Participants Background experience. 40% of them visit StackOverflow multiple times a day. Moreover, nearly 70% of the participants were extremely familiar with Java and Refactoring. (a) Programming Experience. (b) StackOverflow Visit Frequency. (c) Level of Java Familiarity. (d) Level of Refactoring Familiarity. Figure 4.4: Summary of participants’ background information.

Slide 24

Slide 24 text

Title - CONFYYY - MM DD, YYYY Task 1 24 Task 1 Using client/server certificates for two way authentication SSL socket on Android

Slide 25

Slide 25 text

Title - CONFYYY - MM DD, YYYY Task 1 25 Task 2 How to add a push notification in my own android app

Slide 26

Slide 26 text

Title - CONFYYY - MM DD, YYYY Task 1 26 Task 3 Stop the Twitter stream and return List of status with twitter4j

Slide 27

Slide 27 text

Title - CONFYYY - MM DD, YYYY Procedure and manipulation 27 Procedure and manipulation (Babbie, 2015) Give Vesperin demo Measurement of observation e.g., Could such a system allow you to better understand code examples? Pretest Measurement of observation e.g., Did Vesperin allow you to better understand code examples? Posttest Application of Independent variable Use Vesperin Intervention final interview

Slide 28

Slide 28 text

SCC- SRI - 09 18, 2015 28 User experiences Obtained via 4 sources: observations, automated user interaction logging, pretest & posttest, and final interview

Slide 29

Slide 29 text

SCC - SRI - 09 18, 2015 Used in a hybrid comprehension strategy Mixed bottom-up and top-down strategies Used to explore control flow relationships Search and replaced; followed by annotation and cleanup Syntax checking often influenced curation. 29 How was Vesperin used?

Slide 30

Slide 30 text

SCC - SRI - 09 18, 2015 Editing activity surged early on, then subsided over time. This can be explained by looking at the assumptions of dual-process theories (Chaiken and Eagly, 1989) 30 How was Vesperin used? Edits 0 10 20 30 40 50 Minutes 2 4 6 8 10 12 14 16 18 20

Slide 31

Slide 31 text

SCC - SRI - 09 18, 2015 Its facilities were necessary, but not sufficient Unable to handle specific code examples: • Multiple orthogonal classes on a single scratch space • Multiple scratch spaces needed to work in concert Workarounds were used to address limitations • Used static nested classes • Combined content of all scratch spaces into single scratch space Limited aid for identifying code examples’ core parts 31 Were the set of facilities sufficient?

Slide 32

Slide 32 text

SCC - SRI - 09 18, 2015 Came in with high expectations, and left satisfied. (added value and better understanding as predicted) 32 How useful is Vesperin? Horizontal axes: 5-point Likert scale, ranging from strongly disagree (“- -”) to strongly agree (“++”). Vertical axes: number of participants. (a) Better Understanding participants 0 2 4 6 8 10 12 PRETEST POSTEST -- - O + ++ -- - O + ++ (b) Added Value participants 0 1 2 3 4 5 6 7 8 PRETEST POSTTEST -- - O + ++ -- - O + ++

Slide 33

Slide 33 text

SCC - SRI - 09 18, 2015 Used in a hybrid comprehension strategy Helped better understand source code: “It’s much easier to understand the code after its curation.” Its facilities were necessary, but not sufficient Limited aid for identifying prime sets of behavior 33 Overview of Results

Slide 34

Slide 34 text

SCC - SRI - 09 18, 2015 34 System for curating Java code examples on StackOverflow The Vesperin System Multistage (JSON) Source JSON Pack Text capacity Text Pack Kiwi Violette ( ( AST Multi-stage 1 : swap 2 : partition 4 : quicksort Code stages 3 : randomizedPartition import java.util.Random; public class Quicksort { private static Random rand = new Random(); public static void quicksort(int[] arr, int left, int right) { if (left < right) { int pivot = randomizedPartition(arr, left, right); quicksort(arr, left, pivot); quicksort(arr, pivot + 1, right); } } private static int randomizedPartition(int[] arr, int left, int right) { int swapIndex = left + rand.nextInt(right - left) + 1; swap(arr, left, swapIndex); return partition(arr, left, right); } private static int partition(int[] arr, int left, int right) { int pivot = arr[left]; int i = left - 1; int j = right + 1; while (true) { error path (0:warning, ...., n:warning) ok path Multistaging to Understand: Distilling Code Examples Essence Code Examples Multistager (Sanchez et al., 2015) (paper under review)

Slide 35

Slide 35 text

MTU - ICPC’16 - 05 16, 2016 Problem: Understanding unfamiliar code during code foraging is laborious and challenging. • Lots of information contained within code are either peripheral or obscured by other elements. • Lack of tool support for locating the essential sections within a code and then aid with their understanding. Solution: Deliver a method (and its tool) for discovering these essential sections and reveal only their relevant details. 35 Distilling the Essence of Code Examples

Slide 36

Slide 36 text

Title - CONFYYY - MM DD, YYYY 36 Multistage Representation of Code code example … public final class RandGaussianDistrib { private static final Random R… static double uniform(){ return R.nextInt(n); } static double uniform(double a, double b) { return a + uniform() * (b - a); } static double gaussian(){ double r, x, y; do { x = uniform(-1.0, 1.0); y = uniform(-1.0, 1.0); r = (x*x) + (y*y); } while(r >= 1 || r == 0); return x * Math.sqrt(-2 * Math.log(r) / r); } } … public final class RandGaussianDistrib { private static final Random R… static double uniform(){ return R.nextInt(n); } } … public final class RandGaussianDistrib { private static final Random R… static double uniform(){ return R.nextInt(n); } static double uniform(double a, double b){ return a + uniform() * (b - a); } } … public final class RandGaussianDistrib { private static final Random R… static double uniform(){…} static double uniform(double a, double b) {…} static double gaussian(){ double r, x, y; do {…} while (r >= 1 || r == 0); return x * Math.sqrt(-2 * Math.log(r) / r); } } 1: Code stage 2: Code stage 3: Code stage

Slide 37

Slide 37 text

Title - CONFYYY - MM DD, YYYY 37 Multistage representation of code … public final class RandGaussianDistrib { private static final Random R… static double uniform(){ return R.nextInt(n); } } … public final class RandGaussianDistrib { private static final Random R… static double uniform(){ return R.nextInt(n); } static double uniform(double a, double b){ return a + uniform() * (n); } } … public final class RandGaussianDistrib { private static final Random R… static double uniform(){…} static double uniform(double a, double b) {…} static double gaussian(){ double r, x, y; do {…} while (r >= 1 || r == 0); return x * Math.sqrt(-2 * Math.log(r) / r); } } 1: Code stage 2: Code stage 3: Code stage Roadmap for steering understanding

Slide 38

Slide 38 text

Title - CONFYYY - MM DD, YYYY 38 Implementation

Slide 39

Slide 39 text

Title - CONFYYY - MM DD, YYYY 39 Multistaging Implementation browser plug-in Stage code example Code stages S 1: Get Pivot Index (a) Code stage 1. (b) Code stage 2. (c) Code stage 3. Fig. 6. One application of MethodStaging against the SmallestNum code example. function GETBINDINGSIN( m ) V , S , R {} W {target node types} S S [ m while S is not empty do u pop S if u / 2 V then V V [ { u } for each child node w in u do if if w 2 W then R R [ {binding of w } end if S S [ { w } end for end if end while return R // Set of bindings in m end function Fig. 7. GetBindingsIn subroutine. function RECONSTRUCTSOURCECODE( p, d ) // deletes declaration nodes n 2 { p \ d } from AST p p 0 { n | n 2 p and n / 2 { p \ d }} return source code for p 0 end function Fig. 8. ReconstructSourceCode subroutine. B. MethodStaging with Reduction Programmers dealing with large code stages are often con- fronted with the consequent information overload problem. We can reduce this problem by automatically reducing them. The rationale is that reduced code stages can be easily digested by programmers wishing a quick overview of their operation. We make reduction decisions in MethodStaging based on examples’ source code structure. Our approach is consistent with how human abstractors approach inspecting unfamiliar Equation 1. The usage score of a code block is representative of the demand of its elements throughout the code example. The usage frequency of each element in a code block is the number of times this element appears in a code stage. As a result, we use code blocks’ usage score to show the blocks with a higher demand and hide those with a lesser demand. UsageScore ( b ) = P elem2b UsageF req ( elem ) T otalChildren ( b ) (1) For example, given a nested code block at line 11 in Figure 4c, we first collect its children: temp, list, left, and right. Second, we compute each child’s usage frequency: 2, 7, 10, and 9. Lastly, we put it all together and calculate the nested code block’s usage score: (2 + 7 + 10 + 9)/4 = 7. We cast the problem of reducing large code stages as an instance of the Precedence Constrained Knapsack Problem or PCKP [17]. This problem is specified herein. Problem 3.2: Code Stage Reduction. Given a set of code blocks B (with weight wb and profit pb per block b 2 B), a Knapsack capacity W, a precedence order O ✓ B ⇥ B, and a set of constraints C, find H⇤ such that H⇤ = B \ X ⇤, where wb = number of lines of code in b, pb = UsageScore(b), X ⇤ = arg max { P b2B pb }, and X ⇤ satisfies the constraints in C. The constraints in C include: P b j 2B wb j  W, where bi bj (bi precedes bj ) 2 O, and i, j = 1, . . . , |B|. Similar to Samphaiboon et al. [17], we solve this problem by using dynamic programming. Our solution generalizes the code stage reduction problem, also taking into account a precedence relation between code blocks in a code stage. We build a Directed Acyclic Graph (DAG) to represent such a relation, where nodes correspond to code blocks in a one– to–one fashion. This relation is expressed as a composition relation between code blocks. For instance, a code block k 1 2: Select (a) Code stage 1. (b) Code stage 2. (c) Code stage 3. Fig. 6. One application of MethodStaging against the SmallestNum code example. function GETBINDINGSIN( m ) V , S , R {} W {target node types} S S [ m while S is not empty do u pop S if u / 2 V then V V [ { u } for each child node w in u do if if w 2 W then R R [ {binding of w } end if S S [ { w } end for end if end while return R // Set of bindings in m end function Fig. 7. GetBindingsIn subroutine. function RECONSTRUCTSOURCECODE( p, d ) // deletes declaration nodes n 2 { p \ d } from AST p p 0 { n | n 2 p and n / 2 { p \ d }} return source code for p 0 end function Fig. 8. ReconstructSourceCode subroutine. B. MethodStaging with Reduction Equation 1. The usage score of a code block is representative of the demand of its elements throughout the code example. The usage frequency of each element in a code block is the number of times this element appears in a code stage. As a result, we use code blocks’ usage score to show the blocks with a higher demand and hide those with a lesser demand. UsageScore ( b ) = P elem2b UsageFreq ( elem ) TotalChildren ( b ) (1) For example, given a nested code block at line 11 in Figure 4c, we first collect its children: temp, list, left, and right. Second, we compute each child’s usage frequency: 2, 7, 10, and 9. Lastly, we put it all together and calculate the nested code block’s usage score: (2 + 7 + 10 + 9)/4 = 7. We cast the problem of reducing large code stages as an instance of the Precedence Constrained Knapsack Problem or PCKP [17]. This problem is specified herein. Problem 3.2: Code Stage Reduction. Given a set of code blocks B (with weight wb and profit pb per block b 2 B), a Knapsack capacity W, a precedence order O ✓ B ⇥ B, and a set of constraints C, find H⇤ such that H⇤ = B \ X⇤, where wb = number of lines of code in b, pb = UsageScore(b), X⇤ = arg max { P b2B pb }, and X⇤ satisfies the constraints in C. The constraints in C include: P b j 2B wb j  W, where bi bj (bi precedes bj) 2 O, and i, j = 1, . . . , |B|. 3: Main (a) Code stage 1. (b) Code stage 2. (c) Code stage 3. Fig. 6. One application of MethodStaging against the SmallestNum code example. function GETBINDINGSIN( m ) V , S , R {} W {target node types} S S [ m while S is not empty do u pop S if u / 2 V then V V [ { u } for each child node w in u do if if w 2 W then R R [ {binding of w } end if S S [ { w } end for end if end while return R // Set of bindings in m end function Fig. 7. GetBindingsIn subroutine. function RECONSTRUCTSOURCECODE( p, d ) Equation 1. The usage score of a code block is representative of the demand of its elements throughout the code example. The usage frequency of each element in a code block is the number of times this element appears in a code stage. As a result, we use code blocks’ usage score to show the blocks with a higher demand and hide those with a lesser demand. UsageScore ( b ) = P elem2b UsageFreq ( elem ) TotalChildren ( b ) (1) For example, given a nested code block at line 11 in Figure 4c, we first collect its children: temp, list, left, and right. Second, we compute each child’s usage frequency: 2, 7, 10, and 9. Lastly, we put it all together and calculate the nested code block’s usage score: (2 + 7 + 10 + 9)/4 = 7. We cast the problem of reducing large code stages as an instance of the Precedence Constrained Knapsack Problem or PCKP [17]. This problem is specified herein. RESTful service MethodStaging w/Reduction Source code Capacity multistaging request 1 2 3 (S, H*) e.g., hi ∈ hidden code H* browser plug-in Stage RESTful service processing …

Slide 40

Slide 40 text

Title - CONFYYY - MM DD, YYYY 40 Multistager Architecture

Slide 41

Slide 41 text

Title - CONFYYY - MM DD, YYYY 41 Multistaging in Action

Slide 42

Slide 42 text

MTU - ICPC’16 - 05 16, 2016 Exploring the code stages suggests a form of code inspection called Multistaging to Understand (MTU). By adopting MTU • Programmers can inspect a few generated code stages, • mentally abstract their functionality, and then • combine gained knowledge to understand main functionality MTU shares similarities with code reading by stepwise abstraction (Linger et al., 1979) 42 Multistaging to Understand

Slide 43

Slide 43 text

MTU - ICPC’16 - 05 16, 2016 Given the AST of a code example, with a set of n method declarations D = D1 ∪ D2 … ∪ Dn, compute a set of interconnected code stages {S | S ⊆ D × D}, sorted in ascending order by LOC, s.t., each code stage s ∈ S ∪ {sØ } builds upon, and in relation to, preceding code stages. Where: • sØ is the null code stage (sØ ’s preceding code stage is sØ ) • si < sj , si precedes sj and i, j = 1 … |S| 43 The Multistaging Problem

Slide 44

Slide 44 text

MTU - ICPC’16 - 05 16, 2016 Algorithm: MethodStaging(p/*AST*/, sØ ) Stages = {sØ } for each method m in p do d = {} // declarations set for each binding b in GetBindingsIn(m) do // e.g., b = (select, method) d = d U {getDeclarationNode(b)} end for s = source code for {n|n ∈ p ∧ n ∉ {p\d}} Stages = Stages U {s} end for return sortAscending(Stages) end Algorithm 44 Solution: MethodStaging Algorithm

Slide 45

Slide 45 text

MTU - ICPC’16 - 05 16, 2016 MethodStaging provides an effective divide and conquer approach for code understanding. One caveat: It can produce large code stages. • Large code stages (code stages with long methods) can hinder MethodStaging’s effectiveness. • Long methods tend to increase programmers’ cognitive overhead more than small methods (Mantyla et al. 2003) Solution: MethodStaging w/Reduction (via code folding) 45 Reflections on MethodStaging

Slide 46

Slide 46 text

Reduction in MethodStaging shows the code blocks (X*) with a high usage score in each code stage s and hides (i.e., folds) the ones with a low usage score (H*), where X* U H* ∈ s MTU - ICPC’16 - 05 16, 2016 46 MethodStaging w/Reduction Basics Usage frequency of an element in a code block b ⊆ s is the number of times it appears in s. UsageScore(b) = ∑elem ∈ b UsageFreq(elem) TotalChildren(b)

Slide 47

Slide 47 text

MTU - ICPC’16 - 05 16, 2016 Given a set of code blocks B (with a weight wb and profit pb per b ∈ B), a Knapsack capacity W, a precedence order O ⊆ B x B (modeled as a DAG), and a set of constraints C, let’s find the set H*, such that H* = B \ X*, wb = LOC(b), pb = UsageScore(b), X* = arg max {∑b ∈ B pb }, and X* satisfies the constraints in C. Where C includes: • ∑ b’ ∈ B wb ’ ≤ W • ∃ bi → bj (bi precedes bi ) ∈ O, i, j = 1 … |B| 47 MethodStaging w/Reduction (Formulated as a Precedence-Constrained Knapsack Problem)

Slide 48

Slide 48 text

Title - CONFYYY - MM DD, YYYY Multistaging Problem 48 MethodStaging w/Reduction s to iden- de stages. em by au- t reduced grammers xample. g entirely approach h inspect- unfamiliar ing to the [18]. This ected code score. We quation 1. of the de- mple. The Input: AST Node p, and declarations d 2 p Output: A tuple consisting of the reconstructed source code and H⇤ Function ReconstructSourceCode( p, d ) // delete nodes {p \ d} from AST let p0 JDT. deleteAstNodes (p, {p \ d}) let DAG p 0 traverse p0 and then get built DAG let H⇤ computes B p 0 \ X⇤ p 0 using DAG p 0 and a capacity of 15 LOC return (JDT. getSourceCode (p0), H⇤) end Figure 11: Pseudocode for updated Reconstruct- SourceCode . This subroutine returns a tuple com- prising the reconstructed source code and the code elements to hide.

Slide 49

Slide 49 text

B7 B1 B2 B3 B4 B6 B5 B8 B9 B12 B10 B11 X* B7 B1 B2 B3 B4 B6 B5 B8 B9 B12 B10 B11 H* SCC - SRI - 09 18, 2015 Generating the set H* 49 63/6 7/3 19/1 5/3 5/3 3/0 5/0 3/0 13/1 3/1 2/0 17/1 wb/pb B7 B1 B2 B3 B4 B6 B5 B8 B9 B12 B10 B11 B 3. Generate H*: • H* = B \ X* { X*[k,w] = X*[k - 1, w] wk > w max(X*[k - 1, w], wk ≤ w ∧ k - 1 → k X*[k - 1, w - wk] + pk) 1. Build DAG from Example’s AST • bi → bj, bi precedes bj • wb and pb are calculated • wb = wb-original − (wc + wd ) 2. Solve X* using Dynamic Programing

Slide 50

Slide 50 text

MTU - ICPC’16 - 05 16, 2016 50 MTU Evaluation

Slide 51

Slide 51 text

Title - CONFYYY - MM DD, YYYY We consider the following question: Does MTU make the understanding of unfamiliar code examples easier during code foraging? Where easier means: • High comprehension accuracy • short reviewing time 51 MTU in the Lab

Slide 52

Slide 52 text

• 12 Participants, 2 groups, 3 tasks, 120 minutes • Crossed factorial design with 2 factors • Between-subjects: Comprehension strategy • Within-subjects: Size of code examples • Variables • Response Accuracy & Reviewing time SCC - SRI - 09 18, 2015 52 Experimental Setup (Babbie, 2015)

Slide 53

Slide 53 text

MTU - ICPC’16 - 05 16, 2016 Open ended questions addressing five comprehension abstractions (Pennington, 1987) 53 Response Accuracy Function Describe the overall functionality. Control flow Describe execution sequence using pseudo code. Data flow Describe when a data object gets updated. Operations Describe data object’s need in an execution sequence. State Describe data object’s composition at point of execution. Rating scheme (Du Bois, 2005) to score answers: Correct (10 pts), Almost Correct (8 pts), Right Idea (5 pts), and Wrong (0 pts)

Slide 54

Slide 54 text

MTU - ICPC’16 - 05 16, 2016 Collected reviewing times from two sources: 54 Reviewing time browser plug-in’s time tracker Stage 00h 00m off Upwork’s time tracker Assigned task

Slide 55

Slide 55 text

MTU - ICPC’16 - 05 16, 2016 55 Results

Slide 56

Slide 56 text

MTU - ICPC’16 - 05 16, 2016 Significant differences in average response accuracy; favoring the treatment group (in bold) over the control group (in italics) 56 Average Response Accuracy Short [35,70) Medium [70, 140) Long [140, 200] MTU RTU p-value MTU RTU p-value MTU RTU p-value Function 6.83 3.33 0.0037 7.17 - 3.83 3.83 0.0509 7.67 - 5.00 p=0.0534 Control flow 8.50 6.83 0.0525 7.17 - 4.33 4.33 0.1984 8.17 - 4.33 p=0.0204 Data flow 8.67 6.17 0.0462 5.33 - 3.00 3.00 0.2308 8.50 - 6.00 p=0.1199 State 8.67 7.00 0.0873 7.67 - 5.67 5.67 0.1594 9.00 - 6.50 p=0.0971 Operations 7.33 3.33 0.0595 7.83 - 4.83 4.83 0.0609 6.50 - 3.00 p=0.0549 Unaccounted factor: Delocalization (Letovsky et al.,1986) Delocalization led to many wrong answers, which caused high p-values. Note: Rating scheme for scoring accuracy of answers: Correct (10 points), Almost Correct (8 points), Right Idea (5 points), and Wrong (0 points).

Slide 57

Slide 57 text

MTU - ICPC’16 - 05 16, 2016 Significant speed improvements of the treatment group (in bold) over the control group (in italics) 57 Average Reviewing Time Short [35,70) Medium [70, 140) Long [140, 200] MTU RTU p-value MTU RTU p-value MTU RTU p-value Reviewing time (secs) 475 745 0.0995 655 1022 0.0446 465 912 0.0284 Note: Reviewing times obtained from two sources: Upwork’s time tracker and Violette’s time tracker.

Slide 58

Slide 58 text

MTU - ICPC’16 - 05 16, 2016 MTU helps facilitate quick and accurate understanding when most of the code is localized. MTU provides minor benefits when code is partially or fully delocalized. MTU provides consistent speed improvements regardless of delocalization. 58 Summarizing

Slide 59

Slide 59 text

SCC- SRI - 09 18, 2015 59 Epilogue

Slide 60

Slide 60 text

SCC - SRI - 09 18, 2015 Introduced a new paradigm (and its tools) for addressing code foraging’s challenges: Vesperin & Code Examples Multistager. Our results confirmed thesis statement Addressed questionable quality of online code. Facilitated quick and accurate understanding of online code. We only scratched the surface … 60 Source code curation & tools

Slide 61

Slide 61 text

SCC - SRI - 09 18, 2015 Craft solutions by remixing curated code: Semi-automated resolution of delocalized code. Identification of the best chain of code stages to reuse Crowdsourcing program synthesis 61 Looking ahead

Slide 62

Slide 62 text

SCC - SRI - 09 18, 2015 62 Thank you huascar sanchez [email protected]