Source code Curation Tooling for the Code Forager

huascar sanchez [email protected] for the Code Forager Source code curation
tooling defense @ UCSC, October 29, 2015

Title - CONFYYY - MM DD, YYYY Code foraging is
a form of reuse practiced by many programmers. Despite advances in search technology, code foraging still restricted by the questionable quality of online source code. We can build tools that can help programmers address this questionable quality. 2 This talk in one slide

SCC - SRI - 09 18, 2015 This process is
laborious and challenging • Involves multiple rounds of speciﬁc steps (Marchionini, 2006): • browsing, screening, ﬁltering, and retooling • Deals with source code with inherently questionable quality (Gysin and Kuhn, 2010): • not guaranteed to work, to be good, or to be trustworthy 4 Just like Junkyard scavenging…

SCC - SRI - 09 18, 2015 Quality means ﬁtness
for use, and is relative to a speciﬁc programming task (Gryna and Juran, 2001). Quality dimensions (Dandashi, 2002): 1. Accuracy 2. Adaptability 3. Completeness 4. Understandability Source code with questionable quality: Source code that lacks some of these characteristics. 5 Quality

SCC - SRI - 09 18, 2015 Uncertainty over the
quality of source code can have negative effects on task effectiveness (Mackay, 1991). Effective foraging for online source code, requires addressing questionable quality of code upfront. 6 Quality matters

SCC - SRI - 09 18, 2015 (In general) Curation
is used to determine what’s useful, what’s junk, and what’s not. Curation blends ﬁltering, reﬁnement, and validation activities (Krysa, 2006; Stonebraker et al., 2013) Curation can help address code foraging’s challenging nature Curation applied to source code equals Source Code Curation. 7 Curation equals Quality minus Junk

SCC - SRI - 09 18, 2015 Source Code Curation
covers the act of • discovering a code snippet of interest, • cleaning and transforming (reﬁning) it, • presenting it in a meaningful & organized way. Its goal is to improve online source code’s quality; all before consumption. 8 Source Code Curation (Sanchez et al., 2015)

SCC - SRI - 09 18, 2015 Source Code Curation
can (1) help programmers deal with the inherently questionable quality of online source code upfront, and (2) facilitate code understanding 9 Thesis Key ideas: 1. Source code quality greatly impacts code foraging. 2. Quality is described by a series of quality dimensions. 3. Source code curation can improve these dimensions.

SCC - SRI - 09 18, 2015 Programmers are curious
(Brandt et al. 2009). Solutions to code foraging’s challenging nature must not impede such natural impulses These impulses are an integral part of their learning experience (Kuhn and DeLine, 2012). Build intuitive tools to support the curation of Java code examples on StackOverﬂow. 10 How can we implement this notion?

SCC - SRI - 09 18, 2015 11 System for
curating Java code examples on StackOverﬂow The Vesperin System Multistage (JSON) Source JSON Pack Text capacity Text Pack Kiwi Violette ( ( AST Multi-stage 1 : swap 2 : partition 4 : quicksort Code stages 3 : randomizedPartition import java.util.Random; public class Quicksort { private static Random rand = new Random(); public static void quicksort(int[] arr, int left, int right) { if (left < right) { int pivot = randomizedPartition(arr, left, right); quicksort(arr, left, pivot); quicksort(arr, pivot + 1, right); } } private static int randomizedPartition(int[] arr, int left, int right) { int swapIndex = left + rand.nextInt(right - left) + 1; swap(arr, left, swapIndex); return partition(arr, left, right); } private static int partition(int[] arr, int left, int right) { int pivot = arr[left]; int i = left - 1; int j = right + 1; while (true) { error path (0:warning, ...., n:warning) ok path Multistaging to Understand: Distilling code examples essence Code Examples Multistager (Sanchez et al., 2015) paper under review

SCC - SRI - 09 18, 2015 Research idea: Allow
programmers to experiment with code modiﬁcation ideas in the Web page of the Q&A system (in-place). Hypothesis: Intuitively experimenting with code modiﬁcations ideas hands-on can (1) help programmers deal with code with questionable quality upfront and (2) facilitate code understanding. 12 Vesperin

SCC - SRI - 09 18, 2015 17 Source object
Source ID PK Description Text Content Text Notes Array curation request updated code (1) (2) browser plug-in RESTful service scratch space Vesperin page

SCC - SRI - 09 18, 2015 18 (1) Q&A
page scratch space • The space where all in-place code modiﬁcations are made, via direct editing or via semi-automated code transformations. Vesperin page • reDOMed Q&A page drafts management • Drafts are snapshots of changed code for future recoveries. • Add error tolerance into curation process (Olsen, 2009) Vesperin actions • Make curation requests • Add notes in context • Check code syntax • Mark drafts notes (in context)

SCC - SRI - 09 18, 2015 19 mongo db
(2) Curation request {“rename”: { “what”: “method”, “where”: [1, 6], “source”: { “name”: “..”, “content”: JSON requests A P I Reply {“draft”: { “before”: {}, “after”: { “name”: “..”, “content”: JSON replies twitter HTML page RESTful Incremental Java parser Publisher & Renderer Java code transformer: • Codepacking • Delete code member, • Rename code member, • Code cleanup, • Clip fragment, • Create new method

SCC- SRI - 09 18, 2015 20 Vesperin in the
lab

SCC - SRI - 09 18, 2015 Considered the following
research questions: • How will programmers use Vesperin? • Were the provided facilities sufﬁcient? • Will programmers be able to better understand unfamiliar code examples via curation? Will Vesperin add value? 21 User Study

• 15 Participants, 3 tasks, 60 minutes • One group
pretest posttest design • Participants are studied before and after the experimental manipulation (Babbie, 2015) • Variables • Independent variable: Vesperin system • Dependent variables: perception and experience SCC - SRI - 09 18, 2015 22 Study setup (Babbie, 2015)

Title - CONFYYY - MM DD, YYYY Results: 23 Participants
Background experience. 40% of them visit StackOverﬂow multiple times a day. Moreover, nearly 70% of the participants were extremely familiar with Java and Refactoring. (a) Programming Experience. (b) StackOverﬂow Visit Frequency. (c) Level of Java Familiarity. (d) Level of Refactoring Familiarity. Figure 4.4: Summary of participants’ background information.

Title - CONFYYY - MM DD, YYYY Task 1 24
Task 1 Using client/server certiﬁcates for two way authentication SSL socket on Android

Task 2 How to add a push notiﬁcation in my own android app

Task 3 Stop the Twitter stream and return List of status with twitter4j

Title - CONFYYY - MM DD, YYYY Procedure and manipulation
27 Procedure and manipulation (Babbie, 2015) Give Vesperin demo Measurement of observation e.g., Could such a system allow you to better understand code examples? Pretest Measurement of observation e.g., Did Vesperin allow you to better understand code examples? Posttest Application of Independent variable Use Vesperin Intervention ﬁnal interview

SCC- SRI - 09 18, 2015 28 User experiences Obtained
via 4 sources: observations, automated user interaction logging, pretest & posttest, and ﬁnal interview

SCC - SRI - 09 18, 2015 Used in a
hybrid comprehension strategy Mixed bottom-up and top-down strategies Used to explore control ﬂow relationships Search and replaced; followed by annotation and cleanup Syntax checking often inﬂuenced curation. 29 How was Vesperin used?

SCC - SRI - 09 18, 2015 Editing activity surged
early on, then subsided over time. This can be explained by looking at the assumptions of dual-process theories (Chaiken and Eagly, 1989) 30 How was Vesperin used? Edits 0 10 20 30 40 50 Minutes 2 4 6 8 10 12 14 16 18 20

SCC - SRI - 09 18, 2015 Its facilities were
necessary, but not sufficient Unable to handle specific code examples: • Multiple orthogonal classes on a single scratch space • Multiple scratch spaces needed to work in concert Workarounds were used to address limitations • Used static nested classes • Combined content of all scratch spaces into single scratch space Limited aid for identifying code examples’ core parts 31 Were the set of facilities sufficient?

SCC - SRI - 09 18, 2015 Came in with
high expectations, and left satisﬁed. (added value and better understanding as predicted) 32 How useful is Vesperin? Horizontal axes: 5-point Likert scale, ranging from strongly disagree (“- -”) to strongly agree (“++”). Vertical axes: number of participants. (a) Better Understanding participants 0 2 4 6 8 10 12 PRETEST POSTEST -- - O + ++ -- - O + ++ (b) Added Value participants 0 1 2 3 4 5 6 7 8 PRETEST POSTTEST -- - O + ++ -- - O + ++

SCC - SRI - 09 18, 2015 Used in a
hybrid comprehension strategy Helped better understand source code: “It’s much easier to understand the code after its curation.” Its facilities were necessary, but not sufﬁcient Limited aid for identifying prime sets of behavior 33 Overview of Results

SCC - SRI - 09 18, 2015 34 System for
curating Java code examples on StackOverﬂow The Vesperin System Multistage (JSON) Source JSON Pack Text capacity Text Pack Kiwi Violette ( ( AST Multi-stage 1 : swap 2 : partition 4 : quicksort Code stages 3 : randomizedPartition import java.util.Random; public class Quicksort { private static Random rand = new Random(); public static void quicksort(int[] arr, int left, int right) { if (left < right) { int pivot = randomizedPartition(arr, left, right); quicksort(arr, left, pivot); quicksort(arr, pivot + 1, right); } } private static int randomizedPartition(int[] arr, int left, int right) { int swapIndex = left + rand.nextInt(right - left) + 1; swap(arr, left, swapIndex); return partition(arr, left, right); } private static int partition(int[] arr, int left, int right) { int pivot = arr[left]; int i = left - 1; int j = right + 1; while (true) { error path (0:warning, ...., n:warning) ok path Multistaging to Understand: Distilling Code Examples Essence Code Examples Multistager (Sanchez et al., 2015) (paper under review)

MTU - ICPC’16 - 05 16, 2016 Problem: Understanding unfamiliar
code during code foraging is laborious and challenging. • Lots of information contained within code are either peripheral or obscured by other elements. • Lack of tool support for locating the essential sections within a code and then aid with their understanding. Solution: Deliver a method (and its tool) for discovering these essential sections and reveal only their relevant details. 35 Distilling the Essence of Code Examples

Title - CONFYYY - MM DD, YYYY 36 Multistage Representation
of Code code example … public final class RandGaussianDistrib { private static final Random R… static double uniform(){ return R.nextInt(n); } static double uniform(double a, double b) { return a + uniform() * (b - a); } static double gaussian(){ double r, x, y; do { x = uniform(-1.0, 1.0); y = uniform(-1.0, 1.0); r = (x*x) + (y*y); } while(r >= 1 || r == 0); return x * Math.sqrt(-2 * Math.log(r) / r); } } … public final class RandGaussianDistrib { private static final Random R… static double uniform(){ return R.nextInt(n); } } … public final class RandGaussianDistrib { private static final Random R… static double uniform(){ return R.nextInt(n); } static double uniform(double a, double b){ return a + uniform() * (b - a); } } … public final class RandGaussianDistrib { private static final Random R… static double uniform(){…} static double uniform(double a, double b) {…} static double gaussian(){ double r, x, y; do {…} while (r >= 1 || r == 0); return x * Math.sqrt(-2 * Math.log(r) / r); } } 1: Code stage 2: Code stage 3: Code stage

Title - CONFYYY - MM DD, YYYY 37 Multistage representation
of code … public final class RandGaussianDistrib { private static final Random R… static double uniform(){ return R.nextInt(n); } } … public final class RandGaussianDistrib { private static final Random R… static double uniform(){ return R.nextInt(n); } static double uniform(double a, double b){ return a + uniform() * (n); } } … public final class RandGaussianDistrib { private static final Random R… static double uniform(){…} static double uniform(double a, double b) {…} static double gaussian(){ double r, x, y; do {…} while (r >= 1 || r == 0); return x * Math.sqrt(-2 * Math.log(r) / r); } } 1: Code stage 2: Code stage 3: Code stage Roadmap for steering understanding

Title - CONFYYY - MM DD, YYYY 38 Implementation

Title - CONFYYY - MM DD, YYYY 39 Multistaging Implementation
browser plug-in Stage code example Code stages S 1: Get Pivot Index (a) Code stage 1. (b) Code stage 2. (c) Code stage 3. Fig. 6. One application of MethodStaging against the SmallestNum code example. function GETBINDINGSIN( m ) V , S , R {} W {target node types} S S [ m while S is not empty do u pop S if u / 2 V then V V [ { u } for each child node w in u do if if w 2 W then R R [ {binding of w } end if S S [ { w } end for end if end while return R // Set of bindings in m end function Fig. 7. GetBindingsIn subroutine. function RECONSTRUCTSOURCECODE( p, d ) // deletes declaration nodes n 2 { p \ d } from AST p p 0 { n | n 2 p and n / 2 { p \ d }} return source code for p 0 end function Fig. 8. ReconstructSourceCode subroutine. B. MethodStaging with Reduction Programmers dealing with large code stages are often con- fronted with the consequent information overload problem. We can reduce this problem by automatically reducing them. The rationale is that reduced code stages can be easily digested by programmers wishing a quick overview of their operation. We make reduction decisions in MethodStaging based on examples’ source code structure. Our approach is consistent with how human abstractors approach inspecting unfamiliar Equation 1. The usage score of a code block is representative of the demand of its elements throughout the code example. The usage frequency of each element in a code block is the number of times this element appears in a code stage. As a result, we use code blocks’ usage score to show the blocks with a higher demand and hide those with a lesser demand. UsageScore ( b ) = P elem2b UsageF req ( elem ) T otalChildren ( b ) (1) For example, given a nested code block at line 11 in Figure 4c, we first collect its children: temp, list, left, and right. Second, we compute each child’s usage frequency: 2, 7, 10, and 9. Lastly, we put it all together and calculate the nested code block’s usage score: (2 + 7 + 10 + 9)/4 = 7. We cast the problem of reducing large code stages as an instance of the Precedence Constrained Knapsack Problem or PCKP [17]. This problem is specified herein. Problem 3.2: Code Stage Reduction. Given a set of code blocks B (with weight wb and profit pb per block b 2 B), a Knapsack capacity W, a precedence order O ✓ B ⇥ B, and a set of constraints C, find H⇤ such that H⇤ = B \ X ⇤, where wb = number of lines of code in b, pb = UsageScore(b), X ⇤ = arg max { P b2B pb }, and X ⇤ satisfies the constraints in C. The constraints in C include: P b j 2B wb j  W, where bi bj (bi precedes bj ) 2 O, and i, j = 1, . . . , |B|. Similar to Samphaiboon et al. [17], we solve this problem by using dynamic programming. Our solution generalizes the code stage reduction problem, also taking into account a precedence relation between code blocks in a code stage. We build a Directed Acyclic Graph (DAG) to represent such a relation, where nodes correspond to code blocks in a one– to–one fashion. This relation is expressed as a composition relation between code blocks. For instance, a code block k 1 2: Select (a) Code stage 1. (b) Code stage 2. (c) Code stage 3. Fig. 6. One application of MethodStaging against the SmallestNum code example. function GETBINDINGSIN( m ) V , S , R {} W {target node types} S S [ m while S is not empty do u pop S if u / 2 V then V V [ { u } for each child node w in u do if if w 2 W then R R [ {binding of w } end if S S [ { w } end for end if end while return R // Set of bindings in m end function Fig. 7. GetBindingsIn subroutine. function RECONSTRUCTSOURCECODE( p, d ) // deletes declaration nodes n 2 { p \ d } from AST p p 0 { n | n 2 p and n / 2 { p \ d }} return source code for p 0 end function Fig. 8. ReconstructSourceCode subroutine. B. MethodStaging with Reduction Equation 1. The usage score of a code block is representative of the demand of its elements throughout the code example. The usage frequency of each element in a code block is the number of times this element appears in a code stage. As a result, we use code blocks’ usage score to show the blocks with a higher demand and hide those with a lesser demand. UsageScore ( b ) = P elem2b UsageFreq ( elem ) TotalChildren ( b ) (1) For example, given a nested code block at line 11 in Figure 4c, we first collect its children: temp, list, left, and right. Second, we compute each child’s usage frequency: 2, 7, 10, and 9. Lastly, we put it all together and calculate the nested code block’s usage score: (2 + 7 + 10 + 9)/4 = 7. We cast the problem of reducing large code stages as an instance of the Precedence Constrained Knapsack Problem or PCKP [17]. This problem is specified herein. Problem 3.2: Code Stage Reduction. Given a set of code blocks B (with weight wb and profit pb per block b 2 B), a Knapsack capacity W, a precedence order O ✓ B ⇥ B, and a set of constraints C, find H⇤ such that H⇤ = B \ X⇤, where wb = number of lines of code in b, pb = UsageScore(b), X⇤ = arg max { P b2B pb }, and X⇤ satisfies the constraints in C. The constraints in C include: P b j 2B wb j  W, where bi bj (bi precedes bj) 2 O, and i, j = 1, . . . , |B|. 3: Main (a) Code stage 1. (b) Code stage 2. (c) Code stage 3. Fig. 6. One application of MethodStaging against the SmallestNum code example. function GETBINDINGSIN( m ) V , S , R {} W {target node types} S S [ m while S is not empty do u pop S if u / 2 V then V V [ { u } for each child node w in u do if if w 2 W then R R [ {binding of w } end if S S [ { w } end for end if end while return R // Set of bindings in m end function Fig. 7. GetBindingsIn subroutine. function RECONSTRUCTSOURCECODE( p, d ) Equation 1. The usage score of a code block is representative of the demand of its elements throughout the code example. The usage frequency of each element in a code block is the number of times this element appears in a code stage. As a result, we use code blocks’ usage score to show the blocks with a higher demand and hide those with a lesser demand. UsageScore ( b ) = P elem2b UsageFreq ( elem ) TotalChildren ( b ) (1) For example, given a nested code block at line 11 in Figure 4c, we first collect its children: temp, list, left, and right. Second, we compute each child’s usage frequency: 2, 7, 10, and 9. Lastly, we put it all together and calculate the nested code block’s usage score: (2 + 7 + 10 + 9)/4 = 7. We cast the problem of reducing large code stages as an instance of the Precedence Constrained Knapsack Problem or PCKP [17]. This problem is specified herein. RESTful service MethodStaging w/Reduction Source code Capacity multistaging request 1 2 3 (S, H*) e.g., hi ∈ hidden code H* browser plug-in Stage RESTful service processing …

Title - CONFYYY - MM DD, YYYY 40 Multistager Architecture

Title - CONFYYY - MM DD, YYYY 41 Multistaging in
Action

MTU - ICPC’16 - 05 16, 2016 Exploring the code
stages suggests a form of code inspection called Multistaging to Understand (MTU). By adopting MTU • Programmers can inspect a few generated code stages, • mentally abstract their functionality, and then • combine gained knowledge to understand main functionality MTU shares similarities with code reading by stepwise abstraction (Linger et al., 1979) 42 Multistaging to Understand

MTU - ICPC’16 - 05 16, 2016 Given the AST
of a code example, with a set of n method declarations D = D1 ∪ D2 … ∪ Dn, compute a set of interconnected code stages {S | S ⊆ D × D}, sorted in ascending order by LOC, s.t., each code stage s ∈ S ∪ {sØ } builds upon, and in relation to, preceding code stages. Where: • sØ is the null code stage (sØ ’s preceding code stage is sØ ) • si < sj , si precedes sj and i, j = 1 … |S| 43 The Multistaging Problem

MTU - ICPC’16 - 05 16, 2016 Algorithm: MethodStaging(p/*AST*/, sØ
) Stages = {sØ } for each method m in p do d = {} // declarations set for each binding b in GetBindingsIn(m) do // e.g., b = (select, method) d = d U {getDeclarationNode(b)} end for s = source code for {n|n ∈ p ∧ n ∉ {p\d}} Stages = Stages U {s} end for return sortAscending(Stages) end Algorithm 44 Solution: MethodStaging Algorithm

MTU - ICPC’16 - 05 16, 2016 MethodStaging provides an
effective divide and conquer approach for code understanding. One caveat: It can produce large code stages. • Large code stages (code stages with long methods) can hinder MethodStaging’s effectiveness. • Long methods tend to increase programmers’ cognitive overhead more than small methods (Mantyla et al. 2003) Solution: MethodStaging w/Reduction (via code folding) 45 Reﬂections on MethodStaging

Reduction in MethodStaging shows the code blocks (X*) with a
high usage score in each code stage s and hides (i.e., folds) the ones with a low usage score (H*), where X* U H* ∈ s MTU - ICPC’16 - 05 16, 2016 46 MethodStaging w/Reduction Basics Usage frequency of an element in a code block b ⊆ s is the number of times it appears in s. UsageScore(b) = ∑elem ∈ b UsageFreq(elem) TotalChildren(b)

MTU - ICPC’16 - 05 16, 2016 Given a set
of code blocks B (with a weight wb and profit pb per b ∈ B), a Knapsack capacity W, a precedence order O ⊆ B x B (modeled as a DAG), and a set of constraints C, let’s find the set H*, such that H* = B \ X*, wb = LOC(b), pb = UsageScore(b), X* = arg max {∑b ∈ B pb }, and X* satisfies the constraints in C. Where C includes: • ∑ b’ ∈ B wb ’ ≤ W • ∃ bi → bj (bi precedes bi ) ∈ O, i, j = 1 … |B| 47 MethodStaging w/Reduction (Formulated as a Precedence-Constrained Knapsack Problem)

Title - CONFYYY - MM DD, YYYY Multistaging Problem 48
MethodStaging w/Reduction s to iden- de stages. em by au- t reduced grammers xample. g entirely approach h inspect- unfamiliar ing to the [18]. This ected code score. We quation 1. of the de- mple. The Input: AST Node p, and declarations d 2 p Output: A tuple consisting of the reconstructed source code and H⇤ Function ReconstructSourceCode( p, d ) // delete nodes {p \ d} from AST let p0 JDT. deleteAstNodes (p, {p \ d}) let DAG p 0 traverse p0 and then get built DAG let H⇤ computes B p 0 \ X⇤ p 0 using DAG p 0 and a capacity of 15 LOC return (JDT. getSourceCode (p0), H⇤) end Figure 11: Pseudocode for updated Reconstruct- SourceCode . This subroutine returns a tuple com- prising the reconstructed source code and the code elements to hide.

B7 B1 B2 B3 B4 B6 B5 B8 B9 B12
B10 B11 X* B7 B1 B2 B3 B4 B6 B5 B8 B9 B12 B10 B11 H* SCC - SRI - 09 18, 2015 Generating the set H* 49 63/6 7/3 19/1 5/3 5/3 3/0 5/0 3/0 13/1 3/1 2/0 17/1 wb/pb B7 B1 B2 B3 B4 B6 B5 B8 B9 B12 B10 B11 B 3. Generate H*: • H* = B \ X* { X*[k,w] = X*[k - 1, w] wk > w max(X*[k - 1, w], wk ≤ w ∧ k - 1 → k X*[k - 1, w - wk] + pk) 1. Build DAG from Example’s AST • bi → bj, bi precedes bj • wb and pb are calculated • wb = wb-original − (wc + wd ) 2. Solve X* using Dynamic Programing

MTU - ICPC’16 - 05 16, 2016 50 MTU Evaluation

Title - CONFYYY - MM DD, YYYY We consider the
following question: Does MTU make the understanding of unfamiliar code examples easier during code foraging? Where easier means: • High comprehension accuracy • short reviewing time 51 MTU in the Lab

• 12 Participants, 2 groups, 3 tasks, 120 minutes •
Crossed factorial design with 2 factors • Between-subjects: Comprehension strategy • Within-subjects: Size of code examples • Variables • Response Accuracy & Reviewing time SCC - SRI - 09 18, 2015 52 Experimental Setup (Babbie, 2015)

MTU - ICPC’16 - 05 16, 2016 Open ended questions
addressing five comprehension abstractions (Pennington, 1987) 53 Response Accuracy Function Describe the overall functionality. Control flow Describe execution sequence using pseudo code. Data flow Describe when a data object gets updated. Operations Describe data object’s need in an execution sequence. State Describe data object’s composition at point of execution. Rating scheme (Du Bois, 2005) to score answers: Correct (10 pts), Almost Correct (8 pts), Right Idea (5 pts), and Wrong (0 pts)

MTU - ICPC’16 - 05 16, 2016 Collected reviewing times
from two sources: 54 Reviewing time browser plug-in’s time tracker Stage 00h 00m off Upwork’s time tracker Assigned task

MTU - ICPC’16 - 05 16, 2016 55 Results

MTU - ICPC’16 - 05 16, 2016 Significant differences in
average response accuracy; favoring the treatment group (in bold) over the control group (in italics) 56 Average Response Accuracy Short [35,70) Medium [70, 140) Long [140, 200] MTU RTU p-value MTU RTU p-value MTU RTU p-value Function 6.83 3.33 0.0037 7.17 - 3.83 3.83 0.0509 7.67 - 5.00 p=0.0534 Control flow 8.50 6.83 0.0525 7.17 - 4.33 4.33 0.1984 8.17 - 4.33 p=0.0204 Data flow 8.67 6.17 0.0462 5.33 - 3.00 3.00 0.2308 8.50 - 6.00 p=0.1199 State 8.67 7.00 0.0873 7.67 - 5.67 5.67 0.1594 9.00 - 6.50 p=0.0971 Operations 7.33 3.33 0.0595 7.83 - 4.83 4.83 0.0609 6.50 - 3.00 p=0.0549 Unaccounted factor: Delocalization (Letovsky et al.,1986) Delocalization led to many wrong answers, which caused high p-values. Note: Rating scheme for scoring accuracy of answers: Correct (10 points), Almost Correct (8 points), Right Idea (5 points), and Wrong (0 points).

MTU - ICPC’16 - 05 16, 2016 Signiﬁcant speed improvements
of the treatment group (in bold) over the control group (in italics) 57 Average Reviewing Time Short [35,70) Medium [70, 140) Long [140, 200] MTU RTU p-value MTU RTU p-value MTU RTU p-value Reviewing time (secs) 475 745 0.0995 655 1022 0.0446 465 912 0.0284 Note: Reviewing times obtained from two sources: Upwork’s time tracker and Violette’s time tracker.

MTU - ICPC’16 - 05 16, 2016 MTU helps facilitate
quick and accurate understanding when most of the code is localized. MTU provides minor beneﬁts when code is partially or fully delocalized. MTU provides consistent speed improvements regardless of delocalization. 58 Summarizing

SCC- SRI - 09 18, 2015 59 Epilogue

SCC - SRI - 09 18, 2015 Introduced a new
paradigm (and its tools) for addressing code foraging’s challenges: Vesperin & Code Examples Multistager. Our results conﬁrmed thesis statement Addressed questionable quality of online code. Facilitated quick and accurate understanding of online code. We only scratched the surface … 60 Source code curation & tools

SCC - SRI - 09 18, 2015 Craft solutions by
remixing curated code: Semi-automated resolution of delocalized code. Identiﬁcation of the best chain of code stages to reuse Crowdsourcing program synthesis 61 Looking ahead

SCC - SRI - 09 18, 2015 62 Thank you
huascar sanchez [email protected]

Source code Curation Tooling for the Code Forager

Source code Curation Tooling for the Code Forager

More Decks by Huascar Sanchez

Other Decks in Research

Featured

Transcript