Slide 1

Slide 1 text

How the R Community Creates and Curates Knowledge Alexey Zagalsky, Carlos Gómez Teshima, Daniel M. German, Margaret-Anne Storey, Germán Poo-Caamaño speakerdeck.com/alexeyza

Slide 2

Slide 2 text

R is an increasingly popular open source programming language The R community plays an important role in knowledge creation and diffusion Two particular communication channels for Q&A are Stack Overflow and the R-help mailing list 2 The R community

Slide 3

Slide 3 text

Stack Overflow vs. Mailing Lists Since 2010, there has been a decrease in the number of messages on R-help and an increase on Stack Overflow [Vasilescu 2014] Projects that migrated from mailing lists to Stack Overflow showed improvements [Squire 2015] One would expect that traffic on the R-help mailing list would begin to fizzle as Stack Overflow popularity increased Interestingly, we found that both channels are used by the R community and both support Q&A knowledge, however, there are important differences between the two channels 3

Slide 4

Slide 4 text

RQ1: What types of knowledge artifacts are shared on Stack Overflow and the R-help mailing list within the R community? RQ2: How is the knowledge constructed on Stack Overflow and the R-help mailing list? RQ3: Why do certain users post to both Stack Overflow and the R-help mailing list? 4

Slide 5

Slide 5 text

Methodology Phase I: Mining Archival Data Stack Overflow data dump files + R- help MBOX files September 2008 - December 2013 400 random threads from each channel (questions, answers, comments,...) 2 coders Phase II: Qualitative Survey 27 valid responses Promoted to the R community via Twitter, Reddit, R-help mailing list, and Meta Stack Exchange Case study Dataset availible online: https://github.com/thechiselgroup/R-ML-and-StackOverflow 5

Slide 6

Slide 6 text

How-to Set up Bug / Error / Exception Discrepancy Questions Decision help Conceptual / Guidance Code reviewing Other Non-functional Future reference Redirecting Clue / Suggestion / Hint Tutorial Source code Answers Alternative Explanation Announcement Benchmark Opinion Announcement Expansion Background Correction Updates Explanation Solution Off topic / Opinion Too localized Not an answer Repeated question Flags Unclear RQ1: Typology of knowledge artifacts found on both Stack Overflow and the R-help mailing list Clarification Complement / Criticism Expansion Correction / Alternative Comments External reference 6

Slide 7

Slide 7 text

How-to Set up Bug / Error / Exception Discrepancy Questions Decision help Conceptual / Guidance Code reviewing Other Non-functional Future reference Redirecting Clue / Suggestion / Hint Tutorial Source code Answers Alternative Explanation Announcement Benchmark Opinion Announcement Expansion Background Correction Updates Explanation Solution Off topic / Opinion Too localized Not an answer Repeated question Flags Unclear Clarification Complement / Criticsm Expansion Correction / Alternative Comments External reference 7 SO % RH % 20.20% 15.03% 13.01% 2.59% 24.54% 17.62% 5.33% 18.13% 4.09% 16.93% 25.15% 17.44% 0.99% 5.70% 0.62% 0.52% 6.07% 6.04% RQ1: Typology of knowledge artifacts

Slide 8

Slide 8 text

How-to Set up Bug / Error / Exception Discrepancy Questions Decision help Conceptual / Guidance Code reviewing Other Non-functional Future reference Redirecting Clue / Suggestion / Hint Tutorial Source code Answers Alternative Explanation Announcement Benchmark Opinion Announcement Expansion Background Correction Updates Explanation Solution Off-topic / Opinion Too localized Not an answer Repeated question Flags Unclear Clarification Complement / Critic Expansion Correction / Alternative Comments External reference SO % RH % 4.40% 1.12% 12.07% 23.08% 49.10% 0.81% 18.92% 33.60% 13.54% 38.46% 1.96% 2.83% 8 RQ1: Typology of knowledge artifacts

Slide 9

Slide 9 text

RQ2: How knowledge is constructed on SO and RH 9 Participatory Knowledge Construction Crowd Knowledge Construction

Slide 10

Slide 10 text

RQ2: How knowledge is constructed on SO and RH 10 Participatory Knowledge Construction Crowd Knowledge Construction There is a wide spectrum between these two types Both channels provide support for Q&A knowledge: ➢ Crowd-based is more prevalent on Stack Overflow ➢ Participatory is more prevalent on the R-help mailing list

Slide 11

Slide 11 text

Example: Participatory knowledge construction on the R-help mailing list 11 (1) previous answers are included in the current answer with clear links between them; or (2) a reply contains a direct reference to other answers or authors Participatory

Slide 12

Slide 12 text

Example: Participatory knowledge construction on Stack Overflow 12 (1) one can infer a link between answers, through either a direct or indirect reference; or (2) comments complement the answer or directly cite another author Participatory

Slide 13

Slide 13 text

Example: Crowd knowledge construction on SO 13 (1) there is no obvious collaboration; or (2) an answer is a variation of one of the other answers in the thread Crowd-based

Slide 14

Slide 14 text

Example: Crowd knowledge construction on R-help 14 Crowd knowledge construction occurred when different messages responded directly to the original question, rather than to another response Crowd-based

Slide 15

Slide 15 text

RQ3: Why users post to a particular channel 15 Why participants post on Stack Overflow Ability to gain peer recognition It has a rich and user-friendly interface Answers are straight to the point Questions are answered faster Why participants post on the R-help mailing list Email format is convenient Following the mailing list provides awareness and increases learning More flexibility regarding the topics Participation from many highly experienced users Why participants post to both channels Find a better answer Support follow-up questions Speed up answers

Slide 16

Slide 16 text

Discussion: the impact of gamification on collaborative knowledge construction Tausczik et al. found that collaboration on Math Overflow was diverse and fell on a spectrum between independent (crowd-based) and interdependent (participatory), and the most common collaborative act was of an independent nature (i.e., provide information) It seems that SO gamification features, while effective, have the side effect of reducing collaborative knowledge creation between users 16

Slide 17

Slide 17 text

Discussion: Curating knowledge vs. developing knowledge Stack Overflow excels in Q&A knowledge creation and curation when questions have to be kept for posterity, however, it also restricts discussions that may lead to better answers In contrast, R-help allows users to develop knowledge through participation, but knowledge is not curated 17

Slide 18

Slide 18 text

Future work Can Stack Overflow’s model be improved to provide better participatory knowledge construction support? We have an upcoming paper on how participatory knowledge construction is supported by other channels, and what challenges developers face in the process @alexeyzagalsky Slides can be found at: speakerdeck.com/alexeyza