Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MSR16: How the R Community Creates and Curates Knowledge

MSR16: How the R Community Creates and Curates Knowledge

A talk presenting our paper at the 13th International Conference on Mining Software Repositories (MSR 2016), Austin, TX, US, May 15th 2016.

Authors: Alexey Zagalsky, Carlos Gómez Teshima, Daniel M. German, Margaret-Anne Storey, Germán Poo-Caamaño

http://2016.msrconf.org/

Link to paper: http://alexeyza.com/pdf/msr2016.pdf

Alexey Zagalsky

May 15, 2016
Tweet

More Decks by Alexey Zagalsky

Other Decks in Research

Transcript

  1. How the R Community Creates and Curates Knowledge Alexey Zagalsky,

    Carlos Gómez Teshima, Daniel M. German, Margaret-Anne Storey, Germán Poo-Caamaño speakerdeck.com/alexeyza
  2. R is an increasingly popular open source programming language The

    R community plays an important role in knowledge creation and diffusion Two particular communication channels for Q&A are Stack Overflow and the R-help mailing list 2 The R community
  3. Stack Overflow vs. Mailing Lists Since 2010, there has been

    a decrease in the number of messages on R-help and an increase on Stack Overflow [Vasilescu 2014] Projects that migrated from mailing lists to Stack Overflow showed improvements [Squire 2015] One would expect that traffic on the R-help mailing list would begin to fizzle as Stack Overflow popularity increased Interestingly, we found that both channels are used by the R community and both support Q&A knowledge, however, there are important differences between the two channels 3
  4. RQ1: What types of knowledge artifacts are shared on Stack

    Overflow and the R-help mailing list within the R community? RQ2: How is the knowledge constructed on Stack Overflow and the R-help mailing list? RQ3: Why do certain users post to both Stack Overflow and the R-help mailing list? 4
  5. Methodology Phase I: Mining Archival Data Stack Overflow data dump

    files + R- help MBOX files September 2008 - December 2013 400 random threads from each channel (questions, answers, comments,...) 2 coders Phase II: Qualitative Survey 27 valid responses Promoted to the R community via Twitter, Reddit, R-help mailing list, and Meta Stack Exchange Case study Dataset availible online: https://github.com/thechiselgroup/R-ML-and-StackOverflow 5
  6. How-to Set up Bug / Error / Exception Discrepancy Questions

    Decision help Conceptual / Guidance Code reviewing Other Non-functional Future reference Redirecting Clue / Suggestion / Hint Tutorial Source code Answers Alternative Explanation Announcement Benchmark Opinion Announcement Expansion Background Correction Updates Explanation Solution Off topic / Opinion Too localized Not an answer Repeated question Flags Unclear RQ1: Typology of knowledge artifacts found on both Stack Overflow and the R-help mailing list Clarification Complement / Criticism Expansion Correction / Alternative Comments External reference 6
  7. How-to Set up Bug / Error / Exception Discrepancy Questions

    Decision help Conceptual / Guidance Code reviewing Other Non-functional Future reference Redirecting Clue / Suggestion / Hint Tutorial Source code Answers Alternative Explanation Announcement Benchmark Opinion Announcement Expansion Background Correction Updates Explanation Solution Off topic / Opinion Too localized Not an answer Repeated question Flags Unclear Clarification Complement / Criticsm Expansion Correction / Alternative Comments External reference 7 SO % RH % 20.20% 15.03% 13.01% 2.59% 24.54% 17.62% 5.33% 18.13% 4.09% 16.93% 25.15% 17.44% 0.99% 5.70% 0.62% 0.52% 6.07% 6.04% RQ1: Typology of knowledge artifacts
  8. How-to Set up Bug / Error / Exception Discrepancy Questions

    Decision help Conceptual / Guidance Code reviewing Other Non-functional Future reference Redirecting Clue / Suggestion / Hint Tutorial Source code Answers Alternative Explanation Announcement Benchmark Opinion Announcement Expansion Background Correction Updates Explanation Solution Off-topic / Opinion Too localized Not an answer Repeated question Flags Unclear Clarification Complement / Critic Expansion Correction / Alternative Comments External reference SO % RH % 4.40% 1.12% 12.07% 23.08% 49.10% 0.81% 18.92% 33.60% 13.54% 38.46% 1.96% 2.83% 8 RQ1: Typology of knowledge artifacts
  9. RQ2: How knowledge is constructed on SO and RH 9

    Participatory Knowledge Construction Crowd Knowledge Construction
  10. RQ2: How knowledge is constructed on SO and RH 10

    Participatory Knowledge Construction Crowd Knowledge Construction There is a wide spectrum between these two types Both channels provide support for Q&A knowledge: ➢ Crowd-based is more prevalent on Stack Overflow ➢ Participatory is more prevalent on the R-help mailing list
  11. Example: Participatory knowledge construction on the R-help mailing list 11

    (1) previous answers are included in the current answer with clear links between them; or (2) a reply contains a direct reference to other answers or authors Participatory
  12. Example: Participatory knowledge construction on Stack Overflow 12 (1) one

    can infer a link between answers, through either a direct or indirect reference; or (2) comments complement the answer or directly cite another author Participatory
  13. Example: Crowd knowledge construction on SO 13 (1) there is

    no obvious collaboration; or (2) an answer is a variation of one of the other answers in the thread Crowd-based
  14. Example: Crowd knowledge construction on R-help 14 Crowd knowledge construction

    occurred when different messages responded directly to the original question, rather than to another response Crowd-based
  15. RQ3: Why users post to a particular channel 15 Why

    participants post on Stack Overflow Ability to gain peer recognition It has a rich and user-friendly interface Answers are straight to the point Questions are answered faster Why participants post on the R-help mailing list Email format is convenient Following the mailing list provides awareness and increases learning More flexibility regarding the topics Participation from many highly experienced users Why participants post to both channels Find a better answer Support follow-up questions Speed up answers
  16. Discussion: the impact of gamification on collaborative knowledge construction Tausczik

    et al. found that collaboration on Math Overflow was diverse and fell on a spectrum between independent (crowd-based) and interdependent (participatory), and the most common collaborative act was of an independent nature (i.e., provide information) It seems that SO gamification features, while effective, have the side effect of reducing collaborative knowledge creation between users 16
  17. Discussion: Curating knowledge vs. developing knowledge Stack Overflow excels in

    Q&A knowledge creation and curation when questions have to be kept for posterity, however, it also restricts discussions that may lead to better answers In contrast, R-help allows users to develop knowledge through participation, but knowledge is not curated 17
  18. Future work Can Stack Overflow’s model be improved to provide

    better participatory knowledge construction support? We have an upcoming paper on how participatory knowledge construction is supported by other channels, and what challenges developers face in the process @alexeyzagalsky Slides can be found at: speakerdeck.com/alexeyza