Upgrade to Pro — share decks privately, control downloads, hide ads and more …

MSR16: How the R Community Creates and Curates Knowledge

MSR16: How the R Community Creates and Curates Knowledge

A talk presenting our paper at the 13th International Conference on Mining Software Repositories (MSR 2016), Austin, TX, US, May 15th 2016.

Authors: Alexey Zagalsky, Carlos Gómez Teshima, Daniel M. German, Margaret-Anne Storey, Germán Poo-Caamaño

http://2016.msrconf.org/

Link to paper: http://alexeyza.com/pdf/msr2016.pdf

Alexey Zagalsky

May 15, 2016
Tweet

More Decks by Alexey Zagalsky

Other Decks in Research

Transcript

  1. How the R Community
    Creates and Curates Knowledge
    Alexey Zagalsky, Carlos Gómez Teshima, Daniel M. German,
    Margaret-Anne Storey, Germán Poo-Caamaño
    speakerdeck.com/alexeyza

    View Slide

  2. R is an increasingly popular open source programming
    language
    The R community plays an important role in knowledge
    creation and diffusion
    Two particular communication channels for Q&A are Stack
    Overflow and the R-help mailing list
    2
    The R community

    View Slide

  3. Stack Overflow vs. Mailing Lists
    Since 2010, there has been a decrease in the number of
    messages on R-help and an increase on Stack Overflow
    [Vasilescu 2014]
    Projects that migrated from mailing lists to Stack Overflow
    showed improvements [Squire 2015]
    One would expect that traffic on the R-help mailing list
    would begin to fizzle as Stack Overflow popularity
    increased
    Interestingly, we found that both channels are used by the
    R community and both support Q&A knowledge, however,
    there are important differences between the two channels
    3

    View Slide

  4. RQ1: What types of knowledge artifacts are
    shared on Stack Overflow and the R-help
    mailing list within the R community?
    RQ2: How is the knowledge constructed on
    Stack Overflow and the R-help mailing list?
    RQ3: Why do certain users post to both Stack
    Overflow and the R-help mailing list?
    4

    View Slide

  5. Methodology
    Phase I: Mining Archival Data
    Stack Overflow data dump files + R-
    help MBOX files
    September 2008 - December 2013
    400 random threads from each
    channel (questions, answers, comments,...)
    2 coders
    Phase II: Qualitative Survey
    27 valid responses
    Promoted to the R community
    via
    Twitter, Reddit, R-help mailing
    list, and Meta Stack Exchange
    Case study
    Dataset availible online: https://github.com/thechiselgroup/R-ML-and-StackOverflow 5

    View Slide

  6. How-to
    Set up
    Bug / Error /
    Exception
    Discrepancy
    Questions
    Decision help
    Conceptual /
    Guidance
    Code reviewing
    Other
    Non-functional
    Future reference
    Redirecting
    Clue / Suggestion
    / Hint
    Tutorial
    Source code
    Answers
    Alternative
    Explanation
    Announcement
    Benchmark
    Opinion
    Announcement
    Expansion
    Background
    Correction
    Updates
    Explanation
    Solution
    Off topic /
    Opinion
    Too localized
    Not an answer
    Repeated
    question
    Flags
    Unclear
    RQ1:
    Typology of knowledge artifacts
    found on both Stack Overflow and
    the R-help mailing list
    Clarification
    Complement /
    Criticism
    Expansion
    Correction /
    Alternative
    Comments
    External reference
    6

    View Slide

  7. How-to
    Set up
    Bug / Error /
    Exception
    Discrepancy
    Questions
    Decision help
    Conceptual /
    Guidance
    Code reviewing
    Other
    Non-functional
    Future reference
    Redirecting
    Clue / Suggestion
    / Hint
    Tutorial
    Source code
    Answers
    Alternative
    Explanation
    Announcement
    Benchmark
    Opinion
    Announcement
    Expansion
    Background
    Correction
    Updates
    Explanation
    Solution
    Off topic /
    Opinion
    Too localized
    Not an answer
    Repeated
    question
    Flags
    Unclear
    Clarification
    Complement /
    Criticsm
    Expansion
    Correction /
    Alternative
    Comments
    External reference
    7
    SO % RH %
    20.20% 15.03%
    13.01% 2.59%
    24.54% 17.62%
    5.33% 18.13%
    4.09% 16.93%
    25.15% 17.44%
    0.99% 5.70%
    0.62% 0.52%
    6.07% 6.04%
    RQ1: Typology of knowledge artifacts

    View Slide

  8. How-to
    Set up
    Bug / Error /
    Exception
    Discrepancy
    Questions
    Decision help
    Conceptual /
    Guidance
    Code reviewing
    Other
    Non-functional
    Future reference
    Redirecting
    Clue / Suggestion
    / Hint
    Tutorial
    Source code
    Answers
    Alternative
    Explanation
    Announcement
    Benchmark
    Opinion
    Announcement
    Expansion
    Background
    Correction
    Updates
    Explanation
    Solution
    Off-topic /
    Opinion
    Too localized
    Not an answer
    Repeated
    question
    Flags
    Unclear
    Clarification
    Complement /
    Critic
    Expansion
    Correction /
    Alternative
    Comments
    External reference
    SO % RH %
    4.40% 1.12%
    12.07% 23.08%
    49.10% 0.81%
    18.92% 33.60%
    13.54% 38.46%
    1.96% 2.83%
    8
    RQ1: Typology of knowledge artifacts

    View Slide

  9. RQ2: How knowledge is constructed on SO and RH
    9
    Participatory
    Knowledge Construction
    Crowd
    Knowledge Construction

    View Slide

  10. RQ2: How knowledge is constructed on SO and RH
    10
    Participatory
    Knowledge Construction
    Crowd
    Knowledge Construction
    There is a wide spectrum between these two types
    Both channels provide support for Q&A knowledge:
    ➢ Crowd-based is more prevalent on Stack Overflow
    ➢ Participatory is more prevalent on the R-help mailing
    list

    View Slide

  11. Example: Participatory knowledge construction
    on the R-help mailing list
    11
    (1) previous answers are included in the current answer with clear
    links between them; or (2) a reply contains a direct reference to other
    answers or authors
    Participatory

    View Slide

  12. Example: Participatory knowledge construction
    on Stack Overflow
    12
    (1) one can infer a link between answers, through either a direct or
    indirect reference; or (2) comments complement the answer or
    directly cite another author
    Participatory

    View Slide

  13. Example: Crowd knowledge construction on SO
    13
    (1) there is no obvious collaboration; or (2) an answer is a variation of
    one of the other answers in the thread
    Crowd-based

    View Slide

  14. Example: Crowd knowledge construction on R-help
    14
    Crowd knowledge construction
    occurred when different
    messages responded directly
    to the original question, rather
    than to another response
    Crowd-based

    View Slide

  15. RQ3: Why users post to a particular channel
    15
    Why participants post on Stack Overflow
    Ability to gain peer recognition
    It has a rich and user-friendly interface
    Answers are straight to the point
    Questions are answered faster
    Why participants post on the R-help mailing list
    Email format is convenient
    Following the mailing list provides awareness and increases learning
    More flexibility regarding the topics
    Participation from many highly experienced users
    Why participants post to both channels
    Find a better answer
    Support follow-up questions
    Speed up answers

    View Slide

  16. Discussion: the impact of gamification on collaborative
    knowledge construction
    Tausczik et al. found that collaboration on Math Overflow
    was diverse and fell on a spectrum between independent
    (crowd-based) and interdependent (participatory), and the
    most common collaborative act was of an independent
    nature (i.e., provide information)
    It seems that SO gamification features, while effective,
    have the side effect of reducing collaborative knowledge
    creation between users
    16

    View Slide

  17. Discussion: Curating knowledge vs. developing
    knowledge
    Stack Overflow excels in Q&A knowledge creation and
    curation when questions have to be kept for posterity,
    however, it also restricts discussions that may lead to
    better answers
    In contrast, R-help allows users to develop knowledge
    through participation, but knowledge is not curated
    17

    View Slide

  18. Future work
    Can Stack Overflow’s model be improved to provide better
    participatory knowledge construction support?
    We have an upcoming paper on how participatory
    knowledge construction is supported by other channels,
    and what challenges developers face in the process
    @alexeyzagalsky
    Slides can be found at:
    speakerdeck.com/alexeyza

    View Slide