Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Leaving Behind the Software History When Transitioning to Open-Source: Reasons and Implications

Leaving Behind the Software History When Transitioning to Open-Source: Reasons and Implications

Gustavo Pinto

June 11, 2018
Tweet

More Decks by Gustavo Pinto

Other Decks in Technology

Transcript

  1. Leaving Behind the Software History When
    Transitioning to Open-Source: Reasons and
    Implications
    @gustavopinto @igorsteinmacher @gerosa_marco

    View Slide

  2. View Slide

  3. View Slide

  4. View Slide

  5. View Slide

  6. View Slide

  7. View Slide

  8. View Slide

  9. The project deleted the software history and
    imported the source code all at once!

    View Slide

  10. Mihai Codoban, Sruti Srinivasa Ragavan, Danny Dig, and Brian Bailey.
    Software history under the lens: A study on why and how developers
    examine it. In ICSME 2015, pages 1–10, 2015.
    “Software history is indispensable for developers.
    Of the 217 developers surveyed in this work, 85% find software history
    important to their development activities and 61% need to refer to
    history at least several times a day.”
    More benefits:
    Knowledge acquisition (Pham et al., 2013)
    End-users take advantage of the software history (Kuttal et al., 2014)
    Research (co-changes, defect prediction, mining, etc.)

    View Slide

  11. Why one would
    remove the whole
    history? What are the
    reasons? and the
    implications?

    View Slide

  12. We found 50
    proprietary
    projects that
    made the shift
    to open source
    and deleted
    the history
    We could find only 8 projects that kept the history

    View Slide

  13. We found 50
    proprietary
    projects that
    made the shift
    to open source

    View Slide

  14. We found 50
    proprietary
    projects that
    made the shift
    to open source

    View Slide

  15. We found 50
    proprietary
    projects that
    made the shift
    to open source
    1. Why did you decide not to keep the software
    history?

    View Slide

  16. We found 50
    proprietary
    projects that
    made the shift
    to open source
    2. Do the core developers face any kind of problems
    with the lack of software history?

    View Slide

  17. We found 50
    proprietary
    projects that
    made the shift
    to open source
    3. Do the newcomers face any kind of problems with
    the lack of software history?

    View Slide

  18. We found 50
    proprietary
    projects that
    made the shift
    to open source
    4. How does the lack of software history impacted
    software evolution?

    View Slide

  19. We found 50
    proprietary
    projects that
    made the shift
    to open source
    15 did not answered our inquiries
    41 answers in total

    View Slide

  20. We found 50
    proprietary
    projects that
    made the shift
    to open source
    15 did not answered our inquiries

    View Slide

  21. We found 50
    proprietary
    projects that
    made the shift
    to open source
    15 did not answered our inquiries

    View Slide

  22. We found 50
    proprietary
    projects that
    made the shift
    to open source
    15 did not answered our inquiries

    View Slide

  23. We found 50
    proprietary
    projects that
    made the shift
    to open source
    15 did not answered our inquiries

    View Slide

  24. We found 50
    proprietary
    projects that
    made the shift
    to open source
    15 did not answered our inquiries

    View Slide

  25. RQ1. Why some projects do not open the software history?
    Extracting just the
    subfolder would have been
    difficult, and older versions
    would not have built
    First get something working,
    and then disentangle it from
    your own proprietary code,
    configuration, etc.
    Entangled with proprietary code
    Contains sensitive information
    Housekeeping needed
    License and legal reasons

    View Slide

  26. RQ1. Why some projects do not open the software history?
    The earliest commits may
    contain information we cannot
    share, so upon releasing we
    squashed the history
    Going through thousands of
    commits means no one will take on
    the heroic task of even open-
    sourcing the product
    Entangled with proprietary code
    Contains sensitive information
    Housekeeping needed
    License and legal reasons

    View Slide

  27. RQ1. Why some projects do not open the software history?
    We cleaned embarrassing or
    inappropriate comments,
    brought the code up to OSS
    standards …
    Entangled with proprietary code
    Contains sensitive information
    Housekeeping needed
    License and legal reasons

    View Slide

  28. RQ1. Why some projects do not open the software history?
    Made it much easier to get
    the lawyers at our parent
    company to agree to open
    source it
    Instead of reviewing the
    entire history, they could
    review just the current state
    Entangled with proprietary code
    Contains sensitive information
    Housekeeping needed
    License and legal reasons

    View Slide

  29. RQ2. What are the challenges to deal with a history free project?
    None of the core developers
    has wanted or needed to go
    look back through the
    history
    Communication, documentation,
    and idiomatic expressions of the
    Python code are sufficient to
    maintain project coherency

    View Slide

  30. RQ2. What are the challenges to deal with a history free project?
    We still use the non-git
    system internally and can
    refer to history if we need to
    I’m probably the person
    most likely to access it,
    and I’d estimate that I use
    it only a few times per year

    View Slide

  31. RQ2. What are the challenges to deal with a history free project?
    For a fast-moving project, history
    from more than half a year ago is
    not particularly valuable for
    development
    I am not aware of any problems for
    newcomers
    The lack of software history does
    not greatly impact software
    evolution and understanding

    View Slide

  32. Open challenges
    How to design tools to leverage and visualize the software history?
    How to improve tools to migrate code between repositories and to
    disentangle source code
    How to find sensitive information in the software history?
    How to estimate the cost of releasing the history?
    When do developers need to understand the software history?

    View Slide

  33. View Slide

  34. Leaving Behind the Software History When
    Transitioning to Open-Source: Reasons and
    Implications
    @gustavopinto @igorsteinmacher @gerosa_marco

    View Slide