Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Open Source Software Resilience and Epistemic Analysis

Open Source Software Resilience and Epistemic Analysis

Apostolos Kritikos

October 06, 2023
Tweet

More Decks by Apostolos Kritikos

Other Decks in Research

Transcript

  1. Open Source Software Resilience
    and Epistemic Analysis
    Apostolos Kritikos
    PhD Candidate
    School of Informatics
    Aristotle University of Thessaloniki
    Supervisor: Prof. Ioannis Stamelos
    A “glimpse to the future” from the 1820s
    1

    View full-size slide

  2. ● Introduction
    ● State of the Art
    ● Research Plan
    ● Open Source Software Resilience Framework (OSSRF)
    ● Metrics Aggregation
    ● Epistemic Network Analysis (ENA) in Open Source Software
    ● Applications of Open Source Software Resilience
    ● Threats to Validity
    ● Conclusions & Future Work
    ● Acknowledgements & Funding received
    2
    AGENDA
    Reference numbers in this
    presentation [x] are following the
    numbering of the thesis manuscript.

    View full-size slide

  3. INTRODUCTION
    3
    ● Open Source Software (OSS) evolution
    ● Two decades of growth since the Free/Libre Open Source Software movement.
    ● Attracted global developers and major industry players.
    ● Example: IBM's acquisition of RedHat.
    ● Numerous software evaluation models exist.
    ● Some of them specifically designed for Open Source Software.
    ● Yet, no universal model exists to the date.
    ● Can Urban Resilience be mapped to Software Resilience?
    ● City Resilience Framework (CRF) as introduced by Arup Institute / Rockefeller Foundation
    while Thessaloniki was participating to the 100 resilient cities program.
    ● Cities and Software share conceptual similarities.
    ● Epistemic Network Analysis can unlock structured knowledge from an Open Open Source
    Software
    ● Practical Applications of Software Resilience in Open Source Software

    View full-size slide

  4. STATE OF THE ART
    4
    The City Resilience Framework (CRF)
    (source: resilientcitiesnetwork.org)
    City Resilience Framework (CRF) Overview
    ● Developed by Arup Institute and Rockefeller Foundation.
    ● Objective: Define Urban Resilience in a dynamic, evolving
    world.
    ● Set of indicators for cities to measure resilience
    performance.
    ● Not a ranking or comparison tool but a self-improvement
    framework.
    ● Assists cities in leveraging resources for resilience.
    CRF Dimensions & Goals for City Resilience
    ● Health & Well-being: People in the city.
    ● Economy & Society: City's socio-economic organization.
    ● Infrastructure & Environment: City's infrastructure and
    ecosystems.
    ● Leadership & Strategy: Knowledge and future adaptability.

    View full-size slide

  5. STATE OF THE ART
    5
    Literature related to OSS success factors and OSS assessment
    and evaluation between 1977 and 2023
    Factors for OSS Success (literature)
    ● Quality models comparison between 1977-2013 by
    Miguel et al. [75]:
    ● Basic holistic models vs. tailor-made
    models for component evaluation.
    ● Inclusion of OSS-specific models.
    ● Comprehensive review of OSS project selection
    models since 2019 [61]:
    ● Importance of both numerical and
    qualitative criteria in OSS evaluation [98].
    ● IT managers prioritize long-term
    maintenance (at least 10 years) for OSS
    adoption.
    ● Evaluation should be consistent, especially
    after major releases, to track resilience
    evolution.
    ● License type and permissiveness [74].
    ● Community aspects: active developers, end users,
    and project localization [74].
    ● Governance of the project [76].
    ● Structural quality: a dominant factor studied
    extensively.
    ● Generic models: ISO25010 [43].
    ● OSS-specific models: OpenBRR [97].
    ● Social network analysis [92] and community
    maturity [30].

    View full-size slide

  6. STATE OF THE ART
    6
    ● Evaluation in OSS
    ● Primary Focus: Software quality.
    ● Limited exploration: Software health & trust.
    ● Software Health
    ● Connected to the longevity of OSS [30].
    ● Typically viewed within project scope [30].
    ● OSS health linked to social aspects in CHAOSS
    metrics [3].
    ● Resilience Definitions
    ● City Resilience: Ability of a system to cope with
    change [102].
    ● CRF: Capacity of cities to function through
    stresses or shocks, emphasizing vulnerable
    populations [41].
    ● Software: Can recover from hits to critical
    components [31].
    Open Source Software: The Concepts of Quality, Health, and Resilience
    ● OSS Dynamics and Resilience
    ● OSS faces changes in
    technology, governance, and
    community dynamics.
    ● CRF's resilience definition
    resonates with the OSS
    domain's need to endure
    stresses and shocks.

    View full-size slide

  7. STATE OF THE ART
    7
    Stressors and crises in Open Source Software
    ● Potential Crises in OSS Life Cycle
    ● Developer or user base loss to competition.
    ● Unsuccessful major releases.
    ● Project forks or migrations.
    ● New competitive software emergence.
    ● Hostile actions by commercial rivals.
    ● Technology evolution mismatches.
    ● Project sustainability concerns.
    ● Examples of OSS Facing Stressors
    ● (2010) OpenOffice to LibreOffice Transition: Oracle's
    acquisition of Sun triggers OpenOffice developer
    community concerns.
    ● (2023) Core-js Sustainability Crisis: Sustainability
    concerns due to project depending heavily (if not solely)
    on its founder [6, 7].

    View full-size slide

  8. STATE OF THE ART
    8
    Open Source Software and deep learning via Epistemic Analysis
    ● Deep learning involves understanding and adopting ways of thinking to explore and solve
    problems.
    ● Data science tools can analyze large volumes of data, but understanding deep learning requires
    more than just data analysis.
    ● Need to understand learning within the systems context.
    ● Qualitative Research:
    ○ Focuses on "Thick Description": understanding events in specific contexts.
    ○ Uses rich data on individual participants to find consistent interpretations.
    ○ Requires understanding of culture to interpret data meaningfully.
    ● Quantitative Research:
    ○ Relies on sampling from a larger population.
    ○ Concerned with systematic data collection and unbiased sample selection.
    ○ Seeks to generalize findings from a sample to a larger population.
    ● Epistemic Network Analysis (ENA): examines how codes are interrelated in discourse.

    View full-size slide

  9. RESEARCH PLAN & LIST OF PUBLICATIONS
    9
    ● Literature review
    ● Adaptation of the City Resilience
    Framework (CRF) to Open Source
    Software (OSS)
    ● Creation of Tool to Aggregate Metrics
    ● Application of the Resilience
    Framework to OSS Projects
    ● Other Applications:
    ○ Blockchain Rewards for OSS
    Contributors
    ● Epistemic Analysis in Software
    Engineering
    1. Apostolos Kritikos, Konstantina Papadopoulou, Ioannis Stamelos
    “Applying Epistemic Network Analysis to the discussions between
    Software Engineers in Open Source Software”, Journal of Software
    Engineering and Knowledge Engineering. IF: 1.007. (under review)
    2. Apostolos Kritikos, Ioannis Stamelos, “A resilience-based framework for
    assessing the evolution of open source software projects”, Journal of
    Software: Evolution & Processes, DOI: 10.1002/smr.2597. IF: 1.864
    3. Apostolos Kritikos, Prodromos Polychroniadis, Ioannis Stamelos,
    “Source-o-grapher: A tool towards the investigation of software resilience
    in Open Source Software projects”, SoftwareX, DOI: j.softx.2023.101337.
    IF: 2.868
    4. Apostolos Kritikos, Theodoros Venetis, Ioannis Stamelos “An Empirical
    Investigation of Sentiment Analysis of the Bug Tracking Process in Libre
    Office Open Source Software”. In IFIP International Conference on Open
    Source Systems 2020, 36-46.
    5. Apostolos Kritikos, Ioannis Stamelos “Open Source Software Resilience
    Framework”. In 14th International Conference on Open Source Systems
    (OSS2018), 8 – 10 June 2018, Athens, Greece.

    View full-size slide

  10. OPEN SOURCE SOFTWARE RESILIENCE FRAMEWORK (OSSRF)
    10
    ● Adapted from City Resilience
    Framework respecting the original
    structure.
    ● 4 dimension → 12 goals → 48
    indicators.
    ● Not meant to be used as:
    ○ A ranking tool between OSS
    projects.
    ○ A tool to provide a resilience
    score for a project.
    ● Our intention using OSSRF is:
    ○ To approach resilience from
    different aspects (dimensions),
    as the project evolves from a
    major release to the next.
    ○ To compare and contrast
    resilience trends between to or
    more OSS projects.

    View full-size slide

  11. OPEN SOURCE SOFTWARE RESILIENCE FRAMEWORK
    11

    View full-size slide

  12. Indicators
    OPEN SOURCE SOFTWARE RESILIENCE FRAMEWORK
    12
    ● Quantitative: These metrics are usually
    calculated with the help of tools.
    ○ Numerical
    ○ Boolean
    ○ Percentages
    ● Qualitative: These metrics are designed to
    be provided by experts with software on
    knowledge domain
    ○ Likert Scale
    As Wasserman states in [98], it is important
    for OSS evaluation models to include, apart
    from numerical scores and metrics, qualitative
    criteria as well.

    View full-size slide

  13. Resilience determination mechanism
    OPEN SOURCE SOFTWARE RESILIENCE FRAMEWORK
    13
    Indicator #1
    Indicator #2
    Indicator #3
    Indicator #4
    Indicator #5
    Indicator #6
    Indicator #7
    Indicator #8

    Indicator #30
    Indicator #31

    Indicator #48
    Goal #1
    Goal #2
    Goal #3
    Goal #4
    Goal #5
    Goal #6
    Goal #7
    Goal #8
    Goal #9
    Goal #10
    Goal #11
    Goal #12
    Dimension #1
    Dimension #2
    Dimension #3
    Dimension #4
    Average
    Resilience
    Indicators’
    Score
    (average resilience goals’ score)
    (average resilience goals’ score)
    (average resilience goals’ score)
    (average resilience goals’ score)
    FInal Resilience
    Assessment
    Resilient
    >= 50%
    Non-Resilient
    < 50%

    View full-size slide

  14. OPEN SOURCE SOFTWARE RESILIENCE FRAMEWORK
    14
    Sensitivity & Veto Principles investigation
    ● Our assessment model in its current version
    is unweighted, which means that all the
    indicators are equally contributing to the
    decision on whether an assessment
    concludes to a resilient or non resilient
    result.
    ● The fact that 14 of our model’s indicators are
    boolean lead to a concern that a specific
    value of a specific indicators (in the absence
    of weights) might be able to independently
    impact the decision of our assessment
    model.
    ● To address that concern we have conducted
    one-factor-at-a-time sensitivity analysis.
    ● Factors with high sensitivity: the only factor
    that presents high sensitivity is Testing Process
    (I08). More specifically if this boolean factor get
    the value 1 (true), it significantly increasing the
    resilience score for Source Code Dimension
    (D01). We have added this finding to our
    limitations and threats to validity section.
    ● There are no indicators to our model that
    function as veto principles: apart from the
    one-factor-at-a-time sensitivity analysis with
    baseline values, we repeated the analysis on a
    set of indicators values that lead to a resilient
    and non resilient project respectively. This way
    we wanted to ensure that a single indicator
    cannot independently alter the result of our
    model assessing a non resilient project as
    resilient and vice versa

    View full-size slide

  15. OPEN SOURCE SOFTWARE RESILIENCE FRAMEWORK
    15
    Applying Resilience Framework on Open Source Software: Resilient and Non Resilient Projects
    ● We applied the OSSRF assessment model to 5 consecutive versions of 3 intuitively resilient and 3
    intuitively non resilient projects.
    ● We selected these projects in order to present that our model can successfully distinguish between
    resilient and non resilient projects as they evolve in time (hence the 5 consecutive versions).
    ● For the following assessments, for the qualitative indicators, in the absence of experts we have
    decided to apply the value of 3 for the intuitive resilient OSS projects and for the non resilient
    projects, we applied a small penalty, resulting in a value of 2.

    View full-size slide

  16. OPEN SOURCE SOFTWARE RESILIENCE FRAMEWORK
    16
    Applying Resilience Framework on Open Source Software: Resilient and Non Resilient Projects
    A note on the selection of Qualitative Indicators
    ● In the absence of an expert and in order to
    keep the experiment as unbiased as possible
    we will be using average values (3) for the
    aforementioned indicators for the resilient
    group of projects expecting that the non
    average values will highlight the resilience of
    the project.
    ● For the non resilient projects, we will adopt the
    value of (2) for the qualitative indicators. The
    reason we will be doing that is that, percentage
    wise, the qualitative factor give on average a
    60% score to each indicator boosting the
    average above 50%.
    ● Since most of the non resilient project have a
    lifespan of 2 years and little activity and
    contributors community we believe that, without
    loss of generality, we can inject a small penalty
    to qualitative indicators such as robustness,
    scalability, usability and so forth.
    ● To verify our decision we conducted interviews with 5
    experts.
    ● We presented the 6 aforementioned projects as seen in
    Section 7 to the experts (identifying them as resilient
    and non resilient which is exactly the way we ran our
    tests for this scientific work) and we presented them
    with the definition of resilience as adopted from the
    CRF for the purposes of this manuscript.
    ● We also presented to them the qualitative indicators
    (and their definitions) as defined in this work. Then we
    asked them to independently provide, in their expert
    opinion, the appropriate values for the qualitative
    indicators (scoring them from 1 to 5, following the Likert
    scale).

    View full-size slide

  17. OPEN SOURCE SOFTWARE RESILIENCE FRAMEWORK
    17
    For the resilient projects, for all the
    indicators, we have an average score of 4
    from our experts, with the exception of
    Security (I07) that got an average of 5. This
    validates that using the median value (3) in
    our tests was more conservative than an
    expert would probably do.
    For the non resilient projects, for all the
    indicators, we have an average score of 2
    from our experts, with the exception of the
    Scalability Indicator (I02) that scored an
    average of 1. This validates that for the
    qualitative indicators it was reasonable to
    inject the penalty we chose.
    Applying Resilience Framework on Open Source Software: Resilient and Non Resilient Projects
    A note on the selection of Qualitative Indicators

    View full-size slide

  18. OPEN SOURCE SOFTWARE RESILIENCE FRAMEWORK
    18
    Applying Resilience Framework on Open Source Software: Resilient and Non Resilient Projects

    View full-size slide

  19. INDICATORS AGGREGATION
    19
    ● Metrics aggregation is a crucial part of our
    Open Source Software Resilience Model.
    ● As we already mentioned the model is
    utilizing 48 indicators and the calculation of
    some of them require from the user to
    extract data from different tools and
    applications like (a code repository, the
    website of the project, its issue tracker and
    so forth).
    ● Another interesting aspect is the challenges
    on applying the Open Source Software
    Resilience model in projects of different
    programing languages and / or hosted on
    different code repositories.

    View full-size slide

  20. INDICATORS AGGREGATION
    20
    Input:
    ● CSV based (for manual analysis)
    ● GUI based (for semi-automated analysis)
    Output:
    ● Resilience analysis (text)
    ● Goals scores represented as spider chart
    ● Dimensions scores represented as bar chart
    Integrations:
    ● The tool is currently integrated with Github
    ● The tool currently supports analysis for PHP
    projects utilizing the PHPMetrics library
    The tool is available as open source software under
    the MIT license.

    View full-size slide

  21. EPISTEMIC NETWORK ANALYSIS IN OPEN SOURCE SOFTWARE
    21
    ● A scientific framework consists of five
    fields: Epistemology, Identity, Knowledge,
    Skills, and Values.
    ● For each field of the scientific framework,
    certain codes are defined. These codes
    were determined using the Software
    Engineering Body of Knowledge
    (SWEBOK) [35], which describes
    generally accepted knowledge about
    software engineering and is freely
    available online.

    View full-size slide

  22. EPISTEMIC NETWORK ANALYSIS IN OPEN SOURCE SOFTWARE
    22
    ● The open source projects selected for the
    analysis are OpenOffice and LibreOffice.
    ● The dialogue lines were first recorded in
    a .csv file, categorized by project and
    dialogue bug/number. Subsequently, all
    the codes of the scientific framework
    were added as columns. For each
    dialogue line, we set the value one (1) in
    the cells of the codes where there was
    considered to be conceptual correlation,
    and a zero was placed in those that did
    not have it, as shown to the table in the
    right of the slide.
    ● The visualization of networks, were
    performed using an online tool called
    ENA.

    View full-size slide

  23. EPISTEMIC NETWORK ANALYSIS IN OPEN SOURCE SOFTWARE
    23
    The ENA WebKit performs two main
    functions:
    1. It processes encoded data:
    a. Takes the data table
    b. Divides the lines into stanzas
    c. Accumulates codes per stanza
    d. Generates a set of adjacency
    matrices
    e. Creates an aggregated adjacency
    matrix representing the
    connections between encoded
    objects for each unit of analysis
    f. Produces dimensionality reduction
    for data representation
    2. It uses the results of this analysis to
    generate visualizations that facilitate
    data exploration and interpretation.
    Initially, the selection was made for:
    Units, Conversation, Stanza Window,
    Codes, Optional Comparison

    View full-size slide

  24. EPISTEMIC NETWORK ANALYSIS IN OPEN SOURCE SOFTWARE
    24
    Experiment 1: Comparison of the two
    projects
    The parameter selections for this specific
    experiment include the following parameter
    choices:
    ● As units, the columns ”Project” and
    ”Username”.
    ● As conversation, the columns
    ”Username,” ”Project,” and ”Bug.”
    ● As stanza window, a moving window of
    four lines.
    ● All codes from the .csv file were
    selected as codes.
    ● As comparison, the ”Project” column.
    it is evident that LibreOffice
    has stronger connections
    than OpenOffice, and its
    connections extend across
    more domains of the
    epistemic frame.
    Consequently, we could say
    that LibreOffice exhibited
    more scientific dialogues.

    View full-size slide

  25. EPISTEMIC NETWORK ANALYSIS IN OPEN SOURCE SOFTWARE
    25
    Experiment 2: Comparison of bugs for the
    two projects
    The parameter selections for this specific
    experiment include the following parameter
    choices:
    ● As units, the columns ”Bug” and
    ”Username”,
    ● As conversation, the columns
    ”Username,” ”Project,” and ”Bug”,
    ● As stanza window, a moving window of
    four lines,
    ● All codes from the .csv file were
    selected as codes,
    ● As comparison, the ”Bug” column.
    we understand that the
    two networks extend into
    the same domains of the
    epistemic frame with
    minimal differences and
    similar strength in their
    connections. Therefore,
    we could say that the
    two bugs appear to have
    a similar scientific level.
    we understand that the two
    networks extend to
    different sides of the
    epistemic frame. we could
    say that bug 1_1 presents
    more scientific dialogues,
    and it would be beneficial
    for participants in bug 1_2
    to use more epistemology
    in their discourse.

    View full-size slide

  26. APPLICATIONS OF OPEN SOURCE SOFTWARE RESILIENCE & EPISTEMIC ANALYSIS
    26
    Experiment 3: Comparisons among the
    conversationalists
    The parameter settings for this specific
    experiment are the following:
    ● For units, the Username column.
    ● For conversation, the Username,
    Project, and Bug columns
    ● For stanza window, a moving
    window of four lines
    ● For codes, all codes from the .csv
    file
    ● For comparison, no column.
    We understand that the network of
    Patrticipant_A extends across the
    entire epistemic frame with stronger
    connections in the fields of
    knowledge and certain areas of
    skills and epistemology. In contrast,
    the network of Participant_B mainly
    extends to one side of the
    epistemic frame, making fewer
    connections to fields from different
    domains of the epistemic frame.
    We could thus say that
    Patrticipant_A seems to exhibit a
    more scientific discourse.

    View full-size slide

  27. THREATS TO VALIDITY
    27
    ● OSSRF Limitations:
    ● Should be applied to OSS projects active for at least one year with ≥10 contributors.
    ● Transition from CRF (City Resilience Framework) to OSS is influenced by authors' subjective
    interpretations.
    ● Indicators from Software Quality could introduce validity concerns.
    ● All indicators in OSSRF are treated equally, without weightage.
    ● Sensitivity analysis shows some indicators (like Testing Process) are highly sensitive.
    ● Assessment Tools:
    ● Used specific commercial and open-source tools for evaluation.
    ● Engaged with limited industry experts for validation, which could introduce biases.
    ● Tools primarily developed in PHP.

    View full-size slide

  28. THREATS TO VALIDITY
    28
    ● Metrics Aggregation
    ○ Semi-automated tool integrated with Github and PHPMetrics.
    ○ Manual input feature as a backup.
    ○ Best used for mature OSS projects.
    ○ Tailored from City Resilience framework; influenced by authors' interpretations.
    ○ Optimized for Github (PHP projects) and Ubuntu Linux OS.
    ● Epistemic Network Analysis (ENA)
    ○ Explored ENA's potential through three experiments on dialogues.
    ○ Applied to OpenOffice and LibreOffice; LibreOffice forked from OpenOffice.
    ○ Limited selection of bugs and non-random participant selection.
    ○ Analysis done using ENA WebKit; codes used influenced by authors' views.

    View full-size slide

  29. CONCLUSIONS & FUTURE WORK
    29
    ● Framework Application:
    ● Applied to six open source projects; 3 (intuitively) resilient and 3 (intuitively) non-resilient.
    ● Resilient projects score higher in Business, Legal, and Community aspects.
    ● Non-resilient projects often lack vision for sustainability or community engagement.
    ● OSSRF closely monitors project releases, identifying resilience downturns.
    ● Future plans: evaluate qualitative indicators, consider varying factors like repositories and
    languages, and refine sensitivity analysis.

    View full-size slide

  30. CONCLUSIONS & FUTURE WORK
    30
    ● Metrics Aggregation
    ● Tool designed for assessing OSS projects' resilience.
    ● Seamless integration with Github and potential expansion to GitLab.
    ● Plans to evaluate broader language spectrum, integrate with Grimoire Lab tool, and introduce
    innovative visualizations.
    ● Epistemic Network Analysis:
    ● Focus on log files from educational endeavors.
    ● Applied to LibreOffice and OpenOffice; LibreOffice had denser scientific dialogues.
    ● Future scope: explore diverse communities, analyze dialogues from programmers vs.
    non-programmers, and use Large Language Models for automation.

    View full-size slide

  31. CONCLUSIONS & FUTURE WORK
    31
    ● Future applications and follow up research
    ● Research:
    i. From Software Resilience to Software Antifragility (Introduced by Nassim Nicholas
    Taleb)
    ● Applications:
    i. Software selection for large companies or organizations (i.e. The European
    Commission’s - EU Open Source Software Strategy 2020-2023 specifically identified
    the need of a way to compare and contrast OSS projects for their sustainability and
    longevity in order to be adopted on an EU level).
    ii. Talent Acquisition / Recruiting: Software resilience as a way of promoting engineers.

    View full-size slide

  32. ACKNOWLEDGEMENTS & FUNDING RECEIVED
    32
    This research is co-financed by Greece and the European Union (European Social Fund- ESF) through
    the Operational Programme «Human Resources Development, Education and Lifelong Learning» in the
    context of the project «Strengthening Human Resources Research Potential via Doctorate Research»
    (MIS-5000432), implemented by the State Scholarships Foundation (IKY).

    View full-size slide

  33. A “glimpse to the future” from the 1820s
    33
    Thank you!

    View full-size slide