Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Open Source Software Resilience and Epistemic Analysis

Open Source Software Resilience and Epistemic Analysis

Apostolos Kritikos

October 06, 2023
Tweet

More Decks by Apostolos Kritikos

Other Decks in Research

Transcript

  1. Open Source Software Resilience and Epistemic Analysis Apostolos Kritikos PhD

    Candidate School of Informatics Aristotle University of Thessaloniki Supervisor: Prof. Ioannis Stamelos A “glimpse to the future” from the 1820s 1
  2. • Introduction • State of the Art • Research Plan

    • Open Source Software Resilience Framework (OSSRF) • Metrics Aggregation • Epistemic Network Analysis (ENA) in Open Source Software • Applications of Open Source Software Resilience • Threats to Validity • Conclusions & Future Work • Acknowledgements & Funding received 2 AGENDA Reference numbers in this presentation [x] are following the numbering of the thesis manuscript.
  3. INTRODUCTION 3 • Open Source Software (OSS) evolution • Two

    decades of growth since the Free/Libre Open Source Software movement. • Attracted global developers and major industry players. • Example: IBM's acquisition of RedHat. • Numerous software evaluation models exist. • Some of them specifically designed for Open Source Software. • Yet, no universal model exists to the date. • Can Urban Resilience be mapped to Software Resilience? • City Resilience Framework (CRF) as introduced by Arup Institute / Rockefeller Foundation while Thessaloniki was participating to the 100 resilient cities program. • Cities and Software share conceptual similarities. • Epistemic Network Analysis can unlock structured knowledge from an Open Open Source Software • Practical Applications of Software Resilience in Open Source Software
  4. STATE OF THE ART 4 The City Resilience Framework (CRF)

    (source: resilientcitiesnetwork.org) City Resilience Framework (CRF) Overview • Developed by Arup Institute and Rockefeller Foundation. • Objective: Define Urban Resilience in a dynamic, evolving world. • Set of indicators for cities to measure resilience performance. • Not a ranking or comparison tool but a self-improvement framework. • Assists cities in leveraging resources for resilience. CRF Dimensions & Goals for City Resilience • Health & Well-being: People in the city. • Economy & Society: City's socio-economic organization. • Infrastructure & Environment: City's infrastructure and ecosystems. • Leadership & Strategy: Knowledge and future adaptability.
  5. STATE OF THE ART 5 Literature related to OSS success

    factors and OSS assessment and evaluation between 1977 and 2023 Factors for OSS Success (literature) • Quality models comparison between 1977-2013 by Miguel et al. [75]: • Basic holistic models vs. tailor-made models for component evaluation. • Inclusion of OSS-specific models. • Comprehensive review of OSS project selection models since 2019 [61]: • Importance of both numerical and qualitative criteria in OSS evaluation [98]. • IT managers prioritize long-term maintenance (at least 10 years) for OSS adoption. • Evaluation should be consistent, especially after major releases, to track resilience evolution. • License type and permissiveness [74]. • Community aspects: active developers, end users, and project localization [74]. • Governance of the project [76]. • Structural quality: a dominant factor studied extensively. • Generic models: ISO25010 [43]. • OSS-specific models: OpenBRR [97]. • Social network analysis [92] and community maturity [30].
  6. STATE OF THE ART 6 • Evaluation in OSS •

    Primary Focus: Software quality. • Limited exploration: Software health & trust. • Software Health • Connected to the longevity of OSS [30]. • Typically viewed within project scope [30]. • OSS health linked to social aspects in CHAOSS metrics [3]. • Resilience Definitions • City Resilience: Ability of a system to cope with change [102]. • CRF: Capacity of cities to function through stresses or shocks, emphasizing vulnerable populations [41]. • Software: Can recover from hits to critical components [31]. Open Source Software: The Concepts of Quality, Health, and Resilience • OSS Dynamics and Resilience • OSS faces changes in technology, governance, and community dynamics. • CRF's resilience definition resonates with the OSS domain's need to endure stresses and shocks.
  7. STATE OF THE ART 7 Stressors and crises in Open

    Source Software • Potential Crises in OSS Life Cycle • Developer or user base loss to competition. • Unsuccessful major releases. • Project forks or migrations. • New competitive software emergence. • Hostile actions by commercial rivals. • Technology evolution mismatches. • Project sustainability concerns. • Examples of OSS Facing Stressors • (2010) OpenOffice to LibreOffice Transition: Oracle's acquisition of Sun triggers OpenOffice developer community concerns. • (2023) Core-js Sustainability Crisis: Sustainability concerns due to project depending heavily (if not solely) on its founder [6, 7].
  8. STATE OF THE ART 8 Open Source Software and deep

    learning via Epistemic Analysis • Deep learning involves understanding and adopting ways of thinking to explore and solve problems. • Data science tools can analyze large volumes of data, but understanding deep learning requires more than just data analysis. • Need to understand learning within the systems context. • Qualitative Research: ◦ Focuses on "Thick Description": understanding events in specific contexts. ◦ Uses rich data on individual participants to find consistent interpretations. ◦ Requires understanding of culture to interpret data meaningfully. • Quantitative Research: ◦ Relies on sampling from a larger population. ◦ Concerned with systematic data collection and unbiased sample selection. ◦ Seeks to generalize findings from a sample to a larger population. • Epistemic Network Analysis (ENA): examines how codes are interrelated in discourse.
  9. RESEARCH PLAN & LIST OF PUBLICATIONS 9 • Literature review

    • Adaptation of the City Resilience Framework (CRF) to Open Source Software (OSS) • Creation of Tool to Aggregate Metrics • Application of the Resilience Framework to OSS Projects • Other Applications: ◦ Blockchain Rewards for OSS Contributors • Epistemic Analysis in Software Engineering 1. Apostolos Kritikos, Konstantina Papadopoulou, Ioannis Stamelos “Applying Epistemic Network Analysis to the discussions between Software Engineers in Open Source Software”, Journal of Software Engineering and Knowledge Engineering. IF: 1.007. (under review) 2. Apostolos Kritikos, Ioannis Stamelos, “A resilience-based framework for assessing the evolution of open source software projects”, Journal of Software: Evolution & Processes, DOI: 10.1002/smr.2597. IF: 1.864 3. Apostolos Kritikos, Prodromos Polychroniadis, Ioannis Stamelos, “Source-o-grapher: A tool towards the investigation of software resilience in Open Source Software projects”, SoftwareX, DOI: j.softx.2023.101337. IF: 2.868 4. Apostolos Kritikos, Theodoros Venetis, Ioannis Stamelos “An Empirical Investigation of Sentiment Analysis of the Bug Tracking Process in Libre Office Open Source Software”. In IFIP International Conference on Open Source Systems 2020, 36-46. 5. Apostolos Kritikos, Ioannis Stamelos “Open Source Software Resilience Framework”. In 14th International Conference on Open Source Systems (OSS2018), 8 – 10 June 2018, Athens, Greece.
  10. OPEN SOURCE SOFTWARE RESILIENCE FRAMEWORK (OSSRF) 10 • Adapted from

    City Resilience Framework respecting the original structure. • 4 dimension → 12 goals → 48 indicators. • Not meant to be used as: ◦ A ranking tool between OSS projects. ◦ A tool to provide a resilience score for a project. • Our intention using OSSRF is: ◦ To approach resilience from different aspects (dimensions), as the project evolves from a major release to the next. ◦ To compare and contrast resilience trends between to or more OSS projects.
  11. Indicators OPEN SOURCE SOFTWARE RESILIENCE FRAMEWORK 12 • Quantitative: These

    metrics are usually calculated with the help of tools. ◦ Numerical ◦ Boolean ◦ Percentages • Qualitative: These metrics are designed to be provided by experts with software on knowledge domain ◦ Likert Scale As Wasserman states in [98], it is important for OSS evaluation models to include, apart from numerical scores and metrics, qualitative criteria as well.
  12. Resilience determination mechanism OPEN SOURCE SOFTWARE RESILIENCE FRAMEWORK 13 Indicator

    #1 Indicator #2 Indicator #3 Indicator #4 Indicator #5 Indicator #6 Indicator #7 Indicator #8 … Indicator #30 Indicator #31 … Indicator #48 Goal #1 Goal #2 Goal #3 Goal #4 Goal #5 Goal #6 Goal #7 Goal #8 Goal #9 Goal #10 Goal #11 Goal #12 Dimension #1 Dimension #2 Dimension #3 Dimension #4 Average Resilience Indicators’ Score (average resilience goals’ score) (average resilience goals’ score) (average resilience goals’ score) (average resilience goals’ score) FInal Resilience Assessment Resilient >= 50% Non-Resilient < 50%
  13. OPEN SOURCE SOFTWARE RESILIENCE FRAMEWORK 14 Sensitivity & Veto Principles

    investigation • Our assessment model in its current version is unweighted, which means that all the indicators are equally contributing to the decision on whether an assessment concludes to a resilient or non resilient result. • The fact that 14 of our model’s indicators are boolean lead to a concern that a specific value of a specific indicators (in the absence of weights) might be able to independently impact the decision of our assessment model. • To address that concern we have conducted one-factor-at-a-time sensitivity analysis. • Factors with high sensitivity: the only factor that presents high sensitivity is Testing Process (I08). More specifically if this boolean factor get the value 1 (true), it significantly increasing the resilience score for Source Code Dimension (D01). We have added this finding to our limitations and threats to validity section. • There are no indicators to our model that function as veto principles: apart from the one-factor-at-a-time sensitivity analysis with baseline values, we repeated the analysis on a set of indicators values that lead to a resilient and non resilient project respectively. This way we wanted to ensure that a single indicator cannot independently alter the result of our model assessing a non resilient project as resilient and vice versa
  14. OPEN SOURCE SOFTWARE RESILIENCE FRAMEWORK 15 Applying Resilience Framework on

    Open Source Software: Resilient and Non Resilient Projects • We applied the OSSRF assessment model to 5 consecutive versions of 3 intuitively resilient and 3 intuitively non resilient projects. • We selected these projects in order to present that our model can successfully distinguish between resilient and non resilient projects as they evolve in time (hence the 5 consecutive versions). • For the following assessments, for the qualitative indicators, in the absence of experts we have decided to apply the value of 3 for the intuitive resilient OSS projects and for the non resilient projects, we applied a small penalty, resulting in a value of 2.
  15. OPEN SOURCE SOFTWARE RESILIENCE FRAMEWORK 16 Applying Resilience Framework on

    Open Source Software: Resilient and Non Resilient Projects A note on the selection of Qualitative Indicators • In the absence of an expert and in order to keep the experiment as unbiased as possible we will be using average values (3) for the aforementioned indicators for the resilient group of projects expecting that the non average values will highlight the resilience of the project. • For the non resilient projects, we will adopt the value of (2) for the qualitative indicators. The reason we will be doing that is that, percentage wise, the qualitative factor give on average a 60% score to each indicator boosting the average above 50%. • Since most of the non resilient project have a lifespan of 2 years and little activity and contributors community we believe that, without loss of generality, we can inject a small penalty to qualitative indicators such as robustness, scalability, usability and so forth. • To verify our decision we conducted interviews with 5 experts. • We presented the 6 aforementioned projects as seen in Section 7 to the experts (identifying them as resilient and non resilient which is exactly the way we ran our tests for this scientific work) and we presented them with the definition of resilience as adopted from the CRF for the purposes of this manuscript. • We also presented to them the qualitative indicators (and their definitions) as defined in this work. Then we asked them to independently provide, in their expert opinion, the appropriate values for the qualitative indicators (scoring them from 1 to 5, following the Likert scale).
  16. OPEN SOURCE SOFTWARE RESILIENCE FRAMEWORK 17 For the resilient projects,

    for all the indicators, we have an average score of 4 from our experts, with the exception of Security (I07) that got an average of 5. This validates that using the median value (3) in our tests was more conservative than an expert would probably do. For the non resilient projects, for all the indicators, we have an average score of 2 from our experts, with the exception of the Scalability Indicator (I02) that scored an average of 1. This validates that for the qualitative indicators it was reasonable to inject the penalty we chose. Applying Resilience Framework on Open Source Software: Resilient and Non Resilient Projects A note on the selection of Qualitative Indicators
  17. OPEN SOURCE SOFTWARE RESILIENCE FRAMEWORK 18 Applying Resilience Framework on

    Open Source Software: Resilient and Non Resilient Projects
  18. INDICATORS AGGREGATION 19 • Metrics aggregation is a crucial part

    of our Open Source Software Resilience Model. • As we already mentioned the model is utilizing 48 indicators and the calculation of some of them require from the user to extract data from different tools and applications like (a code repository, the website of the project, its issue tracker and so forth). • Another interesting aspect is the challenges on applying the Open Source Software Resilience model in projects of different programing languages and / or hosted on different code repositories.
  19. INDICATORS AGGREGATION 20 Input: • CSV based (for manual analysis)

    • GUI based (for semi-automated analysis) Output: • Resilience analysis (text) • Goals scores represented as spider chart • Dimensions scores represented as bar chart Integrations: • The tool is currently integrated with Github • The tool currently supports analysis for PHP projects utilizing the PHPMetrics library The tool is available as open source software under the MIT license.
  20. EPISTEMIC NETWORK ANALYSIS IN OPEN SOURCE SOFTWARE 21 • A

    scientific framework consists of five fields: Epistemology, Identity, Knowledge, Skills, and Values. • For each field of the scientific framework, certain codes are defined. These codes were determined using the Software Engineering Body of Knowledge (SWEBOK) [35], which describes generally accepted knowledge about software engineering and is freely available online.
  21. EPISTEMIC NETWORK ANALYSIS IN OPEN SOURCE SOFTWARE 22 • The

    open source projects selected for the analysis are OpenOffice and LibreOffice. • The dialogue lines were first recorded in a .csv file, categorized by project and dialogue bug/number. Subsequently, all the codes of the scientific framework were added as columns. For each dialogue line, we set the value one (1) in the cells of the codes where there was considered to be conceptual correlation, and a zero was placed in those that did not have it, as shown to the table in the right of the slide. • The visualization of networks, were performed using an online tool called ENA.
  22. EPISTEMIC NETWORK ANALYSIS IN OPEN SOURCE SOFTWARE 23 The ENA

    WebKit performs two main functions: 1. It processes encoded data: a. Takes the data table b. Divides the lines into stanzas c. Accumulates codes per stanza d. Generates a set of adjacency matrices e. Creates an aggregated adjacency matrix representing the connections between encoded objects for each unit of analysis f. Produces dimensionality reduction for data representation 2. It uses the results of this analysis to generate visualizations that facilitate data exploration and interpretation. Initially, the selection was made for: Units, Conversation, Stanza Window, Codes, Optional Comparison
  23. EPISTEMIC NETWORK ANALYSIS IN OPEN SOURCE SOFTWARE 24 Experiment 1:

    Comparison of the two projects The parameter selections for this specific experiment include the following parameter choices: • As units, the columns ”Project” and ”Username”. • As conversation, the columns ”Username,” ”Project,” and ”Bug.” • As stanza window, a moving window of four lines. • All codes from the .csv file were selected as codes. • As comparison, the ”Project” column. it is evident that LibreOffice has stronger connections than OpenOffice, and its connections extend across more domains of the epistemic frame. Consequently, we could say that LibreOffice exhibited more scientific dialogues.
  24. EPISTEMIC NETWORK ANALYSIS IN OPEN SOURCE SOFTWARE 25 Experiment 2:

    Comparison of bugs for the two projects The parameter selections for this specific experiment include the following parameter choices: • As units, the columns ”Bug” and ”Username”, • As conversation, the columns ”Username,” ”Project,” and ”Bug”, • As stanza window, a moving window of four lines, • All codes from the .csv file were selected as codes, • As comparison, the ”Bug” column. we understand that the two networks extend into the same domains of the epistemic frame with minimal differences and similar strength in their connections. Therefore, we could say that the two bugs appear to have a similar scientific level. we understand that the two networks extend to different sides of the epistemic frame. we could say that bug 1_1 presents more scientific dialogues, and it would be beneficial for participants in bug 1_2 to use more epistemology in their discourse.
  25. APPLICATIONS OF OPEN SOURCE SOFTWARE RESILIENCE & EPISTEMIC ANALYSIS 26

    Experiment 3: Comparisons among the conversationalists The parameter settings for this specific experiment are the following: • For units, the Username column. • For conversation, the Username, Project, and Bug columns • For stanza window, a moving window of four lines • For codes, all codes from the .csv file • For comparison, no column. We understand that the network of Patrticipant_A extends across the entire epistemic frame with stronger connections in the fields of knowledge and certain areas of skills and epistemology. In contrast, the network of Participant_B mainly extends to one side of the epistemic frame, making fewer connections to fields from different domains of the epistemic frame. We could thus say that Patrticipant_A seems to exhibit a more scientific discourse.
  26. THREATS TO VALIDITY 27 • OSSRF Limitations: • Should be

    applied to OSS projects active for at least one year with ≥10 contributors. • Transition from CRF (City Resilience Framework) to OSS is influenced by authors' subjective interpretations. • Indicators from Software Quality could introduce validity concerns. • All indicators in OSSRF are treated equally, without weightage. • Sensitivity analysis shows some indicators (like Testing Process) are highly sensitive. • Assessment Tools: • Used specific commercial and open-source tools for evaluation. • Engaged with limited industry experts for validation, which could introduce biases. • Tools primarily developed in PHP.
  27. THREATS TO VALIDITY 28 • Metrics Aggregation ◦ Semi-automated tool

    integrated with Github and PHPMetrics. ◦ Manual input feature as a backup. ◦ Best used for mature OSS projects. ◦ Tailored from City Resilience framework; influenced by authors' interpretations. ◦ Optimized for Github (PHP projects) and Ubuntu Linux OS. • Epistemic Network Analysis (ENA) ◦ Explored ENA's potential through three experiments on dialogues. ◦ Applied to OpenOffice and LibreOffice; LibreOffice forked from OpenOffice. ◦ Limited selection of bugs and non-random participant selection. ◦ Analysis done using ENA WebKit; codes used influenced by authors' views.
  28. CONCLUSIONS & FUTURE WORK 29 • Framework Application: • Applied

    to six open source projects; 3 (intuitively) resilient and 3 (intuitively) non-resilient. • Resilient projects score higher in Business, Legal, and Community aspects. • Non-resilient projects often lack vision for sustainability or community engagement. • OSSRF closely monitors project releases, identifying resilience downturns. • Future plans: evaluate qualitative indicators, consider varying factors like repositories and languages, and refine sensitivity analysis.
  29. CONCLUSIONS & FUTURE WORK 30 • Metrics Aggregation • Tool

    designed for assessing OSS projects' resilience. • Seamless integration with Github and potential expansion to GitLab. • Plans to evaluate broader language spectrum, integrate with Grimoire Lab tool, and introduce innovative visualizations. • Epistemic Network Analysis: • Focus on log files from educational endeavors. • Applied to LibreOffice and OpenOffice; LibreOffice had denser scientific dialogues. • Future scope: explore diverse communities, analyze dialogues from programmers vs. non-programmers, and use Large Language Models for automation.
  30. CONCLUSIONS & FUTURE WORK 31 • Future applications and follow

    up research • Research: i. From Software Resilience to Software Antifragility (Introduced by Nassim Nicholas Taleb) • Applications: i. Software selection for large companies or organizations (i.e. The European Commission’s - EU Open Source Software Strategy 2020-2023 specifically identified the need of a way to compare and contrast OSS projects for their sustainability and longevity in order to be adopted on an EU level). ii. Talent Acquisition / Recruiting: Software resilience as a way of promoting engineers.
  31. ACKNOWLEDGEMENTS & FUNDING RECEIVED 32 This research is co-financed by

    Greece and the European Union (European Social Fund- ESF) through the Operational Programme «Human Resources Development, Education and Lifelong Learning» in the context of the project «Strengthening Human Resources Research Potential via Doctorate Research» (MIS-5000432), implemented by the State Scholarships Foundation (IKY).