Slide 1

Slide 1 text

So ware Heritage key infrastructure for Open Science and So ware Science Jaime Arias Research Engineer CNRS, LIPN, Université Sorbonne Paris Nord November 27, 2024 THE GREAT LIBRARY OF SOURCE CODE Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 1 / 28

Slide 2

Slide 2 text

Who am I? Hello! I am Jaime Arias CNRS Research Engineer @ LIPN Member @ Collège Codes Sources et Logiciels Ambassador @ So ware Heritage You can find me at: [email protected] https://www.jaime-arias.fr Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 1 / 28

Slide 3

Slide 3 text

TL;DR Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 2 / 28

Slide 4

Slide 4 text

TL;DR Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 2 / 28

Slide 5

Slide 5 text

Outline 1 Open Science & So ware 2 So ware Heritage for Open Science and Reproducibility 3 ick Demo ! 4 Call to action 5 Conclusion Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 3 / 28

Slide 6

Slide 6 text

So ware source code is precious knowledge Apollo 11 source code (excerpt) Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 3 / 28

Slide 7

Slide 7 text

So ware source code is precious knowledge Apollo 11 source code (excerpt) ake III source code ( excerpt ) Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 3 / 28

Slide 8

Slide 8 text

So ware source code is precious knowledge Harold Abelson, Structure and Interpretation of Computer Programs (1st ed.) 1985 “Programs must be wri en for people to read, and only incidentally for machines to execute.” Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 4 / 28

Slide 9

Slide 9 text

So ware source code is precious knowledge Harold Abelson, Structure and Interpretation of Computer Programs (1st ed.) 1985 “Programs must be wri en for people to read, and only incidentally for machines to execute.” Len Shustek, Computer History Museum 2006 “Source code provides a view into the mind of the designer.” Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 4 / 28

Slide 10

Slide 10 text

So ware source code is precious knowledge Harold Abelson, Structure and Interpretation of Computer Programs (1st ed.) 1985 “Programs must be wri en for people to read, and only incidentally for machines to execute.” Len Shustek, Computer History Museum 2006 “Source code provides a view into the mind of the designer.” Sonatype Survey 2017 80% to 90% of a new application is ... just to reuse! Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 4 / 28

Slide 11

Slide 11 text

So ware source code is precious knowledge Harold Abelson, Structure and Interpretation of Computer Programs (1st ed.) 1985 “Programs must be wri en for people to read, and only incidentally for machines to execute.” Len Shustek, Computer History Museum 2006 “Source code provides a view into the mind of the designer.” Sonatype Survey 2017 80% to 90% of a new application is ... just to reuse! Art. L. 112-2 du Code de la Propriété Intellectuelle 1994 “Sont considérés notamment comme œuvres de l’esprit au sens du présent code: ... 13o «Les logiciels, y compris le matériel de conception préparatoire»; ...” Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 4 / 28

Slide 12

Slide 12 text

So ware is a pillar of Open Science So ware powers modern research Over 20% of articles using so ware across all disciplines share it 2024 French Open Science Monitor Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 5 / 28

Slide 13

Slide 13 text

Source code is special (so ware is not data) So ware evolves over time projects may last decades the development history is key to its understanding Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 6 / 28

Slide 14

Slide 14 text

Source code is special (so ware is not data) So ware evolves over time projects may last decades the development history is key to its understanding Complexity millions of lines of code large web of dependencies easy to break, di icult to maintain research so ware a thin top layer sophisticated developer communities python3-matplotlib python3-dateutil python3-six (>= 1.4) python3:any python-matplotlib-data (>= 3.0.2-2) python3-pyparsing (>= 1.5.6) libjs-jquery libjs-jquery-ui python3-numpy (>= 1:1.14.3) python3 (<< 3.8) (>= 3.7~) python3-numpy-abi9 python3-cycler (>= 0.10.0) python3-kiwisolver libfreetype6 (>= 2.2.1) libpng16-16 (>= 1.6.2-1) python3-pil python3-tk (>= 1.5) (>= 3.2~) tzdata [python3] [python3] {debconf} debconf-2.0 (>= 0.5) [debconf] {cdebconf} fonts-lyx ttf-bitstream-vera (>= 3.3.2-2~) jquery javascript-common (>= 1.7) (<< 3.8) (>= 3.7~) python3.7:any libblas3 libblas.so.3 liblapack3 liblapack.so.3 python3-pkg-resources python3-minimal (= 3.7.3-1) python3.7 (>= 3.7.3-1~) libpython3-stdlib (= 3.7.3-1) python3.7-minimal (>= 3.7.3-1~) {dpkg} install-info (>= 1.13.20) libpython3.7-minimal (= 3.7.3-2) libexpat1 (>= 2.1~beta3) libssl1.1 (>= 1.1.1) libpython3.7-stdlib (>= 0.5) (= 3.7.3-2) mime-support libbz2-1.0 liblzma5 (>= 5.1.1alpha+20120614) libdb5.3 libffi6 (>= 3.0.4) libmpdec2 libncursesw6 (>= 6) libtinfo6 (>= 6) libreadline7 (>= 7.0~beta) libsqlite3-0 (>= 3.7.15) libuuid1 (>= 2.20.1) bzip2 file xz-utils (= 1.0.6-9) libmagic1 (= 1:5.35-4) libmagic-mgc (= 1:5.35-4) (>= 5.2.2) xz-lzma (= 6.1+20181013-2) libgpm2 (>= 6) readline-common (>= 1.15.4) libreadline-common (>= 1.16.1) uuid-runtime (>= 2.25-5~) (>= 2.31.1) adduser libsmartcols1 (>= 2.27~rc1) libsystemd0 (>= 0.5) passwd (>= 5.1.1alpha+20120614) libgcrypt20 (>= 1.8.0) liblz4-1 (>= 0.0~r122) libgpg-error0 (>= 1.25) libgpg-error-l10n (= 3.7.3-2) (= 3.7.3-2) (>= 3.7.3-1~) [python3.7] [python3.7] libgfortran5 (>= 8) libquadmath0 (>= 4.6) ... -6- gcc-9-base (= 9-20190428-1) (>= 4.6) (= 9-20190428-1) (>= 8) (>= 4.6) ... -3- (>= 3.3.2-2~) (<< 3.8) (>= 3.6~) (>= 1.6.2-1) (<< 3.8) (>= 3.7~) (>= 2.2.1) [mime-support] python3-pil.imagetk libimagequant0 (>= 2.11.10) libjpeg62-turbo (>= 1.3.1) liblcms2-2 (>= 2.2+git20110628) libtiff5 (>= 4.0.3) libwebp6 (>= 0.5.1) libwebpdemux2 (>= 0.5.1) libwebpmux3 (>= 0.6.1-2) python3-olefile (<< 3.8) (>= 3.7~) (= 6.0.0-1) (>= 3.4.1-2) (>= 3.7.1-1~) (<< 3.9) blt (>= 2.4z-9) tk8.6-blt2.5 (>= 2.5.3) libtcl8.6 (>= 8.6.0) libtk8.6 (>= 8.6.0) (= 2.5.3+dfsg-5) (>= 8.6.0) (>= 8.6.0) blt4.2 blt8.0 blt8.0-unoff (>= 2.2.1) (>= 8.6.0-2) libfontconfig1 (>= 2.12.6) libxext6 libxft2 (>> 2.1.1) libxss1 (>= 2.3.5) (>= 2.12.6) libxrender1 x11-common libjpeg62 (>= 5.1.1alpha+20120614) (>= 1.3.1) libjbig0 (>= 2.0) (>= 0.5.1) libzstd1 (>= 1.3.2) (>= 0.5.1) (>= 0.5.1) Matplotlib library Python dependencies Real dependencies Fake OS dependencies induced by package granularity Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 6 / 28

Slide 15

Slide 15 text

Source code is special (so ware is not data) So ware evolves over time projects may last decades the development history is key to its understanding Complexity millions of lines of code large web of dependencies easy to break, di icult to maintain research so ware a thin top layer sophisticated developer communities python3-matplotlib python3-dateutil python3-six (>= 1.4) python3:any python-matplotlib-data (>= 3.0.2-2) python3-pyparsing (>= 1.5.6) libjs-jquery libjs-jquery-ui python3-numpy (>= 1:1.14.3) python3 (<< 3.8) (>= 3.7~) python3-numpy-abi9 python3-cycler (>= 0.10.0) python3-kiwisolver libfreetype6 (>= 2.2.1) libpng16-16 (>= 1.6.2-1) python3-pil python3-tk (>= 1.5) (>= 3.2~) tzdata [python3] [python3] {debconf} debconf-2.0 (>= 0.5) [debconf] {cdebconf} fonts-lyx ttf-bitstream-vera (>= 3.3.2-2~) jquery javascript-common (>= 1.7) (<< 3.8) (>= 3.7~) python3.7:any libblas3 libblas.so.3 liblapack3 liblapack.so.3 python3-pkg-resources python3-minimal (= 3.7.3-1) python3.7 (>= 3.7.3-1~) libpython3-stdlib (= 3.7.3-1) python3.7-minimal (>= 3.7.3-1~) {dpkg} install-info (>= 1.13.20) libpython3.7-minimal (= 3.7.3-2) libexpat1 (>= 2.1~beta3) libssl1.1 (>= 1.1.1) libpython3.7-stdlib (>= 0.5) (= 3.7.3-2) mime-support libbz2-1.0 liblzma5 (>= 5.1.1alpha+20120614) libdb5.3 libffi6 (>= 3.0.4) libmpdec2 libncursesw6 (>= 6) libtinfo6 (>= 6) libreadline7 (>= 7.0~beta) libsqlite3-0 (>= 3.7.15) libuuid1 (>= 2.20.1) bzip2 file xz-utils (= 1.0.6-9) libmagic1 (= 1:5.35-4) libmagic-mgc (= 1:5.35-4) (>= 5.2.2) xz-lzma (= 6.1+20181013-2) libgpm2 (>= 6) readline-common (>= 1.15.4) libreadline-common (>= 1.16.1) uuid-runtime (>= 2.25-5~) (>= 2.31.1) adduser libsmartcols1 (>= 2.27~rc1) libsystemd0 (>= 0.5) passwd (>= 5.1.1alpha+20120614) libgcrypt20 (>= 1.8.0) liblz4-1 (>= 0.0~r122) libgpg-error0 (>= 1.25) libgpg-error-l10n (= 3.7.3-2) (= 3.7.3-2) (>= 3.7.3-1~) [python3.7] [python3.7] libgfortran5 (>= 8) libquadmath0 (>= 4.6) ... -6- gcc-9-base (= 9-20190428-1) (>= 4.6) (= 9-20190428-1) (>= 8) (>= 4.6) ... -3- (>= 3.3.2-2~) (<< 3.8) (>= 3.6~) (>= 1.6.2-1) (<< 3.8) (>= 3.7~) (>= 2.2.1) [mime-support] python3-pil.imagetk libimagequant0 (>= 2.11.10) libjpeg62-turbo (>= 1.3.1) liblcms2-2 (>= 2.2+git20110628) libtiff5 (>= 4.0.3) libwebp6 (>= 0.5.1) libwebpdemux2 (>= 0.5.1) libwebpmux3 (>= 0.6.1-2) python3-olefile (<< 3.8) (>= 3.7~) (= 6.0.0-1) (>= 3.4.1-2) (>= 3.7.1-1~) (<< 3.9) blt (>= 2.4z-9) tk8.6-blt2.5 (>= 2.5.3) libtcl8.6 (>= 8.6.0) libtk8.6 (>= 8.6.0) (= 2.5.3+dfsg-5) (>= 8.6.0) (>= 8.6.0) blt4.2 blt8.0 blt8.0-unoff (>= 2.2.1) (>= 8.6.0-2) libfontconfig1 (>= 2.12.6) libxext6 libxft2 (>> 2.1.1) libxss1 (>= 2.3.5) (>= 2.12.6) libxrender1 x11-common libjpeg62 (>= 5.1.1alpha+20120614) (>= 1.3.1) libjbig0 (>= 2.0) (>= 0.5.1) libzstd1 (>= 1.3.2) (>= 0.5.1) (>= 0.5.1) Matplotlib library Python dependencies Real dependencies Fake OS dependencies induced by package granularity The human side design, algorithm, code, test, documentation, community, funding and so many more facets ... Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 6 / 28

Slide 16

Slide 16 text

So ware is a pillar of Open Science Key pillar: so ware Links are important Nota Bene so ware may be a tool, a research outcome and a research object Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 7 / 28

Slide 17

Slide 17 text

So ware is a pillar of Open Science Key pillar: so ware Links are important Nota Bene so ware may be a tool, a research outcome and a research object access to the source code is essential! Preserving (the history of) source code is necessary for reproducibility Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 7 / 28

Slide 18

Slide 18 text

Fundamental needs for so ware in Open Science (selection) Archive Research so ware artifacts must be properly archived make sure we can retrieve them (reproducibility) Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 8 / 28

Slide 19

Slide 19 text

Fundamental needs for so ware in Open Science (selection) Archive Research so ware artifacts must be properly archived make sure we can retrieve them (reproducibility) Reference Research so ware artifacts must be properly referenced make sure we can identify them (reproducibility) Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 8 / 28

Slide 20

Slide 20 text

Fundamental needs for so ware in Open Science (selection) Archive Research so ware artifacts must be properly archived make sure we can retrieve them (reproducibility) Reference Research so ware artifacts must be properly referenced make sure we can identify them (reproducibility) Describe Research so ware artifacts must be properly described make it easy to discover and reuse them (visibility) Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 8 / 28

Slide 21

Slide 21 text

Fundamental needs for so ware in Open Science (selection) Archive Research so ware artifacts must be properly archived make sure we can retrieve them (reproducibility) Reference Research so ware artifacts must be properly referenced make sure we can identify them (reproducibility) Describe Research so ware artifacts must be properly described make it easy to discover and reuse them (visibility) Cite/Credit Research so ware artifacts must be properly cited (not the same as referenced!) to give credit to authors (evaluation!) Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 8 / 28

Slide 22

Slide 22 text

Where is the source code? Collaborative development platforms (aka "forges") BitBucket, GitLab(.com), GitHub, etc. support for version control, issues, etc. example: https://depot.lipn.univ-paris13.fr/cosyverif/cosydraw https://gitlab.inria.fr/gt-sw-citation/bibtex-sw-entry/ Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 9 / 28

Slide 23

Slide 23 text

Where is the source code? Collaborative development platforms (aka "forges") BitBucket, GitLab(.com), GitHub, etc. support for version control, issues, etc. example: https://depot.lipn.univ-paris13.fr/cosyverif/cosydraw https://gitlab.inria.fr/gt-sw-citation/bibtex-sw-entry/ Distribution platforms CTAN, CRAN, PyPi, Debian, etc. example: https://ctan.org/pkg/biblatex-software Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 9 / 28

Slide 24

Slide 24 text

Where is the source code? Collaborative development platforms (aka "forges") BitBucket, GitLab(.com), GitHub, etc. support for version control, issues, etc. example: https://depot.lipn.univ-paris13.fr/cosyverif/cosydraw https://gitlab.inria.fr/gt-sw-citation/bibtex-sw-entry/ Distribution platforms CTAN, CRAN, PyPi, Debian, etc. example: https://ctan.org/pkg/biblatex-software Archives So ware Heritage example: archived version of biblatex-so ware Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 9 / 28

Slide 25

Slide 25 text

Archive and reference: some popular approaches that do not fit the bill A - Since the 1970’s 1990’s .zip or .tar file on: p server (e.g. gnu) web page (example) document archive (+ DOI sample) Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 10 / 28

Slide 26

Slide 26 text

Archive and reference: some popular approaches that do not fit the bill A - Since the 1970’s 1990’s .zip or .tar file on: p server (e.g. gnu) web page (example) document archive (+ DOI sample) B - Since the 2000’s Rely on so ware forges institutional/project (e.g. example) free commercial ones: BitBucket, GitHub, GitLab, ... (e.g. imitator) Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 10 / 28

Slide 27

Slide 27 text

Archive and reference: some popular approaches that do not fit the bill A - Since the 1970’s 1990’s .zip or .tar file on: p server (e.g. gnu) web page (example) document archive (+ DOI sample) B - Since the 2000’s Rely on so ware forges institutional/project (e.g. example) free commercial ones: BitBucket, GitHub, GitLab, ... (e.g. imitator) C: a mix of the two Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 10 / 28

Slide 28

Slide 28 text

Archive and reference: some popular approaches that do not fit the bill A - Since the 1970’s 1990’s .zip or .tar file on: p server (e.g. gnu) web page (example) document archive (+ DOI sample) B - Since the 2000’s Rely on so ware forges institutional/project (e.g. example) free commercial ones: BitBucket, GitHub, GitLab, ... (e.g. imitator) C: a mix of the two Can get no satisfaction... A Poor user experience B No preservation guarantee C Can do so much be er Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 10 / 28

Slide 29

Slide 29 text

Forges are not archives! 2015: the first big bad news Google Code and Gitorious.org shutdown: ~1M endangered repositories broken links in the web of knowledge Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 11 / 28

Slide 30

Slide 30 text

Forges are not archives! 2015: the first big bad news Google Code and Gitorious.org shutdown: ~1M endangered repositories broken links in the web of knowledge Big bad news keep coming in summer 2019: BitBucket announces Mercurial VCS sunset july 2020: BitBucket erases 250.000+ repositories (including research so ware) summer 2022: GitLab.com considers erasing all projects that are inactive for a year Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 11 / 28

Slide 31

Slide 31 text

Forges are not archives! 2015: the first big bad news Google Code and Gitorious.org shutdown: ~1M endangered repositories broken links in the web of knowledge Big bad news keep coming in summer 2019: BitBucket announces Mercurial VCS sunset july 2020: BitBucket erases 250.000+ repositories (including research so ware) summer 2022: GitLab.com considers erasing all projects that are inactive for a year In Academia too! 2021: Inria’s old gforge is unplugged... breaks the Opam build chain for OCaml Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 11 / 28

Slide 32

Slide 32 text

Forges are not archives! 2015: the first big bad news Google Code and Gitorious.org shutdown: ~1M endangered repositories broken links in the web of knowledge Big bad news keep coming in summer 2019: BitBucket announces Mercurial VCS sunset july 2020: BitBucket erases 250.000+ repositories (including research so ware) summer 2022: GitLab.com considers erasing all projects that are inactive for a year In Academia too! 2021: Inria’s old gforge is unplugged... breaks the Opam build chain for OCaml We need a universal archive of so ware source code: Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 11 / 28

Slide 33

Slide 33 text

Forges are not archives! 2015: the first big bad news Google Code and Gitorious.org shutdown: ~1M endangered repositories broken links in the web of knowledge Big bad news keep coming in summer 2019: BitBucket announces Mercurial VCS sunset july 2020: BitBucket erases 250.000+ repositories (including research so ware) summer 2022: GitLab.com considers erasing all projects that are inactive for a year In Academia too! 2021: Inria’s old gforge is unplugged... breaks the Opam build chain for OCaml We need a universal archive of so ware source code: now we have one! Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 11 / 28

Slide 34

Slide 34 text

Outline 1 Open Science & So ware 2 So ware Heritage for Open Science and Reproducibility 3 ick Demo ! 4 Call to action 5 Conclusion Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 12 / 28

Slide 35

Slide 35 text

So ware Heritage in a nutshell www.so wareheritage.org THE GREAT LIBRARY OF SOURCE CODE Collect, preserve and share all so ware source code Preserving our heritage, enabling be er so ware and be er science for all Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 12 / 28

Slide 36

Slide 36 text

So ware Heritage in a nutshell www.so wareheritage.org THE GREAT LIBRARY OF SOURCE CODE Collect, preserve and share all so ware source code Preserving our heritage, enabling be er so ware and be er science for all Reference catalog find and reference all so ware source code Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 12 / 28

Slide 37

Slide 37 text

So ware Heritage in a nutshell www.so wareheritage.org THE GREAT LIBRARY OF SOURCE CODE Collect, preserve and share all so ware source code Preserving our heritage, enabling be er so ware and be er science for all Reference catalog find and reference all so ware source code Universal archive preserve and share all so ware source code Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 12 / 28

Slide 38

Slide 38 text

So ware Heritage in a nutshell www.so wareheritage.org THE GREAT LIBRARY OF SOURCE CODE Collect, preserve and share all so ware source code Preserving our heritage, enabling be er so ware and be er science for all Reference catalog find and reference all so ware source code Universal archive preserve and share all so ware source code Research infrastructure enable analysis of all so ware source code Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 12 / 28

Slide 39

Slide 39 text

An international, non profit initiative built for the long term Sharing the vision And many more ... www.softwareheritage.org/support/testimonials Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 13 / 28

Slide 40

Slide 40 text

An international, non profit initiative built for the long term Sharing the vision And many more ... www.softwareheritage.org/support/testimonials Donors, members, sponsors Diamond sponsor Bronze sponsors Gold sponsors Silver sponsors Platinum sponsors Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 13 / 28

Slide 41

Slide 41 text

The largest so ware archive, a shared infrastructure One infrastructure open and shared Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 14 / 28

Slide 42

Slide 42 text

The largest so ware archive, a shared infrastructure One infrastructure open and shared The largest archive ever built Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 14 / 28

Slide 43

Slide 43 text

The largest so ware archive, a shared infrastructure One infrastructure open and shared The largest archive ever built Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 14 / 28

Slide 44

Slide 44 text

So ware Heritage: a radically di erent approach to archiving dsc dsc hg hg hg git git git git svn svn svn tar zip software origins Package repos Forges GitHub lister GitLab lister Debian lister PyPi lister . . . Distros ... Listing (full/incremental) tar Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 15 / 28

Slide 45

Slide 45 text

So ware Heritage: a radically di erent approach to archiving Git loader Mercurial loader Debian source package loader pypi source package loader . . . Software Heritage Archive Merkle DAG + blob storage Loading & deduplication dsc dsc hg hg hg git git git git svn svn svn tar zip software origins Package repos Forges GitHub lister GitLab lister Debian lister PyPi lister . . . Distros ... Scheduling Listing (full/incremental) tar origins snapshots releases revisions revisions directories directories contents Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 15 / 28

Slide 46

Slide 46 text

So ware Heritage: a radically di erent approach to archiving Git loader Mercurial loader Debian source package loader pypi source package loader . . . Software Heritage Archive Merkle DAG + blob storage Loading & deduplication dsc dsc hg hg hg git git git git svn svn svn tar zip software origins Package repos Forges GitHub lister GitLab lister Debian lister PyPi lister . . . Distros ... Scheduling Listing (full/incremental) tar origins snapshots releases revisions revisions directories directories contents Global development history permanently archived in a uniform data model over 20 billion unique source files from over 300 million so ware projects ~2PB (compressed) blobs, ~50 B nodes, ~700 B edges Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 15 / 28

Slide 47

Slide 47 text

So ware Heritage is radically di erent, cont’d So ware Hash Identifiers (SWHID) see swhid.org 50+B intrinsic, decentralised, cryptographically strong identifiers, SWHIDs Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 16 / 28

Slide 48

Slide 48 text

So ware Heritage is radically di erent, cont’d So ware Hash Identifiers (SWHID) see swhid.org 50+B intrinsic, decentralised, cryptographically strong identifiers, SWHIDs Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 16 / 28

Slide 49

Slide 49 text

So ware Heritage is radically di erent, cont’d So ware Hash Identifiers (SWHID) see swhid.org 50+B intrinsic, decentralised, cryptographically strong identifiers, SWHIDs Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 16 / 28

Slide 50

Slide 50 text

So ware Heritage is radically di erent, cont’d So ware Hash Identifiers (SWHID) see swhid.org 50+B intrinsic, decentralised, cryptographically strong identifiers, SWHIDs In SPDX 2.2; IANA registered "swh:"; WikiData P6138; ISO standard Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 16 / 28

Slide 51

Slide 51 text

So ware Heritage is radically di erent, cont’d So ware Hash Identifiers (SWHID) see swhid.org 50+B intrinsic, decentralised, cryptographically strong identifiers, SWHIDs In SPDX 2.2; IANA registered "swh:"; WikiData P6138; ISO standard Full fledged source code references for traceability, integrity and reproducibility Examples: Apollo 11 AGC, ake III rsqrt; Guidelines available: HOWTO and ICMS 2020 Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 16 / 28

Slide 52

Slide 52 text

So ware Heritage is radically di erent, cont’d A quick tour as a user designed for source code: Browse (e.g. Apollo 11 excerpt) like on a developer platform, not a document archive! Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 17 / 28

Slide 53

Slide 53 text

So ware Heritage is radically di erent, cont’d A quick tour as a user reference source code: all granularities, using SWHIDs (full specification available online) SWHIDs guarantee integrity like in blockchains Figure: Compare Fig. 1 and conclusions in the 2012 version and the updated version Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 18 / 28

Slide 54

Slide 54 text

So ware Heritage is radically di erent, cont’d Ge ing so ware archived automated harvesting: over 290 million so ware origins, your researchers’ work may already be there (actually, here)! Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 19 / 28

Slide 55

Slide 55 text

So ware Heritage is radically di erent, cont’d Ge ing so ware archived automated harvesting: over 290 million so ware origins, your researchers’ work may already be there (actually, here)! universal archive: all source code from all platforms (BitBucket, GitHub, GitLab, your own forge, etc.) trigger archival of any code in one click with the updateswh browser extension use webhooks to automatically archive your code (a GitHub action is available too) journals, libraries, open access portals may deposit sourcecode and metadata Example article from IPOL Example article from eLife Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 19 / 28

Slide 56

Slide 56 text

Outline 1 Open Science & So ware 2 So ware Heritage for Open Science and Reproducibility 3 ick Demo ! 4 Call to action 5 Conclusion Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 20 / 28

Slide 57

Slide 57 text

A walkthrough Browse (e.g. Imitator [excerpt], your work may be already there !) Trigger archival, use the updateswh browser extension, configure the webhooks Get and use SWHIDs (full specification available online) Cite so ware with biblatex-so ware package from CTAN Overleaf ACMART template available Example in journals: article from IPOL Example with adt2amas: code source, archive in SWH, curated deposit in HAL Extracting all the so ware products for Inria, for CNRS, for CNES, for LIRMM or for Rémi Gribonval using HalTools Curated deposit in SWH via HAL, see for example: LinBox, SLALOM, Givaro, NS2DDV, SumGra, Coq proof, ... Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 20 / 28

Slide 58

Slide 58 text

An example of long term reproducibility for HPC (re)create fully reproducible binaries from source... https://guix.gnu.org/ functional package manager bit by bit reproductibility from the source code Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 21 / 28

Slide 59

Slide 59 text

An example of long term reproducibility for HPC (re)create fully reproducible binaries from source... https://guix.gnu.org/ functional package manager bit by bit reproductibility from the source code ... with a focus on HPC https://hpc.guix.info/ environment control support cluster deployment from the source code Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 21 / 28

Slide 60

Slide 60 text

An example of long term reproducibility for HPC (re)create fully reproducible binaries from source... https://guix.gnu.org/ functional package manager bit by bit reproductibility from the source code ... with a focus on HPC https://hpc.guix.info/ environment control support cluster deployment from the source code connection with So ware Heritage source code archival and identification for guix and nix automatic fallback for missing sources (see experience report) Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 21 / 28

Slide 61

Slide 61 text

HAL and So ware Heritage: building a curated so ware catalog with minimal user overhead! Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 22 / 28

Slide 62

Slide 62 text

Outline 1 Open Science & So ware 2 So ware Heritage for Open Science and Reproducibility 3 ick Demo ! 4 Call to action 5 Conclusion Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 23 / 28

Slide 63

Slide 63 text

Call to action: best practices for ARDC are available... today! Archiving and referencing For all source code used in research (yes, even small scripts!) ensure it is archived in So ware Heritage (see save code now) get the proper SWHID for your so ware (see detailed HOWTO) add it to research articles for reproducibility (see detailed HOWTO) Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 23 / 28

Slide 64

Slide 64 text

Call to action: best practices for ARDC are available... today! Archiving and referencing For all source code used in research (yes, even small scripts!) ensure it is archived in So ware Heritage (see save code now) get the proper SWHID for your so ware (see detailed HOWTO) add it to research articles for reproducibility (see detailed HOWTO) Describing and Citing/Crediting For so ware you want to put forward (mention in your CV, reports, etc., get citations and credit for it), do the following extra steps: add codemeta.json with description (see the codemeta generator) reference in the HAL portal (french partners, see online HAL documentation) cite so ware using the biblatex-so ware package (in CTAN and TeXLive) Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 23 / 28

Slide 65

Slide 65 text

Call to action: best practices for ARDC are available... today! Archiving and referencing For all source code used in research (yes, even small scripts!) ensure it is archived in So ware Heritage (see save code now) get the proper SWHID for your so ware (see detailed HOWTO) add it to research articles for reproducibility (see detailed HOWTO) Describing and Citing/Crediting For so ware you want to put forward (mention in your CV, reports, etc., get citations and credit for it), do the following extra steps: add codemeta.json with description (see the codemeta generator) reference in the HAL portal (french partners, see online HAL documentation) cite so ware using the biblatex-so ware package (in CTAN and TeXLive) train students, colleagues engage journals, conferences, learned societies Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 23 / 28

Slide 66

Slide 66 text

Outline 1 Open Science & So ware 2 So ware Heritage for Open Science and Reproducibility 3 ick Demo ! 4 Call to action 5 Conclusion Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 24 / 28

Slide 67

Slide 67 text

A rally flag for a grand vision Bring together academia, industry, governments, communities "to build a reference, global infrastructure for open and be er so ware" Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 24 / 28

Slide 68

Slide 68 text

A rally flag for a grand vision Bring together academia, industry, governments, communities "to build a reference, global infrastructure for open and be er so ware" So ware Heritage is the first brick ... vendor neutral open source a worldwide initiative a long term initiative Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 24 / 28

Slide 69

Slide 69 text

A rally flag for a grand vision Bring together academia, industry, governments, communities "to build a reference, global infrastructure for open and be er so ware" So ware Heritage is the first brick ... vendor neutral open source a worldwide initiative a long term initiative ... that will enable archival, reference, integrity qualification, sharing and reuse a global so ware knowledge base test and deploy world class tooling Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 24 / 28

Slide 70

Slide 70 text

A rally flag for a grand vision Bring together academia, industry, governments, communities "to build a reference, global infrastructure for open and be er so ware" So ware Heritage is the first brick ... vendor neutral open source a worldwide initiative a long term initiative ... that will enable archival, reference, integrity qualification, sharing and reuse a global so ware knowledge base test and deploy world class tooling A lot more is needed So ware Heritage can be the catalyser of a way bigger undertaking Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 24 / 28

Slide 71

Slide 71 text

A rally flag for a grand vision Bring together academia, industry, governments, communities "to build a reference, global infrastructure for open and be er so ware" So ware Heritage is the first brick ... vendor neutral open source a worldwide initiative a long term initiative ... that will enable archival, reference, integrity qualification, sharing and reuse a global so ware knowledge base test and deploy world class tooling A lot more is needed So ware Heritage can be the catalyser of a way bigger undertaking You can help! use, disseminate, contribute, build&adapt research tools, ... Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 24 / 28

Slide 72

Slide 72 text

Join a growing and active community Team Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 25 / 28

Slide 73

Slide 73 text

Join a growing and active community Team Ambassadors Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 25 / 28

Slide 74

Slide 74 text

Join a growing and active community Team Ambassadors Contributors to the platform Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 25 / 28

Slide 75

Slide 75 text

Join a growing and active community Team Ambassadors Contributors to the platform Work with us! https://softwareheritage.org/jobs/ Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 25 / 28

Slide 76

Slide 76 text

Report and videos Annual report Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 26 / 28

Slide 77

Slide 77 text

Report and videos Annual report 5 years in 5 minutes Link Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 26 / 28

Slide 78

Slide 78 text

Report and videos Annual report 5 years in 5 minutes Link Evolution of our codebase Link Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 26 / 28

Slide 79

Slide 79 text

it’s a long road, but together we can make it Thank you Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 27 / 28

Slide 80

Slide 80 text

Credits This presentation reuses material from Roberto di Cosmo’s presentations. Jaime Arias [email protected] (CC-BY 4.0) So ware Heritage for Open Science and Research November 27, 2024 28 / 28