Slide 1

Slide 1 text

Software Analytics with Data Science on Software Data BOBKONF 2024, 15.03.2024 Markus Harrer Software Evolutionist @ INNOQ Social: markusharrer.de

Slide 2

Slide 2 text

The original horror show IMAGE: sethJreid/Pixabay LEGACY systemS

Slide 3

Slide 3 text

According to https://pkruchten.files.wordpress.com/2013/12/kruchten-colours-yow-sydney.pdf, Icons from https://game-icons.net/, CC BY 3.0 license

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Adapted from Ray Koopa https://commons.wikimedia.org/wiki/File:Lcars_wallpaper.svg (CC BY-SA 4.0) THE ULTIMATE QUALITY DASHBOARD SHIPPING APP DEVELOPER'S SANITY PROJECT'S BUDGET

Slide 6

Slide 6 text

THE

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

A NEW HOPE

Slide 11

Slide 11 text

Frequency Questions Importance Answering YOUR SPECIFIC questions Use standard tools for general questions Option 2: Use Software Analytics to answer your very specific questions! Option 1: Just ignore the other questions

Slide 12

Slide 12 text

JUDGMENT DAY Typical ISSUES TO TERMINATE  Spotting parts in source code no one knows of anymore  Finding root causes of performance bottlenecks  Identifying alternative modularization options  Showing the progress of long-living restructurings  Measuring the community activity around open source software 

Slide 13

Slide 13 text

THE SOFTWARE DATA OF OF DEVELOPERS

Slide 14

Slide 14 text

Substantial expertise Data Science Data Science Venn diagram by Drew Conway (simplified)

Slide 15

Slide 15 text

= A WAY TO IMPLEMENT SOLID SOFTWARE ANALYTICS R E P R O D U C I B L E D A T A S C I E N C E open comprehensible systematic automated

Slide 16

Slide 16 text

No content

Slide 17

Slide 17 text

3 Python 1 ... and matplotlib, numpy, scikit-learn, NLTK, Pygments, py2neo, requests, BeautifulSoup, Pygal ...

Slide 18

Slide 18 text

code and data in love Computational Notebook

Slide 19

Slide 19 text

Computational Notebook Jupyter Notebook Context Idea Analysis Conclusion Data-driven Software Analysis

Slide 20

Slide 20 text

Literate Programming with Jupyter Notebook

Slide 21

Slide 21 text

Attribution: Tobias ToMar Maier, https://commons.wikimedia.org/wiki/File:VHS_tape_with_time_scale.jpg Demo I Miami Cops Police Academy

Slide 22

Slide 22 text

jQAssistant Neo4j Graph Data Science

Slide 23

Slide 23 text

:Class :Method :Field https://github.com/JavaOnAutobahn/spring-petclinic public class Pet { private LocalDate birthDate; public LocalDate getBirthDate(){ return this.birthDate; } public void setBirthDate(LocalDate birthDate){ this.birthDate = birthDate; }

Slide 24

Slide 24 text

:Class :Method :Field :Entity @Entity @Table(name = "pets") public class Pet { https://github.com/JavaOnAutobahn/spring-petclinic

Slide 25

Slide 25 text

:Class Business Subdomain :Method :Field 2 findings 5 changes :Entity 100% usage birthDate name

Slide 26

Slide 26 text

16 types 17 findings 15 changes 70% usage 5 types 39 findings 51 changes 80% usage A perspective, where also managers can reason about!

Slide 27

Slide 27 text

Attribution: Tobias ToMar Maier, https://commons.wikimedia.org/wiki/File:VHS_tape_with_time_scale.jpg Demo II Terminator Flash Gordon

Slide 28

Slide 28 text

The return of reason The Two Tips The fellowship of the bling A u t o m a t i o n Meta Metric Number of solved problems NO Tool To Rule Them All O p e n e s s become the of the T by analyzing software in a data-driven way

Slide 29

Slide 29 text

ASK ' EM ALL @feststelltaste

Slide 30

Slide 30 text

Appendix

Slide 31

Slide 31 text

Jupyter notebook, python, pandas, matplotlib Repo https://github.com/feststelltaste/software-analytics/tree/master/demos/20240315_BOBKonf_2024 jQAssistant & Neo4j Repo Spring PetClinic https://github.com/JavaOnAutobahn/spring-petclinic Demos

Slide 32

Slide 32 text

More on Software Analytics softwareanalytics.de

Slide 33

Slide 33 text

More on Software Analytics Adam Tornhill: Software X-Ray Tim Menzies, Laurie Williams, Thomas Zimmermann: Perspectives on Data Science for Software Engineering Christian Bird, Tim Menzies, Thomas Zimmermann: The Art and Science of Analyzing Software Data

Slide 34

Slide 34 text

More on Data Science Jeff Leek: The Elements of Data Analytic Style Roger D. Peng: Report Writing for Data Science in R Wes McKinney: Python for Data Analysis

Slide 35

Slide 35 text

More on Graph Analytics Mark Needham & Amy Hodler: Graph Algorithms https://neo4j.com/product/graph-data-science/

Slide 36

Slide 36 text

Paper about jQAssistant/Neo4j https://easychair.org/publications/preprint/893N

Slide 37

Slide 37 text

Thank you very much! innoQ Deutschland GmbH Krischerstr. 100 40789 Monheim am Rhein Germany +49 2173 3366-0 Ohlauer Str. 43 10999 Berlin Germany Ludwigstr. 180E 63067 Offenbach Germany Kreuzstr. 16 80331 Munich Germany Gewerbestr. 11 CH-6330 Cham Switzerland +41 41 743 01 11 Albulastr. 55 8048 Zurich Switzerland innoQ Schweiz GmbH Markus Harrer markus.harrer@innoq.com @feststelltaste