Slide 1

Slide 1 text

Software Analytics with Data Science on Software Data CRAFT CONFERENCE 10, MAY 31, 2024 Markus Harrer Software Evolutionist @ INNOQ Social: markusharrer.de

Slide 2

Slide 2 text

The original horror show IMAGE: sethJreid/Pixabay

Slide 3

Slide 3 text

According to https://pkruchten.files.wordpress.com/2013/12/kruchten-colours-yow-sydney.pdf, Icons from https://game-icons.net/, CC BY 3.0 license

Slide 4

Slide 4 text

Adapted from Ray Koopa https://commons.wikimedia.org/wiki/File:Lcars_wallpaper.svg (CC BY-SA 4.0) THE ULTIMATE SOFTWARE QUALITY DASHBOARD DEVELOPERs SANITY PROJECT BUDGET

Slide 5

Slide 5 text

THE

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

ATTACK OF THE DATA

Slide 10

Slide 10 text

Frequency Problems Importance for us Problem-oriented Data ANALYSIS Standard tools for generic problems Software Analytics for very specific problems! !

Slide 11

Slide 11 text

THE SOFTWARE DATA OF OF DEVELOPERS

Slide 12

Slide 12 text

Domain Expertise Data Science Venn diagram by Drew Conway (simplified and slightly changed)

Slide 13

Slide 13 text

No content

Slide 14

Slide 14 text

3 Python 1

Slide 15

Slide 15 text

Context Idea Analysis Conclusion Literate Statistical Programming code and data in love

Slide 16

Slide 16 text

Attribution: Tobias ToMar Maier, https://commons.wikimedia.org/wiki/File:VHS_tape_with_time_scale.jpg Demo I Miami Cops Police Academy

Slide 17

Slide 17 text

jQAssistant Neo4j Graph Data Science

Slide 18

Slide 18 text

:Class :Method :Field https://github.com/JavaOnAutobahn/spring-petclinic public class Pet { private LocalDate birthDate; public LocalDate getBirthDate(){ return this.birthDate; } public void setBirthDate(LocalDate birthDate){ this.birthDate = birthDate; } Graph-based Analytics of Source Code

Slide 19

Slide 19 text

:Class :Method :Field :Entity @Entity @Table(name = "pets") public class Pet { ... https://github.com/JavaOnAutobahn/spring-petclinic Graph-based Analytics of Source Code

Slide 20

Slide 20 text

:Class :Method :Field 2 findings 5 changes :Entity 100% usage gain and share Insight from Data Business Subdomain

Slide 21

Slide 21 text

16 types 17 findings 15 changes 70% usage 5 types 39 findings 51 changes 80% usage A perspective, where also managers can reason about! You can query this! make better Decisions

Slide 22

Slide 22 text

MATCH (t:Type)-[:BELONGS_TO]->(s:Business:Subdomain), (t)-[:HAS_CHANGE]->(ch:Change) RETURN s.name as ASubdomain, COUNT(DISTINCT t) as Types, COUNT(DISTINCT ch) as Changes ORDER BY Types DESC LIMIT 5 What are the top 5 subdomains with the most types and their changes within a subdomain? Changes Types ASubdomain 209 15 "Pet" 119 9 "Visit" 117 8 "Vet" 130 7 "Owner" 102 6 "Clinic" asking the right Questions

Slide 23

Slide 23 text

Attribution: Tobias ToMar Maier, https://commons.wikimedia.org/wiki/File:VHS_tape_with_time_scale.jpg Demo II Terminator Flash Gordon

Slide 24

Slide 24 text

The return of the reason The Two Powers The fellowship of the bling D a t a o r i e n t e d Meta Metric Number of solved problems NO Tool To Rule Them All P r o b l e m d r i v e n become the of the T by analyzing software in a data-driven way

Slide 25

Slide 25 text

slido.com #CraftConf2024 Pink Stage ASK 'EM ALL Social contact links: markusharrer.de Thank you!

Slide 26

Slide 26 text

One more thing…

Slide 27

Slide 27 text

AI = JUDGMENT DAY? THREAT ASSESSMENT ********************** - MAYBE NOT - Absolutely Not! - MAYBE B a c k g r o u n d i m a g e b y F r e e p i k

Slide 28

Slide 28 text

Appendix

Slide 29

Slide 29 text

More on Software Analytics softwareanalytics.de

Slide 30

Slide 30 text

More on Software Analytics Adam Tornhill: Your Code as a Crime Scene & Software Design X-Rays Tim Menzies, Laurie Williams, Thomas Zimmermann: Perspectives on Data Science for Software Engineering Christian Bird, Tim Menzies, Thomas Zimmermann: The Art and Science of Analyzing Software Data

Slide 31

Slide 31 text

More on Data Science Jeff Leek: The Elements of Data Analytic Style Roger D. Peng: Report Writing for Data Science in R Wes McKinney: Python for Data Analysis

Slide 32

Slide 32 text

Jupyter Notebook, python, pandas, matplotlib https://github.com/feststelltaste/software- analytics/tree/master/demos/20240331_CRAFTCONF_2024 jQAssistant & Neo4j https://github.com/JavaOnAutobahn/spring-petclinic Demos

Slide 33

Slide 33 text

More on Graph Data Science Mark Needham & Amy Hodler: Graph Algorithms https://neo4j.com/product/graph-data-science/

Slide 34

Slide 34 text

Paper about jQAssistant/Neo4j https://easychair.org/publications/preprint/893N

Slide 35

Slide 35 text

Thank you very much! innoQ Deutschland GmbH Krischerstr. 100 40789 Monheim am Rhein Germany +49 2173 3366-0 Ohlauer Str. 43 10999 Berlin Germany Ludwigstr. 180E 63067 Offenbach Germany Kreuzstr. 16 80331 Munich Germany Gewerbestr. 11 CH-6330 Cham Switzerland +41 41 743 01 11 Albulastr. 55 8048 Zurich Switzerland innoQ Schweiz GmbH Markus Harrer markus.harrer@innoq.com Slides