Data Science on
Software Data
WEAREDEVELOPERS WORLD CONFERENCE 2021
Markus Harrer
Software Development Analyst @ INNOQ
Twitter: @feststelltaste
Website: softwareanalytics.de
Slides: speakerdeck.com/feststelltaste/
Slide 2
Slide 2 text
The original horror show
IMAGE: sethJreid/Pixabay
LEGACY systemS
Slide 3
Slide 3 text
According to https://pkruchten.files.wordpress.com/2013/12/kruchten-colours-yow-sydney.pdf
Slide 4
Slide 4 text
No content
Slide 5
Slide 5 text
Adapted from Ray Koopa https://commons.wikimedia.org/wiki/File:Lcars_wallpaper.svg (CC BY-SA 4.0)
THE ULTIMATE QUALITY DASHBOARD
SHIPPING APP
DEVELOPER'S SANITY PROJECT'S BUDGET
Slide 6
Slide 6 text
THE
STRIKES BACK
Slide 7
Slide 7 text
No content
Slide 8
Slide 8 text
No content
Slide 9
Slide 9 text
No content
Slide 10
Slide 10 text
IV == VII V == VIII VI == IX
Slide 11
Slide 11 text
A NEW HOPE
Slide 12
Slide 12 text
Frequency
Questions
Importance for us
Answering important SPECIFIC questions
Use standard tools for
general questions
Option 2: Use Software
Analytics to answer your
very specific questions!
Option 1: Just ignore
the other questions
Slide 13
Slide 13 text
Substantial expertise
Data
Science
Data Science
Venn diagram by Drew Conway (simplified)
Slide 14
Slide 14 text
= A WAY TO IMPLEMENT SOLID
SOFTWARE ANALYTICS
R E P R O D U C I B L E D A T A S C I E N C E
open
comprehensible
systematic
automated
Slide 15
Slide 15 text
THE SOFTWARE DATA OF
OF DEVELOPERS
Slide 16
Slide 16 text
JUDGMENT DAY
Typical ISSUES TO TERMINATE
Spotting parts in the source code no one knows of anymore
Finding root causes of our performance bottlenecks
Identifying alternative modularizations of software systems
Showing the progress of long-time refactorings
Measuring the community activity around open source software
Computational Notebook
COMPLETELY AUTOMATED
• Context documented
• Ideas, assumptions and
simplifications explicit
• Calculations presented in
an understandable way
• Summaries / What’s next?
Jupyter Notebook Context
Idea
Analysis
Conclusion
Data-driven Software Analysis
Slide 21
Slide 21 text
Literate Programming
with Jupyter Notebook
Slide 22
Slide 22 text
Attribution: Tobias ToMar Maier, https://commons.wikimedia.org/wiki/File:VHS_tape_with_time_scale.jpg
Demo I
Miami Cops Police Academy
Slide 23
Slide 23 text
jQAssistant Neo4j
Graph Analytics
Slide 24
Slide 24 text
:Class
:Method
:Field
https://github.com/buschmais/spring-petclinic
public class Pet {
private LocalDate birthDate;
public LocalDate getBirthDate(){
return this.birthDate;
}
public void setBirthDate(LocalDate birthDate){
this.birthDate = birthDate;
}
Slide 25
Slide 25 text
:Class
:Method
:Field
:Entity
https://github.com/buschmais/spring-petclinic
@Entity
@Table(name = "pets")
public class Pet {
Slide 26
Slide 26 text
:Class
Business Subdomain
:Method
:Field
findings 2
changes 5
:Entity
usage 100%
name birthDate
Slide 27
Slide 27 text
types 16
findings 17
changes 15
usage 70%
types 5
findings 39
changes 51
usage 80%
A perspective, where also
managers can reason about!
Slide 28
Slide 28 text
Attribution: Tobias ToMar Maier, https://commons.wikimedia.org/wiki/File:VHS_tape_with_time_scale.jpg
Demo II
Terminator Flash Gordon
Slide 29
Slide 29 text
The return
of reason
The Two Tips
The fellow
feeling
A
u
t
o
m
a
t
i
o
n
Meta Metric
Number of
solved problems
NO Tool To
Rule Them All
O
p
e
n
e
s
s
become the
L S
ordof
the
hing
T
by analyzing software in a data-driven way
Slide 30
Slide 30 text
ASK ' EM ALL
@feststelltaste
Slide 31
Slide 31 text
Appendix
Slide 32
Slide 32 text
More on Software Analytics
softwareanalytics.de
Slide 33
Slide 33 text
Jupyter notebook, python, pandas, matplotlib
Repo
https://github.com/feststelltaste/software-analytics/tree/master/demos/20210630_WeAreDevelopersWorldCongress
Interactive online version
https://mybinder.org/v2/gh/feststelltaste/software-analytics/HEAD?filepath=demos%2F20210630_WeAreDevelopersWorldCongress
jQAssistant & Neo4j
Repo Spring PetClinic
https://github.com/javaonautobahn/spring-petclinic
Repo DesignSmells
https://github.com/feststelltaste/designsmells
Demos
Run notebook with this
Slide 34
Slide 34 text
More on Software Analytics
Adam Tornhill:
Software X-Ray
Tim Menzies, Laurie Williams,
Thomas Zimmermann:
Perspectives on Data Science for
Software Engineering
Christian Bird, Tim Menzies,
Thomas Zimmermann:
The Art and Science of Analyzing
Software Data
Slide 35
Slide 35 text
More on Data Science
Jeff Leek:
The Elements of Data
Analytic Style
Roger D. Peng:
Report Writing for Data
Science in R
Wes McKinney:
Python for Data Analysis
Slide 36
Slide 36 text
More on Graph
Analytics
Mark Needham & Amy Hodler:
Graph Algorithms
https://neo4j.com/product/graph-data-science-library/
Slide 37
Slide 37 text
Paper about jQAssistant/Neo4j
https://easychair.org/publications/preprint/893N
Slide 38
Slide 38 text
Thank you very much!
innoQ Germany GmbH
Krischerstr. 100
40789 Monheim on the
Rhine
Germany
+49 2173 3366-0
Ohlauer Str. 43
10999 Berlin
Germany
Ludwigstr. 180E
63067 Offenbach
Germany
Kreuzstr. 16
80331 Munich
Germany
Gewerbestr. 11
CH-6330 Cham
Switzerland
+41 41 743 01 11
Albulastr. 55
8048 Zurich
Switzerland
innoQ Switzerland GmbH
Markus Harrer
[email protected]
@feststelltaste