Slide 1

Slide 1 text

Data Science on Software Data WEAREDEVELOPERS WORLD CONFERENCE 2021 Markus Harrer Software Development Analyst @ INNOQ Twitter: @feststelltaste Website: softwareanalytics.de Slides: speakerdeck.com/feststelltaste/

Slide 2

Slide 2 text

The original horror show IMAGE: sethJreid/Pixabay LEGACY systemS

Slide 3

Slide 3 text

According to https://pkruchten.files.wordpress.com/2013/12/kruchten-colours-yow-sydney.pdf

Slide 4

Slide 4 text

No content

Slide 5

Slide 5 text

Adapted from Ray Koopa https://commons.wikimedia.org/wiki/File:Lcars_wallpaper.svg (CC BY-SA 4.0) THE ULTIMATE QUALITY DASHBOARD SHIPPING APP DEVELOPER'S SANITY PROJECT'S BUDGET

Slide 6

Slide 6 text

THE STRIKES BACK

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

No content

Slide 10

Slide 10 text

IV == VII V == VIII VI == IX

Slide 11

Slide 11 text

A NEW HOPE

Slide 12

Slide 12 text

Frequency Questions Importance for us Answering important SPECIFIC questions Use standard tools for general questions Option 2: Use Software Analytics to answer your very specific questions! Option 1: Just ignore the other questions

Slide 13

Slide 13 text

Substantial expertise Data Science Data Science Venn diagram by Drew Conway (simplified)

Slide 14

Slide 14 text

= A WAY TO IMPLEMENT SOLID SOFTWARE ANALYTICS R E P R O D U C I B L E D A T A S C I E N C E open comprehensible systematic automated

Slide 15

Slide 15 text

THE SOFTWARE DATA OF OF DEVELOPERS

Slide 16

Slide 16 text

JUDGMENT DAY Typical ISSUES TO TERMINATE  Spotting parts in the source code no one knows of anymore  Finding root causes of our performance bottlenecks  Identifying alternative modularizations of software systems  Showing the progress of long-time refactorings  Measuring the community activity around open source software 

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

Python 3 3 Python 1 ... and matplotlib, numpy, scikit-learn, NLTK, Pygments, py2neo, requests, BeautifulSoup, Pygal ...

Slide 19

Slide 19 text

code and data in love Computational Notebook

Slide 20

Slide 20 text

Computational Notebook COMPLETELY AUTOMATED • Context documented • Ideas, assumptions and simplifications explicit • Calculations presented in an understandable way • Summaries / What’s next? Jupyter Notebook Context Idea Analysis Conclusion Data-driven Software Analysis

Slide 21

Slide 21 text

Literate Programming with Jupyter Notebook

Slide 22

Slide 22 text

Attribution: Tobias ToMar Maier, https://commons.wikimedia.org/wiki/File:VHS_tape_with_time_scale.jpg Demo I Miami Cops Police Academy

Slide 23

Slide 23 text

jQAssistant Neo4j Graph Analytics

Slide 24

Slide 24 text

:Class :Method :Field https://github.com/buschmais/spring-petclinic public class Pet { private LocalDate birthDate; public LocalDate getBirthDate(){ return this.birthDate; } public void setBirthDate(LocalDate birthDate){ this.birthDate = birthDate; }

Slide 25

Slide 25 text

:Class :Method :Field :Entity https://github.com/buschmais/spring-petclinic @Entity @Table(name = "pets") public class Pet {

Slide 26

Slide 26 text

:Class Business Subdomain :Method :Field findings 2 changes 5 :Entity usage 100% name birthDate

Slide 27

Slide 27 text

types 16 findings 17 changes 15 usage 70% types 5 findings 39 changes 51 usage 80% A perspective, where also managers can reason about!

Slide 28

Slide 28 text

Attribution: Tobias ToMar Maier, https://commons.wikimedia.org/wiki/File:VHS_tape_with_time_scale.jpg Demo II Terminator Flash Gordon

Slide 29

Slide 29 text

The return of reason The Two Tips The fellow feeling A u t o m a t i o n Meta Metric Number of solved problems NO Tool To Rule Them All O p e n e s s become the L S ordof the hing T by analyzing software in a data-driven way

Slide 30

Slide 30 text

ASK ' EM ALL @feststelltaste

Slide 31

Slide 31 text

Appendix

Slide 32

Slide 32 text

More on Software Analytics softwareanalytics.de

Slide 33

Slide 33 text

Jupyter notebook, python, pandas, matplotlib Repo https://github.com/feststelltaste/software-analytics/tree/master/demos/20210630_WeAreDevelopersWorldCongress Interactive online version https://mybinder.org/v2/gh/feststelltaste/software-analytics/HEAD?filepath=demos%2F20210630_WeAreDevelopersWorldCongress jQAssistant & Neo4j Repo Spring PetClinic https://github.com/javaonautobahn/spring-petclinic Repo DesignSmells https://github.com/feststelltaste/designsmells Demos Run notebook with this

Slide 34

Slide 34 text

More on Software Analytics Adam Tornhill: Software X-Ray Tim Menzies, Laurie Williams, Thomas Zimmermann: Perspectives on Data Science for Software Engineering Christian Bird, Tim Menzies, Thomas Zimmermann: The Art and Science of Analyzing Software Data

Slide 35

Slide 35 text

More on Data Science Jeff Leek: The Elements of Data Analytic Style Roger D. Peng: Report Writing for Data Science in R Wes McKinney: Python for Data Analysis

Slide 36

Slide 36 text

More on Graph Analytics Mark Needham & Amy Hodler: Graph Algorithms https://neo4j.com/product/graph-data-science-library/

Slide 37

Slide 37 text

Paper about jQAssistant/Neo4j https://easychair.org/publications/preprint/893N

Slide 38

Slide 38 text

Thank you very much! innoQ Germany GmbH Krischerstr. 100 40789 Monheim on the Rhine Germany +49 2173 3366-0 Ohlauer Str. 43 10999 Berlin Germany Ludwigstr. 180E 63067 Offenbach Germany Kreuzstr. 16 80331 Munich Germany Gewerbestr. 11 CH-6330 Cham Switzerland +41 41 743 01 11 Albulastr. 55 8048 Zurich Switzerland innoQ Switzerland GmbH Markus Harrer [email protected] @feststelltaste