Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j (Neo4j Online Meetup)

Software Analytics with Jupyter, Pandas, jQAssistant and Neo4j Identifying Problems
in Software Development with Data Analysis Markus Harrer @feststelltaste Neo4j Online Meetup 23rd November 2017

Markus Harrer Software Development Analyst Key Activities Java Development, Data
Analysis in Software Development Areas of Interest Clean Code, Agile, Software Archeology, Software Revival, Epistemology, Cognitive Psychology @feststelltaste feststelltaste.de [email protected] About me

Agenda 1. Motivation 2. Sofware Analytics 3. My impl of
Software Analytics 4. Examples & Demos 5. Summary 6. Q&A

Motivation Everything wrong with Software Development

Meanwhile in the pub…

Symptom Fixing

Lack of Communication $

Politics

Why is sof tware development s till so crazy?

WALL OF IGNORANCE Janelle Klein: IDEAFLOW - How to Measure
the PAIN in Software Development. Leanpub

WALL OF IGNORANCE RISK VISIBILITY Janelle Klein: IDEAFLOW - How
to Measure the PAIN in Software Development. Leanpub

RISK DATA ANALYSIS VISIBILITY My wife

RISK DATA ANALYSIS VISIBILITY Me

Software Analytics Sober Problem Solving with Data Analysis based on
Software Data

Software Analytics is... “... analytics on software data for managers
and software engineers with the aim of empowering software development individuals and teams to gain and share insight from their data to make better decisions.” Tim Menzies, Thomas Zimmermann: Software Analytics - So What?. IEEE Software Magazine

Frequency Questions Use standard tools for everyday‘s questions Use Software
Analytics to tackle high-risk problems Risk / Value Right Insights for better Decisions Adopted from Tim Menzies, Thomas Zimmermann: Software Analytics - So What?. IEEE Software Magazine

Types of Software Data Community chrono- logical Runtime static =>
Problems are interconnected, so should be the data sources!

Tackling problems – automated, data-driven and reproducible. My Guideline Software
Analytics = Data Science on Software Data

Why does it work now? • Domain-Driven Design brings business
language into code • Data Science enables problem analysis for developers • New Tools can create high-level concepts Code Problems Business Language abstract detailed Problems can be connected to concepts in business terms!

My impl of Software Analytics How can Developers use the
Power of Data Analysis in their Daily Work?

What can you do today? • Visualize developer contributions over
time • Identify unused, error-prone or abandoned code • Create a code and problem inventory for legacy systems • Find performance bottlenecks by analyzing call trees • Visualize unwanted dependencies between modules Make specific problems in your software system visible! e. g. Race Conditions, Architecture Smells, Build Breaker, Programming Errors

Choose known tools or tools for plan B* Python Neo4j,
Pandas, Spark * want to learn / profit from in near future on a suitable platform. Jupyter, Zeppelin => Tools shouldn‘t stand in the way!

Notebook an open dialog with data Context Idea Analysis Conclusion
Problem Context documented Ideas, assumptions and heuristics communicated Preprocessing justified Calculations understandable Summaries conclusive Everything automated

Notebook-Driven Data Analysis

Python Data Scientist's Best Friend: Easy, effective, fast programming language
Pandas Pragmatic Data Analysis Framework: Great data structures & integrations with machine learning libraries D3 Visualization Library for Data-Driven Document: Just beautiful, interactive graphics! Jupyter Interactive Notebook: Central hub for data analysis and documentation Basic Tooling

Advanced Tooling: jQAssistant & Neo4j + = scan document validate
https://jqassistant.org/

Advanced Tooling: jQAssistant & Neo4j Main Ideas • Scan software
structures • Store data in Neo4j database • Execute queries • Examine relationships • Add high-level concepts • Validate rules via constraints • Generate reports

jQAssistant – Use Cases Living, self-validating architecture documentation

jQAssistant – Use Cases Java Class Business‘ Subdomain Living, self-validating
architecture documentation + Find design & code smells + Add business perspectives

Neo4j Schema for Software Data Node Labels File Class Method
Commit Relationship Types CONTAINS DEPENDS_ON INVOKES CONTAINS_CHANGE Properties name fqn signature message File Java key value name “Pet” fileName “Pet.java” fqn “foo.bar.Pet” Type File

Cypher Query Example Spring PetClinic “Give me all database objects”
MATCH (t:Type)-[:ANNOTATED_BY]->()-[:OF_TYPE]->(a:Type) WHERE a.fqn="javax.persistence.Entity" RETURN t AS JpaEntity

Toolchain Python, Jupyter XML/Graph Tables Text Data Pandas jQAssistant Input
Pandas, Neo4j Analysis matplotlib xlsx E pptx P Output D3

Examples The complete Toolchain in Action

Example JaCoCo  Pandas  D3 Production Coverage 1. Measure
code coverage in production 2. Calculate ratio of covered lines to all lines 3. Visualize “usage hotspots” with hierarchical bubble chart https://www.feststelltaste.de/visualizing-production-coverage-with-jacoco-pandas-and-d3/

Example Git  Pandas  D3 Knowledge Island* 1. Take
Git log with numstats 2. Calculate proportional contributions for each source code file per author 3. Visualize “ownership” with hierarchical bubble chart * heavily inspired by Adam Tornhill https://www.feststelltaste.de/knowledge-islands/

Example jQAssistant  Neo4j  Pandas  D3 Dependency Analysis
between Bounded Contexts https://www.feststelltaste.de/a-graphical-approach-towards-bounded-contexts/

Example jQAssistant  Neo4j  Pandas  D3 Dependency Analysis
between Bounded Contexts MATCH (s1:Subdomain)<-[:BELONGS_TO]- (type:Type)-[r:DEPENDS_ON*0..1]-> (dependency:Type)-[:BELONGS_TO]->(s2:Subdomain) RETURN s1.name as type, s2.name as dep, COUNT(r) as number https://www.feststelltaste.de/a-graphical-approach-towards-bounded-contexts/ Subdomains => Bounded Contexts that have meaning to business!

Example JProfiler  jQAssistant  Neo4j  Pandas Mining performance
hotspots 1. Record Call Trees 2. Identify which parts of the application code is responsible for most of the DB operations 3. Trace problems back to the root causes https://www.feststelltaste.de/mining-performance-hotspots-with-jprofiler-jqassistant-neo4j-and-pandas-part-1-the-call-graph/ Requests Incoming Outgoing SQL Calls

Example jQAssistant  Neo4j  Pandas Recursive Method Calls MATCH
(m:Method)-[:INVOKES*]->(m) RETURN m

Example jQAssistant  Neo4j  Pandas Recursive Method Calls to
Database MATCH (m:Method)-[:INVOKES*]->(m) -[:INVOKES]->(dbMethod:Method) <-[:DECLARES]-(dbClass:Class) WHERE dbClass.name = "Database" RETURN m, dbMethod, dbClass

Example jQAssistant  Neo4j  Pandas Identify possible Race Conditions
public class OwnerController { ... private static int ownersIndexes; MATCH (c:Class)-[:DECLARES]->(f:Field)<-[w:WRITES]-(m:Method) WHERE EXISTS(f.static) AND NOT EXISTS(f.final) RETURN c.name, f.name, w.lineNumber, m.name static = same field for all instances of that class

Summary

Summary • Tooling for data analysis in software development is
here! • First analyses are easy to do using tools you already know • Specific in-depth analysis are powerful and worthwhile • Connection between business and developers is possible! • Problems can be attached to code that is business-related • Making the impact of risk-taking visible is a must-have to improve! • Jupyter/Pandas & jQAssistant/Neo4j are my favorites • Provide many ways for identifying problems • Help to figure out solutions as well!

Links Markus Harrer • Blog: https://feststelltaste.de • Twitter: https://twitter.com/feststelltaste •
SlideShare: https://www.slideshare.net/feststelltaste • Consulting: http://markusharrer.de jQAssistant/Neo4j • Demos: https://jqassistant.org/get-started/ • Guide: http://buschmais.github.io/jqassistant/doc/1.3.0/ • Talk by Dirk Mahler: https://vimeo.com/170797227

Q&A Questions and Answers

Software Analytics with Jupyter, Pandas, jQAssi...

Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j (Neo4j Online Meetup)

More Decks by Markus Harrer

Other Decks in Programming

Featured

Transcript