Slide 1

Slide 1 text

Increasing Software Quality using the Provenance of Software Development Processes Andreas Schreiber German Aerospace Center (DLR) Berlin / Braunschweig / Cologne > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013 www.DLR.de • Chart 1

Slide 2

Slide 2 text

Outline • Introduction • Provenance • Software Development Processes • Queries > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013 www.DLR.de • Chart 2

Slide 3

Slide 3 text

Introduction Problem • Today’s software development processes are complex • Massive interaction between developers and tools as well as between tools (manually or automatically) • Tracing and understanding the process is hard • Software isn’t reused because of lack of trust and quality Solution • Recording of process information during runtime • Analysis of recorded information for insight and confidence Standardized (W3C) solution: Provenance www.DLR.de • Chart 3 > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013

Slide 4

Slide 4 text

Provenance Definition Provenance is defined as a record that describes the people, institutions, entities, and activities involved in producing, influencing, or delivering a piece of data or a thing. (W3C Provenance Working Group, http://www.w3.org/2011/prov) > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013

Slide 5

Slide 5 text

Provenance Research Area Since 2002 • Luc Moreau. The foundations for provenance on the web. Foundations and Trends in Web Science, November 2009. • Simmhan, Yogesh L., Beth Plale, and Dennis Gannon: A survey of data provenance in e-science. > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013 www.DLR.de • Chart 5

Slide 6

Slide 6 text

Provenance Application Areas General Areas • Information systems: Origin of data, who was responsible for its creation • Science applications: How the results were obtained • Publications: Origins and references of published results Applications involve • Engineering • Climatology & earth sciences • Finance • Medicine, pharmacy & biomedicine • Security • Software Development www.DLR.de • Chart 6 > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013 http://www.w3.org/2011/prov/wiki/ISWCProvTutorial

Slide 7

Slide 7 text

Provenance Goal Express special “meta” information on the data • Who played what role in creating the data • View of the full revision chain of the data • In case of integrated data, which part comes from which original data and under what process www.DLR.de • Chart 7 > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013 http://www.w3.org/2011/prov/wiki/ISWCProvTutorial

Slide 8

Slide 8 text

Provenance requires a complete model • Describing the various constituents (actors, revisions, etc.) • Balance between • simple (“scruffy”) provenance: easily usable and editable • complex (“complete”) provenance: allows for a detailed reporting of origins, versions, etc. Realizing Provenance > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013 http://www.w3.org/2011/prov/wiki/ISWCProvTutorial

Slide 9

Slide 9 text

W3C Provenance Data Model (PROV-DM) Concepts Nodes • Entity • Activity • Agent Edges • association • responsibility Agent Entity Activity used wasGeneratedBy wasDerivedFrom wasStartedBy wasEndedBy wasAssociatedWith actedOnBehalfOf > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013 www.DLR.de • Chart 9

Slide 10

Slide 10 text

Baking a Cake baking 100 g Butter 2 Eggs 100 g Sugar 100 g Flour Cake wasGeneratedBy > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013 www.DLR.de • Chart 10

Slide 11

Slide 11 text

> ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013 Provenance Life Cycle Provenance database Recording of process Information Query for Provenance of data Administration of Provenance database Application Data (Result) www.DLR.de • Folie 11

Slide 12

Slide 12 text

Software Development Processes > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013 www.DLR.de • Chart 12

Slide 13

Slide 13 text

Typical DLR Software Development Process www.DLR.de • Chart 13 > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013 • Grafik DLR Software Projekt- und Entwicklerhandbuch, M. Bock, A. Hermann, T. Schlauch, 22.10.2009

Slide 14

Slide 14 text

Process Steps www.DLR.de • Chart 14 > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013 Issue Tracking (Requirements, Bugs) Development (Planning, Design, Coding, Testing) Continuous Integration Documentation (Developer, User) Release

Slide 15

Slide 15 text

Provenance Model Activities • Issue Tracking • Development • Continuous Integration • Documentation • Release Entities and Agents • User • Issue • Revision • Release > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013 www.DLR.de • Chart 15

Slide 16

Slide 16 text

Questions and Problems www.DLR.de • Chart 16 > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013 Error detection Which change set resulted in more failing unit tests? Quality assurance How many releases have been produced this year? Process validation From which revision was release X built? Monitoring How much time has been spent implementing issue X? Statistical analysis How many developers contributed to issue X? Developer rating Which developer is most active in contributing documentation? Information Which features are part of release X?

Slide 17

Slide 17 text

Questions and Problems Categorization www.DLR.de • Chart 17 > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013 Single Tool Simple What is the current overall code coverage? Aggregated How did the number of unit tests change in the last month? Multi Tool Developer How many issues were implemented by developer X for release Y? Requirements How much time has been spent implementing issue X? Errors Which requirement causes the most build failures?

Slide 18

Slide 18 text

Implementation Collecting Data www.DLR.de • Chart 18 > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013

Slide 19

Slide 19 text

Implementation Graph Database and Query Language Graph Database Neo4j • High-performance NoSQL graph database Query Language Gremlin • Graph-based programming language for property graphs www.DLR.de • Chart 19 > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013

Slide 20

Slide 20 text

Queries > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013 www.DLR.de • Chart 20

Slide 21

Slide 21 text

How many commits did developer X contribute to release Y? www.DLR.de • Chart 21 > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013

Slide 22

Slide 22 text

How many commits did developer X contribute to release Y? www.DLR.de • Chart 22 > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013 $release := g:key($_g, 'string', string($release)) $commits := $release/outE/inV/inE/outV[@type='commit'] $relevant := $commits[outE/inV[@type='user' and @name=string($developer)]] $count := count($relevant)

Slide 23

Slide 23 text

Which requirement causes the most build failures? www.DLR.de • Chart 23 > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013 $ids := g:dedup(g:key($g, 'type', 'issue')/@identifier) $results := g:map() foreach $id in $ids $issues := g:key($g, 'identifier', string($id)) $revision := $issues/inE/outV[@type='commit'] /inE/outV[@type='revision'] $build := $revision/inE/outV[@type='build'] /inE/outV[@exit_code>0] g:assign($results, $id, count($build)) end $most := g:keys(g:sort($results, 'value', true()))[1]

Slide 24

Slide 24 text

Open Research Topics • Hiding the complexity of queries • Visualization of query results • Standardized semantics/ontology for software development processes www.DLR.de • Chart 24 > ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013

Slide 25

Slide 25 text

> ESA Software Product Assurance Workshop > A. Schreiber • Provenance > 13.06.2013 Questions? Andreas Schreiber Twitter: @onyame http://www.dlr.de/sc Summary • Recording Provenance during run-time • Deep insight into software dev. processes • Higher trust in software quality • Allows reuse with more confidence • Current research field!