Softwarearchäologie_mit_KI_-_Vom_Scherbenhaufen_zum_klaren_Gesamtbild__OOP_2026_.pdf

Softwarearchäologie mit KI Vom Scherbenhaufen zum klaren Gesamtbild OOP 2026,
13.02.2026, München Markus Harrer AI-assisted Software Evolutionist 1

Die schnelllebige KI-Welt 2 Devstral VibeCoding ChatGPT Deep Research GPT
o3 / o4-mini Veo 3 Claude 4 Gemini CLI Context Engineering Agents GLM-4.5 KimiK2 Qwen3-235B-A22B Skills Kilo Code GPT-5 Qwen3-Next gpt-oss DeepSeek-OCR MiniMax M2 gpt-oss-safeguard Gemini 3 GPT-5.3

Nur wenige Dinge bleiben stabil 3

generisch agentiv unterstützend spezialisiert Fülle an KI-Tools Codex CLI 5
Landschaft der KI-Coding-Assistenten, korrigiert und erweitert von Markus Harrer, basierend auf den Ideen von Bilgin Ibryam (https://generativeprogrammer.com/p/ai- coding-assistants-landscape/), aufgegeben zu aktualisieren im Juli 2025 …

Meine Sicht auf das KI-Thema 5 Weg vom Holz /
Code streicheln, hin zum Holz / Code bearbeiten früher heute Bild links: © Deutsche Fotothek , CC BY-SA 3.0 Bild Mitte: © Firma Altendorf, CC BY-SA 3.0 Unported morgen? (eher nicht*) Bild rechts: © Firma Anthon GmbH, fair use *da Fließbandarbeit != Individualentwicklung

6 Aber …

Was ist mit Entwickler:innen und ihren Legacy-Systemen? 7

KI als Retter für Legacy Systeme? 8 Legacy System Wisdom
Blower LogOmagic SellChef Die Zentrale The Sourcerer

KI einfach mal machen lassen? 9 Claude Code: „Refactor everything“
Ähnliche Version: https://www.youtube.com/watch?v=MAnQ5u6JqdI (das war als Witz gedacht) Das steht nun hier, weil einige einmal nach dieser Demo gemeint hatten, es ginge wirklich so einfach Nö!

Das Ergebnis + AI-API-Budget für den Monat überschritten

12 Original: https://x.com/josh_bickett/status/1725556267014595032 Texte leicht angepasst Claude Code Python, pandas,
OpenRouter Meine KI-Tools

Wie uns Archäologie helfen kann, KI im Legacy-Umfeld einzusetzen. 9
ArchAIologie

Archäologie? https://en.wikipedia.org/wiki/File:Archaeology.rome.arp.jpg https://en.wikipedia.org/wiki/File:Dolina-Pano-3.jpg 10

Archäologie Es geht darum, uns in die (Gedanken-)Welten unserer Vorgänger
hineinzuversetzen. 11

Archäologen suchen nach Hinweisen, die Menschen vor uns hinterlassen haben,
und versuchen zu verstehen, was diese bedeuten. Detektive der Vergangenheit “ Archäologe 12 „Archaeology at work", English Heritage Education Service (https://www.youtube.com/watch?v=TFejIkYDH9Q)

13 Altsystem Archäologe Entwickler Was ist da? Was ist es?
Wie wurde es genutzt?

Moderne Archäologietechniken Ausgrabung Typologie Chaîne Opératoire1 Umfang der KI-Nutzung wenig
hoch 14 Was ist da? Wie wurde es genutzt? Was ist es? 1 in etwa so ausgesprochen: „schenn opörätuar“, ~Ablaufkette auf deutsch

Ausgrabung Was ist da? 15 Umfang der KI-Nutzung wenig hoch

17 Ausgrabung Wheeler-Kenyon-Methode Linkes Bild: https://en.wikipedia.org/wiki/Wheeler%E2%80%93Kenyon_method#/media/File:Moza-449.jpg, Bild rechts gemeinfrei

Wheeler-Kenyon-Methode für Code 21 Dateien vermessen Historie ausgraben Hierarchien bilden
Karte darstellen Bilder via NotebookLM

Ausgrabung Wheeler-Kenyon-Methode für Code-Analyse über Git-Log und Tree Maps 18

Ausgrabung 19

Der Elefant im Raum Wo ist die KI? 20

Die KI im Raum hat mir geholfen, das Analyse- skript
zu schreiben 21

Ich hatte früher schon meinen Spaß (und LLMs wissen das!)
22 Meine Software-Analytics-Repositories auf GitHub pandas matplotlib Plotly Jupyter Notebook

Ephemeral Jupyter Notebooks 28 = Verständliche Wegwerfanalysen als Ausgrabungswerkzeuge Ein
Werkzeug zur Erstellung einer computergestützten Erzählung – ein Dokument, das Computercode, dessen Ausgabe (Text, Plots usw.) und menschenlesbaren Text kombiniert, um die Logik zu erklären. Fernando Pérez & Brian Granger: Project Jupyter: Computational Narratives as the Engine of Collaborative Data Science“ (2015).

Weniger Blackboxmagic-AI, 29 Legacy Code Agentic AI Handlungs- anweisungen Legacy
Code Guided AI Handlungs- ideen ✗ ✓ mehr „Guided AI“

Ausgrabung /> Was ist da? 23 Umfang der KI-Nutzung wenig
hoch

Typologie Was ist es? 24 Umfang der KI-Nutzung wenig hoch

Klassifizierung von Artefakten durch Sortierung nach Formen und Arten Typologie
25

Typologie 26 Klassifizierung von Artefakten durch Sortierung nach Namen und
Arten

27 Das Sortieren kann sehr mühsam und repetitiv werden Typologie
Bild: https://pixabay.com/de/photos/%C3%A4gypten-tempel- hieroglyphen-pharao-1197835/ (NadineDoerle)

Typologieerstellung mit LLMs 28 Verwendung vorhandener Namensschemata Repository-Muster repository/*Repository.java Abstrahiert
den Datenzugriff durch Kapselung der Logik, die zum Abrufen, Speichern und Abfragen von Daten erforderlich ist. . ├── model │ ├── Owner.java │ ├── Person.java │ ├── Pet.java │ ├── Specialty.java │ ├── Vet.java │ └── Visit.java ├── repository │ ├── OwnerRepository.java │ ├── PetRepository.java │ ├── VetRepository.java │ └── VisitRepository.java └── web ├── OwnerController.java ├── PetController.java ├── PetValidator.java ├── VetController.java └── VisitController.java repository Repository Repository Repository Repository

Typologie-Prompt für Claude Code 29 Analyze the production Java code
in this codebase and extract distinct concepts. Categorize them into two groups: - technical_concepts: architectural patterns, design tactics, or technical structures - business_concepts: domain-relevant ideas, rules, or terms that represent key business logic For each concept, provide: - name: a short, descriptive name - explanation: a concise description of what the concept is - rationale: why this concept likely exists in the codebase (technical or domain motivation) - file_globs: glob-style patterns of the used naming conventions to identify where this concept appears in the codebase (e.g., **/**Service.java, **/invoicing/**.java) Output the result as a well-structured YAML file with two top-level sections: technical_concepts and business_concepts. Focus only on Java production code (exclude test files, scripts, and configuration files).

Das erwartete Ergebnis vom LLM 30 technical_concepts: - name: "Boundary"
description: "Defines the interfaces for communication between the core business logic (Interactors) and the outer layers (e.g., UI, web services). It includes Request and Response Models." rationale: "Separates the core application from the delivery mechanism, allowing the presentation layer to change independently of the business rules." file_globs: - "**/boundary/*.java" ... business_concepts: - name: "Site" description: "A 'Site' represents a distinct container or context for content like comments, files, and schedules. Most other business concepts are scoped within a specific site." rationale: "The concept of a 'Site' allows for multi-tenancy or partitioning of data, where different users or groups can have their own isolated space within the application." file_globs: - "**/site/**/*.java" Glob-Muster + etwas manuelles Bearbeiten…

Nacharbeitungs-Workbench 13

14 Konzepte

15 Dateiinventar

Git History Source Code

Typologie-Bewertung: Was bringt’s? Wie viele Dateien kann man Konzepten zuordnen?
Verteilung für ein kleines Softwaresystem (~300 Quellcodedateien) Legacy Code Verteilung für ein kleines Softwaresystem (~300 Quellcodedateien) Wenn du eine Datei kennst, die ein Konzept implementiert, kennst du alle anderen Dateien, die dasselbe Konzept implementieren! 31

Fog of War

Typologie 32 Auch: Sehen, wo potenziell noch Unklarheiten bestehen

Typologie Legende der Abdeckung ▪ Beide Konzeptarten ▪ Technisches Konzept
▪ Geschäftliches Konzept ▪ Kein Konzept

Code-Inventarisierung 46 Nicht nur „was ist da“, sondern „wovon ist
wie viel da?“ Technische Konzepte Boundary: 66 file(s) Interactor: 45 file(s) Entity: 62 file(s) Gateway: 17 file(s) Delivery: 19 file(s) RESTful API: 15 file(s) Dependency Injection: 1 file(s) Request Model: 30 file(s) Response Model: 22 file(s) POJO Entities: 10 file(s) Validation: 8 file(s) Business-Konzepte Site: 23 file(s) Comment: 28 file(s) Creator: 9 file(s) File: 22 file(s) Scheduling: 49 file(s) Todo List: 43 file(s) Mail Notification: 8 file(s) + Webseite / Forms + Tabellen + Integrationen … + Zusammenhänge

34 Erweiterte Typologie Bewertung der konzeptionellen Integrität Quelle: L. Adams
Gilmour, Early Medieval Pottery from Flaxengate, Lincoln Bild: https://pixabay.com/de/photos/arch%C3%A4ologie- arch%C3%A4ologische-ausgrabung-59150/

DService KService BService Erweiterte Typologie 35 Dateien innerhalb von Konzepten
erkennen, die nicht das tun, was alle anderen tun CService

Erweiterte Typologie Bewertung der konzeptionellen Integrität mit LLMs [...] Bitte
analysiere den folgenden Quellcode und bewerte, wie gut er das hier angegebene Konzept umsetzt. [...] 36 https://github.com/feststelltaste/software-analytics/tree/master/demos/20260213_OOP_2026

Erweiterte Typologie Bewertung der konzeptionellen Integrität mit LLMs Der Code
[...] stimmt perfekt mit dem Konzept [...] überein. Konfidenz: 1,0 37 https://github.com/feststelltaste/software-analytics/tree/master/demos/20260213_OOP_2026

BService Erweiterte Typologie 35 Dateien innerhalb von Konzepten erkennen, die
nicht das tun, was alle anderen tun Beispiel: Konzept des Services zum Bier auftischen CService DService KService

Konzeptuelle Integritätsanalyse (technisch)

38 Konzeptuelle Integritätsanalyse (fachlich)

39 Konzeptuelle Integritätsanalyse

„Ich habe gar keine Auto Struktur“ 56 Ausblick: Neurosymbolic pattern
mining? AST/CST + LLMs Abstract/Concrete Syntax Tree

Zielbild Semantische Cluster extrahieren und Konzepten zuordnen Nur eine schematische
Darstellung Repositories RESTResources State Machine

Typologie /> Was ist es? 40 Umfang der KI-Nutzung wenig
hoch

Chaîne Opératoire Wie wurde es genutzt? 41 Umfang der KI-Nutzung
wenig hoch

Die Ablaufkette Die Verkettung aller Schritte des Lebenszyklus eines Artefakts
wie z. B. 1. Erstellung 2. Nutzung 3. Wartung 4. Reparatur 5. Entsorgung ... Bild: https://fr.wikipedia.org/wiki/Cha%C3%AEne_op%C3%A9rat oire#/media/Fichier:Cha%C3%AEne_op%C3%A9ratoire.png 42

Chaîne Opératoire für Code? Warum? Es gibt uns einen gewissen
Eindruck von der Komplexität dieser Gesellschaften. [...] bringt uns in die Gedankenwelt dieser Gesellschaften. Quelle: https://www.youtube.com/watch?v=MNp5q3pqkmQ Jason Cohen https://www.intothedustarchaeology.com/ Kein Archäologe, aber spielt einen im TV “ 43

Die Ablaufkette aller Schritte des Lebenszyklus eines Artefakts Chaîne Opératoire
t 44 create public class Customer add testNameCheck() change to BusinessPartner fix tech debt add calculateBonus() refactor testNameCheck() delete BusinessPartner public class BusinessPartner { private String name; private double bonus; public BusinessPartner(String name) { this.name = name; } public double calculateBonus() { ...

Es ist 2025+, wir haben AI Agents! 63

Breaking the Magician's Code: Magic's Biggest Secrets Finally Revealed 64
... using a combination of glob and grep. Claude Code is making use of agentic search “ Anthropic: Transform Legacy Systems into Strategic Assets - Code Modernization with AI https://www.youtube.com/watch?v=8qtSeQuNv0o

Wenn Scope und Aufgabe klar sind, 66 warum nicht?

Chaîne Opératoire/> Wie wurde es genutzt? 50 Umfang der KI-Nutzung
wenig hoch

So what? ArchAIologie 51

Was bringt mir das jetzt? 69 Weniger Angst Oh Mist,
so viel Code!!!! Oh Mist, so viel Code!!!! OK, so viel könnte ich kennen OK, so viel könnte ich kennen Das hier sind ja alles nur Repositories Das hier sind ja alles nur Repositories → →

Was bringt mir das jetzt? 70 Neue Optionen Code Konzept
Idee → → Überdenken? Standardisieren? Verbessern? Auch: https://www.innoq.com/de/blog/2025/10/modern-legacy-dank-ki/ sub encrypt{my($p)=@_;my$a="s";my$b="x" ;my$c=reverse($p.$a.$b.$p);my$d=0;f or(split//,$c.$p.$a.uc($b).reverse( $p)){ $d+=ord($_)*3+length($c)%7 }return "MEGA".$d."END"}print encrypt($ARGV[0]); Ach, eigentlich wollten die nur Passwörter hashen

Was bringt mir das jetzt? 71 Bessere Attraktivität von Arbeiten
an Legacy Code 10

Fragen, Diskussionen, Vorschläge Networking 52 #archaiologie Vielen Dank!

Mehr zum Thema I 73 https://github.com/feststelltaste/awesome-agentic-software-modernization

Mehr zum Thema II 53 Meine Sammlung zum Thema „Software
Analytics“ https://github.com/feststelltaste/awesome-software-analytics

54 Bonusmaterial Du hast so weit gescrollt, du verdienst ein
Geschenk

Manual Work Transformation Tools Guided AI AI assistants AI agents
Developers manually analyze, reason about, and fix issues (based on deep domain and system knowledge) Human-based creation of formal rules and recipes to perform consistent, automated code transformations Human-led detection of issues or anti-patterns, followed by localized AI-generated fixes within defined areas Human-guided AI-based task execution for fixing code in smaller areas / clearly scoped contexts Autonomous systems orchestrate analysis, transformation and validation of modernization workflows General Idea Special issues like redesign of critical parts of business logic or performance optimization Framework migrations, API upgrades, bulk renames, restructurings Identifying systemic issues and using AI to propose or apply localized solutions Summarizing code, generating tests & comments, renaming identifiers, writing code snippets Cleanup ideation, multistep refactoring planning, smaller bug fixing across code bases Possible Use Cases ++ + o - -- Control How much humans can be in the loop -- - - + ++ Risk How likely changes go wrong -- - - o ++ Breadth How wide the method can operate ++ ++ + o o Accuracy How well problematic spots are addressed o ++ ++ o - Traceability How well actions can be tracked ~ - o o o Efforts How much work setup and use need -- ++ o - + Volume How much can be processed Light Version 1.2 Markus Harrer AI for Legacy Modernization: When and How to Use (or not)

Manual Work Transformation Tools1 Guided AI AI assistants AI agents
Developers manually analyze, reason about, and fix issues (based on deep domain and system knowledge) Human-based creation of formal rules and recipes to perform consistent, automated code transformations Human-led detection of issues or anti-patterns, followed by localized AI-generated fixes within defined areas Human-guided AI-based task execution for fixing code in smaller areas / clearly scoped contexts Autonomous systems orchestrate analysis, transformation and validation of modernization workflows General Idea Special issues like redesign of critical parts of business logic or performance optimization Framework migrations, API upgrades, bulk renames, restructurings Identifying systemic issues and using AI to propose or apply localized solutions Summarizing code, generating tests & comments, renaming identifiers, writing code snippets Cleanup ideation, automated, multistep refactoring, bug fixing across multiple code bases Possible Use Cases Very High (humans drive everything) High (humans define transformation logic, execution is automatic) Medium (humans guide focus, agents generate and apply solutions) Low (humans initiate, roughly guide and review AI’s results) Very Low (agents make decisions and act with minimal intervention) Control How much humans can be in the loop Low to Medium (may suffer from outdated assumptions, overconfidence or unclear goals) Medium (when creating recipes) to none (during execution, but also depends on recipe quality) Low (with good problem localization that allows suggestions in limited contexts) High to medium (depends on scope and tasks) Very high to high (esp. with broad tasks and high autonomy + wrong tool use) Risk How likely changes go wrong Very narrow (limited by developers’ cognitive capacities) Narrow (limits defined by AST, LST or recipe capabilities) Narrow (scoped to recognizable patterns or metrics) Limited (current file, code block or interaction context) Very broad (across files, services and task types) Breadth How wide the method can operate Human-level quality (varies by experience) High (precise and deterministic) High (during analysis), medium (during fixing) Medium (but error-prone outside narrow, familiar contexts / training) Medium (depends on prompt quality, feedback loops, available tools) Accuracy How well problematic spots are addressed High (with peer review and diffs) Very high (rules, recipes, diffs) High (analysis steps, reports3, diffs) Medium (prompt history, diffs) Medium (prompts, execution paths, diffs) Traceability How well actions can be tracked Variable (depends on task difficulty) Low to medium4 (depends on reusing existing recipes or creating new ones) Medium (because data analysis needed) Medium (prompt writing, instruction definition, model tuning) First low (“it’s just prompts”), later high (MCPs, skills, orches- tration, validation, security, …) Efforts How much work setup and use require Limited due to the need for deep contextual understanding High-volume, homogeneous code bases Mid-sized codebases (with structural issues) Localized impact (limited by context window ) Large, heterogeneous systems (with recurring issues) Volume How much can be processed 1 e.g. Codemods, OpenRewrite, Rector 2 e.g. using jQAssistant, Semgrep, CodeScene 3 e.g. using Jupyter Notebooks 4 for new recipes, AI might be used Full Version 1.2 Markus Harrer MCP: Model Context Protocol AST: Abstract Syntax Tree LST: Lossless Semantic Tree AI for Legacy Modernization: When and How to Use (or not)

Softwarearchäologie_mit_KI_-_Vom_Scherbenhaufen...

Softwarearchäologie_mit_KI_-_Vom_Scherbenhaufen_zum_klaren_Gesamtbild__OOP_2026_.pdf

More Decks by Markus Harrer

Featured

Transcript