Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Renovate or Rebuild? Architectural Techniques f...

Renovate or Rebuild? Architectural Techniques for the Million Euro Tradeoff

At some point most of us will face the critical decision of whether to modernize a legacy system or rebuild it from scratch. While these decisions can make or break budgets and careers, I have observed that they are often driven by emotion, outdated assumptions and optimism, rather than facts, data and clear rationale.

This talk explains how an architecture-led approach can help us to address the renovate-versus-rebuild dilemma, by moving beyond assumptions to use a decision framework based on proven architectural knowledge.

We'll explore how to analyse the context, from technical debt and operational quality attributes, to opportunity costs and risks, and gather the data we need to understand the tradeoffs that we face. The talk will then demonstrate how an architecture-driven approach can take these inputs and create a rational, data-driven process that results in logical defensible options with clear tradeoffs. Finally, we will discuss how to use our stakeholder communication skills to structure and present these trade-offs to key decision makers in terms they understand and trust.

Whether you're facing a creaking monolith today or are drowning in distributed technical debt, this session will give you a starting point and an approach to to make and defend your own legacy system decisions with confidence.

Avatar for Eoin Woods

Eoin Woods

October 05, 2025
Tweet

More Decks by Eoin Woods

Other Decks in Programming

Transcript

  1. Your system is a little like a “Heath Robinson” machine

    It is working pretty well You can change it … cautiously … if you understand all of the moving parts But perhaps it’s time to think about its future … One morning a senior manager says “let’s rewrite the whole thing!”
  2. DANGER! DANGER! A ”clean sheet” rewrite can come with many

    risks … ¡ Stalled rebuilds ¡ Never ending migrations ¡ Spiralling costs But, continuing to patch up a complex aging platform is risky too. So what to do?
  3. Eoin Woods • Independent consultant • Ex-Chief Engineer at Endava

    based in London (2015-2025) • 10 years in capital markets - UBS and BGI • 10+ years in products - Bull, Sybase, InterTrust ADDRESSING ENERGY EFFICIENCY IN SYSTEM DESIGN: A JOURNEY FROM ARCHITECTURE TO OPERATION EOIN WOODS A thesis submitted in partial fulfilment of the requirements of the University of East London for the degree of Doctor of Philosophy December 2018
  4. RENOVATE VS REBUILD This decision is often made based on

    emotions and assumptions Often made quickly without many facts being available Frequently ends up with a decision people regret Can we use architectural thinking to improve how we make this decision?
  5. Of course, the real decision isn’t “Renovate or Rebuild?” It

    is choosing from a spectrum of options …
  6. THE REAL DECISION Tactical Improvement (Localised improvements when necessary, no

    overall plan) Strategic Improvement (Localised improvements regularly, clear priorities and backlog of work) Incremental Rebuild (Intentional planned progress towards future state, build-in-place, item-by-item) Parallel Rebuild (Green field replacement built in parallel to ongoing work on the system) continuum of options
  7. THE REAL DECISION Approach Strengths Weaknesses Tactical Improvement + low

    commitment + low initial risk + flexible, adaptable, responsive - no clear focus for improvement - may not deliver significant benefit - likelihood of wasted effort Strategic Improvement + manageable risk + clear and manageable changes + incremental benefit delivery - often stalled by changing priorities - mainly deals with localised problems - root causes often not fixed (=> low RoI) Incremental Rebuild + clearly bounded technical changes + clear long-term direction + phased benefit delivery - relatively high risk - complex to accommodate old & new - complex to migrate to new modules Parallel Rebuild + hopefully the best long-term outcome + avoids coexistence compromises + regular new software delivery - often highest overall risk - migration usually extremely complex - parallel work streams causes problems
  8. The question is “What does success look like?” That is,

    what needs to be different to where we are today? This, of course, is stakeholder needs analysis ... and as architects we are good at that!
  9. WHY AN ARCHITECTURE-DRIVEN DECISION? Stakeholder knowledge and experience Tradeoff identification

    and analysis Broad technical knowledge & judgement Structured approach Good communication
  10. AN ARCHITECTURE-DRIVEN APPROACH Step 1: Where Are You? Step 2:

    Potential Benefits Step 3: Remediation Options Step 4: Risks, Constraints and Tradeoffs Step 5: Stakeholder-Led Decision
  11. STEP 1: WHERE ARE YOU? Goal: understand the context of

    the decision Context Critical Concerns Crucial Quality Attributes Code
  12. STEP 1: WHERE ARE YOU? Context System Context Organisational Context

    Critical Concerns Perceptions Complaints Incidents Crucial Quality Attributes Priorities Artefact Measurements Operational Measurements Characteristics of the Code Structure Complexity Evolution Team Context
  13. STEP 2: FIND THE POTENTIAL BENEFITS Goal: explore how you

    can justify large scale investment and change Keep it simple! • What will you be able to do tomorrow that you can’t do today? • Why does that matter? • Who cares about this? • How much do they care? Can they quantify the benefit to them? This isn’t a company strategy exercise (perhaps you need one of those too?)
  14. STEP 2: FIND THE POTENTIAL BENEFITS Improve Reputation with Clients

    Dev Team Sales Why? Who? How? What? Less than one ”Late Reports” incident / month Take better sales ideas to clients Analytics for risk profiles & interest areas “Resilient Delivery” feature & add report data cache Impact Mapping can help make benefits of technical change clear https://www.impactmapping.org
  15. STEP 3: FIND THE REMEDIATION OPTIONS Goal: identify realistic options

    are for realising those benefits Tactical Improvement Backlog of independent incremental changes: • Localised refactoring of modules and mechanisms • Incremental test, CI, UX improvement Strategic Improvement Backlog of change delivering incremental benefit: • System wide refactoring • Technology changes Incremental Rebuild Future state and incremental roadmap: • Module-by-module rebuild plan • Migration strategy for each step Parallel Rebuild Incrementally delivered new implementation: • Green-field architecture choices • Roadmap for incremental build and migration
  16. STEP 3: FIND THE REMEDIATION OPTIONS Characterise each option clearly

    (benefits, size, difficulty, risk) Improvement Option Benefits Scale (S-XL) Difficulty (1-10) Risk (1-10) Refactor request handling into a new library New APIs 25% quicker, better testing & reliability (30% less defects in prod), better monitoring, enable message-based interfaces L 4 3 New dev, test, UAT pipelines Reduced release effort (~40%), increased release reliability (20% less defects in prod), earlier defect identification (70/20/10%), consistency M 6 1 Replace Angular with React Faster change in the UI, easier access to skills, compatibility with corporate tooling XL 7 7 Replace synchronous calls to CRM with message-based Resilience when CRM is slow or unavailable, easier adaptation (new CRM in Q4), monitoring L 5 4 …
  17. STEP 4: RISKS, CONSTRAINTS AND TRADEOFFS Goal: identify what could

    go wrong and understand what is and isn’t possible Risks • Where are the known unknowns? • Are there likely areas of unknown unknowns? • Are the integration points understood? • Are the functions of each part of the system really understood? • Can you change all of the system reliably? If not can you fix this? • … Constraints • How much money is available? • How much risk & change can the organisation bear? • What replacement technologies can the organisation deal with? • Is the team capable of the kind of change envisioned to the system? (Skills, attitude) • …
  18. STEP 4: RISKS, CONSTRAINTS AND TRADEOFFS Many techniques exist for

    analysing tradeoffs … Multi-Criteria Decision Analysis Decision Trees Weighted Pro & Con Lists Trade Off Matrices Scenario Analysis 10-10-10 Rule Opportunity Cost Analysis Quantitative Qualitative Cognitive
  19. STEP 4: TRADE OFF MATRICES Step 1: Define Decision Factors

    Step 2: Capture Factors for Each Option in a Tradeoff Matrix Factor Definition Measurement Migration Risk What is the risk of migrating to each option Likelihood of outage or rollback needed (H, M, L) Bus. Change Risk What level of risk is involved in the business process change needed for an option Likelihood of client visible business interruption (H, M, L) UX What level of UX improvement or reduction is likely for an option Improvement or reduction in UX (from -10 to +10) Ease of Feature Change How much faster or slower is feature delivery likely to be for each option Faster or slower in percentage terms. Option Migration Risk Bus. Change Risk UX Feature Change Add Reporting DB L L +4 -10% Upgrade DB and Schema Extension M L +2 0% Build New Reporting Engine on existing DB H M +8 +50% Integrate with Corporate Reporting Services H H -4 -75% Goal: Address Client Reporting Problems
  20. STEP 5: STAKEHOLDER-LED DECISION Goal: use our data and insights

    to help stakeholders make a good decision Context Concerns Options & Tradeoffs Risks and Constraints ?
  21. STEP 5: STAKEHOLDER-LED DECISION Have a clear strategy Democratic? Consensus?

    Consultative? Choice usually driven by culture & people Focused on the facts Context, concerns, options, risks and constraints Explain tradeoffs Use architectural thinking and stakeholder language Capture them in a standard form for clarity Avoid distortions Preconceptions, assumptions, bias, loudest voices, …
  22. AN ARCHITECTURE-DRIVEN APPROACH Step 1: Where Are You? Step 2:

    Potential Benefits Step 3: Remediation Options Step 4: Risks & Constraints Step 5: Stakeholder-Led Decision
  23. A SYSTEM WITH SOME PROBLEMS <<oracle_db>> System DB <<cpp_svc>> Calculation

    Service <<java_svc>> Trading Support Service <angular_spa>> Trading Support Workbench <<pub/sub>> Message Bus • Capital Markets trading support platform • Highly differentiating (largely unique) • Delivery velocity has slowed dramatically • Formerly reliable, now failing regularly • Complex 600k LOC, multi-technology, db-centric • Highly connected to other parts of the bank The desire is to rewrite … but should they? External Systems
  24. STEP 1: WHERE ARE WE? … CONTEXT <<oracle_db>> System DB

    <<cpp_svc>> Calculation Service <<java_svc>> Trading Support Service <angular_spa>> Trading Support Workbench <<pub/sub>> Message Bus Highly interconnected part organisation’s technology Used by specialised and influential user community
  25. STEP 1: WHERE ARE WE? … CONCERNS <<oracle_db>> System DB

    <<cpp_svc>> Calculation Service <<java_svc>> Trading Support Service <angular_spa>> Trading Support Workbench <<pub/sub>> Message Bus Love the specialised UI … but want reliability and new features Many DB related incidents Defect and reliability incidents Surveys indicate concerns are delivery speed, performance, reliability
  26. STEP 1: WHERE ARE WE? … QUALITY ATTRs & CODE

    ¡ Performance, resilience, reliability from surveys ¡ Operational analysis indicates regular failures due to an inability to cope with peak demand but no other major quality attribute problems ¡ Code analysis indicates complexity, good modularity but large modules, some unexpected dependencies ¡ Code analysis indicates security vulnerabilities ¡ Complex database structure & queries ¡ Analysis and surveys indicate limited test automation, fragile tests, limited deployment automation, relatively poor developer experience
  27. STEP 2: THE BENEFITS ¡ ”The Business” are clear –

    every outage costs them money and every new (financial) product or feature makes them money ¡ TechOps teams say better reliability, less outages, easier management will reduce cost ¡ Development team say reliable testing, easier code changes, easier deployment will allow faster feature development (increasing efficiency) It appears that there are clear financial benefits … what are our options?
  28. <<oracle_db>> Trading DB <<cpp_svc>> Calculation Service <<java_svc>> Trading Support Service

    <angular_spa>> Trading Support Workbench Message Bus <<java_svc>> UI Services <<java_svc>> Risk Service <<oracle_db>> Risk DB <mobile_app>> Trading Support Assistant STEP 3: REMEDIATION OPTIONS <<oracle_db>> Trading DB <<cpp_svc>> CalcNG <react_app>> Trading Support Workbench <<java_svc>> UI Services <mobile_app>> Trading Support Assistant <<java_svc>> Trading Service <<java_svc>> Mobile Services <<oracle_db>> Risk DB <<java_svc>> Risk Service <<oracle_db>> Structure DB <<java_svc>> Structuring Service <<oracle_db>> Account DB <<java_svc>> Account Service <<oracle_db>> Risk DB <<java_svc>> Risk Service API Gateway Strategic Improvements Parallel Rebuild (Partitioning and splitting databases, partition the Java monolith into three, introduce a mobile app for experimental new features, automation everywhere). (Green field, service-based rewrite using more modern technology and decomposition to services with new UI and mobile app).
  29. STEP 4: RISKS AND CONSTRAINTS Risks • Complexity – difficult

    to rewrite 600kloc of code without logic errors • Complexity – making major changes in the existing code is error prone • Migration – Migrating complex existing business to a new system is risky • Team – existing people have narrow skills, long-term effort to fix this • Operation – major change to existing system (e.g. DB) could cause more outages • Operation – parallel run will need comprehensive and complex reconciliation • Regulatory – system is known to the regulators, replacement will need approval • …
  30. STEP 4: RISKS AND CONSTRAINTS Constraints • Team – team

    have little automation experience, split skills Java / JS / C++ • Team – deep domain knowledge in the team (inc. contractors) so change would need to be gradual • Integration – bank clients and other systems do not want integration changes • User Experience – existing users know and like the user interface • Technology – current technology widely understood and supported in the organisation • …
  31. STEP 5: STAKEHOLDER-LED DECISION Current State Estimated benefits Options &

    Tradeoffs Risks and Constraints Context Specific Tradeoffs Business Ops Development • Refactor risk vs migration risk of new build • Refactor risk vs defects in new logic • Disruption vs benefits delivered • Business change risk vs opportunities • Team risks with tech change • … ?
  32. STEP 5: STAKEHOLDER LED DECISION The decision was to perform

    a strategic improvement programme and over 12-18 months the system was systematically improved resolving many of the perceived problems and salvaging the reputation of the system and the team. (The risks of migration, external integration changes and rewriting a lot of complex logic reliably in reasonable time were the main factors in the decision)
  33. CONCLUSIONS: KEY PRINCIPLES Aim for architecture-driven but stakeholder-led decision Data

    collection and analysis key investments for good decisions Stakeholder communication is key Consider all the options, understand their tradeoffs Find the benefits early
  34. AN ARCHITECTURE-DRIVEN DECISION Step 1: Where Are You? Step 2:

    Potential Benefits Step 3: Remediation Options Step 4: Risks, Constraints and Tradeoffs Step 5: Stakeholder-Led Decision
  35. CONCLUSIONS: ARCHITECTURE SKILLS ARE CRUCIAL Data & Analysis Understanding Tradeoffs

    Stakeholder Communication Quality Attributes Identifying Options Evidence-Based & Stakeholder-Led Decision