Slide 1

Slide 1 text

exactpro.com 1 BUILD SOFTWARE TO TEST SOFTWARE BUILD SOFTWARE TO TEST SOFTWARE exactpro.com Good AI Testing Strategy / Bad AI Testing Strategy: The Difference and Why it Matters Iosif Itkin, co-CEO and co-founder, Exactpro Access slides

Slide 2

Slide 2 text

exactpro.com 2 BUILD SOFTWARE TO TEST SOFTWARE 2 BUILD SOFTWARE TO TEST SOFTWARE Iosif is co-founder and co-CEO of Exactpro – an independent provider of AI-enabled software testing, development and consultancy services for financial organisations. Founded in 2009, Exactpro has a client base of major exchanges, post-trade platform operators, banks and technology vendors across 25 countries. Meet the Speaker Iosif’s expertise at the intersection of high-availability systems and capital markets has enabled him to successfully implement testing strategies and facilitate technology transformations within exchanges, investment banks and clearing and settlement organisations in London, New York, Milan, Singapore, Sydney and other major financial centres. Iosif is a co-author of ‘Introduction to AI Testing: Guide to ISTQB® CT-AI Certification’ released in September 2025 by BCS Publishing. Iosif regularly uses coding assistants in his day-to-day work for a variety of testing and development purposes. Access slides

Slide 3

Slide 3 text

exactpro.com 3 BUILD SOFTWARE TO TEST SOFTWARE About Exactpro Our Client network spans more than a half of top 20 global exchange groups Exactpro is an independent provider of AI-enabled software testing services for financial organisations. Our clients are exchanges, post-trade platform operators and banks across 25 countries. We help our clients to decrease time to market, maintain regulatory compliance, improve scalability, latency and operational resiliency. On top of AI-driven testing and automation capabilities, we offer industry-proven expertise in test strategy development, software development and prototyping, facilitating regulatory compliance and reporting, AI Literacy building and support of enterprise-level AI integration. Headquartered in the UK, Exactpro operates delivery centres in Georgia, Sri Lanka, Armenia, the UK, representative offices in the US, Canada, Italy and a global network of consultants. See more partners & clients Exactpro’s client network See our awards

Slide 4

Slide 4 text

exactpro.com 4 BUILD SOFTWARE TO TEST SOFTWARE

Slide 5

Slide 5 text

exactpro.com 5 BUILD SOFTWARE TO TEST SOFTWARE 5 BUILD SOFTWARE TO TEST SOFTWARE Agenda ● Good strategy/Bad Strategy, The Difference and Why it Matters ● How the last 15 years of AI development affected this timeless classic ● What a Strategy is and what it is not ● Hallmarks of a Bad Strategy ● Why so much Bad Strategy? ● What is the kernel of a Good Strategy? ● Why is Good Strategy so rare? Access slides

Slide 6

Slide 6 text

exactpro.com 6 BUILD SOFTWARE TO TEST SOFTWARE exactpro.com Good strategy/ Bad Strategy, The Difference and Why it Matters. By Richard Rumelt Richard Rumelt is an American professor emeritus working at the University of California, Los Angeles (UCLA) Anderson School of Management. He is best known for his influential work on what distinguishes effective strategy from ineffective ambition. He is the author of ‘Good Strategy/Bad Strategy’ and ‘The Crux’, where he argues that true strategy is rooted in clear diagnosis, coherent guiding policy and focused action rather than vision statements or lofty goals. Rumelt’s thinking has shaped modern strategic practice by exposing the dangers of slogan-driven leadership and emphasising disciplined problem-solving in complex, uncertain environments. A major part of this talk is inspired by my attempts to apply and explore his ideas in my software testing practice. First released: July, 2011

Slide 7

Slide 7 text

exactpro.com 7 BUILD SOFTWARE TO TEST SOFTWARE AI Advancements in the Last 15 Years 2011: Apple releases Siri, pioneering virtual assistants. IBM Watson wins Jeopardy! 2014: Ian Goodfellow introduces Generative Adversarial Networks (GANs), foundational for synthetic media generation. 2015: Over 3,000 AI experts sign an open letter calling for a ban on autonomous weapons. 2016: DeepMind’s AlphaGo defeats Go champion Lee Sedol, marking a breakthrough in reinforcement learning. Hanson Robotics unveils Sophia, a humanoid robot. 2017: Facebook (Meta) researchers observe AI agents developing their own internal language. 2019: AlphaStar reaches Grandmaster level in StarCraft II. 2020: OpenAI demonstrates large-scale, general-purpose language models. 2021: OpenAI releases DALL·E. Anthropic is founded, with a focus on AI safety, alignment, and constitutional approaches to training models. 2022: The generative AI surge accelerates with Midjourney and Stable Diffusion, followed by the November launch of ChatGPT, which reaches 100 million users in two months. 2023: Anthropic releases Claude, a conversational AI model designed around Constitutional AI principles. 2024: Regulatory attention intensifies globally. 2025–2026: 4% of GitHub public commits are being authored by Claude Code right now, according to SemiAnalysis data. ‘At the current trajectory,’ the company believes that ‘Claude Code will be 20%+ of all daily commits by the end of 2026.’

Slide 8

Slide 8 text

exactpro.com 8 BUILD SOFTWARE TO TEST SOFTWARE exactpro.com Too often, organisations mistake ambition for direction. It has always happened with traditional systems, and it keeps happening now. Chasing “AI transformation” without understanding the systems, data or real problems results in testing that confirms vibes instead of discovering truth. We will contrast bad strategies that hide behind slogans with good AI testing strategies that diagnose real challenges, align with enterprise goals and balance speed with evidence-based verification in probabilistic systems. Using practical examples, we will face the final challenge: is your AI testing strategy built to impress – or to endure? Setting the Context Access slides

Slide 9

Slide 9 text

exactpro.com 9 BUILD SOFTWARE TO TEST SOFTWARE exactpro.com Strategy Definition A strategy is a mixture of policy and action designed to surmount a high-stakes challenge. – Rumelt, R. ” “

Slide 10

Slide 10 text

exactpro.com 10 BUILD SOFTWARE TO TEST SOFTWARE 10 BUILD SOFTWARE TO TEST SOFTWARE What strategy is NOT ● Strategy is not a vision statement ‘Become an AI-driven organisation’ or ‘Lead in responsible AI’ may inspire, but they do not define how real problems will be solved. ● Strategy is not a list of initiatives Launching AI pilots, building models or deploying tools does not constitute a strategy if the underlying risks and challenges are not clearly diagnosed. ● Strategy is not a collection of best practices Adopting fairness metrics, governance frameworks, or model monitoring tools without understanding the specific problem they address often results in fragmented controls. ● Strategy is not optimism disguised as planning Assuming that more data, better models, or more compute will automatically solve reliability and safety issues ignores the operational realities of complex systems.

Slide 11

Slide 11 text

exactpro.com 11 BUILD SOFTWARE TO TEST SOFTWARE 11 BUILD SOFTWARE TO TEST SOFTWARE A Plan and a Strategy PLAN is about INPUTS STRATEGY is about OUTPUTS

Slide 12

Slide 12 text

exactpro.com 12 BUILD SOFTWARE TO TEST SOFTWARE 12 BUILD SOFTWARE TO TEST SOFTWARE Software Testing is exploring software with the intent of finding bugs. The purpose of our work is to provide information – detect and describe defects in the software developed for/by our clients and operated by them. The value of this work can be measured across the following three main dimensions: ● Information Quality - testing better (better at detecting and describing defects) ● Speed - testing faster ● Cost - testing cheaper Improvements across any of these dimensions bring value to the testing stakeholders. Good strategy makes it easy. Bad strategy makes it hard. What are Software Testing Outputs?

Slide 13

Slide 13 text

exactpro.com 13 BUILD SOFTWARE TO TEST SOFTWARE exactpro.com Bad Strategy ● Fluff – using words of no consequence ● Failure to face the challenge ● Mistaking goals for strategy ● Bad strategic objectives

Slide 14

Slide 14 text

exactpro.com 14 BUILD SOFTWARE TO TEST SOFTWARE Fluff – Using Words of No Consequence Artificial Intelligence Test Automation Quality Assurance Hybrid Agile ● A hallmark of mediocrity and bad strategy is unnecessary complexity – a flurry of fluff masking an absence of substance. A hallmark of true insight is making a complex subject understandable. ● "The beginning of wisdom is the definition of terms." (attributed to Socrates). Without definition there is no clarity. ● A strategy that defines the general direction but doesn’t drill down to the definitions of notions used in it is a bad strategy. ● The volumes of fluff have increased in the AI era. The term ‘AI’ itself is a fluff word. So, what is the ‘AI’ in the context of your test strategy? How do you define ‘responsibility’? And what is your understanding of ‘responsible AI’ if such a term is featured in your test strategy? Large Language Models Grok ChatGPT Gemini Machine Learning Full Coverage

Slide 15

Slide 15 text

exactpro.com 15 BUILD SOFTWARE TO TEST SOFTWARE Failure to Face the Challenge Does the strategy address the relevant challenges? There are many causes and manifestations of failing to address a central challenge. One example is the Law of Triviality (also known as the Bike-Shed Effect). Coined by British naval historian and author C. Northcote Parkinson in 1957, the law describes the tendency of organisations to give disproportionate weight to trivial, easily understood issues while neglecting, or quickly passing over, complex and important ones. The reason: A nuclear reactor is too complex for everyone to understand, causing participants to defer to experts, whereas everyone can understand and have an opinion on a bike shed. The rule: Parkinson stated that "the time spent on any item of the agenda will be in inverse proportion to the sum [of money] involved".

Slide 16

Slide 16 text

exactpro.com 16 BUILD SOFTWARE TO TEST SOFTWARE Failure to Face the Challenge

Slide 17

Slide 17 text

exactpro.com 17 BUILD SOFTWARE TO TEST SOFTWARE Goals and Objectives Goals are broad, high-level targets that reflect the overarching vision or desired outcome. They tend to be more general and sometimes abstract, serving as a guiding purpose. Goals describe what you want to accomplish (the big picture) Objectives, on the other hand, are the concrete, measurable steps you take to achieve those larger goals. They are typically more specific, include clear benchmarks or criteria for success, and often have set time frames. Objectives outline how you will accomplish it (the measurable steps) DELIVERY SPEED DELIVERY QUALITY DEPLOYMENT FREQUENCY LEAD TIME FOR CHANGE MEAN TIME TO RECOVERY CHANGE FAILURE RATE

Slide 18

Slide 18 text

exactpro.com 18 BUILD SOFTWARE TO TEST SOFTWARE 18 BUILD SOFTWARE TO TEST SOFTWARE Goals and Objectives ● 100% Automated ● Easy to Maintain ● Very Fast ● Consistent and Repeatable ● Applicable to All Systems ● Vendor Independent ● No Logging of Invalid Bugs ● No Flaky Results ● Full Transparency ● Low Cost All of these can be achieved without doing any software testing

Slide 19

Slide 19 text

exactpro.com 19 BUILD SOFTWARE TO TEST SOFTWARE exactpro.com Why so much Bad Strategy? ● The Unwillingness or Inability to Choose ● Template-Style Strategy ● New Thought

Slide 20

Slide 20 text

exactpro.com 20 BUILD SOFTWARE TO TEST SOFTWARE 20 BUILD SOFTWARE TO TEST SOFTWARE ” The Unwillingness or Inability to Choose The essence of strategy is choosing what not to do. Michael E. Porter, American economist and founder of strategic management “

Slide 21

Slide 21 text

exactpro.com 21 BUILD SOFTWARE TO TEST SOFTWARE Do we still need to choose in the age of AI? Pre-AI test ideas implementation AI-assisted test ideas implementation AI-coding assistants enable software testers to convert any of their consistency, regression or exploratory testing ideas into automatic checks within minutes. Why bother making a choice?

Slide 22

Slide 22 text

exactpro.com 22 BUILD SOFTWARE TO TEST SOFTWARE Template-Style Strategy ● A template-style strategy features terms like mission, vision, values and goals, which do not reflect the strategy. ● Bad strategy giveaways: ○ The strategy claims to prioritise generic goals like ‘economic prosperity’ or ‘the use of AI in an efficient and safe way’. ○ The strategy looks applicable to any business/system type/use case and reads like a generic template.

Slide 23

Slide 23 text

exactpro.com 23 BUILD SOFTWARE TO TEST SOFTWARE New Thought – ‘Positive thinking will get us there’ Strategies that are based on positive thinking are bad strategies… but they could still count on people’s earnest desire to make it work. friend trusted partner teammate solution x AI can solve all our problems...

Slide 24

Slide 24 text

exactpro.com 24 BUILD SOFTWARE TO TEST SOFTWARE ” “The scout mindset: the motivation to see things as they are, not as you wish they were. Julia Galef The opposite of positive thinking is American writer, speaker and co-founder of the Center for Applied Rationality

Slide 25

Slide 25 text

exactpro.com 25 BUILD SOFTWARE TO TEST SOFTWARE Good Strategy has a Kernel article video

Slide 26

Slide 26 text

exactpro.com 26 BUILD SOFTWARE TO TEST SOFTWARE exactpro.com A diagnosis that defines or explains the nature of the challenge. A good diagnosis simplifies the often overwhelming complexity of reality by identifying certain aspects of the situation as critical. A guiding policy for dealing with the challenge. This is an overall approach chosen to cope with or overcome the obstacles identified in the diagnosis. A set of coherent actions that are designed to carry out the guiding policy. These are steps that are coordinated with one another to work together in accomplishing the guiding policy. Good Strategy Good strategy is coherent action backed up by an argument, an effective mixture of thought and action with a basic underlying structure. A diagnosis A guiding policy A set of coherent actions Kernel

Slide 27

Slide 27 text

exactpro.com 27 BUILD SOFTWARE TO TEST SOFTWARE Diagnosis that explains the nature of the challenge There is no template, no ‘best practice’, nothing that you can easily borrow – it is about you ● It can be related to platforms, processes, people and information security ● In many large organisations the challenge is, more often, internal ● Given the rapid shifts in the AI technology landscape, for most businesses, the challenge is often dealing with change and competition ● It is frequently rooted in the ability to source, store and process data

Slide 28

Slide 28 text

exactpro.com 28 BUILD SOFTWARE TO TEST SOFTWARE Diagnosis that explains the nature of the challenge There is no template, no ‘best practice’, nothing that you can easily borrow – it is about you ● It can be related to platforms, processes, people and information security ● In many large organisations the challenge is, more often, internal ● Given the rapid shifts in the AI technology landscape, for most businesses, the challenge is often dealing with change and competition ● It is frequently rooted in the ability to source, store and process data … but is there any precise practical guidance that is relevant for everyone?

Slide 29

Slide 29 text

exactpro.com 29 BUILD SOFTWARE TO TEST SOFTWARE Diagnosis that explains the nature of the challenge There is no template, no “best practice”, nothing that you can easily borrow - it is about you ● It can be related to platforms, processes, people and information security ● In many large organisations the challenge is, more often, internal ● Given the rapid shifts in the AI technology landscape, for most businesses, the challenge is often dealing with change and competition ● It is frequently rooted in the ability to source, store and process data … but is there any precise practical guidance that is relevant for everyone? Guiding Policy ● Testers: Get Out of the Quality Assurance Business ● Software testing is just about finding and describing bugs ● Focus on obstacles that make finding bugs difficult, slow or expensive

Slide 30

Slide 30 text

exactpro.com 30 BUILD SOFTWARE TO TEST SOFTWARE Guiding policy – you are not QA ● Treat others the way you want to be treated ● Have agency and personal responsibility ● Hope for serenity, courage and wisdom ● There are no solutions. There are only trade-offs

Slide 31

Slide 31 text

exactpro.com 31 BUILD SOFTWARE TO TEST SOFTWARE Software testing is just about finding and describing bugs ● Testing shows the presence, not the absence of defects ● Exhaustive testing is impossible The Crux of the strategy is something that is both important and solvable Software testing is an Information Service The only type of information software testing can provide is information about the presence of defects, their description and the description of the process for their detection

Slide 32

Slide 32 text

exactpro.com 32 BUILD SOFTWARE TO TEST SOFTWARE Software testing is just about finding and describing bugs https://courses.washington.edu/geog482/resource/14_Beyond_Accuracy.pdf A Conceptual Framework of Data Quality “Since quality is value to some person who matters, a bug is anything about the product that threatens its value (in the mind of someone whose opinion matters); or, you might say, it's anything about the product that makes it less valuable than it could reasonably be.” RST definition In LinkedIn posts you will frequently see something like: AI can generate thousands of tests in seconds, but…

Slide 33

Slide 33 text

exactpro.com 33 BUILD SOFTWARE TO TEST SOFTWARE Nothing someone says before the word ‘but’ really counts AI can generate thousands of tests in seconds, but… ● Tests cases are not testing ● The crux of software testing is finding bugs ● If you test an AI-based/AI-enabled system, do concentrate on how to find bugs in it ● If you use AI to support your testing, think about how to use it to find important bugs faster and at a lower cost. Nothing else matters. Guiding policy is an overall approach chosen to cope with or overcome the obstacles identified in the diagnosis.

Slide 34

Slide 34 text

exactpro.com 34 BUILD SOFTWARE TO TEST SOFTWARE Guiding policy Identify obstacles that: ● Block or obstruct detecting and describing defects in the system under test ● Make software testing slower ● Increase the cost and effort of software testing Guiding policy is an overall approach chosen to cope with or overcome the obstacles identified in the diagnosis.

Slide 35

Slide 35 text

exactpro.com 35 BUILD SOFTWARE TO TEST SOFTWARE AI-assisted teams’ value contribution tiers Tier 1 Tier 2 Tier 3 Tier 4 Tier 5 The developers’ responsibility and contribution value have experienced a shift with the appearance of AI-assisted coding. The ability of individuals and development teams to set the right guardrails and generate coherent products is the new differentiating factor in enterprise-grade software delivery. Aligning concepts and language across features, APIs, data models, documentation and UI; identifying repeated solutions and creating shared patterns; reducing accidental complexity, removing unnecessary variation, duplication and special cases that increase the cognitive load without adding real value; structuring collaboration, aligning contributors and coordinating work across teams. Designing systematic controls that prevent AI (and humans) from scaling mistakes. This guardrail capability is positioned as a core competitive advantage for Exactpro. Analysing an existing system, identifying missing functionality and defining requirements at a business analysis or product level. Designing meaningful checks, exploring edge cases and validating system behavior in depth. Executing predefined plans or task lists using AI tools. A junior person can iterate by addressing errors step by step with AI support until a functionality works.

Slide 36

Slide 36 text

exactpro.com 36 BUILD SOFTWARE TO TEST SOFTWARE Coherent actions ‘Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away.’ – Antoine de Saint-Exupéry Coherence is the ability to remove what should never have been added. ● Focus on system-level clarity and alignment rather than individual features or controls. ● Design structures and behaviors that make senses for the system as a whole. ● Make the system easier to understand, reason about, change and operate – for both humans and machines. ● Reduce accidental complexity, remove unnecessary variation and duplication. Question: ‘Does my work make this system easier to understand – or harder?’

Slide 37

Slide 37 text

exactpro.com 37 BUILD SOFTWARE TO TEST SOFTWARE Retrieval-augmented generation-based system (RAG) RAG - Retrieval-augmented generation ● Multiple Sources ● Nondeterministic ● Adaptable ● Autonomous ● Learning ● Evolving ● Both ML and AI

Slide 38

Slide 38 text

exactpro.com 38 BUILD SOFTWARE TO TEST SOFTWARE Guiding policy – have we seen anything like that before? Gateways Smart Order Router Complex Events Processor, SOR Algorithms and Control mechanisms (safeguards) Trading Admin FEs Operational Data Analytics Metadata, Historical Data Integrations Clients Exchanges & ATSs Market Participants ML AI

Slide 39

Slide 39 text

exactpro.com 39 BUILD SOFTWARE TO TEST SOFTWARE Exactpro’s SOR Testing approach applied to RAG systems

Slide 40

Slide 40 text

exactpro.com 40 BUILD SOFTWARE TO TEST SOFTWARE Exactpro’s test approach for RAG systems testing RAG - Retrieval-augmented generation ✔ Requirements and specifications ✔ various input actions checklists ✔ protocol specifications, machine-readable dictionaries ✔ all relevant reference data actions ✔ Information obtained_from testing and calculated weights weights dataset test basis traffic & log files th2-shark Cradle scripts to iterate through available actions and organise them in test cases output dataset interpretation dataset random cartesian all pairs pre-filtering inputs expectations 3 5 4 6 2 1 modelling and simulation Digital Twins ✔ Property checks ✔ Rules based checks ✔ Baseline/benchmark checks ✔ Reconciliation of: - two different output streams - input and output streams - specific properties across the dataset - received data against expected outcomes ✔ Aggregating the data in the dataset, exploring the anomalies Gateways Processes: gather, understand, recommend RAG, GenAI Platform and Machine Learning Services User Admin FEs Integ- rations Analytics Backoffice Op Data Documents & PDFs Spreadsheets Website log files Websites / UIs Log files Transactional data FX rates Integration APIs Unstructured data Emails Static & Dynamic ref data

Slide 41

Slide 41 text

exactpro.com 41 BUILD SOFTWARE TO TEST SOFTWARE Why most of AI projects fail? MIT NANDA’s GenAI Divide State of AI in Business 2025 study states, ‘Despite $30–40 billion in enterprise investment into GenAI, this report uncovers a surprising result in that 95% of organisations are getting zero return… Just 5% of integrated AI pilots are extracting millions in value, while the vast majority remain stuck with no measurable P&L impact.’ ‘Four patterns emerged that define the GenAI Divide: ● Limited disruption: Only 2 of 8 major sectors show meaningful structural change ● Enterprise paradox: Big firms lead in pilot volume but lag in scale-up ● Investment bias: Budgets favor visible, top-line functions over high-ROI back office ● Implementation advantage: External partnerships see twice the success rate of internal builds’ ‘A small group of vendors and buyers are achieving faster progress by addressing these limitations directly. Buyers who succeed demand process-specific customization and evaluate tools based on business outcomes rather than software benchmarks.’

Slide 42

Slide 42 text

exactpro.com 42 BUILD SOFTWARE TO TEST SOFTWARE exactpro.com ● Bad Strategy is Easy ● Good Strategy is Hard ● Thank You Why is Good Strategy so rare?