Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Chapter 1 – Introduction to Generative AI for S...

Sponsored · Ship Features Fearlessly Turn features on and off without deploys. Used by thousands of Ruby developers.

Chapter 1 – Introduction to Generative AI for Software Testing (ISTQBⓇ CT-GenAI v1.1). Slides

Format: Reading Materials (self-study or guided reading)
Estimated Duration: 100 minutes

Target Audience: Software Testers, Test Automation Engineers, Test Analysts, Test Managers, Software Developers and professionals who need a solid understanding of Generative AI (GenAI) in testing – project managers, quality managers, software development managers, business analysts, IT directors and consultants, professionals preparing for ISTQBⓇ CT-GenAI certification

During this chapter, you will:
•Understand what GenAI and Large Language Models (LLMs) are, how they work and when to use them
•See how LLMs support software testing tasks such as requirements analysis, test case creation, and defect detection
•Learn how multimodal LLMs enhance testing through image and text understanding
•Explore how LLMs assist in test data generation, automation, and result analysis

Join Software Testing Hub via Linkedin: https://www.linkedin.com/groups/16889021/
Join Software Testing Hub via Facebook: https://www.facebook.com/groups/746590458484807

Avatar for Exactpro

Exactpro PRO

May 27, 2026

More Decks by Exactpro

Other Decks in Technology

Transcript

  1. BUILD SOFTWARE TO TEST SOFTWARE exactpro.com ISTQBⓇ CT-GenAI Iuliia Emelianova,

    Dmitrii Degtiarenko TRAINING COURSE ISTQBⓇ CT-GenAI COURSE V1.1 Chapter 1. Introduction to Generative AI for Software Testing
  2. 3 Learning Activity Overview…………………………………………………….…………………………………………………………….…… 5 Learning Objectives………………………………………………………………………………………………..……………………….…….……… 6 1.1 Generative

    AI Foundations and Key Concepts…………………………………………………….….……… 7 1.1.1 AI Spectrum: Symbolic AI, Classical Machine Learning, Deep Learning, and Generative AI…………………………….…….……………………………………….……. 17 1.1.2 Basics of Generative AI and LLMs………………….…….…….…………………………………………………… 23 1.1.3 Foundation, Instruction-Tuned, and Reasoning LLMs.……………………………………………38 1.1.4 Multimodal LLMs and Vision-Language Models………………….…….…….………………………..47 Key Takeaways – 1.1.……………………………………………………………………………………………..…………………………… 58 Reflection – 1.1………………….…….…….…………………………………………………………………………………….……………….. 59 1.2 Leveraging Generative AI in Software Testing: Core Principles.…..………………..….… 60 1.2.1 Key LLM Capabilities for Test Tasks………….…….…….……………………………………………..……..…. 63 1.2.2 AI Chatbots and LLM-Powered Testing Applications…….……………………….………….…… 89 Key Takeaways – 1.2.……………………………………………………………………………………………..…………………………… 102 Reflection – 1.2………………….…….…….…………………………………………………………………………………….……………….. 103 CONTENTS
  3. 4 Key Takeaways and Summary…………………………………………………….…………………………………………………………… 104 Reflection and Knowledge Check…………………………………………………………………..……………………….………………

    105 References…………………………………………………….…………………………………………………………………………………………………… 106 Feedback and Evaluation…………………………………………………………………………………..……………………….……………… 107 CONTENTS
  4. 5 Chapter 1 – Introduction to Generative AI for Software

    Testing (ISTQBⓇ CT-GenAI v1.1) Format Reading Materials (self-study or guided reading) Estimated Duration 100 minutes Target Audience Software Testers, Test Automation Engineers, Test Analysts, Test Managers, Software Developers and professionals who need a solid understanding of Generative AI (GenAI) in testing – project managers, quality managers, software development managers, business analysts, IT directors and consultants, professionals preparing for ISTQBⓇ CT-GenAI certification Programme Context This learning activity forms a part of the ISTQBⓇ CT-GenAI training programme and aligns with the syllabus version 1.1 Engagement During this chapter, you will: • Understand what GenAI and Large Language Models (LLMs) are, how they work and when to use them • See how LLMs support software testing tasks such as requirements analysis, test case creation, and defect detection • Learn how multimodal LLMs enhance testing through image and text understanding • Explore how LLMs assist in test data generation, automation, and result analysis LEARNING ACTIVITY OVERVIEW
  5. 6 By the end of this learning activity, participants will

    be able to: • Recall different types of AI: symbolic AI, classical machine learning, deep learning, and generative AI • Explain the basics of generative AI and large language models • Distinguish between foundation, instruction-tuned and reasoning LLMs • Write and execute a given prompt addressing a test task using a multimodal LLM model • Give examples of key LLM capabilities for test tasks • Compare interaction models when using GenAI for software testing LEARNING OBJECTIVES
  6. 8 Generative AI (GenAI): A type of artificial intelligence system

    that uses machine learning models to generate (new) intellectual content that resembles human-created content Sec. 1.1 GENERATIVE AI Generative Artificial Intelligence (GenAI) is a branch of AI that uses large, pre-trained models to create new content, for example text, images, code, etc.
  7. 9 Large Language Model (LLM): A computer program that uses

    very large collections of language data in order to understand and produce text in a way that is similar to the way humans do Prompt: A natural language input provided to elicit specific response in GenAI and LLMs LARGE LANGUAGE MODELS Sec. 1.1 Large Language Models (LLMs) are computer programs that use very large collections of language data in order to understand and produce text in a way that is similar to the way humans do. In other words, they are GenAI systems pre-trained on huge text collections so they can understand context and respond to natural language inputs or prompts.
  8. 10 Tokenization: The process of breaking down text into smaller

    units (tokens) for processing by language models Context window: The span of text, measured in tokens, that a language model considers when generating responses, influencing the relevance and coherence of its outputs Multimodal model: GenAI model that is capable of processing and generating content across multiple data types, such as text, images, and audio LARGE LANGUAGE MODELS KEY TERMS Sec. 1.1 • Tokenization • Context window • Multimodal models Key terms include: • Tokenization. Splitting text into pieces (or tokens) so the model can process it. • Context window. How much text, in tokens, the model can “see” at once. • Multimodal models. Models that handle more than one kind of data (e.g., text, pictures and audio).
  9. 11 Test case: A set of preconditions, inputs, actions (where

    applicable), expected results and postconditions, developed based on test conditions GENERATIVE AI Sec. 1.1 WWW TEXT TOKENS CONTEXT WINDOW Think of GenAI as an extremely knowledgeable assistant that has read almost everything on the internet. It doesn’t just repeat information, it can generate something new, like a recipe, a poem, or a test case which is a set of preconditions, inputs, actions (where applicable), expected results and postconditions, developed based on test conditions. To understand your request, it chops your text into Lego-like bricks called tokens. The context window is how many Lego bricks it can keep on the table while building an answer.
  10. 12 GENERATIVE AI Sec. 1.1 VIDEO TOKENS IMAGE TOKENS SOUND

    TOKENS TEXT TOKENS A multimodal model is like a person who can read, look at photos, and listen to music all at once, then explain how they fit together.
  11. 13 Acceptance criteria: The criteria that a work product must

    satisfy to be accepted by the stakeholders Test case: A set of preconditions, inputs, actions (where applicable), expected results and postconditions, developed based on test conditions Test script: A sequence of instructions for the execution of a test Test data: Data needed for test execution Sec. 1.1 Reviewing and improving acceptance criteria WHAT TASKS CAN LLMs SUPPORT IN SOFTWARE TESTING Generating test cases Generating test scripts Identifying potential defects Analysing defect patterns Generating synthetic test data Supporting documentation generation In software testing, LLMs can support tasks such as reviewing and improving acceptance criteria, which are the criteria that a work product must satisfy to be accepted by the stakeholders, generating test cases or test scripts which are the sequences of instructions for the execution of tests, identifying potential defects, analysing defect patterns, generating synthetic test data which is data needed for test execution, or supporting documentation generation, across the entire test process.
  12. 14 EXAMPLE 1. Split the sentence into tokens 2. Interpret

    the meaning HOW GENAI MIGHT HELP A TESTER? • Three wrong attempts • Lock the account • Duration is 15 minutes Sec. 1.1 If a user enters an invalid password three times, the system must lock the account for 15 minutes Example. Imagine you paste this requirement into an AI tool: “If a user enters an invalid password three times, the system must lock the account for 15 minutes.” Here’s how GenAI might help a tester: 1. It splits the sentence into tokens so it can “read” the rule accurately. 2. It interprets the meaning: three wrong attempts, lock the account, duration is 15 minutes.
  13. 15 EXAMPLE 1. Split the sentence into tokens 2. Interpret

    the meaning 3. Propose draft test ideas HOW GENAI MIGHT HELP A TESTER? • Enter a wrong password once – expect no lock. • Enter it twice – still no lock. • Enter it three times – expect a lock message and timer starts. • After 15 minutes, try again – login should work. Sec. 1.1 If a user enters an invalid password three times, the system must lock the account for 15 minutes 3. It proposes draft test ideas: Enter a wrong password once – expect no lock. Enter it twice – still no lock. Enter it three times – expect a lock message and timer starts. After 15 minutes, try again – login should work.
  14. 16 EXAMPLE 1. Split the sentence into tokens 2. Interpret

    the meaning 3. Propose draft test ideas 4. Suggest edge cases HOW GENAI MIGHT HELP A TESTER? What happens if you try a fourth time after two minutes? Sec. 1.1 If a user enters an invalid password three times, the system must lock the account for 15 minutes 4. It can even suggest edge cases, like “What happens if you try a fourth time after two minutes?” This shows that GenAI doesn’t just copy text but it also reasons about the rule and turns it into concrete checks a tester could run.
  15. 17 1.1.1 AI SPECTRUM: SYMBOLIC AI, CLASSICAL MACHINE LEARNING, DEEP

    LEARNING, AND GENERATIVE AI (K1) Artificial Intelligence (AI) isn’t a single technology; it’s a spectrum of approaches, each with its own way of solving problems. Understanding them helps you see where Generative AI fits and why it’s special for software testing. Below are the main categories, with what they mean, how to picture them, and how they show up in testing.
  16. 18 AI SPECTRUM Sec. 1.1.1 Symbolic AI: An AI approach

    that uses symbols, rules, and structured knowledge to model reasoning 1. SYMBOLIC AI ? ? ? ? ? if then else while ? ? and or Is password wrong? Display error Grant access + – If password is wrong, then display error, else grant access EXAMPLE • Symbolic AI is early, rule-based AI that represents knowledge as symbols and logic rules. It’s like a chef with a strict recipe book: if you see X, always do Y. No improvising. Testing example: Writing an if/else tree to check login rules (if password is wrong, then display error).
  17. 19 AI SPECTRUM Sec. 1.1.1 Machine Learning (ML): The process

    using computational techniques to enable systems to learn from data or experience 2. CLASSICAL MACHINE LEARNING EXAMPLE • Classical Machine Learning is a data-driven algorithm that learns from data or experience , but still relies on someone to define useful “features” or individual measurable attributes of the input data used for training by an ML algorithm and for prediction by an ML model. This one is like a trainee chef who studies many dishes but still needs a teacher to point out the important ingredients. Testing example: Training a model to predict which code modules are most likely to contain defects, based on past bug statistics.
  18. 20 AI SPECTRUM Sec. 1.1.1 Deep learning: ML using neural

    networks with multiple layers 3. DEEP LEARNING EXAMPLE Layout? Size? Position? Colour? Font? • Deep learning uses large neural networks to automatically discover patterns in huge datasets (text, images, video, sound). It’s like a creative cook who samples millions of meals and figures out flavour rules on their own. Testing example: Scanning thousands of screenshots to detect layout problems, or classifying log messages into “error,” “warning,” and “info.”
  19. 21 AI SPECTRUM Sec. 1.1.1 Generative AI (GenAI): A type

    of AI system that uses ML models to generate (new) intellectual content that resembles human-created content 4. GENERATIVE AI EXAMPLE Test cases Boundary values Test script • Generative AI is a branch of deep learning that doesn’t just recognise patterns. Instead it creates new content (text, images, video, audio, code) based on what it has learned. LLMs are the main type. Here we have an inventive chef who takes everything they’ve learned and writes brand-new recipes. Testing example: From a user story, GenAI can draft test cases, propose boundary values, or even write an automated test script. AI has grown from strict rule engines to models that learn, to systems that create. Generative AI’s strength is that it uses vast pre-training, so you can apply it to testing tasks right away.This means that you don’t need to build a model from scratch. That power also means you must understand its limits and risks, which we’ll explore in later chapters.
  20. 22 AI SPECTRUM Sec. 1.1.1 ? ? ? ? ?

    if then else while ? ? and or AI has grown from strict rule engines to models that learn, to systems that create. Generative AI’s strength is that it uses vast pre-training, so you can apply it to testing tasks right away.This means that you don’t need to build a model from scratch. That power also means you must understand its limits and risks, which we’ll explore in later chapters.
  21. 24 GENAI LLM Sec. 1.1.2 Large Language Model (LLM): A

    computer program that uses very large collections of language data in order to understand and produce text in a way that is similar to the way humans do Transformer: A deep learning model architecture that utilises self-attention mechanisms to capture long-range dependencies in input sequences Generative AI is powered by a family of models called Large Language Models (LLMs). These models are built on a special type of deep-learning architecture called the transformer. They’re trained on enormous collections of text (books, articles, code, websites) so they can learn the structure and meaning of language.
  22. 25 GENAI Sec. 1.1.2 Small Language Model (SLM): Language model

    that is intentionally designed and trained to be small, offering a balance between efficiency and task-specific language understanding SLM LLM Some lighter versions, called Small Language Models (SLMs), use the same principles but have fewer parameters. They’re faster and easier to run, but usually less capable.
  23. 26 Sec. 1.1.2 [[0.021, -0.443, 0.287, 0.004, ..., -0.118], [-0.023,

    -0.111, 0.043, 0.142, ..., 0.237], ……………………………………………………………………………. [0.521, 0.049, -0.278, -0.309, ..., -0.538]] But before an LLM can “understand” text, it must translate words into numbers it can work with.
  24. 27 Sec. 1.1.2 Tokenization: The process of breaking down text

    into smaller units (tokens) for processing by language models TOKENIZATION TEXT: LLM must translate words into numbers it can work with. TOKENIZED TEXT: LLM must translate words into numbers it can work with. TOKEN IDs: 7454, 44, 2804, 24888, 6391, 1511, 8663, 480, 665, 1101, 483, 13, 220 EXAMPLE It does this in two key steps: 1. Tokenization. The model breaks a sentence into small pieces called tokens. A token might be a whole word (“tokenization”), a part of a word (“token” and “ization”), or even punctuation. Just like mentioned before, think of tokens as Lego bricks: the model doesn’t see a sentence as a smooth wall of text; it sees a row of bricks it can rearrange and analyse.
  25. 28 EXAMPLE Sec. 1.1.2 Embedding: A technique used to represent

    tokens as dense vectors in a continuous space, learned during training to capture semantic, syntactic, and contextual relationships TOKENIZATION EMBEDDINGS TOKEN IDs: 7454, 44, 2804, 24888, 6391, 1511, 8663, 480, 665, 1101, 483, 13, 220 EMBEDDINGS: [[0.021, -0.443, 0.287, 0.004, ..., -0.118], [-0.023, -0.111, 0.043, 0.142, ..., 0.237], ……………………………………………………………………………. [0.521, 0.049, -0.278, -0.309, ..., -0.538]] 2. Embeddings. Once text is tokenized, every token is turned into a long vector of numbers that captures its semantic, syntactic, and contextual relationships with other tokens.
  26. 29 EXAMPLE Sec. 1.1.2 EMBEDDING SPACE Similar words (for example,

    bug and defect) end up close together in this high-dimensional “semantic map of meaning.” This is how the model keeps track of context and nuance.
  27. 30 Sec. 1.1.2 According to the ISTQB syllabus, the word

    “bug” is close in meaning to … The transformer architecture lets the model look at all tokens in a sentence at once, figure out which ones depend on each other, and predict the most likely “next token.” That’s how an LLM generates text that feels fluent and logical.
  28. 31 Sec. 1.1.2 What word is close in meaning to

    the word “bug”? It is “fault” It is “defect” However, the model doesn’t store facts the way a database does. Instead, it works with probabilities. Given the same input twice, it might not always give the identical answer. This non-deterministic behaviour comes from sampling different but plausible next tokens.
  29. 32 Sec. 1.1.2 ✓ Handling longer documents ✖ Increased memory

    usage ✖ Increased processing cost ✖ Handling shorter documents ✓ Lower memory usage ✓ Lower processing cost Another important concept is the context window, it’s the maximum number of tokens the model can consider at one time. A bigger window allows it to handle longer documents, such as large test logs, but it also increases memory and processing cost.
  30. 33 Sec. 1.1.2 EXAMPLE You are testing a shopping-cart application

    and want ideas for boundary tests. You paste this requirement into an LLM: A customer can add up to 50 items to the cart. If they try to add more, the system shows an error. A customer can add up to 50 items to the cart. If they try to add more, the system shows an error. [32, 6130, 649, 923, 709, 311, 220, 1135, 3673, 311, 279, 7558, 13, 1442, 814, 1456, 311, 923, 810, 11, 279, 1887, 5039, 459, 1493, 13] Example: Suppose you’re testing a shopping-cart application and want ideas for boundary tests. You paste this requirement into an LLM: “A customer can add up to 50 items to the cart. If they try to add more, the system shows an error.” Here’s what happens behind the scenes: 1. The model splits the sentence into tokens: [“A”, “customer”, “can”, “add”, …].
  31. 34 Sec. 1.1.2 EXAMPLE You are testing a shopping-cart application

    and want ideas for boundary tests. You paste this requirement into an LLM: A customer can add up to 50 items to the cart. If they try to add more, the system shows an error. 2. Each token is converted into an embedding, a point in its “semantic map.”
  32. 35 Sec. 1.1.2 EXAMPLE You are testing a shopping-cart application

    and want ideas for boundary tests. You paste this requirement into an LLM: A customer can add up to 50 items to the cart. If they try to add more, the system shows an error. A customer can add up to 50 items to the cart. If they try to add more, the system shows an error. 3. The transformer looks at relationships: ‘50 items’ relates to ‘add’ and ‘error’.
  33. 36 Sec. 1.1.2 EXAMPLE You are testing a shopping-cart application

    and want ideas for boundary tests. You paste this requirement into an LLM: A customer can add up to 50 items to the cart. If they try to add more, the system shows an error. ✓ Add 49 items – expect success. ✓ Add 50 items – expect success. ✖ Add 51 items – expect error. ✓ After the error, remove a number of items and add them back as long as total ≤ 50 – expect success. 4. Based on those links, it predicts and writes useful suggestions: Add 49 items - expect success. Add 50 items - expect success. Add 51 items - expect error. After the error, remove a number of items and then verify you can still add them back as long as total ≤ 50.
  34. 37 Sec. 1.1.2 EXAMPLE You are testing a shopping-cart application

    and want ideas for boundary tests. You paste this requirement into an LLM: Produce: • formal test cases • Gherkin scenarios ✓ Add 49 items – expect success. ✓ Add 50 items – expect success. ✖ Add 51 items – expect error. ✓ After the error, remove a number of items and add them back as long as total ≤ 50 – expect success. With a well-crafted prompt, you can even ask the model to produce these as formal test cases or Gherkin scenarios.
  35. 38 1.1.3 FOUNDATION, INSTRUCTION- TUNED, AND REASONING LLMs (K2) LLMs

    are usually developed through several stages of training, each adding new capabilities. Knowing these stages helps testers decide which kind of model to use for a given testing task.
  36. 39 Sec. 1.1.3 FOUNDATION LLMs Foundation LLM (Base LLM): General-purpose

    models pre-trained on a wide range of text data, capable of predicting the next word based on learned linguistic patterns ✓ Can handle a wide variety of tasks ✖ Are not specialised for any one domain or style of answer Require extra alignment or fine-tuning Foundation LLMs (or Base LLMs). Foundation models are general-purpose large models trained on extremely broad and diverse data sets, billions of words of text, code samples, sometimes images or audio. Their goal is to learn the structure of language and many topics at once. They can handle a wide variety of tasks (writing, summarising, classifying) but they are not specialised for any one domain or style of answer. To meet a specific need, they often require extra alignment or fine-tuning. In simpler terms, they are like a student who has read an entire library but hasn’t yet learned how to sit an exam. They know a lot, but their answers may be long, unfocused, or off-target.
  37. 40 Sec. 1.1.3 FOUNDATION LLMs Foundation LLM (Base LLM): General-purpose

    models pre-trained on a wide range of text data, capable of predicting the next word based on learned linguistic patterns Oh, I have an hour lecture on this topic What does this mean? EXAMPLE Testing example: Ask a foundation LLM, “What does this login requirement mean for testing?” and it might give a broad discussion about security, usability, or password policies, but not a clean checklist of tests.
  38. 41 Sec. 1.1.3 INSTRUCTION-TUNED LLMs Instruction-tuned LLM: A foundation LLM

    trained to follow instructions, often reinforced by feedback to encourage correct answers How to follow instructions Improved ability to: ✓ understand user intent ✓ follow directions ✓ provide answers in the requested format Instruction-tuned LLMs. Instruction-tuned models are foundation LLMs trained to follow instructions, often reinforced by feedback to encourage correct answers. This step improves their ability to understand user intent, follow directions, and provide answers in the requested format. The training focuses on clarity, task adherence, and producing concise, relevant information.
  39. 42 Sec. 1.1.3 INSTRUCTION-TUNED LLMs Instruction-tuned LLM: A foundation LLM

    trained to follow instructions, often reinforced by feedback to encourage correct answers How to answer questions clearly 1. Valid login 2. Invalid password 3. Account lock after three tries Write three test scenarios EXAMPLE Imagine our library-loving student taking a course called “How to answer questions clearly.” Now they know to stay on topic, write in bullets if asked, and stop when the job is done. Testing example: Give an instruction-tuned model a user story and say, “Write three test scenarios.” It will likely return short, well-structured items, e.g., valid login, invalid password, account lock after three tries.
  40. 43 Sec. 1.1.3 REASONING LLMs Reasoning LLM: An LLM building

    upon instruction-tuned models by refining their ability to emulate human-like reasoning processes Focus on: ✓ logical inference ✓ multi-step problem solving ✓ chain-of-thought reasoning Are better at tasks that require: • intermediate steps • careful synthesis Reasoning LLMs. Reasoning models are instruction-tuned LLMs that undergo additional training or optimisation to improve structured thinking. They focus on skills such as logical inference, multi-step problem solving, and chain-of-thought reasoning. This makes them better at tasks where intermediate steps and careful synthesis are required, including technical or analytical problems. Let’s continue with our analogy. The student now joins a logic and problem-solving workshop. They learn to break puzzles into smaller parts, show their working, and check each step before deciding.
  41. 44 Sec. 1.1.3 REASONING LLMs Reasoning LLM: An LLM building

    upon instruction-tuned models by refining their ability to emulate human-like reasoning processes 1. Separate the conditions 2. Reason about thresholds A shopper gets a 10% discount if they buy more than 3 items and total cost exceeds $100; shipping is free if total > $150. Suggest boundary test cases. EXAMPLE Testing example: Present a reasoning LLM with: “A shopper gets a 10% discount if they buy more than 3 items and total cost exceeds $100; shipping is free if total > $150. Suggest boundary test cases.” It can separate the conditions, reason about thresholds ($100 vs $150),
  42. 45 TEST ID ITEMS TOTAL ($) DISCOUNT EXPECTED FREE SHIPPING

    EXPECTED NOTES TC-01 3 99.99 ❌ ❌ Below all thresholds TC-02 3 100.00 ❌ ❌ Cost boundary (100) TC-03 3 100.01 ❌ ❌ Items boundary not met TC-04 4 99.99 ❌ ❌ Cost just below discount TC-05 4 100.00 ❌ ❌ Cost boundary (100) TC-06 4 100.01 ✅ ❌ Discount boundary crossed TC-07 4 149.99 ✅ ❌ Below free shipping TC-08 4 150.00 ✅ ❌ Shipping boundary TC-09 4 150.01 ✅ ✅ Free shipping boundary crossed TC-10 3 150.01 ❌ ✅ Free shipping only TC-11 5 150.00 ✅ ❌ More items, shipping boundary TC-12 5 200.00 ✅ ✅ All conditions satisfied Sec. 1.1.3 EXAMPLE Test matrix and deliver an organised matrix of tests, rather than one jumbled paragraph.
  43. 46 Sec. 1.1.3 WHY THIS MATTERS FOR TESTERS LLM TYPE

    CORE STRENGTH WHEN TO USE FOUNDATION Broad knowledge, flexible but unspecialised Early exploration, summarising requirements INSTRUCTION-TUNED Follows directions, produces tidy artefacts Generating test cases, checklists, defect summaries REASONING Handles complex logic and planning Analysing business rules, prioritising cases, explaining failures Why this matters for testers LLM type Core strength When to use Foundation Broad knowledge, flexible but unspecialised Early exploration, summarising requirements Instruction-Tuned Follows directions, produces tidy artefacts Generating test cases, checklists, defect summaries Reasoning Handles complex logic and planning Analysing business rules, prioritising cases, explaining failures Choosing the right type and knowing its limits helps you get outputs that fit your testing goals while avoiding over-reliance on a model that isn’t built for deep reasoning.
  44. 48 Sec. 1.1.4 MULTIMODAL LLMs FIRST LLMs Large Language Models

    originally worked only with text. But many real-world tasks need more than words: pictures, diagrams, audio, even video. That’s where multimodal LLMs come in, because they extend the same transformer technology to handle several types of input and combine them in a single reasoning process.
  45. 49 [[0.021, -0.443, 0.287, 0.004, ..., -0.118], ……………………………………………………………………………. [0.521, 0.049,

    -0.278, -0.309, ..., -0.538]] Sec. 1.1.4 MULTIMODAL LLM [[0.741, 0.533, -0.911, -0.104, ..., 0.681], ……………………………………………………………………………. [-0.119, 0.949, 0.768, 0.893, ..., 0.844]] [[-0.215, -0.853, 0.746, 0.497, ..., 0.448], ……………………………………………………………………………. [-0.521, -0.497, 0.278, 0.559, ..., 0.588]] [[0.021, 0.099, -0.711, -0.422, ..., -0.463], ……………………………………………………………………………. [0.229, 0.349, 0.008, 0.901, ..., 0.378]] [[-0.081, -0.499, 0.009, -0.034, ..., -0.268], ……………………………………………………………………………. [-0.121, 0.884, -0.338, -0.539, ..., 0.104]] A multimodal LLM is a model trained to process and relate information from different data modalities (for example, text, images, sound, or video). Each type of input is first converted into a numerical representation (or embedding) appropriate for that data. The model then aligns these embeddings in a shared space, letting it understand how an image caption relates to the picture, or how a spoken command links to on-screen text.
  46. 50 Sec. 1.1.4 MULTIMODAL LLM An analogy here would be

    an investigator who doesn’t just read witness statements but also studies photos, listens to recordings, and watches CCTV footage and then pieces everything together to solve a case.
  47. 51 Sec. 1.1.4 VISION-LANGUAGE MODEL Vision-language model: A GenAI system

    that jointly processes visual and textual data to perform tasks by linking and generating content across both modalities CAT BIRD ELEPHANT TIGER A vision-language model is a specialised subset of multimodal LLMs trained mainly on text–image pairs. They learn how visual elements connect to written descriptions and can answer questions or generate text about an image. It’s like a bilingual person fluent in both “picture language” and “written language”, able to translate between them or discuss both at once.
  48. 52 Sec. 1.1.4 MULTIMODAL LLMs Software testers often work with

    visual artefacts in the form of screenshots, mock-ups, wireframes, charts, as well as textual specs.
  49. 53 Sec. 1.1.4 I see an accessibility issue MULTIMODAL LLMs

    GUI analysis CAN HELP WITH Multimodal models can bridge the gap: • GUI analysis: Supply a screenshot of an app and ask, “List any accessibility issues you notice.”
  50. 54 Sec. 1.1.4 I can generate clear and testable acceptance

    criteria MULTIMODAL LLMs GUI analysis Wireframe and acceptance criteria CAN HELP WITH • Wireframe (or simple visual outline of a screen) and acceptance criteria: Give a page mock-up plus a short story; ask the model to propose acceptance criteria that match what’s on the screen (for example, what input fields exist, what happens when you click each button, are there navigation flows that need to be tested, etc.)
  51. 55 Sec. 1.1.4 MULTIMODAL LLMs GUI analysis Wireframe and acceptance

    criteria Image-based defect detection CAN HELP WITH • Image-based defect detection: Compare an expected screen image with an actual run; the model can highlight missing buttons or colour mismatches.
  52. 56 Sec. 1.1.4 GUI analysis Wireframe and acceptance criteria Image-based

    defect detection Hybrid reasoning CAN HELP WITH MULTIMODAL LLMs • Hybrid reasoning: Mix logs, a screen capture, and a requirement paragraph to explain why a test failed.
  53. 57 Sec. 1.1.4 MULTIMODAL LLMs EXAMPLE The button is shifted

    20px to the right; it should be centred Example: A tester uploads: a screenshot of a login page (with “Username,” “Password,” and a misaligned “Login” button) and a text snippet: “The login button should be horizontally centred.” A multimodal LLM can: Recognise the button in the image. Check its position against the requirement. Reply: “The button is shifted 20px to the right; it should be centred.” This blend of visual understanding and textual reasoning lets testers detect issues that would be hard for a text-only model.
  54. 58 • GenAI uses large pre-trained models to create new

    content such as text, code, and images • LLMs are GenAI systems trained on vast text data to understand and generate human-like language • Tokenization and embeddings convert text into numerical representations that models can process • LLMs are probabilistic and non-deterministic, meaning the same input can produce different outputs • The context window limits how much information the model can process at once, impacting performance on long inputs • Different LLM types serve different purposes: ◦ Foundation models provide broad knowledge but may lack focus ◦ Instruction-tuned models follow directions and produce structured outputs ◦ Reasoning models handle complex logic and multi-step problem solving • Multimodal LLMs extend capabilities beyond text to include images, audio, and other data types • LLMs can support software testing by generating test ideas, analysing requirements, and assisting with complex scenarios, but still require human oversight KEY TAKEAWAYS – 1.1
  55. 59 1. How would you decide which type of LLM

    (foundation, instruction-tuned, or reasoning) is most appropriate for a specific testing task in your project? 2. In what situations could a multimodal LLM provide clear advantages over a text-only model in your testing workflow? 3. What risks might arise from relying on probabilistic LLM outputs, and how would you mitigate them as a tester? REFLECTION – 1.1
  56. 60 1.2 LEVERAGING GENERATIVE AI IN SOFTWARE TESTING: CORE PRINCIPLES

    Generative AI isn’t just an academic curiosity, it can become a powerful tool for testers.
  57. 61 Test Monitoring and Control Test Planning Test Design Test

    Implemen- tation Test Execution Test Completion Test Analysis GENERATIVE AI AND LLMs Don’t replace human judgment Take over repetitive chores Spark new ideas Speed up analysis Sec. 1.2 Because LLMs understand language, code, and even images, they can support almost every phase of the testing process. They don’t replace human judgment, but they can take over repetitive chores, spark new ideas, and speed up analysis.
  58. 62 Sec. 1.2 Q A LEVERAGING GENAI Conversational Tools (AI

    Chatbots) LLM-Powered Testing Applications Broadly, testers interact with GenAI in two ways: • Conversational tools (AI chatbots): a friendly interface where you type or speak a question and get an instant answer. • LLM-powered testing applications: testing tools that integrate an LLM behind the scenes, giving you features such as automated test case generation or defect clustering. Understanding what LLMs are actually good at helps you decide when (and when not) to call on them.
  59. 63 1.2.1 KEY LLM CAPABILITIES FOR TEST TASKS (K2) Below

    are some of the most useful capabilities LLMs can bring to software testing.
  60. 64 Sec. 1.2.1 KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING

    1. REQUIREMENTS ANALYSIS AND IMPROVEMENT How many exactly? LLMs can: • Read requirements, user stories, or other “test basis” documents • Look for ambiguities, contradictions, or missing information • Generate meaningful questions to help clarify requirements for stakeholders Lock the account after several failed attempts REQUIREMENTS 1. Requirements analysis and improvement. LLMs can read requirements, user stories, or other “test basis” documents, looking for ambiguities, contradictions, or missing information. They can generate meaningful questions to help clarify requirements for stakeholders. Example: When presented with a paragraph about password rules, the AI, like a skilled proof-reader, spots unclear wording (“lock the account after several failed attempts”, how many exactly?) and suggests clarifying it.
  61. 65 EXPECTED RESULTS Sec. 1.2.1 KEY LLM CAPABILITIES RELEVANT TO

    SOFTWARE TESTING 2. TEST CASE CREATION SUPPORT LLMs can: • Draft test cases from requirements or user stories • Suggest preconditions, steps, and expected results PRECONDITIONS STEPS USER STORY 2. Test case creation support. LLMs can draft test cases from requirements or user stories, suggesting preconditions, steps, and expected results.
  62. 66 Sec. 1.2.1 KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING

    2. TEST CASE CREATION SUPPORT LLMs can: • Draft test cases from requirements or user stories • Suggest preconditions, steps, and expected results As a user, I can reset my password via email. USER STORY TC1 – Request reset TC2 – Follow link TC3 – Use expired link TC4 – Try with wrong email Example: If you give it a story like “As a user, I can reset my password via email.” The AI proposes test cases: request reset, follow link, use expired link, try with the wrong email.
  63. 67 Sec. 1.2.1 KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING

    3. TEST ORACLE GENERATION Test oracle (oracle): A source to determine an expected result to compare with the actual result of the system under test FAILED PASSED 3. Test oracle generation. A test oracle is the source of truth that tells you whether a test passed or failed, in other words, the expected result.
  64. 68 Sec. 1.2.1 KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING

    3. TEST ORACLE GENERATION Test oracle (oracle): A source to determine an expected result to compare with the actual result of the system under test CORRECT BEHAVIOUR BUG I don’t know Without a reliable oracle, even the best test execution leaves you unsure: “Is this behaviour correct, or is it a bug?” The oracle problem is a long-standing challenge in software testing. Sometimes requirements are incomplete, ambiguous, or missing edge cases. This uncertainty exists in conventional testing and remains a challenge even when AI is involved, because models themselves don’t magically know the “true” answer.
  65. 69 Sec. 1.2.1 KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING

    3. TEST ORACLE GENERATION Why not 5? Test oracle problem (oracle problem): The challenge of determining whether a test has passed or failed for a given set of test inputs and state Imagine you’re grading essays without an answer key. You can guess which ones are good, but you don’t always know with certainty. That’s the everyday life of a tester facing the oracle problem.
  66. 70 Sec. 1.2.1 KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING

    3. TEST ORACLE GENERATION Require interpretation Should flag unusual behaviour Function similarly to: • fraud-detection systems • IT monitoring platforms • market surveillance tools Their establishing may be difficult without access to the “ground truth” Ground truth: The information provided by direct observation and measurement that is known to be real or true WHAT ARE TEST ORACLES Test oracles require interpretation and should be sensitive enough to flag genuinely unusual behaviour without overwhelming testers with minor issues. They function similarly to fraud-detection systems, IT monitoring platforms, or market surveillance tools. For complex or probabilistic systems, establishing a test oracle may be difficult without access to the 'ground truth', the actual real-world result that the system aims to predict.
  67. 71 Sec. 1.2.1 KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING

    3. TEST ORACLE GENERATION Here is my rating! In some cases, expected results can be defined within limits through expert consultation, though experts may disagree or be unwilling to have their judgment automated.
  68. 72 Sec. 1.2.1 KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING

    3. TEST ORACLE GENERATION I’m not sure if this is good I don’t like it But we do Issues such as varying competence, differing interpretations, and human uncertainty must be considered.
  69. 73 Sec. 1.2.1 KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING

    3. TEST ORACLE GENERATION Back-to-back (differential) testing A/B testing Metamorphic testing TEST TECHNIQUES HELPING MITIGATE THE ORACLE PROBLEM Back-to-back (differential) testing: Testing to compare two or more variants of a test item or a simulation model of the same test item by executing the same test cases on all variants and comparing the results AI can: • automate the comparison • highlight suspicious differences Several testing techniques can help mitigate the oracle problem, including A/B testing, back-to-back testing, and metamorphic testing. • Back-to-back (or differential) testing: Run the same input on two different implementations (e.g., legacy system vs new system, or two models) and compare outputs. AI can help automate the comparison or highlight suspicious differences.
  70. 74 Sec. 1.2.1 KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING

    3. TEST ORACLE GENERATION Back-to-back (differential) testing A/B testing Metamorphic testing TEST TECHNIQUES HELPING MITIGATE THE ORACLE PROBLEM A/B testing: A statistical testing approach to determine which of two components or systems performs better (typically requires several test runs) LLM can: • design the test plan • collect feedback • spot anomalies in results • A/B testing: Present different user groups with different versions (A vs B), then analyse outcomes. An LLM can assist in designing the test plan, collecting feedback, or spotting anomalies in results.
  71. 75 Sec. 1.2.1 KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING

    3. TEST ORACLE GENERATION Back-to-back (differential) testing A/B testing TEST TECHNIQUES HELPING MITIGATE THE ORACLE PROBLEM Metamorphic testing: A test technique in which the inputs and expected results are extrapolated from a passing test case using a metamorphic relation LLM can: • generate relationships • check relationships against results Metamorphic testing Cleaning time T 2 = 2 T 1 because the area S 2 = 2 S 1 • Metamorphic testing: Define relationships between inputs and outputs that should always hold true. For example, doubling an input should double an output. LLMs can help generate these relationships or check them against results.
  72. 76 KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING 3. TEST

    ORACLE GENERATION Here he is! What do I do now? The oracle problem is one of the hardest in testing because no automation, AI or not, can conjure up a “ground truth” that isn’t specified or agreed on by stakeholders. What GenAI adds is the ability to assist in mitigating the problem: by proposing expectations, highlighting ambiguities, and supporting structured approaches like back-to-back, A/B, or metamorphic testing. But ultimate responsibility still lies with the tester to judge correctness.
  73. 77 KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING 3. TEST

    ORACLE GENERATION EXAMPLE REQUIREMENTS Progressive tax applies There are no exact threshold values But we have a legacy system… Sec. 1.2.1 Tax calculator should apply different rates depending on income brackets Example: Suppose a tax calculator should apply different rates depending on income brackets. The requirement document only says “progressive tax applies” but doesn’t list exact thresholds.
  74. 78 FOUNDATION MODEL Proposes typical brackets ANNUAL INCOME TAX RATE

    Up to 25,000 10% 25,001 – 75,000 20% Above 75,000 30% REASONING MODEL BACK-TO-BACK TESTING Set up metamorphic checks If tax on $20,000 is X, then tax on $40,000 should be greater than X Compare results between the old system and the new one KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING 3. TEST ORACLE GENERATION EXAMPLE Sec. 1.2.1 A foundation model could propose typical brackets (but you must validate them). Different approaches could be applied here. A reasoning model might help set up metamorphic checks: “If tax on $20,000 is X, then tax on $40,000 should be greater than X.” Back-to-back testing could compare results between the old system and the new one.
  75. 79 KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING 4. TEST

    DATA GENERATION Sec. 1.2.1 LLMs can: • Create datasets • Set boundary values • Create different combinations of test data including synthetic data 4. Test data generation. LLMs can create datasets, set boundary values, and create different combinations of test data including synthetic data that mimics production without exposing sensitive info.
  76. 80 KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING 4. TEST

    DATA GENERATION Sec. 1.2.1 LLMs can: • Create datasets • Set boundary values • Create different combinations of test data including synthetic data Generate 20 fake customer profiles with valid but diverse phone numbers and emails # NAME PHONE NUMBER EMAIL 1 Alex Morgan +1 415-555-0134 [email protected] 2 Priya Shah +91 98765 43210 [email protected] ---------------------------------------------------------------------------------------- 19 Pedro Alvarez +52 55 5555 4321 [email protected] 20 Jonas Nilsson +46 70 555 9921 [email protected] EXAMPLE Example: “Generate 20 fake customer profiles with valid but diverse phone numbers and emails.”
  77. 81 KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING 5. TEST

    AUTOMATION SUPPORT Sec. 1.2.1 LLMs can: • Generate test scripts from the description 5. Test automation support. LLMs can help generate test scripts from the description
  78. 82 KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING 5. TEST

    AUTOMATION SUPPORT Sec. 1.2.1 LLMs can: • Generate test scripts from the description • Improve existing test scripts: ◦ Suggest changes ◦ Identify appropriate test design techniques You need to change this and improve existing test scripts by suggesting changes and identifying appropriate test design techniques.
  79. 83 KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING 5. TEST

    AUTOMATION SUPPORT Sec. 1.2.1 EXAMPLE Generate a test script in Python based on the following description: Log in as admin then open report page then export CSV Here’s a simple, realistic Python test script that follows your flow: Log in as admin → open report page → export CSV I’ll assume a web UI test using Selenium + pytest, which is the most common setup. Everything is clearly marked so you can adapt URLs, selectors, or credentials. Example: Selenium + pytest test script import os import time import pytest from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.chrome.options import Options ………………… Example: “Generate a test script in Python based on the following description: Log in as admin then open report page then export CSV.” AI produces a runnable script.
  80. 84 KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING 6. TEST

    RESULT ANALYSIS Sec. 1.2.1 LLMs can: • Create summaries • Classify anomalies • Analyse severity and priority • Review: ◦ Logs ◦ Outputs ◦ Dashboards 6. Test result analysis. LLMs can help analyse test results by creating summaries and classifying anomalies based on severity and priority via reviewing logs, outputs, or dashboards. Example: Feed in a test-run log; the model clusters similar errors and marks the one that likely causes most failures.
  81. 85 Test-run log KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING

    6. TEST RESULT ANALYSIS Sec. 1.2.1 🔹 Error Clusters Cluster 1: Report Page Load Timeout Cluster 2: CSV Export Failure (Disk Full) Cluster 3: Cascading Assertion Failures 🚨 Most Likely Root Cause (Main Failure Driver) 🟥 Disk Full on Test Environment EXAMPLE Example: Feed in a test-run log; the model clusters similar errors and marks the one that likely causes most failures.
  82. 86 KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING 7. TESTWARE

    CREATION AND MAINTENANCE Sec. 1.2.1 LLMs can: • Draft or update: ◦ Plans ◦ Strategies ◦ Defect reports ◦ Release notes 7. Testware creation and maintenance. Beyond cases, LLMs can help draft or update plans, strategies, defect reports, or release notes.
  83. 87 KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING 7. TESTWARE

    CREATION AND MAINTENANCE Sec. 1.2.1 EXAMPLE Example: “Write a concise defect description from this log and screenshot.” AI produces a clear ticket ready for your tracking tool.
  84. 88 KEY LLM CAPABILITIES RELEVANT TO SOFTWARE TESTING Sec. 1.2.1

    LLMs are versatile companions for testers LLMs outputs depend on: • prompt engineering • careful human review LLMs free testers from routine work LLMs are versatile companions for testers: they read, reason, write, and translate between natural language, code, and even images. But remember: their outputs depend on how you ask (prompt engineering will be covered in Chapter 2) and on careful human review. Used wisely, they free testers from routine work and open space for deeper thinking about quality and risk.
  85. 90 Sec. 1.2.2 MAIN FORMS OF GENAI TESTING TOOLS AI

    chatbot: A conversational agent that uses LLMs to process queries and generate human-like text responses, enabling interactive communication with users EXAMPLES: • ChatGPT • Gemini AI Chatbots or Conversational Assistants LLM-Powered Testing Applications or Software Testing Tools Generative AI can appear in testing through two main forms of tools: 1. AI chatbots or conversational assistants you interact with directly, like ChatGPT or Gemini. 2. LLM-powered testing applications or software testing tools that quietly integrate an LLM behind the scenes to enhance traditional testing workflows.
  86. 91 Sec. 1.2.2 MAIN FORMS OF GENAI TESTING TOOLS •

    Both rely on language models • Serve different purposes • Have different levels of automation AI Chatbots or Conversational Assistants LLM-Powered Testing Applications or Software Testing Tools Both rely on the same underlying technology (language models), but they serve different purposes and levels of automation. Understanding their roles helps testers know when to rely on a chatbot’s flexibility and when to benefit from a specialised testing platform.
  87. 92 AI CHATBOTS Sec. 1.2.2 I’m a tester. Could you

    help me in my work? Absolutely 🙂 I’ve got you. I can help you with software testing in pretty much any way you need Could you interpret code, documents, and images? Yes — absolutely 👍 I can interpret and work with code, documents, and images as part of your testing work (and beyond). An AI chatbot is a general-purpose conversational interface built around an LLM. It accepts natural-language prompts and produces natural-language responses, often in real time. Some can also interpret code, documents, or images, depending on their capabilities. Think of it as a knowledgeable colleague you can talk to at any moment, the one who has read mountains of documentation and never tires of brainstorming.
  88. 93 Sec. 1.2.2 Quickly explore requirements or clarify ambiguities by

    pasting text and asking questions HOW TESTERS USE AI CHATBOTS Brainstorm test ideas, e.g. “List edge cases for password validation.” Translate or rephrase requirements into Gherkin syntax or structured test steps Summarise logs, reports, or defect descriptions Experiment interactively (refining prompts until the response fits the context) How testers use it: • Quickly explore requirements or clarify ambiguities by pasting text and asking questions. • Brainstorm test ideas: “List edge cases for password validation.” • Translate or rephrase requirements into Gherkin syntax or structured test steps. • Summarise logs, reports, or defect descriptions. • Experiment interactively (refining prompts until the response fits the context).
  89. 94 Sec. 1.2.2 PROS CONS Provide flexibility No built-in control

    over: • traceability • history • integration with test management tools Are great for discovery Require human verification and adaptation AI CHATBOTS Chatbots provide flexibility but no built-in control over traceability, history, or integration with test management tools. They’re great for discovery but require human verification and adaptation before outputs are used in real projects.
  90. 95 LLM-POWERED TESTING APPLICATIONS Sec. 1.2.2 Integrate LLMs into workflows

    Use: • prompts • structured templates • pipelines These are specialised testing tools that integrate LLMs into their workflows. Unlike general chatbots, they use predefined prompts, structured templates, or pipelines to deliver repeatable, auditable results.
  91. 96 LLM-POWERED TESTING APPLICATIONS Sec. 1.2.2 EXAMPLES: LOW MEDIUM HIGH

    • AI-assisted test case generators • Automatic defect classifiers • Log analysers • Tools that create synthetic test data Examples include AI-assisted test case generators, automatic defect classifiers, log analysers, or tools that create synthetic test data.
  92. 97 LLM-POWERED TESTING APPLICATION AI CHATBOT Sec. 1.2.2 VS If

    a chatbot is like a friendly consultant, an LLM-powered app is like a workshop machine with safety guards and settings pre-configured for testing. It focuses the AI’s power toward a specific purpose while maintaining consistency and traceability.
  93. 98 ✓ Defect analysis Sec. 1.2.2 ✓ Test case generation

    ✓ Test data synthesis HOW TESTERS USE LLM-POWERED TESTING APPLICATIONS ✓ Root-cause hints It’s redundant ✓ Test maintenance support How testers use it: • Test case generation: Convert user stories or requirements into structured test artefacts. • Defect analysis: Cluster similar bugs by text similarity or predict severity from descriptions. • Test data synthesis: Produce realistic but anonymised data. • Test maintenance support: Detect redundant or overlapping tests. • Root-cause hints: Analyse failed runs and highlight likely causes.
  94. 99 Sec. 1.2.2 PROS CONS Consistent outputs aligned with project

    templates Narrower scope compared to a chatbot Built-in history and traceability Inability to ask open-ended questions Easier validation within established workflows Confined to what the tool designers anticipated LLM-POWERED TESTING APPLICATIONS The benefits of this technology include consistent outputs aligned with project templates, built-in history and traceability and easier validation within established workflows. At the same time the limitations of this approach are as follows: the scope is narrower than a chatbot’s, you can’t ask open-ended questions and the AI is confined to what the tool designers anticipated.
  95. 100 Sec. 1.2.2 ✓ Accelerate understanding ✓ Reduce manual overhead

    ✓ Uncover insights that might otherwise remain hidden MAIN FORMS OF GENAI TESTING TOOLS AI Chatbots LLM-Powered Testing Applications • Conversation • Creativity • Consistency • Control So to sum it up, AI chatbots and LLM-powered testing tools represent two complementary sides of GenAI in testing. One emphasises conversation and creativity, the other consistency and control. Used together, with human testers supervising both, they can accelerate understanding, reduce manual overhead, and uncover insights that might otherwise remain hidden in documents or logs.
  96. 101 Sec. 1.2.2 EXAMPLE Identify fingerprint login regression scope Brainstorm

    edge cases Generate structured test cases (format + validation) Link test cases to Jira tickets Execute regression tests Cluster crashes & failures Identify root causes Valid defect? Confirm & create Jira bug Discard false positive Tester AI Chatbot Test Case Generator (LLM) Test Management System (Jira) CI / CD AI-Based Defect Classifier Tester / QA Lead For example, a tester preparing regression tests for a mobile banking app could use a chatbot to brainstorm: “What edge cases should I test for fingerprint login?” Then switch to an LLM-powered test case generator to automatically produce cases in the project’s required format and link them to Jira tickets. Later, an AI-based defect classifier groups crash reports from different devices and highlights the common root cause. Together, these tools create a balanced workflow: human direction, conversational creativity, and structured automation.
  97. 102 • Generative AI supports many testing activities but complements

    rather than replaces human judgment • LLMs assist in key testing tasks such as requirements analysis, test case creation, test data generation, and result analysis • The test oracle problem remains a major challenge; LLMs can help mitigate it but cannot define absolute correctness without clear requirements • LLMs enhance testing efficiency by automating repetitive tasks and generating insights from logs, data, and documentation • Testers interact with GenAI mainly in two ways: AI chatbots (conversational tools) and LLM-powered testing applications (integrated tools) • Combining chatbots with LLM-powered testing tools provides the most effective testing workflow KEY TAKEAWAYS – 1.2
  98. 103 1. In which testing activities would you trust LLM-generated

    outputs the most, and where would you require strict human monitoring? 2. How can LLM-generated test data be reviewed to ensure it is realistic, diverse, and safe to use? REFLECTION – 1.2
  99. 104 • AI technologies range from symbolic AI to machine

    learning, deep learning, and generative AI, each offering different capabilities for testing • LLMs work by converting text into tokens and embeddings, using transformer architecture to understand context and generate probabilistic outputs • Multimodal and vision-language models extend testing capabilities by analysing visual and textual inputs together (e.g., GUI testing and defect detection) • LLMs can support key testing tasks, including requirements analysis, test case creation, test oracle support, test data generation, automation, and result analysis • Testers interact with GenAI through AI chatbots (flexible and exploratory) and LLM-powered testing tools (structured and repeatable), each serving different purposes • Effective use of GenAI requires careful prompting, validation of outputs, and a balanced approach combining human expertise with AI capabilities KEY TAKEAWAYS AND SUMMARY
  100. 105 Answer these questions after completing the reading: 1. How

    do tokenization and embeddings enable LLMs to understand and generate text? 2. What are the key differences between symbolic AI, machine learning, deep learning, and generative AI in the context of testing? 3. When would you choose a foundation, instruction-tuned, or reasoning LLM for a specific testing task? 4. How can multimodal LLMs improve testing compared to text-only models? 5. What is the test oracle problem, and how can techniques like A/B testing, back-to-back testing, and metamorphic testing help address it? 6. How can LLMs support different stages of the testing process, from requirements analysis to test result evaluation? 7. What are the main differences between AI chatbots and LLM-powered testing applications, and when would you use each? 8. What risks are associated with using LLMs in testing, and how would you mitigate them? (You should answer using examples from your own projects where possible.) REFLECTION AND KNOWLEDGE CHECK
  101. 106 • ISTQB® Certified Tester Specialist Level Testing with Generative

    AI (CT-GenAI) Syllabus Version 1.1, 2026 REFERENCES
  102. 107 Learner feedback is collected to support continuous improvement of

    delivery and materials. Understanding is evaluated through: • Chapter quiz covering key concepts from this chapter • Q&A session to clarify questions arising from the activities and quiz FEEDBACK AND EVALUATION