Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Chapter 1 – Introduction to Generative AI for S...

Sponsored · Your Podcast. Everywhere. Effortlessly. Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.

Chapter 1 – Introduction to Generative AI for Software Testing (ISTQBⓇ CT-GenAI v1.1). Reading Materials

Format: Reading Materials (self-study or guided reading)
Estimated Duration: 100 minutes

Target Audience: Software Testers, Test Automation Engineers, Test Analysts, Test Managers, Software Developers and professionals who need a solid understanding of Generative AI (GenAI) in testing – project managers, quality managers, software development managers, business analysts, IT directors and consultants, professionals preparing for ISTQBⓇ CT-GenAI certification

During this chapter, you will:
•Understand what GenAI and Large Language Models (LLMs) are, how they work and when to use them
•See how LLMs support software testing tasks such as requirements analysis, test case creation, and defect detection
•Learn how multimodal LLMs enhance testing through image and text understanding
•Explore how LLMs assist in test data generation, automation, and result analysis

Join Software Testing Hub via Linkedin: https://www.linkedin.com/groups/16889021/
Join Software Testing Hub via Facebook: https://www.facebook.com/groups/746590458484807

Avatar for Exactpro

Exactpro PRO

May 27, 2026

More Decks by Exactpro

Other Decks in Technology

Transcript

  1. ISTQB® CT-GenAI TRAINING COURSE Chapter 1. Introduction to​ Generative AI

    for Software Testing Iuliia Emelianova, Dmitrii Degtiarenko BUILD SOFTWARE TO TEST SOFTWARE ISTQB® CT-GenAI COURSE 2026, V1.1 exactpro.com
  2. Learning Activity Overview Title: Chapter 1 – Introduction to Generative

    AI for Software Testing (ISTQBⓇ CT-GenAI v1.1) Format: Reading Materials (self-study or guided reading) Estimated Duration: 100 minutes Target Audience: Software Testers, Test Automation Engineers, Test Analysts, Test Managers, Software Developers and professionals who need a solid understanding of Generative AI (GenAI) in testing – project managers, quality managers, software development managers, business analysts, IT directors and consultants, professionals preparing for ISTQBⓇ CT-GenAI certification Programme Context: This learning activity forms a part of the ISTQBⓇ CT-GenAI training programme and aligns with the syllabus version 1.1 Engagement: During this chapter, you will: •​ Understand what GenAI and Large Language Models (LLMs) are, how they work and when to use them •​ See how LLMs support software testing tasks such as requirements analysis, test case creation, and defect detection •​ Learn how multimodal LLMs enhance testing through image and text understanding •​ Explore how LLMs assist in test data generation, automation, and result analysis ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 2 of 34
  3. Cognitive Levels of Knowledge Each section corresponds to a certain

    cognitive level of knowledge classified as follows: •​ K1: Remember •​ K2: Understand •​ K3: Apply Learning Objectives By the end of this learning activity, participants will be able to: •​ Recall different types of AI: symbolic AI, classical machine learning, deep learning, and generative AI •​ Explain the basics of generative AI and large language models •​ Distinguish between foundation, instruction-tuned and reasoning LLMs •​ Write and execute a given prompt addressing a test task using a multimodal LLM model •​ Give examples of key LLM capabilities for test tasks •​ Compare interaction models when using GenAI for software testing ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 3 of 34
  4. Learning Structure This reading activity follows a structured learning flow:

    1.​ Introduction to Generative AI and key concepts (including tokenization, context window, and multimodal models) (Section 1.1) 2.​ Overview of the AI spectrum: symbolic AI, classical machine learning, deep learning, and generative AI​ (Section 1.1.1) 3.​ Fundamentals of LLMs: transformers, tokenization, embeddings, and probabilistic behaviour (Section 1.1.2) 4.​ Types of LLMs: foundation, instruction-tuned, and reasoning models, and their use in testing (Section 1.1.3) 5.​ Multimodal LLMs and vision-language models in software testing (Section 1.1.4) 6.​ Practical applications of LLMs in testing (requirements analysis, test case creation, test oracles, and test data generation) and advanced testing support (automation, result analysis, and testware creation) (Section 1.2.1) 7.​ Tools and usage: AI chatbots vs LLM-powered testing applications (Section 1.2.2) 1.1 Generative AI Foundations and Key Concepts Generative Artificial Intelligence (GenAI) is a branch of AI that uses large, pre-trained models to create new content, for example text, images, code, etc. Large Language Models (LLMs) are computer programs that use very large collections of language data in order to understand and produce text in a way that is similar to the way humans do. In other words, they are GenAI systems pre-trained on huge text collections so they can understand context and respond to natural language inputs or prompts. Key terms include: •​ Tokenization. Splitting text into pieces (or tokens) so the model can process it. •​ Context window. How much text, in tokens, the model can “see” at once. ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 4 of 34
  5. •​ Multimodal models. Models that handle more than one kind

    of data (e.g., text, pictures and audio). Think of GenAI as an extremely knowledgeable assistant that has read almost everything on the internet. It doesn’t just repeat information, it can generate something new, like a recipe, a poem, or a test case which is a set of preconditions, inputs, actions (where applicable), expected results and postconditions, developed based on test conditions. To understand your request, it chops your text into Lego-like bricks called tokens. The context window is how many Lego bricks it can keep on the table while building an answer. A multimodal model is like a person who can read, look at photos, and listen to music all at once, then explain how they fit together. In software testing, LLMs can support tasks such as reviewing and improving acceptance criteria which are the criteria that a work product must satisfy to be accepted by the stakeholders, generating test cases or test scripts which are the sequences of instructions for the execution of tests, identifying potential defects, analysing defect patterns, generating synthetic test data which is data needed for test execution, or supporting documentation generation, across the entire test process. Example 1.1. Using Generative AI to Derive Test Cases from Requirements Imagine pasting this requirement into an AI tool: “If a user enters an invalid password three times, the system must lock the account for 15 minutes.” Here’s how GenAI might help a tester: 1.​ It splits the sentence into tokens so it can “read” the rule accurately. 2.​ It interprets the meaning: three wrong attempts, lock the account, duration is 15 minutes. 3.​ It proposes draft test ideas: Enter a wrong password once – expect no lock. Enter it twice – still no lock. Enter it three times – expect a lock message and the timer starts. After 15 minutes, try again – login should work. ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 5 of 34
  6. 4.​ It can even suggest edge cases, like “What happens

    if you try a fourth time after two minutes?” This shows that GenAI doesn’t just copy text but it also reasons about the rule and turns it into concrete checks a tester could run.​ 1.1.1 AI Spectrum: Symbolic AI, Classical Machine Learning, Deep Learning, and Generative AI (K1) Artificial Intelligence (AI) isn’t a single technology; it’s a spectrum of approaches, each with its own way of solving problems. Understanding them helps you see where Generative AI fits and why it’s special for software testing. Below are the main categories, with what they mean, how to picture them, and how they show up in testing: •​ Symbolic AI is early, rule-based AI that represents knowledge as symbols and logic rules. It’s like a chef with a strict recipe book: if you see X, always do Y. No improvising. Example 1.2. Symbolic AI in Software Testing Writing an if/else tree to check login rules (if password is wrong, then display error). ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 6 of 34
  7. •​ Classical Machine Learning is a data-driven algorithm that learns

    from data or experience, but still relies on someone to define useful features or individual measurable attributes of the input data used for training by an ML algorithm and for prediction by an ML model. This one is like a trainee chef who studies many dishes but still needs a teacher to point out the important ingredients. Example 1.3. Classical Machine Learning in Software Testing Training a model to predict which code modules are most likely to contain defects, based on past bug statistics. •​ Deep Learning uses large neural networks to automatically discover patterns in huge datasets (text, images, video, sound). It’s like a creative cook who samples millions of meals and figures out flavour rules on their own. Example 1.4. Deep Learning in Software Testing Scanning thousands of screenshots to detect layout problems, or classifying log messages into “error,” “warning,” and “info.” ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 7 of 34
  8. •​ Generative AI is a branch of deep learning that

    doesn’t just recognise patterns. Instead it creates new content (text, images, video, audio, code) based on what it has learned. LLMs are the main type. Here we have an inventive chef who takes everything they’ve learned and writes brand-new recipes. Example 1.5. GenAI in Software Testing From a user story, GenAI can draft test cases, propose boundary values, or even write an automated test script. AI has grown from strict rule engines to models that learn, to systems that create. Generative AI’s strength is that it uses vast pre-training, so you can apply it to testing tasks right away. This means that you don’t need to build a model from scratch. That power also means you must understand its limits and risks, which we’ll explore in later chapters.​ 1.1.2 Basics of Generative AI and LLMs (K2) Generative AI is powered by a family of models called Large Language Models (LLMs). These models are built on a special type of deep-learning architecture called the transformer. They’re trained on enormous collections of text (books, articles, code, websites) so they can learn the structure and meaning of language. Some lighter versions, called Small Language Models (SLMs), use the same principles but have fewer parameters. They’re faster and easier to run, but usually less capable. But before an LLM can “understand” text, it must translate words into numbers it can work with. It does this in two key steps: 1.​ Tokenization. The model breaks a sentence into small pieces called tokens. A token might be a whole word (“tokenization”), a part of a word (“token” and “ization”), or even punctuation. Just like mentioned before, think of tokens as Lego bricks: the model doesn’t see a sentence as a smooth wall of text; it sees a row of bricks it can rearrange and analyse. ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 8 of 34
  9. 2.​ Embeddings. Once text is tokenized, every token is turned

    into a long vector of numbers that captures its semantic, syntactic, and contextual relationships with other tokens. Similar words (for example, bug and defect) end up close together in this high-dimensional “semantic map of meaning.” This is how the model keeps track of context and nuance. ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 9 of 34
  10. Example 1.6. Tokenization and Embeddings Text:​ ​ ​ LLM must

    translate words into numbers it can work with. Tokenized text:​ LLM must translate words into numbers it can work with. Token IDs:​ ​ 7454, 44, 2804, 24888, 6391, 1511, 8663, 480, 665, 1101, 483, 13, 220 Embeddings:​ [[0.021, -0.443, 0.287, 0.004, ..., -0.118], [-0.023, -0.111, 0.043, 0.142, ..., 0.237], ……………………………………………………………………………. [0.521, 0.049, -0.278, -0.309, ..., -0.538]] The transformer architecture lets the model look at all tokens in a sentence at once, figure out which ones depend on each other, and predict the most likely “next token.” That’s how an LLM generates text that feels fluent and logical. However, the model doesn’t store facts the way a database does. Instead, it works with probabilities. Given the same input twice, it might not give the identical answer. This non-deterministic behaviour comes from sampling different but plausible next tokens. Another important concept is the context window, it’s the maximum number of tokens the model can consider at one time. A bigger window allows it to handle longer documents, such as large test logs, but it also increases memory and processing cost. ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 10 of 34
  11. Example 1.7. How LLMs Work: Tokenization, Embeddings, and Test Case

    Generation Suppose you’re testing a shopping-cart application and want ideas for boundary tests. You paste this requirement into an LLM: “A customer can add up to 50 items to the cart. If they try to add more, the system shows an error.” Here’s what happens behind the scenes: 1.​ The model splits the sentence into tokens:​ A customer can add up to 50 items to the cart. If they try to add more, the system shows an error.​ [32, 6130, 649, 923, 709, 311, 220, 1135, 3673, 311, 279, 7558, 13, 1442, 814, 1456, 311, 923, 810, 11, 279, 1887, 5039, 459, 1493, 13] 2.​ Each token is converted into an embedding, a point in its “semantic map”. 3.​ The transformer looks at relationships: “50 items” relates to “add” and “error”. 4.​ Based on those links, it predicts and writes useful suggestions: •​ Add 49 items - expect success. •​ Add 50 items - expect success. •​ Add 51 items - expect error. •​ After the error, remove a number of items and then verify you can still add them back as long as total ≤ 50. With a well-crafted prompt, you can even ask the model to produce these as formal test cases or Gherkin scenarios. ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 11 of 34
  12. 1.1.3 Foundation, Instruction-Tuned, and Reasoning LLMs (K2) LLMs are usually

    developed through several stages of training, each adding new capabilities. Knowing these stages helps testers decide which kind of model to use for a given testing task. •​ Foundation LLMs (or Base LLMs). Foundation models are general-purpose large models trained on extremely broad and diverse data sets, billions of words of text, code samples, sometimes images or audio. Their goal is to learn the structure of language and many topics at once. They can handle a wide variety of tasks (writing, summarising, classifying) but they are not specialised for any one domain or style of answer. To meet a specific need, they often require extra alignment or fine-tuning. In simpler terms, they are like a student who has read an entire library but hasn’t yet learned how to sit an exam. They know a lot, but their answers may be long, unfocused, or off-target. Example 1.8. Foundation LLM in Testing Ask a foundation LLM, “What does this login requirement mean for testing?” and it might give a broad discussion about security, usability, or password policies, but not a clean checklist of tests. •​ Instruction-tuned LLMs. Instruction-tuned models are foundation LLMs trained to follow instructions, often reinforced by feedback to encourage correct answers. This step improves their ability to understand user intent, follow directions, and provide answers in the requested format. The training focuses on clarity, task adherence, and producing concise, relevant information. Imagine our library-loving student taking a course called “How to answer questions clearly.” Now they know to stay on topic, write in bullets if asked, and stop when the job is done. ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 12 of 34
  13. Example 1.9. Instruction-tuned LLM in Testing Give an instruction-tuned model

    a user story and say, “Write three test scenarios.” It will likely return short, well-structured items, e.g., valid login, invalid password, account lock after three tries. •​ Reasoning LLMs. Reasoning models are instruction-tuned LLMs that undergo additional training or optimisation to improve structured thinking. They focus on skills such as logical inference, multi-step problem solving, and chain-of-thought reasoning. This makes them better at tasks where intermediate steps and careful synthesis are required, including technical or analytical problems. Let’s continue with our analogy. The student now joins a logic and problem-solving workshop. They learn to break puzzles into smaller parts, show their working, and check each step before deciding. Example 1.10. Applying Reasoning LLM to Boundary Testing Present a reasoning LLM with: “A shopper gets a 10% discount if they buy more than 3 items and total cost exceeds $100; shipping is free if total > $150. Suggest boundary test cases.” It can separate the conditions, reason about thresholds ($100 vs $150), and deliver an organised matrix of tests, rather than one jumbled paragraph. ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 13 of 34
  14. ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative

    AI for Software Testing Page 14 of 34 TEST ID ITEMS TOTAL ($) DISCOUNT EXPECTED FREE SHIPPING EXPECTED NOTES TC-01 3 99.99 ❌ ❌ Below all thresholds TC-02 3 100.00 ❌ ❌ Cost boundary (100) TC-03 3 100.01 ❌ ❌ Items boundary not met TC-04 4 99.99 ❌ ❌ Cost just below discount TC-05 4 100.00 ❌ ❌ Cost boundary (100) TC-06 4 100.01 ✅ ❌ Discount boundary crossed TC-07 4 149.99 ✅ ❌ Below free shipping TC-08 4 150.00 ✅ ❌ Shipping boundary TC-09 4 150.01 ✅ ✅ Free shipping boundary crossed TC-10 3 150.01 ❌ ✅ Free shipping only TC-11 5 150.00 ✅ ❌ More items, shipping boundary TC-12 5 200.00 ✅ ✅ All conditions satisfied
  15. Why this matters for testers (see Table below). LLM TYPE

    CORE STRENGTH WHEN TO USE Foundation Broad knowledge, flexible but unspecialised Early exploration, summarising requirements Instruction-Tuned Follows directions, produces tidy artefacts Generating test cases, checklists, defect summaries Reasoning Handles complex logic and planning Analysing business rules, prioritising cases, explaining failures Choosing the right type and knowing its limits helps you get outputs that fit your testing goals while avoiding over-reliance on a model that isn’t built for deep reasoning.​ 1.1.4 Multimodal LLMs and Vision-Language Models (K2) Large Language Models originally worked only with text. But many real-world tasks need more than words, they need pictures, diagrams, audio, even video. That’s where multimodal LLMs come in, because they extend the same transformer technology to handle several types of input and combine them in a single reasoning process. A multimodal LLM is a model trained to process and relate information from different data modalities (for example, text, images, sound, or video). Each type of input is first converted into a numerical representation (or embedding) appropriate for that data. The model then aligns these embeddings in a shared space, letting it understand how an image caption relates to the picture, or how a spoken command links to on-screen text. ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 15 of 34
  16. An analogy here would be an investigator who doesn’t just

    read witness statements but also studies photos, listens to recordings, and watches CCTV footage and then pieces everything together to solve a case. A vision-language model is a specialised subset of multimodal LLMs trained mainly on text–image pairs. They learn how visual elements connect to written descriptions and can answer questions or generate text about an image. It’s like a bilingual person fluent in both “picture language” and “written language”, able to translate between them or discuss both at once. Software testers often work with visual artefacts in the form of screenshots, mock-ups, wireframes, charts, as well as textual specs. Multimodal models can bridge the gap: •​ GUI analysis: Supply a screenshot of an app and ask, “List any accessibility issues you notice.” •​ Wireframe (or simple visual outline of a screen) & acceptance criteria: Give a page mock-up plus a short story; ask the model to propose acceptance criteria that match what’s on the screen (for example, what input fields exist, what happens when you click each button, are there navigation flows that need to be tested, etc.) •​ Image-based defect detection: Compare an expected screen image with an actual run; the model can highlight missing buttons or colour mismatches. •​ Hybrid reasoning: Mix logs, a screen capture, and a requirement paragraph to explain why a test failed. Example 1.11. Using Multimodal LLM for GUI Testing A tester uploads a screenshot of a login page (with “Username,” “Password,” and a misaligned “Login” button) and a text snippet: “The login button should be horizontally centred.” A multimodal LLM can: •​ recognise the button in the image; •​ check its position against the requirement; ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 16 of 34
  17. •​ reply: “The button is shifted 20px to the right;

    it should be centred.” This blend of visual understanding and textual reasoning lets testers detect issues that would be hard for a text-only model. ​ Key Takeaways – 1.1 •​ GenAI uses large pre-trained models to create new content such as text, code, and images •​ LLMs are GenAI systems trained on vast text data to understand and generate human-like language •​ Tokenization and embeddings convert text into numerical representations that models can process •​ LLMs are probabilistic and non-deterministic, meaning the same input can produce different outputs •​ The context window limits how much information the model can process at once, impacting performance on long inputs •​ Different LLM types serve different purposes: ◦​ Foundation models provide broad knowledge but may lack focus ◦​ Instruction-tuned models follow directions and produce structured outputs ◦​ Reasoning models handle complex logic and multi-step problem solving •​ Multimodal LLMs extend capabilities beyond text to include images, audio, and other data types •​ LLMs can support software testing by generating test ideas, analysing requirements, and assisting with complex scenarios, but still require human oversight ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 17 of 34
  18. Reflection – 1.1 1.​ How would you decide which type

    of LLM (foundation, instruction-tuned, or reasoning) is most appropriate for a specific testing task in your project? 2.​ In what situations could a multimodal LLM provide clear advantages over a text-only model in your testing workflow? 3.​ What risks might arise from relying on probabilistic LLM outputs, and how would you mitigate them as a tester? 1.2 Generative AI Foundations and Key Concepts Generative AI isn’t just an academic curiosity, it can become a powerful tool for testers. Because LLMs understand language, code, and even images, they can support almost every phase of the testing process. They don’t replace human judgment, but they can take over repetitive chores, spark new ideas, and speed up analysis. Broadly, testers interact with GenAI in two ways: •​ Conversational tools (AI chatbots): a friendly interface where you type or speak a question and get an instant answer. •​ LLM-powered testing applications: testing tools that integrate an LLM behind the scenes, giving you features such as automated test case generation or defect clustering. Understanding what LLMs are actually good at helps you decide when (and when not) to call on them. ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 18 of 34
  19. 1.2.1 Key LLM Capabilities for Test Tasks (K2) Below are

    some of the most useful capabilities LLMs can bring to software testing. 1.​ Requirements analysis and improvement. LLMs can read requirements, user stories, or other “test basis” documents, looking for ambiguities, contradictions, or missing information. They can generate meaningful questions to help clarify requirements for stakeholders. Example 1.12. Using LLM for Requirements Analysis When presented with a paragraph about password rules, the AI, like a skilled proof-reader, spots unclear wording (“lock the account after several failed attempts”, how many exactly?) and suggests clarifying it. 2.​ Test case creation support. LLMs can draft test cases from requirements or user stories, suggesting preconditions, steps, and expected results. Example 1.13. Using LLM for Test Case Creation You give it a story like “As a user, I can reset my password via email.” The AI proposes test cases: •​ request reset, •​ follow link, •​ use expired link, •​ try with the wrong email. ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 19 of 34
  20. 3.​ Test oracle generation. A test oracle is the source

    of truth that tells you whether a test passed or failed, in other words, the expected result. Without a reliable oracle, even the best test execution leaves you unsure: “Is this behaviour correct, or is it a bug?” The oracle problem is a long-standing challenge in software testing. Sometimes requirements are incomplete, ambiguous, or missing edge cases. This uncertainty exists in conventional testing and remains a challenge even when AI is involved, because models themselves don’t magically know the “true” answer. Imagine you’re grading essays without an answer key. You can guess which ones are good, but you don’t always know with certainty. That’s the everyday life of a tester facing the oracle problem. Test oracles require interpretation and should be sensitive enough to flag genuinely unusual behaviour without overwhelming testers with minor issues. They function similarly to fraud-detection systems, IT monitoring platforms, or market surveillance tools. For complex or probabilistic systems, establishing a test oracle may be difficult without access to the “ground truth”, the actual real-world result that the system aims to predict. In some cases, expected results can be defined within limits through expert consultation, though experts may disagree or be unwilling to have their judgment automated. Issues such as varying competence, differing interpretations, and human uncertainty must be considered. Several testing techniques can help mitigate the oracle problem, including A/B testing, back-to-back testing, and metamorphic testing. •​ Back-to-back (or differential) testing: Run the same input on two different implementations (e.g., legacy system vs new system, or two models) and compare outputs. AI can help automate the comparison or highlight suspicious differences. •​ A/B testing: Present different user groups with different versions (A vs B), then analyse outcomes. An LLM can assist in designing the test plan, collecting feedback, or spotting anomalies in results. •​ Metamorphic testing: Define relationships between inputs and outputs that should always hold true. For example, doubling an input should double an output. LLMs can help generate these relationships or check them against results. ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 20 of 34
  21. The oracle problem is one of the hardest in testing

    because no automation, AI or not, can conjure up a “ground truth” that isn’t specified or agreed on by stakeholders. What GenAI adds is the ability to assist in mitigating the problem: by proposing expectations, highlighting ambiguities, and supporting structured approaches like back-to-back, A/B, or metamorphic testing. But ultimate responsibility still lies with the tester to judge correctness. Example 1.14. Using LLM for Test Oracle Generation Suppose a tax calculator should apply different rates depending on income brackets. The requirement document only says “progressive tax applies” but doesn’t list exact thresholds. •​ A foundation model could propose typical brackets (but you must validate them). Different approaches could be applied here, e.g. •​ A reasoning model might help set up metamorphic checks: “If tax on $20,000 is X, then tax on $40,000 should be greater than X.” •​ Back-to-back testing could compare results between the old system and the new one. 4.​ Test data generation. LLMs can create datasets, set boundary values, and create different combinations of test data including synthetic data that mimics production without exposing sensitive info. ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 21 of 34 ANNUAL INCOME TAX RATE Up to 25,000 10% 25,001 – 75,000 20% Above 75,000 30%
  22. Example 1.15. Using LLM for Test Data Generation A tester

    writes the following prompt: “Generate 20 fake customer profiles with valid but diverse phone numbers and emails.” LLM creates a dataset of 20 customer profiles as shown below. 5.​ Test automation support. LLMs can help generate test scripts from the description and improve existing test scripts by suggesting changes and identifying appropriate test design techniques. Example 1.16. Using LLM for Test Automation Ask an LLM, “Generate a test script in Python based on the following description: Log in as admin then open report page then export CSV.” AI produces a runnable script: ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 22 of 34 # NAME PHONE NUMBER EMAIL 1 Alex Morgan +1 415-555-0134 [email protected] 2 Priya Shah +91 98765 43210 [email protected] --------------------------------------------------------------------------------------------------------------- 19 Pedro Alvarez +52 55 5555 4321 [email protected] 20 Jonas Nilsson +46 70 555 9921 [email protected]
  23. Here’s a simple, realistic Python test script that follows your

    flow: Log in as admin → open report page → export CSV I’ll assume a web UI test using Selenium + pytest, which is the most common setup. Everything is clearly marked so you can adapt URLs, selectors, or credentials. Example: Selenium + pytest test script import os import time import pytest from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.chrome.options import Options ………………… 6.​ Test result analysis. LLMs can help analyse test results by creating summaries and classifying anomalies based on severity and priority via reviewing logs, outputs, or dashboards. Example 1.17. Using LLM for Test Result Analysis Feed in a test-run log: ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 23 of 34
  24. The model clusters similar errors and marks the one that

    likely causes most failures: 🔹 Error Clusters Cluster 1: Report Page Load Timeout Cluster 2: CSV Export Failure (Disk Full) Cluster 3: Cascading Assertion Failures 🚨 Most Likely Root Cause​ (Main Failure Driver) 🟥 Disk Full on Test Environment 7.​ Testware creation and maintenance. Beyond cases, LLMs can help draft or update plans, strategies, defect reports, or release notes. Example 1.18. Using LLM for Testware Creation A tester uploads a screenshot of a webpage with an error, log file and a text snippet: “Write a concise defect description from this log and screenshot.” ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 25 of 34
  25. AI produces a clear ticket ready for the tracking tool:

    ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 27 of 34
  26. LLMs are versatile companions for testers: they read, reason, write,

    and translate between natural language, code, and even images. But remember: their outputs depend on how you ask (prompt engineering will be covered in Chapter 2) and on careful human review. Used wisely, they free testers from routine work and open space for deeper thinking about quality and risk.​ 1.2.2 AI Chatbots and LLM-Powered Testing Applications (K2) Generative AI can appear in testing through two main forms of tools: 1.​ AI chatbots or conversational assistants you interact with directly, like ChatGPT or Gemini. 2.​ LLM-powered testing applications or software testing tools that quietly integrate an LLM behind the scenes to enhance traditional testing workflows. Both rely on the same underlying technology (language models), but they serve different purposes and levels of automation. Understanding their roles helps testers know when to rely on a chatbot’s flexibility and when to benefit from a specialised testing platform. AI Chatbots An AI chatbot is a general-purpose conversational interface built around an LLM. It accepts natural-language prompts and produces natural-language responses, often in real time. Some can also interpret code, documents, or images, depending on their capabilities. Think of it as a knowledgeable colleague you can talk to at any moment, the one who has read mountains of documentation and never tires of brainstorming. How testers use it: ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 28 of 34
  27. •​ Quickly explore requirements or clarify ambiguities by pasting text

    and asking questions. •​ Brainstorm test ideas: “List edge cases for password validation.” •​ Translate or rephrase requirements into Gherkin syntax or structured test steps. •​ Summarise logs, reports, or defect descriptions. •​ Experiment interactively (refining prompts until the response fits the context). Chatbots provide flexibility but no built-in control over traceability, history, or integration with test management tools. They’re great for discovery but require human verification and adaptation before outputs are used in real projects. LLM-Powered Testing Applications These are specialised testing tools that integrate LLMs into their workflows. Unlike general chatbots, they use predefined prompts, structured templates, or pipelines to deliver repeatable, auditable results. Examples include AI-assisted test case generators, automatic defect classifiers, log analysers, or tools that create synthetic test data. If a chatbot is like a friendly consultant, an LLM-powered app is like a workshop machine with safety guards and settings pre-configured for testing. It focuses the AI’s power toward a specific purpose while maintaining consistency and traceability. How testers use it: •​ Test case generation: Convert user stories or requirements into structured test artefacts. •​ Defect analysis: Cluster similar bugs by text similarity or predict severity from descriptions. •​ Test data synthesis: Produce realistic but anonymised data. •​ Test maintenance support: Detect redundant or overlapping tests. •​ Root-cause hints: Analyse failed runs and highlight likely causes. The benefits of this technology include consistent outputs aligned with project templates, built-in history and traceability and easier validation within established workflows. At the same time the limitations of this approach are as ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 29 of 34
  28. follows: the scope is narrower than a chatbot’s, you can’t

    ask open-ended questions and the AI is confined to what the tool designers anticipated. So to sum it up, AI chatbots and LLM-powered testing tools represent two complementary sides of GenAI in testing. One emphasises conversation and creativity, the other consistency and control. Used together, with human testers supervising both, they can accelerate understanding, reduce manual overhead, and uncover insights that might otherwise remain hidden in documents or logs. Example 1.19. Using AI Chatbots and LLM-Powered Testing Applications in Testing A tester preparing regression tests for a mobile banking app could use a chatbot to brainstorm: “What edge cases should I test for fingerprint login?” Then switch to an LLM-powered test case generator to automatically produce cases in the project required format and link them to Jira tickets. Later, an AI-based defect classifier groups crash reports from different devices and highlights the common root cause. Together, these tools create a balanced workflow: human direction, conversational creativity, and structured automation. ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 30 of 34
  29. Key Takeaways – 1.2 •​ Generative AI supports many testing

    activities but complements rather than replaces human judgment •​ LLMs assist in key testing tasks such as requirements analysis, test case creation, test data generation, and result analysis •​ The test oracle problem remains a major challenge; LLMs can help mitigate it but cannot define absolute correctness without clear requirements •​ LLMs enhance testing efficiency by automating repetitive tasks and generating insights from logs, data, and documentation •​ Testers interact with GenAI mainly in two ways: AI chatbots (conversational tools) and LLM-powered testing applications (integrated tools) •​ Combining chatbots with LLM-powered testing tools provides the most effective testing workflow​ Reflection – 1.2 1.​ In which testing activities would you trust LLM-generated outputs the most, and where would you require strict human monitoring? 2.​ How can LLM-generated test data be reviewed to ensure it is realistic, diverse, and safe to use? ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 32 of 34
  30. Key Takeaways and Summary •​ AI technologies range from symbolic

    AI to machine learning, deep learning, and generative AI, each offering different capabilities for testing •​ LLMs work by converting text into tokens and embeddings, using transformer architecture to understand context and generate probabilistic outputs •​ Multimodal and vision-language models extend testing capabilities by analysing visual and textual inputs together (e.g., GUI testing and defect detection) •​ LLMs can support key testing tasks, including requirements analysis, test case creation, test oracle support, test data generation, automation, and result analysis •​ Testers interact with GenAI through AI chatbots (flexible and exploratory) and LLM-powered testing tools (structured and repeatable), each serving different purposes •​ Effective use of GenAI requires careful prompting, validation of outputs, and a balanced approach combining human expertise with AI capabilities​ Reflection and Knowledge Check Answer these questions after completing the reading: 1.​ How do tokenization and embeddings enable LLMs to understand and generate text? 2.​ What are the key differences between symbolic AI, machine learning, deep learning, and generative AI in the context of testing? 3.​ When would you choose a foundation, instruction-tuned, or reasoning LLM for a specific testing task? 4.​ How can multimodal LLMs improve testing compared to text-only models? ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 33 of 34
  31. 5.​ What is the test oracle problem, and how can

    techniques like A/B testing, back-to-back testing, and metamorphic testing help address it? 6.​ How can LLMs support different stages of the testing process, from requirements analysis to test result evaluation? 7.​ What are the main differences between AI chatbots and LLM-powered testing applications, and when would you use each? 8.​ What risks are associated with using LLMs in testing, and how would you mitigate them?​ References •​ ISTQB® Certified Tester Specialist Level Testing with Generative AI (CT-GenAI) Syllabus Version 1.1, 2026,​ https://istqb.org/?sdm_process_download=1&download_id=6295 (accessed May 2026)​ Feedback and Evaluation Learner feedback is collected to support continuous improvement of delivery and materials. Understanding is evaluated through: •​ Chapter quiz covering key concepts from this chapter •​ Q&A session to clarify questions arising from the activities and quiz ISTQB® CT-GenAI Training Course | Chapter 1. Introduction to Generative AI for Software Testing Page 34 of 34