Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Towards Test-Driven Synthesis of Behavioral Pro...

Towards Test-Driven Synthesis of Behavioral Programs

Abstract. Scenario-Based Programming (SBP) simplifies reactive system development through independent behavioral threads (b-threads). However, the intricate synchronization logic makes manual implementation error-prone.
We propose an approach inspired by Test-Driven Development (TDD) leveraging Large Language Models (LLMs) to synthesize b-threads from natural language. By using existential scenarios as test oracles, our approach proposes a feedback-loop cycle for iterative refinement of generated code.
We outline a roadmap for implementing this workflow, integrating it with industrial SBP practices, and empirically evaluating its effectiveness in real-world scenarios.

Avatar for Giovanni Rosa

Giovanni Rosa

June 16, 2026

More Decks by Giovanni Rosa

Other Decks in Research

Transcript

  1. Towards Test-Driven Synthesis of Behavioral Programs AI4SE - Inteligencia Artificial

    para Ingeniería del Software XXX Jornadas de Ingeniería del Software y Bases de Datos (JISBD) Alicante, del 16 al 18 de Junio de 2026 Giovanni Rosa, David Moreno-Lumbreras, Gregorio Robles, and Jesús M. González Barahona SoftDev Speaker: Giovanni Rosa Postdoctoral Researcher Universidad Rey Juan Carlos
  2. LLM-driven Code Synthesis for SBP TDD improves code generation and

    adherence to requirements Promising results using LLMs to support SBP
  3. Preliminary Research Questions Impact of different programming languages and libraries

    Influence on the success rate of less-known languages and libraries Effectiveness of the automated self-correction loop Human intervention is reduced and the code generation process is improved B-thread implementation and adherence to the RWB event sequence % of correct b-threads implementation and RWB event flow
  4. Challenge #1: State Dependency Problem Assume stateless input and output

    sequences Defined sequence of events, influencing the program state Standard Functions (Linear and stateless) B-threads (Reactive and stateful)
  5. Challenge #2: Specialized BP benchmarks Standard coding benchmarks lack the

    event-driven scenarios required to test behavioral programs Specialized SBP dataset with game- based challenges and protocol implementations
  6. Acknowledgments More at https://advise.codeberg.page/ This work is part of the

    ADVISE project (ADvanced Vision on Intelligent Software Engineering) ADVISE aims to integrate AI agents into software development to ensure code meets requirements and aligns with developer intent. Funded by the Spanish AEI, reference 2024/00416/002