Lock in $30 Savings on PRO—Offer Ends Soon! ⏳

ON-PREM AI INFRASTRUCTURE FOR SECURE PR REVIEW

Avatar for iskand3rov iskand3rov
December 08, 2025

ON-PREM AI INFRASTRUCTURE FOR SECURE PR REVIEW

This presentation introduces the design of an on-prem AI infrastructure built for secure, private, and cost-efficient use of large language models within engineering teams . Using the Cossack Labs setup as an example, it demonstrates how local LLMs and Whisper ASR operate together with a controller server, GPU nodes, encrypted storage, and VPN-isolated services to support internal automation tools . A key focus is the AI-powered security pull-request review system, which automatically analyses diffs, detects security issues, and generates review comments using a structured LLM pipeline . The presentation also highlights additional internal AI tools, including AI meeting notes, draft vulnerability report generation, and a private web UI for interacting with LLMs.

Avatar for iskand3rov

iskand3rov

December 08, 2025
Tweet

More Decks by iskand3rov

Other Decks in Technology

Transcript

  1. About Elmir Iskanderov, Lead Security Engineer @ Cossack Labs Cossack

    Labs – British cybersecurity firm with an R&D office in Ukraine, building tailored security solutions for high-risk systems. 5+ years experience in PenTest, AppSec, DevSecOps, AI Focused on Secure SDLC, AppSec automation and AI integration
  2. Plan Building local AI infrastructure (architecture, environment, security, cost-efficiency) AI

    for secure PR reviews Other our internal AI tools: AI meeting notes, AI-generated reports, Web UI for LLMs Questions Quick topic: generating secure code using LLMs Elmir Iskanderov
  3. Plan Building local AI infrastructure (architecture, environment, security, cost-efficiency) AI

    for secure PR reviews Other our internal AI tools: AI meeting notes, AI-generated reports, Web UI for LLMs Questions Quick topic: generating secure code using LLMs Elmir Iskanderov
  4. Why private AI? No external data transfer Data privacy &

    security Predictable cost No need for maintenance Low latency Flexible model choice Easy integration with internal systems Elmir Iskanderov
  5. Where to build private AI infrastructure Cloud (Digital Ocean, AWS,

    etc.) Own hardware (a few GPUs) Elmir Iskanderov
  6. Where to build private AI infrastructure Cloud (Digital Ocean, AWS,

    etc.) Own hardware (a few GPUs) Elmir Iskanderov
  7. +/- of using cloud Fast scalability Less expensive for beginning

    Advantages: Flexible resource allocation Less maintenance Possible limits or availability issue More expensive for 24/7 workloads Disadvantages: Ongoing costs for snapshots, backups, and traffic Lower security if the provider is not trustworthy Easy to manage and deploy Access to latest hardware without upfront cost Elmir Iskanderov
  8. Components of the infrastructure - A user or application can

    interact with the LLM through a website or an API. - A snapshot of a GPU server with pre-installed LLM models, a configured Docker Compose setup, and all required services. Duplicated across several regions. Used by the daily automation service to create a new GPU server at 10 AM. Elmir Iskanderov
  9. Components of the infrastructure - Controller 24/7 Server. Creates and

    deletes GPU servers on schedule.Hosts the Web UI for LLM. Provides API endpoints for interacting with the LLM. Acts as the backend for other internal tools and stores data. - GPU Server. Launched from the snapshot by schedule. Runs the LLM, ASR, and backends for internal tools.Not accessible from the Internet — VPN only. 20 GB VRAM, 32 GB RAM, 8 vCPU, 0,76$/h Elmir Iskanderov
  10. Controller Server and GPU Server - Handles routing of incoming

    API and UI requests through Nginx. - Used for AI meeting notes.Runs Whisper-large ASR model (requires ~9–10 GB VRAM). - An open-source interface for working with LLMs. Provides a chat interface, supports RAG, file uploads, web search, custom model installation, and many other features. Elmir Iskanderov
  11. Our settings of gpt-oss:20b Our context: 40 000 tokens. It’s

    30K words or 120 A4 papers Generation speed: 70 tokens/s Used VRAM: 13 GB Temperature: 1 Attention: 1 - An LLM by OpenAI with 20B parameters. Comparable reasoning capabilities to GPT-3.5 or GPT-3o. Suitable for summarisation, Q&A, classification, and reasoning tasks. Optimised for private deployment and latency-critical use cases. Elmir Iskanderov
  12. Security OS hardening (rebinded ports, user management, regular updates, SSH,

    Docker, environment variables) HTTPS Infrastructure: Custom VPN network for routing requests to the database and Ollama Database encryption (for Web UI for LLM) Whisper – local model from OpenAI gpt-oss:20b — local LLM (OpenAI) Models: WebUI – open source (110K stars) Docker containers for services (isolisation) API keys for authentication in AI meeting notes tool JWT in WebUI with expiration of 1 day Admins don’t see chats, only “last active”. They can reset passwords. Elmir Iskanderov
  13. Plan Building local AI infrastructure (architecture, environment, security, cost-efficiency) AI

    for secure PR reviews Other our internal AI tools: AI meeting notes, AI-generated reports, Web UI for LLMs Questions Quick topic: generating secure code using LLMs Elmir Iskanderov
  14. What is Pull Request and Pull Request Review A Pull

    Request (PR) is a workflow mechanism used in Git-based development to propose changes from one branch to another. A Pull Request Review is the process where other engineers examine the proposed changes before they are merged. The goal is to ensure that the code is correct, secure, maintainable, and aligned with project standards. Elmir Iskanderov
  15. Why security review is needed? Reviewers are often not security-aware

    Shift-left approach as a DevSecOps principle Additional layer for security scanners in CI/CD (SAST/DAST, secrets, config, dependency, etc) To prevent vulnerabilities and security weakness in code ❗ It does not replace human security review, manual testing, or threat analysis. It is an assistant that enhances the review process, but does not eliminate the need for professional oversight. Elmir Iskanderov
  16. Plan Building local AI infrastructure (architecture, environment, security, cost-efficiency) AI

    for secure PR reviews Other our internal AI tools: AI meeting notes, AI-generated reports, Web UI for LLMs Questions Quick topic: generating secure code using LLMs Elmir Iskanderov
  17. AI meeting notes (very very simple diagram) - Sends audio

    chunks every 15–30 seconds with a 2-second overlap. When recording stops, it sends a request with the full transcription to generate AI notes. The user can provide a custom prompt for note generation. - AI Notes Backend. Receives audio chunks and sends them to Whisper-large for transcription. Returns the transcribed text to the user and also uses it to generate AI notes. Elmir Iskanderov
  18. Private web UI for LLM Connect any LLM Open Source

    Data encryption User and group management File uploads Provides API for LLM access Internet search support Elmir Iskanderov
  19. Private web UI for LLM (very very simple diagram) -

    Sends a prompt. - Nginx proxies request to Web UI. Encrypted DB is used for storing users chats. - Primary LLM model. Elmir Iskanderov
  20. Model for generating DRAFT vulnerability reports Generates a full report

    in under 1 minute (instead of 10+ minutes) Uses RAG with our custom report templates Ensures data privacy Supports images, files, and web browsing Elmir Iskanderov
  21. Plan Building local AI infrastructure (architecture, environment, security, cost-efficiency) AI

    for secure PR reviews Other our internal AI tools: AI meeting notes, AI-generated reports, Web UI for LLMs Questions Quick topic: generating secure code using LLMs Elmir Iskanderov
  22. Generation secure code using LLM Bad prompt: Write a Flask

    API endpoint that takes a username from user input and queries the database for that user. Return the user data as JSON. Result: 1. Uses a very permissive username column (String(80)) and no length restriction on the input 2. Input validation No validation – raw username from query string is passed straight to the ORM. 3. Output escaping Directly serialises user.username and user.email into JSON. 4. Debug mode app.run(debug=True) Elmir Iskanderov
  23. Better LLM prompts Better prompt: Write a Python function that

    accepts user input from a web form and searches for a user record in a database by username. The function will be part of a Flask-based API. Ensure that the code adheres to secure coding practices. Implement the following security mechanisms related to input validation and other relevant areas: Input Validation: - Only allow valid usernames (alphanumeric characters, underscores; max 30 characters). - Reject or sanitize any input that doesn't match the expected pattern. - Use regular expressions to validate the input format. Output Encoding / Sanitization: - Prevent any data returned in the response from being used in reflected XSS. - Encode or sanitize user-supplied data before displaying it. Database request: - Use parameterized queries (e.g., via SQLAlchemy or DB-API) to avoid SQL Injection. Error Handling: - Do not expose raw error messages to the user. Write an error message with the minimum necessary information and remove any sensitive data (stack trace, functions names, variables names, etc). - Return generic error messages for failed input validation or query failures for each specific error. Elmir Iskanderov
  24. Better LLM prompts Result: 1. Uses a stricter column length

    (String(30)) that matches the regex. 2. Regex check (^[a-zA-Z0-9_]{1,30}$) ensures the username is alphanumeric (plus underscore) and ≤30 chars. 3. Uses markupsafe.escape() on both fields before JSON serialisation. 4. Wraps the whole handler in try/except Exception and returns a generic 500 error. 5. Debug mode is off app.run(debug=False) Elmir Iskanderov
  25. General system prompt You are a Senior Application Security Engineer

    and experienced Software Developer. Your task is to generate secure, production-grade code in response to developer prompts. Follow these instructions strictly. Write in English or Ukrainian, depending on the prompt language. General rules 1. Always prioritise security by default: input validation, output encoding, authentication, access control, secure dependencies, and error handling. 2. Keep explanations short and professional (1–3 sentences at a time). 3. Always base your code and recommendations on OWASP Cheat Sheets or the OWASP ASVS, if applicable. 4. Never invent security rules — use only established, widely recognised best practices. 5. Clearly note any important security considerations or limitations in the implementation. 6. If unsure whether a practice is secure, say so and suggest further review. OWASP Cheat Sheet Series is a collection of concise, high-value guides created by security experts to help developers and security professionals implement best practices for secure software development. Elmir Iskanderov
  26. General system prompt Code output requirements 1. Code should be

    ready to use, with minimal placeholders. If boilerplate is required, include it. 2. Always prefer secure defaults (e.g. use HTTPS, safe headers, secure session handling). 3. Include brief inline comments only where necessary to explain security decisions. 4. If the code involves risky operations (e.g. file uploads, DB access, auth), reference the relevant OWASP Cheat Sheet in a comment above the block: // See: OWASP Cheat Sheet - File Upload Security 5. Never hardcode secrets or credentials — note where to load them securely. If you include recommendations or notes 1. Use a numbered list (1., 2., 3.…). 2. Maximum 5 items; stop if 2–3 are enough. 3. Keep them abstract unless the prompt asks for concrete config. 4. Do not add filler explanations — keep it concise and security-focused. Elmir Iskanderov
  27. General system prompt Formatting 1. Output should be in Markdown

    unless specified otherwise. 2. Use triple backticks for code blocks with language tags ( ```js, ```java, etc.). 3. Always follow this order: 4. Code block 5. OWASP references (if any) 6. Short explanation or recommendation list (if applicable) Elmir Iskanderov