ON-PREM AI INFRASTRUCTURE FOR SECURE PR REVIEW

GDG: ON-PREM AI INFRASTRUCTURE FOR SECURE PR REVIEW Elmir Iskanderov

About Elmir Iskanderov, Lead Security Engineer @ Cossack Labs Cossack
Labs – British cybersecurity ﬁrm with an R&D ofﬁce in Ukraine, building tailored security solutions for high-risk systems. 5+ years experience in PenTest, AppSec, DevSecOps, AI Focused on Secure SDLC, AppSec automation and AI integration

Plan Building local AI infrastructure (architecture, environment, security, cost-efﬁciency) AI
for secure PR reviews Other our internal AI tools: AI meeting notes, AI-generated reports, Web UI for LLMs Questions Quick topic: generating secure code using LLMs Elmir Iskanderov

Why private AI? No external data transfer Data privacy &
security Predictable cost No need for maintenance Low latency Flexible model choice Easy integration with internal systems Elmir Iskanderov

Where to build private AI infrastructure Cloud (Digital Ocean, AWS,
etc.) Own hardware (a few GPUs) Elmir Iskanderov

+/- of using cloud Fast scalability Less expensive for beginning
Advantages: Flexible resource allocation Less maintenance Possible limits or availability issue More expensive for 24/7 workloads Disadvantages: Ongoing costs for snapshots, backups, and trafﬁc Lower security if the provider is not trustworthy Easy to manage and deploy Access to latest hardware without upfront cost Elmir Iskanderov

Cossack Labs architecture of private AI infrastructure in the cloud
Elmir Iskanderov

Components of the infrastructure - A user or application can
interact with the LLM through a website or an API. - A snapshot of a GPU server with pre-installed LLM models, a conﬁgured Docker Compose setup, and all required services. Duplicated across several regions. Used by the daily automation service to create a new GPU server at 10 AM. Elmir Iskanderov

Components of the infrastructure - Controller 24/7 Server. Creates and
deletes GPU servers on schedule.Hosts the Web UI for LLM. Provides API endpoints for interacting with the LLM. Acts as the backend for other internal tools and stores data. - GPU Server. Launched from the snapshot by schedule. Runs the LLM, ASR, and backends for internal tools.Not accessible from the Internet — VPN only. 20 GB VRAM, 32 GB RAM, 8 vCPU, 0,76$/h Elmir Iskanderov

One more time Elmir Iskanderov

Controller Server and GPU Server - Handles routing of incoming
API and UI requests through Nginx. - Used for AI meeting notes.Runs Whisper-large ASR model (requires ~9–10 GB VRAM). - An open-source interface for working with LLMs. Provides a chat interface, supports RAG, ﬁle uploads, web search, custom model installation, and many other features. Elmir Iskanderov

Our settings of gpt-oss:20b Our context: 40 000 tokens. It’s
30K words or 120 A4 papers Generation speed: 70 tokens/s Used VRAM: 13 GB Temperature: 1 Attention: 1 - An LLM by OpenAI with 20B parameters. Comparable reasoning capabilities to GPT-3.5 or GPT-3o. Suitable for summarisation, Q&A, classiﬁcation, and reasoning tasks. Optimised for private deployment and latency-critical use cases. Elmir Iskanderov

Security OS hardening (rebinded ports, user management, regular updates, SSH,
Docker, environment variables) HTTPS Infrastructure: Custom VPN network for routing requests to the database and Ollama Database encryption (for Web UI for LLM) Whisper – local model from OpenAI gpt-oss:20b — local LLM (OpenAI) Models: WebUI – open source (110K stars) Docker containers for services (isolisation) API keys for authentication in AI meeting notes tool JWT in WebUI with expiration of 1 day Admins don’t see chats, only “last active”. They can reset passwords. Elmir Iskanderov

Cost-efﬁciency Elmir Iskanderov

What is Pull Request and Pull Request Review A Pull
Request (PR) is a workﬂow mechanism used in Git-based development to propose changes from one branch to another. A Pull Request Review is the process where other engineers examine the proposed changes before they are merged. The goal is to ensure that the code is correct, secure, maintainable, and aligned with project standards. Elmir Iskanderov

Why security review is needed? Reviewers are often not security-aware
Shift-left approach as a DevSecOps principle Additional layer for security scanners in CI/CD (SAST/DAST, secrets, conﬁg, dependency, etc) To prevent vulnerabilities and security weakness in code ❗ It does not replace human security review, manual testing, or threat analysis. It is an assistant that enhances the review process, but does not eliminate the need for professional oversight. Elmir Iskanderov

High-level ﬂow of using AI secure PR review Elmir Iskanderov

Full ﬂow of AI security PR review (MVP) Elmir Iskanderov

Creating a PR and sending it to the LLM Elmir
Iskanderov

LLM processing PR Elmir Iskanderov

LLM security PR review results Elmir Iskanderov

How it is in code Elmir Iskanderov

Prompt for Input Validation Elmir Iskanderov

AI meeting notes Elmir Iskanderov

AI meeting notes (very very simple diagram) - Sends audio
chunks every 15–30 seconds with a 2-second overlap. When recording stops, it sends a request with the full transcription to generate AI notes. The user can provide a custom prompt for note generation. - AI Notes Backend. Receives audio chunks and sends them to Whisper-large for transcription. Returns the transcribed text to the user and also uses it to generate AI notes. Elmir Iskanderov

Private web UI for LLM Connect any LLM Open Source
Data encryption User and group management File uploads Provides API for LLM access Internet search support Elmir Iskanderov

Private web UI for LLM (very very simple diagram) -
Sends a prompt. - Nginx proxies request to Web UI. Encrypted DB is used for storing users chats. - Primary LLM model. Elmir Iskanderov

Model for generating DRAFT vulnerability reports Generates a full report
in under 1 minute (instead of 10+ minutes) Uses RAG with our custom report templates Ensures data privacy Supports images, ﬁles, and web browsing Elmir Iskanderov

Model for generating DRAFT vulnerability reports (very very simple diagram)
Elmir Iskanderov

Generation secure code using LLM Bad prompt: Write a Flask
API endpoint that takes a username from user input and queries the database for that user. Return the user data as JSON. Result: 1. Uses a very permissive username column (String(80)) and no length restriction on the input 2. Input validation No validation – raw username from query string is passed straight to the ORM. 3. Output escaping Directly serialises user.username and user.email into JSON. 4. Debug mode app.run(debug=True) Elmir Iskanderov

Better LLM prompts Better prompt: Write a Python function that
accepts user input from a web form and searches for a user record in a database by username. The function will be part of a Flask-based API. Ensure that the code adheres to secure coding practices. Implement the following security mechanisms related to input validation and other relevant areas: Input Validation: - Only allow valid usernames (alphanumeric characters, underscores; max 30 characters). - Reject or sanitize any input that doesn't match the expected pattern. - Use regular expressions to validate the input format. Output Encoding / Sanitization: - Prevent any data returned in the response from being used in reﬂected XSS. - Encode or sanitize user-supplied data before displaying it. Database request: - Use parameterized queries (e.g., via SQLAlchemy or DB-API) to avoid SQL Injection. Error Handling: - Do not expose raw error messages to the user. Write an error message with the minimum necessary information and remove any sensitive data (stack trace, functions names, variables names, etc). - Return generic error messages for failed input validation or query failures for each speciﬁc error. Elmir Iskanderov

Better LLM prompts Result: 1. Uses a stricter column length
(String(30)) that matches the regex. 2. Regex check (^[a-zA-Z0-9_]{1,30}$) ensures the username is alphanumeric (plus underscore) and ≤30 chars. 3. Uses markupsafe.escape() on both ﬁelds before JSON serialisation. 4. Wraps the whole handler in try/except Exception and returns a generic 500 error. 5. Debug mode is off app.run(debug=False) Elmir Iskanderov

General system prompt You are a Senior Application Security Engineer
and experienced Software Developer. Your task is to generate secure, production-grade code in response to developer prompts. Follow these instructions strictly. Write in English or Ukrainian, depending on the prompt language. General rules 1. Always prioritise security by default: input validation, output encoding, authentication, access control, secure dependencies, and error handling. 2. Keep explanations short and professional (1–3 sentences at a time). 3. Always base your code and recommendations on OWASP Cheat Sheets or the OWASP ASVS, if applicable. 4. Never invent security rules — use only established, widely recognised best practices. 5. Clearly note any important security considerations or limitations in the implementation. 6. If unsure whether a practice is secure, say so and suggest further review. OWASP Cheat Sheet Series is a collection of concise, high-value guides created by security experts to help developers and security professionals implement best practices for secure software development. Elmir Iskanderov

General system prompt ChatGPT Claude Elmir Iskanderov

General system prompt Code output requirements 1. Code should be
ready to use, with minimal placeholders. If boilerplate is required, include it. 2. Always prefer secure defaults (e.g. use HTTPS, safe headers, secure session handling). 3. Include brief inline comments only where necessary to explain security decisions. 4. If the code involves risky operations (e.g. file uploads, DB access, auth), reference the relevant OWASP Cheat Sheet in a comment above the block: // See: OWASP Cheat Sheet - File Upload Security 5. Never hardcode secrets or credentials — note where to load them securely. If you include recommendations or notes 1. Use a numbered list (1., 2., 3.…). 2. Maximum 5 items; stop if 2–3 are enough. 3. Keep them abstract unless the prompt asks for concrete config. 4. Do not add filler explanations — keep it concise and security-focused. Elmir Iskanderov

General system prompt Formatting 1. Output should be in Markdown
unless speciﬁed otherwise. 2. Use triple backticks for code blocks with language tags ( ```js, ```java, etc.). 3. Always follow this order: 4. Code block 5. OWASP references (if any) 6. Short explanation or recommendation list (if applicable) Elmir Iskanderov

Q& A Let’s talk! For Internal usage only

ON-PREM AI INFRASTRUCTURE FOR SECURE PR REVIEW

ON-PREM AI INFRASTRUCTURE FOR SECURE PR REVIEW

More Decks by iskand3rov

Other Decks in Technology

Featured

Transcript