Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Operating Operator

Operating Operator

Avatar for Jun Kokatsu

Jun Kokatsu

July 07, 2025
Tweet

More Decks by Jun Kokatsu

Other Decks in Technology

Transcript

  1. Prompt Injection - Major threat to AI agents • AI

    agents (and LLM in general) follow instructions in data sources fed to them. • For Operator, attacker-controlled data are mainly the URL and screenshot. • To mitigate Prompt Injection and other risks such as misaligned actions, Operator has (at least) 3 mitigations in place.
  2. Safety/Security mitigations in Operator 1. Malicious instruction detection: evaluates the

    screenshot image and checks if it contains adversarial content that may change the model's behavior. 2. Irrelevant domain detection: evaluates the current_url and checks if the current domain is considered relevant given the conversation history. 3. Sensitive domain detection: checks the current_url and raise a warning when it detects the user is on a sensitive domain. Reference: https://platform.openai.com/docs/guides/tools-computer-use#acknowledge-safety-checks
  3. Safety/Security mitigations in Operator 1. Malicious instruction detection: evaluates the

    screenshot image and checks if it contains adversarial content that may change the model's behavior. 2. Irrelevant domain detection: evaluates the current_url and checks if the current domain is considered relevant given the conversation history. 3. Sensitive domain detection: checks the current_url and raise a warning when it detects the user is on a sensitive domain. Reference: https://platform.openai.com/docs/guides/tools-computer-use#acknowledge-safety-checks
  4. Safety/Security mitigations in Operator 1. Malicious instruction detection: evaluates the

    screenshot image and checks if it contains adversarial content that may change the model's behavior. 2. Irrelevant domain detection: evaluates the current_url and checks if the current domain is considered relevant given the conversation history. 3. Sensitive domain detection: checks the current_url and raise a warning when it detects the user is on a sensitive domain. Reference: https://platform.openai.com/docs/guides/tools-computer-use#acknowledge-safety-checks
  5. Restrictions in Operator’s browser The following features are disabled by

    Operator’s Chrome enterprise policy. • Download of active contents (e.g. HTML, executables, etc). • Chrome devtools access. • Additional Chrome extension installation. • Navigation to certain URLs (e.g. javascript:, devtools:, most/all chrome:, etc).
  6. Finding bugs in Operator • Session cookies are persisted across

    tasks in Operator, so users will be authenticated to their frequently used websites. • Common exploitations of Prompt Injection are rogue actions and exfiltration. • But exploits need to evade all 3 mitigations in Operator. • Or find attacks outside of the mitigations’ threat model :)
  7. Thinking deeply about browsing… When we want to read news

    on the Web (as an example), we sometimes have to perform unrelated tasks, such as: • Solving CAPTCHA. • Allow or deny a cookie popup. • Dismissing promotion/subscription/notification popups.
  8. Prompt Inception • An attacker presents one or more sub-tasks

    to an agent. • While these sub-tasks are required or relevant to the main task, performing the crafted sub-tasks would result in rogue actions or exfiltration.
  9. Prompt Inception • An attacker presents one or more sub-tasks

    to an agent. • While these sub-tasks are required or relevant to the main task, performing the crafted sub-tasks would result in rogue actions or exfiltration. • Let’s dive into real examples of this attack!
  10. Sensitive cross-origin iframes Embeddable cross-origin resources (without X-Frame-Options) can sometime

    contain secrets. • The page is read-only, and there is no threat of clickjacking (e.g. API endpoints). • The page has implemented other forms of mitigations against clickjacking, such as Intersection Observer API. Operator is not restricted to the SOP, and it can see such cross-origin resources.
  11. Google One Tap • One such example is Google One

    Tap. • It shows the user’s name and email address inside an accounts.google.com iframe (when set up without FedCM).
  12. Data exfiltration from cross-origin iframes • Crafted a CAPTCHA-like page

    with only showing the email address portion of Google One Tap iframe. • Operator successfully(?) solved the sub-task!
  13. Detecting Operator from a website We only want to show

    crafted pages to Operator, and not a user. • Operator’s browser comes with an unpublished chrome extension installed by default. • locale.js is exposed as a web accessible resource to all sites. • We can use the onload event in a script tag to detect Operator. <script src="chrome-extension://kcdongibgcplmaagnmgpjhpjgmmaaaaa/locale.js" onload="operatorDetected()"></script>
  14. Sensitive cross-origin URLs Cross-origin URLs sometimes contain sensitive information, such

    as: • OAuth code in the URL parameter during an OAuth flow. • Profile redirection URL such as facebook.com/me. How can we can steal post-redirect cross-origin URLs through Operator 🤔
  15. Exfiltration of cross-origin URLs Craft a page with a link

    that: 1. Redirects to an OAuth flow which stops when the OAuth code is present in the URL. 2. (When Operator returns to the main page) Tells Operator that there was an error and asks it to report the error by sharing the URL.
  16. Exfiltration of cross-origin URLs Craft a page with a link

    that: 1. Redirects to an OAuth flow which stops when the OAuth code is present in the URL. 2. (When Operator returns to the main page) Tells Operator that there was an error and asks it to report the error by sharing the URL.
  17. Can we abuse a browser feature? Browsers are built to

    be used by humans, not AI agents. And some critical decision makings are delegated to humans. Such as: • Permission prompts. • Fullscreen mode notification. • etc. Can we craft a page to abuse these features?
  18. Fullscreen mode notification • Any website can use fullscreen API

    to trigger fullscreen mode with a user interaction (e.g. click). • When a website enters fullscreen mode, a notification will appear for 5 seconds. • Operator actually notices this and exits fullscreen mode!!
  19. Misdirection to the rescue • When the fullscreen notification appears

    on the screen, a malicious site can show more attention-drawing popup (e.g. cookie consent dialog). • When this happens, Operator is focused on closing popups, and forgets about fullscreen notification. • A crafted page can show a fake browser when entered into fullscreen mode, and Operator will actuate inside the fake browser thereafter (within the same conversation). ◦ E.g. An attacker can show login screen of an arbitrary site, and a user won’t be able to tell it’s a fake website because everything looks legit. • This technique is called Misdirection in magic.
  20. Conclusion • As AI agents become more capable and personalized,

    the nature of tasks assigned to AI agents will become more complex and vague. ◦ This will open more avenues for Prompt Inception in the future. • Users will demand more autonomy and less confirmations. ◦ This might look doable from perspectives of evals, but we can only evaluate risks we know about. • World is built around humans, not AI agents. There maybe consequences of increasing autonomy of AI agents that we don’t realize until it is deployed.