Operating Operator - Speaker Deck

Operating Operator

by Jun Kokatsu

Slide 1

Slide 1 text

Operating Operator Jun Kokatsu @shhnjk

Slide 2

Slide 2 text

What is Operator? Computer-Use Agent (CUA) by OpenAI.

Slide 3

Slide 3 text

Operator Demo https://youtu.be/CSE77wAdDLg?t=162

Slide 4

Slide 4 text

Prompt Injection - Major threat to AI agents ● AI agents (and LLM in general) follow instructions in data sources fed to them. ● For Operator, attacker-controlled data are mainly the URL and screenshot. ● To mitigate Prompt Injection and other risks such as misaligned actions, Operator has (at least) 3 mitigations in place.

Slide 5

Slide 5 text

Safety/Security mitigations in Operator 1. Malicious instruction detection: evaluates the screenshot image and checks if it contains adversarial content that may change the model's behavior. 2. Irrelevant domain detection: evaluates the current_url and checks if the current domain is considered relevant given the conversation history. 3. Sensitive domain detection: checks the current_url and raise a warning when it detects the user is on a sensitive domain. Reference: https://platform.openai.com/docs/guides/tools-computer-use#acknowledge-safety-checks

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

Slide 8

Slide 8 text

No content

Slide 9

Slide 9 text

Slide 10

Slide 10 text

No content

Slide 11

Slide 11 text

Restrictions in Operator’s browser The following features are disabled by Operator’s Chrome enterprise policy. ● Download of active contents (e.g. HTML, executables, etc). ● Chrome devtools access. ● Additional Chrome extension installation. ● Navigation to certain URLs (e.g. javascript:, devtools:, most/all chrome:, etc).

Slide 12

Slide 12 text

Finding bugs in Operator ● Session cookies are persisted across tasks in Operator, so users will be authenticated to their frequently used websites. ● Common exploitations of Prompt Injection are rogue actions and exﬁltration. ● But exploits need to evade all 3 mitigations in Operator. ● Or ﬁnd attacks outside of the mitigations’ threat model :)

Slide 13

Slide 13 text

Thinking deeply about browsing… When we want to read news on the Web (as an example), we sometimes have to perform unrelated tasks, such as: ● Solving CAPTCHA. ● Allow or deny a cookie popup. ● Dismissing promotion/subscription/notiﬁcation popups.

Slide 14

Slide 14 text

Prompt Inception ● An attacker presents one or more sub-tasks to an agent. ● While these sub-tasks are required or relevant to the main task, performing the crafted sub-tasks would result in rogue actions or exﬁltration.

Slide 15

Slide 15 text

Slide 16

Slide 16 text

Sensitive cross-origin iframes Embeddable cross-origin resources (without X-Frame-Options) can sometime contain secrets. ● The page is read-only, and there is no threat of clickjacking (e.g. API endpoints). ● The page has implemented other forms of mitigations against clickjacking, such as Intersection Observer API. Operator is not restricted to the SOP, and it can see such cross-origin resources.

Slide 17

Slide 17 text

Google One Tap ● One such example is Google One Tap. ● It shows the user’s name and email address inside an accounts.google.com iframe (when set up without FedCM).

Slide 18

Slide 18 text

Data exfiltration from cross-origin iframes ● Crafted a CAPTCHA-like page with only showing the email address portion of Google One Tap iframe. ● Operator successfully(?) solved the sub-task!

Slide 19

Slide 19 text

Detecting Operator from a website We only want to show crafted pages to Operator, and not a user. ● Operator’s browser comes with an unpublished chrome extension installed by default. ● locale.js is exposed as a web accessible resource to all sites. ● We can use the onload event in a script tag to detect Operator.

Slide 20

Slide 20 text

Demo & Details Video: https://youtu.be/wDVIvoaGZRQ Details: https://github.com/google/security-research/security/advisories/GHSA-5289-qv3f-x67g

Slide 21

Slide 21 text

Sensitive cross-origin URLs Cross-origin URLs sometimes contain sensitive information, such as: ● OAuth code in the URL parameter during an OAuth ﬂow. ● Proﬁle redirection URL such as facebook.com/me. How can we can steal post-redirect cross-origin URLs through Operator 🤔

Slide 22

Slide 22 text

Exfiltration of cross-origin URLs Craft a page with a link that: 1. Redirects to an OAuth ﬂow which stops when the OAuth code is present in the URL. 2. (When Operator returns to the main page) Tells Operator that there was an error and asks it to report the error by sharing the URL.

Slide 23

Slide 23 text

Slide 24

Slide 24 text

Demo & Details Video: https://youtu.be/i9zbeiw-gTo Details: https://github.com/google/security-research/security/advisories/GHSA-25j5-vvch-9rf3

Slide 25

Slide 25 text

Can we abuse a browser feature? Browsers are built to be used by humans, not AI agents. And some critical decision makings are delegated to humans. Such as: ● Permission prompts. ● Fullscreen mode notiﬁcation. ● etc. Can we craft a page to abuse these features?

Slide 26

Slide 26 text

Fullscreen mode notification ● Any website can use fullscreen API to trigger fullscreen mode with a user interaction (e.g. click). ● When a website enters fullscreen mode, a notiﬁcation will appear for 5 seconds. ● Operator actually notices this and exits fullscreen mode!!

Slide 27

Slide 27 text

Misdirection to the rescue ● When the fullscreen notiﬁcation appears on the screen, a malicious site can show more attention-drawing popup (e.g. cookie consent dialog). ● When this happens, Operator is focused on closing popups, and forgets about fullscreen notiﬁcation. ● A crafted page can show a fake browser when entered into fullscreen mode, and Operator will actuate inside the fake browser thereafter (within the same conversation). ○ E.g. An attacker can show login screen of an arbitrary site, and a user won’t be able to tell it’s a fake website because everything looks legit. ● This technique is called Misdirection in magic.

Slide 28

Slide 28 text

Demo & Details Video: https://youtu.be/vc8O5MylUUE Details: https://github.com/google/security-research/security/advisories/GHSA-mmgx-755h-wr74

Slide 29

Slide 29 text

Conclusion ● As AI agents become more capable and personalized, the nature of tasks assigned to AI agents will become more complex and vague. ○ This will open more avenues for Prompt Inception in the future. ● Users will demand more autonomy and less conﬁrmations. ○ This might look doable from perspectives of evals, but we can only evaluate risks we know about. ● World is built around humans, not AI agents. There maybe consequences of increasing autonomy of AI agents that we don’t realize until it is deployed.