AI Control Problem - Speaker Deck

Slide 1

Slide 1 text

AI Control Problem David Evans University of Virginia aipavilion.github.io AI Pavilion 29 October 2018

Slide 2

Slide 2 text

Schedule Reminders “Final” Papers are Due 4:59pm Thursday, Nov 1 Email [email protected], subject line: [AI Pavilion] Paper Title Include in the email body: 1. What is the purpose of your paper? (one sentence answer) 2. Who is your intended audience? (one sentence answer) 3. If you decided not to follow advice from the first draft, explain why. (It is okay to not follow advice, but you need to make it clear that you understood the advice and justify why you didn’t follow it.) 4. Do you want to continue with this topic, or start on a new topic for the “final” paper? (Yes/no answer is fine, but feel free to explain more if helpful) 1

Slide 3

Slide 3 text

Next Class: Short Presentation You should prepare a short presentation (no more than 5 minutes) about your paper 2

Slide 4

Slide 4 text

Crash Course in Machine Learning 3

Slide 5

Slide 5 text

Labelled Training Data ML Algorithm Feature Extraction Vectors Deployment Malicious / Benign Operational Data Trained Classifier Training (supervised learning) Statistical Machine Learning

Slide 6

Slide 6 text

Learning a Function 5 !: ℝ$ ⟶ {1, 2, … , +} !: ℝ$ ⟶ ℝ !: ℤ$ ⟶ ℤ$

Slide 7

Slide 7 text

Learning a Function 6 !: ℝ$ ⟶ {1, 2, … , +} !: ℝ$ ⟶ ℝ !: ℤ$ ⟶ ℤ$ Classifier: ⟶ “panda” Regression: Profile ⟶ risk of default Translation:

Slide 8

Slide 8 text

How to Learn Functions Linear Regression: learn a function of type: 7 ! = #$ %$ + #' %' + … + #) %)

Slide 9

Slide 9 text

How to Learn Functions Linear Regression: learn a function of type: 8 ! " = $% &% + $( &( + … + $* &* Training: given set of labeled points, { ,% , "% , ,( , "( , … , ,. , ". } Find the values for weights 0 that minimize prediction error: Mean squared error: 123 = % . (∑ 6 7 ("6 − "6 )(

Slide 10

Slide 10 text

How well does this work? Generalization: easy to learn a function that predicts the training data a simple lookup table does perfectly! goal is to find a function that generalizes: produces correct prediction for unseen inputs Capacity: ability to learn a large variety of functions linear regression can only learn linear functions too high capacity: overfits training data (poor generalization) 9

Slide 11

Slide 11 text

Perceptron 10 !" !# !$ … % = '(!" )" + !# )# + … + !$ )$ )

Slide 12

Slide 12 text

How powerful is a perceptron? 11 !" !# !$ … % = '(!" )" + !# )# + … + !$ )$ )

Slide 13

Slide 13 text

Trick-or-Treat Protocols 12

Slide 14

Slide 14 text

“Trick or Treat” 13 !"#$% ∨ !"'() Tricker initiates the protocol by making a threat and demanding tribute Victim either pays tribute (usually in the form of sugary snack) or risks being tricked

Slide 15

Slide 15 text

Illogical Threat 14 !"#$% ∨ !"'()

Slide 16

Slide 16 text

“Trick xor Treat” 15 !"#$% ⊕ !"'() Tricker initiates the protocol by making a threat and demanding tribute Victim either pays tribute (usually in the form of sugary snack) or risks being tricked Tricker must convince Victim that she poses a credible threat: prove she is a qualified tricker

Slide 17

Slide 17 text

Trick-or-Treat Trickers? “Trick xor Treat?” “Prove it!” “The magic word is: squamish ossifrage” Victim 16 Any problems with this?

Slide 18

Slide 18 text

Proof without Disclosure How can the tricker prove their trickability, without allowing the victim to now impersonate a tricker? 17

Slide 19

Slide 19 text

Challenge-Response Protocol 18 Prover: proves knowledge of ! by revealing "(!, %) . Verifier: convinces prover knows !, but learns nothing useful about !. Verifier: picks random %. Need a one-way function: hard to invert, but easy to compute.

Slide 20

Slide 20 text

Example: RSA 19 E e (M ) = Me mod n D d (C ) = Cd mod n Correctness property: E e (D d (.)) = .

Slide 21

Slide 21 text

Trick-or-Treat Trickers? “Trick xor Treat?” “Prove it! Challenge = !” Response: " = $ % ! = !%mod ) Victim 20

Slide 22

Slide 22 text

Trick-or-Treat Trickers? “Trick xor Treat?” “Prove it! Challenge = !” Response: " = $ % ! = !%mod ) Victim 21 How does victim know e and n? Verify: *+ " = "+mod ) = !

Slide 23

Slide 23 text

“Trick xor Treat?” “What is your Tricker ID?” 22 “Elsa #253224”, ! = 3482..., " = 1234... signed by Tricker’s Buroo “Prove it! Challenge = #” Response: $ = & ' # = #'mod " Verify: +, $ = $,mod " = # Verify Tricker’s Buroo signature on certificate

Slide 24

Slide 24 text

“Trick xor Treat?” “Hello" 23 “virginia.edu”, ! = … " = ... signed by Certificate Authority “Prove it! Decrypt E$ (&) channel encrypted using & Verify and Decrypt: () *+ (&) = & Verify signature on certificate Server

Slide 25

Slide 25 text

Slide 26

Slide 26 text

How powerful is a perceptron? 25 !" !# !$ … % = '(!" )" + !# )# + … + !$ )$ )

Slide 27

Slide 27 text

Deep Learning 26 Connect multiple layers of perceptrons In theory, one hidden layer is enough to match any training set In practice, more layers often works better

Slide 28

Slide 28 text

27 Inception v3 23M parameters,

Slide 29

Slide 29 text

Training a DNN • Loss function: measure of how close output are to desired outputs • Backpropagation: update weights throughout network to minimize loss function 28 When Bostom talks about reward functions, this is what it means (for todays ML)

Slide 30

Slide 30 text

Some examples… 29 https://www.youtube.com/watch?v=GdTBqBnqhaQ

Slide 31

Slide 31 text

GANs 30 Generative Adversarial Networks [Ian Goodfellow, et al. 2014]

Slide 32

Slide 32 text

https://github.com/znxlwm/tensorflow-MNIST-GAN-DCGAN 31

Slide 33

Slide 33 text

Slide 34

Slide 34 text

33 https://github.com/robbiebarrat/art-DCGAN

Slide 35

Slide 35 text

34 Generator

Slide 36

Slide 36 text

35 Robot Box Game: +1 reward for pushing box into black square -0.01 penalty for each step Camera observer: Shuts down robot once one box is pushed

Slide 37

Slide 37 text

36 https://www.youtube.com/watch?v=sx8JkdbNgdU

Slide 38

Slide 38 text

Reading Discussion • Capability Contol: – Boxing (limit access) – Incentive (cryptotoken rewards) – Stunting (constraints on abilities) – Tripwires (diagnostics) • Motivation Selection – Direct specification – Domesticity (limit scope) – Indirect normativity – Augmentation 37 For your topic: 1. Explain what it is 2. Why Bostrom doesn’t think it is sufficient 3. Why it could work 4. Argue for or against its effectiveness

Slide 39

Slide 39 text

https://www.youtube.com/watch?v=3TYT1QfdfsM 38