Slide 1

Slide 1 text

Papers We Love June 26 https://arxiv.org/abs/1606.03966 Contextual Decision w/ Low Technical Debt

Slide 2

Slide 2 text

Ex: Which news? Repeatedly: 1. Observe features of user+articles 2. Choose a news article. 3. Observe click-or-not Goal: Maximize fraction of clicks

Slide 3

Slide 3 text

No content

Slide 4

Slide 4 text

… > 25% increase in clicks (without much tuning)

Slide 5

Slide 5 text

(, , ) (, ) , ) arg max 56789:;< (|, ) test fails L

Slide 6

Slide 6 text

No content

Slide 7

Slide 7 text

No content

Slide 8

Slide 8 text

Contextual Bandits not Supervised! Repeatedly: 1. Observe features 2. Choose action ∈ 3. Observe reward Goal: Maximize expected reward

Slide 9

Slide 9 text

BL L LZ KLR KLLS ELL HLL

Slide 10

Slide 10 text

Explore Log Learn Deploy What could possibly go wrong?

Slide 11

Slide 11 text

Client Library Or Web API Join Server Online Learning Offline Learning Policy App context decision reward (, , , ) , (, , , )

Slide 12

Slide 12 text

Client Library Or Web API Join Server Online Learning Offline Learning Policy App context decision reward (, , , ) , (, , , ) Explore

Slide 13

Slide 13 text

Client Library Or Web API Join Server Online Learning Offline Learning Policy App context decision reward (, , , ) , (, , , ) Log

Slide 14

Slide 14 text

Client Library Or Web API Join Server Online Learning Offline Learning Policy App context decision reward (, , , ) , (, , , ) Learn

Slide 15

Slide 15 text

Client Library Or Web API Join Server Online Learning Offline Learning Policy App context decision reward (, , , ) , (, , , ) Deploy

Slide 16

Slide 16 text

Client Library Or Web API Join Server Online Learning Offline Learning Policy App context decision reward (, , , ) , (, , , ) Offline Learn Data

Slide 17

Slide 17 text

No content

Slide 18

Slide 18 text

http://ds.microsoft.com http://aka.ms/mwt http://arxiv.org/abs/1606.03966 http://hunch.net/~vw