John Langford on Making Contextual Decisions wi...

June 26, 2017

Science

1.5k

John Langford on Making Contextual Decisions with Low Technical Debt

Applications and systems are constantly faced with decisions that require picking from a set of actions based on contextual information. Reinforcement-based learning algorithms such as contextual bandits can be very effective in these settings, but applying them in practice is fraught with technical debt, and no general system exists that supports them completely. We address this and create the first general system for contextual learning, called the Decision Service. Existing systems often suffer from technical debt that arises from issues like incorrect data collection and weak debuggability, issues we systematically address through our ML methodology and system abstractions. The Decision Service enables all aspects of contextual bandit learning using four system abstractions which connect together in a loop: explore (the decision space), log, learn, and deploy. Notably, our new explore and log abstractions ensure the system produces correct, unbiased data, which our learner uses for online learning and to enable real-time safeguards, all in a fully reproducible manner.

The Decision Service has a simple user interface and works with a variety of applications: we present two live production deployments for content recommendation that achieved click-through improvements of 25-30%, another with 18% revenue lift in the landing page, and ongoing applications in tech support and machine failure handling. The service makes real-time decisions and learns continuously and scalably, while significantly lowering technical debt.

Papers_We_Love

June 26, 2017

Tweet

More Decks by Papers_We_Love

See All by Papers_We_Love

What About the Natural Numbers by José Manuel Calderón Trilla

1

140

Is Program Analysis The Silver Bullet Against Software Bugs? by Karim Ali

2

340

On the Expressive Power of Programming Languages by Shriram Krishnamurthi

1

290

Anonymity in the Bitcoin Peer-to-Peer Network by Giulia Fanti

0

280

Building Personable Machines by Star Simpson

0

82

PWL SF 03/18 > Cathie Yun on "Bulletproofs: Short Proofs for Confidential Transactions and More"

1

420

Bonnie Eisenman on Multiphase Numerical Modeling... for Jigsaw Puzzle Generation

0

720

Suz Hinton on Accessible images (AIMS)

0

1.1k

Hannes Frederic Sowa on "BBR: Congestion-Based Congestion Control"

0

1.6k

Other Decks in Science

See All in Science

SciPyDataJapan 2025

0

250

データから見る勝敗の法則 / The principle of victory discovered by science (open lecture in NSSU)

1

100

academist Prize 4期生研究トーク延長戦！「美は世界を救う」っていうけど、どうやって？

jimpe_hitsuwari

0

150

データベース02: データベースの概念

PRO

2

870

サイゼミ用因果推論

1

7.4k

データベース15: ビッグデータ時代のデータベース

PRO

0

320

高校生就活へのDA導入の提案

0

3.8k

統計学入門講座第4回スライド

techmathproject

0

170

データマイニング - コミュニティ発見

PRO

0

130

機械学習 - 決定木からはじめる機械学習

PRO

0

1k

データベース14: B+木 & ハッシュ索引

PRO

0

420

Lean4による汎化誤差評価の形式化

1

290

Featured

See All Featured

How to Create Impact in a Changing Tech Landscape [PerfNow 2023]

53

2.9k

32

14k

Building an army of robots

306

45k

Git: the NoSQL Database

PRO

431

65k

GraphQLの誤解/rethinking-graphql

71

11k

The Pragmatic Product Professional

36

6.8k

Refactoring Trust on Your Teams (GOTO; Chicago 2020)

34

3.1k

Art, The Web, and Tiny UX

301

21k

How to Think Like a Performance Engineer

25

1.8k

Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]

8

450

Unsuck your backbone

671

58k

Improving Core Web Vitals using Speculation Rules API

sergeychernyshev

18

1.1k

Transcript

Papers We Love June 26 https://arxiv.org/abs/1606.03966 Contextual Decision w/ Low
Technical Debt
Ex: Which news? Repeatedly: 1. Observe features of user+articles 2.
Choose a news article. 3. Observe click-or-not Goal: Maximize fraction of clicks
None
… > 25% increase in clicks (without much tuning)
(, , ) (, ) , ) arg max 56789:;<
(|, ) test fails L
None
None
Contextual Bandits not Supervised! Repeatedly: 1. Observe features 2. Choose
action ∈ 3. Observe reward Goal: Maximize expected reward
BL L LZ KLR KLLS ELL HLL
Explore Log Learn Deploy What could possibly go wrong?
Client Library Or Web API Join Server Online Learning Offline
Learning Policy App context decision reward (, , , ) , (, , , )
Client Library Or Web API Join Server Online Learning Offline
Learning Policy App context decision reward (, , , ) , (, , , ) Explore
Client Library Or Web API Join Server Online Learning Offline
Learning Policy App context decision reward (, , , ) , (, , , ) Log
Client Library Or Web API Join Server Online Learning Offline
Learning Policy App context decision reward (, , , ) , (, , , ) Learn
Client Library Or Web API Join Server Online Learning Offline
Learning Policy App context decision reward (, , , ) , (, , , ) Deploy
Client Library Or Web API Join Server Online Learning Offline
Learning Policy App context decision reward (, , , ) , (, , , ) Offline Learn Data
None
http://ds.microsoft.com http://aka.ms/mwt http://arxiv.org/abs/1606.03966 http://hunch.net/~vw