Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Learning with Fenchel-Young Losses
Search
Han Bao
June 10, 2020
Science
0
380
Learning with Fenchel-Young Losses
I read the paper "Learning with Fenchel-Young Losses" (JMLR2020):
https://arxiv.org/abs/1901.02324
Han Bao
June 10, 2020
Tweet
Share
Other Decks in Science
See All in Science
主成分分析に基づく教師なし特徴抽出法を用いたコラーゲン-グリコサミノグリカンメッシュの遺伝子発現への影響
tagtag
0
150
2025-06-11-ai_belgium
sofievl
1
210
データマイニング - ノードの中心性
trycycle
PRO
0
320
データベース04: SQL (1/3) 単純質問 & 集約演算
trycycle
PRO
0
1.1k
Ignite の1年間の軌跡
ktombow
0
190
ド文系だった私が、 KaggleのNCAAコンペでソロ金取れるまで
wakamatsu_takumu
2
1.7k
蔵本モデルが解き明かす同期と相転移の秘密 〜拍手のリズムはなぜ揃うのか?〜
syotasasaki593876
1
160
防災デジタル分野での官民共創の取り組み (1)防災DX官民共創をどう進めるか
ditccsugii
0
440
academist Prize 4期生 研究トーク延長戦!「美は世界を救う」っていうけど、どうやって?
jimpe_hitsuwari
0
460
[Paper Introduction] From Bytes to Ideas:Language Modeling with Autoregressive U-Nets
haruumiomoto
0
170
データマイニング - グラフ埋め込み入門
trycycle
PRO
1
130
機械学習 - 決定木からはじめる機械学習
trycycle
PRO
0
1.2k
Featured
See All Featured
Why You Should Never Use an ORM
jnunemaker
PRO
61
9.7k
Rebuilding a faster, lazier Slack
samanthasiow
85
9.3k
Mind Mapping
helmedeiros
PRO
0
38
Designing Experiences People Love
moore
143
24k
Building Applications with DynamoDB
mza
96
6.8k
Self-Hosted WebAssembly Runtime for Runtime-Neutral Checkpoint/Restore in Edge–Cloud Continuum
chikuwait
0
200
Introduction to Domain-Driven Design and Collaborative software design
baasie
1
510
Designing Powerful Visuals for Engaging Learning
tmiket
0
190
<Decoding/> the Language of Devs - We Love SEO 2024
nikkihalliwell
0
100
Visualization
eitanlees
150
16k
The Curious Case for Waylosing
cassininazir
0
190
Testing 201, or: Great Expectations
jmmastey
46
7.8k
Transcript
-FBSOJOHXJUI 'FODIFM:PVOH-PTTFT $SFBUFECZ)BO#BP 1I%BU65PLZP$4 <#MPOEFM .BSUJOTBOE/JDVMBF+.-3>
8IBUJTMPTTGVODUJPOT ˙ .FBTVSJOHEJ⒎FSFODFCFUXFFOUBSHFUBOEQSFEJDUJPO ⾣ &YBNQMFSFHSFTTJPO ⾣ &YBNQMFCJOBSZDMBTTJpDBUJPO
yf(x) ℓ(yf(x)) DPSSFDU XSPOH y − f(x) ℓ(y − f(x)) NBLJOH DMPTFSUP TRVBSFEMPTT )VCFSMPTT f(x) y NBLJOH FRVBMUP MPTT MPHJTUJDMPTT IJOHFMPTT sign( f(x)) sign(y)
'FODIFM:PVOH-PTT %FpOJUJPO-FU CFBlQSFEJDUJPOzSFHVMBSJ[FS Ω : ℝd → ℝ
LΩ (θ; y) := Ω⋆(θ) + Ω(y) − ⟨θ, y⟩ QSFEJDUJPO TDPSF ∈ ℝd UBSHFUMBCFM ∈ dom(Ω) 'FODIFMDPOKVHBUF Ω⋆(θ) := sup μ∈dom(Ω) ⟨θ, μ⟩ − Ω(μ) 8IBUPOUIFFBSUIEPFTJUNFBO 1PUFOUJBMRVFTUJPOT UPCFBOTXFSFE 28IBUJTlQSFEJDUJPOzSFHVMBSJ[FS 28IZEPXFOFFESFHVMBSJ[BUJPOPGQSFEJDUJPO 28IZJTUIFMPTTEFpOFEBTBCPWF
1JQFMJOFPG4VQFSWJTFE-FBSOJOH *OQVUTQBDF 4DPSFTQBDF 0VUQVUTQBDF ℝd x θ ̂ y
QBSBNFUSJ[FENPEFM fW QSFEJDUJPOGVODUJPO ̂ yΩ 0.821 1.215 ⋮ 5.382 ⋮ −1.012 0 0 ⋮ 1 ⋮ 0 ∈ %// fW BSHNBY ̂ yΩ *OQVU 4DPSF 0VUQVU &YBNQMF DMBTTJpDBUJPO IPUWFDT
1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩
1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ Δ3
BSHNBY ˙ DIPPTJOHBWFSUFY USBDUBCMF
1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ Δ3
BSHNBY ˙ DIPPTJOHBWFSUFY USBDUBCMF OPOVOJRTPMVUJPO Δ3 BSHNBY
1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ Δ3
BSHNBY ˙ DIPPTJOHBWFSUFY USBDUBCMF OPOVOJRTPMVUJPO Δ3 BSHNBY OPOEJ⒎FSFOUJBCMF OPVODFSUBJOUZ
1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ TPGUNBY argmax y∈Δd
⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ
TPGUNBY ˙ SFHVMBSJ[FUPXBSETDFOUFS argmax y∈Δd ⟨θ, y⟩
+ HS (y) = exp θi ∑d j=1 exp θj i PSEJOBSZFYQSFTTJPO 1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ TPGUNBY argmax y∈Δd ⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ
TPGUNBY ˙ SFHVMBSJ[FUPXBSETDFOUFS argmax y∈Δd ⟨θ, y⟩
+ HS (y) = exp θi ∑d j=1 exp θj i PSEJOBSZFYQSFTTJPO 1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ EJ⒎FSFOUJBCMF VODFSUBJOUZ TPGUNBY argmax y∈Δd ⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ
TPGUNBY ˙ SFHVMBSJ[FUPXBSETDFOUFS argmax y∈Δd ⟨θ, y⟩
+ HS (y) = exp θi ∑d j=1 exp θj i PSEJOBSZFYQSFTTJPO 1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ EFOTFTVQQPSU GPSMBSHF JTJOUSBDUBCMF ∑d j=1 d EJ⒎FSFOUJBCMF VODFSUBJOUZ TPGUNBY argmax y∈Δd ⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ
1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ TPGUNBY argmax y∈Δd
⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 5TBMMJT FOUSPQZ
1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ TPGUNBY argmax y∈Δd
⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 5TBMMJT FOUSPQZ ˙ &VDMJEFBOQSPKFDUJPOUPXBSETTJNQMFY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 = argmax y∈Δd ∥y − θ∥2 UFOEUPCFTQBSTF
1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ TPGUNBY argmax y∈Δd
⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 5TBMMJT FOUSPQZ ˙ &VDMJEFBOQSPKFDUJPOUPXBSETTJNQMFY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 = argmax y∈Δd ∥y − θ∥2 UFOEUPCFTQBSTF EDBTF EFOTF TQBSTF Δ2
1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ TPGUNBY argmax y∈Δd
⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 5TBMMJT FOUSPQZ ˙ &VDMJEFBOQSPKFDUJPOUPXBSETTJNQMFY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 = argmax y∈Δd ∥y − θ∥2 UFOEUPCFTQBSTF EDBTF EFOTF TQBSTF Δ2 EDBTF Δ3 EFOTF TQBSTF
1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ TPGUNBY argmax y∈Δd
⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 5TBMMJT FOUSPQZ ˙ &VDMJEFBOQSPKFDUJPOUPXBSETTJNQMFY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 = argmax y∈Δd ∥y − θ∥2 UFOEUPCFTQBSTF EDBTF EFOTF TQBSTF Δ2 EDBTF Δ3 EFOTF TQBSTF ⾣ QPJOUTJO ˠEFOTFQSPK ⾣ PUIFSXJTFˠTQBSTFQSPK ⾣ JTGBSTNBMMFSUIBOℝd
1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ TPGUNBY argmax y∈Δd
⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 5TBMMJT FOUSPQZ TQBSTFNBY USBDUBCMF TUJMMJUEFQFOET EJ⒎FSFOUJBCMF VOJRVFTPMVUJPO TQBSTFTVQQPSU ˠJOUFSQSFUBCMF ˙ &VDMJEFBOQSPKFDUJPOUPXBSETTJNQMFY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 = argmax y∈Δd ∥y − θ∥2
l3FHVMBSJ[FEz1SFEJDUJPO BSHNBY argmax y∈Δd ⟨θ, y⟩ TPGUNBY argmax y∈Δd
⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 5TBMMJT FOUSPQZ %FpOJUJPO-FU CFBSFHVMBSJ[FS 5IFQSFEJDUJPOGVODUJPOSFHVMBSJ[FECZ JT Ω : ℝd → ℝ Ω ̂ yΩ (θ) = argmax y∈dom(Ω) ⟨θ, y⟩ − Ω(y) QSFEJDUJPO TDPSF ∈ ℝd NBLFTQSFEJDUJPO BQBSUGSPNWFSUJDFT CFBXBSFEJ⒎FSFOU GSPNVTVBMSFHVMBSJ[BUJPO Loss( fW ) + λ∥W∥2 F
'VSUIFS4USVDUVSFE1SFEJDUJPO UIJTQBSUKVTUNPUJWBUFTSFHVMBSJ[BUJPOGVSUIFSZPVNBZTLJQJU &YBNQMF4FRVFODFMBCFMJOH 0VUQVUTQBDFDPOTJTUTPGTUSVDUVSFEPCKFDUTTVDIBTHSBQIT * MPWF MPTT GVODUJPOT /
/ / / 7 / / / 7 / / / ʜ ʜ JOQVUx PVUQVU DBOETy ʜ ʜ TDPSFTθ MFOHUIn 7 / + ʜ TJ[Fm TFUPGMBCFMT QSPCBCJMJUZ TJNQMFY
'VSUIFS4USVDUVSFE1SFEJDUJPO UIJTQBSUKVTUNPUJWBUFTSFHVMBSJ[BUJPOGVSUIFSZPVNBZTLJQJU &YBNQMF4FRVFODFMBCFMJOH 0VUQVUTQBDFDPOTJTUTPGTUSVDUVSFEPCKFDUTTVDIBTHSBQIT * MPWF MPTT GVODUJPOT /
/ / / 7 / / / 7 / / / ʜ ʜ JOQVUx PVUQVU DBOETy ʜ ʜ TDPSFTθ MFOHUIn 7 / + ʜ TJ[Fm TFUPGMBCFMT FYQPOFOUJBM || = mn QSPCBCJMJUZ TJNQMFY
'VSUIFS4USVDUVSFE1SFEJDUJPO UIJTQBSUKVTUNPUJWBUFTSFHVMBSJ[BUJPOGVSUIFSZPVNBZTLJQJU &YBNQMF-JOFBSBTTJHONFOU FHMJTUXJTFSBOLJOH 0VUQVUTQBDFDPOTJTUTPGTUSVDUVSFEPCKFDUTTVDIBTHSBQIT JOQVUx PVUQVU DBOETy
ʜ ʜ TDPSFTθ #JSLIP⒎ QPMZUPQF EPD EPD EPD EPD PGEPDTn ʜ ʜ
'VSUIFS4USVDUVSFE1SFEJDUJPO UIJTQBSUKVTUNPUJWBUFTSFHVMBSJ[BUJPOGVSUIFSZPVNBZTLJQJU &YBNQMF-JOFBSBTTJHONFOU FHMJTUXJTFSBOLJOH 0VUQVUTQBDFDPOTJTUTPGTUSVDUVSFEPCKFDUTTVDIBTHSBQIT JOQVUx PVUQVU DBOETy
ʜ ʜ TDPSFTθ #JSLIP⒎ QPMZUPQF EPD EPD EPD EPD PGEPDTn FYQPOFOUJBM || = n! ʜ ʜ
'VSUIFS4USVDUVSFE1SFEJDUJPO ˙ -PXEJNFOTJPOBMJOIFSFOUTUSVDUVSFFYJTUT UIP JTFYQPOFOUJBMMZMBSHF
˙ &YBNQMF4FRVFODFMBCFMJOH ⾣ "TTVNQJOQVUXPSENBUUFST ⾣ "TTVNQQSFWMBCFMNBUUFST || UIJTQBSUKVTUNPUJWBUFTSFHVMBSJ[BUJPOGVSUIFSZPVNBZTLJQJU x1 x2 x3 x4 y1 y2 y3 y4 *OQVUTQBDF 4DPSFTQBDF x θ NPEFM fW -PXEJN TDPSFTQBDF η MJOFBSUSBOT M QSPCMFNEFQFOEFOU 㱺MPXEJNTUSVDUVSFO(nm2) XJEFMZVTFEJOMJOFBSDIBJO$3'T
'VSUIFS4USVDUVSFE1SFEJDUJPO ."1JOGFSFODF argmax y∈conv() ⟨θ, y⟩ NBSHJOBMJOGFSFODF argmax y∈conv()
⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ 4QBSTF."1 argmax y∈conv() ⟨θ, y⟩ + H2 (y) 5TBMMJT FOUSPQZ UIJTQBSUKVTUNPUJWBUFTSFHVMBSJ[BUJPOGVSUIFSZPVNBZTLJQJU USBDUBCMF OPVODFSUBJOUZ OPEJ⒎FSFOUJBUJPO EJ⒎FSFOUJBCMF VODFSUBJOUZ PGUFO JOUSBDUBCMF EFOTFTVQQPSU EJ⒎FSFOUJBCMF VODFSUBJOUZ USBDUBCMF 'SBOL8PMGF TQBSTFTVQQPSU 3FNBSL TQBSTFNBYEPFTOPUVUJMJ[F MPXEJNTUSVDUVSF 3FNBSL5SBDUBCJMJUZ TFRVFODFMBCFMJOH."1 7JUFSCJ NBSHJOBMBSF MJOBTTJHO."1 )VOHBSJBO JT NBSHJOBMJT1DPNQ O(nm2) O(n3)
)PXUPEFTJHOMPTT *OQVUTQBDF 4DPSFTQBDF 0VUQVUTQBDF x θ fW ̂ yΩ
QSFEJDUJPO ̂ y TPGUNBY argmax y∈Δd ⟨θ, y⟩ + HS (y) TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2
)PXUPEFTJHOMPTT *OQVUTQBDF 4DPSFTQBDF 0VUQVUTQBDF x θ fW ̂ yΩ
QSFEJDUJPO ̂ y TPGUNBY argmax y∈Δd ⟨θ, y⟩ + HS (y) TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 UBSHFUMBCFM y
)PXUPEFTJHOMPTT *OQVUTQBDF 4DPSFTQBDF 0VUQVUTQBDF x θ fW ̂ yΩ
QSFEJDUJPO ̂ y TPGUNBY argmax y∈Δd ⟨θ, y⟩ + HS (y) TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 UBSHFUMBCFM y 2)PXUPNFBTVSF
)PXUPEFTJHOMPTT TPGUNBY argmax y∈Δd ⟨θ, y⟩ + HS (y)
TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 1SFEJDUJPOGVODUJPO -PTTGVODUJPO
)PXUPEFTJHOMPTT TPGUNBY argmax y∈Δd ⟨θ, y⟩ + HS (y)
TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 1SFEJDUJPOGVODUJPO -PTTGVODUJPO DSPTTFOUSPQZ log∑ i exp θi − θk UBSHFUDMBTT
)PXUPEFTJHOMPTT TPGUNBY argmax y∈Δd ⟨θ, y⟩ + HS (y)
TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 1SFEJDUJPOGVODUJPO -PTTGVODUJPO DSPTTFOUSPQZ log∑ i exp θi − θk UBSHFUDMBTT 28IZJUJTHPPE
)PXUPEFTJHOMPTT TPGUNBY argmax y∈Δd ⟨θ, y⟩ + HS (y)
TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 1SFEJDUJPOGVODUJPO -PTTGVODUJPO DSPTTFOUSPQZ log∑ i exp θi − θk UBSHFUDMBTT ʁʁʁ 28IZJUJTHPPE
)PXUPEFTJHOMPTT TPGUNBY argmax y∈Δd ⟨θ, y⟩ + HS (y)
TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 1SFEJDUJPOGVODUJPO -PTTGVODUJPO DSPTTFOUSPQZ log∑ i exp θi − θk UBSHFUDMBTT ʁʁʁ 28IZJUJTHPPE 2)PXUPEFTJHO
'FODIFM:PVOH-PTT %FpOJUJPO-FU CFBlQSFEJDUJPOzSFHVMBSJ[FS Ω : ℝd → ℝ
LΩ (θ; y) := Ω⋆(θ) + Ω(y) − ⟨θ, y⟩ QSFEJDUJPO TDPSF UBSHFUMBCFM 'FODIFMDPOKVHBUF Ω⋆(θ) := sup μ∈dom(Ω) ⟨θ, μ⟩ − Ω(μ) 5XPLFZQSPQFSUJFT ˙ ':MPTTJTOPOOFHBUJWF ˙ DPSSFDUQSFE J⒎[FSPMPTT y = ̂ yΩ (θ)
'FODIFM:PVOH-PTT %FpOJUJPO-FU CFBlQSFEJDUJPOzSFHVMBSJ[FS Ω : ℝd → ℝ
LΩ (θ; y) := Ω⋆(θ) + Ω(y) − ⟨θ, y⟩ QSFEJDUJPO TDPSF UBSHFUMBCFM 'FODIFMDPOKVHBUF Ω⋆(θ) := sup μ∈dom(Ω) ⟨θ, μ⟩ − Ω(μ) 5XPLFZQSPQFSUJFT ˙ ':MPTTJTOPOOFHBUJWF ˙ DPSSFDUQSFE J⒎[FSPMPTT y = ̂ yΩ (θ) .JOJNJ[JOH':MPTTNBLFTQSFEJDUJPODMPTFUPUBSHFUMBCFM
'FODIFM:PVOH-PTT %FpOJUJPO-FU CFBlQSFEJDUJPOzSFHVMBSJ[FS Ω : ℝd → ℝ
LΩ (θ; y) := Ω⋆(θ) + Ω(y) − ⟨θ, y⟩ QSFEJDUJPO TDPSF UBSHFUMBCFM 'FODIFMDPOKVHBUF Ω⋆(θ) := sup μ∈dom(Ω) ⟨θ, μ⟩ − Ω(μ) 5XPLFZQSPQFSUJFT ˙ ':MPTTJTOPOOFHBUJWF ˙ DPSSFDUQSFE J⒎[FSPMPTT y = ̂ yΩ (θ) .JOJNJ[JOH':MPTTNBLFTQSFEJDUJPODMPTFUPUBSHFUMBCFM 1SPPG6TF'FODIFM:PVOHJOFRVBMJUZ Ω⋆(θ) + Ω(y) ≥ {⟨θ, y⟩ − Ω(y)} + Ω(y) = ⟨θ, y⟩
(FPNFUSJDBM*OUFSQSFUBUJPO Ω(y) y μ 'PSSFHVMBSJ[FS
(FPNFUSJDBM*OUFSQSFUBUJPO Ω(y) y μ 'PSSFHVMBSJ[FS ̂ yΩ (θ) ESBXUBOHFOU
BU CZEFGPG'FODIFMDPOKVHBUF ̂ yΩ (θ) ⟨θ, μ⟩ − Ω⋆(θ)
(FPNFUSJDBM*OUFSQSFUBUJPO Ω(y) y μ 'PSSFHVMBSJ[FS ̂ yΩ (θ) ESBXUBOHFOU
BU CZEFGPG'FODIFMDPOKVHBUF ̂ yΩ (θ) ⟨θ, μ⟩ − Ω⋆(θ) ⟨θ, y⟩ − Ω⋆(θ) −Ω⋆(θ)
(FPNFUSJDBM*OUFSQSFUBUJPO Ω(y) y μ 'PSSFHVMBSJ[FS ̂ yΩ (θ) ESBXUBOHFOU
BU CZEFGPG'FODIFMDPOKVHBUF ̂ yΩ (θ) ⟨θ, μ⟩ − Ω⋆(θ) ⟨θ, y⟩ − Ω⋆(θ) −Ω⋆(θ) LΩ (y; θ) -PTTJTEJTUBODFCFUXFFO BOE BU #SFHNBOEJWFSHFODF y
&YBNQMF4IBOOPO&OUSPQZ HS (y) = − d ∑ j=1 yj
log yj ̂ yHS (θ) = argmax y∈Δd ⟨θ, y⟩ − HS (y) = exp θ ∑d j=1 exp θj TPGUNBY θ ̂ y(θ) CJOBSZTPGUNBYTJHNPJE LHS (θ; y) = H⋆ S (θ) + HS (y) − ⟨θ, y⟩ = log d ∑ j=1 exp θj − θk BTTVNJOHy = ek DSPTTFOUSPQZ JTMPHJTUJDMPTTJOCJOBSZDBTF LHS ( ̂ yHS (θ); y)
&YBNQMF5TBMMJT&OUSPQZ H2 (y) = 1 2 d ∑ j=1
yj (1 − yj ) Hα (y) = 1 α(α − 1) d ∑ j=1 (yj − yα j ) HS (y) = − d ∑ j=1 yj log yj 5TBMMJTFOUSPQZ α α → 2 α → 1 BLB(JOJJOEFY 4IBOOPOFOUSPQZ TQBSTFNBY θ ̂ y(θ) m ℓ(m) TQBSTFNBYMPTT H⋆ 2 (θ) + H2 (y) − ⟨θ, y⟩ NPEJpFE)VCFSMPTT TQFDJBMJ[FEJOCJOBSZDMBTTJpDBUJPO
0UIFS/JDF1SPQFSUZ 0WFSWJFX ˙ 4FQBSBUJPONBSHJO j j BpOJUFTDPSFBUUBJOT[FSPMPTT JG JTlTQBSTFz
˙ $BMJCSBUFETVSSPHBUF j NJOJNJ[JOH':MPTTMFBETUPNJOJNJ[JOHDMBTTJpDBUJPOFSSPS NPSFEJTDVTTJPOJTOFFEFEGPSTUSVDUVSFEQSFEJDUJPO ˙ &⒏DJFOUPQUJNJ[BUJPO j BMXBZTDPOWFYCZOBUVSFPQUJNJ[BCMFXJUI'SBOL8PMGFBMHPSJUIN JUFSBUJWFMZNJOJNJ[JOHMJOFBSBQQSPY Ω m ℓ(m) MPHJTUJD 4IBOOPO WBOJTIFTBU 㱣OPTFQNHO TQBSTFNBY 5TBMMJT TFQNHO m ℓ(m) 㱺OPQFOBMJ[BUJPOPOMBSHFFOPVHIQSFEJDUJPONBSHJOT
4VNNBSZ *OQVUTQBDF 4DPSFTQBDF 0VUQVUTQBDF x θ fW ̂ yΩ
QSFEJDUJPO ̂ y UBSHFUMBCFM y
4VNNBSZ *OQVUTQBDF 4DPSFTQBDF 0VUQVUTQBDF x θ fW ̂ yΩ
QSFEJDUJPO ̂ y UBSHFUMBCFM y ̂ yΩ (θ) = argmax y∈dom(Ω) ⟨θ, y⟩ − Ω(y) 3FHVMBSJ[FEQSFEJDUJPO NBLFTQBSTF USBDUBCMF ʜ
4VNNBSZ *OQVUTQBDF 4DPSFTQBDF 0VUQVUTQBDF x θ fW ̂ yΩ
QSFEJDUJPO ̂ y UBSHFUMBCFM y ̂ yΩ (θ) = argmax y∈dom(Ω) ⟨θ, y⟩ − Ω(y) 3FHVMBSJ[FEQSFEJDUJPO NBLFTQBSTF USBDUBCMF ʜ LΩ (θ; y) := Ω⋆(θ) + Ω(y) − ⟨θ, y⟩ 'FODIFM:PVOHMPTT TZTUFNBUJDXBZDPOTUSVDUJOHMPTTGSPNΩ