Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Learning with Fenchel-Young Losses
Search
Han Bao
June 10, 2020
Science
0
350
Learning with Fenchel-Young Losses
I read the paper "Learning with Fenchel-Young Losses" (JMLR2020):
https://arxiv.org/abs/1901.02324
Han Bao
June 10, 2020
Tweet
Share
Other Decks in Science
See All in Science
オンプレミス環境にKubernetesを構築する
koukimiura
0
340
05_山中真也_室蘭工業大学大学院工学研究科教授_だてプロの挑戦.pdf
sip3ristex
0
630
NASの容量不足のお悩み解決!災害対策も兼ねた「Wasabi Cloud NAS」はここがスゴイ
climbteam
0
120
地表面抽出の方法であるSMRFについて紹介
kentaitakura
1
870
Accelerated Computing for Climate forecast
inureyes
PRO
0
120
Explanatory material
yuki1986
0
400
凸最適化からDC最適化まで
santana_hammer
1
290
データベース11: 正規化(1/2) - 望ましくない関係スキーマ
trycycle
PRO
0
940
SciPyDataJapan 2025
schwalbe10
0
260
Quelles valorisations des logiciels vers le monde socio-économique dans un contexte de Science Ouverte ?
bluehats
1
490
データベース04: SQL (1/3) 単純質問 & 集約演算
trycycle
PRO
0
980
Transport information Geometry: Current and Future II
lwc2017
0
190
Featured
See All Featured
Done Done
chrislema
185
16k
Product Roadmaps are Hard
iamctodd
PRO
54
11k
Facilitating Awesome Meetings
lara
55
6.5k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
34
6k
Why You Should Never Use an ORM
jnunemaker
PRO
59
9.5k
The Art of Delivering Value - GDevCon NA Keynote
reverentgeek
15
1.6k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
How to train your dragon (web standard)
notwaldorf
96
6.2k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.9k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
910
The World Runs on Bad Software
bkeepers
PRO
70
11k
Designing for humans not robots
tammielis
253
25k
Transcript
-FBSOJOHXJUI 'FODIFM:PVOH-PTTFT $SFBUFECZ)BO#BP 1I%BU65PLZP$4 <#MPOEFM .BSUJOTBOE/JDVMBF+.-3>
8IBUJTMPTTGVODUJPOT ˙ .FBTVSJOHEJ⒎FSFODFCFUXFFOUBSHFUBOEQSFEJDUJPO ⾣ &YBNQMFSFHSFTTJPO ⾣ &YBNQMFCJOBSZDMBTTJpDBUJPO
yf(x) ℓ(yf(x)) DPSSFDU XSPOH y − f(x) ℓ(y − f(x)) NBLJOH DMPTFSUP TRVBSFEMPTT )VCFSMPTT f(x) y NBLJOH FRVBMUP MPTT MPHJTUJDMPTT IJOHFMPTT sign( f(x)) sign(y)
'FODIFM:PVOH-PTT %FpOJUJPO-FU CFBlQSFEJDUJPOzSFHVMBSJ[FS Ω : ℝd → ℝ
LΩ (θ; y) := Ω⋆(θ) + Ω(y) − ⟨θ, y⟩ QSFEJDUJPO TDPSF ∈ ℝd UBSHFUMBCFM ∈ dom(Ω) 'FODIFMDPOKVHBUF Ω⋆(θ) := sup μ∈dom(Ω) ⟨θ, μ⟩ − Ω(μ) 8IBUPOUIFFBSUIEPFTJUNFBO 1PUFOUJBMRVFTUJPOT UPCFBOTXFSFE 28IBUJTlQSFEJDUJPOzSFHVMBSJ[FS 28IZEPXFOFFESFHVMBSJ[BUJPOPGQSFEJDUJPO 28IZJTUIFMPTTEFpOFEBTBCPWF
1JQFMJOFPG4VQFSWJTFE-FBSOJOH *OQVUTQBDF 4DPSFTQBDF 0VUQVUTQBDF ℝd x θ ̂ y
QBSBNFUSJ[FENPEFM fW QSFEJDUJPOGVODUJPO ̂ yΩ 0.821 1.215 ⋮ 5.382 ⋮ −1.012 0 0 ⋮ 1 ⋮ 0 ∈ %// fW BSHNBY ̂ yΩ *OQVU 4DPSF 0VUQVU &YBNQMF DMBTTJpDBUJPO IPUWFDT
1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩
1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ Δ3
BSHNBY ˙ DIPPTJOHBWFSUFY USBDUBCMF
1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ Δ3
BSHNBY ˙ DIPPTJOHBWFSUFY USBDUBCMF OPOVOJRTPMVUJPO Δ3 BSHNBY
1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ Δ3
BSHNBY ˙ DIPPTJOHBWFSUFY USBDUBCMF OPOVOJRTPMVUJPO Δ3 BSHNBY OPOEJ⒎FSFOUJBCMF OPVODFSUBJOUZ
1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ TPGUNBY argmax y∈Δd
⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ
TPGUNBY ˙ SFHVMBSJ[FUPXBSETDFOUFS argmax y∈Δd ⟨θ, y⟩
+ HS (y) = exp θi ∑d j=1 exp θj i PSEJOBSZFYQSFTTJPO 1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ TPGUNBY argmax y∈Δd ⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ
TPGUNBY ˙ SFHVMBSJ[FUPXBSETDFOUFS argmax y∈Δd ⟨θ, y⟩
+ HS (y) = exp θi ∑d j=1 exp θj i PSEJOBSZFYQSFTTJPO 1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ EJ⒎FSFOUJBCMF VODFSUBJOUZ TPGUNBY argmax y∈Δd ⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ
TPGUNBY ˙ SFHVMBSJ[FUPXBSETDFOUFS argmax y∈Δd ⟨θ, y⟩
+ HS (y) = exp θi ∑d j=1 exp θj i PSEJOBSZFYQSFTTJPO 1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ EFOTFTVQQPSU GPSMBSHF JTJOUSBDUBCMF ∑d j=1 d EJ⒎FSFOUJBCMF VODFSUBJOUZ TPGUNBY argmax y∈Δd ⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ
1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ TPGUNBY argmax y∈Δd
⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 5TBMMJT FOUSPQZ
1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ TPGUNBY argmax y∈Δd
⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 5TBMMJT FOUSPQZ ˙ &VDMJEFBOQSPKFDUJPOUPXBSETTJNQMFY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 = argmax y∈Δd ∥y − θ∥2 UFOEUPCFTQBSTF
1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ TPGUNBY argmax y∈Δd
⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 5TBMMJT FOUSPQZ ˙ &VDMJEFBOQSPKFDUJPOUPXBSETTJNQMFY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 = argmax y∈Δd ∥y − θ∥2 UFOEUPCFTQBSTF EDBTF EFOTF TQBSTF Δ2
1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ TPGUNBY argmax y∈Δd
⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 5TBMMJT FOUSPQZ ˙ &VDMJEFBOQSPKFDUJPOUPXBSETTJNQMFY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 = argmax y∈Δd ∥y − θ∥2 UFOEUPCFTQBSTF EDBTF EFOTF TQBSTF Δ2 EDBTF Δ3 EFOTF TQBSTF
1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ TPGUNBY argmax y∈Δd
⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 5TBMMJT FOUSPQZ ˙ &VDMJEFBOQSPKFDUJPOUPXBSETTJNQMFY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 = argmax y∈Δd ∥y − θ∥2 UFOEUPCFTQBSTF EDBTF EFOTF TQBSTF Δ2 EDBTF Δ3 EFOTF TQBSTF ⾣ QPJOUTJO ˠEFOTFQSPK ⾣ PUIFSXJTFˠTQBSTFQSPK ⾣ JTGBSTNBMMFSUIBOℝd
1SFEJDUJPO'VODUJPOT BSHNBY argmax y∈Δd ⟨θ, y⟩ TPGUNBY argmax y∈Δd
⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 5TBMMJT FOUSPQZ TQBSTFNBY USBDUBCMF TUJMMJUEFQFOET EJ⒎FSFOUJBCMF VOJRVFTPMVUJPO TQBSTFTVQQPSU ˠJOUFSQSFUBCMF ˙ &VDMJEFBOQSPKFDUJPOUPXBSETTJNQMFY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 = argmax y∈Δd ∥y − θ∥2
l3FHVMBSJ[FEz1SFEJDUJPO BSHNBY argmax y∈Δd ⟨θ, y⟩ TPGUNBY argmax y∈Δd
⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 5TBMMJT FOUSPQZ %FpOJUJPO-FU CFBSFHVMBSJ[FS 5IFQSFEJDUJPOGVODUJPOSFHVMBSJ[FECZ JT Ω : ℝd → ℝ Ω ̂ yΩ (θ) = argmax y∈dom(Ω) ⟨θ, y⟩ − Ω(y) QSFEJDUJPO TDPSF ∈ ℝd NBLFTQSFEJDUJPO BQBSUGSPNWFSUJDFT CFBXBSFEJ⒎FSFOU GSPNVTVBMSFHVMBSJ[BUJPO Loss( fW ) + λ∥W∥2 F
'VSUIFS4USVDUVSFE1SFEJDUJPO UIJTQBSUKVTUNPUJWBUFTSFHVMBSJ[BUJPOGVSUIFSZPVNBZTLJQJU &YBNQMF4FRVFODFMBCFMJOH 0VUQVUTQBDFDPOTJTUTPGTUSVDUVSFEPCKFDUTTVDIBTHSBQIT * MPWF MPTT GVODUJPOT /
/ / / 7 / / / 7 / / / ʜ ʜ JOQVUx PVUQVU DBOETy ʜ ʜ TDPSFTθ MFOHUIn 7 / + ʜ TJ[Fm TFUPGMBCFMT QSPCBCJMJUZ TJNQMFY
'VSUIFS4USVDUVSFE1SFEJDUJPO UIJTQBSUKVTUNPUJWBUFTSFHVMBSJ[BUJPOGVSUIFSZPVNBZTLJQJU &YBNQMF4FRVFODFMBCFMJOH 0VUQVUTQBDFDPOTJTUTPGTUSVDUVSFEPCKFDUTTVDIBTHSBQIT * MPWF MPTT GVODUJPOT /
/ / / 7 / / / 7 / / / ʜ ʜ JOQVUx PVUQVU DBOETy ʜ ʜ TDPSFTθ MFOHUIn 7 / + ʜ TJ[Fm TFUPGMBCFMT FYQPOFOUJBM || = mn QSPCBCJMJUZ TJNQMFY
'VSUIFS4USVDUVSFE1SFEJDUJPO UIJTQBSUKVTUNPUJWBUFTSFHVMBSJ[BUJPOGVSUIFSZPVNBZTLJQJU &YBNQMF-JOFBSBTTJHONFOU FHMJTUXJTFSBOLJOH 0VUQVUTQBDFDPOTJTUTPGTUSVDUVSFEPCKFDUTTVDIBTHSBQIT JOQVUx PVUQVU DBOETy
ʜ ʜ TDPSFTθ #JSLIP⒎ QPMZUPQF EPD EPD EPD EPD PGEPDTn ʜ ʜ
'VSUIFS4USVDUVSFE1SFEJDUJPO UIJTQBSUKVTUNPUJWBUFTSFHVMBSJ[BUJPOGVSUIFSZPVNBZTLJQJU &YBNQMF-JOFBSBTTJHONFOU FHMJTUXJTFSBOLJOH 0VUQVUTQBDFDPOTJTUTPGTUSVDUVSFEPCKFDUTTVDIBTHSBQIT JOQVUx PVUQVU DBOETy
ʜ ʜ TDPSFTθ #JSLIP⒎ QPMZUPQF EPD EPD EPD EPD PGEPDTn FYQPOFOUJBM || = n! ʜ ʜ
'VSUIFS4USVDUVSFE1SFEJDUJPO ˙ -PXEJNFOTJPOBMJOIFSFOUTUSVDUVSFFYJTUT UIP JTFYQPOFOUJBMMZMBSHF
˙ &YBNQMF4FRVFODFMBCFMJOH ⾣ "TTVNQJOQVUXPSENBUUFST ⾣ "TTVNQQSFWMBCFMNBUUFST || UIJTQBSUKVTUNPUJWBUFTSFHVMBSJ[BUJPOGVSUIFSZPVNBZTLJQJU x1 x2 x3 x4 y1 y2 y3 y4 *OQVUTQBDF 4DPSFTQBDF x θ NPEFM fW -PXEJN TDPSFTQBDF η MJOFBSUSBOT M QSPCMFNEFQFOEFOU 㱺MPXEJNTUSVDUVSFO(nm2) XJEFMZVTFEJOMJOFBSDIBJO$3'T
'VSUIFS4USVDUVSFE1SFEJDUJPO ."1JOGFSFODF argmax y∈conv() ⟨θ, y⟩ NBSHJOBMJOGFSFODF argmax y∈conv()
⟨θ, y⟩ + HS (y) 4IBOOPO FOUSPQZ 4QBSTF."1 argmax y∈conv() ⟨θ, y⟩ + H2 (y) 5TBMMJT FOUSPQZ UIJTQBSUKVTUNPUJWBUFTSFHVMBSJ[BUJPOGVSUIFSZPVNBZTLJQJU USBDUBCMF OPVODFSUBJOUZ OPEJ⒎FSFOUJBUJPO EJ⒎FSFOUJBCMF VODFSUBJOUZ PGUFO JOUSBDUBCMF EFOTFTVQQPSU EJ⒎FSFOUJBCMF VODFSUBJOUZ USBDUBCMF 'SBOL8PMGF TQBSTFTVQQPSU 3FNBSL TQBSTFNBYEPFTOPUVUJMJ[F MPXEJNTUSVDUVSF 3FNBSL5SBDUBCJMJUZ TFRVFODFMBCFMJOH."1 7JUFSCJ NBSHJOBMBSF MJOBTTJHO."1 )VOHBSJBO JT NBSHJOBMJT1DPNQ O(nm2) O(n3)
)PXUPEFTJHOMPTT *OQVUTQBDF 4DPSFTQBDF 0VUQVUTQBDF x θ fW ̂ yΩ
QSFEJDUJPO ̂ y TPGUNBY argmax y∈Δd ⟨θ, y⟩ + HS (y) TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2
)PXUPEFTJHOMPTT *OQVUTQBDF 4DPSFTQBDF 0VUQVUTQBDF x θ fW ̂ yΩ
QSFEJDUJPO ̂ y TPGUNBY argmax y∈Δd ⟨θ, y⟩ + HS (y) TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 UBSHFUMBCFM y
)PXUPEFTJHOMPTT *OQVUTQBDF 4DPSFTQBDF 0VUQVUTQBDF x θ fW ̂ yΩ
QSFEJDUJPO ̂ y TPGUNBY argmax y∈Δd ⟨θ, y⟩ + HS (y) TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 UBSHFUMBCFM y 2)PXUPNFBTVSF
)PXUPEFTJHOMPTT TPGUNBY argmax y∈Δd ⟨θ, y⟩ + HS (y)
TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 1SFEJDUJPOGVODUJPO -PTTGVODUJPO
)PXUPEFTJHOMPTT TPGUNBY argmax y∈Δd ⟨θ, y⟩ + HS (y)
TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 1SFEJDUJPOGVODUJPO -PTTGVODUJPO DSPTTFOUSPQZ log∑ i exp θi − θk UBSHFUDMBTT
)PXUPEFTJHOMPTT TPGUNBY argmax y∈Δd ⟨θ, y⟩ + HS (y)
TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 1SFEJDUJPOGVODUJPO -PTTGVODUJPO DSPTTFOUSPQZ log∑ i exp θi − θk UBSHFUDMBTT 28IZJUJTHPPE
)PXUPEFTJHOMPTT TPGUNBY argmax y∈Δd ⟨θ, y⟩ + HS (y)
TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 1SFEJDUJPOGVODUJPO -PTTGVODUJPO DSPTTFOUSPQZ log∑ i exp θi − θk UBSHFUDMBTT ʁʁʁ 28IZJUJTHPPE
)PXUPEFTJHOMPTT TPGUNBY argmax y∈Δd ⟨θ, y⟩ + HS (y)
TQBSTFNBY argmax y∈Δd ⟨θ, y⟩− 1 2 ∥y∥2 2 1SFEJDUJPOGVODUJPO -PTTGVODUJPO DSPTTFOUSPQZ log∑ i exp θi − θk UBSHFUDMBTT ʁʁʁ 28IZJUJTHPPE 2)PXUPEFTJHO
'FODIFM:PVOH-PTT %FpOJUJPO-FU CFBlQSFEJDUJPOzSFHVMBSJ[FS Ω : ℝd → ℝ
LΩ (θ; y) := Ω⋆(θ) + Ω(y) − ⟨θ, y⟩ QSFEJDUJPO TDPSF UBSHFUMBCFM 'FODIFMDPOKVHBUF Ω⋆(θ) := sup μ∈dom(Ω) ⟨θ, μ⟩ − Ω(μ) 5XPLFZQSPQFSUJFT ˙ ':MPTTJTOPOOFHBUJWF ˙ DPSSFDUQSFE J⒎[FSPMPTT y = ̂ yΩ (θ)
'FODIFM:PVOH-PTT %FpOJUJPO-FU CFBlQSFEJDUJPOzSFHVMBSJ[FS Ω : ℝd → ℝ
LΩ (θ; y) := Ω⋆(θ) + Ω(y) − ⟨θ, y⟩ QSFEJDUJPO TDPSF UBSHFUMBCFM 'FODIFMDPOKVHBUF Ω⋆(θ) := sup μ∈dom(Ω) ⟨θ, μ⟩ − Ω(μ) 5XPLFZQSPQFSUJFT ˙ ':MPTTJTOPOOFHBUJWF ˙ DPSSFDUQSFE J⒎[FSPMPTT y = ̂ yΩ (θ) .JOJNJ[JOH':MPTTNBLFTQSFEJDUJPODMPTFUPUBSHFUMBCFM
'FODIFM:PVOH-PTT %FpOJUJPO-FU CFBlQSFEJDUJPOzSFHVMBSJ[FS Ω : ℝd → ℝ
LΩ (θ; y) := Ω⋆(θ) + Ω(y) − ⟨θ, y⟩ QSFEJDUJPO TDPSF UBSHFUMBCFM 'FODIFMDPOKVHBUF Ω⋆(θ) := sup μ∈dom(Ω) ⟨θ, μ⟩ − Ω(μ) 5XPLFZQSPQFSUJFT ˙ ':MPTTJTOPOOFHBUJWF ˙ DPSSFDUQSFE J⒎[FSPMPTT y = ̂ yΩ (θ) .JOJNJ[JOH':MPTTNBLFTQSFEJDUJPODMPTFUPUBSHFUMBCFM 1SPPG6TF'FODIFM:PVOHJOFRVBMJUZ Ω⋆(θ) + Ω(y) ≥ {⟨θ, y⟩ − Ω(y)} + Ω(y) = ⟨θ, y⟩
(FPNFUSJDBM*OUFSQSFUBUJPO Ω(y) y μ 'PSSFHVMBSJ[FS
(FPNFUSJDBM*OUFSQSFUBUJPO Ω(y) y μ 'PSSFHVMBSJ[FS ̂ yΩ (θ) ESBXUBOHFOU
BU CZEFGPG'FODIFMDPOKVHBUF ̂ yΩ (θ) ⟨θ, μ⟩ − Ω⋆(θ)
(FPNFUSJDBM*OUFSQSFUBUJPO Ω(y) y μ 'PSSFHVMBSJ[FS ̂ yΩ (θ) ESBXUBOHFOU
BU CZEFGPG'FODIFMDPOKVHBUF ̂ yΩ (θ) ⟨θ, μ⟩ − Ω⋆(θ) ⟨θ, y⟩ − Ω⋆(θ) −Ω⋆(θ)
(FPNFUSJDBM*OUFSQSFUBUJPO Ω(y) y μ 'PSSFHVMBSJ[FS ̂ yΩ (θ) ESBXUBOHFOU
BU CZEFGPG'FODIFMDPOKVHBUF ̂ yΩ (θ) ⟨θ, μ⟩ − Ω⋆(θ) ⟨θ, y⟩ − Ω⋆(θ) −Ω⋆(θ) LΩ (y; θ) -PTTJTEJTUBODFCFUXFFO BOE BU #SFHNBOEJWFSHFODF y
&YBNQMF4IBOOPO&OUSPQZ HS (y) = − d ∑ j=1 yj
log yj ̂ yHS (θ) = argmax y∈Δd ⟨θ, y⟩ − HS (y) = exp θ ∑d j=1 exp θj TPGUNBY θ ̂ y(θ) CJOBSZTPGUNBYTJHNPJE LHS (θ; y) = H⋆ S (θ) + HS (y) − ⟨θ, y⟩ = log d ∑ j=1 exp θj − θk BTTVNJOHy = ek DSPTTFOUSPQZ JTMPHJTUJDMPTTJOCJOBSZDBTF LHS ( ̂ yHS (θ); y)
&YBNQMF5TBMMJT&OUSPQZ H2 (y) = 1 2 d ∑ j=1
yj (1 − yj ) Hα (y) = 1 α(α − 1) d ∑ j=1 (yj − yα j ) HS (y) = − d ∑ j=1 yj log yj 5TBMMJTFOUSPQZ α α → 2 α → 1 BLB(JOJJOEFY 4IBOOPOFOUSPQZ TQBSTFNBY θ ̂ y(θ) m ℓ(m) TQBSTFNBYMPTT H⋆ 2 (θ) + H2 (y) − ⟨θ, y⟩ NPEJpFE)VCFSMPTT TQFDJBMJ[FEJOCJOBSZDMBTTJpDBUJPO
0UIFS/JDF1SPQFSUZ 0WFSWJFX ˙ 4FQBSBUJPONBSHJO j j BpOJUFTDPSFBUUBJOT[FSPMPTT JG JTlTQBSTFz
˙ $BMJCSBUFETVSSPHBUF j NJOJNJ[JOH':MPTTMFBETUPNJOJNJ[JOHDMBTTJpDBUJPOFSSPS NPSFEJTDVTTJPOJTOFFEFEGPSTUSVDUVSFEQSFEJDUJPO ˙ &⒏DJFOUPQUJNJ[BUJPO j BMXBZTDPOWFYCZOBUVSFPQUJNJ[BCMFXJUI'SBOL8PMGFBMHPSJUIN JUFSBUJWFMZNJOJNJ[JOHMJOFBSBQQSPY Ω m ℓ(m) MPHJTUJD 4IBOOPO WBOJTIFTBU 㱣OPTFQNHO TQBSTFNBY 5TBMMJT TFQNHO m ℓ(m) 㱺OPQFOBMJ[BUJPOPOMBSHFFOPVHIQSFEJDUJPONBSHJOT
4VNNBSZ *OQVUTQBDF 4DPSFTQBDF 0VUQVUTQBDF x θ fW ̂ yΩ
QSFEJDUJPO ̂ y UBSHFUMBCFM y
4VNNBSZ *OQVUTQBDF 4DPSFTQBDF 0VUQVUTQBDF x θ fW ̂ yΩ
QSFEJDUJPO ̂ y UBSHFUMBCFM y ̂ yΩ (θ) = argmax y∈dom(Ω) ⟨θ, y⟩ − Ω(y) 3FHVMBSJ[FEQSFEJDUJPO NBLFTQBSTF USBDUBCMF ʜ
4VNNBSZ *OQVUTQBDF 4DPSFTQBDF 0VUQVUTQBDF x θ fW ̂ yΩ
QSFEJDUJPO ̂ y UBSHFUMBCFM y ̂ yΩ (θ) = argmax y∈dom(Ω) ⟨θ, y⟩ − Ω(y) 3FHVMBSJ[FEQSFEJDUJPO NBLFTQBSTF USBDUBCMF ʜ LΩ (θ; y) := Ω⋆(θ) + Ω(y) − ⟨θ, y⟩ 'FODIFM:PVOHMPTT TZTUFNBUJDXBZDPOTUSVDUJOHMPTTGSPNΩ