Slide 30
Slide 30 text
Kaggle Winner Presentation Template 30
Features Selection/Engineering ~3. Replacing Softmax with Entmax~
• Assumption
• The output of softmax is not zero for every target variable, but many training data labels have zero probability classes
-> Sparse output may improve the score
• What we did
• Replacing softmax with entmax only during inference
• entmax is the function that is like softmax, but entmax can generate more sparse output
• We use α = 1.03
• Improving both public and private LB score ~0.004
[-2, 0, 0.5] softmax [0.049, 0.359, 0.592]
entmax (α=1.5) [0.0, 0.326, 0.674]
entmax (α=2.0) [0.0, 0.25, 0.75 ]
= entmax (α=1.0)