Slide 13
Slide 13 text
V. Comparison & Tuning
[Reimers & Gurevych 2017]
• “Optimal Hyperparameters for Deep LSTM-Networks for Sequence Labeling Tasks”
• “Reporting Score Distributions Makes a Difference: Performance Study of
LSTM-networks for Sequence Tagging” (summary of above paper)
• Evaluation with over 50,000 (!) setups
• [Huang et al. 2015], [Lample et al. 2016], and [Ma & Hovy 2016]
• NNs are non-deterministic → results depends on initialization
• Diff. between ”Stete-of-the-art” & ”mediocre” may be insignificant
• Score Distribution would be more trustable
• Hyperparameters
• Word Embeddings, Character Representation, Optimizer, Gradient
Clipping & Normalization, Tagging Schemes, Classifier, Dropout,
Number of LSTM Layers / Recurrent Units, Mini-batch Size, Backend
• Lots of empirical knowledge and best pracitce advices
13 / 17