Slide 35
Slide 35 text
参考⽂献 2 (Transformer⾔語モデルの分析)
• [Clark+’19] What Does BERT Look at? An Analysis of BERT’s Attention. In Proceedings of BlackboxNLP,
pp.276-286, 2019.
https://aclanthology.org/W19-4828/
• [Vig&Belinkov’19] Analyzing the Structure of Attention in a Transformer Language Model. In Proceedings
of BlackboxNLP, pp.63-76, 2019.
https://aclanthology.org/W19-4808/
• [Xiao+’23] Efficient Streaming Language Models with Attention Sinks. arXiv preprint, arXiv:2309.17453,
2023.
https://arxiv.org/abs/2309.17453
• [Miller+’23] Attention Is Off By One. Blog post, 2023.
https://www.evanmiller.org/attention-is-off-by-one.html
• [Tenney+’19] BERT Rediscovers the Classical NLP Pipeline. In Proceedings of ACL, pp.4593-4601, 2019.
https://www.aclweb.org/anthology/P19-1452/
• [Modarressi+’22] GlobEnc: Quantifying Global Token Attribution by Incorporating the Whole Encoder
Layer in Transformers. In Proceedings of NAACL, pp. 258-271, 2022.
https://aclanthology.org/2022.naacl-main.19/
2023/10/22 PhD colloquium @Tohoku NLP