Upgrade to Pro — share decks privately, control downloads, hide ads and more …

論文紹介:What Context Features Can Transformer Language Models Use?

yuri
September 09, 2021

論文紹介:What Context Features Can Transformer Language Models Use?

yuri

September 09, 2021
Tweet

More Decks by yuri

Other Decks in Research

Transcript

  1. What Context Features Can
    Transformer Language Models Use?
    読む⼈︓村⼭友理(お茶⼤) 2021/09/17 第13回最先端NLP勉強会
    Joe O’Connor and Jacob Andreas, ACL 2021 事前投票4票

    View full-size slide

  2. Research Question
    2
    John went to the library to check out a book.
    p(book | context)
    • Count-based LMs: 10-20 tokens [Brown 2011]
    • RNNs: ~200 tokens [Khandelwal+ 2018]
    • Transformer LMs: 1,000+ tokens [Beltagy+ 2020]
    なぜcontextは⻑い⽅が良いのか︖=⻑いcontextが何を与えるのか︖

    View full-size slide

  3. contextのどんな情報が有⽤なのか
    3
    In 2000, producer David Heyman asked Radcliffe to audition for the role of
    Harry Potter for the film adaptation of Harry Potter and the Philosopher’s Stone,
    the best-selling book by British author J.K. Rowling. Rowling had been
    searching for an unknown British actor to personify the character, and the
    movie’s director Chris Columbus recalled thinking, ”This is what I want. This is
    Harry Potter”, after he saw a video of the young actor in David Copperfield.
    Eight months later, and after several auditions, Radcliffe was selected to play
    the part. Rowling also endorsed the selection saying, ”I don’t think Chris
    Columbus could have found a better Harry.”
    ターゲットから離れたcontextでは、固有表現の情報のみが使われると仮定すると
    p(Harry | full context) ≈ p(Harry | named-entity-only context + ordinary context)

    View full-size slide

  4. 4
    ターゲットから離れたcontextでは、固有表現の情報のみが使われると仮定すると
    p(Harry | full context) ≈ p(Harry | named-entity-only context + ordinary context)
    情報量の差分が⼩さければ、仮定が成り⽴つ
    In 2000, producer David Heyman asked Radcliffe to audition for the role of
    Harry Potter for the film adaptation of Harry Potter and the Philosopher’s Stone,
    the best-selling book by British author J.K. Rowling. Rowling had been
    searching for an unknown British actor to personify the character, and the
    movie’s director Chris Columbus recalled thinking, ”This is what I want. This is
    Harry Potter”, after he saw a video of the young actor in David Copperfield.
    Eight months later, and after several auditions, Radcliffe was selected to play
    the part. Rowling also endorsed the selection saying, ”I don’t think Chris
    Columbus could have found a better Harry.”
    contextのどんな情報が有⽤なのか

    View full-size slide

  5. Ablated Information
    5
    • ablated information
    • ablated likelihood
    • 直感的には、A(f, k) はkトークンにより追加された情報に対して、それら
    kトークンにablation f を適⽤することで失われる割合を計算
    • 0に近ければ何の情報も落ちない︔1 に近ければ情報はすべて落ちる

    View full-size slide

  6. Ablated Information
    6
    • ablated information
    • ablated likelihood
    • 直感的には、A(f, k) はkトークンにより追加された情報に対して、それら
    kトークンにablation f を適⽤することで失われる割合を計算
    • 0に近ければ何の情報も落ちない︔1 に近ければ情報はすべて落ちる
    n
    n-k
    k n-k
    n

    View full-size slide

  7. 実験設定
    7
    GPT-2 [Radford+ 2019] をWikiText-103 dataset [Merity+ 2016] で学習
    • roughly 100 training runs
    Transformer LM
    2000 David Heyman
    Radcliffe Harry
    Potter Harry Potter
    and the Philosopher’s
    Stone British J.K.
    Rowling.
    Rowling had been searching for an
    unknown British actor to personify the
    character, and the movie’s director
    Chris Columbus recalled thinking, ”This
    is what I want. This is Harry Potter”,
    after he saw a video of the young actor
    in David Copperfield.
    Eight months later, and after several
    auditions, Radcliffe was selected to
    play the part. Rowling also
    endorsed the selection saying, ”I
    don’t think Chris Columbus could
    have found a better Harry.”
    512 512+512
    512+256
    ordinary context
    ablated context
    long-range
    mid-range

    View full-size slide

  8. Does order matter?
    8
    Pierre Vinken, 61 years old, will join the board as a
    nonexecutive director Nov. 29.
    Mr. Vinken is chairman of Elsevier N.V., the Dutch
    publishing group.

    View full-size slide

  9. Does order matter?
    9
    かなり破壊的

    View full-size slide

  10. Does order matter?
    10

    View full-size slide

  11. Does order matter?
    11
    局所的な共起関係が保たれれば、正しい語順はあまり重要ではない
    • dog bites man ≈man bites dog

    View full-size slide

  12. Does order matter?
    12
    ⼊⼒全体を、同じドキュメント内の直前の
    512トークンに置き換え(トピック的には
    似ている)

    View full-size slide

  13. Does order matter?
    13
    • 半分以上の情報が失われる
    • トピック情報を与えるわけではない︖

    View full-size slide

  14. Do all words matter?
    14
    • 固有表現のみを保持しておけば良いという訳ではない
    • 名詞が有⽤な情報のほぼ全てを与えている

    View full-size slide

  15. まとめ
    15
    • long-range context の情報が transformer モデルにどのように使われ
    るかを調べた
    • 有⽤な情報は内容語や局所的な共起関係に主に含まれる
    • ⻑い context の効果はトピックや固有表現だけでは説明できない
    • context内の情報量の少ない語(例 padding token)を情報量の多い語
    (例 nouns+verbs)に置き換えても、結果が良くなるわけではなかった

    View full-size slide