$30 off During Our Annual Pro Sale. View Details »

単語埋め込みに関連した応用事例の紹介 / Case Study in Word Embedding

Sansan
January 30, 2019

単語埋め込みに関連した応用事例の紹介 / Case Study in Word Embedding

■イベント
【京都開催】第一回SIL勉強会 自然言語処理編
https://sansan.connpass.com/event/116853/

■登壇概要
タイトル:
単語埋め込みに関連した応用事例の紹介

登壇者:
DSOC R&D研究員 奥田裕樹

▼Sansan Builders Box
https://buildersbox.corp-sansan.com/

Sansan

January 30, 2019
Tweet

More Decks by Sansan

Other Decks in Technology

Transcript


  1. & & & & & &
    SIL

    View Slide

  2. View Slide


  3. Character


    Word

    document

    Sentence

    Clause
    2 2
    Sansan 2
    2

    View Slide

  4. A La Carte Embedding
    3
    EmbedRank

    View Slide

  5. 2 0 1
    A La Carte Embedding: Cheap but Effective Induction of
    Semantic Feature Vectors
    e t [C g a Ai
    E n 4 Ae t [C A b]C L 8
    b
    b b]C m d 8 r
    ASansan Advent Calendar &
    https://yag-ays.github.io/project/alacarte/

    View Slide

  6. n-gram
    5
    A La Carte Embedding

    View Slide

  7. 6
    /
    6
    .
    .
    1

    View Slide

  8. 2 7 7
    7 2
    . SGNS CBOW
    7
    2
    2 7
    2

    View Slide

  9. 3 .
    .
    .
    8

    View Slide

  10. . 4 9
    4
    .
    9

    View Slide

  11. 1
    ( )
    2010 2009 , 2012 , 2008 , 2011 , 2013 , 2007 , 2006 , 2014 , 2011, 2006
    , , , , , , , , ,
    , , , , , , r, , ,
    m p , d, W , W , a , a , PSM,
    r, e , DDR
    , , d, 2 d, , d, , bk , i
    , a d
    e TV , r, , , , , , W g , ,
    p d bigram0

    View Slide

  12. A
    A La Carte Embedding
    dE aE A2 on-the-fly
    b t g
    i aE skip-thought Ag A m
    & E
    t g A 1
    n
    11 e
    C A A L r e

    View Slide

  13. 0 1
    Simple Unsupervised Keyphrase Extraction
    using Sentence Embeddings
    N 8 2 E2C
    N 8 E2C
    O [ Maximal Marginal Relevance
    L R
    https://github.com/swisscom/ai-research-keyphrase-extraction

    EmbedRankEmbedRank++
    EmbedRank++

    View Slide

  14. EmbedRank
    3 1 E
    Ra d E 1
    E 3 d
    (Sentence Embedding)
    b sent2vec doc2vec ed
    •Pagliardini et al., 2017 “Unsupervised Learning of Sentence Embeddings using Compositional n-Gram Features.”
    •Lau and Baldwin, 2016 “An empirical evaluation of doc2vec with practical insights into document embedding generation”

    View Slide

  15. a
    4
    naïve d
    e 4 Maximal Marginal Relevance b



    E R 14
    4

    View Slide

  16. R 5
    1
    MMR 5 E1 1
    topological shape
    topological shapes

    View Slide

  17. 6 1
    PositionRank https://github.com/ymym3412/position-rank
    ACL2018, PageRank 1
    6 2
    termextract https://github.com/kanjirz50/termextract

    View Slide

  18. 1 7
    EmbedRank PositionRank termextract















    7 1
    7
    7 7
    7
    7 1
    https://www.jstage.jst.go.jp/article/pjsai/JSAI2017/0/JSAI2017_1J14/_article/-char/ja/

    View Slide

  19. ‒ ( ) ( )
    42 8 (
    ) 30
    8
    18 2013 - 6 1
    https://ja.wikinews.org/wiki/
    EmbedRank PositionRank termextract


    • (
    )
    • ‒ (


    • ‒
    • _







    View Slide

  20. 1 9






    • '"%/
    • -
    • 0
    • &
    • *0)
    • 6+1#
    • ,
    • 2013+
    • .(2
    • 1
    • !
    • !(2
    • $42!


    • (1
    .
    )

    View Slide

  21. EmbedRank
    E
    MMRE k E
    &
    d m e+MMR b0 R 0
    sentence embedding e a 2
    2 2

    View Slide

  22. A La Carte Embedding
    21
    2
    EmbedRank
    2
    2
    2
    1

    View Slide