Upgrade to Pro — share decks privately, control downloads, hide ads and more …

カンファレンスセッションの選択傾向を知りたい / Let’s study trends of entry to conference sessions

カンファレンスセッションの選択傾向を知りたい / Let’s study trends of entry to conference sessions

Kanazawa.rb meetup #84 で発表した資料です。

今回分析に使ったサンプルコード
https://github.com/TAKAyukiatkwsk/session_analytics_sample

TAKAyukiatkwsk

August 17, 2019
Tweet

More Decks by TAKAyukiatkwsk

Other Decks in Technology

Transcript

  1. カンファレンスセッションの選択傾向を知り
    たい /
    Let’s study trends of entry to
    conference sessions
    Kanazawa.rb meetup #84
    Takayuki Takagi

    View Slide

  2. Who am I?
    ● Takayuki Takagi (高木貴之 / ニボシーニョ)
    ● @TAKAyuki_atkwsk / takayukiatkwsk
    ● Freelance programmer
    ● Remote work
    ● Scala, Ruby, Python, AWS, Docker, etc.
    ● Like beer and gyoza

    View Slide

  3. Today’s topic
    ● I want to know trends of entry to conference sessions

    ○ Extract characteristic words from their title and description with
    NLP(Natural Language Processing)
    ○ But I’m NOT familiar with NLP, so I want to use as easy tools as
    possible
    ○ Easy tools - Cloud APIs

    View Slide

  4. Cloud APIs for NLP
    ● Amazon Comprehend API (AWS)
    ● Cloud Natural Language API (GCP)
    ○ Syntactic analysis
    ● Text Analytics API (Azure)
    ○ Key-phrase extraction API
    -> These APIs are directly available in Japanese!!

    View Slide

  5. Make input data
    ● Copy session titles and descriptions to spreadsheet
    ○ Japan Container Days 2018 (no descriptions)
    ○ Scala Kansai Summit 2018
    ○ JAWS DAYS 2019
    ○ Scala Matsuri 2019
    ○ Google Cloud Next Tokyo 2019
    ● Export as CSV (script input)
    ○ id, title
    ○ id, title + description

    View Slide

  6. Analysis methods
    1. Extract key-phrases with Text Analytics API
    2. Analyze syntax with Cloud Natural Language API
    3. Analyze syntax with MeCab + NEologd (for comparison)
    ● Source code
    ○ https://github.com/TAKAyukiatkwsk/session_analytics_sample

    View Slide

  7. Extract key-phrases with Text
    Analytics API

    View Slide

  8. Key-phrase frequency
    ● [title] find technical words but they are the low frequency(Max=4, mostly 1)
    ● [title + description] the high frequency(Max=9) but they are not technical words
    ● “Kubernetes” is 4 in title, but is 3 in title + description

    View Slide

  9. Analyze syntax with Cloud Natural
    Language API

    View Slide

  10. N-gram Frequency
    ● [title unigram] More general topics (ex. Scala, Kubernetes, サービス, コンテナ, Cloud)
    ● Trends: Scala, Kubernetes, Akka, 機械学習, サーバーレス, Cloud Spanner, マイクロサービス

    View Slide

  11. N-gram Frequency
    ● [title + desc bigram] more understandable words than title bigram
    ● “分散トレーシング” is a characteristic phrase

    View Slide

  12. Analyze syntax with MeCab

    View Slide

  13. N-gram Frequency
    ● “型” is tokenized as a noun (as an affix with Natural Language API)
    ● “機械学習” and “サーバーレス” are tokenzied as one word
    ● “関数型” is a characteristic phrase

    View Slide

  14. N-gram Frequency
    ● There are not abstract words like “よう” “こと” “ため”
    ● “GraphQL” and “マイクロサービス” are tokenzied as one word (not in chart)
    ● “分散トレーシング” is a characteristic phrase

    View Slide

  15. Results
    ● Trends: Kubernetes, Serverless, Scala, 分散トレーシング,
    関数型, Akka, Cloud Spanner
    ○ The frequency depends on title and description quality
    ● Cloud APIs are useful
    ● Key-phrase is not enough, using N-gram too is better
    ● MeCab + NEologd can analyze better than Native
    Language API (in Japanese/specific category?)

    View Slide

  16. Results
    ● If you are familiar with NLP, please teach me NLP and
    analyzing methods!!!

    View Slide

  17. References
    ● Cloud Natural Language | Cloud Natural Language API | Google Cloud
    ○ https://cloud.google.com/natural-language/?hl=ja
    ● Text Analytics API とは - 機能 - - Azure Cognitive Services | Microsoft Docs
    ○ https://docs.microsoft.com/ja-jp/azure/cognitive-services/text-analytics/overview
    ● Amazon Comprehend(テキストのインサイトや関係性を検出) | AWS
    ○ https://aws.amazon.com/jp/comprehend/
    ● TF-IDFで見る評価の高いラーメン屋の口コミ傾向(自然言語処理 , TF-IDF, Mecab, wordcloud, 形態素
    解析、分かち書き) - ギークなエンジニアを目指す男
    ○ https://www.takapy.work/entry/2019/01/14/142128
    ● N-gramモデルを利用したテキスト分析  ―インデックスページ―
    ○ http://www.shuiren.org/chuden/teach/n-gram/index-j.html

    View Slide