カンファレンスセッションの選択傾向を知り
たい /
Let’s study trends of entry to
conference sessions
Kanazawa.rb meetup #84
Takayuki Takagi
Slide 2
Slide 2 text
Who am I?
● Takayuki Takagi (高木貴之 / ニボシーニョ)
● @TAKAyuki_atkwsk / takayukiatkwsk
● Freelance programmer
● Remote work
● Scala, Ruby, Python, AWS, Docker, etc.
● Like beer and gyoza
Slide 3
Slide 3 text
Today’s topic
● I want to know trends of entry to conference sessions
○ Extract characteristic words from their title and description with
NLP(Natural Language Processing)
○ But I’m NOT familiar with NLP, so I want to use as easy tools as
possible
○ Easy tools - Cloud APIs
Slide 4
Slide 4 text
Cloud APIs for NLP
● Amazon Comprehend API (AWS)
● Cloud Natural Language API (GCP)
○ Syntactic analysis
● Text Analytics API (Azure)
○ Key-phrase extraction API
-> These APIs are directly available in Japanese!!
Slide 5
Slide 5 text
Make input data
● Copy session titles and descriptions to spreadsheet
○ Japan Container Days 2018 (no descriptions)
○ Scala Kansai Summit 2018
○ JAWS DAYS 2019
○ Scala Matsuri 2019
○ Google Cloud Next Tokyo 2019
● Export as CSV (script input)
○ id, title
○ id, title + description
Slide 6
Slide 6 text
Analysis methods
1. Extract key-phrases with Text Analytics API
2. Analyze syntax with Cloud Natural Language API
3. Analyze syntax with MeCab + NEologd (for comparison)
● Source code
○ https://github.com/TAKAyukiatkwsk/session_analytics_sample
Slide 7
Slide 7 text
Extract key-phrases with Text
Analytics API
Slide 8
Slide 8 text
Key-phrase frequency
● [title] find technical words but they are the low frequency(Max=4, mostly 1)
● [title + description] the high frequency(Max=9) but they are not technical words
● “Kubernetes” is 4 in title, but is 3 in title + description
Slide 9
Slide 9 text
Analyze syntax with Cloud Natural
Language API
Slide 10
Slide 10 text
N-gram Frequency
● [title unigram] More general topics (ex. Scala, Kubernetes, サービス, コンテナ, Cloud)
● Trends: Scala, Kubernetes, Akka, 機械学習, サーバーレス, Cloud Spanner, マイクロサービス
Slide 11
Slide 11 text
N-gram Frequency
● [title + desc bigram] more understandable words than title bigram
● “分散トレーシング” is a characteristic phrase
Slide 12
Slide 12 text
Analyze syntax with MeCab
Slide 13
Slide 13 text
N-gram Frequency
● “型” is tokenized as a noun (as an affix with Natural Language API)
● “機械学習” and “サーバーレス” are tokenzied as one word
● “関数型” is a characteristic phrase
Slide 14
Slide 14 text
N-gram Frequency
● There are not abstract words like “よう” “こと” “ため”
● “GraphQL” and “マイクロサービス” are tokenzied as one word (not in chart)
● “分散トレーシング” is a characteristic phrase
Slide 15
Slide 15 text
Results
● Trends: Kubernetes, Serverless, Scala, 分散トレーシング,
関数型, Akka, Cloud Spanner
○ The frequency depends on title and description quality
● Cloud APIs are useful
● Key-phrase is not enough, using N-gram too is better
● MeCab + NEologd can analyze better than Native
Language API (in Japanese/specific category?)
Slide 16
Slide 16 text
Results
● If you are familiar with NLP, please teach me NLP and
analyzing methods!!!
Slide 17
Slide 17 text
References
● Cloud Natural Language | Cloud Natural Language API | Google Cloud
○ https://cloud.google.com/natural-language/?hl=ja
● Text Analytics API とは - 機能 - - Azure Cognitive Services | Microsoft Docs
○ https://docs.microsoft.com/ja-jp/azure/cognitive-services/text-analytics/overview
● Amazon Comprehend(テキストのインサイトや関係性を検出) | AWS
○ https://aws.amazon.com/jp/comprehend/
● TF-IDFで見る評価の高いラーメン屋の口コミ傾向(自然言語処理 , TF-IDF, Mecab, wordcloud, 形態素
解析、分かち書き) - ギークなエンジニアを目指す男
○ https://www.takapy.work/entry/2019/01/14/142128
● N-gramモデルを利用したテキスト分析 ―インデックスページ―
○ http://www.shuiren.org/chuden/teach/n-gram/index-j.html