Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machine Learning
Search
Julia Silge
March 04, 2019
Technology
1
210
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
The right tool for the job
juliasilge
0
9
Good practices for applied machine learning
juliasilge
0
150
Applied machine learning with tidymodels
juliasilge
0
78
Maintaining an R Package
juliasilge
0
290
Publishing the Stack Overflow Developer Survey
juliasilge
2
55
Text Mining Using Tidy Data Principles
juliasilge
0
110
North American Developer Hiring Landscape
juliasilge
0
31
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.3k
Text Mining with Tidy Data Principles and Count-Based Methods
juliasilge
3
930
Other Decks in Technology
See All in Technology
Amazon FSx for NetApp ONTAPのパフォーマンスチューニング要素をまとめてみた #cm_odyssey #devio2024
non97
0
220
目標設定は好きですか? アジャイルとともに目標と向き合い続ける方法 / Do you like target Management?
kakehashi
10
3k
エンジニアリングマネージャーはどう学んでいくのか #devsumi / How Do Engineering Managers Continue to Learn and Grow?
expajp
4
1.3k
GoとアクターモデルでES+CQRSを実践! / proto_actor_es_cqrs
ytake
1
150
MySQLのロックの種類とその競合
yoku0825
6
1.6k
シフトレフトで挑む セキュリティの生産性向上
sekido
PRO
0
270
エンジニア向け会社紹介資料
caddi_eng
14
220k
AOAI Dev Day - Opening Session
yoshidashingo
2
430
コミュニティサービスに「あなたへ」フィードを リリースするまでの試行錯誤
takapy
1
140
年間一億円削減した時系列データベースのアーキテクチャ改善~不確実性の高いプロジェクトへの挑戦~
lycorptech_jp
PRO
3
2.9k
VPoEの視点から見た、ヘンリーがサーバーサイドKotlinを使う理由 / Why Server-side Kotlin 2024
cho0o0
1
420
Azure Pipelinesを使用したCICDベースラインアーキテクチャ実践
yuriemori
0
190
Featured
See All Featured
A Philosophy of Restraint
colly
200
16k
Agile that works and the tools we love
rasmusluckow
325
20k
Documentation Writing (for coders)
carmenintech
63
4.2k
KATA
mclloyd
20
13k
Fontdeck: Realign not Redesign
paulrobertlloyd
79
5.1k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
129
32k
Building Applications with DynamoDB
mza
89
5.8k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
228
16k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
23
1.9k
Gamification - CAS2011
davidbonilla
78
4.9k
A better future with KSS
kneath
231
17k
Designing with Data
zakiwarfel
96
5k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE