Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
230
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
270
The right tool for the job
juliasilge
0
41
Good practices for applied machine learning
juliasilge
0
200
Applied machine learning with tidymodels
juliasilge
0
110
Maintaining an R Package
juliasilge
0
360
Publishing the Stack Overflow Developer Survey
juliasilge
2
67
Text Mining Using Tidy Data Principles
juliasilge
0
130
North American Developer Hiring Landscape
juliasilge
0
50
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.5k
Other Decks in Technology
See All in Technology
5min GuardDuty Extended Threat Detection EKS
takakuni
0
130
【TiDB GAME DAY 2025】Shadowverse: Worlds Beyond にみる TiDB 活用術
cygames
0
1k
急成長を支える基盤作り〜地道な改善からコツコツと〜 #cre_meetup
stefafafan
0
120
Agentic DevOps時代の生存戦略
kkamegawa
1
1.3k
Observability infrastructure behind the trillion-messages scale Kafka platform
lycorptech_jp
PRO
0
140
AWS CDK 実践的アプローチ N選 / aws-cdk-practical-approaches
gotok365
6
720
ハノーバーメッセ2025座談会.pdf
iotcomjpadmin
0
160
25分で解説する「最小権限の原則」を実現するための AWS「ポリシー」大全 / 20250625-aws-summit-aws-policy
opelab
9
1.1k
フィンテック養成勉強会#54
finengine
0
170
Prox Industries株式会社 会社紹介資料
proxindustries
0
270
rubygem開発で鍛える設計力
joker1007
2
190
LinkX_GitHubを基点にした_AI時代のプロジェクトマネジメント.pdf
iotcomjpadmin
0
170
Featured
See All Featured
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
107
19k
Adopting Sorbet at Scale
ufuk
77
9.4k
Producing Creativity
orderedlist
PRO
346
40k
Docker and Python
trallard
44
3.4k
Fantastic passwords and where to find them - at NoRuKo
philnash
51
3.3k
GraphQLとの向き合い方2022年版
quramy
47
14k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
44
2.4k
Keith and Marios Guide to Fast Websites
keithpitt
411
22k
Building Applications with DynamoDB
mza
95
6.5k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
46
9.6k
Stop Working from a Prison Cell
hatefulcrawdad
270
20k
実際に使うSQLの書き方 徹底解説 / pgcon21j-tutorial
soudai
PRO
181
53k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE