Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
220
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
220
The right tool for the job
juliasilge
0
28
Good practices for applied machine learning
juliasilge
0
180
Applied machine learning with tidymodels
juliasilge
0
97
Maintaining an R Package
juliasilge
0
330
Publishing the Stack Overflow Developer Survey
juliasilge
2
57
Text Mining Using Tidy Data Principles
juliasilge
0
120
North American Developer Hiring Landscape
juliasilge
0
36
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.4k
Other Decks in Technology
See All in Technology
Potential EM 制度を始めた理由、そして2年後にやめた理由 - EMConf JP 2025
hoyo
2
1.6k
Two Blades, One Journey: Engineering While Managing
ohbarye
3
770
コンピュータビジョンの社会実装について考えていたらゲームを作っていた話
takmin
1
570
Autonomous Database Serverless 技術詳細 / adb-s_technical_detail_jp
oracle4engineer
PRO
17
45k
転生CISOサバイバル・ガイド / CISO Career Transition Survival Guide
kanny
3
1.1k
【詳説】コンテンツ配信 システムの複数機能 基盤への拡張
hatena
0
190
依存パッケージの更新はコツコツが勝つコツ! / phpcon_nagoya2025
blue_goheimochi
3
200
クラウドサービス事業者におけるOSS
tagomoris
3
970
php-conference-nagoya-2025
fuwasegu
0
140
OPENLOGI Company Profile
hr01
0
60k
システム・ML活用を広げるdbtのデータモデリング / Expanding System & ML Use with dbt Modeling
i125
1
310
ディスプレイ広告(Yahoo!広告・LINE広告)におけるバックエンド開発
lycorptech_jp
PRO
0
200
Featured
See All Featured
Building Applications with DynamoDB
mza
93
6.2k
RailsConf 2023
tenderlove
29
1k
A Philosophy of Restraint
colly
203
16k
Site-Speed That Sticks
csswizardry
4
400
Optimizing for Happiness
mojombo
376
70k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.7k
Measuring & Analyzing Core Web Vitals
bluesmoon
6
250
The Language of Interfaces
destraynor
156
24k
YesSQL, Process and Tooling at Scale
rocio
172
14k
A better future with KSS
kneath
238
17k
The Art of Programming - Codeland 2020
erikaheidi
53
13k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
33
2.1k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE