Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
240
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
310
The right tool for the job
juliasilge
0
50
Good practices for applied machine learning
juliasilge
0
210
Applied machine learning with tidymodels
juliasilge
0
130
Maintaining an R Package
juliasilge
0
370
Publishing the Stack Overflow Developer Survey
juliasilge
2
75
Text Mining Using Tidy Data Principles
juliasilge
0
140
North American Developer Hiring Landscape
juliasilge
0
54
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.5k
Other Decks in Technology
See All in Technology
これでもう迷わない!Jetpack Composeの書き方実践ガイド
zozotech
PRO
0
880
Codeful Serverless / 一人運用でもやり抜く力
_kensh
7
430
フルカイテン株式会社 エンジニア向け採用資料
fullkaiten
0
8.7k
品質視点から考える組織デザイン/Organizational Design from Quality
mii3king
0
210
20250910_障害注入から効率的復旧へ_カオスエンジニアリング_生成AIで考えるAWS障害対応.pdf
sh_fk2
3
260
Language Update: Java
skrb
2
300
20250913_JAWS_sysad_kobe
takuyay0ne
2
220
2025年になってもまだMySQLが好き
yoku0825
8
4.8k
DDD集約とサービスコンテキスト境界との関係性
pandayumi
3
280
会社紹介資料 / Sansan Company Profile
sansan33
PRO
6
380k
スマートファクトリーの第一歩 〜AWSマネージドサービスで 実現する予知保全と生成AI活用まで
ganota
2
220
Firestore → Spanner 移行 を成功させた段階的移行プロセス
athug
1
490
Featured
See All Featured
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.7k
Measuring & Analyzing Core Web Vitals
bluesmoon
9
580
How GitHub (no longer) Works
holman
315
140k
VelocityConf: Rendering Performance Case Studies
addyosmani
332
24k
Practical Orchestrator
shlominoach
190
11k
Agile that works and the tools we love
rasmusluckow
330
21k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
34
6k
The Success of Rails: Ensuring Growth for the Next 100 Years
eileencodes
46
7.6k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
358
30k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.6k
Context Engineering - Making Every Token Count
addyosmani
3
49
How STYLIGHT went responsive
nonsquared
100
5.8k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE