Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
260
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
360
The right tool for the job
juliasilge
0
76
Good practices for applied machine learning
juliasilge
0
230
Applied machine learning with tidymodels
juliasilge
0
160
Maintaining an R Package
juliasilge
0
420
Publishing the Stack Overflow Developer Survey
juliasilge
2
93
Text Mining Using Tidy Data Principles
juliasilge
0
180
North American Developer Hiring Landscape
juliasilge
0
79
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.6k
Other Decks in Technology
See All in Technology
スクリプトの先へ!AIエージェントと組み合わせる モバイルE2Eテスト
error96num
0
190
OSC仙台プレ勉強会 AlmaLinuxとは
koedoyoshida
0
190
2026-03-11 JAWS-UG 茨城 #12 改めてALBを便利に使う
masasuzu
2
400
複数クラスタ運用と検索の高度化:ビズリーチにおけるElastic活用事例 / ElasticON Tokyo2026
visional_engineering_and_design
0
170
AWS DevOps Agent vs SRE俺 / AWS DevOps Agent vs me, the SRE
sms_tech
3
900
Windows ファイル共有(SMB)を再確認する
murachiakira
PRO
0
170
Google系サービスで文字起こしから勝手にカレンダーを埋めるエージェントを作った話
risatube
0
190
非情報系研究者へ送る Transformer入門
rishiyama
15
8.9k
フロントエンド刷新 4年間の軌跡
yotahada3
0
500
20260311 ビジネスSWG活動報告(デジタルアイデンティティ人材育成推進WG Ph2 活動報告会)
oidfj
0
350
ガバメントクラウドにおけるAWSの長期継続割引について
takeda_h
2
5.3k
Go 1.26 Genericsにおける再帰的型制約 / Recursive Type Constraints in Go 1.26 Generics
ryokotmng
0
120
Featured
See All Featured
A better future with KSS
kneath
240
18k
How to build an LLM SEO readiness audit: a practical framework
nmsamuel
1
690
Avoiding the “Bad Training, Faster” Trap in the Age of AI
tmiket
0
100
The Spectacular Lies of Maps
axbom
PRO
1
630
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
GraphQLとの向き合い方2022年版
quramy
50
14k
Introduction to Domain-Driven Design and Collaborative software design
baasie
1
640
WCS-LA-2024
lcolladotor
0
480
The Impact of AI in SEO - AI Overviews June 2024 Edition
aleyda
5
770
Un-Boring Meetings
codingconduct
0
230
Designing for Performance
lara
611
70k
SEO Brein meetup: CTRL+C is not how to scale international SEO
lindahogenes
1
2.4k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE