Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
230
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
260
The right tool for the job
juliasilge
0
38
Good practices for applied machine learning
juliasilge
0
200
Applied machine learning with tidymodels
juliasilge
0
110
Maintaining an R Package
juliasilge
0
360
Publishing the Stack Overflow Developer Survey
juliasilge
2
66
Text Mining Using Tidy Data Principles
juliasilge
0
130
North American Developer Hiring Landscape
juliasilge
0
49
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.5k
Other Decks in Technology
See All in Technology
Azure Developer CLI と Azure Deployment Environment / Azure Developer CLI and Azure Deployment Environment
nnstt1
1
150
技術書典18結果報告
mutsumix
2
190
CSSDay, Amsterdam
brucel
0
190
単一Gitリポジトリから独立しました
lycorptech_jp
PRO
0
290
組織とセキュリティ文化と、自分の一歩
maimyyym
3
1.3k
キャッシュレス決済のプロダクトから決済基盤への進化
b1a9id
0
130
データプレーンプログラミングとは? DPU&スイッチASICの開発経験から語る
ebiken
PRO
1
290
DevOpsDays Taipei 2025 -- Creating Awesome Change in SmartNews!
martin_lover
0
180
GitHub Copilot Use Cases at ZOZO
horie1024
1
260
CloudBruteによる外部からのS3バケットの探索・公開の発見について / 20250605 Kumiko Henmi
shift_evolve
3
310
Introduction to Sansan, inc / Sansan Global Development Center, Inc.
sansan33
PRO
0
2.6k
Houtou.pm #1
papix
0
930
Featured
See All Featured
The Cost Of JavaScript in 2023
addyosmani
49
8.2k
For a Future-Friendly Web
brad_frost
178
9.8k
How to Think Like a Performance Engineer
csswizardry
23
1.6k
Build The Right Thing And Hit Your Dates
maggiecrowley
35
2.7k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
45
9.6k
Building an army of robots
kneath
306
45k
Documentation Writing (for coders)
carmenintech
71
4.8k
Balancing Empowerment & Direction
lara
1
95
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
280
13k
We Have a Design System, Now What?
morganepeng
52
7.6k
Designing for humans not robots
tammielis
253
25k
Why Our Code Smells
bkeepers
PRO
336
57k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE