$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
240
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
330
The right tool for the job
juliasilge
0
58
Good practices for applied machine learning
juliasilge
0
210
Applied machine learning with tidymodels
juliasilge
0
140
Maintaining an R Package
juliasilge
0
390
Publishing the Stack Overflow Developer Survey
juliasilge
2
78
Text Mining Using Tidy Data Principles
juliasilge
0
150
North American Developer Hiring Landscape
juliasilge
0
62
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.5k
Other Decks in Technology
See All in Technology
Kill the Vibe?Architecture in the age of AI
stoth
1
150
AI駆動開発によるDDDの実践
dip_tech
PRO
0
220
インフラ屋さんはAIコーディングエージェントとどう生きるか/How infrastructure engineers interact with Kiro
ozawa
2
110
IPv6-mostly field report from RubyKaigi 2026
sorah
0
250
DGX SparkでローカルLLMをLangChainで動かした話
ruzia
1
230
All About Sansan – for New Global Engineers
sansan33
PRO
1
1.3k
あなたの知らないDateのひみつ / The Secret of "Date" You Haven't known #tqrk16
expajp
0
110
AIにおける自由の追求
shujisado
2
430
研究開発部メンバーの働き⽅ / Sansan R&D Profile
sansan33
PRO
3
21k
Digitization部 紹介資料
sansan33
PRO
1
6.1k
Multimodal AI Driving Solutions to Societal Challenges
keio_smilab
PRO
1
100
プラットフォームエンジニアリングとは何であり、なぜプラットフォームエンジニアリングなのか
doublemarket
1
520
Featured
See All Featured
Raft: Consensus for Rubyists
vanstee
140
7.2k
Facilitating Awesome Meetings
lara
57
6.6k
GraphQLの誤解/rethinking-graphql
sonatard
73
11k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
51k
How to Ace a Technical Interview
jacobian
280
24k
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
A better future with KSS
kneath
240
18k
The Cult of Friendly URLs
andyhume
79
6.7k
Building an army of robots
kneath
306
46k
Being A Developer After 40
akosma
91
590k
Making the Leap to Tech Lead
cromwellryan
135
9.6k
Keith and Marios Guide to Fast Websites
keithpitt
413
23k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE