Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
240
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
310
The right tool for the job
juliasilge
0
48
Good practices for applied machine learning
juliasilge
0
200
Applied machine learning with tidymodels
juliasilge
0
130
Maintaining an R Package
juliasilge
0
370
Publishing the Stack Overflow Developer Survey
juliasilge
2
73
Text Mining Using Tidy Data Principles
juliasilge
0
140
North American Developer Hiring Landscape
juliasilge
0
54
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.5k
Other Decks in Technology
See All in Technology
サラリーマンの小遣いで作るtoCサービス - Cloudflare Workersでスケールする開発戦略
shinaps
1
330
なぜSaaSがMCPサーバーをサービス提供するのか?
sansantech
PRO
8
2.7k
COVESA VSSによる車両データモデルの標準化とAWS IoT FleetWiseの活用
osawa
1
240
Obsidian応用活用術
onikun94
1
440
おやつは300円まで!の最適化を模索してみた
techtekt
PRO
0
290
ZOZOマッチのアーキテクチャと技術構成
zozotech
PRO
3
1.4k
Skrub: machine-learning with dataframes
gaelvaroquaux
0
120
allow_retry と Arel.sql / allow_retry and Arel.sql
euglena1215
1
160
Language Update: Java
skrb
2
280
オブザーバビリティが広げる AIOps の世界 / The World of AIOps Expanded by Observability
aoto
PRO
0
330
下手な強制、ダメ!絶対! 「ガードレール」を「檻」にさせない"ガバナンス"の取り方とは?
tsukaman
2
410
5年目から始める Vue3 サイト改善 #frontendo
tacck
PRO
3
200
Featured
See All Featured
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
Imperfection Machines: The Place of Print at Facebook
scottboms
268
13k
A designer walks into a library…
pauljervisheath
207
24k
The Art of Programming - Codeland 2020
erikaheidi
55
13k
Building a Modern Day E-commerce SEO Strategy
aleyda
43
7.6k
Rebuilding a faster, lazier Slack
samanthasiow
83
9.2k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
252
21k
Side Projects
sachag
455
43k
Large-scale JavaScript Application Architecture
addyosmani
512
110k
The Cult of Friendly URLs
andyhume
79
6.6k
YesSQL, Process and Tooling at Scale
rocio
173
14k
Statistics for Hackers
jakevdp
799
220k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE