Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
220
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
190
The right tool for the job
juliasilge
0
23
Good practices for applied machine learning
juliasilge
0
180
Applied machine learning with tidymodels
juliasilge
0
92
Maintaining an R Package
juliasilge
0
320
Publishing the Stack Overflow Developer Survey
juliasilge
2
55
Text Mining Using Tidy Data Principles
juliasilge
0
110
North American Developer Hiring Landscape
juliasilge
0
33
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.4k
Other Decks in Technology
See All in Technology
サーバレスアプリ開発者向けアップデートをキャッチアップしてきた #AWSreInvent #regrowth_fuk
drumnistnakano
0
190
フロントエンド設計にモブ設計を導入してみた / 20241212_cloudsign_TechFrontMeetup
bengo4com
0
1.9k
Snowflake女子会#3 Snowpipeの良さを5分で語るよ
lana2548
0
220
re:Invent 2024 Innovation Talks(NET201)で語られた大切なこと
shotashiratori
0
300
20241220_S3 tablesの使い方を検証してみた
handy
3
250
日本版とグローバル版のモバイルアプリ統合の開発の裏側と今後の展望
miichan
1
120
大幅アップデートされたRagas v0.2をキャッチアップ
os1ma
2
520
統計データで2024年の クラウド・インフラ動向を眺める
ysknsid25
2
840
バクラクのドキュメント解析技術と実データにおける課題 / layerx-ccc-winter-2024
shimacos
2
1k
小学3年生夏休みの自由研究「夏休みに Copilot で遊んでみた」
taichinakamura
0
150
AI時代のデータセンターネットワーク
lycorptech_jp
PRO
1
280
AIのコンプラは何故しんどい?
shujisado
1
190
Featured
See All Featured
Writing Fast Ruby
sferik
628
61k
KATA
mclloyd
29
14k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
226
22k
Done Done
chrislema
181
16k
Embracing the Ebb and Flow
colly
84
4.5k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
5
440
GraphQLとの向き合い方2022年版
quramy
44
13k
Art, The Web, and Tiny UX
lynnandtonic
298
20k
Rails Girls Zürich Keynote
gr2m
94
13k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
45
2.2k
Typedesign – Prime Four
hannesfritz
40
2.4k
Stop Working from a Prison Cell
hatefulcrawdad
267
20k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE