Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
260
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
350
The right tool for the job
juliasilge
0
73
Good practices for applied machine learning
juliasilge
0
230
Applied machine learning with tidymodels
juliasilge
0
160
Maintaining an R Package
juliasilge
0
420
Publishing the Stack Overflow Developer Survey
juliasilge
2
89
Text Mining Using Tidy Data Principles
juliasilge
0
170
North American Developer Hiring Landscape
juliasilge
0
75
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.6k
Other Decks in Technology
See All in Technology
Exadata Fleet Update
oracle4engineer
PRO
0
1.3k
どこで打鍵するのが良い? IaCの実行基盤選定について
nrinetcom
PRO
2
140
LINE Messengerの次世代ストレージ選定
lycorptech_jp
PRO
17
7.1k
Master Dataグループ紹介資料
sansan33
PRO
1
4.4k
What's new in Go 1.26?
ciarana
2
280
マイグレーションガイドに書いてないRiverpod 3移行話
taiju59
0
340
Oracle Base Database Service 技術詳細
oracle4engineer
PRO
15
95k
1 年間の育休から時短勤務で復帰した私が、 AI を駆使して立ち上がりを早めた話
lycorptech_jp
PRO
0
220
ヘルシーSRE
tk3fftk
2
230
Sansan Engineering Unit 紹介資料
sansan33
PRO
1
4k
Introduction to Sansan for Engineers / エンジニア向け会社紹介
sansan33
PRO
6
71k
Devinを導入したら予想外の人たちに好評だった
tomuro
0
820
Featured
See All Featured
Odyssey Design
rkendrick25
PRO
2
530
Crafting Experiences
bethany
1
75
Game over? The fight for quality and originality in the time of robots
wayneb77
1
130
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
333
22k
How To Speak Unicorn (iThemes Webinar)
marktimemedia
1
400
Technical Leadership for Architectural Decision Making
baasie
3
270
Evolving SEO for Evolving Search Engines
ryanjones
0
150
GraphQLの誤解/rethinking-graphql
sonatard
75
11k
Data-driven link building: lessons from a $708K investment (BrightonSEO talk)
szymonslowik
1
950
GraphQLとの向き合い方2022年版
quramy
50
14k
Writing Fast Ruby
sferik
630
62k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
54k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE