Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
220
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
160
The right tool for the job
juliasilge
0
22
Good practices for applied machine learning
juliasilge
0
180
Applied machine learning with tidymodels
juliasilge
0
91
Maintaining an R Package
juliasilge
0
310
Publishing the Stack Overflow Developer Survey
juliasilge
2
55
Text Mining Using Tidy Data Principles
juliasilge
0
110
North American Developer Hiring Landscape
juliasilge
0
33
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.4k
Other Decks in Technology
See All in Technology
Can We Measure Developer Productivity?
ewolff
1
150
インフラとバックエンドとフロントエンドをくまなく調べて遅いアプリを早くした件
tubone24
1
430
20241120_JAWS_東京_ランチタイムLT#17_AWS認定全冠の先へ
tsumita
2
230
rootlessコンテナのすゝめ - 研究室サーバーでもできる安全なコンテナ管理
kitsuya0828
3
380
Security-JAWS【第35回】勉強会クラウドにおけるマルウェアやコンテンツ改ざんへの対策
4su_para
0
170
Lexical Analysis
shigashiyama
1
150
フルカイテン株式会社 採用資料
fullkaiten
0
40k
マルチプロダクトな開発組織で 「開発生産性」に向き合うために試みたこと / Improving Multi-Product Dev Productivity
sugamasao
1
300
複雑なState管理からの脱却
sansantech
PRO
1
140
100 名超が参加した日経グループ横断の競技型 AWS 学習イベント「Nikkei Group AWS GameDay」の紹介/mediajaws202411
nikkei_engineer_recruiting
1
170
DMARC 対応の話 - MIXI CTO オフィスアワー #04
bbqallstars
1
160
社内で最大の技術的負債のリファクタリングに取り組んだお話し
kidooonn
1
550
Featured
See All Featured
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Writing Fast Ruby
sferik
627
61k
Testing 201, or: Great Expectations
jmmastey
38
7.1k
A better future with KSS
kneath
238
17k
Rebuilding a faster, lazier Slack
samanthasiow
79
8.7k
Faster Mobile Websites
deanohume
305
30k
Fashionably flexible responsive web design (full day workshop)
malarkey
405
65k
The Power of CSS Pseudo Elements
geoffreycrofte
73
5.3k
Put a Button on it: Removing Barriers to Going Fast.
kastner
59
3.5k
BBQ
matthewcrist
85
9.3k
Keith and Marios Guide to Fast Websites
keithpitt
409
22k
The Cult of Friendly URLs
andyhume
78
6k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE