Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
220
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
110
The right tool for the job
juliasilge
0
22
Good practices for applied machine learning
juliasilge
0
180
Applied machine learning with tidymodels
juliasilge
0
88
Maintaining an R Package
juliasilge
0
310
Publishing the Stack Overflow Developer Survey
juliasilge
2
55
Text Mining Using Tidy Data Principles
juliasilge
0
110
North American Developer Hiring Landscape
juliasilge
0
32
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.4k
Other Decks in Technology
See All in Technology
10分でわかるfreeeのQA
freee
1
3.4k
来年もre:Invent2024 に行きたいあなたへ - “集中”と“つながり”で楽しむ -
ny7760
0
470
visionOSでの空間表現実装とImmersive Video表示について / ai-immersive-visionos
cyberagentdevelopers
PRO
1
110
マネジメント視点でのre:Invent参加 ~もしCEOがre:Inventに行ったら~
kojiasai
0
460
Nix入門パラダイム編
asa1984
2
200
わたしとトラックポイント / TrackPoint tips
masahirokawahara
1
240
CyberAgent 生成AI Deep Dive with Amazon Web Services / genai-aws
cyberagentdevelopers
PRO
1
480
Vueで Webコンポーネントを作って Reactで使う / 20241030-cloudsign-vuefes_after_night
bengo4com
4
2.5k
グローバル展開を見据えたサービスにおける機械翻訳プラクティス / dp-ai-translating
cyberagentdevelopers
PRO
1
150
顧客が本当に必要だったもの - パフォーマンス改善編 / Make what is needed
soudai
24
6.8k
物価高なラスベガスでの過ごし方
zakky
0
380
Jr. Championsになって、強く連携しながらAWSをもっと使いたい!~AWSに対する期待と行動~
amixedcolor
0
190
Featured
See All Featured
The Art of Programming - Codeland 2020
erikaheidi
51
13k
Designing Experiences People Love
moore
138
23k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
4
290
YesSQL, Process and Tooling at Scale
rocio
167
14k
The Myth of the Modular Monolith - Day 2 Keynote - Rails World 2024
eileencodes
14
1.9k
A designer walks into a library…
pauljervisheath
202
24k
Building a Scalable Design System with Sketch
lauravandoore
459
33k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
25
1.8k
How to Think Like a Performance Engineer
csswizardry
19
1.1k
KATA
mclloyd
29
13k
Ruby is Unlike a Banana
tanoku
96
11k
Understanding Cognitive Biases in Performance Measurement
bluesmoon
26
1.4k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE