Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Sponsored
·
Your Podcast. Everywhere. Effortlessly.
Share. Educate. Inspire. Entertain. You do you. We'll handle the rest.
→
Julia Silge
March 04, 2019
Technology
1
250
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
340
The right tool for the job
juliasilge
0
69
Good practices for applied machine learning
juliasilge
0
230
Applied machine learning with tidymodels
juliasilge
0
150
Maintaining an R Package
juliasilge
0
410
Publishing the Stack Overflow Developer Survey
juliasilge
2
86
Text Mining Using Tidy Data Principles
juliasilge
0
170
North American Developer Hiring Landscape
juliasilge
0
70
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.6k
Other Decks in Technology
See All in Technology
AIと新時代を切り拓く。これからのSREとメルカリIBISの挑戦
0gm
0
1.4k
プロダクト成長を支える開発基盤とスケールに伴う課題
yuu26
4
1.3k
Kiro IDEのドキュメントを全部読んだので地味だけどちょっと嬉しい機能を紹介する
khmoryz
0
200
顧客との商談議事録をみんなで読んで顧客解像度を上げよう
shibayu36
0
240
顧客の言葉を、そのまま信じない勇気
yamatai1212
1
360
Introduction to Bill One Development Engineer
sansan33
PRO
0
360
名刺メーカーDevグループ 紹介資料
sansan33
PRO
0
1k
登壇駆動学習のすすめ — CfPのネタの見つけ方と書くときに意識していること
bicstone
3
110
会社紹介資料 / Sansan Company Profile
sansan33
PRO
15
400k
StrandsとNeptuneを使ってナレッジグラフを構築する
yakumo
1
120
Digitization部 紹介資料
sansan33
PRO
1
6.8k
Context Engineeringが企業で不可欠になる理由
hirosatogamo
PRO
3
590
Featured
See All Featured
Jess Joyce - The Pitfalls of Following Frameworks
techseoconnect
PRO
1
66
Color Theory Basics | Prateek | Gurzu
gurzu
0
200
KATA
mclloyd
PRO
34
15k
Claude Code のすすめ
schroneko
67
210k
What Being in a Rock Band Can Teach Us About Real World SEO
427marketing
0
170
GitHub's CSS Performance
jonrohan
1032
470k
Docker and Python
trallard
47
3.7k
Information Architects: The Missing Link in Design Systems
soysaucechin
0
780
Git: the NoSQL Database
bkeepers
PRO
432
66k
The SEO Collaboration Effect
kristinabergwall1
0
350
Navigating the moral maze — ethical principles for Al-driven product design
skipperchong
2
240
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
254
22k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE