Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
220
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
220
The right tool for the job
juliasilge
0
27
Good practices for applied machine learning
juliasilge
0
180
Applied machine learning with tidymodels
juliasilge
0
96
Maintaining an R Package
juliasilge
0
330
Publishing the Stack Overflow Developer Survey
juliasilge
2
56
Text Mining Using Tidy Data Principles
juliasilge
0
120
North American Developer Hiring Landscape
juliasilge
0
35
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.4k
Other Decks in Technology
See All in Technology
AI エージェント開発を支える MaaS としての Azure AI Foundry
ryohtaka
6
630
Visualize, Visualize, Visualize and rclone
tomoaki0705
9
67k
Amazon S3 Tablesと外部分析基盤連携について / Amazon S3 Tables and External Data Analytics Platform
nttcom
0
150
AndroidXR 開発ツールごとの できることできないこと
donabe3
0
130
Swiftの “private” を テストする / Testing Swift "private"
yutailang0119
0
140
速くて安いWebサイトを作る
nishiharatsubasa
14
15k
室長と気ままに学ぶマイクロソフトのビジネスアプリケーションとビジネスプロセス
ryoheig0405
0
370
PHPで印刷所に入稿できる名札データを作る / Generating Print-Ready Name Tag Data with PHP
tomzoh
0
140
コンテナサプライチェーンセキュリティ
kyohmizu
1
110
明日からできる!技術的負債の返済を加速するための実践ガイド~『ホットペッパービューティー』の事例をもとに~
recruitengineers
PRO
3
510
人はなぜISUCONに夢中になるのか
kakehashi
PRO
6
1.7k
わたしのOSS活動
kazupon
2
300
Featured
See All Featured
The Invisible Side of Design
smashingmag
299
50k
The World Runs on Bad Software
bkeepers
PRO
67
11k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Making Projects Easy
brettharned
116
6k
Unsuck your backbone
ammeep
669
57k
Typedesign – Prime Four
hannesfritz
40
2.5k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
29
1k
The Straight Up "How To Draw Better" Workshop
denniskardys
232
140k
KATA
mclloyd
29
14k
Producing Creativity
orderedlist
PRO
344
39k
jQuery: Nuts, Bolts and Bling
dougneiner
63
7.6k
ReactJS: Keep Simple. Everything can be a component!
pedronauck
666
120k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE