Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
220
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
210
The right tool for the job
juliasilge
0
27
Good practices for applied machine learning
juliasilge
0
180
Applied machine learning with tidymodels
juliasilge
0
96
Maintaining an R Package
juliasilge
0
330
Publishing the Stack Overflow Developer Survey
juliasilge
2
56
Text Mining Using Tidy Data Principles
juliasilge
0
120
North American Developer Hiring Landscape
juliasilge
0
35
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.4k
Other Decks in Technology
See All in Technology
関東Kaggler会LT: 人狼コンペとLLM量子化について
nejumi
3
420
SCSAから学ぶセキュリティ管理
masakamayama
0
140
データ基盤の成長を加速させる:アイスタイルにおける挑戦と教訓
tsuda7
3
650
APIファーストで実現する運用性の高い IoT プラットフォーム: SORACOMのアプローチ
soracom
PRO
0
240
マルチデータプロダクト開発・運用に耐えるためのデータ組織・アーキテクチャの遷移
mtpooh
1
420
リアルタイム分析データベースで実現する SQLベースのオブザーバビリティ
mikimatsumoto
0
860
モノレポ開発のエラー、誰が見る?Datadog で実現する適切なトリアージとエスカレーション
biwashi
6
770
第13回 Data-Centric AI勉強会, 画像認識におけるData-centric AI
ksaito_osx
0
350
MC906491 を見据えた Microsoft Entra Connect アップグレード対応
tamaiyutaro
1
480
目の前の仕事と向き合うことで成長できる - 仕事とスキルを広げる / Every little bit counts
soudai
22
5.5k
Developers Summit 2025 浅野卓也(13-B-7 LegalOn Technologies)
legalontechnologies
PRO
0
120
[2025-02-07]生成AIで変える問い合わせの未来 〜チームグローバル化の香りを添えて〜
tosite
1
290
Featured
See All Featured
Code Reviewing Like a Champion
maltzj
521
39k
KATA
mclloyd
29
14k
Being A Developer After 40
akosma
89
590k
The World Runs on Bad Software
bkeepers
PRO
67
11k
Keith and Marios Guide to Fast Websites
keithpitt
410
22k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Stop Working from a Prison Cell
hatefulcrawdad
267
20k
A Modern Web Designer's Workflow
chriscoyier
693
190k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
160
15k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
330
21k
Docker and Python
trallard
44
3.3k
Designing Experiences People Love
moore
139
23k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE