Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Text Mining: Exploratory Data Analysis to Machi...
Search
Julia Silge
March 04, 2019
Technology
1
240
Text Mining: Exploratory Data Analysis to Machine Learning
March 2019 talk at WiDS Salt Lake City regional event
Julia Silge
March 04, 2019
Tweet
Share
More Decks by Julia Silge
See All by Julia Silge
Introducing Positron
juliasilge
1
300
The right tool for the job
juliasilge
0
47
Good practices for applied machine learning
juliasilge
0
200
Applied machine learning with tidymodels
juliasilge
0
120
Maintaining an R Package
juliasilge
0
370
Publishing the Stack Overflow Developer Survey
juliasilge
2
72
Text Mining Using Tidy Data Principles
juliasilge
0
140
North American Developer Hiring Landscape
juliasilge
0
53
Understanding Principal Component Analysis Using Stack Overflow Data
juliasilge
13
4.5k
Other Decks in Technology
See All in Technology
Claude Codeから我々が学ぶべきこと
oikon48
10
2.8k
開発 × 生成AI × コミュニケーション:GENDAの開発現場で感じたコミュニケーションの変化 / GENDA Tech Talk #1
genda
0
290
LLM 機能を支える Langfuse / ClickHouse のサーバレス化
yuu26
9
2.5k
LTに影響を受けてテンプレリポジトリを作った話
hol1kgmg
0
380
九州の人に知ってもらいたいGISスポット / gis spot in kyushu 2025
sakaik
0
180
Serverless Meetup #21
yoshidashingo
1
130
生成AIによるデータサイエンスの変革
taka_aki
0
3k
Amazon Inspector コードセキュリティで手軽に実現するシフトレフト
maimyyym
0
130
JAWS-UG のイベントで使うハンズオンシナリオを Amazon Q Developer for CLI で作ってみた話
kazzpapa3
0
110
Claude Codeは仕様駆動の夢を見ない
gotalab555
23
7k
Eval-Centric AI: Agent 開発におけるベストプラクティスの探求
asei
0
140
Findy Freelance 利用シーン別AI活用例
ness
0
670
Featured
See All Featured
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
53
2.9k
Measuring & Analyzing Core Web Vitals
bluesmoon
8
550
Refactoring Trust on Your Teams (GOTO; Chicago 2020)
rmw
34
3.1k
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4k
Embracing the Ebb and Flow
colly
86
4.8k
Intergalactic Javascript Robots from Outer Space
tanoku
272
27k
The Cost Of JavaScript in 2023
addyosmani
53
8.8k
Designing for humans not robots
tammielis
253
25k
GraphQLの誤解/rethinking-graphql
sonatard
71
11k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
36
2.5k
Building an army of robots
kneath
306
45k
StorybookのUI Testing Handbookを読んだ
zakiyama
30
6k
Transcript
T E X T M I N I N G
EXPLORATORY DATA ANALYSIS TO MACHINE LEARNING
HELLO T I D Y T E X T Data
Scientist at Stack Overflow @juliasilge https://juliasilge.com/ I’m Julia Silge
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT
T I D Y T E X T TEXT DATA
IS INCREASINGLY IMPORTANT NLP TRAINING IS SCARCE ON THE GROUND
TIDY DATA PRINCIPLES + COUNT-BASED METHODS = T I D
Y T E X T
https://github.com/juliasilge/tidytext
https://github.com/juliasilge/tidytext
http://tidytextmining.com/
T I D Y T E X T EXPLORATORY DATA
ANALYSIS N-GRAMS AND MORE WORDS MACHINE LEARNING
EXPLORATORY DATA ANALYSIS T I D Y T E X
T
from the Washington Post’s Wonkblog
from the Washington Post’s Wonkblog
D3 visualization on Glitch
WHAT IS A DOCUMENT ABOUT? T I D Y T
E X T TERM FREQUENCY INVERSE DOCUMENT FREQUENCY
None
None
• As part of the NASA Datanauts program, I worked
on a project to understand NASA datasets • Metadata includes title, description, keywords, etc
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L N-GRAMS, NETWORKS, & NEGATION
None
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TOPIC MODELING
TOPIC MODELING T I D Y T E X T
•Each DOCUMENT = mixture of topics •Each TOPIC = mixture of words
None
None
None
None
T A K I N G T I D Y
T E X T T O T H E N E X T L E V E L TEXT CLASSIFICATION
TRAIN A GLMNET MODEL T I D Y T E
X T
TEXT CLASSIFICATION T I D Y T E X T
> library(glmnet) > library(doMC) > registerDoMC(cores = 8) > > is_jane <- books_joined$title == "Pride and Prejudice" > > model <- cv.glmnet(sparse_words, is_jane, family = "binomial", + parallel = TRUE, keep = TRUE)
None
None
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com JULIA SILGE
THANK YOU T I D Y T E X T
@juliasilge https://juliasilge.com Author portraits from Wikimedia Photos by Glen Noble and Kimberly Farmer on Unsplash JULIA SILGE