Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Web scraping for data scientists
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Irio Musskopf
May 24, 2016
Programming
0
81
Web scraping for data scientists
Irio Musskopf
May 24, 2016
Tweet
Share
More Decks by Irio Musskopf
See All by Irio Musskopf
Using Machine Learning and Open Data to Report 216 Brazilian Congresspeople for Corruption
irio
0
360
Por que functional programming é mais rápido?
irio
0
69
No país das maravilhas
irio
0
51
Desenvolvendo o mínimo com Ruby on Rails
irio
0
140
Implementando pagamentos usando Moip
irio
0
92
vim 101
irio
1
220
Other Decks in Programming
See All in Programming
副作用をどこに置くか問題:オブジェクト指向で整理する設計判断ツリー
koxya
1
610
FOSDEM 2026: STUNMESH-go: Building P2P WireGuard Mesh Without Self-Hosted Infrastructure
tjjh89017
0
160
Grafana:建立系統全知視角的捷徑
blueswen
0
330
humanlayerのブログから学ぶ、良いCLAUDE.mdの書き方
tsukamoto1783
0
190
AIによる開発の民主化を支える コンテキスト管理のこれまでとこれから
mulyu
3
240
ぼくの開発環境2026
yuzneri
0
210
Fragmented Architectures
denyspoltorak
0
150
組織で育むオブザーバビリティ
ryota_hnk
0
170
MUSUBIXとは
nahisaho
0
130
疑似コードによるプロンプト記述、どのくらい正確に実行される?
kokuyouwind
0
380
【卒業研究】会話ログ分析によるユーザーごとの関心に応じた話題提案手法
momok47
0
200
登壇資料を作る時に意識していること #登壇資料_findy
konifar
4
1.1k
Featured
See All Featured
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
2
410
The Organizational Zoo: Understanding Human Behavior Agility Through Metaphoric Constructive Conversations (based on the works of Arthur Shelley, Ph.D)
kimpetersen
PRO
0
240
WENDY [Excerpt]
tessaabrams
9
36k
Data-driven link building: lessons from a $708K investment (BrightonSEO talk)
szymonslowik
1
910
We Analyzed 250 Million AI Search Results: Here's What I Found
joshbly
1
700
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
359
30k
A Modern Web Designer's Workflow
chriscoyier
698
190k
The agentic SEO stack - context over prompts
schlessera
0
630
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
2.1k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.6k
Intergalactic Javascript Robots from Outer Space
tanoku
273
27k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
141
34k
Transcript
Web scraping Irio Musskopf Data Science Retreat for data scientists
Finding data Not always easy
1. Downloadable dataset
2.APIs
3. Scraping
4.Talk with other companies
4.Produce yourself
Doesn’t matter how complex the system is. It is possible.
Doesn’t matter how complex the system is. It is possible.
Unless there’s a captcha.
None
DEMO
Selectors Limitations User agents Proxies
Irio Musskopf
[email protected]
Thanks