Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Web scraping for data scientists
Search
Sponsored
·
Ship Features Fearlessly
Turn features on and off without deploys. Used by thousands of Ruby developers.
→
Irio Musskopf
May 24, 2016
Programming
0
81
Web scraping for data scientists
Irio Musskopf
May 24, 2016
Tweet
Share
More Decks by Irio Musskopf
See All by Irio Musskopf
Using Machine Learning and Open Data to Report 216 Brazilian Congresspeople for Corruption
irio
0
360
Por que functional programming é mais rápido?
irio
0
69
No país das maravilhas
irio
0
51
Desenvolvendo o mínimo com Ruby on Rails
irio
0
140
Implementando pagamentos usando Moip
irio
0
92
vim 101
irio
1
220
Other Decks in Programming
See All in Programming
Data-Centric Kaggle
isax1015
2
780
プロダクトオーナーから見たSOC2 _SOC2ゆるミートアップ#2
kekekenta
0
220
CSC307 Lecture 08
javiergs
PRO
0
670
MDN Web Docs に日本語翻訳でコントリビュート
ohmori_yusuke
0
650
開発者から情シスまで - 多様なユーザー層に届けるAPI提供戦略 / Postman API Night Okinawa 2026 Winter
tasshi
0
210
HTTPプロトコル正しく理解していますか? 〜かわいい猫と共に学ぼう。ฅ^•ω•^ฅ ニャ〜
hekuchan
2
690
インターン生でもAuth0で認証基盤刷新が出来るのか
taku271
0
190
CSC307 Lecture 03
javiergs
PRO
1
490
QAフローを最適化し、品質水準を満たしながらリリースまでの期間を最短化する #RSGT2026
shibayu36
2
4.4k
Unicodeどうしてる? PHPから見たUnicode対応と他言語での対応についてのお伺い
youkidearitai
PRO
1
2.6k
副作用をどこに置くか問題:オブジェクト指向で整理する設計判断ツリー
koxya
1
610
MUSUBIXとは
nahisaho
0
140
Featured
See All Featured
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
128
55k
Bootstrapping a Software Product
garrettdimon
PRO
307
120k
The Anti-SEO Checklist Checklist. Pubcon Cyber Week
ryanjones
0
70
The MySQL Ecosystem @ GitHub 2015
samlambert
251
13k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
359
30k
Bridging the Design Gap: How Collaborative Modelling removes blockers to flow between stakeholders and teams @FastFlow conf
baasie
0
450
How to Think Like a Performance Engineer
csswizardry
28
2.5k
Leveraging LLMs for student feedback in introductory data science courses - posit::conf(2025)
minecr
0
160
The #1 spot is gone: here's how to win anyway
tamaranovitovic
2
950
What's in a price? How to price your products and services
michaelherold
247
13k
Jamie Indigo - Trashchat’s Guide to Black Boxes: Technical SEO Tactics for LLMs
techseoconnect
PRO
0
64
How Software Deployment tools have changed in the past 20 years
geshan
0
32k
Transcript
Web scraping Irio Musskopf Data Science Retreat for data scientists
Finding data Not always easy
1. Downloadable dataset
2.APIs
3. Scraping
4.Talk with other companies
4.Produce yourself
Doesn’t matter how complex the system is. It is possible.
Doesn’t matter how complex the system is. It is possible.
Unless there’s a captcha.
None
DEMO
Selectors Limitations User agents Proxies
Irio Musskopf
[email protected]
Thanks