Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Web scraping for data scientists
Search
Irio Musskopf
May 24, 2016
Programming
0
71
Web scraping for data scientists
Irio Musskopf
May 24, 2016
Tweet
Share
More Decks by Irio Musskopf
See All by Irio Musskopf
Using Machine Learning and Open Data to Report 216 Brazilian Congresspeople for Corruption
irio
0
350
Por que functional programming é mais rápido?
irio
0
66
No país das maravilhas
irio
0
45
Desenvolvendo o mínimo com Ruby on Rails
irio
0
130
Implementando pagamentos usando Moip
irio
0
88
vim 101
irio
1
220
Other Decks in Programming
See All in Programming
テーブル定義書の構造化抽出して、生成AIでDWH分析を試してみた / devio2025tokyo
kasacchiful
0
330
When Dependencies Fail: Building Antifragile Applications in a Fragile World
selcukusta
0
110
品質ワークショップをやってみた
nealle
0
650
The Past, Present, and Future of Enterprise Java
ivargrimstad
0
590
Reactive Thinking with Signals and the Resource API
manfredsteyer
PRO
0
120
Introducing RemoteCompose: break your UI out of the app sandbox.
camaelon
2
120
Node-REDのノードの開発・活用事例とコミュニティとの関わり(Node-RED Con Nagoya 2025)
404background
0
100
kiroとCodexで最高のSpec駆動開発を!!数時間で web3ネイティブなミニゲームを作ってみたよ!
mashharuki
0
980
AIと人間の共創開発!OSSで試行錯誤した開発スタイル
mae616
2
830
なんでRustの環境構築してないのにRust製のツールが動くの? / Why Do Rust-Based Tools Run Without a Rust Environment?
ssssota
14
47k
Swift Concurrency 年表クイズ
omochi
2
130
Migration to Signals, Resource API, and NgRx Signal Store
manfredsteyer
PRO
0
130
Featured
See All Featured
The Illustrated Children's Guide to Kubernetes
chrisshort
51
51k
Build your cross-platform service in a week with App Engine
jlugia
234
18k
Building a Modern Day E-commerce SEO Strategy
aleyda
44
7.9k
jQuery: Nuts, Bolts and Bling
dougneiner
65
7.9k
Reflections from 52 weeks, 52 projects
jeffersonlam
355
21k
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
2
170
Rails Girls Zürich Keynote
gr2m
95
14k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
31
2.6k
Context Engineering - Making Every Token Count
addyosmani
8
320
Code Reviewing Like a Champion
maltzj
526
40k
Six Lessons from altMBA
skipperchong
29
4k
How STYLIGHT went responsive
nonsquared
100
5.9k
Transcript
Web scraping Irio Musskopf Data Science Retreat for data scientists
Finding data Not always easy
1. Downloadable dataset
2.APIs
3. Scraping
4.Talk with other companies
4.Produce yourself
Doesn’t matter how complex the system is. It is possible.
Doesn’t matter how complex the system is. It is possible.
Unless there’s a captcha.
None
DEMO
Selectors Limitations User agents Proxies
Irio Musskopf
[email protected]
Thanks