Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Web scraping for data scientists
Search
Irio Musskopf
May 24, 2016
Programming
90
0
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Web scraping for data scientists
Irio Musskopf
May 24, 2016
More Decks by Irio Musskopf
See All by Irio Musskopf
Using Machine Learning and Open Data to Report 216 Brazilian Congresspeople for Corruption
irio
0
390
Por que functional programming é mais rápido?
irio
0
78
No país das maravilhas
irio
0
60
Desenvolvendo o mínimo com Ruby on Rails
irio
0
140
Implementando pagamentos usando Moip
irio
0
94
vim 101
irio
1
230
Other Decks in Programming
See All in Programming
The NotImplementedError Problem in Ruby
koic
1
760
Inside Stream API
skrb
1
700
Spec Driven Development | AI Summit Lisbon
danielsogl
PRO
0
190
「エンジニアインターン、どうやって取った?」準備のリアルを語るLT会 Progate BAR
akiomatic
0
130
TSKaigi Night Talks 2026_TypeScriptでサプライチェーンの整合性を型に閉じ込める
geekplus_tech
0
340
TAKTでAI駆動開発の品質を設計する
j5ik2o
6
1.2k
生成AI時代にこそ効くGo | Why Go Works in the Age of Generative AI
mom0tomo
8
3.2k
Hunting Vulnerabilities in Symfony with LLMs
vinceamstoutz
0
540
Mujeres en SEO Summit 2026 - Greatest Disaster Hits en Web Performance
guaca
0
170
DynamoDBには集計系のクエリがないけどなんとかしたい
musan
1
140
Technical Debt: Understanding it Rightly, Engaging it Rightly #LaravelLiveJP
shogogg
0
220
Vite+ Unified Toolchain for the Web
naokihaba
0
300
Featured
See All Featured
4 Signs Your Business is Dying
shpigford
187
22k
Are puppies a ranking factor?
jonoalderson
1
3.5k
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
250
From π to Pie charts
rasagy
0
210
エンジニアに許された特別な時間の終わり
watany
107
250k
Into the Great Unknown - MozCon
thekraken
41
2.6k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
55k
Technical Leadership for Architectural Decision Making
baasie
3
410
The browser strikes back
jonoalderson
0
1.2k
How to train your dragon (web standard)
notwaldorf
97
6.7k
Applied NLP in the Age of Generative AI
inesmontani
PRO
4
2.3k
How Fast Is Fast Enough? [PerfNow 2025]
tammyeverts
3
610
Transcript
Web scraping Irio Musskopf Data Science Retreat for data scientists
Finding data Not always easy
1. Downloadable dataset
2.APIs
3. Scraping
4.Talk with other companies
4.Produce yourself
Doesn’t matter how complex the system is. It is possible.
Doesn’t matter how complex the system is. It is possible.
Unless there’s a captcha.
None
DEMO
Selectors Limitations User agents Proxies
Irio Musskopf
[email protected]
Thanks