Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Web Scraping 101
Search
Cyrus Stoller
November 17, 2015
How-to & DIY
190
0
Share
Web Scraping 101
Cyrus Stoller
November 17, 2015
More Decks by Cyrus Stoller
See All by Cyrus Stoller
Guide to winning a hackathon
cyrusstoller
0
2k
Other Decks in How-to & DIY
See All in How-to & DIY
How to Stylus 20251031
hareyakayuruyaka
0
120
JAWS-UG/AWS Communities Updates 2025/11/8 JAWS-UG 島根支部
awsjcpm
1
150
おっきなガジェットの回線事情
2bo
1
200
JAWS/AWS Community Updates - JAWS-UG新潟 #29
awsjcpm
1
110
苦手の克服方法 / How to overcome weaknesses
toma_sm
0
360
スマートハウスの蓄電性能の効率化を実現してみた~電気自動車編~
runrunsan
0
440
JAWS-UG/AWSコミュニティプログラムのご紹介 - JAWS-UG 佐賀
awsjcpm
2
240
JAWS-UGとAWS - JAWS-UG彩の国埼玉設立のお祝い
awsjcpm
2
660
Azure PortalのQoLを上げてたら Big Techに怒られた
horihiro
2
580
JAWS-UG/AWSコミュニティ アップデート (JAWS-UG函館支部)
awsjcpm
3
140
ライブ感を生む 巻き込み型スライドの作り方/Create your slide like a heavy metal concert
ikuodanaka
5
1.5k
JAWS-UG 福岡 in 北九州 | JAWS-UG/AWSコミュニティ プログラムのご紹介
awsjcpm
1
200
Featured
See All Featured
How to Talk to Developers About Accessibility
jct
2
170
Hiding What from Whom? A Critical Review of the History of Programming languages for Music
tomoyanonymous
2
690
The Invisible Side of Design
smashingmag
302
51k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
11
880
The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
inesmontani
PRO
3
3.3k
Primal Persuasion: How to Engage the Brain for Learning That Lasts
tmiket
0
310
Marketing to machines
jonoalderson
1
5.1k
Visualization
eitanlees
150
17k
Learning to Love Humans: Emotional Interface Design
aarron
275
41k
KATA
mclloyd
PRO
35
15k
Amusing Abliteration
ianozsvald
1
150
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
52
5.9k
Transcript
Web Scraping @cyrusstoller November 17, 2015
Repetitive tasks? No thank you.
None
None
Ruby gem install faraday nokogiri Python pip install scrapy Javascript
/ node.js npm install cheerio cURL / wget curl -o http://example.com ! wget -r --level=2 http://example.com/
None
None
Defining the data we want
You can look this up on your own
You can look this up on your own
What’s an HTTP request?
Making an HTTP request
Dealing with Authentication
None
None
Concurrency
Picking what you want
None
<code walkthrough>
Turn it up
Questions?
twitter: @cyrusstoller github: @cyrusstoller blog: cyrusstoller.com ! possible spring workshop
series on automation and web scraping