Lock in $30 Savings on PRO—Offer Ends Soon! ⏳
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Scrapy Overview
Search
JusBrasil
April 12, 2013
Programming
2
190
Scrapy Overview
An overview of the Scrapy framework by @cacovsky
JusBrasil
April 12, 2013
Tweet
Share
Other Decks in Programming
See All in Programming
React Native New Architecture 移行実践報告
taminif
1
150
CSC509 Lecture 14
javiergs
PRO
0
220
俺流レスポンシブコーディング 2025
tak_dcxi
14
8.4k
チームをチームにするEM
hitode909
0
290
バックエンドエンジニアによる Amebaブログ K8s 基盤への CronJobの導入・運用経験
sunabig
0
140
手が足りない!兼業データエンジニアに必要だったアーキテクチャと立ち回り
zinkosuke
0
580
LLMで複雑な検索条件アセットから脱却する!! 生成的検索インタフェースの設計論
po3rin
2
640
Rediscover the Console - SymfonyCon Amsterdam 2025
chalasr
2
160
認証・認可の基本を学ぼう前編
kouyuume
0
190
TUIライブラリつくってみた / i-just-make-TUI-library
kazto
1
360
AIエージェントを活かすPM術 AI駆動開発の現場から
gyuta
0
350
Full-Cycle Reactivity in Angular: SignalStore mit Signal Forms und Resources
manfredsteyer
PRO
0
120
Featured
See All Featured
It's Worth the Effort
3n
187
29k
Site-Speed That Sticks
csswizardry
13
990
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
15k
Documentation Writing (for coders)
carmenintech
76
5.2k
Building a Scalable Design System with Sketch
lauravandoore
463
34k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
31
9.8k
Speed Design
sergeychernyshev
33
1.4k
The Cost Of JavaScript in 2023
addyosmani
55
9.3k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
970
How STYLIGHT went responsive
nonsquared
100
5.9k
BBQ
matthewcrist
89
9.9k
Scaling GitHub
holman
464
140k
Transcript
Scrapy an overview
/skræpi/
Web Crawler vs. Web Scraper
None
None
Scrapy Framework Scraping / Crawling / Monitoring / Testing
Stable Active Large community
~200 pages of docs
Commercial support
Framework?
None
None
None
Twisted event loop (reactor)
None
Your code goes here
The scraping logic
None
HttpErrorMiddleware UrlLengthMiddleware DepthMiddleware
HttpProxyMiddleware HttpCacheMiddleware RedirectMiddleware
Media download Persistence Post-processing
Data flow control
Queuing
Talk is cheap, show me the code.
$ pip install Scrapy $ scrapy startproject home_news
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Project root
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Project config
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Project module
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your items
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your pipelines
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your settings
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your spiders...
None
//*[@id="glbcorpo"]/div/div[1]/div[1]/div[2]/div[1]/div[1]/div/div/a/@href
//*[@id="glbmateria"]/div[2]/h1/text()
//*[@id="materialetra"]/div/div/p[1]/text()
None
$ pwd /home/caco/studies/scrapy_news/home_news
$ pwd /home/caco/studies/scrapy_news/home_news (project root)
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json (feed exporters: json,csv,xml)
None
None
None
Other nice features • scrapyd: run as a service •
Webservice (issue commands via http requests) • Signals • Stats module • Contribs (CrawlSpider etc)
Obrigado! @cacovsky Thanks! @cacovsky
Images Spatula http://www.duebuoi.it/x/uk_usd/catalog/p/spatulas~805-16x10.html Spiderman http://tincan21.deviantart.com/art/muro-spidey-307810412