Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Scrapy Overview
Search
JusBrasil
April 12, 2013
Programming
2
190
Scrapy Overview
An overview of the Scrapy framework by @cacovsky
JusBrasil
April 12, 2013
Tweet
Share
Other Decks in Programming
See All in Programming
Rediscover the Console - SymfonyCon Amsterdam 2025
chalasr
2
120
スタートアップを支える技術戦略と組織づくり
pospome
8
14k
Building AI with AI
inesmontani
PRO
1
460
【Streamlit x Snowflake】データ基盤からアプリ開発・AI活用まで、すべてをSnowflake内で実現
ayumu_yamaguchi
1
110
データファイルをAWSのDWHサービスに格納する / 20251115jawsug-tochigi
kasacchiful
2
100
AIエージェントを活かすPM術 AI駆動開発の現場から
gyuta
0
190
TypeScriptで設計する 堅牢さとUXを両立した非同期ワークフローの実現
moeka__c
6
2.8k
AI時代もSEOを頑張っている話
shirahama_x
0
210
dnx で実行できるコマンド、作ってみました
tomohisa
0
130
S3 VectorsとStrands Agentsを利用したAgentic RAGシステムの構築
tosuri13
4
250
Querying Design System デザインシステムの意思決定を支える構造検索
ikumatadokoro
1
1.2k
TUIライブラリつくってみた / i-just-make-TUI-library
kazto
1
290
Featured
See All Featured
jQuery: Nuts, Bolts and Bling
dougneiner
65
8k
Unsuck your backbone
ammeep
671
58k
Making the Leap to Tech Lead
cromwellryan
135
9.6k
Building Better People: How to give real-time feedback that sticks.
wjessup
370
20k
Being A Developer After 40
akosma
91
590k
Automating Front-end Workflow
addyosmani
1371
200k
[RailsConf 2023] Rails as a piece of cake
palkan
58
6.1k
Music & Morning Musume
bryan
46
7k
Site-Speed That Sticks
csswizardry
13
980
Writing Fast Ruby
sferik
630
62k
Performance Is Good for Brains [We Love Speed 2024]
tammyeverts
12
1.3k
Measuring & Analyzing Core Web Vitals
bluesmoon
9
690
Transcript
Scrapy an overview
/skræpi/
Web Crawler vs. Web Scraper
None
None
Scrapy Framework Scraping / Crawling / Monitoring / Testing
Stable Active Large community
~200 pages of docs
Commercial support
Framework?
None
None
None
Twisted event loop (reactor)
None
Your code goes here
The scraping logic
None
HttpErrorMiddleware UrlLengthMiddleware DepthMiddleware
HttpProxyMiddleware HttpCacheMiddleware RedirectMiddleware
Media download Persistence Post-processing
Data flow control
Queuing
Talk is cheap, show me the code.
$ pip install Scrapy $ scrapy startproject home_news
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Project root
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Project config
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Project module
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your items
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your pipelines
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your settings
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your spiders...
None
//*[@id="glbcorpo"]/div/div[1]/div[1]/div[2]/div[1]/div[1]/div/div/a/@href
//*[@id="glbmateria"]/div[2]/h1/text()
//*[@id="materialetra"]/div/div/p[1]/text()
None
$ pwd /home/caco/studies/scrapy_news/home_news
$ pwd /home/caco/studies/scrapy_news/home_news (project root)
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json (feed exporters: json,csv,xml)
None
None
None
Other nice features • scrapyd: run as a service •
Webservice (issue commands via http requests) • Signals • Stats module • Contribs (CrawlSpider etc)
Obrigado! @cacovsky Thanks! @cacovsky
Images Spatula http://www.duebuoi.it/x/uk_usd/catalog/p/spatulas~805-16x10.html Spiderman http://tincan21.deviantart.com/art/muro-spidey-307810412