Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Scrapy Overview
Search
JusBrasil
April 12, 2013
Programming
2
190
Scrapy Overview
An overview of the Scrapy framework by @cacovsky
JusBrasil
April 12, 2013
Tweet
Share
Other Decks in Programming
See All in Programming
Rust 製のコードエディタ “Zed” を使ってみた
nearme_tech
PRO
0
110
SourceGeneratorのススメ
htkym
0
180
責任感のあるCloudWatchアラームを設計しよう
akihisaikeda
3
140
ZJIT: The Ruby 4 JIT Compiler / Ruby Release 30th Anniversary Party
k0kubun
1
390
AI Schema Enrichment for your Oracle AI Database
thatjeffsmith
0
210
AIエージェントの設計で注意するべきポイント6選
har1101
7
3.4k
KIKI_MBSD Cybersecurity Challenges 2025
ikema
0
1.3k
【卒業研究】会話ログ分析によるユーザーごとの関心に応じた話題提案手法
momok47
0
190
Implementation Patterns
denyspoltorak
0
270
生成AIを使ったコードレビューで定性的に品質カバー
chiilog
0
210
Patterns of Patterns
denyspoltorak
0
1.3k
15年続くIoTサービスのSREエンジニアが挑む分散トレーシング導入
melonps
0
140
Featured
See All Featured
Rails Girls Zürich Keynote
gr2m
96
14k
Highjacked: Video Game Concept Design
rkendrick25
PRO
1
280
Build your cross-platform service in a week with App Engine
jlugia
234
18k
The State of eCommerce SEO: How to Win in Today's Products SERPs - #SEOweek
aleyda
2
9.5k
How to Think Like a Performance Engineer
csswizardry
28
2.4k
Context Engineering - Making Every Token Count
addyosmani
9
640
Exploring the relationship between traditional SERPs and Gen AI search
raygrieselhuber
PRO
2
3.6k
The Illustrated Children's Guide to Kubernetes
chrisshort
51
51k
Digital Ethics as a Driver of Design Innovation
axbom
PRO
1
170
Designing for Performance
lara
610
70k
Designing Experiences People Love
moore
144
24k
Art, The Web, and Tiny UX
lynnandtonic
304
21k
Transcript
Scrapy an overview
/skræpi/
Web Crawler vs. Web Scraper
None
None
Scrapy Framework Scraping / Crawling / Monitoring / Testing
Stable Active Large community
~200 pages of docs
Commercial support
Framework?
None
None
None
Twisted event loop (reactor)
None
Your code goes here
The scraping logic
None
HttpErrorMiddleware UrlLengthMiddleware DepthMiddleware
HttpProxyMiddleware HttpCacheMiddleware RedirectMiddleware
Media download Persistence Post-processing
Data flow control
Queuing
Talk is cheap, show me the code.
$ pip install Scrapy $ scrapy startproject home_news
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Project root
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Project config
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Project module
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your items
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your pipelines
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your settings
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your spiders...
None
//*[@id="glbcorpo"]/div/div[1]/div[1]/div[2]/div[1]/div[1]/div/div/a/@href
//*[@id="glbmateria"]/div[2]/h1/text()
//*[@id="materialetra"]/div/div/p[1]/text()
None
$ pwd /home/caco/studies/scrapy_news/home_news
$ pwd /home/caco/studies/scrapy_news/home_news (project root)
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json (feed exporters: json,csv,xml)
None
None
None
Other nice features • scrapyd: run as a service •
Webservice (issue commands via http requests) • Signals • Stats module • Contribs (CrawlSpider etc)
Obrigado! @cacovsky Thanks! @cacovsky
Images Spatula http://www.duebuoi.it/x/uk_usd/catalog/p/spatulas~805-16x10.html Spiderman http://tincan21.deviantart.com/art/muro-spidey-307810412