Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Scrapy Overview
Search
JusBrasil
April 12, 2013
Programming
2
190
Scrapy Overview
An overview of the Scrapy framework by @cacovsky
JusBrasil
April 12, 2013
Tweet
Share
Other Decks in Programming
See All in Programming
AI OCR API on Lambdaを Datadogで可視化してみた
nealle
0
210
個人軟體時代
ethanhuang13
0
110
MLH State of the League: 2026 Season
theycallmeswift
0
160
tool ディレクティブを導入してみた感想
sgash708
1
150
CSC305 Summer Lecture 12
javiergs
PRO
0
130
testingを眺める
matumoto
1
120
フロントエンドのmonorepo化と責務分離のリアーキテクト
kajitack
2
140
コーディングは技術者(エンジニア)の嗜みでして / Learning the System Development Mindset from Rock Lady
mackey0225
2
590
私の後悔をAWS DMSで解決した話
hiramax
4
160
オープンセミナー2025@広島LT技術ブログを続けるには
satoshi256kbyte
0
140
[FEConf 2025] 모노레포 절망편, 14개 레포로 부활하기까지 걸린 1년
mmmaxkim
0
1.2k
Trem on Rails - Prompt Engineering com Ruby
elainenaomi
1
100
Featured
See All Featured
Rails Girls Zürich Keynote
gr2m
95
14k
Thoughts on Productivity
jonyablonski
69
4.8k
Build The Right Thing And Hit Your Dates
maggiecrowley
37
2.8k
Being A Developer After 40
akosma
90
590k
Producing Creativity
orderedlist
PRO
347
40k
Large-scale JavaScript Application Architecture
addyosmani
512
110k
Mobile First: as difficult as doing things right
swwweet
223
9.9k
Creating an realtime collaboration tool: Agile Flush - .NET Oxford
marcduiker
31
2.2k
The Psychology of Web Performance [Beyond Tellerrand 2023]
tammyeverts
49
3k
Gamification - CAS2011
davidbonilla
81
5.4k
Fight the Zombie Pattern Library - RWD Summit 2016
marcelosomers
234
17k
4 Signs Your Business is Dying
shpigford
184
22k
Transcript
Scrapy an overview
/skræpi/
Web Crawler vs. Web Scraper
None
None
Scrapy Framework Scraping / Crawling / Monitoring / Testing
Stable Active Large community
~200 pages of docs
Commercial support
Framework?
None
None
None
Twisted event loop (reactor)
None
Your code goes here
The scraping logic
None
HttpErrorMiddleware UrlLengthMiddleware DepthMiddleware
HttpProxyMiddleware HttpCacheMiddleware RedirectMiddleware
Media download Persistence Post-processing
Data flow control
Queuing
Talk is cheap, show me the code.
$ pip install Scrapy $ scrapy startproject home_news
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Project root
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Project config
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Project module
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your items
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your pipelines
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your settings
home_news/ scrapy.cfg home_news/ __init__.py items.py pipelines.py settings.py spiders/ __init__.py ...
Your spiders...
None
//*[@id="glbcorpo"]/div/div[1]/div[1]/div[2]/div[1]/div[1]/div/div/a/@href
//*[@id="glbmateria"]/div[2]/h1/text()
//*[@id="materialetra"]/div/div/p[1]/text()
None
$ pwd /home/caco/studies/scrapy_news/home_news
$ pwd /home/caco/studies/scrapy_news/home_news (project root)
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json
$ pwd /home/caco/studies/scrapy_news/home_news $ scrapy crawl g1 -o scraped_data.json -t
json (feed exporters: json,csv,xml)
None
None
None
Other nice features • scrapyd: run as a service •
Webservice (issue commands via http requests) • Signals • Stats module • Contribs (CrawlSpider etc)
Obrigado! @cacovsky Thanks! @cacovsky
Images Spatula http://www.duebuoi.it/x/uk_usd/catalog/p/spatulas~805-16x10.html Spiderman http://tincan21.deviantart.com/art/muro-spidey-307810412