Developing a Crawler API with Scrapy and Klein - PyCon Colombia 2020
Today we will develop an API to search phrases by tags on the site http://quotes.toscrape.com/ . Our API should receive a tag as parameter, scrap the page and return a json containing a list with quotes and authors that belonging to that tag.
PYCON COLOMBIA Tutorial Goal Today we will develop an API to search phrases by tags on the site http://quotes.toscrape.com/ . Our API should receive a tag as parameter, scrap the page and return a json containing a list with quotes and authors that belonging to that tag.
PYCON COLOMBIA Workshop Summary Points to Cover Introduction and Setup Scrapy Spiders and Selectors Building the Spider Exercise Handle Scrapy async behaviour with Klein Building the API exercise Wrapping up and Questions
What is Scrapy? IS A FREE OPEN SOURCE WEB- CRAWLING FRAMEWORK WRITTEN IN PYTHON iIt is currently maintained by Scrapinghub, a web-scraping development and services company. PYCON COLOMBIA
Why Scrapy? It's open source and free to use; It's easy to build and scale; It has a tool called Selector for data extraction; Handles calls asynchronously and quickly; PYCON COLOMBIA
Why Scrapy? It's open source and free to use; It's easy to build and scale; It has a tool called Selector for data extraction; Handles calls asynchronously and quickly; PYCON COLOMBIA
Spiders and Selectors SPIDERS Spiders are classes that we define and that Scrapy uses to crawl information on websites. Scrapy comes with its own mechanism for extracting data. They’re called selectors because they “select” certain parts of the HTML document specified either by XPath or CSS expressions. SELECTORS
Why Klein? KLEIN IS A MICRO-FRAMEWORK FOR DEVELOPING PRODUCTION-READY WEB SERVICES WITH PYTHON. It’s built on widely used and well tested components like Werkzeug and Twisted, and has near-complete test coverage. PYCON COLOMBIA
Why Klein? KLEIN IS A MICRO-FRAMEWORK FOR DEVELOPING PRODUCTION-READY WEB SERVICES WITH PYTHON. It’s built on widely used and well tested components like Werkzeug and Twisted, and has near-complete test coverage. PYCON COLOMBIA TWISTED Twisted is an event-driven networking engine written in Python
Why Klein? REMEBER THAT SCRAPY HANDLES CALLS ASYNCHRONOUSLY? So, for that reason it doesn't usually talks very well with frameworks that are usually used to making requests synchronously. But Klein can helps with that! PYCON COLOMBIA