Developing a Crawler API with Scrapy and Klein - PyCon Colombia 2020

DEVELOPING A CRAWLER API WITH SCRAPY AND KLEIN PYCON COLOMBIA
2020 BETINA COSTA

Betina Costa SOFTWARE ENGINEER BRAZILIAN SPEAKER CRAZY CAT LADY POLE
DANCER INTROVERT MOVIE GEEK PYCON COLOMBIA

PYCON COLOMBIA Tutorial Goal Today we will develop an API
to search phrases by tags on the site http://quotes.toscrape.com/ . Our API should receive a tag as parameter, scrap the page and return a json containing a list with quotes and authors that belonging to that tag.

PYCON COLOMBIA Workshop Summary Points to Cover Introduction and Setup
Scrapy Spiders and Selectors Building the Spider Exercise Handle Scrapy async behaviour with Klein Building the API exercise Wrapping up and Questions

PYCON COLOMBIA System Requirements PYTHON 3 PIPENV $ pip install
pipenv

What is Scrapy? IS A FREE OPEN SOURCE WEB- CRAWLING
FRAMEWORK WRITTEN IN PYTHON iIt is currently maintained by Scrapinghub, a web-scraping development and services company. PYCON COLOMBIA

Why Scrapy? It's open source and free to use; It's
easy to build and scale; It has a tool called Selector for data extraction; Handles calls asynchronously and quickly; PYCON COLOMBIA

PYCON COLOMBIA SPIDERS AND SELECTORS Let's dive into some HTML
and CSS... Please, don't run away

Spiders and Selectors SPIDERS Spiders are classes that we define
and that Scrapy uses to crawl information on websites. Scrapy comes with its own mechanism for extracting data. They’re called selectors because they “select” certain parts of the HTML document specified either by XPath or CSS expressions. SELECTORS

PYCON COLOMBIA

PYCON COLOMBIA <div class="tags"> <a class="tags"> "change"

LET'S GET TO WORK! PYCON COLOMBIA http://bit.ly/workshop_py2020

PYCON COLOMBIA HANDLING ASYNC BEHAVIOUR With Klein \o/

Why Klein? KLEIN IS A MICRO-FRAMEWORK FOR DEVELOPING PRODUCTION-READY WEB
SERVICES WITH PYTHON. It’s built on widely used and well tested components like Werkzeug and Twisted, and has near-complete test coverage. PYCON COLOMBIA

Why Klein? KLEIN IS A MICRO-FRAMEWORK FOR DEVELOPING PRODUCTION-READY WEB
SERVICES WITH PYTHON. It’s built on widely used and well tested components like Werkzeug and Twisted, and has near-complete test coverage. PYCON COLOMBIA TWISTED Twisted is an event-driven networking engine written in Python

Why Klein? REMEBER THAT SCRAPY HANDLES CALLS ASYNCHRONOUSLY? So, for
that reason it doesn't usually talks very well with frameworks that are usually used to making requests synchronously. But Klein can helps with that! PYCON COLOMBIA

BACK TO WORK! PYCON COLOMBIA http://bit.ly/workshop_py2020 WE WILL HAVE LUNCH
SOON, STAY WITH ME

THANK YOU! PYCON COLOMBIA

Developing a Crawler API with Scrapy and Klein ...

Developing a Crawler API with Scrapy and Klein - PyCon Colombia 2020

Betina Costa

More Decks by Betina Costa

Other Decks in Technology

Featured

Transcript