Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Developing a Crawler API with Scrapy and Klein - PyCon Colombia 2020

Developing a Crawler API with Scrapy and Klein - PyCon Colombia 2020

Today we will develop an API to search phrases by tags on the site http://quotes.toscrape.com/ . Our API should receive a tag as parameter, scrap the page and return a json containing a list with quotes and authors that belonging to that tag.

C5dcbcc9081a8345816340f06e9f18a7?s=128

Betina Costa

February 09, 2020
Tweet

Transcript

  1. DEVELOPING A CRAWLER API WITH SCRAPY AND KLEIN PYCON COLOMBIA

    2020 BETINA COSTA
  2. Betina Costa SOFTWARE ENGINEER BRAZILIAN SPEAKER CRAZY CAT LADY POLE

    DANCER INTROVERT MOVIE GEEK PYCON COLOMBIA
  3. PYCON COLOMBIA Tutorial Goal Today we will develop an API

    to search phrases by tags on the site http://quotes.toscrape.com/ . Our API should receive a tag as parameter, scrap the page and return a json containing a list with quotes and authors that belonging to that tag.
  4. PYCON COLOMBIA Workshop Summary Points to Cover Introduction and Setup

    Scrapy Spiders and Selectors Building the Spider Exercise Handle Scrapy async behaviour with Klein Building the API exercise Wrapping up and Questions
  5. PYCON COLOMBIA System Requirements PYTHON 3 PIPENV $ pip install

    pipenv
  6. What is Scrapy? IS A FREE OPEN SOURCE WEB- CRAWLING

    FRAMEWORK WRITTEN IN PYTHON iIt is currently maintained by Scrapinghub, a web-scraping development and services company. PYCON COLOMBIA
  7. Why Scrapy? It's open source and free to use; It's

    easy to build and scale; It has a tool called Selector for data extraction; Handles calls asynchronously and quickly; PYCON COLOMBIA
  8. Why Scrapy? It's open source and free to use; It's

    easy to build and scale; It has a tool called Selector for data extraction; Handles calls asynchronously and quickly; PYCON COLOMBIA
  9. PYCON COLOMBIA SPIDERS AND SELECTORS Let's dive into some HTML

    and CSS... Please, don't run away
  10. Spiders and Selectors SPIDERS Spiders are classes that we define

    and that Scrapy uses to crawl information on websites. Scrapy comes with its own mechanism for extracting data. They’re called selectors because they “select” certain parts of the HTML document specified either by XPath or CSS expressions. SELECTORS
  11. PYCON COLOMBIA

  12. PYCON COLOMBIA

  13. PYCON COLOMBIA <div class="tags"> <a class="tags"> "change"

  14. LET'S GET TO WORK! PYCON COLOMBIA http://bit.ly/workshop_py2020

  15. PYCON COLOMBIA HANDLING ASYNC BEHAVIOUR With Klein \o/

  16. Why Klein? KLEIN IS A MICRO-FRAMEWORK FOR DEVELOPING PRODUCTION-READY WEB

    SERVICES WITH PYTHON. It’s built on widely used and well tested components like Werkzeug and Twisted, and has near-complete test coverage. PYCON COLOMBIA
  17. Why Klein? KLEIN IS A MICRO-FRAMEWORK FOR DEVELOPING PRODUCTION-READY WEB

    SERVICES WITH PYTHON. It’s built on widely used and well tested components like Werkzeug and Twisted, and has near-complete test coverage. PYCON COLOMBIA TWISTED Twisted is an event-driven networking engine written in Python
  18. Why Klein? REMEBER THAT SCRAPY HANDLES CALLS ASYNCHRONOUSLY? So, for

    that reason it doesn't usually talks very well with frameworks that are usually used to making requests synchronously. But Klein can helps with that! PYCON COLOMBIA
  19. BACK TO WORK! PYCON COLOMBIA http://bit.ly/workshop_py2020 WE WILL HAVE LUNCH

    SOON, STAY WITH ME
  20. THANK YOU! PYCON COLOMBIA