Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Web Scraping 101

Web Scraping 101

Cb1af1fae773efa19477c50827ebe998?s=128

Cyrus Stoller

November 17, 2015
Tweet

More Decks by Cyrus Stoller

Other Decks in How-to & DIY

Transcript

  1. Web Scraping @cyrusstoller November 17, 2015

  2. Repetitive tasks? No thank you.

  3. None
  4. None
  5. Ruby gem install faraday nokogiri Python pip install scrapy Javascript

    / node.js npm install cheerio cURL / wget curl -o http://example.com ! wget -r --level=2 http://example.com/
  6. None
  7. None
  8. Defining the data we want

  9. You can look this up on your own

  10. You can look this up on your own

  11. What’s an HTTP request?

  12. Making an HTTP request

  13. Dealing with Authentication

  14. None
  15. None
  16. Concurrency

  17. Picking what you want

  18. None
  19. <code walkthrough>

  20. Turn it up

  21. Questions?

  22. twitter: @cyrusstoller github: @cyrusstoller blog: cyrusstoller.com ! possible spring workshop

    series on automation and web scraping