Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Web Scraping with Node: Exposing crawler data by API.

Web Scraping with Node: Exposing crawler data by API.

Rafaell Lycan

July 23, 2015
Tweet

More Decks by Rafaell Lycan

Other Decks in Programming

Transcript

  1. Who am I Rafael Almeida (Lycan) - @RafaellLycan Front-End Engineer

    at Tripda. Red Bull, RPG, Star Wars, Lego.
  2. Schedule What is Web Scraping/Crawler? Why we do that? How

    we do that in NodeJS? Our Target Code (Meh.)
  3. What is Web Scraping Web scraping is a computer software

    technique of extracting information from websites.
  4. What is Web Scraping Web scraping is closely related to

    web indexing, which indexes information on the web using a bot or web crawler and is a universal technique adopted by most search engines.
  5. Why we do that To Challenge ourselves To get more

    accessible data To improve our hacking skills To have fun
  6. Using Express var express = require('express'); var app = express();

    app.get('/', function (req, res) { res.json({message:'Star Wars Scraping API’}); }); app.listen(3000);
  7. Using Express //=========== var routes = require('./routes'); app.use('/', routes); app.use(function(err,

    req, res, next) { res.status(400).json({ error: err }); }); //===========
  8. Using Express var express = require('express'); var router = express.Router();

    router.get('/', function(req, res, next) { res.json({message:'Star Wars Scraping API'}); }); module.exports = router;
  9. Using Request var request = require('request'); request(url, function (error, response,

    body){ if(error){ return throw new Error(); } // Get body and do something nice! :) }
  10. Using Bluebird var Promise = require('bluebird');
 module.exports = { getTitle

    :function(url) { return new Promise(function(resolve, reject) { request(url, function (error, response, html) { if (error) { reject(error); } var $ = cheerio.load(html); var title = $(‘title').text(); resolve(title); } }
 }
  11. Using Bluebird //=========== var request = Promise.promisify(require('request')); //=========== getTitle :function(url)

    { return request(url).then(function(result){ var html = result[1];
 var $ = cheerio.load(html);
 var title = $(‘title’).text(); return title; } } //===========
  12. Using Google Analytics var ua = require('universal-analytics'); var Analytics =

    { visitor = ua('UA-XXXX-XX'), track : function() { this.visitor.pageview('/').send(); } }; module.exports = Analytics;