Upgrade to Pro — share decks privately, control downloads, hide ads and more …

COVID 19 API: Scraping ArcGIS Untuk Bersenang-Senang, Menambah Pengetahuan, Dan Kemanusian

Odi
April 18, 2020

COVID 19 API: Scraping ArcGIS Untuk Bersenang-Senang, Menambah Pengetahuan, Dan Kemanusian

How I made COVID-19 API, presented at LiveCamp by WWWID: livecamp.wwwid.org

Odi

April 18, 2020
Tweet

More Decks by Odi

Other Decks in Programming

Transcript

  1. 1. Pre - Who - What - Why 2. Main

    - How - Tools - Tools - TOOLS 3. Post - Growth - Reach - Impact - Usage - Lesson learned
  2. Who

  3. Who

  4. Who

  5. - Just finished a surgery in the hospital - Tech

    Twitter started talking about the new coronavirus - Background
  6. If the COVID19 data was more accessible to everyone, more

    useful things can be made to combat it. Why
  7. 1. Too many sources 2. Not formatted uniformly 3. CORS

    JHU CSSE screenshot here Worldometers screenshot here Why
  8. 1. Too many sources 2. Not formatted uniformly 3. CORS

    JHU CSSE screenshot here Worldometers screenshot here Why REPUTABLE
  9. Analyze: The dashboard is an auto updating SPA No login

    required Need to extract data that is available in the page How
  10. How

  11. Who would win? 1. An extensive HTTP client library, combined

    with a blazing fast DOM parser/manipulator How
  12. Who would win? 1. An extensive HTTP client library, combined

    with a blazing fast DOM parser/manipulator 2. A smol inspect element boi How
  13. Who would win? 1. An extensive HTTP client library, combined

    with a blazing fast DOM parser/manipulator 2. A STRONK inspect element boi How
  14. How

  15. How

  16. How

  17. How

  18. How

  19. How

  20. Things to consider: All XHR to arcgis servers are using

    the same format (they have docs on this) There are 2 main formats: when the returned data is an array (collection) When the returned data is an item (statistics) Data wrangling
  21. Things to consider: All XHR to arcgis servers are using

    the same format (they have docs on this) There are 2 main formats: when the returned data is an array (collection) When the returned data is an item (statistics) Data wrangling
  22. Value in [bracket] will be available in req.query Handle exceeding

    data (ArcGIS limits only 1000 result count max) Do heavy calculations server side, but put it in cache for a bit I guess repeat as needed
  23. Value in [bracket] will be available in req.query Handle exceeding

    data (ArcGIS limits only 1000 result count max) Do heavy calculations server side, but put it in cache for a bit I guess repeat as needed
  24. 1. Fetch required data 2. Generate HTML string (+CSS/JS 3.

    Screenshot the generated HTML using Puppeteer 4. Return the image/png Open Graph image gen
  25. 1. Fetch required data 2. Generate HTML string (+CSS/JS 3.

    Screenshot the generated HTML using Puppeteer 4. Return the image/png Open Graph image gen
  26. 1. Fetch required data 2. Generate HTML string (+CSS/JS 3.

    Screenshot the generated HTML using Puppeteer 4. Return the image/png Open Graph image gen
  27. 1. Fetch required data 2. Generate HTML string (+CSS/JS 3.

    Screenshot the generated HTML using Puppeteer 4. Return the image/png Open Graph image gen
  28. 1. Fetch required data 2. Generate HTML string (+CSS/JS 3.

    Screenshot the generated HTML using Puppeteer 4. Return the image/png Open Graph image gen
  29. A ton of new friends Some job offers Sponsorship Made

    a tool to help “scrape” in this way using puppeteer ZEIT Now version upgrade broke it. Max 10s per lambda) Others
  30. I would use ZEIT again if there are no long

    running processes (fit for lambdas), since it’s very economic and highly scalable. I would setup a persistence layer from day 1. Also set up diffing ala git from day 1. Analytics are useful but they can be EXPENSIVE. Integration tests are ESSENTIAL. Especially so when you are scraping. Just build it. Lesson learned