COVID 19 API: Scraping ArcGIS Untuk Bersenang-Senang, Menambah Pengetahuan, Dan Kemanusian

Scraping ArcGIS @mathdroid FOR FUN AND PROFIT

Scraping ArcGIS @mathdroid FOR FUN AND PROFIT KNOWLEDGE AND HUMANITY

1. Pre - Who - What - Why 2. Main
- How - Tools - Tools - TOOLS 3. Post - Growth - Reach - Impact - Usage - Lesson learned

Who What

- Just finished a surgery in the hospital - Tech
Twitter started talking about the new coronavirus - Background

If the COVID19 data was more accessible to everyone, more
useful things can be made to combat it. Why

1. Too many sources Why

1. Too many sources 2. Not formatted uniformly 3. CORS
JHU CSSE screenshot here Worldometers screenshot here Why

1. Too many sources 2. Not formatted uniformly Why

Why

JHU CSSE screenshot here Worldometers screenshot here Why REPUTABLE

Analyze: The dashboard is an auto updating SPA No login
required Need to extract data that is available in the page How

ALL ROADS LEAD TO ROME How

Who would win? 1. An extensive HTTP client library, combined
with a blazing fast DOM parser/manipulator How

with a blazing fast DOM parser/manipulator 2. A smol inspect element boi How

with a blazing fast DOM parser/manipulator 2. A STRONK inspect element boi How

How EASY?

How EASY? REQUIRED HEADERS

fetch("https://services9.arcgis.com/N9p5hsImWXAccRNI/arcgis/rest/services/N c2JKvYFoAEOFCG5JSI6/FeatureServer/3/query?f=json&returnGeometry=false&spati alRel=esriSpatialRelIntersects&outFields=*&outStatistics=%5B%7B%22statistic Type%22%3A%22exceedslimit%22%2C%22outStatisticFieldName%22%3A%22exceedslimi t%22%2C%22maxPointCount%22%3A4000%2C%22maxRecordCount%22%3A2000%2C%22maxVer texCount%22%3A250000%7D%5D", { "referrer": "https://www.arcgis.com/apps/opsdashboard/index.html", "referrerPolicy":
"no-referrer-when-downgrade", "body": null, "method": "GET", "mode": "cors" }); How

curl 'https://services9.arcgis.com/N9p5hsImWXAccRNI/arcgis/rest/services/Nc2JKvY FoAEOFCG5JSI6/FeatureServer/3/query?f=json&returnGeometry=false&spatialRel= esriSpatialRelIntersects&outFields=*&outStatistics=%5B%7B%22statisticType%2 2%3A%22exceedslimit%22%2C%22outStatisticFieldName%22%3A%22exceedslimit%22%2 C%22maxPointCount%22%3A4000%2C%22maxRecordCount%22%3A2000%2C%22maxVertexCou nt%22%3A250000%7D%5D' \ -H 'Referer:
https://www.arcgis.com/apps/opsdashboard/index.html' \ -H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.113 Safari/537.36' \ -H 'DNT: 1' \ --compressed How

Tool 1 Insomnia

“SCRAPE” IS POSSIBLE!

Scribble

Tool 2 ZEIT Now

Tool 2 ZEIT Now SUCCESS

Things to consider: All XHR to arcgis servers are using
the same format (they have docs on this) There are 2 main formats: when the returned data is an array (collection) When the returned data is an item (statistics) Data wrangling

Value in [bracket] will be available in req.query Handle exceeding
data (ArcGIS limits only 1000 result count max) Do heavy calculations server side, but put it in cache for a bit I guess repeat as needed

uWu wats this

1. Fetch required data 2. Generate HTML string (+CSS/JS 3.
Screenshot the generated HTML using Puppeteer 4. Return the image/png Open Graph image gen

Cache setup

Cache setup STALE WHILE REVALIDATE

stale-while-revalidate

NOW WE WAIT

MONITORING

Out of nowhere, Post

Out of nowhere, Post NOTICED BY SENPAI LITERALLY THE JAVASCRIPT
SENPAI

And not only websites

AS STRONG AS YOUR CREDIT CARD

$0 PER MONTH

A ton of new friends Some job offers Sponsorship Made
a tool to help “scrape” in this way using puppeteer ZEIT Now version upgrade broke it. Max 10s per lambda) Others

I would use ZEIT again if there are no long
running processes (fit for lambdas), since it’s very economic and highly scalable. I would setup a persistence layer from day 1. Also set up diffing ala git from day 1. Analytics are useful but they can be EXPENSIVE. Integration tests are ESSENTIAL. Especially so when you are scraping. Just build it. Lesson learned

Yahya @k1m0ch1 Yogs @teman_bahagia Dito @morpigg You People

COVID 19 API: Scraping ArcGIS Untuk Bersenang-S...

COVID 19 API: Scraping ArcGIS Untuk Bersenang-Senang, Menambah Pengetahuan, Dan Kemanusian

More Decks by Odi

Other Decks in Programming

Featured

Transcript