2021-06-24_UNVT exercise day03

UNVT Exercise Day 3 24 June 2021 By Taro Ubukawa
(Senior Geospatial Expert at UN Geospatial Information Section) Some slides have been removed from original because they explained our internal contents.

Whole schedule

Today’s contents - data conversion (advanced) • Part 1: Data
structure • Part 2: Data conversion from PostGIS data base • Part 3: Updating method Produce Host Style Optimize Import UNVT workflow of developing vector tile for base map Supported by UNVT, open source scripts Source DB Web APP Users Our focus today.

Part 1: Data structure - Designing Vector Tile Structure

Design your vector tile structure • From your source data,
you need to decide the structure of your vector tile first. (Layer name, zoom levels, etc) • It would be nice if you can adjust vector tile structure so that you can have better map. (So that you can easily make style layers.) VT layer Vector tile (VT) Your source in PostGIS Web Map with certain style convert View/table VT layer VT layer View/table View/table … … PostGIS Vector tile Style layer Style layer … Style Access Structures of the source and VT are not necessarily the same. For example: • You can remove some layers/features at certain zoom level. • You can drop unnecessary attribution in VT.

VT design tips (1) You can merge two sources into
a single vector tile layers. (Technically, vice versa, you can make more than one VT layers from single source.) • For example, for the vector tile layer “landmass”, we used two tables from PostGIS. Source changes based on vector tile scale. name of view/table in source Category/Group VT layer name min zoom max zoom custom_planet_land_a Base landmass 8 15 custom_planet_land_a_l08 Base landmass 0 7 Two source tables go into a single VT layer Example: landmass

name of view/table in source Necessary Attribution for conversion (in
addition to geom) Category/ Group VT layer name min zoom max zoom unhq_bndl25 bdytyp_code,bdytype, iso3cd Admin bndl 0 2 unhq_bndl05 bdytyp_code,bdytype, iso3cd Admin bndl 3 4 unhq_bndl bdytyp_code,bdytype, iso3cd Admin bndl (hq) 5 15 custom_unmap_0_bndl type_code, type Admin bndl (c_bnd) 6 15 un_unvmc_igac_bndl (all) Admin bndl (vmc) 6 15 un_unmik_bndl boundary_type_code, coundary_type Admin bndl (mik) 7 15 Another example: “bndl” layer. • We use bndl25 and bndl05 for small scale. • For larger scale, we use more than one source. There attributions were adjusted during conversion. (You can apply a kind of filter where necessary) VT design tips (2)

Part 2: Data conversion from Post GIS data base

Prepare data and Start Docker • Copy data from the
MS Teams channel. Or download it at https://github.com/unvt/ezPrac (ex3-data) • Start command prompt (I used Windows Powershell), and move to the working directory. • Then, start Docker (unvt/nanban) > docker pull unvt/nanban > docker run -it --rm -v ${PWD}:/data unvt/nanban Please check your location and move to /data directory (cd /data). Here, I do not explain the preparation of the Docker environment. Please see https://docs.docker.com/docker-for-windows/install/

Vector tile conversion - warm up 1 Practice 1: Using
gdal (ogr2ogr), let’s get GeoJSON sequence (as a file and as a standard output) from the source files (shp). • Export file as GeoJSON sequences file. (with gdal ver 2.4.0. or later) • > ogr2org -f GeoJSONseq test.geojsons location.shp • You can also output the result to the standard output. • > ogr2ogr -f GeoJSONSeq -lco RS=YES /vsistdout/ location.shp Red is the place where you write your source. Try various data

(BTW) Why did we try standard output? Vector Tiles source
GeoJSON Convert the source into GeoJSON Make vector tile with Tippecanoe Vector Tiles source GeoJSON sequence Tippecanoe Tippecanoe Gdal, etc Gdal, etc We practiced at the day 2. This is simple and easy for beginners. For large data, this is more efficient. We will try this today. Should we have intermediate files? It depends on the situation.

Vector tile conversion - warm up 2 Then, let’s access
PostGIS database • Export to GeoJSONseq file • > ogr2ogr -f GeoJSONSeq output.geojsons PG:"dbname=‘your_db' host=‘xx.xxx.xxx.xx' port=’54xx’ user=‘yourNAME' password=‘Your pass'" -sql "SELECT * FROM yourTable" • You can also export to the standard output • > ogr2ogr -f GeoJSONSeq -lco RS=YES /vsistdout/ PG:"dbname=‘your_db' host=‘xx.xxx.xxx.xx' port=‘54xx' user=‘yourNAME' password=‘Your pass'" -sql "SELECT * FROM yourTable " Practice 2: Using gdal (ogr2ogr), let’s get json sequence (as a file and as a standard output) from the source files (PostGIS). It gets longer, but the red is the place where you write your source. We do not have PostGIS server for practice. Try with your own server!

Exported GeoJSON sequence • Now, you can export the source
as the standard output. (keep docker onep for the next practice) You may find object sequence, which will be inserted into the conversion tool “Tippecanoe”. Object has the “properties” property and “geometory” property. We will add “tippecanoe” properties to the object before the conversion.

Revisit: Mr. Fujimura’s slide https://speakerdeck.com/hfu/unvt-workshop-introduction-and-application?slide=24 We will add “tippecanoe” properties
to the object before the conversion.

Measure to efficiently prepare GeoJSON sequence for data conversion. •
We will add “tippecanoe” properties to the object before the conversion. • We may also need to adjust properties before the conversion. How we can do it efficiently? → We will use nodejs. (Good at asynchronous processing as well. → Efficient conversion.)

Use of the nodejs: Concept (1) • Practice: Basic idea
before the conversion (standard output) ogr2ogr -data into GeoJSON seq- parser -edit each line in GeoJSON seq- Child process: spawn pipe stdout const ogr2ogr = spawn(ogr2ogrPath, [ '-f', 'GeoJSONSeq', '-lco', 'RS=YES', '/vsistdout/', src.url ]) const parser = new Parser() .on('data', f => { f = renameProperties(f) f.tippecanoe = { layer: src.layer, minzoom: src.minzoom, maxzoom: src.maxzoom } downstream.write(`¥x1e${JSON.stringify(f)}¥n`) }) ogr2ogr.stdout.pipe(parser) downstream = process.stdout Here, you can modify source data property. Read the source Adjust/add properties

Practice 3 • In docker, run the following script. >
cd /data > npm install > node index01.js • What did you see? You will see GeoJSON sequence in your standard output. (Try editing config/default.json to change the source, etc.) Use of the nodejs: Concept (1)

• Step 2: Extend previous practice, output into “tippecanoe” ogr2ogr
-data into GeoJSON seq- parser -edit each line in GeoJSON seq- pipe Tippecanoe. stdin Tippecanoe const tippecanoe = spawn(tippecanoePath, [ `--output-to-directory=${dstDir}`, `--no-tile-compression`, `--minimum-zoom=${minzoom}`, `--maximum-zoom=${maxzoom}` ], { stdio: ['pipe', 'inherit', 'inherit'] }) const downstream = tippecanoe.stdin Use of the nodejs: Concept (2) Tippecanoe does not work on Powerschell. So, we need to run the script in docker container, or we can try it on the windows Linux system. Read the source Adjust/add properties Data conversion

Practice 4 • In docker, run the following script. >
cd /data > npm install > node index02.js • Then, you will directly get vector tile. (Try editing config/default.json to change the source, etc.) Congratulation. Now, you can convert your PostGIS data using nodejs (on Dokcer) by yourself. Use of the nodejs: Concept (2) We do not have PostGIS server for practice. So, this practice uses shapefiles (or GeoJSON) in the local folder. Try with your own PostGIS data by replacing the source and nodejs script.

For further extension (1) •For reading the data directly from
PostGIS, you can use gdal (ogr2ogr), but you can try one of nodejs packages named “pg”, which is non- blocking PostgreSQL client for nodejs. (Our UNVT tools often use it) Sometimes, PostGIS data is so large. It is hard for gdal to treat the data at once.

For further extension (2): One mor thought on spatial extent
• If your data is big, converting whole area could be tough work. It would be wise to convert the data by using area module. • For example, if your data is about 150 GB in total, you may want to convert the data area by area. • You can do this by using query of PostGIS. For example, https://github.com/un-vector-tile- toolkit/produce-gsc-3- FH/blob/main/produce-gsc-osm- 46/index_day01.js#L173 For large data, mbtiles format would be better than pbf. But, I try to keep a single mbtiles file within a few GB.

Part 3: Updating method

Basic Idea of Updating un_base osm_base Large scale tiles (ZL6-15)
* Mbtiles (841 files) PostGIS (source) Small scale tiles (ZL0-5) * Mbtiles UN and OSM VT Small scale UN VT Large scale OSM VT Large scale UN and OSM Large scale Daily/weekly update Update when necessary We do not have to update all the data source regularly because some of the source are stable. Some layers need regular update, while others need simple updates when necessary. merge Hosting Server Update when necessary

Weekly/daily updating of OSM source by Area • It would
take more than 3 days to convert the whole globe. • 841 modules are classified into several groups to go through daily/weekly updates as shown in the figure. • Data size distribution is not homogenous. • We have nodejs script for each group.

Data conversion • Conversion work at our server in dev
env. Area Number of modules (ZL4-5-6) Data size (osm) OSM Conversion Time (c=3) Data size (osm & un merged) Time needed to merge Priority (everyday) 69 (0-13-56) 9.8 GB 3h36m 14 GB 1h11m 1 (day 1) 114 (0-13-101) 22 GB 6h36m 25 GB 1h51m 2 (day 2) 116 (6-38-72) 16 GB 5h47m 20 GB 1h38m 3 (day 3) 31 (0-0-31) 26 GB 5h28m 26 GB 1h22m 4 (day 4) 118 (6-29-83) 25 GB 7h03m 27 GB 2h23m 5 (day 5) 126 (37-29-60) 19 GB 5h27m 21 GB 2h05m 6 (day 6) 137 (36-48-53) 16 GB 5h54m 19 GB 2h06m 7 (day 7) 130 (90-40-0) 23 GB 5h15m 29 GB 3h55m Total 841 modules

Update as a scheduled task: Use of crontab • You
can use crontab in your LINUX server to update the vector tile regularity. (Given that you have scripts for update works) • In our case, it includes the regular tasks as below: • Running regular conversions of OSM source data (daily or weekly update) to obtain updated vector tile (in mbtiles format). • Merging updated tile (OSM sourced) with consistent UN sourced tiles by executing “tile-join” command, which is a part of Tippecanoe. (UN sourced tiles are updated when needed, not regularly.) • Merged tiles are copied into the hosting server by using “scp” command.

Example <Example: sh file> node (location)/index_day01.js for f in (location
of osm updated tile)/*.mbtiles; do /usr/local/bin/tile-join --no-tile-size-limit -f -o (location of combined tile)/tile_day01/`basename ${f}` (location of UN tile)/`basename ${f}` (location of osm tile)/`basename ${f}`; date; echo `basename ${f}`; ls -alh large_tiles/unosm/tile_day01/`basename ${f}` ;done scp -i (path to ssh key) -r (location of combined tile)/tile_day01/* (user name)@(hosting server):(location in the hosting server) Scheduled task

Summary • Part 1: You learned that you can flexibly
design your vector tile structure. It is not necessarily the same with the source PostGIS table/view structure. • Part 2: You learned the basic idea of data conversion using nodejs. • Part 3: You learned how I update the vector tile in Linux server.

References • Matsuzawa, T. (2019): Create/distribute vector tile, https://speakerdeck.com/smellman/distrubute-vector-tile •
Install Docker Desktop on Windows: https://docs.docker.com/docker-for-windows/install/ • Tippecanoe: https://github.com/mapbox/tippecanoe • Maputnik editor: https://maputnik.github.io/editor/ • UNVT workshop (in Japanese): https://github.com/unvt/512 • Fujimura, H. (2018): T dashboard の行政界データをベクトルタイルにした, https://qiita.com/hfu/items/35e6dc67d55f3bcec181 • Fujimura, H. (2021): UNVT Workshop: Introduction and Application https://speakerdeck.com/hfu/unvt-workshop-introduction-and-application

More practice? • https://github.com/unvt/ezPrac

2021-06-24_UNVT exercise day03

2021-06-24_UNVT exercise day03

UBUKAWA Taro

More Decks by UBUKAWA Taro

Other Decks in Technology

Featured

Transcript

UNVT Exercise Day 3 24 June 2021 By Taro Ubukawa

Whole schedule

Today’s contents - data conversion (advanced) • Part 1: Data

Part 1: Data structure - Designing Vector Tile Structure

Design your vector tile structure • From your source data,

VT design tips (1) You can merge two sources into

name of view/table in source Necessary Attribution for conversion (in

Part 2: Data conversion from Post GIS data base

Prepare data and Start Docker • Copy data from the

Vector tile conversion - warm up 1 Practice 1: Using

(BTW) Why did we try standard output? Vector Tiles source

Vector tile conversion - warm up 2 Then, let’s access

Exported GeoJSON sequence • Now, you can export the source

Revisit: Mr. Fujimura’s slide https://speakerdeck.com/hfu/unvt-workshop-introduction-and-application?slide=24 We will add “tippecanoe” properties

Measure to efficiently prepare GeoJSON sequence for data conversion. •

Use of the nodejs: Concept (1) • Practice: Basic idea

Practice 3 • In docker, run the following script. >

• Step 2: Extend previous practice, output into “tippecanoe” ogr2ogr

Practice 4 • In docker, run the following script. >

For further extension (1) •For reading the data directly from

For further extension (2): One mor thought on spatial extent

Part 3: Updating method

Basic Idea of Updating un_base osm_base Large scale tiles (ZL6-15)

Weekly/daily updating of OSM source by Area • It would

Data conversion • Conversion work at our server in dev

Update as a scheduled task: Use of crontab • You

Example <Example: sh file> node (location)/index_day01.js for f in (location

Summary • Part 1: You learned that you can flexibly

References • Matsuzawa, T. (2019): Create/distribute vector tile, https://speakerdeck.com/smellman/distrubute-vector-tile •

More practice? • https://github.com/unvt/ezPrac