Slide 1

Slide 1 text

@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY

Slide 2

Slide 2 text

@Kartones CartoDB

Slide 3

Slide 3 text

@Kartones Agenda 1) CSV Format Issues 2) Import Issues

Slide 4

Slide 4 text

@Kartones CSV FORMAT ISSUES

Slide 5

Slide 5 text

@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180 (2005)

Slide 6

Slide 6 text

@Kartones Intro Plain text Simple format Simple rules

Slide 7

Slide 7 text

@Kartones Usage

Slide 8

Slide 8 text

@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!", {""value"":""es""}

Slide 9

Slide 9 text

@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10, 10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text

Slide 10

Slide 10 text

@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary

Slide 11

Slide 11 text

@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates": [125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/

Slide 12

Slide 12 text

@Kartones IMPORT ISSUES

Slide 13

Slide 13 text

@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots of columns (~1600) XLS/XLSX -> CSV

Slide 14

Slide 14 text

@Kartones Typical Stream HTTP downloaded file Stream file between servers Stream data import to DB

Slide 15

Slide 15 text

@Kartones Typical

Slide 16

Slide 16 text

@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors fixing Sync tables -> No downtime allowed

Slide 17

Slide 17 text

@Kartones DB-Specific Leave DB indexes as last step Prefer big INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html

Slide 18

Slide 18 text

@Kartones Questions? Thanks! [email protected]