Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
0
43
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
Tweet
Share
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
35
Python static typing with MyPy
kartones
0
60
High-impact refactors keeping the lights on
kartones
0
58
Remote Work
kartones
0
72
Intro to GameBoy Development
kartones
0
88
Myths & The Real World of OpenSource Development
kartones
0
37
CartoDB Tech Intro
kartones
0
42
Copy Protection & Cracking History
kartones
0
97
Cómo ganar dinero con tus juegos online
kartones
1
100
Other Decks in Programming
See All in Programming
Exploring the Implementation of “t.Run”, “t.Parallel”, and “t.Cleanup”
akarin
1
110
Milestoner
bkuhlmann
1
410
"config" ってなんだ? / What is "config"?
okashoi
0
250
初心者のためのRubyKaigi入門/RubyKaigi Introduction
a_matsuda
8
1.4k
#phpcon_odawara オープン・クローズドなテストフィクスチャを求めて / open closed test fixtures
77web
3
240
Next.js App Router
quramy
11
1.6k
MicrosoftのPlatform Engineeringガイドを読んで実際になにかやってみた
ymd65536
1
500
Go製Webアプリケーションのエラーとの向き合い方大全、あるいはやっぱりスタックトレース欲しいやん / Kyoto.go #50
utgwkk
6
1.7k
スクラムガイドのスプリントレトロスペクティブを改めて読みかえしてみた / Re-reading the Sprint Retrospective Section in the Scrum Guide
mackey0225
3
480
デフォルトにして至高、RubyMineの大好きな所
ruzia
0
680
PHP8.3の機能を振り返る / Review of PHP 8.3 features
seike460
PRO
1
120
From Spring Boot 2 to Spring Boot 3 with Java 21 and Jakarta EE
ivargrimstad
0
420
Featured
See All Featured
Bootstrapping a Software Product
garrettdimon
PRO
302
110k
Statistics for Hackers
jakevdp
790
220k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
20
1.7k
Designing with Data
zakiwarfel
96
4.8k
The World Runs on Bad Software
bkeepers
PRO
61
6.7k
Designing for Performance
lara
602
67k
The Art of Programming - Codeland 2020
erikaheidi
43
12k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
242
1.2M
Teambox: Starting and Learning
jrom
128
8.4k
Unsuck your backbone
ammeep
663
57k
Atom: Resistance is Futile
akmur
260
25k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
6
3.4k
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks!
[email protected]