$30 off During Our Annual Pro Sale. View Details »
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
0
55
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
Tweet
Share
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
45
Python static typing with MyPy
kartones
0
71
High-impact refactors keeping the lights on
kartones
0
66
Remote Work
kartones
0
94
Intro to GameBoy Development
kartones
0
100
Myths & The Real World of OpenSource Development
kartones
0
49
CartoDB Tech Intro
kartones
0
49
Copy Protection & Cracking History
kartones
0
120
Cómo ganar dinero con tus juegos online
kartones
1
120
Other Decks in Programming
See All in Programming
社内オペレーション改善のためのTypeScript / TSKaigi Hokuriku 2025
dachi023
1
540
非同期処理の迷宮を抜ける: 初学者がつまづく構造的な原因
pd1xx
1
680
dnx で実行できるコマンド、作ってみました
tomohisa
0
140
WebRTC、 綺麗に見るか滑らかに見るか
sublimer
1
160
AIエージェントを活かすPM術 AI駆動開発の現場から
gyuta
0
330
モデル駆動設計をやってみようワークショップ開催報告(Modeling Forum2025) / model driven design workshop report
haru860
0
240
TypeScript 5.9 で使えるようになった import defer でパフォーマンス最適化を実現する
bicstone
1
1.2k
配送計画の均等化機能を提供する取り組みについて(⽩⾦鉱業 Meetup Vol.21@六本⽊(数理最適化編))
izu_nori
0
140
エディターってAIで操作できるんだぜ
kis9a
0
690
tparseでgo testの出力を見やすくする
utgwkk
1
170
俺流レスポンシブコーディング 2025
tak_dcxi
13
8.3k
How Software Deployment tools have changed in the past 20 years
geshan
0
28k
Featured
See All Featured
Unsuck your backbone
ammeep
671
58k
Practical Orchestrator
shlominoach
190
11k
Context Engineering - Making Every Token Count
addyosmani
9
490
Git: the NoSQL Database
bkeepers
PRO
432
66k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
253
22k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1.1k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
32
1.8k
Become a Pro
speakerdeck
PRO
30
5.7k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
36
6.2k
Rails Girls Zürich Keynote
gr2m
95
14k
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
37
2.6k
Docker and Python
trallard
46
3.7k
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks!
[email protected]