Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
0
53
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
Tweet
Share
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
44
Python static typing with MyPy
kartones
0
70
High-impact refactors keeping the lights on
kartones
0
65
Remote Work
kartones
0
92
Intro to GameBoy Development
kartones
0
100
Myths & The Real World of OpenSource Development
kartones
0
48
CartoDB Tech Intro
kartones
0
48
Copy Protection & Cracking History
kartones
0
120
Cómo ganar dinero con tus juegos online
kartones
1
120
Other Decks in Programming
See All in Programming
これだけで丸わかり!LangChain v1.0 アップデートまとめ
os1ma
4
280
なぜ強調表示できず ** が表示されるのか — Perlで始まったMarkdownの歴史と日本語文書における課題
kwahiro
12
7.4k
TypeScriptで設計する 堅牢さとUXを両立した非同期ワークフローの実現
moeka__c
5
2.6k
OSS開発者の憂鬱
yusukebe
14
12k
関数の挙動書き換える
takatofukui
4
750
JJUG CCC 2025 Fall: Virtual Thread Deep Dive
ternbusty
3
500
データファイルをAWSのDWHサービスに格納する / 20251115jawsug-tochigi
kasacchiful
2
100
251126 TestState APIってなんだっけ?Step Functionsテストどう変わる?
east_takumi
0
260
Full-Cycle Reactivity in Angular: SignalStore mit Signal Forms und Resources
manfredsteyer
PRO
0
140
Web エンジニアが JavaScript で AI Agent を作る / JSConf JP 2025 sponsor session
izumin5210
4
2.1k
Building AI with AI
inesmontani
PRO
1
310
チーム開発の “地ならし"
konifar
8
6.3k
Featured
See All Featured
How to train your dragon (web standard)
notwaldorf
97
6.4k
CoffeeScript is Beautiful & I Never Want to Write Plain JavaScript Again
sstephenson
162
15k
Build your cross-platform service in a week with App Engine
jlugia
234
18k
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
659
61k
Site-Speed That Sticks
csswizardry
13
970
RailsConf 2023
tenderlove
30
1.3k
Leading Effective Engineering Teams in the AI Era
addyosmani
8
1.2k
The Cult of Friendly URLs
andyhume
79
6.7k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
253
22k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Speed Design
sergeychernyshev
33
1.3k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1k
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks!
[email protected]