Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
0
47
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
Tweet
Share
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
43
Python static typing with MyPy
kartones
0
65
High-impact refactors keeping the lights on
kartones
0
63
Remote Work
kartones
0
86
Intro to GameBoy Development
kartones
0
95
Myths & The Real World of OpenSource Development
kartones
0
45
CartoDB Tech Intro
kartones
0
44
Copy Protection & Cracking History
kartones
0
120
Cómo ganar dinero con tus juegos online
kartones
1
110
Other Decks in Programming
See All in Programming
「”誤った使い方をすることが困難”な設計」で良いコードの基礎を固めよう / phpcon-odawara-2025
taniguhey
0
180
ComposeでWebアプリを作る技術
tbsten
0
130
Creating Awesome Change in SmartNews! En
martin_lover
0
100
eBPF超入門「o11yに使える」とは (20250424_eBPF_o11y)
thousanda
1
100
スモールスタートで始めるためのLambda×モノリス(Lambdalith)
akihisaikeda
2
320
Contribute to Comunities | React Tokyo Meetup #4 LT
sasagar
0
580
Enterprise Web App. Development (1): Build Tool Training Ver. 5
knakagawa
1
120
Boost Your Performance and Developer Productivity with Jakarta EE 11
ivargrimstad
0
590
タイムゾーンの奥地は思ったよりも闇深いかもしれない
suguruooki
1
810
カウシェで Four Keys の改善を試みた理由
ike002jp
1
120
Empowering Developers with HTML-Aware ERB Tooling @ RubyKaigi 2025, Matsuyama, Ehime
marcoroth
2
850
設計の本質:コード、システム、そして組織へ / The Essence of Design: To Code, Systems, and Organizations
nrslib
10
3.6k
Featured
See All Featured
How to Create Impact in a Changing Tech Landscape [PerfNow 2023]
tammyeverts
52
2.4k
Bootstrapping a Software Product
garrettdimon
PRO
307
110k
Fireside Chat
paigeccino
37
3.4k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
8
690
Raft: Consensus for Rubyists
vanstee
137
6.9k
Become a Pro
speakerdeck
PRO
28
5.3k
Building Better People: How to give real-time feedback that sticks.
wjessup
367
19k
Designing Experiences People Love
moore
142
24k
Design and Strategy: How to Deal with People Who Don’t "Get" Design
morganepeng
129
19k
Speed Design
sergeychernyshev
29
920
10 Git Anti Patterns You Should be Aware of
lemiorhan
PRO
656
60k
Intergalactic Javascript Robots from Outer Space
tanoku
270
27k
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks! kartones@cartodb.com