Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
56
0
Share
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
48
Python static typing with MyPy
kartones
0
75
High-impact refactors keeping the lights on
kartones
0
72
Remote Work
kartones
0
98
Intro to GameBoy Development
kartones
0
100
Myths & The Real World of OpenSource Development
kartones
0
50
CartoDB Tech Intro
kartones
0
51
Copy Protection & Cracking History
kartones
0
140
Cómo ganar dinero con tus juegos online
kartones
1
120
Other Decks in Programming
See All in Programming
ソフトウェア設計の結合バランス #phperkaigi
kajitack
0
510
mruby on C#: From VM Implementation to Game Scripting (RubyKaigi 2026)
hadashia
2
1.8k
GitHubCopilotCLIをはじめよう.pdf
htkym
0
330
【ディップ|26年新卒研修資料】TDD実装演習
dip_tech
PRO
0
180
【ディップ|26年新卒研修資料】OpenAPI/Swagger REST API研修
dip_tech
PRO
0
150
なぜあなたのコードには「コシ」がないのか?〜AI時代に問う、最後まで美味しい設計と戦略〜 #phpconkagawa / phpconkagawa2026
shogogg
0
160
Structured Concurrency, Scoped Values and Joiners in the JDK 25 26 27
josepaumard
1
150
実用!Hono RPC2026
yodaka
2
310
ハーネスエンジニアリングとは?
kinopeee
13
7k
Back to the roots of date
jinroq
0
820
空間オーディオの活用
objectiveaudio
0
150
AIと共に生きる技術選定 2026
sgash708
0
140
Featured
See All Featured
The Spectacular Lies of Maps
axbom
PRO
1
740
Into the Great Unknown - MozCon
thekraken
41
2.5k
Scaling GitHub
holman
464
140k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
37
6.4k
Evolution of real-time – Irina Nazarova, EuRuKo, 2024
irinanazarova
9
1.3k
Accessibility Awareness
sabderemane
1
110
The AI Revolution Will Not Be Monopolized: How open-source beats economies of scale, even for LLMs
inesmontani
PRO
3
3.4k
Optimizing for Happiness
mojombo
378
71k
Data-driven link building: lessons from a $708K investment (BrightonSEO talk)
szymonslowik
1
1.1k
Organizational Design Perspectives: An Ontology of Organizational Design Elements
kimpetersen
PRO
1
690
WENDY [Excerpt]
tessaabrams
10
37k
Paper Plane
katiecoart
PRO
1
50k
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks!
[email protected]