Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
0
51
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
Tweet
Share
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
43
Python static typing with MyPy
kartones
0
68
High-impact refactors keeping the lights on
kartones
0
64
Remote Work
kartones
0
88
Intro to GameBoy Development
kartones
0
97
Myths & The Real World of OpenSource Development
kartones
0
45
CartoDB Tech Intro
kartones
0
47
Copy Protection & Cracking History
kartones
0
120
Cómo ganar dinero con tus juegos online
kartones
1
110
Other Decks in Programming
See All in Programming
Code as Context 〜 1にコードで 2にリンタ 34がなくて 5にルール? 〜
yodakeisuke
0
100
Google Agent Development Kit でLINE Botを作ってみた
ymd65536
2
160
Blazing Fast UI Development with Compose Hot Reload (droidcon New York 2025)
zsmb
1
190
Team operations that are not burdened by SRE
kazatohiei
1
200
つよそうにふるまい、つよい成果を出すのなら、つよいのかもしれない
irof
1
300
第9回 情シス転職ミートアップ 株式会社IVRy(アイブリー)の紹介
ivry_presentationmaterials
1
230
都市をデータで見るってこういうこと PLATEAU属性情報入門
nokonoko1203
1
570
Rubyでやりたい駆動開発 / Ruby driven development
chobishiba
1
370
プロダクト志向ってなんなんだろうね
righttouch
PRO
0
160
git worktree × Claude Code × MCP ~生成AI時代の並列開発フロー~
hisuzuya
1
460
Webの外へ飛び出せ NativePHPが切り拓くPHPの未来
takuyakatsusa
2
350
なぜ適用するか、移行して理解するClean Architecture 〜構造を超えて設計を継承する〜 / Why Apply, Migrate and Understand Clean Architecture - Inherit Design Beyond Structure
seike460
PRO
1
680
Featured
See All Featured
Stop Working from a Prison Cell
hatefulcrawdad
270
20k
Bash Introduction
62gerente
614
210k
Practical Orchestrator
shlominoach
188
11k
RailsConf 2023
tenderlove
30
1.1k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
138
34k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.7k
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
53k
Rebuilding a faster, lazier Slack
samanthasiow
81
9.1k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
20
1.3k
What's in a price? How to price your products and services
michaelherold
246
12k
The Cult of Friendly URLs
andyhume
79
6.5k
[RailsConf 2023] Rails as a piece of cake
palkan
55
5.6k
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks!
[email protected]