Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
56
0
Share
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
46
Python static typing with MyPy
kartones
0
71
High-impact refactors keeping the lights on
kartones
0
68
Remote Work
kartones
0
97
Intro to GameBoy Development
kartones
0
100
Myths & The Real World of OpenSource Development
kartones
0
49
CartoDB Tech Intro
kartones
0
50
Copy Protection & Cracking History
kartones
0
130
Cómo ganar dinero con tus juegos online
kartones
1
120
Other Decks in Programming
See All in Programming
事業会社でのセキュリティ長期インターンについて
masachikaura
0
230
脱 雰囲気実装!AgentCoreを良い感じにWEBアプリケーションに組み込むために
takuyay0ne
3
440
Mastering Event Sourcing: Your Parents Holidayed in Yugoslavia
super_marek
0
150
今年もTECHSCOREブログを書き続けます!
hiraoku101
0
230
AWS re:Invent 2025の少し振り返り + DevOps AgentとBacklogを連携させてみた
satoshi256kbyte
2
150
条件判定に名前、つけてますか? #phperkaigi #c
77web
2
970
VueエンジニアがReactを触って感じた_設計の違い
koukimiura
0
160
Strategy for Finding a Problem for OSS: With Real Examples
kibitan
0
140
実践ハーネスエンジニアリング #MOSHTech
kajitack
7
6k
Laravel Nightwatchの裏側 - Laravel公式Observabilityツールを支える設計と実装
avosalmon
1
320
ドメインイベントでビジネスロジックを解きほぐす #phpcon_odawara
kajitack
2
120
車輪の再発明をしよう!PHP で実装して学ぶ、Web サーバーの仕組みと HTTP の正体
h1r0
3
510
Featured
See All Featured
The Curious Case for Waylosing
cassininazir
0
290
Building Adaptive Systems
keathley
44
3k
StorybookのUI Testing Handbookを読んだ
zakiyama
31
6.7k
Statistics for Hackers
jakevdp
799
230k
What Being in a Rock Band Can Teach Us About Real World SEO
427marketing
0
210
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
231
23k
How to make the Groovebox
asonas
2
2.1k
Bash Introduction
62gerente
615
210k
The SEO identity crisis: Don't let AI make you average
varn
0
440
Joys of Absence: A Defence of Solitary Play
codingconduct
1
340
Designing Dashboards & Data Visualisations in Web Apps
destraynor
231
54k
Money Talks: Using Revenue to Get Sh*t Done
nikkihalliwell
0
200
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks!
[email protected]