Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
0
52
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
Tweet
Share
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
44
Python static typing with MyPy
kartones
0
70
High-impact refactors keeping the lights on
kartones
0
65
Remote Work
kartones
0
92
Intro to GameBoy Development
kartones
0
98
Myths & The Real World of OpenSource Development
kartones
0
47
CartoDB Tech Intro
kartones
0
48
Copy Protection & Cracking History
kartones
0
120
Cómo ganar dinero con tus juegos online
kartones
1
120
Other Decks in Programming
See All in Programming
Verilator + Rust + gRPC と Efinix の RISC-V でAIアクセラレータをAIで作ってる話 RTLを語る会(18) 2025/11/08
ryuz88
0
170
ドメイン駆動設計のエッセンス
masuda220
PRO
15
7.4k
One Enishi After Another
snoozer05
PRO
0
180
GC25 Recap: The Code You Reviewed is Not the Code You Built / #newt_gophercon_tour
mazrean
0
140
AIと人間の共創開発!OSSで試行錯誤した開発スタイル
mae616
2
870
iOSでSVG画像を扱う
kishikawakatsumi
0
180
Kotlin 2.2が切り拓く: コンテキストパラメータで書く関数型DSLと新しい依存管理のかたち
knih
0
280
外接に惑わされない自システムの処理時間SLIをOpenTelemetryで実現した話
kotaro7750
0
160
AkarengaLT vol.38
hashimoto_kei
1
130
When Dependencies Fail: Building Antifragile Applications in a Fragile World
selcukusta
0
120
Introducing RemoteCompose: break your UI out of the app sandbox.
camaelon
2
450
CSC509 Lecture 08
javiergs
PRO
0
280
Featured
See All Featured
GraphQLの誤解/rethinking-graphql
sonatard
73
11k
Code Review Best Practice
trishagee
72
19k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
10
910
Fantastic passwords and where to find them - at NoRuKo
philnash
52
3.5k
What’s in a name? Adding method to the madness
productmarketing
PRO
24
3.7k
Building an army of robots
kneath
306
46k
Optimizing for Happiness
mojombo
379
70k
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
127
54k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
34
2.5k
Docker and Python
trallard
46
3.6k
KATA
mclloyd
PRO
32
15k
Designing Experiences People Love
moore
142
24k
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks!
[email protected]