Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
0
52
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
Tweet
Share
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
43
Python static typing with MyPy
kartones
0
69
High-impact refactors keeping the lights on
kartones
0
65
Remote Work
kartones
0
90
Intro to GameBoy Development
kartones
0
98
Myths & The Real World of OpenSource Development
kartones
0
45
CartoDB Tech Intro
kartones
0
48
Copy Protection & Cracking History
kartones
0
120
Cómo ganar dinero con tus juegos online
kartones
1
120
Other Decks in Programming
See All in Programming
管你要 trace 什麼、bpftrace 用下去就對了 — COSCUP 2025
shunghsiyu
0
470
Claude Codeで実装以外の開発フロー、どこまで自動化できるか?失敗と成功
ndadayo
2
1.4k
GitHub Copilotの全体像と活用のヒント AI駆動開発の最初の一歩
74th
8
3.2k
レガシープロジェクトで最大限AIの恩恵を受けられるようClaude Codeを利用する
tk1351
3
1.2k
250830 IaCの選定~AWS SAMのLambdaをECSに乗り換えたときの備忘録~
east_takumi
0
190
WebAssemblyインタプリタを書く ~Component Modelを添えて~
ruccho
1
910
開発チーム・開発組織の設計改善スキルの向上
masuda220
PRO
15
8.1k
オープンセミナー2025@広島LT技術ブログを続けるには
satoshi256kbyte
0
130
AI時代のドメイン駆動設計-DDD実践におけるAI活用のあり方 / ddd-in-ai-era
minodriven
23
9k
AIエージェント開発、DevOps and LLMOps
ymd65536
1
350
モバイルアプリからWebへの横展開を加速した話_Claude_Code_実践術.pdf
kazuyasakamoto
0
270
コーディングエージェント時代のNeovim
key60228
1
100
Featured
See All Featured
Product Roadmaps are Hard
iamctodd
PRO
54
11k
Become a Pro
speakerdeck
PRO
29
5.5k
Site-Speed That Sticks
csswizardry
10
790
Put a Button on it: Removing Barriers to Going Fast.
kastner
60
4k
The Invisible Side of Design
smashingmag
301
51k
Rebuilding a faster, lazier Slack
samanthasiow
83
9.1k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
48
9.6k
Principles of Awesome APIs and How to Build Them.
keavy
126
17k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
7
820
Learning to Love Humans: Emotional Interface Design
aarron
273
40k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
33
2.4k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
23
1.4k
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks!
[email protected]