Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
0
47
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
Tweet
Share
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
43
Python static typing with MyPy
kartones
0
64
High-impact refactors keeping the lights on
kartones
0
63
Remote Work
kartones
0
86
Intro to GameBoy Development
kartones
0
95
Myths & The Real World of OpenSource Development
kartones
0
45
CartoDB Tech Intro
kartones
0
44
Copy Protection & Cracking History
kartones
0
120
Cómo ganar dinero con tus juegos online
kartones
1
110
Other Decks in Programming
See All in Programming
Thank you <💅>, What's the Next?
ahoxa
1
340
タイムゾーンの奥地は思ったよりも闇深いかもしれない
suguruooki
1
670
AIコーディングの理想と現実
tomohisa
22
30k
MCP調べてみました! / Exploring MCP
uhzz
2
2.3k
AI Agents with JavaScript
slobodan
0
230
AIコーディングワークフローの試行 〜AIエージェント×ワークフローでの自動化を目指して〜
rkaga
3
3.7k
Defying Front-End Inertia: Inertia.js on Rails
skryukov
0
490
メモリウォールを超えて:キャッシュメモリ技術の進歩
kawayu
0
1.9k
AI Coding Agent Enablement - エージェントを自走させよう
yukukotani
14
6.1k
Make Parsers Compatible Using Automata Learning
makenowjust
1
4.9k
Ruby's Line Breaks
yui_knk
2
1.1k
エンジニアが挑む、限界までの越境
nealle
1
210
Featured
See All Featured
Faster Mobile Websites
deanohume
306
31k
Writing Fast Ruby
sferik
628
61k
Speed Design
sergeychernyshev
29
900
Into the Great Unknown - MozCon
thekraken
37
1.7k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Building Applications with DynamoDB
mza
94
6.3k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
30
2.3k
Docker and Python
trallard
44
3.3k
VelocityConf: Rendering Performance Case Studies
addyosmani
328
24k
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
41
2.2k
Stop Working from a Prison Cell
hatefulcrawdad
268
20k
Bash Introduction
62gerente
611
210k
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks!
[email protected]