Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Sponsored
·
SiteGround - Reliable hosting with speed, security, and support you can count on.
→
Kartones
October 27, 2015
Programming
56
0
Share
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
48
Python static typing with MyPy
kartones
0
71
High-impact refactors keeping the lights on
kartones
0
68
Remote Work
kartones
0
98
Intro to GameBoy Development
kartones
0
100
Myths & The Real World of OpenSource Development
kartones
0
50
CartoDB Tech Intro
kartones
0
51
Copy Protection & Cracking History
kartones
0
130
Cómo ganar dinero con tus juegos online
kartones
1
120
Other Decks in Programming
See All in Programming
AIを導入する前にやるべきこと
negima
2
300
10 Tips of AWS ~Gen AI on AWS~
licux
5
490
「話せることがない」を乗り越える 〜日常業務から登壇テーマをつくる思考法〜
shoheimitani
4
900
TiDBのアーキテクチャから学ぶ分散システム入門 〜MySQL互換のNewSQLは何を解決するのか〜 / tidb-architecture-study
dznbk
1
200
Oxlintとeslint-plugin-react-hooks 明日から始められそう?
t6adev
0
300
AIエージェントで業務改善してみた
taku271
0
550
CursorとClaudeCodeとCodexとOpenCodeを実際に比較してみた
terisuke
1
500
検索設計から 推論設計への重心移動と Recall-First Retrieval
po3rin
4
1.3k
実践CRDT
tamadeveloper
0
600
エラー処理の温故知新 / history of error handling technic
ryotanakaya
7
1.7k
個人的に嬉しかったpnpmの新機能・3選
matsuo_atsushi
0
100
Back to the roots of date
jinroq
0
530
Featured
See All Featured
Let's Do A Bunch of Simple Stuff to Make Websites Faster
chriscoyier
508
140k
How STYLIGHT went responsive
nonsquared
100
6.1k
Heart Work Chapter 1 - Part 1
lfama
PRO
6
35k
Context Engineering - Making Every Token Count
addyosmani
9
850
Abbi's Birthday
coloredviolet
2
7.3k
The Anti-SEO Checklist Checklist. Pubcon Cyber Week
ryanjones
0
130
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
12
1.1k
Building Experiences: Design Systems, User Experience, and Full Site Editing
marktimemedia
0
490
The Straight Up "How To Draw Better" Workshop
denniskardys
239
140k
The Illustrated Guide to Node.js - THAT Conference 2024
reverentgeek
1
340
Automating Front-end Workflow
addyosmani
1370
200k
AI: The stuff that nobody shows you
jnunemaker
PRO
6
600
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks!
[email protected]