Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
0
46
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
Tweet
Share
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
42
Python static typing with MyPy
kartones
0
64
High-impact refactors keeping the lights on
kartones
0
63
Remote Work
kartones
0
82
Intro to GameBoy Development
kartones
0
94
Myths & The Real World of OpenSource Development
kartones
0
45
CartoDB Tech Intro
kartones
0
44
Copy Protection & Cracking History
kartones
0
120
Cómo ganar dinero con tus juegos online
kartones
1
110
Other Decks in Programming
See All in Programming
知られざるDMMデータエンジニアの生態 〜かつてツチノコと呼ばれし者〜
takaha4k
1
450
Azure AI Foundryのご紹介
qt_luigi
1
210
Androidアプリのモジュール分割における:x:commonを考える
okuzawats
1
280
令和7年版 あなたが使ってよいフロントエンド機能とは
mugi_uno
10
5.2k
テストコードのガイドライン 〜作成から運用まで〜
riku929hr
7
1.4k
いりゃあせ、PHPカンファレンス名古屋2025 / Welcome to PHP Conference Nagoya 2025
ttskch
1
180
責務を分離するための例外設計 - PHPカンファレンス 2024
kajitack
9
2.4k
.NETでOBS Studio操作してみたけど…… / Operating OBS Studio by .NET
skasweb
0
120
最近のVS Codeで気になるニュース 2025/01
74th
1
100
為你自己學 Python
eddie
0
520
アクターシステムに頼らずEvent Sourcingする方法について
j5ik2o
6
700
Package Traits
ikesyo
1
210
Featured
See All Featured
Visualization
eitanlees
146
15k
Making the Leap to Tech Lead
cromwellryan
133
9k
[RailsConf 2023 Opening Keynote] The Magic of Rails
eileencodes
28
9.2k
Dealing with People You Can't Stand - Big Design 2015
cassininazir
365
25k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
44
9.4k
Building a Modern Day E-commerce SEO Strategy
aleyda
38
7k
Art, The Web, and Tiny UX
lynnandtonic
298
20k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
28
2.2k
Fantastic passwords and where to find them - at NoRuKo
philnash
50
2.9k
Chrome DevTools: State of the Union 2024 - Debugging React & Beyond
addyosmani
3
240
Fashionably flexible responsive web design (full day workshop)
malarkey
406
66k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
132
33k
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks!
[email protected]