Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
0
52
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
Tweet
Share
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
43
Python static typing with MyPy
kartones
0
69
High-impact refactors keeping the lights on
kartones
0
65
Remote Work
kartones
0
91
Intro to GameBoy Development
kartones
0
98
Myths & The Real World of OpenSource Development
kartones
0
45
CartoDB Tech Intro
kartones
0
48
Copy Protection & Cracking History
kartones
0
120
Cómo ganar dinero con tus juegos online
kartones
1
120
Other Decks in Programming
See All in Programming
スケールする組織の実現に向けた インナーソース育成術 - ISGT2025
teamlab
PRO
3
200
「社内LT会」を1年続けてみた! / Our Year-Long Journey of Internal Lightning Talks
mackey0225
1
110
半自動E2Eで手っ取り早くリグレッションテストを効率化しよう
beryu
6
1.6k
「待たせ上手」なスケルトンスクリーン、 そのUXの裏側
teamlab
PRO
0
700
Cache Me If You Can
ryunen344
2
6.7k
プロパティベーステストによるUIテスト: LLMによるプロパティ定義生成でエッジケースを捉える
tetta_pdnt
0
6.5k
The Past, Present, and Future of Enterprise Java with ASF in the Middle
ivargrimstad
0
390
Performance for Conversion! 分散トレーシングでボトルネックを 特定せよ
inetand
0
5.7k
CSC305 Lecture 01
javiergs
PRO
1
370
パフォーマンスチューニングで Web 技術を深掘り直す
progfay
17
4.4k
Introducing FrankenPHP gRPC
dunglas
2
950
Swiftビルド弾丸ツアー - Swift Buildが作る新しいエコシステム
giginet
PRO
0
1k
Featured
See All Featured
VelocityConf: Rendering Performance Case Studies
addyosmani
332
24k
Cheating the UX When There Is Nothing More to Optimize - PixelPioneers
stephaniewalter
285
14k
Become a Pro
speakerdeck
PRO
29
5.5k
Context Engineering - Making Every Token Count
addyosmani
3
100
4 Signs Your Business is Dying
shpigford
185
22k
Fashionably flexible responsive web design (full day workshop)
malarkey
407
66k
Unsuck your backbone
ammeep
671
58k
Balancing Empowerment & Direction
lara
4
660
ピンチをチャンスに:未来をつくるプロダクトロードマップ #pmconf2020
aki_iinuma
127
53k
Producing Creativity
orderedlist
PRO
347
40k
Connecting the Dots Between Site Speed, User Experience & Your Business [WebExpo 2025]
tammyeverts
8
550
GraphQLとの向き合い方2022年版
quramy
49
14k
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks!
[email protected]