Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
0
47
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
Tweet
Share
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
43
Python static typing with MyPy
kartones
0
64
High-impact refactors keeping the lights on
kartones
0
63
Remote Work
kartones
0
85
Intro to GameBoy Development
kartones
0
94
Myths & The Real World of OpenSource Development
kartones
0
45
CartoDB Tech Intro
kartones
0
44
Copy Protection & Cracking History
kartones
0
120
Cómo ganar dinero con tus juegos online
kartones
1
110
Other Decks in Programming
See All in Programming
AHC041解説
terryu16
0
590
Flutter × Firebase Genkit で加速する生成 AI アプリ開発
coborinai
0
150
Djangoアプリケーション 運用のリアル 〜問題発生から可視化、最適化への道〜 #pyconshizu
kashewnuts
1
230
Linux && Docker 研修/Linux && Docker training
forrep
23
4.5k
[JAWS-UG横浜 #79] re:Invent 2024 の DB アップデートは Multi-Region!
maroon1st
1
140
『品質』という言葉が嫌いな理由
korimu
0
160
Honoのおもしろいミドルウェアをみてみよう
yusukebe
1
200
Ruby on cygwin 2025-02
fd0
0
140
CNCF Project の作者が考えている OSS の運営
utam0k
5
690
なぜイベント駆動が必要なのか - CQRS/ESで解く複雑系システムの課題 -
j5ik2o
7
2.5k
技術を根付かせる / How to make technology take root
kubode
1
240
Kubernetes History Inspector(KHI)を触ってみた
bells17
0
200
Featured
See All Featured
Raft: Consensus for Rubyists
vanstee
137
6.8k
Writing Fast Ruby
sferik
628
61k
Optimising Largest Contentful Paint
csswizardry
34
3.1k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
31
2.1k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
49k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
356
29k
We Have a Design System, Now What?
morganepeng
51
7.4k
Fontdeck: Realign not Redesign
paulrobertlloyd
82
5.4k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
29
2.4k
Visualizing Your Data: Incorporating Mongo into Loggly Infrastructure
mongodb
44
9.4k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
132
33k
StorybookのUI Testing Handbookを読んだ
zakiyama
28
5.5k
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks!
[email protected]