Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
0
43
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
Tweet
Share
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
33
Python static typing with MyPy
kartones
0
59
High-impact refactors keeping the lights on
kartones
0
58
Remote Work
kartones
0
72
Intro to GameBoy Development
kartones
0
88
Myths & The Real World of OpenSource Development
kartones
0
37
CartoDB Tech Intro
kartones
0
41
Copy Protection & Cracking History
kartones
0
89
Cómo ganar dinero con tus juegos online
kartones
1
100
Other Decks in Programming
See All in Programming
DDDはなぜ難しいのか / 良いコードの定義と設計能力の壁
pospome
24
7.6k
Ruby製社内ツールのGo移行
bgpat
2
270
決断するための勇気、そのためのBacklog / Courage to make decisions, Backlog for that.
seike460
PRO
4
1.9k
オブジェクト指向は必要なのか / Is object-oriented needed?
kishida
27
19k
The Future of C++ Interoperability: Insights from Porting a Game to Swift
teamhimeh
0
280
ONE WEDGE_Company_Information
1wedge
0
180
Dockerで始めるAWS Lambda開発
stutkhd0709
14
2.5k
OpenTelemetry のサービスという概念について
azukiazusa1
1
410
CircleCIを活用して AWSへの継続的デリバリーを 実践する
coconala_engineer
1
110
Laravel標準バリデーションでできること
hmb_ok
2
360
PHPerKaigi 2024〜10年以上動いているレガシーなバッチシステムを Kubernetes(Amazon EKS) に移行する取り組み〜
tshinowpub
1
220
オブジェクトしこう
okuramasafumi
2
130
Featured
See All Featured
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
185
15k
Git: the NoSQL Database
bkeepers
PRO
421
63k
How to train your dragon (web standard)
notwaldorf
71
5.1k
From Idea to $5000 a Month in 5 Months
shpigford
376
45k
RailsConf & Balkan Ruby 2019: The Past, Present, and Future of Rails at GitHub
eileencodes
124
32k
A better future with KSS
kneath
230
16k
The Brand Is Dead. Long Live the Brand.
mthomps
48
22k
Helping Users Find Their Own Way: Creating Modern Search Experiences
danielanewman
19
1.9k
Reflections from 52 weeks, 52 projects
jeffersonlam
343
19k
What's in a price? How to price your products and services
michaelherold
236
11k
XXLCSS - How to scale CSS and keep your sanity
sugarenia
240
1.2M
Making the Leap to Tech Lead
cromwellryan
123
8.4k
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks!
[email protected]