Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
0
46
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
Tweet
Share
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
41
Python static typing with MyPy
kartones
0
63
High-impact refactors keeping the lights on
kartones
0
60
Remote Work
kartones
0
82
Intro to GameBoy Development
kartones
0
94
Myths & The Real World of OpenSource Development
kartones
0
45
CartoDB Tech Intro
kartones
0
42
Copy Protection & Cracking History
kartones
0
120
Cómo ganar dinero con tus juegos online
kartones
1
110
Other Decks in Programming
See All in Programming
17年周年のWebアプリケーションにTanStack Queryを導入する / Implementing TanStack Query in a 17th Anniversary Web Application
saitolume
0
250
プロダクトの品質に コミットする / Commit to Product Quality
pekepek
2
770
ソフトウェアの振る舞いに着目し 複雑な要件の開発に立ち向かう
rickyban
0
890
Keeping it Ruby: Why Your Product Needs a Ruby SDK - RubyWorld 2024
envek
0
180
Symfony Mapper Component
soyuka
2
730
アクターシステムに頼らずEvent Sourcingする方法について
j5ik2o
4
180
たのしいparse.y
ydah
3
120
Haze - Real time background blurring
chrisbanes
1
500
Criando Commits Incríveis no Git
marcelgsantos
2
170
Recoilを剥がしている話
kirik
5
6.6k
複雑な仕様に立ち向かうアーキテクチャ
myohei
0
170
The Efficiency Paradox and How to Save Yourself and the World
hollycummins
1
440
Featured
See All Featured
Site-Speed That Sticks
csswizardry
2
190
jQuery: Nuts, Bolts and Bling
dougneiner
61
7.5k
Become a Pro
speakerdeck
PRO
26
5k
Fireside Chat
paigeccino
34
3.1k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
28
900
[Rails World 2023 - Day 1 Closing Keynote] - The Magic of Rails
eileencodes
33
1.9k
Building a Scalable Design System with Sketch
lauravandoore
460
33k
Bootstrapping a Software Product
garrettdimon
PRO
305
110k
Making Projects Easy
brettharned
116
5.9k
Responsive Adventures: Dirty Tricks From The Dark Corners of Front-End
smashingmag
251
21k
VelocityConf: Rendering Performance Case Studies
addyosmani
326
24k
The Web Performance Landscape in 2024 [PerfNow 2024]
tammyeverts
2
290
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks!
[email protected]