Upgrade to PRO for Only $50/Year—Limited-Time Offer! 🔥
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
0
46
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
Tweet
Share
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
41
Python static typing with MyPy
kartones
0
63
High-impact refactors keeping the lights on
kartones
0
60
Remote Work
kartones
0
81
Intro to GameBoy Development
kartones
0
94
Myths & The Real World of OpenSource Development
kartones
0
45
CartoDB Tech Intro
kartones
0
42
Copy Protection & Cracking History
kartones
0
120
Cómo ganar dinero con tus juegos online
kartones
1
110
Other Decks in Programming
See All in Programming
tidymodelsによるtidyな生存時間解析 / Japan.R2024
dropout009
1
450
Jakarta EE meets AI
ivargrimstad
0
760
CSC305 Lecture 23
javiergs
PRO
0
120
Arm移行タイムアタック
qnighy
0
400
事業成長を爆速で進めてきたプロダクトエンジニアたちの成功談・失敗談
nealle
3
1.2k
かんたんデザイン編集やってみた~「完全に理解した」までの道のり~
morit4ryo
1
110
Serverless苦闘史
mosh_inc
0
140
PipeCDの歩き方
kuro_kurorrr
4
140
今からはじめるAndroidアプリ開発 2024 / DevFest 2024
star_zero
0
710
flutterkaigi_2024.pdf
kyoheig3
0
450
Reckoner における Datadog Browser Test の活用事例 / Datadog Browser Test at Reckoner
nomadblacky
0
190
テスト自動化失敗から再挑戦しチームにオーナーシップを委譲した話/STAC2024 macho
ma_cho29
0
440
Featured
See All Featured
Into the Great Unknown - MozCon
thekraken
33
1.5k
Put a Button on it: Removing Barriers to Going Fast.
kastner
59
3.6k
Writing Fast Ruby
sferik
627
61k
Music & Morning Musume
bryan
46
6.2k
The Language of Interfaces
destraynor
154
24k
4 Signs Your Business is Dying
shpigford
181
21k
The MySQL Ecosystem @ GitHub 2015
samlambert
250
12k
Templates, Plugins, & Blocks: Oh My! Creating the theme that thinks of everything
marktimemedia
27
2.1k
Large-scale JavaScript Application Architecture
addyosmani
510
110k
A Tale of Four Properties
chriscoyier
157
23k
How to train your dragon (web standard)
notwaldorf
88
5.7k
Visualization
eitanlees
145
15k
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks!
[email protected]