Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
0
43
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
Tweet
Share
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
38
Python static typing with MyPy
kartones
0
61
High-impact refactors keeping the lights on
kartones
0
59
Remote Work
kartones
0
78
Intro to GameBoy Development
kartones
0
89
Myths & The Real World of OpenSource Development
kartones
0
43
CartoDB Tech Intro
kartones
0
42
Copy Protection & Cracking History
kartones
0
110
Cómo ganar dinero con tus juegos online
kartones
1
110
Other Decks in Programming
See All in Programming
CSC307 Lecture 06
javiergs
PRO
0
360
ピグパーティにおけるMongoDB CommunityバージョンからAtlasへの移行事例
10969hotaka
0
130
継続的な活動で築く地方エンジニアの道
myamashii
2
350
APIのない大学ログインWebサービスをWKWebViewとJavaScriptでアプリ化した話
akidon0000
1
330
社内 LT 会を発足し、アウトプット文化を醸成させるために考えたこと・やったこと / Starting internal LT meetings and fostering an output culture
mackey0225
3
120
わかりやすい正解を捨てて、コトに向き合う - スクラムフェス金沢2024 スポンサーセッション
yusukekokubo
0
170
日付と正規化
megmogmog1965
0
140
君たちはどうコードをレビューする (される) か / 大吉祥寺.pm
utgwkk
15
8.5k
DMMプラットフォームにおけるTiDBの導入から運用まで
pospome
7
3k
유연한 Composable 설계
l2hyunwoo
0
380
入社1ヶ月でここまでやった!Findy Toolsインフラ支援の最適化
rvirus0817
6
1.4k
AWSでゲームサーバーを運用! Amazon GameLiftのお話
iriikeita
0
200
Featured
See All Featured
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
224
21k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
353
29k
Debugging Ruby Performance
tmm1
71
11k
The Invisible Side of Design
smashingmag
294
50k
How to train your dragon (web standard)
notwaldorf
79
5.5k
Speed Design
sergeychernyshev
9
270
Scaling GitHub
holman
458
140k
It's Worth the Effort
3n
181
27k
Fireside Chat
paigeccino
25
2.8k
Why You Should Never Use an ORM
jnunemaker
PRO
51
8.9k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
44
4.7k
Why Our Code Smells
bkeepers
PRO
332
56k
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks!
[email protected]