Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
0
46
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
Tweet
Share
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
41
Python static typing with MyPy
kartones
0
63
High-impact refactors keeping the lights on
kartones
0
60
Remote Work
kartones
0
80
Intro to GameBoy Development
kartones
0
92
Myths & The Real World of OpenSource Development
kartones
0
45
CartoDB Tech Intro
kartones
0
42
Copy Protection & Cracking History
kartones
0
110
Cómo ganar dinero con tus juegos online
kartones
1
110
Other Decks in Programming
See All in Programming
3 Effective Rules for Using Signals in Angular
manfredsteyer
PRO
0
100
Pinia Colada が実現するスマートな非同期処理
naokihaba
4
220
タクシーアプリ『GO』のリアルタイムデータ分析基盤における機械学習サービスの活用
mot_techtalk
4
1.4k
ペアーズにおけるAmazon Bedrockを⽤いた障害対応⽀援 ⽣成AIツールの導⼊事例 @ 20241115配信AWSウェビナー登壇
fukubaka0825
6
1.8k
Compose 1.7のTextFieldはPOBox Plusで日本語変換できない
tomoya0x00
0
190
みんなでプロポーザルを書いてみた
yuriko1211
0
260
リアーキテクチャxDDD 1年間の取り組みと進化
hsawaji
1
220
Macとオーディオ再生 2024/11/02
yusukeito
0
370
Contemporary Test Cases
maaretp
0
130
Jakarta EE meets AI
ivargrimstad
0
510
AWS Lambdaから始まった Serverlessの「熱」とキャリアパス / It started with AWS Lambda Serverless “fever” and career path
seike460
PRO
1
250
Outline View in SwiftUI
1024jp
1
320
Featured
See All Featured
Ruby is Unlike a Banana
tanoku
97
11k
I Don’t Have Time: Getting Over the Fear to Launch Your Podcast
jcasabona
28
2k
Fontdeck: Realign not Redesign
paulrobertlloyd
82
5.2k
Keith and Marios Guide to Fast Websites
keithpitt
409
22k
How to Ace a Technical Interview
jacobian
276
23k
CSS Pre-Processors: Stylus, Less & Sass
bermonpainter
356
29k
How To Stay Up To Date on Web Technology
chriscoyier
788
250k
Documentation Writing (for coders)
carmenintech
65
4.4k
Designing for humans not robots
tammielis
250
25k
"I'm Feeling Lucky" - Building Great Search Experiences for Today's Users (#IAC19)
danielanewman
226
22k
個人開発の失敗を避けるイケてる考え方 / tips for indie hackers
panda_program
93
16k
Exploring the Power of Turbo Streams & Action Cable | RailsConf2023
kevinliebholz
27
4.3k
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks!
[email protected]