Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
0
47
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
Tweet
Share
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
43
Python static typing with MyPy
kartones
0
64
High-impact refactors keeping the lights on
kartones
0
63
Remote Work
kartones
0
85
Intro to GameBoy Development
kartones
0
94
Myths & The Real World of OpenSource Development
kartones
0
45
CartoDB Tech Intro
kartones
0
44
Copy Protection & Cracking History
kartones
0
120
Cómo ganar dinero con tus juegos online
kartones
1
110
Other Decks in Programming
See All in Programming
ファインディの テックブログ爆誕までの軌跡
starfish719
2
1.1k
[JAWS-UG横浜 #80] うわっ…今年のServerless アップデート、少なすぎ…?
maroon1st
1
170
“あなた” の開発を支援する AI エージェント Bedrock Engineer / introducing-bedrock-engineer
gawa
11
1.8k
お前もAI鬼にならないか?👹Bolt & Cursor & Supabase & Vercelで人間をやめるぞ、ジョジョー!👺
taishiyade
5
3.8k
Spring gRPC について / About Spring gRPC
mackey0225
0
220
How mixi2 Uses TiDB for SNS Scalability and Performance
kanmo
29
11k
社内フレームワークとその依存性解決 / in-house framework and its dependency management
vvakame
1
550
密集、ドキュメントのコロケーション with AWS Lambda
satoshi256kbyte
0
170
Domain-Driven Transformation
hschwentner
2
1.9k
AWS Organizations で実現する、 マルチ AWS アカウントのルートユーザー管理からの脱却
atpons
0
130
DevinとCursorから学ぶAIエージェントメモリーの設計とMoatの考え方
itarutomy
1
640
Software Architecture
hschwentner
6
2.1k
Featured
See All Featured
Building an army of robots
kneath
302
45k
The Invisible Side of Design
smashingmag
299
50k
Scaling GitHub
holman
459
140k
The MySQL Ecosystem @ GitHub 2015
samlambert
250
12k
Gamification - CAS2011
davidbonilla
80
5.1k
Facilitating Awesome Meetings
lara
51
6.2k
Practical Tips for Bootstrapping Information Extraction Pipelines
honnibal
PRO
12
950
Fashionably flexible responsive web design (full day workshop)
malarkey
406
66k
Distributed Sagas: A Protocol for Coordinating Microservices
caitiem20
330
21k
The Illustrated Children's Guide to Kubernetes
chrisshort
48
49k
Building Flexible Design Systems
yeseniaperezcruz
328
38k
Writing Fast Ruby
sferik
628
61k
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks!
[email protected]