Upgrade to Pro
— share decks privately, control downloads, hide ads and more …
Speaker Deck
Features
Speaker Deck
PRO
Sign in
Sign up for free
Search
Search
Geospatial CSV Imports Hidden Complexity
Search
Kartones
October 27, 2015
Programming
0
52
Geospatial CSV Imports Hidden Complexity
@ Mindcamp 7.0 2015, Madrid
Kartones
October 27, 2015
Tweet
Share
More Decks by Kartones
See All by Kartones
Building Autonomous Agents with gym-retro
kartones
0
43
Python static typing with MyPy
kartones
0
68
High-impact refactors keeping the lights on
kartones
0
65
Remote Work
kartones
0
89
Intro to GameBoy Development
kartones
0
97
Myths & The Real World of OpenSource Development
kartones
0
45
CartoDB Tech Intro
kartones
0
47
Copy Protection & Cracking History
kartones
0
120
Cómo ganar dinero con tus juegos online
kartones
1
120
Other Decks in Programming
See All in Programming
商品比較サービス「マイベスト」における パーソナライズレコメンドの第一歩
ucchiii43
0
200
SQLアンチパターン第2版 データベースプログラミングで陥りがちな失敗とその対策 / Intro to SQL Antipatterns 2nd
twada
PRO
27
8.1k
Gemini CLI のはじめ方
ttnyt8701
1
100
Git Sync を超える!OSS で実現する CDK Pull 型デプロイ / Deploying CDK with PipeCD in Pull-style
tkikuc
4
450
ZeroETLで始めるDynamoDBとS3の連携
afooooil
0
120
レトロゲームから学ぶ通信技術の歴史
kimkim0106
0
130
可変性を制する設計: 構造と振る舞いから考える概念モデリングとその実装
a_suenami
2
550
Claude Code で Astro blog を Pages から Workers へ移行してみた
codehex
0
150
What's new in AppKit on macOS 26
1024jp
0
170
副作用と戦う PHP リファクタリング ─ ドメインイベントでビジネスロジックを解きほぐす
kajitack
2
440
The Evolution of Enterprise Java with Jakarta EE 11 and Beyond
ivargrimstad
0
470
状態遷移図を書こう / Sequence Chart vs State Diagram
orgachem
PRO
3
250
Featured
See All Featured
How GitHub (no longer) Works
holman
314
140k
Easily Structure & Communicate Ideas using Wireframe
afnizarnur
194
16k
Building a Scalable Design System with Sketch
lauravandoore
462
33k
GraphQLの誤解/rethinking-graphql
sonatard
71
11k
It's Worth the Effort
3n
185
28k
Site-Speed That Sticks
csswizardry
10
720
A Modern Web Designer's Workflow
chriscoyier
695
190k
JavaScript: Past, Present, and Future - NDC Porto 2020
reverentgeek
50
5.5k
KATA
mclloyd
30
14k
Faster Mobile Websites
deanohume
308
31k
Save Time (by Creating Custom Rails Generators)
garrettdimon
PRO
31
1.3k
Code Reviewing Like a Champion
maltzj
524
40k
Transcript
@Kartones GEOSPATIAL CSV IMPORTS HIDDEN COMPLEXITY
@Kartones CartoDB
@Kartones Agenda 1) CSV Format Issues 2) Import Issues
@Kartones CSV FORMAT ISSUES
@Kartones Intro .csv / MIME:text/csv Unknown birthdate (80s?) RFC 4180
(2005)
@Kartones Intro Plain text Simple format Simple rules
@Kartones Usage
@Kartones CSV 0101000020E610000000000000008049C000000000000038C0,1083 "alien",2014-11-04 15:24:40.43413+00 category 1, "jump jump up!",
{""value"":""es""}
@Kartones WKT: Well-Known Text POINT (30 10) LINESTRING (30 10,
10 30, 40 40) POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10)) MULTIPOINT ((10 40), (40 30), (20 20), (30 10)) MULTIPOLYGON (((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 5))) https://en.wikipedia.org/wiki/Well-known_text
@Kartones WKB: Well-Known Binary POINT(2.0 4.0) = 000000000140000000000000004010000000000000 https://en.wikipedia.org/wiki/Well-known_text#Well-known_binary
@Kartones GeoJSON { "type": "Feature", "geometry": { "type": "Point", "coordinates":
[125.6, 10.1] }, "properties": { "name": "Dinagat Islands" } } http://geojson.org/
@Kartones IMPORT ISSUES
@Kartones Typical Huge files (>1GB) Lots of rows (+2M) Lots
of columns (~1600) XLS/XLSX -> CSV
@Kartones Typical Stream HTTP downloaded file Stream file between servers
Stream data import to DB
@Kartones Typical
@Kartones CartoDB-specific Content guessing (e.g. lat/lon) Type guessing Geometry errors
fixing Sync tables -> No downtime allowed
@Kartones DB-Specific Leave DB indexes as last step Prefer big
INSERT to multiple UPDATE GDAL’s ogr2ogr > Ruby/Python scripts http://www.gdal.org/ogr2ogr.html
@Kartones Questions? Thanks!
[email protected]